Quantum_Geometric_Machine_Learning
Quantum_Geometric_Machine_Learning
by Elija Perrier
Doctor of Philosophy
May, 2024
Acknowledgments
I would like to thank my principal supervisor Associate Professor Dr. Christopher
Ferrie for his patient, instructive and illuminating supervision throughout the tenure
of my research. In addition, I would like to thank A/P Ferrie for the opportunity to
participate in an AUSMURI collaborative research project while working with senior
and experienced researchers across quantum science disciplines. I would like to ac-
knowledge and thank my co-supervisor, Professor Dr. Dacheng Tao, for his counsel
and informative advice, especially relating to information theory, machine learning
and other computational science. I would like to express particular gratitude to Dr.
Christopher Jackson of the University of Waterloo. Dr Jackson’s insights and men-
torship have had a significant positive impact on both my research and academic
career. I would like to acknowledge the significant assistance of the iHPC training
facility at UTS, whose resources and expertise were important elements in running
large-scale quantum simulations over several months and years. I would like to ac-
knowledge the generous financial support provided by the Australian Government
via the Australian Research Training Program scholarship and by the UTS Faculty
of Engineering and Information Technology. I would like to thank staff, faculty and
students (past and present) at the Centre for Quantum Software and Information,
UTS. In particular, I acknowledge the support and discussions with colleagues Pro-
fessor Dr. Michael Bremner, Associate Professor Dr. Simon Devitt and Professor
Dr. Min-Hsiu Hsieh, together with Dr. Akram Youssry, Dr. Arinta Auza, Dr. Maria
Quadeer and Lirnadë Pira. I would like to also thank my academic colleagues at the
Australian National University whose engagement on deep learning, artificial intel-
ligence and other technical matters was highly beneficial, especially Professor Dr.
Seth Lazar of the ANU, Professor Dr. Tiberio Caetano of the Gradient Institute and
Professor Dr. Kimberlee Weatherill of the University of Sydney. I would like to also
thank colleagues at Stanford University, including Mauritz Kop and encouragement
from Professor Dr. Mateo Aboy at Cambridge University.
Finally I would like to thank my family including my mother, Janice Perrier,
whose help in caring for our young children (including a newborn) over many months
was a lifesaver, and my aunt, Alexsis Starcevich, whose support made that help
possible. Also thank you to my father Chris Perrier and his wife Cecilia Basile, and
my sister, Mirabai Perrier, who in the final weeks gave her all.
Most of all, I would like to express unending gratitude to my partner, Paige New-
man and my children, Violet Perrier-Newman and Scarlett Perrier-Newman. Their
sacrifices in time, space and attention to accommodate and support my research
were essential, without which I could not have completed this work, especially dur-
ing the challenging period of the COVID-19 pandemic through which this thesis was
formulated.
This is a THESIS BY COMPILATION. Parts of this thesis have been already
published. The content have been edited to suit the formatting of the thesis and to
maintain its coherence.
DECLARATION OF PUBLICATIONS INCLUDED IN THE THESIS
Contents
List of Figures xi
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Chapter Synopses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 A Note on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Background Theory 17
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Quantum Information Processing . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Operator formalism . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Quantum control . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 Open quantum systems . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Algebra and Lie Theory . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Lie groups and Lie algebras . . . . . . . . . . . . . . . . . . . 26
2.3.2 Representation theory . . . . . . . . . . . . . . . . . . . . . . 28
2.3.3 Cartan algebras and Root-systems . . . . . . . . . . . . . . . 29
2.3.4 Cartan decompositions . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Manifolds and tangents . . . . . . . . . . . . . . . . . . . . . . 32
2.4.2 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Tensors and metrics . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.4 Tangent planes and Lie algebras . . . . . . . . . . . . . . . . . 36
2.4.5 Fibre bundles and Connections . . . . . . . . . . . . . . . . . 37
2.4.6 Geodesics and parallel transport . . . . . . . . . . . . . . . . . 40
i
ii CONTENTS
Bibliography 433
List of Figures
3.2 The frequency response (left) and the phase response (right) of the
filter that is used to simulate distortions of the control pulses. The
frequency is in units of Hz, and the phase response is in units of rad. 123
xi
xii LIST OF FIGURES
4.3 Schema of GRU RNN Greybox model: (a) realised UT inputs (flat-
tened) into a GRU RNN layer comprising GRU cells in which each
segment j plays the role of the time parameter; (b) the output of the
GRU layer is a sequence of control pulses (ĉj ) using tanh activation
functions; (c) these are fed into a custom Hamiltonian estimation
layer to produce a sequence of Hamiltonians (Ĥj ) by applying the
control amplitudes to ∆; (d) the Hamiltonian sequence is fed into a
custom quantum evolution layer implementing the time-independent
Schrödinger equation to produce estimated sequences of subunitaries
(Ûj ) which are fed into (e) a final fidelity layer for comparison with
the true (Uj ). Intermediate outputs are accessible via submodels in
TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.9 Training and validation loss (MSE): GRU RNN Greybox. G = SU (2), ntrain =
1000, nseg = 100, h = 0.1, epochs= 500. This plot shows the MSE loss
(for training and validation sets) for the GRU RNN Greybox model
where the number of segments was increased from 10 to 100. As can
be seen, the model saturates rapidly once segments are increased to
100 and exhibits no significant learning. Similar results were found for
the SubRiemannian model. This result suggests that simply chang-
ing the number of segments is insufficient for model improvement.
One solution to this problem may be to introduce variable or adap-
tive hyperparameter tuning into the model such that the number of
segments varies dynamically. . . . . . . . . . . . . . . . . . . . . . . . 166
LIST OF FIGURES xv
⊥
5.1 Cayley transform of −iHIII expressed as a rotation into −iλ5 . The
presence of the imaginary unit relates to the compactness here of g
which reflects the boundedness and closed nature of the transforma-
tion characteristic of unitary transformations. . . . . . . . . . . . . . 228
xvi LIST OF FIGURES
⊥
5.2 Cayley transform of −iHIII expressed as a rotation into λ5 . By con-
trast with the case to the left, the absence of the imaginary unit is
indicative of non-compactness such that distances are not preserved
(unlike in the unitary case where −i is present). . . . . . . . . . . . . 228
5.3 A transition diagram showing the relationship between energy tran-
sitions and roots. In a quantum control context, a transition between
two energy levels of an atom that can be described by a root vector
in the Hamiltonian. For example, a transition between |0⟩ → |1⟩ can
be described using the root vector eα . An electromagnetic pulse with
a frequency resonant with the energy difference between these two
levels can, if applied correctly, transition the system consistent with
the action of eα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.4 Symmetric root system diagram for the root system described via
roots in equation (5.8.3) for the Lie algebra su(3). The roots α, β, γ
can be seen in terms of angles between the root vectors and can be
calculated using the Cartan matrix. . . . . . . . . . . . . . . . . . . . 230
5.5 Combined diagram of a Dynkin diagram and a symmetric root system
with specified angles and relations. . . . . . . . . . . . . . . . . . . . 231
B.1 Expanded Dynkin diagram of type An with labeled vertices and edges.
The numbers above the nodes indicate the length of the roots rela-
tive to each other. Aij Aji determines the number of lines (or the type
of connection) between vertices i and j. This product can be 0 (no
connection), 1 (single line), 2 (double line), or 3 (triple line), rep-
resenting the angle between the corresponding roots. Additionally,
when Aij Aji > 1, an arrow is drawn pointing from the longer root to
the shorter root. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
D.2 Manifold mapping between parameter manifold K and its fibre bun-
dle Rm and target manifold M with associated fibre bundle spaces
R2n . Optimisation across the parameter manifold can be construed
in certain cases as an embedding of the parameter manifold within
the target (label) manifold M, where such embeddings may often be
complex and give rise to non-standard topologies. . . . . . . . . . . . 423
xviii LIST OF FIGURES
List of Tables
3.3 QDataSet features for quantum state tomography. The left columns
lists typical categories in a machine learning architecture. The right
column describes the corresponding feature(s) of the QDataSet that
would fall into such categories for the use of the QDataSet in training
quantum tomography algorithms. . . . . . . . . . . . . . . . . . . . . 118
3.4 QDataSet features for quantum noise spectroscopy. The left columns
lists typical categories in a machine learning architecture. The right
column describes the corresponding feature(s) of the QDataSet that
would fall into such categories for the use of the QDataSet in training
quantum tomography algorithms. . . . . . . . . . . . . . . . . . . . . 118
3.5 QDataSet features for quantum control. The left columns lists typical
categories in a machine learning architecture. The right column de-
scribes the corresponding feature(s) of the QDataSet that would fall
into such categories for the use of the QDataSet in training quantum
control algorithms. The specifications are just one of a set of possible
ways of framing quantum control problems using machine learning. . 119
xix
xx LIST OF TABLES
3.9 QDataSet File Description (Square). The left column identifies each
dataset in the respective QDataSet examples while the description
column describes the profile of the square pulse datasets in terms of
(i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent. . 122
3.10 An example of the types of quantum data features which may be in-
cluded in a dedicated large-scale dataset for QML. The choice of such
features will depend on the particular objectives in question. We
include a range of quantum data in the QDataSet, including informa-
tion about quantum states, measurement operators and measurement
statistics, Hamiltonians and their corresponding gates, details of en-
vironmental noise and controls. . . . . . . . . . . . . . . . . . . . . . 123
LIST OF TABLES xxi
4.1 Comparison table of batch fidelity MSE ((Uj ) and (Ûj )) for training
(MSE(T)) and validation (MSE(V)) sets along with average opera-
tor fidelity (and order of standard deviation in parentheses) for four
neural networks where Λ0 ∈ su(2n ): (a) GRU & FC Blackbox (origi-
nal) (b) FC Greybox, (c) SubRiemannian model and (d) GRU RNN
Greybox model. Parameters: h = 0.1, nseg = 10, ntrain = 1000; train-
ing/validation 75/25; optimizer: Adam, α ≈1e-3. Note*: MSE for
GRU & FC Blackbox standard MSE comparing (Uj ) with Ûj . Sub-
Riemannian and GRU RNN Greybox models outperform blackbox
models on training and validation sets with lower MSE, higher aver-
age operator fidelity and lower variance. . . . . . . . . . . . . . . . . 162
4.2 Comparison table of batch fidelity MSE ((Uj ) v. (Ûj )) for train-
ing (MSE(T)) and validation (MSE(V)) sets along with average op-
erator fidelity (and order of standard deviation in parentheses) for
models where Λ0 ∈ ∆: (a) GRU & FC Blackbox (original) (b) Sub-
Riemannian model and (c) GRU RNN Greybox model. Parameters:
h = 0.1, nseg = 10, ntrain = 1000; training/validation 75/25; opti-
mizer: Adam, α ≈1e-3. Note*: MSE for GRU & FC Blackbox stan-
dard MSE comparing (Uj ) with Ûj . For this case, overall the GRU
RNN Greybox model performed slightly better than the SubRieman-
nian model, with both outperforming the GRU & FC Blackbox model.
The FC Greybox model was not tested given its inferior performance
overall. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.4 Hamiltonian distance and unitary fidelity between Swaddle and Boozer
geodesic approximations. . . . . . . . . . . . . . . . . . . . . . . . . . 184
xxii LIST OF TABLES
Introduction
Abstract
The use of geometric and symmetry techniques in quantum and classical information
processing has a long tradition across the physical sciences as a means of theoretical
discovery and applied problem solving. In the modern era, the emergent combina-
tion of such geometric and symmetry-based methods with quantum machine learning
(QML) has provided a rich opportunity to contribute to solving a number of persis-
tent challenges in fields such as QML parametrisation, quantum control, quantum
unitary synthesis and quantum proof generation. In this thesis, we combine state-
of-the-art machine learning methods with techniques from differential geometry and
topology to address these challenges. We present a large-scale simulated dataset
of open quantum systems to facilitate the development of quantum machine learn-
ing as a field. We demonstrate the use of deep learning greybox machine learning
techniques for estimating approximate time-optimal unitary sequences as geodesics
on subRiemannian symmetric space manifolds. Finally, we present novel techniques
utilising Cartan decompositions and variational methods for analytically solving
quantum control problems for certain classes of Riemannian symmetric space.
1
2 CHAPTER 1. INTRODUCTION
Geometry will draw the soul toward truth and create the spirit of phi-
losophy (Plato)
Spacetime tells matter how to move; matter tells spacetime how to curve
(John Wheeler )
One geometry cannot be more true than another; it can only be more
convenient (Élie Cartan)
1.1 Overview
This thesis introduces quantum geometric machine learning (QGML), presenting
results in quantum control and quantum machine learning using techniques from
differential geometry and Lie theory. Our work, as we discuss below and throughout
this thesis, represents a synthesis of four distinct but related disciplines: quantum
information science (focusing specifically on quantum computing and quantum con-
trol), abstract algebra and representation theory, differential geometry and quantum
machine learning. By combining insights and methods from these fields, we have
sought to leverage theory and architectures to explore how hybrid quantum-classical
systems can simulate quantum systems, learn underlying geometric symmetries and
be used as tools for optimisation problems. Throughout this thesis, we have en-
deavoured to connect the relevance of these important concepts to quantum in-
formation processing, control and machine learning in a way that is accessible to
a cross-disciplinary audience. Synthesising techniques across disciplines can be a
challenging and at times unwieldy task as drawing precise maps between concepts
faces a thicket of jargon, terms of art and disciplinary convention that characterise
any academic field. To add clarity to the process, we therefore foreshadow below
how each discipline relates to the overall contributions of this thesis.
(i) Quantum information processing. The primary focus of this thesis is to address
problems in quantum information processing, specifically problems related to
simulating quantum systems and the control of quantum systems. Quantum
information processing is a vast discipline, incorporating quantum comput-
ing, quantum communication and quantum sensing. Our focus is on the first
of these, quantum computing, but we retain a general information-theoretic
framing given the overlap with information sciences through quantum machine
learning theory. The targets and objective of the work in this thesis are the
1.1. OVERVIEW 3
(ii) Algebra. Algebra, by which we primarily mean the theory of continuous Lie
groups and representation theory, informs this thesis in many ways but its main
relevance is because, at least for closed quantum systems, unitary operations
the discovery of which is the primary objective above form a Lie group G
whose corresponding Lie algebra g is the basis for constructing the quantum
mechanical Hamiltonians governing the evolution of quantum systems. To this
end, we explore the deep connections between algebra and symmetry reflected
in the work of Cartan and others via concepts such as Cartan decompositions,
abstract root systems and diagrammatic tools such as Dynkin diagrams.
(iii) Geometry. Geometry enters the thesis as a way to frame problems of interest
such that we can leverage specific results in the theory of classical (and quan-
tum) geometric control theory. In geometric terms, the unfolding of a quantum
computation via the representation of unitary evolution can be construed as
the tracing out of a curve on a differentiable manifold M corresponding to the
Lie group G. The evolution of the curve is generated by vectors in the tangent
space T M which can be equated with the said Lie algebra g group generators
above. Time or energy-optimal (minimal) curves correspond to geodesics, so
the search for time-optimal quantum circuits or evolution becomes a question
of using the machinery of differential geometry to find geodesics with minimal
arc-length ℓ. For certain classes of Lie group G, the symmetry properties of
those groups allow the corresponding manifold G/K (for a chosen isometry
group K) to be construed as a Riemannian (or subRiemannian) symmetric
space admitting a Cartan decomposition g = k ⊕ p. In such cases, results from
geometric control theory show that general optimality (Pontryagin Maximum
Principle) criteria can be met under assumptions about the Lie group (and
Lie algebra) being fully reachable under the operation of the Lie derivative
(commutator). Moreover, the symmetry properties of such groups (specifically
their partition into horizontal HM and vertical V M subspaces corresponding
to the subalgebras p and k respectively) together with the Lie triple property
[p, [p, p]] ⊆ p mean that we can make simplifying assumptions about (a) the
4 CHAPTER 1. INTRODUCTION
existence of curves γ(t) between U (0) and U (T ) on M and (b) the uniqueness
(via being minimal length) of such curves, such as when those curves generated
by Hamiltonians drawn from p only. In this case, the optimisation problem is
considerably simplified, becoming a question of finding the minimal (optimal)
(in terms of energy or time) control functions u(t) which apply to such genera-
tors to evolve curves. The final substantive Chapter 5 provides a new method
for calculating such time-optimal curves for use in quantum information.
(iv) Quantum machine learning. Quantum and classical machine learning enter the
thesis as a way to then solve for this specific problem of finding optimal control
functions u(t) by training hybrid classical-quantum algorithms on data such
as data about quantum geodesics. The quantum element of machine learning
enters via a hybrid quantum-classical stack whereby the rules of quantum evo-
lution (such as via Schrödinger’s equation and via Hamiltonians comprising
Lie algebra elements g) are encoded in the neural network stack. The neu-
ral network architecture then seeks to leverage classical machine learning to
learn control functions u(t) to achieve objectives of synthesising U (T ) (such
as via maximising fidelity among the model estimate Û (T ) and U (T ) itself).
We call this hybrid model a greybox model of quantum machine learning as
it incorporates a whitebox of both known facts (such as the laws of quantum
mechanical state evolution) and a blackbox in terms of unknown non-linear
parameters. We show the utility of this approach in Chapter 3, where we ev-
idence its benefits in constructing a large-scale quantum computing dataset
of specific relevance to researchers in both closed- and open-quantum systems
research and in Chapter 4, where we show how for certain classes of problem,
the greybox method is more effective at learning the geometric and geodesic
structure of time-optimal unitaries from data than other methods.
Historical background
Geometric methods have a long lineage in quantum mechanics and quantum infor-
mation. The historical development of late 19th and early 20th century physics
was informed in tandem with and significantly by geometric techniques, from gen-
eral relativity, to quantum mechanics. Geometry informed much of 19th century
1.1. OVERVIEW 5
The search for symmetry has also shaped computational science in profound ways
since its inception, including via techniques to optimise network flows, analyse data,
solve cryptographic problems and facilitate error correction. More recently, the
advent of quantum information sciences [11], specifically quantum computing, has
breathed new life into the use of symmetry reduction techniques drawn from these
fields as researchers seek new methods to overcome challenges posed by computa-
tional intractability or resource constraints. The emergence of sophisticated machine
learning methods and technologies has also brought about an impetus to explore how
well-established mathematics of symmetry can be leveraged together with learning
protocols in order to solve optimisation problems in quantum information science
in general and quantum control specifically. The nature of learning in classical and
quantum information science is a rich and widely studied problem [12, 13]. At the
heart of theories of learning are methods, protocols or even unknown phenomena
that in some way increase the knowledge of a system about the world or some ground
truth. In this sense quantitative sciences of information inhere a certain philosoph-
ical stance or claim about what learning is. In the abstract, learnability of systems
is (or can be) construed using stateful representation, in which the knowledge or
information of a system represents an epistemic state, such as one encoded within
the parameters of a neural network [14] or quantum algorithm, as a means of in
effect quantifying (together with, for example, rules of inference) the knowledge or
information of a system. Yet information is not the sole measure relevant to learn-
ing: knowledge or information about the state of the world is key as assessed by
figures of merit such as prediction metrics (such as loss functions), which most often
stand in as proxies for a system’s epistemic state.
Thus a system that randomly assimilates large amounts of information with
poor predictive capacity (that one which fails to generalise or one which overfits)
is considered one with less capacity to learn (or that has learnt less) than one with
less information but greater generalisability, in both classical and quantum contexts.
To learn thus involves complex interactions between the amount of information a
system can represent on the one hand to enable its versatile generalisability and the
extent to which that system develops sufficiently accurate representations about the
world in order to make accurate predictions, something manifest in practice in bias-
variance trade and studied formally within statistical learning theory. A learning
protocol is thus one which increases this epistemic state and enables the learning of
some structure and predictive improvement. Thus learning can be construed, and in
computational science usually is construed, inherently as an optimisation task given
objectives and data about the world. Machine learning, classically, involves the
8 CHAPTER 1. INTRODUCTION
search for and design of models which satisfy such epistemic objectives to optimise
this learning task using classical data and algorithms. Quantum machine learning
has the same overall aim, yet is complicated by the unique characteristics of quantum
information (as manifest in properties of quantum data and quantum algorithms),
imposing constraints upon how learning occurs in a quantum context.
where initial conditions are chosen depending on the problem at hand (where time
dependence is understood). The QCT problem becomes how to select controls uk ∈
R (bounded) to reach some target U (T ) at time t = T ideally in a time or energy-
optimal fashion [15–19]. The algebraic and Lie theoretic features of this problem
enter quantum information-based control theory because quantum computation in
general must be unitary (or represented by unitary channels) in order for quantum
state and quantum data (and measure) to be preserved (as distinct from collapsing to
1.2. RELATED WORK 9
classical data). This is represented via the computations of interest being described
as unitary group elements U (t) ∈ G where G is a unitary or special unitary group (see
definitions A.1.18, B.2.4 and sections A.1.4 generally). As a result, this necessitates
that Hamiltonians H(t) in Schrödinger’s equation (definition A.2.2) (whose solutions
are such unitaries U (t)) must (in the closed form case) be comprised of elements
from the corresponding Lie algebra g which generate those group elements (note we
discuss the impact of noise in sections below). Control theory is then characterised
by two primary objectives: (a) determining whether the system is controllable, that
is, whether the desired or target computation or end point can be reached given
the application of a control system; and (b) strategies for determining the optimal
control for satisfying an objective such as energy or time minimisation.
The theory of quantum geometric control (QGC), such as early work in quan-
tum control leveraging Cartan decompositions in NMR contexts [20–22], derives
from a mixture of algebraic techniques (such as Cartan decompositions) and clas-
sical geometric control theory [23–26]. The fulcrum concept here is the seminal
correspondence between Lie groups G and Lie algebras g on the one hand and dif-
ferential manifolds M and tangent spaces T M on the other. To this end, this work
fits within an emerging literature on the application of symmetry reduction and
symmetry-based optimisation methods applied to quantum machine learning. To
solve this optimisation problem (and indeed to simulate open and closed quantum
systems for quantum control as per Chapter 3), we leverage techniques in control
theory, geometry and quantum machine learning. Our work involves leveraging
machine learning (ML) techniques via a hybrid quantum-classical approach adopt-
ing quantum machine learning (QML) principles. It focuses on using parametrised
quantum circuits (see section D.7) in neural network architecture which encodes
structural features (such as unitarity of outputs) while enabling machine learning
optimisation.
Other related fields to our work include geometric machine learning (GML) [27]
where machine learning problems, such as parameter manifolds and gradient de-
scent, are studied from an information-theory perspective, bearing similarity with
geometric information science [28] and Lie group machine learning [29]. Probably the
field with the most overlap with QGML is that of geometric QML (GQML) [30–38]
a recently-developed area which studies how symmetry techniques can be used in
the design of or embedded within quantum machine learning architectures (usually
parametrised or variational quantum circuits) in ways such that those networks re-
spect to varying degrees those underlying symmetries. GQML techniques have also
been shown to have import in addressing aspects of the barren plateau problem in
QML. GQML is probably the most similar field to the work herein due to its use
of dynamical Lie algebras in QML network design. Most of the literature in the
10 CHAPTER 1. INTRODUCTION
QGC QML
QGML
GQML
Geometry ML
GML
Figure 1.1: Venn diagram describing overlap of quantum information processing and control theory
(Quantum Control Theory), Geometry and Machine Learning (ML). QGML sits at the intersection
of quantum computing and control theory, geometry and machine learning. Related fields, includ-
ing quantum geometric control (QGC), geometric machine learning (GML) and quantum machine
learning (QML) can be situated accordingly. The most overlap between existing literature and the
present work on QGML is geometric quantum machine learning (GQML) which seeks to encode
symmetries across entire quantum machine learning networks.
1.3. CONTRIBUTIONS 11
1.3 Contributions
In this section, we set out a synopsis of each Chapter below. The first five Chapters
are self-contained and should be regarded as the thesis proper. Note that we have
included both a background theory Chapter (see Chapter 2) and extensive supple-
mentary background information in the form of Appendices. The rationale for this
is that the subject matter of this work is inherently multi-disciplinary in nature,
synthesising concepts from quantum information processing with those in algebra,
geometry and machine learning. Each supplementary Appendix is tailored to pro-
vide additional background material in a relatively contained way for readers whom
may be familiar with some, but not all, of these diverse scientific disciplines. The
Appendices reproduce or paraphrase standard results in the literature with source
material identified at the beginning of each Appendix. Proofs are omitted for brevity
but can be found in the cited sources and other standard texts. The substantive
Chapters 3, 4 and 5 have been tailored to cross-refer to the Appendices’ sections,
definitions, theorems and equations in order to assist readers who may wish to delve
deeper.
Appendix B (Algebra)
This supplemental Appendix includes background information on Lie theory and
representation theory relevant to geometric concepts applied in our results Chap-
ters. The summary begins with contextualisation of the role of geometry in the
development of algebraic and quantum techniques. It then surveys essential results
1.4. CHAPTER SYNOPSES 15
from the theory of Lie groups and Lie algebras which are of central importance to the
results we present further on. It covers aspects of representation theory relevant to
the work, including root system derivation, Cartan matrices and Dynkin formalism.
It focuses in particular on Cartan decompositions from an algebraic standpoint.
Background Theory
2.1 Overview
Quantum geometric machine learning intersects multiple mathematical and scien-
tific disciplines. In this Chapter, we provide a brief synopsis of background theory
relevant to our contributory Chapters 3, 4 and 5. In addition to the material be-
low, we have linked to extensive supplemental material set out in the Appendices.
The rationale for this is primarily the interdisciplinary nature of the subject mat-
ter that spans quantum information processing, algebra and representation theory,
differential geometry and machine learning (both classical and quantum).
17
18 CHAPTER 2. BACKGROUND THEORY
tion A.1.2) where a classical register X is either (a) a simple register, being an
alphabet Σ (describing a state); or (b) a compound register, being an n-tuple of
registers X = (Yk ). Quantum registers and states are then defined as an element of
a complex (Euclidean) space CΣ for a classical state set Σ satisfying specifications
imposed by axioms of quantum mechanics (definition A.1.3). Quantum registers
(and by extension quantum states ψ) are assumed to encode complete information
about the quantum system (such that even open quantum systems (section A.3)
can be reframed as complete closed systems encompassing system and environment
dynamics - see below).
Hilbert spaces are equipped with important structural features (we describe in
more detail below). First is the inner product (definition A.1.4) ⟨·, ·⟩ : V × V →
C, (ψ, ϕ) 7→ ⟨ψ, ϕ⟩ for ψ, ϕ ∈ V, c ∈ C. Together with the Cauchy-Schwartz inequal-
ity (definition A.1.5) we then define a norm (definition A.1.6 on V (C), given by
∥.∥ : V → R, ψ 7→ ∥ψ∥ for ψ, ϕ ∈ V , which in turn imposes a distance (metric)
function d on V . Normed vector spaces are Banach spaces satisfying certain con-
vergence and boundedness properties and allowing definition of an operator norm
(definition A.1.7) ensuring for example that operations (evolutions) remain within
V (C). This structure allows us to define in a quantum informational context the
concept of the dual space, a particularly important concept when we come to geo-
metric and algebraic representations of quantum states and evolutions. The dual
space (definition A.1.8) is defined as the set of all bounded linear functionals is de-
K K
noted the dual space V ∗ (K) to V ( ) such that χ : V → , ψ 7→ a||ψ|| for some
K
(scaling) a ∈ .
C
A Hilbert space is then formally defined as a vector space H( ) with an inner
p
product ⟨·, ·⟩ complete in the norm defined by the inner product ||ψ|| = ⟨ψ, ψ⟩
(definition A.1.9). Hilbert spaces can be composites of other Hilbert spaces, such
as in the case of direct-sum (spaces) H = Hi ⊕ Hj (relevant to state-space de-
composition) (definition A.1.1). Moreover, a Hilbert space admits orthonormal and
orthogonal basis (definitions A.1.10 allowing us to work with H using an orthonor-
mal basis A.1.11), which is of fundamental importance to the formalism of quantum
computation.
In many cases (such as those the subject of Chapters 3 and 4) we are interested in
two-level quantum systems denoted qubit systems (equation A.1.3) along with multi-
qubit systems, hence we are interested in tensor products of Hilbert spaces Hi ⊗ Hj
(definition A.1.12). Tensor products, which also have a representation as a bilinear
map T : Hi × Hj → C are framed geometrically later on forming an important part
of differential geometric formalism (such as relating to contraction operations and
measurement). They exhibit certain universal and convergence properties required
for representations of quantum states and their evolutions (see section A.1.1.1).
2.2. QUANTUM INFORMATION PROCESSING 19
Using the formalism above, one often uses bra-ket notation where states ψ ∈ H are
represented via ket notation |ψ⟩ with their duals ψ † represented via bra notation
⟨ψ|. Connecting this notation, ubiquitous in quantum information, to the formalism
of vector spaces, duals and mappings provides a useful bridge to geometric framings
of quantum information (we expand upon this further on).
(CPTP) maps which play an important role in quantum information theory where we
often are interested in operations upon (or maps between) operator representations.
The set of such channels is denoted C(H1 , H2 ). An example is framing unitary
evolution in terms of unitary channels (definition A.1.30) where the operation of
U ∈ B(H) is represented as Φ(X) = U XU † for X ∈ B(H). Furthermore, we
can also then define quantum-classical channels (definition A.1.31) Φ ∈ C(H1 , H2 )
which transform a quantum state ρ1 ∈ B(H) (with possible off-diagonal terms) into a
classical distribution of states represented by a diagonal density matrix ρ2 ∈ B(H).
Quantum-classical channels are a means of representing general measurement in
quantum information (see generally [43]).
2.2.2 Evolution
The evolution of quantum systems is framed by the postulate (axiom A.1.3) that
quantum states evolve according to the Schrödinger equation idψ = H(t)ψdt (see
definition A.1.18) where ψ ∈ H and the self-adjoint operator that generates evolu-
tions in time, H(t) ∈ B(H), is denoted the Hamiltonian (definition A.1.33). The
Hamiltonian (which represents the set of Hamilton’s equations [44] for quantised
contexts [45])) of a system is the primary means of mathematically characterising
the dynamics of quantum systems, describing state evolution and control archi-
tecture. The equation has equivalent representations in density operator form as
dρ = −i[H, ρ]dt (equation (A.1.20)) and unitary form as dU U -1 = −iHdt (equation
(A.1.21)). The utility of each representation depends on the problem at hand. We
focus primarily on the unitary form due to the apparent relationship with unitary
Lie groups and geometry. Time-dependent solutions toSchrödinger’s equation take
R ∆t
the form of unitaries (equation (A.1.22)) U = T+ exp −i 0 H(t)dt (see section
A.1.5) the set of which as we shall see form a Lie group. In practice solving for
the time-dependent form of U (t) can be difficult or intractable. Instead a time-
independent approximation U (t) ≈ U (t − 1), , , U (0) (equation (A.1.25)) is adopted
by assuming that Hamiltonians H(t) (and thus U (t)) can be regarded as constant
over sufficiently small time scales ∆t. Doing so allows a target unitary U (T ) to
be decomposed as a sequence U (T ) = U (tn )...U (t0 ). As discussed below, for full
controllability such that arbitrary U (t) are reachable (within a reachable set R),
we rely upon the Baker-Campbell-Hausdorff theorem (definition B.2.18) to ensure
the subspace p ⊂ g of Hamiltonian generators are bracket-generating (definition
C.4.4) such that g is generated under the operation of the Lie derivative (B.2.6).
We explain these terms in more detail below.
22 CHAPTER 2. BACKGROUND THEORY
2.2.3 Measurement
†
Mm ρMm
ρ′ = †
⟨Mm Mm , ρ⟩
†
due to the (partial) trace operation of measurement tr(Mm Mm ρ). This is sometimes
describe as the Copenhagen interpretation or collapse of the wave function view of
quantum mechanics. In information theory terms, each measurement outcome m is
associated with a positive semi-definite operator (equation (A.1.31)). In more ad-
vanced treatments, we are interested in measurement statistics drawn from positive
†
operator-valued measures (POVMs), a set of positive operators {Em } = {Mm Mm }
P
satisfying m Em = I in a way that gives us a complete set of positive opera-
tors with which to characterise ρ (see sections A.1.6.1 and A.1.6.3). Because of
this collapse, quantum measurement must be repeated using identical initial con-
ditions (state preparations) in order to generate sufficient measurement statistics
from which quantum data, such as quantum states ρ or operators such as U (t) may
be reconstructed or inferred (e.g. via state or process tomography). Chapter 3
of this work explores the role of measurement in quantum simulations. In Chap-
ters 4 and 5, we assume access to data about U (t) and estimates Û (t) assume the
existence of a measurement process that provides statistics that enable character-
isation of operators and states and in order to calculate loss for machine learning
optimisation protocols. As flagged above, measurement can be framed in terms of
quantum-to-classical channels (equation A.1.36) and permits composite (e.g. multi-
qubit) system measurement (see section A.1.6.2). Moreover, as we discuss in section
D.6.3, the distinct characteristics of quantum versus classical measurement have im-
plications for quantum control and quantum machine learning (such as that typical
online instantaneous feedback based control is unavailable due to the collapse of ρ
under measurement).
Quantum measurements then form a key factor in being able to distinguish quan-
tum states and operators according to quantum metrics (section A.1.8). The choice
of appropriate metric on H or B(H) is central to quantum algorithm design and ma-
2.2. QUANTUM INFORMATION PROCESSING 23
chine learning where, for example, one must choose an appropriate distance measure
for comparing function estimators fˆ to labels drawn from Y (see section D.6). State
discrimination (section A.1.8.1) for both quantum and probabilistic classical states
requires incorporation of stochasticity (the probabilities) together with a similarity
measure. For example, the Holevo-Helstrom theorem (A.1.38) quantifies the proba-
bility of distinguishing between two quantum states given a single measurement µ.
A range of metrics such as Hamming distance, quantum relative entropy (definition
A.1.27) and fidelity exist for comparing states and operators. In this work, we focus
upon the fidelity function (see section
pA.1.8.2) allowing state and operator discrim-
√ √ √ √
ination F (ρ, σ) = ρ σ 1 = Tr σρ σ (equation A.1.49) which is related
to the trace distance (definition A.1.25). The fidelity function is chosen as the loss
function for our empirical risk measure in Chapters 3 and 4 via the cost functional
given in equation 4.6.3.
where iH(u(t)) is a Hamiltonian comprising controls and generators (for us, drawn
from a Lie algebra g). The general Pontryagin Maximum Principle (PMP) control
problem then incorporates adjoint (costate) variables and other variational assump-
24 CHAPTER 2. BACKGROUND THEORY
tions in order to obtain the general form of control problem, the quantum versions
of which are set out in section C.5.3 (and see generally [15,23,46]). The drift Hamil-
tonian Hd and control Hamiltonian Hc combine as (equations (A.2.9) and (A.2.10)):
!
X X
H(u) = Hd + Hk uk U̇ = −i Hd + Hk uk U (2.2.2)
k k
where initial conditions are chosen depending on the problem at hand (often U (0) =
I) or, as per our global Cartan decomposition method in Chapter 5 in order to
streamline solving the optimal control problem. The objective of time-optimal con-
trol is to assume a target UT ∈ Rm (a reachable set given applicable controls) which
is reachable by applying control functions uj (t) ∈ U ⊂ Rm to corresponding gen-
erators in g or a control subset p ⊂ g, subject to a bounded norm on the controls
||uj (t)|| ≤ L. The targets are quantum states ρ(T ) at final time T represented via
CPTP unitary operators (channels) U (T ) acting on initial conditions (also repre-
sented in terms of unitaries). As such, for the quantum case, we are interested in
targets as elements of Lie groups of interest to quantum computation, such as uni-
tary groups and special unitary groups (see definition (B.2.4)). Given theoretical
assumptions as to the existence of length minimising geodesics on Riemannian or
subRiemannian manifolds (see sections C.2.1 and C.4), time optimisation becomes
equivalent to length (equation (C.2.8)) or energy (equation (C.4.2)) minimisation
(see section C.5.3).
Lie groups G are equipped with the structure of a differentiable manifold M
(see section C.1.1 and definition C.1.2). Thus we assume suitable structure on the
underlying manifold M such as the existence of a metric (e.g. Riemannian or sub-
Riemannian, section C.2.1 and C.5.7) which for Lie groups and their corresponding
Lie algebras is the metric induced by the Killing form (definition B.2.12).
In practice the minimisation problem proceeds using variational methods (see
sections C.5.4 and 5.6) and so in practice for controls u(t) means minimising arc-
RT
length via minimising control pulses ℓ(t) = minu(t) 0 ||u(t)||dt (see equations (C.5.3))
having regard to manifold curvature (which enters the minimisation problem via the
metric).
tum system comprising the system (or closed quantum evolution) and the environ-
ment (i.e. interactions with the environment). Such open systems can be modelled
(equation (A.3.1)) in a simple case via introducing environmental dynamics into the
Hamiltonian:
.
H(t) = H0 (t) + H1 (t) = Hd (t) + Hctrl (t) + HSE (t) + HE (t) (2.2.3)
| {z } | {z }
H0 (t) H1 (t)
where H0 (t) is defined as above encompassing drift and control parts of the Hamil-
tonian and where H1 (t) consists of two terms: (i) a system-environment interaction
part HSE (t) and (ii) a free evolution part of the environment HE (t).
dρ X † 1n † o
= −i[H, ρ] + γk Lk ρLk − Lk Lk , ρ (2.2.4)
dt k
2
P
with ρ being the state representation, H(t) the closed system Hamiltonian, k
a summation over all noise channels with dephasing (decoherence) rates γk . The
Lindblad operators Lk acting on ρ encode how the environment affects the closed
quantum state. For modelling of noise in Chapter 3, we impose noise β(t) along var-
ious axes of qubits (see section 3.5.3.1), an example being a single qubit Hamiltonian
with noise βz (t) imposed along the z-axis:
1 1
H= (Ω + βz (t)) σz + fx (t)σx .
2 2
quantum contexts or the Lie derivative (owing to its status in terms of derivations)
satisfying a number of algebraic properties (proposition B.2.2). As we discuss in the
next section, Lie algebras have a natural interpretation and nice geometric intuition
in terms of tangent bundles over Lie group manifolds M. Importantly (particularly
for our exegesis in Chapter 5) the Lie derivative can be identified with the endomor-
phic adjoint action of a Lie algebra upon itself (definition B.2.7) ad : g → EndK (g)
such that adX (Y ) = [X, Y ]. As with Lie groups, Lie algebra homomorphisms (defi-
nition B.2.8) allow mappings between Lie algebras which is important in geometric
(fibre bundle) formulations of quantum evolution over manifolds and corresponding
tangent bundles (see Appendix C). Semi-simple Lie algebras and Lie groups admit
a classification system based upon concepts of ideals (definition B.2.11) which is of
fundamental importance to Cartan’s classification of symmetric spaces. Establish-
ing time optimality using variational techniques requires metrics (inner products)
on Hamiltonians composed from g which in turn requires a way of combining gen-
erators in g. This is given by the Killing form (definition B.2.12) which is a bilinear
mapping over g given by B(X, Y ) = Tr(adX , adY ) for X, Y ∈ g. Additionally, Car-
tan’s criterion for semisimplicity is that the Killing form is non-degenerate, that
is g is semisimple if and only if the Killing form for g is non-degenerate, namely
B(X, Y ) ̸= 0 for all X, Y ∈ g.
Quantum control in terms of Lie groups and Lie algebras relies upon the impor-
tant bridge between the two provided by the exponential map between G and g. In
essence unitaries U (t) represent exponentiated Lie algebra g elements G ∼ exp(g).
This relationship as we discuss is algebraic but connects nicely (not focused on here)
with wave function representations of solutions to Schrödinger’s equation in terms
of exponentials (or their sine and cosine expansions). In this context, the matrix
exponential is an important bridging concept. It is defined (definition B.2.13) via
P∞ X n
the power series eX = n=0 n! for X ∈ g and satisfies a number of properties
we rely on throughout (set out in theorem (B.2.5)). Where the exponential map
allows one to transition from g to G, the derivative of the matrix exponential at
t = 0 allows one to transition from g ∈ G to X ∈ g via dtd etX t=0 = X (equation
(B.2.15)). Moreover, the mapping gives rise to an important unique homomorphism
between G and g at the identity, allowing symmetries and properties of G to be
explored (and, in a control sense, manipulated) by way of g itself. Formally, the
Lie algebra of a matrix group G ⊆ GL(n; C) is then given by (definition B.2.16)
g = {X ∈ Mn (C) | etX ∈ G for all t ∈ R}. Moreover, this relationship allows us
to map between the adjoint action of g and that of G itself (equation (B.2.16))
[X, Y ] := XY − Y X = dtd etX Y e−tX t=0 which we rely upon for example in our final
Chapter.
An important connection between algebraic and geometric methods in quantum
28 CHAPTER 2. BACKGROUND THEORY
information and machine learning arises from the correspondence between Lie alge-
bras g and the tangent plane (or bundle) of a manifold T M. The correspondence
(theorem B.2.17) provides that each Lie algebra g of G is equivalent to the tan-
gent space to G at the identity element of G. This equivalence means that g has
a representation as X ∈ Mn (C). The vectors X represent derivatives at t = 0 of
smooth curves γ : R → G with γ(0) = I, γ ′ (0) = X, allowing characterisation of
sequences of unitaries (Uj (t)) ∈ G (here j indexes the sequence) as paths γj (t), i.e.
we are equating curves with sequences of unitaries. Note that G can also carry
a representation in Mn (C). Furthermore, the Lie algebra / group homomorphism
(section B.2.8) Φ : G → H, φ : g → h such that dΦ = φ. The homomorphism is
fundamental to important theorems such as the Baker-Campbell-Hausdorff theorem
(section B.2.9):
1 1
exp(X) exp(Y ) = exp X + Y + [X, Y ] + [X, [X, Y ]] + ...
2 12
gebra elements to automorphisms of the Lie algebra itself. The benefit of adopting
the adjoint representation is also to some degree one of computational efficiency
where symmetry structure can be more easily diagnosed or embedded (such as in
the case of equivariant neural networks [30, 34, 37]). Adjoint expansions (section
B.3.5) are an important tool in later Chapters (particularly in Chapter 5). Recall-
ing group adjoint action as Adh (g) = hgh -1 with the related Lie algebra adjoint
action adX (Y ) = [X, Y ], we note that such conjugation can be expanded in sine and
cosine terms (equation (B.3.2)):
eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (2.3.1)
This relation together with the sine and cosine (equation (B.3.7)) expansions of
the adjoint action are central to the close-form results for calculation of minimum
time (equation (5.3.13)) time-optimal Hamiltonians for quantum control involving
Riemannian symmetric spaces (see section 5.6).
Lie algebras are defined over R (real) and C (complex) fields. Complex Lie
algebras g (over C) and real Lie algebras g0 (over R) are related via complexification
and the related concept of the real form of a complex Lie algebra. A real vector
space V (R) is the real form of a complex vector space when the complexification
of V is given by W R = V ⊕ iV (definition B.3.3). In Lie algebraic terms this
is g = gC0 = g0 + ig0 (equation (B.3.9)). In particular, this allows us to write
A ∈ Mn (C) as a real-valued matrix only (equation (B.3.10)) which we leverage in
Chapter 4.
B.5) via the adjoint action. For a subalgebra h ⊂ g with a (diagonal) basis H and
a dual space h∗ with elements ej ∈ h∗ (a dual space of functionals ej : V → C), we
can construct a basis of g given by {h, Eij } where Eij is 1 in the (i, j) location and
zero elsewhere (section B.4.2). We study the adjoint action of elements of H ∈ h on
each such eigenvector:
which satisfies certain commutation relations given in equation (B.4.2). Roots may
then be placed in a positive or negative ordered sequence. Each root is a non-
generalised weight of adh (g) (a root of g with respect to h). The set of roots is
denoted ∆(g, h). The weights (definition B.4.1) for a given representation π : h →
V (C) can then be constructed as a generalised weight space Vα = {v ∈ V |(π(H) −
α(H)1)n v = 0, ∀H ∈ h} for Vα ̸= 0 (v ∈ Vα are generalised weight vectors with
α the weights). Here π is the adjoint action of h on X ∈ g. Weights allow g to
be decomposed as g = h ⊕α∈∆ gα (B.4.3). Note that g0 is then the weight space
subalgebra associated with the zero weight under this adjoint action. This leads
to the important definition of a Cartan subalgebra (definition B.4.2) h ⊂ g as a
nilpotent Lie algebra of a finite-dimensional complex Lie algebra g such that h = g0 .
Cartan subalgebras are maximally abelian subalgebras of g. There may be multiple
Cartan subalgebras in g and each is conjugate via an automorphism of g (which
for Cartan decompositions means under conjugation by an element of the isometry
group K). As we discuss in Chapter 5, the choice of Cartan subalgebra is of pivotal
significance to the application of global Cartan decompositions for time-optimal
control.
Cartan subalgebras allow the generalisation of the concept of roots to a root
system. In this formulation (definition B.4.3), roots are the non-zero generalised
weights of adh (g) with respect to the Cartan subalgebra. Recall we denote the set of
roots α is ∆(g, h). Cartan subalgebras h are the maximally abelian subalgebras such
that the elements of adg (h) are simultaneously diagonalisable given that [Hk , Hj ] =
0 for Hk , Hj ∈ h. Roots act as functionals such that for α ∈ ∆, we have that
α(H) = B(H, Hα ), ∀H ∈ h where B(·, ·) denotes the Killing form on g and h (see
section B.4.4 for more detail). From this concept of roots, we can define abstract
root systems (section B.4.5) that satisfy a range of properties set out in definition
B.4.4. These in turn allow roots to be framed in geometric terms related to the Weyl
2.3. ALGEBRA AND LIE THEORY 31
group and concepts of angles between roots (see section B.4.6) which are expressed
via Dynkin diagrams (see below), together with an ordering of roots (see section
B.4.7). Chapter 5 includes derivation of a root system, for example, in relation to
SU (3) connected with the introduction of our method for time-optimal synthesis.
Cartan algebras can be used to construct a Cartan-Weyl basis which is a basis of the
Lie algebra g comprising the Cartan subalgebra h together with root vectors, where
each root can be thought of as a symmetry transformation (see equation (B.4.1)).
Abstract root systems moreover allow the construction of a Cartan matrix (def-
inition B.4.5) where given ∆ ∈ V and simple root system Π = αk , k = 1, ..., n =
dim V , the Cartan matrix of Π an n × n matrix A = (Aij ) whose entries are given
by Aij = 2 ⟨αi , αj ⟩ /|αi |2 . Cartan matrices are then used to construct Dynkin dia-
gram (definition B.4.6) as set out in Figure B.1. Dynkin diagrams allow for visual
representation of root system, encoding angles and lengths between roots. From a
quantum control perspective, roots can be related to the transition frequencies be-
tween different energy levels of a quantum system. Importantly, the Cartan matrix
(definition B.4.5), derived from the inner products of simple roots, tells us about
the relative strengths and coupling between these transitions (see equation (5.8.14)
for a generic example).
We conclude this section with a discussion of the key concept of Cartan decompo-
sitions (section B.5) of semisimple Lie groups and their associated algebras. For a
given a complexified semisimple Lie algebra g = gC0 , a corresponding Cartan involu-
tion θ is associated with the decomposition g = k ⊕ p. In turn, this allows a global
decomposition of G = KAK where K = ek and A = ea where a ⊂ p (a maximal
abelian subspace of p), which is a generalisation of polar decompositions of matrices.
For classical semisimple groups, the real matrix Lie algebra g0 over R, C is closed un-
der conjugate transposition, in which case g0 is the direct sum of its skew-symmetric
k0 and symmetric p0 members. It can be shown [9] that for semi-simple Lie alge-
bras this admits a decomposition g0 = k0 ⊕ p0 which in turn can be complexified as
g = k ⊕ p. To obtain this decomposition, we consider the fundamental involutive
symmetry automorphism on g denoted the Cartan involution. Using the Killing
form B, X, Y ∈ g and an involution θ, a Cartan involution (definition B.5.1) is a
positive definite symmetric bilinear form such that Bθ (X, Y ) = −B(X, θY ). From
the existence of this involution, we infer the existence of the relevant Cartan decom-
position given by g = k ⊕ p with k the +1 symmetric eigenspace θ(k) = k and p the
-1 anti(skew)symmetric eigenspace θ(p) = −p satisfying the following commutation
32 CHAPTER 2. BACKGROUND THEORY
The subspaces k, p are orthogonal with respect to the Killing form in that Bg (X, Y ) =
0 = Bθ (X, Y ) i.e. for X ∈ k and Y ∈ p we have B(X, Y ) = 0. In a quantum control
context, this aligns with the decomposition of H corresponding to states generated
by p and k. The remarkable and important feature of the commutations above is that
under the operation of the Lie bracket, Hamiltonians comprised solely of elements
in p can in fact reach targets in k despite the fibre bundle structure partitioning
the space (see the following section). The corresponding Lie group decomposition
is given by G = KAK where K = exp(k), A = exp(a) (definition B.5.3) for a ⊂ h.
Every element g ∈ G thus has a decomposition as g = k1 ak2 where k1 , k2 ∈ K, a ∈ A.
This allows unitary channels to be decomposed as U = kac and U = pk (see [15]
for the latter) for k, c ∈ exp(k) = K and p ∈ exp(p). We note also, as discussed
in Chapter 5, that for our global decomposition method, we seek a ⊂ p which
means that the Cartan subalgebra must have a compact and non-compact subset
i.e. h ∩ k ̸= 0 and h ∩ p ̸= 0, where we seek the maximally non-compact (most
overlap with p) subalgebra. Because h may lie entirely within k, it may be that a
Cayley transform is required which in essence involves conjugation of elements of
h with a root vector (see Knapp [9] and section B.5.3 for detail). An example of
Cayley transforms for SU (3) is set out in section 5.8. We now progress to cover
important concepts from differential geometry applicable to methods used in this
work, especially those from geometric control theory.
dxi dxi
(γ1 (t)) = (γ2 (t)) i = 1, ..., m.
dt 0 dt 0
Thus the intuitive notion of a vector ‘sticking out’ from a surface or along a curve
is replaced with analytic (derivation) properties of M via its representation in lo-
cal charts. The equivalence class of curves satisfying this condition at p ∈ M is
34 CHAPTER 2. BACKGROUND THEORY
sometimes denoted [γ], hence we can regard tangents as vectors. The tangent space
Tp M is defined as the set of all tangents at p ∈ M, while the tangent bundle T M is
defined as the union of all Tp M. We also define a projection map from the tangent
bundle to M via π : T M → M which, as we shall see, is related in Lie theoretic
contexts to the exponential map from g to G. Tangent vector v ∈ Tp M can be used
to define the directional derivative on f ∈ C ∞ (M) via considering v as operators on
f via v(f ) = df (γ(t))
dt 0
. The directional derivative points in the direction of steepest
ascent and is central to defining the all important gradient (definition C.1.5) central
also to machine learning, quantum information processing. For f : M → R on M,
the gradient of f at a point p ∈ M, denoted ∇f (p) is the unique vector v ∈ Tp M
satisfying v(f ) = ⟨∇f (p), v⟩. Here v(f ) denotes the directional derivative of f in the
direction of v (as above), and ⟨·, ·⟩ denotes the applicable metric. We discuss gra-
dients at some length in later sections especially in the context of backpropagation
and stochastic gradient descent.
While intuitively one often regards a problem as confined to a single M and
bundle T M, each Tp M in the general case is not necessarily identical. For a map
between manifolds h : M → N , then we can define a corresponding pushforward
(definition C.1.6) as taking a vector in the tangent space associated with p ∈ M,
namely v ∈ Tp M and represented as h∗ (v) to the tangent space associated with the
element of N i.e. h∗ : T M → T N . As discussed in section C.1.2, tangents can be
characterised operators or linear maps acting on functions f ∈ C ∞ (M) such that
T M is a derivation. Each tangent vector can be written as v = m
P µ
µ v (∂µ ) where
(∂µ ) represent a basis of partial derivatives of T M with v µ ∈ R the component
functions (equation (C.1.11)).
i.e. r tensor products of the tangent space with itself tensor-producted with s ten-
sor products of the dual cotangent space. Moreover, tensors can be regarded as
mappings taking elements from Tp M and the dual space Tp∗ M and contracting
them down to scalars (equation (C.1.24)). This formalism of contraction is fun-
damental to calculating metrics (via metric tensors), curvature (via the Riemann
curvature tensor) and other operations. Recalling the contraction to unity of ba-
sis elements in equation (C.1.16), we can see that contraction of arbitrary tensors
⟨af i , bej ⟩ = ab ⟨f i , ej ⟩ = abδji effectively multiplies the remaining tensors (vectors
and/or covectors) by the scalar product of coefficients (see section C.1.4). This
can be seen in particular for tensor contractions (definition C.1.15). In quantum
information and control contexts, the quantum measurement (via the partial or full
trace) can be considered in effect a tensorial contraction in this way. Thus tensors
play a central role in quantum theory and dynamics.
A tensor of central importance to calculation of time-optimal unitary sequences,
measurement and machine learning with quantum systems is the metric tensor (sec-
tion C.1.4.1) defined as a [0,2]-form tensor field mapping gp : Tp M × Tp M → R
given by g := gij dxi ⊗ dxj with metric components gij = ⟨ei , ej ⟩ and inverse com-
ponents g ij = ⟨dxi , dxj ⟩. As we discuss below, it is the existence of a metric tensor
on Riemannian and subRiemannian (symmetric) spaces of interest that allows for
calculation of time-optimal metrics, such as arc-length (minimal time) or energy.
The general principles above can be related to n-forms (section C.1.4.2) in order to
build up the concept of an exterior product of forms ω1 ∧ ω2 (definition C.1.18) and
exterior derivative (definition C.1.19). The former is related to Cartan’s structural
equations (theorem C.2.2) and the Maurer Cartan form which can be related to
measures of curvature for example.
Geometric concepts of fibre bundles and connections are integral to geometric control
theory and geometric techniques in quantum information processing and machine
learning. The represent central concepts leveraged in Chapter 4 and 5. They play a
fundamental structural role in the abstract underpinning of how vectors transform
across manifolds M, curvature and parallel transport. Usually in quantum comput-
ing contexts, one assumes the existence of a single Hilbert space H within which
vectors are transported. In geometric framings, instead usually one begins with the
type of differentiable manifold M equipped with a topology of interest. One then
associates to points in M additional abstract structure to enable actions like dif-
ferentiation and mappings of interest to be well-defined, such as a real or complex
valued vector space, but it may be some other type of structure. In order to ob-
38 CHAPTER 2. BACKGROUND THEORY
tain the type of structural consistency quantum information researchers are used to
(i.e. the ease of dealing with a single H), one needs to impose additional geometric
structure, including bundles and connections. To see this, note that in geometric
framings, the evolution of a system from one state to another is represented as a
transformation from p → p′ for p, p′ ∈ M. Each p ∈ M has its own abstract space
e.g. its own vector space. One must therefore define how vectors in one space e.g.
Tp M transforms to Tp′ M.
To handle this abstract formulation, geometry adopts the concept of a bundle is
defined as a triple (E, π, M), where E (denoted the base space) and M are linked
via a continuous projection map π : E → M with a corresponding inverse from
π -1 : M → E. The idea of a bundle encapsulates the means of assigning abstract
structures such as vector spaces to points p ∈ M. These abstract structures are the
fibres associated with p. The base space is then the union of all fibres. Thus we
S
define a base (bundle) space as E = p∈M Fp , with Fp being fibres which remain
abstract at this stage. Formally (definition C.1.21), a fibre over p is the inverse
image of p under π. It arises via the map π −1 : M → T M, an example of a
fibre bundle associating p ∈ M with the tangent space Tp M. The projection π
associates each fibre Fp with a point p ∈ M, where Fp = π −1 ({p}) defines Fp as
the preimage {p} of p under π. Certain bundles have the special property that the
fibres π −1 ({p}), p ∈ M are all homeomorphic (diffeomorphic for manifolds) to F . In
such cases, F is known as the fibre of the bundle and the bundle is said to be a fibre
bundle. For vectors, this is the set of all vectors that are tangent to the manifold
at the point p. The fibre bundle is sometimes visualised in diagrammatic form (see
Isham [48] §5.1.2):
F E
π
M
An important canonical fibre bundle is the principal fibre bundle (section C.1.6.1)
whose fibre acts as a Lie group. A principal fibre bundle has a typical fibre that is
a Lie group G, and the action of G on the fibres is by right multiplication, which
is free and transitive. The fibres of the bundle are the orbits of the G-action on E
and hence are not generally homeomorphic to each other. When G is the fibre itself
and the action on G is both free and transitive. When dealing with principal fibre
S
bundles, often the notation P = p∈M Fp is used to denote the total space (i.e. E)
in order to emphasise that each fibre is isomorphic to G itself and that all fibres
in the bundle are homogeneous i.e. they are all structurally isomorphic (so we can
utilise a single representation for each). This is not necessarily the case in general
for base spaces E.
To obtain the more specific form of a vector bundle or where fibres are the Lie
2.4. DIFFERENTIAL GEOMETRY 39
where X ∈ X. In this work we generally use the notation ∇γ̇(t) to emphasise the
covariant derivative is with respect to γ̇(t). The notation in geometric settings
2.4. DIFFERENTIAL GEOMETRY 41
(section C.1.10) can get quite dense, but we can see in the definition above that τt−1
transports vectors back along γ(t) → γ(0) = p0 while preserving parallelism. Here
X(p0 ) can be thought of as the initial vector field at γ(0) = p0 (the start of our
curve) and X(γ(t)) the vector field at some later point γ(t) with τt−1 X(γ(t)) the
vector at a later time transported back. The equation above is then zero in such case
as τt−1 X(γ(t)) − X(γ(0)) which requires identity between the vectors, hence they
are ‘parallel’. The ∇X operator is also linear in X(M) (which can also be regarded
as a module over C ∞ (M)). These properties are expressed by considering ∇X as
an affine connection an operator ∇ : X(M) × X(M) → X(M) which associates
with X ∈ X(M) a linear mapping ∇X of X(M) satisfying affine conditions (see
definition C.1.34). As we mention below, Riemannian manifolds area equipped
with an important unique affine connection denoted the Levi-Civita connection.
The Levi-Civita connection is unique in that it is torsion-free (so characterised by
curvature only) and by virtue of its compatibility with the Riemannian metric (i.e.
it preserves the inner product of tangent vectors under parallel transport, in turn
preserving angles and lengths along curves).
We can now define a geodesic (definition C.1.35) using such notions of parallelism
and the covariant derivative. Generally, a geodesic is a curve that locally minimises
distance and is a solution to the geodesic equation derived from a chosen connection.
Denote γ : I → M, t 7→ γ(t) for an interval I ⊂ R (which we generally without loss
of generality specify as I = [0, 1]) with an associated tangent vector γ̇(t). Here γ
is regular. Two vector fields X, Y are parallel along γ(t) if ∇X Y = 0, ∀t ∈ I. A
curve γ : I → M, t 7→ γ(t)in M is denoted a geodesic if the set of tangent vectors
{γ̇(t)} = Tγ(t) M is parallel with respect to γ, corresponding to the condition that
∇γ γ̇ = 0, which we denote the geodesic equation. In a coordinate frame the geodesic
equation is expressed in its somewhat more familiar form as (equation C.2.1):
d2 u γ α
γ du du
β
+ Γ αβ =0
ds2 ds ds
where:
1 ∂gµα ∂gµβ ∂gαβ
Γγαβ = g γµ + −
2 ∂uβ ∂uα ∂uµ
are the Christoffel symbols of the second kind (essentially connection coefficients)
with g γµ the inverse of the metric tensor and ds usually indicates parametrisation
by arc length. Solutions to this equation are geodesic curves γ(t). For Lie group
manifolds, all geodesics are generated by generators from the horizontal subspace
Hp M, but not all curves generated from the horizontal subspace are geodesics.
42 CHAPTER 2. BACKGROUND THEORY
which provides that Gaussian curvature K is invariant under local isometries) can
be found in section C.2.2.
With the understanding of geometric concepts above, we can now progress to key
concepts in subRiemannian geometry and geometric control which are applied in
later Chapters. SubRiemannian geometry is relevant to quantum control problems
where only a subspace p ⊂ g is available for Hamiltonian control. Intuitively, this
means that the manifold exhibits differences with a Riemannian manifold where
the entirety of g is available, including in relation to geodesic length. In this sense
subRiemannian geometry is a more generalised form of geometry with Riemannian
geometry sitting within it conceptually. We detail a few key features of subRieman-
nian theory below before moving onto geometric control.
SubRiemannian geometry involves a manifold M together with a distribution
∆ upon which an inner product is defined. Distribution in this context refers to a
linear sub-bundle of the tangent (vector) bundle of M and corresponds to the hori-
zontal subspace of T M discussed above and where the vertical subspace is non-null.
Formally it is defined (definition C.4.1) as consisting of a distribution ∆, being a
vector sub-bundle Hp M ⊂ T M together with a fibre inner product on Hp M. The
sub-bundle corresponds to the horizontal distribution, having the meaning ascribed
to horizontal subspace Hp M above. In the language of Lie algebra, where for a de-
composition g = k ⊕ p, we have for our accessible (or control) subspace p ⊂ g rather
than p = g. Thus quantum control scenarios where only a subspace of g is available
for controls can be framed as subRiemannian problem (under typical assumptions).
Geometrically, this means that the generators or vector fields X(M) are constrained
to limited directions. A transformation not in the horizontal distribution may be
possible where the space exhibits certain symmetry structure such as for symmetric
spaces equipped with Lie triple property (as we discuss for certain subspaces where
[p, p] ⊆ k), but in a sense these are indirect such that the geodesic paths connect-
ing the start and end points will be longer than for a Riemannian geometry on
2.5. GEOMETRIC CONTROL 45
2.5.1 Overview
We conclude this section by bringing the above concepts together through the ful-
crum of geometric control theory (section C.5) relative to quantum control problems
of interest. The primary problem we are concerned with in our final two chapters is
solving time optimal control problems for certain classes of Riemannian symmetric
space. The two overarching principles for optimal control are (a) establishing the ex-
istence of controllable trajectories (paths) and thus reachable states; and (b) to show
that the chosen path (or equivalence class of paths) is unique by showing it meets a
minimisation (thus optimisation) criteria. The Pontryagin Maximum Principle pro-
vides a framework and conditionality for satisfying these existence and uniqueness
conditions. As we discuss below, it can, however, often be difficult or infeasible to
find solutions to an optimisation problem expressed in terms of the PMP. However,
for certain classes of problem (with which we are concerned) involving target states
in Lie groups G with generators in g, results from algebra and geometry can be
applied to satisfy these two requirements. Thus, under appropriate circumstances,
where targets are in Lie groups G with the Hamiltonian constructed from Lie alge-
bra element g, so long as our distribution ∆ ⊂ g is bracket-generating (so we can
recover all of g via nested commutators), then G is in principle reachable. This sat-
isfies the existence requirement. To satisfy uniqueness requirement, we then apply
46 CHAPTER 2. BACKGROUND THEORY
In our case, time optimal control is equivalent to finding the time-minimising sub-
Riemannian geodesics on a manifold M corresponding to the homogeneous symmet-
ric space G/K. Our particular focus is the KP problem, where G admits a Cartan
KAK decomposition where g = k ⊕ p, with the control subset (Hamiltonian) com-
prised of generators in p. In particular such spaces exhibit the Lie triple property
[[p, p], p] ⊆ k given [p, p] ⊆ k. In such cases g remains in principle reachable, but
where minimal time paths constitute subRiemannian geodesics. Such methods rely
upon symmetry reduction [55]. As D’Alessandro notes [15], the primary problem
in quantum control involving Lie groups and their Lie algebras is whether the set
of reachable states R (defined below) for a system is the connected Lie group G
generated by L = span{−H(u(t))} for H ∈ g (or some subalgebra h ⊂ g) and
u ∈ U (our control set, see below). This is manifest then in the requirement that
R = exp(L). In control theory L is designated the dynamical Lie algebra and is
generated by the Lie bracket (derivative) operation among generators in H. The
dynamical adage is a reference to the time-varying nature of control functions u(t).
We explicate a few key concepts below. The first is the general requirement that
tangents γ̇(t) be essentially bounded so that ⟨H(t), H(t)⟩S ≤ N for all t ∈ [0, T ] for
some constant N (definition C.5.2). For time-optimal synthesis, we seek geodesic
(or approximately geodesic) curves γ(t) ∈ M. For this, we draw upon the theory
of horizontal subspaces described above manifest via the concept of a horizontal
control curve (definition C.5.3). Given γ(t) ∈ M with γ̇(t) ∈ ∆γ ⊆ Hp M, we can
define horizontal control curves as (equation (C.5.1)):
m
X
γ̇(t) = uj (t)Xj (γ(t))
j
where uj are the control functions given by uj (t) = ⟨Xj (γ(t)), γ̇(t)⟩. The length of
a horizontal curve, which is essentially what we want to minimise in optimisation
problems, is given by (equation (C.5.3)):
v
T T T u m 2
Z Z Z uX
p
ℓ(γ) = ||γ̇(t)||dt = ⟨γ̇(t), γ̇(t)⟩dt = t uj (t)dt (2.5.1)
0 0 0 j
2.5.4 KP problems
control [19, 26, 57, 58], drawing on the work of [59] as particularly set out in [23] and
[60]. Later work building on Jurdjevic’s contribution includes that of Boscain [24,61,
62], D’Alessandro [15, 17] and others. KP problems are also the focus of a range of
classical and control problems focused on the application of Cartan decompositions
of target manifolds G = K ⊕ P where elements U ∈ G can be written in terms
of U = KP or U = KAK (see [15]). In this formulation, the Lie group and Lie
algebra can be decomposed according to a Cartan decomposition (definition B.5.2)
g = k⊕p (and associated Cartan commutation relations). The space is equipped with
a Killing form (definition B.2.12) which defines an implicit positive definite bilinear
form (X, Y ) which in turn allows us to define a Riemannian metric restricted to
G/K in terms of the Killing form. Such control problems posit controls in p with
targets in k and are a form of subRiemannian control problem.
Assume our target groups are connected matrix Lie groups (definition B.2.1).
Recall equation (C.5.1) can be expressed as:
X
γ̇(t) = Xj (γ)uj (t) (2.5.2)
j
where Aγ(t) represents a drift term for A ∈ k. Our unitary target in G can be
expressed as:
X
γ̇(t) = exp(−At)Xj exp(At)γ(t)uj (2.5.4)
j
for bounded ||Ap || = L. For the KP problem, the PMP equations are integrable.
One of Jurdjevic’s many contributions was to show that in such KP problem con-
texts, optimal control for ⃗u is related to the fact that there exists Ak ∈ k and Ap ∈ p
such that:
m
X
Xj uj (t) = exp(At)Xj exp(−At) (2.5.5)
j
50 CHAPTER 2. BACKGROUND THEORY
Following Jurdjevic’s solution [60] (see also [56]), optimal pathways are given by:
resulting in analytic curves. Our final Chapter 5 sets out an alternative method for
obtaining equivalent time-optimal control solutions for certain classes of problem.
Quantum machine learning (QML) adapts concepts from modern machine learning
(and statistical learning) theory to develop learning protocols for quantum data and
quantum algorithms. This section summarises key concepts relevant to the use of
machine learning in Chapters 3 and 4. A more expansive discussion of background
concepts is set out in Appendix D from which this section draws. The landscape of
QML is already vast thus for the purposes of our work it will assist to situate our re-
sults within the useful schema set out in [64] below in Table (D.1) from Appendix D
which we reproduce below. Our results in Chapters 3 and 4 fall somewhere between
the second (classical machine learning using quantum data) and fourth (quantum
machine learning for quantum data) categories, whereby we leverage classical ma-
chine learning and an adapted bespoke architecture equivalent to a parametrised
quantum circuit [65] (discussed below).
QML Taxonomy
QML Division Inputs Outputs Process
Classical ML Classical Classical Classical
Applied classical Quantum (Clas- Classical (Quan- Classical
ML sical) tum)
Quantum algo- Classical Classical Quantum
rithms for classi-
cal problems
Quantum algo- Quantum Quantum Quantum
rithms for quan-
tum problems
Table 2.1: Quantum and classical machine learning table. Quantum machine learning covers four
quadrants (listed in ‘QML Division’) which differ depending on whether the inputs, outputs or
process is classical or quantum.
2.6. QUANTUM MACHINE LEARNING 51
tence of a sample Dn = {Xi , Yi }n , loss function L and family of functions (e.g. clas-
sifiers) F . The objective then becomes learning an algorithm (rule) that minimises
empirical risk, thereby obtaining a best estimator across sampled and out-of-sample
data, that is fˆn = arg minR̂n (f ). Usually f is a parameterised function f = f (θ)
f ∈F
such that the requisite analytic structure (parametric smoothness) for learning pro-
tocols such as stochastic gradient descent is provided for by the parametrisation,
typically where parameters θ ∈ Rθ . The analyticity of the loss function L means
that R̂n (f ) = n1 ni L(F (Xi (θ)), Yi ) is smooth in θ, which implies the existence of
P
a gradient ∇θ R̂n (f (θ)). The general form of parameter update rule is then a tran-
sition rule on Rθ that maps at each iteration (epoch) θi+1 = θi − γ(n)∇θ R̂n (f (θ))
(see discussion of gradient descent below).
A crucial choice in any machine learning architecture - and one we justify in de-
tail in later Chapters - is that of the loss function. Two popular choices across
statistics and also machine learning (both classical and quantum) are (a) mean-
squared error (MSE) and (b) root-mean squared error (RMSE). The MSE (equa-
tion (D.3.6)) for a function f parameterized by θ over a dataset Dn is given by
P ˆ 2
MSE(fθ ) = 1 nn i=1 fθ (Xi (θ)) − Yi . Other common loss functions include (i)
cross-entropy loss e.g. Kullback-Leibler Divergence (see [12] §14) for classification
tasks and comparing distributions (see section A.1.8 for quantum analogues), (ii)
mean absolute error loss and (iii) hinge loss. The choice of loss functions has sta-
tistical implications regarding model performance and complexity, including bias-
variance trade-offs.
As we discuss below, there is a trade-off between the size of F and empirical risk
performance in and out of sample. We can minimise R̂n (f ) by specifying a larger
class of estimation rules. At one extreme, setting f (x) = Yi when x = Xi and zero
otherwise (in effect, F containing a trivial mapping of Xi ) sends R̂n (f ) → 0, but
performs poorly on out of sample data. At the other extreme, one could set f (x)
to capture all Yi , akin to a scatter-gun approach, yet this would inflate R̂n (f ). The
relation between prediction rule F size and complexity illustrates a tradeoff between
approximation and estimation (known as ‘bias-variance’ tradeoff for squared loss
functions). The tradeoff is denoted excess risk, defined as the difference between
2.6. QUANTUM MACHINE LEARNING 53
Here estimation error reflects how well fˆ compares with other candidates in F,
while approximation error indicates performance deterioration by restricting F. To
reduce empirical risk (and thus excess risk), two common strategies are (a) limiting
the size of F and thus estimation error and (b) regularisation techniques, consti-
tuting inclusion of a penalty metric that penalises increases in variance (overfitting)
of models expressed via fˆn = arg minf ∈F {R̂n (f ) + C(f )} (equation (D.3.8)). While
such strategies are commonplace and deployed in our machine learning architec-
tures in later Chapters, we note the existence of no free lunch theorems which state
that no algorithm (or choice of fˆ) can universally minimise statistical risk across
all distributions, placing effective limits on learnability in terms of restrictions on
generalisability (see section D.3.3). In section D.3.4 we also set out a few key per-
formance measures that are often used in classical and quantum machine learning
literature. These measures, such as binary classification loss, accuracy, AUC/ROCR
scores and F1-scores all seek to assess model performance. The AUC (Area Under
the Curve) score represents the area under the ROC (Receiver Operating Character-
istic) curve, the latter of which is a plot of the true positive rate (sensitivity) against
the false positive rate (1-specificity) different thresholds. Intuitively, a higher AUC
score represents a higher ratio over such thresholds between true positives and false
positives, thus providing a measure of how well the model performs (see [12]).
networks effectively arises from the introduction of non-linear functions of the linear
components of the model (e.g. project pursuit regression as discussed below) [12]. A
simple example of regularisation in such models is given by ridge regression β̂ridge =
arg minβ {L(y, Xβ) + λ∥β∥22 } (equation (D.4.2)). Here λ ∈ R is a penalty term
which inflates the loss more if the parameters β are too large in a process known
as regularisation. Moreover, from such formalism we obtain ridge function f (X) =
g ⟨X, a⟩ for X ∈ Rm , g : R → R being a univariate function (a function on the
loss function in effect), a ∈ Rm with the inner product. Ridge functions essentially
wrap the linear models of statistics in a non-linear kernel sometimes denoted project
pursuit regression f (X) = N T
P
n gn (ωm X) (equation (D.4.4)).
The ridge functions nm (ωnT X) vary only in the directions defined by ωn where the
T
feature Vm = ωm X can be regarded as the projection of X onto the unit vector ωm .
For sufficiently large parameter space, such functions can be regarded as universal
approximators (for arbitrary functions) and form the basis of neural networks. The
architecture of neural networks in quantum and classical machine learning involves a
number of characteristics such as network topology, constraints (e.g. regularisation
strategies or dropout), initial condition choices and transition functions. Neural
network architectures are accordingly modelled using a Fiesler framework such that
where they are formally defined as a nested 4-tuple N N = (T, C, S(0), Φ) where
T is network topology, C is connectivity, S(0) denotes initial conditions and Φ
denotes transition functions (definition D.4.2). This framework has since influenced
modern descriptions of neural networks and their architecture. Neural networks
can abstractly be regarded as extensions of non-linearised linear models (as per
project pursuit regression above) constituted via functional composition via layers.
Each layer takes generally speaking data as an input (from previous layers or initial
inputs) which become functions in linear models, which in turn are arguments in a
non-linear function denoted an activation function which represents the output of
the layer. Formally we define an activation function such that for a vector X ∈ Rn ,
weight ω ∈ Rm×n and bias term β0 ∈ Rm , we have an affine (linear) transformation
z = ωX + β0 ∈ Rm . The activation function (definition D.4.3) is then the function
σ : Rm → Rm with σ(z) = σ(ωX + β0 ) = (σ(z1 ), σ(z2 ), . . . , σ(zm ))T .
The function-compositional nature of neural networks is usefully elucidated by
considering the basic feed-forward (multi-layer perceptron) neural network (definition
D.4.4). Such a network comprises multiple layers ali of neurons such that each neuron
in each layer (other than the first input layer) is a compositional function of neurons
in the preceding layer. A fully-connected network is one where each layer’s neurons
2.6. QUANTUM MACHINE LEARNING 55
are functions of each neuron in the previous layer. We represent each layer l as
(equation D.4.7):
nl−1
!
(l)
X (l) (l−1) (l)
ai = σil wij aj + βi .
j=1
The typical neural network schema (section D.4.2.3) then involves (a) input layers
where (feature) data X is input, (b) hidden layers of the form above followed by (c)
an output layer aL that outputs according to the problem at hand (e.g. for classifi-
cation or regression problems). Sometimes the final layer is considered in addition
to the network itself. There is a constellation of sophisticated and complex neural
network architectures emerging constantly within the computer science literature.
N N
(l+1) (l)
X ∂ R̂k (l)
X
ωij = ωij − γl (l)
= ωij − γl ∇ω(l) R̂k .
k=1 ∂ωij k=1
ij
(l)
Here ωij is the weight vector for neuron i in layer l that weights neuron j in layer
(l)
l − 1, a(l) is layer l, ai is the ith neuron in layer l and nl is the number of neurons
(units) in layer l. The error quantities used for the updating are given by the
backpropagation formula (equation (D.5.12)):
nl+1
(l) (l)
X (l+1) (l+1)
δi = σi′l (zi ) ωiµ δµ .
µ=1
56 CHAPTER 2. BACKGROUND THEORY
(l)
Here δi represents the error term for layer l and neuron i which can be seen to be
(l+1)
dependent upon the error term δµ of the subsequent l + 1th layer (see section
D.5.2 for a full derivation). The backpropagation equations thus allow computation
of gradient descent:
nl+1
∂ R̂i (l) (l−1)
X (l+1) (l+1)
(l)
= ∇ω(l) R̂i = σi′l (zi )aj ωiµ δµ . (2.6.1)
∂ωij ij
µ=1
In this way, errors are ‘back-propagated’ through the network in a manner that up-
dates weights θ in the direction of minimising loss (i.e. optimising), thus steering
the model f towards the objective. Note our discussion in section D.5.2 on differ-
ences in the quantum case, primarily arising from quantum state properties and the
effects of quantum measurement in collapsing quantum states (see section A.1.6).
In general, there are a number of considerations regarding how gradient descent is
calculated and how hyperparameters are tuned. Backpropagation equations (or vari-
ants thereof) are as a practical matter encoded in software such as TensorFlow [68]
(used in this work) where one can either rely on the standard packages inherent
in the program, or tailor a customised version. In our approach in the following
Chapters, we adopted a range of such methods as detailed below. See sections D.5.3
and D.5.4 for more detail.
upon non-linear and dissipative dynamics [75], contrary to the linear unitary
dynamics of quantum computing. Thus leveraging the functional approxima-
tion benefits of neural networks while respecting quantum information con-
straints requires specific design choices (see [70, 76, 77] for examples). One of
the motivations for the greybox architecture that characterises our approach
in Chapters 3 and 4 is to design machine learning systems that explicitly over-
come this challenge by embedding unitarity constraints within the network
itself.
4. Barren plateaus. Barren plateaus [78] bear some similarity to the classical van-
ishing gradient problem (albeit with specific differences as noted in [79]) where
gradient expectation decreases exponentially with the number of qubits (see
section D.6.4), interfering with the ability of the learning protocol as a result.
Proposals exist in the literature (such as weight initialisation and quantum cir-
cuit design) to address or ameliorate their effects, but barren plateaus remain
another quantum-specific phenomenon that must be addressed in quantum
circuit design.
5. Data encoding. Data encoding strategies (section D.6.5) also differ in the
quantum case, with data being encoded usually via either state representations
such as via binary encoding 0 for |0⟩ and 1 for |1⟩ or, for continuous data,
phase encoding e.g. via relative phases exp(iη) where η ∈ (−π, π). Encoding
strategies thus differ from their classical counterparts. They are important for
quantum and classical data processing as they enable leveraging the potential
advantages that motivate the use of quantum algorithms in the first place.
58 CHAPTER 2. BACKGROUND THEORY
In Chapters 3 and 4 we adapt parametrised variational quantum circuits [65, 80] for
solving specific problems in quantum simulation and quantum control. Variational
quantum circuits (see section D.7) can be defined as a unitary operator U (θ(t)) ∈
B(H) parametrised by the set of parameters θ ∈ Rm . A parametrised
R quantum
cir-
T ′ ′
cuit is a sequence of unitary operations U (θ, t) = T+ exp −i 0 H(θ, t )dt (equa-
tion (D.7.1)) and for the time-independent approximation
U (θ, t) t=T = UT −1 (θT −1 )(∆t)...U1 (∆t)(θ). The optimisation problem is then one of
minimising the cost functional cost functional on the parameter space given by C(θ) :
Rm → R by learning θ∗ = arg minθ C(θ). We let fθ : X → Y and denote U (X, θ) (a
quantum circuit) for initial state X ∈ X and parameters θ ∈ Rm . Let {M } represent
the set of (Hermitian) observable (measurement) operators fθ (X) = Tr(M ρ(X, θ))
(equation (D.7.3)) for ρ(X, θ) = U (X, θ)ρ0 U (X, θ)† . Such parametrised circuits are
denoted variational because variational techniques are used to solve the minimisa-
tion problem of finding θ∗ . For our purposes in Chapters 3 and 4 we parametrised
unitaries by control functions uj (t). It is these control functions that are actually
parametrised such that u(t) = u(θ(t)) where a range of deep learning neural net-
works are used to learn optimal u(t) which is then applied to a Hamiltonian to
generate U = U (θ(t)).
The optimisation procedure adopted is based on the fidelity function as central
to the loss function (cost functional) according to which the classical register θ is
updated. The loss function based on the fidelity metric (definition A.1.40) adopted
in equation (D.7.4) in Chapter 4 (batch fidelity) takes the mean squared error (MSE,
see equation (D.3.6) above) of the loss (difference) between fidelity of the estimated
unitary and target as the measure of empirical risk (section D.3) using the notation
for cost functionals C:
n
1X
C(F, 1) = (1 − F (Ûj , Uj ))2
n j=1
trol theory, essentially as an optimisation method for PMP-based protocols, (ii) geo-
metric information theory where infamous relationships between Fisher-information
metrics and Riemannian metrics have seen the application of differential geometric
techniques for optimisation of information-theoretic (or register-based) problems,
(iii) Lie group machine learning which was a relatively early application of some
symmetry techniques in Lie theory (familiar to control and geometric theory) to
problems in machine learning (such as learning symmetries in datasets reflective
of group symmetries) and (iv)geometric quantum machine learning which leverages
Lie theory and particularly dynamical Lie algebras in the construction and design
of compositional neural networks and to address issues around barren plateaus. Of
these, geometric quantum machine learning (GQML), a field which has emerged rel-
atively recently, is the area within which this work is most well-situated. See section
D.8.3 for more detail.
(i) Objective. In both Chapters 3 and 4, our aim is to provide a sequence of con-
60 CHAPTER 2. BACKGROUND THEORY
(ii) Input layers. Input layers a(0) thus vary but essentially in the case of Chapter
4 comprise target unitaries UT which are then fed into subsequent layers a(l) .
(iii) Feed-forward layers. The feed-forward layers (definition D.4.4) then comprise
typical linear neurons with an activation function σ given by a(1) = σ(W T a(0) +
b).
(iv) Control pulse layers. The feed-forward layers then feed into bounded control
pulse layers, there being one parametrised control function for each generator
in the Hamiltonian.
(v) Hamiltonian and unitary layers. The control functions are then combined
into Hamiltonian layers which are then fed into a layer comprising unitary
activation functions. In Chapter 4 this enables generation of the candidate
sequence (Ûj ).
(vi) Optimisation strategies. We utilise batch fidelity via an empirical risk measure
that is the MSE of 1 minus fidelity of Uj and Ûj or otherwise UT , ÛT (see
equation 4.6.2). We also experimented with a variety of other hyperparameters
(see section D.5.4) including the use of dropout [81] which effectively prunes
neurons in order to deal with overfitting.
3.1 Abstract
The availability of large-scale datasets on which to train, benchmark and test algo-
rithms has been central to the rapid development of machine learning as a discipline.
Despite considerable advancements, the field of quantum machine learning has thus
far lacked a set of comprehensive large-scale datasets upon which to benchmark the
development of algorithms for use in applied and theoretical quantum settings. In
this Chapter, we introduce such a dataset, the QDataSet, a quantum dataset de-
signed specifically to facilitate the training and development of quantum machine
learning algorithms. The QDataSet comprises 52 high-quality publicly available
datasets derived from simulations of one- and two-qubit systems evolving in the
presence and/or absence of noise. The datasets are structured to provide a wealth
of information to enable machine learning practitioners to use the QDataSet to
solve problems in applied quantum computation, such as quantum control, quan-
tum spectroscopy and tomography. Accompanying the datasets on the associated
GitHub repository are a set of workbooks demonstrating the use of the QDataSet
in a range of optimisation contexts.
The QDataSet is constructed to mimic conditions in laboratories and experi-
ments where inputs and outputs to quantum systems are classical, such as via clas-
sically characterised controls (pulses, voltages) and measurement outcomes in the
form of a classical probability distribution over observable outcomes of measurement
(see section A.1.6). Actual quantum states, coherences and other characteristically
quantum features of the system, while considered ontologically extant, are, episte-
mologically speaking, reconstructions conditioned upon classical input and output
data. In a machine learning context, this means that the encoding of quantum
61
62 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
3.2 Introduction
Quantum machine learning (QML) is an emergent multi-disciplinary field combin-
ing techniques from quantum information processing, machine learning and opti-
misation to solve problems relevant to quantum computation [72, 83–85]. The last
decade in particular has seen an acceleration and diversification of QML across a
rich variety of domains. As a discipline at the interface of classical and quantum
computing, subdisciplines of QML can usefully be characterised by where they lie on
the classical-quantum spectrum of computation [86], ranging from quantum-native
(using only quantum information processing) and classical (using only classical in-
formation processing) to hybrid quantum-classical (a combination of both quantum
and classical). At the conceptual core of QML is the nature of how quantum or
hybrid classical-quantum systems can learn in order to solve or improve results in
constrained optimisation problems. The type of machine learning of relevance to
QML algorithms very much depends on the specific architectures adopted. This is
particularly the case for the use of QML to solve important problems in quantum
control, quantum tomography and quantum noise mitigation. Thus QML combines
concepts and techniques from quantum computation and classical machine learning,
while also exploring novel quantum learning architectures.
While quantum-native QML is a burgeoning and important field, the more com-
monplace synthesis of machine learning concepts with quantum systems arises in
classical-quantum hybrid architectures [64, 87–90]. Such architectures are typically
characterised by a classical parametrisation of quantum systems or degrees of free-
dom (measurement distributions or expectation values) whose classical parameters
are updated according to classical optimisation routine (such as variational quantum
circuits discussed in section D.7). In applied laboratory and experimental settings,
hybrid quantum-classical architectures remain the norm primarily due the fact that
most quantum systems rely upon classical controls [91, 92]. To this end, hybrid
classical-quantum QML architectures which are able to optimise classical controls
or inputs for quantum systems have wider, more near-term applicability for both ex-
periments and NISQ [93, 94] devices. Recent literature on hybrid classical-quantum
algorithms for quantum control [40, 82], noisy control [95] and noise characteri-
sation [82] present examples of this approach. Other recent approaches include
the hybrid use of quantum algorithms and classical objective functions for natural
language processing [96]. Thus the search for optimising classical-quantum QML
3.2. INTRODUCTION 63
We set out below principles of large-scale datasets that are desirable in a quan-
tum machine learning context and which were adopted in the preparation of the
QDataSet. More detail on QML is set out in Appendix D.
The typography above is useful for classifying the various optimisation strategies
adopted across QML and quantum computing more widely. We expand on it in a
bit more detail below in order to situate the QDataSet within the literature. Optimi-
sation strategies across QML and quantum information processing vary considerably
3.4. QML OBJECTIVES 65
3.4.1 Overview
Cross-disciplinary programmes focused on building quantum datasets for machine
learning will benefit from a framework to categorise and classify the particular objec-
tives of QML architectures and articulation of number of design principles relevant
to the taxonomy of QML datasets. Designing large-scale datasets for QML requires
an understanding of the objectives for which QML research is undertaken and the
extent to which those objectives involve classical and/or quantum information pro-
cessing. Following [86], the application of machine learning techniques to quantum
information processing can be usefully parsed into a simple input / output and pro-
cess taxonomy on the basis of whether information and computational processes are
classical or quantum in nature. Here a process, input or output being ‘quantum
in nature’ refers to the phenomenon by which the input or output data was gen-
erated, or by which the computational process occurs, is itself quantum in nature
given that measurement outcomes are represented as classical datasets from which
the existence of quantum states or processes is inferred. Quantum data encoded
in logical qubits, for example in quantum states (superpositions or entangled), is
different from classical data, in practical terms information about such quantum
data arises by inference on measurement statistics whose outcomes are classical (see
section D.6.5 for a discussion of data encoding). This taxonomy can be usefully par-
titioned into four quadrants depending on the objectives of the QML task (to solve
classical or quantum problems) and the techniques adopted (classical or quantum
66 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
computational methods). Table (D.1) lists the various classical and quantum inputs
according to this taxonomy.
1. Classical machine learning for classical data. The first quadrant covers the
application of classical computational (machine learning) methods to solve
classical problems, that is, problems not involving data or processes of a quan-
tum character.
2. Classical machine learning for quantum data. The second quadrant covers
the application of classical computational and machine learning techniques
to solving problems of a quantum character. Specifically, this subdivision of
QML covers using standard machine learning techniques to solving problems
specific to the theoretical or applied aspects of quantum computing, including
optimal circuit synthesis [40,82,105], design of circuit architectures and so on.
Either input or output data are quantum in nature, while the computational
process by which optimisation, for example, occurs is itself classical.
3. Quantum algorithms for classical optimisation. The third quadrant covers the
application of quantum algorithmic techniques to solving classical problems. In
this subdivision, algorithms are designed leveraging the unique characteristics
of quantum computation, in ways that assist in optimising classical problems
or solve certain classes of problems which may be intractable on a classical
computer. Quantum algorithms are designed with machine learning char-
acteristics, potentially utilising certain computational resources or processes
unavailable when constrained to classical computation. Examples of such algo-
rithms include variational quantum eigensolvers [106–109], quantum analogues
of classical machine learning techniques (e.g. quantum PAC learning [110]) and
hybrid quantum analogues of deep learning architectures (see [72, 90, 111] for
background).
The QDataSet fits within the second subdivision of QML, its primary use being
envisaged as assisting in the development of classical algorithms for optimisation
problems of engineered quantum systems. Our focus on classical techniques ap-
plied to quantum data is deliberate: while advancements in quantum algorithms are
both exciting and promising, the unavailability of a scalable fault-tolerant quantum
3.4. QML OBJECTIVES 67
computing system and limitations in hybrid NISQ devices mean that for the vast
majority of experimental and laboratory use cases, the application of machine learn-
ing is confined to the classical case. Secondly, as a major motivation of this work
is to provide an accessible basis for classical machine learning practitioners to enter
the QML field, it makes sense to focus primarily on applying techniques from the
classical domain to quantum data.
Classical machine learning has become one of the most rapidly advancing scientific
disciplines globally with immense impact across applied and theoretical domains.
The advancement and diversification of machine learning over the last two decades
has been facilitated by the availability of large-scale datasets for use in the research
and applied sciences. Large-scale datasets [14,97,115] have emerged in tandem with
increasing computational power that has seen the velocity, volume and veracity of
data increase [12,116]. Such datasets have both been a catalyst for machine learning
advancements and a consequence or outcome of increasing scope and intensity of
data generation. The availability of large-scale datasets led to the evolution of data
mining, applied engineering and even theoretical results in high energy physics [117].
An important lesson for QML is that developments within these fields have been
facilitated using such datasets in a number of ways. Large-scale datasets improve the
trainability of machine learning algorithms by enabling finer-grained optimisations
via commonplace methods such a backpropagation (discussed in Appendix D). This
has particularly been true within the field of deep learning and neural networks
[14], where large-scale datasets have enabled the development of deeper and richer
algorithmic architectures able to model complex non-linearities and functional forms,
in turn leading to drastic improvements and breakthroughs across domains such as
image classification, natural language processing [118, 119] and time series analysis.
With an abundance of data on which to train algorithms, new techniques such as
regularisation and dimensionality reduction to address problems arising from large-
scale datasets, including overfitting and complexity considerations, have emerged,
in turn spurring novel innovations that have contributed to the advancement of the
field. Large-scale datasets have also served a standardising function by providing
a common basis upon which algorithmic performance may be benchmarked and
standardised. By providing standard benchmarks, large-scale datasets have enabled
researchers to focus on important features of algorithmic architecture in the design
of improvements to training regimes. Such datasets have also enabled the fostering
of the field via competitive platforms such as Kaggle, where researchers compete to
improve upon state of the art results.
68 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Large-scale dataset characteristics affect the utility of the datasets in applied con-
texts. Such characteristics are relevant to the design of quantum datasets. Below we
3.4. QML OBJECTIVES 69
set out a number of principles used in the design of the QDataSet which we believe
provide a useful taxonomy for the QML community to consider when generating data
for use in machine learning-based problems. The aim of the proposed taxonomy for
quantum datasets is to facilitate their interoperability across machine learning plat-
forms (classical and quantum) and for use in optimisation for experimentalists and
engineered quantum systems. While taxonomies and specific architectures will differ
across domains, our proposed QDataSet taxonomy we believe will assist the QML
and classical ML community to guide large-scale data generation towards principles
of interoperability summarised in Table (3.1) and explained below:
3. Training and test sets. Applied datasets for machine learning require training,
validation and test sets in order to adequately train algorithms for objectives,
such as quantum control. The design of quantum datasets in general, and the
QDataSet in particular, has been informed by desirable properties of train-
ing sets. These include, for example: (i) interoperability, ensuring training
set data can be adequately formatted for use in various programming lan-
guages (for example storing QDataSet data via matrices, vectors and tensors
70 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
4. Data precision and type. Data precision and data typing is an important
consideration for quantum datasets, primarily to facilitate ease of interoper-
ability between software and applied/hardware in the quantum space. Others
considerations can include the degree of precision with which data should be
presented. Ideally, quantum data for use in developing algorithms for applica-
tion in experimental quantum control or measurement scenarios should allow
flexibility of data precision to match instruments used in laboratories on a
case-by-case basis. For example, the QDataSet choices regarding noise de-
grees of freedom (such as amplitude, mean and standard deviation) have been
informed by collaborations with experimental groups.
Studying features of particular datasets and their use in classical contexts assists
in extracting desirable features for large-scale quantum datasets. ImageNet is one
of the leading research projects and database architectures for images [14, 98, 115].
The dataset is one of the most widely cited and important datasets in the de-
velopment of machine learning, especially for image-classification algorithms using
convolutional, hierarchical and other deep-learning based neural networks. The evo-
lution of ImageNet and its use within machine learning disciplines provides a useful
guide and comparison for the development of QML datasets in general. ImageNet
comprises two main components: (i) a public semi-structured dataset of images
together with (ii) an annual competition and workshop. The dataset provides a
‘ground truth’ standardised set of images against which to train categorical image
classification algorithms. The competition and workshop provided and continue to
provide an important institutional practice driving machine learning development.
While beyond the scope of this work, the development of QML would arguably be
considerably assisted by the availability of QML-focused competitions akin to those
commonplace within the classical machine learning community. Such competitive
frameworks would motivate and drive the development of scalable and generalisable
QML algorithms. As is also evident from classical machine learning, competitive for-
mats are also a useful way for laboratories, companies or other projects to leverage
the expertise of the diverse machine learning community.
Another example from the machine learning community which can inform the
development of QML is Kaggle, a leading online platform for machine learning-based
competitions. Kaggle runs competitions where competitors are provided with pre-
diction tasks, training sets and constraints upon the type of algorithm design (such
as resource use and so on). Competitors then develop models aiming to optimise a
measure of success, such as a standard machine learning metric of accuracy, AUC
or some other measure [125]. The open competitive nature of Kaggle is designed
to crowd-source solutions and expertise to problems in machine learning and data
science. A ‘quantum Kaggle’ would be of considerable benefit to the QML commu-
nity by providing a platform through which to spur collaborative and competitive
development of quantum algorithms.
3.4. QML OBJECTIVES 73
be cognisant of how their data can be (more easily) used in such platforms below
and also how their datasets can be designed in ways that facilitates their ease of
use within common machine learning languages, such as TensorFlow, PyTorch and
across languages, such as Python, C#, Julia and others. The QDataSet has been
specifically designed in relatively elementary Python packages such as Numpy in or-
der to facilitate its use across the machine learning community, but also in a way that
we hope makes it useable and clearly understandable by the quantum engineering
community. We deliberately selected Python as the language of choice within which
to build the QDataSet simulation given its status as the leading classical program-
ming language of choice for machine learning. It also is a language adopted across
many of the quantum platforms above. We built the QDataSet using Numpy to
produce datasets as transferable as possible (rather than, for example, in Qutip). A
familiarity with the emerging quantum programming and QML ecosystem is useful
for the design of quantum datasets. We set out a few examples of leading quantum
programming platforms below.
An important aspect of quantum dataset design is the decision regarding what quan-
tum information to include in the dataset. In this section, we list types of quantum
data which may be included in large-scale quantum datasets. By quantum data, we
refer to data generated or characterising quantum systems or processes. More specif-
ically, by quantum data we refer to quantum registers (see definition A.1.3) discussed
in Appendix A together with an assumed state preparation procedure (see [43, 144]
on state preparation generally and the Appendices for more discussion) which suffi-
ciently encode information into quantum states that subsist and transform according
to the protocols of quantum information processing. Quantum data may comprise
a range of different properties, features or characteristics of quantum systems, the
environment around quantum systems. Such data is described using a classical al-
phabet Σ and encoded in quantum states in ways that constitute a quantum register
of that information. It may comprise data and information abstracted into a partic-
ular representation or form, such as circuit gates, algebraic formulations, codes etc
or more physical forms, such as statistics or read-outs from measurement devices.
For QML datasets, it is useful to ensure that quantum data is sufficient for classi-
cal machine learning researchers to understand and for integrating quantum data
into their algorithmic techniques. For example, a classically parameterised quantum
circuit (section D.7), as common throughout the QML literature, would typically
76 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
include data or tensors of the relevant parameters, the operators related to such pa-
rameters (such as generators) and the quantum states (vectors or density operators)
upon which the circuit acts.
3.4.7.1 Formalism
operators will take the form of matrices or tensors. It is worth noting that the
operator acting on a quantum state ρ is a unitary U (t) ∈ B(H) which is itself (in
a closed quantum system) a function (or representation) of the Hamiltonian H(t)
governing the system dynamics. In practice, unitaries are formed by exponentiating
time-independent approximations of Hamiltonians. These unitaries represent solu-
tions to the time-dependent Schrödinger equation governing evolution for a closed
system.
The evolutionary dynamics of the quantum system are completely described by
the Hamiltonian operator acting on the state |ψ⟩ such that |ψ(t)⟩ = U (t) |ψ(t = 0)⟩
(section A.1.4). In density operator notation, such evolution is represented as ρ(t) =
U (t)ρ(t0 )U (t)† . Typically solving the continuous form of the Schrödinger equation
is intractable or infeasible, so a discretised approximation as a discrete quantum
circuit (where each gate Ui is run for a sufficiently small time-step ∆t) is used (e.g.
via Trotter-Suzuki decompositions). Open quantum systems are described by the
formalism of Lindbladian master equations (equation A.1.19), representing the effect
of noise sources upon the evolution of quantum states.
There are a number of other important concepts for classical machine learning prac-
titioners to be aware of when using quantum datasets such as the QDataSet. We
set out some of these: (a) relative phase (see section A.1.3), for a qubit system,
amplitudes a and b differ by a relative phase if a = exp(iθ)b, θ ∈ R (relative phase
is an important concept as the relative phase encodes quantum coherences and is,
together with basis encoding, a primary means of encoding data in quantum sys-
tems); (b) entanglement (see section A.1.7), composite states (known as EPR or Bell
states), may be entangled, meaning that their measurement outcomes are necessarily
correlated. For a two-qubit state:
|00⟩ + |11⟩
|ψ⟩ = √ (3.4.1)
2
1
Hd (t) = Hd = Ωσz . (3.4.2)
2
Here σz is the Pauli generator (see equation (3.5.17)) for z-axis rotations. The Ω
term represents the energy gap of the quantum system (the difference in energy
between, for example, the ground and excited state of the qubit, recalling qubits
are characterised by having two distinct quantum states). The single-qubit drift
Hamiltonian for the QDataSet is time-independent for simplicity, though in realistic
cases it will contain a time-dependent component. For the single-qubit control
and noise Hamiltonians we have two cases based upon the concept of which axes
controls and noise are applied. Recall we can represent a single qubit system on a
Bloch sphere, with axes corresponding to the expectations of each Pauli operator and
where operations of each Pauli operator constitute rotations about the respective
axes. Our controls are control functions (equation (A.2.6)), mostly time-dependent,
that apply to each Pauli operator (generator). They act to affect the amplitude over
time of rotations about the respective Pauli axes. More detailed treatments of noise
80 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
in quantum systems and quantum control contexts can be found in [15, 144].
As discussed above, the functional form of the control functions fα (t) varies (see
section A.2 for background on control functions). We select both square pulse and
Gaussian pulse wavefunctions as the form (see below). Each different noise function
βα (t) is parameterised differently depending on various assumptions that are more
specifically detailed in [82] and summarised below. Noise and control functions are
applied to different qubit axes in the single-qubit and two-qubit cases. For a single
qubit, we first have single-axis control along x-direction:
1
Hctrl (t) = fx (t)σx (3.4.3)
2
with the noise (interaction) Hamiltonian H1 (t) along z-direction (the quantification
axis):
1
H1 (t) = βz (t)σz (3.4.4)
2
Here the function βz (t) (a classical noise function β(t) applied along the z-axis) may
take a variety of forms depending on how the noise was generated (see below for
a discussion of noise profiles e.g. N1-N6). See section A.3.2. It should be noted
that noise rarely has a functional form and is itself difficult to characterise (so β(t)
should not be thought of as a simple function). For the second case, we implement
multi-axis control along x- and y- directions and noise along x- and z-directions in
the form:
1 1
Hctrl (t) = fx (t)σx + fy (t)σy (3.4.5)
2 2
1 1
H1 (t) = βx (t)σx + βz (t)σz . (3.4.6)
2 2
Noiseless evolution may be recovered by choosing H1 (t) = 0. Note given the choice
of the z-axis as the quantification axis, the application of x-axis noise may give rise
to decohering effects. For the two-qubit system, we chose the drift Hamiltonian in
the form:
1 1
Hd (t) = Ω1 σz ⊗ σ0 + Ω2 σ0 ⊗ σz . (3.4.7)
2 2
For the control Hamiltonians, we also have two cases. The first one is local control
along the x-axis of each individual qubit, akin to the single-qubit case each. In
the notation, f1α (t) indicates that the control function is applied to, in this case,
the second qubit, while the first qubit remains unaffected (denoted by the ‘1’ in the
subscript and by the identity operator σ0 ). We also introduce an interacting control.
This is a control that acts simultaneously on the x-axis of each qubit, denoted by
3.4. QML OBJECTIVES 81
fxx (t):
1 1
Hctrl (t) = fx1 (t)σx ⊗ σ0 + f1x σ0 ⊗ σx + fxx (t)σx ⊗ σx . (3.4.8)
2 2
The second two-qubit case is for local-control along the x−axis of each qubit only
and is represented as:
1 1
Hctrl (t) = fx1 (t)σx ⊗ σ0 + f1x σ0 ⊗ σx , (3.4.9)
2 2
For the noise, we fix the Hamiltonian to be along the z-axis of both qubits, in the
form:
1 1
H1 (t) = βz1 (t)σz ⊗ σ0 + β1z σ0 ⊗ σz . (3.4.10)
2 2
Notice, that for the case of local-only control and noiseless evolution, this will corre-
spond to two completely-independent qubits and thus we do not include this case, as
it is already covered by the single-qubit datasets. We also note that not all interac-
tion terms (such as σz ⊗σz ) need be included in the Hamiltonian. The reason for this
is that to achieve universal control equivalent to including all generators, one only
need include one-local control for each qubit together with interacting (entangling)
terms (though we note recent results regarding constraints on 2-local operations in
the presence of certain symmetries [146]). Assuming one has a minimal (bracket-
generating, see definition C.4.4) set of Pauli generators in su(2) in the Hamiltonian,
one may synthesise any Pauli gate of interest for the one- or two-qubit systems
(i.e. given two Pauli gates, one can synthesise the third) making the set of targets
reachable (definition C.5.4), thus achieve effective universal control (see section C.5
for bracket-generating subalgebras and equation B.2.22 for discussion of the BCH
formula).
To summarise, the QDataSet includes four categories for the datasets set-out in
Table 3.6. The first two categories are for 1-qubit systems, the first is single axis
control and noise, while the second is multi-axis control and noise. The third and
fourth categories are 2-qubit systems with local-only control or with an additional
interacting control together with noise.
noise (baths) was used as input training data for a novel greybox machine learning
alternative method for characterising noise. In this work, the underlying simulation
was modified to generate a greater variety of qubit-noise examples for the single
qubit case. The simulation was then extended beyond that in [82] to generate ex-
amples for the two-qubit case (in the presence or absence of noise). As discussed
above, the evolution of closed and open quantum systems is described by Hamil-
tonian dynamics, which encode time-dependent functions into operators which are
the Lie algebra generators (see section B.2.4) of time-translations (operators) acting
on quantum states. To summarise: the Hamiltonian comprises three elements: (i) a
drift Hamiltonian Hd (t), encoding the endogenous evolution of the quantum system;
(ii) a control Hamiltonian Hctrl (t), encoding evolution due to the application of clas-
sical controls which may be applied to the quantum system to steer its evolution;
and (iii) and an interaction (noise) Hamiltonian H1 (t), encoding the effects of cou-
pling of the quantum system to its environment, such as a decohering noise source
(a bath). The Hamiltonians are composed using Pauli generators (see section 3.5.3.5
and equations (3.5.17)), representing elements of the Lie algebra su(2). We express
the Hamiltonians in the Pauli operator basis {σi } which forms a complete basis for
our one- and two-qubit systems. Our control functions are represented as fα (t) for
α = x, y, z where the subscript indicates which generator the control applies to.
Concretely, for example, σz control is denoted fz (t)σz . In general, continuous time-
dependent control formulations are difficult - or infeasible - to solve analytically,
where solving here means finding a suitable representation of the control unitary
(equation (A.1.22)):
RT
Uctrl = T+ e−i 0 fα σα /2dt
(3.4.11)
Most controls are usually classical i.e. fα (t) ∈ R. The simplest control functional
form is fixed amplitude control [147] or what is also described as a square pulse,
where constant energy (expressed as amplitudes) is expressed for a discrete time-
step ∆t. Other control waveforms include Gaussian. Moreover, quantum control in
the QDataSet context has two primary imperatives. The first is the use of control in
closed noise-free idealised quantum systems where the objective is the use of controls
to steer the quantum system to some desired objective state. This is equivalent to
the synthesis of quantum circuits (sequences of quantum gates) from the identity I
to a target unitary UT . The second is the use of controls in the presence of noise,
where the quantum system is coupled to an environment that potentially decoheres
the system. In this second case, ideally the controls are tailored to neutralise the
effect of noise on the evolution of the quantum system - a process typically described
by dynamic decoupling [148, 149] (see for example Hahn echo or other examples).
3.4. QML OBJECTIVES 83
Crafting suitable controls to mitigate noise effects is challenging. One must properly
time and select appropriate amplitudes for the application of controls to counteract
decohering interference, modelled as a superoperator term in master equations, for
example (see equation (A.3.5)). Typically, it requires information about the noise
spectrum itself, obtained using techniques from quantum noise spectroscopy [144].
It also requires an understanding of how control and noise signals convolve in the
context of quantum evolution. Dealing with noise is a central imperative of quantum
information processing and the engineering of quantum systems where the aim is to
preserve and extend coherence times of quantum information and to correct for errors
that arise during the evolution of quantum computational systems. To this end,
a major stream of research in quantum information processing concerns quantum
error-correcting codes (QEC) as means of encoding quantum information in a way
that renders it robust to noise-induced errors and/or enables ‘self-correction’ of
errors as a quantum system evolves.
Recall that the control pulse sequences in the QDataSet consist of two types of
waveforms. The first is a train of Gaussian pulses, and the other is a train of square
pulses, both of which are very common in actual experiments. Square pulses are
the simplest waveforms, consisting of a constant amplitude Ak applied for a specific
time interval ∆tk :
where k runs over the total number of time-steps in the sequence. The three param-
eters of such square pulses are the amplitude Ak , the position in the sequence k and
the time duration over which the pulse is applied ∆t. In the QDataSet, the pulse
parameters are stored in a sequence of vectors {an }. Each vector an is of dimension
r parameters of each pulse (e.g. the Gaussian pulse vectors store the amplitude,
mean and variance, the square pulse vectors store pulse position, time interval and
amplitude), enabling reconstruction of each pulse from those parameters if desired.
For simplicity, we assume constant time intervals such that ∆tk = ∆t. The Gaussian
waveform can be expressed as:
n
2 /2σ 2
X
f (t) = Ak e−(t−µk ) k . (3.4.13)
k=1
where n is the number of Gaussian pulses in the sequence. The parameters of the
Gaussian pulses differ somewhat from those of the square pulses. Each of the n
pulses in the sequence is characterised by a set of 3 parameters: (i) the amplitude
84 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Ak (as with the square pulses), (ii) the mean µk and (iii) the variance σk of the
pulse sequence. Thus in total, the sequence is characterised by 3n parameters. The
amplitudes for both Gaussian and square pulses are chosen uniformly at random
from the interval [Amin , Amax ], the standard deviation for all Gaussian pulses in
the train is fixed to σk = σ, and the means are chosen randomly such that there
is minimal amplitude in areas of overlapping Gaussian waveform for the pulses in
the sequence. The pulse sequences can be represented in the time or frequency
domains [150]. The QDataSet pulse sequences are represented using the time-domain
as it has been found to be more efficient feature for machine learning algorithms [82].
As discussed in [82], the choice of structure and characteristics of quantum
datasets depends upon the particular objectives and uses cases in question, the labo-
ratory quantum control parameters and experimental limitations. Training datasets
in machine learning should ideally be structured so as to enhance the generalisabil-
ity. In the language of statistical learning theory, datasets should be chosen so as
to minimise the empirical risk (equation D.3.3) associated with candidate sets of
classifiers [12, 13]. In a quantum control context, this will include understanding for
example the types of controls available to researchers or in experiments, often volt-
age or (microwave) pulse-based [151]. The temporal spacing and amplitude of each
pulse in a sequence of controls applied during an experiment may vary by design
or to some degree uncontrollably. Pulse waveforms can also differ. For example,
the simplest pulse waveform is a constant-amplitude pulse applied for some time
∆t [152]. Such pulses are characterised by for example a single parameter, being the
amplitude of the waveform applied to the quantum system (this manifests as we dis-
cuss below as an amplitude applied to the algebraic generators of unitary evolution
(see [15, 40] for an example)). Other models of pulses (such as Gaussian) are more
complex and require more sophisticated parametrisation and encoding with machine
learning architectures in order to simulate. More detail on such considerations and
the particular pulse characteristics in the QDataSet are set-out in Tables (3.8) and
(3.9).
For most machine learning practitioners using the QDataSet, the entry point will
be the application of known classical machine learning metrics. More advanced
uses of the QDataSet may utilise quantum-specific metrics directly, for example,
via reconstruction of quantum states from measurement statistics. Some use cases
combine the use of classical and quantum metrics. For example, [40, 82] combine
average operator fidelity (equation A.1.49) with standard mean-squared error (MSE)
(equation (D.3.6)) into a measure of empirical risk denoted as ‘batch fidelity’ as per
equation (D.7.4). In those examples, the objective in question is to train a greybox
3.5. EXPERIMENTAL METHODS 85
algorithm (section D.9) to model certain control pulses needed to synthesise target
unitaries. The algorithms learn the particular control pulses which are applied
to generators. While it is the extraction of control pulses which are of interest
to experimentalists, the final output of the algorithm is a sequence of fidelities
where the fidelity of generated (synthesised) unitaries is compared against the target
(label) unitaries UT . This sequence of fidelities is then compared against a vector of
ones with the loss function set to minimise the MSE (distance) between the fidelity
sequence and the label vector. In doing so, the algorithms are effectively trained to
maximise fidelity (as fidelities ≈ 1 are desirable) yet do so using a hybrid approach.
The QDataSet has been generated such that a combination of classical, quantum
and hybrid metrics of divergence measures may be used in the training process.
The total compressed size of the QDataSet (using Pickle and zip formats) is around
14TB (uncompressed, several petabytes). Researchers can use the QDataSet in a
variety of ways to design algorithms for solving problems in quantum control, quan-
tum tomography and quantum circuit synthesis, together with algorithms focused
on classifying or simulating such data. We also provide working examples of how
to use the QDataSet in practice and its use in benchmarking certain algorithms.
Each part below provides in-depth detail on the QDataSet for researchers who may
be unfamiliar with quantum computing, together with specifications for domain ex-
perts within quantum engineering, quantum computation and quantum machine
86 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
The QDataSet was based on simulations of one and two-qubit systems only. The ra-
tionale for this choice was primarily one of computational feasibility. The QDataSet
was generated over a six-month period using the University of Technology, Sydney’s
High Performance Computing (HPC) cluster, with some computations taking sev-
eral weeks alone. In principle larger datasets based on our simulation code can be
prepared, however we note the computational resources of doing so may be consider-
able. To generate the datasets, we wrote bespoke code in TensorFlow which enabled
us to leverage GPU resources in a more efficient manner. As we discuss below, we
were interested in developing a dataset that simulated noise affecting a quantum
system. This required performing Monte Carlo simulations (see section 3.5.3.7) and
solving Schrödinger’s equation several times for each dataset. While existing pack-
ages, such as Qutip (see below) are available to model the effect of noise on quantum
systems, we chose not to rely upon such systems. The reason was that Qutip re-
lies upon Lindblad master equations to simulate system/noise interactions which in
turn rely upon the validity of certain assumptions and approximations. Chief among
these is that noise is Markovian. In our datasets, we included coloured noise with
a power-spectrum density which is non-Markovian. Furthermore, Qutip’s methods
assumes a model of a quantum system interacting with a fermionic or bosonic bath
which was not applicable in our case given we were modelling the imposition of
classical noise using Monte Carlo methods.
The resource cost for simulating the various qubit systems depended upon whether
we sought to simulate noise or distortion. We found, however, that simulating the
two-qubit systems took a significant amount of time, nearly four-weeks of runtime
for a single two-qubit system. While multiple nodes of the HPC cluster were utilised,
even on the largest node on the cluster (with at least 50-100 cores and two GPUs),
3.5. EXPERIMENTAL METHODS 87
the simulation time was extensive, even using GPU-equipped clusters. We estimate
that more efficient speedup could be obtained by directly simulating in lower-order
languages, such as C++. For this reason, we restricted the QDataSet to simula-
tions of at most two-qubit systems. Such a choice obviously limits direct real-world
applications of algorithms trained on the QDataSet to one- and two-qubit systems
generally. While this may appear a notable limitation given the growing abundance
of higher-order multi-qubit NISQ systems, it remains the case that many experi-
mental laboratories remain limited to small numbers of qubits. We expect in most
situations that one and two-qubit gates are all that are available. Engineering more
than two-body interactions is an incredible challenge and only available in certain
architectures.
NISQ devices offer promising next steps, but it is primarily one- and two-qubit
systems that have demonstrated the type of long coherence times, fast gate ex-
ecution and fault-tolerant operation needed for truly scalable quantum computa-
tion [93,153,154]. As a result, the QDataSet can be considered relevant to the state
of the art. Additionally, simulating more than two qubits would have exceeded com-
putational capacity constraints related to our specific simulated code which includes
interactions and iterations over different noise profiles. Moreover, developing algo-
rithms on the basis of small qubit systems is a commonplace way of forming a basis
for algorithms for larger multi-qubit systems: training classical machine learning
algorithms on lower-order qubit systems has the benefit of enabling researchers to
consider how such algorithms can or may learn multi-qubit relations which in turn
can assist in algorithm design when applied to higher-order systems. Doing so will
be an important step in building scalable databases for applying machine learning
to problems in quantum computing.
The QDataSet was generated for non-QEC encoded data. The reasoning behind
this was that (i) specific error-correcting encodings differ considerably from case to
case, whereas unencoded quantum information is more prevalent in the experimen-
tal/laboratory setting; and (ii) quantum computational and NISQ devices are yet to
reach the scale and prevalence necessary for practical testing of QEC at scale. The
simulations in the QDataSet are based upon an alternative technique for quantum
control in the presence of a variety of noise [82], where a greybox neural network
(see section D.9) is used to learn only those characteristics of the noise spectrum rel-
evant to the application of controls (a comparatively simpler problem than seeking
to determine the entire spectrum). In this context, the objective of the QDataSet
is to enable algorithms to natively learn optimal error correction regimes from the
data itself (rather than by encoding in a QEC) via inferring the types of counter-
88 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
vailing controls (e.g. control pulses) that should be applied to minimise errors. In
principle, the same type of machine-learning control architecture could also apply to
QEC encoded data: the machine learning algorithms would in effect learn optimal
quantum control conditioned on the data being encoded in an error-correcting code.
Moreover, there is an emergent literature on using machine learning for QEC discov-
ery itself. For machine learning practitioners, the QDataSet thus provides a useful
way to seek to apply advanced classical machine learning techniques to challenging
but important problems.
The QDataSet was developed using methods that aimed to simulate realistic noise
profiles in experimental contexts. Noise applicable to quantum systems is gener-
ally classified as either classical or quantum [155]. Classical noise is represented
typically as a stochastic process [144] and can include, for example (i) slow noise
which is pseudo-static and not varying much over the characteristic time scale of
the quantum system and (ii) fast or ‘white’ noise with a high frequency relative
to the characteristic frequencies (energy scales) of the system [156]. The effect of
quantum noise on quantum systems is usually classified in two forms. The first is
dephasing (T2 ) noise, which characteristically causes quantum systems to decohere,
thus destroying or degrading quantum information encoded within qubits (see sec-
tion A.3.3). Such noise is usually characterised as an operator acting transverse to
the quantisation axis of chosen angular momentum. The second type of noise (T1 )
can shift the energy state of the system (e.g. shifting the system from a ground to
an excited state).
What this means in practice for the use of the QDataSet is usefully construed
as follows using a Bloch sphere. Once an orientation (x, y, z-axes) is chosen, one is
effectively choosing a choice of basis i.e. the basis of a typical qubit |ψ⟩ = a |0⟩+b |1⟩
is the basis of eigenstates of the σz operator. When noise acts along the z-axis (i.e. is
associated to the σz operator), then it has the potential to (if the energy of the noise
is sufficient) shift the energy state in which the quantum system is in, represented
by a ‘flip’ in the basis from |0⟩ to |1⟩ for example. This type of noise is T1 noise.
By contrast, noise may act along x- and y-axes of a qubit, which is represented as
being associated with the σx and σy operators. These axes are ‘transverse’ to the
quantisation axis. Noise along these axis has the effect of dephasing a qubit, thus
affecting the coherences encoded in the relative phases of the qubit. Such noise is
denoted T2 noise. Open quantum systems’ research (section A.3) and understanding
noise in quantum systems is a vast and highly specialised topic. As we describe
3.5. EXPERIMENTAL METHODS 89
below, the QDataSet adopts the novel approach outlined in [82] where, rather than
seeking to fully characterise noise spectra, the only the information about noise
relevant to the application of controls (to dampen noise) is sought. Such information
is encoded in the VO operator, which is an expectation that encodes the influence
of noise on the quantum system (see subsection 3.5.3 below). In a quantum control
problem using the QDataSet samples containing noise, for example, the objective
would then be to select controls that neutralise such effects.
• N0: this is the noiseless case (indicated in the QDataSet parameters as set out
in Tables (3.8) and (3.9));
• N1: the noise β(t) is described by its power spectral density (PSD) S1 (f ), a
form of 1/f noise with a Gaussian bump;
• N2: here β(t) is stationary Gaussian coloured noise described by its auto-
correlation matrix; chosen such that it is coloured, Gaussian and stationary
90 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
• N3: here the noise β(t) is non-stationary Gaussian coloured noise, again de-
scribed by its autocorrelation matrix which is chosen such that it is coloured,
Gaussian and non-stationary. The noise is simulated via multiplication of a
deterministic time-domain signal with stationary noise;
• N4: in this case, the noise β(t) is described by its autocorrelation matrix
chosen such that it is coloured, non-Gaussian and non-stationary. The non-
Gaussianity of the noise is achieved via squaring the Gaussian noise so as to
achieve requisite non-linearities;
• N5: a noise described by its power spectral density (PSD) S5 (f ), differing from
N1 only via the location of the Gaussian bump; and
• N6: this profile is to model a noise source that is correlated to one of the
other five sources (N1 - N5) through a squaring operation. If the β(t) is the
realization of one of the five profiles, N6 will have realisations of the form β 2 (t).
This profile is used for multi-axis and multi-qubit systems.
The N1 and N5 profiles can be generated following the method described in [82]
(see the section entitled “Implementation” onwards). Regarding the other profiles,
any standard numerical package can generate white Gaussian stationary noise. The
QDataSet noise realisations were encoded using the Numpy package of Python. We
deliberately did so in order to avoid various assumptions used in common quan-
tum programming packages, such as Qutip. To add colouring, we convolve the
time-domain samples of the noise with some signal. To generate non-stationarity,
we multiply the time-domain samples by a some signal. Finally, to generate non-
Gaussianity, we start with a Gaussian noise and apply non-linear transformation
such as squaring. The last noise profile is used to model the case of two noise
sources that are correlated with each other. In this case we generate the first one
using any of the profiles N1-N5, and the other source is completely determined.
Following on from discussion of noise spectral density and open quantum systems
(see section A.3.4), we specify three primary noise profiles for the construction of
the QDataSet below (see [158]).
1. Single-axis noise; orthogonal pulse. For a qubit with single-axis noise and
3.5. EXPERIMENTAL METHODS 91
1 1
H= (Ω + βz (t)) σz + fx (t)σx . (3.5.1)
2 2
2. Multi-axis noise; two orthogonal pulses. For single-qubit multi-axis noise and
two orthogonal control pulses, the Hamiltonian is:
1 1 1
H= (Ω + βz (t)) σz + (fx (t) + βx (t)) σx + fy (t)σy (3.5.3)
2 2 2
3. For the noiseless qubit with two orthogonal pulses, the Hamiltonian is:
1 1 1
H = Ωσz + fx (t)σx + fy (t)σy . (3.5.5)
2 2 2
1 1
H= (Ω + βz (t)) σz + fx (t)σx . (3.5.6)
2 2
To derive the Lindblad master equation, first we look for factors affecting system
evolution in terms of unitary and non-unitary dynamics: (a) unitary evolution driven
by the deterministic part of the Hamiltonian and (b) non-unitary evolution resulting
from the interaction with the environment. Thus βz (t) is the environmental inter-
action term which is ultimately related to the decoherence rate γz via the power
spectral density SZ (ω) (see equation (A.3.13) and section A.3.3 for more detail). To
identify the Lindblad operators, we note that the noise is along the z-axis such that
√
Lk = γz σz , where γz is the decoherence rate, related as per above related to the
92 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
√ √ 1
γz σz ρ γz σz − γz {I, ρ} = γz (σz ρσz − ρ). (3.5.8)
2
such that the Lindblad master equation for our system under z-axis noise is:
dρ 1
= −i (Ωσz + fx (t)σx ) , ρ + γz (σz ρσz − ρ). (3.5.9)
dt 2
Thus we can see the link between the measurable characteristics of environmental
noise and the theoretical description of its effects on quantum systems via equation
(A.3.5).
3.5.3.2 Distortion
In physical experiments, the control pulses are physical signals (such as microwave
pulses), which propagate along cables and get processed by different devices. This
introduces distortions which cannot be avoided in any real devices. However, by
properly engineering the systems, the effects of these distortions can be minimized
and/or compensated for in the design. In this work, we used a linear-time invariant
system to model distortions of the control pulses, and the same filter is used for all
datasets. We chose a Chebychev analogue filter [159] with an undistorted control
signal is the input and distorted the filter output signal. Table (3.7) sets out a
summary of key parameters.
In developing the QDataSet, we have assumed that the environment affecting the
qubit is classical and stochastic, namely that H1 (t) will be a stochastic term that
acts directly on the system. The stochasticity of H1 (t) means that the expectation
of any observable measured experimentally will be given as:
RT
where U0 (T ) = T+ e−i 0 H0 (t)dt
is the evolution matrix in the absence of noise and:
for a system in the state ρ (equation (A.1.14)). The expectation value of the ob-
servable is given by equation:
!
X X
⟨O⟩ = Tr(ρO) = Tr ρ mPm = m Pr(m). (3.5.16)
m m
The set of measurement operators for the QDataSet is the set of Pauli operators
which are important operators in quantum information processing involving qubit
systems. This is in part because such qubit systems can be usefully decomposed
into a Pauli operator basis via the Pauli matrices:
! ! !
0 1 0 −i 1 0
σx = , σy = , σz = (3.5.17)
1 0 i 0 0 −1
together with the identity (denoted σ0 ). Pauli operators are Hermitian (with eigen-
values +1 and −1), traceless and satisfy that σi2 = I. Together with the identity
matrix (which is sometime denoted by σ0 ), they form an orthonormal basis (with
respect to the Hilbert-Schmidt product defined as ⟨A, B⟩ = Tr(A† B)) for any 2 × 2
Hermitian matrix. QDataSet qubit states can then be expressed in this basis via
the density matrix:
1
ρ= (I + r · σ) (3.5.18)
2
where the vector r = (rx , ry , rz ) is a unit vector called the Bloch vector, and the
vector σ = (σx , σy , σz ). The dot product of these two vectors is just a shorthand
notation for the expression r · σ = rx σx + ry σy + rz σz . As such, a time-dependent
Hamiltonian of a qubit can be expressed as
X
H(t) = αi (t)σi (3.5.19)
i∈{x,y,z}
For n-qubit systems (such as two-qubit systems in the QDataSet), Pauli measure-
ments are represented by tensor-products of Pauli operators. For example, a σz
measurement on the first qubit and σx on the second is represented as:
σz ⊗ σx (3.5.25)
The Pauli representation of qubits used in the QDataSet can be usefully visu-
alised via the Bloch sphere as per Figure 3.4. The axes of the Bloch sphere are the
expectation values of the Pauli σx , σy and σz operators respectively. As each Pauli
operator has eigenvalues 1 and −1, the eigenvalues can be plotted along axes of the
p
2-sphere. For a pure (non-decohered) quantum state ρ, |ρ| = rx2 + ry2 + rz2 = 1
(as we require Trρ2 = 1), thus ρ is represented on the Bloch 2-sphere as a vector
originating at the origin and lying on the surface of the Bloch 2-sphere. The evo-
lution of the qubit i.e. a computation according to unitary evolution can then be
represented as rotations of ρ across the Bloch sphere. In noisy contexts, decohered
ρ are represented whereby |ρ| < 1 i.e. the norm of ρ shrinks and ρ no longer resides
on the surface.
For machine learning practitioners, it is useful to appreciate the operation of
the QDataSet Pauli operators σx , σy , σz as the generators of rotations about the
respective axes of the Bloch sphere. Represented on a Bloch sphere, the application
of σz to a qubit is equivalent to rotating the quantum state vector ρ about the z-axis
(see Figure 3.4). Conceptually, a qubit is in a z-eigenstate if it is lying directly on
either the north (+1) or south (−1) pole. Rotating about the z-axis then is akin to
rotating the vector on the spot, thus no change in the quantum states (or eigenvalues)
for σz occurs because the system exhibits symmetry under such transformations.
This is similarly the case for σx , σy generators with respect to their eigenvalues and
eigenvectors. However, rotations by σα will affect the eigenvalues/vectors in the σβ
basis where α ̸= β e.g. a σx rotation will affect the component of the qubit lying
along the σz axis. Similarly, a σz rotation of a qubit in a σx eigenstate will alter that
state (shown in (a) and (b) of Figure 3.4). An understanding of Pauli operators and
conceptualisation of qubit axes is important to the understanding of the simulated
QDataSet. An understanding of symmetries of relevance to qubit evolution (and
quantum algorithms) is also beneficial. As we describe below, controls or noise are
structured to be applied along particular axes of a qubit and thus can be thought
of as a way to control or distortions upon the axial rotations of a qubit effected by
the corresponding Pauli generator.
There exist higher dimensional generalization to the Pauli matrices that allow
forming orthonormal basis to represent operators in these dimensions. In particular
if we have a system of N qubits, then one simple generalization is to form the set
(1) (2) (N )
{σi1 ⊗σi2 ⊗· · · σiN }ij ∈{0,x,y,z} . In other words we take tensor products of the Pauli’s
which gives a set of size 4N . For example, for a two-qubit system we can form the
16− element set {σ0 ⊗ σ0 , σ0 ⊗ σx , σ0 ⊗ σy , σ0 ⊗ σz , σx ⊗ σ0 , σx ⊗ σx , σx ⊗ σy , σx ⊗
σz , σy ⊗ σ0 , σy ⊗ σx , σy ⊗ σy , σy ⊗ σz , σz ⊗ σ0 , σz ⊗ σx , σz ⊗ σy , σz ⊗ σz }. Moreover,
for many use cases, we are interested in the minimal number of operators, such as
Pauli operators, required to achieve a requisite level of control, such as universal
98 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
quantum computation.
For the single qubit system, initial states are the two eigenstates of each Pauli
operator. As noted above, the quantum state can be decomposed in the Pauli basis
as ρj = 12 (I ± σj ), for j = 1, 2, 3. This gives a total of 6 states. We perform the
three Pauli measurements on each of these states, resulting in a total of 18 possible
combinations. These 18 measurements are important to characterize a qubit system.
For two-qubits, it will be similar but now we initialize every individual qubit into
the 6 possible eigenstates, and we measure all 15 Pauli observables (we exclude
identity). This gives a total of 540 possible combinations.
Measurements of the one- and two-qubit systems for the QDataSet are undertaken
using Monte Carlo techniques. This means that a random Pauli measurement is un-
dertaken multiple times, with the measurement results averaged in order to provide
the resultant measurement distribution for each of the operators. The measurement
of the quantum systems is contingent on the noise realisations for each system. For
the noiseless case, the Pauli measurements are simply the Monte Carlo averages
(expectations) of the Pauli operators. Systems with noise will have one or more
noise realisations (applications of noise) applied to them. To account for this, we
include two separate sets of measurement distribution. The first the expectation
value of the three Pauli operators over all possible initial states for each different
noise realisation. These statistics are given by the set {VO } in the QDataSet. Thus
for each type of noise, there will be a set of measurement statistics. The second
is a set of measurement statistics where we average over all noise realisations for
the dataset. This second set of measurements is given by the set {EO }. Including
both sets of measurements enables algorithms trained using the QDataSet to be
more fine-grained in their treatment of noise: in some contexts, while noise profiles
may be uncertain, it is clear that the noise is of a certain type, so the first set of
measurement statistics may be more applicable. For other cases, there is almost no
information about noise profiles or their sources, in which case the average over all
noise realisations may be more appropriate.
For the benefit of researchers using the QDataSet, we briefly set out a bit more
detail of how the datasets were generated. The simulator comprises three main
components. The first approximates time-ordered unitary evolution. The second
component generates realisations of noise given random parametrisations of the
power spectral density (PSD) of the noise. The third component simulates quantum
3.6. SIMULATION RESULTS 99
measurement. The simulations are based upon Monte Carlo methods whereby K
randomised pulse sequences give rise to noise realisations. The quantum systems
are then measured to determine the extent to which the noise realisations affect the
expectation values. Trial and error indicated a stabilisation of measurement statis-
tics at around K = 500, thus K ≥ 1000 was chosen for the final simulation run to
generate the QDataSet. The final Pauli measurements are then averages over such
noise realisations. The parameter K is included for each dataset and example (as
described below). For more detail, including useful pseudocode that sets out the
relationship between noise realisations, β(t) and measurement, see supplementary
material in [82] (which we set out for completeness in section 3.13.1).
Further detail regarding the 52 datasets (including code for simulations) that we
present in this work for use solving engineering applications discussed in section 3.8
using classical machine learning can be found on the repository for the QDataSet
[39]. Table (3.2) sets out the taxonomy of each of the 52 different datasets. Each
dataset comprises 10,000 examples that are compressed into a Pickle file which is in
turn compressed into a zip file. The Item field indicates the dictionary key and the
Description field indicates the dictionary value.
100 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
• Noise Layer : An internal class dedicated to the generation of noise within the
simulation as set out in the Monte Carlo method in [82].
• QuantumCell : This internal Python class is essential for realizing the time-
ordered evolution of the quantum system.
1Z (identity on the first qubit, noise along the z−axis for the second) and Z1 (noise
along the z−axis for the first qubit, identity along the second) noise to follow the
(N1,N6) profile. This category simulates two individual qubits with correlated noise
sources. For category 4, we generate the noiseless, (N1,N5), and (N1,N6) for the 1Z
and Z1 noise. This gives 3 datasets. Therefore, the total number of datasets at this
point is 13. Including the two types of control waveforms, this gives a total of 26.
If we also include the cases of distortion and non-distorted control, then this gives
a total of 52 datasets. Comprehensive detail on the noise profiles used to generate
the datasets is set-out above.
We chose a convention for the naming of the dataset to try delivering as much
information as possible about the chosen parameters for this particular dataset. The
name is partitioned into 6 parts, separated by an underscore sign “ ”. We explicate
each part below:
1. The first part is either the letter “G” or “S” to denote whether the control
waveform is Gaussian or square.
2. The second part is either ”1q” or “2q” to denote the dimensionality of the
system (i.e. the number of qubits).
3. The third part denotes the control Hamiltonian. It is formed by listing down
the Pauli operators we are using for the control for each qubit, and we separate
between qubit by a hyphen “-”. For example, category 1 datasets will have
“X”, while category 4 with have “IX-XI-XX”.
4. The fourth part and fifth parts indicate (i) the axis along which noise is ap-
plied (fourth part) and (ii) the type of noise along each axis (fifth part). So
“G 2q IX-XI IZ-ZI N1-N6” represents two qubits with control along the x axis
of each qubit, while the noise is applied along the z-axis of each. In this case,
N1 noise is applied along the z-axis of the first qubit and N6 noise is applied
along the z-axis of the second qubit. For datasets where no noise is applied,
these two parts are omitted.
5. Finally, the sixth part denotes the presence of control distortions by the letter
“D”, otherwise it is not included.
For example, the dataset “G 2q IX-XI-XX IZ-ZI N1-N6” is two qubit, Gaussian
pulses with no distortions, local X control on each qubit and an interacting XX
control along with local noise on each qubit with profile N1 on the first qubit z-axis
and N6 on the second qubit z-axis. Another example the dataset “S 1q XY D”, is
a single-qubit system with square distorted control pulses along X and Y axis, and
there is no noise.
102 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
• Mean expectation of all observables over all noise realisations. In this case, for
a sample of examples from each dataset in the QDataSet, the mean expectation
over all noise realisations for all observables (i.e. measurements) was compared
against the same mean measurements for the equivalent simulation generated
in Qutip. This was done for the noiseless and noisy case. The two means were
then compared. On average the error (difference between the means) of the
order 10−06 , demonstrating the equivalence of the QDataSet simulation code
with that from Qutip.
Overview
In this section, we include further usage notes related to the 52 QDataSet datasets
based on simulations of one- and two-qubit systems evolving in the presence and/or
absence of noise subject to a variety of controls. Recall that the QDataSet has been
developed primarily for use in training, benchmarking and competitive development
of classical and quantum algorithms for common tasks in quantum control, quan-
tum tomography and noise spectroscopy. It has been generated using customised
code drawing upon base-level Python packages in order to facilitate interoperability
and portability across common machine learning and quantum programming plat-
forms. Each dataset consists of 10,000 samples which in turn comprise a range of
data relevant to the training of machine learning algorithms for solving optimisation
problems. The data includes a range of information (stored in list, matrix or tensor
format) regarding quantum systems and their evolution, such as: quantum state
vectors, drift and control Hamiltonians and unitaries, Pauli measurement distribu-
tions, time series data, pulse sequence data for square and Gaussian pulses and noise
and distortion data.
Researchers can use the QDataSet in a variety of ways to design algorithms for
solving problems in quantum control, quantum tomography and quantum circuit
synthesis, together with algorithms focused on classifying or simulating such data.
104 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
We also provide working examples of how to use the QDataSet in practice and its use
in benchmarking certain algorithms. Each part below provides in-depth detail on
the QDataSet for researchers who may be unfamiliar with quantum computing, to-
gether with specifications for domain experts within quantum engineering, quantum
computation and quantum machine learning.
The aim of generating the datasets is threefold: (a) simulating typical quantum
engineering systems, dynamics and controls used in laboratories; (b) using such
datasets as a basis to train machine learning algorithms to solve certain problems or
achieve certain objectives, such as attainment of a quantum state ρ, quantum circuit
U or quantum control problem generally (among others); and (c) enable optimisation
of algorithms and spur development of optimised algorithms for solving problems
in quantum information, analogously with the role of large datasets in the classical
setting. We explain these use cases in more detail below:
2. Training algorithms using datasets. The second use case for the QDataSet
is related but distinct from the first. The aim is that training models us-
ing the datasets has applicability to experimental setups. Thus, for example,
a machine learning model trained using the datasets in theory should pro-
vide, for example, the optimal set of pulses or interventions needed to solve
(and, indeed, optimise) for some objective. It is intended that the output
of the machine learning model is an abstraction which can then be realised
via the specific experimental setup. The aim then is that the abstraction of
each experiments setup allows the application of a variety of machine learn-
ing models for optimising in a way that is directly applicable to experimental
setups, rather than relying upon experimentalists to then work-out how to
3.8. USAGE NOTES 105
translate the model’s output into their particular experimental context. Re-
quiring conformity of outputs within these abstract criteria thus facilitates a
greater, practical, synthesis between machine learning and the implementation
of solutions and procedures in experiments.
3. Benchmarking, development and testing. The third primary use of the datasets
is to provide a basis for benchmarking, development and testing of existing and
new algorithms in quantum machine learning for quantum control, tomography
and related to noise mitigation. As discussed above, classical machine learning
has historically been characterised by the availability of large-scale datasets
with which to train and develop algorithms. The role of these large datasets
is multifaceted: (i) they provide a means of benchmarking algorithms (see
above), such that a common set of problem parameters, constraints and ob-
jectives allows comparison among different models; (ii) their size often means
they provide a richer source of overt and latent (or constructible) features
which machine learning models may draw upon, improving the versatility and
diversity of models which may be usefully trained. The aim of the QDataSet
is then that it can be used in tandem by researchers as benchmarking tool for
algorithms which they may wish to apply to their own data or experiments.
such as [15]). These generators will typically be used as the tensors or matrices
to which classical controls are applied within machine learning architectures. We
explore these questions in other Chapters, especially in the geometric and algebraic
context of subRiemannian quantum control and also classical Pontryagin-based con-
trol theory.
3.9.1 Benchmarking
Benchmarking algorithms using standardised datasets is an important developmen-
tal characteristics of classical machine learning. Benchmarks provide standardised
datasets, preprocessing protocols, metrics, architectural features (such as optimis-
ers, loss functions and regularisation techniques) which ultimately enable research
communities to precisify their research contributions and improve upon state of the
art results. Results in classical machine learning are typically presented by com-
parison with known benchmarks in the field and adjudged by the extent to which
they outperform the current state of the art benchmarks. Results are presented in
tabular format with standardised metrics for comparison, such as accuracy, F1-score
or AUC/ROCR statistics. The QDataSet has been designed with these metrics in
mind. For example, a range of classical or quantum statistics (e.g. fidelity) can be
used to benchmark the performance of algorithms that use the datasets in training.
The role of benchmarking is important in classical contexts. Firstly, it enables a
basis for researchers across machine learning subdisciplines to gauge the extent to
which their results correlate to algorithmic design as distinct from unique features
of training data or use cases. Secondly, it provides a basis for better assessing the
algorithmic state of the art within subfields. Given its relative nascency, QML lit-
erature tends to focus on providing proof-of-concept examples as to how classical,
hybrid or quantum-native algorithms can be used for classification or regression
tasks. There is little in the way of systematic benchmarking of QML algorithms
3.9. MACHINE LEARNING USING THE QDATASET 107
tomographic classification can be trained using the QDataSet. In any case, an un-
derstanding of standard and state of the art algorithms in each category can provide
QML researchers using the QDataSet with a basis for benchmarking their own al-
gorithms and inform the design of especially hybrid approaches (see [111] for an
overview and for quantum examples of the above).
transformer-based models [165]. See section D.4.2 for detailed discussion of neu-
ral network components and architecture. One feature of algorithmic development
that is particularly important is dealing with the curse of dimensionality - and in
a quantum context, barren plateaus [78] (see section D.6.4). Common techniques
to address such problems include dimensionality reduction techniques or symmetry-
based (for example, tensor network) techniques whose ultimate goal is to reduce
datasets down to their most informative structures while maintaining computational
feasibility. While the QDataSet only extends to two-qubit simulations, the size and
complexity of the data suggests the utility of dimensionality-reduction techniques for
particular problems, such as tomographic state characterisation. To this end, algo-
rithms developed using the QDataSet can benefit from benchmarking and adapting
classical dimensionality-reduction techniques, such as principal component analy-
sis, partial regression, singular value decompositions, matrix factorisation and other
techniques [12]. It is also important to mention that there has been considerable
work in QML generally toward the development of quantum and hybrid analogues
of such techniques. These too should be considered when seeking benchmarks.
Finally, it is worth mentioning the use (and importance) of ensemble methods
in classical machine learning. Ensemble methods tend to combine what are known
as ‘weak learner’ algorithms into an ensemble which, in aggregate, outperforms any
individual instance of the algorithm. Each weak learner’s performance is updated
by reference to a subset of the population of weak learners. Such techniques would
be suitable for use when training algorithms on the QDataSet. Popular examples of
such algorithms are gradient-boosting algorithms, such as xGboost [166].
panded upon in more detail in section A.1.6. The quantum state of interest may be
in either a mixed or pure state and the task is to uniquely identify the state among
a range of potential states. Tomography requires that measurements be tomographi-
cally complete (and therefore informationally complete, see definition A.1.36), which
means that the set of measurement operators form a basis for the Hilbert space of
interest. That is, a set of measurement operators {Mm } is tomographically complete
if for every operator A ∈ B(H), there exists a representation of A in terms of {Mm }.
Data are gathered from a discrete set of experiments, where each experiment is a
process of initial state preparation, by applying a sequence of gates {Gj } and mea-
suring. This experimental process and measurement is repeated N times leading to
a frequency count ni of a particular observable i. The probability of that observable
is then estimated as:
ni
p(i|ρ) ≈ = p̂i
N
from which we reconstruct the density matrix ρ. For a detailed exposition of tomog-
raphy formalism and conditions, see D’Alessandro [15]. Quantum process tomogra-
phy is a related but distinct type of tomography. In this case, we also have a set of
test states {ρj } which span B(H). To undertake process tomography, an unknown
gate sequence Gk comprising K gates is applied to the states such that:
ni
p(i|G, ρj ) ≈ = p̂j,i (3.10.2)
N
p(m) = Tr(ρEm ), ∀m ∈ Σ,
The QDataSet can be used to train algorithms for machine learning algorithms
for tomography. Quantum state and process tomography is particularly challeng-
ing. One must ensure that the estimate we get is physical, i.e. positive semi-definite
with unit trace. Furthermore, the number of measurements N required for sufficient
precision to completely characterise ρ scales rapidly. Each of the K gates in a se-
quence Gk requires d2 (d − 1) (where d = dim |B(H)|) experiments (measurements)
to sufficiently characterise the quantum process is Kd4 − (K − 2)d2 − 1 (see [167] for
more detail). Beyond a small number of qubits, it becomes computationally infea-
sible to completely characterise states by direct measurement, thus parametrised or
incomplete tomography must be relied upon. Machine learning techniques naturally
offer potential to assist with such optimisation problems in tomography, especially
neural network approaches where inherent non-linearities may enable sufficient ap-
proximations that traditional tomographic techniques may not. Examples of the
use of classical machine learning include demonstration of improvements due to
neural network-based (non-linear) classifiers over linear classifiers for tomography
tasks [168] and classical convolutional neural networks to assess whether a set of
measurements is informationally complete [169].
The objective of an algorithm trained using the QDataSet may be, for example,
be to predict (within tolerances determined by the use case) the tomographic de-
scription of a final quantum state from a limited set of measurement statistics (to
avoid having to undertake N such experiments for large N ). Each of the one- and
two-qubit datasets is informationally complete with respect to the Pauli operators
(and identity) i.e. can be decomposed into a one- and two-dimensional Pauli basis.
There are a variety of objectives and techniques which may be adopted. Each of the
10,000 examples for each profile constitutes an experiment comprising initial state
preparation, state evolution and measurement. One approach using the QDataSet
would be to try to produce an estimate ρ̂(T ) of the final state ρ(T ) (which can be
reconstructed by application of the unitaries in the QDataSet to the initial states)
112 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
using the set of Pauli measurements {Em }. To train an algorithm for tomography
without a full set of N measurements being undertaken, one can stipulate the aim
of the machine learning algorithm as being to take a subset of those Pauli measure-
ments as input and try to generate a final state ρ̂(T ) that as closely approximates
the known final state ρ(T ) provided by the QDataSet.
A variety of techniques can be used to draw from the measurement distributions
and iteratively update the estimate ρ̂(T ), for example gradient-based updating of
such estimates [170]. The distance measure could be any number of the quantum
metrics described in the background chapters above, including state or operator
fidelity, trace distance of quantum relative entropy. Classical loss functions, such as
MSE or RMSE can then be used (as is familiar to machine learning practitioners) to
construct an appropriate loss function for minimisation. A related, but alternative,
approach is to use batch fidelity where the loss function is to minimise the error
between a vector of ones and fidelities, the vector being the size of the relevant
batch. Similar techniques may also be used to develop tools for use in gate set
tomography, where the sequence of gates Gk is given by the sequence of unitaries U0
in the QDataSet. In that case, the objective would be to train algorithms to estimate
Gk given the set of measurements, either in the presence of absence of noise. Table
(3.3) sets out an example summary for using the QDataSet for tomography.
The QDataSet can be used to develop and test machine algorithms to assist with
noise spectroscopy. In this problem, we are interested in finding models of the noise
affecting a quantum system given experimental measurements. More background
on noise and quantum measurement is set out in section A.3.3. In terms of the VO
operators discussed earlier, we would like to find an estimate of VO given a set of
control pulse sequences, and the corresponding observables. The QDataSet provides
a sequence of VO operators encoding the average effect of noise on measurement
operators. This set of data can be used to train algorithms to estimate VO from
noisy quantum data, such as noisy measurements or Hamiltonians that include noise
terms. An example approach includes as follows and proceeds from the principle
that we have known information about quantum systems that can be input into
the algorithmic architecture (initial states, controls, even measurements) and we
are trying to estimate unknown quantities (the noise profile). Intermediate inputs
would include the system and noise Hamiltonians H0 , H1 and/or the system and
noise unitaries U0 , U1 . Alternatively, inputs could also include details of the various
noise realisations. The type of inputs will depend on the type of applied use case,
such as how much information may be known about noise sources. Label data could
3.10. EXAMPLE APPLICATIONS OF THE QDATASET 113
be the set of measurements {EO } (expectations of the observables). Given the inputs
(control pulses) and outputs, the problem becomes estimating the mapping {VO },
such that inputs are mapped to outputs via equation (3.5.11). Note that details
about noise realisations or distributions are never accessible experimentally.
Alternatively, architectures may take known information about the system such
as Pauli measurements as inputs or adopt a similar architecture to that in [82, 170]
and construct a multi-layered architecture that replicates the simulation, where the
{V̂O } are extracted from intermediate or custom layers in the architecture. Such
greybox approaches may combine traditional and deep-learning methods and have
the benefit of providing finer-grained control over algorithmic structure by allowing,
for example, the encoding of ‘whitebox’ or known processes from quantum physics
(thereby eliminating the need for the algorithm to learn these processes). Table
(3.4) sets out one example approach that may be adopted.
ciently solve this problem as it would require learning from scratch solutions to the
Schrödinger equation. A more efficient approach may be to encode known informa-
tion, such as the laws governing Hamiltonian evolution etc into machine learning
architecture, such as greybox approaches described above. In this case, the target
fα (t) must be included as an intermediate input into the system Hamiltonians gov-
erning the evolution of ρ(t), yet remains the output of interest. In such approaches,
the input data would be the initial states of the QDataSet with the label data being
ρ(T ) (and label estimate ρ̂(T )). Applicable loss functions then seek to minimise
the (metric) distance between ρ(T ) and ρ̂(T ), such as fidelity F (ρ(T ), ρ̂(T )). To re-
cover the sought after sequence fα (t), the architecture then requires a way to access
the intermediate state of parameters representing fα (t) within the machine learning
architecture.
If path specificity is not important for a use case, then trained algorithms may
synthesise any pathway to achieve ρ̂(T ), subject to the optimisation constraints. The
trained algorithm need not replicate the pathways taken to reach ρ(T ) in the training
data. If path specificity is desirable, then the QDataSet intermediate operators U0 (t)
and U1 (t) can be used to reconstruct the intermediate states i.e. to recover the time-
independent approximation:
3.11 Discussion
In this work, we have presented the QDataSet, a large-scale quantum dataset avail-
able for the development and benchmarking of quantum machine learning algo-
rithms. The 52 datasets in the QDataSet comprise simulations of one- and two-qubit
datasets in a variety of noise-free and noisy contexts together with a number of sce-
narios for exercising control. Large-scale datasets play an important role in classical
machine learning development, often being designed and assembled precisely for
the purpose of algorithm innovation. Despite its burgeoning status, QML lacks
such datasets designed specifically to facilitate QML algorithm development. The
QDataSet has been designed to address this need in the context of quantum control,
tomography and noise spectroscopy, by providing a resource for cross-collaboration
among machine learning practitioners, quantum information researchers and experi-
3.11. DISCUSSION 115
mentalists working on applied quantum systems. In this work we have also ventured
a number of principles which we hope will assist producing large-scale datasets for
QML, including specification of objectives, quantum data features, structuring, pre-
processing. We set-out a number of key desiderata for quantum datasets in general.
We also have aimed to provide sufficient background context across quantum theory,
machine learning and noise spectroscopy for machine learning practitioners to treat
the QDataSet as a point of entry into the field of QML. The QDataSet is suffi-
ciently versatile to enable machine learning researchers to deploy their own domain
expertise to design algorithms of direct use to experimental laboratories.
While designed specifically for problems in quantum control, tomography and
noise mitigation, the scope for the application of the QDataSet in QML research
is expansive. QML is an emerging cross-disciplinary field whose progression will
benefit from the establishment of taxonomies and standardised practices to guide
algorithm development. In this vein, we sketch below a number of proposals for the
future use of the QDataSet, building upon principles upon which the QDataSet was
designed, in order to foster the development of QML datasets and research practices.
2. Quantum taxonomies. While taxonomies within and across disciplines will dif-
fer and evolve, there is considerable scope for research programmes examining
optimal taxonomic structuring of quantum datasets for QML. In this work, we
have outlined a proposed skeleton taxonomy that datasets for QML may wish
to adopt or adapt, covering specification of objectives, ways in which data
is described, identification of training (in-sample) and test (out-of-sample)
data, data typing, structuring, completeness and visibility. Further research
in these directions could include expanding taxonomic classifications of QML
in ways that connect with classical machine learning taxonomies, taking the
QDataSet as an example. Doing so would facilitate greater cross-collaboration
among computer scientists and quantum researchers by allowing researchers
116 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Item Description
simulation name: name of the dataset;
parameters
dim: the dimension 2n of the Hilbert space for n qubits (dimension 2 for single
qubit, 4 for two qubits);
Ω: the spectral energy gap;
static operators: a list of matrices representing the time-independent parts of the
Hamiltonian (i.e. drift components);
dynamic operators: a list of matrices representing the time-dependent parts of
the Hamiltonian (i.e. control components), without the pulses. So, if we have a
term f (t)σx + g(t)σy , this list will be [σx , σy ]. This dynamic operators are further
distinguished (and labelled) according to being (i) undistorted pulses (labelled
pluses) or (ii) distorted pulses (labelled distorted );
noise operators: a list of time-dependent parts of the Hamiltonian that are stochas-
tic (i.e. noise components). so if we have terms like β1 (t)σz + β2 (t)σy , the list will
be [σz , σy ];
measurement operators: Pauli operators (including identity) (I, σx , σy , σz )
initial states: the six eigenstates of the Pauli operators;
T : total time (normalised to unity);
num ex : number of examples, set to 10,000;
batch size: size of batch used in data generation (default is 50);
K: number of randomised pulse sequences in Monte Carlo simulation of noise (set
to K = 2000);
noise profile: N0 to N6 (see above);
pulse shape: Gaussian or Square;
num pulses: number of pulses per interval;
elapsed time: time taken to generate the datasets.
pulse parameters The control pulse sequence parameters for the example:
Square pulses: Ak amplitude at time tk ;
Gaussian pulses: Ak (amplitude), µ (mean) and σ (standard deviation).
time range A sequence of time intervals ∆tj with j = 1, ..., M ;
pulses Time-domain waveform of the control pulse sequence.
distorted pulses Time-domain waveform of the distorted control pulse sequence (if there are no
distortions, the waveform will be identical to the undistorted pulses).
expectations The Pauli expectation values 18 or 52 depending on whether one or two qubits
(see above). For each state, the order of measurement is: σx , σy , σz applied to the
evolved initial states. As the quantum state is evolving in time, the expectations
will range within the interval [1, −1].
VO operator The VO operators corresponding to the three Pauli observables, obtained by aver-
aging the operators WO over all noise realizations.
noise Time domain realisations of the relevant noise.
H0 The system Hamiltonian H0 (t) for time-step j.
H1 The noise Hamiltonian H1 (t) for each noise realization at time-step j.
U0 The system evolution matrix U0 (t) in the absence of noise at time-step j.
UI The interaction unitary UI (t) for each noise realization at time-step j.
VO Set of 3 × 2000 expectation values (measurements) of the three Pauli observables
for all possible states for each noise realization. For each state, the order of mea-
surement is: σx , σy , σz applied to the evolved initial states.
EO The expectations values (measurements) of the three Pauli observables for all
possible states averaged over all noise realizations. For each state, the order of
measurement is: σx , σy , σz applied to the evolved initial states.
Table 3.2: QDataSet characteristics. The left column identifies each item in the respective
QDataSet examples (expressed as keys in the relevant Python dictionary) while the description
column describes each item.
118 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Item Description
Objective Algorithm to learn characterisation of state ρ given measurements {EO }.
Inputs Set of Pauli measurements {EO }, one for each of the M experiments (in
the QDataSet, this is
Label Final state ρ(T )
Intermediate in- Hamiltonians, Unitary operators, Initial states ρ0
puts
Output Estimate of final state ρ̂(T )
Metric State fidelity F (ρ, ρ̂), Quantum relative entropy
Table 3.3: QDataSet features for quantum state tomography. The left columns lists typical cate-
gories in a machine learning architecture. The right column describes the corresponding feature(s)
of the QDataSet that would fall into such categories for the use of the QDataSet in training quan-
tum tomography algorithms.
Item Description
Objective Algorithm to estimate noise operators {VO }, thereby characterising rele-
vant features of noise affecting quantum system.
Inputs Pulse sequence, reconstructed from the pulse parameters feature in the
dataset.
Label Set of measurements {EO }
Intermediate in- Hamiltonians, Unitary operators, Initial states ρ0
puts
Output Estimate of measurements {ÊO }
Metric MSE (between estimates and label data) M SE(EO , ÊO )
Table 3.4: QDataSet features for quantum noise spectroscopy. The left columns lists typical cate-
gories in a machine learning architecture. The right column describes the corresponding feature(s)
of the QDataSet that would fall into such categories for the use of the QDataSet in training quan-
tum tomography algorithms.
3.13. FIGURES & TABLES 119
Item Description
Objective Algorithm to learn optimal sequence of controls to reach final state
ρ(T ) or (equivalently) synthesise target unitary UT .
Inputs Hamiltonians containing Pauli generators H0 (t)
Label Final state ρ(T ) and (possibly) intermediate states ρ(tj ) for each time-
interval tj .
Intermediate Sequence of unitary operators U0 (t), U1 (t), Initial states ρ0
fixed inputs
Intermediate Sequence of pulses fα (t) including parameters depending on whether
weights square or Gaussian (for example)
Table 3.5: QDataSet features for quantum control. The left columns lists typical categories in
a machine learning architecture. The right column describes the corresponding feature(s) of the
QDataSet that would fall into such categories for the use of the QDataSet in training quantum
control algorithms. The specifications are just one of a set of possible ways of framing quantum
control problems using machine learning.
Parameter Value
T 1
M 1024
K 2000
Ω 12
Ω1 12
Ω2 10
n 5
Amin -100
Amax 100
σ T/(12M)
Table 3.7: Dataset Parameters: T : total time, set to unity for standardisation; M : the number of
time-steps (discretisations); K: the number of noise realisations; Ω: the energy gap for the single
qubit case (where subscripts 1 and 2 represent the energy gap for each qubit in the single qubit
case); n: number of control pulses; Amax , Amin : maximum and minimum amplitude; σ: standard
deviation of pulse spacing (for Gaussian pulses).
Figure 3.1: Plot of an undistorted (orange) pulse sequence against a related distorted (blue) pulse
sequence for the single-qubit Gaussian pulse dataset with x-axis control (‘G 1q X’) over the course
of the experimental runtime. Here f (t) is the functional (Gaussian) form of the pulse sequence for
time-steps t. These plots were used in the first step of the verification process for QDataSet. The
shift in pulse sequence is consistent with expected effects of distortion filters. The pulse sequences
for each dataset can be found in simulation parameters =⇒ dynamic operators =⇒ pulses
(undistorted) or distorted pulses for the distorted case (see Table (3.2) for a description of the
dataset characteristics).
Dataset Description
G 1q X (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: none; (iv) No distor-
tion.
G 1q X D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: none; (iv) Distortion.
G 1q XY (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: none; (iv)
No distortion.
G 1q XY D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: none; (iv)
Distortion.
G 1q XY XZ N1N5 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
G 1q XY XZ N1N5 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
G 1q XY XZ N1N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
G 1q XY XZ N1N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
G 1q XY XZ N3N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
G 1q XY XZ N3N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
G 1q X Z N1 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N1 on z-axis; (iv)
No distortion.
G 1q X Z N1 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N1 on z-axis; (iv)
Distortion.
G 1q X Z N2 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
No distortion.
G 1q X Z N2 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
Distortion.
G 1q X Z N3 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N3 on z-axis; (iv)
No distortion.
G 1q X Z N3 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N3 on z-axis; (iv)
Distortion.
G 1q X Z N4 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N4 on z-axis; (iv)
No distortion.
G 1q X Z N4 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N4 on z-axis; (iv)
Distortion.
G 2q IX-XI IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits, Gaussian; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) No distortion.
G 2q IX-XI IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits, Gaussian; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) Distortion.
G 2q IX-XI-XX (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: none; (iv) No distortion.
G 2q IX-XI-XX D (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: none; (iv) Distortion.
G 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: N1 and N5 on z-axis noise on each
qubit; (iv) No distortion.
G 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: N1 and N5 on z-axis noise on each
qubit; (iv) Distortion.
Table 3.8: QDataSet File Description (Gaussian). The left column identifies each dataset in the
respective QDataSet examples while the description column describes the profile of the Gaussian
pulse datasets in terms of (i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent.
122 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Dataset Description
S 1q X (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: none; (iv) No distor-
tion.
S 1q X D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: none; (iv) Distortion.
S 1q XY (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: none; (iv)
No distortion.
S 1q XY D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: none; (iv)
Distortion.
S 1q XY XZ N1N5 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
S 1q XY XZ N1N5 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
S 1q XY XZ N1N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
S 1q XY XZ N1N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
S 1q XY XZ N3N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
S 1q XY XZ N3N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
S 1q X Z N1 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N1 on z-axis; (iv) No
distortion.
S 1q X Z N1 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N1 on z-axis; (iv)
Distortion.
S 1q X Z N2 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N2 on z-axis; (iv) No
distortion.
S 1q X Z N2 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
Distortion.
S 1q X Z N3 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N3 on z-axis; (iv) No
distortion.
S 1q X Z N3 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N3 on z-axis; (iv)
Distortion.
S 1q X Z N4 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N4 on z-axis; (iv) No
distortion.
S 1q X Z N4 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N4 on z-axis; (iv)
Distortion.
S 2q IX-XI IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits, square; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits, square; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) Distortion.
S 2q IX-XI-XX (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, square; (iii) Noise: none; (iv) No distortion.
S 2q IX-XI-XX D (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, square; (iii) Noise: none; (iv) Distortion.
S 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N5 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI-XX IZ-ZI N1-N5 D (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N5 z-axis on each qubit; (iv) Distortion.
S 2q IX-XI-XX IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N6 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI-XX IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N6 z-axis on each qubit; (iv) Distortion.
Table 3.9: QDataSet File Description (Square). The left column identifies each dataset in the
respective QDataSet examples while the description column describes the profile of the square
pulse datasets in terms of (i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent.
3.13. FIGURES & TABLES 123
Item Description
Quantum states Description of states in computational basis, usually represented as
vector or matrix (for ρ). May include initial and evolved (interme-
diate or final) states
Measurement Measurement operators used to generate measurements, description
operators of POVM.
Measurement Distribution of measurement outcome of measurement operators,
distribution either the individual measurement outcomes or some average (the
QDataSet is an average over noise realisations).
Hamiltonians Description of Hamiltonians, which may include system, drift, en-
vironment etc Hamiltonians. Hamiltonians should also include rel-
evant control functions (if applicable).
Gates and oper- Descriptions of gate sequences (circuits) in terms of unitaries (or
ators other operators). The representation of circuits will vary depending
on the datasets and use case, but ideally quantum circuits should
be represented in a way easily translatable across common quan-
tum programming languages and integrable into common machine
learning platforms (e.g. TensorFlow, PyTorch).
Noise Description of noise, either via measurement statistics, known fea-
tures of noise, device specifications.
Controls Specification and description of the controls available to act on the
quantum system.
Table 3.10: An example of the types of quantum data features which may be included in a dedicated
large-scale dataset for QML. The choice of such features will depend on the particular objectives
in question. We include a range of quantum data in the QDataSet, including information about
quantum states, measurement operators and measurement statistics, Hamiltonians and their cor-
responding gates, details of environmental noise and controls.
Figure 3.2: The frequency response (left) and the phase response (right) of the filter that is used to
simulate distortions of the control pulses. The frequency is in units of Hz, and the phase response
is in units of rad.
124 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Figure 3.3: Plot of average observable (measurement) value for all observables (index indicates
each observable in order of Pauli measurements) for all measurement outcomes for samples drawn
from dataset G 1q X (using TensorFlow ‘tf’, orange line) against the same mean for equivalent
simulations in Qutip (blue line - not shown due to identical overlap) for a single dataset. Each
dataset was sampled and comparison against Qutip was undertaken with equivalent results. The
error between means was of order 10−6 i.e. they were effectively identical (so the blue line is not
shown).
Figure 3.4: An example of a quantum state rotation on the Bloch sphere. The |0⟩ , |1⟩ indicates
the σz -axis, the X and Y the σx and σy axes respectively. In (a), the vector is residing in a +1 σx
eigenstate. By rotating about the σz axis by π/4, the vector is rotated to the right, to the +1 σy
eigenstate. A rotation about the σZ axis by angle θ is equivalent to the application of the unitary
U (θ) = exp(−iθz σz /2).
3.13. FIGURES & TABLES 125
QN −j ← P̄
end for
P ← Concatenate(P , Q)
β ← Re{ifft(P )}
return P
end function
function simulate(ρ, O, T , M , fx , fy , fx , SX , SY , SZ )
T
δ←M
E←0
for k ← 0, K − 1 do
βx ← GenerateNoise(SX , T , M )
βy ← GenerateNoise(SY , T , M )
βz ← GenerateNoise(SZ , T , M )
for j ← 0, M − 1 do
t ← (0.5 + j)δ
Hj ← 21 (Ω + βz (t)) σz + 12 (fx (t) + βx (t)) σx + 12 (fy (t) + βy (t)) σy
end for
U ← Evolve(H, δ)
E ← E + tr U ρU † O
end for
E
E←K
return E
end function
126 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Chapter 4
4.1 Abstract
127
128 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
4.2 Introduction
4.3 Preliminaries
4.3.3 Structure
The structure of this Chapter is as follows. Part 4.4 provides an overview of key
quantum control concepts and literature relevant to our experiments. It draws upon
material explicated in more detail in supplementary Appendices below, particularly
sections A.2, B.2 and C.5 and examines the formulation of quantum control problems
geometrically in terms of Lie groups and differential geometry. It also explores
seminal expositions from Nielsen et al. [181] in which time-optimal quantum circuit
synthesis problems are framed in terms of generating approximate geodesics along
relevant group manifolds.
4.4. QUANTUM CONTROL AND GEOMETRY 131
4.4.1 Overview
The necessity of quantum control for various quantum information and computa-
tion programmes globally has seen the emergent application of classical geometric
control in an effort to solve threshold problems such as how to synthesise time
optimal circuits [23, 24, 59]. Nearly two decades ago, developments in applied quan-
tum control [21, 22, 175] spurned the use of geometric tools to assist in solving
optimisation problems in quantum information processing contexts such as applied
NMR [21, 22, 152, 175]. Related work also explored the use of Lie theoretic, geo-
metric and analytic techniques for controllability of spin particles [182]. Since that
time, the connections between geometry and quantum control/information process-
ing across cross-disciplinary fields, via the explication of transformations that enable
problems in one field, in this case quantum control optimisation objectives (such as
minimising controls for synthesis or reachable targets) to be translated into another,
namely the language of differential geometry. Of particular note, Nielsen et al. [183]
demonstrated that calculating quantum gate complexity could be framed in terms
of a distance-minimisation problem in the context of Riemannian manifolds. In
that work, upper and lower bounds on quantum gate complexity, relating to the
optimal control cost in synthesising an arbitrary unitary UT ∈ SU (2n ), were shown
to be equivalent to the geometric challenge of finding minimal distances on certain
Riemannian manifolds (section C.2), subRiemannian (section C.4) and Finslerian
manifolds. Subsequently, geometric techniques were utilised [184, 185] to find a
lower bound on the minimal number of unitary gates required to exactly synthesise
UT , thereby specifying a lower bound on the number of gates required to implement
a target unitary channel.
1
Codebase: https://github.com/eperrier/quant-geom-machine-learning.
132 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
Research into quantum control [17,186] and geometric circuit synthesis [185,187,
188] has built upon results regarding the use of geometric techniques in quantum con-
trol settings. Of interest to researchers at the intersection of geometric and machine
learning approaches for quantum circuit synthesis, and the focus of this Chapter, is
a technique developed in [53,54] that combines subRiemannian geometric techniques
with deep learning in order to approximate normal subRiemannian geodesics (defi-
nition C.4.3) for synthesis of time-optimal or nearly-time optimal quantum circuits.
Our results present improved machine learning architectures tailored to learning
such approximate geodesics.
The affinity between quantum control methods and geometric control and non-
control methods arises from many sources within the literature. One fundamental
reason is the intimate connection between Lie algebraic formulations of control prob-
lems, in classical and quantum settings, and the differential /geometric formulations
of Lie theories on the other (as covered in sections C.5 and A.1.8). In typical Lie
theoretic approaches to quantum control problems [15, 23, 46, 61] such as synthesis
of quantum circuits, the quantum unitary of interest U is drawn from a Lie group G
(definition B.2.1). A feature of Lie groups is that they are mathematical structures
that are at once groups but also differentiable manifolds (definition C.1.2), topolog-
ical structures equipped with sufficient geometric and analytical structure to enable
analytic machinery, such as the tools of differential geometry, to be applied to their
study [51].
A typical formulation of control problems in such Lie theoretic terms takes a
target unitary UT to be an element of a Lie group, such as SU (2n ), represented
as a manifold. Associated with the underlying Lie group G is an Lie algebra g
(definition B.2.6), say su(2n ), comprising the generators of the underlying Lie group
of interest. The Lie algebra g exhibits a homomorphism with the Lie group G
such that it is both the generator of group action G and allows the symmetry
properties of G to be studied by querying the algebra g itself (see definition B.2.8).
Quantum control objectives can then be characterised as attempts to synthesise
a target unitary propagator [21] belonging to such a Lie group G via application
of generators belonging to g in a controlled manner. In the simplest (noise-free)
non-relativistic settings, computation is effected via evolution from U (0) = I to UT
4.4. QUANTUM CONTROL AND GEOMETRY 133
The drift part of the Hamiltonian represents the (directly) ‘uncontrollable’ aspect
of evolution (and is discussed in more detail below), while the control Hamiltonians
represent evolution generated by those elements (generators) of the quantum system
which are controllable (see definition A.2.1), namely the generators of a Lie algebra of
interest, such as, in the case of qubit systems, generalised Pauli operators (equation
(3.5.17)).
The Hc terms represent the control Hamiltonians :
m
X
Hc (t) = vk (t)τk . (4.4.3)
k=1
Figure 4.1: Sketch of geodesic path. The evolution of quantum states is represented by the evolution
according to Schrödinger’s equation of unitary propagators U as curves (black line) on a manifold
U ∈ G generated by generators (tangent vectors) (blue) in the time-dependent case (4.4.2). For
the time-independent case, the geodesic is approximated by evolution of discrete unitaries for time
∆t, represented by red curves (shown as linear for ease of comprehension). Here Uti represents the
evolved unitary at time ti .
One of the motivations for the use of machine learning in quantum control problems
is precisely their potential utility in learning a sufficient approximation of control
functions needed to achieve quantum control objectives, including in the presence
of noise [95].
where ∆tj = ∆t = T /N . That is, the unitary propagator at time t (from the iden-
tity) is the cumulative reverse product (forward-solved cumulant) of a sequence of
Uj . This approximation is considered appropriate where ∆t is small by comparison
to total evolution time T (or equivalently total energy) (and resulting cumulative
errors from the product of such unitaries are sufficiently small) and is an approxi-
mation adopted in our experiments detailed below.
4.4. QUANTUM CONTROL AND GEOMETRY 135
Here, Hd,j = Hd (tj ) designates the drift (or internal) part of the Hamiltonian at
time-step tj and similarly for Hc,j . In the discretised approximation, the control
functions vk,j above now represent the amplitude (energy) to be applied at time-step
tj for time ∆t (duration) and typically correspond, for example, to the application
of certain voltages or magnetic fields for a certain period of time. The functional
form of the controls vk,j can vary, with common (idealised) representations including
Gaussian or ‘square’ pulses.
The objective of time optimal control is then to select the set of controls to be
applied (when using a discretised approximation) at time tj for time ∆tj in order to
synthesise UT in the shortest amount of total time. Such geometric approaches in-
volve reparametrisation of quantum circuits, which are discrete, as approximations
to geodesics on Lie group manifolds of interest to quantum information process-
ing [183–185]. A schema illustrating the application of discretised time-independent
unitaries in order to approximate the geodesic quantum circuits is presented above
in Figure 4.1. It is the desire to solve this optimisation problem that motivates geo-
metric recharacterisation of problems in quantum information, such as determining
and solving geodesic equations of motion. It should be noted that in practice the
properties or characteristics of UT may be known with greater certainty than each
Uj . In this work, we assume the existence of a measurement (and indeed tomo-
graphic) process by which Uj may be sufficiently reconstructed such that knowledge
about Uj is accessible.
The adaptation of geometric methods and variational methods for solving opti-
misation problems in quantum information processing is characterised in terms of
minimising distance of curves along Lie group manifolds G. In geometric contexts,
equation (4.4.9) above can be expressed as horizontal curves (see definition C.5.3)
using equation (C.5.38) where U (t) ∼ γ(t) i.e. as a point in the Lie group manifold
G ∼ M.
Doing so requires selection of a (subRiemannian or Riemannian) metric (defi-
nition C.2.5) (for use in a cost functional) that intuitively measures the distance
(or arc length i.e. using equation (C.2.8)) between elements in the associated Lie
algebra g which, in geometric terms, are represented by tangent vectors belonging
136 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
to the associated tangent space T G (see section C.1.5 for discussion of tangent bun-
dle and Lie algebra correspondence). Cost-functionals (see equation (C.5.13)) are
essentially analogous to variational functional equations such that:
b
dxα dxβ
Z
C= gαβ (4.4.10)
a dt dt
where gαβ represents the (in the most general case, not necessarily constant) metric
tensor (definition C.2.5), dx/dt represents the differential Lie group elements x ∈ G
with respect to the unique single-parametrisation (i.e. time). Solving the optimi-
sation problem of interest, such as synthesising a circuit in minimal time or with
minimal energy, becomes a question of minimising the cost function according to the
Pontryagin Maximum Principle (see section C.5.4.2). Variational methods in this
approach set δC = 0 and consequently use standard techniques from variational
calculus to derive respective equations of motion, differential equations (see section
C.5.4). The solutions (usually) take the form of exponentiated Lie algebraic ele-
ments i.e. unitary propagators, which ultimately minimise the cost functional and
solve the underlying optimisation problem.
It is worth explicating the form of cost functionals for quantum information
practitioners who may be less familiar with geometric methods. In the discretised
case, we essentially replace the integral with a sum over the various Hamiltonians
such that we have:
Z b
Cf = f (H(t))dt (4.4.11)
a
for the discrete case, where f represents the control function(s) applicable to the
Hamiltonian H(t). By selecting the appropriate parametrisation of curves on the
manifold (such as a typical parametrisation by arc-length e.g. equation (C.5.3)),
distance along a curve (representing evolution from one unitary, such as the iden-
tity, to another) can be equated to minimal time required to evolve (and synthesise)
a target unitary UT ∈ G of interest. In cases where there are multiple curves be-
tween two points, then one must select the minimal path over all such paths [181]
consistent with existence theorems regarding subRiemannian geodesics (see discus-
sion of Chow’s theorem in section C.4.1). Because minimising the cost functional
depends itself upon solutions (unitaries) which are themselves generated by Lie al-
gebraic elements subject to control functions, the optimisation problem of quantum
4.4. QUANTUM CONTROL AND GEOMETRY 137
control thus becomes a problem of identifying the optimal set (sequence) of control
functions to be applied over time in order to minimise the cost functional. Note that
the cost functional above in terms of arc-length is different from the cost functional
using fidelity (equation (4.6.3)) as part of our machine learning architecture below.
Applying standard techniques from the calculus of variations (e.g. the Pontrya-
gin Maximum Principle [189] (section C.5.4.2)) with respect to the cost functional
results in the geodesic equation of motion [181, 184] (definition C.1.35) which spec-
ifies the path that minimises the action and which is typically (for constant metric
Riemannian manifolds) is given by equation (C.2.1):
d 2 xj k
j dx dx
l
+ Γ kl =0 (4.4.13)
dt2 dt dt
where x = x(t) ∈ G are the unitary group elements while dx/dt ∈ g represent
the differential operators (tangent vectors/generators) of the associated Lie algebra.
Also in (4.4.13) it is implied that the form of applicable geodesic gαβ is itself identical
across the manifold (which may not always be the case). Γjkl represent Christoffel
terms obtained by variation with respect to the metric. Given a small arc along
a geodesic on a Riemannian manifold, the remainder of the geodesic path is com-
pletely determined by the geodesic equation. Solutions to the geodesic equation
are, in the continuous case, horizontal curves (definition C.5.3) and in the discrete
case approximations to curves, on the manifold of interest. Such curves are inter-
pretable in terms of quantum circuits which are time-optimal when such geodesics
also represent the minimal distance curve linking two unitary group elements on
a manifold. In this way, variational methods leveraging geometric techniques and
characterisation may be utilised for synthesising quantum circuits.
Minimising cost functionals in the way described above involves understanding what
in classical control theory is described as the set of accessible controls available.
Those unitaries which may be synthesised via application of the controls are termed
reachable targets, that is reachable via application of the controls given the gen-
erators (see definition C.5.4). Designing appropriate machine learning algorithms
using geometric methods or otherwise thus requires information on the form of con-
trol function and generators that are available to reach a desired target, such as a
target unitary or quantum state. For a given Lie group G, access to the entire set
of generators g renders any element U ∈ G reachable. In quantum control settings,
access to the full Lie algebraic array of generators occasionally render the problem
of unitary synthesis, i.e. the sequence of generators and control pulses, analytically
or trivially obtainable using geometric means, such as Euler decompositions where
138 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
G = SU (2) [190]. In certain cases (such as those explored below), we are constrained
or seek to synthesise target unitaries UT using only a subset of the relevant Lie alge-
bra, a subset named the control set (or control subset) p ⊂ g with a decomposition
g = p ⊕ k (note we denote the control subset as a set rather than subalgebra as
where such a decomposition is a Cartan decomposition, the non-compactness of p
i.e. that [p, k] ⊆ k results in p not being a subalgebra of g). In such cases, the full
set of generators is not directly accessible. However, one may still be able to reach
the target unitary of interest if the elements of p may be combined (by operation
of the Lie bracket or Lie derivative, as discussed below) in order to generate the re-
maining generators belonging g, thus providing access to the g in its entirety. This
will be the case when the control subset p satisfies the Lie triple property (equation
(C.5.28)) [[p, p], p] ⊆ p. We distinguish such cases by denoting the first case as a case
of directly accessible controls, while the second case represents indirectly accessible
controls.
Returning to the quantum control paradigm (equation 4.4.9), the drift Hamil-
tonian Hd represents the evolution of a quantum system which cannot be directly
controlled. It may represent a noise term or the interaction of a system with an
environment in open quantum systems’ formulations. Where a control subset p ⊂ g
represents only a subset of the relevant Lie algebra, we can think of the comple-
ment k = p⊥ (where g = k ⊕ p) as generators from which the drift term Hd , or at
least elements of it (noting that, for example, in open quantum systems or non-
unitary evolutions, generators are not necessarily Lie algebraic in character), above
is composed, i.e. that Hd ∈ k.
The interaction between the drift Hd (named due to its origins in fluid dynamics)
and control Hj Hamiltonians depends on the set of such accessible controls avail-
able to solve the quantum control problem of interest. The application of control
Hamiltonians in this case represents, in effect, an attempt to ‘steer’ a system evolv-
ing according to Hd towards a desired target via the adjoint action of Lie group
elements generated by p (see [21] for a discussion).
Understanding the nature of relevant control algebras and the composition of
drift Hamiltonians is an important consideration when designing and implement-
ing machine learning architectures for geometric quantum control, including recent
novel approaches applying machine learning for modelling and control of a reconfig-
urable photonic circuit [95] and to learn characteristics of Hd via quantum feature
engineering [158]. One of the motivations of the present work is to demonstrate the
utility of being able to encode prior information about the relevant control subset
into machine learning protocols whose objective is the output of a time-optimal se-
quence of control pulses, a design choice that requires information about precisely
what generators are accessible.
4.4. QUANTUM CONTROL AND GEOMETRY 139
Selecting the specific control subset and set of control amplitudes in order to generate
time-optimal quantum circuits is a difficult task. Solving this optimisation problem
in quantum control and quantum circuit literature using geometric techniques fol-
lows two broad directions which synthesise results from geometric control theory
(see section C.5 and [23] for a comprehensive review). One such approach uses sym-
metric space formalism and Cartan decompositions [20, 21, 175] to decompose the
Lie algebra g associated with a given Lie group G into symmetric and antisymmetric
subspaces such that g = p ⊕ k (see section B.5 and [9] generally). Here p is the con-
trol subset (containing accessible generators) and k is the subalgebra generating the
non-directly controllable evolution of the system. If a suitable partition can be found
satisfying certain Levi (Cartan) commutation relations (see [186,191]) set out in def-
inition B.5.2, then the Lie group can be decomposed into a Cartan decomposition
G = KAK. By doing so, the problem of selecting the appropriate set of genera-
tors τ ∈ p and control amplitudes is simplified (see section (4.11.1) for a discussion
and [175, 186] in particular). A drawback of such methods as currently applied to
problems in quantum control is their limited scope of application, namely that such
methods apply only to limited symmetric space manifolds for which the methods
were developed. Furthermore, the particular methods in [175] used to determine
the appropriate generators are limited in their generality. Chapter 5 presents novel
results that seek to, for certain classes of symmetric space control, address some of
the challenges in using such methods.
An alternative, but ultimately related, method explored by Nielsen et al. in a
range of papers [181,183–185] approaches the problem of finding optimal generators
and controls via modifying metrics applicable to cost functionals. In [181], geometric
techniques are applied to determine the minimal size circuit to exactly implement a
specific n-qubit unitary operation combining variational and geometric techniques
from Riemannian geometry (we discuss in detail in section C.2.1), with the paper
detailing a method for determining the lower bound of circuit complexity and circuit
size by reference to the length of the local minimal geodesic between UT and I
(where length is determined via a Finsler metric on su(2n )). In later work [183–185],
particular metrics with penalty terms are chosen that add higher-weights to higher
order Pauli operators in order to steer the generating set towards one- and two-body
operators which are assessed as being optimal for geodesic synthesis (see Appendix
(4.11.4) for a discussion and section D.3.2 for a discussion of penalty metrics and
regularisation generally). It is shown that in limiting cases applying the variational
techniques and penalty metric of Nielsen et al., the optimal set of generators are
one- and two-body terms [192].
140 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
4.5.1 Overview
The difficulties of synthesising geodesics are well-known throughout geometric and
control literature [194] (see section C.5). The geodesically-driven control methods
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 141
∇γ(t) X = 0.
evolution of curves tangent to certain directions of tangent vectors in k (that is, along
the fibres (section C.1.6)) is not directly possible. This set of directly inaccessible
generators k is orthogonal to the set p of horizontal tangent vectors and is described
as vertical (see definition C.1.25). More formally, the vertical subspace Vp M of T M
comprises vectors X whose evolution along the curve γ(t) is such that ∇γ(t) X ̸= 0
(having some component not tangent to the manifold). In this second case, the
manifold is characterisable as subRiemannian rather than Riemannian. Elements of
the vertical subspace k may still affect the evolution of curves, but only indirectly
to the extent the generators in k are able to be generated by the application of
the Lie bracket via the BCH formula (see equation (4.5.5) below and definition
B.2.18)) i.e. if the distribution is bracket-generating. A number of theorems of
subRiemannian geometry [50] then guarantee the existence and uniqueness of certain
normal subRiemannian geodesics on G which are both unique and minimal in length.
Thus, for generating circuits on G = SU (2n ), by constructing a distribution ∆
that is bracket-generating and comprising only one- and two-body generators, it can
be shown [53] that normal subRiemannian geodesics may be generated which are
minimal and unique, thus approximating the minimal circuits between I and UT . In
the next section, we detail the approach in [54] that leverages such subRiemannian
geometric insights. We do so in order to provide insight into the subRiemannian
machine learning detailed below.
n m
!
Y X
hckj τk
UT ≈ Un ...U1 ≈ E(c) = exp (4.5.1)
j k
| {z }
Uj
m
!
X
ckj τk ∆t
Uj (∆t) = exp = exp(Hj ∆t) (4.5.2)
k
where we have absorbed the imaginary unit −i into the generators. Here Uj are re-
ferred to herein as (right-multiplicative or right acting) subunitaries for convenience,
again justifiable in the large m, small h limit where h = ∆t, the evolution time of
each Uj . The terms ckj := ckj (t) represent the amplitudes of the ckj (square) control
pulses applied to k generators at time interval tj for duration ∆tj = h to generate
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 143
unitary Uj (i.e. j indexes the segment, k indexes the control amplitude ck paired
with the generators τk ). For notational clarity, in sections A.2 and C.5 the set of
cj (t) are denoted uj (t) := (ukj (t)) ∈ U ⊂ Rm , j = 1, ..., m (as per section C.5.4.1).
The method in [54] in effect becomes a ‘bang-bang control’ problem [147, 200] in
which the time-dependent Schrödinger equation is approximated by a sequence of
time-independent solutions Uj where control Hamiltonians Hj are applied via the
application of a constant amplitude ckj for discrete time interval ∆t = h = T /N (with
N the number of segments). The term E(c) represents an embedding function that
maps controls from Cn×m into the Lie group manifold:
with the set of coefficients c ∈ C. The generators τi form a basis for the bracket
generating subset ∆ ∈ su(2n ) of dimension m. The Hamiltonian that generates each
subunitary Uj is the linear sum of m controls applied to m generators. By compari-
son with the conventional control setting described above (4.4.9), the coefficients ckj
correspond to vk,j .
Because ∆ constitutes the set of generators of the entire Lie algebra su(2n ) which
in turn acts as the generator of its associated Lie group SU(2n ), an arbitrary unitary
U ∈ SU(2n ) can in principle be obtained to arbitrary precision with sufficiently-
many products of exponentials. This results from the application of the Baker-
Hausdorff-Campbell (BCH) theorem (see definition B.2.18 and [181] for a generalised
explication), namely that:
1
exp(A) exp(B) = exp(A + B + [A, B] + ...). (4.5.5)
2
In the control setting discussed above (in which each Uj is decomposed into
its BCH product with coefficients ckj ) each ckj sought to be found constitutes some
optimal application of the generator τk . This is consistent with the result in [175] (see
Appendix (4.11.1)), in which the minimum time for synthesising the target unitary
propagator is given by the smallest summation of the coefficients (controls) of the
generators ni=1 |αi | which, in our notation, would sum over all control coefficients
P
144 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
PN Pm
for all subunitaries i.e. j=1 k=1 |ckj |.
It is worth noting that the assumption in [175] and even [184] and other analytic
results in control is that in effect the controls can be applied ‘instantaneously’ such
that the minimum time for evolution of a unitary (via the adjoint action of control
generators on drift Hamiltonians) is lower-bounded by the evolution driven by the
drift Hamiltonian Hd . That is, many such control regimes assume that control
amplitudes can be applied without energy constraints, which is equivalent to being
applicable within infinitesimal time. Often this assumption is justified by the fact
that a control voltage may be many orders of magnitude greater than the energy
scales of the quantum systems to be controlled. In cases where control amplitudes
(for example, voltages) are, in any significant sense, upper-bounded say by energy
constraints, then time for optimal synthesis of circuits will of course increase as the
assumption of instantaneity will not hold. For our purposes, in a bang bang control
scenario and assuming evolution according to any drift Hamiltonian sets a lower-
bound on evolution time, we consider the control amplitudes ckj as applied for time
h rather than instantaneously.
and thus, by doing so, learn the appropriate sequence of control pulses necessary
to generate time optimal evolution of unitaries and, consequently, time optimal
quantum circuits. The method involves generating training data in the form of
normal sub-Riemannian geodesics on SU(2n ) from I to UT . The exponential product
(equation (4.4.7)) represents a path γ(t) along the SU(2n ) manifold, however there
may be an infinity of paths between I and UT such that the map E is not injective
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 145
Specifically, ⟨·, ·⟩ is the restriction of the bi-invariant norm (induced by the inner
product on the tangent bundle) to ∆ ∈ su(2n ). For su(2), this arises from the Killing
form which induces a metric on the manifold (see section C.5.6). The fibre bundle
structure allows partitioning of T M into horizontal and vertical subspaces (albeit if
p = g then V M = ∅). Equation (4.5.8) is the energy equation for a horizontal curve
specified in equation (C.4.2). Here the curve γ(t) ∈ M (path) varies over t ∈ [0, 1]
with tangent vectors to the curve (i.e. along the vector field) given by γ̇(t) ∈ T M.
Hence we can see how minimising path length equates to minimisation of energy.
This approach uses variational methods (section C.5.4) to minimise the path length.
To contextualise this formulation in Lie theoretic terms, γ(t) represent unitaries
U (t) ∈ SU (2n ) and γ̇ the corresponding tangent (Lie algebraic) vectors. Distance
along a path γ(t) generated by the tangent vectors (generators) γ̇(t) is measured
in by subRiemannian (or in the general case, Riemannian) metrics applied to the
tangent space (see general exposition in section C.2.1). The other key assumption
behind this method is that the applicable metric gαβ is constant.
The normal subRiemannian geodesic equations arising from minimising the en-
ergy functional above can be written in differential form [46] as:
It is worth unpacking each of these terms in order to connect the equations above
to the control and geometric formalism above and because they are integrated into
the subRiemannian machine learning model detailed below. In control theory for-
malism (section C.5.4.1), equation (4.5.9) is the state equation for state variable
γ(t), corresponding to equations (C.5.5) and (A.2.6). In quantum contexts it corre-
146 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
sponds to the Schrödinger equation, hence we can identify γ(t) ≡ U (t) (see equation
(A.2.7)). The u term represents an element of the Lie algebra u ∈ ∆ ⊂ su(2n )
parameterised by t ∈ [0, 1], i.e. u : [0, 1] → su(2n ) with t 7→ u(t). As such, it
represents the generator of evolutions on the underlying manifold SU(2n ). For each
value t, the curve γ(t) represents an element of the Lie group i.e. SU(2n ), again
parametrised by t ∈ [0, 1]. Equation (4.5.10) is the costate (adjoint) equation with
costate (adjoint) variables Λ taking values in Lie algebra su(2). Λ represents the
costate (adjoint) variable (akin when integrated to a Lagrange multiplier) encoding
control constraints of the Pontryagin Maximum Principle. They differ from u in that
while u are direct elements of the distribution ∆, Λ(t) are elements of the overall Lie
algebra su(2n ) that are generated by the Lie-bracket between other Λ and u, hence
Λ : [0, 1] → su(2n ). The relationship with the Lie bracket is instructive in that the
Lie bracket also has an interpretation as the Lie derivative (definition B.2.6), the
adjoint equation represents its dynamics.
The time-derivative Λ̇ refers to how the Lie bracket commutator indicates the
change in a vector field along the path γ(t). In a control setting, the Lie derivative
(definition B.2.6) tells us how Λ changes as it is evolved along curves γ(t) generated
by elements u of the control subset. For parallel transport along geodesics (section
C.1.8), as mentioned above, we require this change to be such that the covariant
derivative (definition C.1.33) of Λ0 as it is parallel transported along the curve is
zero, that is:
∇γ(t) Λ0 = 0. (4.5.12)
The last term (4.5.11) indicates that u resides in the distribution ∆ by virtue of
the projection of Λ onto the distribution ∆:
X
proj∆ (x) = Tr(x† τi )τi ∈ ∆. (4.5.13)
i
associated Lie algebra su(2n ). Given these initial operators, the geodesic equations
then allow determination of tuples of unitaries and generators (positions in the
Lie group manifold, momenta in the Lie algebra) for any particular time value
t ∈ [0, 1]. That is, they provide a formula for determining U (t) and Λ(t). The
distribution (control subset) determines the types of geodesics that may be evolved
along. Because the distribution is bracket generating, in principle any curve along
SU (2n ) may be synthesised in this way (though not necessarily directly).
As noted in [53], the above set of equations can be written as a first-order dif-
ferential equation via
γ̇(t) = proj∆ (γ(t)Λ0 γ(t)† )γ(t) (4.5.14)
A first-order integrator (see (4.5.18)) is used to solve for γ(t) = U (t). It is worth
analysing (4.5.14) in light of the discussion above on conjugacy maps and their
relation to time optimal geodesic paths. The γ(t) terms in the conjugacy map:
represent the forward-solved geodesic equations [54, 189] (and see section C.1.10
along with discussion of KP problem solutions in sections C.5.6 and 5.3.2). Given
the initial condition Λ0 , γ(t) here is the cumulative evolved operator in SU(2n ) that
is, for time-step tj , we have:
j
Y
γ(tj ) = Uj (4.5.16)
i=N
(note that in the accompanying code, the imaginary unit is incorporated into ∆).
In the discrete case, the curve γ(t) is partitioned into N such segments of equal
148 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
where here Uj are unitaries that forward-solve the geodesic equations, represented
in terms of the Euler discretisation [53]:
γj+1 = Uj γj (4.5.19)
= exp(−ihproj∆ (γj Λ0 γj† ))γj (4.5.20)
where, again to reiterate, γj+1 represents the cumulative unitary propagator at time
tj+1 and Uj represents the respective unitary that propagates γj → γj+1 . The
Hamiltonian Hj for segment Uj is given by the projection onto ∆:
and is applied for time h (though see Appendix (4.12) below for nuances regarding
the interpretation of h and time given the imposition of ||proj∆ (Λ0 )|| = ||u0 || =
1). A consequence of these formal solutions is that each Hj is constrained to be
generated from ∆. This does not mean that only unitaries directly generated by
∆ are reachable, as the action of unitaries (see (4.5.5)) gives rise to generation of
generators outside ∆. It is, however, of relevance to the construction of machine
learning algorithms seeking to learn and reverse-engineer geodesic approximations
from target unitaries UT . The consequence of this requirement is that the control
functions for machine learning algorithms need only model controls for generators
in ∆.
3. Greybox models: greybox models, as discussed in section D.9 and further on,
seek to combine domain knowledge (such as laws of physics), also known as
whitebox models, together with blackbox models into a hybrid learning pro-
tocol. The particular examples we focus on are variational (parametrised)
quantum circuit models.
The practical engineering, target inputs and outputs of the various machine
learning models differs depending upon metrics of success and use case. For a
typical quantum control problem, the sought output of the architecture is actually
the sequence of control pulses (cj ) to be implemented in order to synthesise the
target unitary (i.e. apply a gate in a quantum circuit). The target unitary UT is
typically one of one or more inputs into the model architecture.
The approach in [54] is blackbox in nature. In that case, the input to their model
was (for their global decomposition algorithm) UT with label data the sequence (Uj ).
150 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
The aim of their algorithm, a multi-layered Gated Recurrent Unit (GRU) RNN, was
to learn a protocol for decomposing arbitrary UT ∈ SU (2n ) into the an estimated
sequence (Ûj ) (sequences are indicated by parentheses). The individual Ûj are then
fed into a subsequent simple feed-forward fully-connected neural network whose
output is an estimate sequence of controls (ĉj ) (where cj is used as a shorthand for
each control amplitude ckj applied to generators τk for segment j and parentheses
indicate a sequence) for generating each Ûj using τk ∈ ∆. While Ûj need not itself
(and is unlikely to) be exactly unitary, so long as the controls (ĉj ) are sufficient
to then input into (4.5.2) to generate unitary propagators, then the objective of
learning the inverse mapping (4.5.7) has been achieved. No guarantees of unitarity
from the learnt model are provided in [54], instead there is a reliance upon simply
finding (4.5.7) in order to provide (ĉj ). As we articulate below, while this approach
in theory is feasible, in practice where unitarity is required within the network itself
(as per our greybox method driven by batch fidelity objective functions), a more
detailed engineering framework for the networks is required. It is for this reason
that we adopt a greybox approach where guarantees of unitary can be obtained via
utilising a Lie-theoretic approach in which controls are learnt parameters, rather
than the specific entries (aij ∈ C) of a unitary matrix group element.
4.6.2 Models
4.6.2.1 Geodesic deep learning architectures
Three deep learning architectures were applied to the problem of learning approxi-
mations to geodesics (definition C.1.35) in SU(2n ):
2. a greybox RNN model using GRU cells [201] in which controls (ĉj ) for esti-
mated Hamiltonians Ĥj are learnt by being trained against (Uj ) (the GRU
RNN Greybox model); and
Each model, described in more detail below, took as initial inputs the target unitary
UT together with unitary sequences (Uj ) (as per (4.4.7 above)). Each new model uses
4.6. EXPERIMENTAL DESIGN 151
(4.4.7) in order to generate estimates (Ûj ). These estimates (Ûj ) were then compared
using MSE loss using an operator fidelity metric against a vector of ones (as perfect
fidelity will result in unity). A second metric of average operator fidelity was also
adopted to provide a measure of how well on training and validation data (see
section D.5.4 for regularisation discussion) the networks were able to synthesise Uj
with respect to the estimated Ûj .
Unlike the segmented neural networks for learning control pulses to generate
specific Uj , the variable weights (and units) of the neural network were constructed
with greater flexibility. The FC Greybox, SubRiemannian and GRU RNN Greybox
models tested were each tested. Note that MSE((Uj ), (Ûj )) refers to the batch fidelity
MSE described below. For each model, the inputs to the model were the target
unitary UT and its corresponding sequence of subunitaries (Uj ). As detailed below,
the penultimate layer of each model outputs an estimated sequence of subunitaries
(Ûj ). This estimated sequence was then compared to the true sequence (Uj ) using
operator fidelity (see (4.6.2 below). This estimate of fidelity F ((Uj ), (Ûj )) was then
compared using MSE against a vector of ones (i.e. ideal fidelity) which formed
the label for the models. As described below, the customised nature of the models
meant intermediate outputs, including estimated control amplitude sequences (ĉj ),
Hamiltonian estimate sequences (Ĥj ) and (Ûj ) were all accessible. The general
architectural principles of this greybox approach are discussed in section D.9.
4.6.2.2 Methods
Generation of training data for each of the models tested was achieved via im-
plementing the first-order subRiemannian geodesic equations in Python, adapting
Mathematica code from [54]. A number of adaptations and modifications to the
original format of the code were undertaken: (a) where in [54], unitaries were param-
eterised only via their real components (to effect dimensionality reduction) (relying
upon an analytic means of recovering imaginary components [53]), in our approach
the entire unitary was realised such that U = X + iY . This was adopted to improve
direct generation of target unitaries of interest and to facilitate fidelity calculations.
Using equation (B.3.10) such the unitaries became expressed in terms of:
!
X −Y
Û = (4.6.1)
Y X
where dim Û = dim SU (2n+1 ); and (b) in certain iterations of the code for Λ0 :
[0, 1] → SU (2n ), the coefficients of the generators were derived using tanh activation
152 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
functions:
ex − e−x
tanh(x) =
ex + e−x
(with a range −1 to 1 rather than the range [0, 1]) that allowed elements of uni-
taries to be more accurately generated and also to test (see Appendix 4.12) whether
the first order integrative approach did indeed generate equivalent time-optimal
holonomic paths (as in [190]) (see section C.1.32). Using tanh activation functions
enabled better approximation of the relevant time-optimal control functions which
give rise to the generator coefficients (for example, to reproduce the holonomic paths
of [190], one needs the coefficients to emulate the range of the sine and cosine control
functions which characterise the time-optimal evolution in that case).
Furthermore, (c) one observation from [54] was that the training data generated
unitaries relatively proximal to the identity i.e. curves that did not evolve far from
their origin. This is a consequence of the time interval ∆t for each generator i.e.
∆t = h = 1/nseg where nseg is the number of segments. The consequence of this for
our results was that training and validation performance was very high for UT close
to the identity (that is, similar to training sets), but declined in cases for UT further
away (in terms of metric distance) from the origin. This is consistent with [54]
but also consistent with the lack of generalisation performance in their model. As
such, in some iterations of the experiments we scaled-up h by a factor in order
to seek UT which were more spread-out across the manifold. Other experiments
undertaken sought to increase the extent to which training data covered manifolds
by increasing the number of segments Uj of the approximate geodesic while keeping
h fixed (between 0 and 1). We report on scale and segment number dependence of
model performance below.
In addition to these modifications, in certain experiments we also supplemented
the [54] generative code with subRiemannian training data from a Python imple-
mentation of Boozer [190]. In this case, given the difficulty of numerically solving for
arbitrary unitaries using Boozer’s approach (whose solutions in the paper rely upon
analytic techniques), we generated rotations about the z-axis by arbitrary angles θ
(denoted η in [190]), then rotated the entire sequence of unitaries Uj by a random
rotation matrix. This has the effect of generating sub-Riemannian geodesics with
arbitrary initial boundary conditions and rotations about arbitrary axes, which in
turn provided a richer dataset for training the various neural networks and machine
learning algorithms.
For SU (2), the bracket-generating set ∆ can be any two of the three Pauli
operators. Different combinations for ∆ were explored as part of our experimental
process. Our experiments focused on setting our control subset ∆ = {σx , σy } as this
4.6. EXPERIMENTAL DESIGN 153
allowed ease of comparison with analytic results of [190] and to enable assessment
of how each machine learning model performed in cases where control subsets were
limited, which was viewed as being more realistic in experimental contexts. Note
that this corresponds to the control problem being a subRiemannian one. It aligns
also with the Cartan decomposition (definition B.5.2) of su(2) (see also Chapter 5
for a comparison with analytical methods).
Test datasets for generalisation, where the trained machine learning models are
tested against out of sample data, were generated using the same subRiemannian
generative code above. We also sought to test, for each of SU (2), SU (4) and SU (8),
the efficacy of the models in generating sequences (Ûj ) that accurately evolved to
randomly generated unitaries from each of those groups. The testing methodology
for geodesic approximation models comprised input of the target UT of interest into
the trained model with the aim of generating control pulses (ĉj ) from which (Ûj )
(and thus ÛT ) could be generated.
In each of the models, a customised layer generates candidate controls (ĉj ) in
the form of variable weights which are updated during each iteration (epoch) of the
model using TensorFlow’s autodifferentiation architecture (which streamlines up-
dating of variable weights). These control amplitudes are then fed into a customised
Hamiltonian estimation layer which applied (ĉj ) to the respective generators in ∆.
The output of this Hamiltonian estimation layer is a sequence of control Hamil-
tonians (Ĥj ) which are input into a second customised layer which implemented
quantum evolution (i.e. equation (4.4.7)) in order to output (Ûj ). A subsequent
custom layer takes (Ûj ) and the true (Uj ) as inputs and calculated their fidelity i.e.
it takes as inputs batches of estimates (Ûj ) and ground truth sequence (Uj ) and
calculates the operator fidelity (see section A.1.8.2) of each Ûj and Uj via:
where d = dim Uj . It should be noted that in this case, the unitaries are ultimately
complex-valued (rather than in realised form) prior to fidelity calculations. The
outputs of the fidelity layer are the ultimate output (labels) of the model (that is,
the output is a batch-size length vector of fidelities). These outputs are compared to
a label batch-size length vector of ones (equivalent to an objective function targeting
unit fidelity). The applicable cost function used was standard MSE but applied to
the difference between ideal fidelity (unity) and actual fidelity:
n
1X
C(F, 1) = (1 − F (Ûj , Uj ))2 (4.6.3)
n j=1
where here n represents the chosen batch size for the models, which in most cases was
154 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
10 or a multiple thereof. It should also be noted that this approach, which we name
‘batch fidelity’, contributed significantly to improvements in performance: previous
iterations of our experiments had engineered fidelity itself as a direct loss-function
using TensorFlow’s low-level API, which was cumbersome, lacking in versatility and
resulting in limited improvement by comparison with batch fidelity approaches. A
standard ADAM optimizer [202] (with α = 10−3 ) was used for all models.
To benchmark the performance of the greybox models, a blackbox model that sought
to input UT and output (ĉj ) was constructed using a simple deep feed-forward fully-
connected layer stack taking as an input UT and outputting a sequence of estimated
control amplitudes (ĉj ). A schema of the model is shown in Figure (4.2). Subse-
quent customised layers construct estimates of Hamiltonians by applying (ĉj ) to the
generators in ∆, which are in turn used to generate subunitaries Ûj .
The stack comprised an initial fully-connected feed forward network with stan-
dard clipped ReLU activation functions (with dropout ∼ 0.2) that was fed UT . This
stack fed into a subsequent dense layer outputting (ĉj ) utilised tanh activation func-
tions. Standard MSE loss against the label data (cj ) was used (akin to the basic
GRU in [54]). The sequence (Uj ) was then reconstructed using (ĉj ) external to the
model and fidelity assessed separately. In this variation of the feed-forward fully-
connected model, a basic greybox approach that instantiated the approximation
(4.4.7) was adopted.
As we discuss in Appendix D, greybox approaches [95, 95] represent a hybrid
synthesis of ‘blackbox’ approaches to machine learning (in which the only known
data are inputs and outputs to an typical machine learning algorithm whose in-
ternal dynamics remain unknown or uninterpretable) and ‘whitebox’ approaches,
where prior knowledge of systems, such as knowledge of applicable physical laws,
is engineered into algorithms. Practically, this means customising layers of neural
network architecture to impose specific physical constraints and laws of quantum
evolution in order to output estimated Hamiltonians and unitaries. The motivation
for this approach is that it is more efficient to engineer known processes, such as
the laws of quantum mechanics, into neural network architecture rather than devote
computational resources to requiring the network to learn what is already known to
be true (and necessary for it to function effectively) such as Schrödinger’s equation.
The greybox architecture used to estimate the control pulses necessary to synthe-
sise each Uj is set-out below. This is achieved by using τi ∈ ∆ to construct estimates
of Hamiltonians Ĥ and unitaries Û . The inputs (training data) to the network are
twofold: firstly, unitaries Û generated by a Hamiltonian composed of generators in
∆ with uniform randomly chosen coefficients ckj ∈ [−1, 1], where the negative values
4.6. EXPERIMENTAL DESIGN 155
represent, intuitively, tangent vectors pointing in the opposite direction along a Lie
group manifold:
dim |∆|
X
Ĥj = ĉkj τk ĉkj ∼ U [−1, 1] (4.6.4)
k=1
• Inputs: UT (target unitary) and (Uj ) the training sequences (Uj ); and
• Outputs: Fidelity F (Ûj , Uj ) ∈ [0, 1], representing the fidelities of the estimate
of the sequence (Ûj ) from those in the training data.
Figure 4.2: Schema of Fully-Connected Greybox model: (a) realised UT inputs (flattened) into
a stack of feed-forward fully connected layers with ReLU activations and dropout of 0.2; (b) the
final dense layer in this stack outputs a sequence of controls (ĉj ) using tanh activation functions;
(c) these are fed into a custom Hamiltonian estimation layer produce a sequence of Hamiltonians
(Ĥj ) using ∆ ; (d) these in turn are fed into a custom quantum evolution layer implementing the
time-independent Schrödinger equation to produce estimated sequences of subunitaries (Ûj ) which
are fed into (e) a final fidelity layer for comparison with the true (Uj ). Intermediate outputs are
accessible via submodels in TensorFlow.
The second category of deep learning architectures explored in our experiments were
RNN algorithms [14,203]. LSTMs are a popular deep learning tool for the modelling
of sequential datasets, such as time-series or other data in which successive data
depends upon preceding data points. The interested reader is directed to a number of
standard texts [14] covering RNNs architecture in general for an overview. In short,
RNNs are modular neural networks comprising ‘cells’, self-enclosed neural networks
consisting of inputs of training data, outputs and a secondary input from preceding
cells. For sequential or time-series data, a sequence of modules are connected for
each entry or time-step in the series, j. The intuition behind RNNs, such as Long-
Short Term Memory networks (LSTMs), is that inputs from previous time-step
cells or ‘memories’ can be carried forward throughout the network, enabling it to
more accurately learn patterns in sequential non-Markovian datasets. The original
application of GRU RNNs to solving the geodesic synthesis problem was the focus
of [54]. That work utilised a relatively simple network of GRU layers, popular due
to efficiencies it can provide to training regimes.
In the present case, the aim of the GRU RNN is to generate a model that can
decompose a target unitary UT into a sequence Uj reachable from I ∈ SU (2n ).
4.6. EXPERIMENTAL DESIGN 157
The third model (the SubRiemannian model) architecture developed in our experi-
ments expanded upon principles of greybox network design and subRiemannian ge-
ometry in order to generate approximations to subRiemannian geodesics. A schema
of the model is shown in Figure (4.4). The choice of architecture was motivated by
158 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
Figure 4.3: Schema of GRU RNN Greybox model: (a) realised UT inputs (flattened) into a GRU
RNN layer comprising GRU cells in which each segment j plays the role of the time parameter; (b)
the output of the GRU layer is a sequence of control pulses (ĉj ) using tanh activation functions; (c)
these are fed into a custom Hamiltonian estimation layer to produce a sequence of Hamiltonians
(Ĥj ) by applying the control amplitudes to ∆; (d) the Hamiltonian sequence is fed into a custom
quantum evolution layer implementing the time-independent Schrödinger equation to produce es-
timated sequences of subunitaries (Ûj ) which are fed into (e) a final fidelity layer for comparison
with the true (Uj ). Intermediate outputs are accessible via submodels in TensorFlow.
The aim of the network was to, given the input UT , learn the control amplitudes
for generating the correct Λ0 which, when input into the subRiemannian normal
geodesic equations, generated the sequence (Ûj ) from which UT could be obtained
(thus resulting in a global decomposition of UT into subunitaries evolved from the
identity). Recall that Λ0 is composed from su(2n ) or ∆ depending on use case (the
original paper [54] selects Λ0 ∈ su(2n )). This generated Λ0 was then input into a
recursive customised layer performing the projection operation (4.5.13) that outputs
estimated Hamiltonians, followed by a quantum layer that ultimately generated the
sequence (Ûj ). The sequence (Ûj ) was then input into a batch fidelity layer for
comparison against the true (Uj ). Once trained, the network could then be used for
prediction of Λ0 , (Uj ), the sequence of amplitudes (ci ) and (Ûj ), each being accessible
via the creation of sub-models that access the respective intermediate custom layer
used to generate such output. Pseudocode for the SubRiemannian model is set-out
in section (4.10.3).
As we discuss in our results section, this architecture provided among the highest-
fidelity performance which is unsurprising given that it effectively reproduces the
subRiemannian generative method in its entirety. One point to note is that, while
this architecture generated the best performance in terms of fidelity, in terms of the
actual learning protocol (i.e. the extent to which the network learns as measured by
declines in loss), it was less adaptive than other architectures. That is, while having
overall lower MSE, it was initialised with a lower MSE which declined less. This is
not unexpected given that, in some sense, the neural network architecture combined
with the whitebox subRiemannian generative procedure overdetermines the task
of learning the coefficients of a single generator Λ0 used as an initial condition.
The other point to note is that in [54], Λ0 ∈ su(2n ) i.e. it is drawn from the full
Lie algebra, not just ∆ (intuitively because it provides a random direction in the
tangent space to commence evolution from). From a control perspective, however,
if one only has access to ∆, one cannot necessarily synthesise Λ0 , thus a second
iteration of experiments where Λ0 ∈ ∆ were undertaken. The applicability of the
SubRiemannian model as a means of solving the control problem is more directly
related to this second case rather than the first.
Figure 4.4: Schema of SubRiemannian model: (a) realised UT inputs (flattened) into a set of
feed-forward fully-connected dense layers (with dropout ∼ 0.2); (b) two layers (red) output sets
−
of control amplitudes for estimating the positive (ĉ+
Λ0 ) and negative (ĉΛ0 ) control amplitudes us-
ing tanh activation functions; (c) these are fed into two custom Hamiltonian estimation layers to
−
produce the positive Λ̂+ n
0 and negative Λ̂0 Hamiltonians for Λ0 using ∆ or su(2 ) that are com-
bined into a single Hamiltonian estimation Λ̂0 ; (d) Λ̂0 is fed into a custom subRiemannian layer
which generates the control amplitudes (ĉj ), Hamiltonians (Ĥj ) and then implements the time-
independent Schrödinger equation to produce estimated sequences of subunitaries (Ûj ) which are
fed into (e) a final fidelity layer for comparison with the true (Uj ). Intermediate outputs (a) to
(d) are accessible via submodels in TensorFlow. The SubRiemannian model resulted in average
gate fidelity when learning representations of (Uj ) of over 0.99 on training and validation sets
in comparison to existing GRU & FC Blackbox models which recorded average gate fidelities of
≈ 0.70, demonstrating the utility of greybox machine learning models in synthesising learning uni-
tary sequences.
intermediate layers as in our GRU RNN Greybox model. The local model took Uj as
an input and output the coefficient control amplitude estimates (ĉj ) from which the
sequence (Ûj ) could be reconstructed using ∆. In [54], in order to reduce parameter
size of the model, the original global model was trained only on the real part of
(Uj ) on the basis that the imaginary part could be recovered via application of the
unitarity constraint (see [53] for details).
To learn the individual Uj segments of the approximate geodesic unitary path,
we adapted while substantially modifying the approach in [54]. In that paper, the
method of learning Uj segments was adopted via feeding the real part of a vectorised
(i.e. flattened) unitary Uj into a simple three layer feed-forward fully connected
neural network. The labels for the network were the true control pulse amplitudes
ckj .
Recreating these models it was found that using only the realised part of uni-
taries was insufficient for model performance overall, thus we included both real and
imaginary parts both for model performance but also because it is unclear whether
4.7. RESULTS 161
simply training alone on realised parts of unitaries affects the way in which the
networks would integrate information about the imaginary parts. Furthermore, the
approach in [54] did not use measures such as fidelity of more utility to quantum in-
formation practitioners, thus our model extended the original models by recreating
the unitaries from the estimated controls (ĉj ).
4.7 Results
4.7.1 Overview
Figure 4.5: Training and validation loss (MSE) Figure 4.6: Training and validation loss (MSE)
for SU(2). for SU(4).
4.8 Discussion
Figure 4.7: Training and validation loss (MSE). Comparison of MSE at different time intervals for
the SubRiemannian model. h = 0.1, 0.5 and 1. G = SU (2), ntrain = 1000, nseg = 10, epochs= 500,
Λ0 ∈ su(2n ): This plot shows the differences in MSE on training and validation sets as the time-step
h = ∆t varies from 0.1 to 1. As can be seen, larger h leads to deterioration in performance (higher
MSE). However, smaller h can lead to insufficiently long geodesics, leading to a deterioration in
generalisation. Setting h = 0.1 (red curves) exhibits the best overall performance. Even a smaller
jump up to h = 0.5 (blue curves) exhibits an increase in MSE and decrease in performance by
several orders of magnitude (and similarly for h = 1).
els for the case of SU (2), SU (4) and SU (8) for 1000 training examples, 10 segments,
h = 0.1 and 500 epochs. All models exhibit a noticeable flatlining of the MSE loss
for after a relatively short number of epochs, indicative of the models saturating
(reaching capacity for learning), a phenomenon accompanied by predictable overfit-
ting beyond such saturation points. For small h ≈ 0.1, the batch fidelity MSE is
already at very low levels of the order ∼ 10−5 . Again we see these persistently low
MSEs as indicative of a highly determined model in which the task of learning Λ0
(at least for smaller dimensional SU(2n )) is overdetermined from the standpoint of
large hidden layers (with 640 neuron units each), together with a prescriptive sub-
Riemannian method. From one perspective, these highly determined architectures
such as SubRiemannian model have less applicability beyond the particular use-case
4.8. DISCUSSION 165
Figure 4.8: Training and validation loss (MSE). Comparison of SubRiemannian, FC Greybox
and GRU RNN Greybox models. G = SU (8), ntrain = 1000, nseg = 10, h = 0.1, epochs= 500,
Λ0 ∈ su(2n ): For U ∈ SU (8), we see (main plot - first 100 epochs) that the GRU RNN Greybox
(blue line) performs best in terms of batch fidelity MSE on training and validation sets. As
shown in the inset, the GRU RNN Greybox levels out (saturates) after about 100 epochs and
overall performed the best of each of the models and rendered average operator fidelities of around
0.998. The SubRiemannian model (red) performed less-well than the GRU RNN, still recording
high average operator fidelity but exhibiting overfitting as can be seen by the divergence of the
validation (dashed) curve from the training (smooth) curve. The FC Greybox rapidly saturates
for large ntrain and exhibits little in the way of learning. All models render high average operator
fidelity > 0.99 and saturate after around 150 epochs (see inset).
Figure 4.9: Training and validation loss (MSE): GRU RNN Greybox. G = SU (2), ntrain =
1000, nseg = 100, h = 0.1, epochs= 500. This plot shows the MSE loss (for training and vali-
dation sets) for the GRU RNN Greybox model where the number of segments was increased from
10 to 100. As can be seen, the model saturates rapidly once segments are increased to 100 and
exhibits no significant learning. Similar results were found for the SubRiemannian model. This
result suggests that simply changing the number of segments is insufficient for model improvement.
One solution to this problem may be to introduce variable or adaptive hyperparameter tuning into
the model such that the number of segments varies dynamically.
with the demonstrable utility of GRU neural networks for certain quantum control
problems [95]. One possible reason for differences between GRU RNN Greybox and
SubRiemannian models may lie in the sensitivity of Λ0 : the SubRiemannian model’s
only variable degrees of freedom once initiated are in the relatively few weights ckj
learnt in order to synthesise Λ0 . As the dimension of SU (2n ) grows, then the co-
efficients of Λ0 become increasingly sensitive, that is, small variations in ckj have
considerable consequences for shaping the evolution in higher-dimension spaces, in
a sense, Λ0 bears the entire burden of training and so becomes hypersensitive and
requires ever fine-grained tuning. This is in contrast to the GRU, for example, where
the availability of more coefficients ckj means each individual coefficient ckj need not
be as sensitive (can vary more) in order to learn the appropriate sequence.
The experiments run across the various training sets indicated model dependence
on the number of segments and scale h. As can be seen from Figure (4.10), we
find that, not unexpectedly, the performance of models depends upon training data.
In particular, model performance measures such as MSE and fidelity clearly depend
upon time interval h = ∆t: where h is small, i.e. the closer the sequence of (Uj ) is to
approximating the geodesic section, the lower the MSE and higher the fidelity. The
effect on model performance is particularly evident in Figure (4.7) where increasing
h from 0.1 to 0.5 leads to a deterioration in loss of several orders in magnitude
4.8. DISCUSSION 167
(particularly for h > 0.3). As step size h increases, the less approximating is the
resultant curve to a geodesic. Furthermore, for larger step sizes, the conditions
required for the assumption of time independence in unitary evolution (4.4.7) are
less valid.
Figure 4.10: Scale h dependence (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500. Plot demonstrates increase in batch fidelity MSE as scale h (∆t) increases from 0.1
to 1, indicative of dependence of learning performance on time-interval over which subunitaries Uj
are evolved.
4.8.4 Generalisation
In order to test the generalisation of each model (see Appendix D for discussion),
a number of tests were run. In the first case, a set of random target unitaries ŨT
from the relevant SU(2n ) group of interest were generated. These target ŨT were
then input into the SubRiemannian and GRU RNN Greybox models which output
the estimated approximate geodesic sequences (Ûj ) to propagate from the identity
to ŨT . An estimated endpoint target estimate ÛT for the approximate geodesic was
generated. This estimate was then compared against ŨT to obtain a generalised
gate fidelity F (ŨT , ÛT ) for each test target unitary. Second, because fidelities of test
unitary targets varied considerably, in order to test the extent to which higher fi-
168 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
delities may be related to similarity to the underlying training set of target unitaries
{UT }train on which the models were trained, a second fidelity calculation was under-
taken. The average gate fidelity of ŨT with {UT }train was calculated F̄ (ŨT , {UT }train ).
Correlations among the two fidelities were then assessed.
In the third case, for SU (2) models trained on training data where Λ0 ∈ ∆,
random test unitaries were replaced by ŨT comprising random-angle θ ∈ [−2π, π]
z-rotations. The rationale was to test the extent to which a model based upon
restricted control subset training and architecture could replicate unitaries generated
only from ∆ with high fidelity for the single qubit case of SU (2) where analytic
solutions to the time optimal synthesis of subRiemanninan geodesics are known
[190].
Figure 4.11: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity F (ÛT , ŨT ) of randomly generated
ŨT with the reconstructed estimate ÛT , versus F (ÛT , ŨT ), average operator fidelity of randomly
generated UT with training {UT }train inputs to the model. The upward trend indicates an increase
in operator fidelity as similarity (Pearson coefficient of 0.52 to 95% significance) of UT to training
{UT }train increases. Colour gradient indicates low fidelity (blue) to high fidelity (red).
Figure 4.12: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ ∆. Plot of generalised gate fidelity F (ÛT , ŨT ) of random-angle θ ∈ [−2π, π]
z-rotations against generated ŨT with the reconstructed estimate ÛT , versus F (ÛT , ŨT ), average
operator fidelity of randomly generated UT with training {UT }train inputs to the model. Here there
is no statistically significant correlation between UT and training set {UT }train , though higher test
fidelities are evident for UT bearing both high and low similarity to the training set (less depen-
dence on similarity to training set for high fidelities).
Figure 4.13: Generalisation (SubRiemannian model). G = SU (8), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity F (ÛT , ŨT ) versus F (ÛT , ŨT ) (av-
erage operator fidelity against training set {UT }train ). Generalisation was significantly worse for
SU(8), however correlation of generalised gate fidelity with similarity of UT to training sets is
evident.
a random angle θ about the z-axis. No particular relationship between ŨT and the
training set is apparent. Figure (4.14) plots the same generalised gate fidelities in
relation to θ. Once again there is no immediately discernible pattern between the
angle of the z-rotation and the fidelity of the estimate of ŨT . We do see that high
(above 0.99) fidelities are distributed across the range of θ and that there is some
hollowing out of fidelities between extremes of 0 and 1.
The out of sample performance of both the SubRiemannian and GRU RNN
Greybox models (in both cases limited to generators from ∆) for random unitaries
drawn from SU (4) and SU (8) was significantly worse than for SU (2). Average
generalised gate fidelities were below 0.5 for each of the models tested. This is
not unexpected given the heightened number of parameters that the models must
deploy in order to learn underlying geodesic structure increases considerably as
the Lie group dimension expands. A larger training set may have some benefits,
however we note the saturation of the models suggests that at least for the models
deployed in the experiments described above, expanding training sets is unlikely
to systematically improve the generalisation of the models. Devising strategies to
address both model saturation and ways in which expanded training data could be
leveraged to improve model performance remains a topic for further research.
4.9. FUTURE DIRECTIONS 171
Figure 4.14: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ ∆. Plot of F (ÛT , ŨT ) of random-angle θ ∈ [−2π, π] z-rotations θ. As evi-
dent by the red high fidelities across the range [−2π, π], the SubRiemannian model trained on data
where Λ0 ∈ ∆ and ∆ = {X, Y } in certain cases does generalise relatively well.
The section below sets-out pseudocode for the machine learning models utilised in
the experiments above.
Pseudocode for the Fully-connected Greybox model is set-out below. Note that
TensorFlow inputs required (Uj ) to be separated into real Re(UJ ) and imaginary
Im(UJ ) parts and then recombined for input into fidelity calculations. Note the cost
function C(F, 1) below is implicitly a function of ckj (the sequence of which is (cj ))
which are the variable weights in the model. Here γ is the learning rate for the
gradient update and θ the trainable weights of the model.
Pseudocode for the GRU RNN Greybox model is set-out below. Note that Tensor-
Flow inputs required (Uj ) to be separated into real Re(UJ ) and imaginary Im(UJ )
parts and then recombined for input into fidelity calculations. Note the cost func-
tion C(F, 1) below is implicitly a function of ckj (the sequence of which is (cj )) which
are the variable weights in the model. Here γ is the learning rate for the gradient
update and θ the trainable weights of the model.
Pseudocode for the SubRiemannian model is set-out below. Note that TensorFlow
inputs required (Uj ) to be separated into real Re(UJ ) and imaginary Im(UJ ) parts
and then recombined for input into fidelity calculations. This model learns the
coefficients required to generate the initial condition Λ0 . It was found that the
−
model performed best when positive c+ Λ0 and negative cΛ0 control functions were
learnt separately then combined to form the coefficient cΛ0 . Note the cost function
C(F, 1) below is implicitly a function of cΛ0 . Here γ is the learning rate for the
gradient update and θ the trainable weights of the model.
174 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
k cx,j cy,j
1 -4.651e-02 0.751e-02
2 -5.668e-02 0.622e-02
3 -5.777e-02 -1.504e-02
4 -5.917e-02 0.947e-02
5 -5.221e-02 -0.529e-02
6 -5.663e-02 0.800e-02
7 -6.137e-02 0.119e-02
8 -4.975e-02 -0.173e-02
9 -5.377e-02 -0.047e-02
10 -5.346e-02 0.079e-02
Table 4.3: Table of control amplitudes for generation of z-rotation by angle θ =-7.529e-01. At each
time-interval k, controls cx,j are applied to X generators and cy,j are applied to Y generators to
form Hamiltonian Hj , for time h = ∆t.
selected angle θ between ±2π was generated using Qutip, in this case θ =-7.529e-01,
generating the target unitary:
!
0.930 + 0.368i 0
UT =
0 0.930 − 0.368i
Hj = cx,j X + cy,j Y
The specific controls applied at time step j are set-out in Table 4.3. The resulting
estimated unitary ÛT generated by applying each Hj for time ∆t is:
!
0.926 + 0.377i −0.003
ÛT =
0.003 0.926 − 0.377i
with the fidelity between the target and estimate F (ÛT , UT ) = 0.9999.
176 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
Our experimental results and methods focus on synthesising quantum circuits for
multi-qubit systems where unitary operators are drawn from SU (2n ). For such
multi-qubit (qudit) systems, unitary operators U belong to Lie groups G = SU(2n )
which describe the evolution of n interacting spin−1/2 particles. These groups are
equipped with a corresponding Lie algebra of dimension (2n )2 − 1 = 4n − 1 and
denoted g = su(2n ), represented via traceless n × n skew-Hermitian (A = −A∗ )
matrices (see Appendix A). Solving time-optimal problems in such contexts often
relies upon appropriate selection of a subset of generators from the Lie algebra as
the control subset from which to synthesise a quantum circuit (see section C.5).
This is especially the case when selecting a control algebra that renders targets
UT reachable in a way that approximates geodesic curves (definition C.1.35) on
the relevant manifold M as the choice of one set of generators over another can
affect evolution time (and whether generated geodesics are indeed minimal, in cases
where multiple geodesics are available such as via great circles on a 2-sphere). Of
importance in selecting control subsets for time-optimal synthesis of geodesics in
multi-qubit systems [175,183,185,186,192] is the so-called product operator basis i.e.
a basis for the Lie algebra of generalised Pauli matrices, being tensor (Kronecker)
products (definition A.1.12) of elementary Pauli operators. The basis is formed by
Pauli spin matrices {Ix , Iy , Iz } = 1/2{σx , σy , σz } i.e. the sets of generators of rotation
in two-dimensional Hilbert space (and Lie algebra basis), with usual commutation
relations. A basis for SU(2n ) comprises of many-body tensor products of these
Pauli operators, i.e. for an n-dimensional operator, there are between 1 and n Pauli
operators tensor products with identities for various indices. An orthogonal basis
{iB} (frame) for su(n) is then given [175] in closed-form via:
where α = x, y, z and s indexes each basis element of the frame. The index aks is 1
in q of the indices and 0 otherwise, and is a way of representing:
Ikα = 1 ⊗ ... ⊗ Iα ⊗ 1
where Iα appears only at the kth position with the identity appearing everywhere
else. The parameter q tells us how many Pauli operators are tensor-producted
4.11. DIFFERENTIAL GEOMETRY AND LIE GROUPS 177
together e.g. q = 1 means the basis element only has one Pauli and the rest identities;
q = 2 means we are dealing with a basis formed by tensor products of two Pauli
operators and identities etc.
Geometric control techniques for multi-qubit systems often focus on selecting one-
and two-body Pauli product operator frames (bases) for relevant control subsets
[53, 184] (see section C.5.5). If the control subset contains only one- and two-body
elements of the Lie algebra g, then curves generated in the corresponding Lie group
G are more likely (with a number of important caveats) to be approximations to
(and in the limit, as the number of gates n → ∞, representations of) geodesic
curves and thus time-optimal synthesis of target unitary propagators. The intuitive
reason for this is that of the full Lie algebra g, one- and two-body generators are less
‘expensive’ as measure for example by a metric calculating energy (equation C.4.2).
This approach can be seen across a number of key results in the literature [175, 184,
185, 192] and forms the basis for the relevant distribution used in subRiemannian
variational methods in [53,54] which the protocols developed in this Chapter expand
upon. The preference for one- and two-body Pauli operator frames arises in different
contexts.
For example, it is demonstrated in [175] in the case where G = SU (4) and K =
SU (2) ⊗ SU (2) that by finding an appropriate Cartan decomposition G = KAK
(with associated Lie algebra decomposition g = p ⊕ k) (see sections B.5 and B.4)
and maximally abelian Cartan subalgebra:
h = ispanIx Sx , Iy Sy , Iz Sz ⊂ p
(where Iα represent one-body terms and Sβ two-body terms), we can write exp(−ih) =
A in the KAK decomposition as the exponential of a linear combination of the gen-
erators in h, namely:
UF = K1 exp(−i(α1 Ix Sx + α2 Iy Sy + α3 Iz Sz ))K2
3
X
T = min |αi |
αi
i=1
UF = Q1 exp(−i2πJ(α1 Ix Sx + α2 Iy Sy + α3 Iz Sz ))Q2
where Q1 , Q2 ∈ K. The proof essentially relies on the fact that because synthesis
of Q1 , Q2 takes negligible time, then synthesis time is determined by the time to
synthesise A in the KAK which is determined by the parameters αi , hence minimal
time amounts to minimising the sum of αi . Synthesis time is thus minimal to the
extent that the ‘fewest-body’ Pauli generators are utilised in the control subset.
Thus, ideally, to generate minimal (and thus time optimal) paths in G to reach
arbitrary target unitaries UT , one should ideally choose the control subset with as
few many-body terms as necessary in order to render UT reachable in a control
sense. In this regard, we note recent work [146] regarding surprising constraints on
realisability and universality of unitary gate sets (in control language, reachability
of circuits) for unitary transformations on composite (e.g. multi-qubit) systems
generated by two-local unitaries. As noted in that work by Marvian, this may require
additional algorithmic tailoring and/or the use of ancilla qubits to circumvent such
restrictions. We have not in this work addressed such generic limitations, but they
are an important consideration in any practical application. It is an open research
question as to whether (and to what extent) machine learning techniques may also
provide a means to bridge such gaps in universality arising from the tension between
two-local unitaries and symmetry properties of composite systems.
Nielsen et al. [181] also focus on adopting one- and two-body terms in their metric-
based approach to characterising and generating time-optimal quantum circuits. For
example, the preference for one- and two-body generators is justified by imposing
a Hamming weight term wt(σ) applied to the Pauli generators σ together with a
penalty function p(·) that penalises the control functional whenever Pauli terms of
4.11. DIFFERENTIAL GEOMETRY AND LIE GROUPS 179
high Hamming weight are part of the control Hamiltonian. The idea is that Pauli
n-tuples (tensor products) of anything more than one- or two -body Hamiltoni-
ans will be penalised via a higher Hamming weight as they will have many more
non-identity elements, whereas one- and two-body operators have lower Hamming
weight). Nielsen et al. demonstrate that selection of one- and two-body generators
is optimal for calculating a lower bound on the complexity measure mG (U ) using
Finsler metrics i.e:
dF (I, U ) ≤ mG (U ) (4.11.1)
The significance of restricting control subsets together with bespoke metrics when
utilising geometric optimisation techniques is evident in later work [183]. For quan-
tum control optimisation architectures, this demonstrates the utility of Finsler met-
rics as a more general norm-based measure of distance (and thus optimality) together
with a justification of the selection of one- and two-body generators due on the basis
of Hamming weights. The use of the ‘penalty metric’ approach (see discussion in
the context of linear models in section D.4.1 and regularisation generally in section
D.3.2) is explored in further work [185, 192] however, as noted in [53], such ap-
proaches can be convoluted without providing guarantees that optimal generators
will be selected.
In [183], Nielsen et al. expand certain elements of the initial program com-
bining techniques from differential geometry and variational methods to quantum
circuit synthesis and quantum control. This second paper considered the difficulty
of implementing a unitary operation U generated by a time dependent Hamiltonian
evolving to the desired UT . They show that the problem of finding minimal circuits
is equivalent to analogous problems in geometric control theory i.e. this Chapter
has more of a focus on quantum control utilising geometric means. They select a
cost function on H(t) such that finding optimal control functions for synthesis of UT
(evolving according to the Schrodinger equation) involves finding minimal geodesics
on a Riemannian manifold (M, g) (see definition C.2.4).
where the first summation is over one- and two-body terms, the second over all other
tensor products. A cost function is constructed with a penalty term p2 imposed that
180 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
a relatively trivial but important feature of the machine learning code in model
architectures explored above.
Later work [184] of Nielsen and Dowling provides a more directly applicable
example of how to develop analytic solutions to geodesic synthesis of unitary op-
erations. As with the discussion above, it is worth exploring the key results from
this Chapter in order to understand characteristics of relevance to any attempt
to utilise geometric methods for synthesis of unitary propagators for multi-qubit
systems. In the paper, they develop a method of deforming (homotopically) sim-
ple and well-understood geodesics to geodesics of metrics of interest. Intuitively,
the idea is to start with a known geodesic curve between I and UT and, sub-
ject to certain constraints, ‘bend’ it homotopically (that is, via mappings which
preserve topological properties) into a minimal-length curve. However, as demon-
strated in [184], a similar preference for one- and two-body terms is manifest in the
applicable lifted Hamilton-Jacobi equation (this work is also important for anyone
interested in geometric quantum control given its discussion of significant (and po-
tentially intractable) complexity constraints presented by the quantum extension
of the Rabarov-Rudich theorem and also extend geometric quantum computing to
include ancilla qubits.
The approach in [192] is precisely to use the penalty metric approach of Nielsen et
al. to generate a subRiemannian geodesic equation in order to confine the generators
of the curve on the manifold to the control subset A. This is achieved by adopting
the norm-based cost function (pseudometric) where higher-order generator terms
are weighted with penalty q, so that minimisation will by extension favour those
generators (i.e. favour generators in p not k). By doing so, a sufficiently proximal
initial seed for the “shooting method” (see [192, 205]) is generated. This method
is a generic numerical technique for solving differential equations with two-point
boundary problems (where our two points are I and UT on G) and thus generating
approximate geodesics.
su(2n ) = P + Q H = HP + HQ (4.11.7)
The idea is that higher-order (three- or more-body) terms in {HQ } carry a penalty
parameter (weight) which is designed, when curve length is obtained via minimising
the action, to penalise higher-order terms in a way that the functional (solution)
to the variational problem is more likely to contain only one- and two-body terms.
Thus instead of restricting the sub-algebra of controls p to only one- and two-body
terms (such as is undertaken in Swaddle), they instead (as per Nielsen’s original
paper) begin with full access to the entire su(2n ) Lie algebra (i.e. fully controllable)
and then proceed to impose constraints in order to refine this down to geodesics
comprising only (or mostly) one- and two-body terms. The distinction with the
subRiemannian approach adopted in [54] is that in the latter case, generators for U
are by design constrained to be drawn from P = ∆ via the projection function in
equation (4.5.13), circumventing imposition of Finslerian penalty metrics.
To this end, one of the objectives of our experiments was to ascertain the relia-
bility of a few different methods of generating geodesics using methods drawn from
geometric control sources. In order to do so, we compared this variational geodesic
generation [46] approach to a known method for analytically determining subRie-
mannian geodesics in SU (2) in [190]. By doing so, we can be confident that the
variational method appropriately approximates geodesics. The challenge posed in
comparing geodesic methods lies in the differing assumptions of each method: [54]
constrains the norms ||proj∆ (u0 )|| = ||u0 || as a means of more efficiently generat-
ing subRiemannian geodesic approximations [53], which is in effect the time scale
(or energy scale) of their method. Conversely, [190], works at different scales. In
practice this means the generators for unitary evolution via each method differ by a
scaling related to the norm of the generators. Such different parameterisations can
be understood as follows:
For some desired tolerance (difference) ϵ, the two approximations at are identical if
the cumulative norms D(H (S) , H (B) ) of the sum of their jth Hamiltonians satisfy:
X Hj(S) (B)
(S) (B)
D(H ,H )= − Hj < ϵ. (4.12.1)
j
Ω j
That is, we want to minimise the distance between each Hamiltonian segment. The
result in [190] is a relatively simple control problem where the control subset consists
of Pauli σx , σy generators with the target a rotation about the z-axis by angle η, UT =
exp(−iησz /2). To validate that variational subRiemannian method can reproduce
the time-optimal paths from [190], a transformation between the two that enables
comparison of Hamiltonians at time tj respectively in each formulation must be
found. Pseudocode for such a transformation (in effect, a rescaling) of Hamiltonians
generated using the method in [54] by comparison with those using the method
184 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
(B) (S)
j D(H (S) , H (B) ) F (Uj , Uj )
1 0.0922 0.9934
2 0.1986 0.9935
3 0.3105 0.9935
4 0.4169 0.9936
5 0.5154 0.9936
6 0.6046 0.9936
7 0.6836 0.9937
8 0.7518 0.9937
9 0.7871 0.9938
10 0.8345 0.9939
Table 4.4: Hamiltonian distance and unitary fidelity between Swaddle and Boozer geodesic ap-
proximations.
in [190] is set-out below (where (S) indicates Hamiltonians using the method in [54]
and (B) the method in [190]).
(B)
Here, conjugation by exp(−iωtj σ2z ) represents the Euler decomposition of the
evolution in [190] as if one had direct access to the generator σz . Alternatively, one
(B) (S)
can also compare unitaries at equivalent times via operator fidelity F (Uj , Uj )
where:
Numerical results comparing both Hamiltonian average distance (4.12.1) and fideli-
ties for ten Uj instances across N segments are set-out below.
(S) (B)
Fidelity results indicate little difference between Uj and Uj , while Hamiltonian
distance increases with j. Overall, the results provide some measure of confidence,
though not analytic certainty, that the variational subRiemannian means of geodesic
approximation in [54] are useful candidates for training data.
4.13. NEURAL NETWORK AND GRU ARCHITECTURES 185
Feed-forward fully-connected neural networks (see section 4.13.1 for general discus-
sion), such as the ones deployed in the models above, can be understood in terms
of functional composition. The objective of deep feed-forward networks is to define
an input-output function z = f (a, w, b) where al are inputs to the layer l (setting
the initial input a0 = x), wl is a tensor of parameters for layer l to be learnt by the
model and bl is a bias tensor applied to al [14, 206]. In its simplest incarnation, the
feed-forward stack takes as input a flattened realised a0 = UT (where k runs over
the dimension of the vector). A layer of a simple neural network consists of units or
neurons activation functions σ (in our case, the ReLU or tanh activation function)
applied to the z such that we have al = σ(z l ), vector and bias b:
where we notice that the output of the previous layer is the input vector into the
subsequent layer. All final layers in the feed-forward networks used σ = tanh activa-
tion functions. The output of an entire layer al is a sequence structured as a vector
that then becomes the input to the next layer. Information in this compositional
model flows ‘forward’ (hence ‘feed-forward’).
When the entire set of units of a preceding layer becomes an input into each unit
of the subsequent layer, we say the layer is dense. The weights are updated using
backpropagation and gradient descent with respect to the applicable cost functional
(description from [206] below, here ⊙ is the Hadamard (element-wise) product), x
refers to each training example (batch gradient descent example below).
186 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
k
k runs over neurons in layer L
Backpropagation: for layers l = L − 1, L − 2, ..., 2, calculate:
δ x,L = ∇a Cx ⊙ σ ′ (z x,L )
Gradient: cost function gradient given by:
∂c
x,l
(∂wjk
= ax,l−1
k δjx,l and ∂C
∂bx,l
= δjl
j
Update weights: for each layer l = L, L − 1, ..., 2 update:
η P
wl → wl − m x,l x,l−1 T
x δ (a )
l l η P x,l
b → b − m xδ
Long-Short Term Memory networks and Gated Recurrent Units are a prevalent
form of recurrent neural network (RNN). RNNs are networks tailored to modelling
sequential data, such as time-series data, or data such as sequences of control am-
plitudes (cj ) [14]. For RNNs, for each time-step t, there is an input xt (such as ct ),
an output yt and hidden-layer output ht . The key intuitive idea behind RNNs is
that ht of the network itself becomes an input into hidden layers for the immediately
next time-step t + 1. LSTMs advance upon this concept by enabling the output of
hidden layers to influence not just the immediately succeeding time-step t + 1, but
also potentially activation functions at later time steps. In this sense LSTMs allow
information about previous hidden layers (or states) to function as ‘memory’ that
is carried forward.
One of the challenges regarding RNNs is the saturation of networks where new
inputs to an activation function fail to contribute significantly to its output. In-
tuitively too much information is saturating the model, so additional information
does not lead to material updates (manifest, for example in flatlining loss, as seen
in some examples above). A way to overcome this problem of saturation includes
to stochastically ‘forget’ certain information in order to make room for additional
information, as manifest in the forget gate of an LSTM, distinct from the update
gate. GRUs [207] by contrast seek to incorporate the output of hidden layers and
updates into subsequent hidden layers as detailed below. Their popularity is often
4.13. NEURAL NETWORK AND GRU ARCHITECTURES 187
rt = σ(wr xt + ur ht−1 + br )
zt = σ(wz xt + uz ht−1 + bz )
This update gate is the output of the unit at time t. However, in order to output
ht , an intermediate hidden layer state is calculated:
where we see the (rt ⊙ ht−1 ) term incorporates the influence of the reset gate and
previous hidden layer ht−1 into the estimate. The final hidden layer output is then
calculated by combining the Hadamard products of the update gate and previous
hidden state together with the intermediate hidden state:
ht = zt ⊙ ht−1 + (1 − zt ) ⊙ h̃t
which is the ultimate output at time t. The incorporation of ht−1 in this way allows
influence of prior information in the sequence to influence future outputs, improving
the correlation between outputs such as controls.
188 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
Chapter 5
5.1 Abstract
Geometric methods have useful application for solving problems in a range of quan-
tum information disciplines, including the synthesis of time-optimal unitaries in
quantum control. In particular, the use of Cartan decompositions to solve prob-
lems in optimal control, especially lambda systems, has given rise to a range of
techniques for solving the so-called KP problem, where target unitaries belong to
a semi-simple Lie group manifold G whose Lie algebra admits a g = k ⊕ p decom-
position and time-optimal solutions are represented by subRiemannian geodesics
synthesised via a distribution of generators in p. In this Chapter, we propose a
new method utilising global Cartan decompositions G = KAK of symmetric spaces
G/K for generating time-optimal unitaries for targets −iX ∈ [p, p] ⊂ k with con-
trols −iH(t) ∈ p. Target unitaries are parametrised as U = kac where k, c ∈ K and
a = eiΘ with Θ ∈ a. We show that the assumption of dΘ = 0 equates to the corre-
sponding time-optimal unitary control problem being able to be solved analytically
using variational techniques. We identify how such control problems correspond
to the holonomies of a compact globally Riemannian symmetric space, where local
translations are generated by p and local rotations are generated by [p, p].
5.2 Introduction
Symmetry-based decompositions [23,55,209–213] are a common technique for reduc-
ing problem complexity and solving constrained optimisation problems in quantum
control and unitary synthesis. Among various decompositional methods, Cartan
KAK decompositions (section B.5) represent a generalised procedure for decom-
189
190CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
posing certain semi-simple Lie groups [2] exhibiting involutive automorphic symme-
try, akin to generalised Euler or singular-value decompositions. Cartan decompo-
sitions have found specific application across a range of domains, such as synthe-
sising time-optimal Hamiltonians for spin-qubit systems in nuclear magnetic reso-
nance [22, 152, 175], linear optics [214], general qubit subspace decomposition [189],
indirectly relating to entanglement dynamics [215] and the entangling power of
unitary dynamics in multi-qubit systems [216]. Other approaches in information
theory [217] use Cartan decompositions for quantum Shannon decompositions and
quantum circuit programming [218]. More recently, their use has been proposed for
efficient measurement schemes for quantum observables [219], fixed-depth Hamilto-
nian simulation [220], reducing numerical error in many-body simulations [221] and
time-reversal operators [222] and also measurement of quantum observables [219].
Cartan decompositions have been of interest in quantum unitary synthesis due to
the fact that multi-qubit systems carrying representations of SU (2n ) can often be
classified as type AI and AIII symmetric spaces [15, 22, 175, 218, 223, 224]. Specific
interest symmetric space formalism in quantum computing has largely been due
to their use in synthesising time-optimal or more efficient or controllable quantum
circuits [175, 181, 187, 189, 214, 216, 221, 222, 225, 226].
Symmetry-based decompositions (see section B.4 and D.8), such as Cartan de-
compositions, have two-fold application in quantum control problems: firstly, symmetry-
decompositions can simplify unitary synthesis via reducing the computational com-
plexity [227]; secondly, they have specific application in quantum control settings
where the control algebra (and thus set of Hamiltonians) available are only a subal-
gebra of the corresponding full Lie algebra g [53, 175, 228]. This Chapter focuses on
the second use case. For unitary targets belonging to semi-simple connected groups
UT ∈ G amenable to Cartan decomposition as G = KAK (where A < G/K and
K < G), a key challenge is identifying the Hamiltonian which will synthesise the uni-
tary optimally. While broadly utilised, Cartan-based unitary synthesis techniques
have suffered from practical limitations due to exponential circuit depth [181, 193]
together with difficulty in identifying the appropriate form of Cartan decomposi-
tion [227] and form of Hamiltonian. In this Chapter, we address such challenges by
providing a generalised procedure for time-optimal unitary and Hamiltonian syn-
thesis using a global Cartan decomposition. Specifically, we demonstrate that for
parametrised unitaries with targets in G = KAK, by utilising what we denote the
constant-θ method, Hamiltonians composed of generators from the horizontal (anti-
symmetric) bracket-generating distribution of the underlying Lie algebra associated
with G can, in certain cases, represent time-optimal means of synthesising such
unitaries.
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 191
Under the postulates of quantum mechanics (axiom A.1.3), unitary motion is defined
by the Schrödinger equation (definition A.1.18):
dU
= −iHU (5.3.1)
dt
Geometric control theory (section C.5) is characterised by the study of the role of
symmetries in optimal control problems. The KP problem has been most exten-
sively detailed by Jurdjevic et al. [23] in the context of geometric control theory.
A common application of techniques of geometric control involves so-called lambda
systems, three-level quantum systems where only two of the levels are encoded with
information of interest. In such systems, the two lowest energy states of a quantum
system are coupled to the highest third energy state via electromagnetic fields [63].
The typical model for the study of lambda systems in quantum control settings is
the Schrödinger equation in the form:
dU (t) X
= ÂU (t) + Bj U (t)ûj (t) U (0) = I (5.3.3)
dt j
where U (t) ∈ SU (3) and  is a diagonal matrix comprising the energy eigenval-
ues of the system. The control problem becomes finding control functions û(t) to
synthesise UT = U (T ) in minimum time subject to the constraint that ||û|| < C
for constant bound C. The unitary targets are UT ∈ G, a semi-simple Lie group
admitting a Cartan decomposition into compact and non-compact subgroups K and
P respectively (hence the nomenclature “KP ”). As discussed in [56] and elsewhere
in the literature [181, 184], time-optimisation can be shown to be equivalent to syn-
thesising subRiemannian geodesics (section C.4.1) on a subRiemannian manifold
implied by the homogeneous space G/K. In this picture, the minimum synthesis
time equates to the arc length (equation (C.4.1) (measured according to the appli-
cable subRiemannian metric) of the subRiemannian curve γ calculated by reference
to the pulses u(t) (see equation (5) in [56]). Intuitively, in this picture, the control
Hamiltonian in p traces out a minimal-time along the manifold. For certain classes
of KP problem, Jurdjevic et al. [17,23,60,213] have shown that geodesic solutions to
equation (5.3.3) take a certain form (see example below). In [213] this is expressed
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 193
as:
dγ
= γ eKt P e−At
(5.3.4)
dt
γ(t) = e(−A+P )t e−At (5.3.5)
with A ∈ k and assuming γ(0) = I for a symmetric matrix P (that is in p). Discussion
of the case in which e(−A+P )t is a scalar is set out in [229]. Detailed exposition by
Jurdjevic is set out in [23,60,213] and elsewhere. In this Chapter, we show that such
results can be obtained by a global Cartan decomposition in tandem with certain
constraints on chosen initial conditions U (0) = U0 and the choice of generators Φ ∈ k
used to approximate the target unitary in K.
U = keiΘ c (5.3.7)
We propose that for certain classes of quantum control problems, namely where
the antisymmetric centralizer generators parameterised by angle θ remain constant,
analytic solutions for time-optimal circuit synthesis are available for non-exceptional
symmetric spaces. Such cases are explicitly where control subsets are limited to cases
where the Hamiltonian comprises generators from a horizontal distribution (bracket-
generating [2,230] and see definition C.4.4) p with p ̸= g (where the vertical subspace
(definition C.1.25) is not null). Only access to subspace p ⊂ g is directly available
for control purposes. If [p, p] ⊆ k holds, arbitrary generators in k may be indirectly
synthesised (via application of Lie brackets) which in turn makes the entirety of
g available and thus, in principle, arbitrary UT ∈ G (if [p, p] = k) reachable (in a
control sense, see definition C.5.4).
-1 Θc
U = keiΘ c = qe−ic (5.3.8)
dU U -1 = −iHdt (5.3.9)
where the left-hand side consists of parameters over the manifold of unitaries (or the
group G), while the right-hand side consists of parameters that are in a geometric
sense external to the manifold, in the vector field (definition C.1.7) associated with
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 195
Geometrically, we can interpret the left K as a local frame (a choice of basis for
the tangent space Tp M at each point p of the group G - see definition C.1.4), the
right K as a global azimuth (a parameter that describes the position of a point on
the sphere and is the same for all points along the ‘longitude’ (or orbits of) K),
and the Θ as the polar geodesic connecting the two (akin to latitude). The latter
two sets of parameters are the coordinates of a symmetric space. Presented in this
way, the Schrödinger equation represents a Maurer-Cartan (differential) form [2] (see
definition C.1.20), encoding the infinitesimal structure of the Lie group and satisfies
the Maurer-Cartan equation (equation (C.1.39)). For our purposes, it allows us to
interpret the right-hand side in terms of the relevant principal bundle connection (see
sections C.1.6.1 and C.1.7). Intuitively, a connection provides a way to differentiate
sections of the bundle and to parallel transport (definition C.1.8) along curves in
the base manifold G. The minimal connection (see section 5.7 for exposition) here
is a geometric way of expressing evolution only by elements in p ∈ HM (i.e. the
horizontal subspace) where the quantum systems evolves according to generators in
the horizontal control subset of the Hamiltonian given by:
i dΘ + sin adΘ (dcc -1 ) . (5.3.11)
The evolution of the system can be framed geometrically as tracing out a path on the
manifold (a Riemannian symmetric space) according to generators and amplitudes
(and controls) in the Hamiltonian. This path is a sequence of quantum unitaries. In
quantum mechanics, we are interested in minimising the external time parameter.
For this, we typically minimise the action such that the total time (length) of a
circuit evolving is:
Z p
ΩT = (idU U -1 , idU U -1 ). (5.3.12)
γ
typical geometric path length s where s ∈ [0, 1] parametrises the path from beginning
to end and ds = Ωdt (see the Appendix for more discussion). To find the solution
for minimal time T , we consider paths such that dΘ = 0 (and thus dθ = 0) which we
denote the ‘constant-θ’ method. The constant-θ method implies that for the Cartan
algebra parameter θ, we have dθ = 0. We conjecture that upon local variation of the
total time with respect to the path, we obtain Euler-Lagrange equations which show
that dqq -1 must also be constant for locally time-optimal paths. The total time for
locally time-optimal constant-θ paths becomes:
where:
Z
iΦ = dcc -1 . (5.3.14)
Determining the time-optimal geodesic path requires global variation over all geodesics,
typically a hard or intractable problem. As shown in the literature [181,184,185,233],
variational methods can be used as a means of calculating synthesis time. For
constant-θ paths, such calculations are significantly simplified as we demonstrate
below.
such that the Hamiltonian does not contain elements of k. The intuition for a qubit
is that rotations achieved via Jz are by construction parallel transporting vectors
via the action of [p, p] ⊆ k. Where Jz is not in the control subset, a way to parallel
transport such vectors, achieving a Jz rotation, but using other generators not in K
is needed. For constant-θ paths:
Note the first of these conditions is explicated as a condition that the generators
comprising Φ ∈ k belong to the commutant (see the general method section 5.6 for
discussion). Holonomic targets may be, under certain assumptions, generated via
unitaries U ∈ G/K which in a Riemannian context are paths with zero geodesic
curvature, indicating parallel translation in a geometric sense. By contrast, where
the manifold is subRiemannian (i.e. where p < g) then subRiemannian geodesics
may exhibit non-zero geodesic curvature by comparison with G and g as a whole.
Summarising the above, our conjecture is that for symmetric space controls in p with
holonomy targets in k that constant-θ paths are time-optimal. For such constant-θ
paths, the objective becomes to calculate:
and
U = qea k (5.3.21)
dU U −1 = −iHdt. (5.3.22)
The adjoint term can be decomposed into symmetric and anti-symmetric parts:
such that:
The first two terms are in p while the latter two are in k. The orthogonal partitioning
from the Cartan decomposition ensures simplification such that cross-terms such as
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 199
(5.3.30)
and:
(5.3.31)
where dkk -1 = idΦ and dqq -1 = idΨ. Using cosh(adiΘ ) = cos(adΘ ) and sinh(adiΘ ) =
sin(adΘ ) the connection becomes:
SU (2)
S2 ≈ ≡ AIII(1,1) (5.4.2)
S(U (1) × U (1))
with Cartan subalgebra chosen to be a = ⟨−iJy ⟩. We can easily see the Cartan
commutation relations (equation (5.3.6)) hold. For S 2 above this corresponds to the
Euler decomposition:
Define:
-1 Θc
U = keiΘ c = qeic . (5.4.6)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 201
-1 iΘc
π(keiΘ c) = e2c . (5.4.8)
by taking the relevant differentials and inverses in equation (5.4.5), using equation
(B.3.2) and calculating the adjoint action on c. Under this Cartan decomposition,
we see equation (5.4.10) partitioned into symmetric k and antisymmetric p part:
dU U -1 = k iJz (dψ + dϕ cos(θ)) + iJy dθ − iJx dϕ sin(θ) k -1 . (5.4.11)
| {z } | {z }
∈k ∈p
dU -1
H = i U = k Jx ϕ̇ sin(θ) − Jy θ̇ k -1 (5.4.12)
dt
where we have defined conjugate momenta ϕ̇, θ̇ for extremisation below. To calcu-
late the optimal time, we first define the Killing form on g for which we require
normalisation of the extremised action in (such that we can define an appropriately
scaled norm and metric). For a subRiemannian space of interest in the adjoint rep-
resentation, the Euclidean norm is then simply defined in terms of the Killing form
p
as |X| = (X, X) such that:
where |Jz |2 = I. Define the energy of the Hamiltonian as |H| = Ω such that:
Z Z Z
-1 dU -1
Ωt = Ω dt = |idU U | = i U ds. (5.4.14)
γ γ γ ds
202CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
Here γ defines the curve along the manifold generated by the Hamiltonian and
t the time elapsed. Calculating path length here is equivalent to approximating
time elapsed modulo Ω. Consistent with typical differential geometric methods, we
parametrise by arc length ds [51]. Define:
dU -1 dt
ṫ = i U = (5.4.15)
ds ds
Extremisation can be performed using the method of Lagrange multipliers and the
minimal connection above. As we demonstrate below, doing so in conjunction with
the constant-θ assumption simplifies the variational problem of estimating minimal
time. The relevant action is given by:
Z
S = Ωt = ṫ + λJz (ψ̇ + ϕ̇ cos(θ)) ds (5.4.17)
γ
noting again the role of the connection. Parametrising by arc length s we have:
2
dU
i U -1 = (ψ̇ + ϕ̇ cos(θ))2 + θ̇2 + (ϕ̇ sin(θ))2 . (5.4.18)
ds
Extremising the action δS = 0 resolves the canonical position and momenta via the
equation above as:
δt 1
Ω = ψ̇ + ϕ̇ cos(θ) + λ (5.4.19)
δ ψ̇ ṫ
!
δt ϕ̇ ψ̇
Ω = + + λ cos(θ) (5.4.20)
δ ϕ̇ ṫ ṫ
δt θ̇
Ω = (5.4.21)
δ θ̇ ṫ
!
δt ψ̇
Ω =− + λ ϕ̇ sin(θ) (5.4.22)
δθ ṫ
5.4. TIME-OPTIMAL CONTROL EXAMPLES 203
δt
Ω = ψ̇ + ϕ̇ cos(θ) (5.4.23)
δλ
where we have assumed vanishing quadratic infinitesimals to first order e.g. dϕ2 = 0.
We note that given λ is constant, it does not affect the total time T . The choice of
λ can be considered a global gauge degree of freedom i.e. ∂T∂λ
= 0 (i.e. regardless of
λ minimal time, T remains the same). The minimal connection constraint:
as we have specified k. The connection and equation (5.4.23) imply that ψ̇(s) be-
comes a local gauge degree of freedom in that it can vary from point to point along
the path parameter s without affecting the physics of the system (i.e. the rate of
change of ψ can vary from point to point without affecting the energy Ω or time T
of the system). That is:
δT
= 0. (5.4.26)
δ ψ̇
We can simplify the equations of motion by setting a gauge fixing condition (some-
times called a gauge trajectory). Thus we select:
ψ̇/ṫ + λ = 0. (5.4.27)
δt δt ϕ̇ θ̇
dS = Ω + Ω = + = 0. (5.4.28)
δ ϕ̇ δ θ̇ ṫ ṫ
Thus ϕ̇/ṫ and θ̇/ṫ are constant by the constant-θ assumption i.e. θ̇ = 0. Minimising
over constant θ̇:
Z
∂T
Ω = (θ̇/ṫ) ds = 0 (5.4.29)
∂ θ̇ γ
R
confirms the independence of T from θ (i.e. as ṫ and the path-length γ ds ̸= 0).
Combining the above results reduces the integrand in equation (5.4.13) to depen-
dence on the dϕ sin θ term (as the minimal connection condition and constant θ
condition cause the first two terms to vanish). Such simplifications then mean opti-
204CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
mal time is found via minimisation over initial conditions in equation (5.4.16):
Z Z
T = min dϕ sin θ = min |ϕ sin θ| ϕ= dϕ. (5.4.30)
θ,ϕ γ γ
Note by comparison with the general form of equation (5.6.48), here the ϕ sin θ term
represents sin adΘ (dcc -1 ) = sin adΘ (Φ). The above method shows how the constant-θ
method simplifies the overall extremisation making the minimisation for T manage-
able.
-1 Θc
U (T )U0-1 = e−iX = eiJz 2η = keiΘ c = qeic . (5.4.32)
U0 = eiΘ . (5.4.33)
ϕ = 2πn (5.4.35)
for n ∈ Z. In general we must optimise over choices of n, which in this case is simply
n = 1. From this choice of initial condition we have:
Note this is a relatively simple form of X = (1 − cos adΘ )(Φ) as Φ comprises only a
single generator Jz . This condition is equivalent to:
Z Z
2η = dχ = dψ + dϕ = 2πn(1 − cos(θ)). (5.4.38)
γ γ
Here we have again used the minimal connection constraint for substitution of vari-
ables. Thus:
η
cos(θ) = 1 − . (5.4.39)
nπ
2 η 2
2η η 2
sin θ = 1 − = − (5.4.43)
nπ nπ nπ
using trigonometric identities and setting n = 1. Note the time optimality is con-
sistent with [190], namely:
p
2 η(2π − η)
T = . (5.4.44)
Ω
We now have the optimal time in terms of the parametrised angle of rotation for Jz .
To specify the time-optimal control Hamiltonian, recall the gauge fixing condition
(equation (5.4.27), which can also be written ψ/t + λ = 0) such that:
λ = −ψ̇/ṫ (5.4.45)
= −ψ(T )/T (5.4.46)
ϕ
= cos(θ) (5.4.47)
T
π−η
= Ωp (5.4.48)
η(2π − η)
where we have used equations (5.4.35) and (5.4.39). Connecting the optimal time
to the control pulses and Hamiltonian, note that λ can be regarded (geometrically)
as the rate of turning of the path. In particular, noting that λ = −ψ(T )/T , we
can regard λ as the infinitesimal rotation for time-step dt. In a control setting
206CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
with a discretised Hamiltonian, we regard it as the rotation per interval ∆t. Thus
−ψ(T ) → λT and per time interval −ψ(t) → λt. The Hamiltonian then becomes:
η
UT (t) = ei 2 σz = eiηJz . (5.4.56)
With ν = 1 − η/(2π), the time-optimal solution for α(t) becomes α(t) = ωt where:
2ν
ω/Ω = √ (5.4.57)
1 − ν2
We now consider now the constant-θ method of relevance to lambda systems, specif-
ically for the SU (3)/S(U (1)×U (2)) (AIII(3,1) type) symmetric space, distinguished
by the choice of Cartan decomposition (for classification of symmetric spaces gener-
ally see section C.3 and [2]). The fundamental representation of SU (3) generators
5.4. TIME-OPTIMAL CONTROL EXAMPLES 207
0 0 0 0 0 0
(5.4.59)
1 0 0 0 0 1
λ3 = 0 −1 0 = |0⟩⟨0| − |1⟩⟨1| λ4 = 0 0 0 = |0⟩⟨2| + |2⟩⟨0|
0 0 0 1 0 0
(5.4.60)
0 0 −i 0 0 0
λ5 = 0 0 0 = −i|0⟩⟨2| + i|2⟩⟨0| λ6 = 0 0 1 = |1⟩⟨2| + |2⟩⟨1|
i 0 0 0 1 0
(5.4.61)
0 0 0
λ7 = 0 0 −i = −i|1⟩⟨2| + i|2⟩⟨1| (5.4.62)
0 i 0
1 0 0
1 1
λ8 = √ 0 1 0 = √ (|0⟩⟨0| + |1⟩⟨1| − 2|2⟩⟨2|) (5.4.63)
3 3
0 0 −2
Following [232,234], we set out commutation relations for the adjoint representation
of su(3), the Lie algebra of SU (3) in Table (5.1). The row label indicates the first
entry in the commutator, the column indicates the second.
208CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
⊥
−iλ1 √ −iλ2 −iH
√ III −iHIII −iλ4 −iλ5 −iλ6 −iλ7
⊥
−iλ1 √ 0 i 3HIII − iHIII −i
√ 3λ2 iλ2 −iλ7 iλ6 −iλ5 iλ4
⊥
−iλ2 −i 3H√III + iHIII √0 3iλ1 −iλ1 −iλ6 −iλ7 iλ
√4 iλ5
√
−iHIII i 3λ2 −i 3λ1 0 0 0 0 −i 3λ7 i 3λ6
⊥
−iHIII −iλ2 iλ1 0 0 −i2λ5 i2λ4 −iλ7 iλ6
⊥
−iλ4 iλ7 iλ6 0 i2λ5 0 −2iHIII −iλ2 −iλ1
⊥
−iλ5 −iλ6 iλ7 √0 −i2λ4 i2HIII 0 iλ1 √ iλ2 ⊥
−iλ6 iλ5 −iλ4 i √3λ7 iλ7 iλ2 −iλ1 √ 0 −i 3HIII + HIII
⊥
−iλ7 −iλ4 −iλ5 −i 3λ6 −iλ6 iλ1 −iλ2 i 3HIII + HIII 0
Table 5.1: Commutation relations for generators in adjoint representation of su(3). The Cartan decomposition is g = k ⊕ p where k =
⊥
span{−iλ1 , −iλ2 , −iHIII , −iHIII } (red) and p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 } (black). As can be seen visually, the decomposition satisfies the Cartan com-
mutation relations (equation (5.3.6)): the yellow region indicates [k, k] ⊂ k, the green region that [p, p] ⊂ k and the white region that [p, k] ⊂ p. From a control
perspective, by inspection is clear that elements in k can be synthesised (is reachable) via linear compositions of the adjoint action of p upon itself (the green
region) as a result of the fact that [p, p] ⊆ k. We choose a = ⟨−iλ5 ⟩ with h = ⟨−iHIII , −iλ5 ⟩.
5.4. TIME-OPTIMAL CONTROL EXAMPLES 209
and:
⊥ ⊥
ad2Θ (−iHIII ) = 0 ad2Θ (−iHIII ) = (−2θ)2 (−HIII ) (5.4.71)
⊥
k = span{−iλ1 , −iλ2 , −iHIII , −iHIII } (5.4.72)
p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 }. (5.4.73)
can simply read off the combination {−iHIII , −iλ5 } (such that h ∩ p = −iλ5 ) from
the commutation table above. The commutant of a = span{−iλ5 } is the subgroup:
M ≡ k ∈ K : ∀iΘ ∈ a, kΘk -1 ∈ Θ = em
(5.4.74)
We also define:
√ ⊥
0 0 0
⊥ ′′ 3HIII + HIII
HIII =− = 0 1 0 (5.4.75)
2
0 0 −1
which together with λ6 and λ7 define the microwave Pauli operators and the gener-
ator:
⊥
√ −2 0 0
′′ −3HIII − 3HIII
HIII = = 0 1 0 (5.4.76)
2
0 0 1
which commutes with the microwave qubit. Continuing, via equations (5.6.65-
5.6.66) we have optimal time in the form:
We set out the general form of targets using the particular Cartan decomposition
and choice of basis for SU (3) above. We set:
⊥′′
√
Φ = −i(ϕ1 λ1 + ϕ2 λ2 + ϕ3 HIII + ϕ4 3HIII ). (5.4.80)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 211
using cos(−θ) = cos(θ), that cos α(θ) = cos(0) = 1 for −iHIII . Our targets are in
this case of the form:
where 1 − cos α(θ) = 0 for HIII . Note that for more general targets involving linear
combinations of HIII , specific choices or transformations of Φ and transformations
of Θ such that the HIII term does not vanish may be required. The rationale for this
is the subject of ongoing work and is important for proving the extent of (and any
constraints upon) the constant-θ conjecture and to understanding the interplay of
system symmetries with reachability of targets. As per the general method set out
below, this form of X can be derived as follows. We begin with the most general
form of target:
⊥′′
√
X = −i(η1 λ1 + η2 λ2 + η3 HIII + η4 3HIII ). (5.4.85)
Minimising evolution time firstly requires a choice of an initial condition. Our target
can be written:
-1 Θc
U (T )U0-1 = e−iX = qeic (5.4.86)
′ ′′
q(T ) = k(T )c(T ) = eΦ eΦ = eΦ = e−iX (5.4.87)
Where a target does not comprise a generators, its coefficient may be set to zero.
Gathering terms:
′′
q(T ) = e−iΦ (5.4.89)
√
Φ′ = −i(ψ1 λ1 + ψ2 λ2 + ψ3 HIII
⊥
+ ψ4 3HIII ) (5.4.90)
Φ′′ = −i (ψ1 + ϕ1 )λ1 + (ψ2 + ϕ2 )λ2 (5.4.91)
√
⊥′′
+ (ψ3 + ϕ3 )HIII + (ψ4 + ϕ4 ) 3HIII (5.4.92)
where the minimisation depends on the choice of ϕk . The choice of ϕi must be such
that the commutant condition eiΦ ∈ M is satisfied. For certain targets X, this
means:
√ ⊥′′
Φ = −iϕ3 3HIII − i2πk(nx λ1 + ny λ2 + nz HIII ) (5.4.101)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 213
and
For simple targets, such as those dealt with in the next section, ϕk simply be an
integer multiple 2πk, in which case the minimisation problem becomes one of se-
lecting the appropriate choice of k that minimises T . Note that as discussed, this
particular form of Hamiltonian cannot reach targets in HIII . Having determined
T , the optimal time Hamiltonian is constructed as follows. Recall from equation
(5.6.70) that:
noting for completeness that Λ ∈ k (and sin adΘ (Φ) ∈ p). We demonstrate the
technique for a specific example gate in the literature below.
In this section, we apply the constant θ method to derive time optimal results
from [229]. The KP decomposition for D’Alessandro et al. in [63] and [229] is:
1
p = √ span{iλ1 , iλ2 , iλ4 , iλ5 } (5.4.109)
2
1
k = √ span{iλ3 , iλ6 , iλ7 , iλ8 }. (5.4.110)
2
As the only difference with our chosen KP decomposition (up to the constant
√
−1/ 2) is swapping −iλ1 , −iλ2 ∈ p and −iλ6 , −iλ7 ∈ k we can use the same
⊥
change of basis from −iλ3 , −iλ8 to −iHIII , −iHIII and choice of a = span{−iλ5 }.
214CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
For convenience and continuity with [229], we use the standard notation from that
paper. The key point for our method is the choice of a Cartan subalgebra that is
maximally non-compact allowing us to select Θ proportional to −iλ5 . The form of
targets in [229] Uf is:
!
e−iϕs 0
Uf = (5.4.111)
0 Ûs
where Us ∈ U (2) and det Us = eiϕs . For the given KP decomposition, matrices K
(block diagonal) and P (block anti-diagonal) have the form:
! 0 α β
if 0
K= P = −α∗ 0 0 (5.4.112)
0 Q
−β ∗ 0 0
chosen in order to eliminate the drift term. The general form of A ∈ k and P =
iλ1 ∈ p in that work are:
a+b 0 0 0 1 0
A= 0 −a −c P = i 1 0 0 . (5.4.113)
0 −c −b 0 0 0
resulting in:
0 0 0 0 i 0
A = 0 0 ± √7i15 P = i 0 0 . (5.4.115)
0 ± √7i15 0 0 0 0
In [229], the results from equations (5.3.5) are assumed and expressed via the con-
straint exp((−A + P )t) = exp(2πk/3) = I (where k ∈ Z) and t = tmin = 2πT =
√
15π/4, in which case:
√ 0 0 0
√ 15π
exp(A 15π/4) = exp 0 0 ± √7i15 (5.4.117)
4
0 ± √7i15 0
!
1 0
= (5.4.118)
0 exp(−7π/4σx )
1 0 0
= 0 cos(−7π/4) i sin(−7π/4) = Uf (5.4.119)
0 i sin(−7π/4) cos(−7π/4)
where we choose c = − √7i15 in the second line from equation (5.4.115) above. If
the positive value is used, a similarity transformation K ∈ exp(k) of the KAK
decomposition exp(At)P exp(−At) is required. In the general form of solution to
subRiemannian geodesic equations from Jurdjevic et al., the control algebra is re-
lated to the Hamiltonian via:
X
exp(At)P exp(−At) = u(t)j Bj (5.4.120)
j
In this relatively simple example, we can choose Φ to solely consist of −iλ6 . Setting
Θ = −iλ5 and noting cos adΘ (Φ) = cos α(θ)(Φ):
π
= 2π(1 − cos(θ)) (5.4.131)
4
1 7
cos(θ) = 1 − = (5.4.132)
s 8 8 √
2
7 15
sin(θ) = 1 − = (5.4.133)
8 8
which is (as time must be positive) the minimum time t to reach the equivalence
class of target Hamiltonians (that generate Uf up to conjugation) in [229]. Note in
this Chapter we denote this minimum time to reach the target as T . For convenience
√
we assume Ω = 1 such that T = 15π/4. To calculate the Hamiltonian:
0 − √7i15 0
5.5 Discussion
We have demonstrated that for specific categories of quantum control problems,
particularly those where the antisymmetric centralizer generators parameterised by
angle θ remain constant, it is possible to obtain analytic solutions for time-optimal
circuit synthesis for non-exceptional symmetric spaces using a global Cartan decom-
position. This is particularly true when the control subsets are restricted to cases
where the Hamiltonian consists of generators from a horizontal distribution (bracket-
generating [2, 230]) p with p ̸= g (where the vertical subspace is not null). Direct
access is only available to subalgebras p ⊂ g. However, we have shown that if the
assumptions [p, p] ⊆ k and dΘ = 0 hold, arbitrary generators in k can be indirectly
synthesised (via application of Lie brackets), which in turn makes the entirety of g
available in optimal time. Geometrically, we have demonstrated a method for syn-
thesis of time-optimal subRiemannian geodesics using p. Note that, as mentioned
218CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
in the example for SU (2) above, subRiemannian geodesics may exhibit non-zero
curvature with respect to the entire manifold, but this is to be expected where we
are limited to a control subset. Consequently, in principle, arbitrary UT ∈ G (rather
than just) UT ∈ G/K becomes reachable in a control sense.
-1 Θc
U = keiΘ c = qeic (5.6.1)
where k, c ∈ K and eiΘ ∈ A. Define Cartan conjugation (see definition B.5.1) as:
π(U ) = U χ U (5.6.3)
-1 iΘc
π(keiΘ c) = e2c . (5.6.4)
That is, the representation of equation (5.6.1) as projected into the symmetric space
G/K establishing the symmetric space as a section of the K-bundle:
π(G) ∼
= G/K. (5.6.5)
The existence of χ and π are sufficient for G/K to be considered globally symmetric
i.e. it has an involutive symmetry at every point. The compactness of G refers to
the property that G is a compact Lie group, meaning it is a closed and bounded
subset of the Euclidean space in which it is embedded. The symmetric space G/K
is equipped with a Riemannian metric, which is a smoothly varying positive-definite
quadratic form on the tangent spaces of the symmetric space. Noting the Euler
formula (see equation (B.3.2)):
eiΘ Xe−iΘ = AdeiΘ (X) = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (5.6.6)
5.6. GENERALISED CONSTANT-θ METHOD 219
where X ∈ g, eiΘ ∈ A ⊂ G. Note here the (lower-case) adjoint action adΘ (X) is the
action of the Lie algebra generators on themselves (section B.3.4), thus takes the
form of the Lie derivative (definition B.2.6) (commutator):
whereas the (upper-case) group adjoint action AdΘ is one of conjugation of group
elements (hence we exponentiate the generator X implicitly). The Maurer-Cartan
form (definition C.1.20) becomes in general:
Recalling that Cartan conjugation is the negative of the relevant involution ι, define
the control space subalgebra as:
which satisfies [p, p] ⊆ k and the Cartan commutation relations more generally.
By restricting the Maurer-Cartan form (Hamiltonian) to the antisymmetric control
subset:
dU U -1 ∈ p (5.6.10)
which can as per the examples be written in its parametrised form as:
here α is a root (functional) on Θ that selects out the relevant parameter, e.g. when
Θ = k θk Hk then α selects out the appropriate θk ∈ C. See section B.4.5 and
P
Note that if Θ comprises multiple Hk , then the related Hamiltonian may also be
expressed as:
dU -1
H = i U = −k dΘ + sin adΘ (dcc ) k -1 .
-1
(5.6.13)
dt
220CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
1
(X, Y ) = ReTr(adX adY ) (5.6.14)
2
where adX is the adjoint representation of X. The Killing form (definition B.2.12)
is used to define an inner product on g allowing measurement of lengths and angles
in g. Define the inner product for weights and the Weyl vector ρ:
r
kl 1 X X
(α, β) = g Hk Hl and ρ= α= ϕk (5.6.15)
2 + k=1
α∈R
where {Hk } is a basis for the Cartan subalgebra a, R+ is the set of positive roots
α, r is the rank, and {ϕk } are the fundamental weights (section B.4). The weights
(α, β) of a representation are the eigenvalues of the Cartan subalgebra a, which (see
below) is a maximal abelian subalgebra of g. The Weyl vector ρ is a special weight
that is associated with the root system of g (section B.4.4). It is defined as half
the sum of the positive roots α (counted with multiplicities). The Weyl vector is
used in Weyl’s character formula, which gives the character of a finite-dimensional
representation in terms of its highest weight [235] which we denote below as τ .
Define the Euclidean norm (see A.1.11) using the Killing form as:
p
|X| = (X, X). (5.6.16)
We leverage the fact that the Killing form is quadratic for semi-simple Lie algebras
g such that:
|idU U -1 |2 = |ik -1 dk + cos adΘ (idcc -1 )|2 + |dΘ|2 + | sin adΘ (dcc -1 )|2 . (5.6.17)
|H| = Ω. (5.6.18)
By the Schrodinger equation, the time elapsed over the path γ is given by:
Z Z Z
-1 dU -1
Ωt = Ω dt = |idU U | = i U ds. (5.6.19)
γ γ γ ds
Define:
dU -1
ṫ = i U (5.6.20)
ds
5.6. GENERALISED CONSTANT-θ METHOD 221
with:
T = min t. (5.6.21)
γ
note
U̇ U -1 − χ(U̇ U -1 )
k k -1 k̇ + cos adΘ (ċc -1 ) k -1 = (5.6.23)
2
and
U̇ U -1 + χ(U̇ U -1 )
k iΘ̇ + i sin adΘ (ċc -1 ) k -1 = . (5.6.24)
2
We can further simplify via expanding in the restricted Cartan-Weyl basis, noting
that α indexes the relevant roots and s indexes the relevant sets of roots r ∈ ∆.
The restricted Cartan-Weyl basis allows g to be decomposed as the sum of the
commutant basis vectors Hk ∈ a and the root vectors Eα :
M
g = Hk Eα (5.6.25)
where there are r such positive roots. The Maurer-Cartan form (equation (5.6.22))
becomes expressed in terms of weights and weight vectors:
Θ = Hk θk (5.6.26)
and
In the above equations, the α are summed over restricted positive roots (of which
there are r many). Fα ∈ k − m with ϕ̇, ψ̇ ∈ C coefficients. The Lagrange multiplier
222CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
vector Λ in equation (5.6.29) is in generalised form. Note that in the SU (2) case
above, the simplicity of k = {Jz } simplifies the multiplier term in the action equation
(5.4.17). Here the Cartan subalgebra h is given by:
This algebra is distinct from a maximally compact Cartan algebra in that h intersects
with both p (the non-compact part of h) and k (the compact part of h). In many
cases, the elements of h are themselves diagonal (see for example Hall and others
[232, 235]) and entirely within k. In our case, a ⊂ p but m ⊂ k. The construction
of the restricted Cartan-Weyl basis is via the adjoint action of a on the Lie algebra,
such that it gives rise to pairings Fα ∈ k − m, Eα ∈ p − a conjugate under the adjoint
action.
The Cartan-Weyl basis has the property that the commutation relations between
the basis elements are simplified because (i) the Cartan generators (belonging to an
abelian subalgebra) commute and (ii) the commutation relations between a Cartan
generator and a root vector are proportional to the root vector itself. The com-
mutation relations between two root vectors can be more complicated, but they
are determined by the structure of the root system. The set of roots are the non-
zero weights of the adjoint representation of the Lie algebra. The roots form a
discrete subset of the dual space to the Cartan subalgebra, and they satisfy certain
symmetries and relations that are encoded in the Dynkin diagram of the Lie alge-
bra [2,230]. Transforming to the restricted Cartan-Weyl basis allows us to represent
the parametrised form of equation (5.6.17) as:
X
| sin adΘ (ċc -1 )|2 = gαα (ϕ̇α,r )2 sin2 α(Θ) (5.6.32)
α,r
X 2
|k -1 k̇ + cos adΘ (ċc -1 )|2 = gαα ψ̇ α,r + ϕ̇α,r cos α(Θ) + |m|2 (5.6.33)
α,r
Here we have used the fact that the only non-vanishing elements of the Killing form
are gαα and gjk (and using gαα = (Eα , Eα )), the inner product of the basis elements
5.6. GENERALISED CONSTANT-θ METHOD 223
with themselves, and the restricted Cartan-Weyl basis to simplify the variational
equations. The nonzero functional derivatives are:
δt 1 α,r
ψ̇ + ϕ̇α,r cos α(Θ) + λα,r
Ω = gαα (5.6.35)
δ ψ̇ α,r ṫ
! !
δt ϕ̇α,r ψ̇ α,r
Ω = gαα + + λα,r cos α(Θ) (5.6.36)
δ ϕ̇ ṫ ṫ
δt θ̇l
Ω = gkl (5.6.37)
δ θ̇k ṫ
!
α,r
δt X ψ̇
Ω =− gαα + λα,r ϕ̇α,r α(Hk ) sin α(Θ) (5.6.38)
δθk α,r
ṫ
δt
Ω α,r
= ψ̇ α,r + ϕ̇α,r cos α(θ). (5.6.39)
δλ
From the Euler-Lagrange equations (see section C.5.4 generally) and by design from
the connection we have the constraints:
We assume again the Lagrange multipliers are constant. As for the case of SU (2),
each Lagrange multiplier λα,r then becomes a global gauge degree of freedom in the
sense that:
∂T
=0 (5.6.41)
∂λα,r
with ψ̇(s) becoming local gauge degrees of freedom under the constraint:
δT
= 0. (5.6.42)
δ ψ̇ α,r
k -1 k̇/ṫ + Λ = 0 (5.6.43)
224CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
and
we see:
Θ = constant. (5.6.47)
The above functional equations show that when the action is varied, each term in
equation (5.6.8) vanishes apart from the sin adΘ (dcc -1 ) term. Calculating optimal
evolution time then reduces to minimization over initial conditions:
where
Z
Φ= −idcc -1 . (5.6.49)
γ
where:
1
Fα,r = adΘ (Eα,r ) ∈ k − m (5.6.51)
α(Θ)
ad2Θ (Eα,r ) = α(Θ)2 Eα,r ∈ p − a (5.6.52)
adΘ (m) = 0 (5.6.53)
such that:
X
sin adΘ (Φ̇) = ϕ̇α,r sin α(Θ)Eα,r ∈ p. (5.6.54)
α,r
5.6. GENERALISED CONSTANT-θ METHOD 225
Again:
-1 Θc
U (T )U0-1 = e−iX = keiΘ c = qeic . (5.6.56)
Explicitly we equate:
-1 Θc
U (T ) = q U (0) = eic . (5.6.57)
U0 = eiΘ (5.6.58)
where M can be regarded as the group elements generated by the commutant algebra
m, that is M = exp(m). Such a condition is equivalent to ΘcΘ -1 = c. In the
SU (2) case, because we only have a single generator in Φ, the condition manifests
in requiring the group elements c ∈ K to resolve to ±I, which in turn imposes a
requirement that their parameters ϕj be of the form ϕj = 2πn for n ∈ Z in order for
eiϕj kj = I where k ∈ k. In general, however, the commutant will have a nontrivial
connected subgroup and it still “quantizes” into multiple connected components.
In practice this means that ϕk may not, in general, be integer multiples of 2π and
instead must be chosen to meet the commutant condition in each case. We note
that:
or equivalently
Z
X= idqq -1 (5.6.61)
γ
Z
k ik -1 dk + idcc -1 k -1
= (5.6.62)
γ
Z
k (1 − cos adΘ ) idcc -1 k -1
= (5.6.63)
γ
= − (1 − cos adΘ ) Φ (5.6.64)
where we have used the minimal connection in equation (5.6.11), where the last
equality comes from k(0) = I and the arc-length parametrisation of γ between
s ∈ [0, 1]. The optimal time is:
where Φ̇ = Φ/T , Θ∗ and Φ∗ are the critical values which minimize the time, and the
multiplier is the turning rate:
Λ = −k -1 k̇/ṫ (5.6.68)
Z
= − k -1 k̇/T (5.6.69)
γ
cos adΘ∗ (Φ∗ )
= . (5.6.70)
T
The global minimization then depends upon the choice of Cartan subalgebra a ∋ Θ
(as illustrated in the examples above).
eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (5.7.1)
5.7. MINIMAL CONNECTIONS AND HOLONOMY GROUPS 227
reveals:
Symmetric controls:
−iHdt ∈ p (5.7.3)
which is minimal in the sense that it minimizes the invariant line element
min |idU U -1 |2 = min |ik -1 dk + i cos adΘ (dcc -1 )|2 + |dΘ|2 + | sin adΘ (dcc -1 )|2 (5.7.6)
k∈K k∈K
where the minimization is over curves, k(t). This means that minimization with a
horizontal constraint is equivalent to minimization over the entire isometry space.
Such submanifolds are said to be totally geodesic. Targets U (T ) = e−iX with effec-
tive Hamiltonians:
are known as holonomies because they are generators of holonomic groups, groups
whose action on a representation (e.g. the relevant vector space) causes vectors to be
parallel-transported in space i.e. the covariant derivative vanishes along such curves
traced out by the group action. We discuss holonomic groups in definition C.1.32.
Intuitively, one can consider holonomic groups as orbits such that any transformation
generated by elements of k will ‘transport’ the relevant vector along paths represented
as geodesics (definition C.1.35). However, such vectors are constrained to such orbits
if those generators are only drawn from k i.e. [k, k] ⊆ k for a chosen subalgebra k. To
transport vectors elsewhere in the space, one must apply generators in p which is
analogous to shifting a vector to a new orbit. Although in application, one considers
U (0) = 1, it is important to remember this variational problem is right-invariant, so
one could just as well let U (0) = U0 be arbitrary, as long as the target is understood
to correspondingly be U (T ) = e−iX U0 . In this case, the time-optima paths are
subRiemannian geodesics (see section C.4.1) which differ from Riemannian geodesics
228CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
to the extent that, intuitively put, evolution in the direction of k is required with the
distinction between the two types of geodesics captured by the concept of geodesic
curvature.
⊥
−iHIII
⊥
−iHIII eadλ4 Θ
ead−iλ4 Θ
−iλ5 λ5
⊥
Figure 5.1: Cayley transform of −iHIII ex-
pressed as a rotation into −iλ5 . The pres- ⊥
ence of the imaginary unit relates to the Figure 5.2: Cayley transform of −iHIII ex-
compactness here of g which reflects the pressed as a rotation into λ5 . By con-
boundedness and closed nature of the trans- trast with the case to the left, the absence
formation characteristic of unitary transfor- of the imaginary unit is indicative of non-
mations. compactness such that distances are not pre-
served (unlike in the unitary case where −i
is present).
⊥
Recall the maximally compact Cartan subalgebra h = ⟨HIII , HIII ⟩ ∈ k in (5.4.73)
is entirely within k. To obtain a maximally non-compact Cartan subalgebra, we
conjugate h via an appropriately chosen group element (see Section 7 of Part VI
of [9]). In principle, we could read off the combination ⟨−iHIII , −iλ5 ⟩ (such that
h ∩ p = −iλ5 ) from the commutation table in Table 5.1.
To demonstrate the application of a Cayley transform to obtain the requisite
maximally non-compact Cartan subalgebra, we use a root (definition B.4.3) (in
our case, γ below) to construct a generator of a transformation to a new Cartan
subalgebra whose intersection with p increases by one dimension. We construct a
5.8. CAYLEY TRANSFORMS AND DYNKIN DIAGRAMS 229
root system (section B.4.4) in order to find a Cayley transformation C such that:
⊥
⟨−iHIII , −iHIII ⟩ → ⟨−iHIII , −iλ5 ⟩ . (5.8.1)
Note that the root vectors are essentially raising operators that promote the system
from a lower to a higher energy state (as indicated by the ket-bra notation), while
their adjoints are lowering operators. The elements of h are linear combinations of
λ3 , λ8 which form a basis for h. To obtain the roots, we conjugate by this basis
which spans our original h above:
Conjugating by Hγ we have:
and:
1 1
iλ4 , −iHγ = −iλ5 iλ4 , −iHIII = 0. (5.8.9)
2 2
Cayley transformations take the form of rotations (conjugations i.e. the adjoint
action) by the angle π/2. We include two diagrams in Figures 5.1 and 5.2 out of
interest. The rotation in Fig. 5.1 represents a transformation by a compact group
element (unitary) preserving distances and angle. For contrast, a rotation where the
generator lacks the imaginary unit coefficient (Fig. 5.2), the geometry is hyperbolic
in nature, reflecting non-compact transformation. This gives an interesting geomet-
230CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
ric intuition about the role of the imaginary unit. Our generator of transformations
is iλ4 , thus the Cayley transformation for Hγ becomes:
π
Cγ = e 4 adiλ4 Cγ (−iHγ ) = −iλ5 . (5.8.10)
We can visually see this via Table 5.1 via the commutation relations along the −iHIII
row where such a choice of Cartan subalgebra intersects with p and k. We could also
have chosen h = ⟨−iHIII , −iλ4 ⟩.
Diagrammatically, we can understand the relationship between roots and state
transitions as set out in Figure 5.3 which shows the relationship of roots to transitions
between energy states in the lambda system. The root system can also be described
2
γ β
0 α 1
Figure 5.3: A transition diagram showing the relationship between energy transitions and roots. In
a quantum control context, a transition between two energy levels of an atom that can be described
by a root vector in the Hamiltonian. For example, a transition between |0⟩ → |1⟩ can be described
using the root vector eα . An electromagnetic pulse with a frequency resonant with the energy
difference between these two levels can, if applied correctly, transition the system consistent with
the action of eα .
as set out in Fig. 5.4 where the angles between root vectors (we give α as the
example) are calculated using the Cartan matrix below. We can represent the
λ6 + iλ7 λ4 + iλ5
β γ
α
λ1 + iλ2
Figure 5.4: Symmetric root system diagram for the root system described via roots in equation
(5.8.3) for the Lie algebra su(3). The roots α, β, γ can be seen in terms of angles between the root
vectors and can be calculated using the Cartan matrix.
relevant Dynkin diagram (definition B.4.6) for this root system. Recall the entries
5.8. CAYLEY TRANSFORMS AND DYNKIN DIAGRAMS 231
Aij of the associated Cartan matrix (definition B.4.5) are calculated by reference to
the roots as Aij = 2(α · β)/(α · α) where (·) denotes the Euclidean inner product.
The Cartan matrix encodes information about angles between simple roots and
ratios of their lengths. Angles between different simple roots will be off-diagonal
elements in the Cartan matrix of the form −1, −2, ... (diagonal entries are 2). Given
α · α = 4,β · β = 4 For (α, β) and α · β = −2 we have Aαβ = 2(α · β)/(α · α) = 2 cos(θ)
with θ being the angle between the roots. So Aαβ = −1 Noting that and:
we find cos(θ) = −1/2 such that θ = 2π/3. The Dynkin diagram represents a root
system for su(3). Each node represents one of the simple roots (α, β) connected
by a single line. By convention, the lines connecting simple roots reflect the angle
between them. For an angle of 2π/3, we have one line connecting the nodes. Hence
we see the connection between root systems as represented by the Dynkin diagram
in Figure (5.5) (see [24] and references therein for a discussion on the relation of
resonance to optimality).
α β
2π
3
β
α
Figure 5.5: Combined diagram of a Dynkin diagram and a symmetric root system with specified
angles and relations.
5.8.1 Hamiltonian
Below is a toy model of how such roots relate to a Lambda system Hamiltonian
with energy levels labeled as |0⟩ , |1⟩ , |2⟩. Here the root vectors from the Lie algebra
correspond to transition operators between these states and are incorporated into
232CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
where ωj are the energy eigenvalues corresponding to the Cartan subalgebra elements
Hj , and gα are coupling constants for the transitions. The Hj and Eα correspond
to the Cartan and non-Cartan elements of g, respectively. The root vectors Eα cor-
respond to transition operators between energy states. For example, if Eα = |0⟩⟨1|,
it induces a transition from state |1⟩ to state |0⟩ when the system interacts with
a resonant control field. The non-Cartan subalgebra elements, which correspond
to the root vectors of the Lie algebra, drive the transitions among different energy
levels. The Cartan subalgebra elements correspond to the diagonal elements in the
Hamiltonian, which can be thought of as the stationary energy levels. By con-
trast, the non-Cartan elements are off-diagonal, represent the possible transitions or
interactions between these energy levels.
Appendix A
A.1 Overview
233
234 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
tum mechanics itself. The study of such formalism usually begins with the study
of quantum axioms or postulates, a set of assumptions about quantum systems and
their evolution derived from observation and from which other results and properties
of quantum systems are deduced. There are a variety of ways to express such ax-
ioms according to chosen formalism (and indeed debate continues within fields such
as quantum foundations, mathematics and philosophy as to the appropriateness of
such postulates). In the synopses below, we have followed the literature in [11, 45]
and [43], making a few adjustments where convenient (such as splitting the measure-
ment postulate in two). We partly frame postulates information-theoretic fashion
concurrently with traditional formulations. The axioms form a structured ordering
of this background material, anchoring various definitions, theorems and results rel-
evant to establishing and justifying methods and results in the substantive Chapters
above. We proceed to discussing key elements of open quantum systems (relevant
to, for example, the simulated quantum machine learning dataset the subject of
Chapter 3) together with elements of the theory of quantum control (which we later
expand upon in terms of algebraic and geometric formulations). For those familiar
with quantum mechanics and quantum information processing, this section can be
skipped without loss of benefit. Most of the material below is drawn from stan-
dard texts in the literature, including [43, 45, 238]. Proofs are omitted for the most
part (being accessible in the foregoing or other textbooks on the topic). We begin
with the first of the quantum postulates (which we denote axioms following [45]) of
quantum mechanics, that related to quantum states.
The axiom above provides that quantum systems are represented by (unit) state
vectors within a complex-valued vector (Hilbert) space H whose dimensionality is
determined according to the physics of the problem. There are various formalisms for
representing such state vectors in quantum information processing, a common one
being Dirac or ‘bra-ket’ notation. In this notation, the state vector is represented
as a ‘ket’ |ψ⟩ ∈ H. Associated with a ket is a corresponding ‘bra’ notation ⟨ψ|
which strictly speaking is a linear (one) form (or function) that acts on |ψ⟩ such
that ⟨ψ1 |ψ2 ⟩ ∈ C for two states |ψ1 ⟩ , |ψ2 ⟩. Quantum states are typically defined
as elements of a Hilbert space H, a vector space over C equipped with an inner
product. To this end, we introduce the definition of a state space.
Definition A.1.1 (State space). A state space V (K) is a vector space over a field
K C
, typically . Elements ψ ∈ V represent possible states of a quantum system. In
P
particular, for ak ∈ C and ψk ∈ V , the state ψ = k ak ψk ∈ V .
The formulation of classical states here maps largely to that found in computer
science literature where states are constructed from alphabets etc. Classical states
may be composed of subregisters which determine the overall state description. That
is, a register has configurations (states) it may adopt (with subregisters determining
236 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
higher-order registers). Probabilistic states are distributions over classical states the
register may assume, denoted P(Σ) with states represented by probability vectors
p(a) ∈ P(Σ). From the classical case, we can then construct the quantum infor-
mation representation of a quantum register and states which are represented by
density operators (defined below).
K
Definition A.1.4 (Inner Product). An inner product on V ( ) is a mapping from
a vector space to a field K K
⟨·, ·⟩ : V × V → , (ψ, ϕ) 7→ ⟨ψ, ϕ⟩ for ψ, ϕ ∈ V, c ∈ K
satisfying:
Inner products and bilinear forms are crucial components across quantum, alge-
braic and geometric methods in quantum information processing. Later, we examine
the relationship between differential (n-)forms and inner products (see Appendix B)
which are relevant to variational results in Chapter 5. There the inner product
is analogous to a bilinear mapping of V × V ∗ → K, implicit in the commonplace
‘braket’ notation of quantum information (where kets |ψ⟩ and ⟨ψ| are duals to each
other and inner products are given by ⟨ψ|ψ⟩). In the formalism below, we regard
|ψ⟩ as an element of a vector (Hilbert) space V (C) while the corresponding ⟨ψ| as an
element of the dual vector space V ∗ (C). Moreover, in later Chapters we adopt (and
argue for the utility of) a geometric approach where inner products and norms are
defined in terms of metric tensors g over manifolds and vector space relationships
to fibres. The inner product defined on a vector space V then gives rise to a norm
via the Cauchy-Schwartz inequality.
A.1. OVERVIEW 237
We are now equipped to define the norm on the vector spaces with specific
properties as follows.
⟨ϕ, ψ⟩
cos(θ) = (A.1.1)
∥ϕ∥ · ∥ψ∥
Definition A.1.7 (Operator norm). If V1 (K) and V2 (K) are normed spaces, the
linear mapping A : V1 (K) → V2 (K) is bounded if
∥Aψ∥
sup < ∞.
ψ∈V1 \{0} ∥ψ∥
Intuitively this tells us that linear mappings on V (K) remain within the closure
of V (K). The set of such linear forms a complementary vector space denoted the
dual space.
K K
a ∈ . The norm is given via the field norm ||.|| = |.| ∈ . The set of all such
bounded linear functionals is denoted the dual space V ∗ (K) to V ( ).K
K
If V ( ) is normed then V ∗ (K) is also a Banach space. Moreover, we have that for
ψ ∈ V and χ ∈ V ∗ :
1. χ(ψ) ∈ K.
2. |χ(ψ)| = ||χ||||ψ||.
3. ∀χ ∈ V ∗ , χ(ψ) = 0 ⇐⇒ ψ = 0.
Note in numbered item 2 above, duals can also be thought of as types of vectors
themselves and as functions that take vectors as arguments. We discuss dual spaces
and functionals in further chapters, particularly in relation to representations of Lie
algebras and differential forms. With the concept of a dual space, inner product and
norm, we can now define the Hilbert space.
Often we are interested in direct sums or direct (tensor) products of Hilbert spaces
such as tensor product of Hilbert spaces (crucial to multi-qubit quantum compu-
tations). We discuss these in later Chapters. In some cases, multipartite Hilbert
spaces may be characterised by direct sums rather than direct (tensor) products.
This is the case where a Hilbert space is decomposable into disjoint regions.
Proposition A.1.1 (Hilbert Space Direct Sum). The direct sum of a sequence H
of separable Hilbert spaces denoted by
∞
M
H := Hj ,
j=1
for all ϕj , ψj ∈ Hj and ψ, ϕ ∈ H. Each subspace inherits the inner product and
associated completeness properties of H, confirming each Hj as a Hilbert space. The
partitioning of a Hilbert space via a Cartan decomposition of Lie algebraic generators
(which form a Hilbert space as a vector space upon which an inner product is defined)
g into g = k ⊕ p is an example of such a decomposition which preserves the overall
structure of the vector space within which quantum state vectors subsist [239]. Thus
the essential properties of quantum state vectors are retained in the case of such
decomposition. Hilbert spaces have a number of features of importance to quantum
computational contexts. These include certain orthogonality relations which are
important to their decomposition in ways that can simplify problems (such as those
of unitary synthesis that we examine in later Chapters).
The boundedness and linearity of states means we can also consider them as
P
(convergent) sums ψ = j aj ej , aj ∈ C where aj = ⟨ej , ψ⟩ ∈ C .
Definition A.1.12 (Tensor Product). A tensor product of two vector spaces V1 (K), V2 (K)
is itself a vector space W (F) equipped with a bilinear map T : V1 (K) × V2 (K) →
W (K).
T
V1 (K) × V2 (K) W (K)
Φ̃
Φ
U
Figure A.1: Commutative diagram showing the university property of tensor products discussed
above.
Qubits are normalised such that they are unit vectors ⟨ψ|ψ⟩ = 1 (that is |a|2 + |b|2 =
1), where a, b ∈ C are amplitudes and their squared moduli, the probabilities, for
measuring outcomes of 0 or 1 (corresponding to states |0⟩ , |1⟩ respectively). Here
⟨ψ|ψ ′ ⟩ denotes the inner product of quantum states |ψ⟩ , |ψ ′ ⟩. As can be observed
in equation (A.1.3), as distinct from a classical (deterministic or stochastic) state,
quantum states may subsist ontologically in a superposition of their basis states.
The stateful characteristic of quantum systems is also manifest in the first pos-
tulate of quantum mechanics, which associates Hilbert spaces to enclosed physical
systems under the assumption that ψ ∈ H provides complete information about the
system [238]. In quantum computing, qubits may be either physical (representing
the realisation of a two-level physical system conforming to the qubit criteria) or
logical, where a set of systems (e.g. a set of one or more physical qubits) behaves
abstractly in the manner of a single qubits which is important in error correction.
Quantum computations may also involve ancilla qubits used to conduct irreversible
logical operations [11].
For problems in quantum computing and control, we are interested in how quan-
tum systems evolve, how they are measured and how they may be controlled. To
this end, we include below a brief synopsis of operator formalism and evolution of
quantum systems. In Appendix B, we characterise these formal features of quantum
information processing algebraic and geometric terms. One of the consequences of
definition A.1.1 is the existence, unlike in the classical case, of quantum superposi-
tion states i.e. that a quantum system can exist simultaneously in multiple different
states until measured. Given a quantum system in states represented by unit vectors
|ψ1 ⟩ , |ψ2 ⟩ , . . . , |ψn ⟩ ∈ H, any linear combination of these states is also in H:
n
X
|ψ⟩ = ck |ψk ⟩ (A.1.4)
k=1
where ck ∈ C are complex coefficients satisfying the normalization condition nk=1 |ck |2 =
P
under axiom A.1.1. The state |ψ⟩ encodes the probabilities of the system being
found in any of the basis states upon measurement, with the probability of finding
the system in state |ψk ⟩ given by |ck |2 .
Definition A.1.13 (Bounded linear operators). The space B(H) of bounded linear
operators A on H is defined by
∥Aψ∥ ≤ K∥ψ∥
for all ψ ∈ H. The smallest such K is the operator norm of A, denoted by ∥A∥.
We also note Riesz’s Theorem that for any bounded linear functional ξ : H → C,
there exists a dual element χ such that ξ(ψ) = ⟨χ, ψ⟩ (i.e. and so an equivalence
of norms). Riesz’s Theorem guarantees the existence in such cases of an adjoint
operator A† which crucial for Hermitian operators.
A.1. OVERVIEW 243
for all ϕ, ψ ∈ H.
(A† )† = A (AB)† = B † A† ,
(αA + βB)† = ᾱA† + β̄B † I † = I.
Note too that ∥A† ∥ = ∥A∥ < ∞. If A† = A, then A is self-adjoint which we denote
as Hermitian (or skew-self-adjoint if A† = −A).
Note that projection operators are normal operators and that the set of projec-
tion operators is denoted Proj(H). For each V there exists a P such that the image
of P is V .
U (H, Y) = {A ∈ B(H, Y) | A† A = IH }
In order for an isometry of the form A ∈ U (H, Y) to exist, it must hold that
dim(Y) ≥ dim(H). Every isometry preserves not only the Euclidean norm, but
inner products as well: ⟨Au, Av⟩ = ⟨u, v⟩ for all u, v ∈ H. Isometries allow us to
then define unitaries operators as the set of isometries U : (H, H) → H (which map
H to itself).
⟨U ϕ, U ψ⟩ = ⟨ϕ, ψ⟩
(i.e. is surjective and maintains inner products) for all ϕ, ψ ∈ H. Unitary operators
preserve norms ∥U ψ∥ = ∥ψ∥.
Unitary operators are thus a specific type of normal operator. From this defi-
nition, we deduce that ∥U ∥ = 1. U is unitary if and only if U † = U −1 such that
U U † = U † U = I. Hermitian operators, unitary operators and positive semi-definite
operators are central to quantum information processing (and the theory of quantum
mechanics generally). Importantly, there is a bijection from the space of Hermitian
operators on the complex space H to real Euclidean spaces. Unitary operators rep-
resent propagators of quantum state vectors ψ(t) ∈ H and, moreover, represent
A.1. OVERVIEW 245
H =P −Q (A.1.7)
of A is given by:
X
Tr(A) = ⟨ej , Aej ⟩,
j
Here Trk ⊗ I is shorthand for the application of the trace operator to Xk . The
partial trace has the effect of, in geometric terms (see below) contracting the relevant
(tensor) product space (along the dimension associated with Xk ) and scaling the
residual tensor by the value of that trace in C. In density operator formalism, the
partial trace gives rise to a reduced density operator. For ρT representing the tensor
product state HA ⊗ HB , the partial tracing-out of HB in effect contracts the total
tensor state up to a scalar TrA , that is:
ρA = TrB ρT (A.1.9)
In later sections below, we note the relation of the trace to quantum measure-
ment.
A.1. OVERVIEW 247
All trace-class operators are Hilbert-Schmidt and the trace exhibits the usual
cyclic property Tr(AB) = Tr(BA). The inner product is given by ⟨A, B⟩ = Tr(A† B)
with the norm as per above ||A||2 = ⟨A, A⟩. With the concept of the trace we can
also note a number of norms (and metrics) used further on, in particular norms
(and thus metrics) for comparing quantum states (such as trace-distance related
measures) and in Chapter 5 for variational techniques. Schatten p-norms are given
by:
1
p1
||X||p = Tr (A† A) 2 (A.1.10)
This also equals the sum of singular values of X. The trace distance is commonly
used to compare distances of quantum states via their operator representation as
density operators (matrices). We now define the density operator.
quantum systems, the trace distance between such operators is given by:
1
dT (ρ, σ) = ||ρ − σ||1 (A.1.15)
2
Definition A.1.26 (Pure and Mixed States). A density operator ρ ∈ B(H) rep-
resents a pure state if there exists a unit vector ψ ∈ H such that ρ is equal to the
orthogonal projection onto span{ψ}. The density matrix ρ is called a mixed state if
no such unit vector ψ exists.
Closed-system pure states remain pure under the action generated by Hamilto-
nians. For pure states, Tr(ρ2 ) = 1 while for mixed Tr(ρ2 ) < 1. Density matrices
form a convex set such that λρ1 + (1 − λ)ρ2, λ ∈ (0, 1). Pure states are those that
cannot be expressed as ρ = λρ1 + (1 − λ)ρ2 for ρ1 ̸= ρ2 . Mixed and pure states can
also be conceived of such that if the state of a quantum system is known exactly
|ψ⟩, i.e. where ψ = |ψ⟩ ⟨ψ| then it is denoted as a pure state, while where there is
P
(epistemic) uncertainty about its state, it is a mixed state i.e. ρ = i pi ρi where
Tr(ρ2 ) < 1 (as all pi < 1). Such properties of pure and mixed states are important
tests for the effects of, for example, decoherence arising from various sources of noise
(see Chapter 3 for more detail).
In many applications of quantum information processing, we seek a metric to
ascertain similarity between quantum states. Density matrices can be used to define
a metric such as quantum relative or von Neumann entropy noted in our discussion
of metrics (relevant to quantum machine learning below in subsection A.1.8):
transformed into another state with different relative phases. That is, such a state
is coherent if the only case where:
c1 eiθ1 ψ1 + c2 eiθ2 ψ2 = c1 ψ1 + c2 ψ2
which is central to measurement statistics by which states and registers are de-
scribed. We now briefly describe quantum channels due to their relevance, in our
context, to measurement and system evolution.
The concept of a channel derives from information theory [237] as a way of ab-
stracting a medium or instrument of information transfer. The concept has since
been adapted for application in quantum information theory such that a quantum
channel is a linear map from one space of square operators to another, that satisfies
the two conditions of (complete) positivity and trace preservation.
Φ : B(H1 ) → B(H2 )
such that Φ is (a) completely positive and (b) trace-preserving. Maps satisfying these
two properties are denoted CPTP maps.
The set of such channels is denoted C(H1 , H2 ) and C(H) for C(H, H). Of
particular importance is the concept of unitary channels.
Definition A.1.30 (Unitary Channel). Given the unitary operator U ∈ B(H), then
the following map is a unitary channel:
Φ(X) = U XU †
Φ ∈ C(H1 , H2 )
†
where pm = Tr(Mm ρMm ) is the probability of obtaining outcome m, and Em rep-
resents the eigenstates corresponding to outcome m projected onto a diagonal basis
(i.e. diagonalised with entries corresponding to measurement outcomes equivalent
to Ea,a for a = m diagonal basis operators) to represent the post-measurement state.
Hence, quantum-classical channels are general measurement channels in quantum in-
formation. Other features of quantum-classical channels, such as their compactness
and convexity over H and the existence of product measurements (for multi-state
systems), are noted in the literature (and ultimately important to both control/
reachability and the learnability. We explore their implementation in terms of ma-
chine learning techniques in later Chapters. Another concept of importance in mea-
surement and quantum machine learning is that of partial measurements, especially
the relationship between partial measurements and the partial trace operation.
Partial measurements are thus related to partial traces and, in geometric con-
texts (as discussed below), contraction operations over tensor fields (where the trace
operation can, under certain geometric conditions relevant to quantum control, be
related to tensorial contraction).
dψ
i = Hψ (A.1.18)
dt
We examine the formalism and consequences of this axiom below. In later Ap-
pendices below, we connect the evolution of quantum systems to geometric and
algebraic formalism (in terms of transformations along Lie group manifolds). Before
we do so, we include a short digression on the two perspectives of quantum evolution
formalism.
We also mention these well-known formalisms as we can connect both with a more
geometric formalism for expressing quantum evolution in Chapters 4 and 5, such
as the Maurer-Cartan forms and the adjoint action of Lie groups. It is also worth
mentioning the hybrid interaction (or von Neumann) picture in which both states
and operators evolve, expressed (as we discuss further on) in terms of the Liouville
von Neumann equation (for closed quantum systems) and the master equation (for
open quantum systems [144]):
1
ρ̇ = −i[H, ρ] + κD[Ô]ρ D[Ô]ρ = ÔρÔ† − (Ô† Ôρ + ρÔ† Ô) (A.1.19)
2
dρ
= −i[H, ρ] ρ(t) = e−iHt ρ0 eiHt (A.1.20)
dt
for ρ0 = ρ(t = 0). In unitary operator formalism, the above can be written:
dU U -1 = −iHdt (A.1.21)
where ψ(t) = U (t)ψ0 . It is useful to unpack this formalism and define the Hamil-
tonian. In later Chapters, we connect the Hamiltonian to geometric, algebraic and
variational principles (such as the framing of equation (A.1.21) in terms of the
Maurer-Cartan form). Hamiltonians as a formalism arise from variational tech-
niques (see [44] for a detailed discussion) for determining equations of motion.
Solving the time dependent Schrödinger equation given in definition A.1.18 is of-
ten challenging or unfeasible, requiring perturbation methods or other techniques.
A common approximation used in quantum information processing and quantum
control is the time-independent approximation to the Schrödinger equation. Recall
from equation (A.1.22) the time-dependent solution to the close-system Schrödinger
equation is given by:
RT
U (T ) = T+ e−i 0 H(t)dt
(A.1.23)
The time-ordering reflects the fact that the generators within the time-dependent
Hamiltonian do not, in general, commute at different time instants (i.e. [H(ti ), H(tj )] ̸=
0). In certain cases, a time-independent approximation to equation (A.1.22) may
be adopted:
RT
U (t) = T+ e−i 0 H(t)dt
(A.1.24)
≃ lim e−iH(tN )∆T e−iH(tN −1 )∆T · · · e−iH(t0 )∆T (A.1.25)
N →∞
where we have applied equation (B.2.14). In such cases, the time interval [0, T ]
is divided into equal segments of length ∆t while the Hamiltonian is considered
constant over that interval. Such results are crucial to time-independent optimal
control theory where control problems are simplified by assuming that Hamiltonians
may be approximated as constant over small time intervals such that the system is
described by the control function u(t) applied over such period. Even when only
a subset of the Lie algebra g is available, we may be able to achieve universal
control (that is, the ability to synthesise all generators, by repeated application of
256 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
the control Hamiltonian) via commutator terms that in turn allow us to generate
the full generators of the corresponding Lie algebra. This is framed geometrically
in terms of the distribution ∆ (a geometric framing of our set of generators) being
‘bracket-generating’.
The approximation is generally available under the assumption that the Hamil-
tonian over increments ∆t is constant as mentioned above. Of central importance
to time-independent approximations it the Baker Campbell Hausdorff formula (def-
inition B.2.18). For universal control in quantum systems, repeated application of
the control Hamiltonian allows us to synthesize all generators of the Lie algebra
corresponding to the system’s dynamics. This is where the BCH formula becomes
crucial, as it allows for the generation of effective Hamiltonians that include com-
mutator terms, thus expanding the set of reachable operations. For example, the
BCH formula shows us that by carefully sequencing the application of H1 and H2 ,
we can effectively implement the commutator [H1 , H2 ] as part of the evolution:
n
e−i[H1 ,H2 ]∆t ≈ e−iH1 ∆t/n e−iH2 ∆t/n eiH1 ∆t/n eiH2 ∆t/n (A.1.27)
for sufficiently large n and small ∆t. This approximation is powerful for designing
control sequences in quantum systems with non-commuting dynamics.
A.1.6 Measurement
Measurement is at the centre of quantum mechanics and quantum information pro-
cessing. In the section below, we set out two further axioms relating to measurement,
adapted from the literature (see [11,45]). The first concerns measurement and oper-
ators. The second concerns the characterisation of post-measurement states. Mea-
surement formalism varies to some degree depending on how information-theoretic
an approach adopted is. We incorporate elements of both the traditional and
information-based formalism. We have aimed to present a useful coverage of mea-
surement formalism specifically tailored to later Chapters, where measurement plays
a critical role, for example, in both simulated quantum systems (for use in state
description and other tasks) and machine learning. The first axiom defines mea-
surement. We split this into two in terms of (a) measurement probabilities (e.g.
the Born rule) and (b) the effect of measurement (wave collapse). Note here we
adopt an operator formalism where measurement is represented by a set of oper-
ators {M } the outcomes of which are measurements m ∈ Σ (i.e. mapping to our
register configurations) such that we can write {Mm |m ∈ Σ} ⊂ P os(H).
The reference to M k (following Hall [45]) is to the fact that the k-th power of the
measurement operator M corresponds to the k-th moment of the probability distri-
bution for measurement outcomes associated with M (i.e. expectation, variance and
so on). Here M is a measurement operator with eigenvectors em and eigenvalues m
corresponding to outcomes of measurement.
The second measurement axiom (which can be viewed as an extension of the
first) describes the effect of measurement on quantum states, namely the collapse of
the wave function into eigenstates of the measurement operator.
† †
p(m) = ⟨ψ|Mm Mm |ψ⟩ = tr(Mm Mm ρ) (A.1.28)
Mm |ψ⟩
|ψ ′ ⟩ = q (A.1.29)
†
⟨ψ|Mm Mm |ψ⟩
†
Mm ρMm
ρ′ = †
(A.1.30)
⟨Mm Mm , ρ⟩
†
P
The set of measurement operators m Mm Mm = I reflects the probabilistic nature
of measurement outcomes. Measurement is modelled as a random variable in Σ,
described by the probability distribution p ∈ P(Σ). The act of measurement tran-
sitions M : ρ → ρ′ described by equation (A.1.29). In quantum information theory,
we further refine the concept of measurement as per below.
ability measure from the set of measurement outcomes to the set P os(H):
X
†
µ : Σ → P os(H) m 7→ µ(m) := Tr(Mm Mm ρ) µ(m) = I (A.1.31)
m∈Σ
Here Σ is the set of measurement outcomes m (which describe our quantum state
ρ) following application of Mm (note we include µ(m) := Mm as a bridge between
the commonplace formalism of Nielsen et al. [11] and the slightly more information
theoretic form in Watrous [43]. In this notation (from [43]), p(m) = ⟨µ(m), ρ⟩.
When a measurement is performed on a system described by the density operator
ρ, the probability of obtaining outcome m is given by:
†
p(m) = Tr(Mm ρMm ) (A.1.32)
and the state of the system after the measurement (post-measurement state) is
†
Mm ρMm
ρm = . (A.1.33)
p(m)
Connecting the standard terminology with the more information terminology for
†
P
measurement, we note that {Mm |m ∈ Σ} ⊂ B(H) such that m Mm Mm = I.
Measurement then corresponds m ∈ Σ being selected at random (i.e. modelled as a
random variable) with probability given by equation (A.1.28) and post-measurement
state given by equation (A.1.30). Such measurements are described as non-destructive
measurements.
calculate the probability distribution over outcomes via Born rule using the trace:
noting the expectation as per equation (A.1.14). Measurement thus has a repre-
sentation as a quantum to classical channel ΦM ∈ C(X , Y) (see definition A.1.31)
under the condition that it is completely dephasing with ΦM (ρ) is diagonal for all
ρ ∈ D(X ). Such quantum-to-classical channels are those that can be realised as a
measurement of a register X. As Watrous (§2.3) notes, there is a correspondence
between the mapping of measurement outcomes m to operators M ∈ P os(X ). Com-
bining Watrous and Nielsen et al.’s formalism, Φ(X) describes a classical description
(distribution) of the outcome probabilities when a quantum state ρ is measured:
X X
†
ΦM (ρ) = Tr(Mm ρMm )|m⟩⟨m| = ⟨µ(m), ρ⟩ Em (A.1.36)
m∈Σ m∈Σ
η : Σ → P os(⊗k Xk ) (A.1.37)
η(m) = TrXk (Ik−1 ⊗ µ(m) ⊗ Ik+1 )(ρ) (A.1.38)
260 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
where Xk ∈ Hk (note sometimes the trace above is denoted TrHk to indicate tracing
out of the k-th subsystem). The measurement µ(m) essentially maps Xk (which,
recall, could be a superposition state) to a classical state scaled by m (and in this
sense we can think of measurement as a tensorial contraction as discussed in later
Chapters). In this formalism, the post-measurement state in equation (A.1.29)
becomes where the denominator is the normalisation scalar:
We are also concerned in particular with projective measurements where each mea-
surement is a projection operator i.e. µ(m) ∈ P roj(H). Each projective measure-
ment projects the state ρ into an eigenstate of the respective projection operator.
The set of such operators {µ(m)|m ∈ Σ} is an orthogonal set, which means there
are at most dim H distinct observables m. It can be shown that any measurement
can be framed as a projective measurement.
Positive-operator valued measure (POVM) formalism more fully describes the mea-
surement statistics and post-measurement state of the system. For a POVM, we de-
†
P
fine a set of positive semi-definite operators {Em } = {Mm Mm } satisfying m Em =
I in a way that gives us a complete set of positive operators (such formalism being
more general than simply relying on projection operators).
Equation (A.1.41) above expresses the same principle of equation (A.4.6) in operator
formalism with the inner product. The uncertainty (variance) in measurements of
A is given by:
The expectation value together with the uncertainty of measurements of A are im-
portant measurement statistics of A with respect to state ψ. We are also often
262 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
interested in how measurement statistics evolve over time which can be modelled
as:
d
⟨A⟩ψ = ⟨−i[A, H]⟩ (A.1.43)
dt
where [·, ·] denotes the commutator [A, B] = AB − BA. The commutator (in the
form of the Lie derivative) is more fully described in Appendix B and proposition
(B.2.2).
In particular, for EPR pairs (Bell states) this means that certain states such as:
1
|ψ⟩ = √ (|00⟩ + |11⟩) (A.1.46)
2
Metrics play a central technical role in classical machine learning, fundamentally be-
ing the basis upon which machine learning algorithms update, via techniques such
as backpropagation. Indeed metrics at their heart concern the ability to quantifi-
ably distinguish objects in some way and to this extent have been integral to the
very concept of information as portended by Hartley in which information refers to
quantifiable ability of a receiver to distinguish symbolic sequences [247]. Hartley’s
original measure of information prefigured advanced approaches to quantifying and
describing information, such as Kolmogorov and others [248].
Metrics for quantum information processing are related but distinct from their
classical counterparts and understanding these differences is important for researchers
applying classical machine learning algorithms to solve problems involving quantum
data. As is commonplace within machine learning, chosen metrics will differ depend-
ing on the objectives, optimisation strategies and datasets. For a classical bit string,
there are a variety of classical information distance metrics used in general [238]. In
more theoretical and advanced treatments, available metrics will depend upon the
underlying structure of the problem (e.g. topology) (see [43] for a comprehensive
discussion). Metrics used will depend also upon whether quantum states or opera-
tors are used as the comparators, though one can relatively easily translate between
operator and state metrics. We outline a number of commonly used quantum met-
rics below and discuss their implementation in classical contexts, such as in loss
functions. Note below we take license with the term metric as certain measures be-
low, such as quantum relative entropy, do not (as with their classical counterparts)
strictly constitute metrics as such.
State discrimination for both quantum and probabilistic classical states requires
incorporation of stochasticity (the probabilities) together with a similarity measure.
The Holevo-Helstrom theorem quantifies the probability of distinguishing between
two quantum states given a single measurement µ. State discrimination here is a
binary classification problem.
1 1
λ ⟨µ(0), ρ0 ⟩ + (1 − λ) ⟨µ(1), ρ1 ⟩ ≤ + ||λρ0 + (1 − λ)ρ1 ||1 (A.1.47)
2 2
√ √ √ √
q
F (ρ, σ) = ρ σ 1
= Tr σρ σ (A.1.49)
Finally, the relationship of fidelity F and trace function || · ||1 for state ρ, σ ∈ D(H)
is given by Fuchs-van de Graaf inequalities, which can be expressed as:
p
2 − 2F (ρ, σ) ≤ ||ρ − σ||1 ≤ 2 1 − F (ρ, σ)2 (A.1.51)
p
Fidelity and trace distance are related via D(ρ, σ) = 1 − F (ρ, σ)2 . Fidelity can
also be interpreted as a metric by calculating the angle ζ = arccos F (ρ, σ). Note in
some texts that equation (A.1.49) is sometimes denoted root fidelity while fidelity
is its square (see [241]) and that fidelity can be related to transition probability
between mixed states [249].
Other metrics common in quantum machine learning literature include:
1. Hamming distance, the number of places at which two bit strings are unequal.
266 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
These measures provide a further basis for comparing for the output of algo-
rithms to labelled data during training.
We discuss the use and relationship of such metrics to quantum machine learning
algorithms deployed in further chapters in more detail in Appendix D.
A.2.1 Overview
Quantum information tasks can be modelled in terms of a two-step experiment (a)
a quantum state and measurement instrument preparation procedure (isolating the
quantum system in a particular state); (ii) a measurement step where the instrument
interacts with the quantum system to yield measurement statistics. Obtaining such
statistics (by which the state preparation can potentially be confirmed) occurs via
multiple repetitions of such experiments. While quantum systems are represented
distinctly from classical systems, their state preparation, control and measurement
is classically parametrised (e.g. by parameters in R or C). Information can be distin-
guished between classically described and evolving (classical information) as distinct
from being described and evolving according to quantum formalism (quantum infor-
mation). In certain cases, such as with certain classes of quantum machine learning
or variational algorithms [73, 88, 100, 251], parameter registers may themselves be
represented (or stored within) quantum registers. However, as noted above, quan-
tum registers themselves concern distributions and evolution] over classical registers
(albeit with certain non-classical features). Thus ultimately quantum information
processing, and tasks involving quantum systems, involve constructions using clas-
sical information.
A.2. QUANTUM CONTROL 267
Our work and results in later Chapters are focused upon typical quantum control
problems [15,46,144], where it is assumed there exist a set of controls (such as pulses
or voltages) with which the quantum system may be controlled or steered towards a
target state (or unitary). Recall that in density matrix formalism, the Schrödinger
equation in terms of a density operator and Hamiltonian is:
dρ
i = [H(t), ρ(t)] (A.2.2)
dt
268 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
In physical systems, it corresponds to the total energy (sum of kinetic and potential
energies) of the system under consideration. In control theory, we can separate
out the Hamiltonian in equation (A.2.2) into drift and control forms. For a closed
system (i.e. a noiseless isolated system with no interaction with the surrounding
environment), it can be expressed in the general form:
.
H(t) = H0 (t) = Hd (t) + Hctrl (t) (A.2.3)
Hd (t) is called the drift Hamiltonian and corresponds to the natural evolution of
the system in the absence of any control. The second term Hctrl (t) is called the
control Hamiltonian and corresponds to the controlled external forces we apply to
the system (such as electromagnetic pulses applied to an atom, or a magnetic field
applied to an electron). This is also sometimes described as an interaction Hamil-
tonian (representing how the system interacts with classical controls e.g. pulses or
other signals). This allows us to define a control-theoretic form of the Schrödinger
equation.
dρ
i = [Hd (t) + Hctrl (t), ρ(t)] (A.2.4)
dt
where ρ(0) is the initial state of the system, the unitary evolution matrix U (t) is
given by equation (A.1.18). We discuss control and its application in some detail in
later Chapters, specifically relating to geometric quantum control as expressed by
formal (Pontryagin) control theory where targets are unitaries UT ∈ G for unitary
groups G. As D’Alessandro [15] notes, the typical quantum control methodology
covers: (a) obtaining the Hamiltonian of the system in an appropriate form, (b)
identifying the internal and interaction components of the Hamiltonian (and often
calculating the energy eigenstates for a time-independent approximation) and (c)
specifying a finite dimensional and bounded control system for control coefficients.
A.2. QUANTUM CONTROL 269
We sketch some of the discussion of control theory in later Chapters here. D’Alessandro
[15] notes that a general control system has the following form (which as we discuss
is due to Pontryagin):
ẋ = f (t, x, u) (A.2.6)
where x represents the system state, f is a vector field while u = u(t) are real-
valued time varying (or constant over small ∆t interval) control functions. The
state is inaccessible in general, only via some other function or channel e.g. g(x),
say a measurement operation. In quantum settings, equation (A.2.6) is described
by the Schrödinger equation. In unitary form this is equivalent to:
where H(u(t)) is a Hamiltonian comprising controls and generators (for us, drawn
from a Lie algebra g). The drift Hamiltonian Hd and control Hamiltonian Hc com-
bine as:
X
H(u) = Hd + Hk uk (A.2.8)
k
Such solutions are drawn from Jurdjevic [23] from geometric control theory litera-
ture. These are a major focus of our final chapter, where we show equivalent results
can be obtained for certain symmetric space quantum control problems by using a
global Cartan decomposition together with certain variational techniques. The form
of equation (A.2.9) can be easily construed in an information-theoretic fashion, such
as when logic gates (e.g. upon qubits) are sought to be engineered.
As noted in [15], an important distinction between classical and quantum control
270 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
where |·| denotes the operator norm. This theorem is denoted the quantum recurrence
theorem.
A.3.1 Overview
We conclude this background Appendix with a brief synopsis of open quantum
systems and quantum control. This is of particular relevance to Chapter 3, where our
QDataSet models the influence of noise upon one- and two-qubit systems. Moreover,
A.3. OPEN QUANTUM SYSTEMS 271
.
H(t) = H0 (t) + H1 (t) = Hd (t) + Hctrl (t) + HSE (t) + HE (t) . (A.3.1)
| {z } | {z }
H0 (t) H1 (t)
H0 (t) is defined as before to encompass the drift and control parts of the Hamilto-
nian. The new term H1 (t) now consists of two terms: HSE (t) represents an inter-
action term with the environment, while HE (t) represents the free evolution of the
environment in the absence of the system. In this case, the Hamiltonian controls the
dynamics of both the system and environment combined in a highly non-trivial way.
In other words, the state becomes the joint state between the system and environ-
ment. The combined system and environment then become closed. Modelling such
open quantum systems is complex and challenging and is typically undertaken using
a variety of stochastic master equations [144] or sophisticated noise spectroscopy.
As detailed in Chapter 3, the QDataSet contains a variety of noise realisations for
one and two qubit systems together with details of a recent novel operator [82] for
characterising noise in quantum systems. As such we briefly summarise a few key
concepts relating to open quantum systems. Detail can be found in Wiseman and
Milburn [144] and other standard texts.
272 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
and (c) complete positivity, namely S must not only map positive operators to
positive operators for the system but also when tensored with the identity operator
on any auxiliary system R:
Definition A.3.2 (Lindblad Master Equation). The mixed unitary and non-unitary
time-evolution of a quantum system represented by ρ interacting with its environment
is given by the Lindblad master equation.
dρ X † 1n † o
= −i[H, ρ] + γk Lk ρLk − Lk Lk , ρ (A.3.5)
dt k
2
Here ρ is the density matrix of the quantum system, H is the Hamiltonian of the
P
system, dictating the unitary evolution due to the system’s internal dynamics, k
indicates a sum over all possible noise channels k affecting the system and γk are the
rates at which the system interacts with the environment through the k th channel,
quantifying the strength of the non-unitary processes. Of importance are Lk , the
Lindblad operators associated with each noise channel, describing how the system
interacts with its environment and, in particular, how the environment transforms
ρ in a way that affects properties of interest, such as its coherence.
Lindblad operators Lk are not superoperators themselves, but they are integral
components in the definition of the Lindblad superoperator, which describes specific
physical processes (e.g., photon emission, absorption, or scattering). In contrast, the
superoperator L, represents the entire evolution of the quantum system including the
dissipative dynamics. To understand this within the context of quantum trajectories,
where the evolution of a system is modeled under continuous measurement, the
stochastic master equation can be written as:
dρ X √
= −i[H, ρ]dt + L[Lk ]ρdt + ηK[L]ρdWt (A.3.6)
dt k
In the equations above, L represents the Lindblad operator, ρ is the density matrix of
the system, η is the efficiency of the measurement, and dWt is the Wiener increment
representing the stochastic process of the measurement that models the infinitesimal
evolution of processes that have continuous paths. The equation comprises:
(i) a Hamiltonian term −i[H, ρ], describing the unitary evolution of a closed sys-
tem;
(ii) a Lindbladian (superoperator) term L[L]ρ, which describes the average effects
of system-environment interactions and is given by:
o
† 1n †
L[Lk ]ρ = γk Lk ρLk − Lk Lk , ρ (A.3.7)
2
(iii) a backaction (superoperator) term K[L]ρ, which accounts for the changes in
274 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
√ √
ηK[L]ρ = η(Lρ + ρL† − ρ Tr(Lρ + ρL† )). (A.3.8)
where η denotes the influence of the backaction term. As Wiseman and Mil-
burn note, in the Heisenberg picture the backaction is manifest in the system
operators rather than state.
Note for completeness that the Lk terms represent the channels through which the
quantum system interacts with its environment i.e. the quantum trajectories. By
contrast, the backaction term seeks to model the effect of the (usually single) mea-
surement channel (which is why we do not sum over k).
with frequency ω of the pulse (e.g. some signal) and F (ω) represents the amplitude
of the control pulse. If F (ω) is confined within a specific frequency band, |ω| ≤ Ω0 ,
then the quantum system’s response to environmental noise can be represented by
the convolution:
Z ∞
I(ω) = dωF (ω)S(ω), (A.3.10)
−∞
where S(ω) is the noise power spectral density (PSD), a measure of noise intensity. In
such cases, only noise components within the control’s effective bandwidth, |ω| ≤ Ω0 ,
are relevant to solving the quantum control problem. Connecting with the Lindblad
master equation (see equation A.3.5), each Lindblad operator Lk corresponds to a
type of noise interaction between the quantum system and the environment. The
decoherence rates γk can be derived from correlation functions of noise operators
coupling to the system as represented via the Lk operators. Correlation functions
capture temporal correlations of the noise operators β(t) on the system. The (two-
A.4. ANALYSIS, MEASURE THEORY & PROBABILITY 275
where ⟨·⟩ denotes the expectation with respect to the environmental state, and
β(t) = eiHenv t β(0)e−iHenv t in the Heisenberg picture with Henv being the Hamiltonian
of the environment. The PSD S(ω) is then obtained from the Fourier transform of
G(t):
Z ∞
S(ω) = G(t)e−iωt dt. (A.3.12)
−∞
where the proportionality constant η depends upon specific interactions between the
system and environment, such as coupling constants or environmental energy states.
Definition A.4.1 (Compact Set). The set A ⊆ V is compact if, for every sequence
(vj ) in A, there exists a subsequence (vjk ) that converges to a vector v ∈ A i.e.
(vjk ) → v. For finite spaces, a set A ⊆ V is compact if and only if it is both closed
and bounded by the Heine-Borel theorem.
value on A (i.e. analogous to the mean value theorem). For A ⊆ V compact and
f : V → W continuous on A, continuous maps preserve compactness i.e. the image
f (A) = {f (v) : v ∈ A} is compact.
Given A ⊆ V (C), B ⊆ W (C), we set some basic properties of measurable sets
relevant to various chapters.
Definition A.4.2 (Borel sets). A Borel set or subset C ⊂ A is a set for which (a) C
is open in A, (b) C is the complement of another Borel subset or (c) for a countable
set {Ck }, Ck ⊂ A then C = ∞
S
k Ck .
with:
Z ∞
E(X) = P r(X ≥ λ)dλ (A.4.4)
0
for X ≥ 0.
A similar formalism then applies for quantum states described via alphabets Σ
with probability vectors p ∈ P(Σ). The random variable X takes the form of a
mapping X : Σ → R such that for Γ ⊆ Σ:
X
P r(X ∈ Γ) = p(a) (A.4.5)
a∈Γ
with p(a) the probability of the state described by a vector in Σ. Expectation be-
P
comes the familiar discretised form E(X) = a p(a)X(a). It can be shown that
278 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
α2
Z
1
γ(A) = √ exp − dα (A.4.6)
2 A 2
||u||2
Z
−n/2)
γn (A) = (2π) exp − dvn (u) (A.4.7)
A 2
γn (U A) = γn (A) (A.4.8)
where A ⊆ Rn is Borel and U ∈ B(Rn ). The consequence is that for i.i.d. random
variables Xk , the Gaussian measure projected onto a subspace is equivalent to a
Gaussian measure on that subspace i.e:
n
X
Yk = U Xj (A.4.9)
j
Definition A.4.8 (Haar measure). The Haar measure on a locally compact group
G is a measure µH that is invariant under the group action. For the unitary group
A.4. ANALYSIS, MEASURE THEORY & PROBABILITY 279
U (X ), it is defined as:
µH : Borel(U (X )) → [0, 1]
satisfying for all g ∈ U (X ) and for all Borel sets A ⊆ Borel(U (X )):
Appendix (Algebra)
B.1 Introduction
B.1.1 Overview
A major focus of quantum geometric machine learning is framing quantum control
problems in terms of geometric control where target quantum computations are rep-
resented as UT ∈ G and G is a Lie group of interest. In this generic model, we are
interested then in constructing time-optimal Hamiltonians composed of generators
from the associated Lie algebra g (or some subspace thereof) and associated control
functions u(t) so as to evolve the quantum system towards the target UT in the
most efficient manner possible. Doing so according to optimality criteria, such as
optimal (minimum) time or energy in turn requires a deep understanding of quan-
tum information processing concepts, but also algebraic and geometric concepts at
the heart of much modern quantum and classical physics. By using methods that
leverage symmetry properties of quantum systems, time-optimal solutions can often
be more easily approximated to find solutions or solution forms that would other-
wise be intractable or numerically infeasible to discover. This synthesis of geometry
and algebra is made possible by seminal results in the history of modern mathe-
matics which established deep connections between algebra, analysis and geometry.
Connections between branches of mathematics was prefigured in much ancient med-
itations of natural philosophy and becoming a central motivation for so much of
19th century mathematical innovations in geometry, analysis, algebra and number
theory that led to the hitherto unparalleled development of the field from Hilbert on-
wards. The widespread application of ideas drawn from these disciplines has been of
significance within physics and other quantitative sciences, including more recently
computer science and machine learning fields.
In this Appendix, we cover the key elements of the theory of Lie algebras and
Lie groups relevant to the quantum geometric machine learning programme and
281
282 APPENDIX B. APPENDIX (ALGEBRA)
greybox machine learning architectures discussed further on. The exegesis below is
far from complete and we have assumed a level of familiarity with elementary group
and representation theory. The Appendix concentrates on key principles relevant
to understanding Lie groups and their associated Lie algebras, with a particular
focus on matrix groups and foundational concepts such as bilinear (Killing) forms,
the Baker-Campbell-Hausdorff theorem and other elementary principles. We then
proceed to a short discussion of elements of representation theory, including a discus-
sion of semi-simple Lie group classification, adjoint action, Cartan decompositions
and abstract root systems. The material below is related throughout to the novel
results in this work in later Chapters, as well as the other Appendices. The content
below is also interconnected with the following Appendix C on geometric principles
(originally a single Appendix, it was split for readability), especially regarding differ-
entiable manifolds, geodesics and geometric control. Most of the material is sourced
from standard texts such as those by Knapp [9, 230], Helgason [2], Hall [235] and
others. As with the other Appendices, proofs are largely omitted (being accessible
in the referenced texts) and content is paraphrased within exposition regarding its
applicability to the main thesis. Readers with sufficient background in these topics
may skip this Appendix.
B.2.1 Overview
An essential concept in physics is symmetry, especially rotational and Lorentz sym-
metry in various physical systems. Symmetries are often described by continuous
groups—parameterized groups with a manifold structure that allows smooth op-
erations. These are known as Lie groups, which possess a differentiable manifold
structure that supports smooth group operations. The tangent space at the identity
element of a Lie group forms a Lie algebra through a natural bracket operation,
encoding the properties of the Lie group while being more tractable due to its linear
structure.
In quantum mechanics, symmetry is reflected in the Hilbert space through a
unitary action of a symmetry group. This is formalized by a unitary represen-
tation, a continuous homomorphism from a symmetry group G into the group of
unitary operators U ∈ B(H) acting on the quantum Hilbert space H (axiom A.1.1).
However, in reality, physical states correspond to unit vectors in H differing by a
phase, so we use projective unitary representations, mapping G into the quotient
U (H)/U (1), where U (H) is the set of unitaries of H and U (1) is the group of phase
factors, or complex numbers of unit magnitude. This captures the essence of physi-
B.2. LIE THEORY 283
Lie groups encapsulate the concept of a continuous group allowing the study
of infinitesimal transformations (such as translations and rotations) and associ-
ated symmetry properties. The ability to analyze infinitesimal transformations
is what connects Lie groups to differential geometry and physics, particularly in
the context of studying continuous symmetries relevant to symmetry reduction.
The corresponding Lie Groups are also equipped with product and inverse maps
µ : G × G → G, ℓ : G → G. An important property of Lie groups is left translation
and the related concept of left invariance, central to the construction of invariant
measures (such as the Haar measure) on groups. For a Lie group G, left translation
by g ∈ G, denoted Lg : G → G, Lg (h) 7→ gh for h ∈ G, is a diffeomorphism from G
onto itself. Right translations are similarly defined as Rg (h) 7→ hg. Left invariance
of a vector field X is defined where (dLg -1 h )(Xg ) = Xh (i.e. that X commutes with
left translations) i.e if, for all g ∈ G, we have dLg (Xe ) = Xg , where e is the identity
element in G and dLg is the differential of left translation by g.
Given a smooth function f : G → R, we denote fg (h) = f (gh) for g, h ∈ G
the left translate. A vector field X on G is itself left invariant if (Xf )g = X(fg ).
284 APPENDIX B. APPENDIX (ALGEBRA)
Such left invariant X form the Lie algebra g of G. The vector field X can then be
regarded as a set of tangent vectors Xg at g ∈ G. The special relationship with the
identity is then further understood as follows: the mapping X → XI (where I is the
identity element of G) is a vector space isomorphism between g to the tangent space
XI (mapping vector fields to their vectors at the identity element, thus identifying
the Lie algebra with the tangent space at the identity), thereby (as we discuss
below and in the next Appendix) encoding the structure of the group through its
tangent space at the identity. This isomorphism preserves (and thus ‘carries’) the
Lie bracket, allowing the tangent space at the identity in G to be identified as the
Lie algebra g itself. Thus the entire Lie algebra, and thus group, can be studied in
geometric terms of the infinitesimal translations near the identity.
Definition B.2.2 (General linear group). The group of all invertible n-dimensional
complex matrices is denoted the general linear group:
Definition B.2.3 (Lie group homomorphism). Given matrix Lie groups G1 and
G2 , a Lie group homomorphism φ : G1 → G2 is a continuous group homomorphism.
Such a homomorphism is an isomorphism if it is bijective and with a continuous
inverse. Lie groups for which there exists such an isomorphism are equivalent (up
to the isomorphism).
This group is the group formed by the set of unitary operators in definition
A.1.18. Unitary operators preserve inner products such that U is unitary if and
only if ⟨U v, U w⟩ = ⟨v, w⟩ for all v, w ∈ H. The special unitary group SU (n) is the
set of all U (n) such that det U (n) = 1. For the geometric study of unitary sequences
(for minimisation and time-optimal problems), we characterise such sequences as
paths on a Lie group manifold which is sufficiently connected to enable paths to be
well defined.
For qubits, SU (2) allows for a consistent and complete description of qubit states and
their transformations, which in turn is essential for the design and understanding of
quantum algorithms and the behaviour of quantum systems. This is also the case for
multi-qubit systems, which can be described in terms of tensor products of SU (2).
For multi-qubit computations (and even single-qubit ones), we are often interested
in decomposition into a sequence of single-qubit gates. The simply connected nature
of gates drawn from SU (2) guarantees the density of universal gate sets, allowing
286 APPENDIX B. APPENDIX (ALGEBRA)
(i) e0 = I
Tr †
(ii) eX = (eX )Tr and eX = (eX )†
−1
eY XY = Y eX Y −1 .
Definition B.2.6 (Lie algebra and Lie derivative). An algebra g is a vector space
over a field K equipped with a bilinear form (product) [X, Y ] for X, Y ∈ g that is
linear in both variables. Such an algebra is a Lie algebra if:
The product [X, Y ] above is denoted the Lie derivative (sometimes the Lie
bracket). In quantum information contexts, the Lie derivative is denoted the com-
mutator. The commutator exhibits a number of properties set out below.
1. linearity in the first and antilinearity in the second arguments [αA, B + βC] =
α[A, B] + αβ ∗ [A, C] for α, β ∈ C;
In geometric contexts, the Lie bracket (commutator) is equated with the Lie
derivative, satisfying the conditions above. We use these terms interchangeably
as appropriate for the context. Note that for g ⊂ A (where A is an associative
algebra), g can be constructed as a Lie algebra using the Lie bracket [x, y] = xy −
yx, ∀x, y ∈ g (in such cases A is the universal enveloping algebra of g). An important
characteristic of Lie algebras is the action of a Lie algebra upon itself as represented
by the adjoint action, realised in the form of the Lie derivative.
Definition B.2.7 (Adjoint action). The adjoint action of a Lie algebra upon itself
ad : g → EndK (g) is given by:
where EndK (g) represents the set of K-scalar endomorphisms of g upon itself and
X, Y ∈ g.
More intuition about the relationship between the Lie derivative and differential
forms is provided by showing that the adjoint action is a derivation satisfying the
Jacobi identity is equivalent to:
where a, b ⊂ g:
for Xk , Yk ∈ gk .
For Cartan decompositions, we require the classification of semi-simple Lie groups.
This in turn requires other sundry definitions.
Definition B.2.10 (Centralizer and Normalizer). The centralizer Zg (s) of s ⊂ g is
a subalgebra of g comprising elements that commute with all of s:
From these definitions we obtain the concepts of g being solvable if g j = 0 for some
j and nilpotent if gj = 0 for some j (nilpotent implies solvable). We can now define
simple and semi-simple Lie algebras as follows:
B.2. LIE THEORY 289
This last test for semisimplicity is equivalent to the radical rad g = 0. Simple
g are semisimple by construction. They are algebras are closed under commutation
[g, g] = g (important for control). Semisimple g have empty centers. The criterion
for semi-simplicity is related together with the Killing form below to Cartan’s sem-
inal classification of Riemannian symmetric spaces (see section C.3) which is the
subject of our work in Chapter 5.
for X, Y ∈ g.
where X ∈ g or some other vector space, such as Mn (C). More intuition can be
obtained via the derivative of the matrix exponential function:
d tX
e = X. (B.2.15)
dt t=0
It can then be shown that there exists a unique element of the Lie algebra
B.2. LIE THEORY 291
(representation) used to establish the relationship between a Lie algebra and Lie
group.
These relations show how continuous symmetries of Lie groups may be studied
via the Lie algebra as a tangent space at the identity element of the group. In
quantum information contexts, the matrix exponential allows the study of system’s
dynamics via mapping the Hamiltonian H (see definition A.1.33) to group elements
represented as unitary operators (definition A.1.18). We now formalise this relation-
ship.
Definition B.2.16 (Lie Algebra of a Matrix Lie Group). Given a matrix Lie group
G ⊆ GL(n; C), the Lie algebra g for G is defined as follows:
d tX −tX
[X, Y ] := XY − Y X = e Ye (B.2.16)
dt t=0
belongs to g. The first three properties of g say that g is a real vector space. Since
Mn (C) is an associative algebra under the operation of matrix multiplication, the last
property of g shows that g is a real Lie algebra. Equation (B.2.16) also indicates that
the Lie algebra g is the set of derivatives at t = 0 of all smooth curves γ(t) ∈ G where
such curves equal the identity at zero. It thus provides a bridge between typical
algebraic formulations of commutators and geometric representations of tangents.
In the next section we provide more detail on the relationship between Lie theory
and differential geometric concepts such as curves. Note that we can define the Lie
292 APPENDIX B. APPENDIX (ALGEBRA)
The significance of the exponential map can be further understood as follows. Group
actions Φ : G → G have a corresponding map in the Lie algebra ϕ : g → g such
that:
where ϕ(X) = dtd Φ(etX )|t=0 (implying smoothness for each X). Here ϕ is a linear
homomorphism such that:
Theorem B.2.17 (Lie algebra tangent space correspondence). Each Lie algebra g of
G is equivalent to the tangent space to G at the identity. The algebra g is equivalent
to the set of X ∈ Mn (C) such that there exists a smooth curve γ : R → Mn (C) ⊆ G
with γ(0) = I, γ ′ (0) = X.
This can be seen given equation (B.2.18) in concert with γ(t) = eX(t) (such that
γ ′ (0) ∈ g). Moreover, for a matrix Lie group that is connected, then it can be shown
that for γ ∈ G, there is a finite set of elements in g that generate γ. These facts are
B.2. LIE THEORY 293
B.2.8 Homomorphisms
The relationship between Lie algebras and Lie groups via the exponential map is
considered a lifting of homomorphisms of Lie algebras to homomorphisms of (an-
alytic) groups. That is, for G, H analytic subgroups, G simply connected and the
Lie algebra homomorphism φ : g → h, then there exists a smooth homomorphism
Φ : G → H such that dΦ = φ. As [9] notes, there are two equivalent ways to express
such a lifting: either relying on lifting homomorphisms to simply connected analytic
groups, or relying on existence theorems for sets of solutions to differential equa-
tions. The first approach involves defining curves γ : R → G with γ(t) 7→ exp(tX).
One then defines d/dt and X̃ as left invariant fields on R, G such that:
d d d
dγ(t) f= f (γ(t)) = f (exp(tX)) = X̃f (exp(tX))
dt dt dt
hence we see the important explicit relationship between the left-invariant vector
fields X̃ and d/dt. Among other things, this shows the mapping X → exp(X)
is smooth (given the smoothness of d/dt). Expressed in local coordinate charts,
the equation above represents a system of differential equations satisfied by curves
γ(t) represented as exponentials. Among other aspects, this relation enables the
machinery of analysis to be brought to bear in group theoretic problems in quantum
information contexts. For example, it enables the application of Taylor’s theorem
such that:
dn
(X̃ n )(g exp(tX)) = (f (g exp(tX))
dtn
d
X̃f (g) = f (g exp tX) t=0
(B.2.21)
dt
294 APPENDIX B. APPENDIX (ALGEBRA)
shows again how the operation of left-invariant vector fields X̃ is equated with
differential operators.
(see Hall [235] §5 for proofs including the Poincaré integral form). The BCH
formula is important in quantum control settings in particular as we discuss in
other parts of this work. Moreover, equation (B.2.22) fundamentally elucidates
the important role of adjoint action (the commutator) in shaping quantum state
evolution via its effect upon exponentials (as diffeomorphic maps on M) which
arises from the non-commutativity of the Lie bracket.
B.3.1 Overview
In this section, we briefly cover elements of the representation theory of semi-simple
groups. This is relevant to geometric treatments involving Lie algebras. Represen-
tations are abstractions (homomorphisms) of groups (in terms of actions on gener-
alised linear (invertible) vector spaces, i.e. on GL(n; C) the group of invertible linear
transformations of a finite-dimensional vector space V for which the associated Lie
algebra is gl(n, C).
B.3.2 Representations
We begin with the definition of representations.
B.3. REPRESENTATION THEORY 295
systems, Cartan matrices and such diagrams in quantum control settings is explored
primarily in the final Chapter of this work.
In the above, the former being the Lie algebra adjoint, the latter the group
adjoint action, the two concepts being related by the exponential map. We denote
adg (X) = gXg -1 . We note that each Lie group exhibits the structure of a manifold
that is real and analytic (under multiplication and inversion and for the exponential
map). Moreover, as noted in [9], it can be shown that each real Lie algebra has a
one-to-one finite dimensional representation on complex vector spaces V (C).
eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X). (B.3.2)
The cos adΘ (X) term can be understood in terms of the cosine expansion:
∞
X (−1)n
cos adΘ (X) = (adΘ ))2n (X). (B.3.3)
n=0
(2n)!
B.3. REPRESENTATION THEORY 297
Thus each term is a multiple of ad2θ . For certain choices of generators Θ and X ∈ k
this effectively means that the adjoint action acts as an involution up to scalar
coefficients given by the relevant root functional α(X) (see below). Thus in the
SU (2) case, ad2nJy (Jz ) ∝ Jz (with n = 0, 1, 2, ...), each application of the adjoint
action acquires a θ2 term (from Θ = iθJy ), such that the series can be written:
∞
X (−1)n
cos adΘ (−iJz ) = (θ))2n (Jz ) = cos(θ)(−iJz ) (B.3.4)
n=0
(2n)!
More generally, the Cartan commutation relations exhibit such an involutive prop-
erty given that for Θ ∈ k:
(for Φ ∈ k) such as in equation (5.3.20) and similarly for the sine terms. For the
sin adΘ (X) terms, by contrast, the adjoint action is of odd order:
∞
X (−1)n
sin adΘ (X) = (adΘ ))2n+1 (X) (B.3.7)
n=0
(2n + 1)!
Definition B.3.3 (Real form). We say that a real vector space V (R) is the real
form of a complex vector space W (C) when the two are related as:
W R = V ⊕ iV
B.4.2 Roots
We define root systems as per below [9, 235]. Given a subalgebra h ⊂ g with a
(diagonal) basis H and a dual space h∗ with basis elements ej ∈ h∗ , we recall that
the duals are a vector space of functionals ej : V → C. For a matrix Lie algebra,
We can construct a basis of g given by {h, Eij } where Eij is 1 in the (i, j) location
and zero elsewhere. We are interested in the adjoint action of elements of H ∈ h on
each such eigenvector as:
where ej selects out the (j, j)th element of a matrix (recalling duals ej (v) = vj δij , vj ∈
C form a vector space of linear functionals). That is α = ei − ej is a linear functional
on h such that α : h → C. Such functionals α are denoted roots. Thus g can be
decomposed as:
with [Eij , Eji ] = Eii − Ejj ∈ h. Roots are then defined as positive or negative (with
− and may be ordered as a sequence. Each root is a nongeneralised weight of adh (g)
(a root of g with respect to h). The set of roots is denoted ∆(g, h).
Vα = {v ∈ V |(π(H) − α(H)1)n v = 0, ∀H ∈ h}
for Vα ̸= 0.
gα = {X ∈ g|(adH − α(H)1)n v = 0, ∀H ∈ h}
g = h ⊕α∈∆ gα
i.e. π is the adjoint action of h on X ∈ g. Here g0 (not to be confused with the real
g0 we discuss below) is then the weight space subalgebra associated with the zero
weight under this adjoint action, it denotes the weight space corresponding to the
zero weight in the decomposition of the Lie algebra g (the centralizer of h ⊂ g). We
come now to the definition of a Cartan subalgebra.
Cartan subalgebra h is the maximally abelian subalgebra such that the elements of
adg (h) are simultaneously diagonalisable. Moreover, all such Cartan subalgebras of
g are conjugate via an automorphism of g in the form of an adjoint action. With this
understanding of Cartan subalgebras we can now generalise the concepts of roots.
Roots exhibit a number of properties [9]. We set out a number of these relevant to
results in later Chapters and in particular our methods in Chapter 5.
Definition B.4.3 (Roots). Non-zero generalised weights of adh (g) are denoted the
roots of g with respect to the Cartan subalgebra. Denote the set of roots α is ∆(g, h)
Given a Killing form B(·, ·), a few standard results hold: (a) B(gα , gβ ) = 0 for
α, β ∈ ∆ (including 0); (b) α ∈ ∆ =⇒ −α ∈ ∆, (c) each root α is associated with
a Hα ∈ h such that we can identify a correspondence between the root functional
and Killing form given by:
α(H) = B(H, Hα ) ∀H ∈ h
[H, Eα ] = α(H)Eα
(see [9] for standard results e.g Lemma 2.18). Other results include that adh (g) is
simultaneously diagonalisable, and that the Killing form has an expression in terms
of the functional:
X
B(H, H ′ ) = α(H)α(H ′ ) B :h×h→C
α
together with normalisability (such that B(Eα , E−α ) = 1) and standard commuta-
tion relations:
[Hα , Eα ] = α(Hα )Eα [Hα , E−α ] = −α(H−α )E−α [Eα , E−α ] = Hα . (B.4.3)
(p, q ≥ 0) with:
2 ⟨β, α⟩
p−q = .
⟨α, α⟩
⟨β, α⟩
sα · β = β − 2 α, β∈E
⟨α, α⟩
⟨β, α⟩
∈ Z.
2⟨α, α⟩
The rank of V is dim(V ). The elements α ∈ ∆ are roots. A root system is reduced
if it satisfies (a) to (d) above, or non-reduced if it doesn’t satisfy (b). Symmetric
spaces root systems of interest in this work are of the non-reduced type. sα is a
reflection about the hyperplane with respect to α (an orthogonal transformation
with determinant −1). Such reflections are said to generate an orthogonal group
of V , the Weyl group W , generated by such reflections. This general formulation
allows root systems to be associated to all complex semisimple Lie algebras.
θ ∈ (0, π/2) then ±(α − β) are roots. These angles can be used to plot the relevant
root system (see Chapter 5). In more general settings, the geometric representation
of roots, under certain assumptions, then exhibits certain permutation symmetries
and the applicable Weyl group is defined in terms of these.
One final concept before coming to the Cartan matrix is that of fixing a lexicographic
ordering related to the concept of positivity of roots. Given such a set of roots:
∆ = {α1 , α2 , . . . , αl }
to the lexicographic ordering if the first non-zero coefficient bj in its expansion with
respect to the basis ∆ is positive. Conversely, the root β is negative if this first
non-zero coefficient is negative. The lexicographic order is defined by comparing
roots: for two distinct roots β = li=1 bi αi and γ = li=1 ci αi , we say that β > γ
P P
if, at the first index j where bj and cj differ, we have bj > cj . This comparison
allows partitioning the root system into two disjoint sets ∆+ and ∆− of positive and
negative roots.
The highest root in a root system is the maximal element with respect to this
lexicographic ordering and is always a positive root. This root plays a pivotal role
in the structure and representation theory of semisimple Lie algebras.
As we articulate in Chapter 5, the construction of the abstract root system for Lie
algebras (related to quantum control problems of interest) involves determining root
systems. We note that α is simple if α > 0 and it is linearly independent of other
P
roots. It can be shown that any positive root can be decomposed as α = i ni αi
(with αi simple and ni ≥ 0). The sum of ni is an integer denoted the level of α with
respect to the set of simple roots (important as the set of simple roots generates the
entire root system through integer linear combinations).
We now define the Cartan matrix used in Chapter 5. To do so we fix a root system
∆, assume each α ∈ ∆ is reduced and fix an ordering as per above. Data from
Cartan matrices may be represented in diagrammatic form via Dynkin diagrams.
Definition B.4.5 (Cartan matrix). Given ∆ ∈ V and simple root system Π = {αk },
304 APPENDIX B. APPENDIX (ALGEBRA)
⟨αi , αj ⟩
Aij = 2 (B.4.4)
|αi |2
While Dynkin diagrams are not focus for use in this work, we do relate them
to quantum control results in the Chapter 5, hence here we mention a few sundry
properties. For an l × l Cartan matrix A, the associated Dynkin diagram has the
following structure: (a) there are at most l pairs of vertices i < j having one edge
(at least) between; (b) the diagram has no loops; and (c) at most three edges (triple
points) connect to any node. The Cartan matrix may be used to determine a Dynkin
diagram (up to scaling factors) with a description set out in Fig. B.1 (see [9] for
more detailed exposition).
α1 α2 αn
Figure B.1: Expanded Dynkin diagram of type An with labeled vertices and edges. The numbers
above the nodes indicate the length of the roots relative to each other. Aij Aji determines the
number of lines (or the type of connection) between vertices i and j. This product can be 0 (no
connection), 1 (single line), 2 (double line), or 3 (triple line), representing the angle between the
corresponding roots. Additionally, when Aij Aji > 1, an arrow is drawn pointing from the longer
root to the shorter root.
By constructing an abstract reduced root system, one can then show that for two
complex semisimple Lie algebras with isomorphic abstract reduced root systems, the
B.5. CARTAN DECOMPOSITIONS 305
associated Cartan matrices are isomorphic. If two complex semisimple Lie algebras
have the same Dynkin diagram, they are isomorphic, thereby sharing the same
algebraic structure. Operations on Cartan matrices correspond to operations on
Dynkin diagrams, allowing visualisation of transformations. It can be shown that
the choice of Cartan matrix is independent of the choice of positive ordering up to
a permutation of index and that the Cartan matrix determines the reduced root
system. The Weyl group, as the set of orthogonal transformations , can be used to
determine ∆+ and Π.
For classical semisimple groups, the matrix Lie algebra g0 over R, C is closed under
conjugate transposition, in which case g0 is the direct sum of its skew-symmetric k0
and symmetric p0 members. Recall from equation (B.3.9) that complexification of
g0 is denoted g = gC0 = g0 + ig0 . The real Lie algebra g0 has a decomposition as the
direct sum g0 = k0 ⊕ p0 . We can construct a real vector space of complex matrices
(that is, where coefficients of matrices are real) as u0 = k0 + ip0 as a subalgebra. As
Knapp notes [9], there are certain requirements with respect to g0 and u0 such as
k0 = g0 ∩ u0 and p0 = g0 ∩ iu0 which allow us to decompose g0 = k0 ⊕ p0 in a way
that in turn allows the complexification:
g=k⊕p (B.5.1)
306 APPENDIX B. APPENDIX (ALGEBRA)
As the focus of this work is geometric and machine learning techniques for Lie
groups and algebras of relevance to quantum control, we omit specific standard
steps showing the complexification of g = k0 ⊕ p0 from this work, see [9, 230, 235]
for more detail. Note that we can define the mapping θ(X) = −X † (negative
of complex conjugate transpose) as an involution (that squares to identity) with
θ[X, Y ] = [θ(X), θ(Y )]. We then define a Cartan involution as follows.
g=k⊕p (B.5.3)
Here k, p are orthogonal with respect to the Killing form in that Bg (X, Y ) = 0 =
Bθ (X, Y ) i.e. for X ∈ k and Y ∈ p we have B(X, Y ) = 0. For the corresponding Lie
group G, the existence of the Cartan decomposition of g (where K ⊂ G is analytic
with Lie algebra k) gives rise to a global Cartan decomposition of G. The global
S
Cartan decomposition G = K exp(p) together with the fact that p = k∈K Adk a
means that the global Cartan decomposition is expressible as G = KAK which we
define below.
U = kac U = pk (B.5.5)
For our results in Chapter 5, we require transformation of the relevant Cartan sub-
algebra (e.g. for su(3)) to one which is maximally non-compact in order to prove
our results relating to time-optimal control. This is because we require our max-
imally abelian subalgebra a ⊂ h to be an element of our control subset, i.e. we
require a ⊂ p. Because Cartan decompositions g = k ⊕ p with associated involu-
tions and h ⊂ g are conjugate up inner automorphisms of g (denoted g, namely
those which have a representation as conjugation by an element g ∈ G, that is
h′ = Intg (h) = ghg -1 ) we can apply a generator in k to transform to a new subalge-
bra that is maximally non-compact (see below). In the literature, it can be shown
that in particular any such h is thereby conjugate to a θ−stable Cartan subalgebra
(for involution θ). This means that we can decompose the Cartan subalgebra as
h = t ⊕ a where t ⊂ k, a ⊂ p. As Knapp notes, the roots of (g, h) are real on
a0 ⊕ it0 . The subalgebras t and a are compact and non-compact subalgebras of h
respectively, with dim t being the compact dimension and dim a being the compact
dimension. The Cartan subalgebra h = t ⊕ a is maximally compact if its compact
subalgebra is as large as possible and maximally non-compact in the converse case.
As a ⊂ p, for a given Cartan decomposition, h is maximally compact if and only if
a is a maximally abelian subspace of p.
308 APPENDIX B. APPENDIX (ALGEBRA)
Eγ + E−γ = λ4
The study of geometric methods usually begins with the definition and study of the
properties of differentiable manifolds. One begins with the simplest concept of a
set of elements, characterised only by their distinctiveness (non-identity), akin to
pure spacetime points M. The question becomes how to impose structure upon
M in a useful or illuminating manner for explaining or modelling phenomena. To
do so, in practice one assumes that M is a topological space, following which ad-
ditional structural complexity, necessary for certain operations, such as analytical
operations (differentiation etc) or measures (for integration), is added. The mate-
rial on differential geometry in this Appendix is drawn from standard texts in the
field [2, 9, 44, 48–52]. Most results are presented without proofs which can be found
in the collection of resources noted above. The Chapter begins with an outline of
basic conceptualisation of manifolds, tangent planes, pushforwards and bundles. It
them moves onto vector (and tensor) fields, n-forms, connections and parallel trans-
port. The second section of this Appendix concentrates on geometric control the-
ory, drawn primarily from the work of Jurdjevic [23, 25, 26, 60, 213], Boscain [61, 62],
D’Alessandro [15] and others. It focuses primarily on classical geometric control
theory, but the results, especially where controls are constructed from semi-simple
Lie groups and algebras, carry over to the quantum realm.
309
310 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
Proposition C.1.1 (Lie Group (Manifold)). A Lie group G equipped with a dif-
ferentiable structure is a smooth differentiable manifold via the smoothness of group
operations (arising from the continuity properties of G i.e. being a topological group)
given by multiplication m : G × G → G, defined by m(g, h) = gh, and inversion
i : G → G, defined by i(g) = g −1 .
charts (U, ϕ)M → (V, ψ)N . The idea is that we can say the function f between
manifolds is a C r function if all local representatives between the two coverings are
themselves C r functions i.e. we can designate f as a C r function if for all coverings
(basically atlases) of M, N by charts (or rather coordinate neighbourhoods), the lo-
cal representative functions are C r . Functions that are differentiable require r ≥ 1,
while smooth functions require C ∞ .
Usually we are working with C ∞ spaces. The complete atlas (a C ∞ atlas) pro-
vides the differential structure required for differential calculus and analytic approx-
imations thereby allowing a differentiable structure to be imposed on M (where
K = C one also imposes holomorphic constraints upon functions defined on M).
For brevity, we denote a manifold as differentiable whether complex or real. In dif-
ferential geometry, the structure (topology) preserving map is fulfilled by a C ∞ (or
technically C r where r may or may not be finite) function such that the function
f : M → N is a C ∞ diffeomorphism assuming f is a bijection and f, f -1 are both
C ∞ . Two diffeomorphic manifolds can be regarded as ‘equivalent’ or ‘copies’ of a
single abstract manifold.
Tx Rn := {v ∈ Rn+1 |x · v = 0} (C.1.1)
γ :R ⊃ (−ϵ, ϵ) → M t 7→ γ(t)
As t varies ‘infinitesimally’ within (−ϵ, ϵ), the curve moves infinitesimally away
from p ∈ M but that it stays within the neighbourhood Up . Next, we define a
tangent as a property of curves.
Definition C.1.4 (Tangent). Curves are tangent if the image of two curves (i.e.
two maps) at t = 0, t ∈ (−ϵ, ϵ) and their derivatives are identical. Formally, two
curves γ1 , γ2 are tangent at a point p ∈ M if:
(ii) The derivatives of two curves in a local coordinate system are the same i.e.
in some local coordinate system (x1 , ..., xm ) around the point p, the two curves
are ‘tangent’ in the usual sense as curves in Rm i.e:
dxi dxi
(γ1 (t)) = (γ2 (t)) i = 1, ..., m
dt 0 dt 0
C.1. DIFFERENTIAL GEOMETRY 313
If γ1 , γ2 are tangent in one coordinate system, then they are tangent in any other
coordinate system that covers p ∈ M. Hence the definition of tangent is independent
of the coordinate system chosen. The equivalence class of curves satisfying this
condition at p ∈ M is sometimes denoted [γ] which can be shown to form a vector
space, hence we can reasonably denote tangents at p as vectors. Note that curves
may not be tangent at another point q ∈ M, hence tangent vectors are thought of
as local (around p) equivalence relations, hence the importance of connections (see
below) that, intuitively speaking, tell us how to map between tangent spaces across a
manifold. The idea here is that the equivalence relation among curves arises from the
tangent relation (curves are equivalent where at p (for t = 0) and their differentials
with respect to coordinate functions are identical. From this construction of the
tangent we obtain a number of important definitions:
(i) Tangent space: the tangent space Tp M to M at p ∈ M is the set of all tangent
vectors at p.
S
(ii) Tangent bundle: is defined as the union of tangent spaces i.e. T M := p∈M Tp M.
df (γ(t))
v(f ) =
dt 0
The directional derivative points in the direction of steepest ascent. From this, we
can define the gradient as follows.
for all v ∈ Tp M where v(f ) denotes the directional derivative of f in the direction
of v (as above), and ⟨·, ·⟩ denotes the applicable metric (see below) on M.
where ∂x∂ j ϕ(p) are the basis vectors of the tangent space Tp M in the coordinate
∂f j
chart, and ∂x j are the partial derivatives of f with respect to the coordinates x .
The inverse ϕ−1 reverses this mapping, translating coordinates in Rn (or Cn ) back to
points on the manifold M. We can understand how the ∇ ‘points’ in the direction
of steepest ascent by noting that that:
where cos θ is given by equation (A.1.1). Then |∇(f (p))| is maximal for cos(θ) = 1
which occurs when v and ∇(f (p)) are parallel. Hence ∇(f (p)) as a vector points
in the direction of maximal increase i.e. steepest ascent. To obtain the steepest
gradient descent, we utilise −∇. Although in this work we only touch upon the geo-
metric machine learning, this type of framing of tangents is important for geometric
framing of statistical learning methods, such as gradient descent (see section D.5.1),
specifically in relation to stochastic gradient descent optimisation (definition D.5.1).
As is shown in the literature, tangent spaces Tp M at p can be considered as
real vector spaces. The intuition for this is that two curves γi , γj cannot be added
directly because M lacks the structure to do so. However, the coordinate space to
which we have a mapping Rm is a vector space, thus one considered how each curve
γi , γj is represented in Rm such that t 7→ ϕ ◦ σ1 (t) + ϕ ◦ σ2 (t) which we can think of as
mapping from R into M then into Rm . The object ϕ ◦ σ1 (t) + ϕ ◦ σ2 (t) is considered
a curve in Rm which itself passes through the origin Rm when t = 0. This additive
object is then considered a curve in M that passes through p when t = 0. We
then define vector addition and scalar multiplication. From this we can show that
Tp M is a real vector space and that the definitions above that allow assertion of
the existence of a vector space of tangent vectors independent of the choice of chart
and distinct from curves γi . Because of this vector structure, the tangent bundle is
a vector bundle.
C.1.1.2 Push-forwards
h∗ :T M → T N (C.1.3)
v 7→ h∗ (v) (C.1.4)
h∗ (v) := h ◦ v (C.1.5)
where v = [γ].
they satisfy the requirements of a derivation. This is a useful way to build intuition
around tangent vectors as differential operators upon a space and can be seen via
the definition of tangent vectors themselves i.e:
df (γ(t))
v(f ) = |0
dt
As shown in the literature, it can be shown that the vectors in such a space of deriva-
tions have a representation as vectors in the space constituted by partial derivatives
as vectors, allowing the equating of derivations at p with the tangent space at p. In
this representation, the partial derivatives represent the basis vectors of the vector
space with coefficients (component or coordinate functions) as:
m m
X
µ ∂ X
µ ∂
v ∈ Dp (M) =⇒ v = v(x ) = v (C.1.11)
µ=1
∂xµ p µ=1
∂xµ p
where the set of real numbers {v 1 , ..., v m } or {v µ } are the components of the vector in
the ‘direction’ of ∂xµ . That is, it can then be shown that there exists an isomorphism
between Tp M and Dp M allowing, in particular expression of important terms such
as the Jacobian in terms of local representatives of push-forward maps between
tangent planes.
for p ∈ M.
[X, Y ] = XY − Y X (C.1.14)
and aligns as the Lie derivative (commutator) (see definition (B.2.6)). That is,
for the map X : C ∞ (M) → C ∞ (M), the map above from M → R defined by
(Xf )(p) = Xp F defines a geometric characterisation of the Lie derivative of the
function f along the vector field (assignment of tangent vectors which are opera-
tors) X. The set of all vector fields X on M is sometimes denoted V F ld(M) [48]
or X(M) (or Γ(M)) and carries the structure of a real vector space.
where X µ are the coefficient functions and the partial derivatives the basis. The
functions X µ are the components of the vector field X with respect to the coordinate
system associated with the chart (U, ϕ). The components are sometimes denoted X µ
(though notation may vary) i.e. they are the coordinate functions that tell us the
components of the vector at some point p ∈ M given a coordinate system. The form
of the components depends upon the coordinate chart in use. Given two coordinate
charges (U, ϕ), (U ′ , ϕ′ ) with U U ′ ̸= ∅, the tangent vector should be defined in a
S
′
X X
X= X µ ∂xµ = X ν ∂xν ′
µ ν
We may then express the coefficient functions in one coordinate chart (or frame) in
318 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
terms of another:
′
′
X ∂xν
Xν = Xµ
µ
∂xµ
As discussed above, this composed vector field is the commutator of vector fields
X, Y and is denoted [X, Y ]. In component form the commutator is represented as:
X
[X, Y ]µ = (X ν Yνµ − Y ν Xνµ ) (C.1.15)
ν
We now connect the theory of vector fields with integral curves and Hamiltonian flow,
on our way towards understanding the theory of Riemannian and sub-Riemannian
geodesic curves. Vector fields can be regarded as generators of an ‘infinitesimal’
diffeomorphism of a manifold e.g. in the form δ(xµ ) = ϵX µ (x). Recall that tangent
vectors can be regarded as (a) equivalence classes of curves or (b) derivations defined
at a point in the manifold. The question is: given a vector field X on M, is it
possible to ‘fill’ M with a family of curves in such a way that the tangent vector
to the curve that passes through any particular point p ∈ M is just the vector
field X evaluated at that point, i.e. the derivation Xp ? This idea is important in
general canonical theory of classical mechanics, where the state space of a system is
represented by a certain (even-dimensional e.g. symplectic phase-space) manifold M
and physical quantities are represented by real-valued differentiable functions on M.
The manifold M is equipped with a structure that associates to each such function
(representing physical quantities) f : M → R a vector field Xf on M. The family
of curves that ‘fit’ Xf play an important role. In particular for quantum systems,
curves associated with the vector field XH where H : M → R is the energy function,
are dynamical trajectories of the system i.e. the Hamiltonian flow. Of particular
importance to our search for time-optimal curves (geodesics) in later chapters are
integral curves. We want to find a single curve that (i) passes through p ∈ M and
(ii) is such that the tangent vector at each point along the curve agrees with the
vector field at that point. For this we use the definition of an integral curve.
Definition C.1.8 (Integral curves). Given vector field X on M, the integral curve
of X passing through p ∈ M is a curve γ : (−ϵ, ϵ) → M, t 7→ γ(t) such that we have
γ(0) = p and the push-forward satisfies:
d
γ∗ = Xγ(t) ∀t ∈ (−ϵ, ϵ) ⊂ R
dt t
The components of X µ of the vector field X determine the form taken by the
integral curve t 7→ γ(t). Remember, the vector field (and tangent vectors) should
be thought of as differential operators, so we have:
d µ
X µ (γ(t)) = x (γ(t))
dt
with the boundary condition that xµ (γ(0)) = xµ (p). Certain other properties are
also sought on integral curves, such as completeness properties such that the curves
are defined for all t (associated with assumptions regarding the compactness of the
manifold).
320 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
We are also interested in the extent to which a vector field can be regarded as
the generator of infinitesimal translations on the manifold M. To this end we define
a one-parameter group of local diffeomorphisms (see definition B.2.14) at a point
p ∈ M consisting of the following (a triple with certain properties): (i) an open
neighbourhood U (p), (ii) a real constant ϵ > 0 and (iii) a family of diffeomorphisms
of U , given by {|ϕt ||t| > ϵ} onto the open set ϕt (U ) ⊂ M, i.e. ϕt : (−ϵ, ϵ) × U → M
and (t, q) 7→ ϕt (q). The group has the following properties: (a) maps from the pa-
rameter and neighbourhood to M are smooth, i.e. (−ϵ, ϵ) × U → M, (t, q) 7→ ϕt (q)
is smooth on both domains; (b) compositions follow ϕs (ϕt (q)) = ϕs+t (q) and M,
i.e. ϕ0 (q) = q. The set of local diffeomorphisms is denoted Diff(M). The term
one-parameter subgroup refers to the composition property (b) above and that the
map t → ϕt can be thought of as a representation (vector space with map) of part of
the additive group of the real line. Families of local diffeomorphisms can be thought
of as inducing vector fields. Through each point q ∈ M, there passes a local curve
t 7→ ϕt (q). Because of the existence of this curve, we can construct a vector field on
U by taking tangents to this family of curves at q. This vector field is denoted X ϕ ,
the vector field induced by the diffeomorphism ϕ, more formally:
d
Xqϕ (f ) := f (ϕt (q))|0 ∀q ∈ U ⊂ M.
dt
It can then be shown that ∀q ∈ U the curve t 7→ ϕt (q) is an integral curve of X ϕ for
|t| < ϵ.
We now briefly mention local flows and its connection with Diff(M). A local flow of
a vector field is a one-parameter local diffeomorphic group such that the vector fields
induced by the group are the vector field itself. More formally, for a vector field X
defined on an open subset U ⊂ M, the local flow of X at p is a one-parameter group
of local diffeomorphisms defined on some open subset V ⊂ U such that p ∈ V ⊂ U
and such that the vector field induced by this family equals X. In a local coordinate
system, local flow - the family of diffeomorphisms ϕX t - is expressed in the following
way:
d µ X
X µ (ϕX
t (q)) = x (ϕt (q))
dt
C.1. DIFFERENTIAL GEOMETRY 321
xµ (ϕX µ µ 2
t (q)) = x (q) + tX (q) + O(t )
the transformed of q → ϕX µ
t (q) are approximated by the original coordinates x (q)
translated to first order by tX µ (q). As X µ belongs to the vector field, this is what
allows us to say that the vector field generates ‘infinitesimal’ transformations on
the manifold because it is responsible (up to approximation) for ‘shifting’ from
q to ϕXt (q) in terms of a local coordinate system on the manifold. We discuss
the relationship of integral curves and local flows to time-optimal geodesics in the
context of geometric control theory further on in section C.5.5.2 on Hamiltonian
flow, providing a connection between geometric formalism and quantum information
processing.
Recall from above the definition (A.1.8) that for any vector space V = V (K) we
define the associated dual space V ∗ as the space of all bounded linear maps L :
V → K (see definition A.1.13). Here we focus on K = R for exposition. In this
mapping formulation, the dual of a real vector space V is the collection V ∗ of all
linear maps L : V → R. The action of L ∈ V ∗ on v ∈ V is usually denoted
⟨L, v⟩ or sometimes with the subscript V as well. Note that here ⟨·, ·⟩ represents
the evaluation of the functional at a vector in the sense of assigning a scalar to each
vector v ∈ V . Note that by the Riesz representation theorem (for finite dimensional
V ), the isomorphism between V and V ∗ means that we can associate to each vector
v a unique such functional Lv , thus allowing us to view the evaluation of functionals
in this way as a generalisation of the inner product (hence the notational similarity).
The dual itself can be given the structure of a real vector space by leveraging the
vector-space properties of R into which L maps i.e.:
If V with dim V < ∞ has a basis {e1 , ..., en }, then the dual basis for V ∗ is a collection
of vectors {f 1 , ..., f n } (they are vectors because they are linear maps L ∈ V ∗ which
is a vector space). This set {f i } is uniquely specified by the criterion that:
⟨f i , ej ⟩ = δji (C.1.16)
Note equation (C.1.16) may also be written as f i (ej ) = δji where duals take (or act
upon) vectors as their arguments, which is in essence an application of the Riesz
representation theorem. Because of the isomorphism between a vector and its dual,
one can ‘invert’ dual and vector spaces, so the dual is the argument of a vector
map to a scalar, but for clarity one usually maintains the formalism above. For
maps between different vector spaces L : V → W , then L induces a dual mapping
L∗ : W ∗ → V ∗ on k ∈ W ∗ by:
which says that the dual-map L∗ acting on k in W ∗ (itself a dual space) generates a
map L∗ k which lives in V ∗ . This map then acts on v ∈ V (taking it to R), acting as
a pullback (discussed below): L∗ is pulling back the linear functional k from W ∗ to
a linear functional in V ∗ . To explicate the relationship between V and V ∗ , it can be
shown that there exists a canonical map χ : V → (V ∗ )∗ where ⟨χ(V ), ℓ⟩V ∗ = ⟨ℓ, v⟩V
where V ∈ V, ℓ ∈ V ∗ which is an isomorphism if dim V < ∞. This concept captures
the spirit of what is sometimes described as V being equivalent to the ‘dual of its
dual’. With these concepts to hand, we can now define cotangent structures.
k : Tp M → R
⟨·, ·⟩ : Tp M × Tp M → R (C.1.17)
and one forms. Consider a metric tensor g which is a type (0, 2)-tensor meaning it
takes two vectors as an argument and returns a scalar. Choosing an inner product
⟨·, ·⟩ on a vector (tangent) space Tp M gives rise to an isomorphism:
The function φ(u) takes u ∈ Tp M and returns its dual such that that φ(u)(v) =
⟨u, v⟩ i.e we have a function taking a vector, obtaining its dual then taking the inner
product with v and returning a real number. In this sense, we can identify the inner
product as the action of a covector on a vector i.e:
⟨u, v⟩ = φ(u)v.
⟨u, v⟩ = gij ui v j = uj v j
324 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
This is a basis for the set of derivations and thus of the tangent space at p by the
isomorphism above. There is also an associated basis about p for the corresponding
dual Tp∗ M. It is denoted by what via differential operators:
As with the basis of the dual space above, the basis for the dual space Tp∗ M is given
via a set of linear operators in the dual space which when composed with the basis
vectors of Tp M (i.e. the ∂x∂ µ operators), leads to deltas. That is:
µ ∂
(dx )p , := δνµ . (C.1.18)
∂xν p
In this way, we can expand dual vectors k ∈ Tp∗ M in terms of component functions
and this basis:
m
X
k= kµ (dxµ )p
µ
where the components kµ ∈ R of the vector for the coordinate system and thus
the basis vectors dxµ are given via the bilinear map (e.g. inner product in certain
contexts) of the dual vector k with the basis element of the tangent vector:
∂
kµ = k, .
∂xµ p p
recalling v µ = v(xµ ) and the resemblance here to inner products. Using this formal-
ism, we can choose a local representation for one forms:
X
ω= ωµ dxµ (C.1.19)
µ
i.e. we can express a one-form (cotangent vector) in terms of the basis set of the
dual space where the above is a shortened expression for:
X
ωp = ωµ (p)(dxµ )p
µ
We now describe the pullback property of one-forms. While it isn’t true in general
that a map h : M → N (between manifolds) can be used to push-forward a vector
field on M i.e. we cannot always use some commuting differential form to act as
our pushforward in the general case, this map h can be used to pull-back a one-
form on N . This feature is connected with the way in which the global topological
structure of a manifold is reflected in DeRham cohomology groups of M defined
using differential forms and to the fact that fibre bundles also pull-back (not push-
forward) (see Frankel [49] §13 for detailed discussion).
Definition C.1.11 (Pull-back). A pullback is the dual of the map between tangent
spaces across manifolds. If h : M → N and we have the linear map (the pushfor-
ward) h∗ : Tp M → Th(p) N , then then pullback map is defined as:
h∗ : Th(p)
∗
N → Tp∗ M (C.1.20)
and is considered under certain conditions the dual of h∗ . This means that for all
326 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
∗
maps (which are also vectors in the associated vector space) Th(p) N and v ∈ Tp M
we have :
∗
To unpack the formalism, note that Th(p) N is a reference to the dual of the tan-
gent space for p ∈ M mapped to h(p) ∈ N , so we can see h∗ as ‘pulling back’ to
the dual space of p ∈ M. The term ⟨h∗ k, v⟩p refers to h∗ (the pullback) acting on
∗ ∗
k i.e. it ‘pulls back’ the map k living in Th(p) N to a map living in Th(p) M. Thus
∗ ∗
h k is a map living in Th(p) M which is itself a linear map that acts on the tangent
space Tp M, i.e. v ∈ Tp M. The definition says that this pullback is equivalent
(diagrammatically) to k acting on h∗ having been pushed forward to Th(p) N which
∗
means it cans be acted on by k ∈ Th(p) N . In both cases, we have a mapping to R.
It can be shown (see [48] for specifics exposition) that the pullback h∗ of ω (again,
in the dual space of N ) evaluated at p ∈ M is given by the pullback acting on that
one-form ω, i.e. h∗ ω when then acts on the basis elements of Tp M, namely the
∂
∂xµ
terms, mapping them to R via the bilinear form. This is definitionally equated
with the one-form ω acting on the basis element ∂x∂ µ of Tp M once it has been
‘pushed forward’ to Tp N by the pushforward map h∗ , which also is equivalent to an
evaluation of the one-form at h(p) (as we are assuming the existence of h : M → N
here). The point of the pullback is to allow the structure of maps defined on or in
relation to one manifold, say N , to be expressed or represented in terms of structure
on or related to another manifold M.
We now specify the relationship of Lie derivatives (and thus commutators) to pull-
backs. Using definition (C.1.12), the pull-back of a one-form ω on M can be related
to a vector field X with an associated one-parameter group of local diffeomorphisms
C.1. DIFFERENTIAL GEOMETRY 327
∗
t → ϕX X
t . For each t, there exists an associated pull-back ϕt . This one-parameter
family of pulled-back forms describes how the one-form ω changes along the flow
lines (integral curves) of X. In this formalism we can connect Lie derivatives to
one forms and pullbacks. The Lie derivative of ω associated with vector field X,
denoted LX ω, is then the rate of change of ω along the flow-lines (integral curves)
of the one-parameter group ϕX t of diffeomorphisms associated with the vector field
X. That is:
d ∗
LX ω := ϕX
t ω (C.1.21)
dt t=0
d
ϕX
−t∗ (Y ) = [X, Y ] = LX Y (C.1.22)
dt t=0
Recall that that X in ϕXt refers to the group of local diffeomorphisms that induce the
∗
vector fields X. Then ϕX t is the family of diffeomorphisms that generate the dual
∗
cotangent vector field X . Equation (C.1.21) can be understood as follows. Recall
that LX f = Xf and LX Y = [X, Y ] (as naive multiplication of vector fields lacks the
derivation quality). It can be shown that LX ⟨ω, Y ⟩ = ⟨LX ω, Y ⟩ + ⟨ω, LX Y ⟩. Recall
ϕX
t representing the flow of X on M (a one-parameter group of diffeomorphisms).
This says that each value of t is associated with a diffeomorphism from M to itself.
The flow moves points along the integral curves of the vector field X. Associated
with such flow is an inverse flow, indicated by the negative sign in ϕX−t , in essence a
rewinding of the flow (which recall is possible as a diffeomorphism). The operation
ϕX
−t moves points p ∈ M along integral curves in the opposite direction to ϕt . In
X
this way, we can see how the commutator (Lie derivative) [X, Y ] expresses the rate
of change of the vector field Y along integral curves of X. Moreover, df can be used
to define the exterior derivative.
⟨df, X⟩ := Xf = LX f
i.e. we represent the function in the dual basis dxµ of X ∗ , then take the derivative
with respect to the coordinates in X, i.e. ∂µ . Note also the commutation of the
pullback h∗ with f , h∗ (df ) = d(f ◦ h).
Definition C.1.14 (Tensor types (geometric)). Tensors are vector products of tan-
gent spaces and/or cotangent spaces. A tensor of type (r, s) ∈ Tpr,s M at a point
p ∈ M belongs to the tensor product space:
" # " #
Tpr,s M := ⊗r Tp M ⊗ ⊗s Tp∗ M (C.1.23)
i.e. r tensor products of the tangent space tensor-producted with s tensor products
of the dual (cotangent) space.
In this formulation, with vectors and their duals, tensors as linear maps becomes
apparent. We can represent the Tpr,s type tensor in terms of a multilinear mapping
of the Cartesian product of a vector space and its dual to R which, in functional
notation (Cartesian products describing the function of tensors rather the direct
product) is given by:
× r
Tp∗ M × s
× Tp M →R (C.1.24)
tensor maps in effect take tensors as their arguments. The upper index r indices
the number of contravariant components of the tensor associated with Tp M. These
are akin to ‘directional’ components transforming in the same way as coordinate
differentials dxi . Contravariant components are typically denoted via superscripts
corresponding to directions the tensor acts along with respect to covectors. The
subscript index s denotes the number of covariant components, consisting of linear
functionals in Tp∗ M (recall for duals and vectors we have components and bases).
This formulation is usefully understood in terms of equation (C.1.16) i.e. ⟨f i , ej ⟩ =
δji . Each basis element of the vector and its dual either annihilate or lead to unity
(scaled by any coefficients a, b), thus if we denote component coefficients a, b ∈ R
for vectors and duals af i , bej , then:
Thus the definition of vectors and duals, as bases of Tp M and Tp∗ M shows how the
tensorial map effectively acts as contraction (see below) which reduces the dimen-
sionality of the input vectors and covectors (the input tensor) and then multiplies
the remainder by the corresponding scalar product of the real coefficients of the basis
vectors for the vector and its dual so annihilated. As we see below, the usual inner
product on vector spaces can be framed in the language of metric tensors (where the
metric tensor acts as a bilinear form resulting in a scalar product). Framing tensors
as multilinear maps connects to concepts further on of, for example, metric tensors
which act on Tp M and Tp∗ M to contract tensor products to scalars, forming the ba-
sis for measuring the time, energy or length of paths given by Hamiltonian evolution
in our quest for time-optimal geodesics. In quantum information, the eponymous
tensor networks represent an important application of tensor formalism for a variety
of use cases, such as error correction and algorithm design [257, 258]. We note for
completeness a few types of special (r, s) tensors. Note the relation between the
tangent planes and cotangent planes:
(i) Tp0,1 M = Tp∗ M is the cotangent space at p (space of covectors or one-forms).
An example at p ∈ M would be a vector in Tp M with one contravariant
component;
(ii) Tp1,0 M = Tp M = (Tp∗ M)∗ (dual of dual returns the tangent plane). An
example would be a covector or one-form in Tp∗ M consisting of one covariant
component;
(v) Tpr,s M a mixed tensor comprising both contravariant (acting on vectors) and
covariant (acting on covectors) elements. Such higher-rank tensors are used
to represent more complex objects, such as curvature tensors or stress-energy
tensors.
Covariant tensors are multi-linear maps that take vectors (from Tp M) as inputs and
return scalars, while contravariant tensors take covectors (from Tp∗ M) as inputs.
From this we obtain a tensor field on M (a generalisation of the concept of a vector
field), which is a smooth association of a tensor of type (r, s) for p ∈ M. It is
instructive to observe how Tp M is associated with contravariant transformations
while Tp∗ M is associated with covariant transformations. This is intuitively related
to the fact that dual spaces (cotangent spaces) consist of linear maps that act on
vectors in the tangent space, hence they transform covariantly (i.e. they co-vary with
vectors). On the other hand, structures that transform ‘opposite to the way tangent
vectors do’, i.e. according to the basis changes in the cotangent space, are said to
transform contravariantly. Since tensor spaces can involve multiple layers of dual
spaces, we consider their transformation properties in terms of how they relate to
the transformations of the basis in the tangent and cotangent spaces. Tensors can be
defined as the unique bilinear map V ∗ ×W ∗ → R which evaluates to ⟨k, v⟩ ⟨l, w⟩ ∈ R
on (k, l) ∈ V ∗ × W ∗ . We also can now usefully define a tensorial contraction as
a generalisation of equations (C.1.16). Contraction is an operation that reduces
the order of a tensor by pairing covariant vectors, indicated by (lower) indices with
contravariant vectors, indicated by (upper) indices and summing over the dimension
of the manifold, effectively treating the contraction over indices as if they are in an
inner product, akin to a trace operation. Recalling that a tensor is a mapping (a
function), then we can define the contraction as a function, being a tensor, that
takes as its argument another tensor. In general, the contracting tensor can be
of any order, but in practice contraction is often performed using metric tensors,
tensors which, as maps to R, satisfy the conditions of being metric (we discuss this
below). The most general form of a contraction is set out below. Following [2, 48]
first recall that vectors and their duals are related via:
Y
⟨Xi ⊗ · · · ⊗ Xr , ωs′ ⊗ . . . ⊗ ωr′ ⟩ = ωj (Xi′ )ωi′ (Xj ),
i,j
Cji is the contraction of the ith contravariant index and jth covariant index:
Cji (⊗rj=1 Xj ⊗si=1 ωi ) = ⟨Xi , ωj ⟩(X1 ⊗ ... ⊗ X̂i ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗ ω̂j ... ⊗ ωs )
| {z }
(r,s)
(C.1.26)
= δij (X1 ⊗ ... ⊗ X̂i ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗ ω̂j ... ⊗ ωs ) (C.1.27)
= (X1 ⊗ ... ⊗ Xj ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗
ωj ... ⊗ ωs ) (C.1.28)
| {z }
(r−1,s−1)
Where the X̂i symbol denotes removal of the ith element (following [2]).
Thus contraction reduces the rank of a tensor by one in both the covariant and
contravariant indices. Note that we can in principle construct a more general con-
traction mapping the product of individual contractions. In particular we examine
metric tensors, which provide a way to compute inner products between tangent
vectors, thereby inducing a geometry and way of measuring distance (and thus time
optimality) on M.
Definition C.1.16 (Metric tensor). The metric tensor is a [0,2]-form tensor field
mapping gp : Tp M × Tp M → R given by:
where ei = ∂i = ∂xi are basis elements of Tp M and dxi are the corresponding dual
basis elements for the inverse metric tensor g ij .
vi = gij v j . (C.1.30)
332 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
We conclude this section introducing n-forms and exterior products. Consider the
following elementary n-forms, assuming 0 ≤ n ≤ dim M with n ∈ N. A 0-form is
a function in C ∞ (M). A one-form (1-form) at each point p ∈ M, assigns a linear
functional that maps vectors in Tp M to R. It is a section of the cotangent bundle
T ∗ M defined as follows.
where Xi are arbitrary vector fields on M, and deg(P ) is the degree of the
permutation P (+1 even, −1 odd). The set of all n-forms on M is denoted An (M).
An n-form is therefore a particular way of assigning n cotangent vectors to p ∈ M.
We can now define an important concept, the wedge or exterior product in terms of
a tensor product of n-forms.
Definition C.1.18 (Exterior product). For ω1 ∈ An1 (M), ω2 ∈ An2 (M) the wedge
or exterior product of n-forms ω1 , ω2 is the (n1 + n2 )-form, denoted ω1 ∧ ω2 and
defined as:
1 X
ω1 ∧ ω2 = (−1)deg(P ) (ω1 ⊗ ω2 )P (C.1.31)
n1 !n2 !
σ(P )
The permutations σ(P ) are understood as follows. For a tensor field ω of type
(0, n), the permuted tensor field ω P is defined to be the permutation map applied
to the permutation index i.e:
for all vector fields X1 , ..., Xn on the manifold M. The factor 1/(n1 !n2 !) is a normal-
isation factor given the ways of permuting ω1 , ω2 , while the sum over permutations
C.1. DIFFERENTIAL GEOMETRY 333
σ(P ) ensures all orderings of vectors are considered and that the result is antisym-
metric with respect to exchange of vectors. When applied to vector fields, the result
is a real number interpreted as an oriented volume spanned by those vectors asso-
ciated with p ∈ M. Intuitively one can think of how in two dimensions the wedge
(or cross) product leads to a volume (area) which can be given an orientation based
on the direction of its vectors. While not a focus of this work, we note that the
generalised pullback commutes with the wedge product, noting for h : M → N and
differential forms α, β, the pullback is a homomorphism h∗ (α ∧ β) = (h∗ α) ∧ (h∗ β).
Wedge products turn vectors into a graded algebra, important in various algebraic
geometry techniques. The basis of n-forms at p ∈ M is given by the wedge product
of differentials:
Indeed it can be shown that dω(Xi ) at p ∈ M only depends upon vector fields Xi
at p, a fact related to the property of the exterior derivative being f -linear:
Such formulations then form the basis for important theorems related to DeRham
cohomology (see Frankel [49]). In this context the exterior derivative can be defined
as follows.
∂ωi1 ...in
dxj .
P
where dωi1 ...in = j ∂xj
The exterior derivative has the property that d(dω) = 0 for any form ω, and it
is linear over the smooth functions on M, i.e., for a smooth function f and a form
α, we have d(f α) = df ∧ α + f dα. It is relevant in particular to the Maurer-Cartan
form and Cartan structure equations discussed elsewhere.
334 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
In this section, we recount a few properties of Lie groups and Lie algebras from
earlier sections, connecting them explicitly to geometric formulations used in later
chapters. Recall from proposition (C.1.1) that a Lie group is a group equipped with
the structure of a differentiable manifold such that the group operations are smooth
i.e. where the map in parameter space that takes us from g1 (parametrised by θ1 )
to g2 in G is a differentiable map.
A fundamental aspect of Lie group theory is the isomorphism between the set of
all left-invariant vector fields on a Lie group G, denoted L(G), and the tangent
space Te G at the identity element e of G. This isomorphism implies that one can
understand the behaviour of left-invariant vector fields by examining transforma-
tions within Te G. In particular, this concept is foundational when considering the
exponential map. The tangent space Te G is, effectively, the Lie algebra of the Lie
group G, and mappings between different Lie groups G → H can be studied by
examining the corresponding maps between their Lie algebras Te G → Te H where
here e denotes the identity element for G. In addition, given the completeness of
left-invariant vector fields G, we can extend integral curves using the group struc-
ture even if G is not compact. Moreover, we can see how the exponential map is
constructed by considering the unique integral curve of a left-invariant vector field
at t = 0, mapping t 7→ exp(tA), where A is an element of the Lie algebra Te G. The
map is formally defined as exp : Te G → G, such that exp(A) = exp(tA) evaluated at
t = 1. It can be demonstrated that the exponential map is a local diffeomorphism
near the identity e, mapping a neighborhood in Te G smoothly onto a neighborhood
in G. This elucidates that the exponential map generates a one-parameter subgroup
of G (see definition B.2.14), and, in fact, every one-parameter subgroup of G can be
expressed in the form t 7→ exp(tA), a fact related to the bijective correspondence
between one-parameter subgroups of the Lie group G and elements of its Lie alge-
bra discussed earlier. Thus the neighborhood of e in G, which is diffeomorphically
mapped by exp : Te G → G, is filled with the images of these subgroup maps, allow-
ing investigations into these subgroups by examining the local structure around the
identity element of the Lie algebra.
rg :G → G, lg : G → G
g ′ 7→ g ′ g g ′ 7→ gg ′
lg∗ X = X, ∀g ∈ G (C.1.33)
or equivalently:
with a similar notion of right invariance. Here lg∗ denotes the pushforward, being
the derivative map of the left action Lg i.e. g maps between manifolds, while g ∗ is
the corresponding push-forward mapping between their respective tangent spaces.
Connecting with the notation above, given an isomorphism between the tangent
space at identity and L(G) given by ξ : Te G → L(G) with ξ(A) = LA for A ∈ Te G
then we define LA g = lg ∗ A for all g ∈ G. The set of left invariant vector fields is
denoted L(G). Indeed one of the important properties of X being left-invariance is
that it is complete (Isham [48] §4.2). Similarly, local compactness is important in
quantum contexts as it relates to the existence, for example, of the Haar measure
(discussed in Appendix A above) where the measure exists on G if G is (quasi)-
invariant under left and right translations. For vector fields X1 , X2 on M that
are by a pushforward h∗ mapped to vector fields Y1 , Y2 on a manifold N where
h : M → N , the commutator [X1 , X2 ] is h-related to the commutator [Y1 , Y2 ]. For
left-invariant vector fields X1 , X2 , we then have that the commutator is invariant
under the left-invariant actions (translations) of G, namely:
which shows that [X1 , X2 ] ∈ L(G). It can be shown that the set L(G) is a ‘sub-Lie
algebra’ of the infinite-dimensional Lie algebra of all vector fields on the manifold
G. This is equivalent to the Lie algebra of G and represents a way of construing the
Lie algebra in terms of important invariant properties of vector fields. It can also be
shown that there is an isomorphism between the tangent space at the identity e ∈ G
i.e. Te G and L(G), allowing L(G) (and diffeomorphisms of G) to be explored via
actions upon Te G (at the identity). The commutator [A, B] ∈ Te G (for A, B ∈ Te G)
336 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
L[A,B] = [LA , LB ].
γ γ
for Cαβ ∈ R where Cαβ denotes the relevant structure constant (which play roles
in quantum mechanics, geometry and elsewhere). The adjoint map also has an
expression in terms of actions on Te G and that left-invariant fields X such that
integral curves of X can be extended for all t ∈ R.
Because of its relevance in particular to our Chapter 5 results, we set out the signif-
icance of left and right invariance in relation to quantum control. Our problem in
that Chapter is to determine the optimal time to synthesise a unitary UT ∈ G using
only controls in p, the antisymmetric subspace corresponding to the Cartan decom-
position g = k ⊕ p, where g is the Lie algebra of G. Now consider the Schrödinger
equation:
dU
= −iH(t)U. (C.1.35)
dt
Presented in this way, the Hamiltonian on the left of U , is the generator of (in-
finitesimal) translations (to second order) U → U + dU over increments dt. U is our
unitary at t while U + dU is our unitary at t + dt. The Hamiltonian is applied for
dt i.e. dU/U = −iH(t)dt. Thus time in equation (C.1.35) moves from right to left.
Intuitively this is because U on the right-hand side represents the quantum state at
time t, while dU on the left-hand side represents it at the later time t + dt. When
we say right or left translation by g, we can think of right translation as following
the flow of time once its direction is chosen, and left translation in this case as time
flowing ‘backwards’ relative to that choice. Thus right and left action by h ∈ G is
C.1. DIFFERENTIAL GEOMETRY 337
expressed as follows:
dU
h = −iH(t)U h (right action) (C.1.36)
dt
dU
h = −ihH(t)U = −i(hH(t)h -1 )hU (left action). (C.1.37)
dt
Thus we can see how, once an implicit direction is chosen, as distinct from right
action, left action acts to conjugate (thus transform by an effective rotation) the
Hamiltonian H(t) → hH(t)g -1 (recalling here that the action of h on g is by conju-
gation, expressed as the commutation g → [Xh , g]). Thus while the representation
of left or right invariance can be chosen initially in an arbitrary fashion, once chosen
in certain circumstances they are not equivalent.
We briefly note a few connections between the exponential map (discussed in Ap-
pendix B) and the geometric formalism above. The exponential map can be related
to integral curves, i.e. as the unique integral curve satisfying:
LA A d
t→γ (t) A= γ∗L (C.1.38)
dt 0
of the left invariant vector field LA associated with the identity in M, that is
A A
γ L (0) = e and which is defined for all t ∈ R. The notation γ L refers to the integral
curve generated by the left-invariant vector field LA originating from the identity ele-
ment e ∈ G. For any g ∈ G, the left translation map Lg : G → G, Lg (h) = gh, h ∈ G
A A
does not alter LA . Here A is the tangent vector to the curve γ L at e where γ∗L
represents the pushforward of the curve’s tangent vector at t = 0. The integral curve
is then written as a mapping from the affine parameter t 7→ exp(tA) with A ∈ Te G.
The map exp : Te G → G is defined via exp A = exp(tA) t=1 , reflecting the conven-
tion that one moves from the identity e ∈ G along the integral curve generated by
A to exp(A) ∈ G with time t ∈ [0, 1]. This reflects the idea that evolution of curves
in G can be framed in terms of evolution from the identity element and therefore
studied in terms of the canonical Lie algebra associated with Te G.
Other results mentioned in Appendix B also apply, such as the exponential map
being local diffeomorphism from Te G to e ∈ G and that t 7→ exp(At) is considered a
unique one-parameter subgroup of G (indeed that all such one-parameter subgroups
are of that form for A ∈ Te G ≃ L(G)). This reiterates the one-to-one association
338 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
Differential forms are an important tool for exhibiting how algebraic properties of g
manifest within or affect geometric and topological properties of a system. Of note
in this regard is the Maurer-Cartan form which is a g-valued one-form on G. The
form can be understood as relating structure constants and wedge products to the
derivative of a one-form, providing a means of expressing v ∈ Tp M in terms of g.
Given the Maurer-Cartan equation:
n
α 1 X α β
dω + C ω ∧ ωγ = 0 (C.1.39)
2 β,γ=1 βγ
Here lg∗−1 : Tp M → g. The one-form ωg relates the tangent space at any point
g ∈ G back to the tangent space at the identity and therefore represents a map
from Tp M → g. As Sharpe [8] notes, the equation is of profound use in classifying
properties of spaces. In particular, Cartan’s structural equations (see Theorem C.2.2
below) can be expressed as:
1 1
dωg = [ωg , ωg ] [ωg , ωg ] := [ωg , ωg ](u, v) = [ωg (u), ωg (v)] (C.1.40)
2 2
which is also described as the Cartan curvature equation as it describes local cur-
vature (where dωg = 0 equates to flatness). Note we can derive Cartan curvature
equation (C.1.40) from equation (C.1.39).
C.1. DIFFERENTIAL GEOMETRY 339
We can connect different types of group actions discussed in earlier parts of this
work to their geometric equivalents:
K = {g ∈ G|gp = p, ∀p ∈ M}.
(ii) Effective group action: is where the kernel equals the identity, that is K = {e}.
(iii) Free group action: if for all p ∈ M it is the case that {g ∈ G|gp = p} = {e}
i.e. M is moved away from itself except with the exception of the unit element
e i.e. e = p is the only element such that gp = p.
(v) Orbit: the orbit Op of the G-action through p is the set of all points in M that
can be reached from p, that is:
An important action throughout this work is the adjoint action of G on itself (dis-
cussed in Chapter 2):
Adg (g ′ ) = gg ′ g −1 .
The kernel of this action is the centre C(G) because the centre is the set of elements
of G that commute with every element in G, but then for an g ′ ∈ C(G) we have
Adg (g ′ ) = gg ′ g −1 = g ′ gg −1 = g ′ i.e. the (effective) kernel of this action of conjugating
by g is the set of elements in g that result in this action being equivalent to the
identity. The relation between the commutator of a pair of Lie algebra elements
(crucial for all quantum control problems) and the push-forward Adg∗ of the adjoint
action of G can be expressed as follows. Let A, B ∈ Te G with Lie bracket [A, B],
C.1. DIFFERENTIAL GEOMETRY 341
d
[A, B] = Adexp(tA)∗ (B)|t=0 .
dt
Recalling that the exponential is given such that exp(Adg∗ (B)) = g(exp(B))g −1 , it
can be shown that the relation between the commutator can be thought of in terms
of this conjugacy:
and X Adg∗ (A) = δg−1 ∗ (X A ) where δg−1 ∗ is the differential (pushforward) of the map
g -1 acting on X A . That is, the vector field associated with the adjointly transformed
Lie algebra element A is equivalent to the transformed vector field A.
We conclude this section via reiterating in a geometric context the key results
related to Lie algebra homomorphisms and connecting these to vector fields. This
is an important result for further on when we introduce the connection between
vector (fibre) bundles that serves a pivotal role in our final chapter. In the theory
of infinitesimal transformations, we can represent the Lie algebra g of G via vector
fields on M on which G acts. This mapping is a homomorphism of Lie algebras and
associates X A with A ∈ Te G. Assume G right-acts on M. Given a mapping
A 7→ X A
that assigns to each A ∈ Te G the vector field X A (denoted the induced vector field
on M) is a homomorphism from the Lie algebra L(G) ≃ Te G into the infinite-
dimensional Lie algebra of all vector fields on M. This is denoted:
[X A , X B ] = X [A,B]
Definition C.1.21 (Fibre). The fibre Fp over p is the inverse image of p under π.
Geometrically, it arises via the map π −1 : M → T M, an example of a fibre bundle
associating p ∈ M with the tangent space Tp M. Formally, we define the fibre Fp as
follows. The projection π associates each fibre Fp with a point p ∈ M, where:
Fp = π −1 ({p}),
defines Fp as the preimage of p under π (that is the set of all points in E mapped
to p ∈ M).
Certain bundles have the special property that the fibres π −1 ({p}), p ∈ M are
all homeomorphic (diffeomorphic for manifolds) to F . In such cases, F is known as
the fibre of the bundle and the bundle is said to be a fibre bundle (we explore this
more via associated bundles below). For vectors, this is the set of all vectors that
are tangent to the manifold at the point p. The fibre bundle is sometimes visualised
in diagrammatic form (see Isham [48] §5.1):
F E
π
Related to fibre bundles is the idea of a section. The section of a bundle (E, π, M)
is a map from the base space (manifold) to the total space:
s:M→E
C.1. DIFFERENTIAL GEOMETRY 343
such that the image of each point p ∈ M lies in the fibre π −1 ({p}) over p such that:
π ◦ s = idM
So the section is the map from the manifold such that we take p, send it up to the
total space E, then that image of p, when subject to π, takes us back to p ∈ M ,
hence π ◦ s is equivalent to the identity on M. One can also define maps between
bundles and pullbacks among fibres analogous to the case for vector bundles.
An important canonical fibre bundle is the principal fibre bundle whose fibre acts
as a Lie group in a particular way. For our purposes, they allow for the definition of
connections (see below) which describe how fibres (or vector spaces) are connected
over different points in M. Connections are of fundamental importance to results
in our final Chapter and also to definitions of vertical and horizontal subspaces in
subRiemannian control problems further on. A connection on a principal bundle
defines a notion of horizontal and vertical subspaces within the tangent space of the
total space. This distinction is crucial for defining parallel transport and curvature,
concepts that are central to understanding the dynamics and control of systems with
symmetry. Firstly, we define a principal fibre bundle as follows.
where E/G is the orbit space of the G-action on E and ρ is here the usual projection
map such that we have the diagram:
u
E E
π ρ
≃
M E/G
A principal fibre bundle has a typical fibre that is a Lie group G, and the action
of G on the fibres is by right multiplication, which is free and transitive. The
fibres of the bundle are the orbits of the G-action on E and hence are not generally
homeomorphic to each other. All non-principal bundles are associated with the
principal bundle. Principal G bundles are when G acts freely on E (that is, free
action is when the only element of G that acts as an identity element on x ∈ E is
344 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
the identity in G itself). Given a closed subgroup H ⊂ G, then the quotient space
G/H is also an orbit space with fibre H. When G is the fibre itself and the action
on G is both free and transitive, we use the notation P to denote the total space
(i.e. E). The notation P is used to emphasise that each fibre is isomorphic to G
itself and that all fibres in the bundle are homogeneous i.e. they are all structurally
isomorphic (so we can utilise a single representation for each). A few other concepts
to note include:
(i) Principal total space (P). For principal G bundles, the total space is often
denoted as a principal total space where the principal G bundle is then indi-
cated by the triple (P, π, M) where P is the principal total space. There exist
principal maps between two such bundles i.e. u : P → P ′ for (P, π, M) and
(P ′ , π ′ , M′ ). The mapping is G-equivariant as u(pg) = u(p)g. Here G acts on
P and P projects onto M via π.
For the avoidance of doubt, we have so far (with slight abuse of notation for con-
venience) been equating G ≡ M (as distinct from explicitly notationally indicating
the group acting on the manifold M × G), where Tg G for g ∈ G captures the in-
finitesimal directions in which G can evolve, mapping directly to tangent spaces
Tp M on the manifold. Usually the principal bundle (P, π, M) is introduced as a
more abstract formulation to cater for where G acts on a different manifold (e.g.
where G ̸= M). The formulation of total space P allows consideration of how the
tangent spaces Tg G (that is for our formulation, Tp M) are related across the entirely
of M. Thus in our treatment, we are interested in how G acts upon itself, which
in the language of quantum operators is how operators act upon themselves e.g.
U1 (t)U2 (t) = U ′ (t) can be regarded as group elements acting on themselves.
While principal fibre bundles above allow us to abstractly associate the group action
of G to a manifold, the fibres remain abstract. In practice we want an association
C.1. DIFFERENTIAL GEOMETRY 345
between fibre bundles and more familiar structures from a geometric control and
quantum information processing perspective, e.g. we want our fibres to have the
structure of vector spaces, tangent space or Lie algebras. For this we turn to the
concept of associated fibre bundles which enable the construction of bundles with
fibres that are not necessarily groups but can be any space on which the group acts.
This is particularly relevant to our use of geometric methods where fibres are vector
spaces, such as Lie algebras or tangent spaces. The idea [48] is that an associated
bundle can be constructed where G acts as a group of transformations on F .
Vector bundles are the primary type of (associated) fibre bundle with which we
are concerned in this work. The concept has broad application, e.g. in machine
learning by allowing the representation of data points that are sensitive to inherent
symmetries in the data. In quantum settings, unitary operations and quantum
channels, which describe the evolution of quantum states, can be seen as bundle
maps that act on the sections of vector bundles. Unitary operations, representing
reversible quantum evolutions, can be modeled as isomorphisms of Hilbert spaces
H (definition A.1.9) as fibres which preserving their structure. Quantum channels,
346 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
C.1.7 Connections
Connections are a fundamentally important concept in geometry, encapsulating the
concept of differentiation of vector fields along curves on manifolds. A connection on
a manifold M, or more specifically on a tangent bundle T M, provides a systematic
way to parallel transport vectors along paths, allowing the comparison of tangent
spaces at different points on M, effectively facilitating the extension of the notion
of directional derivatives to curved spaces. Intuitively the idea of a connection
is a means of associating vectors between infinitesimally adjacent tangent planes
Tp M → Tp+dp M as one progresses from γ(t) = p to γ(t + dt) = p + dt on M.
Another way to think about them is that they provide a way of identifying how
Tp M transforms along the curve γ(t), giving conditions to be satisfied, in the form
of transformation rules for such vectors, for tangent vectors γ̇(t) to remain parallel
as they are transported along a curve (parallel transport or the vanishing of the
covariant derivative, which we discuss below).
Connections are also a fundamental means of distinguishing between Riemannian
and subRiemannian manifolds via the decomposition of fibres (and Lie algebras) into
horizontal and vertical subspaces. They are thus central to quantum and classical
control problems, such as the KP control problem we study extensively in the final
Chapter. In Riemannian geometry, for example, Christoffel symbols represent co-
efficients of connection on the bundle of frames (set of bases) for M. Connections,
as we explain below, are related to notions of parallel transport and covariant dif-
ferentiation. The underlying idea of parallel transport and covariant differentiation
requires that one compares points in neighbouring fibres (or vector spaces, in the
case of a vector bundle) in a way that isn’t dependent upon a particular local co-
ordinate system (or trivialisation). Thus a concept of directionality is needed such
that vector fields point from one fibre to another. Vector fields arising from Lie
algebras lack this intrinsic orientation of directional pointing. The connection pro-
vides a concept of directionality by partitioning the fibre into horizontal and vertical
subspaces as discussed below.
Consider a principal G-bundle (P, π, M) above. A connection on P provides a
smooth splitting of the tangent space Tp P at each point p ∈ P into vertical and
horizontal subspaces, Tp P = Vp P ⊕ Hp P , where Vp P is tangent to the fibre and
Hp P is isomorphic to Tπ(p) M. To understand this formalism, we expand upon
important concepts of vertical and horizontal subspaces. These are fundamental
in later chapters, where synthesis of time-optimal (approximate) geodesics arise by
C.1. DIFFERENTIAL GEOMETRY 347
Vp P = {τ ∈ Tp P |π∗ τ = 0}
Tp P = Vp P ⊕ Hp P.
The horizontal subspace consists of those vectors that are orthogonal (under a cho-
sen Riemannian or pseudo-Riemannian metric on the bundle) to the vertical space
with respect to a chosen connection. Vectors in Hp P are “horizontal” in the sense
that they correspond to displacements that lead to movement in the base manifold
M when considered under parallel transport defined by the connection. This can
348 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
π∗
TP TM
G P
π
Figure C.1: Commutative diagram showing the relationship of the connection (map from G →
P → M), the projection map π : P → M and induced horizontal map (the pushforward) π ∗
where δg (p) = pg denotes the right action of G on P . This condition ensures that
vectors being deemed horizontal is consistent with the geometric structure of the
bundle as defined by the group action.
Intuitively, vertical subspaces manifest the internal symmetry of the bundle en-
coded by the Lie group G, while horizontal subspaces encapsulate the geometry of
how the bundle expands over the base manifold. This geometric structure, facil-
itated by the connection, is crucial for defining parallel transport, curvature and
ultimately geodesics (and their approximations) that characterise time-optimality.
C.1. DIFFERENTIAL GEOMETRY 349
We can thus formally define a connection [48, 49] and [2] in these terms.
G→P →M
π∗ : Tp P → Tπ(p) M.
By construction, the kernel of this map is the vertical subspace. Connections can
also be understood usefully in terms of one forms. Connections can be associated
with certain L(G)-valued one-forms ω on P . Recall that the map ℓ : L(G) →
V F lds(P ), A 7→ X A , so ℓ−1 maps back to the Lie algebra L(G). If τ ∈ Tp P , then:
ωp : Tp P → P, ωp (τ ) = ℓ−1 (Vp P (τ ))
which maps back to Vp P . As this map takes us back to L(G), then ℓ is the iso-
morphism of L(G) with Vp P , so we can associate one-forms with maps between the
tangent space and the Lie group as follows:
(i) ωp (X A ) = A, ∀p ∈ P, A ∈ L(G);
(iii) τ ∈ Hp P ⇐⇒ ωp (τ ) = 0.
As touched upon above, the horizontal subspace of the tangent space T P can thus
be thought of as the kernel of the one-form that maps to the left-invariant Lie group
(L(G)) (see section C.1.34):
The decomposition into vertical and horizontal subspaces discussed above is par-
ticularly relevant to time optimal control problems. In essence, given the Cartan
decomposition (B.5.2) g = k ⊕ p we associate k as the vertical and p as the hori-
zontal subspace. We can then understand the symmetry relations expressed by the
commutators related to this vertical and horizontal sense of directionality: i.e. given
[k, k] ⊂ k, [k, p] ⊂ p and [p, p] ⊂ k we can see that the horizontal generators under
the adjoint action shift from p ∈ G/K to p′ ∈ G/K, while the vertical generators in
k do not translate those points in G/K. We introduce two definitions to elucidate
this further, the Cartan connection.
1
ΩU = dθU + [θU , θU ] (C.1.45)
2
1. Vertical subspace k. The first relation [k, k] ⊆ k provides that the Lie bracket of
two vertical generators remains within the vertical subspace. Geometrically,
this means that transformations generated by elements of k remain within the
fibres of the principal bundle G → G/K. Such vertical movements do not
evolve to other points in the base space G/K, reflecting an intrinsic, self-
contained dynamics within each fibre, akin to internal symmetries. Geometri-
cally, translations by elements of k keep points within the same orbit in G/K.
C.1. DIFFERENTIAL GEOMETRY 351
3. Vertical acting on horizontal. The relation [k, p] ⊂ p implies that the action of
vertical elements on horizontal ones results in horizontal displacements. This
can be interpreted as the influence of the group’s internal symmetries on the di-
rectionality of movement within G/K. In other words, the vertical generators,
through their adjoint action, can modify the direction of horizontal generators
without leaving the horizontal plane, thereby affecting the trajectory of points
in G/K without transitioning to vertical movement.
Recall that in the full Lie group G, orbits are generated by the action of its sub-
groups, including K, on points within G. For a subgroup K and an element g ∈ G,
the orbit of g under the action of K is defined as Og = k · g | k ∈ K. These orbits
represent continuous trajectories or paths within G that are traced out by the action
of K, reflecting the inherent symmetry structure imposed by K on G. When con-
sidering the quotient space G/K, which represents the space of left cosets of K in
G, the action of K on G translates differently. Here, each point in G/K corresponds
to an orbit of K in G, and the quotient space essentially collapses these orbits to
single points, with the projection map π : G → G/K sending elements of G to their
corresponding orbits in G/K.
perspective arises because, within G/K, each orbit Og is identified with a single
point, and the distinct actions of elements of K on points in G that would have
moved them along their orbits are now seen as leaving the corresponding points in
G/K invariant. That is:
for all k ∈ K and g ∈ G, where gK denotes the coset (or point in G/K) correspond-
ing to g. The action of K on any g ∈ G thus translates to the invariance of the coset
gK in G/K, effectively rendering K as acting by fixed points in the quotient space.
To this end, each fibre of G/K can be thus regarded as an equivalence class of orbits.
For any Lie group G and a subgroup K, the quotient space G/K is constructed by
partitioning G into equivalence classes under the equivalence relation:
g ∼ g ′ ←→ g ′ = gk
for k ∈ K. This relation groups elements of G into sets where each set, or equivalence
class, contains elements that can be transformed into one another by the right action
of elements of K. Mathematically, an equivalence class of g ∈ G can be denoted by
the coset gK = {gk | k ∈ K}, which represents an orbit of g under the action of K.
In the quotient topology of G/K, each point corresponds to one such equivalence
class or orbit. The projection map π : G → G/K maps each element g ∈ G to the
equivalence class gK it belongs to. The preimage π −1 (π(g)) = gK under this map
is a fibre over the point π(g) in G/K. Therefore, each fibre of G/K is representative
of an orbit in G under the action of K, signifying that the entire set of elements
in G that are related through multiplication by elements of K are collapsed to a
single point in G/K. This illustrates how the quotient space encapsulates the idea of
moving from the specific (individual group elements in G) to the general (equivalence
classes or orbits in G/K) by abstracting away the internal symmetries represented
by K.
(ii) Vp (Xp↑ ) = 0
The vector field (Xp↑ ) is called the horizontal lift of X as it ‘lifts’ up the vector field
X on M into the horizontal subspace of T P .
The requirement of Vp (Xp↑ ) = 0, indicates that that X ↑ lies entirely in the hori-
zontal subspace Hp M, encapsulates the essence of parallel transport as maintaining
the direction of X through the fibres of P . For a smooth curve γ in M, a horizontal
lift γ ↑ : [a, b] → P is a curve whose tangent vectors are in the horizontal subspaces
Hγ ↑ (t) P as: horizontal in that:
Vp [γ ↑ ] = 0
π(γ ↑ (t)) = γ(t), ∀t ∈ [a, b].
Given a point p ∈ P above γ(a), there exists a unique horizontal lift γ ↑ starting at
p. This uniqueness underscores the connection’s role in determining a specific way
to transport points along curves in M, embodying the geometric intuition behind
parallel transport.
354 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
Definition C.1.30 (Parallel vector fields). For a curve γ(t) ∈ M, the vector field
X is parallel along γ(t) if the covariant derivative vanishes along the curve, that is
if:
τ : π −1 ({γ(a)}) → π −1 ({γ(b)})
adjustment of that rate of change sums to zero. Holonomy reflects how connections
on M influence the paths taken by curves on the manifold, where curves traced out
by the action of the holonomy group constitute (time-optimal) subRiemannian or
Riemannian geodesics (discussed further on in this work).
where τt−1 is the parallel transport operator from the vector space τt−1 ({γ(t)}) to the
vector space τt−1 ({x0 }).
Note we use a slight abuse of notation where ∇γ refers more properly to ∇γ̇
where γ̇(t) is the tangent vector to the curve γ at the point γ(t). Here τt−1 transports
vectors back along γ(t) → γ(0) while preserving parallelism. We interchange with
the notation ∇X which typically refers to the covariant derivative along a vector
field X ∈ X(M) on a manifold M. Here, X is a smooth section of the tangent
bundle T M, and ∇X is an operator that acts on a vector field or section of a vector
bundle. If two curves γ1 , γ2 are tangent at p ∈ M then we have ∇γ1 ψ = ∇γ2 ψ.
Moreover, connecting with the notation:
(b) For the vector field X on M, the covariant derivative along X is the linear
operator:
∇X : Γ(PV ) → Γ(PV )
∇X ψ(p) = ∇Xp ψ.
Consider now affine connections, noting X(M) is also denoted Γ(PV ). We note that
∇X on X(M) exhibits the structure of derivation:
∇X (f ψ) = f ∇X ψ + X(f )ψ (C.1.48)
for f ∈ C ∞ (M). The ∇X operator is also linear in X(M) (which can also be
regarded as a module over C ∞ (M)). These properties are expressed by considering
∇X as an affine connection.
for f, g ∈ C ∞ (M ), X, Y ∈ X(M).
∇∂i = Γkij ∂k
where Γkij = (∇∂i ∂j )k are Christoffel symbols of the second kind and (·)k indicates
the k-th component. For the covariant derivative of Y with respect to X, such k-th
coordinate is given by:
∂Y k
k i k i k j
(∇X Y ) = X (∇i Y ) = X + Γij Y .
∂xi
C.1. DIFFERENTIAL GEOMETRY 357
We can use the above concepts to define a notion of parallelism leading to a definition
of geodesics which are curves that locally minimise distance and are solutions to the
geodesic equation derived from a chosen connection. Denote γ : I → M, t 7→ γ(t) for
an interval I ⊂ R (which we generally without loss of generality specify as I = [0, 1])
with an associated tangent vector γ̇(t). Here γ is regular. Two vector fields X, Y
are parallel along γ(t) if:
∇X Y = 0 ∀t ∈ I.
!
X X ∂Y k X i j k ∂
∇X (Y ) = Xi + X Y Γi,j on U
k i
∂xi i,j
∂xk
dY k X k dxi j
+ Γi,j Y = 0 (t ∈ J) (C.1.49)
dt i,j
dt
which can be shown [2] to represent equation (C.1.48) in the limit as t → 0. Equation
(C.1.49) indicates that the component form of the covariant derivative along γ(t) is
zero (hence indicative of parallel transport). The covariant derivative operator ∇X
has a number of properties, including that (i) it is a derivation (an (r, s)-tensor being
drawn from the derivation space of tangent and cotangent vectors respectively), (ii)
it preserves tensors (i.e. it maps (r, s) tensors to (r, s) tensors) ∇X : T r,s M → T r,s M
and (iii) commutes with all contractions Cji (see above). From this we come to the
first iteration of a geodesic.
d2 u γ α
γ du du
β
+ Γ αβ =0
ds2 ds ds
where:
1 ∂gµα ∂gµβ ∂gαβ
Γγαβ = g γµ + −
2 ∂uβ ∂uα ∂uµ
are the Christoffel symbols of the second kind (essentially connection coefficients)
with g γµ the inverse of the metric tensor and ds usually indicates parametrisation
by arc length. It can be shown that given an affine connection above, for any
p ∈ M and Xp , there exists a unique maximal geodesic t 7→ γ(t) such that γ(0) = p
and γ̇(0) = Xp , i.e. it cannot be extended to a larger interval. As Nielsen et al.
note [183, 184], the geodesic equation is solved either (a) as an initial value problem
by specifying γ(0) = p, γ̇(0) = v for v ∈ Tp M, or (b) a boundary value problem
for γ(0) = p and γ(1) = q using variational methods. For Lie group manifolds, all
geodesics are generated by generators from the horizontal subspace Hp M, but not
all curves generated from the horizontal subspace are geodesics. Note (for clarity)
with regards to Hp P and Hp M that we can denote the image of Hp P under π∗ as
Hp M, representing bundle’s horizontal structure within the tangent space at π(p)
in M, thereby elucidating how the geometry of the bundle expands over the base
manifold via the connection.
for any vector fields X, Y, Z ∈ X(M), where ∇ denotes the Levi-Civita connection
associated with the metric g.
The Riemann curvature tensor measures the extent to which the affine connection
(covariant derivative) is not commutative i.e. the failure of the covariant derivatives
C.2. RIEMANNIAN MANIFOLDS 359
to commute and therefore the extent of intrinsic curvature of the manifold. In this
way, it corresponds to the non-holonomy of M. Recall Holp (ω) for a loop γ reflects
parallel transportation of a vector v ∈ Tp M to and from p ∈ M along γ. The
deviation of v from its initial position can be given by the action of the Riemann
curvature tensor along the loop expressed by the Ambrose-Singer Holonomy The-
orem [262] which states that the Lie algebra of Holp (ω) at p ∈ M is generated
by the values of the curvature Ω over all horizontal two-planes in Tp P where p is
the fibre over x (see also section C.2.3.2 below). The properties of Holp (ω) that
describe how parallel transport around closed loops γ distort vectors are related to
the curvature experienced by these vectors in plans spanned by tangent vectors at
Tx M. Integrating Rµν over all such two-dimensional subspaces identifies the set of
possible curvatures that parallel transport can induce i.e. each element in Holp (ω)
corresponds to a unique way of parallel transporting and the relation to curvature
quantifies the amount of twisting or distortion of the vector (Rµν being responsible
for both the curvature experienced and the Lie algebra of Holp (ω)).
Given on Riemannian manifolds the affine connection is given by the Levi-Civita
connection that is metric compatible and torsion free, the Riemann curvature tensor
has an expression in terms of the second covariant derivative:
α
Rβγδ = Γαβδ,γ − Γαβγ,δ + Γµβδ Γαµγ − Γµβγ Γαµδ .
To obtain measures of curvature, one can then perform contractions with the metric
tensor g to obtain the Ricci tensor which is obtained by performing a tensor con-
λ
traction over the first and third indices i.e. Rµν = Rµλν . Scalar curvature is then
obtained via contraction with the inverse metric tensor R = g µν Rµν , which is in
effect a trace operation. Given the definitions
With an appropriate one-form, we can then define a torsion tensor field as the
argument for the one-form mapping ω : T ∗ M × T M × T M → R with (ω, X, Y ) 7→
360 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
ω(T (X, Y )) where T (X, Y ) ∈ T21 M (i.e. a type (1,2) tensor). We can define the
curvature tensor field ω : T ∗ M × T M × T M × T M → R with (ω, X, Y, Z) 7→
ω(R(X, Y ) · Z) where R(X, Y ) · Z ∈ T31 M. It can be shown that the relevant one-
forms in turn determine the structure of the covariant derivative on M, while it
can be shown (via Cartan’s structure equations) that the forms ω are themselves
described by the torsion and curvature tensor fields as per below (adapted from
Theorem 8.1 in [2], §8). The relation of curvature and torsion to differential forms
is shown explicitly in Cartan’s structural equations which we set out below.
2g(X, ∇Z Y ) = Zg(X, Y ) + g(Z, [X, Y ]) + Y g(X, Z) + g(Y, [X, Z]) − Xg(Y, Z) − g(X, [Y, Z]).
(C.2.6)
gp : Tp M × Tp M → R
|| · ||p : Tp M → R
q
(v, w) 7→ gp (v, w).
The Riemannian metric is a (0, 2)-form thus a tensor, aligning definition (C.1.16).
Recall (0, 2)-forms at p ∈ M have a representation as the tensor product of two
362 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
cotangent vectors at p. Given a coordinate basis for Tp∗ M at p, denoted {dx1 , dx2 , . . . , dxn }.
In the case of a (0, 2)-form g, we can express it in this basis as:
X
g= gij dxi ⊗ dxj ,
i,j
where the gij are the components of the form with respect to the basis, and they can
be functions on the manifold. We can then identify an induced norm on M using
this metric. For X ∈ Tp M , denote:
We may now define the important concept of arc (curve) length which is fundamental
to later chapters. Given the curve (segment) t 7→ γ(t) with t ∈ [α, β] and metric
gγ = g, define the arc length of γ as follows:
Definition C.2.6 (Arc length). Given a curve γ(t) ∈ M with t ∈ [α, β] and metric
g, the arc length of the curve from γ(0) to γ(T ) is given by:
Z T
ℓ(γ) = (g(γ̇(t), γ̇(t)))1/2 dt. (C.2.8)
0
Here d(p, q) satisfies the usual axioms of metrics, namely (a) symmetry d(p, q) =
d(q, p), (b) triangle inequality d(p, q) ≤ d(p, r) + d(r, q) and (c) positive definite
d(p, q) = 0 if and only if p = q. It can then be shown that for such a Riemannian
manifold M with metric d given by equation (C.2.9) together with a curve γpq joining
p, q ∈ M, then if ℓ(γpq ) = d(p, q) then γpq is a geodesic. Then consider an open
ball around p with radius r, Br (p) and associated sphere around p denoted Sr (p).
Assume an open ball Vr (0) = {X ∈ Tp M|0 ≤ ||X|| ≤ r} is a normal neighbourhood
of 0 ∈ Tp M. In this case it can be shown that Br (p) = exp(Vr (0)). For complete
Riemann manifolds (where every sequence in M is convergent), each pair of points
p, q ∈ M can be joined by a geodesic of length d(p, q).
C.2. RIEMANNIAN MANIFOLDS 363
In this section we briefly set out a few key theorems and definitions from standard
differential geometry relating to fundamental forms and curvature which are familiar
for differential geometry on Euclidean spaces. In particular we connect standard
results to the coordinate-free formalism adopted above.
(i) First fundamental form. The first fundamental form is an expression of the
metric tensor g on M and is denoted:
(ii) Second fundamental form. The second fundamental form describes how sur-
faces curve in ambient space i.e. by reference to a normal to surface. In general,
define a level set function f : M → R as the set of points p ∈ M such that
f (p) = c ∈ R:
(iii) Gauss and Codazzi equations. The Gauss equation provides that the Gauss
curvature is a measure of intrinsic curvature of the manifold. In a coordinate
frame, given the first and second fundamental forms in a coordinate basis
with components ϕu , ϕv with principal curvatures given by dN = diag(k1 , k2 ).
C.2. RIEMANNIAN MANIFOLDS 365
det(hij ) eg − f 2
K= = k1 k2 (C.2.15)
det(gij ) EG − F 2
1 1 eE − 2f F + gG
H = (k1 − k2 ) = (C.2.16)
2 2 EG − F 2
which, when satisfied, ensures that the local shape of the surface given by
normal curvature ∇X N is consistent with the global properties of the manifold
(thereby providing an integrability condition).
C.2.3.1 Overview
π(s(p)) = p, ∀p ∈ M. (C.2.19)
This condition ensures that the section lifts each point in the base manifold back
to a specific point in the total space, intersecting each fibre exactly once. For a
vector bundle section then the section is a map assigning to each p ∈ M a vector
in the corresponding fibre. We now define the sectional curvature as taking a two-
dimensional sub-bundle of the section in order to calculate curvature. The intuitive
idea is that sectional curvature allows one to observe how a plane embedded in the
manifold bends or curves as it moves through the space.
g(R(X, Y )Y, X)
K(X, Y ) =
g(X, X)g(Y, Y ) − g(X, Y )2
⟨R(X, Y )Y, X⟩
=
⟨X, X⟩⟨Y, Y ⟩ − ⟨X, Y ⟩2
where R is the Riemann curvature tensor, g is the Riemannian metric tensor and
g(X, Y ) = 0 (orthogonal).
Sectional curvature K thus measures how the Riemann curvature tensor, when
acting on X, Y relates to the area spanned by X and Y . This spanning of an area
is also described by a “2-Grassmanian bundle”, which intuitively speaking is the set
of Riemannian manifold then has constant curvature c if K(H) = c. Indeed it can
be shown (see [2], Prop. 12.3) that sectional curvature determines the Riemannian
curvature such that if for two bundles Tp M, Tq M, Kp = Kq , then Rp = Rq . For a
two-dimensional manifold, the sectional curvature of the plane spanned by (X, Y )
C.3. SYMMETRIC SPACES 367
C.3.1 Overview
Symmetric spaces were originally defined as Riemannian manifolds whose curvature
tensor is invariant under all parallel translation. Cartan allowed the classification of
all symmetric spaces in terms of classical and exceptional semi-simple Lie groups.
Cartan’s classification of symmetric spaces (see Chapters IV and IX of [2]) is of
seminal importance. Locally they are manifolds of the form Rn × G/K where G is a
semi-simple lie group with an involutive automorphism whose fixed point set is the
compact group K while G/K, as a homogeneous space, is provided by a G-invariant
structure.
As Helgason [2] notes, Cartan adopted two methods for the classification prob-
lem. The first, based on holonomy groups provides that for p ∈ M Riemannian, then
the holonomy group (definition C.1.32) of M is the group of all linear transforma-
tions of the tangent space Tp M corresponding to parallel transportation of closed
curves γ ∈ M. Parallel translation is equated with the action of the holonomy
group and leaves the Riemannian metric invariant, leading to a method of classifica-
tion. The more direct second method involves demonstrating the invariance of the
curvature tensor under parallel transport is equivalent geodesic symmetry being a
local isometry for all p ∈ M through which the geodesic cure passes. Such spaces
equipped with geodesic symmetry and global isometry are equipped with a tran-
sitive group of isometries K such that space is represented as a coset space G/K
with an involutive isomorphism whose fixed point is the isometry group K. The
study of symmetric spaces then becomes a question of studying specific involutive
automorphisms of semi-simple Lie algebras, thus connecting to the classification of
semi-simple Lie groups.
∇X R = 0.
It can be shown that this is equivalent to the sectional curvature being invari-
ant under parallel translations. In the formalism of Helgason, given a Riemannian
manifold M with Riemannian structure Q, an analytic Riemannian manifold is one
where both M and Q are analytic. From this we obtain the global form i.e. a
Riemannian globally symmetric space if all p ∈ M are fixed points of an involutive
symmetry θ such that θ2 = I. In this case θ is also a geodesic symmetry in that
θ(p) induces γ̇p → −γ̇p i.e. a reflection symmetry along the geodesic. As Helgason,
following Cartan, notes, in such cases certain facts follow from being a Riemannian
globally symmetric space G/K (in which case (G, K) is sometimes denoted a Rie-
mannian symmetric pair), with a Cartan decomposition as discussed above and an
association of Hp M ∼ p and Vp M ∼ k. It can be shown then that the Riemann
curvature tensor for the symmetric space (homogeneous) G/K with a Riemannian
metric allows for curvature, via R:
The curvature tensor is independent of the group action of h ∈ G, that is, the
curvature tensor remains invariant. In geometric terms, this is reflected by the fact
that the pullback of the metric tensor by h corresponds to the original metric tensor
itself. This G-invariance property of symmetric spaces G/K means that the parallel
transport of vectors remains the same.
Curvature plays an important role in the classification of symmetric spaces.
Three classes of symmetric space can be classified according to their sectional cur-
vature as follows. Given a Lie algebra g equipped with an involutive automorphism
C.3. SYMMETRIC SPACES 369
The classification of Riemannian symmetric spaces by Cartan (see Knapp [9] and
Helgason [2] for detailed discussion, especially Helgason Chapter IX) terms of simple
Lie groups is set out below in Table C.1. Type I spaces are non-compact and are
associated with non-compact simple Lie groups, while Type II spaces are their com-
pact counterparts. Type III spaces are distinct in that they are Riemannian duals of
themselves. Type IV spaces involve complex structures and are related to complex-
ifications of simple Lie algebras. The families of simple Lie groups are denoted by
An , Bn , Cn , and Dn , where An corresponds to the special linear group SL(n + 1, C),
Bn to the special orthogonal group SO(2n + 1), Cn to the symplectic group Sp(n),
and Dn to the special orthogonal group SO(2n). Additionally, the exceptional Lie
groups are denoted as E6 , E7 , E8 , F4 , and G2 , each signifying a unique algebraic
structure that induces specific geometric properties on the associated symmetric
spaces.
Table C.1: Classification of Riemannian Globally Symmetric Spaces (adapted from Helgason [2]
as compiled in Wikipedia)
Type Lie Algebra Class Symmetric Space Alternative Description
I (II) AI(n) SU (n)/SO(n) -
I (II) AII(2n) SU (2n)/Sp(2n) SU (2n)∗ /Sp(2n)
I (II) AIII(n, r) U (n)/[U (r) × U (n − r)] U (r, n − r)/[U (r) × U (n − r)]
I (II) BDI(n, r) SO(n)/[SO(r) × SO(n − r)] SO(r, n − r)/[SO(r) × SO(n − r)]
I (II) DIII(n) SO(2n)/U (n) SO(2n)∗ /U (n)
I (II) CI(n) Sp(2n)/U (n) Sp(2n, R)/U (n)
I (II) CII(n, r) Sp(2n)/[Sp(2r) × Sp(2n − 2r)] Sp(2r, 2n − 2r)/[Sp(2r) × Sp(2n − 2r)]
III (IV) A(n) SU (n) × SU (n)/SU (n) -
III (IV) BD(n) SO(n) × SO(n)/SO(n) SO(n, C)/SO(n)
III (IV) C(n) Sp(2n) × Sp(2n)/Sp(2n) -
370 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
p
where ||γ̇(t)|| = ⟨γ̇(t), γ̇(t)⟩ computed over the horizontal subspace HM. Sub-
Riemannian distance dS is defined as the infimum of all such lengths, namely
dS (A, B) = inf ℓ(γ) of all curves connecting A, B ∈ M. To guarantee that a hor-
izontal curve has a minimal or geodesic distance, we require that the curve γ be
absolutely continuous which is to say that its derivative γ̇(t) ∈ Hp M for all t. In
this case we can define a subRiemannian minimising geodesic as that absolutely
continuous horizontal path that realises the distance between two points in M.
C.4. SUBRIEMANNIAN GEOMETRY 371
as it can be more convenient to minimise this quantity rather than ℓ(γ). A hori-
zontal curve γ that minimises E among all curves also minimises length and can
be parametrised by constant speed (analogous to parametrisation by constant arc
length for minimal geodesics) via d(q0 , q1 )/T for γ(0) = q0 , γ(T ) = q1 ∈ M.
⟨·, ·⟩ : T ∗ M ⊗ T ∗ M → R.
Here γ(t) is a projected curve, i.e. a curve constituted via projection onto the
subRiemannian manifold. As noted in the literature, there are subRiemannian ge-
ometries for which minimal geodesics exist that are not solutions to subRiemannian
Hamiltonian equations, so-called ‘singular’ geodesics. The existence of subRieman-
nian geodesics and importantly the ability for curves generated by distributions ∆ to
effectively cover the entirety of M (thus, in a quantum information context, making
any U (t) in principle reachable via only the subset p), relies upon bracket-generating
properties of ∆ and Chow’s theorem.
C.5.1 Overview
The primary problem we are concerned with in our final two chapters is solving
time optimal control problems for certain classes of Riemannian symmetric space.
In our case, time optimal control is equivalent to finding the time-minimising subRie-
mannian geodesics on a manifold M corresponding to the homogeneous symmetric
space G/K. Our particular focus is the KP problem, where G admits a Cartan
KAK decomposition where g = k ⊕ p, with the control subset (Hamiltonian) com-
prised of generators in p. In particular such spaces exhibit the Lie triple property
[[p, p], p] ⊆ k given [p, p] ⊆ k. Here, g remains in principle reachable, but where
minimal time paths constitute subRiemannian geodesics. Such methods rely upon
symmetry reduction [209, 210]. As D’Alessandro notes [15], the primary problem in
quantum control involving Lie groups and their Lie algebras is whether the set of
reachable states R (defined below) for a system is the connected Lie group G gener-
ated by g = span{−H(u(t))} for H ∈ g (or some subalgebra h ⊂ g) and u ∈ U (our
control set, see below). This is manifest then in the requirement that R = exp(g).
In control theory g is designated the dynamical Lie algebra and is generated by the
Lie bracket (derivative) operation among generators in H. The dynamical adage is
a reference to the time-varying control functions u(t).
In the general case, M is a C r manifold (see section C.1.1) and U are Lebesgue
integrable bounded functions in U (Lesbegue measurability is required as restricting
to piecewise continuous controls does not guarantee optimality). The dynamics
where γ̇(t) = f (t, γ, u) (i.e. equation (A.2.6) are defined such that f : R × M × U →
374 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
M. The values of t ∈ I over which equation (A.2.6) has a unique solution is denoted
the admissible control trajectory which we discuss below in the following subsections.
Returning to horizontal curves above, we assume that such curves γ : R ⊃
[0, T ] → M satisfy certain usual Lipschitz continuity requirements (almost every-
where differentiable as per above). We assume that ||γ̇(t)|| is essentially bounded as
follows.
Definition C.5.2 (Essentially bounded). A tangent γ̇(t) is essentially bounded
where there exists a constant N and mapping H : [0, T ] → T M, t 7→ H(t) = γ̇
such ⟨H(t), H(t)⟩S ≤ N for all t ∈ [0, T ].
This condition ensures that H(t) = γ̇(t) almost everywhere along the curve
(which we assume to be regular γ̇ ̸= 0). As discussed above, horizontal curves are
then those whose tangents (generators) are in the horizontal distribution for such
a curve, that is where γ̇(t) ∈ ∆γ(t) . Recalling our vector fields Xj are differential
operators on γ(t), we can then write the curve in terms of control functions and
generators.
Definition C.5.3 (Horizontal control curves). Given γ(t) ∈ M with γ̇(t) ∈ ∆γ ⊆
Hp M, we can define horizontal control curves as:
m
X
γ̇(t) = uj (t)Xj (γ(t)) (C.5.1)
j
The boundedness of γ̇(t) sets an overall bound on uj (t). Recall that the length
of a horizontal curve is given by:
v
T T T u m 2
Z Z Z uX
p
ℓ(γ) = ||γ̇(t)||dt = ⟨γ̇(t), γ̇(t)⟩dt = t uj (t)dt. (C.5.3)
0 0 0 j
minimal time for and Hamiltonian to generate a minimal geodesic curve such that
γ(0) = U0 and γ(T ) = UT for γ : [0, T ] → M. This is described [56] as the minimum
time optimal control problem [23]. In principle, the problem is based upon the two
equivalent statements (a) γ is a minimal subRiemannian (normal) geodesic between
U0 and UT parametrised by constant speed (arc length); and (b) γ is a minimum
time trajectory subject to ||⃗u|| ≤ L almost everywhere (where u stands in for the
set of controls via control vector ⃗u). SubRiemannian minimising geodesics starting
from q0 to q1 (our UT ) subject to bounded speed L describe optimal synthesis on
M. There are two loci of geodesics related to their optimality.
(i) Critical locus (CR(M)) being the set of points where minimising geodesics are
not optimal. This is defined such that for p ∈ CR(M), the horizontal curve
γ is a minimising geodesic for all points t ∈ [0, T ] but not for infinitesimally
extended interval [0, T + ϵ], the set of such points being critical points; and
(ii) Cut locus (CL(M)) being the set of points p ∈ M reached by more than one
minimising geodesic (γi , γj ) over t ∈ [0, T ], where at least one is optimal (such
points p ∈ M then being cut points).
When minimising γ are analytic functions of t ∈ [0, ∞) i.e. CL(M) ⊆ CR(M), then
cut points are critical points. In terms of optimal control, the critical locus indicates
points at which a geodesic ceases to be time optimal beyond a marginal extension
of the parameter interval as per above. Conversely, the cut locus represents points
where multiple minimal geodesics intersect, delineating the farthest extents of unique
geodesic paths within the reachable set, thereby affecting optimality by introducing
alternative minimal paths. An important concept with a geometric framing and one
drawn from control theory is that of a reachable set.
Definition C.5.4 (Reachable set). The set of all points p ∈ M such that, for
γ(t) : [0, T ] → M there exists a bounded function ⃗u, ||⃗u|| ≤ L where γ(0) = q0 and
γ(T ) = p is called the reachable set and is denoted R(T ).
Note for reachable sets under usual assumptions we have T1 ≤ T2 implies R(T1 ) ⊆
R(T2 ). For optimal trajectories, p = γ(T ) belong on the boundary of R(T ). In
general, targets are in the space of equivalence class of orbits of [p]. For our particular
case, we assume that G = M. For convenience and notational efficiency we denote
by g ∈ G the relevant group diffeomorphisms of M. We also make reasonable
assumptions about the existence of a minimal orbit type and that for points on
the same orbit q2 = gq1 , then one can transition between minimising geodesics
parametrised by constant speed L via the group action γ1 = gγ2 , which effectively
pushes γ around the orbit while satisfying time optimality.
376 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
for almost all t ∈ I. Solution curves γ(t) are typically unique under these conditions.
The PMP approach employs a variational equation to analyze how small variations
in the initial conditions affect the system’s evolution. This principle is crucial for de-
termining optimal trajectories that satisfy both the dynamical constraints imposed
by the system and the objective of minimizing a specific cost function.
for almost all t ∈ I. Solution curves γ(t) can in typical circumstances be regarded as
unique. The classical PMP approach then posits a variational equation such that for
C.5. GEOMETRIC CONTROL THEORY 377
the integral curve γ(t), which expresses how small variations in the initial conditions
of a dynamical system affect its evolution. For a solution curve γ(t), the matrix:
∂fi
Aij (t) = (γ(t), u(t)) (C.5.6)
∂γj
expresses how the vector field (whose components are fi ) changes around γ(t). The
variational system along the curve γ(t), u(t) is then:
n
dvi X
= Aij (t)vj (t) (C.5.7)
dt j
where pi are costate variables (see below, essentially akin to Lagrange multiplier
terms). Solutions to equation (C.5.8) allow construction of the time optimal Hamil-
tonian. Solutions γ(t), v(t) satisfy:
n
X
pi (t)vi (t) = c, (C.5.9)
i=1
for some constant c, such that the solutions have a constant inner product over time,
constituting level sets in their applicable phase space. The systems can be expressed
via the Hamiltonian as a function of γ, p and u as follows:
n
X
H(γ, p, u) = p0 f0 (γ, u) + pi fi (γ, u) (C.5.10)
i=1
with:
dγi ∂H
= (γ(t), p(t), u(t)) = fi (γ(t), u(t)) (C.5.11)
dt ∂pi
n
dpi ∂H X ∂fi
=− (γ(t), p(t), u(t)) = pj (γ(t), u(t)). (C.5.12)
dt ∂γi j
∂xi
378 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
Note that f0 (γ(t), u(t)) is the function we seek to minimise (hence p0 ≤ 0 below)
while costates pi act as in effect Lagrange multiplier terms such that fi represent the
rate of cost for exerting control u(t) given state γ(t). Optimisation is then expressed
in terms of a cost functional which quantifies the evolution along the curve from
points p(t0 ) to p(t1 ) in S ∈ M:
Z t1
C(u) = f0 (γ(t), u(t))dt
t0
where (γ(t), u(t)) is denoted a trajectory and C(u) the cost function. A trajectory
is optimal relative to γ(t0 ) = a, γ(t1 ) = b if it is the minimal cost among all trajec-
tory costs from a to b. The optimal control problem is then constituted by finding
an optimal trajectory. The PMP then specifies necessary conditions that any opti-
mal trajectory must satisfy. From this, we can obtain the concept of the maximal
Hamiltonian HM associated with an integral curve (γ(t), p(t), u(t)):
The PMP transforms the problem of minimising C(u) into one of Hamiltonian max-
imisation. It does so by the introduction of the costate variables p(t) which are
analogous to Lagrange multipliers, enabling inclusion of system dynamics in the
optimisation. Thus the Hamiltonian is defined to include the dynamics and cost
function with co-state variables. In essence it assumes an optimal minimum length
given by the cost p0 f0 (γ, u) term which is negative and then adds to the Hamiltonian
P
the i pi fi (γ, u) representing the additional energy or cost from the controls. The
PMP is formally given below.
(i) (γ(t), p(t), u(t)) is a solution curve to the Hamiltonian equations (C.5.12).
(ii) H(x(t), p(t), u(t)) = HM (x(t), p(t)) for almost all t ∈ [0, T ]; and
Following [15, 23], typical quantum control problems are fundamentally take one of
three forms: (a) the problem of Mayer (where C(u) = C(γ(T ), T ), (b) Lagrange
C.5. GEOMETRIC CONTROL THEORY 379
RT
where C(u) = 0 L(γ(t), u(t), t), dt where CT denotes some cost functional for time
T , combined into a more general Bolza form:
Z T
C(u) = CT (γ(T ), T ) + L(γ(t), u(t), t)dt (C.5.14)
0
subject to the constraint imposed by the Schrödinger equation (A.1.18). Both the
Hamiltonian and unitary target U are complex-valued, however they can be rep-
2
resented in real values via the mapping Cn → Rn . For the optimal solution, one
assumes the existence of a set of optimal controls uj (t) which may be unknown and
replaces them with a control approximating u, given by uϵ . The cost function is
then:
Following this, there are then two types of variation condition, known as strong
variation and weak variation describing how varied the choice of uϵ is from the true
u, which can be used to optimise. The PMP problem above is based upon a strong
variation then involves assumptions about how much uϵ varies over intervals. One
assumes that for τ ∈ (0, T ]:
u if t ∈ [0, τ − ϵ] and (τ − ϵ, τ ]
ϵ
u = (C.5.16)
v t ∈ (τ, T ]
where v is any value in the admissible control set U ∈ Rm . In this context, the PMP
is then expressed as that assuming u(t) as the optimal control and γ(t) as a solution
equation (C.5.5) in the form of Schrödinger’s equation exist, there exists a non-zero
(vector) solution to the adjoint equations and terminal equation at T :
Practically speaking, to apply the PMP in quantum contexts, the following proce-
dure is used. First we define the Hamiltonian (equation (C.5.12)). For each state γ
and costate p, we maximise the Hamiltonian over u ∈ U such that:
with boundary conditions that γ(0) = γ0 (and that pT (T ) = −ϕγ (γ(T )). Controls
that satisfy these conditions are denoted extremal. The Hamiltonian then is written
as follows as a function of the controls:
for a typical qubit control problem where H ∈ g = su(2) for final state target
P
Uf ∈ G = SU (2), with Pauli generators σj where H = j uj (t)σj . The cost
function is then of the form:
Z T X
†
J(uj ) = Re(U (T )Uf ) + η uj (t)σj dt (C.5.25)
0 j
where η is a parameter controlling the contribution of the second term to the cost
functional. It can be shown (see [15] §6) that the optimal controls are then of the
form:
where A, ω and B are free parameters which can be determined through a minimisa-
tion procedure. As noted in the literature, the solutions to the PMP equations are
extremal curves such that the trajectories γ(t) can be said to reside on the boundary
of the reachable set R. Solving the PMP can be challenging. One class of tractable
problems is when the targets UT ∈ G belong to a Riemannian symmetric space
where the symmetry properties of the related Cartan decomposition can, in certain
cases, allow the control solutions uj to be found in closed form. This is the main
focus of Chapter 5. In Chapter 4, we also discuss and examine the work of Nielsen
C.5. GEOMETRIC CONTROL THEORY 381
Certain approaches to geometric control use the language of dynamical Lie algebras
defined by the criterion that:
R = exp(h) (C.5.27)
382 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
for some Lie algebra h where h = span{−iH(u(t))}. ‘Dynamical’ here refers to the
fact that h is composed of generators multiplied by control functions u(t) which
are time-dependent and thus dynamic. Such a condition is equivalent to complete
controllability, namely that every UT ∈ G is obtainable via a particular sequence of
controls u(t). The condition relies upon the bracket-generating characteristics of h
(see definition C.4.4) to recover the full Lie algebra g.
In some literature [15], the number of applications of the commutator in order
to obtain g is denoted the depth of the bracket. Moreover, the condition in equation
(C.5.27) relies upon the fact that exp(h) ⊂ G and that G is compact. This is
an important point: in our final Chapter (and in subRiemannian geometric control
problems generally) admitting a Cartan decomposition g = k⊕p, the isometry group
exp(k) is not compact with respect to G (as [k, k] ⊆ k). One cannot reach arbitrary
states (limits) in p. However, symmetric spaces admitting such a decomposition
exhibit the Lie triple property with respect to p, namely that [p, [p, p]] ⊆ p where
[p, p] ⊆ k, which is a necessary requirement which can be expressed as:
Controllability is sometimes segmented into three types: (a) pure state control-
lable, where for U (0), U (T ) ∈ G, there exist controls {uj (t)} rendering both reach-
able; (b) equivalent state controllable, where {uj (t)} exist to reach U1 (T ) up to some
phase factor ϕ such that U (T ) = exp(iϕ)U1 (T ); and (c) density-matrix controllable
such that there exist controls {uj (t)} enabling ρi → ρk for all ρi .ρk ∈ B(H) (i.e. any
state is reachable from any other).
We can express the above in terms of symplectic manifolds and Hamiltonians which
elucidates the deep connection between Hamiltonian dynamics and the geometric
formalism above. Given a control system defined on M, the Hamiltonian can be
represented as a map H : T ∗ M → R which generates a flow (see definition C.1.2.4)
on the cotangent bundle which represents the phase space of the quantum system
(as we discuss below). As discussed below (definition C.5.7), one forms define a
Hamiltonian vector field associated with the Hamiltonian. Hamiltonian flow is the
local flow generated by a Hamiltonian vector field XH in a symplectic manifold. That
is, the flow ϕX
t
H
(to connect with local flow (section C.1.2.4)) constitutes evolution in
phase space along integral curves consistent with symplectic structure. The flow is
given by Hamilton’s equations and in a quantum context by Schrödinger’s equations.
C.5. GEOMETRIC CONTROL THEORY 383
which anti-commutes {g, f } + {f, g} = 0 with {f, gh} = {f, g}h + g{f, h}. The term
ω -1 is taken from Hall [45] an denotes the bilinear form on T ∗ M that arises by way
of the canonical identification of T ∗ M with T M if ω is non-degenerate. With these
identifications and in particular the isomorphism described above, we can define the
Hamiltonian vector field as follows.
df = w(Xf , ·) (C.5.30)
In this case, we can relate such Hamiltonian vector field to the Poisson bracket
via:
as a local flow φf being the local flow generated by −Xf , such flow preserving ω,
expressed via LXf ω = 0.
We then have that f, g, h ∈ C ∞ (M) form a Lie algebra under the Poisson bracket
satisfying the Jacobi equation, which in turn allows us to transition from the Poisson
bracket to the commutator via:
d
f (ΦH H
t (z)) = {f, H}(Φt (z)), (C.5.33)
dt
df
= {f, H}. (C.5.34)
dt
The Hamiltonian H dictates the flow or vector field on this manifold by deter-
mining the direction and rate of change of state variables in M (i.e. curves γ(t)
represented as sequences of unitaries U (t)).
We briefly connect to the formulation of Hamiltonians, as the generator of time
translations, in terms of controls uj (t) and generators in g. The Hamiltonian deter-
mines a Hamiltonian vector field on the symplectic manifold T ∗ M (the cotangent
bundle of M). The (0, 2)-symplectic form ω encodes information about the evo-
lution of γ(t) ∈ M as shown in equation (C.5.29). Given a Hamiltonian function
H(q, p) on a symplectic manifold M equipped with ω, the Hamiltonian vector field
Xf is defined as per equation (C.5.30) above, recalling dH is the differential of H. In
local phase space coordinates (p, q) we have ω = nj dpj ∧ dq j where n denotes the
P
C.5. GEOMETRIC CONTROL THEORY 385
degrees of freedom of the system. The vector field is then defined using Hamilton’s
equations:
n
X ∂H ∂ ∂H ∂
Xf = − j (C.5.35)
j
∂pj ∂q j ∂q ∂pj
∂H ∂H
q̇ j = ṗj =− . (C.5.36)
∂pj ∂q j
Hence we see how the Hamiltonian H determines the rates of change (i.e., the
velocities) of the coordinates q j and pj , establishing the dynamics of the system
on the symplectic manifold. The Hamiltonian vector fields and the corresponding
flows not only preserve this symplectic structure but also facilitate the analysis of
dynamical properties, such as the conservation laws dictated by the Poisson brackets,
which are essential in the study of classical Hamiltonian systems. While quantisation
methods allow transition from classical to quantum formalism, there are a number
of subtleties between geometric phase spaces in classical and quantum case (see
[263] which also discusses the significance of the Bell–Kochen–Specker theorem for
translating between classical and quantum phase space formalism).
to a cut locus above such that there is more than one optimal geodesic).
C.5.6 KP Problems
A particular type of subRiemannian optimal control problem with which we are
concerned in Chapter 5 is the KP problem. In control theory settings, the problem
was articulated in particular via Jurdjevic’s extensive work on geometric control
[19, 26, 57, 58], drawing on the work of [59] as particularly set out in [23] and [60].
Later work building on Jurdjevic’s contribution includes that of Boscain [24, 61, 62],
D’Alessandro [15, 17] and others. KP problems are also the focus of a range of
classical and control problems focused on the application of Cartan decompositions
of target manifolds G = K ⊕P , such as in nuclear magnetic resonance [21] and others
(see our final chapter). The essence of the KP problem is where M corresponds to a
semi-simple Lie group G together with right-invariant vector fields X(M) equivalent
to the corresponding Lie algebra g. In this formulation, the Lie group and Lie algebra
can be decomposed according to a Cartan decomposition (definition B.5.2) which
is, recalling equation:
g=k⊕p
and equipped with a Killing form (definition B.2.12) which defines an implicit pos-
itive definite bilinear form (X, Y ) which in turn allows us to define a Riemannian
metric restricted to G/K in terms of the Killing form. Thus for our purposes,
we understand the KP problem as a minimum time problem for a subRiemannian
symmetric space G/K.
sets for a larger class of problems. Connecting with the language of control, we can
frame equation (C.5.1) in terms of drift and control parts with:
X
γ̇(t) = Aγ(t) + Xj (γj )uj (t) (C.5.38)
j
where Aγ(t) represents a drift term for A ∈ k. Our unitary target in G can be
expressed as:
X
γ̇(t) = exp(−At)Xj exp(At)γ(t)uj (C.5.39)
j
for bounded ||Ap || = L. For the KP problem, the PMP equations are integrable.
One of Jurdjevic’s many contributions was to show that in such KP problem con-
texts, optimal control for ⃗u is related to the fact that there exists Ak ∈ k and Ap ∈ p
such that:
m
X
Xj uj (t) = exp(At)Xj exp(−At). (C.5.40)
j
Following Jurdjevic’s solution [60] (see also [56]), optimal pathways are given by:
resulting in analytic curves. As we explore in our final Chapter and as noted in the
literature, one can select Ap ∈ a ⊂ p where a is the non-compact part of a maximally
abelian Cartan subalgebra in p. In this regard, we see conjugation of a Cartan
subalgebra element by elements of K, reminiscent of the KAK decomposition itself.
It is also worth noting that a being a maximal subalgebra means that equation
(C.5.39) is invariant under the action of K ∈ k, reflected in the commutation relation
[k, p] ⊆ p. Albertini et al. [56] note that for γ(t) ∈ / CL(M) with H the isotropy
group of γ then the tuple (Ak , Ap ) minimising the geodesic is invariant under the
action of h ∈ H. This gives rise to a general method for time optimal KP control
problems set out in [56]: (i) identify the symmetry group of the problem, (ii) specify
G/K, (iii) find boundaries of R (which may require numerical solution), (iv) find
the first value of t such that π(γ(t)) ∈ π(R) and (v) identifying the orbit space
within which the optimal control exists and then moving within an orbit to the final
target UT = γ(T ). We draw the reader’s attention to work of Jurdjevic [23, 213]
for discussion and in particular D’Alessandro [15] (§6.4.2) for detailed exposition
of this Cartan-based solution to the KP problem. In our final chapter, we revisit
this method demonstrating how our novel use of a global Cartan decomposition and
388 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)
variational methods give rise to time optimal synthesis results consistent with this
literature.
Appendix D
389
390 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
ABC
D.1 Introduction
In this Appendix, we survey literature from classical and quantum machine learning
relevant to later chapters. We include a high-level review of key concepts from sta-
tistical learning theory for both context and utility. We specifically focus on deep
learning architecture adopted in Chapters 3 and 4, building up towards an exegesis
on neural networks by showing how they are a non-linear extension of generalised
linear models. We focus on specific aspects of neural network architecture and opti-
misation procedures, specifically stochastic gradient descent and its implementation
via backpropagation equations. We then provide a short overview the burgeoning
field of quantum machine learning. We track how quantum analogues of classical
machine and classical statistical learning have developed, noting key similarities and
differences. In particular, in order to contextualise the learning protocols adopted
in Chapters 3 and 4, we provide a comparative analysis of learning in a quan-
tum context, with a specific comparative analysis between quantum and classical
backpropagation techniques. We also focus on the relationship between learning
(in both classical and quantum contexts [265]) and measurement, emphasising the
hybrid nature of QML learning protocols (in the main) arising from dependence
upon quantum-classical measurement channels. We examine how techniques from
geometry, algebra and representation theory have been specifically (and relatively re-
cently, in some cases) integrated into both classical and quantum machine learning
strategies, such as in the form of equivariant and dynamical Lie-algebraic meth-
ods [266, 267]. In doing so, we provide a survey overview of geometric machine
learning, together with the use of representation theory and differential geometry in
relevant aspects of classical machine learning, briefly summarising recent results in
invariant and geometric QML [35, 268–270]. Finally, we map out key architectural
characteristics of hybrid quantum-classical methods denoted as greybox machine
learning used in our work above.
391
392 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
In this section, we set out a brief synopsis of key principles of machine learning, in-
cluding foundational elements of supervised and unsupervised models, architectural
principles regarding data, models and loss functions. We cover principles behind
training, optimisation, regularisation and generalisation, together with the primary
classes of algorithm used in this work. Our treatment focuses on both classical and
quantum forms of machine learning.
QML Taxonomy
QML Division Inputs Outputs Process
Classical ML Classical Classical Classical
Applied classical Quantum (Clas- Classical (Quan- Classical
ML sical) tum)
Quantum algo- Classical Classical Quantum
rithms for classi-
cal problems
Quantum algo- Quantum Quantum Quantum
rithms for quan-
tum problems
Table D.1: Quantum and classical machine learning table. Quantum machine learning covers four
quadrants which differ depending on whether the inputs, outputs or process is classical or quantum.
average across Dn . Thus the objective becomes one of optimising the selection rule
of f ∈ F given training data Dn (i.e for the specific algorithm selecting f given Dn )
rather than to minimise a specific f .
and represents an estimate of the average loss over the training set comprising n
samples of Xi , Yi .
The objective then becomes learning an algorithm (rule) that minimises empirical
risk, thereby obtaining an optimal estimator across sampled and out-of-sample data,
namely:
i.e. the f that minimises R̂. As the number of samples n → ∞, then by assumption
(of i.i.d. Dn ), R̂(f ) → R(f ). Equation (D.3.3) is typically reflected in (batch) gra-
dient descent methods that seek to solve for the optimisation task (D.3.4). Typical
gradient descent rules adopt an update rule, which can usefully (for the purposes of
comparison with quantum) be conceived of as a state transition rule. An important
(and ubiquitous) class of optimisation algorithm is gradient descent (and backprop-
agation) which requires constraining estimators fˆ that enable R̂n (f ) to be smooth
(continuous, differentiable), a requirement of f ∈ C k , the class of all k-differentiable
functions (ideally C ∞ ). Moreover, it is typical that f is chosen to be a parameterised
function f = f (θ) such that the requisite analytic structure (parametric smooth-
ness) is provided for by the parametrisation, typically where parameters θ ∈ Rm .
The parameters θ are often denoted the weights of a neural network, for example
where R̂n (f (θ)) is often just denoted as a function of the parameters (given the data
is fixed).
396 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
is smooth in θ, which implies the existence of a gradient ∇θ R̂n (f (θ)) which is a key
requirement of backpropagation. The update rule is then a transition rule for θ that
maps at each iteration (epoch) θi+1 = θi − γ(n)∇θ R̂n (f (θ)). Here γ(n) ≥ 0 is the
step-size whose value may depend upon the iteration (epoch) n (with ∞ γ(n) = ∞
P
and ∞ γ 2 (n) < ∞, the square-integrable condition with respect to the Lebesgue
P
measure [67]).
A variety of common loss functions are used as empirical risk estimators. Two pop-
ular choices across statistics and also machine learning (both classical and quantum)
are (a) mean-squared error (MSE) and (b) root-mean squared error (RMSE). Given
data Dn (X, Y ) with (Xi , Yi ) ∼ PXY and a function estimator fθ as per above, MSE
is defined as follows.
Definition D.3.3 (Mean Squared Error). The MSE for a function f parameterized
by θ over a dataset Dn is:
n
1 Xˆ 2
MSE(fθ ) = fθ (Xi (θ)) − Yi (D.3.6)
n i=1
i.e. L2 loss. MSE calculates the average of the squares of the differences between
p
predicted and label values. RMSE is defined as MSE(fθ ). Other common loss
functions include (i) cross-entropy loss e.g. Kullback-Leibler Divergence (see [12]
§14) for classification tasks and comparing distributions (see section A.1.8 for quan-
tum analogues), (ii) mean absolute error loss and (iii) hinge loss. The choice of loss
functions has statistical implications regarding model performance and complexity,
including bias-variance trade-offs.
As we discuss below, there is a trade-off between the size of F and empirical risk
performance in and out of sample. Such risk R̂n (f ) can always be minimised by
specifying a large class of prediction rules. The most extreme example being f (x) =
Yi when x = Xi and zero otherwise (in effect, F containing a trivial mapping of Xi
to Yi for all Xi ). In this case, R̂n (f ) → 0, but performs poorly on out of sample
data. Such an example also provides more context on what it means to learn,
D.3. STATISTICAL LEARNING THEORY 397
Recalling E[R̂n (fˆn )] is contingent upon fˆ estimated from Dn (X, Y ) (the data sam-
ples) and that the optimal f (one that minimises statistical risk) is represented by
inf R(f ), estimation error captures how well fˆ, which is learnt from data, performs
f ∈F
against all possible choices in F. By contrast, as approximation error indicates the
best choice of f to minimise statistical risk, then the approximation error indicates
the deterioration in performance arising from restricting F to subsets of all possible
f ∈ F ∗ (i.e. all f measurable). The tradeoff is characterised by the fact that if F
becomes large (more prediction rules), estimation error tends to zero (i.e. there is
little that must be learned from the data), but the approximation error increases.
Recalling that by increasing the size of F, the size of R̂n (fˆ) can be minimised (in-
tuitively, we have a greater selection of f to choose from according to which R̂n (fˆ)
may be made small). But the chosen f that renders R̂n (fˆ) minimal, namely fˆn , will
overfit the data because R̂n (fˆ) is not a non-optimal estimate of the true risk Rn (fˆn ).
where Rf (L(f (X), Y )) is the true risk (equation D.3.2) associated with algorithm A
for a particular distribution PXY in predicting function f , and f ∈ F. The theorem
indicates that no algorithm can universally minimise true (statistical) risk across
all distributions. In particular, the theorem sets bounds upon how well models can
generalise. Below we discuss the quantum analogue of the no free lunch theorem.
TP + TN
Accuracy = (D.3.10)
TP + FP + TN + FN
= 1 − E[L(f (X), Y )] = 1 − E[I(f (Xi ) = Yi ))]. (D.3.11)
where n1 ni=1 (I(f (Xi ) − Yi )) is the proportion of correct predictions. Accuracy can
P
be written as:
n
1X
Accuracy(f ) = I(f (Xi ) = Yi )) (D.3.14)
n i=1
in which case empirical risk is one minus the accuracy. Given that E[L(f (X), Y )] =
limn→∞ R̂n (f ), then we can understand accuracy as per above in equation (D.3.11).
Thus we see the inverse relationship between minimising empirical risk and max-
imising accuracy.
while retaining predictive power. Ridge regression, common across the sciences, is
one such model utilising ridge functions, being functions that are a combination of
univariate functions with an affine transformation. Ridge coefficients are then those
ω̂ which minimise a penalised loss function, parametrised by some parameter λ ∈ R
(so higher values of ∥ω∥22 incur a higher penalty):
where here CP (f ) = λ∥ω∥22 connecting with the discussion of penalty terms above.
As we can see (via the loss function, for simplicity we assume L ∈ R), f : Rm → R,
while we can regard the addition of the λ∥ω∥22 term (denoted a regularisation term)
as one that penalises larger weights ω. The regularisation term thereby aims to
address overfitting by reducing the magnitude of ω and thus the variance of the
model. A useful way of connecting generalised linear models with neural networks is
to via projection pursuit models which represent linear combinations of non-linear
ridge functions. We do so in order to elucidate the geometric and directional nature
of statistical learning in this way, in turn connecting with intuition for quantum
geometric machine learning.
m−1
! m−1
!
X X
f X+ bj e j = g ⟨X, em ⟩ + bj ⟨ej , am ⟩ = g(⟨X, e⟩) = f (X) (D.4.3)
j=1 j=1
which elucidates the invariance under the affine transformation. Ridge functions
then form the basis for what in statistical learning is denoted project pursuit regres-
sion [12] with the general form [276]:
N
X
f (X) = gn (ωm X) (D.4.4)
n
wide variety of non-linear functions to choose from. As Hastie et al. [12] note, if
N is sufficiently large, then the model can approximate any continuous function in
Rm (see [14] §6.4.1). Thus projection pursuit regression models can be regarded as
universal approximators, a key characteristic of the success of neural networks.
The formal definition of a neural network has influenced later literature and
technical work in the field. We discuss briefly some of the elements below:
2. Constraints. Bounds upon (a) weight value range ||ωij || ≤ K1 ∈ R, (b) local
thresholds (biases of offsets) and (c) activation range i.e. ||σ|| ≤ K2 ∈ R;
3. Initial state. covering initialisation of (a) weights, (b) local thresholds and (c)
activation values; and
4. Transition functions. These are transition functions for (a) neuron functions
(specifying output of neuron via activation function), (c) learning rules (up-
dating weights and offsets), (c) clamping functions that keep certain neurons
constant and (d) ontogenic functions, those that change network topology.
Modern formulations generally track, with some differences, this taxonomy, with
each category and subcategory being a widely studied subdiscipline of machine learn-
ing research, such as those concerned with optimal network topology, pretraining
and tuning of hyperparameters (see [14]). Of note is that while statistical learning
theory and computer science can provide some theoretical bases for network design,
in general there are few or limited theoretical guarantees relating network architec-
tural features (such as network topology or choice of initialisation). As such, tuning
such architectural characteristics (which are often represented as hyperparameters
D.4. DEEP LEARNING 403
σ : Rm → Rn σk : R → R k = 1, ..., n
σ(z) = σ(ωX + ω0 ) = σ1 (z), ..., σn (z)
Each point within the simplex S represents a possible probability distribution over
m discrete outcomes, where pi is the probability of the i-th outcome. The function
σ transforms a vector z ∈ Rm into a vector p ∈ S such that each component of p,
denoted as pi , can be interpreted as the probability of the i-th outcome. Framed
in this way, the activation satisfies requirements for interpretation as a probability
distribution, namely (i) pi ≥ 0 (non-negativity) and (ii) m
P
i=1 pi = 1. An example
of an activation function often adopted as a proxy for probability is the softmax
function [278], being a type of fold function:
ezi
pi = Pm (D.4.6)
j=1 e zj
neurons ali such that each neuron in each layer (other than the first input layer) is
a (composition) function of each neuron in the preceding layer. Formally:
nl−1 nl−1
! !
(l)
X (l) (l−1) (l)
X (l) (l−1)
ai = σil ωij aj + ωi0 = σil ωij aj (D.4.7)
j=1 j=0
where l indexes each layer, i each neuron in that layer ali valued by an activation
(l)
function σi for that neuron in that layer (definition D.4.3) which is itself a function
(l)
of the sum of weight matrices ωij applied to the output of each neuron of the previous
(l−1) (l)
layer aj together with a bias term ωi0 .
(l) (l)
Sometimes the literature just uses ai or σi for neurons, but the distinction
(l)
is usually maintained to emphasise the activation function as a function and ai
as a neuron. Neurons are also occasionally referred to as ‘units’. Note that in
the right-most term above we have absorbed the bias ωi0 into the summation for
convenience (which can be achieved by considering the zeroth neuron in the layer al−1
0
as the identity). Following the classification of networks above, a fully-connected
neural network with layers indexed from l = 0, ..., L can then be formally described
below. For convenience in the following, we denote the linear argument of activation
functions as follows:
nl−1
(l)
X (l) (l−1)
zi = ωij aj (D.4.8)
j=0
which indicates how the jth neuron in the previous layer l − 1 feeds into the ith
(l)
neuron of the lth layer, weighted by the corresponding weight ωij . We include a
diagram depicting the fully-connected feed forward neural network in Fig. D.1.
(l)
a(l) = (a1 , ..., a(l)
nl ) (D.4.9)
406 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
3. Output layer. The output layer aL is then chosen to accord with the problem
at hand, so for classification problems it may be either a binary classification
or link-function (such as a logit function) giving an outcome σl ∈ [0, 1] inter-
(l)
pretable as a probability (e.g. σi where for example to classify K objects we
would have i = 1, ..., K). The overall output of the network is the estimator
Ŷ , denoted as Ŷ = (Ŷ1 , Ŷ2 , ..., ŶnL ), where nL is the number of neurons in the
layer:
(L) (L) (L)
f (X) = Ŷ = a = a1 , a2 , ..., a(L)
nL . (D.4.10)
Sometimes in the literature the output layer may itself be subject to an additional
transformation (see for example [12] §11.6) but such transformations can always just
be cast as a final layer whose activation function is simply whatever the transfor-
mation is.
As Hastie et al. [12] note, the non-linearity of the activation functions σ expand
the size of the class of candidate functions F that can be learnt. Functionally, the
network can be regarded as a functional composition, with each layer representing a
function comprising linear inputs into nonlinear activations σ l applied sequentially.
We can regard the neural network as a functional composition among activation
functions with inputs X and outputs a(L) . While not the focus here, in geometric
contexts doing so can allow us to frame the network as maps among differentiable
manifolds. We also note that multilayer feed-forward networks are of particular
significance due to the universal approximation theorem [279] for learning arbitrary
functions (and other universal and function approximation results) with a quantum
analogue that functions can be represented to arbitrary error at exponential depth
(which is practically infeasible).
(0)
a1
w1,1
(1)
a1
w1,2
(0)
a2
w1,3
(1)
a2
w1,4 (1)
(0) (0)
(0) a1 w w1,2 . . . w1,n a1 ω1
a3 (1) 1,1 (0) (0)
a w2,1 w2,2 . . . w2,n
a2 ω2
w1,n (1) 2 = σ . .. .. .. +
.. . .
a3 ..
. . . . .. ..
(1) w w . . . wm,n (0) (0)
a4
(0)
.. am m,1 m,2 an ωm
. nl−1
!
.. (l)
X (l) (l−1) (l)
ai = σil ωij aj + ωi0
. (1) j=1
am
(0)
an
Figure D.1: Schema of the first two layers of a fully-connected feed-forward neural network (def-
(l)
inition D.4.4) together with associated matrix and algebraic representation. Here ai are the
(l) (l)
input layer neurons, ωlj are the weights (absorbing bias terms) for neuron aj (diagram adapted
from [1]).
to train the model (in practice this involves a batch of randomly sampled training
data). Neural networks are parametrised by a set of unknown weights which are up-
datable (learnable) by means of updating according to an optimisation procedure.
We now discuss probably the most important optimisation technique in quantum
machine learning (as with classical), namely backpropagation based upon stochastic
gradient descent.
D.5.2 Backpropagation
Backpropagation [280–283] is a stochastic gradient descent-based method for opti-
mising model performance across neural network layers. For neural networks, the
most common approach is to use the method of gradient descent to update pa-
rameters, with the particular method by which this update is calculated being via
backpropagation equations. It represents, subject to a number of conditions, an effi-
cient way of performing gradient descent calculations in order to update parameters
ω of models fθ to achieve some objective, such as minimisation empirical risk (equa-
tion (D.3.2)) (loss) by, having calculated an estimate fˆθ (X), propagating the error
δ ∼ |fˆθ −Y | through the network. Propagation here refers to the use of the calculated
error δ in updating parameters ω across network layers. Backpropagation consists
of two phases: (a) a forward pass, this involves calculating each layer’s activation
(l)
function ai (see equation (D.4.7)); and (b) a backward pass where the backpropaga-
tion updates are calculated. From equation (D.3.5), assume a loss function denoted
generically by R̂n (ω) (in Chapters 3 and 4, we use variations of mean square error).
408 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
Firstly, for gradient descent recall the directional derivative C.1.2) in differential
form for Riemannian and subRiemannian manifolds and gradient (definition C.1.5).
Recall our network is a composition of (activation) functions parametrised by weight
(l)
tensors ω. For completeness, in this section: ωij is the weight vector for neuron i
(l)
in layer l that weights neuron j in layer l − 1, a(l) is layer l, ai is the ith neuron in
layer l and nl is the number of neurons (units) in layer l. We define gradient descent
for optimisation using these definitions as follows.
Definition D.5.1 (Gradient descent optimisation). Optimisation by gradient de-
scent is defined as a mapping:
N N
(l+1) (l)
X ∂ R̂k (l)
X
ωij = ωij − γl (l)
= ωij − γl ∇ω(l) R̂k (D.5.1)
k=1 ∂ωij k=1
ij
where (a) R̂k are loss/error values for the kth data point in the training set Dn (X, Y ),
(l)
(b) ωij denote the weights of the ith neuron for the lth layer weighting the jth neuron
of the l−1th layer and (c) γl the learning rate for that layer (which is usually constant
across layers and often networks).
(l)
Equation (D.5.1) updates each weight ωij by reference to each example (Xi , Yi ),
however in practice a subsample of Dn is used denoted a batch, such that R̂k is
an average over the size of the batch size NB i.e. (1/NB ) N
P B
k R̂k (we omit the
summation below for brevity) (note we set k = i for consistency with the choice
of each data point (Xi , Yi )). Calculating ∇ω(l) R̂i relies upon the chain rule. First,
ij
(l)
consider how R̂i varies in the linear case of zi (without applying the non-linear
activation function σ):
(l)
∂Ri ∂Ri ∂zi
(l)
= (l) (l)
(D.5.2)
∂ωij ∂zi ∂ωij
The first of these terns is denoted the error while the second term is shown to be
equivalent to the preceding layer:
(l) ∂Ri
δi = (l)
(D.5.3)
∂zi
nl−1
!
(l)
∂zi ∂ X (l) (l−1)
(l)
= (l)
ωiµ a(l−1)
µ = aj (D.5.4)
∂ωij ∂ωij µ=0
(l)
where the partial derivatives vanish in equation (D.5.4) for all but µ = j. The δi
(l)
term in equation (D.5.3) is denoted the error. As we show below, δi is dependent
upon errors in the l + 1th layer, hence the error terms propagate ‘backwards’, giving
the name backpropagation.
D.5. OPTIMISATION METHODS 409
(L) (L)
For the output layer, we note that R̂i = R̂i (Ŷi , Yi ) = R̂i (σi (zi ), Y ) thus by
the chain rule:
(L)
(L) ∂ R̂i ∂ R̂i ∂σi ∂ R̂i ′ (L) (L)
δi = (L)
= (L) (L)
= (L)
σi (zi ) (D.5.5)
∂zi ∂σi ∂zi ∂σi
and:
with:
nl+1
∂ R̂i ′ (l) (l−1)
X (l+1) (l+1)
(l)
= ∇ω(l) R̂i = σil (zi )aj ωiµ δµ (D.5.12)
∂ωij ij
µ=1
(l)
We can see from equation (D.5.12) that the error δi for layer l is dependent on errors
(l+1)
in the l+1th layer i.e. δµ . In this sense, errors propagate backwards from the final
output layer to the first layer. Backpropagation thus relies firstly on the forward
pass, the estimates fˆk (X) are computed which allows computation of first δkL (errors
based on outputs and labels) (the forward phase) following which they are ‘back-
propagated’ throughout the network via the backpropagation equations (D.5.11)
(the ‘backward phase’). Error terms for layer l − 1 are calculated by weighting via
(l+1) (l+1) ′ (l)
ωiµ the error terms for the next layer δµ and then scaling via σil (zi ). Doing so
410 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
allows computation of the gradients in equation (D.5.2). A training epoch is then one
full round of forward and backward passes. Note that equation (D.5.1) sums over the
training set, referred to as batch learning (the most common case). Alternatively, one
can perform backpropagation point-wise based on single observations (Xi , Yi ) ∈ Dn .
Quantum backpropagation also brings with it a number of subtleties arising be-
cause of the effect of measurement on quantum states. Thus the classical analogue
of backpropagation does not carry over one-for-one to the quantum case. In many
cases, models learn classically parametrised quantum circuit features via implement-
ing offline classical backpropagation (as per above) conditional upon measurement
statistics. Examples of quantum-related backpropagation proposals include, for ex-
ample, a series of papers by Verdon et al. [71–74, 88, 251] where continuous param-
eters are quantised and encoded in a superposition of quantum registers enabling a
backpropagation-style algorithm. Other examples, including quantum neural net-
work proposals [70, 284, 285] provide automatic differentiation as a means of propa-
gating a form of automatic measurement that is then propagated through a quantum
analogue of a fully-connected network. In each case, one must remember that the
optimisation strategy relies upon measurement which (as per definition A.1.34) is
an inherently quantum-to-classical channel.
where ∇v is the directional derivative given in equation C.1.2 and α is step size (see
section D.5.4). Because calculating the exponential term in equation (D.5.14) can
be difficult, instead mapping to the parameter bundle occurs via a first-order Taylor
expansion of the exponential Euclidean retraction R (adapted from [287]):
which shifts p by some infinitesimal amount. Natural gradient descent is then given
by:
Here gθ-1 (θ) is the dual Riemannian metric and is invariant under invertible smooth
changes of parametrisation. Note the presence of g -1 indicates that the gradient,
which transforms contravariantly. The benefit of the natural gradient is that it is
invariant under coordinate transformations, though there are some subtleties regard-
ing convergence discussed in the literature [286, 288,289]. The quantum analogue of
natural gradient descent is briefly touched upon in Appendix D as a technique for
optimisation.
As the abstract definition of neural networks in section D.4.2 above indicates, there
is a wide combinatorial landscape of parameters for the design of neural network
architectures, including topology, choices of activation functions, loss functions, reg-
ularisation, dropout and so on. Hyperparameters are settings of a neural network
model which are not adapted or updated by the learning protocol of the model in
a direct sense (though model performance may inform ancillary hyperparametric
models). In effect changing the hyperparameter changes the neural network model
itself. We set out briefly a few such parameters, techniques and hyperparameters
used in the results detailed in later chapters.
412 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
(i) Learning rate. The learning rate parameter γr (or gain in adaptive learning)
in equations (D.5.1) represent the step size at each iteration. Mathematically
it represents a scaling of the (directional derivative) ∇ω R and is among the
most important hyperparameters for network performance.
(ii) Epochs. The number of epochs over which training (a full forward- and backward-
pass of the backpropagation algorithm in equation (D.5.11)) is a tunable pa-
rameter of neural networks. In principle the number of epochs can affect the
sensitivity of weights to training data and can risk overparametrisation such
that models do not generalise well. Conversely, insufficient epochs may not
sufficiently minimise empirical risk (definition D.3.2).
Finally, we briefly note that while statistical learning theory provides a generalised
framework for analysing model performance, deep learning networks often exhibit
singularities i.e. they are singular models. The fact that over-parametrised deep
learning models can generalise well runs counter somewhat to the implied trade-
off between model complexity and performance at the heart of statistical learning
theories [290]. In such cases, alternative approaches such as singular learning the-
ory provide means of estimating appropriate statistical learning figures of merit for
models (see seminal work by Watanabe [291] generally and [292] for more detail).
areas. The unifying factor behind quantum machine learning is the use of quantum
data or quantum information processes in ways that enable, constitute, or rely upon
some type of learning protocol as measured or indicated by relevant metrics, such
as in-sample performance, variance, accuracy and so on. The spectrum of quan-
tum machine learning algorithms is expansive, covering quantum neural networks,
parametrised (variational) quantum circuits, quantum support vector machines and
other formulations [35]. Similarly, the application of deep learning principles to
quantum algorithm design, such as via quantum neural networks (see [64, 70] for
an overview), quantum deep learning architectures (including quantum analogues
of graph neural networks, convolutional neural networks and others [71, 72, 74, 88])
speaks to the diversity of approaches now studied in the field. Coupled with these
approaches, emerging trends in the use of symmetry techniques, such as dynamical
Lie algebraic neural networks, symmetry-preserving learning protocols and geomet-
ric quantum machine learning approaches (discussed in particular in section D.8.3)
offer new methods for improving on problem-solving in the quantum and classical
domain. Much literature is also devoted to understanding the differences, some
subtle, some less-so, between classical and quantum machine learning both at the
theoretical level of statistical learning, down to the practical implementation of al-
gorithms in each case (examples include literature in quantum algorithm design and
complexity theory [293]). In this section, we provide a short high-level summary of
some of the key features of quantum machine learning relevant to Chapters 3 and
4, with a particular focus on variational or parametrised quantum machine learning
circuits. First-off, we consider a few key principle differences between classical and
quantum approaches in machine learning, starting with the fundamental distinction
between the need for quantum channels to preserve unitarity (definition A.1.30)
versus the dissipative nature of classical neural networks [294].
As Schuld et al. note [295], the challenge for adapting classical neural network archi-
tecture to the quantum realm in part lies in structural distinctions between classical
and quantum computing. Classical neural network architectures are fundamentally
based upon non-linear and often dissipative dynamics, as distinct from the linear
unitary dynamics of quantum computing. Classical neural networks, operations
such as activation functions and normalisation can lead to a loss of information (e.g.
applying a ReLU activation [296]) can set negative values to zero, constituting infor-
mation loss. Classical neural networks also are often irreversible (such as pooling or
other convolutional network operations) such the network is not, functionally speak-
ing, locally bijective between layers. Quantum information processing, by contrast,
414 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
Quantum learning tasks are similarly contingent upon the particular functional
forms to be learnt. For example, [301] consider a class of learning problems con-
cerned with learning the function:
The model in [301] is in many ways generic and reflective of many QML archi-
tectures, reflecting the fact that information extraction for updating protocols in
effect represents a quantum-classical channel (definition A.1.31) via measurement
or in-built trace operations. A comparison with quantum control is again useful in
this regard. For hybrid classical-quantum structures, empirical risk minimisation is
with respect to a classical representation f (x) (e.g. contingent upon measurement
outcomes m given measurement operator M ) and typically a classical loss function
(e.g. fidelity measures, quantum Shannon entropy or some other function of input
and output state representations). Physical intuition is useful here: classical gradi-
ent descent then enters into Hamiltonian H governing the evolution (and transition)
of the quantum system via controls (e.g. coefficients ci that affect generator ampli-
tudes). Other results in quantum-related statistical learning include bounds on the
expressibility of variational quantum circuits [302–306].
The limits of quantum machine learning algorithms from a statistical learning per-
spective have been examined throughout the literature [307]. For example, it is
shown in [307] that challenges persist for quantum machine learning algorithms
with polyalgorithmic time complexity in input dimensions. In this situation, the
statistical error of the QML algorithm is polynomially dependent upon the number
of samples in the training set, such that the best error rate ε is achievable only if
statistical and approximation error scale equivalently, which in turn requires approx-
imation error to scale polynomially with the number of samples. For logarithmic
dependency on sampling, it is shown that the additive sampling error from mea-
surement further introduces a polynomial dependency on the number of samples.
Such constraints affect the viability of whether certain popular or mooted QML al-
gorithms may in fact provide any advantage over classical analogues. Thus in many
cases proposed QML algorithms face challenges posed by barren plateaus, entan-
glement growth and statistical learning barriers. Thus there is, as with classical
machine learning, a need to consider how algorithms may (if at all) be architected
in order to solve such scaling challenges. Doing so is one of the motivations for
416 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
this work and its exploration of techniques from geometry, topology and symmetry
algebras. As discussed in the introduction, in a machine learning context, the ability
to encode prior or known information, offers one route to such efficiency, hence our
synthesis of what we denote (and further articular) as greybox QML together with
geometric QML.
(ii) Wave function collapse. Unlike the classical case, quantum measurement de-
coheres the state into eigenstates of the measurement operator Mm and un-
like classical measurement, the decohering process of quantum measurement
channels thus leads to information loss, thus the dynamic updating becomes
problematic (which is also the reason online control is problematic).
sentation via |ψ⟩ ∝ |000001 11001⟩ (see [111]). Amplitude encoding associates nor-
malised classical information e.g. for an n-qubit system (with 2n different possible
n P
(basis) states |j⟩), a normalised classical sequence x ∈ C2 , k |xk |2 = 1 (possibly
with only real parts) with quantum amplitudes x = (x1 , ..., x2n ) can be encoded as
Pn
|ψx ⟩ = 2j xj |j⟩. Other examples of sample-based encoding (e.g. Qsample and dy-
namic encoding are also relevant but not addressed here). From a classical machine
learning perspective, such encoding regimes also enable both features and labels to
be encoded into quantum systems.
In Chapters 3 and 4, we assume the standard orthonormal computational basis
{|0⟩ , |1⟩} such that ⟨1|0⟩ = ⟨0|1⟩ = 0 and ⟨1|1⟩ = ⟨0|0⟩ = 1. Quantum states
encode information of interest and use in optimisation problems. They are not
directly observable, but rather their structure must be reconstructed from known
information about the system. In machine learning contexts, quantum states may be
used as inputs, constituent elements in intermediate computations or label (output)
data. For example, in the QDataset (explored in Chapter 3), intermediate quantum
states at any time step may be reconstructed using the intermediate Hamiltonians
D.6. QUANTUM MACHINE LEARNING 419
and unitaries for each example. The code repository for the QDataSet simulation
provides further detail on how quantum state representations are used to generate
the QDataSet [171]. Depending on machine learning architecture, quantum states
will usually be represented as matrices or tensors and may be used as inputs (for
example, flattened), label data or as an intermediate input, such as in intermediate
layers within a hybrid classical-quantum neural network (see [40, 95]). For example,
consider the matrix representation of eigenstates of a Pauli σz operator below:
!
1 0
σz = (D.6.2)
0 −1
In the computational basis, this operator has two eigenstates |0⟩ , |1⟩ for eigenvalues
λ = 1, −1:
! !
1 0
|0⟩ = for λ=1 |1⟩ = for λ = −1 (D.6.3)
0 1
where we have adopted the formalism that the λ = 1 eigenstate is represented by |0⟩
and the λ = −1 eigenstate is represented by |1⟩ (our choice is consistent with Qutip -
practitioners should check platforms they are using for the choice of representation).
These eigenstates have a density operator representation as:
⟨0| ⟨1|
˙ |0⟩ a
ρ = a|0⟩⟨0| + b|0⟩⟨1| + c|1⟩⟨0| + d|1⟩⟨1|= b (D.6.6)
|1⟩ c d
P
where a, b, c, d ∈ C are the complex-values amplitudes respective. Given ρ = pi ρi ,
the diagonal elements an of the density matrix describe the probability pn of the
420 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
For pure states, the diagonal along the density matrix will only have one non-zero
element (i.e. it will be 1) so that ρ = ρi . A mixed state will have multiple entries
along the diagonal such that 0 ≤ an < 1. For example, the σz eigenvectors have the
representation:
the sequence Ui up to i = j (note the order of application is such that Uj ...U0 |ψ⟩).
In this section, we sketch the general use of variational quantum algorithms adopted
in Chapters 3 and 4. As Schuld et al. [64] note, variational quantum circuits can
be interpreted as both deterministic and probabilistic machine learning models for
functions fθ : X → Y (or learning a probability distribution Pθ ). As noted in
the literature, quantum circuits parametrised by θ which are updated according
to some objective (cost) function as part of an optimisation process can in some
sense naturally be regarded as akin to machine learning models [65, 80]. Firstly, we
define a parametrised quantum circuit in terms of a unitary U (θ) dependent upon
parameters θ ∈ Rm .
operations:
Z T
′ ′
U (θ, t) = T+ exp −i H(θ, t )dt . (D.7.1)
0
i.e. a sequence of unitaries acting on the initial state |ψ(0)⟩ which we can represent
via U0 (t). We assume left-action on states |ψ⟩ where dependence upon t is under-
stood. Recall each Ui (θi (t)) above is the solution to the time-dependent Schrödinger
equation (equation A.1.18) as a sequence of integrals. For simplicity, we assume
the time-independent approximation (equation (A.1.25)). The aim of a variational
quantum algorithm (VQA) is to find an optimal set of parameters minimising a cost
functional on the parameter space given by C(θ) : Rm → R such that:
The term variational in this quantum context [106] derives from the underlying
principles of variational calculus of finding optimal solutions, such as (originally
speaking) computing, for example, lowest energy states of a system. As Schuld et al.
note, the term variational quantum circuit is sometimes used interchangeably with
parametrised quantum circuit (see Benedetti et al. [65] for a still-salient review). In
this work, we adopt this association between the two. We set out the deterministic
and probabilistic form below. For a deterministic model, we let fθ : X → Y and
denote U (X, θ) (a quantum circuit) for initial state X ∈ X and parameters θ ∈
Rm . Let {Ok } represent the set of (Hermitian) observable (measurement) operators.
Then the following function represents a deterministic model:
where ρ(X, θ) = U (X, θ)ρ0 U (X, θ)† . Common approaches to the form of U (X, θ) in-
clude, in analogy with classical machine learning, U (X, θ) = W (θ)S(X) where W (θ)
represents a parametrised weight tensor with S(X) representing the encoding of ini-
tial features. In this work, we adopt geometric quantum control-style architectures
where the unitary channel layers in each network constructed from Hamiltonian
layers comprising dynamical Lie algebra generators Xj ∈ g (see section C.5.5.1)
and associated time-dependent control functions uj (t) (sometimes denoted cj (t))).
In terms of measurement statistics, then for U (θ) |ψ(0)⟩ = |ψ(t)⟩, then the class
of functions in equation (D.7.3) we seek to learn are represented by measurement
422 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
The typical supervised QML task for unitary synthesis involves a data space Dn (X , Y)
where feature data is drawn from quantum state data for a Hilbert space X ∼ H
and Y ∼ K (which can be real R or complex C -valued) labels. In many cases,
including our two machine learning-based chapters, H is a tensor product for qubits
H⊗ = ⊗n H where dim H⊗ = 2n . The form of Dn (X , Y) = {ρi , yi }i where Dn is sam-
pled from an underlying population. In Chapter 4 for example, the training data
generated is somewhat unusual in that label data (that which we want to accurately
predict) is the sequence n unitaries (Un ) (i.e. so (Un ) ∈ Y) where the target (final
state) unitary UT is actually fed in via an input layer (i.e. UT ∈ X ). In that case,
as we discuss, what we are interested in is conditioned on UT , what set of controls
cj (∆t) can be generated such that the resulting sequence (Un (∆tn )) leads to UT .
Conversely, for the QDataSet in Chapter 3, the training data may vary, such as
being classical measurement statistics arising from m Pauli measurements (in which
case Y ⊂ Rm ). As noted in our discussion on measurement (section A.1.6), for our
final two chapters we assume access to a measurement process that allows us data
about (of) U .
mation assuming H(t) is constant for over small ∆t then effectively we can treat
the controls as parametrised by θ and in this way our unitaries U = U (θ(t)). In
many cases, our parameter e.g. θ ∈ Rm can be framed as the fibre of an underlying
manifold K such the commutative diagram in Figure D.2 holds. While not the sub-
ject of this work, understanding the relationship between parameter manifolds and
target (label) data manifolds (often described geometrically in terms of embedding)
is an important research direction utilising techniques from geometry and machine
learning, with specific applications to solving problems in quantum computing.
R Rm h∗
R2n
πK πM
K h
M∼
=G
Figure D.2: Manifold mapping between parameter manifold K and its fibre bundle Rm and target
manifold M with associated fibre bundle spaces R2n . Optimisation across the parameter manifold
can be construed in certain cases as an embedding of the parameter manifold within the target
(label) manifold M, where such embeddings may often be complex and give rise to non-standard
topologies.
(i) Parameter shift rules. Parameter shifts [325, 326] in the context of gradient
descent represent a method of calculating gradients by shifts of parameters.
By evaluating the cost function C(θ) at the value of two different parameters
θ, θ + ϵθ , the rescaled difference ∇θ − ∇θ+ϵ forms an unbiased estimate of ∇θ ,
usually restricted to gates with two different eigenvalues [326]. The reason
this matters is because in certain cases classical stochastic gradient descent is
inappropriate or unable to obtain a well-defined gradient. For a cost functional
C(θ) we calculate the gradient as:
C(θ + ϵ) − C(θ − ϵ)
∇θ C = (D.7.5)
2
(ii) Quantum natural gradients. the quantum natural gradient method [327] re-
lated to the Fisher information metric and Fubini-Study metric FQ provides
D.7. VARIATIONAL QUANTUM CIRCUITS 425
where η is the learning rate, loss function L and g + is related to the pseudo-
inverse of the Fubini-Study metric tensor. This metric tensor is sometimes de-
noted the ‘quantum geometric tensor’ which effectively involves inner-product
contractions between states and their derivatives as a means of calculating the
directional derivatives.
η2m−l−1 i X
Kjl (t) = Trrest Mjl (x, t)
S x
426 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
where Mjl encodes the errors calculated via a commutator term which essen-
tially propagates from the input layer to the lth layer for one term and from the
label state to the l + 1-th state, the difference between the unitaries between
any two states then will show up as a non-zero commutator which is what Mjl
encodes. This it is shown is equivalent to calculating the derivative of L for
gradient update purposes via the equations above, with the benefit that only
two layers need to be updated at any one time rather than needing to update
all layers to do so.
In this final section, we sketch an outline of the general structure of greybox varia-
tional quantum circuits that are the subject of our next two chapters. We connect the
general architecture to three related fields in quantum and classical machine learning
(a) Lie algebraic methods (such as Lie group machine learning [329, 330]), (b) geo-
metric machine learning [27] and (c) geometric quantum machine learning [30, 34].
D.8. SYMMETRY-BASED QML 427
ric spaces, in particular discovering strategies for learning sequences of controls for
generating time-optimal geodesic paths along the manifold of interest. We describe
this relationship below.
(i) Objective. Firstly, our problem is one of using machine learning to synthesise
time-optimal unitaries relevant to quantum computational contexts. Specif-
ically, this means using machine learning architecture to solve certain con-
trol systems expressed through the Pontryagin Maximum Principle (definition
C.5.5). Our approach focused on quantum control lends itself to variational
quantum algorithms from a design-choice perspective.
(iii) Inputs. The inputs layers a(0) of the network take as their feature data unitaries
U (t) ∈ G or in the case of multi-qubit systems, G⊗G for Chapters 3 and 4, G =
SU(2). The unitaries have a representation by way of the usual representation
of SU (2) in terms of the matrix group M2 (C) (see definition B.2.1). Denoting
such matrices X ∈ M2 (C) (or X1 ⊗ X2 ∈ M2 (C) ⊗ M2 (C) (by the single X for
convenience), the initial layers tend to transform (e.g. flatten) such matrices
into a realised form. The activation function of these input layers tends to
then just be the identity:
(iv) Feed-forward layers. The input layers are then fed into a sequence of typical
classical neural network layer, in our case feed-forward layers (see definition
D.4.4):
(v) Control pulse layers. As we sketch below, parameter network layers (parametrised
by θ) feed into layers comprising control pulses uj (t). The neurons within these
D.9. GREYBOX MACHINE LEARNING 431
layers are characterised by activation functions such that uj (t) ∈ [0, 1], where
we rely implicitly on the fact that for the bounded control problem, we can
arbitrarily scale (normalise) controls to fit within minimum length L (see def-
inition C.2.6).
(vi) Hamiltonian and Unitary layers. The control pulse layers then feed into be-
spoke layers to (i) construct the Hamiltonian from fixed Lie algebra generators
in g which is then (ii) input into a layer comprising an exponential activation
that constructs time-independent approximation to time-dependent unitary
operator. In all cases, our target of interest is the optimal unitary for time tj .
In Chapter 4, we are interested in the sequence of controls (ui (t)) that generate
a sequence of estimated unitary operators (Ûj (t)) which, under the assumption
of time-independence (see definition A.1.25), allows us to generate our target
U (T ) ∈ G. Each node generating a unitary has k generators and control func-
tions ui (t), i = 1, ...k where k = dim g (or whatever the control subset is). Our
labels and our function estimates are sequences of unitaries and, as we note
below, the target UT is actually one of the inputs to the models.
(viii) Measurement. In Chapter 3, measurement statistics for each time slice are
captured in the data (from which one can reconstruct unitaries or Hamiltonians
for such time slice), whereas for Chapter 4, the existence of an external process
that allows for Û (t) and Û (t) to be constructed is assumed.
432 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
Bibliography
[2] S. Helgason, Differential Geometry, Lie Groups, and Symmetric Spaces, ser.
ISSN. Elsevier Science, 1979.
[3] T. Hawkins, Emergence of the theory of Lie groups: An essay in the history
of mathematics 1869–1926. Springer Science & Business Media, 2012.
[6] T. L. Heath and others, The thirteen books of Euclid’s Elements. Courier
Corporation, 1956.
[10] S. Helgason, Differential Geometry and Symmetric Spaces, ser. ISSN. Elsevier
Science, 1962.
433
434 BIBLIOGRAPHY
[13] V. Vapnik, The Nature of Statistical Learning Theory. Springer, New York,
1995.
[17] D. D’Alessandro and B. Sheller, “On K-P sub-Riemannian Problems and their
Cut Locus,” in 2019 18th European Control Conference (ECC), Jun. 2019, pp.
4210–4215.
[21] N. Khaneja, R. Brockett, and S. Glaser, “Time Optimal Control in Spin Sys-
tems,” Physical Review A, vol. 63, no. 3, Feb. 2001.
[24] U. Boscain, T. Chambrion, and J.-P. Gauthier, “On the K + P Problem for
a Three-Level Quantum System: Optimality Implies Resonance,” Journal of
Dynamical and Control Systems, vol. 8, no. 4, pp. 547–572, Oct. 2002.
[26] V. Jurdjevic and I. Kupka, “Control systems on semi-simple Lie groups and
their homogeneous spaces,” in Annales de l’institut Fourier, vol. 31, 1981, pp.
151–179.
[28] S.-i. Amari, Information geometry and its applications. Springer, 2016, vol.
194.
[29] F. Li, L. Zhang, and Z. Zhang, Lie Group Machine Learning. Berlin, Boston:
De Gruyter, 2019.
[39] E. Perrier, A. Youssry, and C. Ferrie, “QDataSet, quantum datasets for ma-
chine learning,” Scientific Data, vol. 9, no. 1, pp. 1–22, 2022.
[40] E. Perrier, D. Tao, and C. Ferrie, “Quantum geometric machine learning for
quantum circuits and control,” New Journal of Physics, vol. 22, no. 10, p.
103056, Oct. 2020.
[41] E. Perrier and C. S. Jackson, “Solving the KP problem with the global cartan
decomposition,” 2024, arXiv:2404.02358.
[47] J. Hall, Beyond AI: Creating the conscience of the machine. Amherst, NY:
Prometheus, 2007.
[51] M. do Carmo, Differential Geometry of Curves and Surfaces: Revised and Up-
dated Second Edition, ser. Dover Books on Mathematics. Dover Publications,
2016.
BIBLIOGRAPHY 437
[75] J. J. Hopfield, “Neural networks and physical systems with emergent collective
computational abilities,” Proc Natl Acad Sci U S A, vol. 79, pp. 27–8424, Apr.
1982.
[89] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data:
Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350–361, 2017.
[93] J. Preskill, “Quantum Computing in the NISQ era and beyond,” Quantum,
vol. 2, p. 79, 2018.
[105] G. Riviello, K. M. Tibbetts, C. Brif, R. Long, R.-B. Wu, T.-S. Ho, and H. Rab-
itz, “Searching for quantum optimal controls under severe constraints,” Phys-
ical Review A, vol. 91, no. 4, p. 043401, 2015.
[115] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A
large-scale hierarchical image database,” in 2009 IEEE conference on computer
vision and pattern recognition. Ieee, 2009, pp. 248–255.
[116] R. Kitchin and G. McArdle, “What makes Big Data, Big Data? Exploring the
ontological characteristics of 26 datasets,” Big Data & Society, vol. 3, no. 1,
p. 2053951716631130, Jun. 2016.
[124] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recur-
rent neural networks,” in Proceedings of the 30th International Conference on
International Conference on Machine Learning - Volume 28, ser. ICML’13.
Atlanta, GA, USA: JMLR.org, Jun. 2013, pp. III–1310–III–1318.
[129] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast and
Accurate Modeling of Molecular Atomization Energies with Machine Learn-
ing,” Physical Review Letters, vol. 108, no. 5, p. 058301, Jan. 2012.
[164] A. Gelman, P. Gelman, and J. Hill, Data Analysis Using Regression and Multi-
level/Hierarchical Models, ser. Analytical Methods for Social Research. Cam-
bridge University Press, 2007.
448 BIBLIOGRAPHY
[168] J. Gao, L.-F. Qiao, Z.-Q. Jiao, Y.-C. Ma, C.-Q. Hu, R.-J. Ren, A.-L. Yang,
H. Tang, M.-H. Yung, and X.-M. Jin, “Experimental Machine Learning of
Quantum States,” Physical Review Letters, vol. 120, no. 24, p. 240501, Jun.
2018.
[171] E. Perrier, A. Youssry, and C. Ferrie, “QDataset: Quantum Datasets for Ma-
chine Learning,” arXiv:2108.06661, 2021.
[187] P. Leifer, “Quantum geometry of the Cartan control problem,” Oct. 2008,
arXiv:0810.3188 [physics].
[188] B. Li, Z.-H. Yu, and S.-M. Fei, “Geometry of Quantum Computation with
Qutrits,” Scientific Reports, vol. 3, p. 2594, Sep. 2013.
[191] W. A. d. Graaf, Lie Algebras: Theory and Algorithms. Elsevier, Feb. 2000.
[193] W. Huang, “An explicit family of unitaries with exponentially minimal length
Pauli geodesics,” Jan. 2007, arXiv:quant-ph/0701202.
[194] L. Noakes, “A Global algorithm for geodesics,” Journal of the Australian Math-
ematical Society, vol. 65, no. 1, pp. 37–50, Aug. 1998.
[201] B. Lin, J. Yang, X. He, and J. Ye, “Geodesic Distance Function Learning via
Heat Flow on Vector Fields,” arXiv:1405.0133 [cs, math, stat], May 2014.
[211] G. Dirr and U. Helmke, “Lie Theory for Quantum Control,” GAMM-
Mitteilungen, vol. 31, no. 1, pp. 59–93, 2008.
[214] W.-Q. Liu, X.-J. Zhou, and H.-R. Wei, “Collective unitary evolution with
linear optics by Cartan decomposition,” Europhysics Letters, 2022.
452 BIBLIOGRAPHY
[223] “Cartan’s equivalence method,” Oct. 2018, page Version ID: 862708484.
[224] Z.-Y. Su, “A Scheme of Cartan Decomposition for su(N),” Mar. 2006,
arXiv:quant-ph/0603190.
[231] R. Hermann, Lie Groups for Physicists, ser. Mathematical physics monograph
series. W. A. Benjamin, 1966.
[233] T.-L. Wang, L.-N. Wu, W. Yang, G.-R. Jin, N. Lambert, and F. Nori, “Quan-
tum Fisher information as a signature of the superradiant quantum phase
transition,” New J. Phys., vol. 16, p. 063039, 2014.
[235] B. C. Hall, Lie groups, Lie algebras, and representations. Springer, 2013.
[239] J. R. Schue, “Hilbert space methods in the theory of Lie algebras,” Transac-
tions of the American Mathematical Society, vol. 95, no. 1, pp. 69–80, 1960.
[245] K. Kraus, A. Böhm, J. D. Dollard, and W. Wootters, States, Effects, and Op-
erations Fundamental Notions of Quantum Theory: Lectures in Mathematical
Physics at the University of Texas at Austin. Springer, 1983.
[252] Z. Liu and C. Zheng, “Recurrence Theorem for Open Quantum Systems,”
Feb. 2024, arXiv:2402.19143 [quant-ph].
[255] B. Collins and P. Śniady, “Integration with respect to the Haar measure on
unitary, orthogonal and symplectic group,” Communications in Mathematical
Physics, vol. 264, no. 3, pp. 773–795, 2006.
[265] Scott Aaronson, “The learnability of quantum states,” Proceedings of the Royal
Society A: Mathematical, Physical and Engineering Sciences, vol. 463, no.
2088, pp. 3089–3114, Dec. 2007.
[272] P. McCullagh and J. A. Nelder, Generalized linear models. Chapman & Hall,
1989.
[275] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimiza-
tion,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp.
67–82, Apr. 1997, conference Name: IEEE Transactions on Evolutionary Com-
putation.
[288] S.-I. Amari, “Natural gradient works efficiently in learning,” Neural computa-
tion, vol. 10, no. 2, pp. 251–276, 1998.
[293] S. Aaronson, “How Much Structure Is Needed for Huge Quantum Speedups?”
Sep. 2022, arXiv:2209.06930 [quant-ph].
[294] M. Schuld, I. Sinayskiy, and F. Petruccione, “The quest for a quantum neural
network,” Quantum Information Processing, vol. 13, no. 11, pp. 2567–2586,
2014.
[296] D. Boob, S. S. Dey, and G. Lan, “Complexity of training relu neural network,”
Discrete Optimization, p. 100620, 2020.
[297] S. Kak, “On quantum neural computing,” Information Sciences, vol. 83, pp.
20–0255, 1995.
[302] Y. Du, M.-H. Hsieh, T. Liu, S. You, and D. Tao, “On the learnability of
quantum neural networks,” arXiv:2007.12369, 2020.
[310] Z. Liu, L.-W. Yu, L.-M. Duan, and D.-L. Deng, “The Presence and Ab-
sence of Barren Plateaus in Tensor-network Based Machine Learning,” 2021,
arXiv:2108.08312.
[319] H.-Y. Liu, T.-P. Sun, Y.-C. Wu, Y.-J. Han, and G.-P. Guo, “A Parameter
Initialization Method for Variational Quantum Algorithms to Mitigate Barren
Plateaus Based on Transfer Learning,” arXiv:2112.10952, 2021.
[320] K. Zhang, M.-H. Hsieh, L. Liu, and D. Tao, “Gaussian initializations help
deep variational quantum circuits escape from the barren plateau,” 2022,
arXiv:2203.09376.
[321] C. Zhao and X.-S. Gao, “Analyzing the barren plateau phenomenon in training
quantum neural networks with the ZX-calculus,” Quantum, vol. 5, p. 466, Jun.
2021.
[330] M. Lu and F. Li, “Survey on lie group machine learning,” Big Data Mining
and Analytics, vol. 3, no. 4, pp. 235–258, Dec. 2020, conference Name: Big
Data Mining and Analytics.
[332] F. Li and H. Xu, “The theory framework of Lie group machine learning
(LML),” Computer Technology and application, vol. 1, no. 3, pp. 62–80, 2007.