0% found this document useful (0 votes)

20 views

Quantum_Geometric_Machine_Learning

Uploaded by

Danilo Fachin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Quantum_Geometric_Machine_Learning

Uploaded by

Danilo Fachin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 495

Quantum Geometric Machine Learning

by Elija Perrier

Thesis submitted in fulfillment of the requirements of the degree of

Doctor of Philosophy

under the supervision of

arXiv:2409.04955v1 [quant-ph] 8 Sep 2024

A/Prof. Dr. Christopher Ferrie

Prof. Dr. Dacheng Tao

University of Technology Sydney

Faculty of Engineering and Information Technology

May, 2024
Acknowledgments
I would like to thank my principal supervisor Associate Professor Dr. Christopher
Ferrie for his patient, instructive and illuminating supervision throughout the tenure
of my research. In addition, I would like to thank A/P Ferrie for the opportunity to
participate in an AUSMURI collaborative research project while working with senior
and experienced researchers across quantum science disciplines. I would like to ac-
knowledge and thank my co-supervisor, Professor Dr. Dacheng Tao, for his counsel
and informative advice, especially relating to information theory, machine learning
and other computational science. I would like to express particular gratitude to Dr.
Christopher Jackson of the University of Waterloo. Dr Jackson’s insights and men-
torship have had a significant positive impact on both my research and academic
career. I would like to acknowledge the significant assistance of the iHPC training
facility at UTS, whose resources and expertise were important elements in running
large-scale quantum simulations over several months and years. I would like to ac-
knowledge the generous financial support provided by the Australian Government
via the Australian Research Training Program scholarship and by the UTS Faculty
of Engineering and Information Technology. I would like to thank staff, faculty and
students (past and present) at the Centre for Quantum Software and Information,
UTS. In particular, I acknowledge the support and discussions with colleagues Pro-
fessor Dr. Michael Bremner, Associate Professor Dr. Simon Devitt and Professor
Dr. Min-Hsiu Hsieh, together with Dr. Akram Youssry, Dr. Arinta Auza, Dr. Maria
Quadeer and Lirnadë Pira. I would like to also thank my academic colleagues at the
Australian National University whose engagement on deep learning, artificial intel-
ligence and other technical matters was highly beneficial, especially Professor Dr.
Seth Lazar of the ANU, Professor Dr. Tiberio Caetano of the Gradient Institute and
Professor Dr. Kimberlee Weatherill of the University of Sydney. I would like to also
thank colleagues at Stanford University, including Mauritz Kop and encouragement
from Professor Dr. Mateo Aboy at Cambridge University.
Finally I would like to thank my family including my mother, Janice Perrier,
whose help in caring for our young children (including a newborn) over many months
was a lifesaver, and my aunt, Alexsis Starcevich, whose support made that help
possible. Also thank you to my father Chris Perrier and his wife Cecilia Basile, and
my sister, Mirabai Perrier, who in the final weeks gave her all.
Most of all, I would like to express unending gratitude to my partner, Paige New-
man and my children, Violet Perrier-Newman and Scarlett Perrier-Newman. Their
sacrifices in time, space and attention to accommodate and support my research
were essential, without which I could not have completed this work, especially dur-
ing the challenging period of the COVID-19 pandemic through which this thesis was
formulated.
This is a THESIS BY COMPILATION. Parts of this thesis have been already
published. The content have been edited to suit the formatting of the thesis and to
maintain its coherence.
DECLARATION OF PUBLICATIONS INCLUDED IN THE THESIS
Contents

List of Figures xi

List of Tables xix

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Chapter Synopses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 A Note on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Background Theory 17
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Quantum Information Processing . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Operator formalism . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Quantum control . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 Open quantum systems . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Algebra and Lie Theory . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Lie groups and Lie algebras . . . . . . . . . . . . . . . . . . . 26
2.3.2 Representation theory . . . . . . . . . . . . . . . . . . . . . . 28
2.3.3 Cartan algebras and Root-systems . . . . . . . . . . . . . . . 29
2.3.4 Cartan decompositions . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Manifolds and tangents . . . . . . . . . . . . . . . . . . . . . . 32
2.4.2 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Tensors and metrics . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.4 Tangent planes and Lie algebras . . . . . . . . . . . . . . . . . 36
2.4.5 Fibre bundles and Connections . . . . . . . . . . . . . . . . . 37
2.4.6 Geodesics and parallel transport . . . . . . . . . . . . . . . . . 40

i
ii CONTENTS

2.4.7 Riemannian and subRiemannian manifolds . . . . . . . . . . . 42

2.4.8 Symmetric spaces . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.9 SubRiemannian Geometry . . . . . . . . . . . . . . . . . . . . 44
2.5 Geometric control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Time optimisation . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.3 Variational methods . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.4 KP problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6 Quantum machine learning . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6.1 Statistical learning theory . . . . . . . . . . . . . . . . . . . . 51
2.6.2 Loss functions and model complexity . . . . . . . . . . . . . . 52
2.6.3 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6.4 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.5 Optimisation methods . . . . . . . . . . . . . . . . . . . . . . 55
2.6.6 Machine learning and quantum computing . . . . . . . . . . . 56
2.6.7 Parametrised variational quantum circuits . . . . . . . . . . . 58
2.6.8 Greybox machine learning . . . . . . . . . . . . . . . . . . . . 59

3 QDataSet and Quantum Greybox Learning 61

3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Overview of QML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 QML Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.2 Large-Scale Data and Machine Learning . . . . . . . . . . . . 67
3.4.3 Taxonomy of large-scale datasets . . . . . . . . . . . . . . . . 68
3.4.4 QML Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4.5 QML and QC platforms . . . . . . . . . . . . . . . . . . . . . 73
3.4.6 Quantum data in datasets . . . . . . . . . . . . . . . . . . . . 75
3.4.7 QDataSet Methodological Overview . . . . . . . . . . . . . . . 76
3.4.7.1 Formalism . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.7.2 Additional methodological concepts . . . . . . . . . . 78
3.4.7.3 One- and two-qubit Hamiltonians . . . . . . . . . . . 79
3.4.8 Hamiltonians: drift, control, noise . . . . . . . . . . . . . . . . 81
3.4.8.1 QDataSet Control . . . . . . . . . . . . . . . . . . . 83
3.4.8.2 QDataSet Metrics . . . . . . . . . . . . . . . . . . . 84
3.5 Experimental Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5.1 QDataSet and Scalability . . . . . . . . . . . . . . . . . . . . 86
3.5.1.1 QDataSet and Error-Correction . . . . . . . . . . . . 87
CONTENTS iii

3.5.2 QDataSet Noise Methodology . . . . . . . . . . . . . . . . . . 88

3.5.2.1 Noise characteristics . . . . . . . . . . . . . . . . . . 88
3.5.3 QDataSet noise profiles . . . . . . . . . . . . . . . . . . . . . . 89
3.5.3.1 Noise profile details . . . . . . . . . . . . . . . . . . . 90
3.5.3.2 Distortion . . . . . . . . . . . . . . . . . . . . . . . . 92
3.5.3.3 QDataSet noise operators . . . . . . . . . . . . . . . 92
3.5.3.4 QDataSet Measurements . . . . . . . . . . . . . . . . 94
3.5.3.5 Pauli matrices . . . . . . . . . . . . . . . . . . . . . 95
3.5.3.6 Pauli measurements . . . . . . . . . . . . . . . . . . 95
3.5.3.7 Monte Carlo measurements . . . . . . . . . . . . . . 98
3.5.3.8 Monte Carlo Simulator . . . . . . . . . . . . . . . . . 98
3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.6.1 QDataSet form . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.6.2 QDataSet parameters . . . . . . . . . . . . . . . . . . . . . . . 99
3.6.3 Greybox Algorithms . . . . . . . . . . . . . . . . . . . . . . . 100
3.6.4 Datasets and naming convention . . . . . . . . . . . . . . . . . 100
3.7 Technical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.7.1 Distortion analysis . . . . . . . . . . . . . . . . . . . . . . . . 102
3.7.2 Comparison with Qutip . . . . . . . . . . . . . . . . . . . . . 102
3.8 Usage Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.8.1 QDataSet Control Sets . . . . . . . . . . . . . . . . . . . . . . 105
3.9 Machine learning using the QDataSet . . . . . . . . . . . . . . . . . . 106
3.9.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.9.2 Benchmarking by learning protocol . . . . . . . . . . . . . . . 107
3.9.3 Benchmarking by objectives and architecture . . . . . . . . . . 108
3.10 Example applications of the QDataSet . . . . . . . . . . . . . . . . . 109
3.10.1 Quantum state tomography . . . . . . . . . . . . . . . . . . . 109
3.10.2 Quantum noise spectroscopy . . . . . . . . . . . . . . . . . . . 112
3.10.3 Quantum control and circuit synthesis . . . . . . . . . . . . . 113
3.11 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.12 Code availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.13 Figures & Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.13.1 Monte Carlo algorithm . . . . . . . . . . . . . . . . . . . . . . 120

4 Quantum Geometric Machine Learning 127

4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . 129
iv CONTENTS

4.3.2 New contributions . . . . . . . . . . . . . . . . . . . . . . . . 129

4.3.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.4 Quantum control and geometry . . . . . . . . . . . . . . . . . . . . . 131
4.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.4.2 Quantum control formalism . . . . . . . . . . . . . . . . . . . 132
4.4.2.1 Control formulations . . . . . . . . . . . . . . . . . . 132
4.4.2.2 Path-length and Lie groups . . . . . . . . . . . . . . 135
4.4.2.3 Accessible controls and drift Hamiltonians . . . . . . 137
4.4.2.4 Geometric optimisation . . . . . . . . . . . . . . . . 139
4.5 SubRiemannian quantum circuit synthesis . . . . . . . . . . . . . . . 140
4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.5.2 SubRiemannian Normal Geodesics . . . . . . . . . . . . . . . 142
4.5.3 One- and two-body terms . . . . . . . . . . . . . . . . . . . . 144
4.5.4 Generating geodesics . . . . . . . . . . . . . . . . . . . . . . . 145
4.6 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.6.1 Experimental objectives . . . . . . . . . . . . . . . . . . . . . 149
4.6.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.6.2.1 Geodesic deep learning architectures . . . . . . . . . 150
4.6.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.6.2.3 Geodesic architectures: Fully-connected Greybox . . 154
4.6.2.4 Geodesic architectures: GRU RNN Greybox . . . . . 156
4.6.2.5 Geodesic architectures: SubRiemannian model . . . . 157
4.6.2.6 Geodesic architectures: GRU & Fully-connected (orig-
inal) model . . . . . . . . . . . . . . . . . . . . . . . 159
4.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.7.2 Tables and charts . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.8.1 Geodesic approximation performance . . . . . . . . . . . . . . 163
4.8.2 Greybox improvements . . . . . . . . . . . . . . . . . . . . . . 163
4.8.3 Segment and scale dependence . . . . . . . . . . . . . . . . . . 166
4.8.4 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.9 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.10 Algorithmic architectures . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.10.1 Fully-connected Greybox model . . . . . . . . . . . . . . . . . 172
4.10.2 GRU RNN Greybox model, parameters θ = (w, b) . . . . . . . 173
4.10.3 SubRiemannian model . . . . . . . . . . . . . . . . . . . . . . 173
4.10.4 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.10.5 Generalisation: worked example . . . . . . . . . . . . . . . . . 174
CONTENTS v

4.11 Differential geometry and Lie groups . . . . . . . . . . . . . . . . . . 176

4.11.1 Generating subspaces for geodesics . . . . . . . . . . . . . . . 176
4.11.2 Product Operator Basis . . . . . . . . . . . . . . . . . . . . . 176
4.11.3 One- and two-body operators . . . . . . . . . . . . . . . . . . 177
4.11.4 Nielsen’s approach . . . . . . . . . . . . . . . . . . . . . . . . 178
4.12 Comparing geodesic approximations . . . . . . . . . . . . . . . . . . . 182
4.13 Neural network and GRU architectures . . . . . . . . . . . . . . . . . 185
4.13.1 Feed-forward neural networks . . . . . . . . . . . . . . . . . . 185
4.13.2 LSTMs and GRUs . . . . . . . . . . . . . . . . . . . . . . . . 186

5 Global Cartan Decompositions for KP problems 189

5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.3 Symmetric spaces and KP time-optimal control . . . . . . . . . . . . 191
5.3.1 Overview - setting the scene . . . . . . . . . . . . . . . . . . . 191
5.3.2 KP problems for Lambda systems . . . . . . . . . . . . . . . . 192
5.3.3 KAK decompositions . . . . . . . . . . . . . . . . . . . . . . . 193
5.3.4 Sketch of constant-θ method . . . . . . . . . . . . . . . . . . . 194
5.3.5 Holonomy targets . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.3.6 Symmetric space controls in p . . . . . . . . . . . . . . . . . . 197
5.3.7 Cartan decomposition . . . . . . . . . . . . . . . . . . . . . . 198
5.4 Time-optimal control examples . . . . . . . . . . . . . . . . . . . . . 200
5.4.1 SU(2) time-optimal control . . . . . . . . . . . . . . . . . . . . 200
5.4.2 SU (3)/S(U (1) × U (2)) time-optimal control . . . . . . . . . . 206
5.4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 206
5.4.2.2 General form of target unitaries in SU (3) . . . . . . 210
5.4.2.3 Comparison with existing methods . . . . . . . . . . 213
5.4.2.4 Constant-θ method . . . . . . . . . . . . . . . . . . . 215
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.6 Generalised constant-θ method . . . . . . . . . . . . . . . . . . . . . 218
5.7 Minimal connections and holonomy groups . . . . . . . . . . . . . . . 226
5.8 Cayley transforms and Dynkin diagrams . . . . . . . . . . . . . . . . 228
5.8.1 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

A Appendix (Quantum Information Processing) 233

A.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
A.1.1 State space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
A.1.1.1 Tensor product states . . . . . . . . . . . . . . . . . 239
A.1.1.2 Quantum State formalism . . . . . . . . . . . . . . . 241
A.1.2 Operators and evolution . . . . . . . . . . . . . . . . . . . . . 242
vi CONTENTS

A.1.2.1 Operator formalism . . . . . . . . . . . . . . . . . . 242

A.1.3 Density operators and multi-state systems . . . . . . . . . . . 247
A.1.3.1 Quantum channels . . . . . . . . . . . . . . . . . . . 250
A.1.4 Quantum evolution . . . . . . . . . . . . . . . . . . . . . . . . 252
A.1.4.1 Two pictures of quantum evolution . . . . . . . . . . 253
A.1.5 Hamiltonian formalism . . . . . . . . . . . . . . . . . . . . . . 254
A.1.5.1 Time-independent approximations . . . . . . . . . . 255
A.1.6 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
A.1.6.1 POVMs and Kraus Operators . . . . . . . . . . . . . 258
A.1.6.2 Composite system measurement . . . . . . . . . . . . 259
A.1.6.3 Informational completeness and POVMs . . . . . . . 260
A.1.6.4 Expectation evolution . . . . . . . . . . . . . . . . . 261
A.1.7 Quantum entanglement . . . . . . . . . . . . . . . . . . . . . . 262
A.1.8 Quantum metrics . . . . . . . . . . . . . . . . . . . . . . . . . 263
A.1.8.1 State discrimination . . . . . . . . . . . . . . . . . . 264
A.1.8.2 Fidelity function . . . . . . . . . . . . . . . . . . . . 265
A.2 Quantum Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
A.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
A.2.2 Evolution, Hamiltonians and control . . . . . . . . . . . . . . 267
A.2.3 Control systems and strategies . . . . . . . . . . . . . . . . . . 269
A.3 Open quantum systems . . . . . . . . . . . . . . . . . . . . . . . . . . 270
A.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
A.3.2 Noise and quantum evolution . . . . . . . . . . . . . . . . . . 272
A.3.3 Noise and decoherence . . . . . . . . . . . . . . . . . . . . . . 272
A.3.4 Noise and spectral density . . . . . . . . . . . . . . . . . . . . 274
A.4 Analysis, Measure Theory & Probability . . . . . . . . . . . . . . . . 275
A.4.1 Analysis and Measure Theory . . . . . . . . . . . . . . . . . . 275
A.4.2 Probability measure . . . . . . . . . . . . . . . . . . . . . . . 277
A.4.3 Haar measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

B Appendix (Algebra) 281

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
B.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
B.2 Lie theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
B.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
B.2.2 Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
B.2.3 Matrix Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . 284
B.2.4 Lie algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
B.2.5 Killing form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
CONTENTS vii

B.2.6 Matrix exponential . . . . . . . . . . . . . . . . . . . . . . . . 290

B.2.7 Lie algebra of matrix Lie group . . . . . . . . . . . . . . . . . 291
B.2.8 Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 293
B.2.9 Baker-Campbell-Hausdorff theorem . . . . . . . . . . . . . . . 294
B.3 Representation theory . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B.3.2 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B.3.3 Complex Semi-simple Lie Algebras . . . . . . . . . . . . . . . 295
B.3.4 Adjoint action and commutation relations . . . . . . . . . . . 296
B.3.5 Adjoint expansions . . . . . . . . . . . . . . . . . . . . . . . . 296
B.3.6 Real and complex forms . . . . . . . . . . . . . . . . . . . . . 298
B.4 Cartan algebras and Root-systems . . . . . . . . . . . . . . . . . . . . 298
B.4.1 Complex Semi-simple Lie Algebras . . . . . . . . . . . . . . . 298
B.4.2 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
B.4.3 Cartan subalgebras . . . . . . . . . . . . . . . . . . . . . . . . 300
B.4.4 Root system properties . . . . . . . . . . . . . . . . . . . . . . 301
B.4.5 Abstract root systems . . . . . . . . . . . . . . . . . . . . . . 302
B.4.6 Reduced abstract root systems . . . . . . . . . . . . . . . . . . 302
B.4.7 Ordering of root systems . . . . . . . . . . . . . . . . . . . . . 303
B.4.8 Cartan matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 303
B.5 Cartan decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . 305
B.5.1 Compact Real Forms . . . . . . . . . . . . . . . . . . . . . . . 305
B.5.2 Compact and non-compact subalgebras . . . . . . . . . . . . . 307
B.5.3 Cayley transforms . . . . . . . . . . . . . . . . . . . . . . . . . 308

C Appendix (Differential Geometry) 309

C.1 Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
C.1.1 Manifolds and charts . . . . . . . . . . . . . . . . . . . . . . . 309
C.1.1.1 Tangent spaces . . . . . . . . . . . . . . . . . . . . . 311
C.1.1.2 Push-forwards . . . . . . . . . . . . . . . . . . . . . 314
C.1.2 Tangent spaces and derivations . . . . . . . . . . . . . . . . . 315
C.1.2.1 Vector fields . . . . . . . . . . . . . . . . . . . . . . . 316
C.1.2.2 Vector fields and commutators . . . . . . . . . . . . 318
C.1.2.3 Integral curves and Hamiltonian flow . . . . . . . . . 319
C.1.2.4 Local flows . . . . . . . . . . . . . . . . . . . . . . . 320
C.1.3 Cotangent vectors and dual spaces . . . . . . . . . . . . . . . 321
C.1.3.1 Dual spaces . . . . . . . . . . . . . . . . . . . . . . . 321
C.1.3.2 One forms . . . . . . . . . . . . . . . . . . . . . . . . 323
C.1.3.3 Pullbacks of one-forms . . . . . . . . . . . . . . . . . 325
viii CONTENTS

C.1.3.4 Lie derivatives and pullbacks . . . . . . . . . . . . . 326

C.1.4 General tensors and n-forms . . . . . . . . . . . . . . . . . . . 328
C.1.4.1 Metric tensor . . . . . . . . . . . . . . . . . . . . . . 331
C.1.4.2 n-forms and exterior products . . . . . . . . . . . . . 332
C.1.5 Tangent planes and Lie algebras . . . . . . . . . . . . . . . . . 334
C.1.5.1 Left and right translation . . . . . . . . . . . . . . . 334
C.1.5.2 Right and left invariance and Schrödinger’s equation 336
C.1.5.3 Exponentials, integral curves and tangent spaces . . 337
C.1.5.4 Maurer-Cartan Form . . . . . . . . . . . . . . . . . . 338
C.1.5.5 Infinitesimal transformations and adjoint action . . . 340
C.1.6 Fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
C.1.6.1 Principal fibre bundles . . . . . . . . . . . . . . . . . 343
C.1.6.2 Associated bundles . . . . . . . . . . . . . . . . . . . 344
C.1.7 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
C.1.7.1 Relation to Cartan decompositions . . . . . . . . . . 350
C.1.7.2 Fixed points and orbits . . . . . . . . . . . . . . . . 351
C.1.8 Parallel transport and horizontal lifts . . . . . . . . . . . . . . 353
C.1.9 Covariant differentiation . . . . . . . . . . . . . . . . . . . . . 355
C.1.10 Geodesics and parallelism . . . . . . . . . . . . . . . . . . . . 357
C.2 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . 358
C.2.1 Riemannian Manifolds and Metrics . . . . . . . . . . . . . . . 360
C.2.2 Fundamental forms . . . . . . . . . . . . . . . . . . . . . . . . 363
C.2.3 Curvature and forms . . . . . . . . . . . . . . . . . . . . . . . 365
C.2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 365
C.2.3.2 Sectional curvature . . . . . . . . . . . . . . . . . . . 366
C.3 Symmetric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
C.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
C.3.2 Classification of symmetric spaces . . . . . . . . . . . . . . . . 369
C.4 SubRiemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . 370
C.4.1 SubRiemannian Geodesics . . . . . . . . . . . . . . . . . . . . 371
C.5 Geometric control theory . . . . . . . . . . . . . . . . . . . . . . . . . 373
C.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
C.5.2 Geometric control preliminaries . . . . . . . . . . . . . . . . . 373
C.5.3 Time optimal control . . . . . . . . . . . . . . . . . . . . . . . 374
C.5.4 Variational methods . . . . . . . . . . . . . . . . . . . . . . . 376
C.5.4.1 Pontryagin Maximum Principle . . . . . . . . . . . . 376
C.5.4.2 PMP and quantum control . . . . . . . . . . . . . . 378
C.5.5 Time optimal problems with Lie groups . . . . . . . . . . . . . 381
C.5.5.1 Dynamical Lie algebras . . . . . . . . . . . . . . . . 381
CONTENTS ix

C.5.5.2 Symplectic manifolds and Hamiltonian flow . . . . . 382

C.5.5.3 Geodesics and Hamiltonian Flow . . . . . . . . . . . 385
C.5.6 KP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
C.5.7 SubRiemannian control and symmetric spaces . . . . . . . . . 386

D Appendix (Quantum Machine Learning) 389

D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
D.2 The Nature of Learning . . . . . . . . . . . . . . . . . . . . . . . . . 392
D.2.1 Taxonomies of learning . . . . . . . . . . . . . . . . . . . . . . 392
D.3 Statistical Learning Theory . . . . . . . . . . . . . . . . . . . . . . . 393
D.3.1 Classical statistical learning theory . . . . . . . . . . . . . . . 393
D.3.1.1 Empirical risk . . . . . . . . . . . . . . . . . . . . . . 395
D.3.1.2 Common Loss Functions . . . . . . . . . . . . . . . . 396
D.3.1.3 Model complexity and tradeoffs . . . . . . . . . . . . 396
D.3.2 Reducing empirical risk . . . . . . . . . . . . . . . . . . . . . . 397
D.3.3 No-Free Lunch Theorems . . . . . . . . . . . . . . . . . . . . . 398
D.3.4 Statistical performance measures . . . . . . . . . . . . . . . . 398
D.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
D.4.1 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
D.4.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 402
D.4.2.1 Neural network components . . . . . . . . . . . . . . 403
D.4.2.2 Layers in neural networks . . . . . . . . . . . . . . . 404
D.4.2.3 Neural network schema . . . . . . . . . . . . . . . . 405
D.5 Optimisation methods . . . . . . . . . . . . . . . . . . . . . . . . . . 406
D.5.1 Optimisation and Gradient Descent . . . . . . . . . . . . . . . 406
D.5.2 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . 407
D.5.3 Natural gradients . . . . . . . . . . . . . . . . . . . . . . . . . 410
D.5.4 Regularisation and Hyperparameter tuning . . . . . . . . . . . 411
D.6 Quantum Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 412
D.6.1 Neural networks and quantum systems . . . . . . . . . . . . . 413
D.6.2 Quantum statistical learning . . . . . . . . . . . . . . . . . . . 414
D.6.3 Quantum measurement and machine learning . . . . . . . . . 416
D.6.4 Barren Plateaus . . . . . . . . . . . . . . . . . . . . . . . . . . 417
D.6.5 Encoding data in quantum systems . . . . . . . . . . . . . . . 418
D.7 Variational quantum circuits . . . . . . . . . . . . . . . . . . . . . . . 420
D.7.1 QML Optimisation . . . . . . . . . . . . . . . . . . . . . . . . 423
D.7.2 QML Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 424
D.8 Symmetry-based QML . . . . . . . . . . . . . . . . . . . . . . . . . . 426
D.8.1 QML Optimal Control . . . . . . . . . . . . . . . . . . . . . . 427
x CONTENTS

D.8.2 Geometric information theory . . . . . . . . . . . . . . . . . . 427

D.8.3 Geometric QML . . . . . . . . . . . . . . . . . . . . . . . . . . 428
D.9 Greybox machine learning . . . . . . . . . . . . . . . . . . . . . . . . 429

Bibliography 433
List of Figures

1.1 Venn diagram describing overlap of quantum information process-

ing and control theory (Quantum Control Theory), Geometry and
Machine Learning (ML). QGML sits at the intersection of quantum
computing and control theory, geometry and machine learning. Re-
lated fields, including quantum geometric control (QGC), geometric
machine learning (GML) and quantum machine learning (QML) can
be situated accordingly. The most overlap between existing literature
and the present work on QGML is geometric quantum machine learn-
ing (GQML) which seeks to encode symmetries across entire quantum
machine learning networks. . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Plot of an undistorted (orange) pulse sequence against a related dis-

torted (blue) pulse sequence for the single-qubit Gaussian pulse dataset
with x-axis control (‘G 1q X’) over the course of the experimental
runtime. Here f (t) is the functional (Gaussian) form of the pulse
sequence for time-steps t. These plots were used in the first step of
the verification process for QDataSet. The shift in pulse sequence
is consistent with expected effects of distortion filters. The pulse se-
quences for each dataset can be found in simulation parameters =⇒
dynamic operators =⇒ pulses (undistorted) or distorted pulses for
the distorted case (see Table (3.2) for a description of the dataset
characteristics). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.2 The frequency response (left) and the phase response (right) of the
filter that is used to simulate distortions of the control pulses. The
frequency is in units of Hz, and the phase response is in units of rad. 123

xi
xii LIST OF FIGURES

3.3 Plot of average observable (measurement) value for all observables

(index indicates each observable in order of Pauli measurements) for
all measurement outcomes for samples drawn from dataset G 1q X
(using TensorFlow ‘tf’, orange line) against the same mean for equiv-
alent simulations in Qutip (blue line - not shown due to identical
overlap) for a single dataset. Each dataset was sampled and compar-
ison against Qutip was undertaken with equivalent results. The error
between means was of order 10−6 i.e. they were effectively identical
(so the blue line is not shown). . . . . . . . . . . . . . . . . . . . . . 124

3.4 An example of a quantum state rotation on the Bloch sphere. The

|0⟩ , |1⟩ indicates the σz -axis, the X and Y the σx and σy axes respec-
tively. In (a), the vector is residing in a +1 σx eigenstate. By rotating
about the σz axis by π/4, the vector is rotated to the right, to the +1
σy eigenstate. A rotation about the σZ axis by angle θ is equivalent
to the application of the unitary U (θ) = exp(−iθz σz /2). . . . . . . . . 124

4.1 Sketch of geodesic path. The evolution of quantum states is repre-

sented by the evolution according to Schrödinger’s equation of unitary
propagators U as curves (black line) on a manifold U ∈ G generated
by generators (tangent vectors) (blue) in the time-dependent case
(4.4.2). For the time-independent case, the geodesic is approximated
by evolution of discrete unitaries for time ∆t, represented by red
curves (shown as linear for ease of comprehension). Here Uti repre-
sents the evolved unitary at time ti . . . . . . . . . . . . . . . . . . . . 134

4.2 Schema of Fully-Connected Greybox model: (a) realised UT inputs

(flattened) into a stack of feed-forward fully connected layers with
ReLU activations and dropout of 0.2; (b) the final dense layer in this
stack outputs a sequence of controls (ĉj ) using tanh activation func-
tions; (c) these are fed into a custom Hamiltonian estimation layer
produce a sequence of Hamiltonians (Ĥj ) using ∆ ; (d) these in turn
are fed into a custom quantum evolution layer implementing the time-
independent Schrödinger equation to produce estimated sequences of
subunitaries (Ûj ) which are fed into (e) a final fidelity layer for com-
parison with the true (Uj ). Intermediate outputs are accessible via
submodels in TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . 156
LIST OF FIGURES xiii

4.3 Schema of GRU RNN Greybox model: (a) realised UT inputs (flat-
tened) into a GRU RNN layer comprising GRU cells in which each
segment j plays the role of the time parameter; (b) the output of the
GRU layer is a sequence of control pulses (ĉj ) using tanh activation
functions; (c) these are fed into a custom Hamiltonian estimation
layer to produce a sequence of Hamiltonians (Ĥj ) by applying the
control amplitudes to ∆; (d) the Hamiltonian sequence is fed into a
custom quantum evolution layer implementing the time-independent
Schrödinger equation to produce estimated sequences of subunitaries
(Ûj ) which are fed into (e) a final fidelity layer for comparison with
the true (Uj ). Intermediate outputs are accessible via submodels in
TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

4.4 Schema of SubRiemannian model: (a) realised UT inputs (flattened)

into a set of feed-forward fully-connected dense layers (with dropout
∼ 0.2); (b) two layers (red) output sets of control amplitudes for esti-
−
mating the positive (ĉ+Λ0 ) and negative (ĉΛ0 ) control amplitudes using
tanh activation functions; (c) these are fed into two custom Hamil-
−
tonian estimation layers to produce the positive Λ̂+ 0 and negative Λ̂0
Hamiltonians for Λ0 using ∆ or su(2n ) that are combined into a single
Hamiltonian estimation Λ̂0 ; (d) Λ̂0 is fed into a custom subRieman-
nian layer which generates the control amplitudes (ĉj ), Hamiltonians
(Ĥj ) and then implements the time-independent Schrödinger equa-
tion to produce estimated sequences of subunitaries (Ûj ) which are
fed into (e) a final fidelity layer for comparison with the true (Uj ).
Intermediate outputs (a) to (d) are accessible via submodels in Ten-
sorFlow. The SubRiemannian model resulted in average gate fidelity
when learning representations of (Uj ) of over 0.99 on training and val-
idation sets in comparison to existing GRU & FC Blackbox models
which recorded average gate fidelities of ≈ 0.70, demonstrating the
utility of greybox machine learning models in synthesising learning
unitary sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.5 Training and validation loss (MSE) for SU(2). . . . . . . . . . . . . . 163

4.6 Training and validation loss (MSE) for SU(4). . . . . . . . . . . . . . 163

xiv LIST OF FIGURES

4.7 Training and validation loss (MSE). Comparison of MSE at differ-

ent time intervals for the SubRiemannian model. h = 0.1, 0.5 and 1.
G = SU (2), ntrain = 1000, nseg = 10, epochs= 500, Λ0 ∈ su(2n ): This
plot shows the differences in MSE on training and validation sets as
the time-step h = ∆t varies from 0.1 to 1. As can be seen, larger h
leads to deterioration in performance (higher MSE). However, smaller
h can lead to insufficiently long geodesics, leading to a deterioration
in generalisation. Setting h = 0.1 (red curves) exhibits the best over-
all performance. Even a smaller jump up to h = 0.5 (blue curves)
exhibits an increase in MSE and decrease in performance by several
orders of magnitude (and similarly for h = 1). . . . . . . . . . . . . . 164

4.8 Training and validation loss (MSE). Comparison of SubRiemannian,

FC Greybox and GRU RNN Greybox models. G = SU (8), ntrain =
1000, nseg = 10, h = 0.1, epochs= 500, Λ0 ∈ su(2n ): For U ∈ SU (8),
we see (main plot - first 100 epochs) that the GRU RNN Greybox
(blue line) performs best in terms of batch fidelity MSE on training
and validation sets. As shown in the inset, the GRU RNN Greybox
levels out (saturates) after about 100 epochs and overall performed
the best of each of the models and rendered average operator fidelities
of around 0.998. The SubRiemannian model (red) performed less-well
than the GRU RNN, still recording high average operator fidelity
but exhibiting overfitting as can be seen by the divergence of the
validation (dashed) curve from the training (smooth) curve. The FC
Greybox rapidly saturates for large ntrain and exhibits little in the way
of learning. All models render high average operator fidelity > 0.99
and saturate after around 150 epochs (see inset). . . . . . . . . . . . . 165

4.9 Training and validation loss (MSE): GRU RNN Greybox. G = SU (2), ntrain =
1000, nseg = 100, h = 0.1, epochs= 500. This plot shows the MSE loss
(for training and validation sets) for the GRU RNN Greybox model
where the number of segments was increased from 10 to 100. As can
be seen, the model saturates rapidly once segments are increased to
100 and exhibits no significant learning. Similar results were found for
the SubRiemannian model. This result suggests that simply chang-
ing the number of segments is insufficient for model improvement.
One solution to this problem may be to introduce variable or adap-
tive hyperparameter tuning into the model such that the number of
segments varies dynamically. . . . . . . . . . . . . . . . . . . . . . . . 166
LIST OF FIGURES xv

4.10 Scale h dependence (SubRiemannian model). G = SU (2), ntrain =

1000, nseg = 10, epochs= 500. Plot demonstrates increase in batch
fidelity MSE as scale h (∆t) increases from 0.1 to 1, indicative of
dependence of learning performance on time-interval over which sub-
unitaries Uj are evolved. . . . . . . . . . . . . . . . . . . . . . . . . . 167

4.11 Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg =

10, epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity
F (ÛT , ŨT ) of randomly generated ŨT with the reconstructed estimate
ÛT , versus F (ÛT , ŨT ), average operator fidelity of randomly generated
UT with training {UT }train inputs to the model. The upward trend
indicates an increase in operator fidelity as similarity (Pearson coeffi-
cient of 0.52 to 95% significance) of UT to training {UT }train increases.
Colour gradient indicates low fidelity (blue) to high fidelity (red). . . 168

4.12 Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg =

10, epochs= 500, Λ0 ∈ ∆. Plot of generalised gate fidelity F (ÛT , ŨT )
of random-angle θ ∈ [−2π, π] z-rotations against generated ŨT with
the reconstructed estimate ÛT , versus F (ÛT , ŨT ), average operator fi-
delity of randomly generated UT with training {UT }train inputs to the
model. Here there is no statistically significant correlation between
UT and training set {UT }train , though higher test fidelities are evident
for UT bearing both high and low similarity to the training set (less
dependence on similarity to training set for high fidelities). . . . . . . 169

4.13 Generalisation (SubRiemannian model). G = SU (8), ntrain = 1000, nseg =

10, epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity
F (ÛT , ŨT ) versus F (ÛT , ŨT ) (average operator fidelity against train-
ing set {UT }train ). Generalisation was significantly worse for SU(8),
however correlation of generalised gate fidelity with similarity of UT
to training sets is evident. . . . . . . . . . . . . . . . . . . . . . . . . 170

4.14 Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg =

10, epochs= 500, Λ0 ∈ ∆. Plot of F (ÛT , ŨT ) of random-angle θ ∈
[−2π, π] z-rotations θ. As evident by the red high fidelities across
the range [−2π, π], the SubRiemannian model trained on data where
Λ0 ∈ ∆ and ∆ = {X, Y } in certain cases does generalise relatively well.171

⊥
5.1 Cayley transform of −iHIII expressed as a rotation into −iλ5 . The
presence of the imaginary unit relates to the compactness here of g
which reflects the boundedness and closed nature of the transforma-
tion characteristic of unitary transformations. . . . . . . . . . . . . . 228
xvi LIST OF FIGURES

⊥
5.2 Cayley transform of −iHIII expressed as a rotation into λ5 . By con-
trast with the case to the left, the absence of the imaginary unit is
indicative of non-compactness such that distances are not preserved
(unlike in the unitary case where −i is present). . . . . . . . . . . . . 228
5.3 A transition diagram showing the relationship between energy tran-
sitions and roots. In a quantum control context, a transition between
two energy levels of an atom that can be described by a root vector
in the Hamiltonian. For example, a transition between |0⟩ → |1⟩ can
be described using the root vector eα . An electromagnetic pulse with
a frequency resonant with the energy difference between these two
levels can, if applied correctly, transition the system consistent with
the action of eα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.4 Symmetric root system diagram for the root system described via
roots in equation (5.8.3) for the Lie algebra su(3). The roots α, β, γ
can be seen in terms of angles between the root vectors and can be
calculated using the Cartan matrix. . . . . . . . . . . . . . . . . . . . 230
5.5 Combined diagram of a Dynkin diagram and a symmetric root system
with specified angles and relations. . . . . . . . . . . . . . . . . . . . 231

A.1 Commutative diagram showing the university property of tensor prod-

ucts discussed above. . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

B.1 Expanded Dynkin diagram of type An with labeled vertices and edges.
The numbers above the nodes indicate the length of the roots rela-
tive to each other. Aij Aji determines the number of lines (or the type
of connection) between vertices i and j. This product can be 0 (no
connection), 1 (single line), 2 (double line), or 3 (triple line), rep-
resenting the angle between the corresponding roots. Additionally,
when Aij Aji > 1, an arrow is drawn pointing from the longer root to
the shorter root. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

C.1 Commutative diagram showing the relationship of the connection

(map from G → P → M), the projection map π : P → M and
induced horizontal map (the pushforward) π ∗ . . . . . . . . . . . . . 348

D.1 Schema of the first two layers of a fully-connected feed-forward neural

network (definition D.4.4) together with associated matrix and alge-
(l) (l)
braic representation. Here ai are the input layer neurons, ωlj are
(l)
the weights (absorbing bias terms) for neuron aj (diagram adapted
from [1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
LIST OF FIGURES xvii

D.2 Manifold mapping between parameter manifold K and its fibre bun-
dle Rm and target manifold M with associated fibre bundle spaces
R2n . Optimisation across the parameter manifold can be construed
in certain cases as an embedding of the parameter manifold within
the target (label) manifold M, where such embeddings may often be
complex and give rise to non-standard topologies. . . . . . . . . . . . 423
xviii LIST OF FIGURES
List of Tables

2.1 Quantum and classical machine learning table. Quantum machine

learning covers four quadrants (listed in ‘QML Division’) which differ
depending on whether the inputs, outputs or process is classical or
quantum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1 Taxonomy of large-scale datasets which can guide the generation of

QML datasets and development of QML dataset taxonomies. . . . . . 68

3.2 QDataSet characteristics. The left column identifies each item in

the respective QDataSet examples (expressed as keys in the relevant
Python dictionary) while the description column describes each item. 117

3.3 QDataSet features for quantum state tomography. The left columns
lists typical categories in a machine learning architecture. The right
column describes the corresponding feature(s) of the QDataSet that
would fall into such categories for the use of the QDataSet in training
quantum tomography algorithms. . . . . . . . . . . . . . . . . . . . . 118

3.4 QDataSet features for quantum noise spectroscopy. The left columns
lists typical categories in a machine learning architecture. The right
column describes the corresponding feature(s) of the QDataSet that
would fall into such categories for the use of the QDataSet in training
quantum tomography algorithms. . . . . . . . . . . . . . . . . . . . . 118

3.5 QDataSet features for quantum control. The left columns lists typical
categories in a machine learning architecture. The right column de-
scribes the corresponding feature(s) of the QDataSet that would fall
into such categories for the use of the QDataSet in training quantum
control algorithms. The specifications are just one of a set of possible
ways of framing quantum control problems using machine learning. . 119

xix
xx LIST OF TABLES

3.6 The general categorization of the provided datasets. The QDataSet

examples were generated from simulations of either one or two qubit
systems. For each one or two qubit simulation, the drift component
of the Hamiltonian was along a particular axis (the z-axis) for the
single-qubit case and the z-axis of the first qubit for the two-qubit
case (but not the second qubit) or vice versa. Controls were applied
along different axes, such as x- or y- axes. Finally, noise was similarly
added to different axes: the z-axis (and in some cases the x-axis) of
the single qubit case and the z-axis case of the first or second qubit
for the two-qubit case. . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.7 Dataset Parameters: T : total time, set to unity for standardisation;

M : the number of time-steps (discretisations); K: the number of
noise realisations; Ω: the energy gap for the single qubit case (where
subscripts 1 and 2 represent the energy gap for each qubit in the
single qubit case); n: number of control pulses; Amax , Amin : maximum
and minimum amplitude; σ: standard deviation of pulse spacing (for
Gaussian pulses). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.8 QDataSet File Description (Gaussian). The left column identifies

each dataset in the respective QDataSet examples while the descrip-
tion column describes the profile of the Gaussian pulse datasets in
terms of (i) number of qubits, (ii) axis of control and pulse wave-form
(iii) axis and type of noise and (iv) whether distortion is present or
absent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

3.9 QDataSet File Description (Square). The left column identifies each
dataset in the respective QDataSet examples while the description
column describes the profile of the square pulse datasets in terms of
(i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent. . 122

3.10 An example of the types of quantum data features which may be in-
cluded in a dedicated large-scale dataset for QML. The choice of such
features will depend on the particular objectives in question. We
include a range of quantum data in the QDataSet, including informa-
tion about quantum states, measurement operators and measurement
statistics, Hamiltonians and their corresponding gates, details of en-
vironmental noise and controls. . . . . . . . . . . . . . . . . . . . . . 123
LIST OF TABLES xxi

4.1 Comparison table of batch fidelity MSE ((Uj ) and (Ûj )) for training
(MSE(T)) and validation (MSE(V)) sets along with average opera-
tor fidelity (and order of standard deviation in parentheses) for four
neural networks where Λ0 ∈ su(2n ): (a) GRU & FC Blackbox (origi-
nal) (b) FC Greybox, (c) SubRiemannian model and (d) GRU RNN
Greybox model. Parameters: h = 0.1, nseg = 10, ntrain = 1000; train-
ing/validation 75/25; optimizer: Adam, α ≈1e-3. Note*: MSE for
GRU & FC Blackbox standard MSE comparing (Uj ) with Ûj . Sub-
Riemannian and GRU RNN Greybox models outperform blackbox
models on training and validation sets with lower MSE, higher aver-
age operator fidelity and lower variance. . . . . . . . . . . . . . . . . 162

4.2 Comparison table of batch fidelity MSE ((Uj ) v. (Ûj )) for train-
ing (MSE(T)) and validation (MSE(V)) sets along with average op-
erator fidelity (and order of standard deviation in parentheses) for
models where Λ0 ∈ ∆: (a) GRU & FC Blackbox (original) (b) Sub-
Riemannian model and (c) GRU RNN Greybox model. Parameters:
h = 0.1, nseg = 10, ntrain = 1000; training/validation 75/25; opti-
mizer: Adam, α ≈1e-3. Note*: MSE for GRU & FC Blackbox stan-
dard MSE comparing (Uj ) with Ûj . For this case, overall the GRU
RNN Greybox model performed slightly better than the SubRieman-
nian model, with both outperforming the GRU & FC Blackbox model.
The FC Greybox model was not tested given its inferior performance
overall. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

4.3 Table of control amplitudes for generation of z-rotation by angle θ =-

7.529e-01. At each time-interval k, controls cx,j are applied to X
generators and cy,j are applied to Y generators to form Hamiltonian
Hj , for time h = ∆t. . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

4.4 Hamiltonian distance and unitary fidelity between Swaddle and Boozer
geodesic approximations. . . . . . . . . . . . . . . . . . . . . . . . . . 184
xxii LIST OF TABLES

5.1 Commutation relations for generators in adjoint representation of

⊥
su(3). The Cartan decomposition is g = k⊕p where k = span{−iλ1 , −iλ2 , −iHIII , −iHIII }
(red) and p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 } (black). As can be seen
visually, the decomposition satisfies the Cartan commutation rela-
tions (equation (5.3.6)): the yellow region indicates [k, k] ⊂ k, the
green region that [p, p] ⊂ k and the white region that [p, k] ⊂ p. From
a control perspective, by inspection is clear that elements in k can
be synthesised (is reachable) via linear compositions of the adjoint
action of p upon itself (the green region) as a result of the fact that
[p, p] ⊆ k. We choose a = ⟨−iλ5 ⟩ with h = ⟨−iHIII , −iλ5 ⟩. . . . . . . . 208

C.1 Classification of Riemannian Globally Symmetric Spaces (adapted

from Helgason [2] as compiled in Wikipedia) . . . . . . . . . . . . . . 369

D.1 Quantum and classical machine learning table. Quantum machine

learning covers four quadrants which differ depending on whether the
inputs, outputs or process is classical or quantum. . . . . . . . . . . . 393
Chapter 1

Introduction

Abstract
The use of geometric and symmetry techniques in quantum and classical information
processing has a long tradition across the physical sciences as a means of theoretical
discovery and applied problem solving. In the modern era, the emergent combina-
tion of such geometric and symmetry-based methods with quantum machine learning
(QML) has provided a rich opportunity to contribute to solving a number of persis-
tent challenges in fields such as QML parametrisation, quantum control, quantum
unitary synthesis and quantum proof generation. In this thesis, we combine state-
of-the-art machine learning methods with techniques from differential geometry and
topology to address these challenges. We present a large-scale simulated dataset
of open quantum systems to facilitate the development of quantum machine learn-
ing as a field. We demonstrate the use of deep learning greybox machine learning
techniques for estimating approximate time-optimal unitary sequences as geodesics
on subRiemannian symmetric space manifolds. Finally, we present novel techniques
utilising Cartan decompositions and variational methods for analytically solving
quantum control problems for certain classes of Riemannian symmetric space.

1
2 CHAPTER 1. INTRODUCTION

Geometry will draw the soul toward truth and create the spirit of phi-
losophy (Plato)

There is geometry in the humming of the strings, there is music in the

spacing of the spheres (Pythagoras)

Spacetime tells matter how to move; matter tells spacetime how to curve
(John Wheeler )

One geometry cannot be more true than another; it can only be more
convenient (Élie Cartan)

1.1 Overview
This thesis introduces quantum geometric machine learning (QGML), presenting
results in quantum control and quantum machine learning using techniques from
differential geometry and Lie theory. Our work, as we discuss below and throughout
this thesis, represents a synthesis of four distinct but related disciplines: quantum
information science (focusing specifically on quantum computing and quantum con-
trol), abstract algebra and representation theory, differential geometry and quantum
machine learning. By combining insights and methods from these fields, we have
sought to leverage theory and architectures to explore how hybrid quantum-classical
systems can simulate quantum systems, learn underlying geometric symmetries and
be used as tools for optimisation problems. Throughout this thesis, we have en-
deavoured to connect the relevance of these important concepts to quantum in-
formation processing, control and machine learning in a way that is accessible to
a cross-disciplinary audience. Synthesising techniques across disciplines can be a
challenging and at times unwieldy task as drawing precise maps between concepts
faces a thicket of jargon, terms of art and disciplinary convention that characterise
any academic field. To add clarity to the process, we therefore foreshadow below
how each discipline relates to the overall contributions of this thesis.

(i) Quantum information processing. The primary focus of this thesis is to address
problems in quantum information processing, specifically problems related to
simulating quantum systems and the control of quantum systems. Quantum
information processing is a vast discipline, incorporating quantum comput-
ing, quantum communication and quantum sensing. Our focus is on the first
of these, quantum computing, but we retain a general information-theoretic
framing given the overlap with information sciences through quantum machine
learning theory. The targets and objective of the work in this thesis are the
1.1. OVERVIEW 3

construction of operators and computations represented by unitary operations

U (t) acting on quantum states belonging to a Hilbert space |ψ(t)⟩ ∈ H that
evolve the state over times t = 0, ..., T to a target end-state at time T given by
|ψ(T )⟩ = U (T ) |ψ(0)⟩ from some initial state |ψ(0)⟩ and which meet certain
optimality criteria, such as minimal time or energy. The particular states of
interest are mostly single- and multi-qubit systems (such that G = SU (2) or
a tensor product thereof), but our work also encompasses higher-order dimen-
sions of quantum system (such as qudits related to SU (3)).

(ii) Algebra. Algebra, by which we primarily mean the theory of continuous Lie
groups and representation theory, informs this thesis in many ways but its main
relevance is because, at least for closed quantum systems, unitary operations
the discovery of which is the primary objective above form a Lie group G
whose corresponding Lie algebra g is the basis for constructing the quantum
mechanical Hamiltonians governing the evolution of quantum systems. To this
end, we explore the deep connections between algebra and symmetry reflected
in the work of Cartan and others via concepts such as Cartan decompositions,
abstract root systems and diagrammatic tools such as Dynkin diagrams.

(iii) Geometry. Geometry enters the thesis as a way to frame problems of interest
such that we can leverage specific results in the theory of classical (and quan-
tum) geometric control theory. In geometric terms, the unfolding of a quantum
computation via the representation of unitary evolution can be construed as
the tracing out of a curve on a differentiable manifold M corresponding to the
Lie group G. The evolution of the curve is generated by vectors in the tangent
space T M which can be equated with the said Lie algebra g group generators
above. Time or energy-optimal (minimal) curves correspond to geodesics, so
the search for time-optimal quantum circuits or evolution becomes a question
of using the machinery of differential geometry to find geodesics with minimal
arc-length ℓ. For certain classes of Lie group G, the symmetry properties of
those groups allow the corresponding manifold G/K (for a chosen isometry
group K) to be construed as a Riemannian (or subRiemannian) symmetric
space admitting a Cartan decomposition g = k ⊕ p. In such cases, results from
geometric control theory show that general optimality (Pontryagin Maximum
Principle) criteria can be met under assumptions about the Lie group (and
Lie algebra) being fully reachable under the operation of the Lie derivative
(commutator). Moreover, the symmetry properties of such groups (specifically
their partition into horizontal HM and vertical V M subspaces corresponding
to the subalgebras p and k respectively) together with the Lie triple property
[p, [p, p]] ⊆ p mean that we can make simplifying assumptions about (a) the
4 CHAPTER 1. INTRODUCTION

existence of curves γ(t) between U (0) and U (T ) on M and (b) the uniqueness
(via being minimal length) of such curves, such as when those curves generated
by Hamiltonians drawn from p only. In this case, the optimisation problem is
considerably simplified, becoming a question of finding the minimal (optimal)
(in terms of energy or time) control functions u(t) which apply to such genera-
tors to evolve curves. The final substantive Chapter 5 provides a new method
for calculating such time-optimal curves for use in quantum information.

(iv) Quantum machine learning. Quantum and classical machine learning enter the
thesis as a way to then solve for this specific problem of finding optimal control
functions u(t) by training hybrid classical-quantum algorithms on data such
as data about quantum geodesics. The quantum element of machine learning
enters via a hybrid quantum-classical stack whereby the rules of quantum evo-
lution (such as via Schrödinger’s equation and via Hamiltonians comprising
Lie algebra elements g) are encoded in the neural network stack. The neu-
ral network architecture then seeks to leverage classical machine learning to
learn control functions u(t) to achieve objectives of synthesising U (T ) (such
as via maximising fidelity among the model estimate Û (T ) and U (T ) itself).
We call this hybrid model a greybox model of quantum machine learning as
it incorporates a whitebox of both known facts (such as the laws of quantum
mechanical state evolution) and a blackbox in terms of unknown non-linear
parameters. We show the utility of this approach in Chapter 3, where we ev-
idence its benefits in constructing a large-scale quantum computing dataset
of specific relevance to researchers in both closed- and open-quantum systems
research and in Chapter 4, where we show how for certain classes of problem,
the greybox method is more effective at learning the geometric and geodesic
structure of time-optimal unitaries from data than other methods.

Our contribution to the cross-disciplinary pollination among these disciplines is

necessarily specific to our problems of interest. Nevertheless, we hope that this the-
sis makes a modest and useful contribution to joining the dots as it were between
disciplines and serves as a motivation for researchers within such disciplines to ex-
plore the rich tapestries of complementary methods available to them across the
academic aisles.

Historical background
Geometric methods have a long lineage in quantum mechanics and quantum infor-
mation. The historical development of late 19th and early 20th century physics
was informed in tandem with and significantly by geometric techniques, from gen-
eral relativity, to quantum mechanics. Geometry informed much of 19th century
1.1. OVERVIEW 5

mathematical physics, including Hamilton’s reformulation of Newtonian mechan-

ics, Riemann’s work on differential geometry and Klein’s Erlangen program. As
we explore below, the work of Poincaré, Cartan, Weyl and others was significantly
influenced by revolutionary geometric ideas. In modern quantum field theory and
other disciplines, geometry has and continues to play a significant role. The synthe-
sis of geometry with other fields of mathematics, including group theory, analysis
and number theory has in turn opened up rich seams of insight via cross-disciplinary
pollination. What is perhaps not often emphasised in modern formalism is how the
early pioneers of quantum mechanics were profoundly influenced by the emerging
tools of non-Euclidean geometry and its relationship with symmetry (see [3] for a
detailed history). For context, we explore a bit more historical background below,
before proceeding to a summary of contributions of this thesis and a summary of
each Chapter and the supplementary Appendices.

Understanding the historical development of geometric methods is a useful aidé-

mémoire for situating the modern adoption of differential geometry within quantum
information and machine learning contexts. The development of geometry and al-
gebra, two of the four pillars of modern mathematics [4], has been a tandem one
as each discipline influenced the other via through methodologies and conceptual
abstraction. Philosophically, the relationship between algebra and geometry has
exhibited a symbiotic waxing and waning throughout the development of mathe-
matics’ early formalism via the then-canonical ancient distinctions of the science
of magnitude (geometry) and the science of multitude (arithmetic) [5]. Tradition-
ally, geometric and algebraic (or what today would be regarded as proto-algebraic)
proofs or demonstration took relatively distinct forms. For example, Euclid’s ge-
ometric proofs in the Elements [6] were characterised by geometric demonstrative
visual methods, while alternative proofs, such as the celebrated proof of the infin-
ity of primes, relied upon explicit logical inference and deduction. Over successive
centuries, the distinctions between geometric and algebraic methods began to blur
as scholars uncovered significant connections between the fields. An example in-
cludes Galois’s infamous relation of geometric constructions (in terms of canonical
ruler/compass structures) to symmetry properties of polynomials and groups. Sim-
ilarly, significant contributions by Gauss, Riemann, Cayley and others helped set
the scene for the emergence of later seminal results at the end of the 19th and start
of the 20th centuries and for the interplay of algebraic and geometric techniques
drawn upon by modern physics and information science. Of particular note, the
late nineteenth-century Erlangen Program of Felix Klein [7] mentioned above put
forward a systematic means of describing geometries in terms of their symmetry
properties, as described by the mathematics of groups. Klein’s innovative approach
unified diverse geometric concepts under a common framework that significantly
6 CHAPTER 1. INTRODUCTION

informed the subsequent development of mathematical fields and physical sciences.

In particular, the unification of geometry and group theory via the identification
of geometries with specific transformation groups originally developed by Sophus
Lie has had a profound impact on the development of not just mathematics, but
also classical and quantum physics. Lie’s idée fixe in describing the transformation
of geometric objects in terms of underlying symmetries formed a primary motiva-
tion for his later development of the theory of continuous transformation groups
(Lie groups). Together with other developments, Lie groups thus became an in-
dispensable fulcrum through which the mathematical pillars of geometry, analysis
(especially the study of continuity) and geometry were tied together with spectacu-
lar results across the mathematical sciences.

An example of an important discovery across disciplines [8] is Klein’s insight that

non-Euclidean geometries could be recognised as examples of coset spaces G/H
(for Lie groups G and subgroups H) seemed incompatible with the other central
trend of nineteenth century geometry, the development of Riemannian geometry
(owing to a focus on discrete algebraic structures in the former case and reliance
upon smoothly varying metrics and continuous curvature in the latter case). Klein’s
results promoted a synthesis of algebraic, Lie theoretic, and geometric methods
within the then-emerging field of representation theory, providing a powerful frame-
work for analysing structural symmetries of mathematical and physical systems.
As Knapp [9] notes, the development of representation theory and Lie theoretic-
methods has taken diverse approaches within the literature. Some tomes have em-
phasised primarily the algebraic and group-theoretic basis of Cartan decompositions
and optimisation. Others, such as the work of Helgason [2, 10], have followed more
closely Cartan’s genesis, an approach that elucidates the concurrent development
of differential-geometric principles and Lie theory together. Of particular note for
this thesis is the oeuvre of Cartan, specifically the unification of geometry, symme-
try and algebra via Cartan’s seminal classification of Riemannian symmetric spaces
according to the taxonomy of semi-simple Lie groups is regarded as one of the
20th century’s most elegant mathematical achievements. Moreover, the emerging
synthesis of geometry and algebra had profound impacts on physics itself where ge-
ometric and Lie algebraic principles were usefully adopted across the field, reaching
an apotheosis of sorts in the classical theory of general relativity. Similarly, non-
commutative algebra and geometry played a significant role in the development of
quantum mechanics in the early part of the 20th century.
1.1. OVERVIEW 7

The Nature of Learning

The search for symmetry has also shaped computational science in profound ways
since its inception, including via techniques to optimise network flows, analyse data,
solve cryptographic problems and facilitate error correction. More recently, the
advent of quantum information sciences [11], specifically quantum computing, has
breathed new life into the use of symmetry reduction techniques drawn from these
fields as researchers seek new methods to overcome challenges posed by computa-
tional intractability or resource constraints. The emergence of sophisticated machine
learning methods and technologies has also brought about an impetus to explore how
well-established mathematics of symmetry can be leveraged together with learning
protocols in order to solve optimisation problems in quantum information science
in general and quantum control specifically. The nature of learning in classical and
quantum information science is a rich and widely studied problem [12, 13]. At the
heart of theories of learning are methods, protocols or even unknown phenomena
that in some way increase the knowledge of a system about the world or some ground
truth. In this sense quantitative sciences of information inhere a certain philosoph-
ical stance or claim about what learning is. In the abstract, learnability of systems
is (or can be) construed using stateful representation, in which the knowledge or
information of a system represents an epistemic state, such as one encoded within
the parameters of a neural network [14] or quantum algorithm, as a means of in
effect quantifying (together with, for example, rules of inference) the knowledge or
information of a system. Yet information is not the sole measure relevant to learn-
ing: knowledge or information about the state of the world is key as assessed by
figures of merit such as prediction metrics (such as loss functions), which most often
stand in as proxies for a system’s epistemic state.
Thus a system that randomly assimilates large amounts of information with
poor predictive capacity (that one which fails to generalise or one which overfits)
is considered one with less capacity to learn (or that has learnt less) than one with
less information but greater generalisability, in both classical and quantum contexts.
To learn thus involves complex interactions between the amount of information a
system can represent on the one hand to enable its versatile generalisability and the
extent to which that system develops sufficiently accurate representations about the
world in order to make accurate predictions, something manifest in practice in bias-
variance trade and studied formally within statistical learning theory. A learning
protocol is thus one which increases this epistemic state and enables the learning of
some structure and predictive improvement. Thus learning can be construed, and in
computational science usually is construed, inherently as an optimisation task given
objectives and data about the world. Machine learning, classically, involves the
8 CHAPTER 1. INTRODUCTION

search for and design of models which satisfy such epistemic objectives to optimise
this learning task using classical data and algorithms. Quantum machine learning
has the same overall aim, yet is complicated by the unique characteristics of quantum
information (as manifest in properties of quantum data and quantum algorithms),
imposing constraints upon how learning occurs in a quantum context.

1.2 Related work

As foreshadowed above, symmetry and geometric methods have been an integral part
of the development of modern mathematics and physics since ancient times, thru to
the development of classical physics and the revolution in quantum physics of last
century. The thesis draws upon this lineage, intersecting with quantum information,
Lie theory and differential geometry to problems in quantum control and statistical
learning. For contextualisation, we situate QGML within the intersection of these
existing disciplines as set out in the Venn diagram in Figure 1.1. Our work primarily
is focused on quantum information theory with a focus on quantum control theory
(QCT), both in terms of simulating open quantum systems (not only for control)
and time-optimal control for optimal unitary synthesis. In QCT (see Appendix A
and section A.2), we are primarily interested in how to apply controls in order to
obtain quantum states belonging to Hilbert spaces |ψ(t)⟩ ∈ H (ρ(t) ∈ B(H)). For
convenience, we rely upon the unitary operator (channel) representation of states
ρ(t) = U (t)ρ(0)U (t)† , hence the quantum state of interest is represented via the uni-
tary operator applied to the initial quantum state ρ(0) at time t = 0. Quantum state
evolution is given by Schrödinger’s equation dU/U = −iHdt the solutions to which
belong to unitary or special unitary groups G. We rely (as is often the case in QCT
literature) upon time-independent approximate solutions to Schrödinger’s equation.
In QCT, the generic quantum control problem is described via a Hamiltonian H
comprising a drift part Hd and a control Hamiltonian Hc part:
!
X X
H = Hd + Hk uk U̇ = −i Hd + Hk uk U (1.2.1)
k k

where initial conditions are chosen depending on the problem at hand (where time
dependence is understood). The QCT problem becomes how to select controls uk ∈
R (bounded) to reach some target U (T ) at time t = T ideally in a time or energy-
optimal fashion [15–19]. The algebraic and Lie theoretic features of this problem
enter quantum information-based control theory because quantum computation in
general must be unitary (or represented by unitary channels) in order for quantum
state and quantum data (and measure) to be preserved (as distinct from collapsing to
1.2. RELATED WORK 9

classical data). This is represented via the computations of interest being described
as unitary group elements U (t) ∈ G where G is a unitary or special unitary group (see
definitions A.1.18, B.2.4 and sections A.1.4 generally). As a result, this necessitates
that Hamiltonians H(t) in Schrödinger’s equation (definition A.2.2) (whose solutions
are such unitaries U (t)) must (in the closed form case) be comprised of elements
from the corresponding Lie algebra g which generate those group elements (note we
discuss the impact of noise in sections below). Control theory is then characterised
by two primary objectives: (a) determining whether the system is controllable, that
is, whether the desired or target computation or end point can be reached given
the application of a control system; and (b) strategies for determining the optimal
control for satisfying an objective such as energy or time minimisation.
The theory of quantum geometric control (QGC), such as early work in quan-
tum control leveraging Cartan decompositions in NMR contexts [20–22], derives
from a mixture of algebraic techniques (such as Cartan decompositions) and clas-
sical geometric control theory [23–26]. The fulcrum concept here is the seminal
correspondence between Lie groups G and Lie algebras g on the one hand and dif-
ferential manifolds M and tangent spaces T M on the other. To this end, this work
fits within an emerging literature on the application of symmetry reduction and
symmetry-based optimisation methods applied to quantum machine learning. To
solve this optimisation problem (and indeed to simulate open and closed quantum
systems for quantum control as per Chapter 3), we leverage techniques in control
theory, geometry and quantum machine learning. Our work involves leveraging
machine learning (ML) techniques via a hybrid quantum-classical approach adopt-
ing quantum machine learning (QML) principles. It focuses on using parametrised
quantum circuits (see section D.7) in neural network architecture which encodes
structural features (such as unitarity of outputs) while enabling machine learning
optimisation.
Other related fields to our work include geometric machine learning (GML) [27]
where machine learning problems, such as parameter manifolds and gradient de-
scent, are studied from an information-theory perspective, bearing similarity with
geometric information science [28] and Lie group machine learning [29]. Probably the
field with the most overlap with QGML is that of geometric QML (GQML) [30–38]
a recently-developed area which studies how symmetry techniques can be used in
the design of or embedded within quantum machine learning architectures (usually
parametrised or variational quantum circuits) in ways such that those networks re-
spect to varying degrees those underlying symmetries. GQML techniques have also
been shown to have import in addressing aspects of the barren plateau problem in
QML. GQML is probably the most similar field to the work herein due to its use
of dynamical Lie algebras in QML network design. Most of the literature in the
10 CHAPTER 1. INTRODUCTION

field is concerned with constructing quantum machine learning architectures (e.g.

variational quantum circuits) which respect compositionally certain symmetries by
encoding Lie algebraic generators within VQC layers, whereas the work in Chapter
4 for example does not seek an entire network that respects symmetry properties,
but rather only a sub-network of layers whose output (e.g. in the form of unitary
operators U (t)) respects symmetries inherent in unitary groups. Our work also takes
on a more differentially geometric flavour than GQML and also emphasises the re-
lationship with formal control theory to some degree more than GQML literature
usually does. Moreover, our work in that Chapter is concerned with design choices
to enable the network to learn geodesic approximations via simplifying the problem
down to one of learning optimal control functions.

Quantum (Control) Theory

QGC QML

QGML
GQML

Geometry ML
GML

Figure 1.1: Venn diagram describing overlap of quantum information processing and control theory
(Quantum Control Theory), Geometry and Machine Learning (ML). QGML sits at the intersection
of quantum computing and control theory, geometry and machine learning. Related fields, includ-
ing quantum geometric control (QGC), geometric machine learning (GML) and quantum machine
learning (QML) can be situated accordingly. The most overlap between existing literature and the
present work on QGML is geometric quantum machine learning (GQML) which seeks to encode
symmetries across entire quantum machine learning networks.
1.3. CONTRIBUTIONS 11

1.3 Contributions

In Chapters 3, 4 and 5, we present three distinct but connected contributions to the

literature on geometric techniques in quantum machine learning. The first is the pre-
sentation of a large-scale synthetic dataset of open quantum systems evolving in the
presence of noise, which we denote the QDataSet (Chapter 3). While quantum ma-
chine learning has evolved for more than a decade, the field has been characterised
by a sparsity of large-scale datasets from which to benchmark and develop algo-
rithms (as is the case classically). To remedy this gap, the QDataSet was produced.
It presents a novel machine learning architecture that combines both blackbox and
whitebox techniques into a so-called greybox architecture, the underlying rationale
being that learning tasks are more efficient when algorithms are encoded with prior
information. The next novel contribution of this thesis, in Chapter 4, is the demon-
stration of the use of quantum machine learning techniques for learning approximate
time-optimal unitary sequences in the form of geodesics over multi-qubit manifolds
SU (2n ). The results in that Chapter evidence the learnability, in certain cases,
of sequences of unitary operators (Uj ) ∈ G (where G ≃ SU (2)) and multi-qubit
variations which constitute geodesics along the implicit Lie group manifold G. As
a result, they may be interpreted as time-optimal sequences for generating target
unitaries UT ∈ G. Furthermore, we validate the utility of an experimental ‘greybox’
approach to hybrid quantum-classical machine learning architecture for solving the
learning problem, demonstrating its benefits over blackbox methods. Code for the
algorithms (and benchmarking examples) is available via a repository in line with
principles of open access and reproducibility.

The final contribution of this thesis is in Chapter 5, where we present a novel

analytical method for determining Hamiltonians that generate time-optimal uni-
tary sequences for certain classes symmetric spaces. These results draw upon deep
connections between algebra and geometry as pioneered by the work of Cartan and
subsequently developed by others across geometric control theory and quantum con-
trol. The results show that under reasonable simplifying assumptions related to the
Cartan decomposition of the Lie group manifold G, Hamiltonians for generating cer-
tain time-optimal unitaries UT ∈ G can be analytically determined using variational
means. The novel result (where G/K (for an isometry subgroup K) can be con-
strued as a globally Riemann symmetric space) provides a means to tackle one of the
key barriers to learning time-optimal unitary sequences, namely the exponentially
hard problem of globally minimising over all geodesic paths.
12 CHAPTER 1. INTRODUCTION

1.4 Chapter Synopses

In this section, we set out a synopsis of each Chapter below. The first five Chapters
are self-contained and should be regarded as the thesis proper. Note that we have
included both a background theory Chapter (see Chapter 2) and extensive supple-
mentary background information in the form of Appendices. The rationale for this
is that the subject matter of this work is inherently multi-disciplinary in nature,
synthesising concepts from quantum information processing with those in algebra,
geometry and machine learning. Each supplementary Appendix is tailored to pro-
vide additional background material in a relatively contained way for readers whom
may be familiar with some, but not all, of these diverse scientific disciplines. The
Appendices reproduce or paraphrase standard results in the literature with source
material identified at the beginning of each Appendix. Proofs are omitted for brevity
but can be found in the cited sources and other standard texts. The substantive
Chapters 3, 4 and 5 have been tailored to cross-refer to the Appendices’ sections,
definitions, theorems and equations in order to assist readers who may wish to delve
deeper.

Supplemental Appendix C (Differential Geometry) goes into slightly greater

detail than other Chapters in order to lay out the theoretical underpinnings of
geodesics, measuring distance and ultimately optimisation of unitary sequences via
minimisation of arc-length. One of the observations throughout the years during
which this thesis was produced was that while researchers in each discipline, such as
quantum information processing or machine learning, generally found the key con-
cepts of the other accessible, this was much less so for geometry and representation
theory. Geometry, especially coordinate-free non-Euclidean geometry, in particular
is a thicket of concepts wrapped in idiosyncratic terminology such as bundles, pull-
backs, pushforwards, forms and so on. While this is not unique to geometry as a
discipline, it was observed that this generally added to its opacity and difficulty in
being understood or usefully adopted by researchers outside the confines of math-
ematical research. This opacity also, in the author’s opinion at least, somewhat
contributes to the lack of appreciation of how intertwined the development of geom-
etry and quantum mechanics was in the late 19th and early 20th centuries, where,
in the modern era, the connection of geometry to familiar tools such as commuta-
tion (or structure constants) is obscured, but becomes apparent for example when
considering Cartan’s structural equations (see theorem C.2.2) or the Riemannian
curvature tensor (definition C.2.1).
1.4. CHAPTER SYNOPSES 13

Chapter 2 (Background Theory)

This Chapter provides an abridged summary of background theory across quantum
information processing, Lie theory and representation theory, differential geometry
and classical / quantum machine learning relevant to the substantive results in Chap-
ters 3,4 and 5. The Chapter begins by surveying quantum information processing,
covering elementary axioms of quantum mechanics as they apply to quantum com-
puting. It covers operator formalism, evolution of quantum states, measurement,
quantum control and open quantum systems. In addition it introduces concepts
of quantum registers and quantum channels. It then covers algebra and Lie the-
ory, including Lie groups, representation theory, Cartan algebras, root systems and
Dynkin diagrams. Following this, it summarises basic principles of differential ge-
ometry, including differential manifolds, vector fields, tensor fields, tangent planes,
fibre bundles, geodesics, Riemannian and subRiemannian manifolds and symmetric
spaces. It also covers elementary theory of geometric control. The final section
summarises principles of machine learning from classical and quantum perspectives,
including statistical learning theory and variational quantum circuits.

Chapter 3 (QDataSet and Quantum Greybox Learning)

This Chapter presents material published in relation to the QDataSet, a large-scale
synthetic dataset of open quantum systems set out in [39]. The Chapter explores the
importance of large-scale datasets in machine learning for the development of novel
algorithms relevant to theoretical and applied problems in the field. It sets out theory
and implementation of open quantum systems models used to develop the dataset,
together with background information on open quantum systems noise protocols
and the theory of open quantum systems. It presents examples of how to use the
dataset (along with code) related to common tasks such as quantum tomography and
quantum control. The related repository contains code for generating the dataset,
together with example Python code and worked examples for using the dataset for
benchmarking in a variety of use cases. As at the date of this thesis, the paper and
dataset has been referenced in over 20 publications.

Chapter 4 (Quantum Geometric Machine Learning)

This Chapter presents results related to machine learning architectures for learn-
ing time-optimal subRiemannian unitary geodesics UT ∈ SU (2n ) set out in [40].
The work surveys geometric methods in quantum computing, including the work of
Nielsen and others applying variational techniques to circuit synthesis. It reviews the
underlying theory canvassed in earlier Chapters as applicable to the case of multi-
14 CHAPTER 1. INTRODUCTION

qubit circuit synthesis. It presents machine learning architectures and experimental

protocols for learning geodesic structure from underlying training data constructed
from horizontal distributions belonging to relevant Lie algebras ∆ ∈ su(2n ). A raft
of machine learning architectures are presented in order to demonstrate the utility
of ‘greybox’ algorithm design (which incorporates known information into stochas-
tic machine learning architecture). Results of experiments are presented along with
algorithm pseudocode and links to source-code material.

Chapter 5 (Global Cartan Decompositions for KP Problems)

The final Chapter presents novel results set out in [41] related to the use of Car-
tan decompositions for solving certain unitary sequence optimisation KP quantum
control problems for targets UT ∈ G = K ⊕ P (for symmetric subgroup K and
antisymmetric subgroup P ), where G/K is a symmetric space and G is equipped
with a Cartan decomposition G = KAK. The Chapter develops background the-
ory presented in earlier Chapters related to symmetric spaces and semi-simple Lie
groups. It connects this theory to the KAK representation of unitary targets and
unitary evolution. It posits a constant-θ ansatz where θ parametrises generators
drawn from the subset a ⊂ p from a maximally non-compact Cartan subalgebra
h ⊂ g. It provides a general proof and method that uses Cayley transformations
and variational (Lagrangian) methods to show how the global optimisation problem
can, for certain classes of symmetric space, be analytically solved. It also presents
results related to root systems and quantum control.

Appendix A (Quantum Information Processing)

This supplemental Appendix contains the background theory and context for quan-
tum information processing and quantum control. The first part presents a short
recapitulation of primary theorems and definitions in quantum information theory
covering background information on state spaces, operators and system evolution,
density operators, composite systems and measurement. The second part provides
background detail on quantum control formalism and open quantum systems theory
relevant to later Chapters.

Appendix B (Algebra)
This supplemental Appendix includes background information on Lie theory and
representation theory relevant to geometric concepts applied in our results Chap-
ters. The summary begins with contextualisation of the role of geometry in the
development of algebraic and quantum techniques. It then surveys essential results
1.4. CHAPTER SYNOPSES 15

from the theory of Lie groups and Lie algebras which are of central importance to the
results we present further on. It covers aspects of representation theory relevant to
the work, including root system derivation, Cartan matrices and Dynkin formalism.
It focuses in particular on Cartan decompositions from an algebraic standpoint.

Appendix C (Differential Geometry)

This supplemental Appendix provides a summary and discussion of background the-
ory related to differential geometry, such as the theory of manifolds, fibre and tangent
bundles and tensor formalism. It begins with a coverage of elementary geometric
concepts, such as differentiable manifolds, charts, tangent and cotangent (dual)
spaces and tangent planes. It then proceeds to summarise key concepts such as fi-
bre bundles (and vertical/horizontal subspaces), connections on manifolds, parallel
transport and horizontal lifts and covariant differentiation. With this background
in train, it focuses on key concepts for optimisation of geodesics and measuring arc-
length. Its particular focus is Riemannian manifolds, drawing in connections with
Cartan’s classification of symmetric spaces in terms of semi-simple Lie groups. It
then surveys concepts in subRiemannian geometry (and the theory of distributions)
relevant to quantum control relevant to Chapters 4 and 5. It concludes with a focus
on geometric control theory relevant to quantum control, with an emphasis on the
application of Cartan methods that enable simplification (or symmetry reduction).

Appendix D (Quantum Machine Learning)

This supplemental Appendix surveys key background literature in classical and
quantum machine learning. It begins with an overview of classical statistical learn-
ing theory, following which common methods in machine learning are examined. It
sets out methods and theory relevant in particular to machine learning architecture
used in later Chapters, such as deep neural networks. It also examines these results
from the perspective of geometric, algebraic and functional analysis theory touched
on in previous Chapters. The Chapter then proceeds to a summary of key elements
of quantum machine learning, especially variational and other methods used in areas
such as quantum control and algorithm synthesis. It examines different algorithmic
design approaches from ‘black-box’ methods to more engineered ‘whitebox’ methods
where existing assumptions and prior knowledge (such as with respect to quantum
theory) are encoded in algorithms for efficiency purposes. It briefly summarises the
use and role of geometric and Lie algebraic techniques in classical machine learning
and control. It concludes with a synopsis of a few recent emergent trends combining
Lie algebraic theory and quantum machine learning.
16 CHAPTER 1. INTRODUCTION

1.5 A Note on Notation

Consistency of notation is notoriously often honoured in the breach in many fields
where different symbolism or conventions for equivalent concepts can confuse readers
unfamiliar with content. This can be compounded when seeking to build connections
across disciplines, such as between quantum information notation and algebraic or
geometric nomenclature. Geometry in particular can often be riddled with subtle
yet distinct ways of presenting concepts, especially when grappling with coordinate
versus coordinate-free means of expressing results. We mention a few examples
below primarily for readers unfamiliar with differential geometry notation. This can
be skipped for those familiar with geometry without loss of generality as it is more
a foreshadowing for readers from other disciplines.
In general we have aimed for notational consistency across the Chapters below,
such as opting for representing Lie group manifold elements as γ(t) ∈ M where γ(t)
is often the symbolism for curves along manifolds (see section C.1.3 for example).
We then connect γ(t) as the expression of curves traced out on G via unitary group
elements U (t) ∈ G.
For convenience, we usually interchange G and M to emphasise the differentiable
manifold properties of G and leave the more formal relation with principal G-bundles
understood while pointing readers towards relevant literature. In other cases, we
endeavour for consistency across geometric, algebraic and control theory notation.
For example, in traditional control theory, x(t) is used for state variables when
discussing the Pontryagin Maximum Principle (e.g. see section C.5.4.1), whereas we
use γ(t) again as a means of describing how state evolution (in this case represented
by the unitary channel U (t)) can be geometrically represented in terms of curves on
manifolds. Another example is the important relationship between Lie algebras g
and tangent planes (see section C.1.5) which allows us to (under certain conditions)
equate the geometric concept of tangent vectors describing evolution of γ(t) in terms
of generators from the Lie algebra g generating evolutions U (t) via the Hamiltonian
H(t). Here the Hamiltonian is in effect constructed from Lie algebra elements and
the correspondence with tangent planes and g allows us to then think of H(t) as
constructed from tangent bundle elements (which is not unrelated to how state
evolution is presented in quantum field theory for example).
Chapter 2

Background Theory

2.1 Overview
Quantum geometric machine learning intersects multiple mathematical and scien-
tific disciplines. In this Chapter, we provide a brief synopsis of background theory
relevant to our contributory Chapters 3, 4 and 5. In addition to the material be-
low, we have linked to extensive supplemental material set out in the Appendices.
The rationale for this is primarily the interdisciplinary nature of the subject mat-
ter that spans quantum information processing, algebra and representation theory,
differential geometry and machine learning (both classical and quantum).

2.2 Quantum Information Processing

Our starting point in this work is the field of quantum information processing. This
section covers key elements of quantum information processing theory relevant to
quantum control and quantum simulation results in Chapters 3, 4 and 5, summaris-
ing the more expansive treatment of background quantum information processing
theory set out in Appendix A. We begin with the concept of the state of a quantum
system represented as a unit vector ψ over a complex Hilbert space H = H(C). Two
unit vectors ψ1 , ψ2 ∈ H, where ψ2 = cψ1 for c ∈ C, correspond to the same physical
state (axiom A.1.1). In density operator formalism, the quantum system is described
by the bounded linear operator ρ ∈ B(H) acting on H. For bounded A ∈ B(H),
the expectation value of A is Tr(ρA). Composite quantum systems are represented
by tensor products ψk ⊗ ψj (or equivalently in operator formalism, ρk ⊗ ρj ). More
K
generally, H is a state space V (K) construed as a vector space over a field , typi-
C
cally . Elements ψ ∈ V represent possible states of a quantum system (definition
A.1.1). Quantum information processing synthesises such theory with foundational
concepts of information via the concepts of classical registers and states (defini-

17
18 CHAPTER 2. BACKGROUND THEORY

tion A.1.2) where a classical register X is either (a) a simple register, being an
alphabet Σ (describing a state); or (b) a compound register, being an n-tuple of
registers X = (Yk ). Quantum registers and states are then defined as an element of
a complex (Euclidean) space CΣ for a classical state set Σ satisfying specifications
imposed by axioms of quantum mechanics (definition A.1.3). Quantum registers
(and by extension quantum states ψ) are assumed to encode complete information
about the quantum system (such that even open quantum systems (section A.3)
can be reframed as complete closed systems encompassing system and environment
dynamics - see below).
Hilbert spaces are equipped with important structural features (we describe in
more detail below). First is the inner product (definition A.1.4) ⟨·, ·⟩ : V × V →
C, (ψ, ϕ) 7→ ⟨ψ, ϕ⟩ for ψ, ϕ ∈ V, c ∈ C. Together with the Cauchy-Schwartz inequal-
ity (definition A.1.5) we then define a norm (definition A.1.6 on V (C), given by
∥.∥ : V → R, ψ 7→ ∥ψ∥ for ψ, ϕ ∈ V , which in turn imposes a distance (metric)
function d on V . Normed vector spaces are Banach spaces satisfying certain con-
vergence and boundedness properties and allowing definition of an operator norm
(definition A.1.7) ensuring for example that operations (evolutions) remain within
V (C). This structure allows us to define in a quantum informational context the
concept of the dual space, a particularly important concept when we come to geo-
metric and algebraic representations of quantum states and evolutions. The dual
space (definition A.1.8) is defined as the set of all bounded linear functionals is de-
K K
noted the dual space V ∗ (K) to V ( ) such that χ : V → , ψ 7→ a||ψ|| for some
K
(scaling) a ∈ .
C
A Hilbert space is then formally defined as a vector space H( ) with an inner
p
product ⟨·, ·⟩ complete in the norm defined by the inner product ||ψ|| = ⟨ψ, ψ⟩
(definition A.1.9). Hilbert spaces can be composites of other Hilbert spaces, such
as in the case of direct-sum (spaces) H = Hi ⊕ Hj (relevant to state-space de-
composition) (definition A.1.1). Moreover, a Hilbert space admits orthonormal and
orthogonal basis (definitions A.1.10 allowing us to work with H using an orthonor-
mal basis A.1.11), which is of fundamental importance to the formalism of quantum
computation.
In many cases (such as those the subject of Chapters 3 and 4) we are interested in
two-level quantum systems denoted qubit systems (equation A.1.3) along with multi-
qubit systems, hence we are interested in tensor products of Hilbert spaces Hi ⊗ Hj
(definition A.1.12). Tensor products, which also have a representation as a bilinear
map T : Hi × Hj → C are framed geometrically later on forming an important part
of differential geometric formalism (such as relating to contraction operations and
measurement). They exhibit certain universal and convergence properties required
for representations of quantum states and their evolutions (see section A.1.1.1).
2.2. QUANTUM INFORMATION PROCESSING 19

Using the formalism above, one often uses bra-ket notation where states ψ ∈ H are
represented via ket notation |ψ⟩ with their duals ψ † represented via bra notation
⟨ψ|. Connecting this notation, ubiquitous in quantum information, to the formalism
of vector spaces, duals and mappings provides a useful bridge to geometric framings
of quantum information (we expand upon this further on).

2.2.1 Operator formalism

Quantum control, quantum machine learning and unitary synthesis is fundamen-

tally concerned with the directed evolution of quantum systems to obtain some final
quantum state. The rules governing quantum state dynamics present fundamental
constraints upon how such systems evolve. We characterise quantum state evo-
lution and interaction via operator formalism and measurement axioms such that
every measureable physical observable quantity is defined by a real-valued function
M defined on classical phase space, there exists a self-adjoint (Hermitian) linear
measurement operator M acting on H (axiom A.1.2). The operators in question
are drawn from the set of (norm) bounded endomorphic linear maps A : H → H
which we denote B(H) (definition A.1.13). For any operator A ∈ B(H) we de-
fine the adjoint A† : H → H (definition A.1.14) as the unique operator such that
⟨ϕ, Aψ⟩ = ⟨A† ϕ, ψ⟩ for all ϕ, ψ ∈ H. Adjoint operators satisfy a number of impor-
tant properties, including being Hermitian operators with real eigenvalues if A† = A.
In quantum information, we are particularly focused on positive semi-definite oper-
ators A = B † B ∈ B(H) such that the norm is non-negative ||Bψ||2 ≥ 0 (definition
A.1.15). Doing so allows us to define projection operators P such that P 2 = P
and, for some closed V ∈ H, P (V ) = I and P (V ⊥ ) = 0, which are essential for
quantum measurement formalism. We can also define an important class of positive
semi-definite isometry operators (definition A.1.17) for A ∈ B(X , Y) which preserve
the norm ||Av|| = ||v||, v ∈ H. This in turn allows us to define normal operators
(A† A = AA† ) and in turn the crucial set of isometry operators denoted unitary
operators U ∈ B(H) which preserve inner products ⟨U ϕ, U ψ⟩ = ⟨ϕ, ψ⟩. From this
relation we deduce that ||U || = I and thus that U † = U -1 such that U U † = U † U = I
which is often the form in which they are first introduced in quantum contexts. Im-
portantly, the norm (inner-product) preserving homomorphic characteristic of uni-
tary operators gives rise to the preservation of Haar measure (see definition A.4.8)
in turn relating to conservation of probability. The unitary group U (n) is a central
Lie group in quantum information processing and each Chapter in this work and is
a concept we revisit in several guises.
Operator formalism above allows the assertion of the spectral theorem which
states that each normal operator X ∈ B(H) can be decomposed as the linear com-
20 CHAPTER 2. BACKGROUND THEORY

bination of projection operators Πk (changing notation to avoid confusion with P

subspaces below) onto their corresponding eigenspace with eigenvalues λk , that is
X= m
P
k λk Πk (see definition A.1.19). In quantum measurement formalism (section
A.1.6), the spectral theorem manifests via measurement operators being framed
as projection operators Mm whose eigenvalues m are observed with probability
†
⟨ψ|Mm Mm |ψ⟩ (see equation A.1.28). Other key operations include the trace op-
P
erator (definition A.1.20) such that for A ∈ B(H) we have Tr(A) = j ⟨ej , Aej ⟩
for an orthonormal basis {ej } of H. The trace can be construed as a tensorial con-
traction (see section C.1.15) with a range of important properties discussed below
related to both measurement and system-environment interactions in open quantum
systems (section A.3 and Chapter 3) via the partial trace (definition A.1.21).
An important operator representation in quantum information contexts and one
used throughout this work is that of density operators which function as in effect
representations of distributions over quantum states. First, we define a Hilbert-
Schmidt operator A ∈ B(H) as one such that Tr(A† A) < ∞. A density operator
is then a positive semi-definite operator ρ ∈ B(H) that is self-adjoint with non-
negative trace Tr(ρ) = 1. The set of density operators denoted D(H) is a convex
set (important for transitions among quantum states represented by ρ). Moreover,
density operators can be considered as a statistical description of quantum states,
analogous to probability distributions over pure states ρa (indexed by the set Γ)
P
where ρ = a∈Γ p(a)ρa . Here p(a) describes the probability of observing (obser-
vations that characterise the system in state) ρa and so in this sense represents a
probability distribution over (pure) states. Density operator formalism thus allows
quantum states and operations upon them to be represented via operator formalism.
In particular, we then arrive at two crucial distinctions between pure quantum states
(those with Tr(ρ2 ) = 1) and mixed (ensembles) of quantum states {ρi } such that
Tr(ρ) < 1 (see also other measures such as quantum relative entropy (definition
A.1.27) for discerning pure versus mixed states). Pure states represent the most
specific (and complete) information that can be known about a quantum system
(where uncertainty is inherently ontological), while mixed states represent epistemic
uncertainty about the quantum state [42] relevant in particular to open quantum
systems. Density matrix formalism allows state similarity to be more easily framed,
such as via trace distance (definition A.1.25). The formalism also allows the def-
inition of coherent and incoherent superpositions (see definition A.1.28) which is
of particular relevance to understanding decohering phenomena in noise-models of
open quantum systems (such as those we discuss in later Chapters).
Another set of concepts addressed in this work drawn from information theory
is that of quantum channels, which are maps between the space of linear operators
i.e. Φ : B(H1 ) → B(H2 ). Such channels are completely positive trace-preserving
2.2. QUANTUM INFORMATION PROCESSING 21

(CPTP) maps which play an important role in quantum information theory where we
often are interested in operations upon (or maps between) operator representations.
The set of such channels is denoted C(H1 , H2 ). An example is framing unitary
evolution in terms of unitary channels (definition A.1.30) where the operation of
U ∈ B(H) is represented as Φ(X) = U XU † for X ∈ B(H). Furthermore, we
can also then define quantum-classical channels (definition A.1.31) Φ ∈ C(H1 , H2 )
which transform a quantum state ρ1 ∈ B(H) (with possible off-diagonal terms) into a
classical distribution of states represented by a diagonal density matrix ρ2 ∈ B(H).
Quantum-classical channels are a means of representing general measurement in
quantum information (see generally [43]).

2.2.2 Evolution

The evolution of quantum systems is framed by the postulate (axiom A.1.3) that
quantum states evolve according to the Schrödinger equation idψ = H(t)ψdt (see
definition A.1.18) where ψ ∈ H and the self-adjoint operator that generates evolu-
tions in time, H(t) ∈ B(H), is denoted the Hamiltonian (definition A.1.33). The
Hamiltonian (which represents the set of Hamilton’s equations [44] for quantised
contexts [45])) of a system is the primary means of mathematically characterising
the dynamics of quantum systems, describing state evolution and control archi-
tecture. The equation has equivalent representations in density operator form as
dρ = −i[H, ρ]dt (equation (A.1.20)) and unitary form as dU U -1 = −iHdt (equation
(A.1.21)). The utility of each representation depends on the problem at hand. We
focus primarily on the unitary form due to the apparent relationship with unitary
Lie groups and geometry. Time-dependent solutions toSchrödinger’s equation take
R ∆t
the form of unitaries (equation (A.1.22)) U = T+ exp −i 0 H(t)dt (see section
A.1.5) the set of which as we shall see form a Lie group. In practice solving for
the time-dependent form of U (t) can be difficult or intractable. Instead a time-
independent approximation U (t) ≈ U (t − 1), , , U (0) (equation (A.1.25)) is adopted
by assuming that Hamiltonians H(t) (and thus U (t)) can be regarded as constant
over sufficiently small time scales ∆t. Doing so allows a target unitary U (T ) to
be decomposed as a sequence U (T ) = U (tn )...U (t0 ). As discussed below, for full
controllability such that arbitrary U (t) are reachable (within a reachable set R),
we rely upon the Baker-Campbell-Hausdorff theorem (definition B.2.18) to ensure
the subspace p ⊂ g of Hamiltonian generators are bracket-generating (definition
C.4.4) such that g is generated under the operation of the Lie derivative (B.2.6).
We explain these terms in more detail below.
22 CHAPTER 2. BACKGROUND THEORY

2.2.3 Measurement

Measurement formalism is fundamentally central to quantum information process-

ing. We can parse the measurement postulate of quantum mechanics into two parts.
The first (axiom A.1.4) provides that given ψ = ∞
P
j aj ej , aj ∈ C where ψ ∈ H, the
probability of measuring an observable (using operator M ) with value m is given
by |am |2 (with em as eigenstates of the measurement operator). This is expressed as
† †
p(m) = ⟨ψ|Mm Mm |ψ⟩ = tr(Mm Mm ρ) (equation (A.1.28)). The effect of measure-
ment (usually rolled into one axiom) by M on state ρ in axiom (A.1.5) measurement
leads to a post-measurement state ρ′ given by (equation (A.1.30)):

†
Mm ρMm
ρ′ = †
⟨Mm Mm , ρ⟩

†
due to the (partial) trace operation of measurement tr(Mm Mm ρ). This is sometimes
describe as the Copenhagen interpretation or collapse of the wave function view of
quantum mechanics. In information theory terms, each measurement outcome m is
associated with a positive semi-definite operator (equation (A.1.31)). In more ad-
vanced treatments, we are interested in measurement statistics drawn from positive
†
operator-valued measures (POVMs), a set of positive operators {Em } = {Mm Mm }
P
satisfying m Em = I in a way that gives us a complete set of positive opera-
tors with which to characterise ρ (see sections A.1.6.1 and A.1.6.3). Because of
this collapse, quantum measurement must be repeated using identical initial con-
ditions (state preparations) in order to generate sufficient measurement statistics
from which quantum data, such as quantum states ρ or operators such as U (t) may
be reconstructed or inferred (e.g. via state or process tomography). Chapter 3
of this work explores the role of measurement in quantum simulations. In Chap-
ters 4 and 5, we assume access to data about U (t) and estimates Û (t) assume the
existence of a measurement process that provides statistics that enable character-
isation of operators and states and in order to calculate loss for machine learning
optimisation protocols. As flagged above, measurement can be framed in terms of
quantum-to-classical channels (equation A.1.36) and permits composite (e.g. multi-
qubit) system measurement (see section A.1.6.2). Moreover, as we discuss in section
D.6.3, the distinct characteristics of quantum versus classical measurement have im-
plications for quantum control and quantum machine learning (such as that typical
online instantaneous feedback based control is unavailable due to the collapse of ρ
under measurement).
Quantum measurements then form a key factor in being able to distinguish quan-
tum states and operators according to quantum metrics (section A.1.8). The choice
of appropriate metric on H or B(H) is central to quantum algorithm design and ma-
2.2. QUANTUM INFORMATION PROCESSING 23

chine learning where, for example, one must choose an appropriate distance measure
for comparing function estimators fˆ to labels drawn from Y (see section D.6). State
discrimination (section A.1.8.1) for both quantum and probabilistic classical states
requires incorporation of stochasticity (the probabilities) together with a similarity
measure. For example, the Holevo-Helstrom theorem (A.1.38) quantifies the proba-
bility of distinguishing between two quantum states given a single measurement µ.
A range of metrics such as Hamming distance, quantum relative entropy (definition
A.1.27) and fidelity exist for comparing states and operators. In this work, we focus
upon the fidelity function (see section
pA.1.8.2) allowing state and operator discrim-
√ √ √ √
ination F (ρ, σ) = ρ σ 1 = Tr σρ σ (equation A.1.49) which is related
to the trace distance (definition A.1.25). The fidelity function is chosen as the loss
function for our empirical risk measure in Chapters 3 and 4 via the cost functional
given in equation 4.6.3.

2.2.4 Quantum control

Quantum control problems consider how to exert control over quantum systems in
order to obtain certain computational results, or steer quantum systems in ways
corresponding to computations represented by U (t). Quantum control is at the
centre of each of Chapters 3 (as a use-case for the QDataSet and via learning
noise-cancellation operators V0 ), 4 for finding time-optimal geodesics using machine
learning and 5 for finding time-optimal Hamiltonians using global Cartan decompo-
sitions. For a closed quantum system (noiseless and isolated with no environmental
interactions) ρ ∈ B(H) the Hamiltonian can be partitioned into a drift Hd (t) and
control Hc (t) part H0 (t) = Hd (t) + Hctrl (t) such that Schrödinger’s equation (equa-
tion (A.2.4)) becomes idρ = [Hd (t) + Hctrl (t), ρ(t)]dt (see definition A.2.1). The drift
term represents the uncontrollable evolution of the quantum system, while the con-
trol term represents the controllable part. As discussed in sections A.2.3 and C.5.3,
quantum control problems usually follow the Pontryagin Maximum Principle [15,23]
(see section C.5.4.2) where the state of a system x is described according to a (first
order) differential equation ẋ = f (t, x, u) where x represents the system state, f is
a vector field while u = u(t) are real-valued time varying (or constant over small ∆t
interval) control functions. In quantum settings, equation (A.2.6) is described by
the Schrödinger equation. In unitary form this is equivalent to (equation (A.2.7)):

ẋ ∼ U̇ f (x, t, u) ∼ −iH(u(t))U (t) (2.2.1)

where iH(u(t)) is a Hamiltonian comprising controls and generators (for us, drawn
from a Lie algebra g). The general Pontryagin Maximum Principle (PMP) control
problem then incorporates adjoint (costate) variables and other variational assump-
24 CHAPTER 2. BACKGROUND THEORY

tions in order to obtain the general form of control problem, the quantum versions
of which are set out in section C.5.3 (and see generally [15,23,46]). The drift Hamil-
tonian Hd and control Hamiltonian Hc combine as (equations (A.2.9) and (A.2.10)):
!
X X
H(u) = Hd + Hk uk U̇ = −i Hd + Hk uk U (2.2.2)
k k

where initial conditions are chosen depending on the problem at hand (often U (0) =
I) or, as per our global Cartan decomposition method in Chapter 5 in order to
streamline solving the optimal control problem. The objective of time-optimal con-
trol is to assume a target UT ∈ Rm (a reachable set given applicable controls) which
is reachable by applying control functions uj (t) ∈ U ⊂ Rm to corresponding gen-
erators in g or a control subset p ⊂ g, subject to a bounded norm on the controls
||uj (t)|| ≤ L. The targets are quantum states ρ(T ) at final time T represented via
CPTP unitary operators (channels) U (T ) acting on initial conditions (also repre-
sented in terms of unitaries). As such, for the quantum case, we are interested in
targets as elements of Lie groups of interest to quantum computation, such as uni-
tary groups and special unitary groups (see definition (B.2.4)). Given theoretical
assumptions as to the existence of length minimising geodesics on Riemannian or
subRiemannian manifolds (see sections C.2.1 and C.4), time optimisation becomes
equivalent to length (equation (C.2.8)) or energy (equation (C.4.2)) minimisation
(see section C.5.3).
Lie groups G are equipped with the structure of a differentiable manifold M
(see section C.1.1 and definition C.1.2). Thus we assume suitable structure on the
underlying manifold M such as the existence of a metric (e.g. Riemannian or sub-
Riemannian, section C.2.1 and C.5.7) which for Lie groups and their corresponding
Lie algebras is the metric induced by the Killing form (definition B.2.12).
In practice the minimisation problem proceeds using variational methods (see
sections C.5.4 and 5.6) and so in practice for controls u(t) means minimising arc-
RT
length via minimising control pulses ℓ(t) = minu(t) 0 ||u(t)||dt (see equations (C.5.3))
having regard to manifold curvature (which enters the minimisation problem via the
metric).

2.2.5 Open quantum systems

The role of external noise and environmental interactions in quantum information
processing is central to solving problems in quantum control for open quantum sys-
tems. While Chapters 4 and 5 assume a closed system, Chapter 3 explicitly models
its quantum simulations and formalism on the basis of open quantum systems (sec-
tion A.3). Open quantum systems formalism involves constructing an overall quan-
2.2. QUANTUM INFORMATION PROCESSING 25

tum system comprising the system (or closed quantum evolution) and the environ-
ment (i.e. interactions with the environment). Such open systems can be modelled
(equation (A.3.1)) in a simple case via introducing environmental dynamics into the
Hamiltonian:

.
H(t) = H0 (t) + H1 (t) = Hd (t) + Hctrl (t) + HSE (t) + HE (t) (2.2.3)
| {z } | {z }
H0 (t) H1 (t)

where H0 (t) is defined as above encompassing drift and control parts of the Hamil-
tonian and where H1 (t) consists of two terms: (i) a system-environment interaction
part HSE (t) and (ii) a free evolution part of the environment HE (t).

To model the effect of noise on quantum evolution, we utilise the concept of

a superoperator (definition A.3.1) S : B(H) → B(H) which is trace preserving or
decreasing, exhibits convex linearity and complete positivity such that it is a CP
map (equation (A.3.4)). Superoperators then allow us to represent the effect of
environmental interactions (noise) as quantum or classical channels via the effect
on density operator evolution. A common method of representing such open-state
evolution is via the Lindblad master equation (section A.3.3) which formalises non-
unitary evolution arising as a result of interaction with noise sources, such as baths.
Here mixed unitary and non-unitary evolution of the quantum system is given by
(equation (A.3.5)):

dρ X † 1n † o
= −i[H, ρ] + γk Lk ρLk − Lk Lk , ρ (2.2.4)
dt k
2
P
with ρ being the state representation, H(t) the closed system Hamiltonian, k
a summation over all noise channels with dephasing (decoherence) rates γk . The
Lindblad operators Lk acting on ρ encode how the environment affects the closed
quantum state. For modelling of noise in Chapter 3, we impose noise β(t) along var-
ious axes of qubits (see section 3.5.3.1), an example being a single qubit Hamiltonian
with noise βz (t) imposed along the z-axis:

1 1
H= (Ω + βz (t)) σz + fx (t)σx .
2 2

where the type of noise is characterised according to regimes set-out therein. In

general the effect of noise on controls u(t) is modelled via the power spectral density
for which section 3.5.3.1 provides a discussion.
26 CHAPTER 2. BACKGROUND THEORY

2.3 Algebra and Lie Theory

2.3.1 Lie groups and Lie algebras

This work, especially Chapters 3, 4 and 5, focuses on quantum control problems

where target computations are represented as elements of unitary Lie groups UT ∈ G.
Synthesising unitaries to reach UT in a time-optimal fashion involves Hamiltonians
composed using generators of the corresponding Lie algebra g and associated control
functions. In particular, we are interested in leveraging the symmetry and differ-
ential geometric structure of Lie groups to model time-optimal unitary synthesis as
a problem of learning minimal geodesics on Riemannian (section C.2) and subRie-
mannian (section C.4) manifolds. This is especially the case in Chapter 5 where we
leverage symmetry structure of semi-simple Lie groups via global Cartan decompo-
sitions to solve for time-optimal Hamiltonians analytically. In this section, we cover
underlying principles of Lie theory, such as Lie groups, Lie algebras, Killing forms
and approximation theorems. We summarise Cartan decompositions and important
root-space decompositions.
The starting point for our analysis is the concept of a Lie group G being a
Hausdorff topological group equipped with a smooth manifold structure (definition
B.2.1). Lie groups are continuous groups owing to their parametrisation via the
continuous parameter t ∈ R and thus enable the study of infinitesimal transforma-
tions and associated symmetry properties. We are primarily interested in matrix Lie
groups (section B.2.3) which are closed subgroups of the general linear group of all
invertible n-dimensional complex matrices GL(n; C) = {A ∈ Mn (C)| det(A) ̸= 0}.
In particular, the existence of Lie group homomorphisms (definition B.2.3) between
Lie groups φ : G1 → G2 allows for various groups of interest in quantum comput-
ing to be studied by way of representations of GL(n; C). Unitary operators (and
channels) described above then have a representation as a unitary group (definition
B.2.4). Importantly unitary groups are connected Lie groups (definition B.2.1) such
that for all A, B ∈ G there exists a continuous path γ : [0, 1] → Mn (C) such that
γ(0) = A and γ(1) = B with γ(t) ∈ G for all t. This is important for establish-
ing the existence of geodesics γ(t) ∈ G (as time-optimal paths in control theory).
For qubits, our target unitaries are drawn from the special unitary group SU (2)
(equation B.2.1).
Associated with each Lie group G is a Lie algebra g (definition B.2.6) which is
of central importance as the algebras from which Hamiltonians governing quantum
evolution are composed. Lie algebras are vector spaces over a field R or C equipped
with a bilinear form (product) [X, Y ] for X, Y ∈ g satisfying (a) [X, Y ] = −[Y, X]
and (b) the Jacobi identity. The bracket [X, Y ] we identify as the commutator in
2.3. ALGEBRA AND LIE THEORY 27

quantum contexts or the Lie derivative (owing to its status in terms of derivations)
satisfying a number of algebraic properties (proposition B.2.2). As we discuss in the
next section, Lie algebras have a natural interpretation and nice geometric intuition
in terms of tangent bundles over Lie group manifolds M. Importantly (particularly
for our exegesis in Chapter 5) the Lie derivative can be identified with the endomor-
phic adjoint action of a Lie algebra upon itself (definition B.2.7) ad : g → EndK (g)
such that adX (Y ) = [X, Y ]. As with Lie groups, Lie algebra homomorphisms (defi-
nition B.2.8) allow mappings between Lie algebras which is important in geometric
(fibre bundle) formulations of quantum evolution over manifolds and corresponding
tangent bundles (see Appendix C). Semi-simple Lie algebras and Lie groups admit
a classification system based upon concepts of ideals (definition B.2.11) which is of
fundamental importance to Cartan’s classification of symmetric spaces. Establish-
ing time optimality using variational techniques requires metrics (inner products)
on Hamiltonians composed from g which in turn requires a way of combining gen-
erators in g. This is given by the Killing form (definition B.2.12) which is a bilinear
mapping over g given by B(X, Y ) = Tr(adX , adY ) for X, Y ∈ g. Additionally, Car-
tan’s criterion for semisimplicity is that the Killing form is non-degenerate, that
is g is semisimple if and only if the Killing form for g is non-degenerate, namely
B(X, Y ) ̸= 0 for all X, Y ∈ g.
Quantum control in terms of Lie groups and Lie algebras relies upon the impor-
tant bridge between the two provided by the exponential map between G and g. In
essence unitaries U (t) represent exponentiated Lie algebra g elements G ∼ exp(g).
This relationship as we discuss is algebraic but connects nicely (not focused on here)
with wave function representations of solutions to Schrödinger’s equation in terms
of exponentials (or their sine and cosine expansions). In this context, the matrix
exponential is an important bridging concept. It is defined (definition B.2.13) via
P∞ X n
the power series eX = n=0 n! for X ∈ g and satisfies a number of properties
we rely on throughout (set out in theorem (B.2.5)). Where the exponential map
allows one to transition from g to G, the derivative of the matrix exponential at
t = 0 allows one to transition from g ∈ G to X ∈ g via dtd etX t=0 = X (equation
(B.2.15)). Moreover, the mapping gives rise to an important unique homomorphism
between G and g at the identity, allowing symmetries and properties of G to be
explored (and, in a control sense, manipulated) by way of g itself. Formally, the
Lie algebra of a matrix group G ⊆ GL(n; C) is then given by (definition B.2.16)
g = {X ∈ Mn (C) | etX ∈ G for all t ∈ R}. Moreover, this relationship allows us
to map between the adjoint action of g and that of G itself (equation (B.2.16))
[X, Y ] := XY − Y X = dtd etX Y e−tX t=0 which we rely upon for example in our final
Chapter.
An important connection between algebraic and geometric methods in quantum
28 CHAPTER 2. BACKGROUND THEORY

information and machine learning arises from the correspondence between Lie alge-
bras g and the tangent plane (or bundle) of a manifold T M. The correspondence
(theorem B.2.17) provides that each Lie algebra g of G is equivalent to the tan-
gent space to G at the identity element of G. This equivalence means that g has
a representation as X ∈ Mn (C). The vectors X represent derivatives at t = 0 of
smooth curves γ : R → G with γ(0) = I, γ ′ (0) = X, allowing characterisation of
sequences of unitaries (Uj (t)) ∈ G (here j indexes the sequence) as paths γj (t), i.e.
we are equating curves with sequences of unitaries. Note that G can also carry
a representation in Mn (C). Furthermore, the Lie algebra / group homomorphism
(section B.2.8) Φ : G → H, φ : g → h such that dΦ = φ. The homomorphism is
fundamental to important theorems such as the Baker-Campbell-Hausdorff theorem
(section B.2.9):

1 1
exp(X) exp(Y ) = exp X + Y + [X, Y ] + [X, [X, Y ]] + ...
2 12

for X, Y, Z ∈ g which facilitates approximations for quantum control. Moreover it

importantly allows us to understand how evolutions on M (represented via diffeo-
morphic exponential group elements) are shaped by not just Lie algebra elements
individually, but their conjugation (adjoint action). For example in our final Chapter
5 with Cartan decompositions (section 5.3.7) g = p ⊕ k, it is the fact that [p, p] ⊆ k
(together with the other relations constitutive of the Lie triple property of such
symmetric spaces) which fundamentally allows elements in K ⊂ G to be reachable
via application of controls to p and for such curves γ(t) to have a representation in
the homogeneous space G/K.

2.3.2 Representation theory

Representation theory is an important feature of the application of Lie groups (and

non-commutative algebra) in quantum control and machine learning settings. It is of
particular relevance to our final Chapter, but also others below. Denoting GL(n; C)
as G(V ) we define a representation of G as a finite-dimensional continuous homo-
morphism of π : G → GL(V ) such that π([X, Y ]) = [π(X), π(Y )] (definition B.3.1).
Lie algebras may also carry representations into GL(V ). Representations satisfy a
number of properties set out in section B.3.2 which allow groups and algebras to
be studied by way of such representations. An important such representation is
the adjoint representation. Given a Lie group G and Lie algebra g with smooth
isomorphism Adg (h) = ghg -1 where g, h ∈ G, there exists a corresponding Lie al-
gebra isomorphism adX : g → g such that exp((adg )(X)) = g(exp(X))g -1 where
exp(adg (X)) = AdG (exp(X)). Importantly, the adjoint representation maps Lie al-
2.3. ALGEBRA AND LIE THEORY 29

gebra elements to automorphisms of the Lie algebra itself. The benefit of adopting
the adjoint representation is also to some degree one of computational efficiency
where symmetry structure can be more easily diagnosed or embedded (such as in
the case of equivariant neural networks [30, 34, 37]). Adjoint expansions (section
B.3.5) are an important tool in later Chapters (particularly in Chapter 5). Recall-
ing group adjoint action as Adh (g) = hgh -1 with the related Lie algebra adjoint
action adX (Y ) = [X, Y ], we note that such conjugation can be expanded in sine and
cosine terms (equation (B.3.2)):

eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (2.3.1)

This relation together with the sine and cosine (equation (B.3.7)) expansions of
the adjoint action are central to the close-form results for calculation of minimum
time (equation (5.3.13)) time-optimal Hamiltonians for quantum control involving
Riemannian symmetric spaces (see section 5.6).
Lie algebras are defined over R (real) and C (complex) fields. Complex Lie
algebras g (over C) and real Lie algebras g0 (over R) are related via complexification
and the related concept of the real form of a complex Lie algebra. A real vector
space V (R) is the real form of a complex vector space when the complexification
of V is given by W R = V ⊕ iV (definition B.3.3). In Lie algebraic terms this
is g = gC0 = g0 + ig0 (equation (B.3.9)). In particular, this allows us to write
A ∈ Mn (C) as a real-valued matrix only (equation (B.3.10)) which we leverage in
Chapter 4.

2.3.3 Cartan algebras and Root-systems

Cartan decompositions of symmetric space Lie groups provide a technique for solving
certain classes of time-optimal quantum control problems. Cartan algebraic formal-
ism (see section B.4 for an expanded summary) is crucial to the symmetry-reduction
techniques examined in this work. Moreover, Cartan’s pioneering methods are at
the heart of establishing connections between algebraic (Lie-theoretic) formalism
and differential geometric framing. Here we define Cartan subalgebras and connect
them with root systems and accompanying diagrammatic and geometric interpreta-
tions (summarising primarily standard literature from [2, 9, 10, 47]).
Root systems are primarily used in the classification and structure theory of Lie
algebras, while weight systems are used in the study and classification of their repre-
sentations. In particular, in both cases we effectively probe the structure of g using
the adjoint action in the adjoint representation to reveal its symmetry structure
used in time-optimal synthesis. This is the case for example in Chapter 5 where the
commutation table, Table 5.1, evidences the Cartan commutation relations (section
30 CHAPTER 2. BACKGROUND THEORY

B.5) via the adjoint action. For a subalgebra h ⊂ g with a (diagonal) basis H and
a dual space h∗ with elements ej ∈ h∗ (a dual space of functionals ej : V → C), we
can construct a basis of g given by {h, Eij } where Eij is 1 in the (i, j) location and
zero elsewhere (section B.4.2). We study the adjoint action of elements of H ∈ h on
each such eigenvector:

adH (Eij ) = [H, Eij ] = (ei (H) − ej (H))Eij = αEij

Here α = ei − ej is a linear functional on h such that α : h → C. Such functionals α

are denoted roots. The Lie algebra g can then be decomposed as (equation (B.4.1)):

g = h ⊕i̸=j CEij = h ⊕α∈∆ gα gα = {X ∈ g|adH (X) = αX, H ∈ h}

which satisfies certain commutation relations given in equation (B.4.2). Roots may
then be placed in a positive or negative ordered sequence. Each root is a non-
generalised weight of adh (g) (a root of g with respect to h). The set of roots is
denoted ∆(g, h). The weights (definition B.4.1) for a given representation π : h →
V (C) can then be constructed as a generalised weight space Vα = {v ∈ V |(π(H) −
α(H)1)n v = 0, ∀H ∈ h} for Vα ̸= 0 (v ∈ Vα are generalised weight vectors with
α the weights). Here π is the adjoint action of h on X ∈ g. Weights allow g to
be decomposed as g = h ⊕α∈∆ gα (B.4.3). Note that g0 is then the weight space
subalgebra associated with the zero weight under this adjoint action. This leads
to the important definition of a Cartan subalgebra (definition B.4.2) h ⊂ g as a
nilpotent Lie algebra of a finite-dimensional complex Lie algebra g such that h = g0 .
Cartan subalgebras are maximally abelian subalgebras of g. There may be multiple
Cartan subalgebras in g and each is conjugate via an automorphism of g (which
for Cartan decompositions means under conjugation by an element of the isometry
group K). As we discuss in Chapter 5, the choice of Cartan subalgebra is of pivotal
significance to the application of global Cartan decompositions for time-optimal
control.
Cartan subalgebras allow the generalisation of the concept of roots to a root
system. In this formulation (definition B.4.3), roots are the non-zero generalised
weights of adh (g) with respect to the Cartan subalgebra. Recall we denote the set of
roots α is ∆(g, h). Cartan subalgebras h are the maximally abelian subalgebras such
that the elements of adg (h) are simultaneously diagonalisable given that [Hk , Hj ] =
0 for Hk , Hj ∈ h. Roots act as functionals such that for α ∈ ∆, we have that
α(H) = B(H, Hα ), ∀H ∈ h where B(·, ·) denotes the Killing form on g and h (see
section B.4.4 for more detail). From this concept of roots, we can define abstract
root systems (section B.4.5) that satisfy a range of properties set out in definition
B.4.4. These in turn allow roots to be framed in geometric terms related to the Weyl
2.3. ALGEBRA AND LIE THEORY 31

group and concepts of angles between roots (see section B.4.6) which are expressed
via Dynkin diagrams (see below), together with an ordering of roots (see section
B.4.7). Chapter 5 includes derivation of a root system, for example, in relation to
SU (3) connected with the introduction of our method for time-optimal synthesis.
Cartan algebras can be used to construct a Cartan-Weyl basis which is a basis of the
Lie algebra g comprising the Cartan subalgebra h together with root vectors, where
each root can be thought of as a symmetry transformation (see equation (B.4.1)).
Abstract root systems moreover allow the construction of a Cartan matrix (def-
inition B.4.5) where given ∆ ∈ V and simple root system Π = αk , k = 1, ..., n =
dim V , the Cartan matrix of Π an n × n matrix A = (Aij ) whose entries are given
by Aij = 2 ⟨αi , αj ⟩ /|αi |2 . Cartan matrices are then used to construct Dynkin dia-
gram (definition B.4.6) as set out in Figure B.1. Dynkin diagrams allow for visual
representation of root system, encoding angles and lengths between roots. From a
quantum control perspective, roots can be related to the transition frequencies be-
tween different energy levels of a quantum system. Importantly, the Cartan matrix
(definition B.4.5), derived from the inner products of simple roots, tells us about
the relative strengths and coupling between these transitions (see equation (5.8.14)
for a generic example).

2.3.4 Cartan decompositions

We conclude this section with a discussion of the key concept of Cartan decompo-
sitions (section B.5) of semisimple Lie groups and their associated algebras. For a
given a complexified semisimple Lie algebra g = gC0 , a corresponding Cartan involu-
tion θ is associated with the decomposition g = k ⊕ p. In turn, this allows a global
decomposition of G = KAK where K = ek and A = ea where a ⊂ p (a maximal
abelian subspace of p), which is a generalisation of polar decompositions of matrices.
For classical semisimple groups, the real matrix Lie algebra g0 over R, C is closed un-
der conjugate transposition, in which case g0 is the direct sum of its skew-symmetric
k0 and symmetric p0 members. It can be shown [9] that for semi-simple Lie alge-
bras this admits a decomposition g0 = k0 ⊕ p0 which in turn can be complexified as
g = k ⊕ p. To obtain this decomposition, we consider the fundamental involutive
symmetry automorphism on g denoted the Cartan involution. Using the Killing
form B, X, Y ∈ g and an involution θ, a Cartan involution (definition B.5.1) is a
positive definite symmetric bilinear form such that Bθ (X, Y ) = −B(X, θY ). From
the existence of this involution, we infer the existence of the relevant Cartan decom-
position given by g = k ⊕ p with k the +1 symmetric eigenspace θ(k) = k and p the
-1 anti(skew)symmetric eigenspace θ(p) = −p satisfying the following commutation
32 CHAPTER 2. BACKGROUND THEORY

relations from equation (B.5.4):

[k, k] ⊆ k [k, p] ⊆ p [p, p] ⊆ k

The subspaces k, p are orthogonal with respect to the Killing form in that Bg (X, Y ) =
0 = Bθ (X, Y ) i.e. for X ∈ k and Y ∈ p we have B(X, Y ) = 0. In a quantum control
context, this aligns with the decomposition of H corresponding to states generated
by p and k. The remarkable and important feature of the commutations above is that
under the operation of the Lie bracket, Hamiltonians comprised solely of elements
in p can in fact reach targets in k despite the fibre bundle structure partitioning
the space (see the following section). The corresponding Lie group decomposition
is given by G = KAK where K = exp(k), A = exp(a) (definition B.5.3) for a ⊂ h.
Every element g ∈ G thus has a decomposition as g = k1 ak2 where k1 , k2 ∈ K, a ∈ A.
This allows unitary channels to be decomposed as U = kac and U = pk (see [15]
for the latter) for k, c ∈ exp(k) = K and p ∈ exp(p). We note also, as discussed
in Chapter 5, that for our global decomposition method, we seek a ⊂ p which
means that the Cartan subalgebra must have a compact and non-compact subset
i.e. h ∩ k ̸= 0 and h ∩ p ̸= 0, where we seek the maximally non-compact (most
overlap with p) subalgebra. Because h may lie entirely within k, it may be that a
Cayley transform is required which in essence involves conjugation of elements of
h with a root vector (see Knapp [9] and section B.5.3 for detail). An example of
Cayley transforms for SU (3) is set out in section 5.8. We now progress to cover
important concepts from differential geometry applicable to methods used in this
work, especially those from geometric control theory.

2.4 Differential geometry

2.4.1 Manifolds and tangents

In this section we provide a synopsis of key concepts from geometry related to quan-
tum information processing and machine learning. A more extensive exegesis is set
out in Appendix C. This section and the Appendix summarise standard material
from [2, 9, 44, 48–52]. We begin with an outline of basic concepts such as manifolds,
tangent spaces, tensor fields (including vector fields, integral curves and local flows),
together with cotangent spaces, metrics and tangent planes. We then explore fibre
bundle theory and its central use in geometric contexts via partitioning bundles
(tangent bundles for our purposes) into horizontal and vertical subspaces. We de-
velop the theory of connections on a bundle as a lead into definitions of geodesics and
parallel transport. We relate these concepts to those of curvature and the definitions
2.4. DIFFERENTIAL GEOMETRY 33

of Riemannian (and subRiemannian) manifolds and symmetric spaces (of particular

importance as we model our quantum control problems as symmetric space control
problems). We then discuss subRiemannian geometry before discussing geometric
control theory and variational methods such as the Pontryagin Maximum Principle.
The study of geometric methods usually begins with the definition and study
of the properties of differentiable manifolds. One begins with the simplest concept
of a set of elements, characterised only by their distinctiveness (non-identity), akin
to pure spacetime points M. The question becomes how to impose structure upon
M in a useful or illuminating manner for explaining or modelling phenomena. A
differentiable manifold (definition C.1.2) is a Hausdorff topological space M together
with a global differentiable structure. The pure manifold M lacks sufficient structure
for analytic operations such as differentiability to be well-defined. For this, we
associate to points in M an analytic structure equipped with sufficient structure for
differential operations. This is via equipping M with coordinate charts (definition
C.1.1), each of which is a pair (U, ϕ) on M where U ⊂ M (an open subset) and
ϕ : U → Km where p 7→ ϕ(p) (usually taking K = R or C) is a homeomorphism. The
coordinates of a point p are maps from p ∈ U ⊂ M to the Euclidean space Km i.e.
(ϕ1 (p), ..., ϕm (p)). The coordinate functions are each of these individual functions
ϕµ : U → K. Lie groups (proposition C.1.1) can then be construed as a group
G equipped with a differentiable structure is a smooth differentiable manifold via
the smoothness of group operations (arising from the continuity properties of G i.e.
being a topological group). We are often interested in functions defined on p ∈ M.
Functions that preserve such structure are played by the role of C r functions, those
which are differentiable up to order r where differentiability requires r ≥ 1 while
smoothness requires r = ∞ i.e. f ∈ C ∞ (M) (see section C.1.1 for more detail).
Tangent spaces T M (section C.1.1.1) of a manifold are a crucial concept for
geometric characterisation of quantum control and machine learning. We start with
the concept of a curve (definition C.1.3) on the manifold γ(t) ∈ M being a smooth
(i.e. C ∞ ) map γ : R ⊃ (−ϵ, ϵ) → M, t 7→ γ(t) from an interval in R crossing 0 into
M, where γ(0) = p (the ‘start’ of the curve). With this definition, we then define
tangents (definition C.1.4) in terms of the derivatives of curves in a local coordinate
system (x1 , ..., xm ) such that two curves are tangent if their derivatives in Rm at
γ(t) = p align, that is:

dxi dxi
(γ1 (t)) = (γ2 (t)) i = 1, ..., m.
dt 0 dt 0

Thus the intuitive notion of a vector ‘sticking out’ from a surface or along a curve
is replaced with analytic (derivation) properties of M via its representation in lo-
cal charts. The equivalence class of curves satisfying this condition at p ∈ M is
34 CHAPTER 2. BACKGROUND THEORY

sometimes denoted [γ], hence we can regard tangents as vectors. The tangent space
Tp M is defined as the set of all tangents at p ∈ M, while the tangent bundle T M is
defined as the union of all Tp M. We also define a projection map from the tangent
bundle to M via π : T M → M which, as we shall see, is related in Lie theoretic
contexts to the exponential map from g to G. Tangent vector v ∈ Tp M can be used
to define the directional derivative on f ∈ C ∞ (M) via considering v as operators on
f via v(f ) = df (γ(t))
dt 0
. The directional derivative points in the direction of steepest
ascent and is central to defining the all important gradient (definition C.1.5) central
also to machine learning, quantum information processing. For f : M → R on M,
the gradient of f at a point p ∈ M, denoted ∇f (p) is the unique vector v ∈ Tp M
satisfying v(f ) = ⟨∇f (p), v⟩. Here v(f ) denotes the directional derivative of f in the
direction of v (as above), and ⟨·, ·⟩ denotes the applicable metric. We discuss gra-
dients at some length in later sections especially in the context of backpropagation
and stochastic gradient descent.
While intuitively one often regards a problem as confined to a single M and
bundle T M, each Tp M in the general case is not necessarily identical. For a map
between manifolds h : M → N , then we can define a corresponding pushforward
(definition C.1.6) as taking a vector in the tangent space associated with p ∈ M,
namely v ∈ Tp M and represented as h∗ (v) to the tangent space associated with the
element of N i.e. h∗ : T M → T N . As discussed in section C.1.2, tangents can be
characterised operators or linear maps acting on functions f ∈ C ∞ (M) such that
T M is a derivation. Each tangent vector can be written as v = m
P µ
µ v (∂µ ) where
(∂µ ) represent a basis of partial derivatives of T M with v µ ∈ R the component
functions (equation (C.1.11)).

2.4.2 Vector fields

We construct the notion of a vector field as an assignment of Tp M to each p ∈

M satisfying certain linear properties (see definition C.1.7) and closure under the
commutator (Lie derivative) [X, Y ] (see section C.1.2.2 for detail). Vector fields,
having a basis in differential operators, can be thought of as operators on f ∈
C ∞ (M) which are themselves defined on M. We denote the set of vector fields
X(M). Thus we can map between Lie algebras and vector fields in specific cases of
interest in quantum control. Vector fields are importantly regarded as generators of
infinitesimal diffeomorphisms (curves γ(t)) on M in the form δ(xµ ) = ϵX µ (x). For
optimal control where all U ∈ G (or p ∈ M) are reachable, we require M to be filled
with a family of curves γ(t) such that the tangent vector to any such curve at γ(p)
is the vector field Xp . Such curves are denoted integral curves defined as a curve
γ : (−ϵ, ϵ) → M, t 7→ γ(t) such that (at the identity in R) we have γ(0) = p and the
2.4. DIFFERENTIAL GEOMETRY 35

push-forward γ∗ equates to the vector field at p, that is γ∗ (d/dt) = Xp (definition

C.1.8). The vector field can then be regarded as the generator of infinitesimal
translations on M constituted by a group of one-parameter (e.g. parametrised by t ∈
I ⊂ R) diffeomorphisms, such that vector fields are equated with diffeomorphisms
in an equivalent way to Lie algebras with Lie groups via the exponential map. Thus
we can connect Lie theory and differential geometric formulations of curves to map
generators of curves in g to vector fields, facilitating later discussion of Hamiltonians
generating (geodesic) curves γ(t) which are integral curves. More particularly, we
have local flow (section C.1.2.4) which provides a way to construct integral curves
from the vector field in a way parametrised by t.
In earlier sections we introduced notions of dual spaces in the context of quantum
information processing and algebra. We relate these to important concepts of one-
forms, n-forms and tensor products. From dual spaces we define cotangent vectors
as a map from the tangent space to R i.e. k : Tp M → R (definition C.1.9). This
can be thought of as selecting, for v ∈ Tp M the appropriate dual k such that ⟨k, v⟩
is in R. The cotangent space is the space of all cotangent vectors i.e. at p ∈ M it is
the set Tp∗ M of all such linear maps constituting cotangent vectors. If Tp M has a
basis {e1 , ..., en }, then the dual basis for Tp∗ M is a collection of vectors {f 1 , ..., f n } is
uniquely specified by the criterion that ⟨f i , ej ⟩ = δji ∈ R i.e. they contract to unity
(equation (C.1.16)). Understanding how vectors and dual basis elements essentially
contract each other down to scalars is a fundamental point of tensorial contractions.
The cotangent bundle T ∗ M is the collection (a vector bundle) of all such cotangent
for all p ∈ M i.e. T ∗ M = p∈M Tp∗ M.
S

From cotangent vectors, we then construct the definition of a one-form ω on M

as the assignment of a cotangent vector (being a smooth linear map) ωp to every
point p ∈ M (definition C.1.10) i.e. ω : X → R, X ∈ X(M). We discuss this in more
detail in section C.1.3.2, including the relationship of one forms to inner products
and metrics. As noted in the literature, given h : M → N while we cannot in gen-
eral obtain a differential form to act as a pushforward h∗ : Tp M → Th(p) N between
tangent spaces, we can always obtain a pullback that maps in the other direction
h∗ : Th(p)
∗
N → Tp∗ M (definition C.1.11). Pullbacks provide a means of transforming
differential forms, tensors and other objects between manifolds in order to under-
stand how they transform. In Lie derivative contexts (section C.1.3.4), for a vector
field X on M with flow ϕX X∗
t , the pullback ϕt ω denotes how ω transforms under
each diffeomorphism ϕt . The Lie derivative of ω with respect to X, LX ω is then the
rate of change of ϕ∗t ω as it ‘travels backwards’ to 0 (equation C.1.21), describing how
ω instantaneously changes along the flow of X. In the general case, we move from
the Lie derivative to the exterior derivative to understand how differential forms df
(for any f ∈ C ∞ (M)) change locally on manifolds M (see definition C.1.19).
36 CHAPTER 2. BACKGROUND THEORY

2.4.3 Tensors and metrics

With the formalism of vectors and forms, we can then express tensor products in
terms of tensor products of tangent and cotangents. A tensor of type (r, s) ∈ Tpr,s M
at a point p ∈ M belongs to the tensor product space (equation (C.1.23)):
" # " #
Tpr,s M := ⊗r Tp M ⊗ ⊗s Tp∗ M (2.4.1)

i.e. r tensor products of the tangent space with itself tensor-producted with s ten-
sor products of the dual cotangent space. Moreover, tensors can be regarded as
mappings taking elements from Tp M and the dual space Tp∗ M and contracting
them down to scalars (equation (C.1.24)). This formalism of contraction is fun-
damental to calculating metrics (via metric tensors), curvature (via the Riemann
curvature tensor) and other operations. Recalling the contraction to unity of ba-
sis elements in equation (C.1.16), we can see that contraction of arbitrary tensors
⟨af i , bej ⟩ = ab ⟨f i , ej ⟩ = abδji effectively multiplies the remaining tensors (vectors
and/or covectors) by the scalar product of coefficients (see section C.1.4). This
can be seen in particular for tensor contractions (definition C.1.15). In quantum
information and control contexts, the quantum measurement (via the partial or full
trace) can be considered in effect a tensorial contraction in this way. Thus tensors
play a central role in quantum theory and dynamics.
A tensor of central importance to calculation of time-optimal unitary sequences,
measurement and machine learning with quantum systems is the metric tensor (sec-
tion C.1.4.1) defined as a [0,2]-form tensor field mapping gp : Tp M × Tp M → R
given by g := gij dxi ⊗ dxj with metric components gij = ⟨ei , ej ⟩ and inverse com-
ponents g ij = ⟨dxi , dxj ⟩. As we discuss below, it is the existence of a metric tensor
on Riemannian and subRiemannian (symmetric) spaces of interest that allows for
calculation of time-optimal metrics, such as arc-length (minimal time) or energy.
The general principles above can be related to n-forms (section C.1.4.2) in order to
build up the concept of an exterior product of forms ω1 ∧ ω2 (definition C.1.18) and
exterior derivative (definition C.1.19). The former is related to Cartan’s structural
equations (theorem C.2.2) and the Maurer Cartan form which can be related to
measures of curvature for example.

2.4.4 Tangent planes and Lie algebras

understand the behaviour of left-invariant vector fields by examining transformations

within Te G (section C.1.5). The concept of left- and right-invariance of a group
action begins with elementary concepts of left translation lg : G → G, g ′ 7→ gg ′ and
equivalent right-translation. We then say that a vector field X on G is left-invariant
if lg∗ X = X, ∀g ∈ G i.e. that the left g-action on X (represented by the pushforward
lg∗ - recall we are mapping algebraic and geometric concepts here) keeps X constant.
Left invariance also applies to the Lie derivative (equation (C.1.34)). Thus we
can construe g in terms of invariant properties of vector fields X. Exponential
maps can be framed in terms of integral curves, namely as the unique integral
A A
curve satisfying t → γ L (t), A = γ∗L dtd 0 (section C.1.5.3) of the left invariant

A
vector field LA associated with the identity in M, that is γ L (0) = e and which
A
is defined for all t ∈ R. The notation γ L refers to the integral curve generated
by the left-invariant vector field LA originating from the identity element e ∈ G,
reflective of the idea that the evolution of curves γ(t) (including those with which
we are concerned in relation to time-optimality) can be studied in terms of the
canonical Lie algebra associated with Te G. In more advanced treatments, which
we leverage in Chapter 5, the Maurer-Cartan form (definition C.1.20) provides a
useful way identifying the geometric encoding of symmetries of G and g (see section
C.1.5.4), in particular by more explicitly showing how underlying structural features
of g are related to geometric properties such as curvature and indeed Hamiltonian
evolution. The Maurer-Cartan form is a g-valued one-form ω on G given by ωg (v) =
lg∗−1 (v). The Maurer-Cartan equation then describes the differential form via dω α +
1
Pn α β γ
2 β,γ=1 Cβγ ω ∧ ω = 0 (see equation (C.1.39) and definition C.1.20). In Chapter
5 we relate the Schrödinger equation to the Maurer-Cartan form as a means of
connecting to Lie group-related forms (see section 5.6 and equation (5.6.8)).

2.4.5 Fibre bundles and Connections

Geometric concepts of fibre bundles and connections are integral to geometric control
theory and geometric techniques in quantum information processing and machine
learning. The represent central concepts leveraged in Chapter 4 and 5. They play a
fundamental structural role in the abstract underpinning of how vectors transform
across manifolds M, curvature and parallel transport. Usually in quantum comput-
ing contexts, one assumes the existence of a single Hilbert space H within which
vectors are transported. In geometric framings, instead usually one begins with the
type of differentiable manifold M equipped with a topology of interest. One then
associates to points in M additional abstract structure to enable actions like dif-
ferentiation and mappings of interest to be well-defined, such as a real or complex
valued vector space, but it may be some other type of structure. In order to ob-
38 CHAPTER 2. BACKGROUND THEORY

tain the type of structural consistency quantum information researchers are used to
(i.e. the ease of dealing with a single H), one needs to impose additional geometric
structure, including bundles and connections. To see this, note that in geometric
framings, the evolution of a system from one state to another is represented as a
transformation from p → p′ for p, p′ ∈ M. Each p ∈ M has its own abstract space
e.g. its own vector space. One must therefore define how vectors in one space e.g.
Tp M transforms to Tp′ M.
To handle this abstract formulation, geometry adopts the concept of a bundle is
defined as a triple (E, π, M), where E (denoted the base space) and M are linked
via a continuous projection map π : E → M with a corresponding inverse from
π -1 : M → E. The idea of a bundle encapsulates the means of assigning abstract
structures such as vector spaces to points p ∈ M. These abstract structures are the
fibres associated with p. The base space is then the union of all fibres. Thus we
S
define a base (bundle) space as E = p∈M Fp , with Fp being fibres which remain
abstract at this stage. Formally (definition C.1.21), a fibre over p is the inverse
image of p under π. It arises via the map π −1 : M → T M, an example of a
fibre bundle associating p ∈ M with the tangent space Tp M. The projection π
associates each fibre Fp with a point p ∈ M, where Fp = π −1 ({p}) defines Fp as
the preimage {p} of p under π. Certain bundles have the special property that the
fibres π −1 ({p}), p ∈ M are all homeomorphic (diffeomorphic for manifolds) to F . In
such cases, F is known as the fibre of the bundle and the bundle is said to be a fibre
bundle. For vectors, this is the set of all vectors that are tangent to the manifold
at the point p. The fibre bundle is sometimes visualised in diagrammatic form (see
Isham [48] §5.1.2):
F E
π

M
An important canonical fibre bundle is the principal fibre bundle (section C.1.6.1)
whose fibre acts as a Lie group. A principal fibre bundle has a typical fibre that is
a Lie group G, and the action of G on the fibres is by right multiplication, which
is free and transitive. The fibres of the bundle are the orbits of the G-action on E
and hence are not generally homeomorphic to each other. When G is the fibre itself
and the action on G is both free and transitive. When dealing with principal fibre
S
bundles, often the notation P = p∈M Fp is used to denote the total space (i.e. E)
in order to emphasise that each fibre is isomorphic to G itself and that all fibres
in the bundle are homogeneous i.e. they are all structurally isomorphic (so we can
utilise a single representation for each). This is not necessarily the case in general
for base spaces E.
To obtain the more specific form of a vector bundle or where fibres are the Lie
2.4. DIFFERENTIAL GEOMETRY 39

algebras g themselves (which intuitively connects with T M ∼ g correspondence),

we rely on the definition of an associated bundle (definition C.1.23) which effectively
allows fibres bundles that are vector bundles (definition C.1.24), being a type of
fibre bundle in which each fibre exhibits the structure of an n-dimensional vector
space. Vector bundles usefully allow for the definition of connections (see below)
which describe how fibres (or vector spaces) are connected over different points in
M. Intuitively the idea of a connection is a means of associating vectors between
infinitesimally adjacent tangent planes Tp M → Tp+dp M as one progresses from
γ(t) = p to γ(t + dt) = p + dt on M.
Connections (section C.1.7) are of fundamental importance to results in our final
Chapter and also to definitions of vertical and horizontal subspaces in subRieman-
nian control problems further on. A connection on a principal bundle defines a
notion of horizontal and vertical subspaces within the tangent space of the total
space. Hence they are a fundamental means of distinguishing between Riemannian
and subRiemannian manifolds via the decomposition of fibres (and Lie algebras)
into horizontal and vertical subspaces. This distinction is crucial for defining paral-
lel transport and curvature, concepts that are central to understanding the dynamics
and control of systems with symmetry. A connection on P (which we treat in terms
of a vector bundle) provides a smooth splitting of the tangent space Tp P at each
point p ∈ P into vertical and horizontal subspaces, Tp P = Vp P ⊕ Hp P , where Vp P
is tangent to the fibre (recall we measure tangency here in terms of inner products)
and Hp P is isomorphic to Tπ(p) M. The vertical subspace (definition C.1.25) Vp P at
p ∈ P is defined as the kernel of the differential of the projection map π : P → M.
Vp P consists of tangent vectors to P at p that are ‘vertical’ in the sense that they
point along the fibre π −1 (π(p)). These vectors represent infinitesimal movements
within the fibre itself, without leading to any displacement in the base manifold
M. The horizontal subspace Hp P (definition C.1.26) at p ∈ P consists of vectors
that are ‘horizontal’ in the sense that they correspond to displacements that lead
to movement in the base manifold M when considered under parallel transport de-
fined by the connection. A connection (definition C.1.27) (on a principal G-bundle
but we focus here on vector bundles) G → E → M is a smooth assignment of
horizontal subspaces Hp P ⊂ Tp P , to each point p ∈ M. Essentially they allow the
partitioning of the vector bundle and importantly specify the horizontal subspace
which in turn is crucial for generating geodesics along M. For example, in Chapter
4 and [53, 54], generators chosen from Hp M (the distribution ∆) are relied upon in
order to generate geodesic training data.
In Chapter 5, our control subset p ⊂ g can be construed as a horizontal sub-
space (see section C.1.7.1 for more discussion). The decomposition into vertical and
horizontal subspaces discussed above is particularly relevant to time optimal control
40 CHAPTER 2. BACKGROUND THEORY

problems. In essence, given the Cartan decomposition (B.5.2) g = k ⊕ p we asso-

ciate k as the vertical and p as the horizontal subspace. We can then understand
the symmetry relations expressed by the commutators related to this vertical and
horizontal sense of directionality: i.e. given [k, k] ⊂ k, [k, p] ⊂ p and [p, p] ⊂ k we can
see that the horizontal generators under the adjoint action shift from p ∈ G/K to
p′ ∈ G/K, while the vertical generators in k do not translate those points in G/K.

2.4.6 Geodesics and parallel transport

We now define concepts of parallel transport, covariant differentiation and geodesics,
each of which are essential to the geometric time-optimal control formulation applied
in later Chapters. We are interested in horizontal vector fields whose flow lines move
from one fibre to another, intuitively constituting translation across a manifold
via generators in the horizontal subspace. Parallel transport shifts vector fields
vectors along integral curves such that they are parallel according to a specified
connection. The concept of horizontal lifts (definition C.1.29) is central to this
process by enabling (i) preservation of a concept of direction and (ii) preserving
‘horizontality’ such that the notion of straightness or parallelism of Euclidean space
can be extended to curved manifolds, essential for the study of geodesics. For a
smooth curve γ in M, a horizontal lift γ ↑ : [a, b] → P is a curve whose tangent
vectors are in the horizontal subspaces. The vector field (Xp↑ ) is identified as the
horizontal lift of X as it ‘lifts’ up the vector field X on M into the horizontal
subspace of T P . The requirement of Vp (Xp↑ ) = 0, indicates that that X ↑ lies entirely
in the horizontal subspace Hp M, encapsulating the essence of parallel transport as
maintaining the direction of X through the fibres of P .
Given a curve γ(t) ∈ M, the vector field X is parallel along γ(t) if the covariant
derivative vanishes along the curve. We denote such fields as parallel vector fields
(definition C.1.30). The notion of parallel transport along γ from γ(a) to γ(b) is
defined as a map τ : Tγ(a) M → Tγ(b) M satisfying τ : π −1 ({γ(a)}) → π −1 ({γ(b)})
(definition C.1.31). In more familiar language this can be shown to be equivalent
to ∇γ̇(t) X = 0 by maintaining vectors in the horizontal subspace Hp M. Note
sometimes we define for γ̇(t) ∈ Tp M the terms ∇γ̇(t) := ∇γ(t) where γ(t) is any
curve that belongs to the equivalence class of [γ̇(t)] (Isham §6.7). The covariant
derivative (definition C.1.33) ∇ can then be defined in terms of parallel transport
as:

τt−1 X(γ(t)) − X(γ(0))

∇γ X := lim
t→0 t

where X ∈ X. In this work we generally use the notation ∇γ̇(t) to emphasise the
covariant derivative is with respect to γ̇(t). The notation in geometric settings
2.4. DIFFERENTIAL GEOMETRY 41

(section C.1.10) can get quite dense, but we can see in the definition above that τt−1
transports vectors back along γ(t) → γ(0) = p0 while preserving parallelism. Here
X(p0 ) can be thought of as the initial vector field at γ(0) = p0 (the start of our
curve) and X(γ(t)) the vector field at some later point γ(t) with τt−1 X(γ(t)) the
vector at a later time transported back. The equation above is then zero in such case
as τt−1 X(γ(t)) − X(γ(0)) which requires identity between the vectors, hence they
are ‘parallel’. The ∇X operator is also linear in X(M) (which can also be regarded
as a module over C ∞ (M)). These properties are expressed by considering ∇X as
an affine connection an operator ∇ : X(M) × X(M) → X(M) which associates
with X ∈ X(M) a linear mapping ∇X of X(M) satisfying affine conditions (see
definition C.1.34). As we mention below, Riemannian manifolds area equipped
with an important unique affine connection denoted the Levi-Civita connection.
The Levi-Civita connection is unique in that it is torsion-free (so characterised by
curvature only) and by virtue of its compatibility with the Riemannian metric (i.e.
it preserves the inner product of tangent vectors under parallel transport, in turn
preserving angles and lengths along curves).

We can now define a geodesic (definition C.1.35) using such notions of parallelism
and the covariant derivative. Generally, a geodesic is a curve that locally minimises
distance and is a solution to the geodesic equation derived from a chosen connection.
Denote γ : I → M, t 7→ γ(t) for an interval I ⊂ R (which we generally without loss
of generality specify as I = [0, 1]) with an associated tangent vector γ̇(t). Here γ
is regular. Two vector fields X, Y are parallel along γ(t) if ∇X Y = 0, ∀t ∈ I. A
curve γ : I → M, t 7→ γ(t)in M is denoted a geodesic if the set of tangent vectors
{γ̇(t)} = Tγ(t) M is parallel with respect to γ, corresponding to the condition that
∇γ γ̇ = 0, which we denote the geodesic equation. In a coordinate frame the geodesic
equation is expressed in its somewhat more familiar form as (equation C.2.1):

d2 u γ α
γ du du
β
+ Γ αβ =0
ds2 ds ds

where:

1 ∂gµα ∂gµβ ∂gαβ
Γγαβ = g γµ + −
2 ∂uβ ∂uα ∂uµ

are the Christoffel symbols of the second kind (essentially connection coefficients)
with g γµ the inverse of the metric tensor and ds usually indicates parametrisation
by arc length. Solutions to this equation are geodesic curves γ(t). For Lie group
manifolds, all geodesics are generated by generators from the horizontal subspace
Hp M, but not all curves generated from the horizontal subspace are geodesics.
42 CHAPTER 2. BACKGROUND THEORY

2.4.7 Riemannian and subRiemannian manifolds

Equipped with definitions of geodesics and parallelism we can define Riemannian

and subRiemannian manifolds (section C.2). A Riemannian manifold (see definition
C.2.4) is the tuple (M, g) (i.e a manifold M with a metric g) where to each p ∈ M is
assigned a positive definite map gp : Tp M×Tp M → R (described usually in terms of
being an inner product) and an associated norm ||Xp || : Tp M → R. Usually we first
define a Riemannian structure (definition C.2.3) as a type-(0,2) tensor field g such
that (a) g(X, Y ) = g(Y, X) (symmetric) and (b) for p ∈ M, g is a non-degenerate
bilinear form gp : Tp M×Tp M → R. A Riemannian manifold is then formally defined
as a connected C ∞ (M) manifold with a Riemannian structure such that there exists
a unique affine connection satisfying (a) ∇X Y −∇Y X = [X, Y ] (zero torsion) and (b)
∇Z g = 0 (parallel transport preserving inner products where g is the Riemannian
metric). For a richer discussion of the importance of curvature and torsion to such
definitions, see definitions of the Riemann curvature tensor (definition C.2.1) and
Cartan’s structural equations in theorem C.2.2. We can then define the important
concept of a Riemannian metric (definition C.2.5) as an assignment to each p ∈ M
of a positive-definite inner product gp : Tp M × Tp M → R with an induced norm
p
|| · ||p : Tp M → R, (v, w) 7→ gp (v, w). Note that the metric is a (0, 2)-form thus a
tensor, aligning with definition (C.1.16).
With the metric in hand, we can now posit a definition of arc length (definition
C.2.6) which is central to the concept of measuring distance on manifolds and, by
extension, time-optimal paths. Given a curve γ(t) ∈ M with t ∈ [α, β] and metric
g, the arc length of the curve from γ(0) to γ(T ) is given by:
Z T
ℓ(γ) = (g(γ̇(t), γ̇(t)))1/2 dt (2.4.2)
0

Assuming M is simply connected, then all p, q ∈ M can be joined via a curve

segment. We then define the metric of distance (equation (C.2.9)) between p, q ∈ M
as the infimum of the shortest curve measured according to the equation (C.2.8)
above, d(p, q) = inf γ ℓ(γ). The preservation of distances can then be understood in
terms of total geodesicity of manifolds. A sub-manifold S of a Riemannian manifold
M is geodesic at p if each geodesic tangent to S at p is also a curve in S. The
submanifold S is totally geodesic if it is geodesic for all p ∈ S. It can then be
shown that if S is totally geodesic, then parallel translation along γ ∈ S always
transports tangents to tangents, that is τ : Sp M → Sp′ M. Further background
detail relevant to the differential geometric theory underpinning the core idea of
being able to calculate (and compare) distances on manifolds M of interest (such as
key concepts of first and second fundamental forms and Gauss’s Theorema Egregium
2.4. DIFFERENTIAL GEOMETRY 43

which provides that Gaussian curvature K is invariant under local isometries) can
be found in section C.2.2.

2.4.8 Symmetric spaces

Chapters 4 and 5 focus on quantum control problems where G corresponds to Lie
groups admitting Cartan decompositions. As manifolds M these groups constitute
symmetric spaces (section C.3) and may be classified according to Cartan’s regimen.
Symmetric spaces were originally defined as Riemannian manifolds whose curvature
tensor is invariant under all parallel translation. Locally, they resemble Riemannian
manifolds of the form Rn ×G/K where G is a semi-simple lie group (definition B.2.11)
with an involutive automorphism whose fixed point set is the compact group K while
G/K, as a homogeneous space, is provided by a G-invariant structure (see [9]).
Cartan thus allowed the classification of all symmetric spaces in terms of classical
and exceptional semi-simple Lie groups.
The study of symmetric spaces then becomes a question of studying specific
involutive automorphisms of semi-simple Lie algebras, thus connecting to the clas-
sification of semi-simple Lie groups. Geodesic symmetry is defined in terms of dif-
feomorphisms φ of M which fix p ∈ M and reverse geodesics through that point
i.e. when acting the map f : γ(1) = q → γ(−1) = q ′ , such that dφp : Tp M →
Tp M, X 7→ dφ(X) = −X (i.e. dφp ≡ −I) where I denotes the identity in Tp M.
Here dφ can be thought of as the effect of φ on the tangent space e.g. in Lie alge-
braic terms φ(γ(t)) = φ(exp(X)) = exp(dφ(X)) = exp(−X) for X ∈ g. This means
manifold M is locally symmetric around p ∈ M. The manifold M is then locally
(Riemannian) symmetric (definition C.3.1) if each geodesic symmetry is isometric
that is, if there is at least one geodesic symmetry about p which is an isometry (and
globally so if this applies for all p ∈ M). As noted in the literature, this is equiv-
alent to the vanishing of the covariant derivative of the curvature tensor along the
geodesic. A globally symmetric space is one where geodesic symmetries are isome-
tries for all M. A manifold being affine locally symmetric is one which, given ∇,
the torsion tensor T and curvature tensor R, T = 0 and ∇Z R = 0 for all Z ∈ T M.
A manifold is Riemannian globally symmetric space if all p ∈ M are fixed points
of the Cartan involutive symmetry θ (see definition B.5.1) such that θ2 (p) = θ.
It can be shown then that the Riemann curvature tensor for the symmetric space
(homogeneous) G/K with a Riemannian metric allows for curvature, via R:

Rp (X, Y )Z = −[[X, Y ], Z] X, Y, Z ∈ p (2.4.3)

Curvature plays an important role in the classification of symmetric spaces.

Three classes of symmetric space can be classified according to their sectional cur-
44 CHAPTER 2. BACKGROUND THEORY

vature as follows. Given a Lie algebra g equipped with an involutive automorphism

θ2 = I, with corresponding group G with G/K as above, we have a G-invariant
structure on the Riemannian metric i.e. g(X, Y ) = g(hX, hY ) for h ∈ G/K. Then
the three types of symmetric space are (i) G/K compact, then K(X, Y ) > 0, (ii)
G/K non-compact, then K(X, Y ) < 0 and (iii) G/K Euclidean, then K(X, Y ) = 0.
Curvature can be related in an important way to commutators as per the presence
of commutator terms in the Riemannian curvature tensor (section C.2.1) and the
second of the Cartan structural equations (theorem C.2.2). The classification table
for symmetric spaces is given in section C.3.2. Our focus in later Chapters is on
SU (2) (and tensor products thereof) together with SU (3).

2.4.9 SubRiemannian Geometry

With the understanding of geometric concepts above, we can now progress to key
concepts in subRiemannian geometry and geometric control which are applied in
later Chapters. SubRiemannian geometry is relevant to quantum control problems
where only a subspace p ⊂ g is available for Hamiltonian control. Intuitively, this
means that the manifold exhibits differences with a Riemannian manifold where
the entirety of g is available, including in relation to geodesic length. In this sense
subRiemannian geometry is a more generalised form of geometry with Riemannian
geometry sitting within it conceptually. We detail a few key features of subRieman-
nian theory below before moving onto geometric control.
SubRiemannian geometry involves a manifold M together with a distribution
∆ upon which an inner product is defined. Distribution in this context refers to a
linear sub-bundle of the tangent (vector) bundle of M and corresponds to the hori-
zontal subspace of T M discussed above and where the vertical subspace is non-null.
Formally it is defined (definition C.4.1) as consisting of a distribution ∆, being a
vector sub-bundle Hp M ⊂ T M together with a fibre inner product on Hp M. The
sub-bundle corresponds to the horizontal distribution, having the meaning ascribed
to horizontal subspace Hp M above. In the language of Lie algebra, where for a de-
composition g = k ⊕ p, we have for our accessible (or control) subspace p ⊂ g rather
than p = g. Thus quantum control scenarios where only a subspace of g is available
for controls can be framed as subRiemannian problem (under typical assumptions).
Geometrically, this means that the generators or vector fields X(M) are constrained
to limited directions. A transformation not in the horizontal distribution may be
possible where the space exhibits certain symmetry structure such as for symmetric
spaces equipped with Lie triple property (as we discuss for certain subspaces where
[p, p] ⊆ k), but in a sense these are indirect such that the geodesic paths connect-
ing the start and end points will be longer than for a Riemannian geometry on
2.5. GEOMETRIC CONTROL 45

M. A curve on M is a horizontal curve if it is tangent to Hp M. SubRiemannian

length ℓ = ℓ(γ) (for γ smooth and horizontal) is then defined in the same way as
R
Riemannian via ℓ(γ) = ||γ̇(t)||dt. SubRiemannian distance ds is similarly defined
(noting that the subRiemannian distance may be less than or equal to the Rieman-
nian distance as is the subRiemannian metric denoted a cometric (definition C.4.2)).
From the subRiemannian metric we specify a system of Hamilton-Jacobi equations
on T ∗ M the solution to which is a subRiemannian geodesic. This subRiemannian
Hamiltonian formalism can be used to define a subRiemannian Hamiltonian given
by H(p, α) = 21 (α, α)p where α ∈ T ∗ M and (·, ·)p is the cometric. SubRieman-
nian geometry has its own form of Hamiltonian differential equations (see [50] and
Appendix C) given by ẋi = ∂H∂pi , ∂p ˙ i = −∂H/∂xi , denoted the normal geodesic
equations (theorem C.4.3). The coordinates xi are the position and pi and momenta
functions for coordinate vector fields. Under these conditions and the assumption
that the distribution ∆ is bracket generating (definition C.4.4) in the same way a Lie
subalgebra may be, the theory (Chow and Raschevskii theorem, see [50]) guarantees
the existence of geodesics that minimise subRiemannian distance

2.5 Geometric control

2.5.1 Overview
We conclude this section by bringing the above concepts together through the ful-
crum of geometric control theory (section C.5) relative to quantum control problems
of interest. The primary problem we are concerned with in our final two chapters is
solving time optimal control problems for certain classes of Riemannian symmetric
space. The two overarching principles for optimal control are (a) establishing the ex-
istence of controllable trajectories (paths) and thus reachable states; and (b) to show
that the chosen path (or equivalence class of paths) is unique by showing it meets a
minimisation (thus optimisation) criteria. The Pontryagin Maximum Principle pro-
vides a framework and conditionality for satisfying these existence and uniqueness
conditions. As we discuss below, it can, however, often be difficult or infeasible to
find solutions to an optimisation problem expressed in terms of the PMP. However,
for certain classes of problem (with which we are concerned) involving target states
in Lie groups G with generators in g, results from algebra and geometry can be
applied to satisfy these two requirements. Thus, under appropriate circumstances,
where targets are in Lie groups G with the Hamiltonian constructed from Lie alge-
bra element g, so long as our distribution ∆ ⊂ g is bracket-generating (so we can
recover all of g via nested commutators), then G is in principle reachable. This sat-
isfies the existence requirement. To satisfy uniqueness requirement, we then apply
46 CHAPTER 2. BACKGROUND THEORY

results from differential geometry and algebra regarding constructing Hamiltonians

to ensure paths generated are geodesics, thus optimal by virtue of being minimal
time trajectories.

In our case, time optimal control is equivalent to finding the time-minimising sub-
Riemannian geodesics on a manifold M corresponding to the homogeneous symmet-
ric space G/K. Our particular focus is the KP problem, where G admits a Cartan
KAK decomposition where g = k ⊕ p, with the control subset (Hamiltonian) com-
prised of generators in p. In particular such spaces exhibit the Lie triple property
[[p, p], p] ⊆ k given [p, p] ⊆ k. In such cases g remains in principle reachable, but
where minimal time paths constitute subRiemannian geodesics. Such methods rely
upon symmetry reduction [55]. As D’Alessandro notes [15], the primary problem
in quantum control involving Lie groups and their Lie algebras is whether the set
of reachable states R (defined below) for a system is the connected Lie group G
generated by L = span{−H(u(t))} for H ∈ g (or some subalgebra h ⊂ g) and
u ∈ U (our control set, see below). This is manifest then in the requirement that
R = exp(L). In control theory L is designated the dynamical Lie algebra and is
generated by the Lie bracket (derivative) operation among generators in H. The
dynamical adage is a reference to the time-varying nature of control functions u(t).
We explicate a few key concepts below. The first is the general requirement that
tangents γ̇(t) be essentially bounded so that ⟨H(t), H(t)⟩S ≤ N for all t ∈ [0, T ] for
some constant N (definition C.5.2). For time-optimal synthesis, we seek geodesic
(or approximately geodesic) curves γ(t) ∈ M. For this, we draw upon the theory
of horizontal subspaces described above manifest via the concept of a horizontal
control curve (definition C.5.3). Given γ(t) ∈ M with γ̇(t) ∈ ∆γ ⊆ Hp M, we can
define horizontal control curves as (equation (C.5.1)):
m
X
γ̇(t) = uj (t)Xj (γ(t))
j

where uj are the control functions given by uj (t) = ⟨Xj (γ(t)), γ̇(t)⟩. The length of
a horizontal curve, which is essentially what we want to minimise in optimisation
problems, is given by (equation (C.5.3)):
v
T T T u m 2
Z Z Z uX
p
ℓ(γ) = ||γ̇(t)||dt = ⟨γ̇(t), γ̇(t)⟩dt = t uj (t)dt (2.5.1)
0 0 0 j

Note that when [0, T ] is normalised this is equivalent to parametrisation by arc

length.
2.5. GEOMETRIC CONTROL 47

2.5.2 Time optimisation

For subRiemannian and Riemannian manifolds, the problem addressed in our final
two chapters in particular is, for a given target unitary UT ∈ G, how to identify the
minimal time for and Hamiltonian to generate a minimal geodesic curve such that
γ(0) = U0 and γ(T ) = UT for γ : [0, T ] → M. This is described [56] as the minimum
time optimal control problem [23]. In principle, the problem is based upon the two
equivalent statements (a) γ is a minimal subRiemannian (normal) geodesic between
U0 and UT parametrised by constant speed (arc length); and (b) γ is a minimum
time trajectory subject to ||⃗u|| ≤ L almost everywhere (where u stands in for the
set of controls via control vector ⃗u). SubRiemannian minimising geodesics starting
from q0 to q1 (our UT ) subject to bounded speed L describe optimal synthesis on M.
There are also subtleties relating to whether loci of geodesics are in critical loci or cut
loci (see section C.5.3), something explored in the geometric control literature. The
critical locus CR(M) identifies t (and thus γ(t)) for which a geodesic ceases to be
time optimal beyond a marginal extension ϵ of the parameter interval I, effectively
bounding the reachable set (see below) for time-optimality to within this interval.
A cut locus CL(M) indicates the set of p ∈ M where multiple minimal geodesics
intersect, providing a measure of redundancy or flexibility given controls (multiple
options for time-optimality) in control problems. The concept of reachability is
framed in terms of reachable sets - put simply, for any control problem to be well-
formed, we require that the target or end-point (UT ∈ G in our case) be ‘reachable’
by application of controls to generators. Formally (definition C.5.4), the set of all
points p ∈ M such that, for γ(t) : [0, T ] → M there exists a bounded function
⃗u, ||⃗u|| ≤ L where γ(0) = q0 and γ(T ) = p is called the reachable set and is denoted
R(T ).

2.5.3 Variational methods

Of fundamental importance to optimal control solutions (in geometric and other
cases) is the application of variational calculus in terms of Euler-Lagrange, Hamil-
tonian and control optimisation formalism. Sections C.5.4 and C.5.6 set out in some
detail the formalism of geometric control leveraged in this work, supplemented by
discussion of generic quantum control principles in section A.2.
As noted in our discussion of quantum control above, the Pontryagin Maximum
Principle (PMP) (section C.5.4.1) is one of the fundamental principles governing op-
timal control theory. The principle sets out the necessary and sufficient conditions
for optimal control of classical and quantum systems, where system evolution is rep-
resented geometrically by time (or energy) minimal paths γ(t) on M. Assume a dif-
ferentiable manifold M on which we define C ∞ (M) functions (f1 (γ, u), ..., fm (γ, u))
48 CHAPTER 2. BACKGROUND THEORY

where our coordinate charts ϕ ∈ S ⊂ Rn parametrise curves γ(t) ∈ M. We have as

our control variable u ∈ U ⊂ Rm where U is our control set. Both are parametrised
as u = u(t), γ = γ(t) for t ∈ I = [t0 , t1 ] (assuming boundedness and measurabil-
ity). The evolution of γ ∈ M is determined by the state and the control, according
to the differential state equation γ̇i (t) = fi (γ(t), u(t)) (equation (C.5.5)) for almost
all t ∈ I. Solution curves γ(t) to the state equation are typically unique under
these conditions. Importantly, the PMP provides a framework to understand how
small variations in the initial conditions affect the system’s evolution, necessary
for determining optimal trajectories that satisfy constraints and minimise the cost
functional (see equation (C.5.13)). The PMP encodes dynamical constraints via ad-
joint (costate) equations where each state variable γi (t) has a corresponding adjoint
variable or co-state pi (t). The dynamics of these adjoint variables are described
by the adjoint equations dp dt
i
= − ∂H
∂γi
(γ(t), p(t), u(t)) (equations (C.5.8)) where H
is the Hamiltonian (equation (C.5.10)). The adjoint variables p(t) effectively serve
a role similar to Lagrange multipliers, providing a mechanism to incorporate the
state dynamics and constraints directly into the Hamiltonian. The PMP then pro-
vides (definition C.5.5) a maximum principle for solving the optimal control prob-
lem by requiring that for a trajectory (γ(t), u(t)) evolving from a → b over interval
t ∈ I = [0, T ], there exists a non-zero absolutely continuous curve p(t) on I satisfy-
ing constraints that (a) the set of states, costates and controls (γ(t), p(t), u(t)) is a
solution curve to the Hamiltonian equations; (b) that the Hamiltonian is maximal
H(x(t), p(t), u(t)) = HM (x(t), p(t)) for almost all t ∈ [0, T ] and (c) p0 (T ) ≤ 0 and
HM (x(T ), p(T )) = 0.
In practice the PMP can be difficult or complicated to solve, thus in control
theory one often seeks ways to simplify problems. The primary relevance of the
PMP in our work relates to guarantees regarding the existence of time-optimal
solutions which we seek to learning Chapter 4 using machine learning and provide
an alternative method for deducing in Chapter 5. This is especially the case where
target states γ(T ) belong to Lie groups G with associated Lie algebras g, certain
symmetry reduction techniques can make the problem tractable, which often means
reducing the task to a simpler minimisation problem. A class of such problems
is where G/K is a Riemannian symmetric space and KP problems which are the
subject of our last two chapters.

2.5.4 KP problems

A particular type of subRiemannian optimal control problem with which we are

concerned in the final Chapter is the KP problem. In control theory settings, the
problem was articulated in particular via Jurdjevic’s extensive work on geometric
2.5. GEOMETRIC CONTROL 49

control [19, 26, 57, 58], drawing on the work of [59] as particularly set out in [23] and
[60]. Later work building on Jurdjevic’s contribution includes that of Boscain [24,61,
62], D’Alessandro [15, 17] and others. KP problems are also the focus of a range of
classical and control problems focused on the application of Cartan decompositions
of target manifolds G = K ⊕ P where elements U ∈ G can be written in terms
of U = KP or U = KAK (see [15]). In this formulation, the Lie group and Lie
algebra can be decomposed according to a Cartan decomposition (definition B.5.2)
g = k⊕p (and associated Cartan commutation relations). The space is equipped with
a Killing form (definition B.2.12) which defines an implicit positive definite bilinear
form (X, Y ) which in turn allows us to define a Riemannian metric restricted to
G/K in terms of the Killing form. Such control problems posit controls in p with
targets in k and are a form of subRiemannian control problem.

Assume our target groups are connected matrix Lie groups (definition B.2.1).
Recall equation (C.5.1) can be expressed as:
X
γ̇(t) = Xj (γ)uj (t) (2.5.2)
j

where Xj ∈ ∆ = p, our control subset. For the KP problem, we can situate

γ(0) = 1 ∈ M (at the identity) ||û|| ≤ L, in turn specifying a reachable set R(T ).
As D’Alessandro et al. [56,63] note, reachable sets for KP problems admit reachable
sets for a larger class of problems. Connecting with the language of control, we can
frame equation (C.5.1) in terms of drift and control parts with:
X
γ̇(t) = Aγ(t) + Xj (γj )uj (t) (2.5.3)
j

where Aγ(t) represents a drift term for A ∈ k. Our unitary target in G can be
expressed as:
X
γ̇(t) = exp(−At)Xj exp(At)γ(t)uj (2.5.4)
j

for bounded ||Ap || = L. For the KP problem, the PMP equations are integrable.
One of Jurdjevic’s many contributions was to show that in such KP problem con-
texts, optimal control for ⃗u is related to the fact that there exists Ak ∈ k and Ap ∈ p
such that:
m
X
Xj uj (t) = exp(At)Xj exp(−At) (2.5.5)
j
50 CHAPTER 2. BACKGROUND THEORY

Following Jurdjevic’s solution [60] (see also [56]), optimal pathways are given by:

γ̇(t) = exp(Ak t)Ap exp(−Ak t)γ(t) γ(0) = 1 (2.5.6)

γ(t) = exp(−Ak t) exp((Ak + Ap )t) (2.5.7)

resulting in analytic curves. Our final Chapter 5 sets out an alternative method for
obtaining equivalent time-optimal control solutions for certain classes of problem.

2.6 Quantum machine learning

Quantum machine learning (QML) adapts concepts from modern machine learning
(and statistical learning) theory to develop learning protocols for quantum data and
quantum algorithms. This section summarises key concepts relevant to the use of
machine learning in Chapters 3 and 4. A more expansive discussion of background
concepts is set out in Appendix D from which this section draws. The landscape of
QML is already vast thus for the purposes of our work it will assist to situate our re-
sults within the useful schema set out in [64] below in Table (D.1) from Appendix D
which we reproduce below. Our results in Chapters 3 and 4 fall somewhere between
the second (classical machine learning using quantum data) and fourth (quantum
machine learning for quantum data) categories, whereby we leverage classical ma-
chine learning and an adapted bespoke architecture equivalent to a parametrised
quantum circuit [65] (discussed below).

QML Taxonomy
QML Division Inputs Outputs Process
Classical ML Classical Classical Classical
Applied classical Quantum (Clas- Classical (Quan- Classical
ML sical) tum)
Quantum algo- Classical Classical Quantum
rithms for classi-
cal problems
Quantum algo- Quantum Quantum Quantum
rithms for quan-
tum problems
Table 2.1: Quantum and classical machine learning table. Quantum machine learning covers four
quadrants (listed in ‘QML Division’) which differ depending on whether the inputs, outputs or
process is classical or quantum.
2.6. QUANTUM MACHINE LEARNING 51

2.6.1 Statistical learning theory

The formal theory of learning in computational science is classical statistical learning

theory [12, 13, 66, 67] which sets out theoretical conditions for function estimation
from data. In the most general sense, one is interested in estimating Y from as some
function of X, or Y = f (X) + ϵ (where ϵ indicates random error independent of X).
Statistical learning problems are typically framed as function estimation models [13]
involving procedures for finding optimal functions f : X → Y, x 7→ f (x) = y (where
X , Y are typically vector spaces) where it is usually assumed that X, Y ∼ PXY i.e.
random variables drawn from a joint distribution with realisations x, y respectively.
Formally, this model of learning involves (a) a generator of the random variable X ∼
PX , (a) a supervisor function that assigns Yi for each Xi such that (X, Y ) ∼ PXY
and (c) an algorithm for learning a set of functions (the hypothesis space) f (X, Θ)
where θ ∈ Θ (where θ ∈ Km parametrises such functions.
The two primary types of learning problems are supervised and unsupervised
learning (defined below) where the aim is to learn a model family {f } or distribution
P. Supervised learning (definition D.3.1) problems are framed in terms of a given
sample Dn = {Xi , Yi } comprising i.i.d. tuples (Xi , Yi ) ∈ X × Y with where X, Y ∼
PXY . The supervised learning task consists of learning the mapping f : X → Y for
both in-sample and out-of-sample (Xi , Yi ) (it is denoted ‘supervised’ owing to the
joint distribution of features X with labels Y ). Unsupervised learning by contrast is
defined for D = {Xi }, X ∼ PrX without output labels Yi for each input Xi . Rather
the labels, such as in the form of classifications or clusters, are learnt (usually via
some form of distance metric applied to D and protocol optimising for statistical
properties of clustering). The next two Chapters deal with supervised learning
problems.
Finding optimal functions intuitively means minimising the error ϵ = |f (x) − y|
for a given function. This is described generically as risk minimisation where, as
we discuss below, there are various measures of risk (or uncertainty) which typically
are expectations (averages) of loss. One assumes the existence of a well-defined (or
chosen) loss function L : Y × y → R, (f (x), y) 7→ L(f (x), y) and a joint probability
distribution over X × Y denoted by (X, Y ) ∼ PXY and associated density pXY . The
statistical (true) risk is then given by the expectation R(f ) = E[L(f (X), Y )|f ].
The minimum risk (Bayes risk) is the infimum of R(f ) over the set F ∗ all such
measureable functions (learning rules) f denoted R∗ = inf∗ R(f ) (note that F ∗
f ∈F
is usually larger than F). The objective of supervised learning then becomes to
use data generate functional estimators conditioned on the data samples fˆn (x) =
f (x; Dn ) in order to minimise the estimate E[R(fˆn )]. The overall task is then one
of generalisation, which is intuitively the minimisation of risk across both in-sample
52 CHAPTER 2. BACKGROUND THEORY

and out-of-sample data.

Formally, the task is then defined as one of empirical risk minimisation where
empirical risk is R̂n (f ) = n1 ni L(F (Xi ), Yi ) (definition D.3.2). It assumes the exis-
P

tence of a sample Dn = {Xi , Yi }n , loss function L and family of functions (e.g. clas-
sifiers) F . The objective then becomes learning an algorithm (rule) that minimises
empirical risk, thereby obtaining a best estimator across sampled and out-of-sample
data, that is fˆn = arg minR̂n (f ). Usually f is a parameterised function f = f (θ)
f ∈F
such that the requisite analytic structure (parametric smoothness) for learning pro-
tocols such as stochastic gradient descent is provided for by the parametrisation,
typically where parameters θ ∈ Rθ . The analyticity of the loss function L means
that R̂n (f ) = n1 ni L(F (Xi (θ)), Yi ) is smooth in θ, which implies the existence of
P

a gradient ∇θ R̂n (f (θ)). The general form of parameter update rule is then a tran-
sition rule on Rθ that maps at each iteration (epoch) θi+1 = θi − γ(n)∇θ R̂n (f (θ))
(see discussion of gradient descent below).

2.6.2 Loss functions and model complexity

A crucial choice in any machine learning architecture - and one we justify in de-
tail in later Chapters - is that of the loss function. Two popular choices across
statistics and also machine learning (both classical and quantum) are (a) mean-
squared error (MSE) and (b) root-mean squared error (RMSE). The MSE (equa-
tion (D.3.6)) for a function f parameterized by θ over a dataset Dn is given by
P ˆ 2
MSE(fθ ) = 1 nn i=1 fθ (Xi (θ)) − Yi . Other common loss functions include (i)
cross-entropy loss e.g. Kullback-Leibler Divergence (see [12] §14) for classification
tasks and comparing distributions (see section A.1.8 for quantum analogues), (ii)
mean absolute error loss and (iii) hinge loss. The choice of loss functions has sta-
tistical implications regarding model performance and complexity, including bias-
variance trade-offs.
As we discuss below, there is a trade-off between the size of F and empirical risk
performance in and out of sample. We can minimise R̂n (f ) by specifying a larger
class of estimation rules. At one extreme, setting f (x) = Yi when x = Xi and zero
otherwise (in effect, F containing a trivial mapping of Xi ) sends R̂n (f ) → 0, but
performs poorly on out of sample data. At the other extreme, one could set f (x)
to capture all Yi , akin to a scatter-gun approach, yet this would inflate R̂n (f ). The
relation between prediction rule F size and complexity illustrates a tradeoff between
approximation and estimation (known as ‘bias-variance’ tradeoff for squared loss
functions). The tradeoff is denoted excess risk, defined as the difference between
2.6. QUANTUM MACHINE LEARNING 53

expected empirical risk and Bayes risk (equation (D.3.7)):

E[Rn (fˆn )] − R∗ = E[R̂n (fˆn )] − inf R(f ) + inf R(f ) − R∗

f ∈F f ∈F
| {z } | {z }
estimation error approximation error

Here estimation error reflects how well fˆ compares with other candidates in F,
while approximation error indicates performance deterioration by restricting F. To
reduce empirical risk (and thus excess risk), two common strategies are (a) limiting
the size of F and thus estimation error and (b) regularisation techniques, consti-
tuting inclusion of a penalty metric that penalises increases in variance (overfitting)
of models expressed via fˆn = arg minf ∈F {R̂n (f ) + C(f )} (equation (D.3.8)). While
such strategies are commonplace and deployed in our machine learning architec-
tures in later Chapters, we note the existence of no free lunch theorems which state
that no algorithm (or choice of fˆ) can universally minimise statistical risk across
all distributions, placing effective limits on learnability in terms of restrictions on
generalisability (see section D.3.3). In section D.3.4 we also set out a few key per-
formance measures that are often used in classical and quantum machine learning
literature. These measures, such as binary classification loss, accuracy, AUC/ROCR
scores and F1-scores all seek to assess model performance. The AUC (Area Under
the Curve) score represents the area under the ROC (Receiver Operating Character-
istic) curve, the latter of which is a plot of the true positive rate (sensitivity) against
the false positive rate (1-specificity) different thresholds. Intuitively, a higher AUC
score represents a higher ratio over such thresholds between true positives and false
positives, thus providing a measure of how well the model performs (see [12]).

2.6.3 Deep learning

The machine learning architectures focused on in Chapters 3 and 4 are deep learning
neural network architectures. In this section we briefly note a few key features of
deep learning, focusing on its technical relationship with generalised linear model
theory and on architectural principles. Classical neural networks derive from gener-
alised linear models. The basic form of such model (definition D.4.1) starts with i.i.d.
samples X ∈ Rm and Y ∈ R where samples (X, Y ) ∼ PXY and task of estimating
Y according to:
m
X
Ŷ = β̂0 + Xj β̂j + ϵ.
j

Here β̂0 ∈ R is the estimate of bias, β ∈ Rm is a vector of coefficients and ϵ

uncorrelated (random) errors. The adaptation of generalised linear models to neural
54 CHAPTER 2. BACKGROUND THEORY

networks effectively arises from the introduction of non-linear functions of the linear
components of the model (e.g. project pursuit regression as discussed below) [12]. A
simple example of regularisation in such models is given by ridge regression β̂ridge =
arg minβ {L(y, Xβ) + λ∥β∥22 } (equation (D.4.2)). Here λ ∈ R is a penalty term
which inflates the loss more if the parameters β are too large in a process known
as regularisation. Moreover, from such formalism we obtain ridge function f (X) =
g ⟨X, a⟩ for X ∈ Rm , g : R → R being a univariate function (a function on the
loss function in effect), a ∈ Rm with the inner product. Ridge functions essentially
wrap the linear models of statistics in a non-linear kernel sometimes denoted project
pursuit regression f (X) = N T
P
n gn (ωm X) (equation (D.4.4)).

2.6.4 Neural networks

The ridge functions nm (ωnT X) vary only in the directions defined by ωn where the
T
feature Vm = ωm X can be regarded as the projection of X onto the unit vector ωm .
For sufficiently large parameter space, such functions can be regarded as universal
approximators (for arbitrary functions) and form the basis of neural networks. The
architecture of neural networks in quantum and classical machine learning involves a
number of characteristics such as network topology, constraints (e.g. regularisation
strategies or dropout), initial condition choices and transition functions. Neural
network architectures are accordingly modelled using a Fiesler framework such that
where they are formally defined as a nested 4-tuple N N = (T, C, S(0), Φ) where
T is network topology, C is connectivity, S(0) denotes initial conditions and Φ
denotes transition functions (definition D.4.2). This framework has since influenced
modern descriptions of neural networks and their architecture. Neural networks
can abstractly be regarded as extensions of non-linearised linear models (as per
project pursuit regression above) constituted via functional composition via layers.
Each layer takes generally speaking data as an input (from previous layers or initial
inputs) which become functions in linear models, which in turn are arguments in a
non-linear function denoted an activation function which represents the output of
the layer. Formally we define an activation function such that for a vector X ∈ Rn ,
weight ω ∈ Rm×n and bias term β0 ∈ Rm , we have an affine (linear) transformation
z = ωX + β0 ∈ Rm . The activation function (definition D.4.3) is then the function
σ : Rm → Rm with σ(z) = σ(ωX + β0 ) = (σ(z1 ), σ(z2 ), . . . , σ(zm ))T .
The function-compositional nature of neural networks is usefully elucidated by
considering the basic feed-forward (multi-layer perceptron) neural network (definition
D.4.4). Such a network comprises multiple layers ali of neurons such that each neuron
in each layer (other than the first input layer) is a compositional function of neurons
in the preceding layer. A fully-connected network is one where each layer’s neurons
2.6. QUANTUM MACHINE LEARNING 55

are functions of each neuron in the previous layer. We represent each layer l as
(equation D.4.7):
nl−1
!
(l)
X (l) (l−1) (l)
ai = σil wij aj + βi .
j=1

The typical neural network schema (section D.4.2.3) then involves (a) input layers
where (feature) data X is input, (b) hidden layers of the form above followed by (c)
an output layer aL that outputs according to the problem at hand (e.g. for classifi-
cation or regression problems). Sometimes the final layer is considered in addition
to the network itself. There is a constellation of sophisticated and complex neural
network architectures emerging constantly within the computer science literature.

2.6.5 Optimisation methods

The learning component of machine learning is an optimisation procedure designed
to reduce excess risk via minimising empirical risk. In general, the hidden layers and
output layers of the network referred to above comprise parameters (weights) θ which
are dynamically updated over training runs. Most optimisation algorithms used
across machine learning involve some sort of stochastic gradient descent whereby
parameters θ of estimated functions fˆ(θ) are updated according to a generalised
directional derivative. The most common method for calculating the gradient used
in this update rule is backpropagation, which seeks to update parameters via prop-
agating errors δ ∼ |fˆθ − Y | across the network. Backpropagation consists of two
(l)
phases: (a) a forward pass, this involves calculating, for each unit in the layer ai ,
(l)
the layer’s activation function σi (see equation (D.4.7)); and (b) a backward pass
where the backpropagation updates are calculated. The most common method of
updating parameters in the backward pass is stochastic gradient descent (definition
D.5.1) which is defined via the mappings:

N N
(l+1) (l)
X ∂ R̂k (l)
X
ωij = ωij − γl (l)
= ωij − γl ∇ω(l) R̂k .
k=1 ∂ωij k=1
ij

(l)
Here ωij is the weight vector for neuron i in layer l that weights neuron j in layer
(l)
l − 1, a(l) is layer l, ai is the ith neuron in layer l and nl is the number of neurons
(units) in layer l. The error quantities used for the updating are given by the
backpropagation formula (equation (D.5.12)):
nl+1
(l) (l)
X (l+1) (l+1)
δi = σi′l (zi ) ωiµ δµ .
µ=1
56 CHAPTER 2. BACKGROUND THEORY

(l)
Here δi represents the error term for layer l and neuron i which can be seen to be
(l+1)
dependent upon the error term δµ of the subsequent l + 1th layer (see section
D.5.2 for a full derivation). The backpropagation equations thus allow computation
of gradient descent:
nl+1
∂ R̂i (l) (l−1)
X (l+1) (l+1)
(l)
= ∇ω(l) R̂i = σi′l (zi )aj ωiµ δµ . (2.6.1)
∂ωij ij
µ=1

In this way, errors are ‘back-propagated’ through the network in a manner that up-
dates weights θ in the direction of minimising loss (i.e. optimising), thus steering
the model f towards the objective. Note our discussion in section D.5.2 on differ-
ences in the quantum case, primarily arising from quantum state properties and the
effects of quantum measurement in collapsing quantum states (see section A.1.6).
In general, there are a number of considerations regarding how gradient descent is
calculated and how hyperparameters are tuned. Backpropagation equations (or vari-
ants thereof) are as a practical matter encoded in software such as TensorFlow [68]
(used in this work) where one can either rely on the standard packages inherent
in the program, or tailor a customised version. In our approach in the following
Chapters, we adopted a range of such methods as detailed below. See sections D.5.3
and D.5.4 for more detail.

2.6.6 Machine learning and quantum computing

The preceding sections on machine learning have focused on primarily classical ma-
chine learning principles. Quantum machine learning represents an extension of such
principles where either the relevant data is quantum data or the information process-
ing regimen is quantum in nature. Quantum machine learning algorithms include
a diverse array of architectures, including quantum neural networks, parametrised
(variational) quantum circuits, quantum support vector machines and other tech-
niques [35,69]. Deep learning and neural network analogues have also been a feature
of quantum machine learning literature for some time, with a variety of designs for
quantum neural networks within the literature (see [64,70] for an overview). In more
recent years, quantum deep learning architectures (including quantum analogues of
graph neural networks, convolutional neural networks and others [71–74]) have con-
tinued to emerge and shape the discipline. As we discuss below, the quantum nature
of data or information processing inherent in these approaches gives rise to a number
of differences and challenges distinct from its classical counterpart. We itemise a
few of these key differences below.

1. Dissipative versus unitary. As noted in the literature [64] and discussed in

section D.6.1, classical neural network architectures are fundamentally based
2.6. QUANTUM MACHINE LEARNING 57

upon non-linear and dissipative dynamics [75], contrary to the linear unitary
dynamics of quantum computing. Thus leveraging the functional approxima-
tion benefits of neural networks while respecting quantum information con-
straints requires specific design choices (see [70, 76, 77] for examples). One of
the motivations for the greybox architecture that characterises our approach
in Chapters 3 and 4 is to design machine learning systems that explicitly over-
come this challenge by embedding unitarity constraints within the network
itself.

2. Quantum statistical learning. As with its classical counterpart, quantum ma-

chine learning has its own analogue of statistical learning theory, sometimes de-
noted quantum (statistical) learning theory which explores bounds and results
on expressibility, complexity and boundedness of quantum machine learning
architectures. Quantum machine learning also faces challenges with regard to
entanglement (see section A.1.7) growth as systems scale affecting the perfor-
mance of algorithms.

3. Quantum measurement. The unique nature of quantum measurement (section

A.1.6) gives rise to challenges from the fact that measurement causes quan-
tum states ρ to collapse onto eigenstates of the measurement operator M via
decohering quantum-classical measurement channels (definition A.1.31). This
causes a loss of quantum information in contrast to the classical case.

4. Barren plateaus. Barren plateaus [78] bear some similarity to the classical van-
ishing gradient problem (albeit with specific differences as noted in [79]) where
gradient expectation decreases exponentially with the number of qubits (see
section D.6.4), interfering with the ability of the learning protocol as a result.
Proposals exist in the literature (such as weight initialisation and quantum cir-
cuit design) to address or ameliorate their effects, but barren plateaus remain
another quantum-specific phenomenon that must be addressed in quantum
circuit design.

5. Data encoding. Data encoding strategies (section D.6.5) also differ in the
quantum case, with data being encoded usually via either state representations
such as via binary encoding 0 for |0⟩ and 1 for |1⟩ or, for continuous data,
phase encoding e.g. via relative phases exp(iη) where η ∈ (−π, π). Encoding
strategies thus differ from their classical counterparts. They are important for
quantum and classical data processing as they enable leveraging the potential
advantages that motivate the use of quantum algorithms in the first place.
58 CHAPTER 2. BACKGROUND THEORY

2.6.7 Parametrised variational quantum circuits

In Chapters 3 and 4 we adapt parametrised variational quantum circuits [65, 80] for
solving specific problems in quantum simulation and quantum control. Variational
quantum circuits (see section D.7) can be defined as a unitary operator U (θ(t)) ∈
B(H) parametrised by the set of parameters θ ∈ Rm . A parametrised
R quantum
cir-
T ′ ′
cuit is a sequence of unitary operations U (θ, t) = T+ exp −i 0 H(θ, t )dt (equa-
tion (D.7.1)) and for the time-independent approximation
U (θ, t) t=T = UT −1 (θT −1 )(∆t)...U1 (∆t)(θ). The optimisation problem is then one of
minimising the cost functional cost functional on the parameter space given by C(θ) :
Rm → R by learning θ∗ = arg minθ C(θ). We let fθ : X → Y and denote U (X, θ) (a
quantum circuit) for initial state X ∈ X and parameters θ ∈ Rm . Let {M } represent
the set of (Hermitian) observable (measurement) operators fθ (X) = Tr(M ρ(X, θ))
(equation (D.7.3)) for ρ(X, θ) = U (X, θ)ρ0 U (X, θ)† . Such parametrised circuits are
denoted variational because variational techniques are used to solve the minimisa-
tion problem of finding θ∗ . For our purposes in Chapters 3 and 4 we parametrised
unitaries by control functions uj (t). It is these control functions that are actually
parametrised such that u(t) = u(θ(t)) where a range of deep learning neural net-
works are used to learn optimal u(t) which is then applied to a Hamiltonian to
generate U = U (θ(t)).
The optimisation procedure adopted is based on the fidelity function as central
to the loss function (cost functional) according to which the classical register θ is
updated. The loss function based on the fidelity metric (definition A.1.40) adopted
in equation (D.7.4) in Chapter 4 (batch fidelity) takes the mean squared error (MSE,
see equation (D.3.6) above) of the loss (difference) between fidelity of the estimated
unitary and target as the measure of empirical risk (section D.3) using the notation
for cost functionals C:
n
1X
C(F, 1) = (1 − F (Ûj , Uj ))2
n j=1

In doing so, we assume the existence of measurement protocols that adequately

specify Ûj and Uj . In our case, parameters θ are equivalent to a classical register
Σ which is updated according to classical gradient descent (section D.5). In sec-
tion D.7.1 we discuss a number of hybrid quantum-classical and primarily quantum
mechanical means of effectively calculating gradient descent in a QML context, in-
cluding parameter shift rules, quantum natural gradients, quantum backpropagation
and back-action based methods.
In section D.8 we briefly cover a range of these areas, both classical and quantum
as follows. These include (i) the use of machine learning in optimal (geometric) con-
2.6. QUANTUM MACHINE LEARNING 59

trol theory, essentially as an optimisation method for PMP-based protocols, (ii) geo-
metric information theory where infamous relationships between Fisher-information
metrics and Riemannian metrics have seen the application of differential geometric
techniques for optimisation of information-theoretic (or register-based) problems,
(iii) Lie group machine learning which was a relatively early application of some
symmetry techniques in Lie theory (familiar to control and geometric theory) to
problems in machine learning (such as learning symmetries in datasets reflective
of group symmetries) and (iv)geometric quantum machine learning which leverages
Lie theory and particularly dynamical Lie algebras in the construction and design
of compositional neural networks and to address issues around barren plateaus. Of
these, geometric quantum machine learning (GQML), a field which has emerged rel-
atively recently, is the area within which this work is most well-situated. See section
D.8.3 for more detail.

2.6.8 Greybox machine learning

Finally, we briefly outline the key hybrid classical-quantum greybox machine learn-
ing (see section D.9) which forms the basis of our design choices in Chapters 3 and 4.
By encoding priori information within a classical neural network stack, we obviate
the need for the network to learn such rules (such as requirements regarding unitar-
ity or Hamiltonian construction) and thereby affording guarantees (such as unitarity
of outputs) in a way that blackbox learning only at best approximates. Variational
quantum circuits and other approaches adopt implicitly similar techniques when,
for example, seeking to generate unitaries using Pauli generators from su(2). We
focus on synthesising the use of Lie algebras with time-optimal methods in geomet-
ric control as encoded within a hybrid quantum-classical variational (parametrised)
circuit network.
In our case, especially in Chapter 4, our targets are UT ∈ G ≃ M (for Lie group
G) generated (via Hamiltonian flow) by Hamiltonians H from the corresponding Lie
algebra g. We focus on an architecture that learns optimal control from training data
comprising geodesic unitary sequences in order to generate geodesics (or approxi-
mate them) on G. As discussed above, learning optimal control functions uj (t) (for
directions j) where j = 1, ..., dim g is a form of optimisation in a machine learning
context and uj (θ(t)) parametrise the system consistent and are learnt from training
data that satisfies PMP constraints. Control functions are adjusted to minimise our
loss functions (explicated below in terms of fidelity F (ÛT , UT )). Geometrically, this
is represented as transformations in one or more of the directions of elements of
Hp M (our distribution ∆). A sketch of the architecture is below:

(i) Objective. In both Chapters 3 and 4, our aim is to provide a sequence of con-
60 CHAPTER 2. BACKGROUND THEORY

trols for synthesising UT ∈ G. In Chapter 3, this takes the form of a hybrid

quantum-classical circuit which learns control pulses that steer the Hamiltoni-
ans comprising Lie algebra generators from su(2) towards generating candidate
unitaries ÛT which minimise fidelity error with our labelled data Ut . In Chapter
4, we treat the sequence of unitaries (denoting the sequence (Uj )) as training
data that is generated according to PMP and geometric principles in order to
be geodesic. We embed a control function layer which is parameterised by θ
as per above, details of which can be extracted once fidelity of the candidate
geodesic sequence (Ûj ) reaches an acceptable threshold.

(ii) Input layers. Input layers a(0) thus vary but essentially in the case of Chapter
4 comprise target unitaries UT which are then fed into subsequent layers a(l) .

(iii) Feed-forward layers. The feed-forward layers (definition D.4.4) then comprise
typical linear neurons with an activation function σ given by a(1) = σ(W T a(0) +
b).

(iv) Control pulse layers. The feed-forward layers then feed into bounded control
pulse layers, there being one parametrised control function for each generator
in the Hamiltonian.

(v) Hamiltonian and unitary layers. The control functions are then combined
into Hamiltonian layers which are then fed into a layer comprising unitary
activation functions. In Chapter 4 this enables generation of the candidate
sequence (Ûj ).

(vi) Optimisation strategies. We utilise batch fidelity via an empirical risk measure
that is the MSE of 1 minus fidelity of Uj and Ûj or otherwise UT , ÛT (see
equation 4.6.2). We also experimented with a variety of other hyperparameters
(see section D.5.4) including the use of dropout [81] which effectively prunes
neurons in order to deal with overfitting.

Pseudo-code describing algorithms is set out in the relevant chapter. As noted in

the subsequent Chapters, code for the relevant models is available on repositories.
Chapter 3

QDataSet and Quantum Greybox

Learning

3.1 Abstract

The availability of large-scale datasets on which to train, benchmark and test algo-
rithms has been central to the rapid development of machine learning as a discipline.
Despite considerable advancements, the field of quantum machine learning has thus
far lacked a set of comprehensive large-scale datasets upon which to benchmark the
development of algorithms for use in applied and theoretical quantum settings. In
this Chapter, we introduce such a dataset, the QDataSet, a quantum dataset de-
signed specifically to facilitate the training and development of quantum machine
learning algorithms. The QDataSet comprises 52 high-quality publicly available
datasets derived from simulations of one- and two-qubit systems evolving in the
presence and/or absence of noise. The datasets are structured to provide a wealth
of information to enable machine learning practitioners to use the QDataSet to
solve problems in applied quantum computation, such as quantum control, quan-
tum spectroscopy and tomography. Accompanying the datasets on the associated
GitHub repository are a set of workbooks demonstrating the use of the QDataSet
in a range of optimisation contexts.
The QDataSet is constructed to mimic conditions in laboratories and experi-
ments where inputs and outputs to quantum systems are classical, such as via clas-
sically characterised controls (pulses, voltages) and measurement outcomes in the
form of a classical probability distribution over observable outcomes of measurement
(see section A.1.6). Actual quantum states, coherences and other characteristically
quantum features of the system, while considered ontologically extant, are, episte-
mologically speaking, reconstructions conditioned upon classical input and output
data. In a machine learning context, this means that the encoding of quantum

61
62 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

states, quantum processes (such as unitary evolution) represents the encoding of

constraints upon how computation may evolve. To this end, we follow the data
generation protocols set out in [82] which we explicate below.

3.2 Introduction
Quantum machine learning (QML) is an emergent multi-disciplinary field combin-
ing techniques from quantum information processing, machine learning and opti-
misation to solve problems relevant to quantum computation [72, 83–85]. The last
decade in particular has seen an acceleration and diversification of QML across a
rich variety of domains. As a discipline at the interface of classical and quantum
computing, subdisciplines of QML can usefully be characterised by where they lie on
the classical-quantum spectrum of computation [86], ranging from quantum-native
(using only quantum information processing) and classical (using only classical in-
formation processing) to hybrid quantum-classical (a combination of both quantum
and classical). At the conceptual core of QML is the nature of how quantum or
hybrid classical-quantum systems can learn in order to solve or improve results in
constrained optimisation problems. The type of machine learning of relevance to
QML algorithms very much depends on the specific architectures adopted. This is
particularly the case for the use of QML to solve important problems in quantum
control, quantum tomography and quantum noise mitigation. Thus QML combines
concepts and techniques from quantum computation and classical machine learning,
while also exploring novel quantum learning architectures.
While quantum-native QML is a burgeoning and important field, the more com-
monplace synthesis of machine learning concepts with quantum systems arises in
classical-quantum hybrid architectures [64, 87–90]. Such architectures are typically
characterised by a classical parametrisation of quantum systems or degrees of free-
dom (measurement distributions or expectation values) whose classical parameters
are updated according to classical optimisation routine (such as variational quantum
circuits discussed in section D.7). In applied laboratory and experimental settings,
hybrid quantum-classical architectures remain the norm primarily due the fact that
most quantum systems rely upon classical controls [91, 92]. To this end, hybrid
classical-quantum QML architectures which are able to optimise classical controls
or inputs for quantum systems have wider, more near-term applicability for both ex-
periments and NISQ [93, 94] devices. Recent literature on hybrid classical-quantum
algorithms for quantum control [40, 82], noisy control [95] and noise characteri-
sation [82] present examples of this approach. Other recent approaches include
the hybrid use of quantum algorithms and classical objective functions for natural
language processing [96]. Thus the search for optimising classical-quantum QML
3.2. INTRODUCTION 63

architectures is well-motivated from a theoretical and applied perspective.

Despite the increasing maturity of hybrid classical-quantum QML as a discipline,
the field lacks many of the characteristics that have been core to the extraordinary
successes of classical machine learning in general and deep learning in particular.
Classical machine learning has been driven to a considerable extent by the availabil-
ity of large-scale, high-quality accessible datasets against which algorithms can be
developed and tested for accuracy, reliability and scalability. The availability of such
datasets as MNIST [97], ImageNet [98], Netflix [99] and other large scale corpora has
acted as a catalyst for not just innovations within the machine learning community,
but also for the development of benchmarks and protocols that have helped guide
the field. Such datasets have also fostered important cross-collaborations among dis-
ciplines in ways that have advanced classical machine learning. By contrast, QML
as a discipline lacks a similarly standardised set of canonical large-scale datasets
against which machine learning researchers (along the quantum-classical spectrum)
may benchmark their algorithms and upon which to base innovation. Moreover,
the absence of such large-scale standardised datasets arguably holds back important
opportunities for cross-collaboration among quantum physicists, computer science
and other fields.
In this Chapter, we seek to address this gap in QML research by presenting a com-
prehensive dataset for use in QML tailored to solving problems in quantum control,
quantum tomography and noise mitigation. The QDataSet is a dedicated resource
designed for researchers across classical and quantum computation to develop and
train hybrid classical-quantum algorithms for use in theoretical and applied settings
relating to these subfields of QML related to control, tomography and noise char-
acterisation of quantum systems. We name this dataset QDataSet and our project
the QML Dataset Project. The motivation behind the QML Dataset Project is to
map out a similar data architecture for the training and development of QML as ex-
ists for classical machine learning. The focus of the QDataSet on quantum control,
tomography and noise problems means that it is of most relevance to these areas as
distinct from other areas of QML, such as the development of quantum-only machine
learning architectures per se. The contributions of our paper are as follows:

1. Presentation of QDataSet for quantum machine learning, comprising multiple

rich large-scale datasets for use in training classical machine learning algo-
rithms for a variety of quantum information processing tasks including quan-
tum control, quantum tomography, quantum noise spectroscopy and quantum
characterisation;

2. Presentation of desiderata of QML datasets in order to facilitate their use by

theoretical and, in particular, applied researchers; and
64 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

3. Demonstration of using the QDataSet for benchmarking classical and hybrid

classical-quantum algorithms for quantum control.

We set out below principles of large-scale datasets that are desirable in a quan-
tum machine learning context and which were adopted in the preparation of the
QDataSet. More detail on QML is set out in Appendix D.

3.3 Overview of QML

Quantum machine learning, a cross-disciplinary field that explores techniques and
synergies between quantum information processing and machine learning, has emerged
as a cutting-edge research field at the frontiers of quantum information and com-
putational science. In just a few short years, QML has expanded into a diverse
multitude of sub-disciplines, covering such topics as quantum algorithm design, al-
gorithmic optimisation, error correction, quantum control, tomography and topo-
logical state classification [38, 64, 69, 84, 100–103]. More broadly, QML intersects
classical and quantum sub-disciplines both in its domain, namely its application
and utilisation of classical and quantum data, and functionality/methodology, in its
utilisation of either classical or quantum (or a hybrid combination of both) informa-
tion processing methods. To this end, QML can be usefully segmented according to
a classical-quantum typology [86, 100] depending on whether quantum or classical
data or information processing is utilised:

(i) Classical-classical (CC): classical data is processed via classical information

processing methods, typically this characterises the vast majority of machine
learning disciplines;

(ii) Quantum-Classical (QC): quantum data is processed using classical informa-

tion processing;

(iii) Classical-Quantum (CQ): classical data is processed using quantum informa-

tion processing techniques, often characterised as classical information via
quantum channels [11, 43]; and

(iv) Quantum-Quantum (QQ): quantum data is processed using quantum informa-

tion processing, covering typical prospective use-cases of quantum computing
devices.

The typography above is useful for classifying the various optimisation strategies
adopted across QML and quantum computing more widely. We expand on it in a
bit more detail below in order to situate the QDataSet within the literature. Optimi-
sation strategies across QML and quantum information processing vary considerably
3.4. QML OBJECTIVES 65

depending upon use-cases, research programmes and objectives. Understanding the

differences between both classical and quantum data and between classical and quan-
tum information processing, is integral to any research programme involving QML.
In particular, while classical techniques and datasets often taken as a standard ref-
erence against which the various strands of QML are benchmarked, the distinct
natures of classical and quantum data and information processing mean that in
many cases direct analogies between quantum and classical information strategies
are unavailable. Indeed the non-equivalence of quantum and classical information
processing is precisely the underlying motivation behind the pursuit of quantum
computation itself. It should be noted, however, that while ‘data’ versus ‘process-
ing’ distinction is a useful heuristic, in reality the nature of quantum and classical
data is inseparable from the information processing associated with quantum and
classical physics. Furthermore, and more generally, classical information itself is
considered an emergent limit of underlying quantum information processing [104].

3.4 QML Objectives

3.4.1 Overview
Cross-disciplinary programmes focused on building quantum datasets for machine
learning will benefit from a framework to categorise and classify the particular objec-
tives of QML architectures and articulation of number of design principles relevant
to the taxonomy of QML datasets. Designing large-scale datasets for QML requires
an understanding of the objectives for which QML research is undertaken and the
extent to which those objectives involve classical and/or quantum information pro-
cessing. Following [86], the application of machine learning techniques to quantum
information processing can be usefully parsed into a simple input / output and pro-
cess taxonomy on the basis of whether information and computational processes are
classical or quantum in nature. Here a process, input or output being ‘quantum
in nature’ refers to the phenomenon by which the input or output data was gen-
erated, or by which the computational process occurs, is itself quantum in nature
given that measurement outcomes are represented as classical datasets from which
the existence of quantum states or processes is inferred. Quantum data encoded
in logical qubits, for example in quantum states (superpositions or entangled), is
different from classical data, in practical terms information about such quantum
data arises by inference on measurement statistics whose outcomes are classical (see
section D.6.5 for a discussion of data encoding). This taxonomy can be usefully par-
titioned into four quadrants depending on the objectives of the QML task (to solve
classical or quantum problems) and the techniques adopted (classical or quantum
66 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

computational methods). Table (D.1) lists the various classical and quantum inputs
according to this taxonomy.

1. Classical machine learning for classical data. The first quadrant covers the
application of classical computational (machine learning) methods to solve
classical problems, that is, problems not involving data or processes of a quan-
tum character.

2. Classical machine learning for quantum data. The second quadrant covers
the application of classical computational and machine learning techniques
to solving problems of a quantum character. Specifically, this subdivision of
QML covers using standard machine learning techniques to solving problems
specific to the theoretical or applied aspects of quantum computing, including
optimal circuit synthesis [40,82,105], design of circuit architectures and so on.
Either input or output data are quantum in nature, while the computational
process by which optimisation, for example, occurs is itself classical.

3. Quantum algorithms for classical optimisation. The third quadrant covers the
application of quantum algorithmic techniques to solving classical problems. In
this subdivision, algorithms are designed leveraging the unique characteristics
of quantum computation, in ways that assist in optimising classical problems
or solve certain classes of problems which may be intractable on a classical
computer. Quantum algorithms are designed with machine learning char-
acteristics, potentially utilising certain computational resources or processes
unavailable when constrained to classical computation. Examples of such algo-
rithms include variational quantum eigensolvers [106–109], quantum analogues
of classical machine learning techniques (e.g. quantum PAC learning [110]) and
hybrid quantum analogues of deep learning architectures (see [72, 90, 111] for
background).

4. Quantum algorithms for quantum information processing. The fourth quad-

rant covers the application of quantum algorithms to solve quantum problems,
that is, problems whose input or output data is itself quantum in nature. This
division covers the extensive field of quantum algorithm design, including the
famous Grover and Shor algorithms [112–114].

The QDataSet fits within the second subdivision of QML, its primary use being
envisaged as assisting in the development of classical algorithms for optimisation
problems of engineered quantum systems. Our focus on classical techniques ap-
plied to quantum data is deliberate: while advancements in quantum algorithms are
both exciting and promising, the unavailability of a scalable fault-tolerant quantum
3.4. QML OBJECTIVES 67

computing system and limitations in hybrid NISQ devices mean that for the vast
majority of experimental and laboratory use cases, the application of machine learn-
ing is confined to the classical case. Secondly, as a major motivation of this work
is to provide an accessible basis for classical machine learning practitioners to enter
the QML field, it makes sense to focus primarily on applying techniques from the
classical domain to quantum data.

3.4.2 Large-Scale Data and Machine Learning

Classical machine learning has become one of the most rapidly advancing scientific
disciplines globally with immense impact across applied and theoretical domains.
The advancement and diversification of machine learning over the last two decades
has been facilitated by the availability of large-scale datasets for use in the research
and applied sciences. Large-scale datasets [14,97,115] have emerged in tandem with
increasing computational power that has seen the velocity, volume and veracity of
data increase [12,116]. Such datasets have both been a catalyst for machine learning
advancements and a consequence or outcome of increasing scope and intensity of
data generation. The availability of large-scale datasets led to the evolution of data
mining, applied engineering and even theoretical results in high energy physics [117].
An important lesson for QML is that developments within these fields have been
facilitated using such datasets in a number of ways. Large-scale datasets improve the
trainability of machine learning algorithms by enabling finer-grained optimisations
via commonplace methods such a backpropagation (discussed in Appendix D). This
has particularly been true within the field of deep learning and neural networks
[14], where large-scale datasets have enabled the development of deeper and richer
algorithmic architectures able to model complex non-linearities and functional forms,
in turn leading to drastic improvements and breakthroughs across domains such as
image classification, natural language processing [118, 119] and time series analysis.
With an abundance of data on which to train algorithms, new techniques such as
regularisation and dimensionality reduction to address problems arising from large-
scale datasets, including overfitting and complexity considerations, have emerged,
in turn spurring novel innovations that have contributed to the advancement of the
field. Large-scale datasets have also served a standardising function by providing
a common basis upon which algorithmic performance may be benchmarked and
standardised. By providing standard benchmarks, large-scale datasets have enabled
researchers to focus on important features of algorithmic architecture in the design
of improvements to training regimes. Such datasets have also enabled the fostering
of the field via competitive platforms such as Kaggle, where researchers compete to
improve upon state of the art results.
68 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Data Set Characteristics

Item Description
Objectives Specification of the objectives for which the dataset was both cre-
ated and to be used
Description Sufficient description of data, representation and theoretical de-
scription
Training/test Identification of training (in-sample) and test (out-of-sample) sub-
sets of data
Data Types Specification of the types of data and formats to be used
Structuring Degree to which data is structured or unstructured
Dimensionality The dimension of the datasets, dimensional reduction or kernel
methods needed
Preprocessing Extent to which preprocessing is required in dataset, covering trans-
formations of data such as sparsification or decomposition
Data quality, Extent to which data is missing, uncertain or incorrect or noisy
consistency and along with any necessary imputation methods
completeness
Visible v. hid- Extent to which data points are ‘direct’ or ‘indirect’ (inferred) and
den whether imputations necessary
Table 3.1: Taxonomy of large-scale datasets which can guide the generation of QML datasets and
development of QML dataset taxonomies.

3.4.3 Taxonomy of large-scale datasets

Large-scale classical machine learning datasets share common structural and archi-
tectural characteristics designed to facilitate the objectives for which the datasets
were compiled. There are a range of considerations when generating these types of
datasets, including the specific objectives, the types of data to be stored, the degree
of structuring of data (including whether highly structured or unstructured), the
dimensionality of datasets, the extent of preprocessing of datasets required, data
quality issues (such as missing, uncertain or incorrect data - an issue for example in
quantum information processing contexts given sources of error and uncertainty),
data imputation for missing datasets and whether data is visible or hidden data (e.g.
whether data is direct or a feature constructed from other data), the number of data
points, format, default tasks of the datasets, data temporality (how contemporane-
ous data is), control of datasets and access to data. Datasets are also structured
depending on the machine learning algorithms for which they were developed, taking
into account the types of objectives, loss functions, optimisers, development envi-
ronment and programming languages of interest to researchers. Table (3.1) sets out
a range of issues and desiderata in this regard.

Large-scale dataset characteristics affect the utility of the datasets in applied con-
texts. Such characteristics are relevant to the design of quantum datasets. Below we
3.4. QML OBJECTIVES 69

set out a number of principles used in the design of the QDataSet which we believe
provide a useful taxonomy for the QML community to consider when generating data
for use in machine learning-based problems. The aim of the proposed taxonomy for
quantum datasets is to facilitate their interoperability across machine learning plat-
forms (classical and quantum) and for use in optimisation for experimentalists and
engineered quantum systems. While taxonomies and specific architectures will differ
across domains, our proposed QDataSet taxonomy we believe will assist the QML
and classical ML community to guide large-scale data generation towards principles
of interoperability summarised in Table (3.1) and explained below:

1. Objectives. Quantum datasets, as with classical datasets, benefit from being

constructed with particular objectives in mind. Most major classical large-
scale datasets are compiled for specific objectives such as, for example, clas-
sification or regression tasks. In a quantum setting, such objectives include
quantum algorithm design, circuit synthesis, quantum control, tomography or
measurement-based objectives (such as sampling). The QDataSet’s objectives
are to provide training data for use in the development of machine algorithms
for controlled experimental and engineered quantum systems. This objective
has informed the feature selection and structural design, such as inclusion of
measurement statistics, Hamiltonian and unitary sequences and the various
types of noise and distortion.

2. Description. Sufficiently describing datasets, efficiently representing the data

and providing theoretical context for how and why the datasets are so repre-
sented, enhances their utility. Representation of data (its form, structure, data
types and so on) affects their ease of use and uptake. For machine learning,
optimal data representation is an important aspect of feature learning, repre-
sentation learning [120]. In this work, we go to some lengths to describe the
various structural aspects of the QDataSet in order to facilitate its uptake by
researchers in designing algorithms. We especially have set-out background
information for machine learning practitioners who may be unfamiliar with
quantum data in an effort to reduce barriers facing cross-disciplinary collabo-
ration.

3. Training and test sets. Applied datasets for machine learning require training,
validation and test sets in order to adequately train algorithms for objectives,
such as quantum control. The design of quantum datasets in general, and the
QDataSet in particular, has been informed by desirable properties of train-
ing sets. These include, for example: (i) interoperability, ensuring training
set data can be adequately formatted for use in various programming lan-
guages (for example storing QDataSet data via matrices, vectors and tensors
70 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

in Python); (ii) generalisability, preprocessing of datasets to improve general-

isability of algorithmic results, especially to test or out-of-sample data [12] (in
the QDataSet, we do this via providing a variety of noise-affected datasets)
and minimise measures such as empirical risk (equation (D.3.2)) (see in gen-
eral Appendix D); (iii) feature smoothing trained algorithms can often focus
on information-rich yet small subspaces of data which, while informative for
in-sample prediction, can lead to decreased generalisability across the majority
of in- and out-of-sample data lacking such features. Feature smoothing is a
technique to coarse-grain features so that less weight is put on rarer though
information rich features in order to improve generalisation. In a quantum
context, this may involve an iterative process of trial and error that trains
datasets and seeks to identify relevant features; alternatively, it may involve
using techniques from quantum information theory to classify regions of high
and low information content e.g. via entropy measures.

4. Data precision and type. Data precision and data typing is an important
consideration for quantum datasets, primarily to facilitate ease of interoper-
ability between software and applied/hardware in the quantum space. Others
considerations can include the degree of precision with which data should be
presented. Ideally, quantum data for use in developing algorithms for applica-
tion in experimental quantum control or measurement scenarios should allow
flexibility of data precision to match instruments used in laboratories on a
case-by-case basis. For example, the QDataSet choices regarding noise de-
grees of freedom (such as amplitude, mean and standard deviation) have been
informed by collaborations with experimental groups.

5. Structuring. Data structuring, the degree to which data is structured according

to taxonomies, is an important characteristic of classical datasets that affects
their use and functionality. For quantum datasets, structuring encompasses
the types of information that would be included and how that information is
categorised. In the selection of real-world applicable datasets, researchers will
have a range of choices of salient information to includes, including: theoretical
details of the candidate Hamiltonians, details of the physical laboratory setting
such as controls, exogenous parameters such as temperature (a significant en-
vironmental variable affecting quantum systems), noise or other disturbances;
the characteristics of measurement devices and so on. Spectroscopic informa-
tion, including details of the spectroscopy used, may also be included. What
to include and not include will depend upon the particular uses cases and
generality (or specificity) of the datasets. In each case, it makes sense for
quantum datasets to contain as much useful information as possible such as
3.4. QML OBJECTIVES 71

about parameters, say exogenous environment parameters, or distortion infor-

mation which may affect measurement devices. Doing so enables algorithms
trained on quantum datasets to improve their performance and generalise bet-
ter. Examples of such information in the QDataSet include details we have
included regarding noise profiles and distortion simulations.

6. Dimensionality. The dimensionality of datasets is an important consideration.

Large-dimensional datasets often require adaptation in order to facilitate al-
gorithmic learning. This is especially in order to address the ubiquitous curse
of dimensionality [121], where, as dimensions of datasets increase, algorithms
may fail to converge, gradients may vanish or become stuck in barren plateaus.
This may occur during the preprocessing stage, in-processing or during post-
processing. Techniques such as principal component analysis [12], matrix fac-
torisation [101], feature extraction together with algebraic techniques such as
singular value decompositions are all motivated primarily to reduce the dimen-
sionality and complexity of datasets, thereby minimising the hypothesis search
space. Moreover, learning algorithms which can efficiently solve problems with
sparse datasets often have computational advantages. Quantum data by its
nature rapidly becomes higher-dimensional as the number of qubit or com-
putations resources increases. Such vast search spaces present challenges for
QML, such as the barren plateaus problems [78], the quantum analogue of the
vanishing gradient problem in classical machine learning (albeit with differ-
ences arising due to exponentially-large search spaces) (see section D.6.4).

7. Preprocessing. Datasets often require or benefit from preprocessing in order

to ameliorate problems during training, such as vanishing gradients, bias or
problems with convergence. Preprocessing data can include techniques such
as sparsification [122] or smoothing or other strategies. For example, for quan-
tum circuit synthesis, ensuring training data samples are drawn from across
the Hilbert space of interest rather than limited to subspaces can assist with
generalisation (see [40] for a geometric example). In such cases, quantum
dataset preparation may benefit from preprocessing to address sparsity con-
cerns (see [123] for examples and for classical analogues of vanishing gradi-
ents [124]).

8. Visibility. Classical machine learning is often concerned with extraction -

or development - of features. In many forms of classical machine learning,
such as those using kernel methods, or deep learning, features of importance
to optimal performance of an algorithm may need to be inferred from the
data. Quantum datasets in many ways face such challenges from the outset
as quantum data can never be directly observed, rather it must be inferred
72 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

from measurement statistics. When constructing quantum datasets, the extent

to which such inferred (as distinct from directly observed) data will be an
important choice. In the QDataSet, we have chosen to include a range of
such ‘hidden’ or ‘inferred’ data to assist practitioners with use of the dataset,
including the intermediate forms of Hamiltonian, unitaries and other data that
is not itself directly accessible but is a by-product of our simulation (accessible
via intermediate layers).

Studying features of particular datasets and their use in classical contexts assists
in extracting desirable features for large-scale quantum datasets. ImageNet is one
of the leading research projects and database architectures for images [14, 98, 115].
The dataset is one of the most widely cited and important datasets in the de-
velopment of machine learning, especially for image-classification algorithms using
convolutional, hierarchical and other deep-learning based neural networks. The evo-
lution of ImageNet and its use within machine learning disciplines provides a useful
guide and comparison for the development of QML datasets in general. ImageNet
comprises two main components: (i) a public semi-structured dataset of images
together with (ii) an annual competition and workshop. The dataset provides a
‘ground truth’ standardised set of images against which to train categorical image
classification algorithms. The competition and workshop provided and continue to
provide an important institutional practice driving machine learning development.
While beyond the scope of this work, the development of QML would arguably be
considerably assisted by the availability of QML-focused competitions akin to those
commonplace within the classical machine learning community. Such competitive
frameworks would motivate and drive the development of scalable and generalisable
QML algorithms. As is also evident from classical machine learning, competitive for-
mats are also a useful way for laboratories, companies or other projects to leverage
the expertise of the diverse machine learning community.
Another example from the machine learning community which can inform the
development of QML is Kaggle, a leading online platform for machine learning-based
competitions. Kaggle runs competitions where competitors are provided with pre-
diction tasks, training sets and constraints upon the type of algorithm design (such
as resource use and so on). Competitors then develop models aiming to optimise a
measure of success, such as a standard machine learning metric of accuracy, AUC
or some other measure [125]. The open competitive nature of Kaggle is designed
to crowd-source solutions and expertise to problems in machine learning and data
science. A ‘quantum Kaggle’ would be of considerable benefit to the QML commu-
nity by providing a platform through which to spur collaborative and competitive
development of quantum algorithms.
3.4. QML OBJECTIVES 73

3.4.4 QML Datasets

While quantum datasets for machine learning (quantum and classical) are neither
as prevalent nor as ubiquitous as those in the classical realm, there are some ex-
amples in the literature of quantum or hybrid quantum-classical datasets generated
for use in machine learning contexts. QML datasets can be categorised into: (1)
general quantum datasets produced for purposes other than QML, such as quantum
datasets in quantum chemistry of other fields, which can be preprocessed or used as
training data in QML contexts. Such datasets are not specifically produced for the
purposes of QML per se; (2) dedicated QML-specific quantum datasets, generated
and structured for the purposes of QML. This second category mainly consists of
quantum datasets for use in classical or hybrid machine learning contexts. Quan-
tum datasets currently available tend towards one or other of these classifications,
though there is overlap, for example, with quantum datasets designed for use in
machine learning which are nevertheless highly domain-specific. Examples include
quantum chemistry datasets for use in deep tensor neural networks [126], datasets
for learning spectral properties of molecular systems [127–130] and for solid-state
physics [131–134].
A recent example is provided by the dedicated quantum chemistry datasets
known as the QM7-X dataset [135], an expansive dataset of physiochemical proper-
ties for several million equilibrium and non-equilibrium structures of small organic
molecules. The QM7-X dataset spans regions of chemical compound space and was
generated to provide a basis for machine-learning assisted design of molecules with
specific properties. The dataset builds upon previous iterations of QM-series and
related quantum chemistry datasets [129, 136, 137]. Structurally, the dataset com-
bines global (molecular) properties and local (atomic) information, including ground
state quantities (spectra and moments) along with response quantities (related to
polarisation and dispersion). The dataset is highly domain-specific and represents
a salient example of a dataset designed to spur machine-learning driven research
within a particular field. More recent examples include entangled datasets for use
in QML research [138].

3.4.5 QML and QC platforms

The use of quantum datasets in machine learning has been facilitated over the last
several years by a surge in quantum programming and languages and platforms
for both QML and quantum computing generally. While such platforms are dy-
namic and changing, it is important that quantum datasets for machine learning
be constructed to be as interoperable with platforms in the quantum and classical
machine learning community. Generators of large-scale quantum datasets should
74 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

be cognisant of how their data can be (more easily) used in such platforms below
and also how their datasets can be designed in ways that facilitates their ease of
use within common machine learning languages, such as TensorFlow, PyTorch and
across languages, such as Python, C#, Julia and others. The QDataSet has been
specifically designed in relatively elementary Python packages such as Numpy in or-
der to facilitate its use across the machine learning community, but also in a way that
we hope makes it useable and clearly understandable by the quantum engineering
community. We deliberately selected Python as the language of choice within which
to build the QDataSet simulation given its status as the leading classical program-
ming language of choice for machine learning. It also is a language adopted across
many of the quantum platforms above. We built the QDataSet using Numpy to
produce datasets as transferable as possible (rather than, for example, in Qutip). A
familiarity with the emerging quantum programming and QML ecosystem is useful
for the design of quantum datasets. We set out a few examples of leading quantum
programming platforms below.

Qutip [139] is a leading quantum simulation and algorithmic package in Python

used for open quantum systems’ simulation. The package, while not developed
specifically for QML, is widely used for hybrid quantum-classical systems’ research.
Inputs to Qutip algorithms are Numpy-based vectors, tensors and matrices used to
represent density matrices, quantum states and operators. Qutip permits a wide
range of simulations to be run and data to be generated, including for state prepa-
ration, control and drift Hamiltonians, pulse sequences and noise modelling. As dis-
cussed below, the QDataSet, which was built in Python using primarily the Numpy
package, but was verified using Qutip. Q# is Microsoft’s primary open-source pro-
gramming language for quantum algorithm design and execution. The platform
comprises a number of libraries, simulators and a software development kit (QDK).
Quantum Tensorflow (QTF) [140] is a hybrid quantum-classical version of Google’s
leading open-source machine learning Tensorflow platform. QTF is constructed to
enable the synthesis of classical and quantum algorithmic machine learning, for ex-
ample classically parameterised quantum circuits, variational quantum circuits and
eigensolvers, quantum convolutional neural networks and other quantum analogues
of classical machine learning architectures. QTF follows Tensorflow’s overall ma-
chine learning structure and data taxonomy. Input data is usually in the form of
tensors. QTF’s in-platform datasets vary depending on use case, but the platform
primarily draws upon classical datasets for hybrid use cases (quantum computation
applied to solving classical optimisation tasks). For simulated quantum-native data,
QTF draws upon Cirq, Google’s open source framework for programming quantum
computers [141]. Cirq is focused on providing a software library for research into
and simulations of quantum circuits, the idea being to develop quantum algorithms
3.4. QML OBJECTIVES 75

in Cirq that can be run on quantum computers and simulators.

Strawberry Fields [142] is an open-source QML and quantum algorithm pro-
gramming platform developed by Xanadu for photonic quantum computing. Qiskit
is another open source software development kit for quantum circuits, control and
applications on quantum and hybrid computers [143]. Qiskit is based on an open
source quantum assembly language (QASM) standardised abstraction of quantum
circuits. Other platforms enabling the integration of quantum datasets and QML
algorithms (either quantum or classical) include those available via IBM’s Quantum
Experience. The QDataSet has been designed to for interoperability across most
of these platforms. Practically speaking, this means that researchers can select
dataset features of interest, such as tensors of quantum states, Hamiltonians, uni-
tary operators (gates) or even noise information and integrate as datasets for use in
algorithms designed using platforms above. Similarly, machine learning researchers
should find the form of data relatively familiar to typical datasets in machine learn-
ing, where information is encoded in tensors, lists or matrices. Examples of using
similar datasets in customised TensorFlow machine learning models can be found in
various sources [40, 82].

3.4.6 Quantum data in datasets

An important aspect of quantum dataset design is the decision regarding what quan-
tum information to include in the dataset. In this section, we list types of quantum
data which may be included in large-scale quantum datasets. By quantum data, we
refer to data generated or characterising quantum systems or processes. More specif-
ically, by quantum data we refer to quantum registers (see definition A.1.3) discussed
in Appendix A together with an assumed state preparation procedure (see [43, 144]
on state preparation generally and the Appendices for more discussion) which suffi-
ciently encode information into quantum states that subsist and transform according
to the protocols of quantum information processing. Quantum data may comprise
a range of different properties, features or characteristics of quantum systems, the
environment around quantum systems. Such data is described using a classical al-
phabet Σ and encoded in quantum states in ways that constitute a quantum register
of that information. It may comprise data and information abstracted into a partic-
ular representation or form, such as circuit gates, algebraic formulations, codes etc
or more physical forms, such as statistics or read-outs from measurement devices.
For QML datasets, it is useful to ensure that quantum data is sufficient for classi-
cal machine learning researchers to understand and for integrating quantum data
into their algorithmic techniques. For example, a classically parameterised quantum
circuit (section D.7), as common throughout the QML literature, would typically
76 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

include data or tensors of the relevant parameters, the operators related to such pa-
rameters (such as generators) and the quantum states (vectors or density operators)
upon which the circuit acts.

3.4.7 QDataSet Methodological Overview

In this section, we provide an overview of the methods according to which the

QDataSet was developed. Our exposition includes detail of (i) the Hamiltonians used
to generate the dataset, (ii) characteristics of control mechanisms used to control
the evolution of the quantum simulations and (ii) measurement procedures by which
information is extracted from the evolved quantum simulations. This section is
stand-alone but can also be read in tandem with the Appendices. We aim to equip
classical machine learning practitioners with a minimum working knowledge of our
methodology to enable them to understand both how the datasets were generated
and the ways in which quantum data and quantum computing differ from their
classical counterparts relevant to solving problems in quantum control, tomography
and noise spectroscopy. Our focus, as throughout this work, is on the application of
classical and hybrid classical-quantum machine learning algorithms and techniques
to solve constrained optimisation problems using quantum data (as distinct from
the application of purely quantum algorithms).

A synopsis of quantum postulates and quantum information processing set out in

Appendix A aims also to provide a mapping between ways in which data and compu-
tation is characterised in quantum contexts and their analogues in classical machine
learning. For example, the description of dataset characteristics, dimensionality,
input features, training data to what constitutes labels, the types of loss functions
applicable are all important considerations in classical contexts. By providing a
straightforward way to translate quantum dataset characteristics into typical clas-
sical taxonomies used in machine learning, we aim to help lower barriers to more
machine learning practitioners becoming involved in the field of QML. A high-level
summary of some of the types of quantum features that QML-dedicated datasets
ideally would contain is set out in the background Appendices. Our explication
of the QDataSet below provides considerable detail on each quantum data feature
contained within the datasets including parameters, assumptions behind our choice
of quantum system specifications, measurement protocols and noise context. A dic-
tionary of specifications for each example in the QDataSet is set out in Tables (3.8)
and (3.9).
3.4. QML OBJECTIVES 77

3.4.7.1 Formalism

Recall in density operator formalism (definition A.1.24), a quantum system can be

described via a Hermitian positive semi-definite density operator ρ ∈ B(H) with
trace unity acting on the state space of the system (such that if the system is in
P
state ρi with probability pi then ρ = i pi ρi as per equation (A.1.13)). Density
operators are generalised operator-representations of probability distributions over
quantum states with particular properties: all their eigenvalues have to be real,
non-negative, sum to unity, inheriting the necessary constraints of a probability dis-
tribution. States |ψ⟩ in the Hilbert space may be composite systems, described as
the tensor product of states (see A.1.1.1) spaces of the component physical systems,
that is |ψ⟩ = ⊗i |ψi ⟩. We also mention here (as discussed in section A.3) the impor-
tance of open quantum systems sect where a total system Hamiltonian H can be
decomposed as H = HS + HE + HI (equation (A.3.1)), comprising a closed quan-
tum system Hamiltonian HS , an environment Hamiltonian HE and an interaction
Hamiltonian term HI , which is typically how noise is modelled in quantum con-
texts. Open quantum systems are typically approximated using master equations
to capture the dissipative effects of system/environment interaction. The dissipative
nature of open quantum systems has parallels with the dissipative characteristics
of neural networks (see [111]). Simulated data of open quantum systems can be
generated in a number of packages, such as Qutip. For the QDataSet, we made a
design decision to directly simulate the effects of coupling to dissipative environmen-
tal baths using more elementary Monte Carlo methods. The reason for this is that
master equation formalism both requires a number of assumptions on the system
(see [139, 144]) which may be difficult to apply to experimental setups. We also
chose to manually engineer the effect of baths in order to minimise the theoretical
barriers for classical machine learning practitioners using the QDataSet.
In this work, we focus on both closed and open quantum system simulation.
Closed quantum systems evolve over time ∆t = t1 − t0 according to the Schrödinger
equation (A.1.18) whose solutions take the form of unitary channels. As discussed
below, given the difficulties in solving for time dependency, unitaries are typically
approximated by time-independent sequences of unitaries (see section A.1.5.1). The
evolution of quantum states is characterised by such unitaries operating upon vectors
that transform (transition) to other states (section A.2.2). Intermediate quantum
states |ψ ′ ⟩ may be represented as the result of applying unitary operators to initial
states |ψ⟩ such that |ψ ′ ⟩ = U (t) |ψ0 ⟩ (or ρ′ = U (t)ρ0 U (t)† ). Given initial quantum
states, quantum state evolution can be represented entirely via operator dynamics
and representations or more information-theoretically in terms of quantum channels
acting to transition among quantum registers. In standard programming languages,
78 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

operators will take the form of matrices or tensors. It is worth noting that the
operator acting on a quantum state ρ is a unitary U (t) ∈ B(H) which is itself (in
a closed quantum system) a function (or representation) of the Hamiltonian H(t)
governing the system dynamics. In practice, unitaries are formed by exponentiating
time-independent approximations of Hamiltonians. These unitaries represent solu-
tions to the time-dependent Schrödinger equation governing evolution for a closed
system.
The evolutionary dynamics of the quantum system are completely described by
the Hamiltonian operator acting on the state |ψ⟩ such that |ψ(t)⟩ = U (t) |ψ(t = 0)⟩
(section A.1.4). In density operator notation, such evolution is represented as ρ(t) =
U (t)ρ(t0 )U (t)† . Typically solving the continuous form of the Schrödinger equation
is intractable or infeasible, so a discretised approximation as a discrete quantum
circuit (where each gate Ui is run for a sufficiently small time-step ∆t) is used (e.g.
via Trotter-Suzuki decompositions). Open quantum systems are described by the
formalism of Lindbladian master equations (equation A.1.19), representing the effect
of noise sources upon the evolution of quantum states.

3.4.7.2 Additional methodological concepts

There are a number of other important concepts for classical machine learning prac-
titioners to be aware of when using quantum datasets such as the QDataSet. We
set out some of these: (a) relative phase (see section A.1.3), for a qubit system,
amplitudes a and b differ by a relative phase if a = exp(iθ)b, θ ∈ R (relative phase
is an important concept as the relative phase encodes quantum coherences and is,
together with basis encoding, a primary means of encoding data in quantum sys-
tems); (b) entanglement (see section A.1.7), composite states (known as EPR or Bell
states), may be entangled, meaning that their measurement outcomes are necessarily
correlated. For a two-qubit state:

|00⟩ + |11⟩
|ψ⟩ = √ (3.4.1)
2

a measurement outcome of 0 on the first qubit necessarily means that a measure-

ment of the second qubit will result in the post-measurement state |0⟩ also i.e. the
measurement statistics of each qubit correlate. States that are entangled cannot
be written as tensor products of component states i.e. |ψ⟩ = ̸ |ψ1 ⟩ |ψ2 ⟩ (see section
A.1.7); (c) expectation, expectation values of an operator A (e.g. a measurement)
can be written as E(A) = tr(ρA) (see equation (A.1.41)); (d) commutativity, multi-
ple measurements are performed on a system, the outcome will be order-dependent
if the measurement operators corresponding to those measurements do not commute
i.e if [A, B] ̸= 0 (see definition B.2.6). This is distinct from the classical case; and (e)
3.4. QML OBJECTIVES 79

no cloning, quantum data cannot be copied, fundamentally distinguishing it from

classical data (see theorem A.1.39).
There are a range of other aspects of quantum systems that are relevant to the
use of machine learning algorithms for solving optimisation problems (see section
D.5) which we pass over but which are relevant to research programmes using such
algorithms, including the role of error-correcting codes (designed to limit or self-
correct errors to achieve fault-tolerant quantum computing). While not the focus of
the QDataSet, it is worth noting for machine learning practitioners that a distinction
is usually drawn in the quantum information literature between physical and logical
qubits. A physical qubit is a two-level physical quantum system. A logical qubit is
itself an abstraction from a collection of physical qubits which in aggregate behave
according to qubit parameters [145]. For more complex treatments (involving a
multitude of physical qubits) in quantum control or quantum error correction, the
underlying simulation code may be adapted accordingly.

3.4.7.3 One- and two-qubit Hamiltonians

We now describe the QDataSet Hamiltonians which are integral to understanding

the method by which the datasets were generated. First we describe the single-qubit
Hamiltonian and then move to an exposition of the two-qubit case. Each Hamilto-
nian has been designed consistent with solving use-cases below. In particular, we
have adopted quantum control formalism for Hamiltonians (as discussed in Appen-
dices A and B) in terms of drift and control components (see section A.2). For the
single-qubit system, the drift Hamiltonian Hd (t) is fixed in the form:

1
Hd (t) = Hd = Ωσz . (3.4.2)
2

Here σz is the Pauli generator (see equation (3.5.17)) for z-axis rotations. The Ω
term represents the energy gap of the quantum system (the difference in energy
between, for example, the ground and excited state of the qubit, recalling qubits
are characterised by having two distinct quantum states). The single-qubit drift
Hamiltonian for the QDataSet is time-independent for simplicity, though in realistic
cases it will contain a time-dependent component. For the single-qubit control
and noise Hamiltonians we have two cases based upon the concept of which axes
controls and noise are applied. Recall we can represent a single qubit system on a
Bloch sphere, with axes corresponding to the expectations of each Pauli operator and
where operations of each Pauli operator constitute rotations about the respective
axes. Our controls are control functions (equation (A.2.6)), mostly time-dependent,
that apply to each Pauli operator (generator). They act to affect the amplitude over
time of rotations about the respective Pauli axes. More detailed treatments of noise
80 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

in quantum systems and quantum control contexts can be found in [15, 144].
As discussed above, the functional form of the control functions fα (t) varies (see
section A.2 for background on control functions). We select both square pulse and
Gaussian pulse wavefunctions as the form (see below). Each different noise function
βα (t) is parameterised differently depending on various assumptions that are more
specifically detailed in [82] and summarised below. Noise and control functions are
applied to different qubit axes in the single-qubit and two-qubit cases. For a single
qubit, we first have single-axis control along x-direction:

1
Hctrl (t) = fx (t)σx (3.4.3)
2

with the noise (interaction) Hamiltonian H1 (t) along z-direction (the quantification
axis):

1
H1 (t) = βz (t)σz (3.4.4)
2

Here the function βz (t) (a classical noise function β(t) applied along the z-axis) may
take a variety of forms depending on how the noise was generated (see below for
a discussion of noise profiles e.g. N1-N6). See section A.3.2. It should be noted
that noise rarely has a functional form and is itself difficult to characterise (so β(t)
should not be thought of as a simple function). For the second case, we implement
multi-axis control along x- and y- directions and noise along x- and z-directions in
the form:

1 1
Hctrl (t) = fx (t)σx + fy (t)σy (3.4.5)
2 2
1 1
H1 (t) = βx (t)σx + βz (t)σz . (3.4.6)
2 2

Noiseless evolution may be recovered by choosing H1 (t) = 0. Note given the choice
of the z-axis as the quantification axis, the application of x-axis noise may give rise
to decohering effects. For the two-qubit system, we chose the drift Hamiltonian in
the form:

1 1
Hd (t) = Ω1 σz ⊗ σ0 + Ω2 σ0 ⊗ σz . (3.4.7)
2 2

For the control Hamiltonians, we also have two cases. The first one is local control
along the x-axis of each individual qubit, akin to the single-qubit case each. In
the notation, f1α (t) indicates that the control function is applied to, in this case,
the second qubit, while the first qubit remains unaffected (denoted by the ‘1’ in the
subscript and by the identity operator σ0 ). We also introduce an interacting control.
This is a control that acts simultaneously on the x-axis of each qubit, denoted by
3.4. QML OBJECTIVES 81

fxx (t):

1 1
Hctrl (t) = fx1 (t)σx ⊗ σ0 + f1x σ0 ⊗ σx + fxx (t)σx ⊗ σx . (3.4.8)
2 2

The second two-qubit case is for local-control along the x−axis of each qubit only
and is represented as:

1 1
Hctrl (t) = fx1 (t)σx ⊗ σ0 + f1x σ0 ⊗ σx , (3.4.9)
2 2

For the noise, we fix the Hamiltonian to be along the z-axis of both qubits, in the
form:

1 1
H1 (t) = βz1 (t)σz ⊗ σ0 + β1z σ0 ⊗ σz . (3.4.10)
2 2

Notice, that for the case of local-only control and noiseless evolution, this will corre-
spond to two completely-independent qubits and thus we do not include this case, as
it is already covered by the single-qubit datasets. We also note that not all interac-
tion terms (such as σz ⊗σz ) need be included in the Hamiltonian. The reason for this
is that to achieve universal control equivalent to including all generators, one only
need include one-local control for each qubit together with interacting (entangling)
terms (though we note recent results regarding constraints on 2-local operations in
the presence of certain symmetries [146]). Assuming one has a minimal (bracket-
generating, see definition C.4.4) set of Pauli generators in su(2) in the Hamiltonian,
one may synthesise any Pauli gate of interest for the one- or two-qubit systems
(i.e. given two Pauli gates, one can synthesise the third) making the set of targets
reachable (definition C.5.4), thus achieve effective universal control (see section C.5
for bracket-generating subalgebras and equation B.2.22 for discussion of the BCH
formula).
To summarise, the QDataSet includes four categories for the datasets set-out in
Table 3.6. The first two categories are for 1-qubit systems, the first is single axis
control and noise, while the second is multi-axis control and noise. The third and
fourth categories are 2-qubit systems with local-only control or with an additional
interacting control together with noise.

3.4.8 Hamiltonians: drift, control, noise

Recapping above, the QDataSet comprises datasets for one- and two- qubits systems
evolving in the presence and absence of noise. The canonical forms of Hamiltonian
discussed above from which those in the QDataSet are developed are given in [82].
In that work, a limited set of single-qubit systems subject to external environmental
82 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

noise (baths) was used as input training data for a novel greybox machine learning
alternative method for characterising noise. In this work, the underlying simulation
was modified to generate a greater variety of qubit-noise examples for the single
qubit case. The simulation was then extended beyond that in [82] to generate ex-
amples for the two-qubit case (in the presence or absence of noise). As discussed
above, the evolution of closed and open quantum systems is described by Hamil-
tonian dynamics, which encode time-dependent functions into operators which are
the Lie algebra generators (see section B.2.4) of time-translations (operators) acting
on quantum states. To summarise: the Hamiltonian comprises three elements: (i) a
drift Hamiltonian Hd (t), encoding the endogenous evolution of the quantum system;
(ii) a control Hamiltonian Hctrl (t), encoding evolution due to the application of clas-
sical controls which may be applied to the quantum system to steer its evolution;
and (iii) and an interaction (noise) Hamiltonian H1 (t), encoding the effects of cou-
pling of the quantum system to its environment, such as a decohering noise source
(a bath). The Hamiltonians are composed using Pauli generators (see section 3.5.3.5
and equations (3.5.17)), representing elements of the Lie algebra su(2). We express
the Hamiltonians in the Pauli operator basis {σi } which forms a complete basis for
our one- and two-qubit systems. Our control functions are represented as fα (t) for
α = x, y, z where the subscript indicates which generator the control applies to.
Concretely, for example, σz control is denoted fz (t)σz . In general, continuous time-
dependent control formulations are difficult - or infeasible - to solve analytically,
where solving here means finding a suitable representation of the control unitary
(equation (A.1.22)):
RT
Uctrl = T+ e−i 0 fα σα /2dt
(3.4.11)

Most controls are usually classical i.e. fα (t) ∈ R. The simplest control functional
form is fixed amplitude control [147] or what is also described as a square pulse,
where constant energy (expressed as amplitudes) is expressed for a discrete time-
step ∆t. Other control waveforms include Gaussian. Moreover, quantum control in
the QDataSet context has two primary imperatives. The first is the use of control in
closed noise-free idealised quantum systems where the objective is the use of controls
to steer the quantum system to some desired objective state. This is equivalent to
the synthesis of quantum circuits (sequences of quantum gates) from the identity I
to a target unitary UT . The second is the use of controls in the presence of noise,
where the quantum system is coupled to an environment that potentially decoheres
the system. In this second case, ideally the controls are tailored to neutralise the
effect of noise on the evolution of the quantum system - a process typically described
by dynamic decoupling [148, 149] (see for example Hahn echo or other examples).
3.4. QML OBJECTIVES 83

Crafting suitable controls to mitigate noise effects is challenging. One must properly
time and select appropriate amplitudes for the application of controls to counteract
decohering interference, modelled as a superoperator term in master equations, for
example (see equation (A.3.5)). Typically, it requires information about the noise
spectrum itself, obtained using techniques from quantum noise spectroscopy [144].
It also requires an understanding of how control and noise signals convolve in the
context of quantum evolution. Dealing with noise is a central imperative of quantum
information processing and the engineering of quantum systems where the aim is to
preserve and extend coherence times of quantum information and to correct for errors
that arise during the evolution of quantum computational systems. To this end,
a major stream of research in quantum information processing concerns quantum
error-correcting codes (QEC) as means of encoding quantum information in a way
that renders it robust to noise-induced errors and/or enables ‘self-correction’ of
errors as a quantum system evolves.

3.4.8.1 QDataSet Control

Recall that the control pulse sequences in the QDataSet consist of two types of
waveforms. The first is a train of Gaussian pulses, and the other is a train of square
pulses, both of which are very common in actual experiments. Square pulses are
the simplest waveforms, consisting of a constant amplitude Ak applied for a specific
time interval ∆tk :

f (∆tk ) = f = Ak ∆tk (3.4.12)

where k runs over the total number of time-steps in the sequence. The three param-
eters of such square pulses are the amplitude Ak , the position in the sequence k and
the time duration over which the pulse is applied ∆t. In the QDataSet, the pulse
parameters are stored in a sequence of vectors {an }. Each vector an is of dimension
r parameters of each pulse (e.g. the Gaussian pulse vectors store the amplitude,
mean and variance, the square pulse vectors store pulse position, time interval and
amplitude), enabling reconstruction of each pulse from those parameters if desired.
For simplicity, we assume constant time intervals such that ∆tk = ∆t. The Gaussian
waveform can be expressed as:
n
2 /2σ 2
X
f (t) = Ak e−(t−µk ) k . (3.4.13)
k=1

where n is the number of Gaussian pulses in the sequence. The parameters of the
Gaussian pulses differ somewhat from those of the square pulses. Each of the n
pulses in the sequence is characterised by a set of 3 parameters: (i) the amplitude
84 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Ak (as with the square pulses), (ii) the mean µk and (iii) the variance σk of the
pulse sequence. Thus in total, the sequence is characterised by 3n parameters. The
amplitudes for both Gaussian and square pulses are chosen uniformly at random
from the interval [Amin , Amax ], the standard deviation for all Gaussian pulses in
the train is fixed to σk = σ, and the means are chosen randomly such that there
is minimal amplitude in areas of overlapping Gaussian waveform for the pulses in
the sequence. The pulse sequences can be represented in the time or frequency
domains [150]. The QDataSet pulse sequences are represented using the time-domain
as it has been found to be more efficient feature for machine learning algorithms [82].
As discussed in [82], the choice of structure and characteristics of quantum
datasets depends upon the particular objectives and uses cases in question, the labo-
ratory quantum control parameters and experimental limitations. Training datasets
in machine learning should ideally be structured so as to enhance the generalisabil-
ity. In the language of statistical learning theory, datasets should be chosen so as
to minimise the empirical risk (equation D.3.3) associated with candidate sets of
classifiers [12, 13]. In a quantum control context, this will include understanding for
example the types of controls available to researchers or in experiments, often volt-
age or (microwave) pulse-based [151]. The temporal spacing and amplitude of each
pulse in a sequence of controls applied during an experiment may vary by design
or to some degree uncontrollably. Pulse waveforms can also differ. For example,
the simplest pulse waveform is a constant-amplitude pulse applied for some time
∆t [152]. Such pulses are characterised by for example a single parameter, being the
amplitude of the waveform applied to the quantum system (this manifests as we dis-
cuss below as an amplitude applied to the algebraic generators of unitary evolution
(see [15, 40] for an example)). Other models of pulses (such as Gaussian) are more
complex and require more sophisticated parametrisation and encoding with machine
learning architectures in order to simulate. More detail on such considerations and
the particular pulse characteristics in the QDataSet are set-out in Tables (3.8) and
(3.9).

3.4.8.2 QDataSet Metrics

For most machine learning practitioners using the QDataSet, the entry point will
be the application of known classical machine learning metrics. More advanced
uses of the QDataSet may utilise quantum-specific metrics directly, for example,
via reconstruction of quantum states from measurement statistics. Some use cases
combine the use of classical and quantum metrics. For example, [40, 82] combine
average operator fidelity (equation A.1.49) with standard mean-squared error (MSE)
(equation (D.3.6)) into a measure of empirical risk denoted as ‘batch fidelity’ as per
equation (D.7.4). In those examples, the objective in question is to train a greybox
3.5. EXPERIMENTAL METHODS 85

algorithm (section D.9) to model certain control pulses needed to synthesise target
unitaries. The algorithms learn the particular control pulses which are applied
to generators. While it is the extraction of control pulses which are of interest
to experimentalists, the final output of the algorithm is a sequence of fidelities
where the fidelity of generated (synthesised) unitaries is compared against the target
(label) unitaries UT . This sequence of fidelities is then compared against a vector of
ones with the loss function set to minimise the MSE (distance) between the fidelity
sequence and the label vector. In doing so, the algorithms are effectively trained to
maximise fidelity (as fidelities ≈ 1 are desirable) yet do so using a hybrid approach.
The QDataSet has been generated such that a combination of classical, quantum
and hybrid metrics of divergence measures may be used in the training process.

3.5 Experimental Methods

The QDataSet comprises 52 datasets based on simulations of one- and two-qubit
systems evolving in the presence and/or absence of noise subject to a variety of con-
trols. It has been developed to provide a large-scale set of datasets for the training,
benchmarking and competitive development of classical and quantum algorithms for
common tasks in quantum sciences, including quantum control, quantum tomogra-
phy and noise spectroscopy. It has been generated using customised code drawing
upon base-level Python packages and TensorFlow in order to facilitate interoper-
ability and portability across common machine learning and quantum programming
platforms. Each dataset consists of 10,000 samples which in turn comprise a range
of data relevant to the training of machine learning algorithms for solving optimi-
sation problems. The data includes a range of information (stored in list, matrix or
tensor format) regarding quantum systems and their evolution, such as: quantum
state vectors, drift and control Hamiltonians and unitaries, Pauli measurement dis-
tributions, time series data, pulse sequence data for square and Gaussian pulses and
noise and distortion data.

The total compressed size of the QDataSet (using Pickle and zip formats) is around
14TB (uncompressed, several petabytes). Researchers can use the QDataSet in a
variety of ways to design algorithms for solving problems in quantum control, quan-
tum tomography and quantum circuit synthesis, together with algorithms focused
on classifying or simulating such data. We also provide working examples of how
to use the QDataSet in practice and its use in benchmarking certain algorithms.
Each part below provides in-depth detail on the QDataSet for researchers who may
be unfamiliar with quantum computing, together with specifications for domain ex-
perts within quantum engineering, quantum computation and quantum machine
86 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

learning. Previous chapters Appendix A contains extensive background material for

researchers unfamiliar with quantum information processing. The Appendices also
contain discussions of relevant quantum control definitions and concepts relevant
to the construction and use of the QDataSet. We also set out further below ex-
amples applications of the QDataSet together with links to corresponding Jupyter
notebooks. The notebooks are designed to illustrate basic problem solving in tomog-
raphy, quantum control and quantum noise spectroscopy using the QDataSet. They
are designed to enable machine learning researchers to input their algorithms into
the relevant section of the code for testing and experimentation. Machine learning
uptake often occurs via adapting example code and so we regard these examples as
an important demonstration of the use-case for the QDataSet.

3.5.1 QDataSet and Scalability

The QDataSet was based on simulations of one and two-qubit systems only. The ra-
tionale for this choice was primarily one of computational feasibility. The QDataSet
was generated over a six-month period using the University of Technology, Sydney’s
High Performance Computing (HPC) cluster, with some computations taking sev-
eral weeks alone. In principle larger datasets based on our simulation code can be
prepared, however we note the computational resources of doing so may be consider-
able. To generate the datasets, we wrote bespoke code in TensorFlow which enabled
us to leverage GPU resources in a more efficient manner. As we discuss below, we
were interested in developing a dataset that simulated noise affecting a quantum
system. This required performing Monte Carlo simulations (see section 3.5.3.7) and
solving Schrödinger’s equation several times for each dataset. While existing pack-
ages, such as Qutip (see below) are available to model the effect of noise on quantum
systems, we chose not to rely upon such systems. The reason was that Qutip re-
lies upon Lindblad master equations to simulate system/noise interactions which in
turn rely upon the validity of certain assumptions and approximations. Chief among
these is that noise is Markovian. In our datasets, we included coloured noise with
a power-spectrum density which is non-Markovian. Furthermore, Qutip’s methods
assumes a model of a quantum system interacting with a fermionic or bosonic bath
which was not applicable in our case given we were modelling the imposition of
classical noise using Monte Carlo methods.
The resource cost for simulating the various qubit systems depended upon whether
we sought to simulate noise or distortion. We found, however, that simulating the
two-qubit systems took a significant amount of time, nearly four-weeks of runtime
for a single two-qubit system. While multiple nodes of the HPC cluster were utilised,
even on the largest node on the cluster (with at least 50-100 cores and two GPUs),
3.5. EXPERIMENTAL METHODS 87

the simulation time was extensive, even using GPU-equipped clusters. We estimate
that more efficient speedup could be obtained by directly simulating in lower-order
languages, such as C++. For this reason, we restricted the QDataSet to simula-
tions of at most two-qubit systems. Such a choice obviously limits direct real-world
applications of algorithms trained on the QDataSet to one- and two-qubit systems
generally. While this may appear a notable limitation given the growing abundance
of higher-order multi-qubit NISQ systems, it remains the case that many experi-
mental laboratories remain limited to small numbers of qubits. We expect in most
situations that one and two-qubit gates are all that are available. Engineering more
than two-body interactions is an incredible challenge and only available in certain
architectures.
NISQ devices offer promising next steps, but it is primarily one- and two-qubit
systems that have demonstrated the type of long coherence times, fast gate ex-
ecution and fault-tolerant operation needed for truly scalable quantum computa-
tion [93,153,154]. As a result, the QDataSet can be considered relevant to the state
of the art. Additionally, simulating more than two qubits would have exceeded com-
putational capacity constraints related to our specific simulated code which includes
interactions and iterations over different noise profiles. Moreover, developing algo-
rithms on the basis of small qubit systems is a commonplace way of forming a basis
for algorithms for larger multi-qubit systems: training classical machine learning
algorithms on lower-order qubit systems has the benefit of enabling researchers to
consider how such algorithms can or may learn multi-qubit relations which in turn
can assist in algorithm design when applied to higher-order systems. Doing so will
be an important step in building scalable databases for applying machine learning
to problems in quantum computing.

3.5.1.1 QDataSet and Error-Correction

The QDataSet was generated for non-QEC encoded data. The reasoning behind
this was that (i) specific error-correcting encodings differ considerably from case to
case, whereas unencoded quantum information is more prevalent in the experimen-
tal/laboratory setting; and (ii) quantum computational and NISQ devices are yet to
reach the scale and prevalence necessary for practical testing of QEC at scale. The
simulations in the QDataSet are based upon an alternative technique for quantum
control in the presence of a variety of noise [82], where a greybox neural network
(see section D.9) is used to learn only those characteristics of the noise spectrum rel-
evant to the application of controls (a comparatively simpler problem than seeking
to determine the entire spectrum). In this context, the objective of the QDataSet
is to enable algorithms to natively learn optimal error correction regimes from the
data itself (rather than by encoding in a QEC) via inferring the types of counter-
88 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

vailing controls (e.g. control pulses) that should be applied to minimise errors. In
principle, the same type of machine-learning control architecture could also apply to
QEC encoded data: the machine learning algorithms would in effect learn optimal
quantum control conditioned on the data being encoded in an error-correcting code.
Moreover, there is an emergent literature on using machine learning for QEC discov-
ery itself. For machine learning practitioners, the QDataSet thus provides a useful
way to seek to apply advanced classical machine learning techniques to challenging
but important problems.

3.5.2 QDataSet Noise Methodology

3.5.2.1 Noise characteristics

The QDataSet was developed using methods that aimed to simulate realistic noise
profiles in experimental contexts. Noise applicable to quantum systems is gener-
ally classified as either classical or quantum [155]. Classical noise is represented
typically as a stochastic process [144] and can include, for example (i) slow noise
which is pseudo-static and not varying much over the characteristic time scale of
the quantum system and (ii) fast or ‘white’ noise with a high frequency relative
to the characteristic frequencies (energy scales) of the system [156]. The effect of
quantum noise on quantum systems is usually classified in two forms. The first is
dephasing (T2 ) noise, which characteristically causes quantum systems to decohere,
thus destroying or degrading quantum information encoded within qubits (see sec-
tion A.3.3). Such noise is usually characterised as an operator acting transverse to
the quantisation axis of chosen angular momentum. The second type of noise (T1 )
can shift the energy state of the system (e.g. shifting the system from a ground to
an excited state).
What this means in practice for the use of the QDataSet is usefully construed
as follows using a Bloch sphere. Once an orientation (x, y, z-axes) is chosen, one is
effectively choosing a choice of basis i.e. the basis of a typical qubit |ψ⟩ = a |0⟩+b |1⟩
is the basis of eigenstates of the σz operator. When noise acts along the z-axis (i.e. is
associated to the σz operator), then it has the potential to (if the energy of the noise
is sufficient) shift the energy state in which the quantum system is in, represented
by a ‘flip’ in the basis from |0⟩ to |1⟩ for example. This type of noise is T1 noise.
By contrast, noise may act along x- and y-axes of a qubit, which is represented as
being associated with the σx and σy operators. These axes are ‘transverse’ to the
quantisation axis. Noise along these axis has the effect of dephasing a qubit, thus
affecting the coherences encoded in the relative phases of the qubit. Such noise is
denoted T2 noise. Open quantum systems’ research (section A.3) and understanding
noise in quantum systems is a vast and highly specialised topic. As we describe
3.5. EXPERIMENTAL METHODS 89

below, the QDataSet adopts the novel approach outlined in [82] where, rather than
seeking to fully characterise noise spectra, the only the information about noise
relevant to the application of controls (to dampen noise) is sought. Such information
is encoded in the VO operator, which is an expectation that encodes the influence
of noise on the quantum system (see subsection 3.5.3 below). In a quantum control
problem using the QDataSet samples containing noise, for example, the objective
would then be to select controls that neutralise such effects.

3.5.3 QDataSet noise profiles

In developing the QDataSet, we chose sets of noise profiles with different statistical
properties. The selected noise profiles have been chosen to emulate commonplace
types of noise in experimental settings. Doing so improves the utility of algorithms
trained using the QDataSet for application in actual experimental and laboratory
settings. While engineers and experimentalists across quantum disciplines will usu-
ally be familiar with theoretical and practical aspects of noise in quantum systems,
many machine learning and other researchers to whom the QDataSet is directed
will not. To assist machine learning practitioners whom may not be familiar with
elementary features of noise, it is useful to understand a number of conceptual classi-
fications related to noise used in the QDataSet as follows: (i) power spectral density
(which describes the distribution of the noise signal over frequency); (ii) white noise
(usually high-frequency noise with a flat frequency); (iii) coloured noise, a stochas-
tic process where values are correlated spatially or temporally; (iv) autocorrelated
stochasticity, which describes where the noise waveform characteristics are biased
by tending to be short (blue) or long (red) as distinct from unautocorrelated noise,
where waveforms are relatively uniformly distributed across wavelengths; and (v)
stationary noise (a waveform with a constant time period) and non-stationary noise
(a waveform with a varying time period). See literature on noise in signal process-
ing for more detail (such as [157] for an introduction or [144] for a more advanced
quantum treatment). The noise realizations are generated in time domain following
one of these profile listed as follows (see [82] for specific functional forms):

• N0: this is the noiseless case (indicated in the QDataSet parameters as set out
in Tables (3.8) and (3.9));

• N1: the noise β(t) is described by its power spectral density (PSD) S1 (f ), a
form of 1/f noise with a Gaussian bump;

• N2: here β(t) is stationary Gaussian coloured noise described by its auto-
correlation matrix; chosen such that it is coloured, Gaussian and stationary
90 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

(typically lower frequency) and is produced via convolving Gaussian white

noise with a deterministic signal;

• N3: here the noise β(t) is non-stationary Gaussian coloured noise, again de-
scribed by its autocorrelation matrix which is chosen such that it is coloured,
Gaussian and non-stationary. The noise is simulated via multiplication of a
deterministic time-domain signal with stationary noise;

• N4: in this case, the noise β(t) is described by its autocorrelation matrix
chosen such that it is coloured, non-Gaussian and non-stationary. The non-
Gaussianity of the noise is achieved via squaring the Gaussian noise so as to
achieve requisite non-linearities;

• N5: a noise described by its power spectral density (PSD) S5 (f ), differing from
N1 only via the location of the Gaussian bump; and

• N6: this profile is to model a noise source that is correlated to one of the
other five sources (N1 - N5) through a squaring operation. If the β(t) is the
realization of one of the five profiles, N6 will have realisations of the form β 2 (t).
This profile is used for multi-axis and multi-qubit systems.

The N1 and N5 profiles can be generated following the method described in [82]
(see the section entitled “Implementation” onwards). Regarding the other profiles,
any standard numerical package can generate white Gaussian stationary noise. The
QDataSet noise realisations were encoded using the Numpy package of Python. We
deliberately did so in order to avoid various assumptions used in common quan-
tum programming packages, such as Qutip. To add colouring, we convolve the
time-domain samples of the noise with some signal. To generate non-stationarity,
we multiply the time-domain samples by a some signal. Finally, to generate non-
Gaussianity, we start with a Gaussian noise and apply non-linear transformation
such as squaring. The last noise profile is used to model the case of two noise
sources that are correlated with each other. In this case we generate the first one
using any of the profiles N1-N5, and the other source is completely determined.

3.5.3.1 Noise profile details

Following on from discussion of noise spectral density and open quantum systems
(see section A.3.4), we specify three primary noise profiles for the construction of
the QDataSet below (see [158]).

1. Single-axis noise; orthogonal pulse. For a qubit with single-axis noise and
3.5. EXPERIMENTAL METHODS 91

orthogonal control pulses, the Hamiltonian is given by

1 1
H= (Ω + βz (t)) σz + fx (t)σx . (3.5.1)
2 2

With z-axis noise PSD:


(ω−20)2
 1
ω+1
+ 0.8e− 10 0 < ω ≤ 50
SZ (ω) = (ω−20)2
(3.5.2)
0.25 + 0.8e − 10
f > 50

2. Multi-axis noise; two orthogonal pulses. For single-qubit multi-axis noise and
two orthogonal control pulses, the Hamiltonian is:

1 1 1
H= (Ω + βz (t)) σz + (fx (t) + βx (t)) σx + fy (t)σy (3.5.3)
2 2 2

with x-axis noise PSD:


(ω−15)2
 1
(ω+1)1.5
+ 0.5e− 10 0 < ω ≤ 20
SX (ω) = (ω−15)2
(3.5.4)
(5/48) + 0.5e − 10
ω > 20.

Here z-axis noise is as per the first case.

3. For the noiseless qubit with two orthogonal pulses, the Hamiltonian is:

1 1 1
H = Ωσz + fx (t)σx + fy (t)σy . (3.5.5)
2 2 2

We examine the relationship to master equation formalism in section A.3.2 and

the Lindblad master equation (A.3.5). Consider a single qubit with single-axis noise
profile βz (t) along the z-axis and orthogonal control pulses fx (t) applied along the
x-axis. The Hamiltonian is:

1 1
H= (Ω + βz (t)) σz + fx (t)σx . (3.5.6)
2 2

To derive the Lindblad master equation, first we look for factors affecting system
evolution in terms of unitary and non-unitary dynamics: (a) unitary evolution driven
by the deterministic part of the Hamiltonian and (b) non-unitary evolution resulting
from the interaction with the environment. Thus βz (t) is the environmental inter-
action term which is ultimately related to the decoherence rate γz via the power
spectral density SZ (ω) (see equation (A.3.13) and section A.3.3 for more detail). To
identify the Lindblad operators, we note that the noise is along the z-axis such that
√
Lk = γz σz , where γz is the decoherence rate, related as per above related to the
92 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

noise PSD. The Lindblad master equation is then:

dρ 1 √ √ 1 2
= −i (Ωσz + fx (t)σx ) , ρ + γz σz ρ γz σz − {γz σz , ρ} . (3.5.7)
dt 2 2

Noting σz2 = I (the identity), the dissipative term simplifies to:

√ √ 1
γz σz ρ γz σz − γz {I, ρ} = γz (σz ρσz − ρ). (3.5.8)
2

such that the Lindblad master equation for our system under z-axis noise is:

dρ 1
= −i (Ωσz + fx (t)σx ) , ρ + γz (σz ρσz − ρ). (3.5.9)
dt 2

Thus we can see the link between the measurable characteristics of environmental
noise and the theoretical description of its effects on quantum systems via equation
(A.3.5).

3.5.3.2 Distortion

In physical experiments, the control pulses are physical signals (such as microwave
pulses), which propagate along cables and get processed by different devices. This
introduces distortions which cannot be avoided in any real devices. However, by
properly engineering the systems, the effects of these distortions can be minimized
and/or compensated for in the design. In this work, we used a linear-time invariant
system to model distortions of the control pulses, and the same filter is used for all
datasets. We chose a Chebychev analogue filter [159] with an undistorted control
signal is the input and distorted the filter output signal. Table (3.7) sets out a
summary of key parameters.

3.5.3.3 QDataSet noise operators

In developing the QDataSet, we have assumed that the environment affecting the
qubit is classical and stochastic, namely that H1 (t) will be a stochastic term that
acts directly on the system. The stochasticity of H1 (t) means that the expectation
of any observable measured experimentally will be given as:

⟨O⟩c = ⟨Tr(ρ(T )O)⟩c , (3.5.10)

where O represents the measurement operator corresponding to the observable of

˙ m in notation above) and the ⟨·⟩c is a classical expectation over
interest (e.g. O=M
the distribution of the noise realizations. It can then be shown (see [82]) that this
can be expressed in term of the initial state ρ(0), and the evolution fixed over the
3.5. EXPERIMENTAL METHODS 93

time interval [0, T ] as:

⟨O(T )⟩ = Tr VO (T )U0† (T )ρ(0)U0 (T )O (3.5.11)

RT
where U0 (T ) = T+ e−i 0 H0 (t)dt
is the evolution matrix in the absence of noise and:

VO (T ) = ⟨WO (T )⟩c (3.5.12)

.
= ⟨O−1 ŨI† (T )OŨI (T )⟩c . (3.5.13)

is a novel noise operator introduced in [82] which characterises the expectation of

noise relevant to synthesising counteracting control pulses. We encapsulate the
full quantum evolution via the operator WO (T ). Note that VO is formed via the
partial tracing out of the effect of the noise and its interaction with the control
pulses, so encodes only those features of importance or relevance to impact of noise
(not the full noise spectrum). Importantly, the use of the VO operator is designed
to allow information about noise and observables to be separated (a hallmark of
dynamic decoupling approaches). Thus of particular importance is the assumption
(see section A.3.4) that noise is bounded by an upper frequency limit i.e. |ω| ≤ Ω0 .
The modified interaction unitary ŨI (T ) is defined such that:

U (T ) = ŨI (T )U0 (T ), (3.5.14)

RT
where U (T ) = T+ e−i 0 H(t)dt is the full evolution matrix. This contrasts the conven-
tional definition of the interaction unitary which takes the form U (T ) = U0 (T )UI (T ).
The VO operator is used in the simulation code for the QDataSet to characterise the
effect of noise on such values. Ideally, in a noise-free scenario, those expectations
should tend to zero (representative of the absence of noise). The idea for includ-
ing such noise operators is that this data can then be input into machine learning
models to assist the algorithms to learn appropriate, for example, pulse sequences
or controls that send VO → I (neutralising the noise).

A detailed explanation and example of the use of the VO operator is provided

in [82]. For machine learning practitioners, the operator VO may, for example, be
used in an algorithm that seeks to negate the effect of VO . The utility of this
approach is that full noise spectroscopy is not required.
94 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

QDataSet Measurement Methodology

3.5.3.4 QDataSet Measurements

The simulated quantum measurements of the QDataSet are inherently probabilistic.

As set out in section A.1.6, measurement of quantum systems yields an underlying
probability distribution over the possible measurement outcomes (observables) m
of the system which are in turn determined by the state of the system and the
measurement process. There are several ways to describe quantum measurements
mathematically. The most common method (which we adopted) involves projective
measurements. In this case, an observable O is described mathematically by a
Hermitian operator. The eigendecomposition of the operator can be expressed in the
P
form O = m mPm , where m are the eigenvalues, and Pm (sometimes denoted Mm
to emphasise their role in measurement) are the associated projectors (definition
A.1.16) into the corresponding eigenspace. The projectors Pm must satisfy that
Pm2 = Pm , and that m Pm = I (the identity operator), to ensure we get a correct
P

distribution for the outcomes. In more sophisticated treatments, O may belong to

POVM described above which partition the Hilbert space H into distinct projective
subspaces Hm associated with each POVM operator O (see definition A.1.35 and
section A.1.6.1). The probability of measuring an observable is given by:

Pr(m) = Tr(ρPm ), (3.5.15)

for a system in the state ρ (equation (A.1.14)). The expectation value of the ob-
servable is given by equation:
!
X X
⟨O⟩ = Tr(ρO) = Tr ρ mPm = m Pr(m). (3.5.16)
m m

As detailed below, the QDataSet contains measurement outcomes for a variety of

noiseless and noisy systems. The measurement operators chosen are the Pauli op-
erators described below and the QDataSet contains the expectation values for each
Pauli measurement operator. In a classical machine learning context, these measure-
ment statistics form training data labels in optimisation problems, such as designing
algorithms that can efficiently sequence control pulses in order to efficiently (time-
minimally) synthesise a target state or unitary (and thus undertake a quantum
computation) of interest.
3.5. EXPERIMENTAL METHODS 95

3.5.3.5 Pauli matrices

The set of measurement operators for the QDataSet is the set of Pauli operators
which are important operators in quantum information processing involving qubit
systems. This is in part because such qubit systems can be usefully decomposed
into a Pauli operator basis via the Pauli matrices:
! ! !
0 1 0 −i 1 0
σx = , σy = , σz = (3.5.17)
1 0 i 0 0 −1

together with the identity (denoted σ0 ). Pauli operators are Hermitian (with eigen-
values +1 and −1), traceless and satisfy that σi2 = I. Together with the identity
matrix (which is sometime denoted by σ0 ), they form an orthonormal basis (with
respect to the Hilbert-Schmidt product defined as ⟨A, B⟩ = Tr(A† B)) for any 2 × 2
Hermitian matrix. QDataSet qubit states can then be expressed in this basis via
the density matrix:

1
ρ= (I + r · σ) (3.5.18)
2

where the vector r = (rx , ry , rz ) is a unit vector called the Bloch vector, and the
vector σ = (σx , σy , σz ). The dot product of these two vectors is just a shorthand
notation for the expression r · σ = rx σx + ry σy + rz σz . As such, a time-dependent
Hamiltonian of a qubit can be expressed as
X
H(t) = αi (t)σi (3.5.19)
i∈{x,y,z}

with the time-dependence absorbed in the coefficients αi (t).

3.5.3.6 Pauli measurements

The measurements simulated in the QDataSet are Pauli measurements. These

are formed by taking the expectation value of each Pauli matrix e.g. Tr(ρσi ) for
i ∈ {x, y, z} (the identity is omitted). The resultant measurement distributions will
typically form labelled data in a machine learning context. Measurement distribu-
tions are ultimately how various properties of the quantum system are inferred (i.e.
via reconstructive inference), such as the characteristics of quantum circuits, evolu-
tionary paths and tomographical quantum state description. As we describe below,
measurements in the QDataSet comprise measurements on each eigenstate (six in
total) of each Pauli operator by all Pauli operators. Hermitian operators have a
96 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

spectral decomposition in terms of eigenvalues and their corresponding projectors

!
1 1 0
P0 = |0⟩⟨0| = (I + σz ) = (3.5.20)
2 0 0
!
1 0 0
P1 = |1⟩⟨1| = (I − σz ) = (3.5.21)
2 0 1

thus we can write:

! !
1 0 0 0
σz = 1 × −1× (3.5.22)
0 0 0 1

For example, a Pauli measurement on a qubit in the −1 eigenstate with respect to

the σz operator:
! !! !
0 0 1 0 0 0
Tr(ρσz† ) = Tr = Tr = −1 (3.5.23)
0 1 0 −1 0 −1

which is as expected. The probability of observing λ = −1 in this state we should

expect to be unity (given the state is in the eigenstate):

P r(m = −1) = Tr(P12 ρ) = 1 (3.5.24)

For n-qubit systems (such as two-qubit systems in the QDataSet), Pauli measure-
ments are represented by tensor-products of Pauli operators. For example, a σz
measurement on the first qubit and σx on the second is represented as:

σz ⊗ σx (3.5.25)

In programming matrix notation, this becomes represented as a 4×4 matrix (tensor):

! !
1 0 0 1
σz ⊗ σX = ⊗ (3.5.26)
0 −1 1 0
 ! 
0 1
1 × 0 
 1 0 
= ! (3.5.27)
 0 1 
0 −1 ×
 
1 0
 
0 1 0 0
 
1 0 0 0 
=0 0 0 −1
 (3.5.28)
 
0 0 −1 0
3.5. EXPERIMENTAL METHODS 97

The Pauli representation of qubits used in the QDataSet can be usefully visu-
alised via the Bloch sphere as per Figure 3.4. The axes of the Bloch sphere are the
expectation values of the Pauli σx , σy and σz operators respectively. As each Pauli
operator has eigenvalues 1 and −1, the eigenvalues can be plotted along axes of the
p
2-sphere. For a pure (non-decohered) quantum state ρ, |ρ| = rx2 + ry2 + rz2 = 1
(as we require Trρ2 = 1), thus ρ is represented on the Bloch 2-sphere as a vector
originating at the origin and lying on the surface of the Bloch 2-sphere. The evo-
lution of the qubit i.e. a computation according to unitary evolution can then be
represented as rotations of ρ across the Bloch sphere. In noisy contexts, decohered
ρ are represented whereby |ρ| < 1 i.e. the norm of ρ shrinks and ρ no longer resides
on the surface.
For machine learning practitioners, it is useful to appreciate the operation of
the QDataSet Pauli operators σx , σy , σz as the generators of rotations about the
respective axes of the Bloch sphere. Represented on a Bloch sphere, the application
of σz to a qubit is equivalent to rotating the quantum state vector ρ about the z-axis
(see Figure 3.4). Conceptually, a qubit is in a z-eigenstate if it is lying directly on
either the north (+1) or south (−1) pole. Rotating about the z-axis then is akin to
rotating the vector on the spot, thus no change in the quantum states (or eigenvalues)
for σz occurs because the system exhibits symmetry under such transformations.
This is similarly the case for σx , σy generators with respect to their eigenvalues and
eigenvectors. However, rotations by σα will affect the eigenvalues/vectors in the σβ
basis where α ̸= β e.g. a σx rotation will affect the component of the qubit lying
along the σz axis. Similarly, a σz rotation of a qubit in a σx eigenstate will alter that
state (shown in (a) and (b) of Figure 3.4). An understanding of Pauli operators and
conceptualisation of qubit axes is important to the understanding of the simulated
QDataSet. An understanding of symmetries of relevance to qubit evolution (and
quantum algorithms) is also beneficial. As we describe below, controls or noise are
structured to be applied along particular axes of a qubit and thus can be thought
of as a way to control or distortions upon the axial rotations of a qubit effected by
the corresponding Pauli generator.
There exist higher dimensional generalization to the Pauli matrices that allow
forming orthonormal basis to represent operators in these dimensions. In particular
if we have a system of N qubits, then one simple generalization is to form the set
(1) (2) (N )
{σi1 ⊗σi2 ⊗· · · σiN }ij ∈{0,x,y,z} . In other words we take tensor products of the Pauli’s
which gives a set of size 4N . For example, for a two-qubit system we can form the
16− element set {σ0 ⊗ σ0 , σ0 ⊗ σx , σ0 ⊗ σy , σ0 ⊗ σz , σx ⊗ σ0 , σx ⊗ σx , σx ⊗ σy , σx ⊗
σz , σy ⊗ σ0 , σy ⊗ σx , σy ⊗ σy , σy ⊗ σz , σz ⊗ σ0 , σz ⊗ σx , σz ⊗ σy , σz ⊗ σz }. Moreover,
for many use cases, we are interested in the minimal number of operators, such as
Pauli operators, required to achieve a requisite level of control, such as universal
98 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

quantum computation.
For the single qubit system, initial states are the two eigenstates of each Pauli
operator. As noted above, the quantum state can be decomposed in the Pauli basis
as ρj = 12 (I ± σj ), for j = 1, 2, 3. This gives a total of 6 states. We perform the
three Pauli measurements on each of these states, resulting in a total of 18 possible
combinations. These 18 measurements are important to characterize a qubit system.
For two-qubits, it will be similar but now we initialize every individual qubit into
the 6 possible eigenstates, and we measure all 15 Pauli observables (we exclude
identity). This gives a total of 540 possible combinations.

3.5.3.7 Monte Carlo measurements

Measurements of the one- and two-qubit systems for the QDataSet are undertaken
using Monte Carlo techniques. This means that a random Pauli measurement is un-
dertaken multiple times, with the measurement results averaged in order to provide
the resultant measurement distribution for each of the operators. The measurement
of the quantum systems is contingent on the noise realisations for each system. For
the noiseless case, the Pauli measurements are simply the Monte Carlo averages
(expectations) of the Pauli operators. Systems with noise will have one or more
noise realisations (applications of noise) applied to them. To account for this, we
include two separate sets of measurement distribution. The first the expectation
value of the three Pauli operators over all possible initial states for each different
noise realisation. These statistics are given by the set {VO } in the QDataSet. Thus
for each type of noise, there will be a set of measurement statistics. The second
is a set of measurement statistics where we average over all noise realisations for
the dataset. This second set of measurements is given by the set {EO }. Including
both sets of measurements enables algorithms trained using the QDataSet to be
more fine-grained in their treatment of noise: in some contexts, while noise profiles
may be uncertain, it is clear that the noise is of a certain type, so the first set of
measurement statistics may be more applicable. For other cases, there is almost no
information about noise profiles or their sources, in which case the average over all
noise realisations may be more appropriate.

3.5.3.8 Monte Carlo Simulator

For the benefit of researchers using the QDataSet, we briefly set out a bit more
detail of how the datasets were generated. The simulator comprises three main
components. The first approximates time-ordered unitary evolution. The second
component generates realisations of noise given random parametrisations of the
power spectral density (PSD) of the noise. The third component simulates quantum
3.6. SIMULATION RESULTS 99

measurement. The simulations are based upon Monte Carlo methods whereby K
randomised pulse sequences give rise to noise realisations. The quantum systems
are then measured to determine the extent to which the noise realisations affect the
expectation values. Trial and error indicated a stabilisation of measurement statis-
tics at around K = 500, thus K ≥ 1000 was chosen for the final simulation run to
generate the QDataSet. The final Pauli measurements are then averages over such
noise realisations. The parameter K is included for each dataset and example (as
described below). For more detail, including useful pseudocode that sets out the
relationship between noise realisations, β(t) and measurement, see supplementary
material in [82] (which we set out for completeness in section 3.13.1).

3.6 Simulation Results

3.6.1 QDataSet form

Quantum information in the QDataSet is stored following the Tensorflow convention

of interpreting multidimensional arrays. For example the noise Hamiltonian for
one example is stored as a (1, M, K, 2, 2) array, where the first dimension is the
batch, the second is time assuming M steps, then whatever comes next is related
to the object itself. In this case the third dimension denotes the noise realization
assuming a maximum of K realizations, and the last two dimensions ensure we
have a square matrix of size 2. The simulation of the datasets is based on a Monte
Carlo method, where a number of evolution trajectories are simulated and then
averaged to calculate the observables. The exact details can be found in [82] (see
“Implementation” section) which we reproduce and expand upon in this work for
completeness.

3.6.2 QDataSet parameters

Further detail regarding the 52 datasets (including code for simulations) that we
present in this work for use solving engineering applications discussed in section 3.8
using classical machine learning can be found on the repository for the QDataSet
[39]. Table (3.2) sets out the taxonomy of each of the 52 different datasets. Each
dataset comprises 10,000 examples that are compressed into a Pickle file which is in
turn compressed into a zip file. The Item field indicates the dictionary key and the
Description field indicates the dictionary value.
100 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

3.6.3 Greybox Algorithms

The simulation.py module is responsible for simulating a quantum system in the
presence of noise. This module is constructed using the TensorFlow framework and
consists of the following classes (background in relation to which is set out in section
D.9) above. We set out a brief description of key components of the code below:

• Noise Layer : An internal class dedicated to the generation of noise within the
simulation as set out in the Monte Carlo method in [82].

• HamiltonianConstruction: An internal Python class designed for constructing

the Hamiltonians necessary for the quantum system’s dynamics.

• QuantumCell : This internal Python class is essential for realizing the time-
ordered evolution of the quantum system.

• QuantumEvolution: This Python class implements the time-ordered quantum

evolution, ensuring the system evolves in accordance with the specified Hamil-
tonians.

• Quantum Measurement: An internal Python class that models the effects of

coupling losses at the output of the quantum system, simulating the process
of quantum measurements.

• VO Layer : An internal Python class used to calculate the Vo operator (see

section 3.5.3.3 above).

• quantumTFsim: This is the principal class in the Python module. It defines

the machine learning model for the qubit, seamlessly integrating quantum
simulation with the power of TensorFlow.

3.6.4 Datasets and naming convention

Each dataset can be categorised according to the number of qubits in the system
and the noise profile to which the system was subject. Table (3.6) sets out a sum-
mary of such categories. While other types of profiles or combinations could have
been utilised, our aim was to select categories which reflect the types of noise and
categorisations relevant to experimental laboratories working on problems such as
quantum computation. For category 1 of the datasets, we created datasets with
noise profiles N1, N2, N3, N4, together with the noiseless case. This gives a total
of 5 datasets. For category 2, the noise profiles for the X and Z axes respectively
are chosen to be (N1,N5), (N1,N6), (N3,N6). Together with the noiseless case, this
gives a total of 4 datasets. For category 3 (two-qubit system), we chose only the
3.6. SIMULATION RESULTS 101

1Z (identity on the first qubit, noise along the z−axis for the second) and Z1 (noise
along the z−axis for the first qubit, identity along the second) noise to follow the
(N1,N6) profile. This category simulates two individual qubits with correlated noise
sources. For category 4, we generate the noiseless, (N1,N5), and (N1,N6) for the 1Z
and Z1 noise. This gives 3 datasets. Therefore, the total number of datasets at this
point is 13. Including the two types of control waveforms, this gives a total of 26.
If we also include the cases of distortion and non-distorted control, then this gives
a total of 52 datasets. Comprehensive detail on the noise profiles used to generate
the datasets is set-out above.
We chose a convention for the naming of the dataset to try delivering as much
information as possible about the chosen parameters for this particular dataset. The
name is partitioned into 6 parts, separated by an underscore sign “ ”. We explicate
each part below:

1. The first part is either the letter “G” or “S” to denote whether the control
waveform is Gaussian or square.

2. The second part is either ”1q” or “2q” to denote the dimensionality of the
system (i.e. the number of qubits).

3. The third part denotes the control Hamiltonian. It is formed by listing down
the Pauli operators we are using for the control for each qubit, and we separate
between qubit by a hyphen “-”. For example, category 1 datasets will have
“X”, while category 4 with have “IX-XI-XX”.

4. The fourth part and fifth parts indicate (i) the axis along which noise is ap-
plied (fourth part) and (ii) the type of noise along each axis (fifth part). So
“G 2q IX-XI IZ-ZI N1-N6” represents two qubits with control along the x axis
of each qubit, while the noise is applied along the z-axis of each. In this case,
N1 noise is applied along the z-axis of the first qubit and N6 noise is applied
along the z-axis of the second qubit. For datasets where no noise is applied,
these two parts are omitted.

5. Finally, the sixth part denotes the presence of control distortions by the letter
“D”, otherwise it is not included.

For example, the dataset “G 2q IX-XI-XX IZ-ZI N1-N6” is two qubit, Gaussian
pulses with no distortions, local X control on each qubit and an interacting XX
control along with local noise on each qubit with profile N1 on the first qubit z-axis
and N6 on the second qubit z-axis. Another example the dataset “S 1q XY D”, is
a single-qubit system with square distorted control pulses along X and Y axis, and
there is no noise.
102 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

3.7 Technical Validation

Technical validation of the QDataSet was undertaken by comparing the QDataSet
data against leading quantum simulation toolkit Qutip, an open-source software for
simulating the dynamics of open quantum systems [139]. For each of the 26 different
simulated systems (each comprising a noisy and noise-free case), the procedure set
out below was adopted. A Jupyter notebook containing the code used for technical
validation and verification of the datasets is available on the QDataSet repository.

3.7.1 Distortion analysis

Firstly, for each of the one- and two-qubit datasets, the distorted and undistorted
pulse sequences were compared for a sample of examples from each dataset in order
to assess the effect of relevant distortion filters. A plot of the comparison for the
single-qubit case with Gaussian control along the x-axis is presented in Figure (3.1).
The plot compares the distorted sequence of control Gaussian control pulses for the
undistorted case (blue) and distorted case (orange). The expectation was for a shift
in the Gaussian pulse curves as a result of the distortion filters. An error with the
datasets would have seen the distorted pulses not resemble such the undistorted
pulses significantly and, moreover, would have likely seen non-Gaussian pulse forms.
As can be seen from Figure (3.1), the distorted and undistorted pulses appear almost
identical but for a shift and a minor amplitude (f (t)) reduction in the distorted case
, which was seen for each example. This provided us with assurance that simulation
was appropriately modelling distortion of the pulses. As a second assurance, we
plotted the effect of the distortion filter (when applied) and evaluate the frequency
response of the filter. The aim of this process was to identify visually whether the
form frequency response H(Ω) and phase response Ω exhibit the appropriate form
(i.e. no obvious errors). The verification plot is shown in Figure (3.2).

3.7.2 Comparison with Qutip

The second and primary technical validation of the QDataSet was undertaken by
comparing mean expectation values of observables for subsamples of each of the
datasets against the equivalent expectations for simulations and measurements un-
dertaken in using Qutip [139]. To generate the Qutip equivalents, the equivalent
parameters (e.g. Hamiltonian parameters, pulse parameters) were input into Qutip
to generate the relevant outputs. For each dataset in the QDataSet, the verification
procedure was run on varying samples. To undertake this process, we adopted two
validation strategies.
3.8. USAGE NOTES 103

• Mean expectation of all observables over all noise realisations. In this case, for
a sample of examples from each dataset in the QDataSet, the mean expectation
over all noise realisations for all observables (i.e. measurements) was compared
against the same mean measurements for the equivalent simulation generated
in Qutip. This was done for the noiseless and noisy case. The two means were
then compared. On average the error (difference between the means) of the
order 10−06 , demonstrating the equivalence of the QDataSet simulation code
with that from Qutip.

• Mean expectation of single observables over separate noise realisations. In the

second case, the mean expectation over all noise realisations for each separate
observable was compared against the same mean measurements for the equiva-
lent simulation generated in Qutip. Again, this was done for the noiseless and
noisy case. Comparison of the two means showed that on average the error
(difference between the means) of the order 10−07 , in turn demonstrating the
equivalence of the QDataSet simulation code with that from Qutip.

3.8 Usage Notes

Overview
In this section, we include further usage notes related to the 52 QDataSet datasets
based on simulations of one- and two-qubit systems evolving in the presence and/or
absence of noise subject to a variety of controls. Recall that the QDataSet has been
developed primarily for use in training, benchmarking and competitive development
of classical and quantum algorithms for common tasks in quantum control, quan-
tum tomography and noise spectroscopy. It has been generated using customised
code drawing upon base-level Python packages in order to facilitate interoperability
and portability across common machine learning and quantum programming plat-
forms. Each dataset consists of 10,000 samples which in turn comprise a range of
data relevant to the training of machine learning algorithms for solving optimisation
problems. The data includes a range of information (stored in list, matrix or tensor
format) regarding quantum systems and their evolution, such as: quantum state
vectors, drift and control Hamiltonians and unitaries, Pauli measurement distribu-
tions, time series data, pulse sequence data for square and Gaussian pulses and noise
and distortion data.
Researchers can use the QDataSet in a variety of ways to design algorithms for
solving problems in quantum control, quantum tomography and quantum circuit
synthesis, together with algorithms focused on classifying or simulating such data.
104 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

We also provide working examples of how to use the QDataSet in practice and its use
in benchmarking certain algorithms. Each part below provides in-depth detail on
the QDataSet for researchers who may be unfamiliar with quantum computing, to-
gether with specifications for domain experts within quantum engineering, quantum
computation and quantum machine learning.
The aim of generating the datasets is threefold: (a) simulating typical quantum
engineering systems, dynamics and controls used in laboratories; (b) using such
datasets as a basis to train machine learning algorithms to solve certain problems or
achieve certain objectives, such as attainment of a quantum state ρ, quantum circuit
U or quantum control problem generally (among others); and (c) enable optimisation
of algorithms and spur development of optimised algorithms for solving problems
in quantum information, analogously with the role of large datasets in the classical
setting. We explain these use cases in more detail below:

1. Datasets as simulations. Firstly, we have aimed to generate datasets which

abstractly simulate the types of data, characteristics and features which would
be commonly used in laboratories and experimental setups. Each dataset
is an abstractions (say of particular Hamiltonians, or noise profiles) which
can have any number of physical realisations depending on the experimental
design. So different experiments can ultimately realise, in the abstract the
same or a sufficiently similar structure as that provided by the data. This is
an important design choice relating to how the QDataSet is intended to be
used. For example, the implementation of the particular Hamiltonians or state
preparation may be done using trapped-ion setups, quantum dot or transmon-
based qubits [160], doped systems or otherwise. We assume the availability
of a mapping between the dataset features, such as the controls pulses, and
particular control devices (such as voltage or microwave-based controls), for
example, in the laboratory.

2. Training algorithms using datasets. The second use case for the QDataSet
is related but distinct from the first. The aim is that training models us-
ing the datasets has applicability to experimental setups. Thus, for example,
a machine learning model trained using the datasets in theory should pro-
vide, for example, the optimal set of pulses or interventions needed to solve
(and, indeed, optimise) for some objective. It is intended that the output
of the machine learning model is an abstraction which can then be realised
via the specific experimental setup. The aim then is that the abstraction of
each experiments setup allows the application of a variety of machine learn-
ing models for optimising in a way that is directly applicable to experimental
setups, rather than relying upon experimentalists to then work-out how to
3.8. USAGE NOTES 105

translate the model’s output into their particular experimental context. Re-
quiring conformity of outputs within these abstract criteria thus facilitates a
greater, practical, synthesis between machine learning and the implementation
of solutions and procedures in experiments.

3. Benchmarking, development and testing. The third primary use of the datasets
is to provide a basis for benchmarking, development and testing of existing and
new algorithms in quantum machine learning for quantum control, tomography
and related to noise mitigation. As discussed above, classical machine learning
has historically been characterised by the availability of large-scale datasets
with which to train and develop algorithms. The role of these large datasets
is multifaceted: (i) they provide a means of benchmarking algorithms (see
above), such that a common set of problem parameters, constraints and ob-
jectives allows comparison among different models; (ii) their size often means
they provide a richer source of overt and latent (or constructible) features
which machine learning models may draw upon, improving the versatility and
diversity of models which may be usefully trained. The aim of the QDataSet
is then that it can be used in tandem by researchers as benchmarking tool for
algorithms which they may wish to apply to their own data or experiments.

3.8.1 QDataSet Control Sets

For some machine learning applications, we are interested in minimising the number
of controls that must be applied to a quantum system (thus minimising the resources
required to control the system). In such cases, we may seek a minimal control al-
gebra or gate set. For example, the minimal number of Pauli operators required
to achieve a complete control set of generators for synthesising an arbitrary unitary
acting on n-qubits in a 2n dimensional Hilbert space is given by a bracket-generating
set (distribution) ∆ ⊆ su(2n ) [40, 50] (see discussion in Appendix B) which can be
understood in more complex treatments in the context of Lie algebras and represen-
tation theory (the subject of Chapter 5 in particular). Here su(2n ) represents the
Lie algebra corresponding to the Pauli group SU(2n ), the complete set of generators
required to span the n dimensional Hilbert space in the Pauli basis. Given ∆, we
can reconstruct the full Pauli basis via the operation of the Lie bracket [54] (though
noting this may require additional algorithmic design choices involving ancilla for 2-
local operations in the presence of unitary symmetries as per [146]). The QDataSet
generators for one- and two-qubit systems are simply one and two (tensor-product)
sets of Paulis respectively (i.e. not the minimal set ∆). For higher-dimensional
problems, whether to restrict generators to those within ∆ becomes a consideration
for machine learning architectures (see literature on time-optimal quantum control
106 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

such as [15]). These generators will typically be used as the tensors or matrices
to which classical controls are applied within machine learning architectures. We
explore these questions in other Chapters, especially in the geometric and algebraic
context of subRiemannian quantum control and also classical Pontryagin-based con-
trol theory.

3.9 Machine learning using the QDataSet

There are many problems related to the characterization and control of quantum
systems that can be solved using machine learning techniques. In this section, we
give an overview of a number of such problems and how to approach them using
machine learning. We provide a brief overview of the different types of problems in
quantum computing, engineered quantum systems and quantum control for which
the QDataSet and algorithms trained using it may be useful. The list is necessarily
non-exhaustive and is intended to provide some direction mainly to machine learning
researchers unfamiliar with key problems in applied quantum science.

3.9.1 Benchmarking
Benchmarking algorithms using standardised datasets is an important developmen-
tal characteristics of classical machine learning. Benchmarks provide standardised
datasets, preprocessing protocols, metrics, architectural features (such as optimis-
ers, loss functions and regularisation techniques) which ultimately enable research
communities to precisify their research contributions and improve upon state of the
art results. Results in classical machine learning are typically presented by com-
parison with known benchmarks in the field and adjudged by the extent to which
they outperform the current state of the art benchmarks. Results are presented in
tabular format with standardised metrics for comparison, such as accuracy, F1-score
or AUC/ROCR statistics. The QDataSet has been designed with these metrics in
mind. For example, a range of classical or quantum statistics (e.g. fidelity) can be
used to benchmark the performance of algorithms that use the datasets in training.
The role of benchmarking is important in classical contexts. Firstly, it enables a
basis for researchers across machine learning subdisciplines to gauge the extent to
which their results correlate to algorithmic design as distinct from unique features
of training data or use cases. Secondly, it provides a basis for better assessing the
algorithmic state of the art within subfields. Given its relative nascency, QML lit-
erature tends to focus on providing proof-of-concept examples as to how classical,
hybrid or quantum-native algorithms can be used for classification or regression
tasks. There is little in the way of systematic benchmarking of QML algorithms
3.9. MACHINE LEARNING USING THE QDATASET 107

against their classical counterparts in terms of performance of specifically machine

learning algorithms.
Recent examples in a QML setting of benchmarking include comparisons of us-
ing different error distributions relevant to quantum chemistry (and how these affect
performance) [161], benchmarking machine learning algorithms for adaptive phase
estimation [162] and generative machine learning with tensor networks [163]. In
quantum information science more broadly, comparison with classical algorithms is
often driven from computational complexity considerations and the search for quan-
tum supremacy or outperformance, namely whether there exists a classical algorithm
which can achieve results with equivalent efficiency of the quantum algorithm. Users
of the QDataSet for QML research into quantum control, tomography or noise mit-
igation would benefit from adopting (and adapting) practices common in classical
machine learning when reporting results, especially the inclusion of benchmarks
against leading state of the art algorithms for particular use-cases, such as classifi-
cation or regression tasks. Selecting the appropriate benchmarking algorithms itself
tends to benefit from domain expertise. The QDataSet has been designed in order
to be benchmarked against both classical and quantum algorithms.

3.9.2 Benchmarking by learning protocol

As discussed in Appendix D, typically machine learning algorithm classification is

based firstly on whether the learning protocols are supervised, unsupervised or semi-
supervised [12, 14]. Supervised learning uses known input and output (label) data
to train algorithms to estimate label data. Algorithmic models are updated ac-
cording to an optimisation protocol, typically gradient descent, in order to achieve
some objective, such as minimisation of a loss function that compares the simi-
larity of estimates to label data. Unsupervised learning, by contrast, is a learning
protocol where label or classification of data is unknown and must be estimated
via grouping or clustering together in order to ascertain identifying features. Com-
mon techniques include clustering, dimensionality reduction techniques or graph-
based methods. Semi-supervised learning is an intermediate algorithmic classifica-
tion drawing on aspects of both supervised and unsupervised learning protocols.
Usually known label data, say where only part of a dataset is labelled or classified,
is included in architectures in order to learn the classifications which in turn can be
used in supervised contexts. The QDataSet can be used in a variety of supervised,
unsupervised or semi-supervised contexts. For example, training an algorithm for
optimal quantum control can be undertaken in a supervised context (using pulse
data, measurement statistics or Hamiltonian sequences) as label data and modelling
estimates accordingly. Alternatively, semi-supervised or unsupervised protocols for
108 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

tomographic classification can be trained using the QDataSet. In any case, an un-
derstanding of standard and state of the art algorithms in each category can provide
QML researchers using the QDataSet with a basis for benchmarking their own al-
gorithms and inform the design of especially hybrid approaches (see [111] for an
overview and for quantum examples of the above).

3.9.3 Benchmarking by objectives and architecture

The choice of benchmarking algorithms will also be informed by the objectives and
architecture. Classically, algorithms can be parsed into various categories. Typ-
ically they are either regression-based algorithms (see discussion of linear models
in section D.4.1), used where the objective is to estimate (and minimise error in
relation to) continuous data or classification-based algorithms, where the objective
is to classify data into discrete categories. Regression algorithms are algorithms
that seek to model relationships between input variables and outputs iteratively
by updating models (and estimates) in order to minimise error between estimates
and label data. Typical regression algorithms usually fall within broader families
of generalised linear models (GLMs) [164] and include algorithms such as ordinary
least squares, linear and logistic regression, logit and probit models, multivariate
models and other models depending on link functions of interest. GLMs are also
characterised by regularisation techniques that seek to optimise via penalising higher
complexity, outlier weights or high variance. GLMs offer more flexibility for use in
QML and for using the QDataSet in particular as they are not confined to assuming
errors are normally distributed. Other approaches using Bayesian methods, such
as naive Bayes, Gaussian Bayes, Bayesian networks and averaged one-dependence
estimators provide yet further avenues for benchmarking algorithms trained on the
QDataSet for classification or regression tasks.
Classification models aim to solve decision problems via classification. They
typically compare new data to existing datasets using a metric or distance measure.
Examples include clustering algorithms such as k-nearest neighbour, support vector
machines, learning vector quantisation, decision-trees, locally weighted learning, or
graphical models using spatial filtering. Most of the algorithms mentioned thus far
fall within traditional machine learning.
Over the last several decades or so, neural network architectures have emerged
as a driving force of machine learning globally. Quantum analogues and hybrid
neural network architecture has itself a relatively long lineage, including quantum
analogues of perceptrons, quantum neural networks, quantum Hopfield networks
(see [103,111]) through to modern deep learning architectures (such as convolutional,
recurrent, graphical and hierarchical neural networks, generative models [14]) and
3.10. EXAMPLE APPLICATIONS OF THE QDATASET 109

transformer-based models [165]. See section D.4.2 for detailed discussion of neu-
ral network components and architecture. One feature of algorithmic development
that is particularly important is dealing with the curse of dimensionality - and in
a quantum context, barren plateaus [78] (see section D.6.4). Common techniques
to address such problems include dimensionality reduction techniques or symmetry-
based (for example, tensor network) techniques whose ultimate goal is to reduce
datasets down to their most informative structures while maintaining computational
feasibility. While the QDataSet only extends to two-qubit simulations, the size and
complexity of the data suggests the utility of dimensionality-reduction techniques for
particular problems, such as tomographic state characterisation. To this end, algo-
rithms developed using the QDataSet can benefit from benchmarking and adapting
classical dimensionality-reduction techniques, such as principal component analy-
sis, partial regression, singular value decompositions, matrix factorisation and other
techniques [12]. It is also important to mention that there has been considerable
work in QML generally toward the development of quantum and hybrid analogues
of such techniques. These too should be considered when seeking benchmarks.
Finally, it is worth mentioning the use (and importance) of ensemble methods
in classical machine learning. Ensemble methods tend to combine what are known
as ‘weak learner’ algorithms into an ensemble which, in aggregate, outperforms any
individual instance of the algorithm. Each weak learner’s performance is updated
by reference to a subset of the population of weak learners. Such techniques would
be suitable for use when training algorithms on the QDataSet. Popular examples of
such algorithms are gradient-boosting algorithms, such as xGboost [166].

3.10 Example applications of the QDataSet

In this section, we outline a number of applications for which the QDataSet can be
used. These include training machine learning algorithms for use in quantum state
(or process) tomography, quantum noise spectroscopy and quantum control. The
QDataSet repository contains a number of example Jupyter notebooks correspond-
ing to the examples below. The idea behind these datasets is that machine learning
practitioners can input their own algorithms into the code to run experiments and
test how well their algorithms perform.

3.10.1 Quantum state tomography

Quantum state tomography involves reconstructing an estimate ρ̂ of the state of a
quantum system given a set of measured observables. Here we summarise the use
of the QDataSet for tomography tasks. Discussion of measurement protocols is ex-
110 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

panded upon in more detail in section A.1.6. The quantum state of interest may be
in either a mixed or pure state and the task is to uniquely identify the state among
a range of potential states. Tomography requires that measurements be tomographi-
cally complete (and therefore informationally complete, see definition A.1.36), which
means that the set of measurement operators form a basis for the Hilbert space of
interest. That is, a set of measurement operators {Mm } is tomographically complete
if for every operator A ∈ B(H), there exists a representation of A in terms of {Mm }.

Abstractly, the problem involves stipulating a set of operators {Oi }i as input,

and the corresponding desired target outputs {⟨O⟩i }i . The objective is to find the
best model that fits this data. We know that the relation between these is given by
⟨Oi ⟩ = Tr(ρOi ) and we can use this fact to find the estimate of the state. Tomogra-
phy requires repeatedly undertaking different measurements upon quantum states
described by identical density matrices which in turn gives rise to a measurement
distribution from which probabilities of observables can be inferred. Such inferred
probabilities are used to in turn construct a density matrix consistent with observed
measurement distributions thus characterising the state. More formally, assuming
an informationally complete positive-operator valued measure (POVM) {Oi } span-
ning the Hilbert-Schmidt space B(H) of operators on H, we can write the probability
of an observation i given density matrix ρ using the Hilbert-Schmidt norm above
i.e:

p(i|ρ) = ⟨Oi ⟩ = Tr(ρOi ) (3.10.1)

Data are gathered from a discrete set of experiments, where each experiment is a
process of initial state preparation, by applying a sequence of gates {Gj } and mea-
suring. This experimental process and measurement is repeated N times leading to
a frequency count ni of a particular observable i. The probability of that observable
is then estimated as:

ni
p(i|ρ) ≈ = p̂i
N

from which we reconstruct the density matrix ρ. For a detailed exposition of tomog-
raphy formalism and conditions, see D’Alessandro [15]. Quantum process tomogra-
phy is a related but distinct type of tomography. In this case, we also have a set of
test states {ρj } which span B(H). To undertake process tomography, an unknown
gate sequence Gk comprising K gates is applied to the states such that:

ni
p(i|G, ρj ) ≈ = p̂j,i (3.10.2)
N

Connecting with discussion of measurement in Appendix A, we observe that a set

3.10. EXAMPLE APPLICATIONS OF THE QDATASET 111

of measurement operators {Mm } is tomographically complete if for every operator

A ∈ B(H), there exists a representation of A in terms of {Mm }, i.e.,
X
A= am Mm ,
m

where am ∈ C. Given a POVM {Em } and a series of measurement outcomes {p(m)}

for a state ρ, ρ can be reconstructed via solving the following set of linear equations:

p(m) = Tr(ρEm ), ∀m ∈ Σ,

subject to the constraints that ρ is positive semi-definite.

The QDataSet can be used to train algorithms for machine learning algorithms
for tomography. Quantum state and process tomography is particularly challeng-
ing. One must ensure that the estimate we get is physical, i.e. positive semi-definite
with unit trace. Furthermore, the number of measurements N required for sufficient
precision to completely characterise ρ scales rapidly. Each of the K gates in a se-
quence Gk requires d2 (d − 1) (where d = dim |B(H)|) experiments (measurements)
to sufficiently characterise the quantum process is Kd4 − (K − 2)d2 − 1 (see [167] for
more detail). Beyond a small number of qubits, it becomes computationally infea-
sible to completely characterise states by direct measurement, thus parametrised or
incomplete tomography must be relied upon. Machine learning techniques naturally
offer potential to assist with such optimisation problems in tomography, especially
neural network approaches where inherent non-linearities may enable sufficient ap-
proximations that traditional tomographic techniques may not. Examples of the
use of classical machine learning include demonstration of improvements due to
neural network-based (non-linear) classifiers over linear classifiers for tomography
tasks [168] and classical convolutional neural networks to assess whether a set of
measurements is informationally complete [169].

The objective of an algorithm trained using the QDataSet may be, for example,
be to predict (within tolerances determined by the use case) the tomographic de-
scription of a final quantum state from a limited set of measurement statistics (to
avoid having to undertake N such experiments for large N ). Each of the one- and
two-qubit datasets is informationally complete with respect to the Pauli operators
(and identity) i.e. can be decomposed into a one- and two-dimensional Pauli basis.
There are a variety of objectives and techniques which may be adopted. Each of the
10,000 examples for each profile constitutes an experiment comprising initial state
preparation, state evolution and measurement. One approach using the QDataSet
would be to try to produce an estimate ρ̂(T ) of the final state ρ(T ) (which can be
reconstructed by application of the unitaries in the QDataSet to the initial states)
112 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

using the set of Pauli measurements {Em }. To train an algorithm for tomography
without a full set of N measurements being undertaken, one can stipulate the aim
of the machine learning algorithm as being to take a subset of those Pauli measure-
ments as input and try to generate a final state ρ̂(T ) that as closely approximates
the known final state ρ(T ) provided by the QDataSet.
A variety of techniques can be used to draw from the measurement distributions
and iteratively update the estimate ρ̂(T ), for example gradient-based updating of
such estimates [170]. The distance measure could be any number of the quantum
metrics described in the background chapters above, including state or operator
fidelity, trace distance of quantum relative entropy. Classical loss functions, such as
MSE or RMSE can then be used (as is familiar to machine learning practitioners) to
construct an appropriate loss function for minimisation. A related, but alternative,
approach is to use batch fidelity where the loss function is to minimise the error
between a vector of ones and fidelities, the vector being the size of the relevant
batch. Similar techniques may also be used to develop tools for use in gate set
tomography, where the sequence of gates Gk is given by the sequence of unitaries U0
in the QDataSet. In that case, the objective would be to train algorithms to estimate
Gk given the set of measurements, either in the presence of absence of noise. Table
(3.3) sets out an example summary for using the QDataSet for tomography.

3.10.2 Quantum noise spectroscopy

The QDataSet can be used to develop and test machine algorithms to assist with
noise spectroscopy. In this problem, we are interested in finding models of the noise
affecting a quantum system given experimental measurements. More background
on noise and quantum measurement is set out in section A.3.3. In terms of the VO
operators discussed earlier, we would like to find an estimate of VO given a set of
control pulse sequences, and the corresponding observables. The QDataSet provides
a sequence of VO operators encoding the average effect of noise on measurement
operators. This set of data can be used to train algorithms to estimate VO from
noisy quantum data, such as noisy measurements or Hamiltonians that include noise
terms. An example approach includes as follows and proceeds from the principle
that we have known information about quantum systems that can be input into
the algorithmic architecture (initial states, controls, even measurements) and we
are trying to estimate unknown quantities (the noise profile). Intermediate inputs
would include the system and noise Hamiltonians H0 , H1 and/or the system and
noise unitaries U0 , U1 . Alternatively, inputs could also include details of the various
noise realisations. The type of inputs will depend on the type of applied use case,
such as how much information may be known about noise sources. Label data could
3.10. EXAMPLE APPLICATIONS OF THE QDATASET 113

be the set of measurements {EO } (expectations of the observables). Given the inputs
(control pulses) and outputs, the problem becomes estimating the mapping {VO },
such that inputs are mapped to outputs via equation (3.5.11). Note that details
about noise realisations or distributions are never accessible experimentally.
Alternatively, architectures may take known information about the system such
as Pauli measurements as inputs or adopt a similar architecture to that in [82, 170]
and construct a multi-layered architecture that replicates the simulation, where the
{V̂O } are extracted from intermediate or custom layers in the architecture. Such
greybox approaches may combine traditional and deep-learning methods and have
the benefit of providing finer-grained control over algorithmic structure by allowing,
for example, the encoding of ‘whitebox’ or known processes from quantum physics
(thereby eliminating the need for the algorithm to learn these processes). Table
(3.4) sets out one example approach that may be adopted.

3.10.3 Quantum control and circuit synthesis

The QDataSet has been designed in particular to facilitate algorithmic design for
quantum control. As discussed in our sections on quantum control (sections A.2
and C.5), we wish to compare different (hybrid and classical) machine learning
algorithms to optimise a typical problem in quantum control, namely describing the
optimal sequence of pulses in order to synthesise a target unitary UT of interest.
Here the datasets form the basis of training, validation and test sets used to train
and verify each algorithm. The target (label) data for quantum control problems
can vary. Typically, the objective of quantum control is to achieve a reachable state
ρ(T ) via the application of control functions to generators, such as Pauli operators.
Achieving the objective means developing an algorithm that outputs a sequence
of control functions which in turn describe the sequence of experimental controls
fα (t). A typical machine learning approach to quantum control takes ρ(T ) as an
input together with intermediate inputs, such as the applicable generators (e.g.
Pauli operators encoded in the system Hamiltonian H0 (t) of the QDataSet). The
algorithm must learn the appropriate time-series distribution of fα (t) (the set of
control pulses included in the QDataSet, their amplitude and sequence in which they
should be applied) in order to synthesise the estimate ρ̂(T ). Some quantum control
problems are agnostic as to the quantum circuit pathway (sequence of unitaries)
taken to reach ρ̂(T ), though usually the requirement is that the circuit be resource
optimal in some sense, such as time-optimal (shortest time) or energy-optimal (least
energy).
One approach is to treat fα (t) as the label data and ρ(T ) as input data to try
to learn a mapping between them. A naive blackbox approach is unlikely to effi-
114 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

ciently solve this problem as it would require learning from scratch solutions to the
Schrödinger equation. A more efficient approach may be to encode known informa-
tion, such as the laws governing Hamiltonian evolution etc into machine learning
architecture, such as greybox approaches described above. In this case, the target
fα (t) must be included as an intermediate input into the system Hamiltonians gov-
erning the evolution of ρ(t), yet remains the output of interest. In such approaches,
the input data would be the initial states of the QDataSet with the label data being
ρ(T ) (and label estimate ρ̂(T )). Applicable loss functions then seek to minimise
the (metric) distance between ρ(T ) and ρ̂(T ), such as fidelity F (ρ(T ), ρ̂(T )). To re-
cover the sought after sequence fα (t), the architecture then requires a way to access
the intermediate state of parameters representing fα (t) within the machine learning
architecture.
If path specificity is not important for a use case, then trained algorithms may
synthesise any pathway to achieve ρ̂(T ), subject to the optimisation constraints. The
trained algorithm need not replicate the pathways taken to reach ρ(T ) in the training
data. If path specificity is desirable, then the QDataSet intermediate operators U0 (t)
and U1 (t) can be used to reconstruct the intermediate states i.e. to recover the time-
independent approximation:

U (t)† ρU (t) ≈ (Un ...U1 )ρ(U1 ...Un ) (3.10.3)

An example of such an approach is in [40] where time-optimal quantum circuit

data, representing geodesics on Lie group manifolds, is used to train algorithms
for generating time-optimal circuits. Table (3.5) sets out schemata for using the
QDataSet in a quantum control context.

3.11 Discussion
In this work, we have presented the QDataSet, a large-scale quantum dataset avail-
able for the development and benchmarking of quantum machine learning algo-
rithms. The 52 datasets in the QDataSet comprise simulations of one- and two-qubit
datasets in a variety of noise-free and noisy contexts together with a number of sce-
narios for exercising control. Large-scale datasets play an important role in classical
machine learning development, often being designed and assembled precisely for
the purpose of algorithm innovation. Despite its burgeoning status, QML lacks
such datasets designed specifically to facilitate QML algorithm development. The
QDataSet has been designed to address this need in the context of quantum control,
tomography and noise spectroscopy, by providing a resource for cross-collaboration
among machine learning practitioners, quantum information researchers and experi-
3.11. DISCUSSION 115

mentalists working on applied quantum systems. In this work we have also ventured
a number of principles which we hope will assist producing large-scale datasets for
QML, including specification of objectives, quantum data features, structuring, pre-
processing. We set-out a number of key desiderata for quantum datasets in general.
We also have aimed to provide sufficient background context across quantum theory,
machine learning and noise spectroscopy for machine learning practitioners to treat
the QDataSet as a point of entry into the field of QML. The QDataSet is suffi-
ciently versatile to enable machine learning researchers to deploy their own domain
expertise to design algorithms of direct use to experimental laboratories.
While designed specifically for problems in quantum control, tomography and
noise mitigation, the scope for the application of the QDataSet in QML research
is expansive. QML is an emerging cross-disciplinary field whose progression will
benefit from the establishment of taxonomies and standardised practices to guide
algorithm development. In this vein, we sketch below a number of proposals for the
future use of the QDataSet, building upon principles upon which the QDataSet was
designed, in order to foster the development of QML datasets and research practices.

1. Algorithm development. The primary research programme flowing from the

QDataSet involves its use in the development of algorithms with direct appli-
cability to quantum experimental and laboratory setups. As discussed above,
the QDataSet has been designed to be versatile and of use across a range of
use cases, such as quantum control, tomography, noise spectroscopy. In ad-
dition, its design enables machine learning practitioners to benchmark their
algorithms. Future research involving the QDataSet could cover a systematic
benchmarking of common types of classical machine learning algorithms for su-
pervised and unsupervised learning. We also anticipate research programmes
expanding upon greybox and hybrid models, using the QDataSet as a way to
benchmark state of the art QML models.

2. Quantum taxonomies. While taxonomies within and across disciplines will dif-
fer and evolve, there is considerable scope for research programmes examining
optimal taxonomic structuring of quantum datasets for QML. In this work, we
have outlined a proposed skeleton taxonomy that datasets for QML may wish
to adopt or adapt, covering specification of objectives, ways in which data
is described, identification of training (in-sample) and test (out-of-sample)
data, data typing, structuring, completeness and visibility. Further research
in these directions could include expanding taxonomic classifications of QML
in ways that connect with classical machine learning taxonomies, taking the
QDataSet as an example. Doing so would facilitate greater cross-collaboration
among computer scientists and quantum researchers by allowing researchers
116 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

to easily transfer their domain expertise.

3. Experimental interoperability. An important factor in expanding the reach and

impact of QML is the extent to which QML algorithms are directly applica-
ble to solving problems in applied engineering settings. Ideally, QML results
and architecture should be ‘platform agnostic’ - able to be applied to a wide
variety of experimental systems, such as superconductor, transmon, trapped
ion, photonic or quantum dot-based setups. Achieving interoperability across
dynamically evolving technological landscapes is challenging for any discipline.
For QML, the more that simulations within common platforms (such as those
mentioned above) can easily integrate into each other and usefully simulate
applied experiments, the greater the reach of algorithms trained using them.
To the extent that the QDataSet can demonstrably be used across various
platforms, algorithm design using it can assist these research imperatives.

We encourage participants in the quantum community to advance the development

of dedicated quantum datasets for the benefit of QML and expect such efforts to
contribute significantly to the advancement of the field and cross-disciplinary col-
laboration.

3.12 Code availability

The datasets are stored in an online repository and are accessible via links on the site.
The largest of the datasets is over 500GB (compressed), the smallest being around
1.4GB (compressed). The QDataSet is provided subject to open-access MIT/CC
licensing for researchers globally. The code used to generate the QDataSet is con-
tained in the associated repository (see below), together with instructions for repro-
duction of the dataset. The QDataSet code requires Tensorflow > 2.0 along with a
current Anaconda installation of Python 3. The code used to simulate the QDataSet
is available via the Github repository [171] (https://github.com/eperrier/QDataSet).
A Jupyter notebook containing the code used for technical validation and verifica-
tion of the datasets is available on this QDataSet Github repository. Please note
that at the time of compilation of this thesis, the QDataSet was in the process of
being updated to another data repository.

3.13 Figures & Tables

3.13. FIGURES & TABLES 117

Item Description
simulation name: name of the dataset;
parameters
dim: the dimension 2n of the Hilbert space for n qubits (dimension 2 for single
qubit, 4 for two qubits);
Ω: the spectral energy gap;
static operators: a list of matrices representing the time-independent parts of the
Hamiltonian (i.e. drift components);
dynamic operators: a list of matrices representing the time-dependent parts of
the Hamiltonian (i.e. control components), without the pulses. So, if we have a
term f (t)σx + g(t)σy , this list will be [σx , σy ]. This dynamic operators are further
distinguished (and labelled) according to being (i) undistorted pulses (labelled
pluses) or (ii) distorted pulses (labelled distorted );
noise operators: a list of time-dependent parts of the Hamiltonian that are stochas-
tic (i.e. noise components). so if we have terms like β1 (t)σz + β2 (t)σy , the list will
be [σz , σy ];
measurement operators: Pauli operators (including identity) (I, σx , σy , σz )
initial states: the six eigenstates of the Pauli operators;
T : total time (normalised to unity);
num ex : number of examples, set to 10,000;
batch size: size of batch used in data generation (default is 50);
K: number of randomised pulse sequences in Monte Carlo simulation of noise (set
to K = 2000);
noise profile: N0 to N6 (see above);
pulse shape: Gaussian or Square;
num pulses: number of pulses per interval;
elapsed time: time taken to generate the datasets.
pulse parameters The control pulse sequence parameters for the example:
Square pulses: Ak amplitude at time tk ;
Gaussian pulses: Ak (amplitude), µ (mean) and σ (standard deviation).
time range A sequence of time intervals ∆tj with j = 1, ..., M ;
pulses Time-domain waveform of the control pulse sequence.
distorted pulses Time-domain waveform of the distorted control pulse sequence (if there are no
distortions, the waveform will be identical to the undistorted pulses).
expectations The Pauli expectation values 18 or 52 depending on whether one or two qubits
(see above). For each state, the order of measurement is: σx , σy , σz applied to the
evolved initial states. As the quantum state is evolving in time, the expectations
will range within the interval [1, −1].
VO operator The VO operators corresponding to the three Pauli observables, obtained by aver-
aging the operators WO over all noise realizations.
noise Time domain realisations of the relevant noise.
H0 The system Hamiltonian H0 (t) for time-step j.
H1 The noise Hamiltonian H1 (t) for each noise realization at time-step j.
U0 The system evolution matrix U0 (t) in the absence of noise at time-step j.
UI The interaction unitary UI (t) for each noise realization at time-step j.
VO Set of 3 × 2000 expectation values (measurements) of the three Pauli observables
for all possible states for each noise realization. For each state, the order of mea-
surement is: σx , σy , σz applied to the evolved initial states.

EO The expectations values (measurements) of the three Pauli observables for all
possible states averaged over all noise realizations. For each state, the order of
measurement is: σx , σy , σz applied to the evolved initial states.

Table 3.2: QDataSet characteristics. The left column identifies each item in the respective
QDataSet examples (expressed as keys in the relevant Python dictionary) while the description
column describes each item.
118 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Item Description
Objective Algorithm to learn characterisation of state ρ given measurements {EO }.
Inputs Set of Pauli measurements {EO }, one for each of the M experiments (in
the QDataSet, this is
Label Final state ρ(T )
Intermediate in- Hamiltonians, Unitary operators, Initial states ρ0
puts
Output Estimate of final state ρ̂(T )
Metric State fidelity F (ρ, ρ̂), Quantum relative entropy

Table 3.3: QDataSet features for quantum state tomography. The left columns lists typical cate-
gories in a machine learning architecture. The right column describes the corresponding feature(s)
of the QDataSet that would fall into such categories for the use of the QDataSet in training quan-
tum tomography algorithms.

Item Description
Objective Algorithm to estimate noise operators {VO }, thereby characterising rele-
vant features of noise affecting quantum system.
Inputs Pulse sequence, reconstructed from the pulse parameters feature in the
dataset.
Label Set of measurements {EO }
Intermediate in- Hamiltonians, Unitary operators, Initial states ρ0
puts
Output Estimate of measurements {ÊO }
Metric MSE (between estimates and label data) M SE(EO , ÊO )
Table 3.4: QDataSet features for quantum noise spectroscopy. The left columns lists typical cate-
gories in a machine learning architecture. The right column describes the corresponding feature(s)
of the QDataSet that would fall into such categories for the use of the QDataSet in training quan-
tum tomography algorithms.
3.13. FIGURES & TABLES 119

Item Description
Objective Algorithm to learn optimal sequence of controls to reach final state
ρ(T ) or (equivalently) synthesise target unitary UT .
Inputs Hamiltonians containing Pauli generators H0 (t)
Label Final state ρ(T ) and (possibly) intermediate states ρ(tj ) for each time-
interval tj .
Intermediate Sequence of unitary operators U0 (t), U1 (t), Initial states ρ0
fixed inputs
Intermediate Sequence of pulses fα (t) including parameters depending on whether
weights square or Gaussian (for example)

Output Estimate of final state ρ̂(T ) and intermediate states ρ̂(tj )

Metric Average operator fidelity F (ρ, ρ̂)

Table 3.5: QDataSet features for quantum control. The left columns lists typical categories in
a machine learning architecture. The right column describes the corresponding feature(s) of the
QDataSet that would fall into such categories for the use of the QDataSet in training quantum
control algorithms. The specifications are just one of a set of possible ways of framing quantum
control problems using machine learning.

Category Qubits Drift Control Noise

1 1 (z) (x) (z)
2 1 (z) (x, y) (x, z)
3 2 (z1, 1z) (x1, 1x) (z1, 1z)
4 2 (z1, 1z) (x1, 1x, xx) (z1, 1z)
Table 3.6: The general categorization of the provided datasets. The QDataSet examples were
generated from simulations of either one or two qubit systems. For each one or two qubit simulation,
the drift component of the Hamiltonian was along a particular axis (the z-axis) for the single-qubit
case and the z-axis of the first qubit for the two-qubit case (but not the second qubit) or vice versa.
Controls were applied along different axes, such as x- or y- axes. Finally, noise was similarly added
to different axes: the z-axis (and in some cases the x-axis) of the single qubit case and the z-axis
case of the first or second qubit for the two-qubit case.
120 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Parameter Value
T 1
M 1024
K 2000
Ω 12
Ω1 12
Ω2 10
n 5
Amin -100
Amax 100
σ T/(12M)
Table 3.7: Dataset Parameters: T : total time, set to unity for standardisation; M : the number of
time-steps (discretisations); K: the number of noise realisations; Ω: the energy gap for the single
qubit case (where subscripts 1 and 2 represent the energy gap for each qubit in the single qubit
case); n: number of control pulses; Amax , Amin : maximum and minimum amplitude; σ: standard
deviation of pulse spacing (for Gaussian pulses).

Figure 3.1: Plot of an undistorted (orange) pulse sequence against a related distorted (blue) pulse
sequence for the single-qubit Gaussian pulse dataset with x-axis control (‘G 1q X’) over the course
of the experimental runtime. Here f (t) is the functional (Gaussian) form of the pulse sequence for
time-steps t. These plots were used in the first step of the verification process for QDataSet. The
shift in pulse sequence is consistent with expected effects of distortion filters. The pulse sequences
for each dataset can be found in simulation parameters =⇒ dynamic operators =⇒ pulses
(undistorted) or distorted pulses for the distorted case (see Table (3.2) for a description of the
dataset characteristics).

3.13.1 Monte Carlo algorithm

We set out below the Monte Carlo pseudocode referred to in section 3.5.3.8. This
pseudocode is reproduced from [158] (see Algorithm 1 (Monte Carlo simulation of a
noisy qubit) below).
3.13. FIGURES & TABLES 121

Dataset Description
G 1q X (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: none; (iv) No distor-
tion.
G 1q X D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: none; (iv) Distortion.
G 1q XY (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: none; (iv)
No distortion.
G 1q XY D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: none; (iv)
Distortion.
G 1q XY XZ N1N5 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
G 1q XY XZ N1N5 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
G 1q XY XZ N1N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
G 1q XY XZ N1N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
G 1q XY XZ N3N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
G 1q XY XZ N3N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.
G 1q X Z N1 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N1 on z-axis; (iv)
No distortion.
G 1q X Z N1 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N1 on z-axis; (iv)
Distortion.
G 1q X Z N2 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
No distortion.
G 1q X Z N2 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
Distortion.
G 1q X Z N3 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N3 on z-axis; (iv)
No distortion.
G 1q X Z N3 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N3 on z-axis; (iv)
Distortion.
G 1q X Z N4 (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N4 on z-axis; (iv)
No distortion.
G 1q X Z N4 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N4 on z-axis; (iv)
Distortion.
G 2q IX-XI IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits, Gaussian; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) No distortion.
G 2q IX-XI IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits, Gaussian; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) Distortion.
G 2q IX-XI-XX (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: none; (iv) No distortion.
G 2q IX-XI-XX D (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: none; (iv) Distortion.
G 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: N1 and N5 on z-axis noise on each
qubit; (iv) No distortion.
G 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, Gaussian; (iii) Noise: N1 and N5 on z-axis noise on each
qubit; (iv) Distortion.

Table 3.8: QDataSet File Description (Gaussian). The left column identifies each dataset in the
respective QDataSet examples while the description column describes the profile of the Gaussian
pulse datasets in terms of (i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent.
122 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Dataset Description
S 1q X (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: none; (iv) No distor-
tion.
S 1q X D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: none; (iv) Distortion.
S 1q XY (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: none; (iv)
No distortion.
S 1q XY D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: none; (iv)
Distortion.
S 1q XY XZ N1N5 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.

S 1q XY XZ N1N5 D (i) Qubits: one; (ii) Control: x-axis and y-axis, Gaussian; (iii) Noise: N1 on
x-axis, N5 on z-axis; (iv) No distortion.
S 1q XY XZ N1N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
S 1q XY XZ N1N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.

S 1q XY XZ N3N6 (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) Distortion.
S 1q XY XZ N3N6 D (i) Qubits: one; (ii) Control: x-axis and y-axis, square; (iii) Noise: N1 on
x-axis, N6 on z-axis; (iv) No distortion.

S 1q X Z N1 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N1 on z-axis; (iv) No
distortion.
S 1q X Z N1 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N1 on z-axis; (iv)
Distortion.
S 1q X Z N2 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N2 on z-axis; (iv) No
distortion.
S 1q X Z N2 D (i) Qubits: one; (ii) Control: x-axis, Gaussian; (iii) Noise: N2 on z-axis; (iv)
Distortion.
S 1q X Z N3 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N3 on z-axis; (iv) No
distortion.
S 1q X Z N3 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N3 on z-axis; (iv)
Distortion.
S 1q X Z N4 (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N4 on z-axis; (iv) No
distortion.
S 1q X Z N4 D (i) Qubits: one; (ii) Control: x-axis, square; (iii) Noise: N4 on z-axis; (iv)
Distortion.
S 2q IX-XI IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits, square; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits, square; (iii) Noise: N1
and N6 z-axis on each qubit; (iv) Distortion.
S 2q IX-XI-XX (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, square; (iii) Noise: none; (iv) No distortion.
S 2q IX-XI-XX D (i) Qubits: two; (ii) Control: single x-axis control on both qubits and x-axis
interacting control, square; (iii) Noise: none; (iv) Distortion.
S 2q IX-XI-XX IZ-ZI N1-N5 (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N5 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI-XX IZ-ZI N1-N5 D (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N5 z-axis on each qubit; (iv) Distortion.
S 2q IX-XI-XX IZ-ZI N1-N6 (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N6 z-axis on each qubit; (iv) No distortion.
S 2q IX-XI-XX IZ-ZI N1-N6 D (i) Qubits: two; (ii) Control: x-axis on both qubits and x-axis interacting
control, square; (iii) Noise: N1 and N6 z-axis on each qubit; (iv) Distortion.
Table 3.9: QDataSet File Description (Square). The left column identifies each dataset in the
respective QDataSet examples while the description column describes the profile of the square
pulse datasets in terms of (i) number of qubits, (ii) axis of control and pulse wave-form (iii) axis
and type of noise and (iv) whether distortion is present or absent.
3.13. FIGURES & TABLES 123

Item Description
Quantum states Description of states in computational basis, usually represented as
vector or matrix (for ρ). May include initial and evolved (interme-
diate or final) states
Measurement Measurement operators used to generate measurements, description
operators of POVM.
Measurement Distribution of measurement outcome of measurement operators,
distribution either the individual measurement outcomes or some average (the
QDataSet is an average over noise realisations).
Hamiltonians Description of Hamiltonians, which may include system, drift, en-
vironment etc Hamiltonians. Hamiltonians should also include rel-
evant control functions (if applicable).
Gates and oper- Descriptions of gate sequences (circuits) in terms of unitaries (or
ators other operators). The representation of circuits will vary depending
on the datasets and use case, but ideally quantum circuits should
be represented in a way easily translatable across common quan-
tum programming languages and integrable into common machine
learning platforms (e.g. TensorFlow, PyTorch).
Noise Description of noise, either via measurement statistics, known fea-
tures of noise, device specifications.
Controls Specification and description of the controls available to act on the
quantum system.
Table 3.10: An example of the types of quantum data features which may be included in a dedicated
large-scale dataset for QML. The choice of such features will depend on the particular objectives
in question. We include a range of quantum data in the QDataSet, including information about
quantum states, measurement operators and measurement statistics, Hamiltonians and their cor-
responding gates, details of environmental noise and controls.

Figure 3.2: The frequency response (left) and the phase response (right) of the filter that is used to
simulate distortions of the control pulses. The frequency is in units of Hz, and the phase response
is in units of rad.
124 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING

Figure 3.3: Plot of average observable (measurement) value for all observables (index indicates
each observable in order of Pauli measurements) for all measurement outcomes for samples drawn
from dataset G 1q X (using TensorFlow ‘tf’, orange line) against the same mean for equivalent
simulations in Qutip (blue line - not shown due to identical overlap) for a single dataset. Each
dataset was sampled and comparison against Qutip was undertaken with equivalent results. The
error between means was of order 10−6 i.e. they were effectively identical (so the blue line is not
shown).

Figure 3.4: An example of a quantum state rotation on the Bloch sphere. The |0⟩ , |1⟩ indicates
the σz -axis, the X and Y the σx and σy axes respectively. In (a), the vector is residing in a +1 σx
eigenstate. By rotating about the σz axis by π/4, the vector is rotated to the right, to the +1 σy
eigenstate. A rotation about the σZ axis by angle θ is equivalent to the application of the unitary
U (θ) = exp(−iθz σz /2).
3.13. FIGURES & TABLES 125

Algorithm 1 Monte Carlo simulation of a noisy qubit

function Evolve(H, δ)
U ←I
for t ← 0, M − 1 do
Ut ← e−iHt δ
U ← Ut U
end for
return U
end function
function GenerateNoise(S, T , M )
N ← M2
for j ← 0, N − 1 do
ϕ ← Random(0, 1)
Pj ← √MT Sj e2πiϕ
p

QN −j ← P̄
end for
P ← Concatenate(P , Q)
β ← Re{ifft(P )}
return P
end function
function simulate(ρ, O, T , M , fx , fy , fx , SX , SY , SZ )
T
δ←M
E←0
for k ← 0, K − 1 do
βx ← GenerateNoise(SX , T , M )
βy ← GenerateNoise(SY , T , M )
βz ← GenerateNoise(SZ , T , M )
for j ← 0, M − 1 do
t ← (0.5 + j)δ
Hj ← 21 (Ω + βz (t)) σz + 12 (fx (t) + βx (t)) σx + 12 (fy (t) + βy (t)) σy
end for
U ← Evolve(H, δ)
E ← E + tr U ρU † O
end for
E
E←K
return E
end function
126 CHAPTER 3. QDATASET AND QUANTUM GREYBOX LEARNING
Chapter 4

Quantum Geometric Machine

Learning

4.1 Abstract

The application of machine learning techniques to solve problems in quantum con-

trol together with established geometric methods for solving optimisation problems
leads naturally to an exploration of how machine learning approaches can be used
to enhance geometric approaches for solving problems in quantum information pro-
cessing. In this Chapter, we review and extend the application of deep learning to
quantum geometric control problems. Specifically, we demonstrate enhancements in
time-optimal control in the context of quantum circuit synthesis problems by ap-
plying novel deep learning algorithms in order to approximate geodesics (and thus
minimal circuits) along Lie group manifolds relevant to low-dimensional multi-qubit
systems, such as SU (2), SU (4) and SU (8). We demonstrate the superior per-
formance of greybox models, which combine traditional blackbox algorithms with
whitebox models (which encode prior domain knowledge of quantum mechanics),
as means of learning underlying quantum circuit distributions of interest. Our re-
sults demonstrate how geometric control techniques can be used to both (a) verify
the extent to which geometrically synthesised quantum circuits lie along geodesic,
and thus time-optimal, routes and (b) synthesise those circuits. Our results are of
interest to researchers in quantum control and quantum information theory seek-
ing to combine machine learning and geometric techniques for time-optimal control
problems.

127
128 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

4.2 Introduction

Machine learning-based approaches to solving theoretical and applied problems in

quantum control have gained considerable traction over recent years as researchers
leverage access to enhanced computational resources in order to solve numerical opti-
misation problems [82,172–174]. Concurrently, geometric control techniques (section
C.5) in which the tools of differential geometry and topology are applied to problems
in quantum information processing, have been applied in a variety of quantum con-
trol programmes [152,175–177]. The synthesis of geometry and quantum information
has also recently emerged of interest to researchers in complexity geometry [178,179].
It is natural therefore that the intersection between geometric and machine learning
techniques in quantum control emerge as a cross-disciplinary research direction. Un-
derstanding such synergies between techniques within geometric control, quantum
information processing and machine learning opens up promising pathways within
theoretical and applied quantum computational research, with potential applica-
tion across other research domains. In this Chapter, we extend previous research
seeking to combine techniques from geometric control, quantum information pro-
cessing and machine learning in order to synthesise time-optimal quantum circuits
for multi-qubit quantum systems. The development of techniques for improving the
time-optimality of quantum circuit synthesis is of interest to researchers across the
spectrum of theoretical [180] and applied quantum information science given the dif-
ficulties and challenges of synthesising quantum circuits for desired computations,
let alone time-optimal ones. We approach this ubiquitous problem by extending ge-
ometric methods for generating approximate normal subRiemannian geodesic (and
thus time-optimal) paths along certain Lie group manifolds of interest to quantum
information processing (such as SU(2n )) with tailored deep learning-based machine
learning techniques. Our results consist of: (1) an evaluation of certain existing
approaches for approximating geodesics along Lie manifolds via discrete sequences
of unitary propagators; (2) determination of the optimal set of controls for generat-
ing discrete approximations to certain geodesic sequences of unitaries in SU(2n ) for
application in multi-qubit systems; and (3) demonstration of the utility of adopting
so-called ‘greybox’ machine learning architectures [158] which combine ‘whitebox’
architectures, i.e. prior information (such as known laws of quantum mechanics)
with ‘blackbox’ architectures, such as various neural network architectures, into
synthesising quantum circuits.
4.3. PRELIMINARIES 129

4.3 Preliminaries

4.3.1 Problem description

The focus of this Chapter is on the development of novel machine learning archi-
tectures that leverage results from subRiemannian geometry in a quantum control
setting. Such techniques are of relevance to practitioners within quantum control for
a variety of reasons. First, as we discuss below in our explication of subRiemannian
geometry in quantum control settings, subRiemannian control problems are a gen-
eralisation of standard Riemannian control problems in that they represent a more
general form of Riemannian geometry (see sections C.4 and C.5).
Second, subRiemannian quantum control problems arise where only a subset of
the full Lie algebra (of generators) is itself directly accessible as a control subset.
This is of direct relevance to many quantum control scenarios which may be envi-
sioned for quantum computing devices in which one does not have access to the full
set of underlying generators, say for arbitrary multi-qubit (qudit) systems with a
limited gate set. Many quantum control problems are in fact, when characterised
geometrically, subRiemannian quantum control problems.
Third, is the result that synthesis of quantum circuits (i.e. sequences of uni-
tary propagators) in a time-optimal fashion using geometric techniques (in which
time-optimality is equated with generating discretised approximations to minimal
distance geodesics on underlying Lie group manifolds) may call for subRiemannian
rather than Riemannian geometric characterisation. The reason for this is that in
order to generate such geodesic approximations, it may be necessary to restrict the
underlying control subset of generators to a subset of the full Lie algebra (section
B.2.4). For many multi-qubit systems, quantum circuits may be constructed in or-
der to approximate geodesics (and thus be characterised as time optimal) where the
generating Lie algebra is restricted to what are known as one- and two-body Pauli
operators (tensor products of at most two standard Pauli operators), rather than
the full Lie algebra. These three issues - the prevalence of subRiemannian geometric
features in quantum control problems, the restricted availability of generators when
undertaking control and the need to synthesise circuits in a time-optimal fashion -
motivate the use of geometric techniques applied in this Chapter.

4.3.2 New contributions

In this Chapter, we report a number of experimental results based upon simulations
of machine learning models for quantum circuit synthesis.
First, we report improved machine learning architectures for quantum circuit
130 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

synthesis. We demonstrate in-sample improvements according to standard machine

learning metrics (see section D.3.4) including MSE and average operator fidelity
(equation (D.7.4)) for training, validation and generalisation data by several orders
of magnitude compared with relevant state of the art methods. We demonstrate
that customised deep learning architectures which utilise a combination of standard
and bespoke neural network layers, together with customised objective functions
(such as fidelity measures) of relevance to quantum information processing, achieve
superior results. This approach is denoted as ‘greybox’ machine learning (section
D.9), is characterised by models that combine known prior assumptions about quan-
tum information processing with machine learning architectures. We demonstrate
enhanced performance of greybox over blackbox models.
Second, we report an improvement on previous work combining subRiemannian
geometric training with data and deep learning [54] to synthesise quantum circuits.
We show that optimal sets of controls may be obtained using a feed-forward fully-
connected, Gated Recurrent Unit (GRU) Recurrent Neural Network (RNN) and
custom geometric machine learning models. However, we also report on difficulties
in usefully adapting such approaches for generalisation. Third, we demonstrate that
machine learning protocols to learn discretised geodesic approximations in SU (2n )
are particularly sensitive to hyperparameter tuning, including time for application
of generators and coverage of training geodesics over manifolds of interest. We
show that selection of small time-steps for discretised unitary evolution will result
in geodesic approximations highly proximal to the identity in SU (2n ) (as was the
case in [54]), resulting in a deterioration in the ability to (in-sample and out of
sample) learn geodesic approximations to target unitaries further away (by what-
ever relevant distance metric or norm is adopted) along the manifolds. Improving
model performance to generalise beyond the proximity of the identity is shown to
require small evolutionary timescales but also an increased number of segments of
the geodesic approximation i.e. a correct balance of timescale and segmentation.

4.3.3 Structure

The structure of this Chapter is as follows. Part 4.4 provides an overview of key
quantum control concepts and literature relevant to our experiments. It draws upon
material explicated in more detail in supplementary Appendices below, particularly
sections A.2, B.2 and C.5 and examines the formulation of quantum control problems
geometrically in terms of Lie groups and differential geometry. It also explores
seminal expositions from Nielsen et al. [181] in which time-optimal quantum circuit
synthesis problems are framed in terms of generating approximate geodesics along
relevant group manifolds.
4.4. QUANTUM CONTROL AND GEOMETRY 131

Part 4.5 details the application of subRiemannian geometric theory to quantum

circuit synthesis, drawing upon the exegesis in section C.5.7. Part 4.6 lays out
the design principles behind the series of experiments undertaken to develop im-
proved machine learning architectures for quantum circuit synthesis via approximate
discretised geodesics. This section provides a detailed implementation of greybox
variational quantum circuits for machine learning discuss in sections D.7 and D.9.
Readers interested only in the technical details of the architectures should skip to
this section. Part 4.7 details the results of the various experiments, with discussion
set-out in Part 4.8. Future work and directions emerging from this research are then
discussed in Part 4.9. Code for the experiments may be found at GitHub1 .

4.4 Quantum control and geometry

4.4.1 Overview
The necessity of quantum control for various quantum information and computa-
tion programmes globally has seen the emergent application of classical geometric
control in an effort to solve threshold problems such as how to synthesise time
optimal circuits [23, 24, 59]. Nearly two decades ago, developments in applied quan-
tum control [21, 22, 175] spurned the use of geometric tools to assist in solving
optimisation problems in quantum information processing contexts such as applied
NMR [21, 22, 152, 175]. Related work also explored the use of Lie theoretic, geo-
metric and analytic techniques for controllability of spin particles [182]. Since that
time, the connections between geometry and quantum control/information process-
ing across cross-disciplinary fields, via the explication of transformations that enable
problems in one field, in this case quantum control optimisation objectives (such as
minimising controls for synthesis or reachable targets) to be translated into another,
namely the language of differential geometry. Of particular note, Nielsen et al. [183]
demonstrated that calculating quantum gate complexity could be framed in terms
of a distance-minimisation problem in the context of Riemannian manifolds. In
that work, upper and lower bounds on quantum gate complexity, relating to the
optimal control cost in synthesising an arbitrary unitary UT ∈ SU (2n ), were shown
to be equivalent to the geometric challenge of finding minimal distances on certain
Riemannian manifolds (section C.2), subRiemannian (section C.4) and Finslerian
manifolds. Subsequently, geometric techniques were utilised [184, 185] to find a
lower bound on the minimal number of unitary gates required to exactly synthesise
UT , thereby specifying a lower bound on the number of gates required to implement
a target unitary channel.
1
Codebase: https://github.com/eperrier/quant-geom-machine-learning.
132 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Research into quantum control [17,186] and geometric circuit synthesis [185,187,
188] has built upon results regarding the use of geometric techniques in quantum con-
trol settings. Of interest to researchers at the intersection of geometric and machine
learning approaches for quantum circuit synthesis, and the focus of this Chapter, is
a technique developed in [53,54] that combines subRiemannian geometric techniques
with deep learning in order to approximate normal subRiemannian geodesics (defi-
nition C.4.3) for synthesis of time-optimal or nearly-time optimal quantum circuits.
Our results present improved machine learning architectures tailored to learning
such approximate geodesics.

4.4.2 Quantum control formalism

4.4.2.1 Control formulations

The affinity between quantum control methods and geometric control and non-
control methods arises from many sources within the literature. One fundamental
reason is the intimate connection between Lie algebraic formulations of control prob-
lems, in classical and quantum settings, and the differential /geometric formulations
of Lie theories on the other (as covered in sections C.5 and A.1.8). In typical Lie
theoretic approaches to quantum control problems [15, 23, 46, 61] such as synthesis
of quantum circuits, the quantum unitary of interest U is drawn from a Lie group G
(definition B.2.1). A feature of Lie groups is that they are mathematical structures
that are at once groups but also differentiable manifolds (definition C.1.2), topolog-
ical structures equipped with sufficient geometric and analytical structure to enable
analytic machinery, such as the tools of differential geometry, to be applied to their
study [51].
A typical formulation of control problems in such Lie theoretic terms takes a
target unitary UT to be an element of a Lie group, such as SU (2n ), represented
as a manifold. Associated with the underlying Lie group G is an Lie algebra g
(definition B.2.6), say su(2n ), comprising the generators of the underlying Lie group
of interest. The Lie algebra g exhibits a homomorphism with the Lie group G
such that it is both the generator of group action G and allows the symmetry
properties of G to be studied by querying the algebra g itself (see definition B.2.8).
Quantum control objectives can then be characterised as attempts to synthesise
a target unitary propagator [21] belonging to such a Lie group G via application
of generators belonging to g in a controlled manner. In the simplest (noise-free)
non-relativistic settings, computation is effected via evolution from U (0) = I to UT
4.4. QUANTUM CONTROL AND GEOMETRY 133

according the time-dependent Schrödinger equation (definition A.1.18):

Z t
U (t) = T+ exp −i H(s)ds . (4.4.1)
0

where T+ represents the time-ordering operator to ensure appropriate causal ordering

of operators and we have set ℏ = 1 for convenience. The above formulation may also
be expressed in terms (discussed in more detail below) of time dependent drift Hd (t)
and control Hc (t) Hamiltonians characteristic of quantum control settings (section
A.2):

U̇ (t) = −i(Hd (t) + Hc (t))U (t). (4.4.2)

The drift part of the Hamiltonian represents the (directly) ‘uncontrollable’ aspect
of evolution (and is discussed in more detail below), while the control Hamiltonians
represent evolution generated by those elements (generators) of the quantum system
which are controllable (see definition A.2.1), namely the generators of a Lie algebra of
interest, such as, in the case of qubit systems, generalised Pauli operators (equation
(3.5.17)).
The Hc terms represent the control Hamiltonians :
m
X
Hc (t) = vk (t)τk . (4.4.3)
k=1

parametrised by control functions vk (t) which are continuous functions of time.

These Hamiltonians are composed from (usually linear) functions of the generators
τk ∈ g (where dim g = m and k indexes the generators) belonging to the correspond-
ing Lie algebra (such as generalised (tensor products) of Pauli operators for SU (2n ).
The time-dependence of the Hamiltonians is encoded in these time-dependent con-
trol functions as the generators themselves are not time-dependent. While often
linear, the functional time dependence can and does often assume non-linear and
complicated functional forms, especially in the presence of noise.
Analytically solving for the form of the control function is difficult and usually
intractable for higher-order qudit systems or open quantum systems in the presence
of noise, with numerical methods usually adopted instead [152]. This is due to
the limited circumstances in which purely analytic solutions are possible (see [15,
175] for examples). It is also because, more practically, such analytic methods are
insufficient for solving quantum control problems in open quantum systems where
noise is present unless the full noise spectrum is known, which is rare (see discussion
of noise in section A.3.2). In both cases, solutions to control problems usually
rely upon well-established numerical techniques, such as dynamical decoupling [92].
134 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.1: Sketch of geodesic path. The evolution of quantum states is represented by the evolution
according to Schrödinger’s equation of unitary propagators U as curves (black line) on a manifold
U ∈ G generated by generators (tangent vectors) (blue) in the time-dependent case (4.4.2). For
the time-independent case, the geodesic is approximated by evolution of discrete unitaries for time
∆t, represented by red curves (shown as linear for ease of comprehension). Here Uti represents the
evolved unitary at time ti .

One of the motivations for the use of machine learning in quantum control problems
is precisely their potential utility in learning a sufficient approximation of control
functions needed to achieve quantum control objectives, including in the presence
of noise [95].

It is common (as discussed below), in appropriate circumstances, to simplify the

typical time-dependent Schrödinger equation with its time-independent discretised
approximation (definition A.1.25). In this case, evolution towards a target unitary
propagator UT is approximated via a sequence of successive unitaries generated by
time-independent Hamiltonians Hj applied at time tj for duration ∆tj :
Z t
U (t) = T+ exp −i H(s)ds (4.4.4)
0
0
Y
= lim exp(−iHj (tj )∆t) (4.4.5)
N →∞
j=N
0
Y
≈ exp(−iHj (tj )∆t) (4.4.6)
j=N
0
Y
= Uj = UN ...Uj ...U0 (4.4.7)
j=N

where ∆tj = ∆t = T /N . That is, the unitary propagator at time t (from the iden-
tity) is the cumulative reverse product (forward-solved cumulant) of a sequence of
Uj . This approximation is considered appropriate where ∆t is small by comparison
to total evolution time T (or equivalently total energy) (and resulting cumulative
errors from the product of such unitaries are sufficiently small) and is an approxi-
mation adopted in our experiments detailed below.
4.4. QUANTUM CONTROL AND GEOMETRY 135

Adopting this approximation allows (4.4.2) to be expressed as:

U̇j = −i(Hd,j + Hc,j )Uj (4.4.8)

Xm
= −i(Hd,j + vk,j τk )Uj . (4.4.9)
k

Here, Hd,j = Hd (tj ) designates the drift (or internal) part of the Hamiltonian at
time-step tj and similarly for Hc,j . In the discretised approximation, the control
functions vk,j above now represent the amplitude (energy) to be applied at time-step
tj for time ∆t (duration) and typically correspond, for example, to the application
of certain voltages or magnetic fields for a certain period of time. The functional
form of the controls vk,j can vary, with common (idealised) representations including
Gaussian or ‘square’ pulses.
The objective of time optimal control is then to select the set of controls to be
applied (when using a discretised approximation) at time tj for time ∆tj in order to
synthesise UT in the shortest amount of total time. Such geometric approaches in-
volve reparametrisation of quantum circuits, which are discrete, as approximations
to geodesics on Lie group manifolds of interest to quantum information process-
ing [183–185]. A schema illustrating the application of discretised time-independent
unitaries in order to approximate the geodesic quantum circuits is presented above
in Figure 4.1. It is the desire to solve this optimisation problem that motivates geo-
metric recharacterisation of problems in quantum information, such as determining
and solving geodesic equations of motion. It should be noted that in practice the
properties or characteristics of UT may be known with greater certainty than each
Uj . In this work, we assume the existence of a measurement (and indeed tomo-
graphic) process by which Uj may be sufficiently reconstructed such that knowledge
about Uj is accessible.

4.4.2.2 Path-length and Lie groups

The adaptation of geometric methods and variational methods for solving opti-
misation problems in quantum information processing is characterised in terms of
minimising distance of curves along Lie group manifolds G. In geometric contexts,
equation (4.4.9) above can be expressed as horizontal curves (see definition C.5.3)
using equation (C.5.38) where U (t) ∼ γ(t) i.e. as a point in the Lie group manifold
G ∼ M.
Doing so requires selection of a (subRiemannian or Riemannian) metric (defi-
nition C.2.5) (for use in a cost functional) that intuitively measures the distance
(or arc length i.e. using equation (C.2.8)) between elements in the associated Lie
algebra g which, in geometric terms, are represented by tangent vectors belonging
136 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

to the associated tangent space T G (see section C.1.5 for discussion of tangent bun-
dle and Lie algebra correspondence). Cost-functionals (see equation (C.5.13)) are
essentially analogous to variational functional equations such that:
b
dxα dxβ
Z
C= gαβ (4.4.10)
a dt dt

where gαβ represents the (in the most general case, not necessarily constant) metric
tensor (definition C.2.5), dx/dt represents the differential Lie group elements x ∈ G
with respect to the unique single-parametrisation (i.e. time). Solving the optimi-
sation problem of interest, such as synthesising a circuit in minimal time or with
minimal energy, becomes a question of minimising the cost function according to the
Pontryagin Maximum Principle (see section C.5.4.2). Variational methods in this
approach set δC = 0 and consequently use standard techniques from variational
calculus to derive respective equations of motion, differential equations (see section
C.5.4). The solutions (usually) take the form of exponentiated Lie algebraic ele-
ments i.e. unitary propagators, which ultimately minimise the cost functional and
solve the underlying optimisation problem.
It is worth explicating the form of cost functionals for quantum information
practitioners who may be less familiar with geometric methods. In the discretised
case, we essentially replace the integral with a sum over the various Hamiltonians
such that we have:
Z b
Cf = f (H(t))dt (4.4.11)
a

for the continuous case and

b
X
Cf = f (Hj (tj )) (4.4.12)
j=a

for the discrete case, where f represents the control function(s) applicable to the
Hamiltonian H(t). By selecting the appropriate parametrisation of curves on the
manifold (such as a typical parametrisation by arc-length e.g. equation (C.5.3)),
distance along a curve (representing evolution from one unitary, such as the iden-
tity, to another) can be equated to minimal time required to evolve (and synthesise)
a target unitary UT ∈ G of interest. In cases where there are multiple curves be-
tween two points, then one must select the minimal path over all such paths [181]
consistent with existence theorems regarding subRiemannian geodesics (see discus-
sion of Chow’s theorem in section C.4.1). Because minimising the cost functional
depends itself upon solutions (unitaries) which are themselves generated by Lie al-
gebraic elements subject to control functions, the optimisation problem of quantum
4.4. QUANTUM CONTROL AND GEOMETRY 137

control thus becomes a problem of identifying the optimal set (sequence) of control
functions to be applied over time in order to minimise the cost functional. Note that
the cost functional above in terms of arc-length is different from the cost functional
using fidelity (equation (4.6.3)) as part of our machine learning architecture below.
Applying standard techniques from the calculus of variations (e.g. the Pontrya-
gin Maximum Principle [189] (section C.5.4.2)) with respect to the cost functional
results in the geodesic equation of motion [181, 184] (definition C.1.35) which spec-
ifies the path that minimises the action and which is typically (for constant metric
Riemannian manifolds) is given by equation (C.2.1):

d 2 xj k
j dx dx
l
+ Γ kl =0 (4.4.13)
dt2 dt dt

where x = x(t) ∈ G are the unitary group elements while dx/dt ∈ g represent
the differential operators (tangent vectors/generators) of the associated Lie algebra.
Also in (4.4.13) it is implied that the form of applicable geodesic gαβ is itself identical
across the manifold (which may not always be the case). Γjkl represent Christoffel
terms obtained by variation with respect to the metric. Given a small arc along
a geodesic on a Riemannian manifold, the remainder of the geodesic path is com-
pletely determined by the geodesic equation. Solutions to the geodesic equation
are, in the continuous case, horizontal curves (definition C.5.3) and in the discrete
case approximations to curves, on the manifold of interest. Such curves are inter-
pretable in terms of quantum circuits which are time-optimal when such geodesics
also represent the minimal distance curve linking two unitary group elements on
a manifold. In this way, variational methods leveraging geometric techniques and
characterisation may be utilised for synthesising quantum circuits.

4.4.2.3 Accessible controls and drift Hamiltonians

Minimising cost functionals in the way described above involves understanding what
in classical control theory is described as the set of accessible controls available.
Those unitaries which may be synthesised via application of the controls are termed
reachable targets, that is reachable via application of the controls given the gen-
erators (see definition C.5.4). Designing appropriate machine learning algorithms
using geometric methods or otherwise thus requires information on the form of con-
trol function and generators that are available to reach a desired target, such as a
target unitary or quantum state. For a given Lie group G, access to the entire set
of generators g renders any element U ∈ G reachable. In quantum control settings,
access to the full Lie algebraic array of generators occasionally render the problem
of unitary synthesis, i.e. the sequence of generators and control pulses, analytically
or trivially obtainable using geometric means, such as Euler decompositions where
138 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

G = SU (2) [190]. In certain cases (such as those explored below), we are constrained
or seek to synthesise target unitaries UT using only a subset of the relevant Lie alge-
bra, a subset named the control set (or control subset) p ⊂ g with a decomposition
g = p ⊕ k (note we denote the control subset as a set rather than subalgebra as
where such a decomposition is a Cartan decomposition, the non-compactness of p
i.e. that [p, k] ⊆ k results in p not being a subalgebra of g). In such cases, the full
set of generators is not directly accessible. However, one may still be able to reach
the target unitary of interest if the elements of p may be combined (by operation
of the Lie bracket or Lie derivative, as discussed below) in order to generate the re-
maining generators belonging g, thus providing access to the g in its entirety. This
will be the case when the control subset p satisfies the Lie triple property (equation
(C.5.28)) [[p, p], p] ⊆ p. We distinguish such cases by denoting the first case as a case
of directly accessible controls, while the second case represents indirectly accessible
controls.
Returning to the quantum control paradigm (equation 4.4.9), the drift Hamil-
tonian Hd represents the evolution of a quantum system which cannot be directly
controlled. It may represent a noise term or the interaction of a system with an
environment in open quantum systems’ formulations. Where a control subset p ⊂ g
represents only a subset of the relevant Lie algebra, we can think of the comple-
ment k = p⊥ (where g = k ⊕ p) as generators from which the drift term Hd , or at
least elements of it (noting that, for example, in open quantum systems or non-
unitary evolutions, generators are not necessarily Lie algebraic in character), above
is composed, i.e. that Hd ∈ k.
The interaction between the drift Hd (named due to its origins in fluid dynamics)
and control Hj Hamiltonians depends on the set of such accessible controls avail-
able to solve the quantum control problem of interest. The application of control
Hamiltonians in this case represents, in effect, an attempt to ‘steer’ a system evolv-
ing according to Hd towards a desired target via the adjoint action of Lie group
elements generated by p (see [21] for a discussion).
Understanding the nature of relevant control algebras and the composition of
drift Hamiltonians is an important consideration when designing and implement-
ing machine learning architectures for geometric quantum control, including recent
novel approaches applying machine learning for modelling and control of a reconfig-
urable photonic circuit [95] and to learn characteristics of Hd via quantum feature
engineering [158]. One of the motivations of the present work is to demonstrate the
utility of being able to encode prior information about the relevant control subset
into machine learning protocols whose objective is the output of a time-optimal se-
quence of control pulses, a design choice that requires information about precisely
what generators are accessible.
4.4. QUANTUM CONTROL AND GEOMETRY 139

4.4.2.4 Geometric optimisation

Selecting the specific control subset and set of control amplitudes in order to generate
time-optimal quantum circuits is a difficult task. Solving this optimisation problem
in quantum control and quantum circuit literature using geometric techniques fol-
lows two broad directions which synthesise results from geometric control theory
(see section C.5 and [23] for a comprehensive review). One such approach uses sym-
metric space formalism and Cartan decompositions [20, 21, 175] to decompose the
Lie algebra g associated with a given Lie group G into symmetric and antisymmetric
subspaces such that g = p ⊕ k (see section B.5 and [9] generally). Here p is the con-
trol subset (containing accessible generators) and k is the subalgebra generating the
non-directly controllable evolution of the system. If a suitable partition can be found
satisfying certain Levi (Cartan) commutation relations (see [186,191]) set out in def-
inition B.5.2, then the Lie group can be decomposed into a Cartan decomposition
G = KAK. By doing so, the problem of selecting the appropriate set of genera-
tors τ ∈ p and control amplitudes is simplified (see section (4.11.1) for a discussion
and [175, 186] in particular). A drawback of such methods as currently applied to
problems in quantum control is their limited scope of application, namely that such
methods apply only to limited symmetric space manifolds for which the methods
were developed. Furthermore, the particular methods in [175] used to determine
the appropriate generators are limited in their generality. Chapter 5 presents novel
results that seek to, for certain classes of symmetric space control, address some of
the challenges in using such methods.
An alternative, but ultimately related, method explored by Nielsen et al. in a
range of papers [181,183–185] approaches the problem of finding optimal generators
and controls via modifying metrics applicable to cost functionals. In [181], geometric
techniques are applied to determine the minimal size circuit to exactly implement a
specific n-qubit unitary operation combining variational and geometric techniques
from Riemannian geometry (we discuss in detail in section C.2.1), with the paper
detailing a method for determining the lower bound of circuit complexity and circuit
size by reference to the length of the local minimal geodesic between UT and I
(where length is determined via a Finsler metric on su(2n )). In later work [183–185],
particular metrics with penalty terms are chosen that add higher-weights to higher
order Pauli operators in order to steer the generating set towards one- and two-body
operators which are assessed as being optimal for geodesic synthesis (see Appendix
(4.11.4) for a discussion and section D.3.2 for a discussion of penalty metrics and
regularisation generally). It is shown that in limiting cases applying the variational
techniques and penalty metric of Nielsen et al., the optimal set of generators are
one- and two-body terms [192].
140 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Such variational and penalty metric-based approaches have their drawbacks,

however: there are limited convergence guarantees due to, for example, the exis-
tence of exponentially Pauli geodesics (many unitaries have minimal Pauli geodesics
of exponential length (see [193])), reliance upon complicated boundary conditions, or
the difficulty in discovering homotopic maps with which to deform known geodesics
into other geodesic paths [53,185,192]. The approach in the work of Nielsen et al. is
also less general in that it assumes the entire distribution is the Lie algebra su(2n ).
A common characteristic of both approaches in the case of SU (2n ) is a preference
for control subspaces comprising only one- and two-body Pauli operators (operators
that are tensor products of at most one or two Pauli operators) [183]. In [20,175], the
rationale is that higher-order (more than two-body) generators introduce coupling
terms which increase evolution time. In [181, 184, 185], this rationale manifests in
the imposition of penalty metrics upon higher-order terms in cost functionals. This
approach penalises higher-order generators by assigning to them a higher weighting
in the metric, thereby penalising higher-order terms in the cost function which seeks
to minimise the metric of interest (in the case of [184], often Finslerian metrics F ).
Thus there are strong motivations for preferencing one- and two-body generator
control subsets when devising strategies for quantum circuit synthesis. It can be
shown that for higher-order SU(2n ) systems, three or more body generators can
themselves be composed via one- and two-body generators [53] when they form
a bracket-generating set [50]. These combined results motivate the selection of
a (minimal) set of one- and two-body generators that can generate the entire Lie
algebra su(2n ). This is a characteristic of the bracket-generating set (or distribution)
∆ adopted in [54] (see definition C.4.4), where instead of imposing penalty metrics
or relying on decompositions to obtain optimal control subsets, the control subsets
are selected initially to comprise only one- and two-body terms. Such reasoning does
not guarantee the utility of one- and two-body terms per se (see [53] for technical
examples) but can provide in certain cases a basis for potentially preferring such
generators when designing optimisation protocols, such as via machine learning, to
approximate geodesics. We also note recent important results which may affect
these methods from Marvian [146] regarding constraints on universality of control
from 2-local operations in absence of certain correctives being applied.

4.5 SubRiemannian quantum circuit synthesis

4.5.1 Overview
The difficulties of synthesising geodesics are well-known throughout geometric and
control literature [194] (see section C.5). The geodesically-driven control methods
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 141

articulated above face considerable challenges in terms of the complexities of the

relevant boundary-value problem when adopting certain ‘penalty’ metrics designed
to enforce the geodesic constraints on Finslerian manifolds. Though analytic or nu-
merical (including machine learning) architectures are unlikely to provide means of
systematically synthesising approximate geodesics and time-optimal unitary synthe-
sis for arbitrary propagators or higher-dimensional Lie groups, they have potential
utility for lower-order qudit systems.

In [53, 54], an approach leveraging subRiemannian, rather than Riemannian,

geometry is adopted in order to overcome some of these barriers to quantum circuit
synthesis using geodesic approximations. SubRiemannian geometry [17, 50, 56, 183,
195] is a generalised form of Riemannian geometry that is well-developed in classical
control contexts. In its simplest description, it covers typical geometries where only
a subset of the full Lie algebra g is directly accessible. We discuss subRiemannian
geometry in section C.4 and follow-on sections.

For the purposes of quantum control, it is helpful to characterise subRieman-

nian manifolds in Lie theoretic terms (see [50] for a more formal treatment). The
concepts below, especially the important correspondence between Lie algebras and
tangent bundles are covered extensively in Appendices B and C. For a given Lie
group manifold G, the Lie algebra g comprises generators which also form (i.e. cor-
respond to) a basis of the tangent space T M (section C.1.5). Curves γ(t) along G
are those generated by generators τ ∈ g such that the generators may be thought of
as tangent vectors tangent to the curves they generate. The curves which may be
generated on a manifold in many ways characterise the manifold. A distinguishing
feature of Riemannian and subRiemannian manifolds is the set of accessible gener-
ators. Riemannian manifolds are characterised by full direct access to g, that is, all
generators in g may generate curves on G. In more formal language, the directions a
curve may evolve along (or subalgebra of its generators) is characterised by certain
subsets ∆ of the tangent bundle T M for a manifold G. The distribution ∆ in this
context (see definition C.4.1) is also denoted the horizontal tangent space Hp M,
which intuitively refers to tangent vectors being ‘tangent’ and along the manifold
but more formally refers to the fact that the covariant derivative (definition C.1.33)
of those (generating) tangent vectors X (along the vector field Xf ) along the curve
is zero, that is

∇γ(t) X = 0.

The vanishing of the covariant derivative is the characteristic definition of parallel

transport (section C.1.8). By contrast, it may be the case that only a subspace
p ⊂ g where g = p ⊕ k is accessible for generation of curves on G. In this case,
142 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

evolution of curves tangent to certain directions of tangent vectors in k (that is, along
the fibres (section C.1.6)) is not directly possible. This set of directly inaccessible
generators k is orthogonal to the set p of horizontal tangent vectors and is described
as vertical (see definition C.1.25). More formally, the vertical subspace Vp M of T M
comprises vectors X whose evolution along the curve γ(t) is such that ∇γ(t) X ̸= 0
(having some component not tangent to the manifold). In this second case, the
manifold is characterisable as subRiemannian rather than Riemannian. Elements of
the vertical subspace k may still affect the evolution of curves, but only indirectly
to the extent the generators in k are able to be generated by the application of
the Lie bracket via the BCH formula (see equation (4.5.5) below and definition
B.2.18)) i.e. if the distribution is bracket-generating. A number of theorems of
subRiemannian geometry [50] then guarantee the existence and uniqueness of certain
normal subRiemannian geodesics on G which are both unique and minimal in length.
Thus, for generating circuits on G = SU (2n ), by constructing a distribution ∆
that is bracket-generating and comprising only one- and two-body generators, it can
be shown [53] that normal subRiemannian geodesics may be generated which are
minimal and unique, thus approximating the minimal circuits between I and UT . In
the next section, we detail the approach in [54] that leverages such subRiemannian
geometric insights. We do so in order to provide insight into the subRiemannian
machine learning detailed below.

4.5.2 SubRiemannian Normal Geodesics

The motivation behind the approach in [54] is to solve the problem of finding time-
optimal sequences of gates via approximating subRiemannian normal geodesics on
Lie group manifolds [184,196–199] in order to synthesise target unitary propagators
UT ∈ SU (2n ). The basis of the approach is to firstly adopt the time-independent
approximation (4.4.7) and express UT as an approximate product of exponentials:

n m
!
Y X
hckj τk

UT ≈ Un ...U1 ≈ E(c) = exp (4.5.1)
j k
| {z }
Uj
m
!
X
ckj τk ∆t

Uj (∆t) = exp = exp(Hj ∆t) (4.5.2)
k

where we have absorbed the imaginary unit −i into the generators. Here Uj are re-
ferred to herein as (right-multiplicative or right acting) subunitaries for convenience,
again justifiable in the large m, small h limit where h = ∆t, the evolution time of
each Uj . The terms ckj := ckj (t) represent the amplitudes of the ckj (square) control
pulses applied to k generators at time interval tj for duration ∆tj = h to generate
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 143

unitary Uj (i.e. j indexes the segment, k indexes the control amplitude ck paired
with the generators τk ). For notational clarity, in sections A.2 and C.5 the set of
cj (t) are denoted uj (t) := (ukj (t)) ∈ U ⊂ Rm , j = 1, ..., m (as per section C.5.4.1).
The method in [54] in effect becomes a ‘bang-bang control’ problem [147, 200] in
which the time-dependent Schrödinger equation is approximated by a sequence of
time-independent solutions Uj where control Hamiltonians Hj are applied via the
application of a constant amplitude ckj for discrete time interval ∆t = h = T /N (with
N the number of segments). The term E(c) represents an embedding function that
maps controls from Cn×m into the Lie group manifold:

E : Cn×m → SU (2n ) (4.5.3)

n m
!
Y X
c = (c11 , ..., cm 1 m
1 , ..., cN , ..., cN ) 7→ exp hckj τk (4.5.4)
j k

with the set of coefficients c ∈ C. The generators τi form a basis for the bracket
generating subset ∆ ∈ su(2n ) of dimension m. The Hamiltonian that generates each
subunitary Uj is the linear sum of m controls applied to m generators. By compari-
son with the conventional control setting described above (4.4.9), the coefficients ckj
correspond to vk,j .

Because ∆ constitutes the set of generators of the entire Lie algebra su(2n ) which
in turn acts as the generator of its associated Lie group SU(2n ), an arbitrary unitary
U ∈ SU(2n ) can in principle be obtained to arbitrary precision with sufficiently-
many products of exponentials. This results from the application of the Baker-
Hausdorff-Campbell (BCH) theorem (see definition B.2.18 and [181] for a generalised
explication), namely that:

1
exp(A) exp(B) = exp(A + B + [A, B] + ...). (4.5.5)
2

The approach in [53,54] is to constrain application to cases where U may be synthe-

sised as a product of a polynomial in n terms, meaning the number of exponentials
(subunitaries) required to synthesise U is at most a polynomial function of the num-
ber of sub-unitaries n. We discuss the effect for machine learning algorithms of
increasing n on outcomes such as fidelity measures below.

In the control setting discussed above (in which each Uj is decomposed into
its BCH product with coefficients ckj ) each ckj sought to be found constitutes some
optimal application of the generator τk . This is consistent with the result in [175] (see
Appendix (4.11.1)), in which the minimum time for synthesising the target unitary
propagator is given by the smallest summation of the coefficients (controls) of the
generators ni=1 |αi | which, in our notation, would sum over all control coefficients
P
144 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
PN Pm
for all subunitaries i.e. j=1 k=1 |ckj |.
It is worth noting that the assumption in [175] and even [184] and other analytic
results in control is that in effect the controls can be applied ‘instantaneously’ such
that the minimum time for evolution of a unitary (via the adjoint action of control
generators on drift Hamiltonians) is lower-bounded by the evolution driven by the
drift Hamiltonian Hd . That is, many such control regimes assume that control
amplitudes can be applied without energy constraints, which is equivalent to being
applicable within infinitesimal time. Often this assumption is justified by the fact
that a control voltage may be many orders of magnitude greater than the energy
scales of the quantum systems to be controlled. In cases where control amplitudes
(for example, voltages) are, in any significant sense, upper-bounded say by energy
constraints, then time for optimal synthesis of circuits will of course increase as the
assumption of instantaneity will not hold. For our purposes, in a bang bang control
scenario and assuming evolution according to any drift Hamiltonian sets a lower-
bound on evolution time, we consider the control amplitudes ckj as applied for time
h rather than instantaneously.

4.5.3 One- and two-body terms

As discussed above, the approach in [53, 54] is to in essence circumvent the need
for elaborate penalty terms in bespoke metrics to penalise higher-order generalised
Pauli geodesic generators (akin to regularisation) by instead simply constraining the
control subset, the distribution ∆, to be the Kronecker product of one- and two-
body Pauli operators (here ι ranges over the standard index 0, 1, 2, 3 of the Paulis
including identity):
i j i k l
△ = span √ σι , √ σι σι (4.5.6)
2n 2n
where σιj indicates the n-fold Kronecker product/tensor product of Pauli operators at
position {1, ..., j, ..., m} with the two-dimensional identity operator at other indices.
The underlying approach of the geodesic approximation method in these papers
is to seek to learn the inverse map:

E −1 : SU(2n ) → Cm×n (4.5.7)

and thus, by doing so, learn the appropriate sequence of control pulses necessary
to generate time optimal evolution of unitaries and, consequently, time optimal
quantum circuits. The method involves generating training data in the form of
normal sub-Riemannian geodesics on SU(2n ) from I to UT . The exponential product
(equation (4.4.7)) represents a path γ(t) along the SU(2n ) manifold, however there
may be an infinity of paths between I and UT such that the map E is not injective
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 145

(or unique, thus minimal), meaning E −1 is not well-defined.

4.5.4 Generating geodesics

To solve this uniqueness problem, [53, 54] propose to synthesise paths that approxi-
mate minimal normal sub-Riemannian geodesics described above. To generate nor-
mal subRiemannian geodesics in SU(2n ), [54] limit the norm of boundary conditions
(a computational efficiency choice) and apply a generalised form of the Pontrya-
gin Maximum Principle [147]. They follow well-established variational approaches
in [46] where subRiemannian geodesics may be found by minimising the energy
(cost) functional: Z 1
E[γ] = dt⟨γ̇(t), γ̇(t)⟩. (4.5.8)
0

Specifically, ⟨·, ·⟩ is the restriction of the bi-invariant norm (induced by the inner
product on the tangent bundle) to ∆ ∈ su(2n ). For su(2), this arises from the Killing
form which induces a metric on the manifold (see section C.5.6). The fibre bundle
structure allows partitioning of T M into horizontal and vertical subspaces (albeit if
p = g then V M = ∅). Equation (4.5.8) is the energy equation for a horizontal curve
specified in equation (C.4.2). Here the curve γ(t) ∈ M (path) varies over t ∈ [0, 1]
with tangent vectors to the curve (i.e. along the vector field) given by γ̇(t) ∈ T M.
Hence we can see how minimising path length equates to minimisation of energy.
This approach uses variational methods (section C.5.4) to minimise the path length.
To contextualise this formulation in Lie theoretic terms, γ(t) represent unitaries
U (t) ∈ SU (2n ) and γ̇ the corresponding tangent (Lie algebraic) vectors. Distance
along a path γ(t) generated by the tangent vectors (generators) γ̇(t) is measured
in by subRiemannian (or in the general case, Riemannian) metrics applied to the
tangent space (see general exposition in section C.2.1). The other key assumption
behind this method is that the applicable metric gαβ is constant.
The normal subRiemannian geodesic equations arising from minimising the en-
ergy functional above can be written in differential form [46] as:

γ̇(t) = uγ(t) (4.5.9)

Λ̇ = [Λ, u] (4.5.10)
u = proj∆ (Λ) (4.5.11)

It is worth unpacking each of these terms in order to connect the equations above
to the control and geometric formalism above and because they are integrated into
the subRiemannian machine learning model detailed below. In control theory for-
malism (section C.5.4.1), equation (4.5.9) is the state equation for state variable
γ(t), corresponding to equations (C.5.5) and (A.2.6). In quantum contexts it corre-
146 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

sponds to the Schrödinger equation, hence we can identify γ(t) ≡ U (t) (see equation
(A.2.7)). The u term represents an element of the Lie algebra u ∈ ∆ ⊂ su(2n )
parameterised by t ∈ [0, 1], i.e. u : [0, 1] → su(2n ) with t 7→ u(t). As such, it
represents the generator of evolutions on the underlying manifold SU(2n ). For each
value t, the curve γ(t) represents an element of the Lie group i.e. SU(2n ), again
parametrised by t ∈ [0, 1]. Equation (4.5.10) is the costate (adjoint) equation with
costate (adjoint) variables Λ taking values in Lie algebra su(2). Λ represents the
costate (adjoint) variable (akin when integrated to a Lagrange multiplier) encoding
control constraints of the Pontryagin Maximum Principle. They differ from u in that
while u are direct elements of the distribution ∆, Λ(t) are elements of the overall Lie
algebra su(2n ) that are generated by the Lie-bracket between other Λ and u, hence
Λ : [0, 1] → su(2n ). The relationship with the Lie bracket is instructive in that the
Lie bracket also has an interpretation as the Lie derivative (definition B.2.6), the
adjoint equation represents its dynamics.

The time-derivative Λ̇ refers to how the Lie bracket commutator indicates the
change in a vector field along the path γ(t). In a control setting, the Lie derivative
(definition B.2.6) tells us how Λ changes as it is evolved along curves γ(t) generated
by elements u of the control subset. For parallel transport along geodesics (section
C.1.8), as mentioned above, we require this change to be such that the covariant
derivative (definition C.1.33) of Λ0 as it is parallel transported along the curve is
zero, that is:

∇γ(t) Λ0 = 0. (4.5.12)

The last term (4.5.11) indicates that u resides in the distribution ∆ by virtue of
the projection of Λ onto the distribution ∆:
X
proj∆ (x) = Tr(x† τi )τi ∈ ∆. (4.5.13)
i

This projection function is important in that it ensures that the generators of Uj

remain within ∆, facilitating the parallel transport of Λ0 and that Uj are therefore
able to be synthesised from the control subset in our machine learning protocols.
Here γ(t) ∼ U (t) and γ̇(t) ∼ Λ(t). The geodesic curves γ(t) depend on the initial
condition Λ(0) (the ‘momentum’ term) with the initial ‘position’ in the manifold
being the identity unitary. In the geometric control setting over Lie group manifolds,
such as unitary groups, selecting an initial generalised coordinate (akin to ‘position’
in the manifold) and generalised momentum, which in turn amounts to selecting
an initial ‘starting’ unitary from the Lie group for the evolution at t = 0, usually
the identity U (0) ∈ SU(2n ) along with a starting momentum Λ(0) drawn from the
4.5. SUBRIEMANNIAN QUANTUM CIRCUIT SYNTHESIS 147

associated Lie algebra su(2n ). Given these initial operators, the geodesic equations
then allow determination of tuples of unitaries and generators (positions in the
Lie group manifold, momenta in the Lie algebra) for any particular time value
t ∈ [0, 1]. That is, they provide a formula for determining U (t) and Λ(t). The
distribution (control subset) determines the types of geodesics that may be evolved
along. Because the distribution is bracket generating, in principle any curve along
SU (2n ) may be synthesised in this way (though not necessarily directly).

As noted in [53], the above set of equations can be written as a first-order dif-
ferential equation via
γ̇(t) = proj∆ (γ(t)Λ0 γ(t)† )γ(t) (4.5.14)

A first-order integrator (see (4.5.18)) is used to solve for γ(t) = U (t). It is worth
analysing (4.5.14) in light of the discussion above on conjugacy maps and their
relation to time optimal geodesic paths. The γ(t) terms in the conjugacy map:

γ(t)Λ0 γ(t)† → Λj (4.5.15)

represent the forward-solved geodesic equations [54, 189] (and see section C.1.10
along with discussion of KP problem solutions in sections C.5.6 and 5.3.2). Given
the initial condition Λ0 , γ(t) here is the cumulative evolved operator in SU(2n ) that
is, for time-step tj , we have:

j
Y
γ(tj ) = Uj (4.5.16)
i=N

In this respect conjugating Λ0 by the γ(tj ) is equivalent to adopting a co-rotating

basis or so-called moving frame for the Lie algebra (not dissimilar to how conjugation
acts in a standard Euler decomposition such as in [190]). Projecting the conjugated
Λ0 back onto the horizontal space HM (i.e. ∆) then defines Λj as Λ0 parallel
transported along the approximate geodesic. This algorithmic approximation thus
achieves (a) a way to parallel transport Λ0 and (b) a decomposition method for gen-
erating Uj . The continuous curve γ(t) is discretised via partitioning the parametri-
sation interval into N temporal segments. A first-order integrator is then utilised to
solve the differential equation. In continuous form, the integration equation for the
unitary propagator applied over interval ∆t takes the time-dependent form (4.4.1):
Z ∆t
†
U (t) = exp −i proj∆ (γ(t)Λ0 γ(t) )dt (4.5.17)
0

(note that in the accompanying code, the imaginary unit is incorporated into ∆).
In the discrete case, the curve γ(t) is partitioned into N such segments of equal
148 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

parameter-interval h = ∆t, indexed by γj where j = 1, ..., N . The first-order inte-

gration resolves to:
Uj = exp(−ihproj∆ (γj Λ0 γj† )) (4.5.18)

where here Uj are unitaries that forward-solve the geodesic equations, represented
in terms of the Euler discretisation [53]:

γj+1 = Uj γj (4.5.19)
= exp(−ihproj∆ (γj Λ0 γj† ))γj (4.5.20)

where, again to reiterate, γj+1 represents the cumulative unitary propagator at time
tj+1 and Uj represents the respective unitary that propagates γj → γj+1 . The
Hamiltonian Hj for segment Uj is given by the projection onto ∆:

Hj = proj∆ (γj Λ0 γj† ) (4.5.21)

and is applied for time h (though see Appendix (4.12) below for nuances regarding
the interpretation of h and time given the imposition of ||proj∆ (Λ0 )|| = ||u0 || =
1). A consequence of these formal solutions is that each Hj is constrained to be
generated from ∆. This does not mean that only unitaries directly generated by
∆ are reachable, as the action of unitaries (see (4.5.5)) gives rise to generation of
generators outside ∆. It is, however, of relevance to the construction of machine
learning algorithms seeking to learn and reverse-engineer geodesic approximations
from target unitaries UT . The consequence of this requirement is that the control
functions for machine learning algorithms need only model controls for generators
in ∆.

4.6 Experimental Design

In this section, we detail our experimental design and implementation of various
machine learning models that build upon and extend work in [54] applying deep
learning to the problem of approximate geodesic quantum circuit synthesis. The
overall objective of our experiments was to compare the performance of variety
of different machine learning architectures in simulated environments in terms of
generating time-optimal quantum circuits by being trained on approximate normal
subRiemannian geodesic in SU (2n ) Lie groups. While other methods, such as the
‘shooting’ method [192] provide alternative means of generating geodesic data, it
was shown in [54] that such methods particularly for higher-order SU (8) cases led to
considerable increases in runtime compared with neural network approaches. In any
case, as our primary focus in this Chapter was on investigating the utility of greybox
4.6. EXPERIMENTAL DESIGN 149

approaches to geometric machine learning architectures, such alternative methods

(for example, implementing the methods of [184, 185]) of generating geodesics or
approximations thereto were not canvassed.

4.6.1 Experimental objectives

Synthesis of geodesics for use as training data in the various machine learning pro-
tocols utilised an adapted subRiemannian approach from [54] and [190]. Our overall
objectives required the ability to decompose a target unitary UT in order to generate
the sequence (Uj ) from UT and, in turn, render each Uj synthesisable from a set of
control amplitudes applied to generators from ∆. There are a variety of classical
deep learning approaches that can be adopted to solving this type of supervised
learning problem, including:

1. Standard neural network models: neural network models (section D.4.2.3)

adopt variations on simply connected or other architecture that seeks to learn
an optimal configuration of hidden representations in order to output (and
thus generate) the desired sequence. On their own such models tend to be
blackbox models, in which algorithms are trained to learn a mapping from
inputs (training data) to outputs (labels) without any necessary interpretabil-
ity or clarity about the nature of the mapping or intermediate features being
generated by the network;

2. Generative models: generative models, such as variational autoencoders (VAEs)

and generative adversarial networks (GANs) and transformers seek to learn the
underlying distribution of ground truth data, then use that learnt distribution
to generate new similar data (distributions); and

3. Greybox models: greybox models, as discussed in section D.9 and further on,
seek to combine domain knowledge (such as laws of physics), also known as
whitebox models, together with blackbox models into a hybrid learning pro-
tocol. The particular examples we focus on are variational (parametrised)
quantum circuit models.

The practical engineering, target inputs and outputs of the various machine
learning models differs depending upon metrics of success and use case. For a
typical quantum control problem, the sought output of the architecture is actually
the sequence of control pulses (cj ) to be implemented in order to synthesise the
target unitary (i.e. apply a gate in a quantum circuit). The target unitary UT is
typically one of one or more inputs into the model architecture.
The approach in [54] is blackbox in nature. In that case, the input to their model
was (for their global decomposition algorithm) UT with label data the sequence (Uj ).
150 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

The aim of their algorithm, a multi-layered Gated Recurrent Unit (GRU) RNN, was
to learn a protocol for decomposing arbitrary UT ∈ SU (2n ) into the an estimated
sequence (Ûj ) (sequences are indicated by parentheses). The individual Ûj are then
fed into a subsequent simple feed-forward fully-connected neural network whose
output is an estimate sequence of controls (ĉj ) (where cj is used as a shorthand for
each control amplitude ckj applied to generators τk for segment j and parentheses
indicate a sequence) for generating each Ûj using τk ∈ ∆. While Ûj need not itself
(and is unlikely to) be exactly unitary, so long as the controls (ĉj ) are sufficient
to then input into (4.5.2) to generate unitary propagators, then the objective of
learning the inverse mapping (4.5.7) has been achieved. No guarantees of unitarity
from the learnt model are provided in [54], instead there is a reliance upon simply
finding (4.5.7) in order to provide (ĉj ). As we articulate below, while this approach
in theory is feasible, in practice where unitarity is required within the network itself
(as per our greybox method driven by batch fidelity objective functions), a more
detailed engineering framework for the networks is required. It is for this reason
that we adopt a greybox approach where guarantees of unitary can be obtained via
utilising a Lie-theoretic approach in which controls are learnt parameters, rather
than the specific entries (aij ∈ C) of a unitary matrix group element.

4.6.2 Models
4.6.2.1 Geodesic deep learning architectures

Three deep learning architectures were applied to the problem of learning approxi-
mations to geodesics (definition C.1.35) in SU(2n ):

1. a simple multi-layer feed-forward fully-connected (FC) network (definition

4.13.1) model implementing adaptation of (4.5.2) that learns controls (cj )
trained against (Uj ) (the FC Greybox model);

2. a greybox RNN model using GRU cells [201] in which controls (ĉj ) for esti-
mated Hamiltonians Ĥj are learnt by being trained against (Uj ) (the GRU
RNN Greybox model); and

3. a fully-connected subRiemannian greybox model (the SubRiemannian model)

which generates controls (ĉj ) by concurrently implementing (4.5.13) and learn-
ing the control pulses cΛ0 for the initial generator Λ0 (that is, a model that
replicates the subRiemannian normal geodesic equations while learning initial
conditions for respective geodesics).

Each model, described in more detail below, took as initial inputs the target unitary
UT together with unitary sequences (Uj ) (as per (4.4.7 above)). Each new model uses
4.6. EXPERIMENTAL DESIGN 151

various neural network architectures to generate controls (ĉj ) for generators τk ∈ ∆

(where Hj = k ĉkj τk ) which are in turn evolved via customised layers implementing
P

(4.4.7) in order to generate estimates (Ûj ). These estimates (Ûj ) were then compared
using MSE loss using an operator fidelity metric against a vector of ones (as perfect
fidelity will result in unity). A second metric of average operator fidelity was also
adopted to provide a measure of how well on training and validation data (see
section D.5.4 for regularisation discussion) the networks were able to synthesise Uj
with respect to the estimated Ûj .
Unlike the segmented neural networks for learning control pulses to generate
specific Uj , the variable weights (and units) of the neural network were constructed
with greater flexibility. The FC Greybox, SubRiemannian and GRU RNN Greybox
models tested were each tested. Note that MSE((Uj ), (Ûj )) refers to the batch fidelity
MSE described below. For each model, the inputs to the model were the target
unitary UT and its corresponding sequence of subunitaries (Uj ). As detailed below,
the penultimate layer of each model outputs an estimated sequence of subunitaries
(Ûj ). This estimated sequence was then compared to the true sequence (Uj ) using
operator fidelity (see (4.6.2 below). This estimate of fidelity F ((Uj ), (Ûj )) was then
compared using MSE against a vector of ones (i.e. ideal fidelity) which formed
the label for the models. As described below, the customised nature of the models
meant intermediate outputs, including estimated control amplitude sequences (ĉj ),
Hamiltonian estimate sequences (Ĥj ) and (Ûj ) were all accessible. The general
architectural principles of this greybox approach are discussed in section D.9.

4.6.2.2 Methods

Generation of training data for each of the models tested was achieved via im-
plementing the first-order subRiemannian geodesic equations in Python, adapting
Mathematica code from [54]. A number of adaptations and modifications to the
original format of the code were undertaken: (a) where in [54], unitaries were param-
eterised only via their real components (to effect dimensionality reduction) (relying
upon an analytic means of recovering imaginary components [53]), in our approach
the entire unitary was realised such that U = X + iY . This was adopted to improve
direct generation of target unitaries of interest and to facilitate fidelity calculations.
Using equation (B.3.10) such the unitaries became expressed in terms of:
!
X −Y
Û = (4.6.1)
Y X

where dim Û = dim SU (2n+1 ); and (b) in certain iterations of the code for Λ0 :
[0, 1] → SU (2n ), the coefficients of the generators were derived using tanh activation
152 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

functions:

ex − e−x
tanh(x) =
ex + e−x

(with a range −1 to 1 rather than the range [0, 1]) that allowed elements of uni-
taries to be more accurately generated and also to test (see Appendix 4.12) whether
the first order integrative approach did indeed generate equivalent time-optimal
holonomic paths (as in [190]) (see section C.1.32). Using tanh activation functions
enabled better approximation of the relevant time-optimal control functions which
give rise to the generator coefficients (for example, to reproduce the holonomic paths
of [190], one needs the coefficients to emulate the range of the sine and cosine control
functions which characterise the time-optimal evolution in that case).
Furthermore, (c) one observation from [54] was that the training data generated
unitaries relatively proximal to the identity i.e. curves that did not evolve far from
their origin. This is a consequence of the time interval ∆t for each generator i.e.
∆t = h = 1/nseg where nseg is the number of segments. The consequence of this for
our results was that training and validation performance was very high for UT close
to the identity (that is, similar to training sets), but declined in cases for UT further
away (in terms of metric distance) from the origin. This is consistent with [54]
but also consistent with the lack of generalisation performance in their model. As
such, in some iterations of the experiments we scaled-up h by a factor in order
to seek UT which were more spread-out across the manifold. Other experiments
undertaken sought to increase the extent to which training data covered manifolds
by increasing the number of segments Uj of the approximate geodesic while keeping
h fixed (between 0 and 1). We report on scale and segment number dependence of
model performance below.
In addition to these modifications, in certain experiments we also supplemented
the [54] generative code with subRiemannian training data from a Python imple-
mentation of Boozer [190]. In this case, given the difficulty of numerically solving for
arbitrary unitaries using Boozer’s approach (whose solutions in the paper rely upon
analytic techniques), we generated rotations about the z-axis by arbitrary angles θ
(denoted η in [190]), then rotated the entire sequence of unitaries Uj by a random
rotation matrix. This has the effect of generating sub-Riemannian geodesics with
arbitrary initial boundary conditions and rotations about arbitrary axes, which in
turn provided a richer dataset for training the various neural networks and machine
learning algorithms.
For SU (2), the bracket-generating set ∆ can be any two of the three Pauli
operators. Different combinations for ∆ were explored as part of our experimental
process. Our experiments focused on setting our control subset ∆ = {σx , σy } as this
4.6. EXPERIMENTAL DESIGN 153

allowed ease of comparison with analytic results of [190] and to enable assessment
of how each machine learning model performed in cases where control subsets were
limited, which was viewed as being more realistic in experimental contexts. Note
that this corresponds to the control problem being a subRiemannian one. It aligns
also with the Cartan decomposition (definition B.5.2) of su(2) (see also Chapter 5
for a comparison with analytical methods).
Test datasets for generalisation, where the trained machine learning models are
tested against out of sample data, were generated using the same subRiemannian
generative code above. We also sought to test, for each of SU (2), SU (4) and SU (8),
the efficacy of the models in generating sequences (Ûj ) that accurately evolved to
randomly generated unitaries from each of those groups. The testing methodology
for geodesic approximation models comprised input of the target UT of interest into
the trained model with the aim of generating control pulses (ĉj ) from which (Ûj )
(and thus ÛT ) could be generated.
In each of the models, a customised layer generates candidate controls (ĉj ) in
the form of variable weights which are updated during each iteration (epoch) of the
model using TensorFlow’s autodifferentiation architecture (which streamlines up-
dating of variable weights). These control amplitudes are then fed into a customised
Hamiltonian estimation layer which applied (ĉj ) to the respective generators in ∆.
The output of this Hamiltonian estimation layer is a sequence of control Hamil-
tonians (Ĥj ) which are input into a second customised layer which implemented
quantum evolution (i.e. equation (4.4.7)) in order to output (Ûj ). A subsequent
custom layer takes (Ûj ) and the true (Uj ) as inputs and calculated their fidelity i.e.
it takes as inputs batches of estimates (Ûj ) and ground truth sequence (Uj ) and
calculates the operator fidelity (see section A.1.8.2) of each Ûj and Uj via:

F (Ûj , Uj ) = |Tr(Ûj† Uj )|2 /d2 (4.6.2)

where d = dim Uj . It should be noted that in this case, the unitaries are ultimately
complex-valued (rather than in realised form) prior to fidelity calculations. The
outputs of the fidelity layer are the ultimate output (labels) of the model (that is,
the output is a batch-size length vector of fidelities). These outputs are compared to
a label batch-size length vector of ones (equivalent to an objective function targeting
unit fidelity). The applicable cost function used was standard MSE but applied to
the difference between ideal fidelity (unity) and actual fidelity:
n
1X
C(F, 1) = (1 − F (Ûj , Uj ))2 (4.6.3)
n j=1

where here n represents the chosen batch size for the models, which in most cases was
154 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

10 or a multiple thereof. It should also be noted that this approach, which we name
‘batch fidelity’, contributed significantly to improvements in performance: previous
iterations of our experiments had engineered fidelity itself as a direct loss-function
using TensorFlow’s low-level API, which was cumbersome, lacking in versatility and
resulting in limited improvement by comparison with batch fidelity approaches. A
standard ADAM optimizer [202] (with α = 10−3 ) was used for all models.

4.6.2.3 Geodesic architectures: Fully-connected Greybox

To benchmark the performance of the greybox models, a blackbox model that sought
to input UT and output (ĉj ) was constructed using a simple deep feed-forward fully-
connected layer stack taking as an input UT and outputting a sequence of estimated
control amplitudes (ĉj ). A schema of the model is shown in Figure (4.2). Subse-
quent customised layers construct estimates of Hamiltonians by applying (ĉj ) to the
generators in ∆, which are in turn used to generate subunitaries Ûj .
The stack comprised an initial fully-connected feed forward network with stan-
dard clipped ReLU activation functions (with dropout ∼ 0.2) that was fed UT . This
stack fed into a subsequent dense layer outputting (ĉj ) utilised tanh activation func-
tions. Standard MSE loss against the label data (cj ) was used (akin to the basic
GRU in [54]). The sequence (Uj ) was then reconstructed using (ĉj ) external to the
model and fidelity assessed separately. In this variation of the feed-forward fully-
connected model, a basic greybox approach that instantiated the approximation
(4.4.7) was adopted.
As we discuss in Appendix D, greybox approaches [95, 95] represent a hybrid
synthesis of ‘blackbox’ approaches to machine learning (in which the only known
data are inputs and outputs to an typical machine learning algorithm whose in-
ternal dynamics remain unknown or uninterpretable) and ‘whitebox’ approaches,
where prior knowledge of systems, such as knowledge of applicable physical laws,
is engineered into algorithms. Practically, this means customising layers of neural
network architecture to impose specific physical constraints and laws of quantum
evolution in order to output estimated Hamiltonians and unitaries. The motivation
for this approach is that it is more efficient to engineer known processes, such as
the laws of quantum mechanics, into neural network architecture rather than devote
computational resources to requiring the network to learn what is already known to
be true (and necessary for it to function effectively) such as Schrödinger’s equation.
The greybox architecture used to estimate the control pulses necessary to synthe-
sise each Uj is set-out below. This is achieved by using τi ∈ ∆ to construct estimates
of Hamiltonians Ĥ and unitaries Û . The inputs (training data) to the network are
twofold: firstly, unitaries Û generated by a Hamiltonian composed of generators in
∆ with uniform randomly chosen coefficients ckj ∈ [−1, 1], where the negative values
4.6. EXPERIMENTAL DESIGN 155

represent, intuitively, tangent vectors pointing in the opposite direction along a Lie
group manifold:

dim |∆|
X
Ĥj = ĉkj τk ĉkj ∼ U [−1, 1] (4.6.4)
k=1

Ûj = exp(−hĤj ) (4.6.5)

(recalling i is absorbed into τk for convenience). The coefficients cj are constructed

via successive feed-forward fully-connected dense layers before being applied to the
generators: they are the optimal controls being sought and represent updatable
weights in the network. Secondly, a tensor of training Uj (generated extrinsically
from ∆) is separately input into the network.
Because TensorFlow layers require output/input as real floats, Uj is separated
into a real Re(Uj ) and imaginary Im(Uj ) parts which are subsequently recombined
in a customised layer. The specific controls being learnt by the network were accessi-
ble using standard TensorFlow techniques that allow access to intermediate output
layers (we do this by creating separate models whose output is the output of an
intermediate layer). This approach allows access to intermediate outputs, such as
Ĥj and Ûj .
For training and validation of the model, we utilised the following inputs and
outputs:

• Inputs: UT (target unitary) and (Uj ) the training sequences (Uj ); and

• Outputs: Fidelity F (Ûj , Uj ) ∈ [0, 1], representing the fidelities of the estimate
of the sequence (Ûj ) from those in the training data.

In the model, UT is fed into an initial feed-forward fully-connected layer which

is in turn connected to a dense flattened layer to produce a coefficient ĉkj for each
generator τk ∈ ∆ in (4.5.2). Thus for a model with nseg segments and dim ∆ = d, a
total of nseg × d coefficients ĉk are generated.
These are then applied to the generators τk in a customised Hamiltonian esti-
mation layer. The output of this layer is then input into a unitary layer which that
generates each subunitary:

Ûj = exp(hĉkj τk ) (4.6.6)

with summation implied in order to generate the estimated sequence of unitaries

(Ûj ). A subsequent custom layer calculates F (Ûj , Uj ). The output of this layer is
a (batched) vector of fidelities which are compared against a label of ones using a
standard MSE loss function and Adam optimiser. This model is the simplest of the
156 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

greybox models adopted in our experiments. Pseudocode for the Fully-connected

Greybox model is set-out in section 4.10 below.

Figure 4.2: Schema of Fully-Connected Greybox model: (a) realised UT inputs (flattened) into
a stack of feed-forward fully connected layers with ReLU activations and dropout of 0.2; (b) the
final dense layer in this stack outputs a sequence of controls (ĉj ) using tanh activation functions;
(c) these are fed into a custom Hamiltonian estimation layer produce a sequence of Hamiltonians
(Ĥj ) using ∆ ; (d) these in turn are fed into a custom quantum evolution layer implementing the
time-independent Schrödinger equation to produce estimated sequences of subunitaries (Ûj ) which
are fed into (e) a final fidelity layer for comparison with the true (Uj ). Intermediate outputs are
accessible via submodels in TensorFlow.

4.6.2.4 Geodesic architectures: GRU RNN Greybox

The second category of deep learning architectures explored in our experiments were
RNN algorithms [14,203]. LSTMs are a popular deep learning tool for the modelling
of sequential datasets, such as time-series or other data in which successive data
depends upon preceding data points. The interested reader is directed to a number of
standard texts [14] covering RNNs architecture in general for an overview. In short,
RNNs are modular neural networks comprising ‘cells’, self-enclosed neural networks
consisting of inputs of training data, outputs and a secondary input from preceding
cells. For sequential or time-series data, a sequence of modules are connected for
each entry or time-step in the series, j. The intuition behind RNNs, such as Long-
Short Term Memory networks (LSTMs), is that inputs from previous time-step
cells or ‘memories’ can be carried forward throughout the network, enabling it to
more accurately learn patterns in sequential non-Markovian datasets. The original
application of GRU RNNs to solving the geodesic synthesis problem was the focus
of [54]. That work utilised a relatively simple network of GRU layers, popular due
to efficiencies it can provide to training regimes.
In the present case, the aim of the GRU RNN is to generate a model that can
decompose a target unitary UT into a sequence Uj reachable from I ∈ SU (2n ).
4.6. EXPERIMENTAL DESIGN 157

The GRU RNN seeks to reverse-engineer the geodesically approximate sequence of

subunitaries through a learning protocol that is itself sequential. In this model, the
index j of the sequence (Uj ) is akin to a ‘time slice’. At each slice j, the unitary
Uj is input into the corresponding GRU cell Gj (one for each segment). A schema
of the model is shown in Figure (4.3). The cell activation functions were set to
the tanh function given its range of [−1, 1] accorded with the range of elements of
desired subunitaries. The output of the GRU cell Gj then becomes, with a certain
probability, an input into the successor GRU cell Gj+1 which also takes as an input
the successor subunitary Uj+1 where the function tanh over operators (matrices) is
understood in the usual way (see Appendix 4.13 for background).
The output of the GRU RNN is itself a sequence of control amplitudes (ĉj ) from
which were then applied to generators in ∆ in a custom Hamiltonian estimation layer
in TensorFlow in order to construct Hamiltonian estimates and quantum evolution
layers to generated estimated subunitaries Ûj . As with other models above, the
sequence (Ûj ) was then input into a customised batch fidelity layer for comparison
against the corresponding (Uj ). Our variations of the basic GRU RNN differed
in that rather than simply concatenating and flattening all (Uj ) into a long single
vector for input into a single GRU cell, each Uj was associated with time-slice j, the
objective being that, a discretised output of (Ûj ).
Our main adaptation to the standard GRU RNN model was to include cus-
tomised layers as described above such that the output (Ûj ) were themselves gener-
ated by inputting learnt coefficients (ĉj ) into custom Hamiltonian estimation layers
(containing generators from ∆), followed by quantum evolution layers (exponentia-
tion) to generate the estimates. In this respect we followed novel approaches devel-
oped in [95,158], particularly around sequential Hamiltonian estimation (though we
restricted ourselves throughout to square pulse forms for (ĉj ) only instead of also
trialling Gaussian pulses). Here the aim of the GRU is to replicate the algorithmic
approach in [54], for example learning how Λ0 is conjugated by Uj in the generation
of Uj+1 . Again, this represents in effect a form of ‘whitebox’ engineering in which
assured knowledge, namely how unitaries approximately evolve under the cumula-
tive action of subunitaries, is encoded into customised layers within the network
(rather than having the network ‘deduce’ this process). Pseudocode for the GRU
RNN Greybox model is set-out in section 4.6.2.4.

4.6.2.5 Geodesic architectures: SubRiemannian model

The third model (the SubRiemannian model) architecture developed in our experi-
ments expanded upon principles of greybox network design and subRiemannian ge-
ometry in order to generate approximations to subRiemannian geodesics. A schema
of the model is shown in Figure (4.4). The choice of architecture was motivated by
158 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.3: Schema of GRU RNN Greybox model: (a) realised UT inputs (flattened) into a GRU
RNN layer comprising GRU cells in which each segment j plays the role of the time parameter; (b)
the output of the GRU layer is a sequence of control pulses (ĉj ) using tanh activation functions; (c)
these are fed into a custom Hamiltonian estimation layer to produce a sequence of Hamiltonians
(Ĥj ) by applying the control amplitudes to ∆; (d) the Hamiltonian sequence is fed into a custom
quantum evolution layer implementing the time-independent Schrödinger equation to produce es-
timated sequences of subunitaries (Ûj ) which are fed into (e) a final fidelity layer for comparison
with the true (Uj ). Intermediate outputs are accessible via submodels in TensorFlow.

insights from the variational means of generating subRiemannian geodesics them-

selves [46,49], namely that a machine learning model that effectively leveraged known
or assumed knowledge regarding evolution of unitaries and their generation would
perform better than more blackbox-oriented approaches. In essence the model al-
gorithmically implemented the recursive method of generating approximate subRie-
mannian geodesics which relies only upon Λ0 and ∆ and relied upon learning the
choice of initial condition Λ0 , rather than having to learn how to construct Hamil-
tonians or evolve according to the laws of quantum mechanics (which were instead
dealt with via customised layers).
The method in [54] assumes certain prior knowledge or information in order to
generate output, including: (a) the distribution ∆ i.e. the control subset in an
experiment of interest; (b) the form of variational equations giving rise to normal
subRiemannian geodesics; (c) hyperparameters, such as knowledge of the number of
segments in each approximation and time-step h. The form of (4.5.13) provides (via
the trace operation) the control amplitudes ĉj for each generator for Hamiltonian Ĥj .
Once the initial generator Λ0 is selected, given these prior assumptions, the output
of geodesic approximations is predetermined. This characterisation was then used
to design the network architecture: the inputs to the network were target unitaries
UT , together with the associated sequence (Uj ) and control subset ∆.
4.6. EXPERIMENTAL DESIGN 159

The aim of the network was to, given the input UT , learn the control amplitudes
for generating the correct Λ0 which, when input into the subRiemannian normal
geodesic equations, generated the sequence (Ûj ) from which UT could be obtained
(thus resulting in a global decomposition of UT into subunitaries evolved from the
identity). Recall that Λ0 is composed from su(2n ) or ∆ depending on use case (the
original paper [54] selects Λ0 ∈ su(2n )). This generated Λ0 was then input into a
recursive customised layer performing the projection operation (4.5.13) that outputs
estimated Hamiltonians, followed by a quantum layer that ultimately generated the
sequence (Ûj ). The sequence (Ûj ) was then input into a batch fidelity layer for
comparison against the true (Uj ). Once trained, the network could then be used for
prediction of Λ0 , (Uj ), the sequence of amplitudes (ci ) and (Ûj ), each being accessible
via the creation of sub-models that access the respective intermediate custom layer
used to generate such output. Pseudocode for the SubRiemannian model is set-out
in section (4.10.3).
As we discuss in our results section, this architecture provided among the highest-
fidelity performance which is unsurprising given that it effectively reproduces the
subRiemannian generative method in its entirety. One point to note is that, while
this architecture generated the best performance in terms of fidelity, in terms of the
actual learning protocol (i.e. the extent to which the network learns as measured by
declines in loss), it was less adaptive than other architectures. That is, while having
overall lower MSE, it was initialised with a lower MSE which declined less. This is
not unexpected given that, in some sense, the neural network architecture combined
with the whitebox subRiemannian generative procedure overdetermines the task
of learning the coefficients of a single generator Λ0 used as an initial condition.
The other point to note is that in [54], Λ0 ∈ su(2n ) i.e. it is drawn from the full
Lie algebra, not just ∆ (intuitively because it provides a random direction in the
tangent space to commence evolution from). From a control perspective, however,
if one only has access to ∆, one cannot necessarily synthesise Λ0 , thus a second
iteration of experiments where Λ0 ∈ ∆ were undertaken. The applicability of the
SubRiemannian model as a means of solving the control problem is more directly
related to this second case rather than the first.

4.6.2.6 Geodesic architectures: GRU & Fully-connected (original) model

In order to benchmark the performance of the greybox models described above, we

recreated the original global and local machine learning models utilised in [54]. In
that paper, the global model utilised a simple shallow-depth GRU RNN taking UT
as inputs and outputting sequence estimates (Ûj ) (being trained on the true (Uj )).
In this global decomposition, each element of each Uj is in effect a trainable weight,
with the GRU RNN returning the full (Ûj ) instead of only the control amplitudes as
160 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.4: Schema of SubRiemannian model: (a) realised UT inputs (flattened) into a set of
feed-forward fully-connected dense layers (with dropout ∼ 0.2); (b) two layers (red) output sets
−
of control amplitudes for estimating the positive (ĉ+
Λ0 ) and negative (ĉΛ0 ) control amplitudes us-
ing tanh activation functions; (c) these are fed into two custom Hamiltonian estimation layers to
−
produce the positive Λ̂+ n
0 and negative Λ̂0 Hamiltonians for Λ0 using ∆ or su(2 ) that are com-
bined into a single Hamiltonian estimation Λ̂0 ; (d) Λ̂0 is fed into a custom subRiemannian layer
which generates the control amplitudes (ĉj ), Hamiltonians (Ĥj ) and then implements the time-
independent Schrödinger equation to produce estimated sequences of subunitaries (Ûj ) which are
fed into (e) a final fidelity layer for comparison with the true (Uj ). Intermediate outputs (a) to
(d) are accessible via submodels in TensorFlow. The SubRiemannian model resulted in average
gate fidelity when learning representations of (Uj ) of over 0.99 on training and validation sets
in comparison to existing GRU & FC Blackbox models which recorded average gate fidelities of
≈ 0.70, demonstrating the utility of greybox machine learning models in synthesising learning uni-
tary sequences.

intermediate layers as in our GRU RNN Greybox model. The local model took Uj as
an input and output the coefficient control amplitude estimates (ĉj ) from which the
sequence (Ûj ) could be reconstructed using ∆. In [54], in order to reduce parameter
size of the model, the original global model was trained only on the real part of
(Uj ) on the basis that the imaginary part could be recovered via application of the
unitarity constraint (see [53] for details).
To learn the individual Uj segments of the approximate geodesic unitary path,
we adapted while substantially modifying the approach in [54]. In that paper, the
method of learning Uj segments was adopted via feeding the real part of a vectorised
(i.e. flattened) unitary Uj into a simple three layer feed-forward fully connected
neural network. The labels for the network were the true control pulse amplitudes
ckj .
Recreating these models it was found that using only the realised part of uni-
taries was insufficient for model performance overall, thus we included both real and
imaginary parts both for model performance but also because it is unclear whether
4.7. RESULTS 161

simply training alone on realised parts of unitaries affects the way in which the
networks would integrate information about the imaginary parts. Furthermore, the
approach in [54] did not use measures such as fidelity of more utility to quantum in-
formation practitioners, thus our model extended the original models by recreating
the unitaries from the estimated controls (ĉj ).

4.7 Results

4.7.1 Overview

The motivation behind the architectures above is to develop protocols by which

time-optimal quantum circuits may be implemented via sequences of control pulses
applied to quantum computational systems. In this respect, the objective is for
the architectures to receive a target unitary UT as input and output a sequence of
control pulses (ĉj ) necessary to synthesise the estimate ÛT that optimises fidelity
F (ÛT , UT ). Our experimental method sought to enable comparison of blackbox and
greybox methods across the synthesis of unitary propagators (gates) in SU (2n ) for
n = 1, 2, 3 and higher order groups in order to achieve this objective. We also sought
to gain insight into hyperparameters of model architecture by shedding light on, for
example, the optimal number of segments, training examples and training data.
Throughout our experiments, we observed that the selection of hyperparameters
for both the training data and the models made a significant impact on performance.
For example, as we discuss below, selection of different values for h = ∆t exhibited
a noticeable impact on performance in terms of training/validation batch fidelity
MSE and generalisation to test sets. For this reason, we extended our experiments
to include progressively increasing values of h from h = 1/nseg to around h ≈ 1.
Generalisation of models was tested via assessing the fidelities of (Ûj ) output by
the trained models and also independently reconstructing (Ûj ) from the estimates
of control coefficients (ĉj ). In this respect, our architecture benefited from the
customised layering in that intermediate outputs of the models, such as control
coefficients, sequences of estimated Hamiltonians (Ĥj ), the actual unitary sequences
(Uj ) and fidelities could all easily be extracted from the models using TensorFlow’s
standard Keras model functional API. As discussed in [95,158], one of the benefits of
this type of architecture is that it allows practitioners to ‘open’ the machine learning
box, as it were, to validate at intermediate steps that the whitebox outputs of the
model match expectations, which in turn is useful for model tuning and engineering.
162 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

4.7.2 Tables and charts

Experimental results are set out in the tables and figures below. In Table (4.1),
each of the four models was trained and evaluated against training data from SU (2),
SU (4) and SU (8). For the greybox models, batch fidelity MSE was chosen as the
relevant loss function. For the GRU & FC Blackbox model that replicated (subject
to the inclusion of imaginary parts of unitaries in training) the original machine
learning models in [54], standard MSE comparing realised unitary sequences (Uj )
and estimates (Ûj ) was used. Average gate fidelities for training and validation
data sets were also recorded, with order of magnitude of standard error provided in
parentheses. Bold entries indicate the highest MSE and fidelity metrics for models
trained on SU (2), SU (4) and SU (8) training data respectively.

Comparison table: training and validation | Λ0 ∈ su(2n )

Model SU(2) SU(4) SU(8)
Metric MSE(T) MSE(V) Fidelity MSE(T) MSE(V) Fidelity MSE(T) MSE(V) Fidelity
GRU & FC 3.693e- 3.559e- 0.6936(e- 4.144e- 4.887e- 0.7170(e- 1.852e- 4.447e- 0.7231(e-
Blackbox* 05 05 01) 05 05 02) 04 04 02)
FC Greybox 1.827e- 1.681e- 0.9964(e- 3.924e- 4.156e- 0.9940(e- 2.607e- 2.450e- 0.9842(e-
05 05 05) 05 05 05) 04 04 05)
SubRiemannian 8.728e- 3.211e- 0.9999(e- 1.521e- 2.007e- 0.9999(e- 1.024e- 1.137e- 0.9998(e-
(XY) 09 10 05) 07 07 05) 05 04 05)
GRU RNN 1.414e- 1.348e- 0.9998(e- 9.019e- 1.204e- 0.9998(e- 3.557e- 1.186e- 0.9998(e-
Greybox 07 07 05) 08 07 05) 06 05 05)
Table 4.1: Comparison table of batch fidelity MSE ((Uj ) and (Ûj )) for training (MSE(T)) and
validation (MSE(V)) sets along with average operator fidelity (and order of standard deviation
in parentheses) for four neural networks where Λ0 ∈ su(2n ): (a) GRU & FC Blackbox (original)
(b) FC Greybox, (c) SubRiemannian model and (d) GRU RNN Greybox model. Parameters:
h = 0.1, nseg = 10, ntrain = 1000; training/validation 75/25; optimizer: Adam, α ≈1e-3. Note*:
MSE for GRU & FC Blackbox standard MSE comparing (Uj ) with Ûj . SubRiemannian and GRU
RNN Greybox models outperform blackbox models on training and validation sets with lower MSE,
higher average operator fidelity and lower variance.

Comparison table: training and validation | Λ0 ∈ ∆

Model SU(2) SU(4) SU(8)
Metric MSE(T) MSE(V) Fidelity MSE(T) MSE(V) Fidelity MSE(T) MSE(V) Fidelity
GRU & FC 1.053e- 8.668e- 0.7180(e- 1.328e- 1.739e- 0.9621(e- 4.283e- 1.045e- 0.7177(e-
Blackbox* 07 08 02) 04 04 04) 05 04 02)
SubRiemannian 2.616e- 9.263e- 0.9999(e- 5.224e- 5.983e- 0.9999(e- 2.165e- 6.874e- 0.9979(e-
(XY) 09 11 05) 08 09 05) 07 05 05)
GRU RNN 7.290e- 7.086e- 0.9999(e- 3.478e- 5.505e- 0.9999(e- 2.817e- 1.092e- 0.9994(e-
Greybox 10 10 05) 09 09 05) 07 06 05)
Table 4.2: Comparison table of batch fidelity MSE ((Uj ) v. (Ûj )) for training (MSE(T)) and
validation (MSE(V)) sets along with average operator fidelity (and order of standard deviation in
parentheses) for models where Λ0 ∈ ∆: (a) GRU & FC Blackbox (original) (b) SubRiemannian
model and (c) GRU RNN Greybox model. Parameters: h = 0.1, nseg = 10, ntrain = 1000; train-
ing/validation 75/25; optimizer: Adam, α ≈1e-3. Note*: MSE for GRU & FC Blackbox standard
MSE comparing (Uj ) with Ûj . For this case, overall the GRU RNN Greybox model performed
slightly better than the SubRiemannian model, with both outperforming the GRU & FC Blackbox
model. The FC Greybox model was not tested given its inferior performance overall.
.
4.8. DISCUSSION 163

Figure 4.5: Training and validation loss (MSE) Figure 4.6: Training and validation loss (MSE)
for SU(2). for SU(4).

4.8 Discussion

4.8.1 Geodesic approximation performance

As can be seen from Table (4.1), the in-sample (training/validation) performance of
the models varied considerably between blackbox and greybox approaches. From a
use-case and training data perspective, as can be seen from Table (4.1), while the
SubRiemannian and GRU RNN Greybox models outperformed the existing bench-
mark in [54] in terms of in-sample batch fidelity MSE loss, we see a decline in
estimations of Uj ∈ SU (2n ) for higher n. MSE overall increases with dimension n,
which is not unexpected.

4.8.2 Greybox improvements

As can be seen from Table 4.1, the greybox models in general significantly outper-
formed (with fidelities around the 0.99 mark) the generic blackbox models (with
fidelities in the order of 0.70) for in-sample training and validation experiments for
all greybox models and all training data sets (Λ ∈ su(2n ) and Λ ∈ ∆). By com-
parison with existing approaches in [54] and blackbox models that seek to directly
synthesise control sequences (cj ) or unitary sequences (Uj ), the SubRiemannian and
GRU RNN Greybox models outperformed (batch fidelity MSE) the FC Greybox
model by several orders of magnitude. This is evident most apparently in Figures
(4.5) and (4.6). In Figure (4.5), representing training of the models on SU (2) data,
the SubRiemannian model performs the best out of each model, though exhibits
overfitting at around the 80 epoch level. These improvements were also accompa-
nied by functional benefits such as the guarantees of unitarity of Uj .
Figures (4.5), (4.6) and (4.8) show training and validation loss for the three mod-
164 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.7: Training and validation loss (MSE). Comparison of MSE at different time intervals for
the SubRiemannian model. h = 0.1, 0.5 and 1. G = SU (2), ntrain = 1000, nseg = 10, epochs= 500,
Λ0 ∈ su(2n ): This plot shows the differences in MSE on training and validation sets as the time-step
h = ∆t varies from 0.1 to 1. As can be seen, larger h leads to deterioration in performance (higher
MSE). However, smaller h can lead to insufficiently long geodesics, leading to a deterioration in
generalisation. Setting h = 0.1 (red curves) exhibits the best overall performance. Even a smaller
jump up to h = 0.5 (blue curves) exhibits an increase in MSE and decrease in performance by
several orders of magnitude (and similarly for h = 1).

els for the case of SU (2), SU (4) and SU (8) for 1000 training examples, 10 segments,
h = 0.1 and 500 epochs. All models exhibit a noticeable flatlining of the MSE loss
for after a relatively short number of epochs, indicative of the models saturating
(reaching capacity for learning), a phenomenon accompanied by predictable overfit-
ting beyond such saturation points. For small h ≈ 0.1, the batch fidelity MSE is
already at very low levels of the order ∼ 10−5 . Again we see these persistently low
MSEs as indicative of a highly determined model in which the task of learning Λ0
(at least for smaller dimensional SU(2n )) is overdetermined from the standpoint of
large hidden layers (with 640 neuron units each), together with a prescriptive sub-
Riemannian method. From one perspective, these highly determined architectures
such as SubRiemannian model have less applicability beyond the particular use-case
4.8. DISCUSSION 165

Figure 4.8: Training and validation loss (MSE). Comparison of SubRiemannian, FC Greybox
and GRU RNN Greybox models. G = SU (8), ntrain = 1000, nseg = 10, h = 0.1, epochs= 500,
Λ0 ∈ su(2n ): For U ∈ SU (8), we see (main plot - first 100 epochs) that the GRU RNN Greybox
(blue line) performs best in terms of batch fidelity MSE on training and validation sets. As
shown in the inset, the GRU RNN Greybox levels out (saturates) after about 100 epochs and
overall performed the best of each of the models and rendered average operator fidelities of around
0.998. The SubRiemannian model (red) performed less-well than the GRU RNN, still recording
high average operator fidelity but exhibiting overfitting as can be seen by the divergence of the
validation (dashed) curve from the training (smooth) curve. The FC Greybox rapidly saturates
for large ntrain and exhibits little in the way of learning. All models render high average operator
fidelity > 0.99 and saturate after around 150 epochs (see inset).

of learning the subRiemannian geodesic approximations specified by the method

in [54]. A comparison of FC Greybox, which is a more generalisable architecture
(not restricted to whitebox engineering of the subRiemannian algorithm) indicates
relatively high performance measures of low MSE and high fidelity.
While the SubRiemannian model performed best in the case of Λ0 ∈ su(2n ),
as is evident from Tables (4.1 and 4.2), the GRU RNN Greybox model performed
almost as well for SU(2) and moderately outperformed the SubRiemannian model
for SU (4) and SU (8) for most cases. The GRU RNN Greybox model was noticeably
faster to train than the FC Greybox model by several hours and was slightly quicker
to train than the SubRiemannian model but also flatlines (saturates) relatively early
as evident in Figure (4.9). This is of note considering the fact that the GRU RNN
Greybox model has more parameters than the SubRiemannian model (which os-
tensibly needs to only learn control amplitudes for Λ0 generation) and is consistent
166 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.9: Training and validation loss (MSE): GRU RNN Greybox. G = SU (2), ntrain =
1000, nseg = 100, h = 0.1, epochs= 500. This plot shows the MSE loss (for training and vali-
dation sets) for the GRU RNN Greybox model where the number of segments was increased from
10 to 100. As can be seen, the model saturates rapidly once segments are increased to 100 and
exhibits no significant learning. Similar results were found for the SubRiemannian model. This
result suggests that simply changing the number of segments is insufficient for model improvement.
One solution to this problem may be to introduce variable or adaptive hyperparameter tuning into
the model such that the number of segments varies dynamically.

with the demonstrable utility of GRU neural networks for certain quantum control
problems [95]. One possible reason for differences between GRU RNN Greybox and
SubRiemannian models may lie in the sensitivity of Λ0 : the SubRiemannian model’s
only variable degrees of freedom once initiated are in the relatively few weights ckj
learnt in order to synthesise Λ0 . As the dimension of SU (2n ) grows, then the co-
efficients of Λ0 become increasingly sensitive, that is, small variations in ckj have
considerable consequences for shaping the evolution in higher-dimension spaces, in
a sense, Λ0 bears the entire burden of training and so becomes hypersensitive and
requires ever fine-grained tuning. This is in contrast to the GRU, for example, where
the availability of more coefficients ckj means each individual coefficient ckj need not
be as sensitive (can vary more) in order to learn the appropriate sequence.

4.8.3 Segment and scale dependence

The experiments run across the various training sets indicated model dependence
on the number of segments and scale h. As can be seen from Figure (4.10), we
find that, not unexpectedly, the performance of models depends upon training data.
In particular, model performance measures such as MSE and fidelity clearly depend
upon time interval h = ∆t: where h is small, i.e. the closer the sequence of (Uj ) is to
approximating the geodesic section, the lower the MSE and higher the fidelity. The
effect on model performance is particularly evident in Figure (4.7) where increasing
h from 0.1 to 0.5 leads to a deterioration in loss of several orders in magnitude
4.8. DISCUSSION 167

(particularly for h > 0.3). As step size h increases, the less approximating is the
resultant curve to a geodesic. Furthermore, for larger step sizes, the conditions
required for the assumption of time independence in unitary evolution (4.4.7) are
less valid.

Figure 4.10: Scale h dependence (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500. Plot demonstrates increase in batch fidelity MSE as scale h (∆t) increases from 0.1
to 1, indicative of dependence of learning performance on time-interval over which subunitaries Uj
are evolved.

4.8.4 Generalisation
In order to test the generalisation of each model (see Appendix D for discussion),
a number of tests were run. In the first case, a set of random target unitaries ŨT
from the relevant SU(2n ) group of interest were generated. These target ŨT were
then input into the SubRiemannian and GRU RNN Greybox models which output
the estimated approximate geodesic sequences (Ûj ) to propagate from the identity
to ŨT . An estimated endpoint target estimate ÛT for the approximate geodesic was
generated. This estimate was then compared against ŨT to obtain a generalised
gate fidelity F (ŨT , ÛT ) for each test target unitary. Second, because fidelities of test
unitary targets varied considerably, in order to test the extent to which higher fi-
168 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

delities may be related to similarity to the underlying training set of target unitaries
{UT }train on which the models were trained, a second fidelity calculation was under-
taken. The average gate fidelity of ŨT with {UT }train was calculated F̄ (ŨT , {UT }train ).
Correlations among the two fidelities were then assessed.
In the third case, for SU (2) models trained on training data where Λ0 ∈ ∆,
random test unitaries were replaced by ŨT comprising random-angle θ ∈ [−2π, π]
z-rotations. The rationale was to test the extent to which a model based upon
restricted control subset training and architecture could replicate unitaries generated
only from ∆ with high fidelity for the single qubit case of SU (2) where analytic
solutions to the time optimal synthesis of subRiemanninan geodesics are known
[190].

Figure 4.11: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity F (ÛT , ŨT ) of randomly generated
ŨT with the reconstructed estimate ÛT , versus F (ÛT , ŨT ), average operator fidelity of randomly
generated UT with training {UT }train inputs to the model. The upward trend indicates an increase
in operator fidelity as similarity (Pearson coefficient of 0.52 to 95% significance) of UT to training
{UT }train increases. Colour gradient indicates low fidelity (blue) to high fidelity (red).

Generalisation of both GRU RNN Greybox and SubRiemannian models trained

on SU (2) was of mixed success. Figure (4.11) plots F (ŨT , ÛT ) against F̄ (ŨT , {UT }train )
for the SubRiemannian model (comprising only X, Y generators) trained on ran-
4.8. DISCUSSION 169

Figure 4.12: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ ∆. Plot of generalised gate fidelity F (ÛT , ŨT ) of random-angle θ ∈ [−2π, π]
z-rotations against generated ŨT with the reconstructed estimate ÛT , versus F (ÛT , ŨT ), average
operator fidelity of randomly generated UT with training {UT }train inputs to the model. Here there
is no statistically significant correlation between UT and training set {UT }train , though higher test
fidelities are evident for UT bearing both high and low similarity to the training set (less depen-
dence on similarity to training set for high fidelities).

domly generated unitaries in SU (2) where Λ0 ∈ su(2n ) (colour gradient indicates

low fidelity (blue) to high (red)). As can be seen, generalised gate fidelity varies
considerably, with average generalised gate fidelity of 0.6474 with considerable un-
certainty (standard deviation of 0.2288). In this case, there is a discernible relation-
ship between fidelity and the extent to which training and test set unitaries (used in
generalisation) are similar. We see this via the upward trend of fidelities as similar-
ity of ŨT to the training set {UT }train increases. This is not uncommon in machine
learning contexts, where the more similar data used for generalising a model are to
training data, the more accurate the model.
The model was able to generate approximations to normal subRiemannian geodesics
for certain random unitaries with a fidelity of over 0.99 by comparison with the in-
tended target ŨT . A worked example illustrating the generation of specific control
amplitudes using the GRU RNN Greybox model for a specific z-rotation is set-out
in section 4.10.5. Identifying structural characteristics among those estimates with
higher fidelity remains an open problem.
By comparison, Figure (4.12) plots the relationship between generalised gate
fidelity and similarity to training set data for the SubRiemannian model trained on
data generated where Λ0 ∈ ∆. In this case, the test unitaries ŨT were rotations by
170 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Figure 4.13: Generalisation (SubRiemannian model). G = SU (8), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ su(2n ). Plot of generalised gate fidelity F (ÛT , ŨT ) versus F (ÛT , ŨT ) (av-
erage operator fidelity against training set {UT }train ). Generalisation was significantly worse for
SU(8), however correlation of generalised gate fidelity with similarity of UT to training sets is
evident.

a random angle θ about the z-axis. No particular relationship between ŨT and the
training set is apparent. Figure (4.14) plots the same generalised gate fidelities in
relation to θ. Once again there is no immediately discernible pattern between the
angle of the z-rotation and the fidelity of the estimate of ŨT . We do see that high
(above 0.99) fidelities are distributed across the range of θ and that there is some
hollowing out of fidelities between extremes of 0 and 1.
The out of sample performance of both the SubRiemannian and GRU RNN
Greybox models (in both cases limited to generators from ∆) for random unitaries
drawn from SU (4) and SU (8) was significantly worse than for SU (2). Average
generalised gate fidelities were below 0.5 for each of the models tested. This is
not unexpected given the heightened number of parameters that the models must
deploy in order to learn underlying geodesic structure increases considerably as
the Lie group dimension expands. A larger training set may have some benefits,
however we note the saturation of the models suggests that at least for the models
deployed in the experiments described above, expanding training sets is unlikely
to systematically improve the generalisation of the models. Devising strategies to
address both model saturation and ways in which expanded training data could be
leveraged to improve model performance remains a topic for further research.
4.9. FUTURE DIRECTIONS 171

Figure 4.14: Generalisation (SubRiemannian model). G = SU (2), ntrain = 1000, nseg = 10,
epochs= 500, Λ0 ∈ ∆. Plot of F (ÛT , ŨT ) of random-angle θ ∈ [−2π, π] z-rotations θ. As evi-
dent by the red high fidelities across the range [−2π, π], the SubRiemannian model trained on data
where Λ0 ∈ ∆ and ∆ = {X, Y } in certain cases does generalise relatively well.

4.9 Future directions

This Chapter presents a comparative analysis of greybox models for learning and
generating candidate quantum circuits from time optimally generated training data.
The results from experiments above present a clear case for the benefits of greybox
machine learning architecture for specific applications in quantum control. The in-
crease in performance over blackbox models, as evidenced by training and validation
average operator fidelities for synthesised quantum circuits of over 0.99 in each case,
demonstrate that machine learning based methods of quantum circuit synthesis can
benefit from customised architecture that engineers known or necessary information
into learning protocols. This is especially the case for quantum machine learning
architectures: to the extent learning protocol resources need not be devoted to redis-
covering known information or relationships, such protocols can more leverage the
power of blackbox neural networks to those parts of problems which are unknown.
While the models outperformed current benchmarks on training and validation
sets, they faced considerable challenges generalising well. Future work that may im-
prove upon performance could include exploring hyperparameter learning, such as
dynamically learning optimal numbers of segments, time-scale h (including variable
time-scale) or other metrics (such as Finslerian metrics(see section 4.11.4 below)) for
use within the subRiemannian variational algorithm itself. The cross-disciplinary
172 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

intersection of geometry, machine learning and quantum information processing pro-

vides a rich seam of emergent research directions for which the application of both
geometric quantum control and greybox machine learning architectures explored in
this Chapter are potentially useful. It is important to note that the methods devel-
oped in this Chapter, particularly the SubRiemannian model and GRU RNN Grey-
box were both specifically engineered for particular objectives. While the models
developed in this Chapter and experiments were tailored for the particular problem
of learning subRiemannian normal geodesics for quantum circuit synthesis, the over-
all architectural framework in which geometric knowledge is encoded into machine
learning protocols has potential for useful application in quantum and classical infor-
mation processing tasks. Future work building upon the greybox machine learning
results in this Chapter could include an exploration of ways to combine the exten-
sive utility of symmetric space formalism, methods of Cartan and other geometric
techniques with machine learning.

4.10 Algorithmic architectures

The section below sets-out pseudocode for the machine learning models utilised in
the experiments above.

4.10.1 Fully-connected Greybox model

Pseudocode for the Fully-connected Greybox model is set-out below. Note that
TensorFlow inputs required (Uj ) to be separated into real Re(UJ ) and imaginary
Im(UJ ) parts and then recombined for input into fidelity calculations. Note the cost
function C(F, 1) below is implicitly a function of ckj (the sequence of which is (cj ))
which are the variable weights in the model. Here γ is the learning rate for the
gradient update and θ the trainable weights of the model.

Algorithm 2 Fully-connected Greybox model

Inputs: UT , Re(UJ ), Im(UJ ),∆,h
Labels: v = (1...1), dim v = |(Uj )|
FC Dense Network: UT → tanh(UT ; θ) = (ĉj )
Hamiltonian estimation: (ĉj ), ∆ → (Ĥj ) = ( k ĉk τk ) where τk ∈ ∆
P

Quantum Evolution: (Ĥj ) → (Ûj ) = (exp(−hĤj ))

Fidelity: Re(UJ ), Im(UJ ), (Ûj ) → F (Ûj , Uj )
MSE: min C(F, 1) = n1 nj (1 − F (Ûj , Uj ))2
P
Update: θ → θ − γ∇θ C(F, 1)
4.10. ALGORITHMIC ARCHITECTURES 173

4.10.2 GRU RNN Greybox model, parameters θ = (w, b)

Pseudocode for the GRU RNN Greybox model is set-out below. Note that Tensor-
Flow inputs required (Uj ) to be separated into real Re(UJ ) and imaginary Im(UJ )
parts and then recombined for input into fidelity calculations. Note the cost func-
tion C(F, 1) below is implicitly a function of ckj (the sequence of which is (cj )) which
are the variable weights in the model. Here γ is the learning rate for the gradient
update and θ the trainable weights of the model.

Algorithm 3 GRU RNN Greybox model

Inputs: UT , Re(UJ ), Im(UJ ),∆, h
Labels: v = (1...1), dim v = |(Uj )|
GRU RNN: UT → tanh(UT ; θ) = (ĉj )
Hamiltonian estimation: (ĉj ), ∆ → (Ĥj ) = ( k ĉk τk ) where τk ∈ ∆
P

Quantum Evolution: (Ĥj ), h → (Ûj ) = (exp(−hĤj ))

Fidelity: Re(UJ ), Im(UJ ), (Ûj ) → F (Ûj , Uj )
MSE: min C(F, 1) = n1 nj (1 − F (Ûj , Uj ))2
P
Update: θ → θ − γ∇θ C(F, 1)

4.10.3 SubRiemannian model

Pseudocode for the SubRiemannian model is set-out below. Note that TensorFlow
inputs required (Uj ) to be separated into real Re(UJ ) and imaginary Im(UJ ) parts
and then recombined for input into fidelity calculations. This model learns the
coefficients required to generate the initial condition Λ0 . It was found that the
−
model performed best when positive c+ Λ0 and negative cΛ0 control functions were
learnt separately then combined to form the coefficient cΛ0 . Note the cost function
C(F, 1) below is implicitly a function of cΛ0 . Here γ is the learning rate for the
gradient update and θ the trainable weights of the model.
174 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Algorithm 4 SubRiemannian model

Inputs: UT , Re(Uj ), Im(Uj ),A = ∆ or su(2n ), U0 = I, h, nseg
Labels: v = (1...1), dim v = |(Uj )|
−
FC Dense Network: UT → tanh(UT ; θ) = c+ Λ0 , tanh(UT ; θ) = cΛ0
Λ0 estimation: P
c+ + k+
Λ0 , A → Λ0 = Pk c τk , τk ∈ A;
− −
cΛ0 , A → Λ0 = k ck− τk , τk ∈ A
−
Λ0 layer: Λ+ 0 , Λ0 → Λ0
subRiemannian layer: Λ0 → (Uj ). Set Y = U0 .
For j in nseg :
Λ0 → Y Λ0 Y † = X
X → Ĥj = proj∆ (X), cj
Ĥj → Ûj+1 = exp(−hHj )
Y = Ûj+1
return (Ûj )
Fidelity: (Ûj ), (Re(Uj )), (Im(UJ )) → F (Ûj , Uj )
MSE: min C(F, 1) = n1 nj (1 − F (Ûj , Uj ))2
P

Update: ck → ck − γ∇ck C(F, 1)

4.10.4 Simulation Design

Simulation of training datasets for use in the machine learning models was under-
taken in Python. The simulation was adapted from Mathematica code accompa-
nying [54] with a number of adaptations. The code is constructed as a class with
the following hyperparameters: (i) n = dim(SU (2n )) for selecting the Lie group
of interest SU (2n ); (ii) nseg the number of segments (indexed by j) in the global
decomposition into subunitaries (Uj ); (iii) ntrain , the number of training examples;
(iv) a parameter for whether Λ0 ∈ su(2n ) or ∆; (v) a set of parameters for select-
ing (for SU (2)) which Pauli operators constituted ∆; (vi) a parameter for selecting
whether unitaries were to be generated in accordance with the example formulation
in [190], (vi) parameter for selecting h (which defaults to 1/nseg in the event of a null
entry. Upon selection of parameters, the class generates an extensive selection of
training data in various forms (see the relevant code repository for code with com-
mentary), including complex and realised iterations of UT , (Uj ), cj , ∆ and other key
inputs into the models. Training data was generated using a combination of stan-
dard Python numerical and scientific packages together with quantum simulation
software QuTip [139].

4.10.5 Generalisation: worked example

We include below an example of the controls generated by the GRU RNN Greybox
model for SU (2) when estimating an arbitrary z-rotation by angle θ. A randomly
4.10. ALGORITHMIC ARCHITECTURES 175

k cx,j cy,j
1 -4.651e-02 0.751e-02
2 -5.668e-02 0.622e-02
3 -5.777e-02 -1.504e-02
4 -5.917e-02 0.947e-02
5 -5.221e-02 -0.529e-02
6 -5.663e-02 0.800e-02
7 -6.137e-02 0.119e-02
8 -4.975e-02 -0.173e-02
9 -5.377e-02 -0.047e-02
10 -5.346e-02 0.079e-02
Table 4.3: Table of control amplitudes for generation of z-rotation by angle θ =-7.529e-01. At each
time-interval k, controls cx,j are applied to X generators and cy,j are applied to Y generators to
form Hamiltonian Hj , for time h = ∆t.

selected angle θ between ±2π was generated using Qutip, in this case θ =-7.529e-01,
generating the target unitary:
!
0.930 + 0.368i 0
UT =
0 0.930 − 0.368i

The generators in this case were σx and σy . To generate an approximate geodesic

sequence, we require the neural network to learn controls cx,j for X and cy,j for Y
for ten subsidiary Hamiltonians Hj (linear compositions of X, Y ):

Hj = cx,j X + cy,j Y

The specific controls applied at time step j are set-out in Table 4.3. The resulting
estimated unitary ÛT generated by applying each Hj for time ∆t is:
!
0.926 + 0.377i −0.003
ÛT =
0.003 0.926 − 0.377i

with the fidelity between the target and estimate F (ÛT , UT ) = 0.9999.
176 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

4.11 Differential geometry and Lie groups

4.11.1 Generating subspaces for geodesics

4.11.2 Product Operator Basis

Our experimental results and methods focus on synthesising quantum circuits for
multi-qubit systems where unitary operators are drawn from SU (2n ). For such
multi-qubit (qudit) systems, unitary operators U belong to Lie groups G = SU(2n )
which describe the evolution of n interacting spin−1/2 particles. These groups are
equipped with a corresponding Lie algebra of dimension (2n )2 − 1 = 4n − 1 and
denoted g = su(2n ), represented via traceless n × n skew-Hermitian (A = −A∗ )
matrices (see Appendix A). Solving time-optimal problems in such contexts often
relies upon appropriate selection of a subset of generators from the Lie algebra as
the control subset from which to synthesise a quantum circuit (see section C.5).
This is especially the case when selecting a control algebra that renders targets
UT reachable in a way that approximates geodesic curves (definition C.1.35) on
the relevant manifold M as the choice of one set of generators over another can
affect evolution time (and whether generated geodesics are indeed minimal, in cases
where multiple geodesics are available such as via great circles on a 2-sphere). Of
importance in selecting control subsets for time-optimal synthesis of geodesics in
multi-qubit systems [175,183,185,186,192] is the so-called product operator basis i.e.
a basis for the Lie algebra of generalised Pauli matrices, being tensor (Kronecker)
products (definition A.1.12) of elementary Pauli operators. The basis is formed by
Pauli spin matrices {Ix , Iy , Iz } = 1/2{σx , σy , σz } i.e. the sets of generators of rotation
in two-dimensional Hilbert space (and Lie algebra basis), with usual commutation
relations. A basis for SU(2n ) comprises of many-body tensor products of these
Pauli operators, i.e. for an n-dimensional operator, there are between 1 and n Pauli
operators tensor products with identities for various indices. An orthogonal basis
{iB} (frame) for su(n) is then given [175] in closed-form via:

Bs = 2q−1 Πnk=1 (Ikα )aks

where α = x, y, z and s indexes each basis element of the frame. The index aks is 1
in q of the indices and 0 otherwise, and is a way of representing:

Ikα = 1 ⊗ ... ⊗ Iα ⊗ 1

where Iα appears only at the kth position with the identity appearing everywhere
else. The parameter q tells us how many Pauli operators are tensor-producted
4.11. DIFFERENTIAL GEOMETRY AND LIE GROUPS 177

together e.g. q = 1 means the basis element only has one Pauli and the rest identities;
q = 2 means we are dealing with a basis formed by tensor products of two Pauli
operators and identities etc.

4.11.3 One- and two-body operators

Geometric control techniques for multi-qubit systems often focus on selecting one-
and two-body Pauli product operator frames (bases) for relevant control subsets
[53, 184] (see section C.5.5). If the control subset contains only one- and two-body
elements of the Lie algebra g, then curves generated in the corresponding Lie group
G are more likely (with a number of important caveats) to be approximations to
(and in the limit, as the number of gates n → ∞, representations of) geodesic
curves and thus time-optimal synthesis of target unitary propagators. The intuitive
reason for this is that of the full Lie algebra g, one- and two-body generators are less
‘expensive’ as measure for example by a metric calculating energy (equation C.4.2).
This approach can be seen across a number of key results in the literature [175, 184,
185, 192] and forms the basis for the relevant distribution used in subRiemannian
variational methods in [53,54] which the protocols developed in this Chapter expand
upon. The preference for one- and two-body Pauli operator frames arises in different
contexts.
For example, it is demonstrated in [175] in the case where G = SU (4) and K =
SU (2) ⊗ SU (2) that by finding an appropriate Cartan decomposition G = KAK
(with associated Lie algebra decomposition g = p ⊕ k) (see sections B.5 and B.4)
and maximally abelian Cartan subalgebra:

h = ispanIx Sx , Iy Sy , Iz Sz ⊂ p

(where Iα represent one-body terms and Sβ two-body terms), we can write exp(−ih) =
A in the KAK decomposition as the exponential of a linear combination of the gen-
erators in h, namely:

UF = K1 exp(−i(α1 Ix Sx + α2 Iy Sy + α3 Iz Sz ))K2

where K1 , K2 ∈ K = SU(2) ⊗ SU(2). In this case, any Hamiltonian from k can

be generated using the controls in p (essentially by showing they can generate the
two-body terms via action of the single-body operators I on S) and is time optimal.
Because synthesis depends on the evolution of the drift Hamiltonian according to
generators in k (as acted on via the adjoint action of K) and because this depends on
the coefficients of the generators αi , then the minimal time is given by the coefficients
178 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

of the generators in p used to steer Hd :

3
X
T = min |αi |
αi
i=1

hence the optimisation problem becomes relatively straight-forward. One rationale

for preferring one- and two-body generators [175] is that higher-order (i.e. more than
one-body) generators are shown to have coefficients which include a scalar coupling
strength J between the relevant spins such that each Hj has a coefficient 2π and the
two-body (Ii Si ) term has coefficient 2πJ. Thus a time-optimal problem becomes a
P
simpler optimisation problem of finding the minimal sum i αi satisfying:

UF = Q1 exp(−i2πJ(α1 Ix Sx + α2 Iy Sy + α3 Iz Sz ))Q2

where Q1 , Q2 ∈ K. The proof essentially relies on the fact that because synthesis
of Q1 , Q2 takes negligible time, then synthesis time is determined by the time to
synthesise A in the KAK which is determined by the parameters αi , hence minimal
time amounts to minimising the sum of αi . Synthesis time is thus minimal to the
extent that the ‘fewest-body’ Pauli generators are utilised in the control subset.
Thus, ideally, to generate minimal (and thus time optimal) paths in G to reach
arbitrary target unitaries UT , one should ideally choose the control subset with as
few many-body terms as necessary in order to render UT reachable in a control
sense. In this regard, we note recent work [146] regarding surprising constraints on
realisability and universality of unitary gate sets (in control language, reachability
of circuits) for unitary transformations on composite (e.g. multi-qubit) systems
generated by two-local unitaries. As noted in that work by Marvian, this may require
additional algorithmic tailoring and/or the use of ancilla qubits to circumvent such
restrictions. We have not in this work addressed such generic limitations, but they
are an important consideration in any practical application. It is an open research
question as to whether (and to what extent) machine learning techniques may also
provide a means to bridge such gaps in universality arising from the tension between
two-local unitaries and symmetry properties of composite systems.

4.11.4 Nielsen’s approach

Nielsen et al. [181] also focus on adopting one- and two-body terms in their metric-
based approach to characterising and generating time-optimal quantum circuits. For
example, the preference for one- and two-body generators is justified by imposing
a Hamming weight term wt(σ) applied to the Pauli generators σ together with a
penalty function p(·) that penalises the control functional whenever Pauli terms of
4.11. DIFFERENTIAL GEOMETRY AND LIE GROUPS 179

high Hamming weight are part of the control Hamiltonian. The idea is that Pauli
n-tuples (tensor products) of anything more than one- or two -body Hamiltoni-
ans will be penalised via a higher Hamming weight as they will have many more
non-identity elements, whereas one- and two-body operators have lower Hamming
weight). Nielsen et al. demonstrate that selection of one- and two-body generators
is optimal for calculating a lower bound on the complexity measure mG (U ) using
Finsler metrics i.e:
dF (I, U ) ≤ mG (U ) (4.11.1)

where G is a universal gate set in SU(2n ).

The significance of restricting control subsets together with bespoke metrics when
utilising geometric optimisation techniques is evident in later work [183]. For quan-
tum control optimisation architectures, this demonstrates the utility of Finsler met-
rics as a more general norm-based measure of distance (and thus optimality) together
with a justification of the selection of one- and two-body generators due on the basis
of Hamming weights. The use of the ‘penalty metric’ approach (see discussion in
the context of linear models in section D.4.1 and regularisation generally in section
D.3.2) is explored in further work [185, 192] however, as noted in [53], such ap-
proaches can be convoluted without providing guarantees that optimal generators
will be selected.

In [183], Nielsen et al. expand certain elements of the initial program com-
bining techniques from differential geometry and variational methods to quantum
circuit synthesis and quantum control. This second paper considered the difficulty
of implementing a unitary operation U generated by a time dependent Hamiltonian
evolving to the desired UT . They show that the problem of finding minimal circuits
is equivalent to analogous problems in geometric control theory i.e. this Chapter
has more of a focus on quantum control utilising geometric means. They select a
cost function on H(t) such that finding optimal control functions for synthesis of UT
(evolving according to the Schrodinger equation) involves finding minimal geodesics
on a Riemannian manifold (M, g) (see definition C.2.4).

In this case, H(t) is written in terms of a Pauli operator expansion:

′ ′′
X X
H= hσ σ + hσ σ (4.11.2)
σ σ

where the first summation is over one- and two-body terms, the second over all other
tensor products. A cost function is constructed with a penalty term p2 imposed that
180 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

penalises the higher-order terms:

v
u ′ ′′
uX X
F (H) = t hσ σ + p 2 hσ σ (4.11.3)
σ σ

with the total cost to be minimised given by:

Z T
d(U ) ≡ dtF (H(t)) (4.11.4)
0

Due to parametrisation invariance, F (a Finsler metric) can be rescaled such that

T = d(U ). The overall effect is to demonstrate that using O(n2 d(I, U )3 ) one- and
two-qubit gates, it is possible to synthesise a unitary UA satisfying ||UT − UA || ≤
c, where c is a constant and UT is the target unitary gate. Moreover, the work
demonstrates the optimality of unitary synthesis via following minimal geodesics in
the Lie group manifold generated by one- and two-body generators (as we focus on
below). Nielsen notes the number of one- and two-qubit terms (i.e. dim ∆) for the
relevant Lie algebra is given by

dim ∆ = 9n(n − 1)/2 + 3n (4.11.5)

a relatively trivial but important feature of the machine learning code in model
architectures explored above.

Later work [184] of Nielsen and Dowling provides a more directly applicable
example of how to develop analytic solutions to geodesic synthesis of unitary op-
erations. As with the discussion above, it is worth exploring the key results from
this Chapter in order to understand characteristics of relevance to any attempt
to utilise geometric methods for synthesis of unitary propagators for multi-qubit
systems. In the paper, they develop a method of deforming (homotopically) sim-
ple and well-understood geodesics to geodesics of metrics of interest. Intuitively,
the idea is to start with a known geodesic curve between I and UT and, sub-
ject to certain constraints, ‘bend’ it homotopically (that is, via mappings which
preserve topological properties) into a minimal-length curve. However, as demon-
strated in [184], a similar preference for one- and two-body terms is manifest in the
applicable lifted Hamilton-Jacobi equation (this work is also important for anyone
interested in geometric quantum control given its discussion of significant (and po-
tentially intractable) complexity constraints presented by the quantum extension
of the Rabarov-Rudich theorem and also extend geometric quantum computing to
include ancilla qubits.

In subsequent work utilising Nielsen et al.’s approach [192], the application of

4.11. DIFFERENTIAL GEOMETRY AND LIE GROUPS 181

penalty metrics is extended in order to show its utility in synthesising time-optimal

geodesics. In that paper, it is shown that a bound on the norm of the Hamiltonian
H(t) is equivalent to a bound on the speed of evolution, that is, such a bound implies
that minimal-time paths are minimal distance in which the norm function is used
as the distance measure. Given ||H(t)|| = E, they demonstrate that for any curve
connecting I and UT , the length of time-optimal curves is given by:
Z T Z T
L= ||H(t)||dt = Edt = ET (4.11.6)
0 0

where minimising evolution time T thereby minimises distance L. Hamiltonians

of interest are confined to a control subset A that disjunctively partitions the Lie
algebra M (i.e. equivalent to generators being drawn from p or k above) and cases
where ||H(t)|| ≤ E (where the Hamiltonian can rescaled i.e. reparametrised so
that the norm equals E at all points on the path which in effect keeps the path
identical but time shorter). They introduce a slight modification, that Tr(H 2 (t)) =
E 2 in order to introduce the quantum brachistochrone problem (see also [204]), a
quantum analogue of the brachistochrone (meaning ‘shortest time’) problem from
classical variational mechanics [44]. Their method in essence adopts the penalty-
metric approach of [184] such that in the limit, the lowest-energy solution tends
towards minimal time by reason of the increased cost associated with higher-order
(more than one- and two-body) generalised Pauli generators.

The approach in [192] is precisely to use the penalty metric approach of Nielsen et
al. to generate a subRiemannian geodesic equation in order to confine the generators
of the curve on the manifold to the control subset A. This is achieved by adopting
the norm-based cost function (pseudometric) where higher-order generator terms
are weighted with penalty q, so that minimisation will by extension favour those
generators (i.e. favour generators in p not k). By doing so, a sufficiently proximal
initial seed for the “shooting method” (see [192, 205]) is generated. This method
is a generic numerical technique for solving differential equations with two-point
boundary problems (where our two points are I and UT on G) and thus generating
approximate geodesics.

One of the other connections of Nielsen et al.’s work on geometric complexity to

Cartan decompositions lies in the intuitive understanding that, in a control setting,
sufficiently large penalty metrics in many ways correspond to a pseudo-partitioning
of bundles/spaces into those subspaces which have low energy cost to reach and
those subspaces which, although in principle reachable in a control setting (defini-
tion C.5.4), cannot be obtained feasibly. Thus discovering ways in which state spaces
that are effectively partitioned or which are equipped with an effective decomposition
182 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

arising from penalty metrics is of use.

Another motivation of restricting control subsets as in [53, 54] and our experi-
ments above) to one- and two-body terms is to be found via the geodesic approx-
imations via the decomposition of the Lie algebra into projective subspace opera-
tors [184, 196, 197]. In [192], this was achieved via setting P(H) = HP for one- and
two-body Pauli terms and Q(H) = HQ for three- or more-body Pauli terms using
the Jordan-Hahn decomposition (see equation (A.1.7)) such that

su(2n ) = P + Q H = HP + HQ (4.11.7)

The idea is that higher-order (three- or more-body) terms in {HQ } carry a penalty
parameter (weight) which is designed, when curve length is obtained via minimising
the action, to penalise higher-order terms in a way that the functional (solution)
to the variational problem is more likely to contain only one- and two-body terms.
Thus instead of restricting the sub-algebra of controls p to only one- and two-body
terms (such as is undertaken in Swaddle), they instead (as per Nielsen’s original
paper) begin with full access to the entire su(2n ) Lie algebra (i.e. fully controllable)
and then proceed to impose constraints in order to refine this down to geodesics
comprising only (or mostly) one- and two-body terms. The distinction with the
subRiemannian approach adopted in [54] is that in the latter case, generators for U
are by design constrained to be drawn from P = ∆ via the projection function in
equation (4.5.13), circumventing imposition of Finslerian penalty metrics.

4.12 Comparing geodesic approximations

Generation of geodesics in a QML context relies upon the availability of ways to
compare whether outputs of machine learning models do in fact closely approximate
geodesic curves. Thus the availability of reliable analytic and numerical methods
for the generation of geodesics for use as training, testing and validation datasets
is important. In our work, we sought to adapt the novel algorithmic approach to
geodesic synthesis from [54] to include performance metrics of relevance to quantum
information processing, such as fidelity measures. By comparison, [190] sets out an
algorithm for determining time-optimal sub-Riemannian geodesics in SU (2) which
can be used to benchmark the performance of different machine learning approaches
to synthesising approximate geodesics. While the derivation of time optimal pa-
rameters in [190] relies upon complicated sequence of coordinate transformations
which is not easily scalable, it does provide a useful basis for comparison with the
methods in [54]. In [190], it is shown that time-optimal paths, where target unitaries
constitute rotations by angle θ about the z-axis with generators being Pauli X and
4.12. COMPARING GEODESIC APPROXIMATIONS 183

Y operators, can be synthesised in time-optimal fashion by following ‘circular’ or

holonomic paths along which they are parallel-transported. On the Bloch sphere,
this is represented as ‘circular’ paths emanating from the north pole whose diameter
increases with increasing θ ∈ [0, 2π]. Intuitively, the greater the angle of rotation,
the greater the diameter of the holonomic path (see section C.1.32).

To this end, one of the objectives of our experiments was to ascertain the relia-
bility of a few different methods of generating geodesics using methods drawn from
geometric control sources. In order to do so, we compared this variational geodesic
generation [46] approach to a known method for analytically determining subRie-
mannian geodesics in SU (2) in [190]. By doing so, we can be confident that the
variational method appropriately approximates geodesics. The challenge posed in
comparing geodesic methods lies in the differing assumptions of each method: [54]
constrains the norms ||proj∆ (u0 )|| = ||u0 || as a means of more efficiently generat-
ing subRiemannian geodesic approximations [53], which is in effect the time scale
(or energy scale) of their method. Conversely, [190], works at different scales. In
practice this means the generators for unitary evolution via each method differ by a
scaling related to the norm of the generators. Such different parameterisations can
be understood as follows:

Swaddle parametrisation Boozer parametrisation

(S) (B)
||Hj || = Ωj ||Hj || = 1
(S) (B)
dtj = h = 1/N dtj = Ωj h/1
j j
(S)
X (B)
X
tj = h = jh tj = Ωj h/1
k=1 k=1

For some desired tolerance (difference) ϵ, the two approximations at are identical if
the cumulative norms D(H (S) , H (B) ) of the sum of their jth Hamiltonians satisfy:

X Hj(S) (B)
(S) (B)
D(H ,H )= − Hj < ϵ. (4.12.1)
j
Ω j

That is, we want to minimise the distance between each Hamiltonian segment. The
result in [190] is a relatively simple control problem where the control subset consists
of Pauli σx , σy generators with the target a rotation about the z-axis by angle η, UT =
exp(−iησz /2). To validate that variational subRiemannian method can reproduce
the time-optimal paths from [190], a transformation between the two that enables
comparison of Hamiltonians at time tj respectively in each formulation must be
found. Pseudocode for such a transformation (in effect, a rescaling) of Hamiltonians
generated using the method in [54] by comparison with those using the method
184 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

(B) (S)
j D(H (S) , H (B) ) F (Uj , Uj )
1 0.0922 0.9934
2 0.1986 0.9935
3 0.3105 0.9935
4 0.4169 0.9936
5 0.5154 0.9936
6 0.6046 0.9936
7 0.6836 0.9937
8 0.7518 0.9937
9 0.7871 0.9938
10 0.8345 0.9939
Table 4.4: Hamiltonian distance and unitary fidelity between Swaddle and Boozer geodesic ap-
proximations.

in [190] is set-out below (where (S) indicates Hamiltonians using the method in [54]
and (B) the method in [190]).

Algorithm 5 Comparison of subRiemannian and analytic geodesic circuits in SU (2)

(S)
Generate Hj
(S)
Calculate ||Hj || = Ωj
(B)
Calculate tj = jk=1 Ωj h
P
(B) (B)
H0 = Ω1j H1
(B) σz (B) σz
(B) (B)
Calculate Hj = e−iωtj 2 H0 eiωtj 2

(B)
Here, conjugation by exp(−iωtj σ2z ) represents the Euler decomposition of the
evolution in [190] as if one had direct access to the generator σz . Alternatively, one
(B) (S)
can also compare unitaries at equivalent times via operator fidelity F (Uj , Uj )
where:

(B) (B) (B)

Uj = exp(−iHj dtj ) (4.12.2)

where we again use the assumption:

(B) (B) (B) (B)
(B)
UF ≈ e−iHN dtN
...e−iH1 dt1
. (4.12.3)

Numerical results comparing both Hamiltonian average distance (4.12.1) and fideli-
ties for ten Uj instances across N segments are set-out below.

(S) (B)
Fidelity results indicate little difference between Uj and Uj , while Hamiltonian
distance increases with j. Overall, the results provide some measure of confidence,
though not analytic certainty, that the variational subRiemannian means of geodesic
approximation in [54] are useful candidates for training data.
4.13. NEURAL NETWORK AND GRU ARCHITECTURES 185

4.13 Neural network and GRU architectures

In this section, we briefly summarise aspects of the neural network architectures

adopted in our experiments above. More background theory and detail can be
found in Appendix D.

4.13.1 Feed-forward neural networks

Feed-forward fully-connected neural networks (see section 4.13.1 for general discus-
sion), such as the ones deployed in the models above, can be understood in terms
of functional composition. The objective of deep feed-forward networks is to define
an input-output function z = f (a, w, b) where al are inputs to the layer l (setting
the initial input a0 = x), wl is a tensor of parameters for layer l to be learnt by the
model and bl is a bias tensor applied to al [14, 206]. In its simplest incarnation, the
feed-forward stack takes as input a flattened realised a0 = UT (where k runs over
the dimension of the vector). A layer of a simple neural network consists of units or
neurons activation functions σ (in our case, the ReLU or tanh activation function)
applied to the z such that we have al = σ(z l ), vector and bias b:

al = σ(z l ) = σ(wl al−1 + bl ) (4.13.1)

where we notice that the output of the previous layer is the input vector into the
subsequent layer. All final layers in the feed-forward networks used σ = tanh activa-
tion functions. The output of an entire layer al is a sequence structured as a vector
that then becomes the input to the next layer. Information in this compositional
model flows ‘forward’ (hence ‘feed-forward’).

When the entire set of units of a preceding layer becomes an input into each unit
of the subsequent layer, we say the layer is dense. The weights are updated using
backpropagation and gradient descent with respect to the applicable cost functional
(description from [206] below, here ⊙ is the Hadamard (element-wise) product), x
refers to each training example (batch gradient descent example below).
186 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING

Algorithm 6 Stochastic gradient descent and backpropagation (batch) [206]

Input: Set x = a0
Feed-forward: For m layers, for l = 2, ..., m calculate:
z x,l = wl ax,l−1 + bl
ax,l = σ(z x,l )
σ = tanh for l = m
Output layer (L = m) error δ x,l :
δ x,l = ((wl+1 )T δ x,l+1 ) ⊙ σ ′ (z x,l )
∂ax,L
σ ′ = ∂zx,L
k

k
k runs over neurons in layer L
Backpropagation: for layers l = L − 1, L − 2, ..., 2, calculate:
δ x,L = ∇a Cx ⊙ σ ′ (z x,L )
Gradient: cost function gradient given by:

∂c
x,l
(∂wjk
= ax,l−1
k δjx,l and ∂C
∂bx,l
= δjl
j
Update weights: for each layer l = L, L − 1, ..., 2 update:
η P
wl → wl − m x,l x,l−1 T
x δ (a )
l l η P x,l
b → b − m xδ

4.13.2 LSTMs and GRUs

Long-Short Term Memory networks and Gated Recurrent Units are a prevalent
form of recurrent neural network (RNN). RNNs are networks tailored to modelling
sequential data, such as time-series data, or data such as sequences of control am-
plitudes (cj ) [14]. For RNNs, for each time-step t, there is an input xt (such as ct ),
an output yt and hidden-layer output ht . The key intuitive idea behind RNNs is
that ht of the network itself becomes an input into hidden layers for the immediately
next time-step t + 1. LSTMs advance upon this concept by enabling the output of
hidden layers to influence not just the immediately succeeding time-step t + 1, but
also potentially activation functions at later time steps. In this sense LSTMs allow
information about previous hidden layers (or states) to function as ‘memory’ that
is carried forward.
One of the challenges regarding RNNs is the saturation of networks where new
inputs to an activation function fail to contribute significantly to its output. In-
tuitively too much information is saturating the model, so additional information
does not lead to material updates (manifest, for example in flatlining loss, as seen
in some examples above). A way to overcome this problem of saturation includes
to stochastically ‘forget’ certain information in order to make room for additional
information, as manifest in the forget gate of an LSTM, distinct from the update
gate. GRUs [207] by contrast seek to incorporate the output of hidden layers and
updates into subsequent hidden layers as detailed below. Their popularity is often
4.13. NEURAL NETWORK AND GRU ARCHITECTURES 187

owing to their improved speedup over LSTMs for a variety of contexts.

The reset gate combines the input xt at time t with the previous time-step hidden
state ht−1 to define a reset output rt [208]:

rt = σ(wr xt + ur ht−1 + br )

where wr , ur are updatable weight matrices and br is an applicable bias, with σ an

activation function (in our models, the tanh function to produce control amplitudes
(cj ) but usually the sigmoid function). The update gate remains:

zt = σ(wz xt + uz ht−1 + bz )

This update gate is the output of the unit at time t. However, in order to output
ht , an intermediate hidden layer state is calculated:

h̃t = tanh(wh xt + uh (rt ⊙ ht−1 ) + bh )

where we see the (rt ⊙ ht−1 ) term incorporates the influence of the reset gate and
previous hidden layer ht−1 into the estimate. The final hidden layer output is then
calculated by combining the Hadamard products of the update gate and previous
hidden state together with the intermediate hidden state:

ht = zt ⊙ ht−1 + (1 − zt ) ⊙ h̃t

which is the ultimate output at time t. The incorporation of ht−1 in this way allows
influence of prior information in the sequence to influence future outputs, improving
the correlation between outputs such as controls.
188 CHAPTER 4. QUANTUM GEOMETRIC MACHINE LEARNING
Chapter 5

Global Cartan Decompositions for

KP problems

5.1 Abstract
Geometric methods have useful application for solving problems in a range of quan-
tum information disciplines, including the synthesis of time-optimal unitaries in
quantum control. In particular, the use of Cartan decompositions to solve prob-
lems in optimal control, especially lambda systems, has given rise to a range of
techniques for solving the so-called KP problem, where target unitaries belong to
a semi-simple Lie group manifold G whose Lie algebra admits a g = k ⊕ p decom-
position and time-optimal solutions are represented by subRiemannian geodesics
synthesised via a distribution of generators in p. In this Chapter, we propose a
new method utilising global Cartan decompositions G = KAK of symmetric spaces
G/K for generating time-optimal unitaries for targets −iX ∈ [p, p] ⊂ k with con-
trols −iH(t) ∈ p. Target unitaries are parametrised as U = kac where k, c ∈ K and
a = eiΘ with Θ ∈ a. We show that the assumption of dΘ = 0 equates to the corre-
sponding time-optimal unitary control problem being able to be solved analytically
using variational techniques. We identify how such control problems correspond
to the holonomies of a compact globally Riemannian symmetric space, where local
translations are generated by p and local rotations are generated by [p, p].

5.2 Introduction
Symmetry-based decompositions [23,55,209–213] are a common technique for reduc-
ing problem complexity and solving constrained optimisation problems in quantum
control and unitary synthesis. Among various decompositional methods, Cartan
KAK decompositions (section B.5) represent a generalised procedure for decom-

189
190CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

posing certain semi-simple Lie groups [2] exhibiting involutive automorphic symme-
try, akin to generalised Euler or singular-value decompositions. Cartan decompo-
sitions have found specific application across a range of domains, such as synthe-
sising time-optimal Hamiltonians for spin-qubit systems in nuclear magnetic reso-
nance [22, 152, 175], linear optics [214], general qubit subspace decomposition [189],
indirectly relating to entanglement dynamics [215] and the entangling power of
unitary dynamics in multi-qubit systems [216]. Other approaches in information
theory [217] use Cartan decompositions for quantum Shannon decompositions and
quantum circuit programming [218]. More recently, their use has been proposed for
efficient measurement schemes for quantum observables [219], fixed-depth Hamilto-
nian simulation [220], reducing numerical error in many-body simulations [221] and
time-reversal operators [222] and also measurement of quantum observables [219].
Cartan decompositions have been of interest in quantum unitary synthesis due to
the fact that multi-qubit systems carrying representations of SU (2n ) can often be
classified as type AI and AIII symmetric spaces [15, 22, 175, 218, 223, 224]. Specific
interest symmetric space formalism in quantum computing has largely been due
to their use in synthesising time-optimal or more efficient or controllable quantum
circuits [175, 181, 187, 189, 214, 216, 221, 222, 225, 226].

Symmetry-based decompositions (see section B.4 and D.8), such as Cartan de-
compositions, have two-fold application in quantum control problems: firstly, symmetry-
decompositions can simplify unitary synthesis via reducing the computational com-
plexity [227]; secondly, they have specific application in quantum control settings
where the control algebra (and thus set of Hamiltonians) available are only a subal-
gebra of the corresponding full Lie algebra g [53, 175, 228]. This Chapter focuses on
the second use case. For unitary targets belonging to semi-simple connected groups
UT ∈ G amenable to Cartan decomposition as G = KAK (where A < G/K and
K < G), a key challenge is identifying the Hamiltonian which will synthesise the uni-
tary optimally. While broadly utilised, Cartan-based unitary synthesis techniques
have suffered from practical limitations due to exponential circuit depth [181, 193]
together with difficulty in identifying the appropriate form of Cartan decomposi-
tion [227] and form of Hamiltonian. In this Chapter, we address such challenges by
providing a generalised procedure for time-optimal unitary and Hamiltonian syn-
thesis using a global Cartan decomposition. Specifically, we demonstrate that for
parametrised unitaries with targets in G = KAK, by utilising what we denote the
constant-θ method, Hamiltonians composed of generators from the horizontal (anti-
symmetric) bracket-generating distribution of the underlying Lie algebra associated
with G can, in certain cases, represent time-optimal means of synthesising such
unitaries.
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 191

5.3 Symmetric spaces and KP time-optimal con-

trol

5.3.1 Overview - setting the scene

Under the postulates of quantum mechanics (axiom A.1.3), unitary motion is defined
by the Schrödinger equation (definition A.1.18):

dU
= −iHU (5.3.1)
dt

whose solution is often formally written as the time-ordered exponential (equation

(A.1.22)):
RT
U (T ) = T+ e−i 0 H(t)dt
U0 . (5.3.2)

Usually, these kind of equations are considered as matrix or operator equations in

a given representation. However, such equations can be considered more funda-
mentally differential geometric in nature. This is because time-ordered exponentials
can be expanded in commutators which themselves have a structure that is often
universal. Universal in this context refers to the fact that a matrix representation
can be one of an infinite number of otherwise inequivalent representations. Perhaps
the most useful and well-known example is angular momentum, with commutators
that define the group of virtual rotations. To emphasize angular momenta and rota-
tions as universal is to appreciate that they are concepts which precede or transcend
any particular matrix representations, and this universality serves as a platform for
understanding details such as energy quantization.
The differential geometry of universal structures was a famous passion of Poincaré,
who originated the ideas of “universal covering group” and “universal enveloping al-
gebra”, and Klein, for whom the “universal properties” of category theory are an
homage (see [7, 8] and [3] for historical background). In quantum physics however,
although it is certainly useful to have an intuitive understanding that rotation is
universal, there is usually no need to work directly with things like universal covers.
Instead, the standard is to assume a Hilbert space (definition A.1.9) and immediately
consider angular momenta as acting on states to logically derive the spin quantum
numbers and consequent matrix representations. Indeed, this quantum standard to
act on states is why equation (5.3.1) is often called “Schrödinger.” Followed by the
basic fact that the matrix group SU (2) is the fundamental representation of the
universal cover of 3-dimensional rotations, one can then use this matrix group (and
perhaps a little good faith) to come to understand the universal. However, if more
192CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

general transformation groups are considered, calculations with matrix representa-

tions can make the universal geometry they are representing far less apparent. This
article demonstrates that the so-called KP-problems, can be solved completely with
the help of some universal techniques. The KP problems are a family of unitary
control problems (section A.2) where the Hamiltonian control space p is a Lie triple
system, defined by the property [p, [p, p]] ⊆ p, and the target Hamiltonian is in
[p, p] ⊆ k. We expand on this below and in section C.5.6.

5.3.2 KP problems for Lambda systems

Geometric control theory (section C.5) is characterised by the study of the role of
symmetries in optimal control problems. The KP problem has been most exten-
sively detailed by Jurdjevic et al. [23] in the context of geometric control theory.
A common application of techniques of geometric control involves so-called lambda
systems, three-level quantum systems where only two of the levels are encoded with
information of interest. In such systems, the two lowest energy states of a quantum
system are coupled to the highest third energy state via electromagnetic fields [63].
The typical model for the study of lambda systems in quantum control settings is
the Schrödinger equation in the form:

dU (t) X
= ÂU (t) + Bj U (t)ûj (t) U (0) = I (5.3.3)
dt j

where U (t) ∈ SU (3) and Â is a diagonal matrix comprising the energy eigenval-
ues of the system. The control problem becomes finding control functions û(t) to
synthesise UT = U (T ) in minimum time subject to the constraint that ||û|| < C
for constant bound C. The unitary targets are UT ∈ G, a semi-simple Lie group
admitting a Cartan decomposition into compact and non-compact subgroups K and
P respectively (hence the nomenclature “KP ”). As discussed in [56] and elsewhere
in the literature [181, 184], time-optimisation can be shown to be equivalent to syn-
thesising subRiemannian geodesics (section C.4.1) on a subRiemannian manifold
implied by the homogeneous space G/K. In this picture, the minimum synthesis
time equates to the arc length (equation (C.4.1) (measured according to the appli-
cable subRiemannian metric) of the subRiemannian curve γ calculated by reference
to the pulses u(t) (see equation (5) in [56]). Intuitively, in this picture, the control
Hamiltonian in p traces out a minimal-time along the manifold. For certain classes
of KP problem, Jurdjevic et al. [17,23,60,213] have shown that geodesic solutions to
equation (5.3.3) take a certain form (see example below). In [213] this is expressed
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 193

as:

dγ
= γ eKt P e−At

(5.3.4)
dt
γ(t) = e(−A+P )t e−At (5.3.5)

with A ∈ k and assuming γ(0) = I for a symmetric matrix P (that is in p). Discussion
of the case in which e(−A+P )t is a scalar is set out in [229]. Detailed exposition by
Jurdjevic is set out in [23,60,213] and elsewhere. In this Chapter, we show that such
results can be obtained by a global Cartan decomposition in tandem with certain
constraints on chosen initial conditions U (0) = U0 and the choice of generators Φ ∈ k
used to approximate the target unitary in K.

5.3.3 KAK decompositions

In many quantum control and quantum algorithm settings, the target unitary UT is
an element of a semi-simple Lie group G such that UT ∈ G. In certain cases, G may
be decomposed into a homogeneous space G/K where K represents an isometry
(stabilizer) subgroup K < G. Such semi-simple Lie groups can be decomposed as
G = KAK where a ⊂ p and A = exp(a), a decomposition known as a Cartan
decomposition [2, 15, 17, 20, 152, 183, 186]. We discuss this decomposition at length
in Appendix B, particularly in sections B.5 and B.4. The decomposition G = KAK
is considered a global Cartan decomposition (applying to the entire group) rendering
G/K a globally Riemannian symmetric space (see section C.3). The corresponding
Lie algebra g may similarly be decomposed as g = k ⊕ p where p = k⊥ . The existence
of a Cartan decomposition is equivalent to the satisfaction of the following canonical
Cartan commutation relations (definition B.5.2):

[k, k] ⊆ k [p, p] ⊆ k [p, k] ⊆ p. (5.3.6)

Given a choice of maximally non-compact Cartan subalgebra a ⊂ p, G can be

decomposed as G = KAK, where A = ea . Doing so allows unitaries in G to be
written as:

U = keiΘ c (5.3.7)

where k, c ∈ K and eiΘ ∈ A (where θ parametrises a generators in a, e.g. a ro-

tation angle). Satisfaction of Cartan commutation relations is also equivalent to
the existence of an involution (see definition B.5.1) χ2 = I which partitions G
(and g) into symmetric χ(k) = k and antisymmetric χ(p) = −p subalgebras. Here
K = ek , A ⊂ P = ep [2, 230, 231]. Elements of G can be written in terms of the
194CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

relevant group of action of subgroups K and A, including, where relevant, unitary

elements of G. Arbitrary targets UT ∈ G remain reachable, however elements of
the vertical (symmetric) subalgebra must be indirectly synthesised via application
of the Lie derivative (bracket) i.e. [p, p] ⊆ k. Such decompositions are manifestly
coordinate-free when represented using differential forms. Satisfying such criteria
allows G/K to be equivalently characterised as a Riemannian (or subRiemannian)
symmetric space.

We propose that for certain classes of quantum control problems, namely where
the antisymmetric centralizer generators parameterised by angle θ remain constant,
analytic solutions for time-optimal circuit synthesis are available for non-exceptional
symmetric spaces. Such cases are explicitly where control subsets are limited to cases
where the Hamiltonian comprises generators from a horizontal distribution (bracket-
generating [2,230] and see definition C.4.4) p with p ̸= g (where the vertical subspace
(definition C.1.25) is not null). Only access to subspace p ⊂ g is directly available
for control purposes. If [p, p] ⊆ k holds, arbitrary generators in k may be indirectly
synthesised (via application of Lie brackets) which in turn makes the entirety of
g available and thus, in principle, arbitrary UT ∈ G (if [p, p] = k) reachable (in a
control sense, see definition C.5.4).

5.3.4 Sketch of constant-θ method

Here we sketch the constant-θ method. In subsequent sections, we provide worked

examples. A generalised form of the method is set out in section 5.6 below. A target
unitary UT ∈ G may be decomposed into a Cartan G = KAK coordinate system
can be written in the form:

-1 Θc
U = keiΘ c = qe−ic (5.3.8)

where c, q, k ∈ K and eiΘ ∈ A ⊂ P . Locally forward evolution in time can be

written differentially as:

dU U -1 = −iHdt (5.3.9)

where the left-hand side consists of parameters over the manifold of unitaries (or the
group G), while the right-hand side consists of parameters that are in a geometric
sense external to the manifold, in the vector field (definition C.1.7) associated with
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 195

the manifold. Taking (5.3.8) as our unitary, then:

-1
dU U = k k dk + cos adΘ (dcc ) + i dΘ + sin adΘ (dcc ) k -1 .
-1 -1 -1
(5.3.10)

Geometrically, we can interpret the left K as a local frame (a choice of basis for
the tangent space Tp M at each point p of the group G - see definition C.1.4), the
right K as a global azimuth (a parameter that describes the position of a point on
the sphere and is the same for all points along the ‘longitude’ (or orbits of) K),
and the Θ as the polar geodesic connecting the two (akin to latitude). The latter
two sets of parameters are the coordinates of a symmetric space. Presented in this
way, the Schrödinger equation represents a Maurer-Cartan (differential) form [2] (see
definition C.1.20), encoding the infinitesimal structure of the Lie group and satisfies
the Maurer-Cartan equation (equation (C.1.39)). For our purposes, it allows us to
interpret the right-hand side in terms of the relevant principal bundle connection (see
sections C.1.6.1 and C.1.7). Intuitively, a connection provides a way to differentiate
sections of the bundle and to parallel transport (definition C.1.8) along curves in
the base manifold G. The minimal connection (see section 5.7 for exposition) here
is a geometric way of expressing evolution only by elements in p ∈ HM (i.e. the
horizontal subspace) where the quantum systems evolves according to generators in
the horizontal control subset of the Hamiltonian given by:

i dΘ + sin adΘ (dcc -1 ) . (5.3.11)

The evolution of the system can be framed geometrically as tracing out a path on the
manifold (a Riemannian symmetric space) according to generators and amplitudes
(and controls) in the Hamiltonian. This path is a sequence of quantum unitaries. In
quantum mechanics, we are interested in minimising the external time parameter.
For this, we typically minimise the action such that the total time (length) of a
circuit evolving is:
Z p
ΩT = (idU U -1 , idU U -1 ). (5.3.12)
γ

In common with geometric methods, the integral is usually parametrised by path

length s (so integrated between 0 and 1 - see the examples below) of the curve γ
traced out along the manifold. Here Ω = |H| and where (X, Y ) = Tr(adX adY ) is the
(representation independent) Killing form (see definition B.2.12) which, in certain
cases induces a scalar form (see below and [230, 232]). As is typical in differential
geometry, the minimisation of evolution time of the curve γ ∈ G (where γ represents
a unitary circuit parameterised by arc length (see equation (C.2.8)) occurs over a
196CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

typical geometric path length s where s ∈ [0, 1] parametrises the path from beginning
to end and ds = Ωdt (see the Appendix for more discussion). To find the solution
for minimal time T , we consider paths such that dΘ = 0 (and thus dθ = 0) which we
denote the ‘constant-θ’ method. The constant-θ method implies that for the Cartan
algebra parameter θ, we have dθ = 0. We conjecture that upon local variation of the
total time with respect to the path, we obtain Euler-Lagrange equations which show
that dqq -1 must also be constant for locally time-optimal paths. The total time for
locally time-optimal constant-θ paths becomes:

ΩT = min | sin adΘ (Φ)| (5.3.13)

θ,ϕ

where:
Z
iΦ = dcc -1 . (5.3.14)

Determining the time-optimal geodesic path requires global variation over all geodesics,
typically a hard or intractable problem. As shown in the literature [181,184,185,233],
variational methods can be used as a means of calculating synthesis time. For
constant-θ paths, such calculations are significantly simplified as we demonstrate
below.

5.3.5 Holonomy targets

The local frame has two coordinate systems that are particular, the cardinal (basis)
frame “k” and the geodesic frame “q.” The cardinal frame describes the local,
compact part of the group, while the geodesic frame describes the global, non-
compact part of the group (see [9] for detailed discussion of such concepts). We
define a target in the holonomy group K to be one such that:

U (T )U (0) -1 = eiX ∈ K. (5.3.15)

Holonomy group (definition C.1.32) refers to the set of group transformations a

vector undergoes when it is parallel transported around a closed loop in a manifold.
For symmetric spaces, the choice of connection gives rise to an implicit subgroup
K < G which acts as a holonomy group whose action traces out equivalence classes
of orbit (and so in this sense k defines rotations) while p defines translations between
orbits (intuitively seen in the key commutation relation [p, p] ⊆ k). We define the
connection for the geodesic frame:
h i
q -1 dq = c -1 (1 − cos adΘ )(dcc -1 ) c (5.3.16)
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 197

such that the Hamiltonian does not contain elements of k. The intuition for a qubit
is that rotations achieved via Jz are by construction parallel transporting vectors
via the action of [p, p] ⊆ k. Where Jz is not in the control subset, a way to parallel
transport such vectors, achieving a Jz rotation, but using other generators not in K
is needed. For constant-θ paths:

AdΦ (Θ) = Θ and therefore X = (1 − cos adΘ )Φ. (5.3.17)

Note the first of these conditions is explicated as a condition that the generators
comprising Φ ∈ k belong to the commutant (see the general method section 5.6 for
discussion). Holonomic targets may be, under certain assumptions, generated via
unitaries U ∈ G/K which in a Riemannian context are paths with zero geodesic
curvature, indicating parallel translation in a geometric sense. By contrast, where
the manifold is subRiemannian (i.e. where p < g) then subRiemannian geodesics
may exhibit non-zero geodesic curvature by comparison with G and g as a whole.

5.3.6 Symmetric space controls in p

Summarising the above, our conjecture is that for symmetric space controls in p with
holonomy targets in k that constant-θ paths are time-optimal. For such constant-θ
paths, the objective becomes to calculate:

ΩT = min sin adΘ (Φ) (5.3.18)

Θ,Φ

under the constraints

AdΦ (Θ) = Θ (5.3.19)

and

X = (1 − cos adΘ )Φ. (5.3.20)

The variations can be performed elegantly by the feature that Θ is in a symmetric

space and Φ is in a reductive Lie group. From this point, the problem of minimising
time is undertaken using variational techniques (such as Lagrange multipliers). The
constant-θ assumption allows us to simplify this problem to a significant degree. We
also show in our general exposition how a transformation to a restricted Cartan-
Weyl basis can also assist in simplifying the often challenging global minimisation
problem.
198CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

5.3.7 Cartan decomposition

To explicitly connect the G = KAK decomposition to the typical Schrödinger equa-

tion, consider the KAK decomposition of a unitary U ∈ G given by:

U = qea k (5.3.21)

where q, k ∈ K and ea ∈ A. Schrödinger’s equation can be written consistent with

the Maurer-Cartan form (see section 5.6) as:

dU U −1 = −iHdt. (5.3.22)

Expanding out we have:

dU = d(qea k) = ea kdq + qea kda + qea dk (5.3.23)

U −1 = (qea k)−1 = k −1 e−a q −1 . (5.3.24)

Which resolves to:

dU U -1 = dqq -1 + qdaq -1 + qea dkk -1 e−a q -1 (5.3.25)

= dqq -1 + qdaq -1 + qeada (dkk -1 )q -1 . (5.3.26)

The adjoint term can be decomposed into symmetric and anti-symmetric parts:

eada (X) = cosh(ada )(X) + sinh(ada )(X) (5.3.27)

| {z } | {z }
even powers odd powers

such that:

eada (dkk −1 ) = cosh(ada )(dkk −1 ) + sinh(ada )(dkk −1 ) . (5.3.28)

| {z } | {z }
∈k ∈p

As a ⊂ p and given the Cartan commutation relations, we have [a, k] ⊂ p while

[a, [a, k]] ⊂ k. Thus the symmetric term in equation (5.3.28) above, comprising even
powers of generators is in k while the antisymmetric term, comprising odd powers
of generators, will be in p. Rearranging equation (5.3.22):

dU U -1 = q[da + sinh(ada )(dkk -1 ) + q -1 dq + cosh(ada )(dkk -1 )]q -1 . (5.3.29)

The first two terms are in p while the latter two are in k. The orthogonal partitioning
from the Cartan decomposition ensures simplification such that cross-terms such as
5.3. SYMMETRIC SPACES AND KP TIME-OPTIMAL CONTROL 199

Tr(pk) vanish. Thus:

Tr (dU U -1 )2 = Tr (da + sinh(ada )(dkk -1 ))2 + Tr (q -1 dq + cosh(ada )(dkk -1 ))2

(5.3.30)

and:

Tr (dU U -1 )2 = Tr da2 + Tr sinh(ada )(dkk -1 ))2 + Tr (q -1 dq + cosh(ada )(dkk -1 ))2

(5.3.31)

using Tr(XadY (Z) = −Tr(adY (X)Z).

To demonstrate the constant-θ method, we assume that our controls (generators

in our Hamiltonian) are in p. Doing so restricts our Hamiltonian to the horizon-
tal subspace (definition C.1.26) under which the quantum state is parallel trans-
ported as the system evolves. This is equivalent to approximating a subRiemannian
geodesic circuit over the relevant differentiable manifold for our chosen group G/K.
The choice of generators that enables such parallel transport without the use of
generators in k is equivalent to a Hamiltonian comprising only generators in p, en-
suring the quantum state undergoes parallel transport along curves γ ∈ G/K such
that ∇γ̇ γ̇ = 0 (see section C.1.9). In equation (5.3.29) this is equivalent to setting
da + sinh(ada )(dkk -1 ) = 0 or da = − sinh(ada )(dkk -1 ) (in geometric parlance, set-
ting a minimal connection see sections 5.7 and C.1.7 generally). Using a change of
variables we denote:

a = iΘ k = eiΦ q = eiΨ (5.3.32)

where dkk -1 = idΦ and dqq -1 = idΨ. Using cosh(adiΘ ) = cos(adΘ ) and sinh(adiΘ ) =
sin(adΘ ) the connection becomes:

dΨ + cos(adΘ )(dΦ) = 0. (5.3.33)

Assuming a constant theta dΘ = 0 together with the above transformations allows

our Hamiltonian in equation (5.3.29) to be written as:

Hdt = eiΨ (−i sin(adΘ )(dΦ))e−iΨ . (5.3.34)

From this point, we must then solve the minimisation problem.

200CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

5.4 Time-optimal control examples

We demonstrate such time-optimal analytic solutions for a few common types of
symmetric space quantum control with targets in SU (2) and SU (3) to explicate the
constant-θ method. We generalise this method in section 5.6.

5.4.1 SU(2) time-optimal control

Consider G = SU (2) with isometry group (definition A.1.17) K = S(U(1) × U(1)) =
span{eiησz }. Further define Pauli matrices {σk } with angular momenta Jk = 21 σk .
Define the Cartan conjugation χ as:

U χ = e−iJz π U † eiJz π (5.4.1)

together with a Cartan projection π(U ) = U χ U . To define the relevant Cartan

decomposition, we note the quotient group G/K corresponds to the symmetric space:

SU (2)
S2 ≈ ≡ AIII(1,1) (5.4.2)
S(U (1) × U (1))

where S 2 = π(G) = G/K. Detail on the classification of symmetric spaces is

set out in section C.3.2). As distinct from the projection in [190], the projection
is representation independent. The Cartan projection above defines the relevant
G = KAK decomposition. The corresponding Cartan decomposition of the Lie
algebra g satisfying equations (5.3.6) is:

g = su(2) = ⟨−iJz ⟩ ⊕ ⟨−iJx , −iJy ⟩ (5.4.3)

| {z } | {z }
k p

with Cartan subalgebra chosen to be a = ⟨−iJy ⟩. We can easily see the Cartan
commutation relations (equation (5.3.6)) hold. For S 2 above this corresponds to the
Euler decomposition:

U = eiJz ψ eiJy θ eiJz ϕ . (5.4.4)

Define:

k = eiJz ψ iΘ = iJy θ c = eiJz ϕ q = kc = eiJz χ (5.4.5)

to represent the unitary as:

-1 Θc
U = keiΘ c = qeic . (5.4.6)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 201

Under this change of variables, the Cartan conjugation becomes:

k χ = k -1 (eiΘ )χ = eiΘ (U V )χ = V χ U χ (5.4.7)

with (using the adjoint action (definition B.2.7)):

-1 iΘc
π(keiΘ c) = e2c . (5.4.8)

The Schrödinger equation then becomes:

dU U -1 = k k -1 dk + cos adΘ (dcc -1 ) + idΘ + i sin adΘ (dcc -1 ) k -1

(5.4.9)

= k iJz (dψ + dϕ cos(θ)) + iJy dθ − iJx dϕ sin(θ) k -1 (5.4.10)

by taking the relevant differentials and inverses in equation (5.4.5), using equation
(B.3.2) and calculating the adjoint action on c. Under this Cartan decomposition,
we see equation (5.4.10) partitioned into symmetric k and antisymmetric p part:

dU U -1 = k iJz (dψ + dϕ cos(θ)) + iJy dθ − iJx dϕ sin(θ) k -1 . (5.4.11)
| {z } | {z }
∈k ∈p

Restricting dU U -1 ∈ p is equivalent to defining a minimal connection dψ = −dϕ cos θ.

That is, effectively setting the symmetric part of the Hamiltonian to zero (or in
geometric parlance, restricting to the horizontal subspace corresponding to the un-
derlying subRiemannian manifold). The Hamiltonian becomes:

dU -1
H = i U = k Jx ϕ̇ sin(θ) − Jy θ̇ k -1 (5.4.12)
dt

where we have defined conjugate momenta ϕ̇, θ̇ for extremisation below. To calcu-
late the optimal time, we first define the Killing form on g for which we require
normalisation of the extremised action in (such that we can define an appropriately
scaled norm and metric). For a subRiemannian space of interest in the adjoint rep-
resentation, the Euclidean norm is then simply defined in terms of the Killing form
p
as |X| = (X, X) such that:

|idU U -1 |2 = (dψ + dϕ cos(θ))2 + dθ2 + (dϕ sin(θ))2 (5.4.13)

Here γ defines the curve along the manifold generated by the Hamiltonian and
t the time elapsed. Calculating path length here is equivalent to approximating
time elapsed modulo Ω. Consistent with typical differential geometric methods, we
parametrise by arc length ds [51]. Define:

dU -1 dt
ṫ = i U = (5.4.15)
ds ds

setting optimal (minimal) time as:

Z
T = min t = min ṫds . (5.4.16)
γ γ

Extremisation can be performed using the method of Lagrange multipliers and the
minimal connection above. As we demonstrate below, doing so in conjunction with
the constant-θ assumption simplifies the variational problem of estimating minimal
time. The relevant action is given by:
Z
S = Ωt = ṫ + λJz (ψ̇ + ϕ̇ cos(θ)) ds (5.4.17)
γ

noting again the role of the connection. Parametrising by arc length s we have:
2
dU
i U -1 = (ψ̇ + ϕ̇ cos(θ))2 + θ̇2 + (ϕ̇ sin(θ))2 . (5.4.18)
ds

Extremising the action δS = 0 resolves the canonical position and momenta via the
equation above as:

δt 1
Ω = ψ̇ + ϕ̇ cos(θ) + λ (5.4.19)
δ ψ̇ ṫ

!
δt ϕ̇ ψ̇
Ω = + + λ cos(θ) (5.4.20)
δ ϕ̇ ṫ ṫ

δt θ̇
Ω = (5.4.21)
δ θ̇ ṫ

!
δt ψ̇
Ω =− + λ ϕ̇ sin(θ) (5.4.22)
δθ ṫ
5.4. TIME-OPTIMAL CONTROL EXAMPLES 203

δt
Ω = ψ̇ + ϕ̇ cos(θ) (5.4.23)
δλ

where we have assumed vanishing quadratic infinitesimals to first order e.g. dϕ2 = 0.
We note that given λ is constant, it does not affect the total time T . The choice of
λ can be considered a global gauge degree of freedom i.e. ∂T∂λ
= 0 (i.e. regardless of
λ minimal time, T remains the same). The minimal connection constraint:

k -1 dk = − cos adΘ (dcc -1 ) (5.4.24)

can be written as:

ψ̇ = −ϕ̇ cos(θ) (5.4.25)

as we have specified k. The connection and equation (5.4.23) imply that ψ̇(s) be-
comes a local gauge degree of freedom in that it can vary from point to point along
the path parameter s without affecting the physics of the system (i.e. the rate of
change of ψ can vary from point to point without affecting the energy Ω or time T
of the system). That is:

δT
= 0. (5.4.26)
δ ψ̇

We can simplify the equations of motion by setting a gauge fixing condition (some-
times called a gauge trajectory). Thus we select:

ψ̇/ṫ + λ = 0. (5.4.27)

Recalling extremisation via dS = 0, we find equations (5.4.20) and (5.4.21) become:

δt δt ϕ̇ θ̇
dS = Ω + Ω = + = 0. (5.4.28)
δ ϕ̇ δ θ̇ ṫ ṫ

Thus ϕ̇/ṫ and θ̇/ṫ are constant by the constant-θ assumption i.e. θ̇ = 0. Minimising
over constant θ̇:
Z
∂T
Ω = (θ̇/ṫ) ds = 0 (5.4.29)
∂ θ̇ γ

R
confirms the independence of T from θ (i.e. as ṫ and the path-length γ ds ̸= 0).
Combining the above results reduces the integrand in equation (5.4.13) to depen-
dence on the dϕ sin θ term (as the minimal connection condition and constant θ
condition cause the first two terms to vanish). Such simplifications then mean opti-
204CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

mal time is found via minimisation over initial conditions in equation (5.4.16):
Z Z
T = min dϕ sin θ = min |ϕ sin θ| ϕ= dϕ. (5.4.30)
θ,ϕ γ γ

Note by comparison with the general form of equation (5.6.48), here the ϕ sin θ term
represents sin adΘ (dcc -1 ) = sin adΘ (Φ). The above method shows how the constant-θ
method simplifies the overall extremisation making the minimisation for T manage-
able.

Now consider holonomy targets of the form:

U (T )U0-1 = e−iX = eiJz 2η ∈ K (5.4.31)

with controls only in p = {iJx , iJy }. By assumption U is of KAK form (equation

(5.4.6)):

-1 Θc
U (T )U0-1 = e−iX = eiJz 2η = keiΘ c = qeic . (5.4.32)

Choose the initial condition as:

U0 = eiΘ . (5.4.33)

The simplest form is when c resolves to identity. This in turn requires c ∈ K to

resolve to the identity:
n o
-1
M ≡ c ∈ K : cΘc = Θ = {±I} (5.4.34)

Given we have a single element in k, this is equivalent c = exp(iJz ϕ) where:

ϕ = 2πn (5.4.35)

for n ∈ Z. In general we must optimise over choices of n, which in this case is simply
n = 1. From this choice of initial condition we have:

q(T ) = k(T )c(T ) = e−iX (5.4.36)

q(T ) = eiJz ψ eiJz ϕ = eiJz (ψ+ϕ) . (5.4.37)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 205

Note this is a relatively simple form of X = (1 − cos adΘ )(Φ) as Φ comprises only a
single generator Jz . This condition is equivalent to:
Z Z
2η = dχ = dψ + dϕ = 2πn(1 − cos(θ)). (5.4.38)
γ γ

Here we have again used the minimal connection constraint for substitution of vari-
ables. Thus:

η
cos(θ) = 1 − . (5.4.39)
nπ

Substituting into equation (5.4.30) we have that:

r
2η η 2
ΩT = min 2πn − (5.4.40)
n nπ nπ
p
= 2 min η(2πn − η) (5.4.41)
n
p
= 2 η(2π − η) (5.4.42)

where we have used:

2 η 2
2η η 2
sin θ = 1 − = − (5.4.43)
nπ nπ nπ

using trigonometric identities and setting n = 1. Note the time optimality is con-
sistent with [190], namely:
p
2 η(2π − η)
T = . (5.4.44)
Ω

We now have the optimal time in terms of the parametrised angle of rotation for Jz .
To specify the time-optimal control Hamiltonian, recall the gauge fixing condition
(equation (5.4.27), which can also be written ψ/t + λ = 0) such that:

λ = −ψ̇/ṫ (5.4.45)
= −ψ(T )/T (5.4.46)
ϕ
= cos(θ) (5.4.47)
T
π−η
= Ωp (5.4.48)
η(2π − η)

where we have used equations (5.4.35) and (5.4.39). Connecting the optimal time
to the control pulses and Hamiltonian, note that λ can be regarded (geometrically)
as the rate of turning of the path. In particular, noting that λ = −ψ(T )/T , we
can regard λ as the infinitesimal rotation for time-step dt. In a control setting
206CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

with a discretised Hamiltonian, we regard it as the rotation per interval ∆t. Thus
−ψ(T ) → λT and per time interval −ψ(t) → λt. The Hamiltonian then becomes:

H(t) = Ωe−iJz λt ea eiJz λt (5.4.49)

where a ∈ a ⊂ p. Selecting Jx , the Hamiltonian resolves to:

H(t) = Ωe−iJz λt/2 Jx eiJz λt/2 (5.4.50)

! ! !
Ω e−iλt/2 0 0 1 eiλt/2 0
= (5.4.51)
2 0 eiλt/2 1 0 0 e−iλt/2
! !
Ω e−iλt/2 0 0 e−iλt/2
= (5.4.52)
2 0 eiλt/2 eiλt/2 0
!
Ω 0 e−iλt
= (5.4.53)
2 eiλt 0
!
Ω 0 cos(λt) − i sin(λt)
= (5.4.54)
2 cos(λt) + i sin(λt) 0
= Ω(cos λtJx + sin λtJy ). (5.4.55)

The Hamiltonian is comprised of control subset generators Jx , Jy ∈ p with control

amplitudes λ given by their coefficients over time-interval t. In [190], the target
unitary is of the form:

η
UT (t) = ei 2 σz = eiηJz . (5.4.56)

With ν = 1 − η/(2π), the time-optimal solution for α(t) becomes α(t) = ωt where:

2ν
ω/Ω = √ (5.4.57)
1 − ν2

The minimum control time is given by:

√
π 1 − ν2
T = . (5.4.58)
Ω

5.4.2 SU (3)/S(U (1) × U (2)) time-optimal control

5.4.2.1 Overview

We now consider now the constant-θ method of relevance to lambda systems, specif-
ically for the SU (3)/S(U (1)×U (2)) (AIII(3,1) type) symmetric space, distinguished
by the choice of Cartan decomposition (for classification of symmetric spaces gener-
ally see section C.3 and [2]). The fundamental representation of SU (3) generators
5.4. TIME-OPTIMAL CONTROL EXAMPLES 207

is via the Gell-man matrices:

   
0 1 0 0 −i 0
λ1 = 1 0 0 = |0⟩⟨1| + |1⟩⟨0| λ2 =  i 0 0 = −i|0⟩⟨1| + i|1⟩⟨0|
   

0 0 0 0 0 0
(5.4.59)
   
1 0 0 0 0 1
λ3 = 0 −1 0 = |0⟩⟨0| − |1⟩⟨1| λ4 = 0 0 0 = |0⟩⟨2| + |2⟩⟨0|
   

0 0 0 1 0 0
(5.4.60)
   
0 0 −i 0 0 0
λ5 = 0 0 0  = −i|0⟩⟨2| + i|2⟩⟨0| λ6 = 0 0 1 = |1⟩⟨2| + |2⟩⟨1|
   

i 0 0 0 1 0
(5.4.61)
 
0 0 0
λ7 = 0 0 −i = −i|1⟩⟨2| + i|2⟩⟨1| (5.4.62)
 

0 i 0
 
1 0 0
1  1
λ8 = √ 0 1 0  = √ (|0⟩⟨0| + |1⟩⟨1| − 2|2⟩⟨2|) (5.4.63)

3 3
0 0 −2

Following [232,234], we set out commutation relations for the adjoint representation
of su(3), the Lie algebra of SU (3) in Table (5.1). The row label indicates the first
entry in the commutator, the column indicates the second.
208CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS
⊥
−iλ1 √ −iλ2 −iH
√ III −iHIII −iλ4 −iλ5 −iλ6 −iλ7
⊥
−iλ1 √ 0 i 3HIII − iHIII −i
√ 3λ2 iλ2 −iλ7 iλ6 −iλ5 iλ4
⊥
−iλ2 −i 3H√III + iHIII √0 3iλ1 −iλ1 −iλ6 −iλ7 iλ
√4 iλ5
√
−iHIII i 3λ2 −i 3λ1 0 0 0 0 −i 3λ7 i 3λ6
⊥
−iHIII −iλ2 iλ1 0 0 −i2λ5 i2λ4 −iλ7 iλ6
⊥
−iλ4 iλ7 iλ6 0 i2λ5 0 −2iHIII −iλ2 −iλ1
⊥
−iλ5 −iλ6 iλ7 √0 −i2λ4 i2HIII 0 iλ1 √ iλ2 ⊥

−iλ6 iλ5 −iλ4 i √3λ7 iλ7 iλ2 −iλ1 √ 0 −i 3HIII + HIII
⊥

−iλ7 −iλ4 −iλ5 −i 3λ6 −iλ6 iλ1 −iλ2 i 3HIII + HIII 0
Table 5.1: Commutation relations for generators in adjoint representation of su(3). The Cartan decomposition is g = k ⊕ p where k =
⊥
span{−iλ1 , −iλ2 , −iHIII , −iHIII } (red) and p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 } (black). As can be seen visually, the decomposition satisfies the Cartan com-
mutation relations (equation (5.3.6)): the yellow region indicates [k, k] ⊂ k, the green region that [p, p] ⊂ k and the white region that [p, k] ⊂ p. From a control
perspective, by inspection is clear that elements in k can be synthesised (is reachable) via linear compositions of the adjoint action of p upon itself (the green
region) as a result of the fact that [p, p] ⊆ k. We choose a = ⟨−iλ5 ⟩ with h = ⟨−iHIII , −iλ5 ⟩.
5.4. TIME-OPTIMAL CONTROL EXAMPLES 209

The typical Cartan decomposition of g = su(3) = k ⊕ p follows:

√
k = span{−iλ1 , −iλ2 , −iλ3 , −i 3λ8 } (5.4.64)
p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 } (5.4.65)

with a maximally compact Cartan subalgebra for g often chosen to be

√
h = span{−iλ3 , −i 3λ8 }. (5.4.66)

At this stage, our Cartan subalgebra is not maximally non-compact (having no

intersection with p), which we deal with below. Our interest is in lambda sys-
tems, that is, 3-level quantum systems where the Hamiltonian control space is a
4-dimensional space of optical transitions generated by p (where (λ1 , λ2) may be
substituted for (λ6 , λ7 )). The target space is the 4-dimensional unitary subgroup
K = S(U (1) × U (2)) of microwave transitions generated by k. With respect to this
decomposition, it is convenient to introduce a slight change of basis:
√
3 1 1
HIII = − λ3 + λ8 = √ (−|0⟩⟨0| + 2|1⟩⟨1| − |2⟩⟨2|) (5.4.67)
2 2 3
√
⊥ 1 3
HIII = λ3 + λ8 = |0⟩⟨0| − |2⟩⟨2|. (5.4.68)
2 2

Note for convenience:

√ √
3 1 ⊥ 3 ⊥ 1
λ3 = − HIII + HIII λ8 = HIII + HIII (5.4.69)
2 2 2 2
√ ⊥
√ √ ⊥
λ3 + 3λ8 = 2HIII − λ3 + 3λ8 = 3HIII + HIII (5.4.70)

and:

⊥ ⊥
ad2Θ (−iHIII ) = 0 ad2Θ (−iHIII ) = (−2θ)2 (−HIII ) (5.4.71)

Under this change of basis:

⊥
k = span{−iλ1 , −iλ2 , −iHIII , −iHIII } (5.4.72)
p = span{−iλ4 , −iλ5 , −iλ6 , −iλ7 }. (5.4.73)

Under this transformation, the maximally compact Cartan subalgebra

⊥
h = {HIII , HIII } ∈ k in (5.4.73) is entirely within k. In principle, to obtain a
maximally noncompact Cartan subalgebra, we conjugate h via an appropriately
chosen group element, thereby performing a Cayley transform (see Section 7 of Part
VI of [9] and for an explicit example see section 5.8 below). However, in our case we
210CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

can simply read off the combination {−iHIII , −iλ5 } (such that h ∩ p = −iλ5 ) from
the commutation table above. The commutant of a = span{−iλ5 } is the subgroup:

M ≡ k ∈ K : ∀iΘ ∈ a, kΘk -1 ∈ Θ = em

(5.4.74)

We also define:
 
√ ⊥
0 0 0
⊥ ′′ 3HIII + HIII
HIII =− = 0 1 0  (5.4.75)
 
2
0 0 −1

which together with λ6 and λ7 define the microwave Pauli operators and the gener-
ator:
 
⊥
√ −2 0 0
′′ −3HIII − 3HIII
HIII = =  0 1 0 (5.4.76)
 
2
0 0 1

which commutes with the microwave qubit. Continuing, via equations (5.6.65-
5.6.66) we have optimal time in the form:

ΩT = min | sin adΘ (Φ)| (5.4.77)

Θ,Φ

with initial condition U0 =e−iΘ ∈ A and target unitary is given by UT = e−iX :

X = (1 − cos adΘ )(Φ). (5.4.78)

Here Φ ∈ k and is in the commutant with respect to Θ. The time-optimal circuit is

generated by the Hamiltonian:

cos adΘ∗ (Φ∗ )

H(t) = e−iΛt sin adΘ∗ (Φ∗ )eiΛt Λ= ≡ cos adΘ̇∗ (∆Φ∗ ). (5.4.79)
T

Here Θ∗ , Φ∗ reflects a choice of parameters (e.g. θk , ϕj ) which minimise time T .

To calculate the optimal time and corresponding Hamiltonian explicitly, we note
adΘ (Φ) = [Θ, Φ] ∈ p while ad2Θ (Φ) = [Θ, [Θ, Φ]] ∈ k (see section (B.3.5)).

5.4.2.2 General form of target unitaries in SU (3)

We set out the general form of targets using the particular Cartan decomposition
and choice of basis for SU (3) above. We set:

⊥′′
√
Φ = −i(ϕ1 λ1 + ϕ2 λ2 + ϕ3 HIII + ϕ4 3HIII ). (5.4.80)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 211

The cos(adΘ ) term is proportional to the application of ad2Θ :

cos adΘ (Φ) = cos α(θ)(Φ) (5.4.81)

⊥′′
√
= cos(θ)(−iλ1 ) + cos(θ)(−iλ2 ) + cos(2θ)(−iHIII ) − i 3HIII (5.4.82)

using cos(−θ) = cos(θ), that cos α(θ) = cos(0) = 1 for −iHIII . Our targets are in
this case of the form:

X = (1 − cos adΘ )(Φ) = (1 − cos α(θ))(Φ) (5.4.83)

⊥′′
= −i (1 − cos(θ))ϕ1 λ1 + (1 − cos(θ))ϕ2 λ2 + (1 − cos(2θ))ϕ3 HIII (5.4.84)

where 1 − cos α(θ) = 0 for HIII . Note that for more general targets involving linear
combinations of HIII , specific choices or transformations of Φ and transformations
of Θ such that the HIII term does not vanish may be required. The rationale for this
is the subject of ongoing work and is important for proving the extent of (and any
constraints upon) the constant-θ conjecture and to understanding the interplay of
system symmetries with reachability of targets. As per the general method set out
below, this form of X can be derived as follows. We begin with the most general
form of target:

⊥′′
√
X = −i(η1 λ1 + η2 λ2 + η3 HIII + η4 3HIII ). (5.4.85)

Minimising evolution time firstly requires a choice of an initial condition. Our target
can be written:

-1 Θc
U (T )U0-1 = e−iX = qeic (5.4.86)

with U0 = eiΘ and the commutant condition we have that:

′ ′′
q(T ) = k(T )c(T ) = eΦ eΦ = eΦ = e−iX (5.4.87)

In general, we must choose Φ such that it is in the commutant:

√ ⊥′′
Φ = −i(ϕ1 λ1 + ϕ2 λ2 + ϕ3 3HIII + ϕ4 HIII )∈M (5.4.88)
212CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

Where a target does not comprise a generators, its coefficient may be set to zero.
Gathering terms:

′′
q(T ) = e−iΦ (5.4.89)
√
Φ′ = −i(ψ1 λ1 + ψ2 λ2 + ψ3 HIII
⊥
+ ψ4 3HIII ) (5.4.90)

Φ′′ = −i (ψ1 + ϕ1 )λ1 + (ψ2 + ϕ2 )λ2 (5.4.91)
√

⊥′′
+ (ψ3 + ϕ3 )HIII + (ψ4 + ϕ4 ) 3HIII (5.4.92)

Using the form of minimal connection (equation (5.4.24)):

ψ̇k = −ϕ̇k cos α(θ) (5.4.93)

equate coefficients for our target Hamiltonian X, where, by gathering terms by

generator, we note that ηk = ϕk (1 − (cos αk (θ)) (excluding the vanishing −iHIII
term):

η1 = ϕ1 (1 − cos(θ)) η2 = ϕ2 (1 − cos(θ)) (5.4.94)

η3 = ϕ3 (1 − cos(2θ)) η4 . = ϕ4 (1 − 1) = 0e−iX (5.4.95)

recovering the form in equation (5.4.84) up to relevant generators. Continuing:

1 η1 η2 p
cos(θ) = 1 − + sin(θ) = 1 − cos(θ)2 (5.4.96)
2 ϕ1 ϕ2
η3 p
cos(2θ) = 1 − sin(2θ) = 1 − cos(2θ)2 (5.4.97)
ϕ3

Note that η4 does not contribute. Optimal time is parametrised as:

ΩT = min | sin adΘ (Φ)| = min | sin adΘ (Φ)| (5.4.98)

Θ,Φ θ,ϕk
v !
u
u X
= min t ϕ2k sin2 αk (θ) (5.4.99)
θ,ϕk
k
q
= min ϕ21 sin2 (θ) + ϕ22 sin2 (θ) + ϕ23 sin2 (2θ) (5.4.100)
ϕk

where the minimisation depends on the choice of ϕk . The choice of ϕi must be such
that the commutant condition eiΦ ∈ M is satisfied. For certain targets X, this
means:
√ ⊥′′
Φ = −iϕ3 3HIII − i2πk(nx λ1 + ny λ2 + nz HIII ) (5.4.101)
5.4. TIME-OPTIMAL CONTROL EXAMPLES 213

for any ϕ3 ∈ R, k ∈ Z and unit vector n̂ = (nx , ny , nz ). The remainders of p and k

pair into root spaces of a:

[iλ5 , iλ1 ] = −iλ6 [iλ5 , iλ6 ] = iλ1 , (5.4.102)

[iλ5 , iλ2 ] = iλ7 [iλ5 , iλ7 ] = −iλ2 (5.4.103)

and

[iλ5 , iλ4 ] = 2iλ3 [iλ5 , iλ3 ] = −2iλ4 (5.4.104)

In this formulation, for a target unitary U (T ) = eiX :

X = (1 − cos adΘ )(Φ) (5.4.105)

⊥ ′′
= −i2πk((1 − cos θ)(nx λ1 + ny λ2 ) + (1 − cos(2θ))nz HIII (5.4.106)

For simple targets, such as those dealt with in the next section, ϕk simply be an
integer multiple 2πk, in which case the minimisation problem becomes one of se-
lecting the appropriate choice of k that minimises T . Note that as discussed, this
particular form of Hamiltonian cannot reach targets in HIII . Having determined
T , the optimal time Hamiltonian is constructed as follows. Recall from equation
(5.6.70) that:

H(t) = e−iΛt sin adΘ∗ (Φ∗ )eiΛt (5.4.107)

cos adΘ∗ (Φ∗ )
Λ= (5.4.108)
T

noting for completeness that Λ ∈ k (and sin adΘ (Φ) ∈ p). We demonstrate the
technique for a specific example gate in the literature below.

5.4.2.3 Comparison with existing methods

In this section, we apply the constant θ method to derive time optimal results
from [229]. The KP decomposition for D’Alessandro et al. in [63] and [229] is:

1
p = √ span{iλ1 , iλ2 , iλ4 , iλ5 } (5.4.109)
2
1
k = √ span{iλ3 , iλ6 , iλ7 , iλ8 }. (5.4.110)
2

As the only difference with our chosen KP decomposition (up to the constant
√
−1/ 2) is swapping −iλ1 , −iλ2 ∈ p and −iλ6 , −iλ7 ∈ k we can use the same
⊥
change of basis from −iλ3 , −iλ8 to −iHIII , −iHIII and choice of a = span{−iλ5 }.
214CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

For convenience and continuity with [229], we use the standard notation from that
paper. The key point for our method is the choice of a Cartan subalgebra that is
maximally non-compact allowing us to select Θ proportional to −iλ5 . The form of
targets in [229] Uf is:
!
e−iϕs 0
Uf = (5.4.111)
0 Ûs

where Us ∈ U (2) and det Us = eiϕs . For the given KP decomposition, matrices K
(block diagonal) and P (block anti-diagonal) have the form:
 
! 0 α β
if 0
K= P = −α∗ 0 0  (5.4.112)
 
0 Q
−β ∗ 0 0

chosen in order to eliminate the drift term. The general form of A ∈ k and P =
iλ1 ∈ p in that work are:
   
a+b 0 0 0 1 0
A= 0 −a −c P = i 1 0 0 . (5.4.113)
   

0 −c −b 0 0 0

P is expressed to be an element of a Cartan subalgebra of p. However the full Cartan

subalgebra is not given. In that paper, the given target is a simple Hadamard gate:
 
1 0
iπ
U f = 0 √1 √i  = exp(H) = exp − λ6 (5.4.114)
 
2 2 4
0 √i √1
2 2

resulting in:
   
0 0 0 0 i 0
A = 0 0 ± √7i15  P =  i 0 0 . (5.4.115)
   

0 ± √7i15 0 0 0 0

In [229], the solution to subRiemannian geodesics (drawing upon results originally

presented by Jurdjevic in [213] (p.257) and later with more exposition in [60] (p.28))
relying upon, as Jurdjevic notes, the right invariance of the vector field under the
action of elements of K) is of the form:

Uf = exp(At) exp((−A + P )t) = exp(At). (5.4.116)

5.4. TIME-OPTIMAL CONTROL EXAMPLES 215

In [229], the results from equations (5.3.5) are assumed and expressed via the con-
straint exp((−A + P )t) = exp(2πk/3) = I (where k ∈ Z) and t = tmin = 2πT =
√
15π/4, in which case:
  
√ 0 0 0
√  15π 
exp(A 15π/4) = exp  0 0 ± √7i15  (5.4.117)

4
0 ± √7i15 0
!
1 0
= (5.4.118)
0 exp(−7π/4σx )
 
1 0 0
= 0 cos(−7π/4) i sin(−7π/4) = Uf (5.4.119)
 

0 i sin(−7π/4) cos(−7π/4)

where we choose c = − √7i15 in the second line from equation (5.4.115) above. If
the positive value is used, a similarity transformation K ∈ exp(k) of the KAK
decomposition exp(At)P exp(−At) is required. In the general form of solution to
subRiemannian geodesic equations from Jurdjevic et al., the control algebra is re-
lated to the Hamiltonian via:
X
exp(At)P exp(−At) = u(t)j Bj (5.4.120)
j

for Bj ∈ p and uj (t) ∈ R (with ||u|| = M for M = Ω constant, by the Pontryagin

‘bang bang’ principle).

5.4.2.4 Constant-θ method

To demonstrate our constant θ method, we first obtain our target generators in

terms of k:
 
0 0 0
iπ  iπ π i
log(Uf ) = 0 0 = λ6 = √ − √ λ6 (5.4.121)

4 4
iπ
2 2 2
0 4
0

which becomes our target X:

π i π
X= √ − √ λ6 = (−iλ6 ) = −iη6 λ6 (5.4.122)
2 2 2 4
Φ = −iϕ6 λ6 (5.4.123)
216CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

In this relatively simple example, we can choose Φ to solely consist of −iλ6 . Setting
Θ = −iλ5 and noting cos adΘ (Φ) = cos α(θ)(Φ):

X = (1 − cos adΘ )(Φ) (5.4.124)

= (1 − cos α(θ))(Φ) (5.4.125)
= (1 − cos(θ))(−iϕ6 λ6 ) (5.4.126)
= −iη6 λ6 (5.4.127)

where we have used:

cos adΘ (Φ) = cos α(θ)ad2k

−iλ5 (−iλ6 ) = i cos(θ)λ6 cos α(θ) = cos(θ) (5.4.128)
sin adΘ (Φ) = sin α(θ)ad2k+1
−iλ5 (−iλ6 ) = i sin(θ)λ1 sin α(θ) = sin(θ) (5.4.129)

Noting Ad(exp(X))(Y ) = exp(adX (Y )) such that for c = exp(−iϕ6 λ6 ):

e−iϕ6 λ6 (−iλ5 )eiϕ6 λ6 = −iλ5 (5.4.130)

our commutant condition is satisfied for ϕ6 = 2π, then:

π
= 2π(1 − cos(θ)) (5.4.131)
4
1 7
cos(θ) = 1 − = (5.4.132)
s 8 8 √
2
7 15
sin(θ) = 1 − = (5.4.133)
8 8

which is “minimum T” in [229]. Minimal time is then:

ΩT = min sin adΘ (Φ) (5.4.134)

Θ,Φ
q
= min ϕ26 sin2 α(θ) (5.4.135)
θ,ϕk

= ±2π sin(θ) (5.4.136)

√ √
15 15π
= ±2π =± (5.4.137)
8 4

which is (as time must be positive) the minimum time t to reach the equivalence
class of target Hamiltonians (that generate Uf up to conjugation) in [229]. Note in
this Chapter we denote this minimum time to reach the target as T . For convenience
√
we assume Ω = 1 such that T = 15π/4. To calculate the Hamiltonian:

H(t) = e−iΛt sin adΘ∗ (Φ∗ )eiΛt (5.4.138)

5.5. DISCUSSION 217

and apply our formulation:

cos adΘ∗ (Φ∗ )

Λ= (5.4.139)
T
2π cos θ
= (−iλ6 ) (5.4.140)
T
7/8
=√ (−iλ6 ) (5.4.141)
15/8
7
= √ (−iλ6 ) (5.4.142)
15
 
0 0 0
= 0 0 − √7i15  (5.4.143)
 

0 − √7i15 0

which is A in [229]. Then from equations (5.4.129) and (5.4.137):

   
0 i 0 √ 0 i 0
15π 
sin adΘ (Φ) = i2π sin(θ)λ1 = 2π sin(θ)  i 0 0 =  i 0 0 (5.4.144)
  
4
0 0 0 0 0 0

which is P in [229] scaled by T . Using our formulation, the Hamiltonian is then:

H(t) = e−iΛt sin adΘ∗ (Φ∗ )eiΛt = eAt P e−At (5.4.145)

= iλ1 cos ωt ± iλ6 sin ωt (5.4.146)
√
for ω = 7/ 15.

5.5 Discussion
We have demonstrated that for specific categories of quantum control problems,
particularly those where the antisymmetric centralizer generators parameterised by
angle θ remain constant, it is possible to obtain analytic solutions for time-optimal
circuit synthesis for non-exceptional symmetric spaces using a global Cartan decom-
position. This is particularly true when the control subsets are restricted to cases
where the Hamiltonian consists of generators from a horizontal distribution (bracket-
generating [2, 230]) p with p ̸= g (where the vertical subspace is not null). Direct
access is only available to subalgebras p ⊂ g. However, we have shown that if the
assumptions [p, p] ⊆ k and dΘ = 0 hold, arbitrary generators in k can be indirectly
synthesised (via application of Lie brackets), which in turn makes the entirety of g
available in optimal time. Geometrically, we have demonstrated a method for syn-
thesis of time-optimal subRiemannian geodesics using p. Note that, as mentioned
218CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

in the example for SU (2) above, subRiemannian geodesics may exhibit non-zero
curvature with respect to the entire manifold, but this is to be expected where we
are limited to a control subset. Consequently, in principle, arbitrary UT ∈ G (rather
than just) UT ∈ G/K becomes reachable in a control sense.

5.6 Generalised constant-θ method

In this section, we generalise the method detailed above. Given a G = KAK

decomposition, an arbitrary unitary UT ∈ G has a decomposition as:

-1 Θc
U = keiΘ c = qeic (5.6.1)

where k, c ∈ K and eiΘ ∈ A. Define Cartan conjugation (see definition B.5.1) as:

k χ = k -1 (eiΘ )χ = eiΘ (U V )χ = V χ U χ . (5.6.2)

The Cartan projection is:

π(U ) = U χ U (5.6.3)

which U to an element of the subspace of G that is fixed by χ. Combining with

equation (5.6.1) we have:

-1 iΘc
π(keiΘ c) = e2c . (5.6.4)

That is, the representation of equation (5.6.1) as projected into the symmetric space
G/K establishing the symmetric space as a section of the K-bundle:

π(G) ∼
= G/K. (5.6.5)

The existence of χ and π are sufficient for G/K to be considered globally symmetric
i.e. it has an involutive symmetry at every point. The compactness of G refers to
the property that G is a compact Lie group, meaning it is a closed and bounded
subset of the Euclidean space in which it is embedded. The symmetric space G/K
is equipped with a Riemannian metric, which is a smoothly varying positive-definite
quadratic form on the tangent spaces of the symmetric space. Noting the Euler
formula (see equation (B.3.2)):

eiΘ Xe−iΘ = AdeiΘ (X) = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (5.6.6)
5.6. GENERALISED CONSTANT-θ METHOD 219

where X ∈ g, eiΘ ∈ A ⊂ G. Note here the (lower-case) adjoint action adΘ (X) is the
action of the Lie algebra generators on themselves (section B.3.4), thus takes the
form of the Lie derivative (definition B.2.6) (commutator):

adΘ (X) = [Θ, X] (5.6.7)

whereas the (upper-case) group adjoint action AdΘ is one of conjugation of group
elements (hence we exponentiate the generator X implicitly). The Maurer-Cartan
form (definition C.1.20) becomes in general:

dU U -1 = k k -1 dk + cos adΘ (dcc -1 ) + idΘ + i sin adΘ (dcc -1 ) k -1 .

(5.6.8)

Recalling that Cartan conjugation is the negative of the relevant involution ι, define
the control space subalgebra as:

p = {−iH : χ(−iH) = −iH} (5.6.9)

which satisfies [p, p] ⊆ k and the Cartan commutation relations more generally.
By restricting the Maurer-Cartan form (Hamiltonian) to the antisymmetric control
subset:

dU U -1 ∈ p (5.6.10)

we thereby define a minimal connection (see section 5.7):

k -1 dk = − cos adΘ (dcc -1 ) (5.6.11)

which can as per the examples be written in its parametrised form as:

ψ̇ α,r + ϕ̇α,r cos α(Θ) = 0 (5.6.12)

here α is a root (functional) on Θ that selects out the relevant parameter, e.g. when
Θ = k θk Hk then α selects out the appropriate θk ∈ C. See section B.4.5 and
P

Appendix B for more background.

Note that if Θ comprises multiple Hk , then the related Hamiltonian may also be
expressed as:

dU -1
H = i U = −k dΘ + sin adΘ (dcc ) k -1 .
-1
(5.6.13)
dt
220CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

Given g as the Lie algebra of G, define the Killing form as:

1
(X, Y ) = ReTr(adX adY ) (5.6.14)
2

where adX is the adjoint representation of X. The Killing form (definition B.2.12)
is used to define an inner product on g allowing measurement of lengths and angles
in g. Define the inner product for weights and the Weyl vector ρ:
r
kl 1 X X
(α, β) = g Hk Hl and ρ= α= ϕk (5.6.15)
2 + k=1
α∈R

where {Hk } is a basis for the Cartan subalgebra a, R+ is the set of positive roots
α, r is the rank, and {ϕk } are the fundamental weights (section B.4). The weights
(α, β) of a representation are the eigenvalues of the Cartan subalgebra a, which (see
below) is a maximal abelian subalgebra of g. The Weyl vector ρ is a special weight
that is associated with the root system of g (section B.4.4). It is defined as half
the sum of the positive roots α (counted with multiplicities). The Weyl vector is
used in Weyl’s character formula, which gives the character of a finite-dimensional
representation in terms of its highest weight [235] which we denote below as τ .

Define the Euclidean norm (see A.1.11) using the Killing form as:
p
|X| = (X, X). (5.6.16)

We leverage the fact that the Killing form is quadratic for semi-simple Lie algebras
g such that:

|idU U -1 |2 = |ik -1 dk + cos adΘ (idcc -1 )|2 + |dΘ|2 + | sin adΘ (dcc -1 )|2 . (5.6.17)

Let the Hamiltonian have an isotropic cutoff:

|H| = Ω. (5.6.18)

By the Schrodinger equation, the time elapsed over the path γ is given by:
Z Z Z
-1 dU -1
Ωt = Ω dt = |idU U | = i U ds. (5.6.19)
γ γ γ ds

Define:

dU -1
ṫ = i U (5.6.20)
ds
5.6. GENERALISED CONSTANT-θ METHOD 221

with:

T = min t. (5.6.21)
γ

To perform the local minimization, we introduce a vector of Lagrange multipliers

(defined below in equation (5.6.29)):
Z
-1

-1
Ωt = ṫ + Λ, k k̇ + cos adΘ (ċc ) ds (5.6.22)
γ

note
U̇ U -1 − χ(U̇ U -1 )
k k -1 k̇ + cos adΘ (ċc -1 ) k -1 = (5.6.23)
2

and
U̇ U -1 + χ(U̇ U -1 )
k iΘ̇ + i sin adΘ (ċc -1 ) k -1 = . (5.6.24)
2

We can further simplify via expanding in the restricted Cartan-Weyl basis, noting
that α indexes the relevant roots and s indexes the relevant sets of roots r ∈ ∆.
The restricted Cartan-Weyl basis allows g to be decomposed as the sum of the
commutant basis vectors Hk ∈ a and the root vectors Eα :
M
g = Hk Eα (5.6.25)

where there are r such positive roots. The Maurer-Cartan form (equation (5.6.22))
becomes expressed in terms of weights and weight vectors:

Θ = Hk θk (5.6.26)

iċc -1 = Fα,r ϕ̇α,r (5.6.27)

ik -1 k̇ = Fα,r ψ̇ α,r + m (5.6.28)

and

Λ = Fα,r λα,r + m. (5.6.29)

In the above equations, the α are summed over restricted positive roots (of which
there are r many). Fα ∈ k − m with ϕ̇, ψ̇ ∈ C coefficients. The Lagrange multiplier
222CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

vector Λ in equation (5.6.29) is in generalised form. Note that in the SU (2) case
above, the simplicity of k = {Jz } simplifies the multiplier term in the action equation
(5.4.17). Here the Cartan subalgebra h is given by:

h = {a, m}. (5.6.30)

This algebra is distinct from a maximally compact Cartan algebra in that h intersects
with both p (the non-compact part of h) and k (the compact part of h). In many
cases, the elements of h are themselves diagonal (see for example Hall and others
[232, 235]) and entirely within k. In our case, a ⊂ p but m ⊂ k. The construction
of the restricted Cartan-Weyl basis is via the adjoint action of a on the Lie algebra,
such that it gives rise to pairings Fα ∈ k − m, Eα ∈ p − a conjugate under the adjoint
action.

The Cartan-Weyl basis has the property that the commutation relations between
the basis elements are simplified because (i) the Cartan generators (belonging to an
abelian subalgebra) commute and (ii) the commutation relations between a Cartan
generator and a root vector are proportional to the root vector itself. The com-
mutation relations between two root vectors can be more complicated, but they
are determined by the structure of the root system. The set of roots are the non-
zero weights of the adjoint representation of the Lie algebra. The roots form a
discrete subset of the dual space to the Cartan subalgebra, and they satisfy certain
symmetries and relations that are encoded in the Dynkin diagram of the Lie alge-
bra [2,230]. Transforming to the restricted Cartan-Weyl basis allows us to represent
the parametrised form of equation (5.6.17) as:

|Θ̇|2 = gkl θ̇k θ̇l (5.6.31)

X
| sin adΘ (ċc -1 )|2 = gαα (ϕ̇α,r )2 sin2 α(Θ) (5.6.32)
α,r

X 2
|k -1 k̇ + cos adΘ (ċc -1 )|2 = gαα ψ̇ α,r + ϕ̇α,r cos α(Θ) + |m|2 (5.6.33)
α,r

and using the vector of Lagrange multipliers with the connection:

X
Λ, k -1 k̇ + cos adΘ (ċc -1 ) = gαα λα,r ψ̇ α,r + ϕ̇α,r cos α(Θ) + |m|2 .

(5.6.34)
α,r

Here we have used the fact that the only non-vanishing elements of the Killing form
are gαα and gjk (and using gαα = (Eα , Eα )), the inner product of the basis elements
5.6. GENERALISED CONSTANT-θ METHOD 223

with themselves, and the restricted Cartan-Weyl basis to simplify the variational
equations. The nonzero functional derivatives are:

δt 1 α,r
ψ̇ + ϕ̇α,r cos α(Θ) + λα,r

Ω = gαα (5.6.35)
δ ψ̇ α,r ṫ

! !
δt ϕ̇α,r ψ̇ α,r
Ω = gαα + + λα,r cos α(Θ) (5.6.36)
δ ϕ̇ ṫ ṫ

δt θ̇l
Ω = gkl (5.6.37)
δ θ̇k ṫ

!
α,r
δt X ψ̇
Ω =− gαα + λα,r ϕ̇α,r α(Hk ) sin α(Θ) (5.6.38)
δθk α,r
ṫ

δt
Ω α,r
= ψ̇ α,r + ϕ̇α,r cos α(θ). (5.6.39)
δλ

From the Euler-Lagrange equations (see section C.5.4 generally) and by design from
the connection we have the constraints:

k -1 k̇ = − cos adΘ (ċc -1 ). (5.6.40)

We assume again the Lagrange multipliers are constant. As for the case of SU (2),
each Lagrange multiplier λα,r then becomes a global gauge degree of freedom in the
sense that:

∂T
=0 (5.6.41)
∂λα,r

with ψ̇(s) becoming local gauge degrees of freedom under the constraint:

δT
= 0. (5.6.42)
δ ψ̇ α,r

We are free to choose the gauge trajectory ψ̇(s) as:

k -1 k̇/ṫ + Λ = 0 (5.6.43)
224CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

With this choice, we have also by the remaining Euler-Lagrange equations:

ċc -1 /ṫ = constant (5.6.44)

and

Θ̇/ṫ = constant. (5.6.45)

Minimizing over the constant θ̇:

Z
∂T
Ω = gkl (θ̇l /ṫ) ds = 0 (5.6.46)
∂ θ̇ k
γ

we see:

Θ = constant. (5.6.47)

The above functional equations show that when the action is varied, each term in
equation (5.6.8) vanishes apart from the sin adΘ (dcc -1 ) term. Calculating optimal
evolution time then reduces to minimization over initial conditions:

T = min |sin adΘ (Φ)| (5.6.48)

Θ,Φ

where
Z
Φ= −idcc -1 . (5.6.49)
γ

We can express in terms of the Cartan-Weyl basis as follows:

X
Φ= ϕ̇α,r Fα,r + m (5.6.50)
α,r

where:

1
Fα,r = adΘ (Eα,r ) ∈ k − m (5.6.51)
α(Θ)
ad2Θ (Eα,r ) = α(Θ)2 Eα,r ∈ p − a (5.6.52)
adΘ (m) = 0 (5.6.53)

such that:
X
sin adΘ (Φ̇) = ϕ̇α,r sin α(Θ)Eα,r ∈ p. (5.6.54)
α,r
5.6. GENERALISED CONSTANT-θ METHOD 225

The minimisation can be expressed in terms of parameters noting the trace as a

rank-2 tensor contraction (see section C.1.15). The problem of finding the minimal
time Hamiltonian is thereby simplified considerably. Consider targets of the form:

U (T )U0-1 = e−iX ∈ K. (5.6.55)

Again:

-1 Θc
U (T )U0-1 = e−iX = keiΘ c = qeic . (5.6.56)

Explicitly we equate:

-1 Θc
U (T ) = q U (0) = eic . (5.6.57)

As was the case for SU (2), we select initial conditions as:

U0 = eiΘ (5.6.58)

where we can see a quantization condition emerges by satisfying the commutant

condition that c = exp(Φ) ∈ K (with Φ ∈ k) commute with Θ:
n o
exp(Φ) ∈ K : exp(Φ)Θ exp(−Φ) = Θ = M (5.6.59)

where M can be regarded as the group elements generated by the commutant algebra
m, that is M = exp(m). Such a condition is equivalent to ΘcΘ -1 = c. In the
SU (2) case, because we only have a single generator in Φ, the condition manifests
in requiring the group elements c ∈ K to resolve to ±I, which in turn imposes a
requirement that their parameters ϕj be of the form ϕj = 2πn for n ∈ Z in order for
eiϕj kj = I where k ∈ k. In general, however, the commutant will have a nontrivial
connected subgroup and it still “quantizes” into multiple connected components.
In practice this means that ϕk may not, in general, be integer multiples of 2π and
instead must be chosen to meet the commutant condition in each case. We note
that:

q(T ) = k(T )c(T ) = e−iX (5.6.60)

226CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

or equivalently
Z
X= idqq -1 (5.6.61)
γ
Z
k ik -1 dk + idcc -1 k -1

= (5.6.62)
γ
Z
k (1 − cos adΘ ) idcc -1 k -1

= (5.6.63)
γ

= − (1 − cos adΘ ) Φ (5.6.64)

where we have used the minimal connection in equation (5.6.11), where the last
equality comes from k(0) = I and the arc-length parametrisation of γ between
s ∈ [0, 1]. The optimal time is:

ΩT = min sin adΘ (Φ) (5.6.65)

Θ,Φ

with minimization over constraints:

eΦ ∈ M and X = (1 − cos adΘ ) (Φ). (5.6.66)

Time-optimal control is given by Hamiltonians of the form:

H(t) = e−iΛt sin adΘ∗ (Φ∗ )eiΛt (5.6.67)

where Φ̇ = Φ/T , Θ∗ and Φ∗ are the critical values which minimize the time, and the
multiplier is the turning rate:

Λ = −k -1 k̇/ṫ (5.6.68)
Z
= − k -1 k̇/T (5.6.69)
γ
cos adΘ∗ (Φ∗ )
= . (5.6.70)
T

The global minimization then depends upon the choice of Cartan subalgebra a ∋ Θ
(as illustrated in the examples above).

5.7 Minimal connections and holonomy groups

A simple application of the product rule and the Euler formula:

eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X) (5.7.1)
5.7. MINIMAL CONNECTIONS AND HOLONOMY GROUPS 227

reveals:

dU U -1 = k k -1 dk + cos adΘ (dcc -1 ) + idΘ + i sin adΘ (dcc -1 ) k -1 .

(5.7.2)

Symmetric controls:

−iHdt ∈ p (5.7.3)

which satisfy the Lie triple property:

[p, [p, p]] ⊆ p (5.7.4)

define the connection:

k -1 dk = − cos adΘ (dcc -1 ) (5.7.5)

which is minimal in the sense that it minimizes the invariant line element

min |idU U -1 |2 = min |ik -1 dk + i cos adΘ (dcc -1 )|2 + |dΘ|2 + | sin adΘ (dcc -1 )|2 (5.7.6)
k∈K k∈K

where the minimization is over curves, k(t). This means that minimization with a
horizontal constraint is equivalent to minimization over the entire isometry space.
Such submanifolds are said to be totally geodesic. Targets U (T ) = e−iX with effec-
tive Hamiltonians:

−iX ∈ k ⊇ [p, p] (5.7.7)

are known as holonomies because they are generators of holonomic groups, groups
whose action on a representation (e.g. the relevant vector space) causes vectors to be
parallel-transported in space i.e. the covariant derivative vanishes along such curves
traced out by the group action. We discuss holonomic groups in definition C.1.32.
Intuitively, one can consider holonomic groups as orbits such that any transformation
generated by elements of k will ‘transport’ the relevant vector along paths represented
as geodesics (definition C.1.35). However, such vectors are constrained to such orbits
if those generators are only drawn from k i.e. [k, k] ⊆ k for a chosen subalgebra k. To
transport vectors elsewhere in the space, one must apply generators in p which is
analogous to shifting a vector to a new orbit. Although in application, one considers
U (0) = 1, it is important to remember this variational problem is right-invariant, so
one could just as well let U (0) = U0 be arbitrary, as long as the target is understood
to correspondingly be U (T ) = e−iX U0 . In this case, the time-optima paths are
subRiemannian geodesics (see section C.4.1) which differ from Riemannian geodesics
228CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

to the extent that, intuitively put, evolution in the direction of k is required with the
distinction between the two types of geodesics captured by the concept of geodesic
curvature.

5.8 Cayley transforms and Dynkin diagrams

In this section, we give an example of computing the Cayley transform of a Cartan
subalgebra h in order to increase its intersection with non-compact subalgebras of
g. This is of interest because the method described above in this Chapter seeks a
maximally abelian subalgebra a ⊂ h ⊂ g from which to deduce the Cartan KAK
decomposition. We discuss background theory on this topic in section B.5.3 (follow-
ing [9]).

⊥
−iHIII
⊥
−iHIII eadλ4 Θ
ead−iλ4 Θ

−iλ5 λ5

⊥
Figure 5.1: Cayley transform of −iHIII ex-
pressed as a rotation into −iλ5 . The pres- ⊥
ence of the imaginary unit relates to the Figure 5.2: Cayley transform of −iHIII ex-
compactness here of g which reflects the pressed as a rotation into λ5 . By con-
boundedness and closed nature of the trans- trast with the case to the left, the absence
formation characteristic of unitary transfor- of the imaginary unit is indicative of non-
mations. compactness such that distances are not pre-
served (unlike in the unitary case where −i
is present).

⊥
Recall the maximally compact Cartan subalgebra h = ⟨HIII , HIII ⟩ ∈ k in (5.4.73)
is entirely within k. To obtain a maximally non-compact Cartan subalgebra, we
conjugate h via an appropriately chosen group element (see Section 7 of Part VI
of [9]). In principle, we could read off the combination ⟨−iHIII , −iλ5 ⟩ (such that
h ∩ p = −iλ5 ) from the commutation table in Table 5.1.
To demonstrate the application of a Cayley transform to obtain the requisite
maximally non-compact Cartan subalgebra, we use a root (definition B.4.3) (in
our case, γ below) to construct a generator of a transformation to a new Cartan
subalgebra whose intersection with p increases by one dimension. We construct a
5.8. CAYLEY TRANSFORMS AND DYNKIN DIAGRAMS 229

root system (section B.4.4) in order to find a Cayley transformation C such that:

⊥
⟨−iHIII , −iHIII ⟩ → ⟨−iHIII , −iλ5 ⟩ . (5.8.1)

By inspection we seek a transformation from −iHIII⊥ → −iλ5 . To find the Cayley

transformation, we first construct a root system via root vectors as follows:

λ1 + iλ2 λ4 + iλ5 λ6 + iλ7

= |0⟩⟨1| = eα √ = |0⟩⟨2| = eγ = |1⟩⟨2| = eβ (5.8.2)
2 2 2
λ1 − iλ2 λ4 − iλ5 λ6 − iλ7
√ = e−α √ = e−γ √ = e−β . (5.8.3)
2 2 2

Note that the root vectors are essentially raising operators that promote the system
from a lower to a higher energy state (as indicated by the ket-bra notation), while
their adjoints are lowering operators. The elements of h are linear combinations of
λ3 , λ8 which form a basis for h. To obtain the roots, we conjugate by this basis
which spans our original h above:

[λ3 , eα ] = 2eα [λ8 , eα ] = 0 α = (2, 0) (5.8.4)

√ √
[λ3 , eγ ] = eγ [λ8 , eγ ] = 3eγ γ = (1, 3) (5.8.5)
√ √
[λ3 , eβ ] = −eβ [λ8 , eβ ] = 3eβ β = (−1, 3). (5.8.6)

The roots are given by α, γ, β. As γ is a linear combination of α, β, they are the

⊥
simple roots for this system. Denote HIII = Hγ then note:

eγ + e−γ = |0⟩⟨2| + |2⟩⟨0| = λ4 . (5.8.7)

Conjugating by Hγ we have:

[Hγ , eγ + e−γ ] = eγ − e−γ = 2iλ5 (5.8.8)

and:

1 1
iλ4 , −iHγ = −iλ5 iλ4 , −iHIII = 0. (5.8.9)
2 2

Cayley transformations take the form of rotations (conjugations i.e. the adjoint
action) by the angle π/2. We include two diagrams in Figures 5.1 and 5.2 out of
interest. The rotation in Fig. 5.1 represents a transformation by a compact group
element (unitary) preserving distances and angle. For contrast, a rotation where the
generator lacks the imaginary unit coefficient (Fig. 5.2), the geometry is hyperbolic
in nature, reflecting non-compact transformation. This gives an interesting geomet-
230CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

ric intuition about the role of the imaginary unit. Our generator of transformations
is iλ4 , thus the Cayley transformation for Hγ becomes:

π
Cγ = e 4 adiλ4 Cγ (−iHγ ) = −iλ5 . (5.8.10)

The maximally non-compact Cartan subalgebra h under this transformation is:

h = t + a = ⟨−iHIII , −iλ5 ⟩ . (5.8.11)

We can visually see this via Table 5.1 via the commutation relations along the −iHIII
row where such a choice of Cartan subalgebra intersects with p and k. We could also
have chosen h = ⟨−iHIII , −iλ4 ⟩.
Diagrammatically, we can understand the relationship between roots and state
transitions as set out in Figure 5.3 which shows the relationship of roots to transitions
between energy states in the lambda system. The root system can also be described

2
γ β

0 α 1
Figure 5.3: A transition diagram showing the relationship between energy transitions and roots. In
a quantum control context, a transition between two energy levels of an atom that can be described
by a root vector in the Hamiltonian. For example, a transition between |0⟩ → |1⟩ can be described
using the root vector eα . An electromagnetic pulse with a frequency resonant with the energy
difference between these two levels can, if applied correctly, transition the system consistent with
the action of eα .

as set out in Fig. 5.4 where the angles between root vectors (we give α as the
example) are calculated using the Cartan matrix below. We can represent the

λ6 + iλ7 λ4 + iλ5

β γ

α
λ1 + iλ2

Figure 5.4: Symmetric root system diagram for the root system described via roots in equation
(5.8.3) for the Lie algebra su(3). The roots α, β, γ can be seen in terms of angles between the root
vectors and can be calculated using the Cartan matrix.

relevant Dynkin diagram (definition B.4.6) for this root system. Recall the entries
5.8. CAYLEY TRANSFORMS AND DYNKIN DIAGRAMS 231

Aij of the associated Cartan matrix (definition B.4.5) are calculated by reference to
the roots as Aij = 2(α · β)/(α · α) where (·) denotes the Euclidean inner product.
The Cartan matrix encodes information about angles between simple roots and
ratios of their lengths. Angles between different simple roots will be off-diagonal
elements in the Cartan matrix of the form −1, −2, ... (diagonal entries are 2). Given
α · α = 4,β · β = 4 For (α, β) and α · β = −2 we have Aαβ = 2(α · β)/(α · α) = 2 cos(θ)
with θ being the angle between the roots. So Aαβ = −1 Noting that and:

α · β = |α||β| cos(θ) (5.8.12)

−2 = (2)(2) cos(θ) (5.8.13)

we find cos(θ) = −1/2 such that θ = 2π/3. The Dynkin diagram represents a root
system for su(3). Each node represents one of the simple roots (α, β) connected
by a single line. By convention, the lines connecting simple roots reflect the angle
between them. For an angle of 2π/3, we have one line connecting the nodes. Hence
we see the connection between root systems as represented by the Dynkin diagram
in Figure (5.5) (see [24] and references therein for a discussion on the relation of
resonance to optimality).

α β

2π
3
β

α
Figure 5.5: Combined diagram of a Dynkin diagram and a symmetric root system with specified
angles and relations.

5.8.1 Hamiltonian

Below is a toy model of how such roots relate to a Lambda system Hamiltonian
with energy levels labeled as |0⟩ , |1⟩ , |2⟩. Here the root vectors from the Lie algebra
correspond to transition operators between these states and are incorporated into
232CHAPTER 5. GLOBAL CARTAN DECOMPOSITIONS FOR KP PROBLEMS

the Hamiltonian as:

X X
H= ωj Hj + gα Eα (5.8.14)
j α∈∆

where ωj are the energy eigenvalues corresponding to the Cartan subalgebra elements
Hj , and gα are coupling constants for the transitions. The Hj and Eα correspond
to the Cartan and non-Cartan elements of g, respectively. The root vectors Eα cor-
respond to transition operators between energy states. For example, if Eα = |0⟩⟨1|,
it induces a transition from state |1⟩ to state |0⟩ when the system interacts with
a resonant control field. The non-Cartan subalgebra elements, which correspond
to the root vectors of the Lie algebra, drive the transitions among different energy
levels. The Cartan subalgebra elements correspond to the diagonal elements in the
Hamiltonian, which can be thought of as the stationary energy levels. By con-
trast, the non-Cartan elements are off-diagonal, represent the possible transitions or
interactions between these energy levels.
Appendix A

Appendix (Quantum Information

Processing)

A.1 Overview

Quantum information processing is characterised by constraints upon how informa-

tion is represented and processed arising from the foundational postulates of quan-
tum mechanics. While information as the subject of enquiry within physics has a
long lineage, quantum information processing as an emergent and distinct discipline
arose largely through the convergence (and often synthesis) of concepts in compu-
tational and informational sciences and quantum physics, principally through the
vision of quantum computing envisaged by Feynman [236] and others. The mod-
ern day field incorporates concepts from quantum physics, computer science and
information and communication theory (pioneered by Shannon and others [237]).
Quantum information theory is often classified into three overlapping but distinct
sub-domains of quantum computing, quantum communication and quantum sens-
ing. Our focus in this work is in the first of these areas, quantum computing, albeit
we leverage concepts drawn from communication theory such as quantum channels
and registers.
Below we provide an overview of key elements of quantum postulates [238] and
quantum information processing for classical machine learning practitioners solving
optimisation problems for quantum engineering. In particular, we describe ways in
which quantum data are typically represented (such as via tensors), how quantum
processes are usually expressed in common programming languages. Quantum in-
formation processing draws upon formalism and conceptual framing from a range of
disciplines, including analysis, information theory and complexity theory, together
with algebra, geometry and other numerous branches of mathematics. All quantum
information processing rests, however, upon the mathematical formalism of quan-

233
234 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

tum mechanics itself. The study of such formalism usually begins with the study
of quantum axioms or postulates, a set of assumptions about quantum systems and
their evolution derived from observation and from which other results and properties
of quantum systems are deduced. There are a variety of ways to express such ax-
ioms according to chosen formalism (and indeed debate continues within fields such
as quantum foundations, mathematics and philosophy as to the appropriateness of
such postulates). In the synopses below, we have followed the literature in [11, 45]
and [43], making a few adjustments where convenient (such as splitting the measure-
ment postulate in two). We partly frame postulates information-theoretic fashion
concurrently with traditional formulations. The axioms form a structured ordering
of this background material, anchoring various definitions, theorems and results rel-
evant to establishing and justifying methods and results in the substantive Chapters
above. We proceed to discussing key elements of open quantum systems (relevant
to, for example, the simulated quantum machine learning dataset the subject of
Chapter 3) together with elements of the theory of quantum control (which we later
expand upon in terms of algebraic and geometric formulations). For those familiar
with quantum mechanics and quantum information processing, this section can be
skipped without loss of benefit. Most of the material below is drawn from stan-
dard texts in the literature, including [43, 45, 238]. Proofs are omitted for the most
part (being accessible in the foregoing or other textbooks on the topic). We begin
with the first of the quantum postulates (which we denote axioms following [45]) of
quantum mechanics, that related to quantum states.

A.1.1 State space

In this section, we set out a number of standard formulations for conceptualising
quantum information and quantum control problems. Advanced readers familiar
with quantum formalism can progress to other sections without loss of generality.
We begin with vector spaces and some results from functional analysis. In later
Chapters, we will equate typical Hilbert space formalism in terms of geometric and
algebraic concepts related to manifolds, fibres and connections. The first axiom
concerns representations of quantum systems in terms of states.

Axiom A.1.1 (Quantum states). The state of a quantum system is represented by

a unit vector ψ in an appropriate Hilbert space H. Two unit vectors ψ1 , ψ2 ∈ H,
where ψ2 = cψ1 for c ∈ C, correspond to the same physical state. In density operator
formalism, the quantum system is described by the linear operator ρ ∈ B(H) acting
on H. For bounded A ∈ B(H), the expectation value of A is Tr(ρA). Composite
quantum systems are represented by tensor products ψk ⊗ ψj (or equivalently in
operator formalism, ρk ⊗ ρj ).
A.1. OVERVIEW 235

The axiom above provides that quantum systems are represented by (unit) state
vectors within a complex-valued vector (Hilbert) space H whose dimensionality is
determined according to the physics of the problem. There are various formalisms for
representing such state vectors in quantum information processing, a common one
being Dirac or ‘bra-ket’ notation. In this notation, the state vector is represented
as a ‘ket’ |ψ⟩ ∈ H. Associated with a ket is a corresponding ‘bra’ notation ⟨ψ|
which strictly speaking is a linear (one) form (or function) that acts on |ψ⟩ such
that ⟨ψ1 |ψ2 ⟩ ∈ C for two states |ψ1 ⟩ , |ψ2 ⟩. Quantum states are typically defined
as elements of a Hilbert space H, a vector space over C equipped with an inner
product. To this end, we introduce the definition of a state space.

Definition A.1.1 (State space). A state space V (K) is a vector space over a field
K C
, typically . Elements ψ ∈ V represent possible states of a quantum system. In
P
particular, for ak ∈ C and ψk ∈ V , the state ψ = k ak ψk ∈ V .

Evolutions of quantum systems over time are described by mappings within

this state space (e.g. linear operators on V ). Importantly, as can be seen above,
any linear combination of basis states ψk (or in braket notation, |ψk ⟩) is also a
quantum state, an assumption crucial to the existence of two-level qubit systems
and entanglement. The state space of a quantum system is fundamental in the
description of quantum states, where the vectors are typically normalized to have a
unit norm due to the probabilistic interpretation of quantum mechanics. Later, we
refine our working definition of state space in terms of Hilbert spaces, operators and
other concepts such as channels. Our aim is to provide sufficient depth and breadth
for readers in order to connect quantum state space with concepts from algebra,
representation theory, geometry and statistical learning. While we utilise standard
bra-ket notation and formalism, we also incorporate a more quantum information-
theoretic approach following Watrous [43] who defines states in terms of registers
(alphabets or tuples thereof) as descriptions of quantum states according to an
alphabet Σ.

Definition A.1.2 (Classical registers and states). A classical register X is either

(a) a simple register, being an alphabet Σ (describing a state); or (b) a compound
register, being an n-tuple of registers X = (Yk ).
A classical state set Γ of X is then either (i) Γ = Σ or (ii) in the case of a compound
register, the Cartesian product of those state sets Σ = ×k Γk .

The formulation of classical states here maps largely to that found in computer
science literature where states are constructed from alphabets etc. Classical states
may be composed of subregisters which determine the overall state description. That
is, a register has configurations (states) it may adopt (with subregisters determining
236 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

higher-order registers). Probabilistic states are distributions over classical states the
register may assume, denoted P(Σ) with states represented by probability vectors
p(a) ∈ P(Σ). From the classical case, we can then construct the quantum infor-
mation representation of a quantum register and states which are represented by
density operators (defined below).

Definition A.1.3 (Quantum registers and states). A quantum state is defined as

an element of a complex (Euclidean) space CΣ for a classical state set Σ satisfying
specifications imposed by axioms of quantum mechanics (see below) as they apply to
states.

In the terminology of operators (see below) a quantum state is defined as a

density operator ρ ∈ B(H). We refer in parts to this formulation below and in later
Chapters. To come to our formal definition of Hilbert spaces H, first we define the
inner product on V (K).

K
Definition A.1.4 (Inner Product). An inner product on V ( ) is a mapping from
a vector space to a field K K
⟨·, ·⟩ : V × V → , (ψ, ϕ) 7→ ⟨ψ, ϕ⟩ for ψ, ϕ ∈ V, c ∈ K
satisfying:

(i) ⟨ψ|ϕ⟩ = ⟨ϕ|ψ⟩.

(ii) ⟨ϕ|ϕ⟩ ∈ R+ and ⟨ϕ|ϕ⟩ = 0 ⇐⇒ ϕ = 0.

(iii) ⟨cϕ|ψ⟩ = c ⟨ϕ|ψ⟩ and ⟨ϕ|cψ⟩ = c∗ ⟨ϕ|ψ⟩.

(iv) ⟨ϕ + ψ|χ⟩ = ⟨ϕ|χ⟩ + ⟨ψ|χ⟩ and ⟨ϕ|ψ + χ⟩ = ⟨ϕ|ψ⟩ + ⟨ϕ|χ⟩.

Inner products and bilinear forms are crucial components across quantum, alge-
braic and geometric methods in quantum information processing. Later, we examine
the relationship between differential (n-)forms and inner products (see Appendix B)
which are relevant to variational results in Chapter 5. There the inner product
is analogous to a bilinear mapping of V × V ∗ → K, implicit in the commonplace
‘braket’ notation of quantum information (where kets |ψ⟩ and ⟨ψ| are duals to each
other and inner products are given by ⟨ψ|ψ⟩). In the formalism below, we regard
|ψ⟩ as an element of a vector (Hilbert) space V (C) while the corresponding ⟨ψ| as an
element of the dual vector space V ∗ (C). Moreover, in later Chapters we adopt (and
argue for the utility of) a geometric approach where inner products and norms are
defined in terms of metric tensors g over manifolds and vector space relationships
to fibres. The inner product defined on a vector space V then gives rise to a norm
via the Cauchy-Schwartz inequality.
A.1. OVERVIEW 237

Definition A.1.5 (Cauchy-Schwarz Inequality). The Cauchy-Schwarz inequality

K
applies to inner product spaces V ( ) such that for ϕ, ψ ∈ V :

|⟨ϕ, ψ⟩|2 ≤ ⟨ϕ, ϕ⟩⟨ψ, ψ⟩.

p
If || · || : V → R where ||ψ|| = ⟨ψ, ψ⟩, then the inner product defines a norm on
V.

We are now equipped to define the norm on the vector spaces with specific
properties as follows.

Definition A.1.6 (Norm). Define a norm on a vector space V over K

= R or C
as a mapping ∥.∥ : V → R, ψ 7→ ∥ψ∥, such that for ψ, ϕ ∈ V and c ∈ : K
1. ∥ψ∥ ≥ 0 and ψ = 0 ⇐⇒ ∥ψ∥ = 0.
2. ∥cψ∥ = |c|∥ψ∥.
3. ∥ϕ + ψ∥ ≤ ∥ϕ∥ + ∥ψ∥.

A norm ∥.∥ on V defines a distance (metric) function d on V as d(ϕ, ψ) = ∥ψ−ϕ∥.

As standard we can then define a generalised angle given two vectors ϕ, ψ ∈ V in
an inner product space where the cosine of the angle θ between ϕ and ψ is given by:

⟨ϕ, ψ⟩
cos(θ) = (A.1.1)
∥ϕ∥ · ∥ψ∥

A normed vector space is a Banach space if it is complete with respect to the

associated distance function (e.g. inner product). Banach spaces are equipped with
certain convergence and boundedness properties which enable among other things
definitions of bounded linear functionals and dual spaces, both of which are relevant
to results in later Chapters. Banach spaces allow us to define an operator norm.

Definition A.1.7 (Operator norm). If V1 (K) and V2 (K) are normed spaces, the
linear mapping A : V1 (K) → V2 (K) is bounded if

∥Aψ∥
sup < ∞.
ψ∈V1 \{0} ∥ψ∥

Intuitively this tells us that linear mappings on V (K) remain within the closure
of V (K). The set of such linear forms a complementary vector space denoted the
dual space.

Definition A.1.8 (Dual space). A bounded linear functional on a normed vector

K K
space V ( ) is a bounded linear map χ : V → , ψ 7→ a||ψ|| for some (scaling)
238 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

K K
a ∈ . The norm is given via the field norm ||.|| = |.| ∈ . The set of all such
bounded linear functionals is denoted the dual space V ∗ (K) to V ( ).K
K
If V ( ) is normed then V ∗ (K) is also a Banach space. Moreover, we have that for
ψ ∈ V and χ ∈ V ∗ :

1. χ(ψ) ∈ K.
2. |χ(ψ)| = ||χ||||ψ||.
3. ∀χ ∈ V ∗ , χ(ψ) = 0 ⇐⇒ ψ = 0.

Note in numbered item 2 above, duals can also be thought of as types of vectors
themselves and as functions that take vectors as arguments. We discuss dual spaces
and functionals in further chapters, particularly in relation to representations of Lie
algebras and differential forms. With the concept of a dual space, inner product and
norm, we can now define the Hilbert space.

Definition A.1.9 (Hilbert Space). A Hilbert space is a vector space H( ) with K

an inner product ⟨·, ·⟩ complete in the norm defined by the inner product ||ψ|| =
p
⟨ψ, ψ⟩.

The norm on H satisfies:

∞
X
||ψ||2 := ||ψj ||2 < ∞.
j=1

Often we are interested in direct sums or direct (tensor) products of Hilbert spaces
such as tensor product of Hilbert spaces (crucial to multi-qubit quantum compu-
tations). We discuss these in later Chapters. In some cases, multipartite Hilbert
spaces may be characterised by direct sums rather than direct (tensor) products.
This is the case where a Hilbert space is decomposable into disjoint regions.

Proposition A.1.1 (Hilbert Space Direct Sum). The direct sum of a sequence H
of separable Hilbert spaces denoted by
∞
M
H := Hj ,
j=1

has a representation as a space of sequences ψ = (ψ1 , ψ2 , ψ3 , . . .) where ψj ∈ Hj .

A direct sum of each Hj is the set of ψ = (ψ1 , ψ2 , ψ3 , . . .) where ψj = 0 for all

but finitely many ψj . An inner product on the direct sum can be defined via:
∞
X
⟨ϕ, ψ⟩ = ⟨ϕj , ψj ⟩,
j=1
A.1. OVERVIEW 239

for all ϕj , ψj ∈ Hj and ψ, ϕ ∈ H. Each subspace inherits the inner product and
associated completeness properties of H, confirming each Hj as a Hilbert space. The
partitioning of a Hilbert space via a Cartan decomposition of Lie algebraic generators
(which form a Hilbert space as a vector space upon which an inner product is defined)
g into g = k ⊕ p is an example of such a decomposition which preserves the overall
structure of the vector space within which quantum state vectors subsist [239]. Thus
the essential properties of quantum state vectors are retained in the case of such
decomposition. Hilbert spaces have a number of features of importance to quantum
computational contexts. These include certain orthogonality relations which are
important to their decomposition in ways that can simplify problems (such as those
of unitary synthesis that we examine in later Chapters).

Definition A.1.10 (Orthogonality and orthonormality). A Hilbert space H exhibits

the following relations related to orthogonality:

1. ψ, ϕ ∈ H are orthogonal if ⟨ψ, ϕ⟩ = 0.

2. (Orthogonal space) If Hi ⊂ H, the orthogonal subspace Hi⊥ ⊂ H of Hi is:

Hi⊥ = {ϕ ∈ H|⟨ψ, ϕ⟩ = 0 for all ψ ∈ Hi }

In many quantum information tasks, we assume that states ψ ∈ H can be de-

composed into a direct sum Hi ⊕ Hi⊥ = H such that ψ = ψi + ψi⊥ . States are then
represented in terms of an orthonormal basis for H.

Definition A.1.11 (Orthonormal Basis). The set {ej }, ej ∈ H, j = 1, ..., dim H is

an orthonormal basis if ⟨ej , ek ⟩ = δjk (the Kronecker delta) and H = span{aej |a ∈
C}. In this case:
X
||ψ||2 = |aj |2 (A.1.2)
j

The boundedness and linearity of states means we can also consider them as
P
(convergent) sums ψ = j aj ej , aj ∈ C where aj = ⟨ej , ψ⟩ ∈ C .

A.1.1.1 Tensor product states

In many cases of interest, we have disjoint or multi-state (e.g. multi-qubit) quantum

systems of interest. This is especially the case with the multi-qubit systems the
subject of Chapters 3 and 4. Vector spaces for multi-state systems are represented
by tensor products of vector spaces and Hilbert spaces which we define below.
240 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

Definition A.1.12 (Tensor Product). A tensor product of two vector spaces V1 (K), V2 (K)
is itself a vector space W (F) equipped with a bilinear map T : V1 (K) × V2 (K) →
W (K).

To allow us to represent a single (equivalent) tensor product between two such

vector spaces, we note the following ‘universal property’ [45] which allows elements
of W to be written as elements u ⊗ v where u ∈ V1 , v, v ∈ V2 . Given another vector
space Z(K) along with a bilinear mapping Φ : V1 × V2 → Z, it can be shown that
e : W → Z such that the diagram below commutes:
there exists a unique linear map Φ
It can then be shown that the tensor product of V1 , V2 exists and is unique up to a

T
V1 (K) × V2 (K) W (K)

Φ̃
Φ

U
Figure A.1: Commutative diagram showing the university property of tensor products discussed
above.

canonical isomorphism. This allows us to represent the mapping T in terms of the

direct product T : V1 × V2 → V1 ⊗ V2 , (u, v) 7→ u ⊗ v.
The universal property guarantees that given Φ : V1 × V2 → Z, there will exist a
map Φ̃ : V1 ⊗ V2 → Z with Φ̃(u ⊗ v = Φ(u, v). Among other things this means that
dim(V1 ⊗ V2 ) = (dim V1 )(dim V2 ). Under such assumptions, arbitrary ψ ∈ V1 ⊗ V2
can then be decomposed as linear compositions of u ⊗ v for a given basis. It also
allows the space of operators (linear maps Ak ∈ End(Vk )) to be constructed in
tensor-product form by guaranteeing the existence of Ak ⊗ Aj : Vk ⊗ Vj → Vk ⊗ Vj
exhibiting (a) homomorphic structure (Ak ⊗ Aj )(u × v) = (Ak u) ⊗ (Aj v) and linear
compositionality (Ak ⊗ Aj )(Ap ⊗ Aq ) = (Ak Ap ) ⊗ (Aj Ar ). For tensor products
of Hilbert spaces, we note that given multiple inner product spaces Vk with inner
products ⟨vp , vq ⟩k , there exists an inner product on the tensor product space ⊗k Vk
given by:

⟨up ⊗ vp , uq ⊗ vq ⟩ = ⟨up , uq ⟩k ⟨vp , vq ⟩j

where up , uq ∈ Vk and vp , vq ∈ Vj . In this way, a Hilbert tensor product ⊗k Hk space

is equipped with sufficient structure via the inner product and bounded operators
as per above (allowing, among other things, metric calculations relevant to machine
learning and variational techniques to be implemented). We note for completeness
that orthonormal bases {ej }k for Hk form a tensorial orthonormal basis for the
Hilbert space ⊗Hk given by {×k ej,k } (tensor product of orthonormal bases).
A.1. OVERVIEW 241

A.1.1.2 Quantum State formalism

A significant proportion of quantum information research concentrates on single (or

multiple) qubit (quantum bit) systems. Qubits are two-level quantum systems (two-
dimensional state spaces) comprising state vectors with orthonormal bases {|0⟩ , |1⟩}
(and accompanied by corresponding operator representations below). In this formal-
ism a qubit can be defined as follows as a two-state quantum mechanical system:

|ψ⟩ = a |0⟩ + b |1⟩ . (A.1.3)

Qubits are normalised such that they are unit vectors ⟨ψ|ψ⟩ = 1 (that is |a|2 + |b|2 =
1), where a, b ∈ C are amplitudes and their squared moduli, the probabilities, for
measuring outcomes of 0 or 1 (corresponding to states |0⟩ , |1⟩ respectively). Here
⟨ψ|ψ ′ ⟩ denotes the inner product of quantum states |ψ⟩ , |ψ ′ ⟩. As can be observed
in equation (A.1.3), as distinct from a classical (deterministic or stochastic) state,
quantum states may subsist ontologically in a superposition of their basis states.
The stateful characteristic of quantum systems is also manifest in the first pos-
tulate of quantum mechanics, which associates Hilbert spaces to enclosed physical
systems under the assumption that ψ ∈ H provides complete information about the
system [238]. In quantum computing, qubits may be either physical (representing
the realisation of a two-level physical system conforming to the qubit criteria) or
logical, where a set of systems (e.g. a set of one or more physical qubits) behaves
abstractly in the manner of a single qubits which is important in error correction.
Quantum computations may also involve ancilla qubits used to conduct irreversible
logical operations [11].
For problems in quantum computing and control, we are interested in how quan-
tum systems evolve, how they are measured and how they may be controlled. To
this end, we include below a brief synopsis of operator formalism and evolution of
quantum systems. In Appendix B, we characterise these formal features of quantum
information processing algebraic and geometric terms. One of the consequences of
definition A.1.1 is the existence, unlike in the classical case, of quantum superposi-
tion states i.e. that a quantum system can exist simultaneously in multiple different
states until measured. Given a quantum system in states represented by unit vectors
|ψ1 ⟩ , |ψ2 ⟩ , . . . , |ψn ⟩ ∈ H, any linear combination of these states is also in H:
n
X
|ψ⟩ = ck |ψk ⟩ (A.1.4)
k=1

where ck ∈ C are complex coefficients satisfying the normalization condition nk=1 |ck |2 =
P

1. Such a superposition state |ψ⟩ is a unit vector in H. It is a valid quantum state

242 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

under axiom A.1.1. The state |ψ⟩ encodes the probabilities of the system being
found in any of the basis states upon measurement, with the probability of finding
the system in state |ψk ⟩ given by |ck |2 .

A.1.2 Operators and evolution

Often it is more convenient to work with (and consider evolutions of) functions
and algebras associated with operators on H rather than ψ ∈ H directly. This
operator formalism manifests in both recasting state spaces in terms of density
operators and considering maps between operators in terms of quantum channels.
Such mappings also serve to connect quantum formalism with the types of mappings
and formalism common within machine learning and geometry. We now consider
formalism associated with the second axiom of quantum mechanics, namely that
physical observables correspond to real eigenvalues of (measurement) operators.

Axiom A.1.2 (Observable Operators). For every measureable physical observable

quantity m is defined by a real-valued function, denoted an operator, M defined
on classical phase space, there exists a self-adjoint (Hermitian) linear measurement
operator M acting on H. The results of measurement of the state must be a real-
valued eigenvalue of the measurement operator.

A.1.2.1 Operator formalism

In quantum information, operators are thought of as maps between Hilbert spaces.

To understand this, we begin by considering the general class of endomorphisms on
H (and later describe Hilbert-Schmidt space and channels) and define B(H) as the
space of bounded endomorphic linear maps of H:

Definition A.1.13 (Bounded linear operators). The space B(H) of bounded linear
operators A on H is defined by

B(H) = {A : H → H | A is linear and bounded},

where A is bounded if there exists a non-negative real number K such that:

∥Aψ∥ ≤ K∥ψ∥

for all ψ ∈ H. The smallest such K is the operator norm of A, denoted by ∥A∥.

We also note Riesz’s Theorem that for any bounded linear functional ξ : H → C,
there exists a dual element χ such that ξ(ψ) = ⟨χ, ψ⟩ (i.e. and so an equivalence
of norms). Riesz’s Theorem guarantees the existence in such cases of an adjoint
operator A† which crucial for Hermitian operators.
A.1. OVERVIEW 243

Definition A.1.14 (Adjoint of an Operator). For any A ∈ B(H), there exists a

unique linear operator A† : H → H, called the adjoint of A, such that

⟨ϕ, Aψ⟩ = ⟨A† ϕ, ψ⟩

for all ϕ, ψ ∈ H.

The adjoint is also known as the Hermitian conjugate. In physics, sometimes

the adjoint of such an operator, A† is denoted A∗ (we reserve the ∗ notation for
dual spaces). This can occasionally lead to some confusion as sometimes A∗ is
just considered the complex (entry-wise) conjugate. We set out a few important
properties of such operators below.

Proposition A.1.2 (Adjoint Operator Properties and Hermitian Operators). For

all A, B ∈ B(H) and α, β ∈ C we have

(A† )† = A (AB)† = B † A† ,
(αA + βB)† = ᾱA† + β̄B † I † = I.

Note too that ∥A† ∥ = ∥A∥ < ∞. If A† = A, then A is self-adjoint which we denote
as Hermitian (or skew-self-adjoint if A† = −A).

Matrix representation of Hermitian conjugation is via the (complex)-conjugate

transpose A† = (A∗ )T . In braket notation we define |ψ⟩† = ˙ ⟨ψ|. The adjoint op-
erator is also important practically as an element used when determining Cartan
involutions (and thus Cartan decompositions) with respect to symmetric and anti-
symmetric subspaces of g. This allows us to introduce unitary operators which are
crucial to quantum information processing and our results herein. Before we define
unitaries, we set out a few standard important operator classes.

Definition A.1.15 (Positive Semidefinite Operators). An operator A ∈ B(H) is

positive semi-definite if for all ψ ∈ H, ⟨ψ, Aψ⟩ ≥ 0. This equates to A = B † B for
some operator B ∈ B(H) such that:

⟨ψ, Aψ⟩ = ⟨ψ, B † Bψ⟩ = ⟨Bψ, Bψ⟩ = ||Bψ||2 ≥ 0 (A.1.5)

Denote the set of such operators as:

Pos(H) = {B † B|B ∈ B(H)}.

Positive semidefinite operators are Hermitian. A positive definite operator is

a positive-semi definite operator that is also invertible i.e. A ∈ P os(H) such that
244 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

det(A) ̸= 0. Positive semi-definite operators are commonly expressed via constraints

on the inner product, namely that ⟨Av, v⟩ ≥ 0 for v ∈ H. Operators A ∈ B(H)
that commute with their adjoint i.e. AA† = A† A are denoted normal operators and
share the same set of eigenvectors. Using these definitions we can then specify two
important classes, projection operators and isometries.

Definition A.1.16 (Projection Operators). An operator P ∈ Pos(H) is an (or-

thogonal) projection operator if P 2 = P and, for some closed V ∈ H, P (V ) = I and
P (V ⊥ ) = 0.

Note that projection operators are normal operators and that the set of projec-
tion operators is denoted Proj(H). For each V there exists a P such that the image
of P is V .

Definition A.1.17 (Isometries). A bounded linear operator A ∈ B(H, Y) (for

Hilbert spaces H, Y) is defined to be an isometry if ∥Aw∥ = ∥w∥ for all w ∈ H
(i.e. it acts as a fixed point (identity) operator with respect to H. The statement
is equivalent to A† A = IH (the identity operator on H). We define the set of such
isometries as:

U (H, Y) = {A ∈ B(H, Y) | A† A = IH }

In order for an isometry of the form A ∈ U (H, Y) to exist, it must hold that
dim(Y) ≥ dim(H). Every isometry preserves not only the Euclidean norm, but
inner products as well: ⟨Au, Av⟩ = ⟨u, v⟩ for all u, v ∈ H. Isometries allow us to
then define unitaries operators as the set of isometries U : (H, H) → H (which map
H to itself).

Definition A.1.18 (Unitary Operator). An operator U ∈ B(H) is unitary if:

⟨U ϕ, U ψ⟩ = ⟨ϕ, ψ⟩

(i.e. is surjective and maintains inner products) for all ϕ, ψ ∈ H. Unitary operators
preserve norms ∥U ψ∥ = ∥ψ∥.

Unitary operators are thus a specific type of normal operator. From this defi-
nition, we deduce that ∥U ∥ = 1. U is unitary if and only if U † = U −1 such that
U U † = U † U = I. Hermitian operators, unitary operators and positive semi-definite
operators are central to quantum information processing (and the theory of quantum
mechanics generally). Importantly, there is a bijection from the space of Hermitian
operators on the complex space H to real Euclidean spaces. Unitary operators rep-
resent propagators of quantum state vectors ψ(t) ∈ H and, moreover, represent
A.1. OVERVIEW 245

solutions to Schrödinger’s equation (which we discuss in more detail), forming a

group. Not all evolution is unitary (e.g. open quantum systems and measurement
itself). However evolution of quantum systems modelled via unitary operators is
usually the foundational basis for more complex models.
From the existence of normal operators we obtain the spectral theorem which
states that every normal operator X ∈ B(H) can be expressed as a linear combina-
tion of projection Πk operators onto distinct orthogonal subspaces that allows the
decomposition of such normal operators into the sum of (complex) eigenvalues and
such projection operators. This fact is expressed via the spectral theorem.

Theorem A.1.19 (Spectral Theorem). Given H with normal operator X ∈ B(H),

P
there exists λk ∈ C (k = 1, ..., dim H) and projection operators k Πk = I in Proj(H)
such that:
m
X
X= λk Πk
k=1

This decomposition is unique in the sense that each λk corresponds to an eigen-

value of X, and each Πk projects onto a subspace formed by the eigenvectors as-
sociated with λk . Normal operators that may be decomposed in this way have an
orthonormal basis, and the same basis may be used for decomposing commuting nor-
mal operators. In particular, both Hamiltonians H and measurement operators are
Hermitian with real eigenvalues such that the eigenvalues in equation (A.1.6) above
are real. Among other things, this allows the Jordan-Hahn decomposition [240]
P
where the spectral decomposition of H = k λk Πk is obtained via partitioning into:
X X
P = max{λk , 0}Πk Q= max{−λk , 0}Πk (A.1.6)
k k

for P, Q ∈ P os(H), P Q = 0, allowing H to be written as:

H =P −Q (A.1.7)

The spectral decomposition allows functions f ∈ Cm to be written as f (X) =

P
k f (λk )Πk . The spectral decomposition is also related to more general decomposi-
tions, such as singular value decompositions which hold for arbitrary X ∈ B(H), and
polar decompositions (see [43, 45]) which are commonly used in machine learning
contexts. They are also related to the later focus of this work in terms of gener-
alised decompositions of Lie group manifolds, such as Cartan decompositions. We
now define an important operator, the trace.

Definition A.1.20 (Trace). Given a non-negative self-adjoint A ∈ B(H) the trace

246 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

of A is given by:
X
Tr(A) = ⟨ej , Aej ⟩,
j

for an orthonormal basis {ej } of H (where A is denoted ‘trace class’ if Tr(A) is

finite).

Note that for any A ∈ B(H), A† is self-adjoint and non-negative. As a result,

√
A† A is well-defined given that for such self-adjoint operators, for a bounded and
measureable functional f : σ(A) → C we have:
Z Z
A
f (A) = f (x)dµ (x) A= xdµA (x)
σ(A) σ(A)

where σ(A) is a Borel σ−algebra and µA is a projection-valued measure on µA :

σ(A) → Π(H) (see section (A.4 for background)). The trace is independent of choice
of orthonormal basis with standard properties such as cyclicity Tr(AB) = Tr(BA)
for A, B ∈ B(H). The trace operation can sometimes usefully be considered as
a tensorial contraction of A = Aij with δij i.e. T r(A) = Ai j δij . For self-adjoint
P
operators, the trace has associated eigenvalues λj such that Tr(A) = j λj < ∞.
An important operation to note from a measurement and machine learning per-
spective is the partial trace operation below. We can define the partial trace as
follows.

Definition A.1.21 (Partial trace). Given a tensor product of Xk ∈ B(H), the

partial trace of Xk with respect to the product state is:

(Trk ⊗ I)(Xk−1 ⊗ Xk ⊗ Xk+1 ) = Tr(Xk )(Xk−1 ⊗ Xk+1 ) = TrXk (Xk−1 ⊗ Xk+1 )

(A.1.8)

Here Trk ⊗ I is shorthand for the application of the trace operator to Xk . The
partial trace has the effect of, in geometric terms (see below) contracting the relevant
(tensor) product space (along the dimension associated with Xk ) and scaling the
residual tensor by the value of that trace in C. In density operator formalism, the
partial trace gives rise to a reduced density operator. For ρT representing the tensor
product state HA ⊗ HB , the partial tracing-out of HB in effect contracts the total
tensor state up to a scalar TrA , that is:

ρA = TrB ρT (A.1.9)

In later sections below, we note the relation of the trace to quantum measure-
ment.
A.1. OVERVIEW 247

A.1.3 Density operators and multi-state systems

Modelling multi-state and multi-qubit systems is an important feature of quantum

information systems’ research. It is also the primary focus of results in Chapters
3 to 5. In the foregoing, typically a quantum system is described as an element of
a Hilbert space ψ ∈ H whose evolution is characterised by the operation of linear
operators |ψ(t)⟩ = U (t) |ψ0 ⟩. However, for multi-state systems, this state formalism
becomes convoluted when considering measurement and expectation outcomes of
quantum systems. Hence we introduce completely positive self-adjoint operators,
denoted density operators. Such operators are akin to generalised probability distri-
butions over expectations of observables. Thus for a distribution D of expectation
values of observables of a measurement operator M , it can be shown that there ex-
ists a density operator ρ where D(M ) = Tr(ρA). For unit vectors in H, this reduces
to the projection of M onto ψ with expectation given by ⟨ψ, M ψ⟩. First, we define
a Hilbert-Schmidt operator in terms of the trace.

Definition A.1.22 (Hilbert-Schmidt Operator). An operator A ∈ B(H) is Hilbert-

Schmidt if Tr(A† A) < ∞.

All trace-class operators are Hilbert-Schmidt and the trace exhibits the usual
cyclic property Tr(AB) = Tr(BA). The inner product is given by ⟨A, B⟩ = Tr(A† B)
with the norm as per above ||A||2 = ⟨A, A⟩. With the concept of the trace we can
also note a number of norms (and metrics) used further on, in particular norms
(and thus metrics) for comparing quantum states (such as trace-distance related
measures) and in Chapter 5 for variational techniques. Schatten p-norms are given
by:
1
p1
||X||p = Tr (A† A) 2 (A.1.10)

for X ∈ B(H). Schatten norms are important in quantum information processing in

that they are preserved under unitary transformations. Schatten 2-norms (where p =
2) are sometimes denoted the Frobenius norm which corresponds to the Euclidean
norm of an operator X represented as a vector M ∈ M (C):
!1/2
X
||M ||2 = |M (j, k)|2 (A.1.11)
j,k

ranging over indices j, k. For trace distance we have as follows.

Definition A.1.23 (Trace distance). A Schatten p-norm such that p = 1 is denoted

248 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

the trace distance:

√
||X||1 = Tr( X † X). (A.1.12)

This also equals the sum of singular values of X. The trace distance is commonly
used to compare distances of quantum states via their operator representation as
density operators (matrices). We now define the density operator.

Definition A.1.24 (Density operator). A density operator is a positive semi-definite

operator ρ ∈ B(H) that is self-adjoint and non-negative with Tr(ρ) = 1.

Density operators tend to be the more formal representation of quantum states.

The set of density operators is denoted D(H) which can be shown to be a convex set.
As Bengtsson et al. note [241] (see §8.7), this is crucial for the additional structure
required for the set of such operators to form an algebra. For a state description
given by alphabet Γ, such that ρm ∈ D(H) (the density operator for the outcome
described by m ∈ Σ (i.e. outcomes are elements of the relevant register), which in
practice depends upon measurement outcome expectations), it can be shown that:
X
ρ= p(m)ρm . (A.1.13)
m∈Γ

Here p describes the probability of observing (observations that characterise the

system in state) ρm and so in this sense represents a probability distribution over
(pure) states. Equation (A.1.13) describes a mixed state. Such a description is of an
ensemble of quantum states which can be construed as a function η : Γ → P os(H)
P
such that Tr( m∈Γ η(m)) = 1. Here the trace acts as a sort of normalising measure
such that ρm = η(m)/Tr(η(m)). Density matrices characterise expectation values
of observables as families of expectation values via:

Φρ (A) = Tr(ρA) = Tr(Aρ) (A.1.14)

where A ∈ B(H), Φρ is a linear functional Φρ : B(H) → C, associating operators

on H to scalar values in C. The map Φρ reflects a distribution of measurement
outcomes satisfying Φρ (I) = 1, Φρ (A) ∈ R for A self-adjoint and non-negative if A
is non-negative. We denote ρ = |ψ⟩⟨ψ| with Tr(|ψ⟩⟨ψ|A) = ⟨ψ, Aψ⟩. Note also that
relative phases ψ1 = eiθ ψ2 indicate the same states ρ1 = ρ2 , reflecting the physical
equivalence of states that differ only by a global phase. The trace norm above can
then be used to calculate trace distance as a measure of similarity as the Schatten
1-norm:

Definition A.1.25 (Trace distance). Given two operators ρ, σ ∈ B(H) representing

A.1. OVERVIEW 249

quantum systems, the trace distance between such operators is given by:

1
dT (ρ, σ) = ||ρ − σ||1 (A.1.15)
2

Trace distance is considered a generalisation (in the quantum setting) of to-

tal variation distance between two probability distributions. The metric is a central
component of the quantum machine learning objective (loss) functions used in Chap-
ters 3 and 4. Density matrices are also convenient for distinguishing between pure
states and mixed states which we define as follows.

Definition A.1.26 (Pure and Mixed States). A density operator ρ ∈ B(H) rep-
resents a pure state if there exists a unit vector ψ ∈ H such that ρ is equal to the
orthogonal projection onto span{ψ}. The density matrix ρ is called a mixed state if
no such unit vector ψ exists.

Closed-system pure states remain pure under the action generated by Hamilto-
nians. For pure states, Tr(ρ2 ) = 1 while for mixed Tr(ρ2 ) < 1. Density matrices
form a convex set such that λρ1 + (1 − λ)ρ2, λ ∈ (0, 1). Pure states are those that
cannot be expressed as ρ = λρ1 + (1 − λ)ρ2 for ρ1 ̸= ρ2 . Mixed and pure states can
also be conceived of such that if the state of a quantum system is known exactly
|ψ⟩, i.e. where ψ = |ψ⟩ ⟨ψ| then it is denoted as a pure state, while where there is
P
(epistemic) uncertainty about its state, it is a mixed state i.e. ρ = i pi ρi where
Tr(ρ2 ) < 1 (as all pi < 1). Such properties of pure and mixed states are important
tests for the effects of, for example, decoherence arising from various sources of noise
(see Chapter 3 for more detail).
In many applications of quantum information processing, we seek a metric to
ascertain similarity between quantum states. Density matrices can be used to define
a metric such as quantum relative or von Neumann entropy noted in our discussion
of metrics (relevant to quantum machine learning below in subsection A.1.8):

Definition A.1.27 (Quantum relative (von Neumann) entropy). Given a density

matrix ρ ∈ B(H) for a quantum state and trace operation, we define the von Neu-
mann entropy or quantum relative entropy as:

S(ρ) = Tr(−ρ log ρ)

A state ρ is pure if an only if S(ρ) = 0 and mixed otherwise. Density matrix

formalism can be used to formally understand and express quantum superposition
in terms of coherent (quantum) and incoherent superposition.

Definition A.1.28 (Coherent Superposition). For two quantum states ψ1 , ψ2 ∈ H

with amplitudes c1 , c2 ∈ C, the state c1 ψ1 + c2 ψ2 is called coherent if it cannot be
250 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

transformed into another state with different relative phases. That is, such a state
is coherent if the only case where:

c1 eiθ1 ψ1 + c2 eiθ2 ψ2 = c1 ψ1 + c2 ψ2

are physically the same state when eiθ1 = eiθ2 for θ1 , θ2 ∈ R.

Coherence is characterized by preservation of relative phases. By contrast, an

incoherent superposition of states ψ1 , ψ2 ∈ H with probabilities p1 , p2 ≥ 0 such that
p1 + p2 = 1 is described by the density operator ρ = p1 |ψ1 ⟩⟨ψ1 | + p2 |ψ2 ⟩⟨ψ2 |. The
expectation value of an observable A in this state is given by:

Tr(ρA) = p1 ⟨ψ1 , Aψ1 ⟩ + p2 ⟨ψ2 , Aψ2 ⟩

which is central to measurement statistics by which states and registers are de-
scribed. We now briefly describe quantum channels due to their relevance, in our
context, to measurement and system evolution.

A.1.3.1 Quantum channels

The concept of a channel derives from information theory [237] as a way of ab-
stracting a medium or instrument of information transfer. The concept has since
been adapted for application in quantum information theory such that a quantum
channel is a linear map from one space of square operators to another, that satisfies
the two conditions of (complete) positivity and trace preservation.

Definition A.1.29 (Quantum Channel). A quantum channel is a linear map

Φ : B(H1 ) → B(H2 )

such that Φ is (a) completely positive and (b) trace-preserving. Maps satisfying these
two properties are denoted CPTP maps.

The set of such channels is denoted C(H1 , H2 ) and C(H) for C(H, H). Of
particular importance is the concept of unitary channels.

Definition A.1.30 (Unitary Channel). Given the unitary operator U ∈ B(H), then
the following map is a unitary channel:

Φ(X) = U XU †

for every X ∈ B(X ) (e.g. X = ρ) is an example of a channel, being a completely

positive, trace-preserving (CPTP) map.
A.1. OVERVIEW 251

Such channels represent an idealised channel or memory operator, as operat-

ing on the quantum state causes no change in the state of the register ψ it acts
upon (in this sense representing a fixed point operator with respect to the register
state) [43]. For multi-state systems, there exist product channels C(⊗k Xk , ⊗k Yk ).
The trace operator can also be construed as a channel. For problems in quan-
tum control and geometry explored below, we are mainly interested in learning the
CPTP map Φ : U (0) → U (T ) satisfying some optimality condition, such as min-
imisation of evolution time or energy. Channels play a crucial role across quantum
information processing, giving rise to considerations such as optimal representations
(distinct from Lie group representations) for such channels. Importantly, the class
of channels we seek to learn preserves those essential properties of quantum oper-
ators above, such as Hermiticity and trace-preservation (ensuring, for example, we
remain within the closure of unitary or special unitary groups when formulating
unitary sequences). Other standard results for channels, such as that for H1 , H2 ,
the set C(H1 , H2 ) is compact and convex are important in guaranteeing the control-
lability and reachability of target unitaries. We direct the reader to [43] for more
detailed exposition.
In the formalism of quantum information, we can also cast measurements as
channels mapping to classical registers in Rn . For this purpose, we introduce the
notion of a quantum-classical channel.

Definition A.1.31 (Quantum-classical channel). A quantum-classical channel is a

CPTP map

Φ ∈ C(H1 , H2 )

which transforms a quantum state ρ1 ∈ B(H1 ) into a classical distribution of states

represented by a diagonal density matrix ρ2 ∈ B(H2 ).

The channel is realised via a measurement comprising a dephasing channel com-

ponent ∆ that eliminates off-diagonal (coherent) elements i.e. it is a quantum-
classical channel if Φ = ∆Φ where ∆ ∈ C is a dephasing channel which has the
effect of transforming off-diagonal entries in ρ1 ∈ B(H1 ) to zero while preserving
diagonal entries.
Reiterating our discussion of density operators above, this equivalent to Φ : ρ →
ρ′ where ρ′ is a diagonal density operator. As Watrous notes, these quantum-classical
channels are those which can be represented as measurements of quantum registers
X, formally described by ρ, with measurement outcomes m stored in a register Y
with classical state set Σ. It can be shown that for every such channel Φ ∈ C, there
exists a unique (measurement) µ : Σ → P os(H) such that (connecting standard
252 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

notation with Watrous’s notation):

X
Φ(X) = ⟨µ(m), X⟩ Em (A.1.16)
m∈Σ
X X
†
= Φ(ρ) = (Mm ρMm )= pm Em (A.1.17)
m∈Σ m∈Σ

†
where pm = Tr(Mm ρMm ) is the probability of obtaining outcome m, and Em rep-
resents the eigenstates corresponding to outcome m projected onto a diagonal basis
(i.e. diagonalised with entries corresponding to measurement outcomes equivalent
to Ea,a for a = m diagonal basis operators) to represent the post-measurement state.
Hence, quantum-classical channels are general measurement channels in quantum in-
formation. Other features of quantum-classical channels, such as their compactness
and convexity over H and the existence of product measurements (for multi-state
systems), are noted in the literature (and ultimately important to both control/
reachability and the learnability. We explore their implementation in terms of ma-
chine learning techniques in later Chapters. Another concept of importance in mea-
surement and quantum machine learning is that of partial measurements, especially
the relationship between partial measurements and the partial trace operation.

Definition A.1.32 (Partial measurement). For a compound (quantum) state reg-

ister X = (Yk ), k = 1, ..., n a measurement on a single register µ : Σ → P os(Yk ) is
denoted a partial measurement.

Partial measurements are thus related to partial traces and, in geometric con-
texts (as discussed below), contraction operations over tensor fields (where the trace
operation can, under certain geometric conditions relevant to quantum control, be
related to tensorial contraction).

A.1.4 Quantum evolution

We now consider the evolution of quantum systems, beginning with the evolution
axiom (adapted from the literature including [45]).

Axiom A.1.3 (Evolution (Schrödinger equation)). Quantum state evolution is de-

scribed by the following first-order differential equation, denoted the Schrödinger
equation:

dψ
i = Hψ (A.1.18)
dt

where ψ ∈ H and H ∈ B(H) is a fixed Hermitian operator denoted as the Hamilto-

nian. In the standard way we set ℏ = 1 for convenience.
A.1. OVERVIEW 253

We examine the formalism and consequences of this axiom below. In later Ap-
pendices below, we connect the evolution of quantum systems to geometric and
algebraic formalism (in terms of transformations along Lie group manifolds). Before
we do so, we include a short digression on the two perspectives of quantum evolution
formalism.

A.1.4.1 Two pictures of quantum evolution

Quantum mechanics is traditionally portrayed as being describable by two some-

what different but mathematically equivalent formulations, the Schrödinger picture,
focusing on state evolution and the Heisenberg picture, focusing on operator evo-
lution [242]. In the Schrödinger picture, the quantum system is represented by a
wavefunction |ψ(t)⟩ parametrised by time t and which evolves according to equation
(A.1.18). In this picture, the operators, representing observable, that act on |ψ(t)⟩
are time-independent. Measurement probabilities of an observable (see below) are
given by the inner product of |ψ(t)⟩ with the observable eigenvector. By contrast,
in the Heisenberg picture quantum state vectors are time-independent |ψ⟩ and it is
the operators Ô(t), corresponding to measureable observables, which evolve in time
as:

Ô(t) = eiHt Ô(0)e−iHt

We also mention these well-known formalisms as we can connect both with a more
geometric formalism for expressing quantum evolution in Chapters 4 and 5, such
as the Maurer-Cartan forms and the adjoint action of Lie groups. It is also worth
mentioning the hybrid interaction (or von Neumann) picture in which both states
and operators evolve, expressed (as we discuss further on) in terms of the Liouville
von Neumann equation (for closed quantum systems) and the master equation (for
open quantum systems [144]):

1
ρ̇ = −i[H, ρ] + κD[Ô]ρ D[Ô]ρ = ÔρÔ† − (Ô† Ôρ + ρÔ† Ô) (A.1.19)
2

where each term is parametrised by t (omitted for brevity). Here Ô is an arbitrary

operator, D is a superoperator term acting to transition quantum (density) operators
(in this case, density operators) ρ and κ the decoherence rate. The interaction
picture is specifically relevant to our greybox open quantum systems’ simulations
for machine learning that is the subject of Chapter 3. We include below a number of
standard definitions and results relevant to understanding the operator formalism
used throughout this work. Later we connect such formalism to geometric and
algebraic methods, along with formalism in quantum machine learning.
254 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

A.1.5 Hamiltonian formalism

In density operator formalism, state evolution in Axiom (A.1.3) is described via:

dρ
= −i[H, ρ] ρ(t) = e−iHt ρ0 eiHt (A.1.20)
dt

for ρ0 = ρ(t = 0). In unitary operator formalism, the above can be written:

dU U -1 = −iHdt (A.1.21)

where ψ(t) = U (t)ψ0 . It is useful to unpack this formalism and define the Hamil-
tonian. In later Chapters, we connect the Hamiltonian to geometric, algebraic and
variational principles (such as the framing of equation (A.1.21) in terms of the
Maurer-Cartan form). Hamiltonians as a formalism arise from variational tech-
niques (see [44] for a detailed discussion) for determining equations of motion.

Definition A.1.33 (Hamiltonian operator). The operator H ∈ B(H) described

in equation (A.1.21) is a self-adjoint (Hermitian) denoted the Hamiltonian. The
Hamiltonian operator is described as the generator of quantum evolutions.

The Hamiltonian H(t) ∈ B(H) of a system is the primary means of mathemat-

ically characterising the dynamics of quantum systems. Hamiltonians specify the
data-encoding process by which information is encoded into quantum states along
with how the system evolves and how the quantum computation may be controlled.
Solutions to equation (A.1.21) are of the form:
Z ∆t
U = T+ exp −i H(t)dt (A.1.22)
0

where T+ is the time-ordering operator (described below). As can be seen, U is

unitary as U U † = I (the adjoint action has no effect on T+ ). Being unitary, these
solutions at once can be represented as linear operators on H that preserve inner
products (see definition B.2.4) and Haar measure (see definition A.4.8). As per
definition A.1.30, these solutions can be represented as unitary channels acting on
density operators via Φ(ρ) = U ρU † . Unitary evolution itself is required to preserve
quantum coherence and probability measures of systems (which give rise to the
enhanced computational power of quantum systems). Moreover, as we discuss in
Appendix B, the set of unitaries forms a Lie group (see definition B.2.4) and so
has a natural geometric interpretation in terms of a Lie group manifold G whose
evolution is described by a Hamiltonian composed of generators of the corresponding
Lie algebra g.
A.1. OVERVIEW 255

A.1.5.1 Time-independent approximations

Solving the time dependent Schrödinger equation given in definition A.1.18 is of-
ten challenging or unfeasible, requiring perturbation methods or other techniques.
A common approximation used in quantum information processing and quantum
control is the time-independent approximation to the Schrödinger equation. Recall
from equation (A.1.22) the time-dependent solution to the close-system Schrödinger
equation is given by:
RT
U (T ) = T+ e−i 0 H(t)dt
(A.1.23)

The time-ordering reflects the fact that the generators within the time-dependent
Hamiltonian do not, in general, commute at different time instants (i.e. [H(ti ), H(tj )] ̸=
0). In certain cases, a time-independent approximation to equation (A.1.22) may
be adopted:
RT
U (t) = T+ e−i 0 H(t)dt
(A.1.24)
≃ lim e−iH(tN )∆T e−iH(tN −1 )∆T · · · e−iH(t0 )∆T (A.1.25)
N →∞

where ∆T = T /N and tj = j∆T . The Suzuki-Trotter decomposition [243, 244] and

Lie-Trotter formulation:
n
e−i(H1 +H2 ) = lim e−i(H1 /n) e−i(H2 /n) (A.1.26)
n→∞

under certain conditions allow the time-varying Hamiltonian to be approximated by

a piece-wise constant Hamiltonian. Equation (A.1.26) can be seen via:
n
−i(H1 +H2 )t −i(H1 + H2 )t
e = lim 1 +
n→∞ n
n n
−iH1 t −iH2 t
= lim 1 + 1+
n→∞ n n
n
= lim e−iH1 t/n e−iH2 t/n

n→∞

where we have applied equation (B.2.14). In such cases, the time interval [0, T ]
is divided into equal segments of length ∆t while the Hamiltonian is considered
constant over that interval. Such results are crucial to time-independent optimal
control theory where control problems are simplified by assuming that Hamiltonians
may be approximated as constant over small time intervals such that the system is
described by the control function u(t) applied over such period. Even when only
a subset of the Lie algebra g is available, we may be able to achieve universal
control (that is, the ability to synthesise all generators, by repeated application of
256 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

the control Hamiltonian) via commutator terms that in turn allow us to generate
the full generators of the corresponding Lie algebra. This is framed geometrically
in terms of the distribution ∆ (a geometric framing of our set of generators) being
‘bracket-generating’.
The approximation is generally available under the assumption that the Hamil-
tonian over increments ∆t is constant as mentioned above. Of central importance
to time-independent approximations it the Baker Campbell Hausdorff formula (def-
inition B.2.18). For universal control in quantum systems, repeated application of
the control Hamiltonian allows us to synthesize all generators of the Lie algebra
corresponding to the system’s dynamics. This is where the BCH formula becomes
crucial, as it allows for the generation of effective Hamiltonians that include com-
mutator terms, thus expanding the set of reachable operations. For example, the
BCH formula shows us that by carefully sequencing the application of H1 and H2 ,
we can effectively implement the commutator [H1 , H2 ] as part of the evolution:
n
e−i[H1 ,H2 ]∆t ≈ e−iH1 ∆t/n e−iH2 ∆t/n eiH1 ∆t/n eiH2 ∆t/n (A.1.27)

for sufficiently large n and small ∆t. This approximation is powerful for designing
control sequences in quantum systems with non-commuting dynamics.

A.1.6 Measurement
Measurement is at the centre of quantum mechanics and quantum information pro-
cessing. In the section below, we set out two further axioms relating to measurement,
adapted from the literature (see [11,45]). The first concerns measurement and oper-
ators. The second concerns the characterisation of post-measurement states. Mea-
surement formalism varies to some degree depending on how information-theoretic
an approach adopted is. We incorporate elements of both the traditional and
information-based formalism. We have aimed to present a useful coverage of mea-
surement formalism specifically tailored to later Chapters, where measurement plays
a critical role, for example, in both simulated quantum systems (for use in state
description and other tasks) and machine learning. The first axiom defines mea-
surement. We split this into two in terms of (a) measurement probabilities (e.g.
the Born rule) and (b) the effect of measurement (wave collapse). Note here we
adopt an operator formalism where measurement is represented by a set of oper-
ators {M } the outcomes of which are measurements m ∈ Σ (i.e. mapping to our
register configurations) such that we can write {Mm |m ∈ Σ} ⊂ P os(H).

Axiom A.1.4 (Measurement). Given ψ ∈ H, the probability distribution for the

measurement of an observable m is such that E[M k ] = ⟨ψ, M k ψ⟩. Given an or-
A.1. OVERVIEW 257

thonormal basis {ej }, we may write ψ = ∞

P
j aj ej , aj ∈ C with measurement proba-
bility of observing m given by P (M = m) = |am |2 (the Born rule).

The reference to M k (following Hall [45]) is to the fact that the k-th power of the
measurement operator M corresponds to the k-th moment of the probability distri-
bution for measurement outcomes associated with M (i.e. expectation, variance and
so on). Here M is a measurement operator with eigenvectors em and eigenvalues m
corresponding to outcomes of measurement.
The second measurement axiom (which can be viewed as an extension of the
first) describes the effect of measurement on quantum states, namely the collapse of
the wave function into eigenstates of the measurement operator.

Axiom A.1.5 (Effect of Measurement). Measurement via measurement operator M

with measurement outcome m ∈ R, causes a quantum system transition to ψ ′ ∈ H
such that M ψ ′ = mψ ′ where m is the eigenvalue of Mm corresponding to the observed
outcome, and ψ ′ is the corresponding eigenstate of Mm .

This postulate is also known as the collapse of the wave-function or Copenhagen

interpretation. These postulates can be expanded to encompass multiple possible
measurement outcomes whereby quantum measurements are framed as sets of mea-
surement operators {Mm }, where m indexes the outcome of a measurement (e.g.
an energy level or state indicator), i.e. an observable. The probability p(m) of ob-
servable m upon measuring |ψ⟩ is represented by such operators acting on the state
such that measurement probability is given by:

† †
p(m) = ⟨ψ|Mm Mm |ψ⟩ = tr(Mm Mm ρ) (A.1.28)

with the post-measurement state |ψ ′ ⟩ given by:

Mm |ψ⟩
|ψ ′ ⟩ = q (A.1.29)
†
⟨ψ|Mm Mm |ψ⟩
†
Mm ρMm
ρ′ = †
(A.1.30)
⟨Mm Mm , ρ⟩

†
P
The set of measurement operators m Mm Mm = I reflects the probabilistic nature
of measurement outcomes. Measurement is modelled as a random variable in Σ,
described by the probability distribution p ∈ P(Σ). The act of measurement tran-
sitions M : ρ → ρ′ described by equation (A.1.29). In quantum information theory,
we further refine the concept of measurement as per below.

Definition A.1.34 (Measurement). A measurement is defined in terms of a prob-

258 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

ability measure from the set of measurement outcomes to the set P os(H):
X
†
µ : Σ → P os(H) m 7→ µ(m) := Tr(Mm Mm ρ) µ(m) = I (A.1.31)
m∈Σ

Here Σ is the set of measurement outcomes m (which describe our quantum state
ρ) following application of Mm (note we include µ(m) := Mm as a bridge between
the commonplace formalism of Nielsen et al. [11] and the slightly more information
theoretic form in Watrous [43]. In this notation (from [43]), p(m) = ⟨µ(m), ρ⟩.
When a measurement is performed on a system described by the density operator
ρ, the probability of obtaining outcome m is given by:

†
p(m) = Tr(Mm ρMm ) (A.1.32)

and the state of the system after the measurement (post-measurement state) is

†
Mm ρMm
ρm = . (A.1.33)
p(m)

This measurement process can be described by a quantum channel E that maps

the initial density operator ρ to a final density operator ρ′ that is a mixture of the
post-measurement states, weighted by their respective probabilities:
X X
ρ′ = E(ρ) = p(m)ρm = †
Mm ρMm (A.1.34)
m m

Connecting the standard terminology with the more information terminology for
†
P
measurement, we note that {Mm |m ∈ Σ} ⊂ B(H) such that m Mm Mm = I.
Measurement then corresponds m ∈ Σ being selected at random (i.e. modelled as a
random variable) with probability given by equation (A.1.28) and post-measurement
state given by equation (A.1.30). Such measurements are described as non-destructive
measurements.

A.1.6.1 POVMs and Kraus Operators

In more advanced treatments, positive-operator valued measure (POVM) formalism

more fully describes the measurement statistics and post-measurement state of the
†
system. For a POVM, we define a set of positive operators {Em } = {Mm Mm }
P
satisfying m Em = I in a way that gives us a complete set of positive operators
(such formalism being more general than simply relying on projection operators). In
this way the measurement operators M can be regarded as Kraus operators satisfying
the requisite completeness relation [245, 246]. As we are interested in probability
distributions rather than individual probabilities from a single measurement, we
A.1. OVERVIEW 259

calculate the probability distribution over outcomes via Born rule using the trace:

p(Ei ) = Tr(ρEi ) ⟨A⟩ = Tr(ρA) (A.1.35)

noting the expectation as per equation (A.1.14). Measurement thus has a repre-
sentation as a quantum to classical channel ΦM ∈ C(X , Y) (see definition A.1.31)
under the condition that it is completely dephasing with ΦM (ρ) is diagonal for all
ρ ∈ D(X ). Such quantum-to-classical channels are those that can be realised as a
measurement of a register X. As Watrous (§2.3) notes, there is a correspondence
between the mapping of measurement outcomes m to operators M ∈ P os(X ). Com-
bining Watrous and Nielsen et al.’s formalism, Φ(X) describes a classical description
(distribution) of the outcome probabilities when a quantum state ρ is measured:
X X
†
ΦM (ρ) = Tr(Mm ρMm )|m⟩⟨m| = ⟨µ(m), ρ⟩ Em (A.1.36)
m∈Σ m∈Σ

Further discussion of measurement procedures is set out below (such as in Chapter

3). Note that measurements described above give rise to a set of measurement statis-
tics and that these are used, e.g. via process or state tomography, to reconstruct
or infer the state |ψ⟩ or channel U i.e. such that the outcome of measurements is a
set of measurements {m} from which measurement statistics using operator M are
calculated. Such statistics are then used to infer or reconstruct data describing quan-
tum states (or in operator formalism the channels or operators U (t) themselves). In
practice, for sequences of unitaries (Un (t)) or quantum states |ψ⟩, because of state
collapse (A.1.5), we assume that we cannot measure each unitary at each time point
t (albeit see discussion on post-selection measurement based quantum computing).

A.1.6.2 Composite system measurement

For composite systems, a measurement of the k-th state ρk (or Xk depending on

formalism) will result in a post-measurement state conditional upon that measure-
ment. The state ρk will ‘collapse’ onto one of the eigenvectors of the measurement
operator Mm corresponding to the observable m obtained. In composite systems,
because of the distributivity of scalar multiplication, the post-measurement state is
then a tensor product of the original states with that eigenvector state scaled by the
measurement outcome (then renormalised). This can be expressed as:

η : Σ → P os(⊗k Xk ) (A.1.37)
η(m) = TrXk (Ik−1 ⊗ µ(m) ⊗ Ik+1 )(ρ) (A.1.38)
260 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

where Xk ∈ Hk (note sometimes the trace above is denoted TrHk to indicate tracing
out of the k-th subsystem). The measurement µ(m) essentially maps Xk (which,
recall, could be a superposition state) to a classical state scaled by m (and in this
sense we can think of measurement as a tensorial contraction as discussed in later
Chapters). In this formalism, the post-measurement state in equation (A.1.29)
becomes where the denominator is the normalisation scalar:

η(m) TrXk (Ik−1 ⊗ µk (m) ⊗ Ik+1 )(ρ)

ρ′ = = (A.1.39)
Tr(η(m) ⟨µ(m), ρk ⟩

We are also concerned in particular with projective measurements where each mea-
surement is a projection operator i.e. µ(m) ∈ P roj(H). Each projective measure-
ment projects the state ρ into an eigenstate of the respective projection operator.
The set of such operators {µ(m)|m ∈ Σ} is an orthogonal set, which means there
are at most dim H distinct observables m. It can be shown that any measurement
can be framed as a projective measurement.

A.1.6.3 Informational completeness and POVMs

Positive-operator valued measure (POVM) formalism more fully describes the mea-
surement statistics and post-measurement state of the system. For a POVM, we de-
†
P
fine a set of positive semi-definite operators {Em } = {Mm Mm } satisfying m Em =
I in a way that gives us a complete set of positive operators (such formalism being
more general than simply relying on projection operators).

Definition A.1.35 (POVM). A Positive Operator-Valued Measure (POVM) on a

Hilbert space H is a set {Em } of operators that satisfies the following conditions:
X
Em ≥ 0 ∀m Em = IH . (A.1.40)
m

Here, Em represents the effect corresponding to the measurement outcome m,

and IH is the identity operator on H. The operators Em are positive semi-definite
and the set {Em } is complete in the sense that it sums to the identity, ensuring that
probabilities over all outcomes sum to one.
To estimate probability distributions, we similarly calculate the probability dis-
tribution over outcomes via Born rule using the trace p(Ei ) = Tr(ρEi ). Further
discussion of measurement procedures is set out below (such as in Chapter 3).
We are also interested, in a quantum machine learning context, to utilise informationally-
complete measurements, where the set of measurements span the entire space B(H).
This is because quantum states are uniquely determined by their measurement statis-
tics. For problems such as state identification (tomography) and other typical tasks,
A.1. OVERVIEW 261

understanding the underlying probability distribution for each measurement out-

come enables a complete description of that state (or in Watrous’s terminology, a
full description of the register).

Definition A.1.36 (Informationally-complete measurements). A measure µ : Σ →

P os(H), is informationally complete if span{µ(m)|m ∈ Σ} = B(H).

For diagonal states ρ ∈ D(H), the probability vector p ∈ P(Σ) completely

specifies p, which intuitively shows how density operators can be thought of as
operator-analogues of probability distributions over classical states (registers). See
[43] for more generalised descriptions in terms of instruments.
Finally we briefly note extremal measurements, which is a quantum-classical
channel corresponding to an extreme point of all quantum-classical channels C(H).
A measurement µ is extremal if for all measurement choices µj , µk with µ = λµj +
(1 − λ)µk for λ ∈ (0, 1) we have µj = µk , Extremal measurements are those that
cannot be expressed as non-trivial convex combinations of other POVM elements.
We don’t expand upon these types of measurements but note that extremal POVMs
are often associated with obtaining maximum information about a quantum system.
They are optimal in the sense that no other measurement can provide strictly more
information about the quantum state with the fewest measurement outcomes (hence
potentially providing better training data sets from a statistical learning perspective
for tasks such as state discrimination). Extremal measurements are relevant to opti-
mal design of experiments in order to assist statistical learning processes by offering
data that captures the most distinct features of the quantum state’s behaviour.

A.1.6.4 Expectation evolution

The evolution of quantum states can also be characterised in terms of changes in

measurement statistics over time. That is, equation (A.1.21) can also be used to
model the evolution of the measurement statistics [45]. Recall that if A ∈ B(H) is
Hermitian then the expectation value of A for state ψ ∈ H is:

⟨A⟩ψ = ⟨ψ, Aψ⟩ = Tr(ρA) (A.1.41)

Equation (A.1.41) above expresses the same principle of equation (A.4.6) in operator
formalism with the inner product. The uncertainty (variance) in measurements of
A is given by:

(∆A)2 = ⟨A2 ⟩ψ − (⟨A⟩ψ )2 (A.1.42)

The expectation value together with the uncertainty of measurements of A are im-
portant measurement statistics of A with respect to state ψ. We are also often
262 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

interested in how measurement statistics evolve over time which can be modelled
as:

d
⟨A⟩ψ = ⟨−i[A, H]⟩ (A.1.43)
dt

where [·, ·] denotes the commutator [A, B] = AB − BA. The commutator (in the
form of the Lie derivative) is more fully described in Appendix B and proposition
(B.2.2).

A.1.7 Quantum entanglement

A fundamental (or in Schrödinger’s estimation, the fundamental) distinguishing

property of quantum mechanics from its classical counterpart is the phenomenon
of entanglement. Entanglement has a few different but equivalent definitions, such
as being a state ρ which cannot be represented as a tensor product state (definition
A.1.12). More formally, entanglement can be defined as per below.

Definition A.1.37 (Entanglement). Given Hilbert spaces HA and HB and compos-

ite system H = HA ⊗ HB . A pure state |ψ⟩ ∈ H is entangled if it cannot be written
as a product of states from each subsystem, i.e., there do not exist |ϕA ⟩ ∈ HA and
|ϕB ⟩ ∈ HB such that:

|ψ⟩ = |ϕA ⟩ ⊗ |ϕB ⟩ . (A.1.44)

A mixed state ρ ∈ B(H) is entangled if it cannot be expressed as a convex combina-

tion of product states:
X
ρ ̸= p i ρA B
i ⊗ ρi , (A.1.45)
i

where {pi } are probabilities, ρA B

i ∈ B(HA ), and ρi ∈ B(HB ) are density matrices for
the subsystems A and B, respectively.

In particular, for EPR pairs (Bell states) this means that certain states such as:

1
|ψ⟩ = √ (|00⟩ + |11⟩) (A.1.46)
2

cannot be represented as a tensor product state. We mention entanglement only

tangentially in this work but it is of course of central importance across quantum
information processing.
A.1. OVERVIEW 263

A.1.8 Quantum metrics

Metrics play a central technical role in classical machine learning, fundamentally be-
ing the basis upon which machine learning algorithms update, via techniques such
as backpropagation. Indeed metrics at their heart concern the ability to quantifi-
ably distinguish objects in some way and to this extent have been integral to the
very concept of information as portended by Hartley in which information refers to
quantifiable ability of a receiver to distinguish symbolic sequences [247]. Hartley’s
original measure of information prefigured advanced approaches to quantifying and
describing information, such as Kolmogorov and others [248].

Metrics for quantum information processing are related but distinct from their
classical counterparts and understanding these differences is important for researchers
applying classical machine learning algorithms to solve problems involving quantum
data. As is commonplace within machine learning, chosen metrics will differ depend-
ing on the objectives, optimisation strategies and datasets. For a classical bit string,
there are a variety of classical information distance metrics used in general [238]. In
more theoretical and advanced treatments, available metrics will depend upon the
underlying structure of the problem (e.g. topology) (see [43] for a comprehensive
discussion). Metrics used will depend also upon whether quantum states or opera-
tors are used as the comparators, though one can relatively easily translate between
operator and state metrics. We outline a number of commonly used quantum met-
rics below and discuss their implementation in classical contexts, such as in loss
functions. Note below we take license with the term metric as certain measures be-
low, such as quantum relative entropy, do not (as with their classical counterparts)
strictly constitute metrics as such.

Being able to quantify the difference or similarity of quantum sates is fundamen-

tal to the application of any machine learning protocols in quantum information
processing. This problem of state discrimination, of how accurately states can be
distinguished using measurement, is examined briefly below. In particular, we focus
on the key metric in later Chapters of fidelity among target unitary operators and
those generated via our algorithmic methods. We summarise briefly a few key re-
sults in the area, including the Holevo-Helstrom theorem (for providing a weighted
difference between states). In machine learning, state discrimination is also central
to classification methods. Indeed difference measures and metrics are central to
underlying objective (loss) functions underpinning the learnability of classical and
quantum machine learning algorithms.
264 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

A.1.8.1 State discrimination

State discrimination for both quantum and probabilistic classical states requires
incorporation of stochasticity (the probabilities) together with a similarity measure.
The Holevo-Helstrom theorem quantifies the probability of distinguishing between
two quantum states given a single measurement µ. State discrimination here is a
binary classification problem.

Definition A.1.38 (Holevo-Helstrom theorem). Given diagonal ρk ∈ D(H) with

λ ∈ [0, 1], for each choice of measurement µ : {0, 1} → P os(H), the probability of
correctly (λ) or incorrectly (1 − λ) distinguishing a state is given by:

1 1
λ ⟨µ(0), ρ0 ⟩ + (1 − λ) ⟨µ(1), ρ1 ⟩ ≤ + ||λρ0 + (1 − λ)ρ1 ||1 (A.1.47)
2 2

Where, for a suitably chosen projective measurement µ, the relation is one of

equality. State discrimination is central to results in this work, especially in Chap-
ters 5 and 6 where discriminating states (measuring their similarity) is the key to
training quantum machine learning models. A detailed exposition of state discrimi-
nation from an information theoretic perspective can be found [43]. For our primary
results relating to state (and by extension operator) discrimination in later Chap-
ters, we focus on the fidelity metric as the basis for distinguishing quantum states
and operators (central to, for example, our objective functions in machine learning
applications we use). We focus on fidelity below. Briefly we also mention an impor-
tant feature of quantum information (especially as its relates to machine learning
applications) in the form of the no cloning theorem which provides that quantum
information cannot be identically copied. That is, no procedure exists for identically
replicating any arbitrary quantum state |ψ⟩ on a Hilbert space H.

Theorem A.1.39 (No-Cloning Theorem). The no cloning theorem provides that

here does not exist a completely positive, trace-preserving (CPTP) map Φ : B(H) →
B(H ⊗ H) such that for any pure state |ψ⟩ ∈ H:

Φ(|ψ⟩⟨ψ|) = |ψ⟩⟨ψ| ⊗ |ψ⟩⟨ψ|. (A.1.48)

The no cloning theorem illustrates the fundamentally non-classical nature of

quantum information. Note, however, that it does not prevent the preparation of
identical quantum states (essential to machine learning protocols including those
explored in Appendix D) i.e. identical ‘copies’ of quantum states ρ can be produced
via identical preparation procedures. Rather, the no cloning theorem is a claim
about operations on a state that would result in an identical copy being produced.
A.1. OVERVIEW 265

A.1.8.2 Fidelity function

Of central importance to later results in Chapters 3 and 4 is the fidelity function

which quantifies the degree of similarity or overlap between two quantum states (or
their operator representations in P os(H)). Fidelity is a key metric in quantum in-
formation processing, especially in quantum control and quantum unitary synthesis
problems. It is central to the quantum machine learning loss function architecture
adopted in this work. As such, we provide a more in-depth assessment of the metric
with regard to quantum state comparison.

Definition A.1.40 (Fidelity). The fidelity between two operators ρ, σ ∈ P os(H) is

given by:

√ √ √ √
q
F (ρ, σ) = ρ σ 1
= Tr σρ σ (A.1.49)

The fidelity function exhibits a number of properties including: (a) continu-

ity - it is continuous at (ρ, σ) (important for statistical learning performance);
(b) symmetric F (ρ, σ) = F (σ, ρ); (c) positive semi-definiteness F (ρ, σ) ≥ 0 else
ρσ = 0; (d) F (ρ, σ) ≤ Tr(ρ)Tr(σ) (equality if not independent); (e) preserva-
tion under conjugation by isometries (unitary channels) V ∈ U (H, H) such that
√
F (ρ, σ) = F (V ρV ∗ , V σV ∗ ); (f) F (λρ, σ) = λF (ρ, σ) = F (ρ, λσ). In terms of den-
sity operators ρ, σ ∈ P os(H), we have that F (ρ, σ) ∈ [0, 1] with F = 0 if and only
if ρ ⊥ σ and F = 1 if and only if ρ = σ. The fidelity function can be shown to be
jointly concave in its arguments and monotonic under the action of channels. The
latter of these properties provides that for Φ ∈ C(H) with ρ, σ ∈ P os(H):

F (ρ, σ) ≤ F (Φ(ρ), Φ(σ)) (A.1.50)

Finally, the relationship of fidelity F and trace function || · ||1 for state ρ, σ ∈ D(H)
is given by Fuchs-van de Graaf inequalities, which can be expressed as:
p
2 − 2F (ρ, σ) ≤ ||ρ − σ||1 ≤ 2 1 − F (ρ, σ)2 (A.1.51)
p
Fidelity and trace distance are related via D(ρ, σ) = 1 − F (ρ, σ)2 . Fidelity can
also be interpreted as a metric by calculating the angle ζ = arccos F (ρ, σ). Note in
some texts that equation (A.1.49) is sometimes denoted root fidelity while fidelity
is its square (see [241]) and that fidelity can be related to transition probability
between mixed states [249].
Other metrics common in quantum machine learning literature include:

1. Hamming distance, the number of places at which two bit strings are unequal.
266 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

Hamming distance is important in error-correcting contexts and quantum com-

munication [250].

2. Trace distance or L1-Kolmogorov distance described in definition A.1.25. Trace

distance is a metric preserved under unitary transformations. It is therefore a
widely used similarity metric in quantum information.

3. Quantum relative entropy (see definition A.1.27) is the quantum analogue of

Shannon entropy [241]. It is found given by S(ρ) = −Tr(ρ log ρ) The quantum
analogue of (binary) cross-entropy is in turn given by:

S(ρ||σ) = Tr(ρ log ρ) − Tr(ρ log σ) (A.1.52)

These measures provide a further basis for comparing for the output of algo-
rithms to labelled data during training.

We discuss the use and relationship of such metrics to quantum machine learning
algorithms deployed in further chapters in more detail in Appendix D.

A.2 Quantum Control

A.2.1 Overview
Quantum information tasks can be modelled in terms of a two-step experiment (a)
a quantum state and measurement instrument preparation procedure (isolating the
quantum system in a particular state); (ii) a measurement step where the instrument
interacts with the quantum system to yield measurement statistics. Obtaining such
statistics (by which the state preparation can potentially be confirmed) occurs via
multiple repetitions of such experiments. While quantum systems are represented
distinctly from classical systems, their state preparation, control and measurement
is classically parametrised (e.g. by parameters in R or C). Information can be distin-
guished between classically described and evolving (classical information) as distinct
from being described and evolving according to quantum formalism (quantum infor-
mation). In certain cases, such as with certain classes of quantum machine learning
or variational algorithms [73, 88, 100, 251], parameter registers may themselves be
represented (or stored within) quantum registers. However, as noted above, quan-
tum registers themselves concern distributions and evolution] over classical registers
(albeit with certain non-classical features). Thus ultimately quantum information
processing, and tasks involving quantum systems, involve constructions using clas-
sical information.
A.2. QUANTUM CONTROL 267

As Wiseman et al. note [144] the preparation-evolve-measure procedure of quan-

tum parameter estimation can be framed as one where a quantum system mediates
(and transforms) the classical parameters of a state preparation procedure to the
classical measurement statistics. Construed as a transformation mediating the tran-
sition from classical (input) to classical (measurement/output) information, quan-
tum systems can be seen as establishing constraints upon the accuracy of the estima-
tion of such parameters (albeit ontological limits given the fundamental grounding of
physics in quantum mechanics). For example, modelling the synthesis of a particular
computation or state can be modelled as the task of quantum parameter estima-
tion where parameters θ ∈ C (such as control pulses, see below) are estimated from
measurement outcomes of a state whose evolution is modelled as:

ρ0 → ρT = e−iHt ρ0 eiHt = U ρ0 U † (A.2.1)

P
Here the Hamiltonian is parameterised by θ as H = k θk Gk where Gk are generators
(see below). Assuming ρ0 is (sufficiently) known along with Gk , then ascertaining
the set of controls θk to (optimally - say minimising time or energy) synthesise ρT is
contingent on estimates θ̂ via minimising a cost function |θk − θ̂k |. The estimates θ̂
are themselves usually constructed via comparing the measurement statistics of the
estimate ρ̂T against ρT . This is also the case when seeking to estimate, for example,
state preparation parameters from measurement statistics. Thus constraints upon
measurement statistics act as constraints or bounds upon how accurately quantum
parameter estimation (and tasks dependent on it, such as unitary synthesis) may
be performed. In this work, we primarily base our optimisations (in later Chapters)
upon the fidelity measure (discussed below), assuming the existence of an (optimal)
measurement protocol by which to construct our estimates of, for example, target
unitaries ÛT . Below we set out a few of the key assumptions and features of quantum
control formalism used in later Chapters. We revisit this formalism in Appendices
below in terms of Lie theoretic and differential geometric models of quantum control.

A.2.2 Evolution, Hamiltonians and control

Our work and results in later Chapters are focused upon typical quantum control
problems [15,46,144], where it is assumed there exist a set of controls (such as pulses
or voltages) with which the quantum system may be controlled or steered towards a
target state (or unitary). Recall that in density matrix formalism, the Schrödinger
equation in terms of a density operator and Hamiltonian is:

dρ
i = [H(t), ρ(t)] (A.2.2)
dt
268 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

In physical systems, it corresponds to the total energy (sum of kinetic and potential
energies) of the system under consideration. In control theory, we can separate
out the Hamiltonian in equation (A.2.2) into drift and control forms. For a closed
system (i.e. a noiseless isolated system with no interaction with the surrounding
environment), it can be expressed in the general form:

.
H(t) = H0 (t) = Hd (t) + Hctrl (t) (A.2.3)

Hd (t) is called the drift Hamiltonian and corresponds to the natural evolution of
the system in the absence of any control. The second term Hctrl (t) is called the
control Hamiltonian and corresponds to the controlled external forces we apply to
the system (such as electromagnetic pulses applied to an atom, or a magnetic field
applied to an electron). This is also sometimes described as an interaction Hamil-
tonian (representing how the system interacts with classical controls e.g. pulses or
other signals). This allows us to define a control-theoretic form of the Schrödinger
equation.

Definition A.2.1 (Control (Schrödinger) equation). Given quantum state ρ ∈ B(H)

with Hamiltonian given by H0 (t) = Hd (t)+Hctrl (t) define the control form of equation
(A.2.2) as:

dρ
i = [Hd (t) + Hctrl (t), ρ(t)] (A.2.4)
dt

The solution of the evolution equation at time t = T is given by:

ρ(T ) = U (T )ρ(0)U † (T ), (A.2.5)

where ρ(0) is the initial state of the system, the unitary evolution matrix U (t) is
given by equation (A.1.18). We discuss control and its application in some detail in
later Chapters, specifically relating to geometric quantum control as expressed by
formal (Pontryagin) control theory where targets are unitaries UT ∈ G for unitary
groups G. As D’Alessandro [15] notes, the typical quantum control methodology
covers: (a) obtaining the Hamiltonian of the system in an appropriate form, (b)
identifying the internal and interaction components of the Hamiltonian (and often
calculating the energy eigenstates for a time-independent approximation) and (c)
specifying a finite dimensional and bounded control system for control coefficients.
A.2. QUANTUM CONTROL 269

A.2.3 Control systems and strategies

We sketch some of the discussion of control theory in later Chapters here. D’Alessandro
[15] notes that a general control system has the following form (which as we discuss
is due to Pontryagin):

ẋ = f (t, x, u) (A.2.6)

where x represents the system state, f is a vector field while u = u(t) are real-
valued time varying (or constant over small ∆t interval) control functions. The
state is inaccessible in general, only via some other function or channel e.g. g(x),
say a measurement operation. In quantum settings, equation (A.2.6) is described
by the Schrödinger equation. In unitary form this is equivalent to:

ẋ ∼ U̇ f (x, t, u) ∼ −iH(u(t))U (A.2.7)

where H(u(t)) is a Hamiltonian comprising controls and generators (for us, drawn
from a Lie algebra g). The drift Hamiltonian Hd and control Hamiltonian Hc com-
bine as:
X
H(u) = Hd + Hk uk (A.2.8)
k

where f in equation (A.2.7) is thought of as a linear function of x and an affine

function of the controls u. For ensembles, the evolution can be described via control
Hamiltonians acting upon the set of density matrices as well. Overall the generic
control solution is then:

ρ(t) = U (t)ρ(0)U † (t) U̇ = −iH(u)U U (0) = I (A.2.9)

with solutions given by:

!
X
U̇ = −i Hd + Hk uk U (A.2.10)
k

Such solutions are drawn from Jurdjevic [23] from geometric control theory litera-
ture. These are a major focus of our final chapter, where we show equivalent results
can be obtained for certain symmetric space quantum control problems by using a
global Cartan decomposition together with certain variational techniques. The form
of equation (A.2.9) can be easily construed in an information-theoretic fashion, such
as when logic gates (e.g. upon qubits) are sought to be engineered.
As noted in [15], an important distinction between classical and quantum control
270 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

is the unavailability of classical feedback control in the quantum case as a result of

the measurement axiom A.1.5, namely the collapse of ρ → ρ′ : the modification of the
state ρ by measurement means that evolution must be considered as starting from
a different initial condition ρ(0)′ . Instead, we focus on open loop control schemes
which rely upon an a priori control system. In this case, control inputs are applied
without observing the current state ρ(t) i.e. meaning no adjustment during the
control strategy (the sequence of controls and their characteristics), such as to the
sequence or amplitude of control pulses. This is distinct from feedback control where
the control strategy is dynamically adaptable based on the state (e.g. as say in
reinforcement learning).
We also note other conditions regarding reachability of targets UT ∈ G. In
particular, we rely upon the quantum recurrence theorem [252]

Definition A.2.2 (Quantum Recurrence Theorem). For any U ∈ G where G rep-

resents a unitary Lie group acting on H represented by U (0), there exists a ϵ, T > 0
such that:

U (0) − U (T )U (0)(T )† < ϵ,

where |·| denotes the operator norm. This theorem is denoted the quantum recurrence
theorem.

The action of the group element U ∈ G on H is the quantum analogue of the

classical flow in dynamical systems. Indeed the quantum recurrence theorem is the
quantum analogue of the Poincaré classical recurrence theorem which states that
any Hamiltonian system defined on a finite phase space will eventually evolve to
within arbitrarily small proximity to their initial state. The quantum recurrence
theorem guarantees that quantum evolution on the action of G (via conjugation)
will evolve the quantum state U arbitrarily close to U0 after some finite time T ,
reflecting the quasi-periodic nature of the evolution in a finite-dimensional Hilbert
space and follows from the spectral properties of the unitary group on such a space.
Before moving to our next Chapter on Lie groups and associated algebraic concepts,
we mention a few concepts from open quantum systems relevant to later Chapters.

A.3 Open quantum systems

A.3.1 Overview
We conclude this background Appendix with a brief synopsis of open quantum
systems and quantum control. This is of particular relevance to Chapter 3, where our
QDataSet models the influence of noise upon one- and two-qubit systems. Moreover,
A.3. OPEN QUANTUM SYSTEMS 271

it is of seminal importance in any quantum information processing task. As Wiseman

and Howard note, projective measurements (see above) are inadequate for describing
real measurements because, among other reasons, such measurement never measure
directly the system of interest in a closed state [144]. Rather, the system always
interacts with an environment. Even if this logic is extended back to the interaction
of the observer themselves, some truncation of the phenomena into system (measured
via applying the measurement postulates above for projective measurement) and
environment must be made (so-called Heisenberg cuts). While the focus of this work
is on geometric and machine learning techniques, we briefly summarise a few relevant
and key aspects of open quantum systems modelling. This is again of relevance to
Chapter 3 in relation to how our quantum simulations explicitly factor in a variety
of noise sources. Most of our results in Chapters 3 and 4 assume a simplified control
problem, one of a closed quantum system, while Chapter 3 demonstrates (through
the VO operator) how machine learning can be used to learn those features of an
open or noisy system in a way that can allow for mitigation or noise cancellation to
a degree.

Open quantum systems construct an overall quantum system comprising the

system (or closed quantum evolution) and the environment (i.e. interactions with
the environment). A simple Hamiltonian model for such open systems:

.
H(t) = H0 (t) + H1 (t) = Hd (t) + Hctrl (t) + HSE (t) + HE (t) . (A.3.1)
| {z } | {z }
H0 (t) H1 (t)

H0 (t) is defined as before to encompass the drift and control parts of the Hamilto-
nian. The new term H1 (t) now consists of two terms: HSE (t) represents an inter-
action term with the environment, while HE (t) represents the free evolution of the
environment in the absence of the system. In this case, the Hamiltonian controls the
dynamics of both the system and environment combined in a highly non-trivial way.
In other words, the state becomes the joint state between the system and environ-
ment. The combined system and environment then become closed. Modelling such
open quantum systems is complex and challenging and is typically undertaken using
a variety of stochastic master equations [144] or sophisticated noise spectroscopy.
As detailed in Chapter 3, the QDataSet contains a variety of noise realisations for
one and two qubit systems together with details of a recent novel operator [82] for
characterising noise in quantum systems. As such we briefly summarise a few key
concepts relating to open quantum systems. Detail can be found in Wiseman and
Milburn [144] and other standard texts.
272 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

A.3.2 Noise and quantum evolution

Of importance to practical development and applications of quantum computing,
especially the control of quantum systems, is understanding the role of noise and
other environmental interactions via the theory of superoperators, which are defined
as follows.
Definition A.3.1 (Superoperators). A superoperator S is a linear map on the space
of bounded linear operators on a Hilbert space H given as S : B(H) → B(H),
Â 7→ S Â.
For a superoperator S to correspond to a physical process, such as a quantum
channel, it must satisfy the following: (a) trace preserving or decreasing, that is,
for any state ρ we have that 0 ≤ Tr(Sρ) ≤ 1, (b) convex linearity, that is, for
probabilities {pj }:
!
X X
S pj ρj = pj Sρj (A.3.2)
j j

and (c) complete positivity, namely S must not only map positive operators to
positive operators for the system but also when tensored with the identity operator
on any auxiliary system R:

(IR ⊗ S)(ρRS ) ≥ 0. (A.3.3)

A superoperator S that satisfies these properties is a CPTP map and therefore a

quantum channel (by definition A.1.29), often represented by the Kraus operators
{Ek } in the form:
X X
S(ρ) = Ek ρEk† Ek† Ek = IH (A.3.4)
k k

A.3.3 Noise and decoherence

Noise effects are of fundamental importance to simulating quantum systems (the
subject of Chapter 5) but also the ability of machine learning algorithms to learn
patterns in data. As shown above, open quantum systems are described by the
evolution of their density operators. The formal and most common method of
representing such open-state evolution is via the Lindblad master equation which
formalises non-unitary evolution arising as a result of interaction with noise sources,
such as baths. The most general form of the master equation for the density operator
ρ of an open quantum system, preserving complete positivity and the trace of ρ, is
given by the Lindblad equation:
A.3. OPEN QUANTUM SYSTEMS 273

Definition A.3.2 (Lindblad Master Equation). The mixed unitary and non-unitary
time-evolution of a quantum system represented by ρ interacting with its environment
is given by the Lindblad master equation.

dρ X † 1n † o
= −i[H, ρ] + γk Lk ρLk − Lk Lk , ρ (A.3.5)
dt k
2

Here ρ is the density matrix of the quantum system, H is the Hamiltonian of the
P
system, dictating the unitary evolution due to the system’s internal dynamics, k
indicates a sum over all possible noise channels k affecting the system and γk are the
rates at which the system interacts with the environment through the k th channel,
quantifying the strength of the non-unitary processes. Of importance are Lk , the
Lindblad operators associated with each noise channel, describing how the system
interacts with its environment and, in particular, how the environment transforms
ρ in a way that affects properties of interest, such as its coherence.
Lindblad operators Lk are not superoperators themselves, but they are integral
components in the definition of the Lindblad superoperator, which describes specific
physical processes (e.g., photon emission, absorption, or scattering). In contrast, the
superoperator L, represents the entire evolution of the quantum system including the
dissipative dynamics. To understand this within the context of quantum trajectories,
where the evolution of a system is modeled under continuous measurement, the
stochastic master equation can be written as:

dρ X √
= −i[H, ρ]dt + L[Lk ]ρdt + ηK[L]ρdWt (A.3.6)
dt k

In the equations above, L represents the Lindblad operator, ρ is the density matrix of
the system, η is the efficiency of the measurement, and dWt is the Wiener increment
representing the stochastic process of the measurement that models the infinitesimal
evolution of processes that have continuous paths. The equation comprises:

(i) a Hamiltonian term −i[H, ρ], describing the unitary evolution of a closed sys-
tem;

(ii) a Lindbladian (superoperator) term L[L]ρ, which describes the average effects
of system-environment interactions and is given by:
o
† 1n †
L[Lk ]ρ = γk Lk ρLk − Lk Lk , ρ (A.3.7)
2

(iii) a backaction (superoperator) term K[L]ρ, which accounts for the changes in
274 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

the system due to measurement and is given by:

√ √
ηK[L]ρ = η(Lρ + ρL† − ρ Tr(Lρ + ρL† )). (A.3.8)

where η denotes the influence of the backaction term. As Wiseman and Mil-
burn note, in the Heisenberg picture the backaction is manifest in the system
operators rather than state.

Note for completeness that the Lk terms represent the channels through which the
quantum system interacts with its environment i.e. the quantum trajectories. By
contrast, the backaction term seeks to model the effect of the (usually single) mea-
surement channel (which is why we do not sum over k).

A.3.4 Noise and spectral density

We conclude with a few remarks relevant in particular to discussions of noise spectral
density in Chapter 3 (section 3.5.3.1). In quantum control, complete knowledge of
environmental noise is usually (if not always) impossible to obtain. However, in
certain cases, even partial knowledge of open system dynamics can assist in achieving
gate fidelities. One such example is where the control pulse is band-limited. To
understand this, consider the frequency domain representation of a control pulse
u(t):
Z ∞
F (ω) = u(t) exp(−iωt)dt (A.3.9)
−∞

with frequency ω of the pulse (e.g. some signal) and F (ω) represents the amplitude
of the control pulse. If F (ω) is confined within a specific frequency band, |ω| ≤ Ω0 ,
then the quantum system’s response to environmental noise can be represented by
the convolution:
Z ∞
I(ω) = dωF (ω)S(ω), (A.3.10)
−∞

where S(ω) is the noise power spectral density (PSD), a measure of noise intensity. In
such cases, only noise components within the control’s effective bandwidth, |ω| ≤ Ω0 ,
are relevant to solving the quantum control problem. Connecting with the Lindblad
master equation (see equation A.3.5), each Lindblad operator Lk corresponds to a
type of noise interaction between the quantum system and the environment. The
decoherence rates γk can be derived from correlation functions of noise operators
coupling to the system as represented via the Lk operators. Correlation functions
capture temporal correlations of the noise operators β(t) on the system. The (two-
A.4. ANALYSIS, MEASURE THEORY & PROBABILITY 275

time or interval) correlation function G(t) is defined as:

G(t) = ⟨β(t)β(0)† ⟩, (A.3.11)

where ⟨·⟩ denotes the expectation with respect to the environmental state, and
β(t) = eiHenv t β(0)e−iHenv t in the Heisenberg picture with Henv being the Hamiltonian
of the environment. The PSD S(ω) is then obtained from the Fourier transform of
G(t):
Z ∞
S(ω) = G(t)e−iωt dt. (A.3.12)
−∞

This quantity characterizes the strength of noise or fluctuations in the environment

at frequency ω. As noted in the literature [253, 254], classical PSD indicates the
magnitude of noise at a particular frequency ω, while quantum noise PSD indicates
the magnitude of Golden rule transition rates for emission or absorption events.
Dephasing rates γk can thus be shown to be related to (proportional to) S(ω) (see
[254]). That is, γk is determined by integrating over the PSD frequencies of interest:
Z Ω0
γk = η S(ω)dω (A.3.13)
−Ω0

where the proportionality constant η depends upon specific interactions between the
system and environment, such as coupling constants or environmental energy states.

A.4 Analysis, Measure Theory & Probability

A.4.1 Analysis and Measure Theory

In this section, we include a few concepts of relevance from analysis, measure theory
and probability theory underpinning the theory of quantum information processing
above.

Definition A.4.1 (Compact Set). The set A ⊆ V is compact if, for every sequence
(vj ) in A, there exists a subsequence (vjk ) that converges to a vector v ∈ A i.e.
(vjk ) → v. For finite spaces, a set A ⊆ V is compact if and only if it is both closed
and bounded by the Heine-Borel theorem.

Compactness of sets is important for controllability and reachability such as

via the Poincaré and quantum recurrence theorems (see definition A.2.2). A ⊆ V
is compact if and only if it is both closed and bounded. If A is compact and
f : A → R is continuous on A, then there exists for f a maximum and minimum
276 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

value on A (i.e. analogous to the mean value theorem). For A ⊆ V compact and
f : V → W continuous on A, continuous maps preserve compactness i.e. the image
f (A) = {f (v) : v ∈ A} is compact.
Given A ⊆ V (C), B ⊆ W (C), we set some basic properties of measurable sets
relevant to various chapters.

Definition A.4.2 (Borel sets). A Borel set or subset C ⊂ A is a set for which (a) C
is open in A, (b) C is the complement of another Borel subset or (c) for a countable
set {Ck }, Ck ⊂ A then C = ∞
S
k Ck .

The set of Borel subsets is Borel(A). Functions f : A → B are Borel if

∞
f (C) ∈ Borel(A) for all C ∈ Borel(A) (i.e. the function preserves being Borel).
Borel subsets have characteristic functions χC (u) which acts as a classifier (binary)
where χC (u) = 1 if u ∈ C and 0 otherwise. Borel sets exhibit closure (algebraic)
properties in that scalar multiplication of Borel functions remain Borel, while Borel
functions are closed under addition. These and other properties allow us to define a
(Borel) measure which is important in the underlying probabilistic quantum theories
discussed above.

Definition A.4.3 (Measure). A measure is a map:

µ : Borel(A) → [0, ∞] (A.4.1)

such that (a) µ() = 0, (b) (transitivity of measure) such that:

∞
[ ∞
X
µ Ck = µ(Ck )
k k

For probability we are interested in normalised measures µ(A) = 1. A Borel

measure is a measure defined on the Borel σ-algebra of a topological space, which is
the sigma-algebra generated by the open sets (or, equivalently, by the closed sets).
From the measure µk one can also define product measures and so on. An important
concept we briefly mention is that of integrable functions.
For continuous variables or continuous spectrum observables, the probability of
measuring a particular outcome can be described by Borel integrable functions over
the spectrum of the observable. Borel integrable functions permit integrations over
Lie group manifolds in the analysis and optimization of control protocols as set out
in Chapter 5

Definition A.4.4 (Borel integrable). Given a measure µ, define:

Z
f (u)dµ(u)
A.4. ANALYSIS, MEASURE THEORY & PROBABILITY 277

An integrable (Borel) function g : A → K with respect to µ is one for which there

exist Borel functions f0 , f1 : A → [0, ∞) such that:
Z Z Z
g(u)dµ(u) = f0 (u)dµ(u) − αf1 (u)dµ(u) (A.4.2)

where α = 1 for K = R and i for K = C. Here g = f1 − f0

Integrable functions exhibit certain features such as linearity (with respect to

scalars in K).

A.4.2 Probability measure

We define a probability measure as follows, relevant in particular to both quantum
information processing.

Definition A.4.5 (Probability measure). For A ∈ A ⊂ B(H), a probability measure

is defined as µ : A → [0, 1] on A ⊂ R with µ(A) = 1.

In probability theory, A are typically denoted events. Being a measure, proba-

bility must exhibit countable additivity. Random variables X are then defined as
Borel functions X : A → R distributed with respect to µ. Standard definitions and
properties of probabilities are then defined with respect to this measure theoretic
definition. We then define a few standard related properties.

Definition A.4.6 (Expectation). The expectation value of a random variable X

(with respect to µ) is then defined as the integral:
Z
E(X) = X(u)dµ(u) (A.4.3)

with:
Z ∞
E(X) = P r(X ≥ λ)dλ (A.4.4)
0

for X ≥ 0.

A similar formalism then applies for quantum states described via alphabets Σ
with probability vectors p ∈ P(Σ). The random variable X takes the form of a
mapping X : Σ → R such that for Γ ⊆ Σ:
X
P r(X ∈ Γ) = p(a) (A.4.5)
a∈Γ

with p(a) the probability of the state described by a vector in Σ. Expectation be-
P
comes the familiar discretised form E(X) = a p(a)X(a). It can be shown that
278 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)

distribution of states with respect to such probability vectors is equivalent to distri-

bution according to Borel probability measure. A few other concepts used in this
work are set out below.

Definition A.4.7 (Gaussian measure). The standard Gaussian measure is a Borel

measure γ : Borel(R) → [0, 1] where:

α2
Z
1
γ(A) = √ exp − dα (A.4.6)
2 A 2

The integral measure α in equation (A.4.6) is the standard Borel measure on

R. A standard normal random variable is such that P r(X ∈ A) = γ(A). For
multidimensional systems, we note also the standard measure on n−dimensional
systems as γn : Borel(Rn ) → [0, 1] such that:

||u||2
Z
−n/2)
γn (A) = (2π) exp − dvn (u) (A.4.7)
A 2

where vn is an n−fold product measure. An important feature of this measure is that

it is invariant under orthogonal transformations, this includes rotations (including
those generated by, for example, group elements in specific cases) e.g:

γn (U A) = γn (A) (A.4.8)

where A ⊆ Rn is Borel and U ∈ B(Rn ). The consequence is that for i.i.d. random
variables Xk , the Gaussian measure projected onto a subspace is equivalent to a
Gaussian measure on that subspace i.e:
n
X
Yk = U Xj (A.4.9)
j

A.4.3 Haar measure

A Borel algebra can also be defined as the σ-algebra generated by all open subsets
of the Lie topological group G (being locally compact and Hausdorff). Left and
right action on S ⊂ G is defined as usual via gS, Sg with left-translation (measure)
invariance being defined as µ(gS) = µ(S) and µ(Sg) = µ(S). Using this formulation,
we can define the Haar measure [255] in the quantum context as a unitarily invariant
Borel probability measure µH .

Definition A.4.8 (Haar measure). The Haar measure on a locally compact group
G is a measure µH that is invariant under the group action. For the unitary group
A.4. ANALYSIS, MEASURE THEORY & PROBABILITY 279

U (X ), it is defined as:

µH : Borel(U (X )) → [0, 1]

satisfying for all g ∈ U (X ) and for all Borel sets A ⊆ Borel(U (X )):

µH (gA) = µH (A) = µH (Ag)

The measure is also normalized such that µH (U (X )) = µH (G) = 1.

The Haar measure allows definition of a uniform measure on topologically struc-

tured groups, such as locally compact unitary groups. As the measure is left- and
right- invariant (see discussion of Lie groups in section (B.2.2). Thus in a quantum
mechanical sense ensuring unitary operators are chosen according to the Haar mea-
sure, meaning that (when selecting unitaries) each has an equal probability of being
chosen (related to creating random quantum circuits and simulating quantum dy-
namics). One can also define the Haar measure in terms of operator-valued random
variables (see [43]).
280 APPENDIX A. APPENDIX (QUANTUM INFORMATION PROCESSING)
Appendix B

Appendix (Algebra)

B.1 Introduction

B.1.1 Overview
A major focus of quantum geometric machine learning is framing quantum control
problems in terms of geometric control where target quantum computations are rep-
resented as UT ∈ G and G is a Lie group of interest. In this generic model, we are
interested then in constructing time-optimal Hamiltonians composed of generators
from the associated Lie algebra g (or some subspace thereof) and associated control
functions u(t) so as to evolve the quantum system towards the target UT in the
most efficient manner possible. Doing so according to optimality criteria, such as
optimal (minimum) time or energy in turn requires a deep understanding of quan-
tum information processing concepts, but also algebraic and geometric concepts at
the heart of much modern quantum and classical physics. By using methods that
leverage symmetry properties of quantum systems, time-optimal solutions can often
be more easily approximated to find solutions or solution forms that would other-
wise be intractable or numerically infeasible to discover. This synthesis of geometry
and algebra is made possible by seminal results in the history of modern mathe-
matics which established deep connections between algebra, analysis and geometry.
Connections between branches of mathematics was prefigured in much ancient med-
itations of natural philosophy and becoming a central motivation for so much of
19th century mathematical innovations in geometry, analysis, algebra and number
theory that led to the hitherto unparalleled development of the field from Hilbert on-
wards. The widespread application of ideas drawn from these disciplines has been of
significance within physics and other quantitative sciences, including more recently
computer science and machine learning fields.
In this Appendix, we cover the key elements of the theory of Lie algebras and
Lie groups relevant to the quantum geometric machine learning programme and

281
282 APPENDIX B. APPENDIX (ALGEBRA)

greybox machine learning architectures discussed further on. The exegesis below is
far from complete and we have assumed a level of familiarity with elementary group
and representation theory. The Appendix concentrates on key principles relevant
to understanding Lie groups and their associated Lie algebras, with a particular
focus on matrix groups and foundational concepts such as bilinear (Killing) forms,
the Baker-Campbell-Hausdorff theorem and other elementary principles. We then
proceed to a short discussion of elements of representation theory, including a discus-
sion of semi-simple Lie group classification, adjoint action, Cartan decompositions
and abstract root systems. The material below is related throughout to the novel
results in this work in later Chapters, as well as the other Appendices. The content
below is also interconnected with the following Appendix C on geometric principles
(originally a single Appendix, it was split for readability), especially regarding differ-
entiable manifolds, geodesics and geometric control. Most of the material is sourced
from standard texts such as those by Knapp [9, 230], Helgason [2], Hall [235] and
others. As with the other Appendices, proofs are largely omitted (being accessible
in the referenced texts) and content is paraphrased within exposition regarding its
applicability to the main thesis. Readers with sufficient background in these topics
may skip this Appendix.

B.2 Lie theory

B.2.1 Overview
An essential concept in physics is symmetry, especially rotational and Lorentz sym-
metry in various physical systems. Symmetries are often described by continuous
groups—parameterized groups with a manifold structure that allows smooth op-
erations. These are known as Lie groups, which possess a differentiable manifold
structure that supports smooth group operations. The tangent space at the identity
element of a Lie group forms a Lie algebra through a natural bracket operation,
encoding the properties of the Lie group while being more tractable due to its linear
structure.
In quantum mechanics, symmetry is reflected in the Hilbert space through a
unitary action of a symmetry group. This is formalized by a unitary represen-
tation, a continuous homomorphism from a symmetry group G into the group of
unitary operators U ∈ B(H) acting on the quantum Hilbert space H (axiom A.1.1).
However, in reality, physical states correspond to unit vectors in H differing by a
phase, so we use projective unitary representations, mapping G into the quotient
U (H)/U (1), where U (H) is the set of unitaries of H and U (1) is the group of phase
factors, or complex numbers of unit magnitude. This captures the essence of physi-
B.2. LIE THEORY 283

cal state representations, as an ordinary or projective representation of a Lie group

induces a corresponding representation of its Lie algebra. For example, angular
momentum operators are a representation of the Lie algebra of the rotation group.
Understanding the representation of Lie groups and their algebras in a way relevant
to quantum information processing is therefore central to solving geometric quan-
tum control problems, especially using machine learning. To this end, we set out
background theory below. We formalise these concepts below.

B.2.2 Lie groups

A topological group is a group G equipped with a topology such that the group’s
binary operation and inverse function are continuous functions with respect to the
topology. Such groups exhibit both algebraic structure and topological structure,
allowing algebraic operations (due to group structure) and continuous functions on
the group manifold (due to its topology). A continuous group is one equipped with
continuous group operations i.e. that G is parameterised by a parameter that is
continuous (and even smooth) on its domain e.g. θ ∈ I ⊂ R. If G has Hausdorff (T2
topology) and that parameter is differentiable, then G exhibits the structure of a
Lie group (and as we discuss below, that of a differentiable manifold (see definition
C.1.2)).

Definition B.2.1 (Lie groups). A Lie group G is a Hausdorff topological group

equipped with a smooth manifold structure, namely one that supports group opera-
tions that are smooth maps.

Lie groups encapsulate the concept of a continuous group allowing the study
of infinitesimal transformations (such as translations and rotations) and associ-
ated symmetry properties. The ability to analyze infinitesimal transformations
is what connects Lie groups to differential geometry and physics, particularly in
the context of studying continuous symmetries relevant to symmetry reduction.
The corresponding Lie Groups are also equipped with product and inverse maps
µ : G × G → G, ℓ : G → G. An important property of Lie groups is left translation
and the related concept of left invariance, central to the construction of invariant
measures (such as the Haar measure) on groups. For a Lie group G, left translation
by g ∈ G, denoted Lg : G → G, Lg (h) 7→ gh for h ∈ G, is a diffeomorphism from G
onto itself. Right translations are similarly defined as Rg (h) 7→ hg. Left invariance
of a vector field X is defined where (dLg -1 h )(Xg ) = Xh (i.e. that X commutes with
left translations) i.e if, for all g ∈ G, we have dLg (Xe ) = Xg , where e is the identity
element in G and dLg is the differential of left translation by g.
Given a smooth function f : G → R, we denote fg (h) = f (gh) for g, h ∈ G
the left translate. A vector field X on G is itself left invariant if (Xf )g = X(fg ).
284 APPENDIX B. APPENDIX (ALGEBRA)

Such left invariant X form the Lie algebra g of G. The vector field X can then be
regarded as a set of tangent vectors Xg at g ∈ G. The special relationship with the
identity is then further understood as follows: the mapping X → XI (where I is the
identity element of G) is a vector space isomorphism between g to the tangent space
XI (mapping vector fields to their vectors at the identity element, thus identifying
the Lie algebra with the tangent space at the identity), thereby (as we discuss
below and in the next Appendix) encoding the structure of the group through its
tangent space at the identity. This isomorphism preserves (and thus ‘carries’) the
Lie bracket, allowing the tangent space at the identity in G to be identified as the
Lie algebra g itself. Thus the entire Lie algebra, and thus group, can be studied in
geometric terms of the infinitesimal translations near the identity.

B.2.3 Matrix Lie groups

We begin with the general linear group GL(n; C) acting on Mn (C), the set of dim n
complex matrices.

Definition B.2.2 (General linear group). The group of all invertible n-dimensional
complex matrices is denoted the general linear group:

GL(n; C) = {A ∈ Mn (C)| det(A) ̸= 0}

As we discuss later on, matrix representations of groups are central to many

methods in quantum information processing. We note (as discussed in Chapter
2 2
4) that Mn (C) may be identified (locally) Cn = R2n via representing another
dimension for the imaginary terms (each complex matrix element represented by
two real numbers). A matrix Lie group is defined as a closed subgroup of GL(n; C)
i.e. each such group is a real-valued submanifold of the general linear group (closed
here means in the topological sense, such that limit points of sequences remain
within the subgroup). To study transformations using such groups, we rely upon
group homomorphisms.

Definition B.2.3 (Lie group homomorphism). Given matrix Lie groups G1 and
G2 , a Lie group homomorphism φ : G1 → G2 is a continuous group homomorphism.
Such a homomorphism is an isomorphism if it is bijective and with a continuous
inverse. Lie groups for which there exists such an isomorphism are equivalent (up
to the isomorphism).

Important groups include the group of n−dimensional invertible real-valued ma-

trices GL(n, R), and SL(n, C) and SL(n, R) special linear groups with determinant
1. Of particular importance is the unitary group defined below.
B.2. LIE THEORY 285

Definition B.2.4 (Unitary matrix and unitary group). An operator U ∈ Mn (C)

is unitary if U † U = U U † = I. The set of unitary operators form the unitary group
denoted U (n).

This group is the group formed by the set of unitary operators in definition
A.1.18. Unitary operators preserve inner products such that U is unitary if and
only if ⟨U v, U w⟩ = ⟨v, w⟩ for all v, w ∈ H. The special unitary group SU (n) is the
set of all U (n) such that det U (n) = 1. For the geometric study of unitary sequences
(for minimisation and time-optimal problems), we characterise such sequences as
paths on a Lie group manifold which is sufficiently connected to enable paths to be
well defined.

Proposition B.2.1 (Connected Lie Group). A (matrix) Lie group G is connected

if for all A, B ∈ G there exists a continuous path γ : [0, 1] → Mn (C) such that
γ(0) = A and γ(1) = B with γ(t) ∈ G for all t.

The connectedness of a Lie group is crucial to its representation in terms of a

differentiable manifold with suitable topology for quantum information tasks. We
characterise G as simply connected if it is connected and every continuous loop
(i.e. curve such that γ(0) = γ(t) in G can be shrunk continuously to a point
g ∈ G. A matrix Lie group G is considered compact if it is compact as a subset of
Mn (C) ∼
2
= R2n .
In particular, of relevance to later Chapters, we note that the groups O(n), SO(n), U (n)
and SU (n) are compact. U (n) is connected allowing continuous paths between
U (0) → U (t). This is an important aspect of controllability (and reachability) of
quantum states in later Chapters. It also impacts the learnability of unitary se-
quences from training data (such as discussed in Chapter 4) i.e. non-compactness
can affect the ability of a model to properly estimate or learn mathematical struc-
ture required for learning optimal controls. The group SU (2), the key group for
qubit-based quantum computation, is defined as:
( ! )
α −β
SU (2) = α, β ∈ C, |α|2 + |β|2 = 1 . (B.2.1)
β ᾱ

For qubits, SU (2) allows for a consistent and complete description of qubit states and
their transformations, which in turn is essential for the design and understanding of
quantum algorithms and the behaviour of quantum systems. This is also the case for
multi-qubit systems, which can be described in terms of tensor products of SU (2).
For multi-qubit computations (and even single-qubit ones), we are often interested
in decomposition into a sequence of single-qubit gates. The simply connected nature
of gates drawn from SU (2) guarantees the density of universal gate sets, allowing
286 APPENDIX B. APPENDIX (ALGEBRA)

approximation of (and controllability for) relevant target unitaries to any desired

accuracy. We now set out below a few standard algebraic facts about Lie groups and
Lie algebras (sourced from [235] and [9]). In particular we emphasise concepts related
to characterising quantum unitary sequences as curves along Lie group manifolds
generated by the corresponding Lie algebra. First, we list a few properties of the
matrix exponential of relevance.

Theorem B.2.5 (Matrix Exponential Properties). The following are properties of

the matrix exponential for X, Y ∈ Mn (C):

(i) e0 = I
Tr †
(ii) eX = (eX )Tr and eX = (eX )†

(iii) For Y, X ∈ Mn (C) invertible:

−1
eY XY = Y eX Y −1 .

(iv) det(eX ) = eTr(X)

(v) If XY = Y X then [X, Y ] = 0 so eX+Y +O[X,Y ] = eX eY

(vi) eX is invertible and (eX )−1 = e−X

(vii) For if XY ̸= Y X, we have

m
eX+Y = lim eX/m eY /m .
m→∞

B.2.4 Lie algebras

Lie algebras are a fundamental mathematical concept for describing the evolution
of quantum systems, geometric structure and time-optimal control. As we expand
upon below, Lie algebras provide a bridge to study symmetry groups by way of their
matrix representations and transformations thereof.

Definition B.2.6 (Lie algebra and Lie derivative). An algebra g is a vector space
over a field K equipped with a bilinear form (product) [X, Y ] for X, Y ∈ g that is
linear in both variables. Such an algebra is a Lie algebra if:

(a) [X, X] = 0 and [X, Y ] = −[Y, X], ∀X, Y ∈ g; and

(b) the Jacobi identity is satisfied, namely:

[[X, Y ], Z] + [[Y, Z], X] + [[Z, X], Y ] = 0 (B.2.2)

B.2. LIE THEORY 287

The product [X, Y ] above is denoted the Lie derivative (sometimes the Lie
bracket). In quantum information contexts, the Lie derivative is denoted the com-
mutator. The commutator exhibits a number of properties set out below.

Proposition B.2.2 (Lie bracket properties). Properties of Lie bracket (commuta-

tor) include:

1. linearity in the first and antilinearity in the second arguments [αA, B + βC] =
α[A, B] + αβ ∗ [A, C] for α, β ∈ C;

2. antisymmetry [B, A] = −[A, B];

3. derivability (Leibniz rule) [A, BC] = [A, B]C+B[A, C] (related to derivations).

In geometric contexts, the Lie bracket (commutator) is equated with the Lie
derivative, satisfying the conditions above. We use these terms interchangeably
as appropriate for the context. Note that for g ⊂ A (where A is an associative
algebra), g can be constructed as a Lie algebra using the Lie bracket [x, y] = xy −
yx, ∀x, y ∈ g (in such cases A is the universal enveloping algebra of g). An important
characteristic of Lie algebras is the action of a Lie algebra upon itself as represented
by the adjoint action, realised in the form of the Lie derivative.

Definition B.2.7 (Adjoint action). The adjoint action of a Lie algebra upon itself
ad : g → EndK (g) is given by:

adX (Y ) = [X, Y ] (B.2.3)

where EndK (g) represents the set of K-scalar endomorphisms of g upon itself and
X, Y ∈ g.

More intuition about the relationship between the Lie derivative and differential
forms is provided by showing that the adjoint action is a derivation satisfying the
Jacobi identity is equivalent to:

adZ [X, Y ] = [X, adZ (Y )] + [adZ (X), Y ] (B.2.4)

The adjoint action is a homomorphic endomorphism of g i.e. ad[X,Y ] = adX adY −

adY adX with kernel Zg below.
In quantum information processing, Lie algebras tend to be represented via rep-
resentations of G ⊂ GL(n, C) or its closed subgroups (closed linear group). An
important result in the theory of Lie groups is that the Lie algebra g of G is canon-
ically isomorphic with the linear Lie algebra gl(n, C) i.e. µ : g → gl(n, C) such that
X ∈ g can be represented in terms of real and imaginary XI ∈ gl(n, C). Thus the
behaviour of gl(n, C) may be used to study that of g and thus G of interest.
288 APPENDIX B. APPENDIX (ALGEBRA)

Definition B.2.8 (Lie algebra homomorphism). A linear map φ : g → h where:

φ([X, Y ]) = [φ(X), φ(Y )] (B.2.5)

where a, b ⊂ g:

[a, b] ={[X, Y ]|X ∈ a, Y ∈ b} (B.2.6)

is defined to be a Lie algebra homomorphism from g to h.

We then can define subalgebras (of critical importance to our discussion of Cartan
methods below and control subsets) and ideals.
Definition B.2.9 (Lie subalgebras and ideals). A set h ⊂ g is a Lie subalgebra if
[h, h] ⊆ h, in which case h is also a Lie algebra. A subalgebra is an ideal if [h, g] ⊆ h.
Lie algebras can also be written as direct sums of the underlying vector spaces
g1 , g2 equipped with an (induced) Lie bracket (adjoint preserving Killing form):

[(X1 , Y1 ), ([X2 , Y2 ])] = ([X1 , Y1 ], [X2 , Y2 ]) (B.2.7)

for Xk , Yk ∈ gk .
For Cartan decompositions, we require the classification of semi-simple Lie groups.
This in turn requires other sundry definitions.
Definition B.2.10 (Centralizer and Normalizer). The centralizer Zg (s) of s ⊂ g is
a subalgebra of g comprising elements that commute with all of s:

Zg (s) = {X ∈ g|[X, Y ] = 0, ∀Y ∈ s} (B.2.8)

The normalizer of g is the set of all elements of g whose commutator with s is in s:

Ng (s) = {X ∈ g|[X, Y ] ∈ s, ∀Y ∈ s} (B.2.9)

The center Zg is the set of X ∈ G that commute with all elements in g. To

understand semi-simple Lie algebras, we can define nested commutators with a com-
mutator series and lower central series as:

g0 = g gj+1 = [gj , gj ] g = g0 ⊇ g1 ⊇ ... (commutator series) (B.2.10)

g0 = g gj+1 = [g, gj ] g = g0 ⊇ g1 ⊇ ... (lower central series) (B.2.11)

From these definitions we obtain the concepts of g being solvable if g j = 0 for some
j and nilpotent if gj = 0 for some j (nilpotent implies solvable). We can now define
simple and semi-simple Lie algebras as follows:
B.2. LIE THEORY 289

Definition B.2.11 (Simple and semisimple). A finite-dimensional g is simple if g

is nonabelian and has no proper non-zero ideals. We say g is semisimple if g has
no nonzero solvable ideals. A Lie group G is said to be semisimple if its Lie algebra
g is semisimple.

This last test for semisimplicity is equivalent to the radical rad g = 0. Simple
g are semisimple by construction. They are algebras are closed under commutation
[g, g] = g (important for control). Semisimple g have empty centers. The criterion
for semi-simplicity is related together with the Killing form below to Cartan’s sem-
inal classification of Riemannian symmetric spaces (see section C.3) which is the
subject of our work in Chapter 5.

B.2.5 Killing form

We now define the Killing form, a bilinear form defined on g which plays an cen-
tral role in allowing algebraic concepts to be connected to geometric analogues and
enabling classification of algebraic structures. In particular, as we discuss below,
conditions on the Killing form allow its association as a Riemannian or subRieman-
nian metric (used to calculate minimal path lengths, and thus time, of curves in G)
for symmetric spaces the subject of our final chapter.

Definition B.2.12 (Killing form). The Killing form is a bilinear transformation

given by:

B(X, Y ) = Tr(adX , adY ) (B.2.12)

for X, Y ∈ g.

The Killing form is an important element in Cartan’s classification of semi-simple

Lie algebras and, among other things, establishing appropriate metrics (and inner
products) on g for variational methods discussed later on. Additionally, Cartan’s
criterion for semisimplicity is that the Killing form is non-degenerate, that is g is
semisimple if and only if the Killing form for g is non-degenerate, namely B(X, Y ) ̸=
0 for all X, Y ∈ g. Other concepts we briefly note are that g is reductive if for each
ideal a ⊂ g, there exists another ideal b ⊂ g such that g = a ⊕ b (i.e. can be written
as the direct product of ideals). It can be shown this is equivalent to g being closed
under the conjugate transpose (adjoint) action. Under such conditions, the Killing
form can be used to define an inner product via (X, Y ) = −B(X, Y ) and thus a
metric. Moreover, the Killing form is invariant under the adjoint action of the Lie
group, meaning B(Adg X, Adg Y ) = B(X, Y ) for all g ∈ G and X, Y ∈ g. This
invariance property of the Killing form ensures the induced metric is well-defined
across the manifold G. We explore this in our final chapter.
290 APPENDIX B. APPENDIX (ALGEBRA)

B.2.6 Matrix exponential

The matrix exponential is an important feature in the theory of Lie algebras and Lie
groups (especially regarding their connection) acting as a fulcrum concept that links
both operationally and semantically from algebraic theory to quantum theory. The
various representations of the matrix exponential serve a variety of uses, such as in
terms of infinite power series (especially relating to the use of the adjoint operator
in Chapter 5) and as group elements. It is defined below.

Definition B.2.13 (Matrix exponential). The matrix exponential is defined by the

following power series:
∞
X
X Xn
e = . (B.2.13)
n=0
n!

This can usefully be expressed as:

n
X X
e = lim I+ (B.2.14)
n→∞ n

where X ∈ g or some other vector space, such as Mn (C). More intuition can be
obtained via the derivative of the matrix exponential function:

d tX
e = X. (B.2.15)
dt t=0

Equation (B.2.15) provides an important way of understanding, via the represen-

tation of matrix groups, the connection between Lie algebras and Lie groups. It
indicates that the derivative at the identity element of the group is precisely the Lie
algebra element X, demonstrating how Lie algebras serve as the infinitesimal gener-
ators of Lie group actions. Moreover, the one-parameter subgroup theorem below,
stating that for every element X in the Lie algebra there corresponds a unique one-
parameter subgroup in the Lie group with A(t) = exp (tX), explicitly reveals the
relationship between the algebraic structure of Lie algebras and the topological and
geometric structure of Lie groups. This unique mapping via the exponential allows
the exploration of continuous symmetries of groups by way of their corresponding
Lie algebra. We note then the definition of a one-parameter subgroup of G.

Definition B.2.14 (One-Parameter Subgroup of GL(n; C)). A one-parameter sub-

group of GL(n; C) is denoted G(·) and is a continuous homomorphism of R →
GL(n; C). Equivalently, for A ∈ G(·), it corresponds to a continuous map A : R →
GL(n; C) such that A(0) = I and A(s + t) = A(s)A(t) for s, t ∈ R.

It can then be shown that there exists a unique element of the Lie algebra
B.2. LIE THEORY 291

(representation) used to establish the relationship between a Lie algebra and Lie
group.

Theorem B.2.15 (One-Parameter Subgroup Exponential Representation). If G(·)

is a one-parameter subgroup of GL(n; C), then there exists a unique X ∈ Mn (C)
such that A(t) = exp (tX) parametrised by t ∈ R.

These relations show how continuous symmetries of Lie groups may be studied
via the Lie algebra as a tangent space at the identity element of the group. In
quantum information contexts, the matrix exponential allows the study of system’s
dynamics via mapping the Hamiltonian H (see definition A.1.33) to group elements
represented as unitary operators (definition A.1.18). We now formalise this relation-
ship.

B.2.7 Lie algebra of matrix Lie group

Lie algebras are associated with Lie groups via the exponential map.

Definition B.2.16 (Lie Algebra of a Matrix Lie Group). Given a matrix Lie group
G ⊆ GL(n; C), the Lie algebra g for G is defined as follows:

g = {X ∈ Mn (C) | etX ∈ G for all t ∈ R}.

Note that X ∈ g if and only if the one-parameter subgroup is in closure of G.

For X ∈ g, it is sufficient that etX ∈ G for all t ∈ R. For any matrix Lie group G,
the Lie algebra g of G has the following properties. The zero matrix 0 belongs to g.
For all X in g, tX belongs to g for all real numbers t. For all X and Y in g, X + Y
belongs to g. For all A ∈ G and X ∈ g we have AXA−1 ∈ g. For all X and Y in g,
the commutator [X, Y ], defined by

d tX −tX
[X, Y ] := XY − Y X = e Ye (B.2.16)
dt t=0

belongs to g. The first three properties of g say that g is a real vector space. Since
Mn (C) is an associative algebra under the operation of matrix multiplication, the last
property of g shows that g is a real Lie algebra. Equation (B.2.16) also indicates that
the Lie algebra g is the set of derivatives at t = 0 of all smooth curves γ(t) ∈ G where
such curves equal the identity at zero. It thus provides a bridge between typical
algebraic formulations of commutators and geometric representations of tangents.
In the next section we provide more detail on the relationship between Lie theory
and differential geometric concepts such as curves. Note that we can define the Lie
292 APPENDIX B. APPENDIX (ALGEBRA)

algebras of u(n) and su(n):

u(n) = {X ∈ Mn (C)|X † = −X} (B.2.17)

su(n) = {X ∈ u(n)|Tr(X) = 0} (B.2.18)

The significance of the exponential map can be further understood as follows. Group
actions Φ : G → G have a corresponding map in the Lie algebra ϕ : g → g such
that:

Φ(etX ) = etϕX (B.2.19)

where ϕ(X) = dtd Φ(etX )|t=0 (implying smoothness for each X). Here ϕ is a linear
homomorphism such that:

ϕ([X, Y ]) = [ϕ(X), ϕ(Y )] ϕ(AXA -1 ) = Φ(A)ϕ(X)Φ(A) -1 (B.2.20)

where A ∈ G and X ∈ g. For a group G and corresponding algebra g, X ∈ g if and

only if eX ∈ G.
A key feature in quantum control contexts is framing control of the evolution of
quantum systems in terms of paths over unitary Lie group manifolds. For this to be
the case, in essence we require that Lie group homomorphisms be characterised or
determined by Lie algebra homomorphisms - that is, we essentially want guarantees
that arbitrary paths along the manifold can be generated by applying controls to
generators in g. This is particularly the case when the Lie groups of interest exhibit
topological properties of being simply connected (meaning any close curve in G can
be shrunk to an arbitrary element of G itself - or that is, there are no ‘holes’). In
a control context, the simply connected nature of the underlying group manifold is
important for reachability of targets. If a group G is connected and simply con-
nected, then it can be shown that the Lie algebra and Lie group are related such
that there is unique mapping Φ such that the group-algebra homomorphism corre-
spondence set out in equation (B.2.19) holds. From a geometric perspective, the
following relationship between Lie algebras and tangent spaces is set out.

Theorem B.2.17 (Lie algebra tangent space correspondence). Each Lie algebra g of
G is equivalent to the tangent space to G at the identity. The algebra g is equivalent
to the set of X ∈ Mn (C) such that there exists a smooth curve γ : R → Mn (C) ⊆ G
with γ(0) = I, γ ′ (0) = X.

This can be seen given equation (B.2.18) in concert with γ(t) = eX(t) (such that
γ ′ (0) ∈ g). Moreover, for a matrix Lie group that is connected, then it can be shown
that for γ ∈ G, there is a finite set of elements in g that generate γ. These facts are
B.2. LIE THEORY 293

important to the characterisation and control of quantum unitary sequences (and

circuits) in terms of paths along manifolds in Chapters 4 and 5. The relationship
between Lie algebra and group homomorphisms is not simply one-for-one, as can
be seen the fact that su(2) ≃ so(3) while SU (2) and SO(3) are not isomorphic (see
literature [9, 230] etc for more detail). We note finally for completeness that on
occasions one is interested in the universal cover G̃ of a group G which effectively
is a simply connected matrix Lie group which ‘covers’ the space of interest (in the
sense that there is a group homomorphism between the groups and isomorphism
among their algebras that allows the study or manipulation of group G, which may
not be simply connected, via G̃ and g̃).

B.2.8 Homomorphisms

The relationship between Lie algebras and Lie groups via the exponential map is
considered a lifting of homomorphisms of Lie algebras to homomorphisms of (an-
alytic) groups. That is, for G, H analytic subgroups, G simply connected and the
Lie algebra homomorphism φ : g → h, then there exists a smooth homomorphism
Φ : G → H such that dΦ = φ. As [9] notes, there are two equivalent ways to express
such a lifting: either relying on lifting homomorphisms to simply connected analytic
groups, or relying on existence theorems for sets of solutions to differential equa-
tions. The first approach involves defining curves γ : R → G with γ(t) 7→ exp(tX).
One then defines d/dt and X̃ as left invariant fields on R, G such that:

d d d
dγ(t) f= f (γ(t)) = f (exp(tX)) = X̃f (exp(tX))
dt dt dt

hence we see the important explicit relationship between the left-invariant vector
fields X̃ and d/dt. Among other things, this shows the mapping X → exp(X)
is smooth (given the smoothness of d/dt). Expressed in local coordinate charts,
the equation above represents a system of differential equations satisfied by curves
γ(t) represented as exponentials. Among other aspects, this relation enables the
machinery of analysis to be brought to bear in group theoretic problems in quantum
information contexts. For example, it enables the application of Taylor’s theorem
such that:

dn
(X̃ n )(g exp(tX)) = (f (g exp(tX))
dtn

for g ∈ G, f ∈ C ∞ . Noting also that:

d
X̃f (g) = f (g exp tX) t=0
(B.2.21)
dt
294 APPENDIX B. APPENDIX (ALGEBRA)

shows again how the operation of left-invariant vector fields X̃ is equated with
differential operators.

B.2.9 Baker-Campbell-Hausdorff theorem

As Hall [235] notes, an important feature of the correspondence between Lie alge-
bras and Lie groups is to show that the Lie group homomorphism Φ : G → H (for
Lie groups G and H) defined by Φ(exp(X)) = exp(ϕ(X)) where ϕ : g → h is a
Lie algebra homomorphism. A consequence of this homomorphism is the Baker-
Campbell-Hausdorff (BCH) theorem which allows a direction relationship between
the properties of the exponential map and Lie algebraic operations to become ap-
parent.
Definition B.2.18 (Baker-Campbell-Hausdorff). The Baker-Campbell-Hausdorff
formula shows that the map Φ is identified from Ue ∈ G into H (where e is the
identity element of G) is a local homomorphism by the Lie group-Lie algebra homo-
morphism, such that for sufficiently small X, Y, Z ∈ g:

1 1
exp(X) exp(Y ) = exp X + Y + [X, Y ] + [X, [X, Y ]]... (B.2.22)
2 12

(see Hall [235] §5 for proofs including the Poincaré integral form). The BCH
formula is important in quantum control settings in particular as we discuss in
other parts of this work. Moreover, equation (B.2.22) fundamentally elucidates
the important role of adjoint action (the commutator) in shaping quantum state
evolution via its effect upon exponentials (as diffeomorphic maps on M) which
arises from the non-commutativity of the Lie bracket.

B.3 Representation theory

B.3.1 Overview
In this section, we briefly cover elements of the representation theory of semi-simple
groups. This is relevant to geometric treatments involving Lie algebras. Represen-
tations are abstractions (homomorphisms) of groups (in terms of actions on gener-
alised linear (invertible) vector spaces, i.e. on GL(n; C) the group of invertible linear
transformations of a finite-dimensional vector space V for which the associated Lie
algebra is gl(n, C).

B.3.2 Representations
We begin with the definition of representations.
B.3. REPRESENTATION THEORY 295

Definition B.3.1 (Finite-Dimensional Representation of a Lie Group). A finite-

dimensional representation of G is a continuous homomorphism of π : G → GL(V ).
A representation satisfies:

π([X, Y ]) = [π(X), π(Y )] (B.3.1)

for X, Y ∈ G and equivalently for the lie algebra π : g → gl(V ).

An invariant subspace for a representation φ : g → GL(V ) on a subspace U ⊂

GL(V ) is one such that φ(X)U ⊆ U for X ∈ g (i.e. if the only G-invariant subspaces
of V under π are the trivial subspace {0} and V itself). A representation on a non-
zero space GL(V ) is irreducible if the only invariant subspaces are 0 and GL(V ).
That is, given π : G → GL(V ), a subspace U ⊂ V is invariant if π(g)w ∈ W , for all
g ∈ G, w ∈ W . Two representations of G are isomorphic if and only if the Lie algebra
representations are isomorphic. This result allows us in particular to move between
the fundamental and adjoint representation with implications, for example, with
respect to optimal parametrisation of invariant neural networks whose parameters
are correlated to the choice of representation [30, 256]. Representations φ, φ′ are
equivalent if there exists an isomorphism between the two vector spaces χ : V → V ′
such that χφ(X) = φ′ (X)E. A completely reducible subspace is then one which
can be decomposed into the direct sum of irreducible representations i.e. such that
V = ⊕i Ui for U ⊂ V, i = 1, ..., dim V .

B.3.3 Complex Semi-simple Lie Algebras

Of central importance to our result in Chapter 5 are what are known as Cartan
decompositions of symmetric space Lie groups as a technique for solving certain
classes of time-optimal quantum control problems. In this section, we provide a brief
summary of Cartan decompositions, their relationship to Lie groups and algebras
together with root systems and Dynkin diagrams. The section largely summarises
standard work in the literature [2, 9, 10, 47]. Quantum information processing prob-
lems in this work (and much of the field) are often related to semisimple Lie groups
and Lie algebras. In our case, we are interested in Cartan subalgebras h ⊂ g that
are abelian and certain subclasses of maximally abelian subalgebras. The Cartan
subalgebra forms the generators the adjoint action of which on g enable its root-
space decomposition which in turn gives rise to an abstracted root system. The
root system can then in turn be related (given a fixed ordering or choice of basis)
to simple roots. These roots can be used to form a basis of g, from which a Cartan
matrix and Dynkin diagram may be constructed and, as we show in Chapter 5, is
related to analytic forms of time-optimal unitary synthesis. The relevance of root
296 APPENDIX B. APPENDIX (ALGEBRA)

systems, Cartan matrices and such diagrams in quantum control settings is explored
primarily in the final Chapter of this work.

B.3.4 Adjoint action and commutation relations

An important representation used throughout this work is the adjoint representation.
We set out its definition below.

Definition B.3.2 (Adjoint representation). Given a Lie group G, Lie algebra g

and X ∈ g with smooth isomorphism Adg (h) = ghg -1 where g, h ∈ G, there exists a
corresponding Lie algebra isomorphism adX : g → g such that:

exp((adg )(X)) = g(exp(X))g -1

i.e. exp(adg (X)) = AdG (exp(X)).

In the above, the former being the Lie algebra adjoint, the latter the group
adjoint action, the two concepts being related by the exponential map. We denote
adg (X) = gXg -1 . We note that each Lie group exhibits the structure of a manifold
that is real and analytic (under multiplication and inversion and for the exponential
map). Moreover, as noted in [9], it can be shown that each real Lie algebra has a
one-to-one finite dimensional representation on complex vector spaces V (C).

B.3.5 Adjoint expansions

We note for completeness a few additional concepts. Firstly automorphisms of a
Lie algebra g are invertible linear maps L such that [L(X), L(Y )] = L([X, Y ]) for
X, Y ∈ g. In addition we have that adL(X) = L(adX )L -1 . The Killing form is
invariant under such automorphisms B(L(X), L(Y )) = B(X, Y ) for X, Y ∈ G.
Note that the adjoint action of group elements can be expressed via conjugation
as Adh (g) = hgh -1 where g, h ∈ G. The adjoint action of a Lie algebra upon itself,
which we focus on, is via the commutator namely adX (Y ) = [X, Y ] where X, Y ∈ g.
Thus terms such as adΘ (Φ) involve calculation of commutation relations among
generators. We also note the following relation (utilised in Chapter 5):

eiΘ Xe−iΘ = eiadΘ (X) = cos adΘ (X) + i sin adΘ (X). (B.3.2)

The cos adΘ (X) term can be understood in terms of the cosine expansion:
∞
X (−1)n
cos adΘ (X) = (adΘ ))2n (X). (B.3.3)
n=0
(2n)!
B.3. REPRESENTATION THEORY 297

Thus each term is a multiple of ad2θ . For certain choices of generators Θ and X ∈ k
this effectively means that the adjoint action acts as an involution up to scalar
coefficients given by the relevant root functional α(X) (see below). Thus in the
SU (2) case, ad2nJy (Jz ) ∝ Jz (with n = 0, 1, 2, ...), each application of the adjoint
action acquires a θ2 term (from Θ = iθJy ), such that the series can be written:
∞
X (−1)n
cos adΘ (−iJz ) = (θ))2n (Jz ) = cos(θ)(−iJz ) (B.3.4)
n=0
(2n)!

More generally, the Cartan commutation relations exhibit such an involutive prop-
erty given that for Θ ∈ k:

adΘ (k) = [Θ, k] ⊂ p ad2Θ (k) = [Θ, [Θ, k]] ⊂ k (B.3.5)

In the general case, assuming an appropriately chosen representation, each applica-

tion of the adjoint action by even n or odd 2n + 1 powers (and thus our cos θ and
sin θ terms) will be scaled by an eigenvalue α, such eigenvalue being the root (see
below for an example). Thus we have ad2n Θ (k) ⊂ k and:

cos(adΘ )Φ = cos α(θ)Φ (B.3.6)

(for Φ ∈ k) such as in equation (5.3.20) and similarly for the sine terms. For the
sin adΘ (X) terms, by contrast, the adjoint action is of odd order:
∞
X (−1)n
sin adΘ (X) = (adΘ ))2n+1 (X) (B.3.7)
n=0
(2n + 1)!

such that ad2n+1

Θ (k) ∈ p, hence the sine term arising in our Hamiltonian form equa-
tion (5.6.67) which, when conjugated with Λ results in a Hamiltonian given in terms
of control generators, namely:

[sin adΘ (k), Λ] ⊂ p (B.3.8)

By convention, sometimes one sees in the literature KAK decompositions written

in the form ek1 aek2 where k1 , k2 ∈ k, eki ∈ K. Here the two exponentials are group
elements while a ⊂ p. Thus in equation (B.3.2) we write the adjoint action as
conjugation given it involves the Lie group (rather than Lie algebraic) elements,
while for the action of a Lie algebra on itself, we would use the commutation relation.
The choice is usually one of convenience.
298 APPENDIX B. APPENDIX (ALGEBRA)

B.3.6 Real and complex forms

Complex Lie algebras g (over C) and real Lie algebras g0 (over R) are related via
complexification and the related concept of the real form of a complex Lie algebra.
Such complexification of Lie algebras is relevant in particular to formulating Cartan
decompositions of Lie algebras and their associated groups (see definition B.5.2).
We begin with a definition of real forms.

Definition B.3.3 (Real form). We say that a real vector space V (R) is the real
form of a complex vector space W (C) when the two are related as:

W R = V ⊕ iV

Where we denote W the complexification of V .

This equivalence is a statement that a real vector space can be complexified in

the manner above. For Lie algebras, we often denote a real Lie algebra (as a vector
space over R) via a 0 subscript g0 . In such terms, we may denote:

g = gC0 = g0 + ig0 (B.3.9)

as the complexification of g0 (and equivalently indicating that g0 is the real form

of the complex Lie algebra g). Doing so allows the retention of certain properties
of g0 while allowing complex coefficients yet retaining structure of interest from
g0 (including decompositional structure). We can describe the real forms of g via
g = g0 + ig0 , analogous to splitting Cn into R2n (parametrising the imaginary part).
In particular for M ∈ Mn (C) we have under this transformation (used in equation
(4.6.1) below):
!
ReM −ImM
Z(M ) = (B.3.10)
ImM ReM

B.4 Cartan algebras and Root-systems

B.4.1 Complex Semi-simple Lie Algebras

Cartan subalgebras are of central importance across an expansive variety of physi-
cal and informational problems in physics and quantum computation. The algebras
derive from Cartan’s study of symmetry groups and provide a means of character-
ising the underlying or fundamental properties of such groups, such as their asso-
ciated root systems. In turn, as we discuss and show throughout this work, those
B.4. CARTAN ALGEBRAS AND ROOT-SYSTEMS 299

symmetry-related properties, such as roots and weights, have significant implica-

tions for symmetry reduction and characterisation of a wide range of physical and
mathematical phenomena. This includes, in particular, weight systems (or set of
weights) associated with representations of a Lie algebra or Lie group. Weights are
elements of the dual space (see definition A.1.8) of a Cartan subalgebra and label
the eigenspaces of the representation with respect to the Cartan subalgebra. Root
systems are primarily used in the classification and structure theory of Lie algebras,
while weight systems are used in the study and classification of their representa-
tions. As noted in the literature, the process involves (a) identifying the (complex
semi-simple Lie algebra) g, (b) selecting a Cartan subalgebra h, (c) constructing
an abstract reduced root system, (d) selecting a choice of ordering and (e) then
constructing a Cartan matrix.

B.4.2 Roots

We define root systems as per below [9, 235]. Given a subalgebra h ⊂ g with a
(diagonal) basis H and a dual space h∗ with basis elements ej ∈ h∗ , we recall that
the duals are a vector space of functionals ej : V → C. For a matrix Lie algebra,
We can construct a basis of g given by {h, Eij } where Eij is 1 in the (i, j) location
and zero elsewhere. We are interested in the adjoint action of elements of H ∈ h on
each such eigenvector as:

adH (Eij ) = [H, Eij ] = (ei (H) − ej (H))Eij = αEij

where ej selects out the (j, j)th element of a matrix (recalling duals ej (v) = vj δij , vj ∈
C form a vector space of linear functionals). That is α = ei − ej is a linear functional
on h such that α : h → C. Such functionals α are denoted roots. Thus g can be
decomposed as:

g = h ⊕i̸=j CEij = h ⊕α∈∆ gα gα = {X ∈ g|adH (X) = αX, H ∈ h} (B.4.1)

This is the root space decomposition of g with X ∈ gα , denoted the Cartan-Weyl

basis. As we see below, such decomposition is related to the existence of Car-
tan subalgebras and eventually Cartan decompositions, relevant to results in above
Chapters. The commutation relations are given by:

 g if α + β is a root
 α+β


[gα , gβ ] := 0 if α + β is not a root or 0 (B.4.2)


⊆ h

if α + β = 0.
300 APPENDIX B. APPENDIX (ALGEBRA)

with [Eij , Eji ] = Eii − Ejj ∈ h. Roots are then defined as positive or negative (with
− and may be ordered as a sequence. Each root is a nongeneralised weight of adh (g)
(a root of g with respect to h). The set of roots is denoted ∆(g, h).

B.4.3 Cartan subalgebras

Given a Lie algebra representation, we can construct a system analogous to a root
system, via a weight space and weight vectors, as follows.

Definition B.4.1 (Weights). Given a representation π : h → V (C) and roots α ∈

h∗ , define the generalised weight space as:

Vα = {v ∈ V |(π(H) − α(H)1)n v = 0, ∀H ∈ h}

for Vα ̸= 0.

The elements of v ∈ Vα are generalised weight vectors with α the weights. If

this condition holds, we say h is a nilpotent Lie algebra, such that there are finitely
many generalised weights and V = ⊕α Vα (weight space decomposition). Weights
needn’t be linearly independent and are dependent upon roots. For a Lie algebra,
we denote Vα as gα . It can be shown that for such a nilpotent h relative to adg h
(the action on h), the generalised weight vectors have the following properties: (a)
g = ⊕gα , (b) h ⊆ g0 and (c) [gα , gβ ] ⊆ gα+β (zero if α + β is not a weight etc). Note
that

gα = {X ∈ g|(adH − α(H)1)n v = 0, ∀H ∈ h}
g = h ⊕α∈∆ gα

i.e. π is the adjoint action of h on X ∈ g. Here g0 (not to be confused with the real
g0 we discuss below) is then the weight space subalgebra associated with the zero
weight under this adjoint action, it denotes the weight space corresponding to the
zero weight in the decomposition of the Lie algebra g (the centralizer of h ⊂ g). We
come now to the definition of a Cartan subalgebra.

Definition B.4.2 (Cartan subalgebra). A Cartan subalgebra h ⊂ g is a nilpotent

Lie algebra of a finite-dimensional complex Lie algebra g such that h = g0 . Cartan
subalgebras are maximally abelian subalgebras of g.

The algebra h is a Cartan subalgebra if it is the normalizer with respect to g,

that is if Ng (h) = {X ∈ g|[X, h] ⊆ h}. It can be shown that all finite complex Lie
algebras g have Cartan subalgebras. For complex semisimple Lie algebras, Cartan
algebras h are abelian (B([Hk , Hj ]) = 0 for Killing-form B and Hk , Hj ∈ h). A
B.4. CARTAN ALGEBRAS AND ROOT-SYSTEMS 301

Cartan subalgebra h is the maximally abelian subalgebra such that the elements of
adg (h) are simultaneously diagonalisable. Moreover, all such Cartan subalgebras of
g are conjugate via an automorphism of g in the form of an adjoint action. With this
understanding of Cartan subalgebras we can now generalise the concepts of roots.

B.4.4 Root system properties

Roots exhibit a number of properties [9]. We set out a number of these relevant to
results in later Chapters and in particular our methods in Chapter 5.

Definition B.4.3 (Roots). Non-zero generalised weights of adh (g) are denoted the
roots of g with respect to the Cartan subalgebra. Denote the set of roots α is ∆(g, h)

Given a Killing form B(·, ·), a few standard results hold: (a) B(gα , gβ ) = 0 for
α, β ∈ ∆ (including 0); (b) α ∈ ∆ =⇒ −α ∈ ∆, (c) each root α is associated with
a Hα ∈ h such that we can identify a correspondence between the root functional
and Killing form given by:

α(H) = B(H, Hα ) ∀H ∈ h

noting that ∆ spans h∗ . Importantly we have that for each eigenvector Eα ∈ g

(non-zero) that:

[H, Eα ] = α(H)Eα

(see [9] for standard results e.g Lemma 2.18). Other results include that adh (g) is
simultaneously diagonalisable, and that the Killing form has an expression in terms
of the functional:
X
B(H, H ′ ) = α(H)α(H ′ ) B :h×h→C
α

together with normalisability (such that B(Eα , E−α ) = 1) and standard commuta-
tion relations:

[Hα , Eα ] = α(Hα )Eα [Hα , E−α ] = −α(H−α )E−α [Eα , E−α ] = Hα . (B.4.3)

Finally in this section, we note concepts of relevance to Cartan matrices applied in

the final chapter. We denote root strings, or α-strings as follows: given α ∈ ∆, β ∈
∆ ∪ ∅, the α-string that contains β is the set of all elements β + nα ∈ ∆ ∪ ∅ for
n ∈ Z. Essentially denoting a (symmetric) string of roots β + nα for −p ≤ n ≤ q
302 APPENDIX B. APPENDIX (ALGEBRA)

(p, q ≥ 0) with:

2 ⟨β, α⟩
p−q = .
⟨α, α⟩

B.4.5 Abstract root systems

Abstract root systems are an important way of capturing the essence of the sym-
metries within a Lie algebra via enabling the construction of relations among roots
that may be interpreted using the formalism and language of symmetry.

Definition B.4.4 (Root System). A root system (V, ∆) is a finite-dimensional ∆ ∈

V (R) equipped with an inner product ⟨·, ·⟩ and norm | · |2 with a subset ∆ = {α ∈
V |α ̸= 0} satisfying (a) V = span(∆), (b) α ∈ ∆ and c ∈ R =⇒ cα ∈ ∆ only if
c = ±1, (c) given the linear transformation sα :

⟨β, α⟩
sα · β = β − 2 α, β∈E
⟨α, α⟩

for α, β ∈ ∆ then sα · β and (d):

⟨β, α⟩
∈ Z.
2⟨α, α⟩

The rank of V is dim(V ). The elements α ∈ ∆ are roots. A root system is reduced
if it satisfies (a) to (d) above, or non-reduced if it doesn’t satisfy (b). Symmetric
spaces root systems of interest in this work are of the non-reduced type. sα is a
reflection about the hyperplane with respect to α (an orthogonal transformation
with determinant −1). Such reflections are said to generate an orthogonal group
of V , the Weyl group W , generated by such reflections. This general formulation
allows root systems to be associated to all complex semisimple Lie algebras.

B.4.6 Reduced abstract root systems

A reduced abstract root system in h∗0 is one where α ∈ ∆ but 2α ∈ / ∆. Familiar
Lie algebraic concepts apply to root systems such as the classification of reducible
root systems (orthogonal decompositions V = ⊕Vk such that α belongs to a Vk )
and corresponding irreducible root systems. An important element in the geometric
characterisation of root systems is set out below. Given α, β (linearly independent)
and ⟨α, α⟩ ≥ ⟨β, β⟩ it can be shown that there are limited allowed angles θ and
length ratios among the roots via the proposition that: (a) ⟨α, β⟩ = 0, (b) ⟨α, α⟩ =
⟨β, β⟩ with θ = π/3 or 2π/3, (c) ⟨α, α⟩ = 2 ⟨β, β⟩ (for θ = π/4, 3π/4) or (d)
⟨α, α⟩ = 3 ⟨β, β⟩ (for θ = π/6, 5π/6). If θ ∈ (π/2, π) (obtuse) then α + β is a root or
B.4. CARTAN ALGEBRAS AND ROOT-SYSTEMS 303

θ ∈ (0, π/2) then ±(α − β) are roots. These angles can be used to plot the relevant
root system (see Chapter 5). In more general settings, the geometric representation
of roots, under certain assumptions, then exhibits certain permutation symmetries
and the applicable Weyl group is defined in terms of these.

B.4.7 Ordering of root systems

One final concept before coming to the Cartan matrix is that of fixing a lexicographic
ordering related to the concept of positivity of roots. Given such a set of roots:

∆ = {α1 , α2 , . . . , αl }

we introduce an ordering as follows. A root β = li=1 bi αi is positive with respect

to the lexicographic ordering if the first non-zero coefficient bj in its expansion with
respect to the basis ∆ is positive. Conversely, the root β is negative if this first
non-zero coefficient is negative. The lexicographic order is defined by comparing
roots: for two distinct roots β = li=1 bi αi and γ = li=1 ci αi , we say that β > γ
P P

if, at the first index j where bj and cj differ, we have bj > cj . This comparison
allows partitioning the root system into two disjoint sets ∆+ and ∆− of positive and
negative roots.
The highest root in a root system is the maximal element with respect to this
lexicographic ordering and is always a positive root. This root plays a pivotal role
in the structure and representation theory of semisimple Lie algebras.
As we articulate in Chapter 5, the construction of the abstract root system for Lie
algebras (related to quantum control problems of interest) involves determining root
systems. We note that α is simple if α > 0 and it is linearly independent of other
P
roots. It can be shown that any positive root can be decomposed as α = i ni αi
(with αi simple and ni ≥ 0). The sum of ni is an integer denoted the level of α with
respect to the set of simple roots (important as the set of simple roots generates the
entire root system through integer linear combinations).

B.4.8 Cartan matrices

We now define the Cartan matrix used in Chapter 5. To do so we fix a root system
∆, assume each α ∈ ∆ is reduced and fix an ordering as per above. Data from
Cartan matrices may be represented in diagrammatic form via Dynkin diagrams.

Definition B.4.5 (Cartan matrix). Given ∆ ∈ V and simple root system Π = {αk },
304 APPENDIX B. APPENDIX (ALGEBRA)

k = 1, ..., n = dim V , the Cartan matrix of Π is an n × n matrix A = (Aij ) where:

⟨αi , αj ⟩
Aij = 2 (B.4.4)
|αi |2

A variety of other standard properties apply to abstract root systems ∆ con-

straining the roots and their ratios (e.g. 2 ⟨β, α⟩ /|α2 | = 0, ±1, ±2, ±3, ±4 and ±1
if α, β are non-proportional with |α| ≤ |β|). These and other constraints limit the
ratios and relation of roots in ways that allow their representation in terms of Cartan
matrices and diagrammatic representations. Different orderings (enumerations) of Π
lead to different Aij , however they are all conjugate to each other. Cartan matrices
satisfy certain properties e.g. Aij ∈ Z, Aii = 2, Aij ≤ 0(i ̸= j), Aij = 0 ↔ Aji = 0
and the existence of a diagonal matrix D that renders A symmetric positive definite
under conjugation. The (multiple) block diagonality of a Cartan matrix implies its
reducibility, otherwise the matrix is irreducible. From the Cartan matrix, we can
construct a Dynkin diagram (as distinct from a root diagram) as follows.

Definition B.4.6 (Dynkin diagram). A Dynkin diagram is a graph diagram of a

set of simple roots Π. Let each simple root αi be a vertex of the graph, with a weight
proportional to ⟨αi , αi ⟩ = |αi |2 . The graph construction rules are as follows. Two
simple roots αi , αj are connected by Aij Aji edges (which may be zero).

While Dynkin diagrams are not focus for use in this work, we do relate them
to quantum control results in the Chapter 5, hence here we mention a few sundry
properties. For an l × l Cartan matrix A, the associated Dynkin diagram has the
following structure: (a) there are at most l pairs of vertices i < j having one edge
(at least) between; (b) the diagram has no loops; and (c) at most three edges (triple
points) connect to any node. The Cartan matrix may be used to determine a Dynkin
diagram (up to scaling factors) with a description set out in Fig. B.1 (see [9] for
more detailed exposition).

α1 α2 αn
Figure B.1: Expanded Dynkin diagram of type An with labeled vertices and edges. The numbers
above the nodes indicate the length of the roots relative to each other. Aij Aji determines the
number of lines (or the type of connection) between vertices i and j. This product can be 0 (no
connection), 1 (single line), 2 (double line), or 3 (triple line), representing the angle between the
corresponding roots. Additionally, when Aij Aji > 1, an arrow is drawn pointing from the longer
root to the shorter root.

By constructing an abstract reduced root system, one can then show that for two
complex semisimple Lie algebras with isomorphic abstract reduced root systems, the
B.5. CARTAN DECOMPOSITIONS 305

associated Cartan matrices are isomorphic. If two complex semisimple Lie algebras
have the same Dynkin diagram, they are isomorphic, thereby sharing the same
algebraic structure. Operations on Cartan matrices correspond to operations on
Dynkin diagrams, allowing visualisation of transformations. It can be shown that
the choice of Cartan matrix is independent of the choice of positive ordering up to
a permutation of index and that the Cartan matrix determines the reduced root
system. The Weyl group, as the set of orthogonal transformations , can be used to
determine ∆+ and Π.

B.5 Cartan decompositions

An important set of theorems for the results in Chapters 4 and 5 are those related
to the Cartan decomposition of semisimple Lie groups and their associated algebras.
The Cartan decomposition allows decomposition of a classical matrix Lie algebra into
Hermitian and skew-Hermitian components. For a given a complexified semisimple
Lie algebra g = gC0 , a corresponding Cartan involution θ is associated with the
decomposition g = k ⊕ p. In turn, this allows a global decomposition of G = KAK
where K = ek and A = ea where a ⊂ p (a maximal abelian subspace of p), which
can be viewed analogously in terms of the polar decompositions of matrices. Of
importance to our result in Chapter 5 is that Cayley transforms (see below) may
be used to transform between any Cartan subalgebras up to conjugacy. We use this
result to define our appropriate control decomposition in Chapter 5. We briefly set
out relevant background theory below, before moving to geometric concepts and
interpretation.

B.5.1 Compact Real Forms

For classical semisimple groups, the matrix Lie algebra g0 over R, C is closed under
conjugate transposition, in which case g0 is the direct sum of its skew-symmetric k0
and symmetric p0 members. Recall from equation (B.3.9) that complexification of
g0 is denoted g = gC0 = g0 + ig0 . The real Lie algebra g0 has a decomposition as the
direct sum g0 = k0 ⊕ p0 . We can construct a real vector space of complex matrices
(that is, where coefficients of matrices are real) as u0 = k0 + ip0 as a subalgebra. As
Knapp notes [9], there are certain requirements with respect to g0 and u0 such as
k0 = g0 ∩ u0 and p0 = g0 ∩ iu0 which allow us to decompose g0 = k0 ⊕ p0 in a way
that in turn allows the complexification:

g=k⊕p (B.5.1)
306 APPENDIX B. APPENDIX (ALGEBRA)

As the focus of this work is geometric and machine learning techniques for Lie
groups and algebras of relevance to quantum control, we omit specific standard
steps showing the complexification of g = k0 ⊕ p0 from this work, see [9, 230, 235]
for more detail. Note that we can define the mapping θ(X) = −X † (negative
of complex conjugate transpose) as an involution (that squares to identity) with
θ[X, Y ] = [θ(X), θ(Y )]. We then define a Cartan involution as follows.

Definition B.5.1 (Cartan involution). Given a Killing form B, X, Y ∈ g and an

involution θ, a Cartan involution is a positive definite symmetric bilinear form:

Bθ (X, Y ) = −B(X, θY ) (B.5.2)

It can be shown (see below) that involutions on g0 induce Cartan involutions

on g (for complex semisimple g, the only Cartan involutions of gR are conjugations
with respect to real forms of g). Cartan involutions are conjugate up to inner
automorphisms Int g. This result is related to the existence statement that g is
equipped with a Cartan involution which is so conjugate (the proofs are usually via
g0 which is then complexified). From the existence of this involution, we infer the
existence of the relevant Cartan decomposition.

Definition B.5.2 (Cartan decomposition). Given a complex semisimple Lie algebra

g with Cartan involution θ, g can be decomposed as:

g=k⊕p (B.5.3)

with k the +1 symmetric eigenspace θ(k) = k and p the -1 anti(skew)symmetric

eigenspace θ(p) = −p satisfying the following commutation relations:

[k, k] ⊆ k [k, p] ⊆ p [p, p] ⊆ k (B.5.4)

Here k, p are orthogonal with respect to the Killing form in that Bg (X, Y ) = 0 =
Bθ (X, Y ) i.e. for X ∈ k and Y ∈ p we have B(X, Y ) = 0. For the corresponding Lie
group G, the existence of the Cartan decomposition of g (where K ⊂ G is analytic
with Lie algebra k) gives rise to a global Cartan decomposition of G. The global
S
Cartan decomposition G = K exp(p) together with the fact that p = k∈K Adk a
means that the global Cartan decomposition is expressible as G = KAK which we
define below.

Definition B.5.3 (KAK Decomposition). Given G with a Cartan decomposition as

in definition (B.5.2), G can be decomposed as G = KAK where K = exp(k), A =
exp(a). Every element g ∈ G thus has a decomposition as g = k1 ak2 where k1 , k2 ∈
K, a ∈ A.
B.5. CARTAN DECOMPOSITIONS 307

Where such a Cartan decomposition is obtainable, we denote (as discussed fur-

ther on) G/K as a global Riemannian symmetric space. Note that in this case we
can decompose an arbitrary U ∈ G as:

U = kac U = pk (B.5.5)

where k, c ∈ exp(k) = K and p ∈ exp(p) = P and a ∈ exp(a) ∈ A ⊂ P .

The Cartan decompositions of semi-simple Lie algebras and groups is central to

results in our Chapter on global Cartan decompositions for time optimal control. It
is also important to results relevant to Chapter 4 given the role of Cartan decompo-
sitions in quantum control formalism [15, 59, 152, 175]. In particular, we rely upon
the KAK decomposition and the assumption that a ∈ exp(a) is parametrised by a
constant parameter (the ‘constant-θ’ ansatz) which, as we demonstrate, simplifies
the minimisation problem for finding certain time (and energy) optimal sequences
of unitaries.

B.5.2 Compact and non-compact subalgebras

For our results in Chapter 5, we require transformation of the relevant Cartan sub-
algebra (e.g. for su(3)) to one which is maximally non-compact in order to prove
our results relating to time-optimal control. This is because we require our max-
imally abelian subalgebra a ⊂ h to be an element of our control subset, i.e. we
require a ⊂ p. Because Cartan decompositions g = k ⊕ p with associated involu-
tions and h ⊂ g are conjugate up inner automorphisms of g (denoted g, namely
those which have a representation as conjugation by an element g ∈ G, that is
h′ = Intg (h) = ghg -1 ) we can apply a generator in k to transform to a new subalge-
bra that is maximally non-compact (see below). In the literature, it can be shown
that in particular any such h is thereby conjugate to a θ−stable Cartan subalgebra
(for involution θ). This means that we can decompose the Cartan subalgebra as
h = t ⊕ a where t ⊂ k, a ⊂ p. As Knapp notes, the roots of (g, h) are real on
a0 ⊕ it0 . The subalgebras t and a are compact and non-compact subalgebras of h
respectively, with dim t being the compact dimension and dim a being the compact
dimension. The Cartan subalgebra h = t ⊕ a is maximally compact if its compact
subalgebra is as large as possible and maximally non-compact in the converse case.
As a ⊂ p, for a given Cartan decomposition, h is maximally compact if and only if
a is a maximally abelian subspace of p.
308 APPENDIX B. APPENDIX (ALGEBRA)

B.5.3 Cayley transforms

As we discuss in Chapter 5, our Cartan decomposition for quantum control prob-
lems related to symmetric spaces requires a ⊂ h which is a maximally non-compact
Cartan subalgebra. Cartan subalgebras h are conjugate to other Cartan subalgebras
via a transformation of the form h′ = KhK -1 . In certain cases, such as the typical
Cartan decomposition of SU (3) that we explicitly explore in our final chapter, h is
not maximally non-compact (e.g. for su(3) choosing h = {λ3 , λ8 } Gell-man genera-
tors renders h ⊂ k while h ∩ p = ∅). To remedy this, we apply Cayley transforms of
which there are two types: (a) one constructed from an imaginary non-compact root
β (denoted by Knapp as cβ ) whose application to h causes the intersection h ∩ p to
increase by one dimension; and (b) another constructed from a real root α (denoted
by Knapp as dα ) whose application to h causes the intersection h ∩ p to decrease by
one dimension. Our interest is in the first of these, cβ . An example of the form of
such a transformation for sl(2, C) is given by:
π
cβ = Ad exp i X X = E β + Eβ ∈ gβ ⊂ p (B.5.6)
4
√
where Eβ ∈ gβ , E β ∈ g−β . In our example, we choose γ = (1, 3) such that:

Eγ + E−γ = λ4

where λ4 is the corresponding Gell-man matrix (see equation (5.4.63). We include

a worked example of a Cayley transformation in Chapter 5. Having covered key
elements of algebra and representation theory relevant to our results’ Chapters, we
now cover relevant background in differential geometry and then geometric control
theory. Our focus is largely on the synthesis of geometric and algebraic formalism,
where specific Lie groups are represented in terms of differentiable manifolds, Lie
algebras in terms of fibre bundles (the Lie algebra of a Lie group can be associated
with the structure group of a principal fibre bundle, and Lie algebra-valued differ-
ential forms play a crucial role in defining connections on these bundles) and where
transformations are given in terms of connections (such as Levi-Civita and others).
We also relate basic elements of the theory of Riemannian and subRiemannian man-
ifolds, geodesic theory and the application of variational methods to the theory of
Lie algebras and groups articulated above in a quantum context. Finally, we in-
tegrate our algebraic and geometric exegesis into the context of geometric control
theory, with a specific focus on quantum control relevant to the main results of this
work.
Appendix C

Appendix (Differential Geometry)

The study of geometric methods usually begins with the definition and study of the
properties of differentiable manifolds. One begins with the simplest concept of a
set of elements, characterised only by their distinctiveness (non-identity), akin to
pure spacetime points M. The question becomes how to impose structure upon
M in a useful or illuminating manner for explaining or modelling phenomena. To
do so, in practice one assumes that M is a topological space, following which ad-
ditional structural complexity, necessary for certain operations, such as analytical
operations (differentiation etc) or measures (for integration), is added. The mate-
rial on differential geometry in this Appendix is drawn from standard texts in the
field [2, 9, 44, 48–52]. Most results are presented without proofs which can be found
in the collection of resources noted above. The Chapter begins with an outline of
basic conceptualisation of manifolds, tangent planes, pushforwards and bundles. It
them moves onto vector (and tensor) fields, n-forms, connections and parallel trans-
port. The second section of this Appendix concentrates on geometric control the-
ory, drawn primarily from the work of Jurdjevic [23, 25, 26, 60, 213], Boscain [61, 62],
D’Alessandro [15] and others. It focuses primarily on classical geometric control
theory, but the results, especially where controls are constructed from semi-simple
Lie groups and algebras, carry over to the quantum realm.

C.1 Differential geometry

C.1.1 Manifolds and charts

We begin with the concept of a topological manifold M, being a connected Hausdorff
(T2 ) space (a topological space such that for any two distinct points there exist
disjoint neighbourhoods). From this point, we defined coordinate charts and atlases
as follows.

309
310 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

Definition C.1.1 (Coordinate charts and atlases). A coordinate chart is a pair

(U, ϕ) on M where U ⊂ M (an open subset) and ϕ : U → Km with p 7→ ϕ(p)
(usually taking K = R or C) is a homeomorphism. The charts ϕ, ϕ -1 together form
a family denoted as an atlas.

An atlas constitutes the assignment of a collection of open charts to the under-

lying manifold, which in turn allows the imposition of differential structure on the
manifold.

Definition C.1.2 (Differentiable manifold). A differentiable manifold is a Haus-

dorff topological space M together with a global differentiable structure.

The coordinates of a point p are maps from p ∈ U ⊂ M to the Euclidean space

K i.e. (ϕ1 (p), ..., ϕm (p)). The coordinate functions are each of these individual
m

functions ϕµ : U → K. We have m of them so that’s how we get the map from U

to Km . These functions are defined in terms of a chain, or rather, composition that
takes information from p, to Km to K. An example is Euclidean space Rm which can
be equipped with a differential structure via imposing a globally-defined coordinate
chart, where the coordinates of vectors v ∈ Rm have components v = (x1 , ..., xm ).
Any finite dimensional vector space V can be regarded as a differentiable manifold
by choosing a basis set of vectors and using this to map V isomorphically to Rm e.g.
if {e1 , ..., em } is a basis of V , then the vector v = m
P i
i v ei which becomes mapped
1 m m
to the n-tuple (v , ..., v ) ∈ R . Again, the key point is that {ei } is the basis of
V , whereas the coordinate functions (or set thereof) are the elements parametrised
by Rm . For certain manifolds, such as the sphere S 1 or the torus, there is no
way of parametrising the space with a single coordinate function, hence we may
require an atlas with different coordinate functions. In this case a mapping between
coordinate functions that preserves smoothness is required. Of crucial importance
for our analysis is the fact that a Lie group G (see definition B.2.1) is a group that
is also a smooth manifold via the smoothness of group operations.

Proposition C.1.1 (Lie Group (Manifold)). A Lie group G equipped with a dif-
ferentiable structure is a smooth differentiable manifold via the smoothness of group
operations (arising from the continuity properties of G i.e. being a topological group)
given by multiplication m : G × G → G, defined by m(g, h) = gh, and inversion
i : G → G, defined by i(g) = g −1 .

Functions defined on p ∈ M that overlap multiple charts are required to be

differentiable. Functions that preserve such structure are played by the role of C r
functions, those which are differentiable up to order r. In particular we are inter-
ested in local representatives, being functions between manifolds f : M → N which
have a representation (in a geometric sense) as a function between the coordinate
C.1. DIFFERENTIAL GEOMETRY 311

charts (U, ϕ)M → (V, ψ)N . The idea is that we can say the function f between
manifolds is a C r function if all local representatives between the two coverings are
themselves C r functions i.e. we can designate f as a C r function if for all coverings
(basically atlases) of M, N by charts (or rather coordinate neighbourhoods), the lo-
cal representative functions are C r . Functions that are differentiable require r ≥ 1,
while smooth functions require C ∞ .

Usually we are working with C ∞ spaces. The complete atlas (a C ∞ atlas) pro-
vides the differential structure required for differential calculus and analytic approx-
imations thereby allowing a differentiable structure to be imposed on M (where
K = C one also imposes holomorphic constraints upon functions defined on M).
For brevity, we denote a manifold as differentiable whether complex or real. In dif-
ferential geometry, the structure (topology) preserving map is fulfilled by a C ∞ (or
technically C r where r may or may not be finite) function such that the function
f : M → N is a C ∞ diffeomorphism assuming f is a bijection and f, f -1 are both
C ∞ . Two diffeomorphic manifolds can be regarded as ‘equivalent’ or ‘copies’ of a
single abstract manifold.

We denote F as the set of smooth differentiable functions C ∞ (M) (with which

we shall mainly be concerned with). Such functions compose differentiably and re-
main homeomorphic. The set of such functions C ∞ forms an algebraic derivation
(see below). It can be shown under typical criteria for A ∈ Vopen ∈ M , there exists
a φ ∈ C m such that φ(C) = 1, φ(V̄ ) = 0 i.e. functioning as an indicator or classifier
function (something of relevance to statistical learning of, for example, classifiers).
Of relevance to product spaces in quantum information, product spaces of two man-
ifolds M × N are represented by products in the vector space representation (see
paper), so that the product is itself a manifold (topologically).We leave as under-
stood usual features such as definitions of covering, paracompactness and normality
of M. With these concepts, we now move onto concepts of curves and tangent
spaces.

C.1.1.1 Tangent spaces

Tangent spaces are particularly important in algebraic geometry, geometric control

and geometric approaches to machine learning. The most basic example of a tangent
space for a point in some vector space x ∈ Rn with the tangent space typically
denoted as T M, representing the tangent space to the manifold M. It is often
312 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

illustrated by reference to the real vector algebra as:

Tx Rn := {v ∈ Rn+1 |x · v = 0} (C.1.1)

to indicate the changed dimensionality (intuitively being constructed from deriva-

tives at p ∈ M) hence being orthogonal. More constructively for our purposes,
tangent spaces can be thought of in a more algebraic way connected with the local
differentiable properties of functions on the manifold such that tangent spaces, like
differentiable manifolds, make no reference to ambient spaces in their definition e.g.
higher-dimensional vector spaces. As Isham [48] notes “[t]he crucial question, there-
fore, is to understand what should replace the intuitive idea of a tangent vector as
something that is tangent to a surface in the usual sense and which, in particular,
‘sticks out’ into the containing space” (p.72). To answer to this question is to con-
strue the tangent as a property of curves in the manifold. We define a tangent vector
in terms of properties of curves γ ∈ M which are in turn usually parametrised in
a particular way (using affine parameters or arc length). Moreover, because there
are many curves that are tangent, the concept of a tangent vector as an equivalence
class of curves is used.

Definition C.1.3 (Curves on manifold). A curve γ on a manifold M is a smooth

(i.e. C ∞ ) map:

γ :R ⊃ (−ϵ, ϵ) → M t 7→ γ(t)

(i.e. from an interval in R crossing 0 into M), such that, γ(0) = p.

As t varies ‘infinitesimally’ within (−ϵ, ϵ), the curve moves infinitesimally away
from p ∈ M but that it stays within the neighbourhood Up . Next, we define a
tangent as a property of curves.

Definition C.1.4 (Tangent). Curves are tangent if the image of two curves (i.e.
two maps) at t = 0, t ∈ (−ϵ, ϵ) and their derivatives are identical. Formally, two
curves γ1 , γ2 are tangent at a point p ∈ M if:

(i) γ1 (0) = γ2 (0) = p; and

(ii) The derivatives of two curves in a local coordinate system are the same i.e.
in some local coordinate system (x1 , ..., xm ) around the point p, the two curves
are ‘tangent’ in the usual sense as curves in Rm i.e:

dxi dxi
(γ1 (t)) = (γ2 (t)) i = 1, ..., m
dt 0 dt 0
C.1. DIFFERENTIAL GEOMETRY 313

If γ1 , γ2 are tangent in one coordinate system, then they are tangent in any other
coordinate system that covers p ∈ M. Hence the definition of tangent is independent
of the coordinate system chosen. The equivalence class of curves satisfying this
condition at p ∈ M is sometimes denoted [γ] which can be shown to form a vector
space, hence we can reasonably denote tangents at p as vectors. Note that curves
may not be tangent at another point q ∈ M, hence tangent vectors are thought of
as local (around p) equivalence relations, hence the importance of connections (see
below) that, intuitively speaking, tell us how to map between tangent spaces across a
manifold. The idea here is that the equivalence relation among curves arises from the
tangent relation (curves are equivalent where at p (for t = 0) and their differentials
with respect to coordinate functions are identical. From this construction of the
tangent we obtain a number of important definitions:

(i) Tangent space: the tangent space Tp M to M at p ∈ M is the set of all tangent
vectors at p.
S
(ii) Tangent bundle: is defined as the union of tangent spaces i.e. T M := p∈M Tp M.

(iii) Projection map: a map π : T M → M from the tangent bundle to M as-

sociated with each point p ∈ M. This map is ‘natural’ in the sense that it
associates each tangent plane (set of tangent vectors) with the point p ∈ M.

The tangent vector v ∈ Tp M can be also be construed as a directional derivative on

functions that are defined on the manifold, f on M by defining their ‘action’ on the
function f (which we denote v(f ) in terms of a differential operator on f evaluated
at the identity of the parametrisation of the curve itself (i.e. 0 point of interval)):

df (γ(t))
v(f ) =
dt 0

The directional derivative points in the direction of steepest ascent. From this, we
can define the gradient as follows.

Definition C.1.5 (Gradient). Given a smooth function f : M → R on a differ-

entiable manifold M, the gradient of f at a point p ∈ M, denoted ∇f (p), is the
unique vector in the tangent space ∇f (p) ∈ Tp M that satisfies:

v(f ) = ⟨∇f (p), v⟩

for all v ∈ Tp M where v(f ) denotes the directional derivative of f in the direction
of v (as above), and ⟨·, ·⟩ denotes the applicable metric (see below) on M.

In a coordinate frame (U, ϕ) around p, with ϕ = (x1 , . . . , xn ), the gradient of f

314 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

can be expressed as:

n
X ∂f −1 ∂
∇f (p) = ◦ϕ (ϕ(p))
j=1
∂xj ∂xj ϕ(p)

where ∂x∂ j ϕ(p) are the basis vectors of the tangent space Tp M in the coordinate
∂f j
chart, and ∂x j are the partial derivatives of f with respect to the coordinates x .

The inverse ϕ−1 reverses this mapping, translating coordinates in Rn (or Cn ) back to
points on the manifold M. We can understand how the ∇ ‘points’ in the direction
of steepest ascent by noting that that:

∇(f (p)) · v = |∇(f (p))||v| cos θ (C.1.2)

where cos θ is given by equation (A.1.1). Then |∇(f (p))| is maximal for cos(θ) = 1
which occurs when v and ∇(f (p)) are parallel. Hence ∇(f (p)) as a vector points
in the direction of maximal increase i.e. steepest ascent. To obtain the steepest
gradient descent, we utilise −∇. Although in this work we only touch upon the geo-
metric machine learning, this type of framing of tangents is important for geometric
framing of statistical learning methods, such as gradient descent (see section D.5.1),
specifically in relation to stochastic gradient descent optimisation (definition D.5.1).
As is shown in the literature, tangent spaces Tp M at p can be considered as
real vector spaces. The intuition for this is that two curves γi , γj cannot be added
directly because M lacks the structure to do so. However, the coordinate space to
which we have a mapping Rm is a vector space, thus one considered how each curve
γi , γj is represented in Rm such that t 7→ ϕ ◦ σ1 (t) + ϕ ◦ σ2 (t) which we can think of as
mapping from R into M then into Rm . The object ϕ ◦ σ1 (t) + ϕ ◦ σ2 (t) is considered
a curve in Rm which itself passes through the origin Rm when t = 0. This additive
object is then considered a curve in M that passes through p when t = 0. We
then define vector addition and scalar multiplication. From this we can show that
Tp M is a real vector space and that the definitions above that allow assertion of
the existence of a vector space of tangent vectors independent of the choice of chart
and distinct from curves γi . Because of this vector structure, the tangent bundle is
a vector bundle.

C.1.1.2 Push-forwards

An important map to mention in geometry is that of a push-forward. Recall that

tangent spaces are akin to a local linearisation of the manifold. That is, a map
h : M → N can be linearised via the architecture of tangent spaces and their vector
space structure. Given this map between manifolds M, N , we are interested in
C.1. DIFFERENTIAL GEOMETRY 315

maps between their corresponding tangent spaces T M, T N . This map is denoted

the push-forward h∗ and is defined as follows.

Definition C.1.6 (Push-forward). Given the mapping h : M → N and v ∈ Tp M,

then a push-forward is a function taking a vector in the tangent space associated with
p ∈ M, namely v ∈ Tp M and represented as h∗ (v) to the tangent space associated
with the element of N we denote via the map h(p) ∈ N :

h∗ :T M → T N (C.1.3)
v 7→ h∗ (v) (C.1.4)
h∗ (v) := h ◦ v (C.1.5)

where v = [γ].

Pushforwards are compositional mappings between tangent spaces which com-

pose linearly. It can be shown that certain commutativity diagrams hold with respect
to pushforwards such that, as maps between tangent spaces, they are independent
of intermediate tangent spaces through which they transition.

C.1.2 Tangent spaces and derivations

To understand the connection between geometry and Lie algebras, and ultimately
symmetry groups relevant to quantum machine learning, we focus on tangent spaces,
vector fields and later tensor fields which exhibit the structure of a derivation. Recall
that a derivation is a function on an algebra which generalizes certain features of
the derivative operator. For an algebra A over a field K (also a ring), a derivation
of A is a map D : A → A such that:

D(αf + βg) = αDf + βDg (scalar distributivity) (C.1.6)

D(f g) = f D(g) + gD(f ) (product rule aka Leibniz’s law) (C.1.7)

where α, β ∈ K; f, g ∈ A. Intuitively, we can think of derivations as typical linear

operators that allow differentiation along a direction. C ∞ forms an algebra such
that (this also relates to the condition of being a derivation):

(λf )(p) = λf (p) (Scalar multiplicative distributivity) (C.1.8)

(f + g)(p) = f (p) + g(p) (Additive distributivity) (C.1.9)
(f g)(p) = f (p)g(p) (Multiplicative distributivity) (C.1.10)

for λ ∈ F (the field), p ∈ M and f, g ∈ C ∞ . Importantly, tangents vectors can be

construed as operators or linear maps acting on functions f ∈ C ∞ (M) by noting
316 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

they satisfy the requirements of a derivation. This is a useful way to build intuition
around tangent vectors as differential operators upon a space and can be seen via
the definition of tangent vectors themselves i.e:

df (γ(t))
v(f ) = |0
dt

where v = [γ], the equivalence class of curves tangent at p ∈ M (sometimes tangent

vectors are described as a derivation map from such a ring of functions on C ∞ (M)
into R). The space of derivations Dp M (i.e. D at p ∈ M) satisfies the requirements
of being a real vector space if we define:

(v1 + v2 )f = v1 (f ) + v2 (f ), (rv)f = rv(f ) v1 , v2 , v ∈ Dp (M), r ∈ R, f ∈ C ∞ (M).

As shown in the literature, it can be shown that the vectors in such a space of deriva-
tions have a representation as vectors in the space constituted by partial derivatives
as vectors, allowing the equating of derivations at p with the tangent space at p. In
this representation, the partial derivatives represent the basis vectors of the vector
space with coefficients (component or coordinate functions) as:
m m
X
µ ∂ X
µ ∂
v ∈ Dp (M) =⇒ v = v(x ) = v (C.1.11)
µ=1
∂xµ p µ=1
∂xµ p

where the set of real numbers {v 1 , ..., v m } or {v µ } are the components of the vector in
the ‘direction’ of ∂xµ . That is, it can then be shown that there exists an isomorphism
between Tp M and Dp M allowing, in particular expression of important terms such
as the Jacobian in terms of local representatives of push-forward maps between
tangent planes.

C.1.2.1 Vector fields

Of particular importance to quantum information and algebraic transformations in

this work is the concept of a vector field as an assignment of vectors in tangent
planes to each point in M.

Definition C.1.7 (Vector field). A vector field X on a C ∞ manifold M is a deriva-

tion of the algebra C ∞ (M ) by way of an assignment of Xp ∈ Tp M, ∀p ∈ M. Given
f ∈ C ∞ (M) and vector fields X, Y ∈ D(M), then Xf and X + Y are also vector
fields as follows:

Xp f : M → R p 7→ (Xf )(p) (C.1.12)

X + Y : p → Xp + Y p (C.1.13)
C.1. DIFFERENTIAL GEOMETRY 317

for p ∈ M.

To each p ∈ M is associated a vector field. We think of vector fields as operators

on functions f which are themselves defined on M. Note in the set of vector fields,
Helgason [2] denotes X, Y ∈ D(M ) as D1 (M). If the above conditions are met, then
D(M) is a module over the ring of functions F on C ∞ (M ). If both vector fields
are in this module, i.e. if X, Y ∈ D(M), then we have that XY − Y X is also a
derivation of C ∞ (M ) and is itself a vector field. This vector field (again, a mapping
of the algebra over the field, in this case over C ∞ (M ) = A in the above definition
of derivation) is denoted:

[X, Y ] = XY − Y X (C.1.14)

and aligns as the Lie derivative (commutator) (see definition (B.2.6)). That is,
for the map X : C ∞ (M) → C ∞ (M), the map above from M → R defined by
(Xf )(p) = Xp F defines a geometric characterisation of the Lie derivative of the
function f along the vector field (assignment of tangent vectors which are opera-
tors) X. The set of all vector fields X on M is sometimes denoted V F ld(M) [48]
or X(M) (or Γ(M)) and carries the structure of a real vector space.

Note that a vector field can be regarded as a first-order differential operator: as

the tangent vector Xp ∈ Tp M is differential operator C ∞ functions defined on M.
A vector field, as an assignment of such tangent vectors, can be considered first-order
differential operator on M. Thus for a neighbourhood around p given by U ∈ M,
the vector field related to X can be defined as:
m
X ∂
X= (X µ )
µ=1
∂xµ

where X µ are the coefficient functions and the partial derivatives the basis. The
functions X µ are the components of the vector field X with respect to the coordinate
system associated with the chart (U, ϕ). The components are sometimes denoted X µ
(though notation may vary) i.e. they are the coordinate functions that tell us the
components of the vector at some point p ∈ M given a coordinate system. The form
of the components depends upon the coordinate chart in use. Given two coordinate
charges (U, ϕ), (U ′ , ϕ′ ) with U U ′ ̸= ∅, the tangent vector should be defined in a
S

way that it is equivalent across two coordinate systems, that is:

′
X X
X= X µ ∂xµ = X ν ∂xν ′
µ ν

We may then express the coefficient functions in one coordinate chart (or frame) in
318 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

terms of another:
′
′
X ∂xν
Xν = Xµ
µ
∂xµ

C.1.2.2 Vector fields and commutators

We explicate a number of relevant features of vector fields and commutators, given

their importance in both quantum and geometric methods adopted in the substan-
tive Chapters above. Commutators arise in this context by considering composition
of vector fields: naively multiplying two vector fields does not obtain a third. X, Y
can be viewed as linear maps from C ∞ (M) → C ∞ (M). So we can define the com-
position X ◦ Y : C ∞ (M) → C ∞ (M) as X ◦ Y (f ) = X(Y (f )) however this is not
a vector field because the Leibnizian derivation property (‘product rule’) isn’t sat-
isfied. Instead, it can be shown that X ◦ Y − Y ◦ X satisfies the derivation criteria
(see text) and so is a vector field:

(X ◦ Y − Y ◦ X)(f g) = g(X ◦ Y − Y ◦ X)f + f (X ◦ Y − Y ◦ X)g

As discussed above, this composed vector field is the commutator of vector fields
X, Y and is denoted [X, Y ]. In component form the commutator is represented as:
X
[X, Y ]µ = (X ν Yνµ − Y ν Xνµ ) (C.1.15)
ν

The commutator is both antisymmetric [X, Y ] = −[Y, X] and importantly satisfies

the Jacobi identity. The relation to Lie algebras is apparent. In general a Lie
algebra defined as a vector space g together with a bilinear map [·, ·] : g × g →
g, (A, B) 7→ [A, B] satisfies (a) antisymmetry and (b) the Jacobi identity for elements
A, B ∈ g. This bilinear mapping is not associative (ordering matters), forming a non-
associative algebra. Note that the Lie algebra ‘bracket’ is defined as compositions
which satisfy antisymmetry and Jacobi, rather than directly as the commutator.
It turns out that for certain vector spaces (those with which we are concerned in
this work), the commutator will satisfy the requirements for being a Lie algebra.
One such example is the set of M (n, C) where the Lie bracket is defined to be the
commutator [A, B] := AB − BA as discussed above. For our purposes, the results
relating to vector antisymmetry and satisfaction of the Jacobi equation imply that
the set of vector fields X on M (using the same notation for a set as an individual
field) have the structure of a real Lie algebra. Note also that, if h : M → N
and the associated pushforward satisfies h∗ : Tp M → Th(p) N then each manifold is
considered h-related.
C.1. DIFFERENTIAL GEOMETRY 319

C.1.2.3 Integral curves and Hamiltonian flow

We now connect the theory of vector fields with integral curves and Hamiltonian flow,
on our way towards understanding the theory of Riemannian and sub-Riemannian
geodesic curves. Vector fields can be regarded as generators of an ‘infinitesimal’
diffeomorphism of a manifold e.g. in the form δ(xµ ) = ϵX µ (x). Recall that tangent
vectors can be regarded as (a) equivalence classes of curves or (b) derivations defined
at a point in the manifold. The question is: given a vector field X on M, is it
possible to ‘fill’ M with a family of curves in such a way that the tangent vector
to the curve that passes through any particular point p ∈ M is just the vector
field X evaluated at that point, i.e. the derivation Xp ? This idea is important in
general canonical theory of classical mechanics, where the state space of a system is
represented by a certain (even-dimensional e.g. symplectic phase-space) manifold M
and physical quantities are represented by real-valued differentiable functions on M.
The manifold M is equipped with a structure that associates to each such function
(representing physical quantities) f : M → R a vector field Xf on M. The family
of curves that ‘fit’ Xf play an important role. In particular for quantum systems,
curves associated with the vector field XH where H : M → R is the energy function,
are dynamical trajectories of the system i.e. the Hamiltonian flow. Of particular
importance to our search for time-optimal curves (geodesics) in later chapters are
integral curves. We want to find a single curve that (i) passes through p ∈ M and
(ii) is such that the tangent vector at each point along the curve agrees with the
vector field at that point. For this we use the definition of an integral curve.

Definition C.1.8 (Integral curves). Given vector field X on M, the integral curve
of X passing through p ∈ M is a curve γ : (−ϵ, ϵ) → M, t 7→ γ(t) such that we have
γ(0) = p and the push-forward satisfies:

d
γ∗ = Xγ(t) ∀t ∈ (−ϵ, ϵ) ⊂ R
dt t

The components of X µ of the vector field X determine the form taken by the
integral curve t 7→ γ(t). Remember, the vector field (and tangent vectors) should
be thought of as differential operators, so we have:

d µ
X µ (γ(t)) = x (γ(t))
dt

with the boundary condition that xµ (γ(0)) = xµ (p). Certain other properties are
also sought on integral curves, such as completeness properties such that the curves
are defined for all t (associated with assumptions regarding the compactness of the
manifold).
320 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

We are also interested in the extent to which a vector field can be regarded as
the generator of infinitesimal translations on the manifold M. To this end we define
a one-parameter group of local diffeomorphisms (see definition B.2.14) at a point
p ∈ M consisting of the following (a triple with certain properties): (i) an open
neighbourhood U (p), (ii) a real constant ϵ > 0 and (iii) a family of diffeomorphisms
of U , given by {|ϕt ||t| > ϵ} onto the open set ϕt (U ) ⊂ M, i.e. ϕt : (−ϵ, ϵ) × U → M
and (t, q) 7→ ϕt (q). The group has the following properties: (a) maps from the pa-
rameter and neighbourhood to M are smooth, i.e. (−ϵ, ϵ) × U → M, (t, q) 7→ ϕt (q)
is smooth on both domains; (b) compositions follow ϕs (ϕt (q)) = ϕs+t (q) and M,
i.e. ϕ0 (q) = q. The set of local diffeomorphisms is denoted Diff(M). The term
one-parameter subgroup refers to the composition property (b) above and that the
map t → ϕt can be thought of as a representation (vector space with map) of part of
the additive group of the real line. Families of local diffeomorphisms can be thought
of as inducing vector fields. Through each point q ∈ M, there passes a local curve
t 7→ ϕt (q). Because of the existence of this curve, we can construct a vector field on
U by taking tangents to this family of curves at q. This vector field is denoted X ϕ ,
the vector field induced by the diffeomorphism ϕ, more formally:

d
Xqϕ (f ) := f (ϕt (q))|0 ∀q ∈ U ⊂ M.
dt

It can then be shown that ∀q ∈ U the curve t 7→ ϕt (q) is an integral curve of X ϕ for
|t| < ϵ.

C.1.2.4 Local flows

We now briefly mention local flows and its connection with Diff(M). A local flow of
a vector field is a one-parameter local diffeomorphic group such that the vector fields
induced by the group are the vector field itself. More formally, for a vector field X
defined on an open subset U ⊂ M, the local flow of X at p is a one-parameter group
of local diffeomorphisms defined on some open subset V ⊂ U such that p ∈ V ⊂ U
and such that the vector field induced by this family equals X. In a local coordinate
system, local flow - the family of diffeomorphisms ϕX t - is expressed in the following
way:

d µ X
X µ (ϕX
t (q)) = x (ϕt (q))
dt
C.1. DIFFERENTIAL GEOMETRY 321

around q ∈ U ⊂ M. We then Taylor expand the coordinates (which are in a vector

space) xµ (ϕX X
t (q)) of the transformed point ϕt (q) near t = 0, resulting in:

xµ (ϕX µ µ 2
t (q)) = x (q) + tX (q) + O(t )

Recalling that xµ (ϕX m

t (q)) ∈ R , the expansion shows that the coordinates x of
µ

the transformed of q → ϕX µ
t (q) are approximated by the original coordinates x (q)
translated to first order by tX µ (q). As X µ belongs to the vector field, this is what
allows us to say that the vector field generates ‘infinitesimal’ transformations on
the manifold because it is responsible (up to approximation) for ‘shifting’ from
q to ϕXt (q) in terms of a local coordinate system on the manifold. We discuss
the relationship of integral curves and local flows to time-optimal geodesics in the
context of geometric control theory further on in section C.5.5.2 on Hamiltonian
flow, providing a connection between geometric formalism and quantum information
processing.

C.1.3 Cotangent vectors and dual spaces

In this section, we discuss important concepts of one-forms, cotangent vectors, lead-
ing to discussion of pullbacks, tensors and n-forms along with corresponding defini-
tions of covariance and contravariance.

C.1.3.1 Dual spaces

Recall from above the definition (A.1.8) that for any vector space V = V (K) we
define the associated dual space V ∗ as the space of all bounded linear maps L :
V → K (see definition A.1.13). Here we focus on K = R for exposition. In this
mapping formulation, the dual of a real vector space V is the collection V ∗ of all
linear maps L : V → R. The action of L ∈ V ∗ on v ∈ V is usually denoted
⟨L, v⟩ or sometimes with the subscript V as well. Note that here ⟨·, ·⟩ represents
the evaluation of the functional at a vector in the sense of assigning a scalar to each
vector v ∈ V . Note that by the Riesz representation theorem (for finite dimensional
V ), the isomorphism between V and V ∗ means that we can associate to each vector
v a unique such functional Lv , thus allowing us to view the evaluation of functionals
in this way as a generalisation of the inner product (hence the notational similarity).
The dual itself can be given the structure of a real vector space by leveraging the
vector-space properties of R into which L maps i.e.:

(i) ⟨L1 + L2 , v⟩ := ⟨L1 , v⟩ + ⟨L2 , v⟩ ;

(ii) ⟨rL, v⟩ := r ⟨L, v⟩ , r ∈ R.

322 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

If V with dim V < ∞ has a basis {e1 , ..., en }, then the dual basis for V ∗ is a collection
of vectors {f 1 , ..., f n } (they are vectors because they are linear maps L ∈ V ∗ which
is a vector space). This set {f i } is uniquely specified by the criterion that:

⟨f i , ej ⟩ = δji (C.1.16)

Note equation (C.1.16) may also be written as f i (ej ) = δji where duals take (or act
upon) vectors as their arguments, which is in essence an application of the Riesz
representation theorem. Because of the isomorphism between a vector and its dual,
one can ‘invert’ dual and vector spaces, so the dual is the argument of a vector
map to a scalar, but for clarity one usually maintains the formalism above. For
maps between different vector spaces L : V → W , then L induces a dual mapping
L∗ : W ∗ → V ∗ on k ∈ W ∗ by:

⟨L∗ k, v⟩V := ⟨k, Lv⟩W

which says that the dual-map L∗ acting on k in W ∗ (itself a dual space) generates a
map L∗ k which lives in V ∗ . This map then acts on v ∈ V (taking it to R), acting as
a pullback (discussed below): L∗ is pulling back the linear functional k from W ∗ to
a linear functional in V ∗ . To explicate the relationship between V and V ∗ , it can be
shown that there exists a canonical map χ : V → (V ∗ )∗ where ⟨χ(V ), ℓ⟩V ∗ = ⟨ℓ, v⟩V
where V ∈ V, ℓ ∈ V ∗ which is an isomorphism if dim V < ∞. This concept captures
the spirit of what is sometimes described as V being equivalent to the ‘dual of its
dual’. With these concepts to hand, we can now define cotangent structures.

Definition C.1.9 (Cotangent vectors). A cotangent vector is a map from tangent

space to R. At a point p ∈ M, it is defined as a real linear map:

k : Tp M → R

The value (which is in R) of k when acting on v ∈ Tp M is denoted ⟨k, v⟩ or ⟨k, v⟩p

(analogous to using inner-product symbolism for mapping to scalar or contraction).

The cotangent space is the space of all cotangent vectors i.e. at p ∈ M it is

the set Tp∗ M of all such linear maps constituting cotangent vectors. It is a vector
space and is the dual of the vector space Tp M. Similarly, the cotangent bundle
T ∗ M is the collection (a vector bundle) of all such cotangent for all p ∈ M i.e.
T ∗ M = p∈M Tp∗ M.
S
C.1. DIFFERENTIAL GEOMETRY 323

C.1.3.2 One forms

We now come to the important definition of a one-form. In the field of geometric

optimal control, one-forms are essential for formulating and solving control problems.
One-forms can represent constraints and objectives in control problems, particularly
in the context of Hamiltonian systems, where the dynamics can be expressed in terms
of symplectic geometry (which we discuss in section C.5.5.2).

Definition C.1.10 (One-form). A one form ω defined on M is an assignment of

a cotangent vector (being a smooth linear map) ωp to every point p ∈ M. We
can consider a one-form as a map from a vector field (as a vector space) to R i.e.
ω : X → R for X ∈ X(M) i.e. a one-form is a field of cotangent vectors, meaning
it assigns a cotangent vector to each point in a smoothly varying manner.

One-forms are useful and important for understanding tensorial contractions

and, under certain conditions, relate to inner products. Noting that ⟨ω, X⟩ (p) :=
⟨ωp , Xp ⟩p , we observe that the one-form as an assignment of a cotangent vector to
each point p ∈ M, in turn induces a map from the tangent plane at p to R. In this
sense the representation ⟨ωp , Xp ⟩p specifies a map ω from the vector field at p to R
and that this mapping is smooth i.e. infinitely-differentiable. Connecting with the
definition of the inner product (equation (A.1.4)) above, we can consider an inner
product in terms of tangent spaces:

⟨·, ·⟩ : Tp M × Tp M → R (C.1.17)

and one forms. Consider a metric tensor g which is a type (0, 2)-tensor meaning it
takes two vectors as an argument and returns a scalar. Choosing an inner product
⟨·, ·⟩ on a vector (tangent) space Tp M gives rise to an isomorphism:

φ :Tp M → Tp∗ M v 7→ ⟨v, ·⟩ .

The function φ(u) takes u ∈ Tp M and returns its dual such that that φ(u)(v) =
⟨u, v⟩ i.e we have a function taking a vector, obtaining its dual then taking the inner
product with v and returning a real number. In this sense, we can identify the inner
product as the action of a covector on a vector i.e:

⟨u, v⟩ = φ(u)v.

In terms of components of the metric tensor, this is equivalent to:

⟨u, v⟩ = gij ui v j = uj v j
324 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

but noting that there is no canonical identification of V with V ∗ . Such concepts

are important in quantum and geometric methods, especially Hamiltonian dynam-
ics where, if a system has a configuration space Q then the space of classical states
is defined to be the cotangent bundle i.e. set of all linear maps from the tangent
plane into R (see [48]). We briefly note the expression of dual vectors in terms of
components and basis elements.

As discussed above, selecting a local coordinate system around p ∈ M gives rise

to a basis set associated with p ∈ M:

∂
.
∂xµ

This is a basis for the set of derivations and thus of the tangent space at p by the
isomorphism above. There is also an associated basis about p for the corresponding
dual Tp∗ M. It is denoted by what via differential operators:

(dx1 )p , ..., (dxm )p .

As with the basis of the dual space above, the basis for the dual space Tp∗ M is given
via a set of linear operators in the dual space which when composed with the basis
vectors of Tp M (i.e. the ∂x∂ µ operators), leads to deltas. That is:

µ ∂
(dx )p , := δνµ . (C.1.18)
∂xν p

In this way, we can expand dual vectors k ∈ Tp∗ M in terms of component functions
and this basis:
m
X
k= kµ (dxµ )p
µ

where the components kµ ∈ R of the vector for the coordinate system and thus
the basis vectors dxµ are given via the bilinear map (e.g. inner product in certain
contexts) of the dual vector k with the basis element of the tangent vector:

∂
kµ = k, .
∂xµ p p

Remembering that k is a map acting on v ∈ Tp M, then we can write the bilinear

C.1. DIFFERENTIAL GEOMETRY 325

map pairing ⟨k, v⟩ similarly as:

m
X
⟨k, v⟩ = kµ v µ
µ

recalling v µ = v(xµ ) and the resemblance here to inner products. Using this formal-
ism, we can choose a local representation for one forms:
X
ω= ωµ dxµ (C.1.19)
µ

i.e. we can express a one-form (cotangent vector) in terms of the basis set of the
dual space where the above is a shortened expression for:
X
ωp = ωµ (p)(dxµ )p
µ

for p ∈ U ⊂ M. The components ωµ of ω are functions on U :

∂
ωµ (q) := ω, µ
∂x p

where q ∈ U ⊂ M. Such a construction shows how a one-forms can be construed as

smooth cross-sections of T ∗ M analogous to vector fields defined as smooth sections
of T M (see [48]).

C.1.3.3 Pullbacks of one-forms

We now describe the pullback property of one-forms. While it isn’t true in general
that a map h : M → N (between manifolds) can be used to push-forward a vector
field on M i.e. we cannot always use some commuting differential form to act as
our pushforward in the general case, this map h can be used to pull-back a one-
form on N . This feature is connected with the way in which the global topological
structure of a manifold is reflected in DeRham cohomology groups of M defined
using differential forms and to the fact that fibre bundles also pull-back (not push-
forward) (see Frankel [49] §13 for detailed discussion).

Definition C.1.11 (Pull-back). A pullback is the dual of the map between tangent
spaces across manifolds. If h : M → N and we have the linear map (the pushfor-
ward) h∗ : Tp M → Th(p) N , then then pullback map is defined as:

h∗ : Th(p)
∗
N → Tp∗ M (C.1.20)

and is considered under certain conditions the dual of h∗ . This means that for all
326 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

∗
maps (which are also vectors in the associated vector space) Th(p) N and v ∈ Tp M
we have :

⟨h∗ k, v⟩p := ⟨k, h∗ v⟩h(p)

∗
To unpack the formalism, note that Th(p) N is a reference to the dual of the tan-
gent space for p ∈ M mapped to h(p) ∈ N , so we can see h∗ as ‘pulling back’ to
the dual space of p ∈ M. The term ⟨h∗ k, v⟩p refers to h∗ (the pullback) acting on
∗ ∗
k i.e. it ‘pulls back’ the map k living in Th(p) N to a map living in Th(p) M. Thus
∗ ∗
h k is a map living in Th(p) M which is itself a linear map that acts on the tangent
space Tp M, i.e. v ∈ Tp M. The definition says that this pullback is equivalent
(diagrammatically) to k acting on h∗ having been pushed forward to Th(p) N which
∗
means it cans be acted on by k ∈ Th(p) N . In both cases, we have a mapping to R.

We can also extend pullbacks to one-forms. Recall, a one-form is an assignment

of a cotangent vector ωp to p ∈ M and this cotangent vector, which is a mapping
from Tp M to R, acts on the tangent vector at p, i.e. v ∈ Tp M to R. We define a
pullback of one-forms as follows.

Definition C.1.12 (Pull-back (one-forms)). If ω is a one-form on N , then the

pullback of ω is itself a one-form h∗ ω on M expressed as:

⟨h∗ ω, v⟩p := ⟨ω, h∗ v⟩h(p)

for all p ∈ M and v ∈ Tp M.

It can be shown (see [48] for specifics exposition) that the pullback h∗ of ω (again,
in the dual space of N ) evaluated at p ∈ M is given by the pullback acting on that
one-form ω, i.e. h∗ ω when then acts on the basis elements of Tp M, namely the
∂
∂xµ
terms, mapping them to R via the bilinear form. This is definitionally equated
with the one-form ω acting on the basis element ∂x∂ µ of Tp M once it has been
‘pushed forward’ to Tp N by the pushforward map h∗ , which also is equivalent to an
evaluation of the one-form at h(p) (as we are assuming the existence of h : M → N
here). The point of the pullback is to allow the structure of maps defined on or in
relation to one manifold, say N , to be expressed or represented in terms of structure
on or related to another manifold M.

C.1.3.4 Lie derivatives and pullbacks

We now specify the relationship of Lie derivatives (and thus commutators) to pull-
backs. Using definition (C.1.12), the pull-back of a one-form ω on M can be related
to a vector field X with an associated one-parameter group of local diffeomorphisms
C.1. DIFFERENTIAL GEOMETRY 327

∗
t → ϕX X
t . For each t, there exists an associated pull-back ϕt . This one-parameter
family of pulled-back forms describes how the one-form ω changes along the flow
lines (integral curves) of X. In this formalism we can connect Lie derivatives to
one forms and pullbacks. The Lie derivative of ω associated with vector field X,
denoted LX ω, is then the rate of change of ω along the flow-lines (integral curves)
of the one-parameter group ϕX t of diffeomorphisms associated with the vector field
X. That is:

d ∗
LX ω := ϕX
t ω (C.1.21)
dt t=0

Note by comparison that the Lie derivative can be expressed as:

d
ϕX
−t∗ (Y ) = [X, Y ] = LX Y (C.1.22)
dt t=0

Recall that that X in ϕXt refers to the group of local diffeomorphisms that induce the
∗
vector fields X. Then ϕX t is the family of diffeomorphisms that generate the dual
∗
cotangent vector field X . Equation (C.1.21) can be understood as follows. Recall
that LX f = Xf and LX Y = [X, Y ] (as naive multiplication of vector fields lacks the
derivation quality). It can be shown that LX ⟨ω, Y ⟩ = ⟨LX ω, Y ⟩ + ⟨ω, LX Y ⟩. Recall
ϕX
t representing the flow of X on M (a one-parameter group of diffeomorphisms).
This says that each value of t is associated with a diffeomorphism from M to itself.
The flow moves points along the integral curves of the vector field X. Associated
with such flow is an inverse flow, indicated by the negative sign in ϕX−t , in essence a
rewinding of the flow (which recall is possible as a diffeomorphism). The operation
ϕX
−t moves points p ∈ M along integral curves in the opposite direction to ϕt . In
X

this way, we can see how the commutator (Lie derivative) [X, Y ] expresses the rate
of change of the vector field Y along integral curves of X. Moreover, df can be used
to define the exterior derivative.

Definition C.1.13 (Exterior derivative). The exterior derivative of f ∈ C ∞ (M) is

given by:

⟨df, X⟩ := Xf = LX f

for vector fields X on M.

In local coordinates, this has a representation as:

m
X ∂
(df )p = µ
f (dxµ )p
µ
∂x p
328 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

i.e. we represent the function in the dual basis dxµ of X ∗ , then take the derivative
with respect to the coordinates in X, i.e. ∂µ . Note also the commutation of the
pullback h∗ with f , h∗ (df ) = d(f ◦ h).

C.1.4 General tensors and n-forms

We can now express tensors using the geometric formalism above, which we utilise for
our discussion of time-optimal geometric control. Generalising tangent and cotan-
gent vectors requires tensors and n-forms. To do this, we adopt the idea of the
tensor product of two vector spaces V and W , denoted (as per definition (A.1.12))
by V ⊗ W with a geometric framing. The tensors examined above can then be
represented as compositions of tangent and cotangent spaces.

Definition C.1.14 (Tensor types (geometric)). Tensors are vector products of tan-
gent spaces and/or cotangent spaces. A tensor of type (r, s) ∈ Tpr,s M at a point
p ∈ M belongs to the tensor product space:
" # " #
Tpr,s M := ⊗r Tp M ⊗ ⊗s Tp∗ M (C.1.23)

i.e. r tensor products of the tangent space tensor-producted with s tensor products
of the dual (cotangent) space.

The basis of Tpr,s M can be thought of as composed of the tensor products of r

basis elements of the tangent space (partial derivatives) and s basis elements of the
dual (cotangent) space (differentials):
" # " #
∂
Tpr,s M = ⊗ri=1 ⊗ ⊗sj=1 (dxvj ) .
∂xµi

In this formulation, with vectors and their duals, tensors as linear maps becomes
apparent. We can represent the Tpr,s type tensor in terms of a multilinear mapping
of the Cartesian product of a vector space and its dual to R which, in functional
notation (Cartesian products describing the function of tensors rather the direct
product) is given by:

× r
Tp∗ M × s
× Tp M →R (C.1.24)

where a multilinear map is viewed as a function that takes r tangent vectors as

input and s cotangent vectors as input and maps them to R (the difference being
notational, r always indexes the tangent vectors, s always indexes the cotangent
vectors). Tensors take products of vectors and covectors as their arguments, thus
C.1. DIFFERENTIAL GEOMETRY 329

tensor maps in effect take tensors as their arguments. The upper index r indices
the number of contravariant components of the tensor associated with Tp M. These
are akin to ‘directional’ components transforming in the same way as coordinate
differentials dxi . Contravariant components are typically denoted via superscripts
corresponding to directions the tensor acts along with respect to covectors. The
subscript index s denotes the number of covariant components, consisting of linear
functionals in Tp∗ M (recall for duals and vectors we have components and bases).
This formulation is usefully understood in terms of equation (C.1.16) i.e. ⟨f i , ej ⟩ =
δji . Each basis element of the vector and its dual either annihilate or lead to unity
(scaled by any coefficients a, b), thus if we denote component coefficients a, b ∈ R
for vectors and duals af i , bej , then:

⟨af i , bej ⟩ = ab ⟨f i , ej ⟩ = abδji (C.1.25)

Thus the definition of vectors and duals, as bases of Tp M and Tp∗ M shows how the
tensorial map effectively acts as contraction (see below) which reduces the dimen-
sionality of the input vectors and covectors (the input tensor) and then multiplies
the remainder by the corresponding scalar product of the real coefficients of the basis
vectors for the vector and its dual so annihilated. As we see below, the usual inner
product on vector spaces can be framed in the language of metric tensors (where the
metric tensor acts as a bilinear form resulting in a scalar product). Framing tensors
as multilinear maps connects to concepts further on of, for example, metric tensors
which act on Tp M and Tp∗ M to contract tensor products to scalars, forming the ba-
sis for measuring the time, energy or length of paths given by Hamiltonian evolution
in our quest for time-optimal geodesics. In quantum information, the eponymous
tensor networks represent an important application of tensor formalism for a variety
of use cases, such as error correction and algorithm design [257, 258]. We note for
completeness a few types of special (r, s) tensors. Note the relation between the
tangent planes and cotangent planes:
(i) Tp0,1 M = Tp∗ M is the cotangent space at p (space of covectors or one-forms).
An example at p ∈ M would be a vector in Tp M with one contravariant
component;

(ii) Tp1,0 M = Tp M = (Tp∗ M)∗ (dual of dual returns the tangent plane). An
example would be a covector or one-form in Tp∗ M consisting of one covariant
component;

(iii) Tpr,0 M is the space of r-contravariant tensors (multilinear functions of covec-

tors);

(iv) Tp0,s M is the space of s-covariant tensors (multilinear functions of vectors);

330 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

(v) Tpr,s M a mixed tensor comprising both contravariant (acting on vectors) and
covariant (acting on covectors) elements. Such higher-rank tensors are used
to represent more complex objects, such as curvature tensors or stress-energy
tensors.

Covariant tensors are multi-linear maps that take vectors (from Tp M) as inputs and
return scalars, while contravariant tensors take covectors (from Tp∗ M) as inputs.
From this we obtain a tensor field on M (a generalisation of the concept of a vector
field), which is a smooth association of a tensor of type (r, s) for p ∈ M. It is
instructive to observe how Tp M is associated with contravariant transformations
while Tp∗ M is associated with covariant transformations. This is intuitively related
to the fact that dual spaces (cotangent spaces) consist of linear maps that act on
vectors in the tangent space, hence they transform covariantly (i.e. they co-vary with
vectors). On the other hand, structures that transform ‘opposite to the way tangent
vectors do’, i.e. according to the basis changes in the cotangent space, are said to
transform contravariantly. Since tensor spaces can involve multiple layers of dual
spaces, we consider their transformation properties in terms of how they relate to
the transformations of the basis in the tangent and cotangent spaces. Tensors can be
defined as the unique bilinear map V ∗ ×W ∗ → R which evaluates to ⟨k, v⟩ ⟨l, w⟩ ∈ R
on (k, l) ∈ V ∗ × W ∗ . We also can now usefully define a tensorial contraction as
a generalisation of equations (C.1.16). Contraction is an operation that reduces
the order of a tensor by pairing covariant vectors, indicated by (lower) indices with
contravariant vectors, indicated by (upper) indices and summing over the dimension
of the manifold, effectively treating the contraction over indices as if they are in an
inner product, akin to a trace operation. Recalling that a tensor is a mapping (a
function), then we can define the contraction as a function, being a tensor, that
takes as its argument another tensor. In general, the contracting tensor can be
of any order, but in practice contraction is often performed using metric tensors,
tensors which, as maps to R, satisfy the conditions of being metric (we discuss this
below). The most general form of a contraction is set out below. Following [2, 48]
first recall that vectors and their duals are related via:
Y
⟨Xi ⊗ · · · ⊗ Xr , ωs′ ⊗ . . . ⊗ ωr′ ⟩ = ωj (Xi′ )ωi′ (Xj ),
i,j

for Xi , Xj′ ∈ T r M and ωj , ωi′ ∈ T ∗ M. We then define tensor contractions as follows.

Definition C.1.15 (Tensor contractions). A (1, 1)-contraction is a linear mapping

r−1
Cji : Tsr M → Ts−1 M with (Cji (Tsr ))p = Cji (Ts,p
r
) for T ∈ Tsr M, p ∈ M. The mapping
C.1. DIFFERENTIAL GEOMETRY 331

Cji is the contraction of the ith contravariant index and jth covariant index:

Cji (⊗rj=1 Xj ⊗si=1 ωi ) = ⟨Xi , ωj ⟩(X1 ⊗ ... ⊗ X̂i ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗ ω̂j ... ⊗ ωs )
| {z }
(r,s)

(C.1.26)
= δij (X1 ⊗ ... ⊗ X̂i ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗ ω̂j ... ⊗ ωs ) (C.1.27)
= (X1 ⊗ ... ⊗ Xj ... ⊗ Xr ⊗ ω1 ⊗ · · · ⊗
ωj ... ⊗ ωs ) (C.1.28)
| {z }
(r−1,s−1)

Where the X̂i symbol denotes removal of the ith element (following [2]).

Thus contraction reduces the rank of a tensor by one in both the covariant and
contravariant indices. Note that we can in principle construct a more general con-
traction mapping the product of individual contractions. In particular we examine
metric tensors, which provide a way to compute inner products between tangent
vectors, thereby inducing a geometry and way of measuring distance (and thus time
optimality) on M.

C.1.4.1 Metric tensor

We now move to the definition of the metric tensor, of fundamental importance to

time optimal and machine learning approaches adopted in this work.

Definition C.1.16 (Metric tensor). The metric tensor is a [0,2]-form tensor field
mapping gp : Tp M × Tp M → R given by:

g := gij dxi ⊗ dxj gij := ⟨ei , ej ⟩ g ij: = ⟨dxi , dxj ⟩ (C.1.29)

where ei = ∂i = ∂xi are basis elements of Tp M and dxi are the corresponding dual
basis elements for the inverse metric tensor g ij .

Recall in this notation that v = v j ej so that and similarly for v expressed in

the dual basis i.e. v = vj dxj . The metric tensor can be described in terms of
Metric tensors induce an inner product space on Tp M allowing vector both lengths
(magnitudes) and angles between vectors to be calculated. The metric tensor can be
used for raising and lowering indices in effect converting between vectors and their
covectors acting on the vector v to produce its covariant components vi through
index lowering. This process is described by:

vi = gij v j . (C.1.30)
332 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

which can be seen, assuming an orthonormal basis {ei } as:

⟨v, ej ⟩ = ⟨v i ei , ej ⟩ = v i ⟨ei , ej ⟩ = v i δij = vj

where gij = δij in the orthonormal case (see [49] p.lix).

C.1.4.2 n-forms and exterior products

We conclude this section introducing n-forms and exterior products. Consider the
following elementary n-forms, assuming 0 ≤ n ≤ dim M with n ∈ N. A 0-form is
a function in C ∞ (M). A one-form (1-form) at each point p ∈ M, assigns a linear
functional that maps vectors in Tp M to R. It is a section of the cotangent bundle
T ∗ M defined as follows.

Definition C.1.17 (n-form). An n-form is form of a tensor field, represented by ω,

of type (0, n) that is totally skew-symmetric. For any permutation P of the indices
1, 2, . . . , n:

ω(X1 , . . . , Xn ) = (−1)deg(P ) ω(XP (1) , . . . , XP (n) )

where Xi are arbitrary vector fields on M, and deg(P ) is the degree of the
permutation P (+1 even, −1 odd). The set of all n-forms on M is denoted An (M).
An n-form is therefore a particular way of assigning n cotangent vectors to p ∈ M.
We can now define an important concept, the wedge or exterior product in terms of
a tensor product of n-forms.

Definition C.1.18 (Exterior product). For ω1 ∈ An1 (M), ω2 ∈ An2 (M) the wedge
or exterior product of n-forms ω1 , ω2 is the (n1 + n2 )-form, denoted ω1 ∧ ω2 and
defined as:

1 X
ω1 ∧ ω2 = (−1)deg(P ) (ω1 ⊗ ω2 )P (C.1.31)
n1 !n2 !
σ(P )

where σ(P ) denotes permutations over P .

The permutations σ(P ) are understood as follows. For a tensor field ω of type
(0, n), the permuted tensor field ω P is defined to be the permutation map applied
to the permutation index i.e:

ω P (X1 , ..., Xn ) = ω(XP (1) , ..., XP (n) )

for all vector fields X1 , ..., Xn on the manifold M. The factor 1/(n1 !n2 !) is a normal-
isation factor given the ways of permuting ω1 , ω2 , while the sum over permutations
C.1. DIFFERENTIAL GEOMETRY 333

σ(P ) ensures all orderings of vectors are considered and that the result is antisym-
metric with respect to exchange of vectors. When applied to vector fields, the result
is a real number interpreted as an oriented volume spanned by those vectors asso-
ciated with p ∈ M. Intuitively one can think of how in two dimensions the wedge
(or cross) product leads to a volume (area) which can be given an orientation based
on the direction of its vectors. While not a focus of this work, we note that the
generalised pullback commutes with the wedge product, noting for h : M → N and
differential forms α, β, the pullback is a homomorphism h∗ (α ∧ β) = (h∗ α) ∧ (h∗ β).
Wedge products turn vectors into a graded algebra, important in various algebraic
geometry techniques. The basis of n-forms at p ∈ M is given by the wedge product
of differentials:

(dxµ1 )p ∧ ... ∧ (dxµn )p

Indeed it can be shown that dω(Xi ) at p ∈ M only depends upon vector fields Xi
at p, a fact related to the property of the exterior derivative being f -linear:

dω(X1 , ..., f Xi , ..., Xn+1 ) = f dω(X1 , ..., Xi , ..., Xn+1 )

dω(f X, Y ) = f dω(X, Y )

Such formulations then form the basis for important theorems related to DeRham
cohomology (see Frankel [49]). In this context the exterior derivative can be defined
as follows.

Definition C.1.19 (Exterior Derivative). Given A(M) := T ∗ M, the exterior

derivative is a map d : An (M) → An+1 (M), which takes an n-form to an (n + 1)-
form. For an n-form ω ∈ An (M) given locally by
X
ω= ωi1 ...in dxi1 ∧ . . . ∧ dxin ,
i1 ,...,in

the exterior derivative is defined as

X
dω = dωi1 ...in ∧ dxi1 ∧ . . . ∧ dxin (C.1.32)
i1 ,...,in

∂ωi1 ...in
dxj .
P
where dωi1 ...in = j ∂xj

The exterior derivative has the property that d(dω) = 0 for any form ω, and it
is linear over the smooth functions on M, i.e., for a smooth function f and a form
α, we have d(f α) = df ∧ α + f dα. It is relevant in particular to the Maurer-Cartan
form and Cartan structure equations discussed elsewhere.
334 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

C.1.5 Tangent planes and Lie algebras

In this section, we recount a few properties of Lie groups and Lie algebras from
earlier sections, connecting them explicitly to geometric formulations used in later
chapters. Recall from proposition (C.1.1) that a Lie group is a group equipped with
the structure of a differentiable manifold such that the group operations are smooth
i.e. where the map in parameter space that takes us from g1 (parametrised by θ1 )
to g2 in G is a differentiable map.

A fundamental aspect of Lie group theory is the isomorphism between the set of
all left-invariant vector fields on a Lie group G, denoted L(G), and the tangent
space Te G at the identity element e of G. This isomorphism implies that one can
understand the behaviour of left-invariant vector fields by examining transforma-
tions within Te G. In particular, this concept is foundational when considering the
exponential map. The tangent space Te G is, effectively, the Lie algebra of the Lie
group G, and mappings between different Lie groups G → H can be studied by
examining the corresponding maps between their Lie algebras Te G → Te H where
here e denotes the identity element for G. In addition, given the completeness of
left-invariant vector fields G, we can extend integral curves using the group struc-
ture even if G is not compact. Moreover, we can see how the exponential map is
constructed by considering the unique integral curve of a left-invariant vector field
at t = 0, mapping t 7→ exp(tA), where A is an element of the Lie algebra Te G. The
map is formally defined as exp : Te G → G, such that exp(A) = exp(tA) evaluated at
t = 1. It can be demonstrated that the exponential map is a local diffeomorphism
near the identity e, mapping a neighborhood in Te G smoothly onto a neighborhood
in G. This elucidates that the exponential map generates a one-parameter subgroup
of G (see definition B.2.14), and, in fact, every one-parameter subgroup of G can be
expressed in the form t 7→ exp(tA), a fact related to the bijective correspondence
between one-parameter subgroups of the Lie group G and elements of its Lie alge-
bra discussed earlier. Thus the neighborhood of e in G, which is diffeomorphically
mapped by exp : Te G → G, is filled with the images of these subgroup maps, allow-
ing investigations into these subgroups by examining the local structure around the
identity element of the Lie algebra.

C.1.5.1 Left and right translation

Of importance to connecting Lie groups and algebras to geometric constructs is the

existence of left and right translations of G onto itself, which can be used to map
the local tangent bundle around the entire group. Recall left translations and right
C.1. DIFFERENTIAL GEOMETRY 335

translations of G are diffeomorphisms given by the group actions:

rg :G → G, lg : G → G
g ′ 7→ g ′ g g ′ 7→ gg ′

A vector field X on a Lie group G is left-invariant if it is lg -related to itself for

all g ∈ G i.e.:

lg∗ X = X, ∀g ∈ G (C.1.33)

or equivalently:

lg∗ (Xg′ ) = Xgg′ , ∀g, g ′ ∈ G

with a similar notion of right invariance. Here lg∗ denotes the pushforward, being
the derivative map of the left action Lg i.e. g maps between manifolds, while g ∗ is
the corresponding push-forward mapping between their respective tangent spaces.
Connecting with the notation above, given an isomorphism between the tangent
space at identity and L(G) given by ξ : Te G → L(G) with ξ(A) = LA for A ∈ Te G
then we define LA g = lg ∗ A for all g ∈ G. The set of left invariant vector fields is
denoted L(G). Indeed one of the important properties of X being left-invariance is
that it is complete (Isham [48] §4.2). Similarly, local compactness is important in
quantum contexts as it relates to the existence, for example, of the Haar measure
(discussed in Appendix A above) where the measure exists on G if G is (quasi)-
invariant under left and right translations. For vector fields X1 , X2 on M that
are by a pushforward h∗ mapped to vector fields Y1 , Y2 on a manifold N where
h : M → N , the commutator [X1 , X2 ] is h-related to the commutator [Y1 , Y2 ]. For
left-invariant vector fields X1 , X2 , we then have that the commutator is invariant
under the left-invariant actions (translations) of G, namely:

lg∗ [X1 , X2 ] = [lg∗ X1 , lg∗ X2 ] = [X1 , X2 ] (C.1.34)

which shows that [X1 , X2 ] ∈ L(G). It can be shown that the set L(G) is a ‘sub-Lie
algebra’ of the infinite-dimensional Lie algebra of all vector fields on the manifold
G. This is equivalent to the Lie algebra of G and represents a way of construing the
Lie algebra in terms of important invariant properties of vector fields. It can also be
shown that there is an isomorphism between the tangent space at the identity e ∈ G
i.e. Te G and L(G), allowing L(G) (and diffeomorphisms of G) to be explored via
actions upon Te G (at the identity). The commutator [A, B] ∈ Te G (for A, B ∈ Te G)
336 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

is then defined to be the unique element in Te G satisfying:

L[A,B] = [LA , LB ].

Expanding on the above notation, here LA := LA A

g = lg∗ A, ∀g ∈ G i.e. L is the
vector field on G defined by such left translation and L[AB] = lg∗ [A, B] representing
the left invariant vector field associated with the Lie bracket [A, B] ∈ Te G. We note
also the structure constants of a Lie algebra L(G) ≃ Te G arising from commutators
of its basis elements Eα that is:
n
X γ
[Eα , Eβ ] = Cαβ Eγ
γ=1

γ γ
for Cαβ ∈ R where Cαβ denotes the relevant structure constant (which play roles
in quantum mechanics, geometry and elsewhere). The adjoint map also has an
expression in terms of actions on Te G and that left-invariant fields X such that
integral curves of X can be extended for all t ∈ R.

C.1.5.2 Right and left invariance and Schrödinger’s equation

Because of its relevance in particular to our Chapter 5 results, we set out the signif-
icance of left and right invariance in relation to quantum control. Our problem in
that Chapter is to determine the optimal time to synthesise a unitary UT ∈ G using
only controls in p, the antisymmetric subspace corresponding to the Cartan decom-
position g = k ⊕ p, where g is the Lie algebra of G. Now consider the Schrödinger
equation:

dU
= −iH(t)U. (C.1.35)
dt

Presented in this way, the Hamiltonian on the left of U , is the generator of (in-
finitesimal) translations (to second order) U → U + dU over increments dt. U is our
unitary at t while U + dU is our unitary at t + dt. The Hamiltonian is applied for
dt i.e. dU/U = −iH(t)dt. Thus time in equation (C.1.35) moves from right to left.
Intuitively this is because U on the right-hand side represents the quantum state at
time t, while dU on the left-hand side represents it at the later time t + dt. When
we say right or left translation by g, we can think of right translation as following
the flow of time once its direction is chosen, and left translation in this case as time
flowing ‘backwards’ relative to that choice. Thus right and left action by h ∈ G is
C.1. DIFFERENTIAL GEOMETRY 337

expressed as follows:

dU
h = −iH(t)U h (right action) (C.1.36)
dt

whereas left action is given by:

dU
h = −ihH(t)U = −i(hH(t)h -1 )hU (left action). (C.1.37)
dt

Thus we can see how, once an implicit direction is chosen, as distinct from right
action, left action acts to conjugate (thus transform by an effective rotation) the
Hamiltonian H(t) → hH(t)g -1 (recalling here that the action of h on g is by conju-
gation, expressed as the commutation g → [Xh , g]). Thus while the representation
of left or right invariance can be chosen initially in an arbitrary fashion, once chosen
in certain circumstances they are not equivalent.

C.1.5.3 Exponentials, integral curves and tangent spaces

We briefly note a few connections between the exponential map (discussed in Ap-
pendix B) and the geometric formalism above. The exponential map can be related
to integral curves, i.e. as the unique integral curve satisfying:

LA A d
t→γ (t) A= γ∗L (C.1.38)
dt 0

of the left invariant vector field LA associated with the identity in M, that is
A A
γ L (0) = e and which is defined for all t ∈ R. The notation γ L refers to the integral
curve generated by the left-invariant vector field LA originating from the identity ele-
ment e ∈ G. For any g ∈ G, the left translation map Lg : G → G, Lg (h) = gh, h ∈ G
A A
does not alter LA . Here A is the tangent vector to the curve γ L at e where γ∗L
represents the pushforward of the curve’s tangent vector at t = 0. The integral curve
is then written as a mapping from the affine parameter t 7→ exp(tA) with A ∈ Te G.
The map exp : Te G → G is defined via exp A = exp(tA) t=1 , reflecting the conven-
tion that one moves from the identity e ∈ G along the integral curve generated by
A to exp(A) ∈ G with time t ∈ [0, 1]. This reflects the idea that evolution of curves
in G can be framed in terms of evolution from the identity element and therefore
studied in terms of the canonical Lie algebra associated with Te G.

Other results mentioned in Appendix B also apply, such as the exponential map
being local diffeomorphism from Te G to e ∈ G and that t 7→ exp(At) is considered a
unique one-parameter subgroup of G (indeed that all such one-parameter subgroups
are of that form for A ∈ Te G ≃ L(G)). This reiterates the one-to-one association
338 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

between one parameter subgroups of G and the exponential map. Left-invariance

also applies to n-forms and that Lie algebras can be said to have associated with
them dual Lie algebras (see literature for standard discussion).

C.1.5.4 Maurer-Cartan Form

Differential forms are an important tool for exhibiting how algebraic properties of g
manifest within or affect geometric and topological properties of a system. Of note
in this regard is the Maurer-Cartan form which is a g-valued one-form on G. The
form can be understood as relating structure constants and wedge products to the
derivative of a one-form, providing a means of expressing v ∈ Tp M in terms of g.
Given the Maurer-Cartan equation:
n
α 1 X α β
dω + C ω ∧ ωγ = 0 (C.1.39)
2 β,γ=1 βγ

we observe that the commutator of one-forms is a two-form which can be obtained

via exterior differentiation. The term dω α is the exterior derivative of the form, and
the wedge product ω β ∧ ω γ is summed over the structure constants Cβγ α
of the Lie
algebra, weighted by the Lie bracket.

Definition C.1.20 (Maurer-Cartan Form). Let G be a Lie group. The Maurer-

Cartan form is a g-valued one-form ω on G, where g is the Lie algebra of G. For
each g ∈ G, ωg : Tg G → g is defined by

ωg (v) = lg∗−1 (v)

where lg∗−1 denotes the pullback of the left multiplication by g −1 in G, and v ∈ Tg G

is a tangent vector at g.

Here lg∗−1 : Tp M → g. The one-form ωg relates the tangent space at any point
g ∈ G back to the tangent space at the identity and therefore represents a map
from Tp M → g. As Sharpe [8] notes, the equation is of profound use in classifying
properties of spaces. In particular, Cartan’s structural equations (see Theorem C.2.2
below) can be expressed as:

1 1
dωg = [ωg , ωg ] [ωg , ωg ] := [ωg , ωg ](u, v) = [ωg (u), ωg (v)] (C.1.40)
2 2

which is also described as the Cartan curvature equation as it describes local cur-
vature (where dωg = 0 equates to flatness). Note we can derive Cartan curvature
equation (C.1.40) from equation (C.1.39).
C.1. DIFFERENTIAL GEOMETRY 339

Connecting the concepts above to quantum information problems in this work,

the form takes a tangent vector v ∈ Tg G and returns a left-invariant vector field
associated with v, meaning the field remains constant along the flow. Here lg−1
translates v back to the identity then pushes it forward to g ′ using left translation.
As we discuss, the Maurer-Cartan form assists in identifying left-invariant vec-
tor fields invariant under the group action, simplifying equations of motion in opti-
mal control contexts. For optimal control, we are also interested in transformation
groups. For example, a group G acting on M does so if there exists a homomor-
phism such that γ : G → σ(M) with g 7→ γg where σ(G) is the permutation group
of M (with an equivalent anti-homomorphism for right action). Left action on a
differentiable manifold can then be considered as a homomorphism from the group
(of diffeomorphisms on M) to curves γ, that is g → γg where γ : Diff(M) → M.
This gives rise to the notion of equivariant mappings, being structure preserving
maps between pairs of group actions of G on manifolds M, N where g 7→ γg , g 7→ γg′
on M and N respectively where γg′ ◦ f = f ◦ γg . Equivariance has recently become
of interest in quantum machine learning via the construction of equivariant neural
networks where the network architecture is constructed to be equivariant [259–261].

We can connect different types of group actions discussed in earlier parts of this
work to their geometric equivalents:

(i) Kernel : kernel of a G-action is the subgroup of G defined by:

K = {g ∈ G|gp = p, ∀p ∈ M}.

(ii) Effective group action: is where the kernel equals the identity, that is K = {e}.

(iii) Free group action: if for all p ∈ M it is the case that {g ∈ G|gp = p} = {e}
i.e. M is moved away from itself except with the exception of the unit element
e i.e. e = p is the only element such that gp = p.

(iv) Transitive group action: G-action is transitive if any pair of points p, q ∈ M

can be connected by an element of the group, i.e. there exists a g ∈ G such that
p = gq. In transitive action, the whole of M can be probed by the G-action.
This is complementary to the idea that in an effective action the whole of G
can be proved by its action on the manifold M.

(v) Orbit: the orbit Op of the G-action through p is the set of all points in M that
can be reached from p, that is:

Op = {q ∈ M|∃g ∈ G, q = gp}. (C.1.41)

340 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

When a group G acts on a set X (group action on X) it permutes the element

of X. The path of an element of X moves around a fixed path is its orbit. An
example is the sphere S 2 acted on by the circle group S 1 via rotations around
the z axis, this leaves the elements of the set invariant. Such action is analogous
to the action of the compact subgroup K under a Cartan decomposition G =
K ⊕P , the intuitive idea being that each K defines an orbit about which k ∈ K
transform, but that to reach different orbits, one requires the action of some
element in P .

(vi) Stabiliser (isotropy) groups: if G action on M is transitive with Gp of the

action is given by:

Gp = {g ∈ G|gp = p}. (C.1.42)

The stabilisers of an element p ∈ M can be thought of as group actions (or

permutations of M which form a group homomorphically mapped to G) which
leave p unchanged, i.e. the set of group action identity elements with respect
to p ∈ M where we can think of a set of permutations which leave p unchanged
or ‘in the same position’. Stabiliser groups are fundamental to quantum infor-
mation (especially surface codes). In our final chapter, the stabiliser group K
of G is of fundamental importance, noting, for example, that stabiliser groups
along an orbit are conjugate i.e. if p, q belong to the same orbit of a G-action,
they can be reached via some g ∈ G.

C.1.5.5 Infinitesimal transformations and adjoint action

An important action throughout this work is the adjoint action of G on itself (dis-
cussed in Chapter 2):

Adg (g ′ ) = gg ′ g −1 .

The kernel of this action is the centre C(G) because the centre is the set of elements
of G that commute with every element in G, but then for an g ′ ∈ C(G) we have
Adg (g ′ ) = gg ′ g −1 = g ′ gg −1 = g ′ i.e. the (effective) kernel of this action of conjugating
by g is the set of elements in g that result in this action being equivalent to the
identity. The relation between the commutator of a pair of Lie algebra elements
(crucial for all quantum control problems) and the push-forward Adg∗ of the adjoint
action of G can be expressed as follows. Let A, B ∈ Te G with Lie bracket [A, B],
C.1. DIFFERENTIAL GEOMETRY 341

then the bracket has a representation as differentiated adjoint:

d
[A, B] = Adexp(tA)∗ (B)|t=0 .
dt

Recalling that the exponential is given such that exp(Adg∗ (B)) = g(exp(B))g −1 , it
can be shown that the relation between the commutator can be thought of in terms
of this conjugacy:

exp(tA) exp(B) exp(−tA) = exp(t[A, B] + O(t2 ))

and X Adg∗ (A) = δg−1 ∗ (X A ) where δg−1 ∗ is the differential (pushforward) of the map
g -1 acting on X A . That is, the vector field associated with the adjointly transformed
Lie algebra element A is equivalent to the transformed vector field A.
We conclude this section via reiterating in a geometric context the key results
related to Lie algebra homomorphisms and connecting these to vector fields. This
is an important result for further on when we introduce the connection between
vector (fibre) bundles that serves a pivotal role in our final chapter. In the theory
of infinitesimal transformations, we can represent the Lie algebra g of G via vector
fields on M on which G acts. This mapping is a homomorphism of Lie algebras and
associates X A with A ∈ Te G. Assume G right-acts on M. Given a mapping

A 7→ X A

that assigns to each A ∈ Te G the vector field X A (denoted the induced vector field
on M) is a homomorphism from the Lie algebra L(G) ≃ Te G into the infinite-
dimensional Lie algebra of all vector fields on M. This is denoted:

[X A , X B ] = X [A,B]

for all A, B ∈ Te G ≃ L(G). For left actions, the map A 7→ X A is an anti-

homomorphism [X A , X B ] = X [A,B] from Te G ≃ L(G). If the action is effective
on M, then the map A 7→ X A is an isomorphism from Te G ≃ L(G) to Lie algebras
on M.

C.1.6 Fibre bundles

In many cases of physical importance, the vector space V arises as a representa-
tion space of some internal symmetry group, especially for example in symmetry
reduction techniques used in geometric control theory. Fibre bundles allow for the
study of situations where the structure of interest over each point of a base space
is more complex than just a vector space, allowing, for example, gauge symmetries
342 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

to be identified (e.g. where gauge symmetries are not completely represented as

transformations on vector spaces). They are of particular relevance in gauge the-
ories providing a means of describing fields and their connections over manifolds.
In quantum information problems involving systems with a geometric or topologi-
cal character, quantum state space can be considered as sections of a fibre bundle
where the base space represents a space of parameters or configurations, and the
fibres represent the Hilbert spaces associated with each configuration. In geometric
control, we often look to find ways of modelling the control space modeling control
inputs and state spaces as fibres. Fibre bundles are an abstraction of vector bundles
and are utilised for when, for example, mappings lack vector space properties. We
begin with the concept of a bundle, defined as a triple (E, π, M), where E and M
are linked via a continuous projection map π : E → M. We briefly sketch out the
theory of fibre bundles and principal bundles as these are touched upon in other
S
sections. Define a bundle space as E = p∈M Fp , with Fp being fibres. A fibre is
then defined as follows.

Definition C.1.21 (Fibre). The fibre Fp over p is the inverse image of p under π.
Geometrically, it arises via the map π −1 : M → T M, an example of a fibre bundle
associating p ∈ M with the tangent space Tp M. Formally, we define the fibre Fp as
follows. The projection π associates each fibre Fp with a point p ∈ M, where:

Fp = π −1 ({p}),

defines Fp as the preimage of p under π (that is the set of all points in E mapped
to p ∈ M).

Certain bundles have the special property that the fibres π −1 ({p}), p ∈ M are
all homeomorphic (diffeomorphic for manifolds) to F . In such cases, F is known as
the fibre of the bundle and the bundle is said to be a fibre bundle (we explore this
more via associated bundles below). For vectors, this is the set of all vectors that
are tangent to the manifold at the point p. The fibre bundle is sometimes visualised
in diagrammatic form (see Isham [48] §5.1):

F E
π

Related to fibre bundles is the idea of a section. The section of a bundle (E, π, M)
is a map from the base space (manifold) to the total space:

s:M→E
C.1. DIFFERENTIAL GEOMETRY 343

such that the image of each point p ∈ M lies in the fibre π −1 ({p}) over p such that:

π ◦ s = idM

So the section is the map from the manifold such that we take p, send it up to the
total space E, then that image of p, when subject to π, takes us back to p ∈ M ,
hence π ◦ s is equivalent to the identity on M. One can also define maps between
bundles and pullbacks among fibres analogous to the case for vector bundles.

C.1.6.1 Principal fibre bundles

An important canonical fibre bundle is the principal fibre bundle whose fibre acts
as a Lie group in a particular way. For our purposes, they allow for the definition of
connections (see below) which describe how fibres (or vector spaces) are connected
over different points in M. Connections are of fundamental importance to results
in our final Chapter and also to definitions of vertical and horizontal subspaces in
subRiemannian control problems further on. A connection on a principal bundle
defines a notion of horizontal and vertical subspaces within the tangent space of the
total space. This distinction is crucial for defining parallel transport and curvature,
concepts that are central to understanding the dynamics and control of systems with
symmetry. Firstly, we define a principal fibre bundle as follows.

Definition C.1.22 (Principal fibre bundle). A principal fibre a bundle (E, π, M) is

a G-bundle (a fibre bundle) if the total space E is a right-G space and if the bundle
(E, π, M) is isomorphic in the following way:

(E, π, M) ≃ (E, ρ, E/G)

where E/G is the orbit space of the G-action on E and ρ is here the usual projection
map such that we have the diagram:

u
E E
π ρ

≃
M E/G

A principal fibre bundle has a typical fibre that is a Lie group G, and the action
of G on the fibres is by right multiplication, which is free and transitive. The
fibres of the bundle are the orbits of the G-action on E and hence are not generally
homeomorphic to each other. All non-principal bundles are associated with the
principal bundle. Principal G bundles are when G acts freely on E (that is, free
action is when the only element of G that acts as an identity element on x ∈ E is
344 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

the identity in G itself). Given a closed subgroup H ⊂ G, then the quotient space
G/H is also an orbit space with fibre H. When G is the fibre itself and the action
on G is both free and transitive, we use the notation P to denote the total space
(i.e. E). The notation P is used to emphasise that each fibre is isomorphic to G
itself and that all fibres in the bundle are homogeneous i.e. they are all structurally
isomorphic (so we can utilise a single representation for each). A few other concepts
to note include:

(i) Principal total space (P). For principal G bundles, the total space is often
denoted as a principal total space where the principal G bundle is then indi-
cated by the triple (P, π, M) where P is the principal total space. There exist
principal maps between two such bundles i.e. u : P → P ′ for (P, π, M) and
(P ′ , π ′ , M′ ). The mapping is G-equivariant as u(pg) = u(p)g. Here G acts on
P and P projects onto M via π.

(ii) Triviality. The triviality (or trivialisation) of a principal G-bundle relates to

whether the bundle expressly respects the product structure of the base space
with G. A principal G-bundle is trivial if it is isomorphic to the product bundle
(M × G, pr1 , M) where pr1 : M × G → M is a projection onto M. Trivi-
ality requires (i) compatibility with projection maps so that diffeomorphisms
f respect the fibration structure of P by mapping fibres in P over p ∈ M
to {p} × G and (ii) equivariance with respect to G such that f (pg) = f (p)g,
where f (p) ∈ M × G (such that f preserves the group action structure of the
bundle).

For the avoidance of doubt, we have so far (with slight abuse of notation for con-
venience) been equating G ≡ M (as distinct from explicitly notationally indicating
the group acting on the manifold M × G), where Tg G for g ∈ G captures the in-
finitesimal directions in which G can evolve, mapping directly to tangent spaces
Tp M on the manifold. Usually the principal bundle (P, π, M) is introduced as a
more abstract formulation to cater for where G acts on a different manifold (e.g.
where G ̸= M). The formulation of total space P allows consideration of how the
tangent spaces Tg G (that is for our formulation, Tp M) are related across the entirely
of M. Thus in our treatment, we are interested in how G acts upon itself, which
in the language of quantum operators is how operators act upon themselves e.g.
U1 (t)U2 (t) = U ′ (t) can be regarded as group elements acting on themselves.

C.1.6.2 Associated bundles

While principal fibre bundles above allow us to abstractly associate the group action
of G to a manifold, the fibres remain abstract. In practice we want an association
C.1. DIFFERENTIAL GEOMETRY 345

between fibre bundles and more familiar structures from a geometric control and
quantum information processing perspective, e.g. we want our fibres to have the
structure of vector spaces, tangent space or Lie algebras. For this we turn to the
concept of associated fibre bundles which enable the construction of bundles with
fibres that are not necessarily groups but can be any space on which the group acts.
This is particularly relevant to our use of geometric methods where fibres are vector
spaces, such as Lie algebras or tangent spaces. The idea [48] is that an associated
bundle can be constructed where G acts as a group of transformations on F .

Definition C.1.23 (Associated bundle). Given a principal fibre bundle (P, π, M)

and a left G-space F , an associated fibre bundle is a fibre bundle (PF , πF , M) where
the principal total space PF is defined as the quotient (P × F )/G (also represented
as P ×G F being the space formed by the Cartesian product of P and F modulo the
group action).

The projection πF : PF → M is induced by the projection π : P → M of

the principal bundle. For each p ∈ M, the space πF-1 (p) is homeomorphic to F ,
in essence allowing us to select structures, such as vector spaces or tangent spaces,
which respect the fibre bundle structure of the principal G-bundle. In this way, the
principal bundle structure can be extended over the chosen fibres e.g. vector spaces
in a way that varies smoothly from point to point over M. This is in turn central
to the concept of a connection, which gives a covariant derivative along the fibres of
the bundle.
The primary form of associated bundle we are interested in are vector bundles,
which are important due to their being equipped with a natural vector space struc-
ture. For convenience we retain E instead of PF .

Definition C.1.24 (Vector bundle). An n-dimensional complex or real vector bun-

dle (E, π, M) is a type of fibre bundle in which each fibre exhibits the structure of
an n-dimensional vector space. For each p ∈ M there exists a neighbourhood U and
way of locally expressing trivialisation of the bundle via h : U × Rn → π −1 (U ) such
that for all y ∈ U, h : {y} × Rn → π −1 ({y}) is linear.

Vector bundles are the primary type of (associated) fibre bundle with which we
are concerned in this work. The concept has broad application, e.g. in machine
learning by allowing the representation of data points that are sensitive to inherent
symmetries in the data. In quantum settings, unitary operations and quantum
channels, which describe the evolution of quantum states, can be seen as bundle
maps that act on the sections of vector bundles. Unitary operations, representing
reversible quantum evolutions, can be modeled as isomorphisms of Hilbert spaces
H (definition A.1.9) as fibres which preserving their structure. Quantum channels,
346 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

which describe more general (possibly irreversible if decohering) quantum evolutions,

can be represented as linear maps between fibres (see [241] for more detail).

C.1.7 Connections
Connections are a fundamentally important concept in geometry, encapsulating the
concept of differentiation of vector fields along curves on manifolds. A connection on
a manifold M, or more specifically on a tangent bundle T M, provides a systematic
way to parallel transport vectors along paths, allowing the comparison of tangent
spaces at different points on M, effectively facilitating the extension of the notion
of directional derivatives to curved spaces. Intuitively the idea of a connection
is a means of associating vectors between infinitesimally adjacent tangent planes
Tp M → Tp+dp M as one progresses from γ(t) = p to γ(t + dt) = p + dt on M.
Another way to think about them is that they provide a way of identifying how
Tp M transforms along the curve γ(t), giving conditions to be satisfied, in the form
of transformation rules for such vectors, for tangent vectors γ̇(t) to remain parallel
as they are transported along a curve (parallel transport or the vanishing of the
covariant derivative, which we discuss below).
Connections are also a fundamental means of distinguishing between Riemannian
and subRiemannian manifolds via the decomposition of fibres (and Lie algebras) into
horizontal and vertical subspaces. They are thus central to quantum and classical
control problems, such as the KP control problem we study extensively in the final
Chapter. In Riemannian geometry, for example, Christoffel symbols represent co-
efficients of connection on the bundle of frames (set of bases) for M. Connections,
as we explain below, are related to notions of parallel transport and covariant dif-
ferentiation. The underlying idea of parallel transport and covariant differentiation
requires that one compares points in neighbouring fibres (or vector spaces, in the
case of a vector bundle) in a way that isn’t dependent upon a particular local co-
ordinate system (or trivialisation). Thus a concept of directionality is needed such
that vector fields point from one fibre to another. Vector fields arising from Lie
algebras lack this intrinsic orientation of directional pointing. The connection pro-
vides a concept of directionality by partitioning the fibre into horizontal and vertical
subspaces as discussed below.
Consider a principal G-bundle (P, π, M) above. A connection on P provides a
smooth splitting of the tangent space Tp P at each point p ∈ P into vertical and
horizontal subspaces, Tp P = Vp P ⊕ Hp P , where Vp P is tangent to the fibre and
Hp P is isomorphic to Tπ(p) M. To understand this formalism, we expand upon
important concepts of vertical and horizontal subspaces. These are fundamental
in later chapters, where synthesis of time-optimal (approximate) geodesics arise by
C.1. DIFFERENTIAL GEOMETRY 347

way of choosing Hamiltonians comprising generators from horizontal subspaces (Lie

subalgebras). Evolution according to horizontal subspaces in Riemannian manifolds
relates to vanishing (zero) covariant derivatives.

Definition C.1.25 (Vertical subspace). The vertical subspace Vp P at p ∈ P is

defined as the kernel of the differential of the projection map π : P → M, i.e.,

Vp P = ker(π∗ |p ) = {v ∈ Tp P | π∗ (v) = 0}.

where π∗ is the associated mapping i.e. π∗ : Tp P → Tπ(p) M. Connections can

be associated with L(G) valued one-forms ω (which is the assignment of dual spaces
for each p ∈ M) where ωp (X A ) = A for p ∈ M and A ∈ L(G). Intuitively, Vp P
consists of tangent vectors to P at p that are “vertical” in the sense that they point
along the fibre π −1 (π(p)). These vectors represent infinitesimal movements within
the fibre itself, without leading to any displacement in the base manifold M. Vector
fields XpA belong to the subspace Vp P ⊂ Tp P intuitively they point ‘along’ the fibres.
This vertical subspace is defined by:

Vp P = {τ ∈ Tp P |π∗ τ = 0}

By contrast, horizontal vectors represent transformations between fibres. We for-

malise the definition of a horizontal subspace below.

Definition C.1.26 (Horizontal subspace). The horizontal subspace Hp P at p ∈

P , on the other hand, is selected by the connection and complements the vertical
subspace within Tp P . It is formally defined such that τ ∈ Hp P if and only if ωp (τ ) =
0 (i.e. the kernel of the one-form) and can be related as:

Hp = {τ ∈ Tp P |π∗ τ ̸= 0}. (C.1.43)

Specifically, a connection on P provides a smooth assignment of a horizontal

subspace to each point in P such that the tangent space at any point p decomposes
as a direct sum:

Tp P = Vp P ⊕ Hp P.

The horizontal subspace consists of those vectors that are orthogonal (under a cho-
sen Riemannian or pseudo-Riemannian metric on the bundle) to the vertical space
with respect to a chosen connection. Vectors in Hp P are “horizontal” in the sense
that they correspond to displacements that lead to movement in the base manifold
M when considered under parallel transport defined by the connection. This can
348 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

π∗
TP TM

G P
π

Figure C.1: Commutative diagram showing the relationship of the connection (map from G →
P → M), the projection map π : P → M and induced horizontal map (the pushforward) π ∗

be understood diagramatically. The projection map π : P → M induces a (push-

forward) map π∗ : Tp P → Tπ(p) M: In this structure, A 7→ X A is an isomorphism of
L(G) onto Vp P . The vertical subspace thus comprises those vectors that are in the
kernel of the map π∗ , i.e. they map to the nullspace of T M and such vectors do not
generate evolutions in the base manifold M. Analogously, these vectors ‘point up’
so do not specify a direction to move on M, akin to a vector pointing vertically out
of a surface. The horizontal subspace, by contrast, comprises vectors that generate
translations on the base manifold. This is intuitively equivalent to those vectors
‘pointing’ in some direction of translation on the manifold from say p → p′ ∈ M.
Because p′ has its own fibre (e.g. vector bundle), then those horizontal subspace
vectors ‘point’ horizontally towards other vector bundles.

It is useful to dwell on the notion of orthogonality between vertical and horizontal

subspaces implied by the connection. Indeed in many ways it is more illuminat-
ing to begin with this concept of orthogonality to build intuition around the terms
vertical and horizontal. This provides means of distinguishing between vertical and
horizontal movements in P . The choice of horizontal subspace Hp P at p ∈ P is not
arbitrary. Rather, it is subject to conditions ensuring consistency with the group
action and smooth variation across P :

δg∗ (Hp P ) = Hpg P, ∀g ∈ G, p ∈ P,

where δg (p) = pg denotes the right action of G on P . This condition ensures that
vectors being deemed horizontal is consistent with the geometric structure of the
bundle as defined by the group action.
Intuitively, vertical subspaces manifest the internal symmetry of the bundle en-
coded by the Lie group G, while horizontal subspaces encapsulate the geometry of
how the bundle expands over the base manifold. This geometric structure, facil-
itated by the connection, is crucial for defining parallel transport, curvature and
ultimately geodesics (and their approximations) that characterise time-optimality.
C.1. DIFFERENTIAL GEOMETRY 349

We can thus formally define a connection [48, 49] and [2] in these terms.

Definition C.1.27 (Connection). A connection on a principal G-bundle given as:

G→P →M

is a smooth assignment of horizontal subspaces Hp P ⊂ Tp P , to each point p in the

total space ∈ P such that:

(i) Tp P ≃ Vp P ⊕ Hp P, ∀p ∈ P (i.e. decomposable into direct sum of horizon-

tal/vertical)

(ii) δg∗ (Hp P ) = Hpg P, ∀g ∈ G, p ∈ P (subject to right action) where δg (p) = pg

denoting right action.

An example of a connection is the covariant derivative, which describes how

vectors transform across the fibre (vector) bundle.The commutative diagram in Fig-
ure (C.1) illustrates how the projection map π : P → M gives rise to an induced
horizontal map from the tangent space to the total space:

π∗ : Tp P → Tπ(p) M.

By construction, the kernel of this map is the vertical subspace. Connections can
also be understood usefully in terms of one forms. Connections can be associated
with certain L(G)-valued one-forms ω on P . Recall that the map ℓ : L(G) →
V F lds(P ), A 7→ X A , so ℓ−1 maps back to the Lie algebra L(G). If τ ∈ Tp P , then:

ωp : Tp P → P, ωp (τ ) = ℓ−1 (Vp P (τ ))

which maps back to Vp P . As this map takes us back to L(G), then ℓ is the iso-
morphism of L(G) with Vp P , so we can associate one-forms with maps between the
tangent space and the Lie group as follows:

(i) ωp (X A ) = A, ∀p ∈ P, A ∈ L(G);

(ii) δg∗ = Adg−1 ω, with δg∗ ω(τ ) = Adg−1 ω(τ );

(iii) τ ∈ Hp P ⇐⇒ ωp (τ ) = 0.

As touched upon above, the horizontal subspace of the tangent space T P can thus
be thought of as the kernel of the one-form that maps to the left-invariant Lie group
(L(G)) (see section C.1.34):

Hp P = {τ ∈ Tp P |ωp (τ ) = 0, ωp (τ ) = ℓ−1 (Vp P (τ ))}

350 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

In this formulation, the connection can be expressed as an L(G)-valued one form

on P satisfying (i) to (iii) above. The map ℓ : L(G) → V F lds(P ) maps Lie alge-
bra elements to vector fields on P such that ℓ−1 (Vp P (τ )) would map this vertical
component back to an element of the Lie algebra.

C.1.7.1 Relation to Cartan decompositions

The decomposition into vertical and horizontal subspaces discussed above is par-
ticularly relevant to time optimal control problems. In essence, given the Cartan
decomposition (B.5.2) g = k ⊕ p we associate k as the vertical and p as the hori-
zontal subspace. We can then understand the symmetry relations expressed by the
commutators related to this vertical and horizontal sense of directionality: i.e. given
[k, k] ⊂ k, [k, p] ⊂ p and [p, p] ⊂ k we can see that the horizontal generators under
the adjoint action shift from p ∈ G/K to p′ ∈ G/K, while the vertical generators in
k do not translate those points in G/K. We introduce two definitions to elucidate
this further, the Cartan connection.

Definition C.1.28 (Cartan Connection). A Cartan connection on a principal G-

bundle (P, π, M) (with group G, subgroup H (and subalgebra h) and homogeneous
space G/H) consists of an atlas (see definition C.1.1) U ∈ M and a g-valued one-
form θU : T U → g defined on each chart such that (a) h: θU mod h is a linear
isomorphism for every u ∈ U and (b) for charts U, V , then h : U ∩ V → H:

θV = Ad(h−1 )θU + h∗ ωH (C.1.44)

where ωH is the Maurer-Cartan form of H.

Note that Cartan curvature (equation (C.1.40)) can be expressed as:

1
ΩU = dθU + [θU , θU ] (C.1.45)
2

We can understand the commutation relations in terms of principal bundles and

horizontal/vertical subspaces as follows:

1. Vertical subspace k. The first relation [k, k] ⊆ k provides that the Lie bracket of
two vertical generators remains within the vertical subspace. Geometrically,
this means that transformations generated by elements of k remain within the
fibres of the principal bundle G → G/K. Such vertical movements do not
evolve to other points in the base space G/K, reflecting an intrinsic, self-
contained dynamics within each fibre, akin to internal symmetries. Geometri-
cally, translations by elements of k keep points within the same orbit in G/K.
C.1. DIFFERENTIAL GEOMETRY 351

2. Horizontal to vertical. The second commutation relation [p, p] ⊂ k, indicates

that the commutator of two horizontal elements results in a vertical element.
This property is particularly relevant as it relates to the curvature of the con-
nection in G/K. The horizontal generators under the adjoint action can lead
to a shift that results in a vertical movement, suggesting that purely hori-
zontal displacements can induce curvature in the space, effectively “bending”
into the vertical subspace. This phenomenon is a hallmark of the non-trivial
geometry in subRiemannian spaces and is crucial for understanding the un-
derlying geometric structure of G/K. Elements of p correspond to horizontal
directions that can be projected onto the base space G/K, effectively causing
translations from one orbit to another. This curvature is a manifestation of
how horizontal translations alter the geometric structure of the space, moving
points across different orbits in G/K (see below).

3. Vertical acting on horizontal. The relation [k, p] ⊂ p implies that the action of
vertical elements on horizontal ones results in horizontal displacements. This
can be interpreted as the influence of the group’s internal symmetries on the di-
rectionality of movement within G/K. In other words, the vertical generators,
through their adjoint action, can modify the direction of horizontal generators
without leaving the horizontal plane, thereby affecting the trajectory of points
in G/K without transitioning to vertical movement.

Recall that in the full Lie group G, orbits are generated by the action of its sub-
groups, including K, on points within G. For a subgroup K and an element g ∈ G,
the orbit of g under the action of K is defined as Og = k · g | k ∈ K. These orbits
represent continuous trajectories or paths within G that are traced out by the action
of K, reflecting the inherent symmetry structure imposed by K on G. When con-
sidering the quotient space G/K, which represents the space of left cosets of K in
G, the action of K on G translates differently. Here, each point in G/K corresponds
to an orbit of K in G, and the quotient space essentially collapses these orbits to
single points, with the projection map π : G → G/K sending elements of G to their
corresponding orbits in G/K.

C.1.7.2 Fixed points and orbits

Translations by elements of k preserve the orbit of a point in G/K (reflecting the

isotropy subgroup K’s action where K is the stabiliser (see equation (C.1.42)) of
a point in G/K) while translations by elements of p have the capacity to navigate
across different orbits, revealing the transitive but stratified nature of the action of
G on G/K. This can be related to the fixed-point operation of K. In the quotient
space G/K, the orbits of K in G appear as if K acts as a fixed-point operator. This
352 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

perspective arises because, within G/K, each orbit Og is identified with a single
point, and the distinct actions of elements of K on points in G that would have
moved them along their orbits are now seen as leaving the corresponding points in
G/K invariant. That is:

k(gK) = (kg)K = gK (C.1.46)

for all k ∈ K and g ∈ G, where gK denotes the coset (or point in G/K) correspond-
ing to g. The action of K on any g ∈ G thus translates to the invariance of the coset
gK in G/K, effectively rendering K as acting by fixed points in the quotient space.
To this end, each fibre of G/K can be thus regarded as an equivalence class of orbits.
For any Lie group G and a subgroup K, the quotient space G/K is constructed by
partitioning G into equivalence classes under the equivalence relation:

g ∼ g ′ ←→ g ′ = gk

for k ∈ K. This relation groups elements of G into sets where each set, or equivalence
class, contains elements that can be transformed into one another by the right action
of elements of K. Mathematically, an equivalence class of g ∈ G can be denoted by
the coset gK = {gk | k ∈ K}, which represents an orbit of g under the action of K.

In the quotient topology of G/K, each point corresponds to one such equivalence
class or orbit. The projection map π : G → G/K maps each element g ∈ G to the
equivalence class gK it belongs to. The preimage π −1 (π(g)) = gK under this map
is a fibre over the point π(g) in G/K. Therefore, each fibre of G/K is representative
of an orbit in G under the action of K, signifying that the entire set of elements
in G that are related through multiplication by elements of K are collapsed to a
single point in G/K. This illustrates how the quotient space encapsulates the idea of
moving from the specific (individual group elements in G) to the general (equivalence
classes or orbits in G/K) by abstracting away the internal symmetries represented
by K.

This explicitly builds intuition for the symmetry reducing properties of KP

decompositions and the fundamental role of connections in defining such symmetries
on manifolds. Moreover, as we discuss below, the synthesis of connections in terms
of vertical and horizontal spaces, related to the homogeneous space G/K allows for
a detailed understanding of Riemannian and subRiemannian geometry and, in turn,
geometric control theory.
C.1. DIFFERENTIAL GEOMETRY 353

C.1.8 Parallel transport and horizontal lifts

We come now to important concepts related to geodesics and time optimality. We
are interested in the concept of parallel transport in a principal bundle ξ = (P, π, M)
with a connection given (via the one-form) ω. We are interested in horizontal vec-
tor fields whose flow lines move from one fibre to another, intuitively constituting
translation across a manifold via generators in the horizontal subspace. Parallel
transport shifts vector fields vectors along integral curves such that they are parallel
according to a specified connection. The concept of horizontal lifts is central to this
process by enabling (i) preservation of a concept of direction and (ii) preserving
‘horizontality’ such that the notion of straightness or parallelism of Euclidean space
can be extended to curved manifolds, essential for the study of geodesics. Note that
while all geodesics are generated by generators in the horizontal subspace Hp M, not
all curves generated by horizontal generators are geodesics.

Definition C.1.29 (Horizontal lift). Given a vector field X on M and connection

expressed in terms of ω, its horizontal lift X ↑ is a vector field on P that is everywhere
horizontal with respect to the one-form ω such that given π∗ : Hp P → Tπ(p) M is an
isomorphism, for all X on M there exists a unique vector field X ↑ satisfying the
following for all p ∈ P :

(i) π∗ (Xp↑ ) = Xπ(p)

(ii) Vp (Xp↑ ) = 0

The vector field (Xp↑ ) is called the horizontal lift of X as it ‘lifts’ up the vector field
X on M into the horizontal subspace of T P .

The requirement of Vp (Xp↑ ) = 0, indicates that that X ↑ lies entirely in the hori-
zontal subspace Hp M, encapsulates the essence of parallel transport as maintaining
the direction of X through the fibres of P . For a smooth curve γ in M, a horizontal
lift γ ↑ : [a, b] → P is a curve whose tangent vectors are in the horizontal subspaces
Hγ ↑ (t) P as: horizontal in that:

Vp [γ ↑ ] = 0
π(γ ↑ (t)) = γ(t), ∀t ∈ [a, b].

Given a point p ∈ P above γ(a), there exists a unique horizontal lift γ ↑ starting at
p. This uniqueness underscores the connection’s role in determining a specific way
to transport points along curves in M, embodying the geometric intuition behind
parallel transport.
354 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

Definition C.1.30 (Parallel vector fields). For a curve γ(t) ∈ M, the vector field
X is parallel along γ(t) if the covariant derivative vanishes along the curve, that is
if:

∇γ̇(t) X(t) = 0 ∀t ∈ [a, b] (C.1.47)

where ∇ is the covariant derivative defined below.

Note often we use the notation for γ̇(t) ∈ Tp M the terms ∇γ̇(t) := ∇γ(t) where
γ(t) is any curve that belongs to the equivalence class of [γ̇(t)] (Isham §6.7). In
this work we generally use the notation ∇γ̇(t) to emphasise the covariant derivative
is with respect to γ̇(t). Parallel transport of v ∈ Tγ(a) M along γ(t) for t ∈ [a, b]
is then defined by the following map τ which takes v to the corresponding vector
field X(b) ∈ Tγ(b)M . The vector field X is the unique vector field along γ such
that X(a) = v and ∇γ̇(t) X(t) = 0 is satisfied. Parallelism is expressible in terms of
horizontal lifts as follows [48, 49].
Definition C.1.31 (Parallel transport). For γ : [a, b] → M, the notion of parallel
transport along γ from γ(a) to γ(b) is defined as a map τ : Tγ(a) M → Tγ(b) M
satisfying the following:

τ : π −1 ({γ(a)}) → π −1 ({γ(b)})

The map represents assigning, to each π −1 ({γ(a)}), a point in the horizontal

lift of γ(b). That is, assigning γ ↑ (b) ∈ π −1 ({γ(b)}) to π −1 ({γ(a)}) where γ ↑ is the
horizontal lift of the curve γ that passes p at t = a.
Parallel transport is defined across fibres, where horizontal curves are mapped
into horizontal curves by the right action (δg ) of G on P , for all g ∈ G that commute
with τ i.e. τ ◦δg = δg ◦τ . It can be shown that if p1 = τ (p) then p1 g = τ (p)g = τ (pg)
with the result that τ is a bijection among fibres. The concept of parallel transport
is then extended to fibres via the holonomy group. This is important in Chapter
5, where we are interested in holonomic paths for time-optimal synthesis. The
holonomy group Holp (ω) consists of all elements of G that can be realized as the
result of parallel transporting a reference fibre along closed curves (loops) starting
and ending at p ∈ M.
Definition C.1.32 (Holonomy group). For a loop γ : [0, 1] → M with γ(0) =
γ(1) = p and a horizontal lift γ ↑ of γ starting at a point x ∈ P above p, the endpoint
of γ ↑ is related to x by an element of G. The set of all such elements, for all possible
loops γ based at p, forms the holonomy group Holp (ω).
For a vector field X to be parallel along γ (for all t ∈ [a, b]), the requirement
∇γ̇(t) X(t) = 0 says that the rate of change of the vector field and the connection’s
C.1. DIFFERENTIAL GEOMETRY 355

adjustment of that rate of change sums to zero. Holonomy reflects how connections
on M influence the paths taken by curves on the manifold, where curves traced out
by the action of the holonomy group constitute (time-optimal) subRiemannian or
Riemannian geodesics (discussed further on in this work).

C.1.9 Covariant differentiation

We now come the important concept of covariant differentiation. As noted in the
literature, the challenge of identifying the derivative of a section lies in how to
compare neighbouring points in two different fibres in a way that does not depend
on the local coordinate frame. For bundles equipped with a connection that allows
a pullback points one fibre to those in another, this comparison can be undertaken.
Consider the map ψ : M → PV where ψ is a section, being a smooth map that
assigns a vector in the fibre each point in M (here PV is the total space of the vector
bundle where each fibre is a vector space isomorphic to V (following the notation
of [48])). We wish to understand how ψ varies along curves in the manifold. This
concept is expressed via the covariant derivative as a generalised gradient (definition
C.1.5). Recall that X(M) denotes the set of smooth vector fields on M.

Definition C.1.33 (Covariant derivative). Denote a principal G-bundle ξ = (P, π, M)

with V a vector space linear representation of G. Consider the curve γ : [0, ϵ] → M
such that γ(0) = p0 ∈ M, and let ψ : M → PV be a section of the associated vector
bundle. Then the covariant derivative of ψ in the direction γ at p0 , is

τt−1 ψ(γ(t)) − ψ(γ(0))

∇γ ψ := lim
t→0 t

where τt−1 is the parallel transport operator from the vector space τt−1 ({γ(t)}) to the
vector space τt−1 ({x0 }).

Note we use a slight abuse of notation where ∇γ refers more properly to ∇γ̇
where γ̇(t) is the tangent vector to the curve γ at the point γ(t). Here τt−1 transports
vectors back along γ(t) → γ(0) while preserving parallelism. We interchange with
the notation ∇X which typically refers to the covariant derivative along a vector
field X ∈ X(M) on a manifold M. Here, X is a smooth section of the tangent
bundle T M, and ∇X is an operator that acts on a vector field or section of a vector
bundle. If two curves γ1 , γ2 are tangent at p ∈ M then we have ∇γ1 ψ = ∇γ2 ψ.
Moreover, connecting with the notation:

(a) For v ∈ Tx M, the covariant derivative of ψ of PV along v is represented as

∇v ψ = ∇γ ψ where γ here is any curve in the equivalence class v = [γ].
356 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

(b) For the vector field X on M, the covariant derivative along X is the linear
operator:

∇X : Γ(PV ) → Γ(PV )

on the set of sections Γ(PV ) of the vector bundle PV associated with p ∈ M ,

that is:

∇X ψ(p) = ∇Xp ψ.

Consider now affine connections, noting X(M) is also denoted Γ(PV ). We note that
∇X on X(M) exhibits the structure of derivation:

∇X (f ψ) = f ∇X ψ + X(f )ψ (C.1.48)

for f ∈ C ∞ (M). The ∇X operator is also linear in X(M) (which can also be
regarded as a module over C ∞ (M)). These properties are expressed by considering
∇X as an affine connection.

Definition C.1.34 (Affine connection). An affine connection on M is an operator

∇ : X(M) × X(M) → X(M) which associates with X ∈ X(M) a linear mapping
∇X of X(M) into itself i.e. (X, Y ) 7→ ∇(X, Y ) = ∇X Y satisfying the following two
conditions:

(∇1 ) ∇f X+gY = f ∇X + g∇Y ;

(∇2 ) ∇X (f Y ) = f ∇X (Y ) + (Xf )Y

for f, g ∈ C ∞ (M ), X, Y ∈ X(M).

The affine connection is one way to define the covariant derivative ∇X on a

vector bundle (PV , πV , M), which in turn can be related to the connection one-form
on the principal bundle ξ (see above). The affine connection together with a local
coordinate chart (U, p) can be defined via:

∇∂i = Γkij ∂k

where Γkij = (∇∂i ∂j )k are Christoffel symbols of the second kind and (·)k indicates
the k-th component. For the covariant derivative of Y with respect to X, such k-th
coordinate is given by:

∂Y k

k i k i k j
(∇X Y ) = X (∇i Y ) = X + Γij Y .
∂xi
C.1. DIFFERENTIAL GEOMETRY 357

C.1.10 Geodesics and parallelism

We can use the above concepts to define a notion of parallelism leading to a definition
of geodesics which are curves that locally minimise distance and are solutions to the
geodesic equation derived from a chosen connection. Denote γ : I → M, t 7→ γ(t) for
an interval I ⊂ R (which we generally without loss of generality specify as I = [0, 1])
with an associated tangent vector γ̇(t). Here γ is regular. Two vector fields X, Y
are parallel along γ(t) if:

∇X Y = 0 ∀t ∈ I.

This definition of parallelism involves values of X, Y only on the curve γ. In local

coordinates {xi }, it can be shown that their exist coordinate functions X i , Y j for
i, j ≤ m on the coordinate neighbourhood U as per above such that:

X
i ∂ X
j ∂
X= X Y = Y
i
∂xi j
∂xj

!
X X ∂Y k X i j k ∂
∇X (Y ) = Xi + X Y Γi,j on U
k i
∂xi i,j
∂xk

with the result:

dY k X k dxi j
+ Γi,j Y = 0 (t ∈ J) (C.1.49)
dt i,j
dt

which can be shown [2] to represent equation (C.1.48) in the limit as t → 0. Equation
(C.1.49) indicates that the component form of the covariant derivative along γ(t) is
zero (hence indicative of parallel transport). The covariant derivative operator ∇X
has a number of properties, including that (i) it is a derivation (an (r, s)-tensor being
drawn from the derivation space of tangent and cotangent vectors respectively), (ii)
it preserves tensors (i.e. it maps (r, s) tensors to (r, s) tensors) ∇X : T r,s M → T r,s M
and (iii) commutes with all contractions Cji (see above). From this we come to the
first iteration of a geodesic.

Definition C.1.35 (Geodesic). A curve γ : I → M, t 7→ γ(t)in M is denoted a

geodesic if the set of tangent vectors {γ̇(t)} = Tγ(t) M is parallel with respect to γ,
corresponding to the condition that ∇γ γ̇ = 0, which we denote the geodesic equation.
358 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

In a coordinate fame the geodesic equation is expressed as:

d2 u γ α
γ du du
β
+ Γ αβ =0
ds2 ds ds

where:

1 ∂gµα ∂gµβ ∂gαβ
Γγαβ = g γµ + −
2 ∂uβ ∂uα ∂uµ

are the Christoffel symbols of the second kind (essentially connection coefficients)
with g γµ the inverse of the metric tensor and ds usually indicates parametrisation
by arc length. It can be shown that given an affine connection above, for any
p ∈ M and Xp , there exists a unique maximal geodesic t 7→ γ(t) such that γ(0) = p
and γ̇(0) = Xp , i.e. it cannot be extended to a larger interval. As Nielsen et al.
note [183, 184], the geodesic equation is solved either (a) as an initial value problem
by specifying γ(0) = p, γ̇(0) = v for v ∈ Tp M, or (b) a boundary value problem
for γ(0) = p and γ(1) = q using variational methods. For Lie group manifolds, all
geodesics are generated by generators from the horizontal subspace Hp M, but not
all curves generated from the horizontal subspace are geodesics. Note (for clarity)
with regards to Hp P and Hp M that we can denote the image of Hp P under π∗ as
Hp M, representing bundle’s horizontal structure within the tangent space at π(p)
in M, thereby elucidating how the geometry of the bundle expands over the base
manifold via the connection.

C.2 Riemannian manifolds

The covariant derivative / affine connection ∇ on M allows us to define concepts
of torsion and curvature tensor fields. First, we define the celebrated Riemann
curvature tensor. The tensor intuitively represents
Definition C.2.1 (Riemann curvature tensor). Given an (M, g) Riemannian man-
ifold (see below) and set of vector fields X(M), define the following (1, 3) tensor field:

X(M) × X(M) × X(M) → X(M) (C.2.1)

(X, Y, Z) 7→ R(X, Y )Z = (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (C.2.2)
= ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z (C.2.3)

for any vector fields X, Y, Z ∈ X(M), where ∇ denotes the Levi-Civita connection
associated with the metric g.
The Riemann curvature tensor measures the extent to which the affine connection
(covariant derivative) is not commutative i.e. the failure of the covariant derivatives
C.2. RIEMANNIAN MANIFOLDS 359

to commute and therefore the extent of intrinsic curvature of the manifold. In this
way, it corresponds to the non-holonomy of M. Recall Holp (ω) for a loop γ reflects
parallel transportation of a vector v ∈ Tp M to and from p ∈ M along γ. The
deviation of v from its initial position can be given by the action of the Riemann
curvature tensor along the loop expressed by the Ambrose-Singer Holonomy The-
orem [262] which states that the Lie algebra of Holp (ω) at p ∈ M is generated
by the values of the curvature Ω over all horizontal two-planes in Tp P where p is
the fibre over x (see also section C.2.3.2 below). The properties of Holp (ω) that
describe how parallel transport around closed loops γ distort vectors are related to
the curvature experienced by these vectors in plans spanned by tangent vectors at
Tx M. Integrating Rµν over all such two-dimensional subspaces identifies the set of
possible curvatures that parallel transport can induce i.e. each element in Holp (ω)
corresponds to a unique way of parallel transporting and the relation to curvature
quantifies the amount of twisting or distortion of the vector (Rµν being responsible
for both the curvature experienced and the Lie algebra of Holp (ω)).
Given on Riemannian manifolds the affine connection is given by the Levi-Civita
connection that is metric compatible and torsion free, the Riemann curvature tensor
has an expression in terms of the second covariant derivative:

R(X, Y ) = ∇2X,Y − ∇2Y,X

indicating how the non-commutativity of the second covariant derivatives of vector

fields X and Y represents the intrinsic curvature of M. Given the relationship
between second derivatives and curvature, one can see how the Riemann curvature
tensor gives rise to a measure of curvature on manifolds. In coordinate notation, it
is denoted as:

α
Rβγδ = Γαβδ,γ − Γαβγ,δ + Γµβδ Γαµγ − Γµβγ Γαµδ .

To obtain measures of curvature, one can then perform contractions with the metric
tensor g to obtain the Ricci tensor which is obtained by performing a tensor con-
λ
traction over the first and third indices i.e. Rµν = Rµλν . Scalar curvature is then
obtained via contraction with the inverse metric tensor R = g µν Rµν , which is in
effect a trace operation. Given the definitions

T (X, Y ) = ∇X (Y ) − ∇Y (X) − [X, Y ] (torsion tensor)

R(X, Y ) = ∇X ∇Y − ∇Y ∇X − ∇[X,Y ] (curvature tensor),

With an appropriate one-form, we can then define a torsion tensor field as the
argument for the one-form mapping ω : T ∗ M × T M × T M → R with (ω, X, Y ) 7→
360 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

ω(T (X, Y )) where T (X, Y ) ∈ T21 M (i.e. a type (1,2) tensor). We can define the
curvature tensor field ω : T ∗ M × T M × T M × T M → R with (ω, X, Y, Z) 7→
ω(R(X, Y ) · Z) where R(X, Y ) · Z ∈ T31 M. It can be shown that the relevant one-
forms in turn determine the structure of the covariant derivative on M, while it
can be shown (via Cartan’s structure equations) that the forms ω are themselves
described by the torsion and curvature tensor fields as per below (adapted from
Theorem 8.1 in [2], §8). The relation of curvature and torsion to differential forms
is shown explicitly in Cartan’s structural equations which we set out below.

Theorem C.2.2 (Cartan structural equations). Given a principal bundle connection

(definition C.1.27), Cartan’s structural equations are given by:
X 1X i j
dω i = − ωpi ∧ ω p + Tjk ω ∧ ω k (C.2.4)
p
2 j,k
X 1 X
dωli = − ωpi ∧ ωlp + i
Rljkp ωj ∧ ωk (C.2.5)
p
2 j,k

Here for completeness, dω i represents the exterior derivative (how ω i changes

in all directions), ωpi is a one form representing connection coefficients, encoding
information about non-linear behaviour, the ∧ product captures the idea of an ori-
ented area spanned by the one-forms ω i . Cartan’s structural equations provide an
important tool for using the local behaviour of a connection to understand global
properties of M, such as curvature and torsion.

C.2.1 Riemannian Manifolds and Metrics

We now come to identifying how to calculate lengths of curves on manifolds relevant
to our substantive Chapters. Firstly, we define a (pseudo)-Riemannian structure in
terms of a tensor field. In doing so we gain further insight into metrics in terms
of tensor fields of type (0,2). A Riemannian manifold is the tuple (M, g) (i.e a
manifold M with a metric g) where to each p ∈ M is assigned a positive definite
map gp : Tp M × Tp M → R (described usually in terms of being an inner product)
and an associated norm ||Xp || : Tp M → R. We integrate these concepts in the
formalism above as follows.

Definition C.2.3 (Riemannian structure). Define a pseudo-Riemannian structure

on M as a type-(0,2) tensor field g such that (a) g(X, Y ) = g(Y, X) (symmetric)
and (b) for p ∈ M, g is a nondegenerate bilinear form gp : Tp M × Tp M → R.

If we assume gp is positive definite for all p ∈ M then we speak of Riemannian

structures and manifolds as follows.
C.2. RIEMANNIAN MANIFOLDS 361

Definition C.2.4 (Riemannian Manifold). A Riemannian manifold is a connected

C ∞ (M) manifold with a Riemannian structure such that there exists a unique affine
connections satisfying the following:

(i) ∇X Y − ∇Y X = [X, Y ] (namely the torsion tensor T = 0); and

(ii) ∇Z g = 0 (parallel transport preserves the inner product on tangent spaces).

By applying ∇Z to the tensor field formed from X ⊗ Y ⊗ g, can specify the

Riemann connection. We understand this as follows. An affine connection ∇ is
metric compatible with a metric g when:

X⟨Y, Z⟩ = ⟨∇X Y, Z⟩ + ⟨Y, ∇X Z⟩

which can be expressed using the metric g as:

Xg(Y, Z) = g(∇X Y , Z) + g(Y, ∇X Z).

Metric-compatible connections preserve angles, orthogonality and lengths of vec-

tors in Tp M when transported along γ(t). The fundamental theory of Riemannian
geometry is an existence and uniqueness theorem which asserts the existence of
a torsion-free metric compatible connection, denoted the Levi-Civita connection,
which can be expressed via the Koszul formula as per below.

2g(X, ∇Z Y ) = Zg(X, Y ) + g(Z, [X, Y ]) + Y g(X, Z) + g(Y, [X, Z]) − Xg(Y, Z) − g(X, [Y, Z]).
(C.2.6)

An example of a connection where torsion is non-zero yet remains metric compatible

is the Weitzenböck connection. The metric g determines the local structure but not
global structure. We now define the Riemannian metric.

Definition C.2.5 (Riemannian metric). A Riemannian metric is an assignment to

each p ∈ M of a positive-definite inner product:

gp : Tp M × Tp M → R

with an induced norm:

|| · ||p : Tp M → R
q
(v, w) 7→ gp (v, w).

The Riemannian metric is a (0, 2)-form thus a tensor, aligning definition (C.1.16).
Recall (0, 2)-forms at p ∈ M have a representation as the tensor product of two
362 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

cotangent vectors at p. Given a coordinate basis for Tp∗ M at p, denoted {dx1 , dx2 , . . . , dxn }.
In the case of a (0, 2)-form g, we can express it in this basis as:
X
g= gij dxi ⊗ dxj ,
i,j

where the gij are the components of the form with respect to the basis, and they can
be functions on the manifold. We can then identify an induced norm on M using
this metric. For X ∈ Tp M , denote:

(gp (X, X))1/2 =||X||.

˙ (C.2.7)

We may now define the important concept of arc (curve) length which is fundamental
to later chapters. Given the curve (segment) t 7→ γ(t) with t ∈ [α, β] and metric
gγ = g, define the arc length of γ as follows:

Definition C.2.6 (Arc length). Given a curve γ(t) ∈ M with t ∈ [α, β] and metric
g, the arc length of the curve from γ(0) to γ(T ) is given by:
Z T
ℓ(γ) = (g(γ̇(t), γ̇(t)))1/2 dt. (C.2.8)
0

Given any normal neighbourhood N0 of 0 ∈ M and defining Np = exp(N0 ),

then for any q ∈ Np , denote the unique geodesic in Np joining p to q as γpq . In
this case it can be shown that ℓ(γpq ) < ℓ(γ) for each curve segment joining p and q.
Under this assumption, we can impose a metric upon the Riemannian manifold M
as follows. Assuming M is simply connected, then all p, q ∈ M can be joined via a
curve segment. The distance of p, q is then defined by the infimum of the shortest
curve measured according to the equation (C.2.8) above and is defined as:

d(p, q) = inf L(γ). (C.2.9)

Here d(p, q) satisfies the usual axioms of metrics, namely (a) symmetry d(p, q) =
d(q, p), (b) triangle inequality d(p, q) ≤ d(p, r) + d(r, q) and (c) positive definite
d(p, q) = 0 if and only if p = q. It can then be shown that for such a Riemannian
manifold M with metric d given by equation (C.2.9) together with a curve γpq joining
p, q ∈ M, then if ℓ(γpq ) = d(p, q) then γpq is a geodesic. Then consider an open
ball around p with radius r, Br (p) and associated sphere around p denoted Sr (p).
Assume an open ball Vr (0) = {X ∈ Tp M|0 ≤ ||X|| ≤ r} is a normal neighbourhood
of 0 ∈ Tp M. In this case it can be shown that Br (p) = exp(Vr (0)). For complete
Riemann manifolds (where every sequence in M is convergent), each pair of points
p, q ∈ M can be joined by a geodesic of length d(p, q).
C.2. RIEMANNIAN MANIFOLDS 363

The preservation of distances can then be understood in terms of total geodesicity

of manifolds. A sub-manifold S of a Riemannian manifold M is geodesic at p if each
geodesic tangent to S at p is also a curve in S. The submanifold S is totally geodesic
if it is geodesic for all p ∈ S. It can then be shown that if S is totally geodesic,
then parallel translation along γ ∈ S always transports tangents to tangents, that
is τ : Sp M → Sp M. Summarising the above points:

• Riemann Connection: This refers to the affine connection on a Riemannian

manifold M, representing differentiation of vector fields X(M) along curves γ
and for calculations related to parallel transport.

• Riemann metric: This provides a notion of distance on M, providing a means

of measuring lengths of curves and angles between vectors, thereby defining
the manifold geometry.

• Riemann curvature tensor : This tensor Rµν quantifies curvature on M and

derives from the Riemann connection. Importantly, it is independent of any
coordinate embedding.

• Levi-Civita Connection: Given all connections on a Riemannian manifold with

metric (M, g), the Levi-Civita connection is the unique connection that is both
torsion-free and metric-compatible, i.e. preserving the Riemannian metric
under parallel transport (under which the covariant derivative of the metric
tensor vanishes).

C.2.2 Fundamental forms

In this section we briefly set out a few key theorems and definitions from standard
differential geometry relating to fundamental forms and curvature which are familiar
for differential geometry on Euclidean spaces. In particular we connect standard
results to the coordinate-free formalism adopted above.

(i) First fundamental form. The first fundamental form is an expression of the
metric tensor g on M and is denoted:

Ip (X, Y ) = gp (X, Y ) X, Y ∈ Tp M. (C.2.10)

Given a coordinate chart ϕ : R2 → M where ϕ(X, Y ) → p ∈ M, define as

usual ϕX = ∂X ϕ and similarly for v and dual X ′ = dX/dt. Then (recalling
364 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

g(X, V ) = ⟨X, Y ⟩):

E = g(ϕX , ϕX ) F = g(ϕX , ϕY ) G = g(ϕY , ϕY ) (C.2.11)

!
E F
gij = (C.2.12)
F G
dxi dxj
Ip = E(u′ )2 + 2F u′ v ′ + G(v ′ )2 = gij = ds2 dt (C.2.13)
dt dt

with equivalents for higher-dimensional manifolds. The first fundamental form

thus represents a quadratic form associated with the metric tensor [49].

(ii) Second fundamental form. The second fundamental form describes how sur-
faces curve in ambient space i.e. by reference to a normal to surface. In general,
define a level set function f : M → R as the set of points p ∈ M such that
f (p) = c ∈ R:

Sc = {p ∈ M|f (p) = c}. (C.2.14)

Sc ⊂ M will be a submanifold if ∆f ̸= 0, ∀p ∈ Sc . Recall that ∇f points in the

direction of the steepest incease in f . For ∇f (p) ̸= 0, f is non-constant except
along directions tangent to Sc . Thus Sc resembles a hypersurface in M (where
dim Sc = dim M − 1), which is a generalisation of M being, for example, R3
with Sc = R2 and ∇f being the normal vector ‘pointing out’ of the surface.
Thus we can denote a generalised unit normal vector via N = ∇f /||∇f ||. The
second fundamental form is then defined as:

IIp (X, Y ) = g(∇X N, Y ) = ⟨S(X), Y ⟩

for X, Y ∈ Tp M. Here S(X) = ∇X N , the shape operator which encodes

information about a manifold’s (or submanifold’s) curvature by describing how
the normal vector field N changes along the vector field X for X ∈ Tp M (i.e.
as one moves tangent to the surface). We then have in the coordinate frame:

e = − ⟨Nu , ϕu ⟩ f = − ⟨Nu , ϕv ⟩ = − ⟨Nv , ϕu ⟩ g = ⟨Nv , ϕv ⟩

IIp = e(u′ )2 + 2f u′ v ′ + g(v ′ )2

(iii) Gauss and Codazzi equations. The Gauss equation provides that the Gauss
curvature is a measure of intrinsic curvature of the manifold. In a coordinate
frame, given the first and second fundamental forms in a coordinate basis
with components ϕu , ϕv with principal curvatures given by dN = diag(k1 , k2 ).
C.2. RIEMANNIAN MANIFOLDS 365

Gaussian curvature K is then:

det(hij ) eg − f 2
K= = k1 k2 (C.2.15)
det(gij ) EG − F 2
1 1 eE − 2f F + gG
H = (k1 − k2 ) = (C.2.16)
2 2 EG − F 2

where H is mean curvature (the average of principal curvatures k1 , k2 . Gaus-

sian curvature is extrinsic to the manifold, while mean curvature depends on
how it is embedded in ambient space. In coordinate free form, the Gauss
equation can be expressed as:

[∇X , ∇Y ]Z = −K det(X, Y )JZ (C.2.17)

where det(X, Y ) is the area form of a Riemannian manifold and J is an en-

domorphism (mapping) J : M → M such that ⟨JX, Y ⟩ = det(X, Y ) and is
denoted an almost complex structure J 2 = −I (and so intuitively acts to rotate
by π/2 in a way that transitions for example JX → Y ). These results provide
that the Gaussian curvature K is invariant under local isometry, a statement
of Gauss’s celebrated Theorem Egregium (see [49]). Additionally, the Codazzi
equation provides that the covariant derivative of IIp is symmetric, that is:

[∇X , ∇Y ]N = (∇X ∇Y − ∇Y ∇X )N = ∇[X,Y ] N (C.2.18)

which, when satisfied, ensures that the local shape of the surface given by
normal curvature ∇X N is consistent with the global properties of the manifold
(thereby providing an integrability condition).

C.2.3 Curvature and forms

C.2.3.1 Overview

Curvature is a concept of fundamental importance in geometric methods across

quantum information, algebra, machine learning and control. Curvature also plays
a profound role in the classification of symmetric space of interest in quantum in-
formation, where in many cases computations may be expressed in terms of the
evolution of curves over unitary Lie group manifolds generated by Hamiltonians
drawn from corresponding Lie algebras. In this section, we contextualise curvature
in the language of differential forms, beginning with the concepts of sectional cur-
vature and revisiting curvature tensor fields on the way to defining the celebrated
Riemann curvature tensor.
366 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

C.2.3.2 Sectional curvature

We define sectional curvature on the way to defining the formalism of Riemannian

symmetric spaces, the subject in particular of our final chapter. Sectional curvature
K(σp ) provides a means of classifying curvature on Riemannian manifolds via a two-
dimensional sub-bundle σp ⊂ Tp M. Geometrically it corresponds to the Gaussian
curvature with plane σp constructed from tangents to geodesics γ in the direction
of σp . Recall the definition of the section of a fibre bundle.

Definition C.2.7 (Section of a Fibre Bundle). Let π : E → M be a fibre bundle

with total space E, base space M, and fibre F . A section of this fibre bundle is a
continuous map s : M → E such that π ◦ s = idM , the identity map on M. In other
words, for each point p ∈ M, s(p) is a point in the fibre over p, i.e.,

π(s(p)) = p, ∀p ∈ M. (C.2.19)

This condition ensures that the section lifts each point in the base manifold back
to a specific point in the total space, intersecting each fibre exactly once. For a
vector bundle section then the section is a map assigning to each p ∈ M a vector
in the corresponding fibre. We now define the sectional curvature as taking a two-
dimensional sub-bundle of the section in order to calculate curvature. The intuitive
idea is that sectional curvature allows one to observe how a plane embedded in the
manifold bends or curves as it moves through the space.

Definition C.2.8 (Sectional curvature). For a Riemannian manifold M and point

p ∈ M with a two-dimensional sub-bundle σp ⊂ Tp M with X, Y ∈ σp , define the
sectional curvature K : Tp M × Tp M → R, (X, Y ) → K(X, Y ) as:

g(R(X, Y )Y, X)
K(X, Y ) =
g(X, X)g(Y, Y ) − g(X, Y )2
⟨R(X, Y )Y, X⟩
=
⟨X, X⟩⟨Y, Y ⟩ − ⟨X, Y ⟩2

where R is the Riemann curvature tensor, g is the Riemannian metric tensor and
g(X, Y ) = 0 (orthogonal).

Sectional curvature K thus measures how the Riemann curvature tensor, when
acting on X, Y relates to the area spanned by X and Y . This spanning of an area
is also described by a “2-Grassmanian bundle”, which intuitively speaking is the set
of Riemannian manifold then has constant curvature c if K(H) = c. Indeed it can
be shown (see [2], Prop. 12.3) that sectional curvature determines the Riemannian
curvature such that if for two bundles Tp M, Tq M, Kp = Kq , then Rp = Rq . For a
two-dimensional manifold, the sectional curvature of the plane spanned by (X, Y )
C.3. SYMMETRIC SPACES 367

is the Gaussian curvature at that point. Given Gaussian curvature is restricted to

two-dimensions, sectional curvature in one sense allows a generalisation of Gaussian
curvature measurement to higher-dimension manifolds.

C.3 Symmetric spaces

C.3.1 Overview
Symmetric spaces were originally defined as Riemannian manifolds whose curvature
tensor is invariant under all parallel translation. Cartan allowed the classification of
all symmetric spaces in terms of classical and exceptional semi-simple Lie groups.
Cartan’s classification of symmetric spaces (see Chapters IV and IX of [2]) is of
seminal importance. Locally they are manifolds of the form Rn × G/K where G is a
semi-simple lie group with an involutive automorphism whose fixed point set is the
compact group K while G/K, as a homogeneous space, is provided by a G-invariant
structure.
As Helgason [2] notes, Cartan adopted two methods for the classification prob-
lem. The first, based on holonomy groups provides that for p ∈ M Riemannian, then
the holonomy group (definition C.1.32) of M is the group of all linear transforma-
tions of the tangent space Tp M corresponding to parallel transportation of closed
curves γ ∈ M. Parallel translation is equated with the action of the holonomy
group and leaves the Riemannian metric invariant, leading to a method of classifica-
tion. The more direct second method involves demonstrating the invariance of the
curvature tensor under parallel transport is equivalent geodesic symmetry being a
local isometry for all p ∈ M through which the geodesic cure passes. Such spaces
equipped with geodesic symmetry and global isometry are equipped with a tran-
sitive group of isometries K such that space is represented as a coset space G/K
with an involutive isomorphism whose fixed point is the isometry group K. The
study of symmetric spaces then becomes a question of studying specific involutive
automorphisms of semi-simple Lie algebras, thus connecting to the classification of
semi-simple Lie groups.

Geodesic symmetry is defined in terms of diffeomorphisms φ of M which fix p ∈ M

and reverse geodesics through that point (i.e. when acting the map φ : γ(t = 1) =
q → γ(t = −1) = q ′ for γ(0) = p, such that dφp : Tp M → Tp M, X 7→ dφ(X) = −X
(i.e. dφp ≡ −I where I denotes the identity in Tp M). Here dφ can be thought
of as the effect of φ on the tangent space e.g. in Lie algebraic terms φ(γ(t)) =
φ(exp(X)) = exp(dφ(X)) = exp(−X) for X ∈ g. This means manifold M is lo-
cally symmetric around p ∈ M. That is, the manifold M is locally (Riemannian)
368 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

symmetric if each geodesic symmetries is isometric, i.e. if there is at least one

geodesic symmetry about p which is an isometry (and globally so if this applies for
all p ∈ M). As noted in the literature, this is equivalent to the vanishing of the co-
variant derivative of the curvature tensor along the geodesic. A globally symmetric
space is one where geodesic symmetries are isometries for all M. A manifold being
affine locally symmetric is one which, given ∇, the torsion tensor T and curvature
tensor R, T = 0 and ∇Z R = 0 for all Z ∈ T M.

Definition C.3.1 (Riemannian local symmetric space). A Riemannian manifold

M if for all p ∈ M there exists a normal neighbourhood Np where the geodesic
symmetry for p is an isometry. In other words, a Riemannian local symmetric
space is a manifold M where the Riemann curvature tensor is invariant under all
parallel translations:

∇X R = 0.

It can be shown that this is equivalent to the sectional curvature being invari-
ant under parallel translations. In the formalism of Helgason, given a Riemannian
manifold M with Riemannian structure Q, an analytic Riemannian manifold is one
where both M and Q are analytic. From this we obtain the global form i.e. a
Riemannian globally symmetric space if all p ∈ M are fixed points of an involutive
symmetry θ such that θ2 = I. In this case θ is also a geodesic symmetry in that
θ(p) induces γ̇p → −γ̇p i.e. a reflection symmetry along the geodesic. As Helgason,
following Cartan, notes, in such cases certain facts follow from being a Riemannian
globally symmetric space G/K (in which case (G, K) is sometimes denoted a Rie-
mannian symmetric pair), with a Cartan decomposition as discussed above and an
association of Hp M ∼ p and Vp M ∼ k. It can be shown then that the Riemann
curvature tensor for the symmetric space (homogeneous) G/K with a Riemannian
metric allows for curvature, via R:

Rp (X, Y )Z = −[[X, Y ], Z] X, Y, Z ∈ p. (C.3.1)

The curvature tensor is independent of the group action of h ∈ G, that is, the
curvature tensor remains invariant. In geometric terms, this is reflected by the fact
that the pullback of the metric tensor by h corresponds to the original metric tensor
itself. This G-invariance property of symmetric spaces G/K means that the parallel
transport of vectors remains the same.
Curvature plays an important role in the classification of symmetric spaces.
Three classes of symmetric space can be classified according to their sectional cur-
vature as follows. Given a Lie algebra g equipped with an involutive automorphism
C.3. SYMMETRIC SPACES 369

θ2 = I, with corresponding group G with G/K as above, we have a G-invariant

structure on the Riemannian metric i.e. g(X, Y ) = g(hX, hY ) for h ∈ G/K. Then
the three types of symmetric space are:

(i) G/K compact, then K(X, Y ) > 0;

(ii) G/K non-compact, then K(X, Y ) < 0; and

(iii) G/K Euclidean, then K(X, Y ) = 0.

The non-compactness or compactness can be calculated via the rank of M = G/K,

being the maximum dimension of a subspace of T M where K = 0.

C.3.2 Classification of symmetric spaces

The classification of Riemannian symmetric spaces by Cartan (see Knapp [9] and
Helgason [2] for detailed discussion, especially Helgason Chapter IX) terms of simple
Lie groups is set out below in Table C.1. Type I spaces are non-compact and are
associated with non-compact simple Lie groups, while Type II spaces are their com-
pact counterparts. Type III spaces are distinct in that they are Riemannian duals of
themselves. Type IV spaces involve complex structures and are related to complex-
ifications of simple Lie algebras. The families of simple Lie groups are denoted by
An , Bn , Cn , and Dn , where An corresponds to the special linear group SL(n + 1, C),
Bn to the special orthogonal group SO(2n + 1), Cn to the symplectic group Sp(n),
and Dn to the special orthogonal group SO(2n). Additionally, the exceptional Lie
groups are denoted as E6 , E7 , E8 , F4 , and G2 , each signifying a unique algebraic
structure that induces specific geometric properties on the associated symmetric
spaces.

Table C.1: Classification of Riemannian Globally Symmetric Spaces (adapted from Helgason [2]
as compiled in Wikipedia)
Type Lie Algebra Class Symmetric Space Alternative Description
I (II) AI(n) SU (n)/SO(n) -
I (II) AII(2n) SU (2n)/Sp(2n) SU (2n)∗ /Sp(2n)
I (II) AIII(n, r) U (n)/[U (r) × U (n − r)] U (r, n − r)/[U (r) × U (n − r)]
I (II) BDI(n, r) SO(n)/[SO(r) × SO(n − r)] SO(r, n − r)/[SO(r) × SO(n − r)]
I (II) DIII(n) SO(2n)/U (n) SO(2n)∗ /U (n)
I (II) CI(n) Sp(2n)/U (n) Sp(2n, R)/U (n)
I (II) CII(n, r) Sp(2n)/[Sp(2r) × Sp(2n − 2r)] Sp(2r, 2n − 2r)/[Sp(2r) × Sp(2n − 2r)]
III (IV) A(n) SU (n) × SU (n)/SU (n) -
III (IV) BD(n) SO(n) × SO(n)/SO(n) SO(n, C)/SO(n)
III (IV) C(n) Sp(2n) × Sp(2n)/Sp(2n) -
370 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

C.4 SubRiemannian geometry

We now discuss subRiemannian geometry which is fundamental to Chapters 4 and
5 of this work where the problem at hand is to develop techniques for time optimal
synthesis in relation to subRiemannian manifolds. Most of the material is drawn
from Montgomery’s seminal work on subRiemannian geometry [50] together with
other standard source material on topic. That is, subRiemannian geometry involves
a manifold M together with a distribution upon which an inner product is defined.
Distribution in this context refers to a linear sub-bundle of the tangent bundle of
M and corresponds to the horizontal subspace of T M discussed above and where
the vertical subspace is non-null. SubRiemannian manifolds are characterised where
the distribution is not the entire bundle, or in the language of Lie algebra, where
for a decomposition g = k ⊕ p, we have for our accessible (or control) subspace
p ⊂ g rather than p = g. This means that the generators or vector fields X(M) are
constrained to limited directions. A transformation not in the horizontal distribution
may be possible (as we discuss for certain subspaces where [p, p] ⊆ k), but such that
the geodesic paths connecting the start and end points will be longer than for a
Riemannian geometry on M. Formally, we define a subRiemannian manifold as
follows.

Definition C.4.1 (SubRiemannian manifold and distributions). A subRiemannian

manifold (geometry) on M consists of a distribution ∆, being a vector sub-bundle
Hp M ⊂ T M together with a fibre inner product on Hp M. The sub-bundle cor-
responds to the horizontal distribution, having the meaning ascribed to horizontal
subspace Hp M above.

A curve on M is a horizontal curve if it is tangent to Hp M. Similar to Rie-

mannian manifolds, subRiemannian length ℓ = ℓ(γ) (for γ smooth and horizontal)
is then defined similarly as:
Z
ℓ(γ) = ||γ̇(t)||dt (C.4.1)

p
where ||γ̇(t)|| = ⟨γ̇(t), γ̇(t)⟩ computed over the horizontal subspace HM. Sub-
Riemannian distance dS is defined as the infimum of all such lengths, namely
dS (A, B) = inf ℓ(γ) of all curves connecting A, B ∈ M. To guarantee that a hor-
izontal curve has a minimal or geodesic distance, we require that the curve γ be
absolutely continuous which is to say that its derivative γ̇(t) ∈ Hp M for all t. In
this case we can define a subRiemannian minimising geodesic as that absolutely
continuous horizontal path that realises the distance between two points in M.
C.4. SUBRIEMANNIAN GEOMETRY 371

Sometimes we define the energy of a horizontal curve as:

Z
1
E(γ) = ||γ̇(t)||2 (C.4.2)
γ 2

as it can be more convenient to minimise this quantity rather than ℓ(γ). A hori-
zontal curve γ that minimises E among all curves also minimises length and can
be parametrised by constant speed (analogous to parametrisation by constant arc
length for minimal geodesics) via d(q0 , q1 )/T for γ(0) = q0 , γ(T ) = q1 ∈ M.

C.4.1 SubRiemannian Geodesics

As Montgomery notes, while a Riemannian metric is defined by a covariant (0, 2)
form. In subRiemannian geometry, the metric is defined only on a subset of the
tangent space (the horizontal distribution), unlike in Riemannian geometry where it
is defined over all TM . Instead, it is usual for a subRiemannian metric to be encoded
as a contravariant symmetric two-tensor as a section of the bundle. We denote this
the cometric as follows:

Definition C.4.2 (SubRiemannian (co)metric). The subRiemannian metric, or co-

metric, is a section of the bundle T M ⊗ T M of symmetric bilinear forms on the
cotangent bundle T ∗ M. This gives rise to a fibre-bilinear form:

⟨·, ·⟩ : T ∗ M ⊗ T ∗ M → R.

The form defines a symmetric bundle that maps β : T ∗ M → T M, α(βp (µ)) =

((α, µ))q where α, µ ∈ Tp∗ M and p ∈ M. Symmetric here refers to the fact that β
equals its adjoint β ∗ . The cometric then satisfies the conditions that Im(βp ) = Hp M
and it considered a subRiemannian inner product α(v) = ⟨βp (α), v⟩p for v ∈ Hp M,
α ∈ T ∗ M. Using the subRiemannian metric we specify a system of Hamilton-Jacobi
equations on T ∗ M the solution to which is a subRiemannian geodesic.
This formalism can be used to define a subRiemannian Hamiltonian given by
H(p, α) = 21 (α, α)p where α ∈ T ∗ M and (·, ·)p is the cometric. As Montgomery
notes, β uniquely determines the subRiemannian structure. The formalism also has
its own form of Hamiltonian differential equations (see [50] Appendix A) given by
˙ i = −∂H/∂xi , denoted the normal geodesic equations. Here xi are
ẋi = ∂H∂pi , ∂p
the coordinates and pi are momenta functions for coordinate vector fields.
In this sense, Riemannian geometry is a special case of subRiemannian geometry
where the distribution ∆ = g or the entire tangent bundle and with the cometric
g µν . As with the Riemannian case, there are equivalent existence theorems regarding
minimal geodesics.
372 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

Theorem C.4.3 (Normal subRiemannian geodesics). Given a solution to the sub-

Riemannian equations on T ∗ M given by (γ(t), p(t)) with γ(t) ∈ M, then every
sufficiently short curve segment γ(t) is a minimising subRiemannian geodesic and γ
is the unique minimising geodesic between endpoints γ(0) = q, γ(T ) = p.

Here γ(t) is a projected curve, i.e. a curve constituted via projection onto the
subRiemannian manifold. As noted in the literature, there are subRiemannian ge-
ometries for which minimal geodesics exist that are not solutions to subRiemannian
Hamiltonian equations, so-called ‘singular’ geodesics. The existence of subRieman-
nian geodesics and importantly the ability for curves generated by distributions ∆ to
effectively cover the entirety of M (thus, in a quantum information context, making
any U (t) in principle reachable via only the subset p), relies upon bracket-generating
properties of ∆ and Chow’s theorem.

Definition C.4.4 (Bracket generating (distribution)). A distribution ∆ (as a col-

lection of vector fields {Xi }) is called bracket generating if the application of the Lie
derivative among all Xi spans the entire tangent bundle. Equivalently, a distribution
(in our case defined by a control subset of the Lie algebra) ∆ = h ⊂ g for a Lie alge-
bra g is bracket generating if repeated application of the Lie derivative (commutator)
(definition B.2.6) obtains the entirely of g.

By a theorem of Chow and Raschevskii (see [50]), it can be shown that if M

is connected with a bracket generating distribution and complete (with respect to
the subRiemannian metric), then there exists a minimising geodesic connecting any
two points of M. For a decomposition g = k ⊕ p, then we are interested in bracket
generating control subsets p such that [p, p] ⊆ k such that we have access to the
entirety of g (recall p is our horizontal subspace (distribution) Hp M).
A subRiemannian structure on M is, as noted above, given by the sub-bundle
distribution ∆ ⊆ Hp M. Noting the canonical projection π∆ : ∆ → M being the
canonical projection of the bundle with fibres at p ∈ M given by ∆p = π∆-1 ⊆ Hp M.
Assume dim ∆p = m, independent from the dimension of p ∈ M. Such a setup
corresponds to, in traditional control contexts, the existence of m smooth vector
fields Xp (M) = {X1 , ..., Xm } such that all points p ∈ M share the same distribution
i.e. ∆p . We assume that Xp is bracket generating such that it is the smallest p which
can generate the entire g. With the assumption that ∆p is the same for all p ∈ M,
then we have that for all p ∈ M, g = Tp M. The Riemannian metric restricted to
∆p ∈ Tp M provides a smooth, positive definite inner product for generators in ∆p
(also, for convenience, we assume the usual orthonormality of vectors and covectors).
C.5. GEOMETRIC CONTROL THEORY 373

C.5 Geometric control theory

C.5.1 Overview
The primary problem we are concerned with in our final two chapters is solving
time optimal control problems for certain classes of Riemannian symmetric space.
In our case, time optimal control is equivalent to finding the time-minimising subRie-
mannian geodesics on a manifold M corresponding to the homogeneous symmetric
space G/K. Our particular focus is the KP problem, where G admits a Cartan
KAK decomposition where g = k ⊕ p, with the control subset (Hamiltonian) com-
prised of generators in p. In particular such spaces exhibit the Lie triple property
[[p, p], p] ⊆ k given [p, p] ⊆ k. Here, g remains in principle reachable, but where
minimal time paths constitute subRiemannian geodesics. Such methods rely upon
symmetry reduction [209, 210]. As D’Alessandro notes [15], the primary problem in
quantum control involving Lie groups and their Lie algebras is whether the set of
reachable states R (defined below) for a system is the connected Lie group G gener-
ated by g = span{−H(u(t))} for H ∈ g (or some subalgebra h ⊂ g) and u ∈ U (our
control set, see below). This is manifest then in the requirement that R = exp(g).
In control theory g is designated the dynamical Lie algebra and is generated by the
Lie bracket (derivative) operation among generators in H. The dynamical adage is
a reference to the time-varying control functions u(t).

C.5.2 Geometric control preliminaries

Recall from section A.2 that we define a control system (definition A.2.3) as time-
dependent vector fields defined over a differentiable manifold M parametrised by
control functions u(t) which give rise to a set of solutions to the differential state
equation (A.2.6) in the form of admissible control trajectories γ(t). Optimal control
aims to minimise a functional (an objective) over such control trajectories subject
to constraints. In control theory [147] (§2 and §4) we define a control system as
follows.

Definition C.5.1 (Control system). A control system is a 4-tuple Σ = (M, U, f, U)

where M is the state space represented as a differentiable manifold, U ⊂ Rm the set
of controls for uk (t) ∈ U , f are the dynamics and U is the set of admissible controls.

In the general case, M is a C r manifold (see section C.1.1) and U are Lebesgue
integrable bounded functions in U (Lesbegue measurability is required as restricting
to piecewise continuous controls does not guarantee optimality). The dynamics
where γ̇(t) = f (t, γ, u) (i.e. equation (A.2.6) are defined such that f : R × M × U →
374 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

M. The values of t ∈ I over which equation (A.2.6) has a unique solution is denoted
the admissible control trajectory which we discuss below in the following subsections.
Returning to horizontal curves above, we assume that such curves γ : R ⊃
[0, T ] → M satisfy certain usual Lipschitz continuity requirements (almost every-
where differentiable as per above). We assume that ||γ̇(t)|| is essentially bounded as
follows.
Definition C.5.2 (Essentially bounded). A tangent γ̇(t) is essentially bounded
where there exists a constant N and mapping H : [0, T ] → T M, t 7→ H(t) = γ̇
such ⟨H(t), H(t)⟩S ≤ N for all t ∈ [0, T ].
This condition ensures that H(t) = γ̇(t) almost everywhere along the curve
(which we assume to be regular γ̇ ̸= 0). As discussed above, horizontal curves are
then those whose tangents (generators) are in the horizontal distribution for such
a curve, that is where γ̇(t) ∈ ∆γ(t) . Recalling our vector fields Xj are differential
operators on γ(t), we can then write the curve in terms of control functions and
generators.
Definition C.5.3 (Horizontal control curves). Given γ(t) ∈ M with γ̇(t) ∈ ∆γ ⊆
Hp M, we can define horizontal control curves as:
m
X
γ̇(t) = uj (t)Xj (γ(t)) (C.5.1)
j

where uj are the control functions given by:

uj (t) = ⟨Xj (γ(t)), γ̇(t)⟩ . (C.5.2)

The boundedness of γ̇(t) sets an overall bound on uj (t). Recall that the length
of a horizontal curve is given by:
v
T T T u m 2
Z Z Z uX
p
ℓ(γ) = ||γ̇(t)||dt = ⟨γ̇(t), γ̇(t)⟩dt = t uj (t)dt. (C.5.3)
0 0 0 j

When [0, T ] is normalised this is equivalent to parametrisation by arclength. As

noted in [56], the consequence of the Chow-Raschevskii theorem, M being connected
and X(M) being bracket generating renders (M, d) a subRiemannian metric space
where dS is the subRiemannian distance function.

C.5.3 Time optimal control

For subRiemannian and Riemannian manifolds, the problem addressed in our final
two chapters in particular is, for a given target unitary UT ∈ G, how to identify the
C.5. GEOMETRIC CONTROL THEORY 375

minimal time for and Hamiltonian to generate a minimal geodesic curve such that
γ(0) = U0 and γ(T ) = UT for γ : [0, T ] → M. This is described [56] as the minimum
time optimal control problem [23]. In principle, the problem is based upon the two
equivalent statements (a) γ is a minimal subRiemannian (normal) geodesic between
U0 and UT parametrised by constant speed (arc length); and (b) γ is a minimum
time trajectory subject to ||⃗u|| ≤ L almost everywhere (where u stands in for the
set of controls via control vector ⃗u). SubRiemannian minimising geodesics starting
from q0 to q1 (our UT ) subject to bounded speed L describe optimal synthesis on
M. There are two loci of geodesics related to their optimality.

(i) Critical locus (CR(M)) being the set of points where minimising geodesics are
not optimal. This is defined such that for p ∈ CR(M), the horizontal curve
γ is a minimising geodesic for all points t ∈ [0, T ] but not for infinitesimally
extended interval [0, T + ϵ], the set of such points being critical points; and

(ii) Cut locus (CL(M)) being the set of points p ∈ M reached by more than one
minimising geodesic (γi , γj ) over t ∈ [0, T ], where at least one is optimal (such
points p ∈ M then being cut points).

When minimising γ are analytic functions of t ∈ [0, ∞) i.e. CL(M) ⊆ CR(M), then
cut points are critical points. In terms of optimal control, the critical locus indicates
points at which a geodesic ceases to be time optimal beyond a marginal extension
of the parameter interval as per above. Conversely, the cut locus represents points
where multiple minimal geodesics intersect, delineating the farthest extents of unique
geodesic paths within the reachable set, thereby affecting optimality by introducing
alternative minimal paths. An important concept with a geometric framing and one
drawn from control theory is that of a reachable set.

Definition C.5.4 (Reachable set). The set of all points p ∈ M such that, for
γ(t) : [0, T ] → M there exists a bounded function ⃗u, ||⃗u|| ≤ L where γ(0) = q0 and
γ(T ) = p is called the reachable set and is denoted R(T ).

Note for reachable sets under usual assumptions we have T1 ≤ T2 implies R(T1 ) ⊆
R(T2 ). For optimal trajectories, p = γ(T ) belong on the boundary of R(T ). In
general, targets are in the space of equivalence class of orbits of [p]. For our particular
case, we assume that G = M. For convenience and notational efficiency we denote
by g ∈ G the relevant group diffeomorphisms of M. We also make reasonable
assumptions about the existence of a minimal orbit type and that for points on
the same orbit q2 = gq1 , then one can transition between minimising geodesics
parametrised by constant speed L via the group action γ1 = gγ2 , which effectively
pushes γ around the orbit while satisfying time optimality.
376 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

C.5.4 Variational methods

Of fundamental importance to optimal control solutions (in geometric and other
cases) is the application of variational calculus in terms of Euler-Lagrange, Hamil-
tonian and control optimisation formalism represented by the Pontryagin Maximum
Principle below. In the working below, we follow Jurdjevic’s geometric framing of the
Pontryagin maximum principle (PMP) (see [23], Chapter 11). The maximum princi-
ple represents an extension of techniques developed by Weierstrass and others, with a
particular focus on optimality conditions and the distinction between ‘weak’ (Euler-
Lagrangian) and ‘strong’ (PMP) minima. The PMP is a cornerstone of optimal
control theory. It sets out the necessary and sufficient conditions for optimal control
of curves γ(t) for dynamical systems defined on a differentiable manifold M ∼ Rn .
The principle assumes the existence of functions (f1 (γ(t), u(t)), . . . , fm (γ(t), u(t)))
on M with u(t) control variable u ∈ U ⊂ Rm . Here U is the control set which is
parameterized alongside γ as u = u(t), γ = γ(t) for t ∈ I = [t0 , t1 ] (under usual
assumptions of boundedness and measurability of t). The evolution of γ(t) in M is
driven by the interaction of the state and control according to the differential state
equation:

γ̇i (t) = fi (γ(t), u(t)) (C.5.4)

for almost all t ∈ I. Solution curves γ(t) are typically unique under these conditions.
The PMP approach employs a variational equation to analyze how small variations
in the initial conditions affect the system’s evolution. This principle is crucial for de-
termining optimal trajectories that satisfy both the dynamical constraints imposed
by the system and the objective of minimizing a specific cost function.

C.5.4.1 Pontryagin Maximum Principle

Consider as our differentiable manifold M ∼ Rn on which we define C ∞ (M) func-

tions (f1 (γ, u), ..., fm (γ, u)) where our coordinate charts ϕ ∈ S ⊂ Rn parametrise
curves γ(t) ∈ M. We have as our control variable u ∈ U ⊂ Rm where U is our
control set. Both are parametrised as u = u(t), γ = γ(t) for t ∈ I = [t0 , t1 ] (as-
suming boundedness and measurability). The evolution of γ ∈ M is determined by
the state and the control, according to the differential equation. We have the state
equation:

γ̇i (t) = fi (γ(t), u(t)) (C.5.5)

for almost all t ∈ I. Solution curves γ(t) can in typical circumstances be regarded as
unique. The classical PMP approach then posits a variational equation such that for
C.5. GEOMETRIC CONTROL THEORY 377

the integral curve γ(t), which expresses how small variations in the initial conditions
of a dynamical system affect its evolution. For a solution curve γ(t), the matrix:

∂fi
Aij (t) = (γ(t), u(t)) (C.5.6)
∂γj

expresses how the vector field (whose components are fi ) changes around γ(t). The
variational system along the curve γ(t), u(t) is then:
n
dvi X
= Aij (t)vj (t) (C.5.7)
dt j

a set of differential equations describing the evolution of a small perturbation (i.e. a

variation) of γ(t) by v(t). The solution to v(t) describes how sensitive the system is
to such perturbations over time and thus how stable it is. Related to the variational
system is the adjoint system a set of differential equations for optimality with respect
to p(t):
n
dpi X
=− pj Aji (t) (C.5.8)
dt j

where pi are costate variables (see below, essentially akin to Lagrange multiplier
terms). Solutions to equation (C.5.8) allow construction of the time optimal Hamil-
tonian. Solutions γ(t), v(t) satisfy:
n
X
pi (t)vi (t) = c, (C.5.9)
i=1

for some constant c, such that the solutions have a constant inner product over time,
constituting level sets in their applicable phase space. The systems can be expressed
via the Hamiltonian as a function of γ, p and u as follows:
n
X
H(γ, p, u) = p0 f0 (γ, u) + pi fi (γ, u) (C.5.10)
i=1

with:

dγi ∂H
= (γ(t), p(t), u(t)) = fi (γ(t), u(t)) (C.5.11)
dt ∂pi
n
dpi ∂H X ∂fi
=− (γ(t), p(t), u(t)) = pj (γ(t), u(t)). (C.5.12)
dt ∂γi j
∂xi
378 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

Note that f0 (γ(t), u(t)) is the function we seek to minimise (hence p0 ≤ 0 below)
while costates pi act as in effect Lagrange multiplier terms such that fi represent the
rate of cost for exerting control u(t) given state γ(t). Optimisation is then expressed
in terms of a cost functional which quantifies the evolution along the curve from
points p(t0 ) to p(t1 ) in S ∈ M:
Z t1
C(u) = f0 (γ(t), u(t))dt
t0

where (γ(t), u(t)) is denoted a trajectory and C(u) the cost function. A trajectory
is optimal relative to γ(t0 ) = a, γ(t1 ) = b if it is the minimal cost among all trajec-
tory costs from a to b. The optimal control problem is then constituted by finding
an optimal trajectory. The PMP then specifies necessary conditions that any opti-
mal trajectory must satisfy. From this, we can obtain the concept of the maximal
Hamiltonian HM associated with an integral curve (γ(t), p(t), u(t)):

HM (γ(t), p(t)) = sup H(γ(t), p(t), u(t)). (C.5.13)

u∈U

The PMP transforms the problem of minimising C(u) into one of Hamiltonian max-
imisation. It does so by the introduction of the costate variables p(t) which are
analogous to Lagrange multipliers, enabling inclusion of system dynamics in the
optimisation. Thus the Hamiltonian is defined to include the dynamics and cost
function with co-state variables. In essence it assumes an optimal minimum length
given by the cost p0 f0 (γ, u) term which is negative and then adds to the Hamiltonian
P
the i pi fi (γ, u) representing the additional energy or cost from the controls. The
PMP is formally given below.

Definition C.5.5 (Pontryagin maximum principle). The maximum principle for

solving the optimal control problem provides that for a trajectory (γ(t), u(t)) evolving
from a → b over interval t ∈ I = [0, T ], there exists a non-zero absolutely continuous
curve p(t) on I satisfying:

(i) (γ(t), p(t), u(t)) is a solution curve to the Hamiltonian equations (C.5.12).

(ii) H(x(t), p(t), u(t)) = HM (x(t), p(t)) for almost all t ∈ [0, T ]; and

(iii) p0 (T ) ≤ 0 and HM (x(T ), p(T )) = 0.

C.5.4.2 PMP and quantum control

Following [15, 23], typical quantum control problems are fundamentally take one of
three forms: (a) the problem of Mayer (where C(u) = C(γ(T ), T ), (b) Lagrange
C.5. GEOMETRIC CONTROL THEORY 379
RT
where C(u) = 0 L(γ(t), u(t), t), dt where CT denotes some cost functional for time
T , combined into a more general Bolza form:
Z T
C(u) = CT (γ(T ), T ) + L(γ(t), u(t), t)dt (C.5.14)
0

subject to the constraint imposed by the Schrödinger equation (A.1.18). Both the
Hamiltonian and unitary target U are complex-valued, however they can be rep-
2
resented in real values via the mapping Cn → Rn . For the optimal solution, one
assumes the existence of a set of optimal controls uj (t) which may be unknown and
replaces them with a control approximating u, given by uϵ . The cost function is
then:

C(uϵ ) − C(u) ≥ 0. (C.5.15)

Following this, there are then two types of variation condition, known as strong
variation and weak variation describing how varied the choice of uϵ is from the true
u, which can be used to optimise. The PMP problem above is based upon a strong
variation then involves assumptions about how much uϵ varies over intervals. One
assumes that for τ ∈ (0, T ]:

u if t ∈ [0, τ − ϵ] and (τ − ϵ, τ ]
ϵ
u = (C.5.16)
v t ∈ (τ, T ]

where v is any value in the admissible control set U ∈ Rm . In this context, the PMP
is then expressed as that assuming u(t) as the optimal control and γ(t) as a solution
equation (C.5.5) in the form of Schrödinger’s equation exist, there exists a non-zero
(vector) solution to the adjoint equations and terminal equation at T :

ṗT = −pT fi (γ(t), u(t)) pT (T ) = −ϕγ (γ(T )) (C.5.17)

where for all τ ∈ (0, T ]:

pT (τ )f (γ(τ ), u(τ ) ≥ pT (τ )f (γ(τ ), u(τ ). (C.5.18)

Practically speaking, to apply the PMP in quantum contexts, the following proce-
dure is used. First we define the Hamiltonian (equation (C.5.12)). For each state γ
and costate p, we maximise the Hamiltonian over u ∈ U such that:

H(p, γ, u) ≥ H(p, γ, v) (C.5.19)

380 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

recalling v is determined as per equation (C.5.16). The solutions to this maximi-

sation problem enable us to write u = u(γ, p) i.e. as a function of the state and
costate. We can convert the general loss (equation (C.5.14)) into a ‘Mayer form’
via expressing ϕ as a function of γ and the loss Lagrangian L. In this case, the
differential equations (C.5.12) then become:

pTi = −pTi fi (γi , u(γi , pi ) − µL µ̇ = 0 (C.5.20)

with boundary conditions that γ(0) = γ0 (and that pT (T ) = −ϕγ (γ(T )). Controls
that satisfy these conditions are denoted extremal. The Hamiltonian then is written
as follows as a function of the controls:

H(p, γ, u) = pT f (γ, u) − L(γ, u) (C.5.21)

= pT H(u)γ − L(γ, u) (C.5.22)
= H(u)p + L(γ, u) (by skew-symmetry) (C.5.23)

with adjoint equations:

ṗT = −pT H(u)TL p(T ) = −ϕT (γ(T )) (C.5.24)

for a typical qubit control problem where H ∈ g = su(2) for final state target
P
Uf ∈ G = SU (2), with Pauli generators σj where H = j uj (t)σj . The cost
function is then of the form:
Z T X
†
J(uj ) = Re(U (T )Uf ) + η uj (t)σj dt (C.5.25)
0 j

where η is a parameter controlling the contribution of the second term to the cost
functional. It can be shown (see [15] §6) that the optimal controls are then of the
form:

uj (t) = A cos(ωt + B) (C.5.26)

where A, ω and B are free parameters which can be determined through a minimisa-
tion procedure. As noted in the literature, the solutions to the PMP equations are
extremal curves such that the trajectories γ(t) can be said to reside on the boundary
of the reachable set R. Solving the PMP can be challenging. One class of tractable
problems is when the targets UT ∈ G belong to a Riemannian symmetric space
where the symmetry properties of the related Cartan decomposition can, in certain
cases, allow the control solutions uj to be found in closed form. This is the main
focus of Chapter 5. In Chapter 4, we also discuss and examine the work of Nielsen
C.5. GEOMETRIC CONTROL THEORY 381

et al. and others on geometric quantum control [181,184,185] where minimisation of

paths on corresponding Lie group SU(2n ) manifolds is related to obtaining minimum
bounds on circuit complexity (a research programme built upon in later work by
Brown and Susskind [178, 179]).

C.5.5 Time optimal problems with Lie groups

In this section, we examine the general variational problem above framed in terms
of the geometric and algebraic formalism of Lie groups. One of the motivations for
the geometric turn in much of control theory (where applicable) is that geomet-
ric tools and techniques have been able to provide an efficient framework through
which to satisfy the optimisation requirements of the PMP. Recall the two overar-
ching principles for optimal control are (a) establishing the existence of controllable
trajectories (paths) and thus reachable states; and (b) to show that the chosen
path (or equivalence class of paths) is unique by showing it meets a minimisation
(thus optimisation) criteria. For certain classes of control problem, one can leverage
geometric and symmetry properties to satisfy these requirements.
When the target manifold M is a Lie group G with a Lie algebra g, then for
any element in G to be reachable is equivalent to G ∼ exp(g) by the properties
of the exponential homomorphism (section C.1.5.3). This will be the case when
Hamiltonians H are constructed from a distribution ∆ ⊆ g (with controls applied
to generators in ∆) that is a bracket-generating set (definition C.4.4) such that the
application of the Baker-Campbell-Hausdorff formula (definition B.2.18) generates
arbitrary targets in G. Thus satisfies the existence requirement for PMP solutions.
The uniqueness requirement is then satisfied via results from geometry that guar-
antee (a) the metric structure that allows calculation of arc-length or energy and
(b) that provides for there to be a solution (or class of solutions) which are unique
by reason of minimising time or energy as calculated by way of the said metric,
that is, unique geodesics. See standard presentations in [23, 46] for more technical
exposition. Thus geometric control involving Lie groups can, in certain cases, offer
a way to considerably simplify complex PMP optimisation problems. We elucidate
specific applications of the PMP in the quantum case below.

C.5.5.1 Dynamical Lie algebras

Certain approaches to geometric control use the language of dynamical Lie algebras
defined by the criterion that:

R = exp(h) (C.5.27)
382 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

for some Lie algebra h where h = span{−iH(u(t))}. ‘Dynamical’ here refers to the
fact that h is composed of generators multiplied by control functions u(t) which
are time-dependent and thus dynamic. Such a condition is equivalent to complete
controllability, namely that every UT ∈ G is obtainable via a particular sequence of
controls u(t). The condition relies upon the bracket-generating characteristics of h
(see definition C.4.4) to recover the full Lie algebra g.
In some literature [15], the number of applications of the commutator in order
to obtain g is denoted the depth of the bracket. Moreover, the condition in equation
(C.5.27) relies upon the fact that exp(h) ⊂ G and that G is compact. This is
an important point: in our final Chapter (and in subRiemannian geometric control
problems generally) admitting a Cartan decomposition g = k⊕p, the isometry group
exp(k) is not compact with respect to G (as [k, k] ⊆ k). One cannot reach arbitrary
states (limits) in p. However, symmetric spaces admitting such a decomposition
exhibit the Lie triple property with respect to p, namely that [p, [p, p]] ⊆ p where
[p, p] ⊆ k, which is a necessary requirement which can be expressed as:

[[p, p], p] ⊆ p (C.5.28)

Controllability is sometimes segmented into three types: (a) pure state control-
lable, where for U (0), U (T ) ∈ G, there exist controls {uj (t)} rendering both reach-
able; (b) equivalent state controllable, where {uj (t)} exist to reach U1 (T ) up to some
phase factor ϕ such that U (T ) = exp(iϕ)U1 (T ); and (c) density-matrix controllable
such that there exist controls {uj (t)} enabling ρi → ρk for all ρi .ρk ∈ B(H) (i.e. any
state is reachable from any other).

C.5.5.2 Symplectic manifolds and Hamiltonian flow

We can express the above in terms of symplectic manifolds and Hamiltonians which
elucidates the deep connection between Hamiltonian dynamics and the geometric
formalism above. Given a control system defined on M, the Hamiltonian can be
represented as a map H : T ∗ M → R which generates a flow (see definition C.1.2.4)
on the cotangent bundle which represents the phase space of the quantum system
(as we discuss below). As discussed below (definition C.5.7), one forms define a
Hamiltonian vector field associated with the Hamiltonian. Hamiltonian flow is the
local flow generated by a Hamiltonian vector field XH in a symplectic manifold. That
is, the flow ϕX
t
H
(to connect with local flow (section C.1.2.4)) constitutes evolution in
phase space along integral curves consistent with symplectic structure. The flow is
given by Hamilton’s equations and in a quantum context by Schrödinger’s equations.
C.5. GEOMETRIC CONTROL THEORY 383

For more context, as Hall [45] notes, intuitively a symplectic manifold M is

one with sufficient structure to define the Poisson bracket of two functions f1 , f2 ∈
C ∞ (M) on it.

Definition C.5.6 (Symplectic manifold). A symplectic manifold is a smooth dif-

ferentiable manifold equipped with a non-degenerate (0, 2)-form ω on M.

Symplectic manifolds are even dimensional as odd-dimensional bundles cannot

be equipped with non-degenerate skew bilinear forms. Thus we assume M is of
dimension 2n for convenience. It can be shown that the cotangent bundle T ∗ M
of any manifold has the structure of a symplectic manifold. Given a coordinate
system xi then, as per our discussion above, we can represent an element in Tp∗ M
as ϕ = X j dxj with pj the coordinate functions in R. Recall that π -1 : M → T ∗ M .
Define a one-form θ(X) = ϕ(π ∗ (X)) sometimes denoted the tautological one-form
(so called because it returns the value of θ acting on tangent vectors at p). Define
a (0,2)-form via the exterior product ω = dθ = dpj ∧ dxj . As discussed above, the
non-degeneracy of ω allows the identification of Tp M with Tp∗ M. The two-form ω
also has a corresponding dual defined on T ∗ M (taking duals as its arguments) given
by ω -1 : T ∗ M × T ∗ M → R. Then, given f, g ∈ C ∞ (M), we can define the Poisson
bracket in differential formalism as:

{f, g} = −ω -1 (df, dg) (C.5.29)

which anti-commutes {g, f } + {f, g} = 0 with {f, gh} = {f, g}h + g{f, h}. The term
ω -1 is taken from Hall [45] an denotes the bilinear form on T ∗ M that arises by way
of the canonical identification of T ∗ M with T M if ω is non-degenerate. With these
identifications and in particular the isomorphism described above, we can define the
Hamiltonian vector field as follows.

Definition C.5.7 (Hamiltonian vector field). For f ∈ C ∞ (M), we denote Xf ∈

T M the Hamiltonian vector field associated to f if:

df = w(Xf , ·) (C.5.30)

where Xf corresponds to df under such isomorphism given by ω.

In this case, we can relate such Hamiltonian vector field to the Poisson bracket
via:

Xf (g) = {f, g} = −Xg (f ) ω(Xf , Xg ) = −{f, g} (C.5.31)

for f, g ∈ C ∞ (M). Thus conceptually we connect vector fields to Poisson brackets

in Hamiltonian formalism. We then define the Hamiltonian flow generated by f ,
384 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

as a local flow φf being the local flow generated by −Xf , such flow preserving ω,
expressed via LXf ω = 0.
We then have that f, g, h ∈ C ∞ (M) form a Lie algebra under the Poisson bracket
satisfying the Jacobi equation, which in turn allows us to transition from the Poisson
bracket to the commutator via:

[Xf , Xg ] = X{f,g} (C.5.32)

For the flow φ generated by −X on M, if φ preserves ω then we can represent

X = Xf . In this case, we can assert that the flow is locally (or, if applicable to all
M, globally) Hamiltonian. We then have the concept of a Hamiltonian generator
where φ = φf . As Hall notes [45], any smooth function on a symplectic manifold
M generates a Hamiltonian flow. We can then define the familiar Hamiltonian H
defined on T ∗ M. We then obtain the definition of a Hamiltonian and Hamiltonian
system in such geometric terms.

Definition C.5.8 (Hamiltonian systems and Hamiltonians). A Hamiltonian system

on a symplectic manifold M = T ∗ M is a tuple (M, ΦH ) where ΦH represents the
Hamiltonian flow generated by the Hamiltonian H on M. Conserved quantities
for the Hamiltonian system are represented by functions f such that f (ΦH (v)) is
independent of t for each v ∈ M

d
f (ΦH H
t (z)) = {f, H}(Φt (z)), (C.5.33)
dt

for all z ∈ N , or, more concisely,

df
= {f, H}. (C.5.34)
dt

The functions f are conserved quantities if and only if {f, H} = 0.

The Hamiltonian H dictates the flow or vector field on this manifold by deter-
mining the direction and rate of change of state variables in M (i.e. curves γ(t)
represented as sequences of unitaries U (t)).
We briefly connect to the formulation of Hamiltonians, as the generator of time
translations, in terms of controls uj (t) and generators in g. The Hamiltonian deter-
mines a Hamiltonian vector field on the symplectic manifold T ∗ M (the cotangent
bundle of M). The (0, 2)-symplectic form ω encodes information about the evo-
lution of γ(t) ∈ M as shown in equation (C.5.29). Given a Hamiltonian function
H(q, p) on a symplectic manifold M equipped with ω, the Hamiltonian vector field
Xf is defined as per equation (C.5.30) above, recalling dH is the differential of H. In
local phase space coordinates (p, q) we have ω = nj dpj ∧ dq j where n denotes the
P
C.5. GEOMETRIC CONTROL THEORY 385

degrees of freedom of the system. The vector field is then defined using Hamilton’s
equations:
n
X ∂H ∂ ∂H ∂
Xf = − j (C.5.35)
j
∂pj ∂q j ∂q ∂pj

In canonical position and momenta coordinates (q j , pj ):

∂H ∂H
q̇ j = ṗj =− . (C.5.36)
∂pj ∂q j

Hence we see how the Hamiltonian H determines the rates of change (i.e., the
velocities) of the coordinates q j and pj , establishing the dynamics of the system
on the symplectic manifold. The Hamiltonian vector fields and the corresponding
flows not only preserve this symplectic structure but also facilitate the analysis of
dynamical properties, such as the conservation laws dictated by the Poisson brackets,
which are essential in the study of classical Hamiltonian systems. While quantisation
methods allow transition from classical to quantum formalism, there are a number
of subtleties between geometric phase spaces in classical and quantum case (see
[263] which also discusses the significance of the Bell–Kochen–Specker theorem for
translating between classical and quantum phase space formalism).

C.5.5.3 Geodesics and Hamiltonian Flow

Finally in this section, we mention connections with the symplectic Hamiltonian

flow formalism above and optimal curves on M. We can relate a geodesic γ(t) on a
Riemannian (or subRiemannian) manifold M to the Hamiltonian dynamics above by
considering the cotangent bundle T ∗ M and a form of (kinetic) Hamiltonian given by
H(x, p) = 12 g ij (x)pi pj , where g ij is the inverse the metric tensor (see section C.1.4.1)
i.e. we have a Hamiltonian quadratic in momenta. In this case, it can be shown
that Hamilton’s equations are equivalent to the geodesic equations, providing a
phase space formulation of geodesic flows. The Hamiltonian system is characterised
by geodesic flow on T M, and the geodesics represent integral curve projections of
this Hamiltonian flow onto M. The Hamiltonian vector fields above preserve the
symplectic form ω on T ∗ M, implying that the energy H is conserved along these
flows, thereby defining a Hamiltonian flow where dH = 0 along the trajectories.
Thus we see how symplectic representations of state evolution can provide conceptual
maps between Hamiltonians in a quantum information context and time-optimal
geodesic curves γ(t) from a geometric perspective (see Terry Tao’s blog [264], noting
also that often one reframes the problem of geodesic flow in terms of energy rather
than geodesic paths due to degeneracy among such paths i.e. where paths belong
386 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

to a cut locus above such that there is more than one optimal geodesic).

C.5.6 KP Problems
A particular type of subRiemannian optimal control problem with which we are
concerned in Chapter 5 is the KP problem. In control theory settings, the problem
was articulated in particular via Jurdjevic’s extensive work on geometric control
[19, 26, 57, 58], drawing on the work of [59] as particularly set out in [23] and [60].
Later work building on Jurdjevic’s contribution includes that of Boscain [24, 61, 62],
D’Alessandro [15, 17] and others. KP problems are also the focus of a range of
classical and control problems focused on the application of Cartan decompositions
of target manifolds G = K ⊕P , such as in nuclear magnetic resonance [21] and others
(see our final chapter). The essence of the KP problem is where M corresponds to a
semi-simple Lie group G together with right-invariant vector fields X(M) equivalent
to the corresponding Lie algebra g. In this formulation, the Lie group and Lie algebra
can be decomposed according to a Cartan decomposition (definition B.5.2) which
is, recalling equation:

g=k⊕p

together with the Cartan commutation relations (Lie triple):

[k, k] ⊆ k [k, p] ⊆ p [p, p] ⊆ k

and equipped with a Killing form (definition B.2.12) which defines an implicit pos-
itive definite bilinear form (X, Y ) which in turn allows us to define a Riemannian
metric restricted to G/K in terms of the Killing form. Thus for our purposes,
we understand the KP problem as a minimum time problem for a subRiemannian
symmetric space G/K.

C.5.7 SubRiemannian control and symmetric spaces

Assume our target groups are connected matrix Lie groups (definition B.2.1). Recall
equation (C.5.1) can be expressed as:
X
γ̇(t) = Xj (γ)uj (t) (C.5.37)
j

where Xj ∈ ∆ = p, our control subset. For the KP problem, we can situate

γ(0) = I ∈ M (at the identity) ||û|| ≤ L, in turn specifying a reachable set R(T ).
As D’Alessandro et al. [56,63] note, reachable sets for KP problems admit reachable
C.5. GEOMETRIC CONTROL THEORY 387

sets for a larger class of problems. Connecting with the language of control, we can
frame equation (C.5.1) in terms of drift and control parts with:
X
γ̇(t) = Aγ(t) + Xj (γj )uj (t) (C.5.38)
j

where Aγ(t) represents a drift term for A ∈ k. Our unitary target in G can be
expressed as:
X
γ̇(t) = exp(−At)Xj exp(At)γ(t)uj (C.5.39)
j

Following Jurdjevic’s solution [60] (see also [56]), optimal pathways are given by:

γ̇(t) = exp(Ak t)Ap exp(−Ak t)γ(t) γ(0) = 1 (C.5.41)

γ(t) = exp(−Ak t) exp((Ak + Ap )t) (C.5.42)

resulting in analytic curves. As we explore in our final Chapter and as noted in the
literature, one can select Ap ∈ a ⊂ p where a is the non-compact part of a maximally
abelian Cartan subalgebra in p. In this regard, we see conjugation of a Cartan
subalgebra element by elements of K, reminiscent of the KAK decomposition itself.
It is also worth noting that a being a maximal subalgebra means that equation
(C.5.39) is invariant under the action of K ∈ k, reflected in the commutation relation
[k, p] ⊆ p. Albertini et al. [56] note that for γ(t) ∈ / CL(M) with H the isotropy
group of γ then the tuple (Ak , Ap ) minimising the geodesic is invariant under the
action of h ∈ H. This gives rise to a general method for time optimal KP control
problems set out in [56]: (i) identify the symmetry group of the problem, (ii) specify
G/K, (iii) find boundaries of R (which may require numerical solution), (iv) find
the first value of t such that π(γ(t)) ∈ π(R) and (v) identifying the orbit space
within which the optimal control exists and then moving within an orbit to the final
target UT = γ(T ). We draw the reader’s attention to work of Jurdjevic [23, 213]
for discussion and in particular D’Alessandro [15] (§6.4.2) for detailed exposition
of this Cartan-based solution to the KP problem. In our final chapter, we revisit
this method demonstrating how our novel use of a global Cartan decomposition and
388 APPENDIX C. APPENDIX (DIFFERENTIAL GEOMETRY)

variational methods give rise to time optimal synthesis results consistent with this
literature.
Appendix D

Appendix (Quantum Machine

Learning)

389
390 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
ABC

D.1 Introduction

In this Appendix, we survey literature from classical and quantum machine learning
relevant to later chapters. We include a high-level review of key concepts from sta-
tistical learning theory for both context and utility. We specifically focus on deep
learning architecture adopted in Chapters 3 and 4, building up towards an exegesis
on neural networks by showing how they are a non-linear extension of generalised
linear models. We focus on specific aspects of neural network architecture and opti-
misation procedures, specifically stochastic gradient descent and its implementation
via backpropagation equations. We then provide a short overview the burgeoning
field of quantum machine learning. We track how quantum analogues of classical
machine and classical statistical learning have developed, noting key similarities and
differences. In particular, in order to contextualise the learning protocols adopted
in Chapters 3 and 4, we provide a comparative analysis of learning in a quan-
tum context, with a specific comparative analysis between quantum and classical
backpropagation techniques. We also focus on the relationship between learning
(in both classical and quantum contexts [265]) and measurement, emphasising the
hybrid nature of QML learning protocols (in the main) arising from dependence
upon quantum-classical measurement channels. We examine how techniques from
geometry, algebra and representation theory have been specifically (and relatively re-
cently, in some cases) integrated into both classical and quantum machine learning
strategies, such as in the form of equivariant and dynamical Lie-algebraic meth-
ods [266, 267]. In doing so, we provide a survey overview of geometric machine
learning, together with the use of representation theory and differential geometry in
relevant aspects of classical machine learning, briefly summarising recent results in
invariant and geometric QML [35, 268–270]. Finally, we map out key architectural
characteristics of hybrid quantum-classical methods denoted as greybox machine
learning used in our work above.

391
392 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

D.2 The Nature of Learning

D.2.1 Taxonomies of learning

In this section, we set out a brief synopsis of key principles of machine learning, in-
cluding foundational elements of supervised and unsupervised models, architectural
principles regarding data, models and loss functions. We cover principles behind
training, optimisation, regularisation and generalisation, together with the primary
classes of algorithm used in this work. Our treatment focuses on both classical and
quantum forms of machine learning.

A common taxonomy for approaching learning problems involves classification

in terms of concepts of data, modelling, risk/loss and algorithm type (supervised or
unsupervised learning) [64]. The concept of learning is then manifest in the ability
of a model to accurately predict or reproduce known information (such as labels
from training data) and to generalise to out-of-sample data well. Other adjuncts for
characterising learning of algorithms include algorithmic expressibility or complexity
measures [271]. A model is then represented as an algorithm which achieves these
objectives, as measured by figures of merit, such as empirical risk, training and gen-
eralisation error, algorithmic complexity and others. We discuss these in more detail
with our summary of statistical learning theory concepts below. In many cases, the
models in question, both classical and quantum, are extensions of well-understood
statistical modelling techniques, such as generalised linear models [12,272], graphical
models, kernel methods [273] or deep-learning algorithms [14]. Quantum machine
learning adapts concepts from modern machine learning (and statistical learning)
theory to develop learning protocols for quantum data, or using quantum algorithms
themselves including generative models [165,274] have been adapted in quantum set-
tings. The landscape of QML is already vast thus for the purposes of our work it will
assist to situate our results within the useful schema set out in [64] below in Table
(D.1). Our results in Chapters 3 and 4 fall somewhere between the second (classi-
cal machine learning using quantum data) and fourth (quantum machine learning
for quantum data) categories, whereby we leverage classical machine learning and
an adapted bespoke architecture equivalent to a parametrised quantum circuit [65]
(discussed below).
D.3. STATISTICAL LEARNING THEORY 393

QML Taxonomy
QML Division Inputs Outputs Process
Classical ML Classical Classical Classical
Applied classical Quantum (Clas- Classical (Quan- Classical
ML sical) tum)
Quantum algo- Classical Classical Quantum
rithms for classi-
cal problems
Quantum algo- Quantum Quantum Quantum
rithms for quan-
tum problems
Table D.1: Quantum and classical machine learning table. Quantum machine learning covers four
quadrants which differ depending on whether the inputs, outputs or process is classical or quantum.

D.3 Statistical Learning Theory

D.3.1 Classical statistical learning theory

The formal theory of learning in computational science is classical statistical learning
theory [12, 13, 66, 67] which sets out theoretical conditions for function estimation
from data. In the most general sense, one is interested in estimating Y from as some
function of X, or Y = f (X) + ϵ (where ϵ indicates random error independent of X).
Statistical learning problems are typically framed as function estimation models [13]
involving procedures for finding optimal functions f : X → Y, x 7→ f (x) = y (where
X , Y are typically vector spaces) where it is usually assumed that X, Y ∼ PXY i.e.
random variables drawn from a joint distribution with realisations x, y respectively.
Formally, this model of learning involves (a) a generator of the random variable X ∼
PX , (a) a supervisor function that assigns Yi for each Xi such that (X, Y ) ∼ PXY
and (c) an algorithm for learning a set of functions (the hypothesis space) f (X, Θ)
where θ ∈ Θ parametrises such functions.
Typically X are denoted the features and Y are denoted the labels. Feature and
label data is usually classified into discrete or continuous data. The two primary
types of learning problems are supervised and unsupervised learning (defined below)
where the aim is to learn a model family {f } or distribution P.

Definition D.3.1 (Supervised learning). Given a sample Dn = {Xi , Yi } comprising

tuples (Xi , Yi ) ∈ X × Y (where for simplicity it is assumed Dn is i.i.d) where X, Y ∼
PXY , a supervised learning task consists of learning the mapping f : X → Y for both
in-sample and out-of-sample (Xi , Yi ).

Unsupervised learning is defined as for D = {Xi }, X ∼ PrX unsupervised learn-

ing does not utilize corresponding output labels Yi for each input Xi . Instead, the
objective is to discover underlying patterns, structures, or distributions within the
394 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

dataset Dn , such as clusters or classifications. Semi-supervised learning is where

some but not all Xi are labelled. Self-supervised learning is where a model f is
trained on data in X itself (e.g. where traditionally unlabelled data is used in effect
as a label, such as where X is a sequence of data and each element in the sequence
is predicted using the previous items in the sequence). The primary subject of this
work is supervised learning of unitary operators as elements of Lie groups of rele-
vance to quantum information processing, with a focus on U (t) ∈ G for G = SU (2)
or SU (3).

Finding optimal functions intuitively means minimising the error ϵ = |f (x) − y|

for a given function. This is described generically as risk minimisation where, as
we discuss below, there are various measures of risk (or uncertainty) which typ-
ically are expectations (averages) of loss. The objective of supervised learning
then becomes one of generalisation, which is intuitively the minimisation of risk
across both in-sample and out-of-sample data. More formally, given a loss function
L : Y × y → R, (f (x), y) 7→ L(f (x), y) and a joint probability distribution over
X × Y denoted by (X, Y ) ∼ PXY and associated density pXY (with usual assump-
tions regarding applicable measure, Lebesgue for continuous, counting for discrete),
then an expectation operator can be defined as:
Z Z
E[f (X, Y )] = f (x, y)d PXY (x, y) = f (x, y)pXY (x, y). (D.3.1)

The statistical risk or true risk of f is then given by:

R(f ) = E[L(f (X), Y )|f ]. (D.3.2)

Statistical risk is an estimator, representing the average (expected) loss of f with

respect to L. The minimum risk (Bayes risk ) is the infimum of R(f ) over all
such measureable functions (learning rules) f denoted R∗ = inf∗ R(f ) where F ∗
f ∈F
represents the set of measureable functions f such that F ⊂ F ∗ . Of course the
challenge is that PXY is usually not known such that the estimation of f relies
upon random samples of training data comprising n examples of input x and label y
data, denoted Dn = {Xi , Yi }n (where for simplicity it is assumed Dn is i.i.d and x, y
represent realisations of random variables X, Y ). The learning task then becomes
framed as finding an estimate fˆ of f (conditional upon Dn ) among a set of functions
(prediction rules) F. This functional estimator conditioned on the sample is denoted
fˆn (x) = f (x; Dn ). The learning objective then aims to minimise R(fˆn ) with high
probability over the distribution of data, i.e. we want to minimise the expected risk
given by E[R(fˆn )]. Intuitively this means the average error (risk) of the prediction
rules is minimised over the distribution (of samples) Dn i.e. fˆn performs well on
D.3. STATISTICAL LEARNING THEORY 395

average across Dn . Thus the objective becomes one of optimising the selection rule
of f ∈ F given training data Dn (i.e for the specific algorithm selecting f given Dn )
rather than to minimise a specific f .

D.3.1.1 Empirical risk

The concept of empirical risk enables an estimation of statistical risk on out of

sample (unseen) data using Dn [67], a sample of n i.i.d. samples of training data
i.e. Dn allows us to estimate.

Definition D.3.2 (Empirical risk). Given a sample Dn = {Xi , Yi }n , loss function

L and a function (hypothesis) f ∈ F , the empirical risk is denoted:
n
1X
R̂n (f ) = L(f (Xi ), Yi ) (D.3.3)
n i

and represents an estimate of the average loss over the training set comprising n
samples of Xi , Yi .

The objective then becomes learning an algorithm (rule) that minimises empirical
risk, thereby obtaining an optimal estimator across sampled and out-of-sample data,
namely:

fˆn = arg minR̂n (f ) (D.3.4)

f ∈F

i.e. the f that minimises R̂. As the number of samples n → ∞, then by assumption
(of i.i.d. Dn ), R̂(f ) → R(f ). Equation (D.3.3) is typically reflected in (batch) gra-
dient descent methods that seek to solve for the optimisation task (D.3.4). Typical
gradient descent rules adopt an update rule, which can usefully (for the purposes of
comparison with quantum) be conceived of as a state transition rule. An important
(and ubiquitous) class of optimisation algorithm is gradient descent (and backprop-
agation) which requires constraining estimators fˆ that enable R̂n (f ) to be smooth
(continuous, differentiable), a requirement of f ∈ C k , the class of all k-differentiable
functions (ideally C ∞ ). Moreover, it is typical that f is chosen to be a parameterised
function f = f (θ) such that the requisite analytic structure (parametric smooth-
ness) is provided for by the parametrisation, typically where parameters θ ∈ Rm .
The parameters θ are often denoted the weights of a neural network, for example
where R̂n (f (θ)) is often just denoted as a function of the parameters (given the data
is fixed).
396 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

Under such assumptions, it can be shown (assuming the analyticity of L) that:

n
1X
R̂n (f ) = L(F (Xi (θ)), Yi ) (D.3.5)
n i

is smooth in θ, which implies the existence of a gradient ∇θ R̂n (f (θ)) which is a key
requirement of backpropagation. The update rule is then a transition rule for θ that
maps at each iteration (epoch) θi+1 = θi − γ(n)∇θ R̂n (f (θ)). Here γ(n) ≥ 0 is the
step-size whose value may depend upon the iteration (epoch) n (with ∞ γ(n) = ∞
P

and ∞ γ 2 (n) < ∞, the square-integrable condition with respect to the Lebesgue
P

measure [67]).

D.3.1.2 Common Loss Functions

A variety of common loss functions are used as empirical risk estimators. Two pop-
ular choices across statistics and also machine learning (both classical and quantum)
are (a) mean-squared error (MSE) and (b) root-mean squared error (RMSE). Given
data Dn (X, Y ) with (Xi , Yi ) ∼ PXY and a function estimator fθ as per above, MSE
is defined as follows.

Definition D.3.3 (Mean Squared Error). The MSE for a function f parameterized
by θ over a dataset Dn is:
n
1 Xˆ 2
MSE(fθ ) = fθ (Xi (θ)) − Yi (D.3.6)
n i=1

i.e. L2 loss. MSE calculates the average of the squares of the differences between
p
predicted and label values. RMSE is defined as MSE(fθ ). Other common loss
functions include (i) cross-entropy loss e.g. Kullback-Leibler Divergence (see [12]
§14) for classification tasks and comparing distributions (see section A.1.8 for quan-
tum analogues), (ii) mean absolute error loss and (iii) hinge loss. The choice of loss
functions has statistical implications regarding model performance and complexity,
including bias-variance trade-offs.

D.3.1.3 Model complexity and tradeoffs

As we discuss below, there is a trade-off between the size of F and empirical risk
performance in and out of sample. Such risk R̂n (f ) can always be minimised by
specifying a large class of prediction rules. The most extreme example being f (x) =
Yi when x = Xi and zero otherwise (in effect, F containing a trivial mapping of Xi
to Yi for all Xi ). In this case, R̂n (f ) → 0, but performs poorly on out of sample
data. Such an example also provides more context on what it means to learn,
D.3. STATISTICAL LEARNING THEORY 397

namely that learning is properly characterised as selecting F that minimises R̂n (f )

across in-sample and out-of-sample data. The size and complexity of prediction
rules F illustrates a tradeoff between approximation and estimation (known as ‘bias-
variance’ tradeoff for squared loss functions). This tradeoff is formally represented
by excess risk, the difference between expected empirical risk and Bayes risk:

E[Rn (fˆn )] − R∗ = E[R̂n (fˆn )] − inf R(f ) + inf R(f ) − R∗ (D.3.7)

f ∈F f ∈F
| {z } | {z }
estimation error approximation error

Recalling E[R̂n (fˆn )] is contingent upon fˆ estimated from Dn (X, Y ) (the data sam-
ples) and that the optimal f (one that minimises statistical risk) is represented by
inf R(f ), estimation error captures how well fˆ, which is learnt from data, performs
f ∈F
against all possible choices in F. By contrast, as approximation error indicates the
best choice of f to minimise statistical risk, then the approximation error indicates
the deterioration in performance arising from restricting F to subsets of all possible
f ∈ F ∗ (i.e. all f measurable). The tradeoff is characterised by the fact that if F
becomes large (more prediction rules), estimation error tends to zero (i.e. there is
little that must be learned from the data), but the approximation error increases.
Recalling that by increasing the size of F, the size of R̂n (fˆ) can be minimised (in-
tuitively, we have a greater selection of f to choose from according to which R̂n (fˆ)
may be made small). But the chosen f that renders R̂n (fˆ) minimal, namely fˆn , will
overfit the data because R̂n (fˆ) is not a non-optimal estimate of the true risk Rn (fˆn ).

D.3.2 Reducing empirical risk

Two empirical risk-based minimisation methods for handling overfitting are (a) to
limit the size of F in order to control estimation error. However, given the estima-
tion/approximation error tradeoff, placing upper bounds on estimation error also
places lower bounds on approximation error; and (b) including a penalty metric
(the basis of regularisation) that penalises model complexity. In this latter case, we
express the inclusion of the penalty (cost) term CP (f ) as:

fˆn = arg min{R̂n (f ) + CP (f )}. (D.3.8)

f ∈F

In practice, CP (f ) may be proportional to the degree of f (e.g. for a polynomial) or

the norm of the derivative of f . Often CP (f ) is proportional to parameter norms.
Limiting training data size n is one method of limiting the size of F (such as the
method of sieves) on the assumption that F increases monotonically with n. In
Bayesian contexts, where empirical risk reflects a log likelihood function, C(f ) is
interpretable as incorporating prior knowledge about the likelihood of models. In
398 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

this case, setting exp(−CP (f )) is a prior probability distribution on F (i.e. the

probability for f ), then f being probable equates to small CP (f ) and vice versa.
Other models, such as ‘description length methods’, estimate model complexity via
the number of bits required to represent the model. Other techniques include hold-
out methods involving splitting training data D into training sets DT and test (or
validation) sets DV , where DT is used to select fˆn and assessed (via a separate
risk measure) against DV . Knowing how to partition the training data can be
difficult. A common approach in machine learning is to use k-fold cross validation
which randomly splits data into training and test sets, removing entry k from D
and minimising the regularised empirical risk [12, 14]. Theoretically demonstrating
guarantees for improved performance can remain challenging, however (see [12] §7.10
for a general discussion).
Regularisation and penalty-based methods are an important means of addressing
empirical risk and form the basis of regularisation to deal with overfitting. We
examine these in particular when reviewing the work of Nielsen et al. [181, 183, 185]
and penalty metrics for geometric quantum control.

D.3.3 No-Free Lunch Theorems

Statistical learning theory (both classical and quantum) also has a number of no free
lunch theorems which place bounds upon the generalisation performance of general-
purpose optimisation algorithms. The classical no free lunch theorem [275] asserts
that for any algorithm A, when averaged over all possible problems, the expected
performance of A is equivalent to any other algorithm A′ . Mathematically, this can
be represented in terms of risk across different types of learning problems.
Given the set F of all possible functions f : X → Y and the set of all distributions
P, for any two algorithms fA and fA′ the no free lunch theorem [275] provides that:
Z Z
RfA (L(f (X), Y )) dP = RfA′ (L(f (X), Y )) dP (D.3.9)
P P

where Rf (L(f (X), Y )) is the true risk (equation D.3.2) associated with algorithm A
for a particular distribution PXY in predicting function f , and f ∈ F. The theorem
indicates that no algorithm can universally minimise true (statistical) risk across
all distributions. In particular, the theorem sets bounds upon how well models can
generalise. Below we discuss the quantum analogue of the no free lunch theorem.

D.3.4 Statistical performance measures

Before moving onto consideration of deep learning algorithms, we mention a few
common statistical measures (including those used in Chapter 3) in the context
D.4. DEEP LEARNING 399

of statistical learning theory. Much machine learning and statistical modelling,

especially classification algorithms, utilise measures such as true positive (TP), true
negatives (TN), false positives (FP) and false negatives (FN) by reference to an
functional binary classifier estimator fˆ(X) : X → Y where Y = {0, 1}. These can
be related in statistical learning terms such as accuracy, a metric used in Chapter 4
(see results’ tables and discussion in section 4.7.2) to compare model performance
among candidate quantum machine learning architectures. Accuracy measures the
proportion of correct predictions among the total number of cases examined and
can be linked with the expectation of the indicator loss:

TP + TN
Accuracy = (D.3.10)
TP + FP + TN + FN
= 1 − E[L(f (X), Y )] = 1 − E[I(f (Xi ) = Yi ))]. (D.3.11)

In terms of statistical learning measures, empirical risk with an indicator function

I(f (Xi ) = Yi )) which is 1 if f (Xi ) = Yi or 0 otherwise can be written as:
n
1X
R̂n (f ) = (1 − I(f (Xi ) − Yi )) (D.3.12)
n i=1
n
1X
=1− (I(f (Xi ) − Yi )) (D.3.13)
n i=1

where n1 ni=1 (I(f (Xi ) − Yi )) is the proportion of correct predictions. Accuracy can
P

be written as:
n
1X
Accuracy(f ) = I(f (Xi ) = Yi )) (D.3.14)
n i=1

in which case empirical risk is one minus the accuracy. Given that E[L(f (X), Y )] =
limn→∞ R̂n (f ), then we can understand accuracy as per above in equation (D.3.11).
Thus we see the inverse relationship between minimising empirical risk and max-
imising accuracy.

D.4 Deep Learning

Chapters 3 and 4 of this work concern the construction of a range of deep learning
architectures to solve optimisation problems in quantum control. In this section,
we cover the key elements of the machine learning architectures adopted in those
chapters. We focus on neural network formalism, with a particular emphasis on
‘greybox’ methods whereby neural network architectures are encoded with a priori
information relevant to the problem at hand, such as via encoding specific laws,
400 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

rules or transformations (such as, in our case, the time-independent approximation

of Schrödinger’s equation) which assist the learning process.

D.4.1 Linear models

Neural networks used in this work derive ultimately from an adaptation of statistical
methods related to generalised linear models. In this section, we briefly survey the
key features of generalised linear models as they relate to neural networks. We
subsequently define neural network architecture in terms of non-linear functional
composition of a variety of (usually) linear models, via layers in a deep neural
network having a representation as a directed graph. Tracking the construction
of neural networks by building up intuition through linear models and then non-
linear networks is a useful way of both demystifying much of the jargon around
neural networks, but also serves to connect the architecture to concepts across our
previous chapters. We begin with basic linear models.

Definition D.4.1 (Linear models). Given X ∈ Rm and Y ∈ R where samples

(X, Y ) ∼ PXY are identically and independently distributed, we define a basic linear
model for estimating Y (assuming it as a scalar for simplicity) as:
m
X
Ŷ = ω0 + XjT ωj + ϵ (D.4.1)
j=1

where ω0 ∈ R is the estimate of bias while ω ∈ Rm is a vector of coefficients (weights)

with ϵ uncorrelated (random) errors.

Sometimes ω0 (being constant) is absorbed such that Ŷ = X T ω where it is un-

derstood that the zeroth index is identity (so we just have the ω0 term). We drop
the transpose symbol on X and other tensors from hereon in, it being understood.
The coefficients ω̂ are determined according to a chosen minimum estimate of em-
pirical risk, usually least squares. Where ωj ∈ Rm satisfies smoothness and other
(mathematically) nice properties, then we can regard ωj as being drawn from a
differentiable manifold M ≃ Rm (homeomorphically) where the basis of ωj can be
considered to be (tangent) vectors, each of which has directions j = 1, ..., m. An ex-
tension to forms of generalised additive model can then be undertaken by considering
projection pursuit regression (used by Hastie et al. [12] as we see below to connect
with neural network architecture). In generalised linear model theory, minimising
empirical risk is often undertaken by imposing penalty metrics in objective/loss
functions in order to steer models away from high-variance (and thus overfitting of)
classes of estimates fˆ. Such methods are sometimes denoted ‘shrinkage methods’
in the vernacular of statistical learning as they aim to minimise coefficient variance
D.4. DEEP LEARNING 401

while retaining predictive power. Ridge regression, common across the sciences, is
one such model utilising ridge functions, being functions that are a combination of
univariate functions with an affine transformation. Ridge coefficients are then those
ω̂ which minimise a penalised loss function, parametrised by some parameter λ ∈ R
(so higher values of ∥ω∥22 incur a higher penalty):

ω̂ridge = arg min L(Y, f (X)) + λ∥ω∥22

(D.4.2)
ω

where here CP (f ) = λ∥ω∥22 connecting with the discussion of penalty terms above.
As we can see (via the loss function, for simplicity we assume L ∈ R), f : Rm → R,
while we can regard the addition of the λ∥ω∥22 term (denoted a regularisation term)
as one that penalises larger weights ω. The regularisation term thereby aims to
address overfitting by reducing the magnitude of ω and thus the variance of the
model. A useful way of connecting generalised linear models with neural networks is
to via projection pursuit models which represent linear combinations of non-linear
ridge functions. We do so in order to elucidate the geometric and directional nature
of statistical learning in this way, in turn connecting with intuition for quantum
geometric machine learning.

A ridge function defined as f (X) = g ⟨X, e⟩ for X ∈ Rm , g : R → R being

a univariate function (a function on the loss function in effect), e ∈ Rm with the
inner product as defined previously (definition A.1.4). The ridge function is constant
along directions orthogonal to em . By construction, vectors in hyperspaces of Rm ,
denoted e1 , ..., em−1 are orthogonal to em i.e. ⟨ej , em ⟩ = 0 for j = 1, ..., m − 1. Thus:

m−1
! m−1
!
X X
f X+ bj e j = g ⟨X, em ⟩ + bj ⟨ej , am ⟩ = g(⟨X, e⟩) = f (X) (D.4.3)
j=1 j=1

which elucidates the invariance under the affine transformation. Ridge functions
then form the basis for what in statistical learning is denoted project pursuit regres-
sion [12] with the general form [276]:

N
X
f (X) = gn (ωm X) (D.4.4)
n

for dimension N . In this formulation, ωm are m unit vectors in the m-directions of

the manifold M. The key element is that the estimator Ŷ = f (X) is a function
of derived features Vm = ωm X rather than X directly. Note for consistency that
our general weight tensor ω is a scaled version of ωm . The ridge functions gn (ωm X)
vary only in the directions defined by ωm where the feature Vm = ωm X can be
regarded as the projection of X onto the unit vector ωm . In general gn there is a
402 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

wide variety of non-linear functions to choose from. As Hastie et al. [12] note, if
N is sufficiently large, then the model can approximate any continuous function in
Rm (see [14] §6.4.1). Thus projection pursuit regression models can be regarded as
universal approximators, a key characteristic of the success of neural networks.

D.4.2 Neural networks

The formal definition of neural networks by Fiesler [277] provides a taxonomy to
understand variable elements.

Definition D.4.2 (Neural network). A neural network can be formally defined as a

nested 4-tuple N N = (T, C, S(0), Φ) where T is network topology, C is connectivity,
S(0) denotes initial conditions and Φ denotes transition functions.

The formal definition of a neural network has influenced later literature and
technical work in the field. We discuss briefly some of the elements below:

1. Topology. A directed graph G = (V, E) of functional composition, with ver-

tices V representing neurons and E edges between neurons comprising. The
topology comprises (a) a framework of Ln layers each with Nn neurons; and (b)
a set of connectivity relations between source neurons in L0 exhibiting certain
properties such as (i) intraconnectivity, self-connectivity or supraconectivity,
(ii) symmetry or asymmetry and (iii) order;

2. Constraints. Bounds upon (a) weight value range ||ωij || ≤ K1 ∈ R, (b) local
thresholds (biases of offsets) and (c) activation range i.e. ||σ|| ≤ K2 ∈ R;

3. Initial state. covering initialisation of (a) weights, (b) local thresholds and (c)
activation values; and

4. Transition functions. These are transition functions for (a) neuron functions
(specifying output of neuron via activation function), (c) learning rules (up-
dating weights and offsets), (c) clamping functions that keep certain neurons
constant and (d) ontogenic functions, those that change network topology.

Modern formulations generally track, with some differences, this taxonomy, with
each category and subcategory being a widely studied subdiscipline of machine learn-
ing research, such as those concerned with optimal network topology, pretraining
and tuning of hyperparameters (see [14]). Of note is that while statistical learning
theory and computer science can provide some theoretical bases for network design,
in general there are few or limited theoretical guarantees relating network architec-
tural features (such as network topology or choice of initialisation). As such, tuning
such architectural characteristics (which are often represented as hyperparameters
D.4. DEEP LEARNING 403

of a model or architecture) tends to be experimental (or in reality trial-and-error)

and empirical in nature. Moreover, as we show in particular in Chapter 4, network
architecture and connectivity is often a key and important element in both the per-
formance of networks but also their description via visual or other means.

D.4.2.1 Neural network components

An elementary neural network can be understood as a non-linear extension of the

generalised linear models discussed above within the network taxonomy. In the
examples below, neural network layers a(l) are indexed by l = 0, ..., L where l = 0
is the first layer and l = L the output layer. We consider a simple fully-connected
feed-forward network comprising an input layer a(0) with m inputs i.e. X ∈ Rm
(whose activation function, as we discuss below, can be considered identity neuron
functions), a set of hidden layers a(1) to a(l−1) and a final layer a(L) that provides
(l) (l)
predicted outputs. Each layer comprises N neurons so that a(l) = (a1 , ..., anl ) where
nl is the number of neurons in layer l. That is, each neuron has its own activation
function though in practice these are considered by convention the same for each
neuron (for simplicity we keep the number of neurons in each layer constant). Each
neuron in each layer takes the inputs of (all or some) neurons in the previous layer
(or the initial data X) as an argument for its own activation function (which is
usually non-linear) comprising a weight tensor (which for consistency with linear
models we generally denote ω) and a bias term ω0 which we usually absorb into ω.
Here we adopt the general case of ω ∈ Rn×m to allow for subsequent layers that may
have n neurons. Activation functions for the basis for the non-linearity of neural
networks. We define activation functions below.

Definition D.4.3 (Activation function). Given an input vector X ∈ Rm , weight

tensor ω ∈ Rn×m and bias term ω0 ∈ Rn we define the affine transformation z =
ωX + ω0 ∈ Rn . An activation function is then defined as:

σ : Rm → Rn σk : R → R k = 1, ..., n

σ(z) = σ(ωX + ω0 ) = σ1 (z), ..., σn (z)

where the activation function σ(z) is (usually) applied element-wise.

Each neuron in each layer is constituted by an activation function σk and together

they form an n-dimensional layer (or more precisely each layer has its own number
of neurons nl but we leave this understood). Activation functions σ are generally
classified into one of the following types: (i) ridge functions (as described above)
404 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

where f (X) = g(ωX + ω0 ) for a non-linear ridge function g : R → R, (ii) radial

functions where f (X) = g(||X −c||) where c is the centre of the radial basis function
(e.g. for a radius r = ||X − c||), g : R → R is a non-linear function with || · || a
norm (usually Euclidean) or (iii) fold functions such as the softmax function. Fold
functions, such as softmax, are particularly important as they are often interpreted
formally or heuristically as probabilities relating to classification. Let σ : Rm → S
be a function, where S is the standard (m − 1)-simplex defined as:
( m
)
X
S= p ∈ Rm | pi ≥ 0, ∀i ∈ {1, . . . , m}, and pi = 1 . (D.4.5)
i=1

Each point within the simplex S represents a possible probability distribution over
m discrete outcomes, where pi is the probability of the i-th outcome. The function
σ transforms a vector z ∈ Rm into a vector p ∈ S such that each component of p,
denoted as pi , can be interpreted as the probability of the i-th outcome. Framed
in this way, the activation satisfies requirements for interpretation as a probability
distribution, namely (i) pi ≥ 0 (non-negativity) and (ii) m
P
i=1 pi = 1. An example
of an activation function often adopted as a proxy for probability is the softmax
function [278], being a type of fold function:

ezi
pi = Pm (D.4.6)
j=1 e zj

which effectively converts a vector of real numbers into a probability distribution of

possible outcomes.

D.4.2.2 Layers in neural networks

As discussed above, clustering or layering of neurons is a hallmark of neural net-

work architecture. While fully-connected feed-forward networks (where each neuron
in each layer is forward-connected to each other) exhibit relatively symmetric struc-
ture, not all networks are as simply described given concurrent parallel networks,
or common practices such as residual connections that may connect initial layers to
layers further down the hierarchy directly rather than through intermediate layers.
Nevertheless, for exposition we consider in essence a fully-connected feedforward
network. To do so, we describe a canonical multi-layer perceptron model of a feed-
forward network (directly related to one of the candidate models used in Chapter
6, namely the fully-connected model). In the following we denote nl the number of
neurons for layer l.

Definition D.4.4 (Feed-forward neural network). A feed-forward neural network

(also known as a multi-layer perceptron network) consists of multiple layers al of
D.4. DEEP LEARNING 405

neurons ali such that each neuron in each layer (other than the first input layer) is
a (composition) function of each neuron in the preceding layer. Formally:
nl−1 nl−1
! !
(l)
X (l) (l−1) (l)
X (l) (l−1)
ai = σil ωij aj + ωi0 = σil ωij aj (D.4.7)
j=1 j=0

where l indexes each layer, i each neuron in that layer ali valued by an activation
(l)
function σi for that neuron in that layer (definition D.4.3) which is itself a function
(l)
of the sum of weight matrices ωij applied to the output of each neuron of the previous
(l−1) (l)
layer aj together with a bias term ωi0 .
(l) (l)
Sometimes the literature just uses ai or σi for neurons, but the distinction
(l)
is usually maintained to emphasise the activation function as a function and ai
as a neuron. Neurons are also occasionally referred to as ‘units’. Note that in
the right-most term above we have absorbed the bias ωi0 into the summation for
convenience (which can be achieved by considering the zeroth neuron in the layer al−1
0
as the identity). Following the classification of networks above, a fully-connected
neural network with layers indexed from l = 0, ..., L can then be formally described
below. For convenience in the following, we denote the linear argument of activation
functions as follows:
nl−1
(l)
X (l) (l−1)
zi = ωij aj (D.4.8)
j=0

which indicates how the jth neuron in the previous layer l − 1 feeds into the ith
(l)
neuron of the lth layer, weighted by the corresponding weight ωij . We include a
diagram depicting the fully-connected feed forward neural network in Fig. D.1.

D.4.2.3 Neural network schema

A schema of the toy model is set out in Figure D.1:

1. Input layer. The input layer is represented by feature data X = (x1 , ..., xm ) ∈
Rm . For the input layer (l = 0), we can just regard the activation function as
(0)
the identity aj = xj , the j-th (for j = 1, ..., n0 ).

2. Hidden layer. Here the layers indexed by l comprise neurons indexed by i

(l)
represented by their activation function ai . A layer l comprises nl neurons.
For a fully-connected feed-forward layer, this means all neurons in the imme-
diately preceding layer are inputs into each activation function (neuron) of the
immediately subsequent layer:

(l)
a(l) = (a1 , ..., a(l)
nl ) (D.4.9)
406 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

using equation (D.4.7) above.

3. Output layer. The output layer aL is then chosen to accord with the problem
at hand, so for classification problems it may be either a binary classification
or link-function (such as a logit function) giving an outcome σl ∈ [0, 1] inter-
(l)
pretable as a probability (e.g. σi where for example to classify K objects we
would have i = 1, ..., K). The overall output of the network is the estimator
Ŷ , denoted as Ŷ = (Ŷ1 , Ŷ2 , ..., ŶnL ), where nL is the number of neurons in the
layer:

(L) (L) (L)
f (X) = Ŷ = a = a1 , a2 , ..., a(L)
nL . (D.4.10)

Sometimes in the literature the output layer may itself be subject to an additional
transformation (see for example [12] §11.6) but such transformations can always just
be cast as a final layer whose activation function is simply whatever the transfor-
mation is.
As Hastie et al. [12] note, the non-linearity of the activation functions σ expand
the size of the class of candidate functions F that can be learnt. Functionally, the
network can be regarded as a functional composition, with each layer representing a
function comprising linear inputs into nonlinear activations σ l applied sequentially.
We can regard the neural network as a functional composition among activation
functions with inputs X and outputs a(L) . While not the focus here, in geometric
contexts doing so can allow us to frame the network as maps among differentiable
manifolds. We also note that multilayer feed-forward networks are of particular
significance due to the universal approximation theorem [279] for learning arbitrary
functions (and other universal and function approximation results) with a quantum
analogue that functions can be represented to arbitrary error at exponential depth
(which is practically infeasible).

D.5 Optimisation methods

D.5.1 Optimisation and Gradient Descent

The learning component of machine learning is an optimisation procedure designed
to reduce excess risk via minimising empirical risk. The main optimisation algo-
rithm used across most machine learning involves some sort of stochastic gradient
descent whereby parameters θ of estimated functions fˆ(θ) are updated according
to a generalised directional derivative. In keeping with the geometric framing of
much of this work, we present a definition below. Stochastic gradient descent is a
type of gradient descent where only a randomly sampled subset of Dn (X, Y ) is used
D.5. OPTIMISATION METHODS 407

(0)
a1
w1,1
(1)
a1
w1,2
(0)
a2
w1,3
(1)
a2
w1,4  (1)

   (0)   (0) 
(0) a1 w w1,2 . . . w1,n a1 ω1
a3  (1)   1,1  (0)   (0) 
a   w2,1 w2,2 . . . w2,n 
 a2  ω2 
   
w1,n (1)  2  = σ  . .. .. ..   +
 ..  .   . 

a3  ..
 .   . . .   ..   .. 
(1) w w . . . wm,n (0) (0)
a4
(0)
.. am m,1 m,2 an ωm
. nl−1
!
.. (l)
X (l) (l−1) (l)
ai = σil ωij aj + ωi0
. (1) j=1
am
(0)
an
Figure D.1: Schema of the first two layers of a fully-connected feed-forward neural network (def-
(l)
inition D.4.4) together with associated matrix and algebraic representation. Here ai are the
(l) (l)
input layer neurons, ωlj are the weights (absorbing bias terms) for neuron aj (diagram adapted
from [1]).

to train the model (in practice this involves a batch of randomly sampled training
data). Neural networks are parametrised by a set of unknown weights which are up-
datable (learnable) by means of updating according to an optimisation procedure.
We now discuss probably the most important optimisation technique in quantum
machine learning (as with classical), namely backpropagation based upon stochastic
gradient descent.

D.5.2 Backpropagation
Backpropagation [280–283] is a stochastic gradient descent-based method for opti-
mising model performance across neural network layers. For neural networks, the
most common approach is to use the method of gradient descent to update pa-
rameters, with the particular method by which this update is calculated being via
backpropagation equations. It represents, subject to a number of conditions, an effi-
cient way of performing gradient descent calculations in order to update parameters
ω of models fθ to achieve some objective, such as minimisation empirical risk (equa-
tion (D.3.2)) (loss) by, having calculated an estimate fˆθ (X), propagating the error
δ ∼ |fˆθ −Y | through the network. Propagation here refers to the use of the calculated
error δ in updating parameters ω across network layers. Backpropagation consists
of two phases: (a) a forward pass, this involves calculating each layer’s activation
(l)
function ai (see equation (D.4.7)); and (b) a backward pass where the backpropaga-
tion updates are calculated. From equation (D.3.5), assume a loss function denoted
generically by R̂n (ω) (in Chapters 3 and 4, we use variations of mean square error).
408 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

Firstly, for gradient descent recall the directional derivative C.1.2) in differential
form for Riemannian and subRiemannian manifolds and gradient (definition C.1.5).
Recall our network is a composition of (activation) functions parametrised by weight
(l)
tensors ω. For completeness, in this section: ωij is the weight vector for neuron i
(l)
in layer l that weights neuron j in layer l − 1, a(l) is layer l, ai is the ith neuron in
layer l and nl is the number of neurons (units) in layer l. We define gradient descent
for optimisation using these definitions as follows.
Definition D.5.1 (Gradient descent optimisation). Optimisation by gradient de-
scent is defined as a mapping:

N N
(l+1) (l)
X ∂ R̂k (l)
X
ωij = ωij − γl (l)
= ωij − γl ∇ω(l) R̂k (D.5.1)
k=1 ∂ωij k=1
ij

where (a) R̂k are loss/error values for the kth data point in the training set Dn (X, Y ),
(l)
(b) ωij denote the weights of the ith neuron for the lth layer weighting the jth neuron
of the l−1th layer and (c) γl the learning rate for that layer (which is usually constant
across layers and often networks).
(l)
Equation (D.5.1) updates each weight ωij by reference to each example (Xi , Yi ),
however in practice a subsample of Dn is used denoted a batch, such that R̂k is
an average over the size of the batch size NB i.e. (1/NB ) N
P B
k R̂k (we omit the
summation below for brevity) (note we set k = i for consistency with the choice
of each data point (Xi , Yi )). Calculating ∇ω(l) R̂i relies upon the chain rule. First,
ij
(l)
consider how R̂i varies in the linear case of zi (without applying the non-linear
activation function σ):

(l)
∂Ri ∂Ri ∂zi
(l)
= (l) (l)
(D.5.2)
∂ωij ∂zi ∂ωij

The first of these terns is denoted the error while the second term is shown to be
equivalent to the preceding layer:

(l) ∂Ri
δi = (l)
(D.5.3)
∂zi
nl−1
!
(l)
∂zi ∂ X (l) (l−1)
(l)
= (l)
ωiµ a(l−1)
µ = aj (D.5.4)
∂ωij ∂ωij µ=0

(l)
where the partial derivatives vanish in equation (D.5.4) for all but µ = j. The δi
(l)
term in equation (D.5.3) is denoted the error. As we show below, δi is dependent
upon errors in the l + 1th layer, hence the error terms propagate ‘backwards’, giving
the name backpropagation.
D.5. OPTIMISATION METHODS 409

(L) (L)
For the output layer, we note that R̂i = R̂i (Ŷi , Yi ) = R̂i (σi (zi ), Y ) thus by
the chain rule:
(L)
(L) ∂ R̂i ∂ R̂i ∂σi ∂ R̂i ′ (L) (L)
δi = (L)
= (L) (L)
= (L)
σi (zi ) (D.5.5)
∂zi ∂σi ∂zi ∂σi

and:

∂Ri (l) (l−1)

(l)
= δi aj . (D.5.6)
∂ωij

For hidden layers l < L we have:

nl+1
(l) ∂ R̂i X ∂ R̂i ∂zµ(l+1)
δi = (l)
= (l+1) (l)
(D.5.7)
∂zi µ=1 ∂zµ ∂zi
nl+1 (l+1)
(l+1) ∂zµ
X
= δµ (l)
. (D.5.8)
µ=1 ∂zi

Noting (equation (D.4.8)) then:

nl
X (l+1) l (l)
zµ(l+1) = ωiµ σi (zi ) (D.5.9)
i=0
(l+1)
∂zµ (l+1) l′ (l)
(l)
= ωiµ σi (zi ) (D.5.10)
∂zi

resolves to the backpropagation formula:

nl+1
′
(l) (l)
X (l+1) (l+1)
δi = σil (zi ) ωiµ δµ (D.5.11)
µ=1

with:
nl+1
∂ R̂i ′ (l) (l−1)
X (l+1) (l+1)
(l)
= ∇ω(l) R̂i = σil (zi )aj ωiµ δµ (D.5.12)
∂ωij ij
µ=1

(l)
We can see from equation (D.5.12) that the error δi for layer l is dependent on errors
(l+1)
in the l+1th layer i.e. δµ . In this sense, errors propagate backwards from the final
output layer to the first layer. Backpropagation thus relies firstly on the forward
pass, the estimates fˆk (X) are computed which allows computation of first δkL (errors
based on outputs and labels) (the forward phase) following which they are ‘back-
propagated’ throughout the network via the backpropagation equations (D.5.11)
(the ‘backward phase’). Error terms for layer l − 1 are calculated by weighting via
(l+1) (l+1) ′ (l)
ωiµ the error terms for the next layer δµ and then scaling via σil (zi ). Doing so
410 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

allows computation of the gradients in equation (D.5.2). A training epoch is then one
full round of forward and backward passes. Note that equation (D.5.1) sums over the
training set, referred to as batch learning (the most common case). Alternatively, one
can perform backpropagation point-wise based on single observations (Xi , Yi ) ∈ Dn .
Quantum backpropagation also brings with it a number of subtleties arising be-
cause of the effect of measurement on quantum states. Thus the classical analogue
of backpropagation does not carry over one-for-one to the quantum case. In many
cases, models learn classically parametrised quantum circuit features via implement-
ing offline classical backpropagation (as per above) conditional upon measurement
statistics. Examples of quantum-related backpropagation proposals include, for ex-
ample, a series of papers by Verdon et al. [71–74, 88, 251] where continuous param-
eters are quantised and encoded in a superposition of quantum registers enabling a
backpropagation-style algorithm. Other examples, including quantum neural net-
work proposals [70, 284, 285] provide automatic differentiation as a means of propa-
gating a form of automatic measurement that is then propagated through a quantum
analogue of a fully-connected network. In each case, one must remember that the
optimisation strategy relies upon measurement which (as per definition A.1.34) is
an inherently quantum-to-classical channel.

D.5.3 Natural gradients

We briefly mention the concept also of natural gradients here, drawn primarily from
geometric information theory [286,287] as a means of connecting with the formalism
of Appendix B. As noted in the literature [28, 287], local gradient descent based
optimisation depends on the choice of local parameters. Given (in the scalar case)
two parameterisations η and θ, with initialisations θ0 = θ(η0 ) and η0 = η(θ0 ), we will
usually have η(θ) ̸= θ(η) such the path and stationary endpoint of gradient descent
in parameter space may be different (using L as our loss function to avoid confusion
with R below):

θt+1 = θt − αt ∇θ Lθ (θt ) ̸= ηt+1 = ηt − αt ∇η Lη (ηt ) (D.5.13)

referring to a generic smooth loss function L ∈ C ∞ (M). Here αt is step-size in the

gradient descent function. The solution to this problem is to select a natural gradient
[288] which provides a directional derivative in the intrinsically steepest direction
with respect to the (Riemann) metric tensor. In information theory, parameter space
can be modelled in terms of a differentiable manifold where updating parameters
corresponds to routes or curves on the manifold. In this case, points on the manifold
p ∈ M have a representation, for example, as a set of parameters (say a vector)
associated with that point i.e. θp . Stochastic gradient descent on Riemannian
D.5. OPTIMISATION METHODS 411

manifolds (M, g) [289], denoted by ∇M , utilises the exponential map (definition

B.2.13) exp : Tp M → M for its update protocol such that:

pt+1 = exppt (−αt ∇M L(pt )) ∇M L(p) = ∇v (L(expp (v))) v=0

(D.5.14)

where ∇v is the directional derivative given in equation C.1.2 and α is step size (see
section D.5.4). Because calculating the exponential term in equation (D.5.14) can
be difficult, instead mapping to the parameter bundle occurs via a first-order Taylor
expansion of the exponential Euclidean retraction R (adapted from [287]):

θt+1 = Rθt (−αt ∇θ Lθ (θt )) (D.5.15)

which shifts p by some infinitesimal amount. Natural gradient descent is then given
by:

θt+1 = θt − αt gθ−1 (θt )∇θ Lθ (θt ) (D.5.16)

where the natural gradient ∇N is defined as:

∇N Lθ (θ) := gθ−1 (θ)∇θ Lθ (θ). (D.5.17)

Here gθ-1 (θ) is the dual Riemannian metric and is invariant under invertible smooth
changes of parametrisation. Note the presence of g -1 indicates that the gradient,
which transforms contravariantly. The benefit of the natural gradient is that it is
invariant under coordinate transformations, though there are some subtleties regard-
ing convergence discussed in the literature [286, 288,289]. The quantum analogue of
natural gradient descent is briefly touched upon in Appendix D as a technique for
optimisation.

D.5.4 Regularisation and Hyperparameter tuning

As the abstract definition of neural networks in section D.4.2 above indicates, there
is a wide combinatorial landscape of parameters for the design of neural network
architectures, including topology, choices of activation functions, loss functions, reg-
ularisation, dropout and so on. Hyperparameters are settings of a neural network
model which are not adapted or updated by the learning protocol of the model in
a direct sense (though model performance may inform ancillary hyperparametric
models). In effect changing the hyperparameter changes the neural network model
itself. We set out briefly a few such parameters, techniques and hyperparameters
used in the results detailed in later chapters.
412 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

(i) Learning rate. The learning rate parameter γr (or gain in adaptive learning)
in equations (D.5.1) represent the step size at each iteration. Mathematically
it represents a scaling of the (directional derivative) ∇ω R and is among the
most important hyperparameters for network performance.

(ii) Epochs. The number of epochs over which training (a full forward- and backward-
pass of the backpropagation algorithm in equation (D.5.11)) is a tunable pa-
rameter of neural networks. In principle the number of epochs can affect the
sensitivity of weights to training data and can risk overparametrisation such
that models do not generalise well. Conversely, insufficient epochs may not
sufficiently minimise empirical risk (definition D.3.2).

(iii) Regularisation. As flagged above, regularisation is a means of seeking to min-

imise variance, and thus overfitting, of models trained on in-sample training
data where inference (i.e. predicting / inferring label outputs Y ) is undertaken
on out-of-sample feature data X.

(iv) Generalisation. Generalisation refers to a statistical measure of risk that cap-

tures the out-of-sample performance of a model (see discussion above).

Finally, we briefly note that while statistical learning theory provides a generalised
framework for analysing model performance, deep learning networks often exhibit
singularities i.e. they are singular models. The fact that over-parametrised deep
learning models can generalise well runs counter somewhat to the implied trade-
off between model complexity and performance at the heart of statistical learning
theories [290]. In such cases, alternative approaches such as singular learning the-
ory provide means of estimating appropriate statistical learning figures of merit for
models (see seminal work by Watanabe [291] generally and [292] for more detail).

D.6 Quantum Machine Learning

In this section we consider how the principles of classical statistical learning and
classical machine learning algorithms are related to the hybrid quantum-classical
models explored later on in this work. In line with our taxonomy focusing on data,
models and metrics (measures), we start with a brief synopsis of how data (quan-
tum or classical) is actually encoded for use in quantum machine learning scenarios.
In this section, we cover a number of important differences between classical and
quantum machine learning. We only sketch a few germane points related to the ob-
jectives of our work. As we discuss in Chapter 3, quantum machine learning is now
a vast and variegated discipline combining elements of quantum computation and
information processing, statistical learning theory, communication theory and other
D.6. QUANTUM MACHINE LEARNING 413

areas. The unifying factor behind quantum machine learning is the use of quantum
data or quantum information processes in ways that enable, constitute, or rely upon
some type of learning protocol as measured or indicated by relevant metrics, such
as in-sample performance, variance, accuracy and so on. The spectrum of quan-
tum machine learning algorithms is expansive, covering quantum neural networks,
parametrised (variational) quantum circuits, quantum support vector machines and
other formulations [35]. Similarly, the application of deep learning principles to
quantum algorithm design, such as via quantum neural networks (see [64, 70] for
an overview), quantum deep learning architectures (including quantum analogues
of graph neural networks, convolutional neural networks and others [71, 72, 74, 88])
speaks to the diversity of approaches now studied in the field. Coupled with these
approaches, emerging trends in the use of symmetry techniques, such as dynamical
Lie algebraic neural networks, symmetry-preserving learning protocols and geomet-
ric quantum machine learning approaches (discussed in particular in section D.8.3)
offer new methods for improving on problem-solving in the quantum and classical
domain. Much literature is also devoted to understanding the differences, some
subtle, some less-so, between classical and quantum machine learning both at the
theoretical level of statistical learning, down to the practical implementation of al-
gorithms in each case (examples include literature in quantum algorithm design and
complexity theory [293]). In this section, we provide a short high-level summary of
some of the key features of quantum machine learning relevant to Chapters 3 and
4, with a particular focus on variational or parametrised quantum machine learning
circuits. First-off, we consider a few key principle differences between classical and
quantum approaches in machine learning, starting with the fundamental distinction
between the need for quantum channels to preserve unitarity (definition A.1.30)
versus the dissipative nature of classical neural networks [294].

D.6.1 Neural networks and quantum systems

As Schuld et al. note [295], the challenge for adapting classical neural network archi-
tecture to the quantum realm in part lies in structural distinctions between classical
and quantum computing. Classical neural network architectures are fundamentally
based upon non-linear and often dissipative dynamics, as distinct from the linear
unitary dynamics of quantum computing. Classical neural networks, operations
such as activation functions and normalisation can lead to a loss of information (e.g.
applying a ReLU activation [296]) can set negative values to zero, constituting infor-
mation loss. Classical neural networks also are often irreversible (such as pooling or
other convolutional network operations) such the network is not, functionally speak-
ing, locally bijective between layers. Quantum information processing, by contrast,
414 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

is constituted by unitary operations on H which preserve the axiomatic linearity of

quantum state space (i.e. that a quantum system remains a linear combination (su-
P
perposition) of basis states j aj |ψ⟩j following unitary evolution). Furthermore,
unitary evolution is information-preserving in that all unitary operations are in
principle reversible (ensuring, among other things, conservation of total probabil-
ity). Thus from the onset, there are key distinctions to be resolved and bridged for
neural network architecture and quantum information processing to be synthesised.
Early progenitors of quantum machine learning architectures adapting princi-
ples form neural networks include Hopfield networks [75], being a (directed) graph
network where weights are indexed by i and the neurons by j. Hopfield networks
usefully illustrate the differences between classical and quantum networks. Hopfield
networks exhibit associative (content-addressable) memory, which allows retrieval of
the network state from P stored network states

X P = {(x11 , ..., x1N ), ..., (xP1 , ..., xP1 )}.

Associative memory therefore allows computation using incomplete input informa-

tion instead of the pattern’s exact storage address in RAM (basis of human mem-
ory and learning). Modern day neural networks represent functional compositions
of non-linear equations. Early forms of neurons included perceptrons (sometimes
denoted binary neurons or linear threshold functions) such as the McCulloch-Pitt
neuron (see [100, 294] for a review of the early history of the field). Fundamen-
tally, as noted in the literature, if quantum neural networks or machine learning
circuits merely induced measurements in order to update the parameters, then they
would proceed classically and in effect destroy (due to measurement collapse (ax-
iom A.1.5) [294]) the underlying quantum properties of the data. Early techniques
for addressing this implicit challenge included leveraging quantum measurement
to select system eigenstates [297], the use of dissipative quantum systems (being
revisited now in light of thermodynamic hybrid quantum computing), encoding pa-
rameter evolution in quantum terms [298], use of entanglement [299], teleportation
and back-action based approaches [300].

D.6.2 Quantum statistical learning

Quantum learning tasks are similarly contingent upon the particular functional
forms to be learnt. For example, [301] consider a class of learning problems con-
cerned with learning the function:

f (x) = tr(OE(|x⟩ ⟨x|)) (D.6.1)

D.6. QUANTUM MACHINE LEARNING 415

where O represents a typical known observable (operator) and E represents a com-

pletely positive trace preserving map i.e. a channel representing physical (possibly
quantum) state evolution. Equation (D.6.1) is generic, representing classical inputs
mapped to real numbers (e.g. results of Hermitian measurement operators O). Ob-
servation (measurement) constitutes an interaction between an experiment and the
physical (quantum) system E, which can be thought of as querying E by way of
O. As with classical, the learning problem is to estimate a function h(x) ≈ f (x)
while minimising statistical and empirical risk, but with the added constraint of
minimising the number of queries on E.

The model in [301] is in many ways generic and reflective of many QML archi-
tectures, reflecting the fact that information extraction for updating protocols in
effect represents a quantum-classical channel (definition A.1.31) via measurement
or in-built trace operations. A comparison with quantum control is again useful in
this regard. For hybrid classical-quantum structures, empirical risk minimisation is
with respect to a classical representation f (x) (e.g. contingent upon measurement
outcomes m given measurement operator M ) and typically a classical loss function
(e.g. fidelity measures, quantum Shannon entropy or some other function of input
and output state representations). Physical intuition is useful here: classical gradi-
ent descent then enters into Hamiltonian H governing the evolution (and transition)
of the quantum system via controls (e.g. coefficients ci that affect generator ampli-
tudes). Other results in quantum-related statistical learning include bounds on the
expressibility of variational quantum circuits [302–306].

The limits of quantum machine learning algorithms from a statistical learning per-
spective have been examined throughout the literature [307]. For example, it is
shown in [307] that challenges persist for quantum machine learning algorithms
with polyalgorithmic time complexity in input dimensions. In this situation, the
statistical error of the QML algorithm is polynomially dependent upon the number
of samples in the training set, such that the best error rate ε is achievable only if
statistical and approximation error scale equivalently, which in turn requires approx-
imation error to scale polynomially with the number of samples. For logarithmic
dependency on sampling, it is shown that the additive sampling error from mea-
surement further introduces a polynomial dependency on the number of samples.
Such constraints affect the viability of whether certain popular or mooted QML al-
gorithms may in fact provide any advantage over classical analogues. Thus in many
cases proposed QML algorithms face challenges posed by barren plateaus, entan-
glement growth and statistical learning barriers. Thus there is, as with classical
machine learning, a need to consider how algorithms may (if at all) be architected
in order to solve such scaling challenges. Doing so is one of the motivations for
416 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

this work and its exploration of techniques from geometry, topology and symmetry
algebras. As discussed in the introduction, in a machine learning context, the ability
to encode prior or known information, offers one route to such efficiency, hence our
synthesis of what we denote (and further articular) as greybox QML together with
geometric QML.

D.6.3 Quantum measurement and machine learning

As discussed in Appendix A, quantum systems are constructed on the basis of un-
derlying measurement instruments and their accompanying theory. In quantum
information processing, measurement is an integral part of the evolution and charac-
terisation of systems as distinct from classical theory. Recall from section A.1.6 that
measurement can be considered as a quantum-classical channel (definition A.1.31)
and equation (A.1.17) that stochastically maps from states ρ to a classical register
(of measurement outcomes) Σ. As per the measurement axioms of quantum me-
chanics, the effect of measurement is to collapse a state ρ into an eigenvalue of the
relevant measurement operator Mm (axiom A.1.5).
Thus the question arises as to how machine learning models of quantum phe-
nomena, trained on stochastic quantum measurement data (such as reconstructions
of unitaries from measurement statistics) ought to integrate such stochasticity and
measurement practices into their algorithms. To begin with, it helps to identify
key structural differences between the classical and quantum measurement and how
these differences may impact learning protocols to be adopted:

(i) Probabilistic outcomes. Quantum measurements are inherently probabilistic

(axiom A.1.4) such that measurements (of outputs) will differ given identical
inputs, but moreover it may be unclear what intermediate quantum evolution
has occurred. Classical machine learning is no stranger to handling uncer-
tainty or probabilistic characteristics of computing, however quantum com-
puting presents distinct challenges given the nature of quantum measurement.

(ii) Wave function collapse. Unlike the classical case, quantum measurement de-
coheres the state into eigenstates of the measurement operator Mm and un-
like classical measurement, the decohering process of quantum measurement
channels thus leads to information loss, thus the dynamic updating becomes
problematic (which is also the reason online control is problematic).

(iii) No cloning and non-repeatability. No cloning theorems applicable in quantum

information (theorem A.1.39) mean that classical machine learning methods
relying upon copying data (especially during training versus simple initial state
preparation) may be unavailable.
D.6. QUANTUM MACHINE LEARNING 417

Our models of quantum simulation in Chapter 3 specifically incorporate mea-

surement data into the QDataSet in Chapter 3. By contrast, in Chapter 4, we train
a greybox neural network to generate approximate geodesics. The network is trained
on geodesic sequences generated from horizontal distributions ∆. Our target (label)
data in that case are unitary targets UT , but the construction of these ultimately
depends upon measurement, which we assume has been undertaken (we do this in
order to focus upon the key objectives of modelling time optimal paths rather than
tomographical reconstruction). We discuss a few of these issues below in the context
of parameter shift rules.

D.6.4 Barren Plateaus

Barren plateaus are a phenomena related to training of QML algorithms using

gradient-based methods. Early papers on quantum algorithm-related barren plateaus
[78] identified phenomena where the variance of a gradient expectation was shown
to decrease exponentially in the number of qubits. In this sense they are somewhat
analogous to the classical vanishing gradient problem [14, 124] albeit with specific
differences (see [79] for a comparison). The vanishing variance of the gradient is
known as a barren plateau, reflecting the geometric intuition where the process
of gradient descent seeking (local or global minima) finds itself in sparse of ‘flat’
(zero-manifold curvature) region. This in line with the principle underlying the di-
rectional derivative (see definition C.1.2 and gradients C.1.5) i.e. that generally, zero
curvature implies a form of (pseudo)-isotropy where there is no necessary preferred
direction of steepest descent, thus confounding the gradient descent protocol. Since
the original paper, a plethora of studies [308–311] have examined conditions under
which barren plateaus may or may not arise, together with strategies for preprocess-
ing or tuning initial conditions (such as parameter initialisation strategies θ [312] or
Bayesian methods [313]) to mitigate or even side-step such issues. Other examples
using control paradigms include [314] which leverage Lie algebras encoded within
certain ansatzes (including popular Hamiltonian variational ansatzes [315]) to show
guarantees for avoiding barren plateaus using such ansatzes may not exist, or show-
ing noise-induced barren plateaus for variational quantum algorithms [316]. Con-
versely, for certain classes of QML algorithms such as quantum convolutional neural
networks, barren plateaus may be absent assuming the architecture is appropriately
chosen [317]. Examples or strategies to address barren plateaus include using adap-
tive quantum natural gradients for greater stability and optimised descent [318], the
use of parameter initialisation [319] (a state efficient ansatz) ground state prepara-
tion, the use of Gaussian parameter initialisations [320] or even category-theoretic
methods [321]. Experimentally, we found, when tuning our quantum deep learning
418 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

architectures in Chapters 3 and 4, initialising parameters as per [124] worked effec-

tively to avoid such issues. Barren plateaus are particularly relevant for researchers
designing QML algorithms.

D.6.5 Encoding data in quantum systems

In this section, we examine protocols for encoding information in quantum algo-

rithms used in quantum machine learning. Most quantum information optimisation
problems involve information encoded in quantum systems, either by construction
in an experiment involving quantum systems themselves, or via encoding exogenous
or classical information into quantum systems (such as qubits) in order to lever-
age the benefits of quantum computation. Both approaches involve the input into
quantum states in a process known as state preparation. The way in which data
is encoded in quantum systems affects the performance and expressiveness many
quantum algorithms [64]. Information is usually encoded using one of four stan-
dard encoding methods including: (a) basis encoding, (b) amplitude encoding, (c)
qsample encoding and (d) dynamic encoding. The first of these, basis encoding,
is a technique that encodes classical information into quantum basis states. Usu-
ally, such procedures involve transformation of data into classical binary bit-strings
(x1 , ..., xd ), xi ∈ {0, 1} then mapping each bit string to the quantum basis state of a
set of qubits of a composite system. For example, for x ∈ RN , say a set of decimals,
is converted into a d-dimensional bit string (e.g. 0.1 → 00001..., −0.6 → 11001..)
suitably normalised such that x = dk (1/2k )xk . The sequence x is given a repre-
P

sentation via |ψ⟩ ∝ |000001 11001⟩ (see [111]). Amplitude encoding associates nor-
malised classical information e.g. for an n-qubit system (with 2n different possible
n P
(basis) states |j⟩), a normalised classical sequence x ∈ C2 , k |xk |2 = 1 (possibly
with only real parts) with quantum amplitudes x = (x1 , ..., x2n ) can be encoded as
Pn
|ψx ⟩ = 2j xj |j⟩. Other examples of sample-based encoding (e.g. Qsample and dy-
namic encoding are also relevant but not addressed here). From a classical machine
learning perspective, such encoding regimes also enable both features and labels to
be encoded into quantum systems.
In Chapters 3 and 4, we assume the standard orthonormal computational basis
{|0⟩ , |1⟩} such that ⟨1|0⟩ = ⟨0|1⟩ = 0 and ⟨1|1⟩ = ⟨0|0⟩ = 1. Quantum states
encode information of interest and use in optimisation problems. They are not
directly observable, but rather their structure must be reconstructed from known
information about the system. In machine learning contexts, quantum states may be
used as inputs, constituent elements in intermediate computations or label (output)
data. For example, in the QDataset (explored in Chapter 3), intermediate quantum
states at any time step may be reconstructed using the intermediate Hamiltonians
D.6. QUANTUM MACHINE LEARNING 419

and unitaries for each example. The code repository for the QDataSet simulation
provides further detail on how quantum state representations are used to generate
the QDataSet [171]. Depending on machine learning architecture, quantum states
will usually be represented as matrices or tensors and may be used as inputs (for
example, flattened), label data or as an intermediate input, such as in intermediate
layers within a hybrid classical-quantum neural network (see [40, 95]). For example,
consider the matrix representation of eigenstates of a Pauli σz operator below:
!
1 0
σz = (D.6.2)
0 −1

In the computational basis, this operator has two eigenstates |0⟩ , |1⟩ for eigenvalues
λ = 1, −1:
! !
1 0
|0⟩ = for λ=1 |1⟩ = for λ = −1 (D.6.3)
0 1

where we have adopted the formalism that the λ = 1 eigenstate is represented by |0⟩
and the λ = −1 eigenstate is represented by |1⟩ (our choice is consistent with Qutip -
practitioners should check platforms they are using for the choice of representation).
These eigenstates have a density operator representation as:

ρλ=1 = |0⟩⟨0| ρλ=−1 = |1⟩⟨1| (D.6.4)

with matrix representations:

! !
1 0 0 0
|0⟩⟨0| = |1⟩⟨1| = . (D.6.5)
0 0 0 1

Each element |a⟩⟨b| = Ea,b can be considered as an operator basis representation

for the relevant density operators (not to be confused with root system vectors in
representation theory below). For machine learning practitioners, one way to think
about density operators (see definition A.1.24) is by associating rows and columns
with bra and ket vector representations:

⟨0| ⟨1|
˙ |0⟩ a

ρ = a|0⟩⟨0| + b|0⟩⟨1| + c|1⟩⟨0| + d|1⟩⟨1|= b (D.6.6)
|1⟩ c d

P
where a, b, c, d ∈ C are the complex-values amplitudes respective. Given ρ = pi ρi ,
the diagonal elements an of the density matrix describe the probability pn of the
420 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

system residing in state ρn , that is

ρnn = an a∗n = pn ≥ 0 (D.6.7)

For pure states, the diagonal along the density matrix will only have one non-zero
element (i.e. it will be 1) so that ρ = ρi . A mixed state will have multiple entries
along the diagonal such that 0 ≤ an < 1. For example, the σz eigenvectors have the
representation:

⟨0| ⟨1| ⟨0| ⟨1|

˙ |0⟩ 1

˙ |0⟩

|0⟩⟨0|= 0 |1⟩⟨1|= 0 0 (D.6.8)
|1⟩ 0 0 |1⟩ 0 1

Sometimes the density matrix representation of a state will be equivalent to the

outer product of the state, but caution should be applied as this is not generally
the case. Translating between the nomenclature and symbolism of quantum infor-
mation to a more familiar matrix representation used in machine learning assists
machine learning researchers to develop their algorithmic architecture. For exam-
ple, the QDataSet simulation code utilises state space representations of data and
operations thereon in order to generate the output contained in the datasets them-
selves. To recover a quantum state |ψ(tj )⟩ = U (tj ) |ψ0 ⟩ ≈ i Uij |ψ0 ⟩, one can apply
Q

the sequence Ui up to i = j (note the order of application is such that Uj ...U0 |ψ⟩).

D.7 Variational quantum circuits

In this section, we sketch the general use of variational quantum algorithms adopted
in Chapters 3 and 4. As Schuld et al. [64] note, variational quantum circuits can
be interpreted as both deterministic and probabilistic machine learning models for
functions fθ : X → Y (or learning a probability distribution Pθ ). As noted in
the literature, quantum circuits parametrised by θ which are updated according
to some objective (cost) function as part of an optimisation process can in some
sense naturally be regarded as akin to machine learning models [65, 80]. Firstly, we
define a parametrised quantum circuit in terms of a unitary U (θ) dependent upon
parameters θ ∈ Rm .

Definition D.7.1 (Parametrised quantum circuits). A parametrised unitary oper-

ator is a unitary U (θ)(t) operator acting on a Hilbert space H parametrised by a set
of parameters θ ∈ Rm . A parametrised quantum circuit is then a sequence of unitary
D.7. VARIATIONAL QUANTUM CIRCUITS 421

operations:
Z T
′ ′
U (θ, t) = T+ exp −i H(θ, t )dt . (D.7.1)
0

In the time-independent approximation, we have

U (θ, t) = UT −1 (θT −1 )(∆t)...U1 (∆t)(θ) (D.7.2)

t=T

i.e. a sequence of unitaries acting on the initial state |ψ(0)⟩ which we can represent
via U0 (t). We assume left-action on states |ψ⟩ where dependence upon t is under-
stood. Recall each Ui (θi (t)) above is the solution to the time-dependent Schrödinger
equation (equation A.1.18) as a sequence of integrals. For simplicity, we assume
the time-independent approximation (equation (A.1.25)). The aim of a variational
quantum algorithm (VQA) is to find an optimal set of parameters minimising a cost
functional on the parameter space given by C(θ) : Rm → R such that:

θ∗ = arg min C(θ).

The term variational in this quantum context [106] derives from the underlying
principles of variational calculus of finding optimal solutions, such as (originally
speaking) computing, for example, lowest energy states of a system. As Schuld et al.
note, the term variational quantum circuit is sometimes used interchangeably with
parametrised quantum circuit (see Benedetti et al. [65] for a still-salient review). In
this work, we adopt this association between the two. We set out the deterministic
and probabilistic form below. For a deterministic model, we let fθ : X → Y and
denote U (X, θ) (a quantum circuit) for initial state X ∈ X and parameters θ ∈
Rm . Let {Ok } represent the set of (Hermitian) observable (measurement) operators.
Then the following function represents a deterministic model:

fθ (X) = Tr(Oρ(X, θ))

where ρ(X, θ) = U (X, θ)ρ0 U (X, θ)† . Common approaches to the form of U (X, θ) in-
clude, in analogy with classical machine learning, U (X, θ) = W (θ)S(X) where W (θ)
represents a parametrised weight tensor with S(X) representing the encoding of ini-
tial features. In this work, we adopt geometric quantum control-style architectures
where the unitary channel layers in each network constructed from Hamiltonian
layers comprising dynamical Lie algebra generators Xj ∈ g (see section C.5.5.1)
and associated time-dependent control functions uj (t) (sometimes denoted cj (t))).
In terms of measurement statistics, then for U (θ) |ψ(0)⟩ = |ψ(t)⟩, then the class
of functions in equation (D.7.3) we seek to learn are represented by measurement
422 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

statistics, i.e. a mapping from inputs (e.g. quantum data x) to R as:

X
fθ (x) = mp(m) (D.7.3)
m

ranging over measurements {m}. In practice because of post-measurement state

collapse, one takes the expectation to estimate fθ as fˆθ (x) = (1/N ) N (s)
P
i m where
m(s) ∈ {m} i.e. the outcome of a random measurement from the set of possi-
ble outcomes. Reducing error fθ − fˆθ ≤ ϵ requires O(ϵ−2 ) measurements, growing
quickly with precision. Thus one must repeat the experiment N times depending
on ϵ sought. In general we are interested in learning distributions of measurement
statistics (from which to reconstruct states or operators) given inputs X ∈ X and
conditional upon labels (outputs) Y. In measurement terms, we regard Yi ∈ Y as
measurement outcomes so we interested in a supervised probabilistic quantum model
for a conditional distribution given by Pθ (m|X), not just Pθ (X) itself. Because
probabilistic quantum models generate samples, they are often considered genera-
tive models.

The typical supervised QML task for unitary synthesis involves a data space Dn (X , Y)
where feature data is drawn from quantum state data for a Hilbert space X ∼ H
and Y ∼ K (which can be real R or complex C -valued) labels. In many cases,
including our two machine learning-based chapters, H is a tensor product for qubits
H⊗ = ⊗n H where dim H⊗ = 2n . The form of Dn (X , Y) = {ρi , yi }i where Dn is sam-
pled from an underlying population. In Chapter 4 for example, the training data
generated is somewhat unusual in that label data (that which we want to accurately
predict) is the sequence n unitaries (Un ) (i.e. so (Un ) ∈ Y) where the target (final
state) unitary UT is actually fed in via an input layer (i.e. UT ∈ X ). In that case,
as we discuss, what we are interested in is conditioned on UT , what set of controls
cj (∆t) can be generated such that the resulting sequence (Un (∆tn )) leads to UT .
Conversely, for the QDataSet in Chapter 3, the training data may vary, such as
being classical measurement statistics arising from m Pauli measurements (in which
case Y ⊂ Rm ). As noted in our discussion on measurement (section A.1.6), for our
final two chapters we assume access to a measurement process that allows us data
about (of) U .

Finally, we include a few comments regarding geometric formulations of parametri-

sations above. Recall that unitary elements in our approach are parametrised by
control functions u(t). However these control functions are themselves parametrised
by θ, i.e. u(t) = u(θ(t)). In this work, we adopt the time-independent approxi-
D.7. VARIATIONAL QUANTUM CIRCUITS 423

mation assuming H(t) is constant for over small ∆t then effectively we can treat
the controls as parametrised by θ and in this way our unitaries U = U (θ(t)). In
many cases, our parameter e.g. θ ∈ Rm can be framed as the fibre of an underlying
manifold K such the commutative diagram in Figure D.2 holds. While not the sub-
ject of this work, understanding the relationship between parameter manifolds and
target (label) data manifolds (often described geometrically in terms of embedding)
is an important research direction utilising techniques from geometry and machine
learning, with specific applications to solving problems in quantum computing.

R Rm h∗
R2n
πK πM

K h
M∼
=G
Figure D.2: Manifold mapping between parameter manifold K and its fibre bundle Rm and target
manifold M with associated fibre bundle spaces R2n . Optimisation across the parameter manifold
can be construed in certain cases as an embedding of the parameter manifold within the target
(label) manifold M, where such embeddings may often be complex and give rise to non-standard
topologies.

D.7.1 QML Optimisation

The quantum supervised learning problem above becomes to learn the distribu-
tion Pθ (Y |X). The most common technique for parametrised quantum circuits is
automatic differentiation [322] via which one can implement stochastic gradient de-
scent and backpropagation algorithms. As with the classical case, a loss function is
chosen, typically the cost functional C(fθ , fˆθ ) = C(θ). Thus for example a loss func-
tion based on the fidelity metric (definition A.1.40) adopted in equation (D.7.4) in
Chapter 4, denoted batch fidelity takes the mean squared error (MSE, see equation
(D.3.6) above) of the loss (difference) between fidelity of our unitary estimator and
the target unitary for empirical risk as:
n
1X
C(F (θ), 1) = (1 − F (U ˆ(θ)j , U (θ)j ))2 . (D.7.4)
n j=1

This equation implicitly relies upon assumptions regarding measurement protocols

that allow us to specify Ûj and Uj . Because we are using channels Uj for states ρj ∈
B(H) we could equivalently specify equation (D.7.4) in terms of the measurement
statistics required to (tomographically) reconstruct Uj (something we touch upon in
Chapter 3), in which case it could be written in terms of informationally complete
measurement statistics (see section A.1.6.3) with loss functions given in terms of
expectation statistics (equation A.1.14) continent upon parameters (i.e. controls)
θ. Before moving onto a sketch of our greybox variational quantum architecture
424 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

adopted in Chapters 3 and 4, it is worth saying a few words about differences in

implementation of gradient descent in the classical and quantum case and parameter
shift rules in particular.

D.7.2 QML Gradients

The discussion of gradient-based optimisation methods above is classically focused,
relying on mathematically nice properties of differentiable manifolds M (such as con-
figuration space or registers parametrising quantum states or operators). Equivalent
and analogous methods of calculating gradients in quantum information processing
have been studied in the context of a variety of quantum and hybrid quantum-
classical machine learning architectures. Early examples include utilising quantum
encoding for faster numerical gradient estimation [323] including in the multivariate
case [324] There are a number of candidate optimisation strategies in the litera-
ture that effectively adapt classical stochastic gradient descent to the quantum case,
including the following:

(i) Parameter shift rules. Parameter shifts [325, 326] in the context of gradient
descent represent a method of calculating gradients by shifts of parameters.
By evaluating the cost function C(θ) at the value of two different parameters
θ, θ + ϵθ , the rescaled difference ∇θ − ∇θ+ϵ forms an unbiased estimate of ∇θ ,
usually restricted to gates with two different eigenvalues [326]. The reason
this matters is because in certain cases classical stochastic gradient descent is
inappropriate or unable to obtain a well-defined gradient. For a cost functional
C(θ) we calculate the gradient as:

C(θ + ϵ) − C(θ − ϵ)
∇θ C = (D.7.5)
2

(ϵ is usually set to π/2 for operators with ±1 eigenvalues). An example

would include where a quantum circuit includes a rotation gate Rx (θ) =
exp(−iσx θ/2). Connecting with section A.1.6.4, the gradient of the expec-
tation of an observable A with respect to a parametrised gate U (θ) in simple
form is:

⟨A⟩U (θ+ϵ) − ⟨A⟩U (θ−ϵ)

∇θ ⟨A⟩U (θ) = (D.7.6)
2

providing a means of understanding how measurement statistics vary as U (θ)

evolves.

(ii) Quantum natural gradients. the quantum natural gradient method [327] re-
lated to the Fisher information metric and Fubini-Study metric FQ provides
D.7. VARIATIONAL QUANTUM CIRCUITS 425

a candidate quantum analogue for quantum-specific natural gradient descent

(see section D.5.3):

θt+1 = θt − ηg + (θt )∇L (D.7.7)

where η is the learning rate, loss function L and g + is related to the pseudo-
inverse of the Fubini-Study metric tensor. This metric tensor is sometimes de-
noted the ‘quantum geometric tensor’ which effectively involves inner-product
contractions between states and their derivatives as a means of calculating the
directional derivatives.

(iii) Quantum autodifferentiation and backpropagation. In a series of papers, Beer

et al. [70,284,328] propose a quantum analogue of autodifferentiation and back-
propagation for (fully connected for the most part) quantum neural network
architectures. Quantum autodifferentiation extends the concept of computa-
tional graphs to quantum circuits, enabling the calculation of gradients via
a quantum version of the backpropagation algorithm. The proposal is based
upon a dissipative quantum neural network architecture leveraging perceptron
unitaries acting on m+n qubits with (2m+n )2 −1 parameters. The circuit com-
prises L + 2 layers of qubits (+2 being one for input, one for output, the rest L
being hidden). Each perception is represented by a unitary with information
being propagated essentially via partial trace (definition A.1.21) operations.
More specifically (see §4 therein) each layer of qubits ρl−1 is tensored with the
subsequent layer of qubits ρl initialised in |0⟩⟨0|. A set of unitaries are applied
to both layers with the l − 1-th layer being traced out. This mapping of layers
l − 1 to l is denoted via the map Etl . The backpropagation step works similarly
but by applying the adjoint of this layer-to-layer map Ftl first to the tensor
product of layer l and l − 1 then tracing out layer l. The loss function L is a
function of the fidelity of the labelled (target) states and the final output layer
of the network. In particular, for fidelity-based loss functions with a quantum
neural network architecture where each layer comprises qubits acted upon by
unitaries, errors are backpropagated using a unitary operation on qubits that
encodes the error given by:

Ujl (t + ϵ) = exp(iϵKjl (t))Ujl (t)

for ϵ training step size where:

η2m−l−1 i X
Kjl (t) = Trrest Mjl (x, t)

S x
426 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

where Mjl encodes the errors calculated via a commutator term which essen-
tially propagates from the input layer to the lth layer for one term and from the
label state to the l + 1-th state, the difference between the unitaries between
any two states then will show up as a non-zero commutator which is what Mjl
encodes. This it is shown is equivalent to calculating the derivative of L for
gradient update purposes via the equations above, with the benefit that only
two layers need to be updated at any one time rather than needing to update
all layers to do so.

(iv) Backaction-based gradients. In [72], Verdon et al. introduce a quantum-

specific analogue of backpropagation, baqprop (backward propagation of phase
errors) based on a back-action arising from interaction of the quantum sys-
tem with an external apparatus (e.g. measurement). The model adopted
seeks to calculate the gradient of an oracle function f whose input is a con-
tinuous register (see definitions A.1.2 and A.1.3). Given position Φj and mo-
mentum Πj operators for a multi-(two) qubit register, a von Neumann mea-
surement e−if (ϕ̂1 )Π̂2 : |ϕ1 , ϕ2 ⟩ → |ϕ1 , ϕ2 + f (ϕ1 )⟩ representing a controlled
displacement of the second register which computes the function f on the
first register and stores the result in the second. While in the position ba-
sis it appears the first register is unchanged, in the momentum it can be
shown that the momentum of the first register is affected. This can be seen
via expressing the operation via the adjoint action (definition B.2.7) in the
Heisenberg frame this is Adeif (ϕ̂1 )Π̂2 (Π̂2 ) = Π̂2 + f ′ (ϕ̂1 ) which can be shown via
Π̂1 = Adeif (ϕ̂1 )Π̂2 (Π̂1 ) + f ′ (ϕ̂1 )Π̂2 . Information about the gradient is then ob-
tained by various measurement protocols on Π̂1 i.e. the momentum of the first
register is shifted in proportion to the gradient f ′ . The actual optimisation
procedures applied involve quantum-tunnelling style propagation of errors and
hybrid style momentum measurement gradient descent.

D.8 Symmetry-based QML

In this final section, we sketch an outline of the general structure of greybox varia-
tional quantum circuits that are the subject of our next two chapters. We connect the
general architecture to three related fields in quantum and classical machine learning
(a) Lie algebraic methods (such as Lie group machine learning [329, 330]), (b) geo-
metric machine learning [27] and (c) geometric quantum machine learning [30, 34].
D.8. SYMMETRY-BASED QML 427

D.8.1 QML Optimal Control

In terms of the discussion of optimal control examined in the previous chapter,
the backpropagation algorithms are designed to learn function estimates fθ that
map from time-optimal geodesic data to sequences of controls. An effective learning
protocol would thereby, in principle, learn controls (and thus unitary sequences) sat-
isfying the requisite PMP conditions. These approaches seek to learn parameterised
functions uj (t) in the case of control theory and weights ω in the case of neural
networks to optimise an objective function (maximising the PMP Hamiltonian or
minimising the loss/cost function L in backpropagation). In essence the conceptual
connection between controls acting on generators, updated to satisfy the constraints
of the optimal control problem, with updatable weights acting on input features
in order to satisfy the objective function of minimising loss provides a bridge be-
tween classical control theory and machine learning. Indeed the deep connections
between the two are very much manifest across the literature, especially in the work
of Bellman [121, 331].
In Chapter 4, we assume that the sequence of unitaries (generated from a the-
oretically justified means of obtaining subRiemannian geodesics) is time-optimal
and thus bounded by total time (path-length) L, thus adopt a learning protocol
that seeks to learn the sequence of controls u(t) ∈ [0, 1] or u(t) ∈ [−1, 1] (using a
tanh activation function) which minimises the error (fidelity) of Ûj (t) versus Uj (t)
for a sequence of unitaries j = 1, ..., n. Put another way, we assume by reason of
the assumption that training data consists of geodesic paths (represented by se-
quences of unitaries) whose length is therefore minimal and unique. From equation
C.2.8, because the controls set the length ℓ(γ), the sequence of controls obtaining
Ûj (t) ≃ Uj (t) will be minimal (up to some error tolerance).

D.8.2 Geometric information theory

Geometric information theory emerged in the second-half of the twentieth century
via the application of techniques from differential geometry to information theory. In
particular this saw results mapping concepts of Riemannian manifolds and metrics
across to information manifolds and figures of merit common in statistics, such as
Fisher information. The fundamental organising principle is to regard information-
theoretic problems of interest in terms of geometric concepts. Thus information is
presented as inhering within a differentiable manifold e.g. Riemannian or Kahler
manifold. Information theory based upon specific measures of information [286]. As
Amari [28,288] (credited with first equating Fisher information and the Riemannian
metric) notes, information geometry is a means of exploring the world using modern
(differential) geometry, as a discipline arising from the use of such techniques to
428 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

study symmetry and invariance properties involved in statistical inference. In the

interests of time and space, we do not in this work delve into information geometry
in any significant way, however note its use in machine learning in a variety of
areas, such as clustering, support vector machines, belief propagation and boosting.
The underlying idea of information geometry is to cast information and probability
distributions in terms of geometric structures such as differentiable manifolds. Other
areas of machine learning exploring geometric and algebraic techniques include Lie
group machine learning [29, 329] where, similar to the use of Lie groups in quantum
contexts, machine learning networks and algorithms are constructed in order to
respect and leverage symmetries underpinning groups and their corresponding Lie
algebras (see [330, 332] comprehensive reviews and theory).

D.8.3 Geometric QML

In recent years, interest in applying geometric techniques to assist in symmetry re-

duction and symmetry-enhanced quantum machine learning network design have
gained prominent. We summarise some of the key results relevant to our com-
plementary work below. This emerging sub-discipline, sometimes named geometric
quantum machine learning [32,35] seeks to incorporate symmetry properties into the
design of machine learning architecture as a means of improving figures of merit, such
as runtime complexity, metrics (such as fidelity) or generalisation and overparametri-
sation. This is often achieved via seeking to build parametrised quantum circuits
or quantum neural network architectures which, as functional compositions, respect
underlying symmetries of input data, or which lead to outputs which respect sym-
metry properties, such as outputs in terms of unitaries or parametrisations thereof.
For example, [333] design quantum neural network layers F that permutation sym-
metry under the action of the permutation group Sn . In this case, σF(σ) = F(σ(ψ))
for σ ∈ Sn .

Central to the GQML programme is the use of symmetry-respective architectures.

This is usually in the form of neural network layers consistent of unitaries generated
by generators of dynamical Lie groups (see section C.5.5.1). The use of symme-
try reduction is shown in a number of works to, for example, reduce the effec-
tive parametrisation required in order to meet certain performance thresholds [34].
Relatedly, the reduction in parametrisation owing to such embedded symmetries
has been shown to improve other architectural features, such as the risk of bar-
ren plateaus [334]. Geometric QML is the closest extant discipline related to the
present work. The distinction between this work and various geometric QML lit-
erature [30–38] in this emerging field is that they are usually concerned with con-
D.9. GREYBOX MACHINE LEARNING 429

structing quantum machine learning architectures (e.g. variational quantum cir-

cuits) which respect compositionally certain symmetries by encoding Lie algebraic
generators within VQC layers, whereas the work in Chapter 4 for example does not
seek an entire network that respects symmetry properties, but rather only a sub-
network of layers whose output (e.g. in the form of unitary operators U (t)) respects
symmetries inherent in unitary groups. Moreover, our work in that Chapter is con-
cerned with design choices to enable the network to learn geodesic approximations
via simplifying the problem down to one of learning optimal control functions.

D.9 Greybox machine learning

Our approach in our Chapters above represents a related, but distinct, stream of
research leveraging algebraic and geometric techniques for solving time-optimal uni-
tary synthesis problems. Our general approach to architecture, focusing on quantum
geometric machine learning, is set out below. The method is a hybrid of quantum
and classical architectures which we denote greybox architecture or greybox learn-
ing which leverages variational parametrised quantum circuits (equation D.7.2) that
embed symmetries of interest (via Lie algebra generators and unitary activation
functions) in order to generate unitary targets respecting symmetry structures. The
underlying idea is that known a priori information is encoded within a classical neu-
ral network stack, obviating the need for the network to learn such rules (such as
requirements regarding unitarity or Hamiltonian construction) and thereby afford-
ing guarantees (such as unitarity of outputs) in a way that blackbox learning only
at best approximates. Although we do not formally prove it, our focus is on the idea
that by doing so we are in effect minimising empirical risk by reducing the variance
in the models that would otherwise be required to for example learn and implement
the Schrödinger equation from scratch. We can analogise between directionality of
gradient descent updates to estimate Ŷ and using machine learning methods to solve
time-optimal problems. Sketching this out, in the previous Appendices, we artic-
ulated how UT ∈ G ≃ M (for Lie group G) are generated (via Hamiltonian flow)
by Hamiltonians H from the corresponding Lie algebra g. By the correspondence
between tangent vectors and Lie algebra basis elements, we can then regard learning
optimal control functions uj (t) (for directions j where j = 1, ..., dim g) as equivalent
optimisation in a machine learning context. The controls uj (t) in effect parametrise
our system consistent with the generalised Pontryagin method, with the adjustment
of the control functions to minimise our loss functions (explicated below in terms
of fidelity F (ÛT , UT )) geometrically represented as transformations in one or more
of the directions of elements of Hp M (our distribution ∆). The geometric features
we are seeking to leverage are the symmetry properties of subRiemannian symmet-
430 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)

ric spaces, in particular discovering strategies for learning sequences of controls for
generating time-optimal geodesic paths along the manifold of interest. We describe
this relationship below.
(i) Objective. Firstly, our problem is one of using machine learning to synthesise
time-optimal unitaries relevant to quantum computational contexts. Specif-
ically, this means using machine learning architecture to solve certain con-
trol systems expressed through the Pontryagin Maximum Principle (definition
C.5.5). Our approach focused on quantum control lends itself to variational
quantum algorithms from a design-choice perspective.

(ii) Control optimisation. Our aim is to therefore generate a sequence of control

pulses that may be applied to corresponding Lie algebra generators (in our
case corresponding to single and multi-qubit systems) in order to generate
sequences of time-optimal (or as approximately time-optimal as we can get)
unitary operators. given the objectives and constraints required, including (a)
ensuring outputs respect unitary constraints and (b) controls (and by extension
path-length) respects minimisation (and geodesic) constraints.

(iii) Inputs. The inputs layers a(0) of the network take as their feature data unitaries
U (t) ∈ G or in the case of multi-qubit systems, G⊗G for Chapters 3 and 4, G =
SU(2). The unitaries have a representation by way of the usual representation
of SU (2) in terms of the matrix group M2 (C) (see definition B.2.1). Denoting
such matrices X ∈ M2 (C) (or X1 ⊗ X2 ∈ M2 (C) ⊗ M2 (C) (by the single X for
convenience), the initial layers tend to transform (e.g. flatten) such matrices
into a realised form. The activation function of these input layers tends to
then just be the identity:

a(0) = I(X) (D.9.1)

consistent with the generic neural network schema in section D.4.2.3.

(iv) Feed-forward layers. The input layers are then fed into a sequence of typical
classical neural network layer, in our case feed-forward layers (see definition
D.4.4):

a(1) = σ(ωa(0) + b). (D.9.2)

The feed-forward layers tend to be, at least in our examples, fully-connected

and dense.

(v) Control pulse layers. As we sketch below, parameter network layers (parametrised
by θ) feed into layers comprising control pulses uj (t). The neurons within these
D.9. GREYBOX MACHINE LEARNING 431

layers are characterised by activation functions such that uj (t) ∈ [0, 1], where
we rely implicitly on the fact that for the bounded control problem, we can
arbitrarily scale (normalise) controls to fit within minimum length L (see def-
inition C.2.6).

(vi) Hamiltonian and Unitary layers. The control pulse layers then feed into be-
spoke layers to (i) construct the Hamiltonian from fixed Lie algebra generators
in g which is then (ii) input into a layer comprising an exponential activation
that constructs time-independent approximation to time-dependent unitary
operator. In all cases, our target of interest is the optimal unitary for time tj .
In Chapter 4, we are interested in the sequence of controls (ui (t)) that generate
a sequence of estimated unitary operators (Ûj (t)) which, under the assumption
of time-independence (see definition A.1.25), allows us to generate our target
U (T ) ∈ G. Each node generating a unitary has k generators and control func-
tions ui (t), i = 1, ...k where k = dim g (or whatever the control subset is). Our
labels and our function estimates are sequences of unitaries and, as we note
below, the target UT is actually one of the inputs to the models.

(vii) Optimisation. The optimisation occurs by reference to a cost function based

on the fidelity metric (definition A.1.40). The fidelity of each unitary estimate
in the sequence Ûj (t) is estimated. From a control theory perspective, the
optimisation is classical offline control, where an exogenous protocol is used to
optimise and then adjust the parameters θ accordingly which ultimately feed
into adjusting the controls ui (t). Several classes of network are used, including
a vanilla (basic) fully-connected feedforward network, an LSTM and the grey-
box approach which we detail here. The networks are trained on data Dn (X, Y )
generated from distribution ∆ ∈ su(2), such that the resulting Hamiltonians
(and thus curves on M) are time-optimal, these are subRiemannian geodesics.

(viii) Measurement. In Chapter 3, measurement statistics for each time slice are
captured in the data (from which one can reconstruct unitaries or Hamiltonians
for such time slice), whereas for Chapter 4, the existence of an external process
that allows for Û (t) and Û (t) to be constructed is assumed.
432 APPENDIX D. APPENDIX (QUANTUM MACHINE LEARNING)
Bibliography

[1] Neutelings, Izaak, “Tikz.net,” https://tikz.net/.

[2] S. Helgason, Differential Geometry, Lie Groups, and Symmetric Spaces, ser.
ISSN. Elsevier Science, 1979.

[3] T. Hawkins, Emergence of the theory of Lie groups: An essay in the history
of mathematics 1869–1926. Springer Science & Business Media, 2012.

[4] T. Gowers, J. Barrow-Green, and I. Leader, The Princeton companion to math-

ematics. Princeton University Press, 2010.

[5] M. Detlefsen, “Formalism,” in The Oxford Handbook of Philosophy of Mathe-

matics and Logic. Oxford University Press, Jun. 2007.

[6] T. L. Heath and others, The thirteen books of Euclid’s Elements. Courier
Corporation, 1956.

[7] J. Gray, “Felix Klein’s Erlangen Program,‘Comparative considerations of re-

cent geometrical researches’(1872),” in Landmark Writings in Western Math-
ematics 1640-1940. Elsevier, 2005, pp. 544–552.

[8] R. W. Sharpe, Differential geometry: Cartan’s generalization of Klein’s Er-

langen program. Springer Science & Business Media, 2000, vol. 166.

[9] A. W. Knapp and A. W. Knapp, Lie groups beyond an introduction. Springer,

1996, vol. 140.

[10] S. Helgason, Differential Geometry and Symmetric Spaces, ser. ISSN. Elsevier
Science, 1962.

[11] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Infor-

mation: 10th Anniversary Edition, 10th ed. New York, NY, USA: Cambridge
University Press, 2011.

[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learn-

ing: Data Mining, Inference, and Prediction, ser. Springer Series in Statistics.
Springer New York, 2013.

433
434 BIBLIOGRAPHY

[13] V. Vapnik, The Nature of Statistical Learning Theory. Springer, New York,
1995.

[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA:

MIT Press, 2016.

[15] D. D’Alessandro, Introduction to Quantum Control and Dynamics. CRC

Press, Aug. 2007.

[16] D. D’Alessandro and J. T. Hartwig, “Dynamical Decomposition of Bilinear

Control Systems Subject to Symmetries,” Journal of Dynamical and Control
Systems, vol. 27, no. 1, pp. 1–30, 2021.

[17] D. D’Alessandro and B. Sheller, “On K-P sub-Riemannian Problems and their
Cut Locus,” in 2019 18th European Control Conference (ECC), Jun. 2019, pp.
4210–4215.

[18] F. Albertini and D. D’Alessandro, “Controllability of symmetric spin net-

works,” Journal of Mathematical Physics, vol. 59, no. 5, p. 052102, 2018.

[19] V. Jurdjevic and J. P. Quinn, “Controllability and stability,” Journal of dif-

ferential equations, vol. 28, no. 3, pp. 381–389, 1978.

[20] N. Khaneja and S. J. Glaser, “Cartan decomposition of SU(2n) and control of

spin systems,” Chemical Physics, vol. 267, no. 1, pp. 11–23, Jun. 2001.

[21] N. Khaneja, R. Brockett, and S. Glaser, “Time Optimal Control in Spin Sys-
tems,” Physical Review A, vol. 63, no. 3, Feb. 2001.

[22] N. Khaneja, S. J. Glaser, and R. Brockett, “Sub-Riemannian Geometry and

Time Optimal Control of Three Spin Systems: Quantum Gates and Coherence
Transfer,” Physical Review A, vol. 65, no. 3, Jan. 2002.

[23] V. Jurdjevic, J. Velimir, V. DJurdjević, B. Bollobas, C. U. Press, W. Fulton,

A. Katok, F. Kirwan, P. Sarnak, and B. Simon, Geometric Control Theory,
ser. Cambridge Studies in Advanced Mathematics. Cambridge University
Press, 1997.

[24] U. Boscain, T. Chambrion, and J.-P. Gauthier, “On the K + P Problem for
a Three-Level Quantum System: Optimality Implies Resonance,” Journal of
Dynamical and Control Systems, vol. 8, no. 4, pp. 547–572, Oct. 2002.

[25] V. Jurdjevic, “Non-Euclidean Elastica,” American Journal of Mathematics,

vol. 117, no. 1, pp. 93–124, 1995.
BIBLIOGRAPHY 435

[26] V. Jurdjevic and I. Kupka, “Control systems on semi-simple Lie groups and
their homogeneous spaces,” in Annales de l’institut Fourier, vol. 31, 1981, pp.
151–179.

[27] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric deep

learning: Grids, groups, graphs, geodesics, and gauges,” arXiv:2104.13478,
2021.

[28] S.-i. Amari, Information geometry and its applications. Springer, 2016, vol.
194.

[29] F. Li, L. Zhang, and Z. Zhang, Lie Group Machine Learning. Berlin, Boston:
De Gruyter, 2019.

[30] M. Larocca, F. Sauvage, F. M. Sbahi, G. Verdon, P. J. Coles, and M. Cerezo,

“Group-Invariant Quantum Machine Learning,” PRX Quantum, vol. 3, no. 3,
p. 030341, Sep. 2022.

[31] M. Larocca, P. Czarnik, K. Sharma, G. Muraleedharan, P. J. Coles, and

M. Cerezo, “Diagnosing barren plateaus with tools from quantum optimal
control,” 2021, arXiv:2105.14377.

[32] M. Larocca, N. Ju, D. Garcı́a-Martı́n, P. J. Coles, and M. Cerezo, “Theory of

overparametrization in quantum neural networks,” arXiv:2109.11676, 2021.

[33] M. Larocca, E. Calzetta, and D. A. Wisniacki, “Exploiting landscape geometry

to enhance quantum optimal control,” Physical Review A, vol. 101, no. 2, p.
023410, Feb. 2020.

[34] Q. T. Nguyen, L. Schatzki, P. Braccia, M. Ragone, P. J. Coles, F. Sauvage,

M. Larocca, and M. Cerezo, “Theory for equivariant quantum neural net-
works,” PRX Quantum, vol. 5, no. 2, p. 020328, 2024.

[35] M. Cerezo, G. Verdon, H.-Y. Huang, L. Cincio, and P. J. Coles, “Challenges

and opportunities in quantum machine learning,” Nature Computational Sci-
ence, 2022.

[36] M. Cerezo, A. Poremba, L. Cincio, and P. J. Coles, “Variational quantum

fidelity estimation,” Quantum, vol. 4, p. 248, 2020.

[37] M. Ragone, P. Braccia, Q. T. Nguyen, L. Schatzki, P. J. Coles, F. Sauvage,

M. Larocca, and M. Cerezo, “Representation Theory for Geometric Quantum
Machine Learning,” Feb. 2023, arXiv:2210.07980 [quant-ph, stat].
436 BIBLIOGRAPHY

[38] M. C. Caro, H.-Y. Huang, M. Cerezo, K. Sharma, A. Sornborger, L. Cin-

cio, and P. J. Coles, “Generalization in quantum machine learning from few
training data,” Nature Communications, vol. 13, 2022.

[39] E. Perrier, A. Youssry, and C. Ferrie, “QDataSet, quantum datasets for ma-
chine learning,” Scientific Data, vol. 9, no. 1, pp. 1–22, 2022.

[40] E. Perrier, D. Tao, and C. Ferrie, “Quantum geometric machine learning for
quantum circuits and control,” New Journal of Physics, vol. 22, no. 10, p.
103056, Oct. 2020.

[41] E. Perrier and C. S. Jackson, “Solving the KP problem with the global cartan
decomposition,” 2024, arXiv:2404.02358.

[42] N. Harrigan and R. W. Spekkens, “Einstein, Incompleteness, and the Epis-

temic View of Quantum States,” Foundations of Physics, vol. 40, no. 2, pp.
125–157, Feb. 2010.

[43] J. Watrous, The Theory of Quantum Information. Cambridge University

Press, 2018.

[44] H. Goldstein, Classical Mechanics. Pearson Education, Sep. 2002.

[45] B. C. Hall, Quantum theory for mathematicians. Springer, 2013.

[46] Y. L. Sachkov, “Control theory on lie groups,” Journal of Mathematical Sci-

ences, vol. 156, no. 3, p. 381, Jan. 2009.

[47] J. Hall, Beyond AI: Creating the conscience of the machine. Amherst, NY:
Prometheus, 2007.

[48] C. J. Isham, Modern differential geometry for physicists. World Scientific

Publishing Company, 1999, vol. 61.

[49] T. Frankel, The Geometry of Physics: An Introduction. Cambridge University

Press, 2011.

[50] R. Montgomery, A. M. Society, P. Landweber, M. Loss, T. Ratiu, and

J. Stafford, A Tour of Subriemannian Geometries, Their Geodesics, and Appli-
cations, ser. Mathematical surveys and monographs. American Mathematical
Society, 2002.

[51] M. do Carmo, Differential Geometry of Curves and Surfaces: Revised and Up-
dated Second Edition, ser. Dover Books on Mathematics. Dover Publications,
2016.
BIBLIOGRAPHY 437

[52] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, ser. Foun-

dations of Differential Geometry. Interscience Publishers, 1963, vol. 1.

[53] M. E. Swaddle, “SubRiemannian geodesics and cubics for efficient quantum

circuits,” PhD Thesis, The University of Western Australia, 2017.

[54] M. Swaddle, L. Noakes, H. Smallbone, L. Salter, and J. Wang, “Generating

three-qubit quantum circuits with neural networks,” Physics Letters A, vol.
381, no. 39, pp. 3391–3395, 2017.

[55] B. A. Sheller, “Symmetry Reduction in K-P Problems,” PhD Thesis, Iowa

State University, United States – Iowa.

[56] F. Albertini and D. D’Alessandro, “On Symmetries in Time Optimal Control,

SubRiemannian Geometries, and the KP Problem,” Journal of Dynamical and
Control Systems, vol. 24, no. 1, pp. 13–38, Jan. 2018.

[57] V. Jurdjevic, “Abstract control systems: controllability and observability,”

SIAM Journal on Control, vol. 8, no. 3, pp. 424–439, 1970.

[58] V. Jurdjevic and H. J. Sussmann, “Control systems on Lie groups,” Journal

of Differential equations, vol. 12, no. 2, pp. 313–329, 1972.

[59] N. Khaneja, S. J. Glaser, and R. Brockett, “Sub-riemannian geometry and

time optimal control of three spin systems: Quantum gates and coherence
transfer,” Physical Review A, vol. 65, no. 3, p. 032301, 2002.

[60] V. Jurdjevic, “Hamiltonian point of view of non-Euclidean geometry and el-

liptic functions,” Systems & Control Letters, vol. 43, no. 1, pp. 25–41, May
2001.

[61] U. Boscain, M. Sigalotti, and D. Sugny, “Introduction to the Pontryagin Max-

imum Principle for Quantum Optimal Control,” PRX Quantum, vol. 2, no. 3,
p. 030203, Sep. 2021.

[62] U. Boscain and F. Rossi, “Invariant Carnot-Caratheodory metrics on SU(3),

SO(3), SL(2) and lens spaces,” SIAM Journal on Control and Optimization,
vol. 47, no. 4, pp. 1851–1878, 2008.

[63] D. D’Alessandro, B. A. Sheller, and Z. Zhu, “Time-optimal control of quantum

lambda systems in the KP configuration,” Journal of Mathematical Physics,
vol. 61, no. 5, p. 052107, May 2020.

[64] M. Schuld and F. Petruccione, Machine Learning with Quantum Computers.

Springer, 2021.
438 BIBLIOGRAPHY

[65] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, “Parameterized quantum

circuits as machine learning models,” Quantum Science and Technology, vol. 4,
no. 4, p. 043001, 2019.

[66] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Sta-

tistical Learning: with Applications in R, ser. Springer Texts in Statistics.
Springer New York, 2013.

[67] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions

on Neural Networks, vol. 10, no. 5, pp. 988–999, Sep. 1999.

[68] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghe-

mawat, G. Irving, M. Isard, and others, “Tensorflow: A system for large-scale
machine learning,” in 12th USENIX symposium on operating systems design
and implementation OSDI 16, 2016, pp. 265–283.

[69] V. Dunjko and H. J. Briegel, “Machine learning & artificial intelligence in

the quantum domain: a review of recent progress,” Reports on Progress in
Physics, vol. 81, no. 7, p. 074001, 2018.

[70] K. Beer, “Quantum neural networks,” PhD Thesis, 2022, arXiv:2205.08154

[quant-ph].

[71] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung, R. Babbush, Z. Jiang,

H. Neven, and M. Mohseni, “Learning to learn with quantum neural networks
via classical neural networks,” arXiv:1907.05415, 2019.

[72] G. Verdon, J. Pye, and M. Broughton, “A universal training algorithm for

quantum deep learning,” arXiv:1806.09729, 2018.

[73] G. Verdon, T. McCourt, E. Luzhnica, V. Singh, S. Leichenauer, and J. Hidary,

“Quantum Graph Neural Networks,” arXiv:1909.12264 [quant-ph], Sep. 2019,
arXiv: 1909.12264.

[74] G. Verdon, M. Broughton, and J. Biamonte, “A quantum algorithm to train

neural networks using low-depth circuits,” arXiv:1712.05304, 2017.

[75] J. J. Hopfield, “Neural networks and physical systems with emergent collective
computational abilities,” Proc Natl Acad Sci U S A, vol. 79, pp. 27–8424, Apr.
1982.

[76] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld, N. Quesada, and

S. Lloyd, “Continuous-variable quantum neural networks,” Physical Review
Research, vol. 1, no. 3, p. 033063, 2019.
BIBLIOGRAPHY 439

[77] L. Franken and B. Georgiev, “Explorations in quantum neural networks with

intermediate measurements,” in Proceedings of ESANN, 2020.

[78] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven,

“Barren plateaus in quantum neural network training landscapes,” Nature
Communications, vol. 9, no. 1, pp. 1–6, 2018.

[79] M. Larocca, S. Thanasilp, S. Wang, K. Sharma, J. Biamonte, P. J. Coles,

L. Cincio, J. R. McClean, Z. Holmes, and M. Cerezo, “A review of barren
plateaus in variational quantum computing,” May 2024, arXiv.2405.00781.

[80] M. Cerezo, K. Sharma, A. Arrasmith, and P. J. Coles, “Variational Quantum

State Eigensolver,” npj Quantum Information, vol. 8, no. 1, pp. 1–11, 2022.

[81] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,

“Dropout: a simple way to prevent neural networks from overfitting,” The
journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.

[82] A. Youssry, G. A. Paz-Silva, and C. Ferrie, “Characterization and control of

open quantum systems beyond quantum noise spectroscopy,” npj Quantum
Information, vol. 6, no. 1, pp. 1–13, Dec. 2020, number: 1 Publisher: Nature
Publishing Group.

[83] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, “Quantum

Boltzmann Machine,” Physical Review X, vol. 8, no. 2, p. 021050, Apr. 2018.

[84] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd,

“Quantum machine learning,” Nature, vol. 549, no. 7671, pp. 195–202, 2017.

[85] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Evaluating

analytic gradients on quantum hardware,” Physical Review A, vol. 99, no. 3,
p. 032331, 2019.

[86] E. Aı̈meur, G. Brassard, and S. Gambs, “Machine Learning in a Quantum

World,” in Advances in Artificial Intelligence, ser. Lecture Notes in Computer
Science, L. Lamontagne and M. Marchand, Eds. Berlin, Heidelberg: Springer,
2006, pp. 431–442.

[87] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Evaluating

analytic gradients on quantum hardware,” Physical Review A, vol. 99, no. 3,
p. 032331, 2019.

[88] G. Verdon, M. Broughton, and J. Biamonte, “A quantum algorithm to train

neural networks using low-depth circuits,” Aug. 2019, arXiv:1712.05304 [cond-
mat, physics:quant-ph].
440 BIBLIOGRAPHY

[89] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data:
Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350–361, 2017.

[90] T. Vidick and J. Watrous, “Quantum Proofs,” Foundations and Trends®

in Theoretical Computer Science, vol. 11, no. 1-2, pp. 1–215, 2016, arXiv:
1610.01664.

[91] D. Dong and I. R. Petersen, “Quantum control theory and applications: A

survey,” IET Control Theory & Applications, vol. 4, no. 12, pp. 2651–2671,
Dec. 2010, arXiv: 0910.2350.

[92] L. Viola, E. Knill, and S. Lloyd, “Dynamical Decoupling of Open Quantum

Systems,” Physical Review Letters, vol. 82, no. 12, pp. 2417–2421, Mar. 1999.

[93] J. Preskill, “Quantum Computing in the NISQ era and beyond,” Quantum,
vol. 2, p. 79, 2018.

[94] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug, S. Alperin-Lea, A. Anand,

M. Degroote, H. Heimonen, J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-
C. Kwek, and A. Aspuru-Guzik, “Noisy intermediate-scale quantum (NISQ)
algorithms,” Jan. 2021, arXiv: 2101.08448.

[95] A. Youssry, R. J. Chapman, A. Peruzzo, C. Ferrie, and M. Tomamichel, “Mod-

eling and control of a reconfigurable photonic circuit using deep learning,”
Quantum Science and Technology, vol. 5, no. 2, p. 025001, Jan. 2020.

[96] R. Lorenz, A. Pearson, K. Meichanetzidis, D. Kartsaklis, and B. Coecke,

“QNLP in Practice: Running Compositional Models of Meaning on a Quan-
tum Computer,” Feb. 2021, arXiv: 2102.12846.

[97] Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST database of hand-

written digits, 1998,” URL http://yann. lecun. com/exdb/mnist, vol. 10, p. 34,
1998.

[98] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,

A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet
Large Scale Visual Recognition Challenge,” International Journal of Computer
Vision, vol. 115, no. 3, pp. 211–252, Dec. 2015.

[99] Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-Scale Parallel Collab-

orative Filtering for the Netflix Prize,” in Algorithmic Aspects in Information
and Management, ser. Lecture Notes in Computer Science, R. Fleischer and
J. Xu, Eds. Berlin, Heidelberg: Springer, 2008, pp. 337–348.
BIBLIOGRAPHY 441

[100] M. Schuld, I. Sinayskiy, and F. Petruccione, “An introduction to quantum

machine learning,” Contemporary Physics, vol. 56, no. 2, pp. 172–185, 2015.

[101] C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Rocchetto, S. Sev-

erini, and L. Wossnig, “Quantum machine learning: a classical perspective,”
Proceedings of the Royal Society A: Mathematical, Physical and Engineering
Sciences, vol. 474, no. 2209, p. 20170551, 2018.

[102] M. Bilkis, M. Cerezo, G. Verdon, P. J. Coles, and L. Cincio, “A semi-

agnostic ansatz with variable structure for quantum machine learning,” 2021,
arXiv:2103.06712.

[103] V. Dunjko, J. M. Taylor, and H. J. Briegel, “Quantum-Enhanced Machine

Learning,” Physical Review Letters, vol. 117, no. 13, p. 130501, Sep. 2016.

[104] E. Joos, H. D. Zeh, C. Kiefer, D. J. W. Giulini, J. Kupsch, and I.-O. Sta-

matescu, Decoherence and the Appearance of a Classical World in Quantum
Theory. Springer Science & Business Media, Mar. 2013.

[105] G. Riviello, K. M. Tibbetts, C. Brif, R. Long, R.-B. Wu, T.-S. Ho, and H. Rab-
itz, “Searching for quantum optimal controls under severe constraints,” Phys-
ical Review A, vol. 91, no. 4, p. 043401, 2015.

[106] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love,

A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a
photonic quantum processor,” Nature Communications, vol. 5, p. 4213, Jul.
2014, eprint: 1304.3061.

[107] Z.-C. Yang, A. Rahmani, A. Shabani, H. Neven, and C. Chamon, “Optimiz-

ing variational quantum algorithms using pontryagin’s minimum principle,”
Physical Review X, vol. 7, no. 2, p. 021027, 2017.

[108] D. Wecker, M. B. Hastings, and M. Troyer, “Progress towards practical quan-

tum variational algorithms,” Physical Review A, vol. 92, no. 4, p. 042303, Oct.
2015.

[109] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, “The theory

of variational hybrid quantum-classical algorithms,” New Journal of Physics,
vol. 18, no. 2, p. 023023, 2016.

[110] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal component

analysis,” Nature Physics, vol. 10, no. 9, pp. 631–633, Sep. 2014.

[111] M. Schuld and F. Petruccione, Supervised learning with quantum computers.

Springer, 2018, vol. 17.
442 BIBLIOGRAPHY

[112] L. K. Grover, “Quantum Mechanics Helps in Searching for a Needle in a

Haystack,” Physical Review Letters, vol. 79, no. 2, pp. 325–328, Jul. 1997.

[113] G. Brassard and P. Hoyer, “An Exact Quantum Polynomial-Time Algorithm

for Simon’s Problem,” Proceedings of the Fifth Israeli Symposium on Theory
of Computing and Systems, pp. 12–23, 1997, arXiv: quant-ph/9704027.

[114] P. W. Shor, “Polynomial-Time Algorithms for Prime Factorization and Dis-

crete Logarithms on a Quantum Computer,” SIAM Journal on Computing,
vol. 26, no. 5, pp. 1484–1509, 1997, arXiv: quant-ph/9508027.

[115] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A
large-scale hierarchical image database,” in 2009 IEEE conference on computer
vision and pattern recognition. Ieee, 2009, pp. 248–255.

[116] R. Kitchin and G. McArdle, “What makes Big Data, Big Data? Exploring the
ontological characteristics of 26 datasets,” Big Data & Society, vol. 3, no. 1,
p. 2053951716631130, Jun. 2016.

[117] K. Albertsson, P. Altoe, D. Anderson, M. Andrews, J. P. A. Espinosa,

A. Aurisano, L. Basara, A. Bevan, W. Bhimji, D. Bonacorsi, P. Calafiura,
M. Campanelli, L. Capps, F. Carminati, S. Carrazza, T. Childers, E. Coni-
avitis, K. Cranmer, C. David, D. Davis, J. Duarte, M. Erdmann, J. Eschle,
A. Farbin, M. Feickert, N. F. Castro, C. Fitzpatrick, M. Floris, A. Forti,
J. Garra-Tico, J. Gemmler, M. Girone, P. Glaysher, S. Gleyzer, V. Glig-
orov, T. Golling, J. Graw, L. Gray, D. Greenwood, T. Hacker, J. Har-
vey, B. Hegner, L. Heinrich, B. Hooberman, J. Junggeburth, M. Kagan,
M. Kane, K. Kanishchev, P. Karpiński, Z. Kassabov, G. Kaul, D. Kcira,
T. Keck, A. Klimentov, J. Kowalkowski, L. Kreczko, A. Kurepin, R. Kutschke,
V. Kuznetsov, N. Köhler, I. Lakomov, K. Lannon, M. Lassnig, A. Limosani,
G. Louppe, A. Mangu, P. Mato, H. Meinhard, D. Menasce, L. Moneta,
S. Moortgat, M. Narain, M. Neubauer, H. Newman, H. Pabst, M. Paganini,
M. Paulini, G. Perdue, U. Perez, A. Picazio, J. Pivarski, H. Prosper, F. Psihas,
A. Radovic, R. Reece, A. Rinkevicius, E. Rodrigues, J. Rorie, D. Rousseau,
A. Sauers, S. Schramm, A. Schwartzman, H. Severini, P. Seyfert, F. Siroky,
K. Skazytkin, M. Sokoloff, G. Stewart, B. Stienen, I. Stockdale, G. Strong,
S. Thais, K. Tomko, E. Upfal, E. Usai, A. Ustyuzhanin, M. Vala, S. Val-
lecorsa, J. Vasel, M. Verzetti, X. Vilası́s-Cardona, J.-R. Vlimant, I. Vukotic,
S.-J. Wang, G. Watts, M. Williams, W. Wu, S. Wunsch, and O. Zapata, “Ma-
chine Learning in High Energy Physics Community White Paper,” Journal of
Physics: Conference Series, vol. 1085, p. 022008, Sep. 2018.
BIBLIOGRAPHY 443

[118] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of

Deep Bidirectional Transformers for Language Understanding,” in Proceedings
of the 17th Annual Conference of the North American Chapter of the Asso-
ciation for Computational Linguistics: Human Language Technologies, 2019,
pp. 4171–4186.

[119] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal,

A. Neelakantan, P. Shyam, G. Sastry, A. Askell, and others, “Language models
are few-shot learners,” Advances in Neural Information Processing Systems,
vol. 33, pp. 1877–1901, 2020.

[120] W. L. Hamilton, R. Ying, and J. Leskovec, “Representation Learning on

Graphs: Methods and Applications,” Sep. 2017, arXiv: 1709.05584.

[121] R. Bellman, “Dynamic Programming and Lagrange Multipliers,” Proceedings

of the National Academy of Sciences of the United States of America, vol. 42,
no. 10, pp. 767–769, Oct. 1956.

[122] S. Ravishankar, B. Wen, and Y. Bresler, “Online Sparsifying Transform Learn-

ing—Part I: Algorithms,” IEEE Journal of Selected Topics in Signal Process-
ing, vol. 9, no. 4, pp. 625–636, Jun. 2015, conference Name: IEEE Journal of
Selected Topics in Signal Processing.

[123] C. O. Marrero, M. Kieferova, and N. Wiebe, “Entanglement Induced Barren

Plateaus,” 2020, arXiv:2010.15968.

[124] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recur-
rent neural networks,” in Proceedings of the 30th International Conference on
International Conference on Machine Learning - Volume 28, ser. ICML’13.
Atlanta, GA, USA: JMLR.org, Jun. 2013, pp. III–1310–III–1318.

[125] C. S. Bojer and J. P. Meldgaard, “Kaggle forecasting competitions: An over-

looked learning opportunity,” International Journal of Forecasting, vol. 37,
no. 2, pp. 587–603, Apr. 2021.

[126] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. Tkatchenko,

“Quantum-chemical insights from deep tensor neural networks,” Nature Com-
munications, vol. 8, no. 1, p. 13890, Jan. 2017, number: 1 Publisher: Nature
Publishing Group.

[127] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and

K.-R. Müller, “Machine learning of accurate energy-conserving molecular force
fields,” Science Advances, vol. 3, no. 5, p. e1603015, May 2017.
444 BIBLIOGRAPHY

[128] R. Ramakrishnan, M. Hartmann, E. Tapavicza, and O. A. von Lilienfeld,

“Electronic spectra from TDDFT and machine learning in chemical space,”
The Journal of Chemical Physics, vol. 143, no. 8, p. 084111, Aug. 2015.

[129] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast and
Accurate Modeling of Molecular Atomization Energies with Machine Learn-
ing,” Physical Review Letters, vol. 108, no. 5, p. 058301, Jan. 2012.

[130] A. P. Bartók, M. J. Gillan, F. R. Manby, and G. Csányi, “Machine-learning

approach for one- and two-body corrections to density functional theory: Ap-
plications to molecular and condensed water,” Physical Review B, vol. 88,
no. 5, p. 054104, Aug. 2013.

[131] C. Sutton, L. M. Ghiringhelli, T. Yamamoto, Y. Lysogorskiy, L. Blumenthal,

T. Hammerschmidt, J. R. Golebiowski, X. Liu, A. Ziletti, and M. Scheffler,
“Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle
competition,” npj Computational Materials, vol. 5, no. 1, pp. 1–11, Nov. 2019,
number: 1 Publisher: Nature Publishing Group.

[132] C. Nyshadham, M. Rupp, B. Bekker, A. V. Shapeev, T. Mueller, C. W. Rosen-

brock, G. Csányi, D. W. Wingate, and G. L. W. Hart, “Machine-learned
multi-system surrogate models for materials prediction,” npj Computational
Materials, vol. 5, no. 1, pp. 1–6, Apr. 2019, number: 1 Publisher: Nature
Publishing Group.

[133] W. J. Szlachta, A. P. Bartók, and G. Csányi, “Accuracy and transferability of

Gaussian approximation potential models for tungsten,” Physical Review B,
vol. 90, no. 10, p. 104108, Sep. 2014.

[134] F. A. Faber, A. Lindmaa, O. A. von Lilienfeld, and R. Armiento, “Machine

Learning Energies of 2 Million Elpasolite ABC 2D 6 Crystals,” Physical Review
Letters, vol. 117, no. 13, p. 135502, Sep. 2016.

[135] J. Hoja, L. Medrano Sandonas, B. G. Ernst, A. Vazquez-Mayagoitia, R. A.

DiStasio Jr., and A. Tkatchenko, “QM7-X, a comprehensive dataset of
quantum-mechanical properties spanning the chemical space of small organic
molecules,” Scientific Data, vol. 8, no. 1, p. 43, Feb. 2021.

[136] L. Ruddigkeit, R. van Deursen, L. C. Blum, and J.-L. Reymond, “Enumeration

of 166 Billion Organic Small Molecules in the Chemical Universe Database
GDB-17,” Journal of Chemical Information and Modeling, vol. 52, no. 11, pp.
2864–2875, Nov. 2012.
BIBLIOGRAPHY 445

[137] J. S. Smith, R. Zubatyuk, B. Nebgen, N. Lubbers, K. Barros, A. E. Roitberg,

O. Isayev, and S. Tretiak, “The ANI-1ccx and ANI-1x data sets, coupled-
cluster and density functional theory properties for molecules,” Scientific Data,
vol. 7, no. 1, p. 134, May 2020, number: 1 Publisher: Nature Publishing
Group.

[138] L. Schatzki, A. Arrasmith, P. J. Coles, and M. Cerezo, “Entangled Datasets

for Quantum Machine Learning,” 2021, arXiv:2109.03400.

[139] J. R. Johansson, P. D. Nation, and F. Nori, “QuTiP 2: A Python framework

for the dynamics of open quantum systems,” Computer Physics Communica-
tions, vol. 184, no. 4, pp. 1234–1240, Apr. 2013.

[140] M. Broughton, G. Verdon, T. McCourt, A. J. Martinez, J. H. Yoo, S. V. Isakov,

P. Massey, R. Halavati, M. Yuezhen Niu, A. Zlokapa, E. Peters, O. Lockwood,
A. Skolik, S. Jerbi, V. Dunjko, M. Leib, M. Streif, D. Von Dollen, H. Chen,
S. Cao, R. Wiersema, H.-Y. Huang, J. R. McClean, R. Babbush, S. Boixo,
D. Bacon, A. K. Ho, H. Neven, and M. Mohseni, “Tensorflow quantum: A
software framework for quantum machine learning,” arXiv:2003.02989, 2020.

[141] C. Developers, “Cirq,” May 2021.

[142] N. Killoran, J. Izaac, N. Quesada, V. Bergholm, M. Amy, and C. Weedbrook,

“Strawberry Fields: A Software Platform for Photonic Quantum Computing,”
Quantum, vol. 3, p. 129, Mar. 2019, arXiv: 1804.03159.

[143] G. Aleksandrowicz, T. Alexander, P. Barkoutsos, L. Bello, Y. Ben-Haim,

D. Bucher, F. J. Cabrera-Hernández, J. Carballo-Franquis, A. Chen, C.-
F. Chen, J. M. Chow, A. D. Córcoles-Gonzales, A. J. Cross, A. Cross,
J. Cruz-Benito, C. Culver, S. D. L. P. González, E. D. L. Torre, D. Ding,
E. Dumitrescu, I. Duran, P. Eendebak, M. Everitt, I. F. Sertage, A. Frisch,
A. Fuhrer, J. Gambetta, B. G. Gago, J. Gomez-Mosquera, D. Greenberg,
I. Hamamura, V. Havlicek, J. Hellmers, Lukasz Herok, H. Horii, S. Hu,
T. Imamichi, T. Itoko, A. Javadi-Abhari, N. Kanazawa, A. Karazeev, K. Kr-
sulich, P. Liu, Y. Luh, Y. Maeng, M. Marques, F. J. Martı́n-Fernández, D. T.
McClure, D. McKay, S. Meesala, A. Mezzacapo, N. Moll, D. M. Rodrı́guez,
G. Nannicini, P. Nation, P. Ollitrault, L. J. O’Riordan, H. Paik, J. Pérez,
A. Phan, M. Pistoia, V. Prutyanov, M. Reuter, J. Rice, A. R. Davila, R. H. P.
Rudy, M. Ryu, N. Sathaye, C. Schnabel, E. Schoute, K. Setia, Y. Shi, A. Silva,
Y. Siraichi, S. Sivarajah, J. A. Smolin, M. Soeken, H. Takahashi, I. Taver-
nelli, C. Taylor, P. Taylour, K. Trabing, M. Treinish, W. Turner, D. Vogt-
Lee, C. Vuillot, J. A. Wildstrom, J. Wilson, E. Winston, C. Wood, S. Wood,
446 BIBLIOGRAPHY

S. Wörner, I. Y. Akhalwaya, and C. Zoufal, “Qiskit: An Open-source Frame-

work for Quantum Computing,” Jan. 2019, language: en.

[144] H. M. Wiseman and G. J. Milburn, Quantum Measurement and Control.

Cambridge University Press, 2010.

[145] B. Shaw, M. M. Wilde, O. Oreshkov, I. Kremsky, and D. A. Lidar, “Encoding

one logical qubit into six physical qubits,” Physical Review A, vol. 78, no. 1,
p. 012337, Jul. 2008.

[146] I. Marvian, “Restrictions on realizable unitary operations imposed by symme-

try and locality,” Nature Physics, vol. 18, no. 3, pp. 283–289, 2022.

[147] H. Schättler and U. Ledzewicz, Geometric Optimal Control: Theory, Methods

and Examples. Springer Science & Business Media, Jun. 2012.

[148] R. S. Gupta and M. J. Biercuk, “Machine Learning for Predictive Estimation

of Qubit Dynamics Subject to Dephasing,” Physical Review Applied, vol. 9,
no. 6, p. 064042, Jun. 2018.

[149] S. Mavadia, V. Frey, J. Sastrawan, S. Dona, and M. J. Biercuk, “Prediction and

real-time compensation of qubit decoherence via machine learning,” Nature
Communications, vol. 8, Jan. 2017.

[150] A. Bandrauk, M. Delfour, C. Bris, and U. d. M. C. d. r. mathématiques,

Quantum Control: Mathematical and Numerical Challenges: Mathematical
and Numerical Challenges : CRM Workshop, October 6-11, 2002, Montréal,
Canada, ser. CRM proceedings & lecture notes. American Mathematical
Society, 2003.

[151] L. C. L. Hollenberg, A. S. Dzurak, C. Wellard, A. R. Hamilton, D. J. Reilly,

G. J. Milburn, and R. G. Clark, “Charge-based quantum computing using
single donors in semiconductors,” Physical Review B, vol. 69, no. 11, p. 113301,
Mar. 2004.

[152] N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbrüggen, and S. J. Glaser,

“Optimal control of coupled spin dynamics: design of NMR pulse sequences
by gradient ascent algorithms,” Journal of Magnetic Resonance, vol. 172, no. 2,
pp. 296–305, 2005.

[153] A. Chatterjee, P. Stevenson, S. De Franceschi, A. Morello, N. P. de Leon, and

F. Kuemmeth, “Semiconductor qubits in practice,” Nature Reviews Physics,
vol. 3, no. 3, pp. 157–177, Mar. 2021, number: 3 Publisher: Nature Publishing
Group.
BIBLIOGRAPHY 447

[154] R. Srinivas, S. C. Burd, H. M. Knaack, R. T. Sutherland, A. Kwiatkowski,

S. Glancy, E. Knill, D. J. Wineland, D. Leibfried, A. C. Wilson, D. T. C. All-
cock, and D. H. Slichter, “High-fidelity laser-free universal control of trapped
ion qubits,” Nature, vol. 597, no. 7875, pp. 209–213, Sep. 2021, number: 7875
Publisher: Nature Publishing Group.

[155] G. A. Paz-Silva, L. M. Norris, F. Beaudoin, and L. Viola, “Extending comb-

based spectral estimation to multiaxis quantum noise,” Physical Review A,
vol. 100, no. 4, p. 042334, Oct. 2019.

[156] M. P. Wardrop and A. C. Doherty, “Exchange-based two-qubit gate for singlet-

triplet qubits,” Physical Review B, vol. 90, no. 4, p. 045418, Jul. 2014.

[157] S. J. Orfanidis, Introduction to signal processing. Prentice-Hall, Inc., 1995.

[158] A. Youssry, G. A. Paz-Silva, and C. Ferrie, “Beyond Quantum Noise Spec-

troscopy: modelling and mitigating noise with quantum feature engineering,”
arXiv:2003.06827 [quant-ph], Mar. 2020.

[159] B. Popescu, H. Rahman, and U. Kleinekathöfer, “Chebyshev Expansion Ap-

plied to Dissipative Quantum Systems,” The Journal of Physical Chemistry
A, vol. 120, no. 19, pp. 3270–3277, May 2016.

[160] A. Dewes, F. Ong, V. Schmitt, R. Lauro, N. Boulant, P. Bertet, D. Vion,

and D. Esteve, “Characterization of a two-transmon processor with individual
single-shot qubit readout,” Phys. Rev. Lett., vol. 108, no. 5, p. 057002, 2012.

[161] P. Pernot, B. Huang, and A. Savin, “Impact of non-normal error distributions

on the benchmarking and ranking of quantum machine learning models,” Ma-
chine Learning: Science and Technology, vol. 1, no. 3, p. 035011, Aug. 2020.

[162] N. F. Costa, Y. Omar, A. Sultanov, and G. S. Paraoanu, “Benchmarking

machine learning algorithms for adaptive quantum phase estimation with noisy
intermediate-scale quantum sensors,” EPJ Quantum Technology, vol. 8, no. 1,
p. 16, Dec. 2021, number: 1 Publisher: Springer Berlin Heidelberg.

[163] M. L. Wall, M. R. Abernathy, and G. Quiroz, “Generative machine learn-

ing with tensor networks: Benchmarks on near-term quantum computers,”
Physical Review Research, vol. 3, no. 2, p. 023010, Apr. 2021.

[164] A. Gelman, P. Gelman, and J. Hill, Data Analysis Using Regression and Multi-
level/Hierarchical Models, ser. Analytical Methods for Social Research. Cam-
bridge University Press, 2007.
448 BIBLIOGRAPHY

[165] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,

L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in NIPS. Curran
Associates, Inc., 2017, pp. 5998–6008.

[166] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in

Proceedings of the 22nd ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Asso-
ciation for Computing Machinery, Aug. 2016, pp. 785–794.

[167] D. Greenbaum, “Introduction to Quantum Gate Set Tomography,”

https://arxiv.org/abs/1509.02921v1, Sep. 2015.

[168] J. Gao, L.-F. Qiao, Z.-Q. Jiao, Y.-C. Ma, C.-Q. Hu, R.-J. Ren, A.-L. Yang,
H. Tang, M.-H. Yung, and X.-M. Jin, “Experimental Machine Learning of
Quantum States,” Physical Review Letters, vol. 120, no. 24, p. 240501, Jun.
2018.

[169] Y. S. Teo, S. Shin, H. Jeong, Y. Kim, Y.-H. Kim, G. I. Struchalin, E. V.

Kovlakov, S. S. Straupe, S. P. Kulik, G. Leuchs, and L. L. Sánchez-Soto,
“Benchmarking quantum tomography completeness and fidelity with machine
learning,” New Journal of Physics, vol. 23, no. 10, p. 103021, Oct. 2021.

[170] A. Youssry, C. Ferrie, and M. Tomamichel, “Efficient online quantum state

estimation using a matrix-exponentiated gradient method,” New Journal of
Physics, vol. 21, no. 3, p. 033006, Mar. 2019.

[171] E. Perrier, A. Youssry, and C. Ferrie, “QDataset: Quantum Datasets for Ma-
chine Learning,” arXiv:2108.06661, 2021.

[172] P. Batra, M. H. Ram, and T. S. Mahesh, “Recommender System Expedited

Quantum Control Optimization,” arXiv:2201.12550 [quant-ph], Feb. 2022,
arXiv: 2201.12550.

[173] C. Chen, D. Dong, H. X. Li, J. Chu, and T. J. Tarn, “Fidelity-based prob-

abilistic q-learning for control of quantum systems,” IEEE Transactions on
Neural Networks and Learning Systems, vol. 25, pp. 2162–237, May 2014.

[174] B. Flynn, A. A. Gentile, N. Wiebe, R. Santagati, and A. Laing, “Quantum

Model Learning Agent: characterisation of quantum systems through machine
learning,” Dec. 2021, arXiv:2112.08409 [quant-ph].

[175] N. Khaneja and S. J. Glaser, “Cartan decomposition of SU(2n) and control of

spin systems,” Chemical Physics, vol. 267, no. 1, pp. 11–23, Jun. 2001.
BIBLIOGRAPHY 449

[176] N. Khaneja, “On Some Model Problems in Quantum Control,” Communica-

tions in Information & Systems, vol. 9, no. 1, pp. 1–40, 2009.

[177] A. Ekert, M. Ericsson, P. Hayden, H. Inamori, J. A. Jones, D. K. L. Oi, and

V. Vedral, “Geometric quantum computation,” Journal of Modern Optics,
vol. 47, no. 14–15, pp. 2501–2513, Nov. 2000.

[178] A. R. Brown and L. Susskind, “Complexity geometry of a single qubit,” Phys-

ical Review D, vol. 100, no. 4, p. 046020, Aug. 2019.

[179] H. W. Lin and L. Susskind, “Complexity geometry and Schwarzian dynamics,”

Journal of High Energy Physics, vol. 2020, no. 1, p. 87, Jan. 2020.

[180] E. Farhi, J. Goldstone, and S. Gutmann, “A Quantum Approximate Opti-

mization Algorithm,” Nov. 2014, arXiv:1411.4028 [quant-ph].

[181] M. A. Nielsen, “A Geometric Approach to Quantum Circuit Lower Bounds,”

Quantum Info. Comput., vol. 6, no. 3, pp. 213–262, May 2006, place: Paramus,
NJ Publisher: Rinton Press, Incorporated.

[182] D. D’Alessandro, “Constructive controllability of one and two spin 1/2

particles,” in Proceedings of the 2001 American Control Conference. (Cat.
No.01CH37148), vol. 2, Jun. 2001, pp. 1715–1720 vol.2.

[183] M. A. Nielsen, M. R. Dowling, M. Gu, and A. C. Doherty, “Optimal control,

geometry, and quantum computing,” Physical Review A, vol. 73, no. 6, p.
062323, Jun. 2006.

[184] M. R. Dowling and M. A. Nielsen, “The Geometry of Quantum Computation,”

Quantum Info. Comput., vol. 8, no. 10, pp. 861–899, Nov. 2008.

[185] M. Gu, A. Doherty, and M. A. Nielsen, “Quantum control via geometry: An

explicit example,” Physical Review A, vol. 78, no. 3, p. 032327, Sep. 2008.

[186] D. D’Alessandro, “Lie Algebraic Analysis and Control of Quantum Dynamics,”

Mar. 2008, arXiv:0803.1193 [quant-ph].

[187] P. Leifer, “Quantum geometry of the Cartan control problem,” Oct. 2008,
arXiv:0810.3188 [physics].

[188] B. Li, Z.-H. Yu, and S.-M. Fei, “Geometry of Quantum Computation with
Qutrits,” Scientific Reports, vol. 3, p. 2594, Sep. 2013.

[189] H. N. S. Earp and J. K. Pachos, “A constructive algorithm for the Cartan

decomposition of SU(2N),” Journal of Mathematical Physics, vol. 46, no. 8, p.
082108, Aug. 2005.
450 BIBLIOGRAPHY

[190] A. D. Boozer, “Time-optimal synthesis of SU(2) transformations for a spin-1/2

system,” Physical Review A, vol. 85, no. 1, p. 012317, Jan. 2012.

[191] W. A. d. Graaf, Lie Algebras: Theory and Algorithms. Elsevier, Feb. 2000.

[192] X. Wang, M. Allegra, K. Jacobs, S. Lloyd, C. Lupo, and M. Mohseni, “Quan-

tum Brachistochrone Curves as Geodesics: Obtaining Accurate Minimum-
Time Protocols for the Control of Quantum Systems,” Physical Review Let-
ters, vol. 114, no. 17, p. 170501, Apr. 2015.

[193] W. Huang, “An explicit family of unitaries with exponentially minimal length
Pauli geodesics,” Jan. 2007, arXiv:quant-ph/0701202.

[194] L. Noakes, “A Global algorithm for geodesics,” Journal of the Australian Math-
ematical Society, vol. 65, no. 1, pp. 37–50, Aug. 1998.

[195] K. Shizume, T. Nakajima, R. Nakayama, and Y. Takahashi, “Quantum Com-

putational Riemannian and Sub-Riemannian Geodesics,” Progress of Theoret-
ical Physics, vol. 127, no. 6, pp. 997–1008, Jun. 2012.

[196] H. E. Brandt, “Jacobi fields in the Riemannian geometry of quantum compu-

tation,” in Quantum Information and Computation VIII, vol. 7702. Interna-
tional Society for Optics and Photonics, Apr. 2010, p. 770208.

[197] ——, “Riemannian curvature in the differential geometry of quantum com-

putation,” Physica E: Low-dimensional Systems and Nanostructures, vol. 42,
no. 3, pp. 449–453, Jan. 2010.

[198] ——, “Aspects of the riemannian geometry of quantum computation,” Inter-

national Journal of Modern Physics B, vol. 26, no. 27n28, p. 1243004, Sep.
2012.

[199] ——, “Tools in the Riemannian geometry of quantum computation,” Quantum

Information Processing, vol. 11, no. 3, pp. 787–839, Jun. 2012.

[200] J. P. Lasalle, “The ‘bang-bang’ principle,” IFAC Proceedings Volumes, vol. 1,

no. 1, pp. 503–507, Aug. 1960.

[201] B. Lin, J. Yang, X. He, and J. Ye, “Geodesic Distance Function Learning via
Heat Flow on Vector Fields,” arXiv:1405.0133 [cs, math, stat], May 2014.

[202] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”

Jan. 2017, arXiv:1412.6980 [cs].
BIBLIOGRAPHY 451

[203] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural com-

putation, vol. 9, no. 8, pp. 1735–1780, 1997.

[204] A. Carlini, A. Hosoya, T. Koike, and Y. Okudaira, “Time-optimal unitary

operations,” Physical Review A, vol. 75, no. 4, p. 042308, Apr. 2007.

[205] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes

3rd Edition: The Art of Scientific Computing. Cambridge University Press,
2007.

[206] M. A. Nielsen, Neural Networks and Deep Learning. Determination Press,

2015.

[207] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,

H. Schwenk, and Y. Bengio, “Learning phrase representations us-
ing rnn encoder-decoder for statistical machine translation,” Sep. 2014,
arXiv.1406.1078.

[208] K. Cho, B. v. Merrienboer, D. Bahdanau, and Y. Bengio, “On the Properties of

Neural Machine Translation: Encoder-Decoder Approaches,” in Proceedings of
SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure
in Statistical Translation, D. Wu, M. Carpuat, X. Carreras, and E. M. Vecchi,
Eds., 2014, pp. 103–111.

[209] R. Zeier and T. Schulte-Herbrueggen, “Symmetry Principles in Quantum Sys-

tems Theory,” Journal of Mathematical Physics, vol. 52, no. 11, p. 113510,
Nov. 2011, arXiv:1012.5256 [math-ph, physics:quant-ph].

[210] A. Echeverrı́a-Enrı́quez, J. Marı́n-Solano, M. Munoz-Lecanda, and N. Román-

Roy, “Geometric reduction in optimal control theory with symmetries,” Re-
ports on Mathematical Physics, vol. 52, no. 1, pp. 89–113, 2003.

[211] G. Dirr and U. Helmke, “Lie Theory for Quantum Control,” GAMM-
Mitteilungen, vol. 31, no. 1, pp. 59–93, 2008.

[212] D. DAlessandro and F. Albertini, “Quantum symmetries and Cartan decom-

positions in arbitrary dimensions,” Journal of Physics A: Mathematical and
Theoretical, vol. 40, no. 10, pp. 2439–2453, Feb. 2007.

[213] V. Jurdjevic, “Optimal control, geometry, and mechanics,” Mathematical con-

trol theory, pp. 227–267, 1999.

[214] W.-Q. Liu, X.-J. Zhou, and H.-R. Wei, “Collective unitary evolution with
linear optics by Cartan decomposition,” Europhysics Letters, 2022.
452 BIBLIOGRAPHY

[215] M. J. Bremner, J. L. Dodd, M. A. Nielsen, and D. Bacon, “Fungible dynamics:

There are only two types of entangling multiple-qubit interactions,” Physical
Review A, vol. 69, no. 1, p. 012313, Jan. 2004.

[216] S. S. Bullock and G. K. Brennen, “Canonical decompositions of n-qubit quan-

tum computations and concurrence,” Journal of Mathematical Physics, vol. 45,
no. 6, pp. 2447–2467, May 2004.

[217] B. Drury and P. Love, “Constructive quantum Shannon decomposition from

Cartan involutions,” Journal of Physics A: Mathematical and Theoretical,
vol. 41, no. 39, p. 395305, Sep. 2008.

[218] R. R. Tucci, “An Introduction to Cartan’s KAK Decomposition for QC Pro-

grammers,” Jul. 2005, arXiv:quant-ph/0507171.

[219] T.-C. Yen and A. F. Izmaylov, “Cartan Subalgebra Approach to Efficient

Measurements of Quantum Observables,” PRX Quantum, vol. 2, no. 4, p.
040320, Oct. 2021.

[220] E. Kökcü, T. Steckmann, J. Freericks, E. F. Dumitrescu, and A. F. Kem-

per, “Fixed depth hamiltonian simulation via cartan decomposition,” 2021,
arXiv:2104.00728.

[221] T. Steckmann, T. Keen, A. F. Kemper, E. F. Dumitrescu, and Y. Wang,

“Simulating the Mott transition on a noisy digital quantum computer via
Cartan-based fast-forwarding circuits,” 2021, arXiv:2112.05688.

[222] S. S. Bullock, G. K. Brennen, and D. P. O’leary, “Time reversal and n-qubit

canonical decompositions,” Journal of mathematical physics, vol. 46, no. 6, p.
062104, 2005.

[223] “Cartan’s equivalence method,” Oct. 2018, page Version ID: 862708484.

[224] Z.-Y. Su, “A Scheme of Cartan Decomposition for su(N),” Mar. 2006,
arXiv:quant-ph/0603190.

[225] S. S. Bullock, “Note on the Khaneja Glaser Decomposition,” Mar. 2004,

arXiv:quant-ph/0403141.

[226] C. Gokler, S. Lloyd, P. Shor, and K. Thompson, “Efficiently Controllable

Graphs,” Physical Review Letters, vol. 118, no. 26, p. 260501, Jun. 2017.

[227] E. Kökcü, T. Steckmann, Y. Wang, J. K. Freericks, E. F. Dumitrescu, and

A. F. Kemper, “Fixed Depth Hamiltonian Simulation via Cartan Decom-
position,” Physical Review Letters, vol. 129, no. 7, p. 070501, Aug. 2022,
arXiv:2104.00728 [cond-mat, physics:quant-ph].
BIBLIOGRAPHY 453

[228] G. K. Brennen, “An observable measure of entanglement for pure states of

multi-qubit systems,” quant-ph/0305094, 2003, https://arxiv.org/abs/quant-
ph/0305094.

[229] F. Albertini, D. D’Alessandro, and B. Sheller, “Sub-Riemannian Geodesics

in SU(n)/S(U(n-1) × U(1)) and Optimal Control of Three Level Quantum
Systems,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 1176–
1191, 2020.

[230] A. W. Knapp, Representation theory of semisimple groups: an overview based

on examples. Princeton: Princeton university press, 2001.

[231] R. Hermann, Lie Groups for Physicists, ser. Mathematical physics monograph
series. W. A. Benjamin, 1966.

[232] R. N. Cahn, Semi-simple Lie algebras and their representations. Courier

Corporation, 2014.

[233] T.-L. Wang, L.-N. Wu, W. Yang, G.-R. Jin, N. Lambert, and F. Nori, “Quan-
tum Fisher information as a signature of the superradiant quantum phase
transition,” New J. Phys., vol. 16, p. 063039, 2014.

[234] M. Byrd, “The Geometry of SU(3),” Aug. 1997, arXiv:physics/9708015.

[235] B. C. Hall, Lie groups, Lie algebras, and representations. Springer, 2013.

[236] R. P. Feynman, “Simulating physics with computers,” International Journal

of Theoretical Physics, vol. 21, no. 6, pp. 467–488, Jun. 1982.

[237] C. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J.,

vol. 27, pp. 379–423, 1948.

[238] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum infor-

mation, 10th ed. Cambridge ; New York: Cambridge University Press, 2010.

[239] J. R. Schue, “Hilbert space methods in the theory of Lie algebras,” Transac-
tions of the American Mathematical Society, vol. 95, no. 1, pp. 69–80, 1960.

[240] J. Watrous, “Quantum statistical zero-knowledge,” 2002, arXiv:quant-

ph/0202111.

[241] I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduc-

tion to Quantum Entanglement. Cambridge University Press, 2006.

[242] J. J. Sakurai and E. D. Commins, “Modern quantum mechanics, revised edi-

tion,” 1995.
454 BIBLIOGRAPHY

[243] A. M. Childs, Y. Su, M. C. Tran, N. Wiebe, and S. Zhu, “Theory of trotter

error with commutator scaling,” Physical Review X, vol. 11, no. 1, p. 011020,
2021.

[244] M. Suzuki, “Generalized Trotter’s formula and systematic approximants of

exponential operators and inner derivations with applications to many-body
problems,” Communications in Mathematical Physics, vol. 51, no. 2, pp. 183–
190, 1976.

[245] K. Kraus, A. Böhm, J. D. Dollard, and W. Wootters, States, Effects, and Op-
erations Fundamental Notions of Quantum Theory: Lectures in Mathematical
Physics at the University of Texas at Austin. Springer, 1983.

[246] K. E. Hellwig and K. Kraus, “Pure operations and measurements,” Commu-

nications in Mathematical Physics, vol. 11, no. 3, pp. 214–220, Sep. 1969.

[247] R. V. Hartley, “Transmission of information 1,” Bell System technical journal,

vol. 7, no. 3, pp. 535–563, 1928.

[248] T. M. Cover, P. Gacs, and R. M. Gray, “Kolmogorov’s contributions to infor-

mation theory and algorithmic complexity,” The Annals of Probability, vol. 17,
no. 3, pp. 840–865, 1989.

[249] A. Uhlmann, “The “transition probability” in the state space of a *-algebra,”

Reports on Mathematical Physics, vol. 9, no. 2, pp. 273–279, 1976.

[250] J. F. Doriguello and A. Montanaro, “Quantum sketching protocols for Ham-

ming distance and beyond,” Physical Review A, vol. 99, no. 6, p. 062331, Jun.
2019.

[251] G. Verdon, J. Marks, S. Nanda, S. Leichenauer, and J. Hidary, “Quantum

Hamiltonian-Based Models and the Variational Quantum Thermalizer Algo-
rithm,” Oct. 2019, arXiv:1910.02071 [quant-ph].

[252] Z. Liu and C. Zheng, “Recurrence Theorem for Open Quantum Systems,”
Feb. 2024, arXiv:2402.19143 [quant-ph].

[253] A. A. Clerk, “Quantum noise and quantum measurement,” Quantum Ma-

chines: Measurement and Control of Engineered Quantum Systems, 2008.

[254] A. A. Clerk, M. H. Devoret, S. M. Girvin, F. Marquardt, and R. J. Schoelkopf,

“Introduction to quantum noise, measurement, and amplification,” Reviews of
Modern Physics, vol. 82, no. 2, pp. 1155–1208, Apr. 2010.
BIBLIOGRAPHY 455

[255] B. Collins and P. Śniady, “Integration with respect to the Haar measure on
unitary, orthogonal and symplectic group,” Communications in Mathematical
Physics, vol. 264, no. 3, pp. 773–795, 2006.

[256] M. Larocca, N. Ju, D. Garcı́a-Martı́n, P. J. Coles, and M. Cerezo, “Theory

of overparametrization in quantum neural networks,” Nature Computational
Science, vol. 3, no. 6, pp. 542–551, Jun. 2023, number: 6 Publisher: Nature
Publishing Group.

[257] C. J. Wood, J. D. Biamonte, and D. G. Cory, “Tensor networks and graphical

calculus for open quantum systems,” Quant. Inf. Comp., vol. 15, pp. 0579–
0811, 2015.

[258] R. Orús, “A practical introduction to tensor networks: Matrix product states

and projected entangled pair states,” Annals of Physics, vol. 349, pp. 117–158,
2014.

[259] J. J. Meyer, M. Mularski, E. Gil-Fuster, A. A. Mele, F. Arzani, A. Wilms, and

J. Eisert, “Exploiting symmetry in variational quantum machine learning,”
May 2022, arXiv:2205.06217 [quant-ph].

[260] T. S. Cohen, “Equivariant convolutional networks,” PhD Thesis, University

of Amsterdam, 2021.

[261] P. Mernyei, K. Meichanetzidis, and I. I. Ceylan, “Equivariant quantum graph

circuits,” in International Conference on Machine Learning. PMLR, 2022,
pp. 15 401–15 420.

[262] W. Ambrose and I. M. Singer, “A theorem on holonomy,” Transactions of the

American Mathematical Society, vol. 75, no. 3, pp. 428–443, 1953.

[263] N. de Silva and R. S. Barbosa, “Contextuality and noncommutative geometry

in quantum mechanics,” Communications in Mathematical Physics, vol. 365,
no. 2, p. 375–429, Jan. 2019.

[264] Tao, Terence, “The Euler-Arnold equation,” Jun. 2010.

[265] Scott Aaronson, “The learnability of quantum states,” Proceedings of the Royal
Society A: Mathematical, Physical and Engineering Sciences, vol. 463, no.
2088, pp. 3089–3114, Dec. 2007.

[266] S. Villar, D. W. Hogg, K. Storey-Fisher, W. Yao, and B. Blum-Smith, “Scalars

are universal: Equivariant machine learning, structured like classical physics,”
Advances in Neural Information Processing Systems, vol. 34, pp. 28 848–
28 863, 2021.
456 BIBLIOGRAPHY

[267] A. Skolik, M. Cattelan, S. Yarkoni, T. Bäck, and V. Dunjko, “Equivariant

quantum circuits for learning on weighted graphs,” arXiv:2205.06109, 2022.

[268] J. Cerezo, J. Kubelka, R. Robbes, and A. Bergel, “Building an Expert Rec-

ommender Chatbot,” May 2019, pp. 59–63.

[269] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, “Cost function

dependent barren plateaus in shallow parametrized quantum circuits,” Nature
Communications, vol. 12, no. 1, pp. 1–12, 2021.

[270] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii,

J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, “Variational
quantum algorithms,” Nature Reviews Physics, vol. 3, no. 1, pp. 625–644,
2021.

[271] S. Aaronson, Quantum Computing Since Democritus, ser. Quantum Comput-

ing Since Democritus. Cambridge University Press, 2013.

[272] P. McCullagh and J. A. Nelder, Generalized linear models. Chapman & Hall,
1989.

[273] N. Cristianini, J. Shawe-Taylor, and others, An introduction to support vector

machines and other kernel-based learning methods. Cambridge University
Press, 2000.

[274] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac,

T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Platen, C. Ma,
Y. Jernite, J. Plu, C. Xu, T. Scao, S. Gugger, M. Drame, Q. Lhoest, and
A. Rush, “Transformers: State-of-the-Art Natural Language Processing,” in
EMNLP Demo. ACL, 2020.

[275] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimiza-
tion,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp.
67–82, Apr. 1997, conference Name: IEEE Transactions on Evolutionary Com-
putation.

[276] J. H. Friedman and W. Stuetzle, “Projection pursuit regression,” Journal of

the American statistical Association, vol. 76, no. 376, pp. 817–823, 1981.

[277] E. Fiesler, “Neural network classification and formalization,” Computer Stan-

dards & Interfaces, vol. 16, no. 3, pp. 231–239, Jul. 1994.

[278] A. Martins and R. Astudillo, “From Softmax to Sparsemax: A Sparse Model

of Attention and Multi-Label Classification,” in ICML, vol. 48. JMLR.org,
2016, pp. 1614–1623.
BIBLIOGRAPHY 457

[279] K. Hornik, M. Stinchcombe, H. White, and others, “Multilayer feedforward

networks are universal approximators.” Neural networks, vol. 2, no. 5, pp.
359–366, 1989.

[280] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations

by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.

[281] P. J. Werbos, “Backpropagation through time: what it does and how to do

it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.

[282] Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal Brain Damage,” in Advances

in Neural Information Processing Systems, 1990, pp. 598–605.

[283] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning

applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11,
pp. 2278–2324, 1998.

[284] K. Beer, M. Khosla, J. Köhler, and T. J. Osborne, “Quantum machine learning

of graph-structured data,” Mar. 2021, arXiv:2103.10837 [quant-ph].

[285] R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. d. Wolf, “Quantum lower

bounds by polynomials,” J. ACM, vol. 48, pp. 4–5411, Jul. 2001.

[286] F. Nielsen and F. Barbaresco, Geometric science of information. Springer,

2015.

[287] F. Nielsen, “An elementary introduction to information geometry,” Entropy,

vol. 22, no. 10, p. 1100, 2020.

[288] S.-I. Amari, “Natural gradient works efficiently in learning,” Neural computa-
tion, vol. 10, no. 2, pp. 251–276, 1998.

[289] S. Bonnabel, “Stochastic gradient descent on Riemannian manifolds,” IEEE

Transactions on Automatic Control, vol. 58, no. 9, pp. 2217–2229, 2013.

[290] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding

deep learning (still) requires rethinking generalization,” Communications of
the ACM, vol. 64, no. 3, pp. 107–115, 2021.

[291] S. Watanabe, Algebraic geometry and statistical learning theory. Cambridge

University Press, 2009, vol. 25.

[292] S. Wei, D. Murfet, M. Gong, H. Li, J. Gell-Redman, and T. Quella, “Deep

learning is singular, and that’s good,” IEEE Transactions on Neural Networks
and Learning Systems, vol. 34, no. 12, pp. 10 473–10 486, 2022.
458 BIBLIOGRAPHY

[293] S. Aaronson, “How Much Structure Is Needed for Huge Quantum Speedups?”
Sep. 2022, arXiv:2209.06930 [quant-ph].

[294] M. Schuld, I. Sinayskiy, and F. Petruccione, “The quest for a quantum neural
network,” Quantum Information Processing, vol. 13, no. 11, pp. 2567–2586,
2014.

[295] ——, “Quantum computing for pattern classification,” in Trends in Artificial

Intelligence, 2014, pp. 208–220, eprint: 1412.3646.

[296] D. Boob, S. S. Dey, and G. Lan, “Complexity of training relu neural network,”
Discrete Optimization, p. 100620, 2020.

[297] S. Kak, “On quantum neural computing,” Information Sciences, vol. 83, pp.
20–0255, 1995.

[298] M. Peruš, “Neural networks as a basis for quantum associative networks,”

Neural Netw. World, vol. 10, pp. 1001–1013, 2000.

[299] E. C. Behrman, J. Niemel, J. E. Steck, and S. R. Skinner, “A quantum dot

neural network,” 1996.

[300] D. S. Abrams and S. Lloyd, “Quantum algorithm providing exponential speed

increase for finding eigenvalues and eigenvectors,” Physical Review Letters,
vol. 83, no. 24, p. 5162, 1999.

[301] H.-Y. Huang, R. Kueng, and J. Preskill, “Information-Theoretic Bounds on

Quantum Advantage in Machine Learning,” Phys. Rev. Lett., vol. 126, no. 19,
p. 190505, May 2021.

[302] Y. Du, M.-H. Hsieh, T. Liu, S. You, and D. Tao, “On the learnability of
quantum neural networks,” arXiv:2007.12369, 2020.

[303] M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. Sohl-Dickstein, “On

the expressive power of deep neural networks,” in international conference on
machine learning. PMLR, 2017, pp. 2847–2854.

[304] C.-C. Chen, M. Watabe, K. Shiba, M. Sogabe, K. Sakamoto, and T. Sogabe,

“On the Expressibility and Overfitting of Quantum Circuit Learning,” ACM
Transactions on Quantum Computing, vol. 2, no. 2, pp. 1–24, 2021.

[305] S. Sim, P. D. Johnson, and A. Aspuru-Guzik, “Expressibility and Entangling

Capability of Parameterized Quantum Circuits for Hybrid Quantum-Classical
Algorithms,” Advanced Quantum Technologies, vol. 2, no. 12, p. 1900070, 2019.
BIBLIOGRAPHY 459

[306] K. Nakaji and N. Yamamoto, “Expressibility of the alternating layered ansatz

for quantum computation,” Quantum, vol. 5, p. 434, 2021.

[307] C. Ciliberto, A. Rocchetto, A. Rudi, and L. Wossnig, “Statistical limits of

supervised quantum learning,” Physical Review A, vol. 102, no. 4, p. 042414,
Oct. 2020.

[308] Z. Holmes, A. Arrasmith, B. Yan, P. J. Coles, A. Albrecht, and A. T. Sorn-

borger, “Barren plateaus preclude learning scramblers,” Physical Review Let-
ters, vol. 126, no. 19, p. 190501, 2021.

[309] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, “Connecting ansatz ex-

pressibility to gradient magnitudes and barren plateaus,” PRX Quantum,
vol. 3, no. 1, p. 010313, Jan. 2022.

[310] Z. Liu, L.-W. Yu, L.-M. Duan, and D.-L. Deng, “The Presence and Ab-
sence of Barren Plateaus in Tensor-network Based Machine Learning,” 2021,
arXiv:2108.08312.

[311] T. L. Patti, K. Najafi, X. Gao, and S. F. Yelin, “Entanglement devised barren

plateau mitigation,” Physical Review Research, vol. 3, no. 3, p. 033090, 2021.

[312] C. O. Marrero, M. Kieferová, and N. Wiebe, “Entanglement-induced barren

plateaus,” PRX Quantum, vol. 2, no. 4, p. 040316, 2021.

[313] A. Rad, A. Seif, and N. M. Linke, “Surviving The Barren Plateau in

Variational Quantum Circuits with Bayesian Learning Initialization,” 2022,
arXiv:2203.02464.

[314] M. Larocca, P. Czarnik, K. Sharma, G. Muraleedharan, P. J. Coles, and

M. Cerezo, “Diagnosing Barren Plateaus with Tools from Quantum Optimal
Control,” Quantum, vol. 6, p. 824, Sep. 2022.

[315] A. A. Mele, G. B. Mbeng, G. E. Santoro, M. Collura, and P. Torta, “Avoiding

barren plateaus via transferability of smooth solutions in Hamiltonian Varia-
tional Ansatz,” arXiv:2206.01982, 2022.

[316] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J.

Coles, “Noise-induced barren plateaus in variational quantum algorithms,”
Nature Communications, vol. 12, no. 1, pp. 1–11, 2021.

[317] A. Pesah, M. Cerezo, S. Wang, T. Volkoff, A. T. Sornborger, and P. J. Coles,

“Absence of Barren Plateaus in Quantum Convolutional Neural Networks,”
Physical Review X, vol. 11, no. 4, p. 041011, 2021.
460 BIBLIOGRAPHY

[318] T. Haug and M. Kim, “Optimal training of variational quantum algorithms

without barren plateaus,” 2021, arXiv:2104.14543.

[319] H.-Y. Liu, T.-P. Sun, Y.-C. Wu, Y.-J. Han, and G.-P. Guo, “A Parameter
Initialization Method for Variational Quantum Algorithms to Mitigate Barren
Plateaus Based on Transfer Learning,” arXiv:2112.10952, 2021.

[320] K. Zhang, M.-H. Hsieh, L. Liu, and D. Tao, “Gaussian initializations help
deep variational quantum circuits escape from the barren plateau,” 2022,
arXiv:2203.09376.

[321] C. Zhao and X.-S. Gao, “Analyzing the barren plateau phenomenon in training
quantum neural networks with the ZX-calculus,” Quantum, vol. 5, p. 466, Jun.
2021.

[322] M. Bartholomew-Biggs, S. Brown, B. Christianson, and L. Dixon, “Automatic

differentiation of algorithms,” Journal of Computational and Applied Mathe-
matics, vol. 124, no. 1, pp. 171–190, Dec. 2000.

[323] S. P. Jordan, “Fast quantum algorithm for numerical gradient estimation,”

Physical Review Letters, vol. 95, no. 5, p. 050501, Jul. 2005, arXiv:quant-
ph/0405146.

[324] A. Gilyén, S. Arunachalam, and N. Wiebe, “Optimizing quantum optimiza-

tion algorithms via faster quantum gradient computation,” arXiv e-prints, p.
arXiv:1711.00465, Nov. 2017, eprint: 1711.00465.

[325] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantum circuit learn-

ing,” Physical Review A, vol. 98, no. 3, p. 032309, 2018.

[326] D. Wierichs, J. Izaac, C. Wang, and C. Y.-Y. Lin, “General parameter-shift

rules for quantum gradients,” Quantum, 2022.

[327] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, “Quantum natural gradient,”

Quantum, vol. 4, p. 269, 2020.

[328] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann, D. Scheier-

mann, and R. Wolf, “Training deep quantum neural networks,” Nature Com-
munications, vol. 11, no. 1, p. 808, 2020, iSBN: 2041-1723.

[329] A. Moskalev, A. Sepliarskaia, I. Sosnovik, and A. Smeulders, “LieGG: Study-

ing Learned Lie Group Generators,” Jan. 2023, arXiv:2210.04345 [cs].
BIBLIOGRAPHY 461

[330] M. Lu and F. Li, “Survey on lie group machine learning,” Big Data Mining
and Analytics, vol. 3, no. 4, pp. 235–258, Dec. 2020, conference Name: Big
Data Mining and Analytics.

[331] R. Bellman, Introduction to the mathematical theory of control processes: Lin-

ear equations and quadratic criteria. Elsevier, 2016.

[332] F. Li and H. Xu, “The theory framework of Lie group machine learning
(LML),” Computer Technology and application, vol. 1, no. 3, pp. 62–80, 2007.

[333] L. Schatzki, M. Larocca, F. Sauvage, and M. Cerezo, “‘Theoretical Guaran-

tees for Permutation-Equivariant Quantum Neural Networks,” Manuscript in
preparation, 2022.

[334] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk min-

imization,” 2019, arXiv:1907.02893.

Dr. Frank Zickert - Hands-On Quantum Machine Learning With Python Volume 1 - Get Started-PYQML (2021)
100% (5)
Dr. Frank Zickert - Hands-On Quantum Machine Learning With Python Volume 1 - Get Started-PYQML (2021)
435 pages
Download Complete A Relativist s Toolkit The Mathematics of Black Hole Mechanics Eric Poisson PDF for All Chapters
100% (8)
Download Complete A Relativist s Toolkit The Mathematics of Black Hole Mechanics Eric Poisson PDF for All Chapters
57 pages
303012357X Mathematics of Quantum Computing An Introduction (Scherer 2019-11-13) (29B45CBD) PDF
100% (5)
303012357X Mathematics of Quantum Computing An Introduction (Scherer 2019-11-13) (29B45CBD) PDF
773 pages
The Inverse Theory of Relativity, An Alternative To Unification
No ratings yet
The Inverse Theory of Relativity, An Alternative To Unification
31 pages
Principles - of .Quantum - Artificial.Intelligence PDF
100% (5)
Principles - of .Quantum - Artificial.Intelligence PDF
277 pages
Introduction To Riemannian Manifolds
100% (5)
Introduction To Riemannian Manifolds
447 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
Shakeel Ur Rahman
No ratings yet
Shakeel Ur Rahman
80 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
Introduction To Classical and Quantum Computing
No ratings yet
Introduction To Classical and Quantum Computing
392 pages
Intro To Quantum Computing
No ratings yet
Intro To Quantum Computing
742 pages
Intro To QC Vol 1 Loceff PDF
No ratings yet
Intro To QC Vol 1 Loceff PDF
742 pages
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
Lectures Qco
No ratings yet
Lectures Qco
124 pages
Premath
No ratings yet
Premath
35 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Barak Shoshany PHYS 4P51 Lecture Notes
No ratings yet
Barak Shoshany PHYS 4P51 Lecture Notes
180 pages
v584
No ratings yet
v584
161 pages
QC Notes
No ratings yet
QC Notes
141 pages
Barak Shoshany PHY 256 Lecture Notes
100% (1)
Barak Shoshany PHY 256 Lecture Notes
167 pages
Quantum Problems
No ratings yet
Quantum Problems
255 pages
كوانتم كومبيوتك
No ratings yet
كوانتم كومبيوتك
165 pages
Qutip Doc 4.3 PDF
No ratings yet
Qutip Doc 4.3 PDF
317 pages
1403 7050
No ratings yet
1403 7050
164 pages
QI Notes201123
No ratings yet
QI Notes201123
709 pages
Introduction To: Computational Quantum Mechanics
No ratings yet
Introduction To: Computational Quantum Mechanics
120 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 scribd download
100% (3)
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 scribd download
37 pages
Barak Shoshany PHY 256 Lecture Notes
No ratings yet
Barak Shoshany PHY 256 Lecture Notes
165 pages
Entropy: Quantum Simulation Logic, Oracles, and The Quantum Advantage
No ratings yet
Entropy: Quantum Simulation Logic, Oracles, and The Quantum Advantage
76 pages
Script QI
No ratings yet
Script QI
199 pages
Book Lecture
No ratings yet
Book Lecture
241 pages
Unconventional Computation - MacLennan - 2018
No ratings yet
Unconventional Computation - MacLennan - 2018
304 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
Qbook 1
No ratings yet
Qbook 1
438 pages
02 Whole
No ratings yet
02 Whole
138 pages
Algebraic Method Qcomputing
No ratings yet
Algebraic Method Qcomputing
160 pages
Qcnotes PDF
No ratings yet
Qcnotes PDF
218 pages
1803 07098 PDF
No ratings yet
1803 07098 PDF
38 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 scribd download
No ratings yet
Quantum Computers: Theory and Algorithms 1st Edition Belal Ehsan Baaquie 2024 scribd download
37 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
No ratings yet
"Thinking Quantum": Lectures On Quantum Theory For High-School Students
38 pages
Math For Data Science
100% (1)
Math For Data Science
554 pages
22 Scheme Physics For Cse Module 3 Notes
No ratings yet
22 Scheme Physics For Cse Module 3 Notes
45 pages
Fundamental Components of Deep Learning: A Category-Theoretic Approach PHD Thesis
No ratings yet
Fundamental Components of Deep Learning: A Category-Theoretic Approach PHD Thesis
272 pages
Qclec
No ratings yet
Qclec
260 pages
03 Algorithm and Error Correction New Updates
No ratings yet
03 Algorithm and Error Correction New Updates
102 pages
Bassano Vacchini PDF
No ratings yet
Bassano Vacchini PDF
151 pages
Quantum Computers - Theory and Algorithms by Belal Ehsan Baaquie Leong-Chuan Kwek (Springer 2023)
No ratings yet
Quantum Computers - Theory and Algorithms by Belal Ehsan Baaquie Leong-Chuan Kwek (Springer 2023)
297 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Explorations in Computational Physics
From Everand
Explorations in Computational Physics
Devang Patil
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Design and Technology in Today's World: A First Look
From Everand
Design and Technology in Today's World: A First Look
Baz Professor
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Seismic Isolation and Response Control
From Everand
Seismic Isolation and Response Control
Eftychia Apostolidi
No ratings yet
A_Model_for_Decision-making_to_Parameterizing_Dema
No ratings yet
A_Model_for_Decision-making_to_Parameterizing_Dema
17 pages
InventoryManagementModelforReducingStockoutratebyApplyingLeanWarehousingandDDMRPToolsinaSMEsintheCommercialSector_WSCE-2023_055
No ratings yet
InventoryManagementModelforReducingStockoutratebyApplyingLeanWarehousingandDDMRPToolsinaSMEsintheCommercialSector_WSCE-2023_055
11 pages
The_scope_of_demand-driven_material_requirements_p
No ratings yet
The_scope_of_demand-driven_material_requirements_p
21 pages
Analysis_of_a_new_dynamic_capacity_management_appr
No ratings yet
Analysis_of_a_new_dynamic_capacity_management_appr
18 pages
Ssrn-4490327_A Simulation-Driven Machine Learning Framework For
No ratings yet
Ssrn-4490327_A Simulation-Driven Machine Learning Framework For
37 pages
ssrn-3888897_End-to-End Deep Learning for Automatic Inventory
No ratings yet
ssrn-3888897_End-to-End Deep Learning for Automatic Inventory
50 pages
Fermi Coordinates
No ratings yet
Fermi Coordinates
3 pages
Connections
No ratings yet
Connections
8 pages
Greg Egan - Foundations 2 - From Special To General
No ratings yet
Greg Egan - Foundations 2 - From Special To General
21 pages
Outline Term1 3-4
No ratings yet
Outline Term1 3-4
3 pages
Buy ebook (Ebook) Modeling and Control in Vibrational and Structural Dynamics: A Differential Geometric Approach (Chapman & Hall CRC Applied Mathematics & Nonlinear Science) by Peng-Fei Yao ISBN 1439834555 cheap price
100% (4)
Buy ebook (Ebook) Modeling and Control in Vibrational and Structural Dynamics: A Differential Geometric Approach (Chapman & Hall CRC Applied Mathematics & Nonlinear Science) by Peng-Fei Yao ISBN 1439834555 cheap price
71 pages
The Many Faces of Gauss-Bonnet Theorem
No ratings yet
The Many Faces of Gauss-Bonnet Theorem
12 pages
Raychaudhuri
No ratings yet
Raychaudhuri
50 pages
M435 Chapter 5 Geodesics
No ratings yet
M435 Chapter 5 Geodesics
8 pages
General Relativity Math
No ratings yet
General Relativity Math
88 pages
Geometry and Statistics - Handbook of Statistics, Volume 46 Frank Nielsen & Arni S.R. Srinivasa Rao & C.R. Rao (Editors) pdf download
100% (2)
Geometry and Statistics - Handbook of Statistics, Volume 46 Frank Nielsen & Arni S.R. Srinivasa Rao & C.R. Rao (Editors) pdf download
53 pages
D22 B
No ratings yet
D22 B
3 pages
Students
No ratings yet
Students
104 pages
Relativity Essay
No ratings yet
Relativity Essay
15 pages
Analysis and Mathematical Physics - Shaun Bullett, Tom Fearn, Frank Smith
No ratings yet
Analysis and Mathematical Physics - Shaun Bullett, Tom Fearn, Frank Smith
243 pages
[FREE PDF sample] Geometry and Statistics - Handbook of Statistics, Volume 46 Frank Nielsen & Arni S.R. Srinivasa Rao & C.R. Rao (Editors) ebooks
100% (3)
[FREE PDF sample] Geometry and Statistics - Handbook of Statistics, Volume 46 Frank Nielsen & Arni S.R. Srinivasa Rao & C.R. Rao (Editors) ebooks
51 pages
Local Mathematics For Local Physics From Number Scaling To Gauge Theory And Cosmology Paul Benioff instant download
100% (1)
Local Mathematics For Local Physics From Number Scaling To Gauge Theory And Cosmology Paul Benioff instant download
79 pages
Parallel Transport and Geodesics
No ratings yet
Parallel Transport and Geodesics
35 pages
Differential Geometry and Lie Groups for Physicists 1st Edition Marián Fecko - Discover the ebook with all chapters in just a few seconds
100% (1)
Differential Geometry and Lie Groups for Physicists 1st Edition Marián Fecko - Discover the ebook with all chapters in just a few seconds
56 pages
A Relativist s Toolkit The Mathematics of Black Hole Mechanics Eric Poisson download
No ratings yet
A Relativist s Toolkit The Mathematics of Black Hole Mechanics Eric Poisson download
56 pages
Differential Geometry and Lie Groups for Physicists 1st Edition Marián Fecko all chapter instant download
100% (3)
Differential Geometry and Lie Groups for Physicists 1st Edition Marián Fecko all chapter instant download
42 pages
XXXXHFGJ
No ratings yet
XXXXHFGJ
3 pages
The Reissner-Nordström Metric: Jonatan Nordebo March 16, 2016
No ratings yet
The Reissner-Nordström Metric: Jonatan Nordebo March 16, 2016
46 pages
Pages From Notes09
No ratings yet
Pages From Notes09
10 pages
Barilari GeoDiff
No ratings yet
Barilari GeoDiff
145 pages
Lecture 0: Differential Geometry, General Relativity, Classical Yang-Mills Theory
No ratings yet
Lecture 0: Differential Geometry, General Relativity, Classical Yang-Mills Theory
7 pages
Riemannian Geometry
No ratings yet
Riemannian Geometry
64 pages
Prof. Murray Lecture On Line Bundles
No ratings yet
Prof. Murray Lecture On Line Bundles
16 pages