100% found this document useful (8 votes)
72 views83 pages

Programming Languages and Systems 21st European Symposium on Programming ESOP 2012 Held as Part of the European Joint Conferences on Theory and Practice of Software ETAPS 2012 Tallinn Estonia March 24 April 1 2012 Proceedings 1st Edition by Helmut Seidl 3642288693 9783642288692 - Download the ebook now and own the full detailed content

The document provides information about various eBooks available for download at ebookball.com, including titles related to programming languages, software science, and nursing. It highlights the 31st European Symposium on Programming (ESOP 2022) and the 25th European Joint Conferences on Theory and Practice of Software (ETAPS 2022), detailing their organization, acceptance rates, and featured speakers. Additionally, it mentions the optional artifact evaluation introduced at ESOP 2022 for accepted papers.

Uploaded by

galyeatusi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (8 votes)
72 views83 pages

Programming Languages and Systems 21st European Symposium on Programming ESOP 2012 Held as Part of the European Joint Conferences on Theory and Practice of Software ETAPS 2012 Tallinn Estonia March 24 April 1 2012 Proceedings 1st Edition by Helmut Seidl 3642288693 9783642288692 - Download the ebook now and own the full detailed content

The document provides information about various eBooks available for download at ebookball.com, including titles related to programming languages, software science, and nursing. It highlights the 31st European Symposium on Programming (ESOP 2022) and the 25th European Joint Conferences on Theory and Practice of Software (ETAPS 2022), detailing their organization, acceptance rates, and featured speakers. Additionally, it mentions the optional artifact evaluation introduced at ESOP 2022 for accepted papers.

Uploaded by

galyeatusi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Quick and Easy Ebook Downloads – Start Now at ebookball.

com for Instant Access

Programming Languages and Systems 21st European


Symposium on Programming ESOP 2012 Held as Part of
the European Joint Conferences on Theory and
Practice of Software ETAPS 2012 Tallinn Estonia
March 24 April 1 2012 Proceedings 1st Edition by
Helmut Seidl 3642288693 9783642288692
https://ebookball.com/product/programming-languages-and-
systems-21st-european-symposium-on-programming-
esop-2012-held-as-part-of-the-european-joint-conferences-on-
theory-and-practice-of-software-etaps-2012-tallinn-estonia-
march-24/

OR CLICK BUTTON

DOWLOAD NOW

Instantly Access and Download Textbook at https://ebookball.com


Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...

Read anywhere, anytime, on any device!

Foundations of Software Science and Computation Structures


25th International Conference, FOSSACS 2022 Held as Part
of the European Joint Conferences on Theory and Practice
of Software, ETAPS 2022 Munich, Germany, April 2–7, 2022
https://ebookball.com/product/foundations-of-software-science-and-
Proceedings 1st edition by Patricia Bouyer, Lutz Schröder
computation-structures-25th-international-conference-
fossacs-2022-held-as-part-of-the-european-joint-conferences-on-theory-
366254458X 9783662544587
and-practice-of-software-etaps-2022-munic/
ebookball.com

Principles of Security and Trust 7th International


Conference POST 2018 Held as Part of the European Joint
Conferences on Theory and Practice of Software ETAPS 2018
Thessaloniki Greece April 14 20 2018 Proceedings 7th
https://ebookball.com/product/principles-of-security-and-trust-7th-
ediion by Lujo Bauer, Ralf Küsters ISBN 3319897217
international-conference-post-2018-held-as-part-of-the-european-joint-
conferences-on-theory-and-practice-of-software-
978-3319897219
etaps-2018-thessaloniki-greece-april-14-20-201/
ebookball.com

Windows Command Line for Windows 8 1 Windows Server 2012


Windows Server 2012 R2 1st Edition by William Stanek ISBN
B00S7AWSIC
https://ebookball.com/product/windows-command-line-for-
windows-8-1-windows-server-2012-windows-server-2012-r2-1st-edition-by-
william-stanek-isbn-b00s7awsic-15962/

ebookball.com

Advances in Computing and Information Technology


Proceedings of the Second International Conference on
Advances in Computing and Information Technology July 13
15 2012 Chennai India 1st Edition by Kathy Schwalbe ISBN
https://ebookball.com/product/advances-in-computing-and-information-
B004U0Y3B8
technology-proceedings-of-the-second-international-conference-on-
advances-in-computing-and-information-technology-
july-13-15-2012-chennai-india-1st-edition-by-kat/
ebookball.com
Quintessence of Dental Technology 2012 1st Edition by
Duarte, Sillas ISBN 0867155620 9780867155624

https://ebookball.com/product/quintessence-of-dental-
technology-2012-1st-edition-by-duarte-sillas-
isbn-0867155620-9780867155624-7054/

ebookball.com

Knowledge based Software Engineering Proceedings of the


Seventh Joint Conference On Knowledge Based Software
Engineering 1st Edition by Enn Tyugu, Takahira Yamaguchi
ISBN 1586036408 9781586036409
https://ebookball.com/product/knowledge-based-software-engineering-
proceedings-of-the-seventh-joint-conference-on-knowledge-based-
software-engineering-1st-edition-by-enn-tyugu-takahira-yamaguchi-
isbn-1586036408-9781586036409-19744/
ebookball.com

Nursing Diagnoses Definitions and Classification 2012 14


1st edition by NANDA International ISBN 1118301935
9781118301937
https://ebookball.com/product/nursing-diagnoses-definitions-and-
classification-2012-14-1st-edition-by-nanda-international-
isbn-1118301935-9781118301937-2238/

ebookball.com

Software Engineering And Algorithms Proceedings


Proceedings of 10th Computer Science On-line Conference
2021 Vol 1 10th Edition by Radek Silhavy 3030774414
9783030774417
https://ebookball.com/product/software-engineering-and-algorithms-
proceedings-proceedings-of-10th-computer-science-on-line-
conference-2021-vol-1-10th-edition-by-radek-
silhavy-3030774414-9783030774417-16062/
ebookball.com

Lippincotts Nursing Drug Guide 2012 1st Edition by Amy


Karch 1609136217 978-1609136215

https://ebookball.com/product/lippincotts-nursing-drug-guide-2012-1st-
edition-by-amy-karch-1609136217-978-1609136215-1392/

ebookball.com
ARCoSS Ilya Sergey (Ed.)

Programming
LNCS 13240

Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings
Lecture Notes in Computer Science 13240
Founding Editors
Gerhard Goos, Germany
Juris Hartmanis, USA

Editorial Board Members


Elisa Bertino, USA Gerhard Woeginger , Germany
Wen Gao, China Moti Yung , USA
Bernhard Steffen , Germany

Advanced Research in Computing and Software Science


Subline of Lecture Notes in Computer Science

Subline Series Editors


Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy
Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board


Susanne Albers, TU Munich, Germany
Benjamin C. Pierce, University of Pennsylvania, USA
Bernhard Steffen , University of Dortmund, Germany
Deng Xiaotie, Peking University, Beijing, China
Jeannette M. Wing, Microsoft Research, Redmond, WA, USA
More information about this series at https://link.springer.com/bookseries/558
Ilya Sergey (Ed.)

Programming
Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings

123
Editor
Ilya Sergey
National University of Singapore
Singapore, Singapore

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-030-99335-1 ISBN 978-3-030-99336-8 (eBook)
https://doi.org/10.1007/978-3-030-99336-8
© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ETAPS Foreword

Welcome to the 25th ETAPS! ETAPS 2022 took place in Munich, the beautiful capital
of Bavaria, in Germany.
ETAPS 2022 is the 25th instance of the European Joint Conferences on Theory and
Practice of Software. ETAPS is an annual federated conference established in 1998,
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each
conference has its own Program Committee (PC) and its own Steering Committee
(SC). The conferences cover various aspects of software systems, ranging from theo-
retical computer science to foundations of programming languages, analysis tools, and
formal approaches to software engineering. Organizing these conferences in a coherent,
highly synchronized conference program enables researchers to participate in an
exciting event, having the possibility to meet many colleagues working in different
directions in the field, and to easily attend talks of different conferences. On the
weekend before the main conference, numerous satellite workshops took place that
attract many researchers from all over the globe.
ETAPS 2022 received 362 submissions in total, 111 of which were accepted,
yielding an overall acceptance rate of 30.7%. I thank all the authors for their interest in
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con-
tributions, and in particular the PC (co-)chairs for their hard work in running this entire
intensive process. Last but not least, my congratulations to all authors of the accepted
papers!
ETAPS 2022 featured the unifying invited speakers Alexandra Silva (University
College London, UK, and Cornell University, USA) and Tomáš Vojnar (Brno
University of Technology, Czech Republic) and the conference-specific invited
speakers Nathalie Bertrand (Inria Rennes, France) for FoSSaCS and Lenore Zuck
(University of Illinois at Chicago, USA) for TACAS. Invited tutorials were provided by
Stacey Jeffery (CWI and QuSoft, The Netherlands) on quantum computing and
Nicholas Lane (University of Cambridge and Samsung AI Lab, UK) on federated
learning.
As this event was the 25th edition of ETAPS, part of the program was a special
celebration where we looked back on the achievements of ETAPS and its constituting
conferences in the past, but we also looked into the future, and discussed the challenges
ahead for research in software science. This edition also reinstated the ETAPS men-
toring workshop for PhD students.
ETAPS 2022 took place in Munich, Germany, and was organized jointly by the
Technical University of Munich (TUM) and the LMU Munich. The former was
founded in 1868, and the latter in 1472 as the 6th oldest German university still running
today. Together, they have 100,000 enrolled students, regularly rank among the top
100 universities worldwide (with TUM’s computer-science department ranked #1 in
the European Union), and their researchers and alumni include 60 Nobel laureates.
vi ETAPS Foreword

The local organization team consisted of Jan Křetínský (general chair), Dirk Beyer
(general, financial, and workshop chair), Julia Eisentraut (organization chair), and
Alexandros Evangelidis (local proceedings chair).
ETAPS 2022 was further supported by the following associations and societies:
ETAPS e.V., EATCS (European Association for Theoretical Computer Science),
EAPLS (European Association for Programming Languages and Systems), and EASST
(European Association of Software Science and Technology).
The ETAPS Steering Committee consists of an Executive Board, and representa-
tives of the individual ETAPS conferences, as well as representatives of EATCS,
EAPLS, and EASST. The Executive Board consists of Holger Hermanns
(Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König
(Duisburg), Thomas Noll (Aachen), Caterina Urban (Paris), Tarmo Uustalu (Reykjavik
and Tallinn), and Lenore Zuck (Chicago).
Other members of the Steering Committee are Patricia Bouyer (Paris), Einar Broch
Johnsen (Oslo), Dana Fisman (Be’er Sheva), Reiko Heckel (Leicester), Joost-Pieter
Katoen (Aachen and Twente), Fabrice Kordon (Paris), Jan Křetínský (Munich), Orna
Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick),
Andrew M. Pitts (Cambridge), Elizabeth Polgreen (Edinburgh), Grigore Roşu (Illinois),
Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella
(Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Natasha Sharygina
(Lugano), Pawel Sobocinski (Tallinn), Peter Thiemann (Freiburg), Sebastián Uchitel
(London and Buenos Aires), Jan Vitek (Prague), Andrzej Wasowski (Copenhagen),
Thomas Wies (New York), Anton Wijs (Eindhoven), and Manuel Wimmer (Linz).
I’d like to take this opportunity to thank all authors, attendees, organizers of the
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all
enjoyed ETAPS 2022.
Finally, a big thanks to Jan, Julia, Dirk, and their local organization team for all their
enormous efforts to make ETAPS a fantastic event.

February 2022 Marieke Huisman


ETAPS SC Chair
ETAPS e.V. President
Preface

This volume contains the papers accepted at the 31st European Symposium on
Programming (ESOP 2022), held during April 5–7, 2022, in Munich, Germany
(COVID-19 permitting). ESOP is one of the European Joint Conferences on Theory
and Practice of Software (ETAPS); it is dedicated to fundamental issues in the spec-
ification, design, analysis, and implementation of programming languages and systems.
The 21 papers in this volume were selected by the Program Committee (PC) from
64 submissions. Each submission received between three and four reviews. After
receiving the initial reviews, the authors had a chance to respond to questions and
clarify misunderstandings of the reviewers. After the author response period, the papers
were discussed electronically using the HotCRP system by the 33 Program Committee
members and 33 external reviewers. Two papers, for which the PC chair had a conflict
of interest, were kindly managed by Zena Ariola. The reviewing for ESOP 2022 was
double-anonymous, and only authors of the eventually accepted papers have been
revealed.
Following the example set by other major conferences in programming languages,
for the first time in its history, ESOP featured optional artifact evaluation. Authors
of the accepted manuscripts were invited to submit artifacts, such as code, datasets, and
mechanized proofs, that supported the conclusions of their papers. Members of the
Artifact Evaluation Committee (AEC) read the papers and explored the artifacts,
assessing their quality and checking that they supported the authors’ claims. The
authors of eleven of the accepted papers submitted artifacts, which were evaluated by
20 AEC members, with each artifact receiving four reviews. Authors of papers with
accepted artifacts were assigned official EAPLS artifact evaluation badges, indicating
that they have taken the extra time and have undergone the extra scrutiny to prepare a
useful artifact. The ESOP 2022 AEC awarded Artifacts Functional and Artifacts
(Functional and) Reusable badges. All submitted artifacts were deemed Functional, and
all but one were found to be Reusable.
My sincere thanks go to all who contributed to the success of the conference and to
its exciting program. This includes the authors who submitted papers for consideration;
the external reviewers who provided timely expert reviews sometimes on very short
notice; the AEC members and chairs who took great care of this new aspect of ESOP;
and, of course, the members of the ESOP 2022 Program Committee. I was extremely
impressed by the excellent quality of the reviews, the amount of constructive feedback
given to the authors, and the criticism delivered in a professional and friendly tone.
I am very grateful to Andreea Costea and KC Sivaramakrishnan who kindly agreed to
serve as co-chairs for the ESOP 2022 Artifact Evaluation Committee. I would like to
thank the ESOP 2021 chair Nobuko Yoshida for her advice, patience, and the many
insightful discussions on the process of running the conference. I thank all who con-
tributed to the organization of ESOP: the ESOP steering committee and its chair Peter
Thiemann, as well as the ETAPS steering committee and its chair Marieke Huisman.
viii Preface

Finally, I would like to thank Barbara König and Alexandros Evangelidis for their help
with assembling the proceedings.

February 2022 Ilya Sergey


Organization

Program Chair
Ilya Sergey National University of Singapore, Singapore

Program Committee
Michael D. Adams Yale-NUS College, Singapore
Danel Ahman University of Ljubljana, Slovenia
Aws Albarghouthi University of Wisconsin-Madison, USA
Zena M. Ariola University of Oregon, USA
Ahmed Bouajjani Université de Paris, France
Giuseppe Castagna CNRS, Université de Paris, France
Cristina David University of Bristol, UK
Mariangiola Dezani Università di Torino, Italy
Rayna Dimitrova CISPA Helmholtz Center for Information Security,
Germany
Jana Dunfield Queen’s University, Canada
Aquinas Hobor University College London, UK
Guilhem Jaber Université de Nantes, France
Jeehoon Kang KAIST, South Korea
Ekaterina Komendantskaya Heriot-Watt University, UK
Ori Lahav Tel Aviv University, Israel
Ivan Lanese Università di Bologna, Italy, and Inria, France
Dan Licata Wesleyan University, USA
Sam Lindley University of Edinburgh, UK
Andreas Lochbihler Digital Asset, Switzerland
Cristina Lopes University of California, Irvine, USA
P. Madhusudan University of Illinois at Urbana-Champaign, USA
Stefan Marr University of Kent, UK
James Noble Victoria University of Wellington, New Zealand
Burcu Kulahcioglu Ozkan Delft University of Technology, The Netherlands
Andreas Pavlogiannis Aarhus University, Denmark
Vincent Rahli University of Birmingham, UK
Robert Rand University of Chicago, USA
Christine Rizkallah University of Melbourne, Australia
Alejandro Russo Chalmers University of Technology, Sweden
Gagandeep Singh University of Illinois at Urbana-Champaign, USA
Gordon Stewart BedRock Systems, USA
Joseph Tassarotti Boston College, USA
Bernardo Toninho Universidade NOVA de Lisboa, Portugal
x Organization

Additional Reviewers
Andreas Abel Gothenburg University, Sweden
Guillaume Allais University of St Andrews, UK
Kalev Alpernas Tel Aviv University, Israel
Davide Ancona Università di Genova, Italy
Stephanie Balzer Carnegie Mellon University, USA
Giovanni Bernardi Université de Paris, France
Soham Chakraborty Delft University of Technology, The Netherlands
Arthur Chargueraud Inria, France
Ranald Clouston Australian National University, Australia
Fredrik Dahlqvist University College London, UK
Olivier Danvy Yale-NUS College, Singapore
Benjamin Delaware Purdue University, USA
Dominique Devriese KU Leuven, Belgium
Paul Downen University of Massachusetts, Lowell, USA
Yannick Forster Saarland University, Germany
Milad K. Ghale University of New South Wales, Australia
Kiran Gopinathan National University of Singapore, Singapore
Tristan Knoth University of California, San Diego, USA
Paul Levy University of Birmingham, UK
Umang Mathur National University of Singapore, Singapore
McKenna McCall Carnegie Mellon University, USA
Garrett Morris University of Iowa, USA
Fredrik Nordvall Forsberg University of Strathclyde, UK
José N. Oliveira University of Minho, Portugal
Alex Potanin Australian National University, Australia
Susmit Sarkar University of St Andrews, UK
Filip Sieczkowski Heriot-Watt University, UK
Kartik Singhal University of Chicago, USA
Sandro Stucki Chalmers University of Technology and University
of Gothenburg, Sweden
Amin Timany Aarhus University, Denmark
Klaus v. Gleissenthall Vrije Universiteit Amsterdam, The Netherlands
Thomas Wies New York University, USA
Vladimir Zamdzhiev Inria, Loria, Université de Lorraine, France

Artifact Evaluation Committee Chairs


Andreea Costea National University of Singapore, Singapore
K. C. Sivaramakrishnan IIT Madras, India

Artifact Evaluation Committee


Utpal Bora IIT Hyderabad, India
Darion Cassel Carnegie Mellon University, USA
Organization xi

Pritam Choudhury University of Pennsylvania, USA


Jan de Muijnck-Hughes University of Glasgow, UK
Darius Foo National University of Singapore, Singapore
Léo Gourdin Université Grenoble-Alpes, France
Daniel Hillerström University of Edinburgh, UK
Jules Jacobs Radboud University, The Netherlands
Chaitanya Koparkar Indiana University, USA
Yinling Liu Toulouse Computer Science Research Center, France
Yiyun Liu University of Pennsylvania, USA
Kristóf Marussy Budapest University of Technology and Economics,
Hungary
Orestis Melkonian University of Edinburgh, UK
Shouvick Mondal Concordia University, Canada
Krishna Narasimhan TU Darmstadt, Germany
Mário Pereira Universidade NOVA de Lisboa, Portugal
Goran Piskachev Fraunhofer IEM, Germany
Somesh Singh Inria, France
Yahui Song National University of Singapore, Singapore
Vimala Soundarapandian IIT Madras, India
Contents

Categorical Foundations of Gradient-Based Learning . . . . . . . . . . . . . . . . . . 1


Geoffrey S. H. Cruttwell, Bruno Gavranović, Neil Ghani, Paul Wilson,
and Fabio Zanasi

Compiling Universal Probabilistic Programming Languages with Efficient


Parallel Sequential Monte Carlo Inference . . . . . . . . . . . . . . . . . . . . . . . . . 29
Daniel Lundén, Joey Öhman, Jan Kudlicka, Viktor Senderov,
Fredrik Ronquist, and David Broman

Foundations for Entailment Checking in Quantitative Separation Logic . . . . . 57


Kevin Batz, Ira Fesefeldt, Marvin Jansen, Joost-Pieter Katoen,
Florian Keßler, Christoph Matheja, and Thomas Noll

Extracting total Amb programs from proofs . . . . . . . . . . . . . . . . . . . . . . . . 85


Ulrich Berger and Hideki Tsuiki

Why3-do: The Way of Harmonious Distributed System Proofs . . . . . . . . . . . 114


Cláudio Belo Lourenço and Jorge Sousa Pinto

Relaxed virtual memory in Armv8-A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143


Ben Simner, Alasdair Armstrong, Jean Pichon-Pharabod,
Christopher Pulte, Richard Grisenthwaite, and Peter Sewell

Verified Security for the Morello Capability-enhanced Prototype


Arm Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Thomas Bauereiss, Brian Campbell, Thomas Sewell,
Alasdair Armstrong, Lawrence Esswood, Ian Stark, Graeme Barnes,
Robert N. M. Watson, and Peter Sewell

The Trusted Computing Base of the CompCert Verified Compiler . . . . . . . . 204


David Monniaux and Sylvain Boulmé

View-Based Owicki–Gries Reasoning for Persistent x86-TSO. . . . . . . . . . . . 234


Eleni Vafeiadi Bila, Brijesh Dongol, Ori Lahav, Azalea Raad,
and John Wickerson

Abstraction for Crash-Resilient Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 262


Artem Khyzha and Ori Lahav

Static Race Detection for Periodic Programs . . . . . . . . . . . . . . . . . . . . . . . . 290


Varsha P Suresh, Rekha Pai, Deepak D’Souza, Meenakshi D’Souza,
and Sujit Kumar Chakrabarti
xiv Contents

Probabilistic Total Store Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317


Parosh Aziz Abdulla, Mohamed Faouzi Atig, Raj Aryan Agarwal,
Adwait Godbole, and Krishna S.

Linearity and Uniqueness: An Entente Cordiale . . . . . . . . . . . . . . . . . . . . . 346


Daniel Marshall, Michael Vollmer, and Dominic Orchard

A Framework for Substructural Type Systems . . . . . . . . . . . . . . . . . . . . . . 376


James Wood and Robert Atkey

A Dependent Dependency Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403


Pritam Choudhury, Harley Eades III, and Stephanie Weirich

Polarized Subtyping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431


Zeeshan Lakhani, Ankush Das, Henry DeYoung, Andreia Mordido,
and Frank Pfenning

Structured Handling of Scoped Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . 462


Zhixuan Yang, Marco Paviotti, Nicolas Wu, Birthe van den Berg,
and Tom Schrijvers

Region-based Resource Management and Lexical Exception Handlers


in Continuation-Passing Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
Philipp Schuster, Jonathan Immanuel Brachthäuser,
and Klaus Ostermann

A Predicate Transformer for Choreographies: Computing Preconditions


in Choreographic Programming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Sung-Shik Jongmans and Petra van den Bos

Comparing the Expressiveness of the p-calculus and CCS . . . . . . . . . . . . . . 548


Rob van Glabbeek

Concurrent NetKAT: Modeling and analyzing stateful,


concurrent networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Jana Wagemaker, Nate Foster, Tobias Kappé, Dexter Kozen,
Jurriaan Rot, and Alexandra Silva

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603


Categorical Foundations of Gradient-Based Learning

Geoffrey S. H. Cruttwell1 ( ) , Bruno Gavranović2 ( ) , Neil Ghani2 ( ) ,


Paul Wilson4 ( ) , and Fabio Zanasi4 ( )
1
Mount Allison University, Canada
2
University of Strathclyde, United Kingdom
3
University College London

Abstract. We propose a categorical semantics of gradient-based ma-


chine learning algorithms in terms of lenses, parametric maps, and re-
verse derivative categories. This foundation provides a powerful explana-
tory and unifying framework: it encompasses a variety of gradient descent
algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well
as a variety of loss functions such as MSE and Softmax cross-entropy,
shedding new light on their similarities and differences. Our approach to
gradient-based learning has examples generalising beyond the familiar
continuous domains (modelled in categories of smooth maps) and can
be realized in the discrete setting of boolean circuits. Finally, we demon-
strate the practical significance of our framework with an implementation
in Python.

1 Introduction
The last decade has witnessed a surge of interest in machine learning, fuelled by
the numerous successes and applications that these methodologies have found in
many fields of science and technology. As machine learning techniques become
increasingly pervasive, algorithms and models become more sophisticated, posing
a significant challenge both to the software developers and the users that need to
interface, execute and maintain these systems. In spite of this rapidly evolving
picture, the formal analysis of many learning algorithms mostly takes place at a
heuristic level [41], or using definitions that fail to provide a general and scalable
framework for describing machine learning. Indeed, it is commonly acknowledged
through academia, industry, policy makers and funding agencies that there is a
pressing need for a unifying perspective, which can make this growing body of
work more systematic, rigorous, transparent and accessible both for users and
developers [2, 36].
Consider, for example, one of the most common machine learning scenar-
ios: supervised learning with a neural network. This technique trains the model
towards a certain task, e.g. the recognition of patterns in a data set (cf. Fig-
ure 1). There are several different ways of implementing this scenario. Typically,
at their core, there is a gradient update algorithm (often called the “optimiser”),
depending on a given loss function, which updates in steps the parameters of the
network, based on some learning rate controlling the “scaling” of the update. All
c The Author(s) 2022
I. Sergey (Ed.): ESOP 2022, LNCS 13240, pp. 1–28, 2022.
https://doi.org/10.1007/978-3-030-99336-8_1
2 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

of these components can vary independently in a supervised learning algorithm


and a number of choices is available for loss maps (quadratic error, Softmax
cross entropy, dot product, etc.) and optimisers (Adagrad [20], Momentum [37],
and Adam [32], etc.).

Fig. 1: An informal illustration of gradient-based learning. This neural network


is trained to distinguish different kinds of animals in the input image. Given an
input X, the network predicts an output Y , which is compared by a ‘loss map’
with what would be the correct answer (‘label’). The loss map returns a real
value expressing the error of the prediction; this information, together with the
learning rate (a weight controlling how much the model should be changed in
response to error) is used by an optimiser, which computes by gradient-descent
the update of the parameters of the network, with the aim of improving its
accuracy. The neural network, the loss map, the optimiser and the learning rate
are all components of a supervised learning system, and can vary independently
of one another.

This scenario highlights several questions: is there a uniform mathemati-


cal language capturing the different components of the learning process? Can
we develop a unifying picture of the various optimisation techniques, allowing
for their comparative analysis? Moreover, it should be noted that supervised
learning is not limited to neural networks. For example, supervised learning is
surprisingly applicable to the discrete setting of boolean circuits [50] where con-
tinuous functions are replaced by boolean-valued functions. Can we identify an
abstract perspective encompassing both the real-valued and the boolean case?
In a nutshell, this paper seeks to answer the question:
what are the fundamental mathematical structures underpinning gradient-
based learning?
Our approach to this question stems from the identification of three funda-
mental aspects of the gradient-descent learning process:
(I) computation is parametric, e.g. in the simplest case we are given a function
f : P × X → Y and learning consists of finding a parameter p : P such
Categorical Foundations of Gradient-Based Learning 3

that f (p, −) is the best function according to some criteria. Specifically, the
weights on the internal nodes of a neural network are a parameter which the
learning is seeking to optimize. Parameters also arise elsewhere, e.g. in the
loss function (see later).
(II) information flows bidirectionally: in the forward direction, the computa-
tion turns inputs via a sequence of layers into predicted outputs, and then
into a loss value; in the reverse direction, backpropagation is used propa-
gate the changes backwards through the layers, and then turn them into
parameter updates.
(III) the basis of parameter update via gradient descent is differentiation e.g.
in the simple case we differentiate the function mapping a parameter to its
associated loss to reduce that loss.

We model bidirectionality via lenses [6, 12, 29] and based upon the above
three insights, we propose the notion of parametric lens as the fundamental
semantic structure of learning. In a nutshell, a parametric lens is a process with
three kinds of interfaces: inputs, outputs, and parameters. On each interface,
information flows both ways, i.e. computations are bidirectional. These data
are best explained with our graphical representation of parametric lenses, with
inputs A, A′ , outputs B, B ′ , parameters P , P ′ , and arrows indicating information
flow (below left). The graphical notation also makes evident that parametric
lenses are open systems, which may be composed along their interfaces (below
center and right).
Q Q′
′ ′ Q Q ′
P P P P

B
A B A C
(1)
A′ B′ A′ C′ P P′
B′
A B
A′ B′

This pictorial formalism is not just an intuitive sketch: as we will show, it can
be understood as a completely formal (graphical) syntax using the formalism of
string diagrams [39], in a way similar to how other computational phenomena
have been recently analysed e.g. in quantum theory [14], control theory [5, 8],
and digital circuit theory [26].
It is intuitively clear how parametric lenses express aspects (I) and (II) above,
whereas (III) will be achieved by studying them in a space of ‘differentiable
objects’ (in a sense that will be made precise). The main technical contribution
of our paper is showing how the various ingredients involved in learning (the
model, the optimiser, the error map and the learning rate) can be uniformly
understood as being built from parametric lenses.
We will use category theory as the formal language to develop our notion of
parametric lenses, and make Figure 2 mathematically precise. The categorical
perspective brings several advantages, which are well-known, established princi-
ples in programming language semantics [3,40,49]. Three of them are particularly
4 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

A P P
B

Optimiser

P P′
B′
A B L
Learning
Model Loss
rate
A′ B′ L′

Fig. 2: The parametric lens that captures the learning process informally sketched
in Figure 1. Note each component is a lens itself, whose composition yields the
interactions described in Figure 1. Defining this picture formally will be the
subject of Sections 3-4.

important to our contribution, as they constitute distinctive advantages of our


semantic foundations:
Abstraction Our approach studies which categorical structures are sufficient
to perform gradient-based learning. This analysis abstracts away from the
standard case of neural networks in several different ways: as we will see, it
encompasses other models (namely Boolean circuits), different kinds of op-
timisers (including Adagrad, Adam, Nesterov momentum), and error maps
(including quadratic and softmax cross entropy loss). These can be all un-
derstood as parametric lenses, and different forms of learning result from
their interaction.
Uniformity As seen in Figure 1, learning involves ingredients that are seem-
ingly quite different: a model, an optimiser, a loss map, etc. We will show
how all these notions may be seen as instances of the categorical defini-
tion of a parametric lens, thus yielding a remarkably uniform description of
the learning process, and supporting our claim of parametric lenses being a
fundamental semantic structure of learning.
Compositionality The use of categorical structures to describe computation
naturally enables compositional reasoning whereby complex systems are anal-
ysed in terms of smaller, and hence easier to understand, components. Com-
positionality is a fundamental tenet of programming language semantics; in
the last few years, it has found application in the study of diverse kinds of
computational models, across different fields— see e.g. [8,14,25,45]. As made
evident by Figure 2, our approach models a neural network as a parametric
lens, resulting from the composition of simpler parametric lenses, capturing
the different ingredients involved in the learning process. Moreover, as all
the simpler parametric lenses are themselves composable, one may engineer
a different learning process by simply plugging a new lens on the left or right
of existing ones. This means that one can glue together smaller and relatively
simple networks to create larger and more sophisticated neural networks.
Categorical Foundations of Gradient-Based Learning 5

We now give a synopsis of our contributions:


– In Section 2, we introduce the tools necessary to define our notion of para-
metric lens. First, in Section 2.1, we introduce a notion of parametric cat-
egories, which amounts to a functor Para(−) turning a category C into one
Para(C) of ‘parametric C-maps’. Second, we recall lenses (Section 2.2). In a
nutshell, a lens is a categorical morphism equipped with operations to view
and update values in a certain data structure. Lenses play a prominent role
in functional programming [47], as well as in the foundations of database
theory [31] and more recently game theory [25]. Considering lenses in C sim-
ply amounts to the application of a functorial construction Lens(−), yield-
ing Lens(C). Finally, we recall the notion of a cartesian reverse differential
category (CRDC): a categorical structure axiomatising the notion of differ-
entiation [13] (Section 2.4). We wrap up in Section 2.3, by combining these
ingredients into the notion of parametric lens, formally defined as a morphism
in Para(Lens(C)) for a CRDC C. In terms of our desiderata (I)-(III) above,
note that Para(−) accounts for (I), Lens(−) accounts for (II), and the CRDC
structure accounts for (III).
– As seen in Figure 1, in the learning process there are many components at
work: the model, the optimiser, the loss map, the learning rate, etc.. In Sec-
tion 3, we show how the notion of parametric lens provides a uniform char-
acterisation for such components. Moreover, for each of them, we show how
different variations appearing in the literature become instances of our ab-
stract characterisation. The plan is as follows:
◦ In Section 3.1, we show how the combinatorial model subject of the training
can be seen as a parametric lens. The conditions we provide are met by the
‘standard’ case of neural networks, but also enables the study of learning for
other classes of models. In particular, another instance are Boolean circuits:
learning of these structures is relevant to binarisation [16] and it has been
explored recently using a categorical approach [50], which turns out to be
a particular case of our framework.
◦ In Section 3.2, we show how the loss maps associated with training are also
parametric lenses. Our approach covers the cases of quadratic error, Boolean
error, Softmax cross entropy, but also the ‘dot product loss’ associated with
the phenomenon of deep dreaming [19, 34, 35, 44].
◦ In Section 3.3, we model the learning rate as a parametric lens. This
analysis also allows us to contrast how learning rate is handled in the ‘real-
valued’ case of neural networks with respect to the ‘Boolean-valued’ case of
Boolean circuits.
◦ In Section 3.4, we show how optimisers can be modelled as ‘reparame-
terisations’ of models as parametric lenses. As case studies, in addition to
basic gradient update, we consider the stateful variants: Momentum [37],
Nesterov Momentum [48], Adagrad [20], and Adam (Adaptive Moment Es-
timation) [32]. Also, on Boolean circuits, we show how the reverse derivative
ascent of [50] can be also regarded in such way.
– In Section 4, we study how the composition of the lenses defined in Section 3
yields a description of different kinds of learning processes.
6 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

◦ Section 4.1 is dedicated to modelling supervised learning of parameters,


in the way described in Figure 1. This amounts essentially to study of
the composite of lenses expressed in Figure 2, for different choices of the
various components. In particular we look at (i) quadratic loss with basic
gradient descent, (ii) softmax cross entropy loss with basic gradient descent,
(iii) quadratic loss with Nesterov momentum, and (iv) learning in Boolean
circuits with XOR loss and basic gradient ascent.
◦ In order to showcase the flexibility of our approach, in Section 4.2 we de-
part from our ‘core’ case study of parameter learning, and turn attention
to supervised learning of inputs, also called deep dreaming — the idea
behind this technique is that, instead of the network parameters, one up-
dates the inputs, in order to elicit a particular interpretation [19, 34, 35, 44].
Deep dreaming can be easily expressed within our approach, with a differ-
ent rearrangement of the parametric lenses involved in the learning process,
see (8) below. The abstract viewpoint of categorical semantics provides a
mathematically precise and visually captivating description of the differ-
ences between the usual parameter learning process and deep dreaming.
– In Section 5 we describe a proof-of-concept Python implementation, avail-
able at [17], based on the theory developed in this paper. This code is intended
to show more concretely the payoff of our approach. Model architectures, as
well as the various components participating in the learning process, are now
expressed in a uniform, principled mathematical language, in terms of lenses.
As a result, computing network gradients is greatly simplified, as it amounts
to lens composition. Moreover, the modularity of this approach allows one to
more easily tune the various parameters of training.
We show our library via a number of experiments, and prove correctness by
achieving accuracy on par with an equivalent model in Keras, a mainstream
deep learning framework [11]. In particular, we create a working non-trivial
neural network model for the MNIST image-classification problem [33].
– Finally, in Sections 6 and 7, we discuss related and future work.

2 Categorical Toolkit

In this section we describe the three categorical components of our framework,


each corresponding to an aspect of gradient-based learning: (I) the Para con-
struction (Section 2.1), which builds a category of parametric maps, (II) the
Lens construction, which builds a category of “bidirectional” maps (Section
2.2), and (III) the combination of these two constructions into the notion of
“parametric lenses” (Section 2.3). Finally (IV) we recall Cartesian reverse dif-
ferential categories — categories equipped with an abstract gradient operator.

Notation We shall use f ; g for sequential composition of morphisms f : A → B


and g : B → C in a category, 1A for the identity morphism on A, and I for the
unit object of a symmetric monoidal category.
Categorical Foundations of Gradient-Based Learning 7

2.1 Parametric Maps

In supervised learning one is typically interested in approximating a function


g : Rn → Rm for some n and m. To do this, one begins by building a neural
network, which is a smooth map f : Rp × Rn → Rm where Rp is the set of
possible weights of that neural network. Then one looks for a value of q ∈ Rp
such that the function f (q, −) : Rn → Rm closely approximates g. We formalise
these maps categorically via the Para construction [9, 23, 24, 30].

Definition 1 (Parametric category). Let (C, ⊗, I) be a strict4 symmetric


monoidal category. We define a category Para(C) with objects those of C, and
a map from A to B a pair (P, f ), with P an object of C and f : P ⊗ A →
B. The composite of maps (P, f ) : A → B and (P ′ , f ′ ) : B → C is the pair
(P ′ ⊗ P, (1P ′ ⊗ f ); f ′ ). The identity on A is the pair (I, 1A ).

Example 1. Take the category Smooth whose objects are natural numbers and
whose morphisms f : n → m are smooth maps from Rn to Rm . As described
above, the category Para(Smooth) can be thought of as a category of neural
networks: a map in this category from n to m consists of a choice of p and a
map f : Rp × Rn → Rm with Rp representing the set of possible weights of the
neural network.

As we will see in the next sections, the interplay of the various components
at work in the learning process becomes much clearer once represented the mor-
phisms of Para(C) using the pictorial formalism of string diagrams, which we
now recall. In fact, we will mildly massage the traditional notation for string
diagrams (below left), by representing a morphism f : A → B in Para(C) as
below right.
P
P
f B A f B
A

This is to emphasise the special role played by P , reflecting the fact that in
machine learning data and parameters have different semantics. String diagram-
matic notations also allows to neatly represent composition of maps (P, f ) : A →
B and (P ′ , f ′ ) : B → C (below left), and “reparameterisation” of (P, f ) : A → B
by a map α : Q → P (below right), yielding a new map (Q, (α⊗1A ); f ) : A → B.

P P′ α
(2)
P
B
A f f′ C A f B

4
One can also define Para(C) in the case when C is non-strict; however, the result
would be not a category but a bicategory.
8 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Intuitively, reparameterisation changes the parameter space of (P, f ) : A → B to


some other object Q, via some map α : Q → P . We shall see later that gradient
descent and its many variants can naturally be viewed as reparameterisations.
Note coherence rules in combining the two operations in (2) just work as ex-
pected, as these diagrams can be ultimately ‘compiled’ down to string diagrams
for monoidal categories.

2.2 Lenses

In machine learning (or even learning in general) it is fundamental that infor-


mation flows both forwards and backwards: the ‘forward’ flow corresponds to a
model’s predictions, and the ‘backwards’ flow to corrections to the model. The
category of lenses is the ideal setting to capture this type of structure, as it is a
category consisting of maps with both a “forward” and a “backward” part.

Definition 2. For any Cartesian category C, the category of (bimorphic) lenses


in C, Lens(C), is the category with the following data. Objects are pairs (A, A′ )
of objects in C. A map from (A, A′ ) to (B, B ′ ) consists of a pair (f, f ∗ ) where
f : A → B (called the get or forward part of the lens) and f ∗ : A × B ′ →
A′ (called the put or backwards part of the lens). The composite of (f, f ∗ ) :
(A, A′ ) → (B, B ′ ) and (g, g ∗ ) : (B, B ′ ) → (C, C ′ ) is given by get f ; g and put
⟨π0 , ⟨π0 ; f, π1 ⟩; g ∗ ⟩; f ∗ . The identity on (A, A′ ) is the pair (1A , π1 ).

The embedding of Lens(C) into the category of Tambara modules over C


(see [7, Thm. 23]) provides a rich string diagrammatic language, in which lenses
may be represented with forward/backward wires indicating the information
flow. In this language, a morphism (f, f ∗ ) : (A, A′ ) → (B, B ′ ) is written as
below left, which can be ‘expanded’ as below right.

f B
A
A ∗ B
(f, f )
A′ B′
A′ f∗
B′

It is clear in this language how to describe the composite of (f, f ∗ ) : (A, A′ ) →


(B, B ′ ) and (g, g ∗ ) : (B, B ′ ) → (C, C ′ ):

f B g C
A
(3)
′ ∗ ∗
A f g
B′ C′
Categorical Foundations of Gradient-Based Learning 9

2.3 Parametric Lenses

The fundamental category where supervised learning takes place is the composite
Para(Lens(C)) of the two constructions in the previous sections:

Definition 3. The category Para(Lens(C)) of parametric lenses on C has


as objects pairs (A, A′ ) of objects from C. A morphism from (A, A′ ) to (B, B ′ ),
called a parametric lens5 , is a choice of parameter pair (P, P ′ ) and a lens (f, f ∗ ) :
(P, P ′ ) × (A, A′ ) → (B, B ′ ) so that f : P × A → B and f ∗ : P × A × B ′ → P ′ × A′
String diagrams for parametric lenses are built by simply composing the graph-
ical languages of the previous two sections — see (1), where respectively a mor-
phism, a composition of morphisms, and a reparameterisation are depicted.
Given a generic morphism in Para(Lens(C)) as depicted in (1) on the left,
one can see how it is possible to “learn” new values from f : it takes as input an
input A, a parameter P , and a change B ′ , and outputs a change in A, a value
of B, and a change P ′ . This last element is the key component for supervised
learning: intuitively, it says how to change the parameter values to get the neural
network closer to the true value of the desired function.
The question, then, is how one is to define such a parametric lens given
nothing more than a neural network, ie., a parametric map (P, f ) : A → B.
This is precisely what the gradient operation provides, and its generalization to
categories is explored in the next subsection.

2.4 Cartesian Reverse Differential Categories

Fundamental to all types of gradient-based learning is, of course, the gradient


operation. In most cases this gradient operation is performed in the category of
smooth maps between Euclidean spaces. However, recent work [50] has shown
that gradient-based learning can also work well in other categories; for example,
in a category of boolean circuits. Thus, to encompass these examples in a single
framework, we will work in a category with an abstract gradient operation.

Definition 4. A Cartesian left additive category [13, Defn. 1] consists of


a category C with chosen finite products (including a terminal object), and an
addition operation and zero morphism in each homset, satisfying various axioms.
A Cartesian reverse differential category (CRDC) [13, Defn. 13] consists
of a Cartesian left additive category C, together with an operation which provides,
for each map f : A → B in C, a map R[f ] : A × B → A satisfying various
axioms.

For f : A → B, the pair (f, R[f ]) forms a lens from (A, A) to (B, B). We
will pursue the idea that R[f ] acts as backwards map, thus giving a means to
“learn”f .
5
In [23], these are called learners. However, in this paper we study them in a much
broader light; see Section 6.
10 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Note that assigning type A×B → A to R[f ] hides some relevant information:
B-values in the domain and A-values in the codomain of R[f ] do not play the
same role as values of the same types in f : A → B: in R[f ], they really take in a
tangent vector at B and output a tangent vector at A (cf. the definition of R[f ]
in Smooth, Example 2 below). To emphasise this, we will type R[f ] as a map
A × B ′ → A′ (even though in reality A = A′ and B = B ′ ), thus meaning that
(f, R[f ]) is actually a lens from (A, A′ ) to (B, B ′ ). This typing distinction will
be helpful later on, when we want to add additional components to our learning
algorithms.
The following two examples of CRDCs will serve as the basis for the learning
scenarios of the upcoming sections.

Example 2. The category Smooth (Example 1) is Cartesian with product given


by addition, and it is also a Cartesian reverse differential category: given a
smooth map f : Rn → Rm , the map R[f ] : Rn × Rm → Rn sends a pair (x, v)
to J[f ]T (x) · v: the transpose of the Jacobian of f at x in the direction v. For
example, if f : R2 → R3 is defined as f (x1 , x2 ) := (x31 + 2x1 x2 , x2 , sin(x
  1 )), then
 2  v1
3x1 + 2x2 0 cos(x1 )  
R[f ] : R2 × R3 → R2 is given by (x, v) 7→ · v2 . Using
2x1 1 0
v3
the reverse derivative (as opposed to the forward derivative) is well-known to be
much more computationally efficient for functions f : Rn → Rm when m ≪ n
(for example, see [28]), as is the case in most supervised learning situations
(where often m = 1).

Example 3. Another CRDC is the symmetric monoidal category POLY Z2 [13,


Example 14] with objects the natural numbers and morphisms f : A → B the B-
tuples of polynomials Z2 [x1 . . . xA ]. When presented by generators and relations
these morphisms can be viewed as a syntax for boolean circuits, with parametric
lenses for such circuits (and their reverse derivative) described in [50].

3 Components of learning as Parametric Lenses


As seen in the introduction, in the learning process there are many components
at work: a model, an optimiser, a loss map, a learning rate, etc. In this section
we show how each such component can be understood as a parametric lens.
Moreover, for each component, we show how our framework encompasses several
variations of the gradient-descent algorithms, thus offering a unifying perspective
on many different approaches that appear in the literature.

3.1 Models as Parametric Lenses


We begin by characterising the models used for training as parametric lenses.
In essence, our approach identifies a set of abstract requirements necessary to
perform training by gradient descent, which covers the case studies that we will
consider in the next sections.
Categorical Foundations of Gradient-Based Learning 11

The leading intuition is that a suitable model is a parametric map, equipped


with a reverse derivative operator. Using the formal developments of Section 2,
this amounts to assuming that a model is a morphism in Para(C), for a CRDC
C. In order to visualise such morphism as a parametric lens, it then suffices to
apply under Para(−) the canonical morphism R : C → Lens(C) (which exists
for any CRDC C, see [13, Prop. 31]), mapping f to (f, R[f ]). This yields a functor
Para(R) : Para(C) → Para(Lens(C)), pictorially defined as

P P′

P f B

A
A f B 7→ (4)
A′ R[f ]
B′

Example 4 (Neural networks). As noted previously, to learn a function of type


Rn → Rm , one constructs a neural network, which can be seen as a function of
type Rp × Rn → Rm where Rp is the space of parameters of the neural network.
As seen in Example 1, this is a map in the category Para(Smooth) of type
Rn → Rm with parameter space Rp . Then one can apply the functor in (4)
to present a neural network together with its reverse derivative operator as a
parametric lens, i.e. a morphism in Para(Lens(Smooth)).

Example 5 (Boolean circuits). For learning of Boolean circuits as described in


[50], the recipe is the same as in Example 4, except that the base category is
POLYZ2 (see Example 3). The important observation here is that POLY Z2 is a
CRDC, see [13, 50], and thus we can apply the functor in (4).

Note a model/parametric lens f can take as inputs an element of A, an


element of B ′ (a change in B) and a parameter P and outputs an element of
B, a change in A, and a change in P . This is not yet sufficient to do machine
learning! When we perform learning, we want to input a parameter P and a pair
A × B and receive a new parameter P . Instead, f expects a change in B (not an
element of B) and outputs a change in P (not an element of P ). Deep dreaming,
on the other hand, wants to return an element of A (not a change in A). Thus, to
do machine learning (or deep dreaming) we need to add additional components
to f ; we will consider these additional components in the next sections.

3.2 Loss Maps as Parametric Lenses


Another key component of any learning algorithm is the choice of loss map.
This gives a measurement of how far the current output of the model is from
the desired output. In standard learning in Smooth, this loss map is viewed as
a map of type B × B → R. However, in our setup, this is naturally viewed as a
12 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

parametric map from B to R with parameter space B.6 We also generalize the
codomain to an arbitrary object L.

Definition 5. A loss map on B consists of a parametric map (B, loss) :


Para(C)(B, L) for some object L.

Note that we can precompose a loss map (B, loss) : B → L with a neural
network (P, f ) : A → B (below left), and apply the functor in (4) (with C =
Smooth) to obtain the parametric lens below right.

P P′ B B′
P B B
A f loss L (5)
B 7→
A f loss L A′ R[f ] R[loss] L′
B′

This is getting closer to the parametric lens we want: it can now receive
inputs of type B. However, this is at the cost of now needing an input to L′ ; we
consider how to handle this in the next section.

Example 6 (Quadratic error). In Smooth, the standard loss function on Rb is


quadratic error:P it uses L = R and has parametric map e : Rb × Rb → R given
1 b
by e(bt , bp ) = 2 i=1 ((bp )i −(bt )i )2 , where we think of bt as the “true” value and
bp the predicted value. This has reverse derivative R[e] : Rb × Rb × R → Rb × Rb
given by R[e](bt , bp , α) = α · (bp − bt , bt − bp ) — note α suggests the idea of
learning rate, which we will explore in Section 3.3.

Example 7 (Boolean error). In POLYZ2 , the loss function on Zb which is im-


plicitly used in [50] is a bit different: it uses L = Zb and has parametric map
e : Zb × Zb → Zb given by
e(bt , bp ) = bt + bp .
(Note that this is + in Z2 ; equivalently this is given by XOR.) Its reverse deriva-
tive is of type R[e] : Zb × Zb × Zb → Zb × Zb given by R[e](bt , bp , α) = (α, α).

Example 8 (Softmax cross entropy). The Softmax cross entropy loss is a Rb -


Pb
parametric map Rb → R defined by e(bt , bp ) = i=1 (bt )i ((bp )i −log(Softmax(bp )i ))
exp((bp )i )
where Softmax(bp ) = Pb exp((b ) )
is defined componentwise for each class i.
j=1 p j

We note that, although bt needs to be a probability distribution, at the


moment there is no need to ponder the question of interaction of probability
distributions with the reverse derivative framework: one can simply consider bt
as the image of some logits under the Softmax function.
6
Here the loss map has its parameter space equal to its input space. However, putting
loss maps on the same footing as models lends itself to further generalizations where
the parameter space is different, and where the loss map can itself be learned. See
Generative Adversarial Networks, [9, Figure 7.].
Categorical Foundations of Gradient-Based Learning 13

Example 9 (Dot product). In Deep Dreaming (Section 4.2) we often want to focus
only on a particular element of the network output Rb . This is done by supplying
a one-hot vector bt as the ground truth to the loss function e(bt , bp ) = bt ·bp which
computes the dot product of two vectors. If the ground truth vector y is a one-
hot vector (active at the i-th element), then the dot product performs masking of
all inputs except the i-th one. Note the reverse derivative R[e] : Rb × Rb × R →
Rb × Rb of the dot product is defined as R[e](bt , bp , α) = (α · bp , α · bt ).

3.3 Learning Rates as Parametric Lenses


After models and loss maps, another ingredient of the learning process are learn-
ing rates, which we formalise as follows.
Definition 6. A learning rate α on L consists of a lens from (L, L′ ) to (1, 1)
where 1 is a terminal object in C.
Note that the get component of the learning rate lens must be the unique map
to 1, while the put component is a map L × 1 → L′ ; that is, simply a map
α∗ : L → L′ . Thus we can view α as a parametric lens from (L, L′ ) → (1, 1)
(with trivial parameter space) and compose it in Para(Lens(C)) with a model
and a loss map (cf. (5)) to get
P P′ B B′

B L
A f loss
α (6)
A′ R[f ] R[loss]
B′ L′

Example 10. In standard supervised learning in Smooth, one fixes some ϵ > 0
as a learning rate, and this is used to define α: α is simply constantly −ϵ, ie.,
α(l) = −ϵ for any l ∈ L.
Example 11. In supervised learning in POLY Z2 , the standard learning rate is
quite different: for a given L it is defined as the identity function, α(l) = l.
Other learning rate morphisms are possible as well: for example, one could
fix some ϵ > 0 and define a learning rate in Smooth by α(l) = −ϵ · l. Such a
choice would take into account how far away the network is from its desired goal
and adjust the learning rate accordingly.

3.4 Optimisers as Reparameterisations


In this section we consider how to implement gradient descent (and its variants)
into our framework. To this aim, note that the parametric lens (f, R[f ]) rep-
resenting our model (see (4)) outputs a P ′ , which represents a change in the
parameter space. Now, we would like to receive not just the requested change
in the parameter, but the new parameter itself. This is precisely what gradient
descent accomplishes, when formalised as a lens.
14 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Definition 7. In any CRDC C we can define gradient update as a map G in


Lens(C) from (P, P ) to (P, P ′ ) consisting of (G, G∗ ) : (P, P ) → (P, P ′ ), where
G(p) = p and G∗ (p, p′ ) = p + p′7 .

Intuitively, such a lens allows one to receive the requested change in parameter
and implement that change by adding that value to the current parameter. By its
type, we can now “plug” the gradient descent lens G : (P, P ) → (P, P ′ ) above the
model (f, R[f ]) in (4) — formally, this is accomplished as a reparameterisation
of the parametric morphism (f, R[f ]), cf. Section 2.1. This gives us Figure 3
(left).

P P S×P S×P

+ Optimiser

P P′ P P′
A B A B

Model Model

A′ B′ A′ B′

Fig. 3: Model reparameterised by basic gradient descent (left) and a generic


stateful optimiser (right).

Example 12 (Gradient update in Smooth). In Smooth, the gradient descent repa-


rameterisation will take the output from P ′ and add it to the current value of
P to get a new value of P .

Example 13 (Gradient update in Boolean circuits). In the CRDC POLY Z2 , the


gradient descent reparameterisation will again take the output from P ′ and
add it to the current value of P to get a new value of P ; however, since + in
Z2 is the same as XOR, this can be also be seen as taking the XOR of the
current parameter and the requested change; this is exactly how this algorithm
is implemented in [50].

Other variants of gradient descent also fit naturally into this framework by
allowing for additional input/output data with P . In particular, many of them
keep track of the history of previous updates and use that to inform the next one.
This is easy to model in our setup: instead of asking for a lens (P, P ) → (P, P ′ ),
we ask instead for a lens (S ×P, S ×P ) → (P, P ′ ) where S is some “state” object.
7
Note that as in the discussion in Section 2.4, we are implicitly assuming that P = P ′ ;
we have merely notated them differently to emphasize the different “roles” they play
(the first P can be thought of as “points”, the second as “vectors”)
Categorical Foundations of Gradient-Based Learning 15

Definition 8. A stateful parameter update consists of a choice of object S


(the state object) and a lens U : (S × P, S × P ) → (P, P ′ ).
Again, we view this optimiser as a reparameterisation which may be “plugged
in” a model as in Figure 3 (right). Let us now consider how several well-known
optimisers can be implemented in this way.
Example 14 (Momentum). In the momentum variant of gradient descent, one
keeps track of the previous change and uses this to inform how the current
parameter should be changed. Thus, in this case, we set S = P , fix some γ >
0, and define the momentum lens (U, U ∗ ) : (P × P, P × P ) → (P, P ′ ) . by
U (s, p) = p and U ∗ (s, p, p′ ) = (s′ , p + s′ ), where s′ = −γs + p′ . Note momentum
recovers gradient descent when γ = 0.
In both standard gradient descent and momentum, our lens representation
has trivial get part. However, as soon as we move to more complicated variants,
this is not anymore the case, as for instance in Nesterov momentum below.
Example 15 (Nesterov momentum). In Nesterov momentum, one uses the mo-
mentum from previous updates to tweak the input parameter supplied to the
network. We can precisely capture this by using a small variation of the lens in
the previous example. Again, we set S = P , fix some γ > 0, and define the Nes-
terov momentum lens (U, U ∗ ) : (P × P, P × P ) → (P, P ′ ) by U (s, p) = p + γs
and U ∗ as in the previous example.
Example 16 (Adagrad). Given any fixed ϵ > 0 and δ ∼ 10−7 , Adagrad [20] is
given by S = P , with the lens whose get part is (g, p) 7→ p. The put is (g, p, p′ ) 7→
(g ′ , p + δ+√
ϵ
g′
⊙ p′ ) where g ′ = g + p′ ⊙ p′ and ⊙ is the elementwise (Hadamard)
product. Unlike with other optimization algorithms where the learning rate is
the same for all parameters, Adagrad divides the learning rate of each individual
parameter with the square root of the past accumulated gradients.
Example 17 (Adam). Adaptive Moment Estimation (Adam) [32] is another method
that computes adaptive learning rates for each parameter by storing exponen-
tially decaying average of past gradients (m) and past squared gradients (v). For
fixed β1 , β2 ∈ [0, 1), ϵ > 0, and δ ∼ 10−8 , Adam is given by S = P × P , with
the lens whose get part is (m, v, p) 7→ p and whose put part is put(m, v, p, p′ ) =
(mb ′ , vb′ , p + δ+√
ϵ
b′
v
b ′ ) where m′ = β1 m + (1 − β1 )p′ , v ′ = β2 v + (1 − β2 )p′2 ,
⊙m
m′ v′
b′ =
and m 1−β1t
, vb′ = 1−β2t
.

Note that, so far, optimsers/reparameterisations have been added to the


P/P ′ wires. In order to change the model’s parameters (Fig. 3). In Section 4.2
we will study them on the A/A′ wires instead, giving deep dreaming.

4 Learning with Parametric Lenses


In the previous section we have seen how all the components of learning can be
modeled as parametric lenses. We now study how all these components can be
16 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

put together to form supervised learning systems. In addition to studying the


most common examples of supervised learning: systems that learn parameters,
we also study different kinds systems: those that learn their inputs. This is a
technique commonly known as deep dreaming, and we present it as a natural
counterpart of supervised learning of parameters.
Before we describe these systems, it will be convenient to represent all the
inputs and outputs of our parametric lenses as parameters. In (6), we see the
P/P ′ and B/B ′ inputs and outputs as parameters; however, the A/A′ wires are
not. To view the A/A′ inputs as parameters, we compose that system with the
parametric lens η we now define. The parametric lens η has the type (1, 1) →
(A, A′ ) with parameter space (A, A′ ) defined by (getη = 1A , putη = π1 ) and can
A
A
be depicted graphically as . Composing η with the rest of the learning
A′
system in (6) gives us the closed parametric lens

A A′ P P′ B B′
A B L
Model Loss α (7)
A′ B′ L′

This composite is now a map in Para(Lens(C)) from (1, 1) to (1, 1); all its inputs
and outputs are now vertical wires, ie., parameters. Unpacking it further, this is
a lens of type (A × P × B, A′ × P ′ × B ′ ) → (1, 1) whose get map is the terminal
map, and whose put map is of the type A × P × B → A′ × P ′ × B ′ . It can be
unpacked as the composite put(a, p, bt ) = (a′ , p′ , b′t ), where

bp = f (p, a) (b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).

In the next two sections we consider further additions to the image above which
correspond to different types of supervised learning.

4.1 Supervised Learning of Parameters

The most common type of learning performed on (7) is supervised learning of


parameters. This is done by reparameterising (cf. Section 2.1) the image in the
following manner. The parameter ports are reparameterised by one of the (pos-
sibly stateful) optimisers described in the previous section, while the backward
wires A′ of inputs and B ′ of outputs are discarded. This finally yields the com-
plete picture of a system which learns the parameters in a supervised manner:
Categorical Foundations of Gradient-Based Learning 17

A S×P S×P
B

Optimiser

P P′
B′
A B L

Model Loss α
′ ′ ′
A B L

Fixing a particular optimiser (U, U ) : (S × P, S × P ) → (P, P ′ ) we again


unpack the entire construction. This is a map in Para(Lens(C)) from (1, 1) to


(1, 1) whose parameter space is (A × S × P × B, S × P ). In other words, this
is a lens of type (A × S × P × B, S × P ) → (1, 1) whose get component is the
terminal map. Its put map has the type A × S × P × B → S × P and unpacks
to put(a, s, p, bt ) = U ∗ (s, p, p′ ), where

p = U (s, p) bp = f (p, a)
(b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).

While this formulation might seem daunting, we note that it just explicitly
specifies the computation performed by a supervised learning system. The vari-
able p represents the parameter supplied to the network by the stateful gradient
update rule (in many cases this is equal to p); bp represents the prediction of
the network (contrast this with bt which represents the ground truth from the
dataset). Variables with a tick ′ represent changes: b′p and b′t are the changes
on predictions and true values respectively, while p′ and a′ are changes on the
parameters and inputs. Furthermore, this arises automatically out of the rule for
lens composition (3); what we needed to specify is just the lenses themselves.
We justify and illustrate our approach on a series of case studies drawn from
the literature. This presentation has the advantage of treating all these instances
uniformly in terms of basic constructs, highlighting their similarities and differ-
ences. First, we fix some parametric map (Rp , f ) : Para(Smooth)(Ra , Rb ) in
Smooth and the constant negative learning rate α : R (Example 10). We then
vary the loss function and the gradient update, seeing how the put map above
reduces to many of the known cases in the literature.

Example 18 (Quadratic error, basic gradient descent). Fix the quadratic error
(Example 6) as the loss map and basic gradient update (Example 12). Then the
aforementioned put map simplifies. Since there is no state, its type reduces to
A × P × B → P , and we have put(a, p, bt ) = p + p′ , where (p′ , a′ ) = R[f ](p, a, α ·
(f (p, a) − bt )). Note that α here is simply a constant, and due to the linearity
of the reverse derivative (Def 4), we can slide the α from the costate into the
basic gradient update lens. Rewriting this update, and performing this sliding we
obtain a closed form update step put(a, p, bt ) = p+α·(R[f ](p, a, f (p, a)−bt ); π0 ),
18 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

where the negative descent component of gradient descent is here contained in


the choice of the negative constant α.

This example gives us a variety of regression algorithms solved iteratively


by gradient descent: it embeds some parametric map (Rp , f ) : Ra → Rb into the
system which performs regression on input data - where a denotes the input to
the model and bt denotes the ground truth. If the corresponding f is linear and
b = 1, we recover simple linear regression with gradient descent. If the codomain
is multi-dimensional, i.e. we are predicting multiple scalars, then we recover
multivariate linear regression. Likewise, we can model a multi-layer perceptron or
even more complex neural network architectures performing supervised learning
of parameters simply by changing the underlying parametric map.

Example 19 (Softmax cross entropy, basic gradient descent). Fix Softmax cross
entropy (Example 8) as the loss map and basic gradient update (Example 12).
Again the put map simplifies. The type reduces to A × P × B → P and we have
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, α · (Softmax(f (p, a)) − bt )). The
same rewriting performed on the previous example can be done here.

This example recovers logistic regression, e.g. classification.

Example 20 (Mean squared error, Nesterov Momentum). Fix the quadratic error
(Example 6) as the loss map and Nesterov momentum (Example 15) as the
gradient update. This time the put map A × S × P × B → S × P does not have a
simplified type. The implementation of put reduces to put(a, s, p, bt ) = (s′ , p+s′ ),
where p = p + γs, (p′ , a′ ) = R[f ](p, a, α · (f (p, a) − bt )), and s′ = −γs + p′ .

This example with Nesterov momentum differs in two key points from all
the other ones: i) the optimiser is stateful, and ii) its get map is not trivial.
While many other optimisers are stateful, the non-triviality of the get map here
showcases the importance of lenses. They allow us to make precise the notion of
computing a “lookahead” value for Nesterov momentum, something that is in
practice usually handled in ad-hoc ways. Here, the algebra of lens composition
handles this case naturally by using the get map, a seemingly trivial, unused
piece of data for previous optimisers.
Our last example, using a different base category POLY Z2 , shows that our
framework captures learning in not just continuous, but discrete settings too.
Again, we fix a parametric map (Zp , f ) : POLYZ2 (Za , Zb ) but this time we fix
the identity learning rate (Example 11), instead of a constant one.

Example 21 (Basic learning in Boolean circuits). Fix XOR as the loss map (Ex-
ample 7) and the basic gradient update (Example 13). The put map again
simplifies. The type reduces to A × P × B → P and the implementation to
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, f (p, a) + bt ).

A sketch of learning iteration. Having described a number of examples in


supervised learning, we outline how to model learning iteration in our framework.
Recall the aforementioned put map whose type is A × P × B → P (for simplicity
Categorical Foundations of Gradient-Based Learning 19

here modelled without state S). This map takes an input-output pair (a0 , b0 ),
the current parameter pi and produces an updated parameter pi+1 . At the next
time step, it takes a potentially different input-output pair (a1 , b1 ), the updated
parameter pi+1 and produces pi+2 . This process is then repeated. We can model
this iteration as a composition of the put map with itself, as a composite (A ×
put × B); put whose type is A × A × P × B × B → P . This map takes two input-
output pairs A × B, a parameter and produces a new parameter by processing
these datapoints in sequence. One can see how this process can be iterated any
number of times, and even represented as a string diagram.
But we note that with a slight reformulation of the put map, it is possible
to obtain a conceptually much simpler definition. The key insight lies in seeing
that the map put : A × P × B → P is essentially an endo-map P → P with some
extra inputs A × B; it’s a parametric map!
In other words, we can recast the put map as a parametric map (A × B, put) :
Para(C)(P, P ). Being an endo-map, it can be composed with itself. The resulting
composite is an endo-map taking two “parameters”: input-output pair at the
time step 0 and time step 1. This process can then be repeated, with Para
composition automatically taking care of the algebra of iteration.

A×B A×B A×B

P
P put put . n. . put P

This reformulation captures the essence of parameter iteration: one can think
of it as a trajectory pi , pi+1 , pi+2 , ... through the parameter space; but it is a
trajectory parameterised by the dataset. With different datasets the algorithm
will take a different path through this space and learn different things.

4.2 Deep Dreaming: Supervised Learning of Inputs

We have seen that reparameterising the parameter port with gradient descent
allows us to capture supervised parameter learning. In this section we describe
how reparameterising the input port provides us with a way to enhance an input
image to elicit a particular interpretation. This is the idea behind the technique
called Deep Dreaming, appearing in the literature in many forms [19, 34, 35, 44].

S×A S×A P B

Optimiser

A A′
B′
A B L

Model Loss α (8)


A′ B′ L′
20 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Deep dreaming is a technique which uses the parameters p of some trained


classifier network to iteratively dream up, or amplify some features of a class b on
a chosen input a. For example, if we start with an image of a landscape a0 , a label
b of a “cat” and a parameter p of a sufficiently well-trained classifier, we can start
performing “learning” as usual: computing the predicted class for the landscape
a0 for the network with parameters p, and then computing the distance between
the prediction and our label of a cat b. When performing backpropagation, the
respective changes computed for each layer tell us how the activations of that
layer should have been changed to be more “cat” like. This includes the first
(input) layer of the landscape a0 . Usually, we discard this changes and apply
gradient update to the parameters. In deep dreaming we discard the parameters
and apply gradient update to the input (see (8)). Gradient update here takes these
changes and computes a new image a1 which is the same image of the landscape,
but changed slightly so to look more like whatever the network thinks a cat looks
like. This is the essence of deep dreaming, where iteration of this process allows
networks to dream up features and shapes on a particular chosen image [1].
Just like in the previous subsection, we can write this deep dreaming system
as a map in Para(Lens(C)) from (1, 1) to (1, 1) whose parameter space is (S×A×
P ×B, S ×A). In other words, this is a lens of type (S ×A×P ×B, S ×A) → (1, 1)
whose get map is trivial. Its put map has the type S × A × P × B → S × A
and unpacks to put(s, a, p, bt ) = U ∗ (s, a, a′ ), where a = U (s, a), bp = f (p, a),
(b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))), and (p′ , a′ ) = R[f ](p, a, b′p ).
We note that deep dreaming is usually presented without any loss function as
a maximisation of a particular activation in the last layer of the network output
[44, Section 2.]. This maximisation is done with gradient ascent, as opposed to
gradient descent. However, this is just a special case of our framework where
the loss function is the dot product (Example 9). The choice of the particular
activation is encoded as a one-hot vector, and the loss function in that case
essentially masks the network output, leaving active only the particular chosen
activation. The final component is the gradient ascent: this is simply recovered
by choosing a positive, instead of a negative learning rate [44]. We explicitly
unpack this in the following example.
Example 22 (Deep dreaming, dot product loss, basic gradient update). Fix Smooth
as base category, a parametric map (Rp , f ) : Para(Smooth)(Ra , Rb ), the dot
product loss (Example 9), basic gradient update (Example 12), and a positive
learning rate α : R. Then the above put map simplifies. Since there is no state, its
type reduces to A × P × B → A and its implementation to put(a, p, bt ) = a + a′ ,
where (p′ , a′ ) = R[f ](p, a, α · bt ). Like in Example 18, this update can be rewrit-
ten as put(a, p, bt ) = a + α · (R[f ](p, a, bt ); π1 ), making a few things apparent.
This update does not depend on the prediction f (p, a): no matter what the net-
work has predicted, the goal is always to maximize particular activations. Which
activations? The ones chosen by bt . When bt is a one-hot vector, this picks out
the activation of just one class to maximize, which is often done in practice.
While we present only the most basic image, there is plenty of room left
for exploration. The work of [44, Section 2.] adds an extra regularization term
Categorical Foundations of Gradient-Based Learning 21

to the image. In general, the neural network f is sometimes changed to copy


a number of internal activations which are then exposed on the output layer.
Maximizing all these activations often produces more visually appealing results.
In the literature we did not find an example which uses the Softmax-cross entropy
(Example 8) as a loss function in deep dreaming, which seems like the more
natural choice in this setting. Furthermore, while deep dreaming commonly uses
basic gradient descent, there is nothing preventing the use of any of the optimiser
lenses discussed in the previous section, or even doing deep dreaming in the
context of Boolean circuits. Lastly, learning iteration which was described in at
the end of previous subsection can be modelled here in an analogous way.

5 Implementation

We provide a proof-of-concept implementation as a Python library — full usage


examples, source code, and experiments can be found at [17]. We demonstrate
the correctness of our library empirically using a number of experiments im-
plemented both in our library and in Keras [11], a popular framework for deep
learning. For example, one experiment is a model for the MNIST image clas-
sification problem [33]: we implement the same model in both frameworks and
achieve comparable accuracy. Note that despite similarities between the user in-
terfaces of our library and of Keras, a model in our framework is constructed
as a composition of parametric lenses. This is fundamentally different to the
approach taken by Keras and other existing libraries, and highlights how our
proposed algebraic structures naturally guide programming practice
In summary, our implementation demonstrates the advantages of our ap-
proach. Firstly, computing the gradients of the network is greatly simplified
through the use of lens composition. Secondly, model architectures can be ex-
pressed in a principled, mathematical language; as morphisms of a monoidal
category. Finally, the modularity of our approach makes it easy to see how var-
ious aspects of training can be modified: for example, one can define a new
optimization algorithm simply by defining an appropriate lens. We now give a
brief sketch of our implementation.

5.1 Constructing a Model with Lens and Para

We model a lens (f, f ∗ ) in our library with the Lens class, which consists of a
pair of maps fwd and rev corresponding to f and f ∗ , respectively. For example,
we write the identity lens (1A , π2 ) as follows:
i d e n t i t y = Lens ( lambda x : x , lambda x dy : x dy [ 1 ] )

The composition (in diagrammatic order) of Lens values f and g is written


f >> g, and monoidal composition as f @ g. Similarly, the type of Para maps
is modeled by the Para class, with composition and monoidal product written
the same way. Our library provides several primitive Lens and Para values.
22 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Let us now see how to construct a single layer neural network from the com-
position of such primitives. Diagramatically, we wish to construct the following
model, representing a single ‘dense’ layer of a neural network:

Rb×a Rb×a Rb Rb

Rb Rb
Ra Rb
linear bias activation (9)
Ra Rb
Rb Rb
Here, the parameters of linear are the coefficients of a b × a matrix, and the
underlying lens has as its forward map the function (M, x) → M · x, where M is
the b × a matrix whose coefficients are the Rb×a parameters, and x ∈ Ra is the
input vector. The bias map is even simpler: the forward map of the underlying
lens is simply pointwise addition of inputs and parameters: (b, x) → b+x. Finally,
the activation map simply applies a nonlinear function (e.g., sigmoid) to the
input, and thus has the trivial (unit) parameter space. The representation of
this composition in code is straightforward: we can simply compose the three
primitive Para maps as in (9):
def d e n s e ( a , b , a c t i v a t i o n ) :
return l i n e a r ( a , b ) >> b i a s ( b ) >> a c t i v a t i o n

Note that by constructing model architectures in this way, the computation


of reverse derivatives is greatly simplified: we obtain the reverse derivative ‘for
free’ as the put map of the model. Furthermore, adding new primitives is also
simplified: the user need simply provide a function and its reverse derivative in
the form of a Para map. Finally, notice also that our approach is truly composi-
tional: we can define a hidden layer neural network with n hidden units simply
by composing two dense layers, as follows:
d e n s e ( a , n , a c t i v a t i o n ) >> d e n s e ( n , b , a c t i v a t i o n )

5.2 Learning
Now that we have constructed a model, we also need to use it to learn from
data. Concretely, we will construct a full parametric lens as in Figure 2 then
extract its put map to iterate over the dataset.
By way of example, let us see how to construct the following parametric lens,
representing basic gradient descent over a single layer neural network with a
fixed learning rate:
A P P B

P P′
B′
A B L
dense loss (10)
ϵ
A′ B′ L′
Categorical Foundations of Gradient-Based Learning 23

This morphism is constructed essentially as below, where apply update(α,


f ) represents the ‘vertical stacking’ of α atop f :
a p p l y u p d a t e ( b a s i c u p d a t e , d e n s e ) >> l o s s >> l e a r n i n g r a t e ( ϵ )

Now, given the parametric lens of (10), one can construct a morphism step :
B ×P ×A → P which is simply the put map of the lens. Training the model then
consists of iterating the step function over dataset examples (x, y) ∈ A×B to op-
timise some initial choice of parameters θ0 ∈ P , by letting θi+1 = step(yi , θi , xi ).
Note that our library also provides a utility function to construct step from
its various pieces:
s t e p = s u p e r v i s e d s t e p ( model , update , l o s s , l e a r n i n g r a t e )

For an end-to-end example of model training and iteration, we refer the


interested reader to the experiments accompanying the code [17].

6 Related Work
The work [23] is closely related to ours, in that it provides an abstract categorical
model of backpropagation. However, it differs in a number of key aspects. We
give a complete lens-theoretic explanation of what is back-propagated via (i)
the use of CRDCs to model gradients; and (ii) the Para construction to model
parametric functions and parameter update. We thus can go well beyond [23]
in terms of examples - their example of smooth functions and basic gradient
descent is covered in our subsection 4.1.
We also explain some of the constructions of [23] in a more structured way.
For example, rather than considering the category Learn of [23] as primitive,
here we construct it as a composite of two more basic constructions (the Para
and Lens constructions). The flexibility could be used, for example, to com-
positionally replace Para with a variant allowing parameters to come from a
different category, or lenses with the category of optics [38] enabling us to model
things such as control flow using prisms.
One more relevant aspect is functoriality. We use a functor to augment a
parametric map with its backward pass, just like [23]. However, they additionally
augmented this map with a loss map and gradient descent using a functor as
well. This added extra conditions on the partial derivatives of the loss function:
it needed to be invertible in the 2nd variable. This constraint was not justified
in [23], nor is it a constraint that appears in machine learning practice. This led
us to reexamine their constructions, coming up with our reformulation that does
not require it. While loss maps and optimisers are mentioned in [23] as parts of
the aforementioned functor, here they are extracted out and play a key role: loss
maps are parametric lenses and optimisers are reparameterisations. Thus, in this
paper we instead use Para-composition to add the loss map to the model, and
Para 2-cells to add optimisers. The mentioned inverse of the partial derivative
of the loss map in the 2nd variable was also hypothesised to be relevant to deep
dreaming. We have investigated this possibility thoroughly in our paper, showing
24 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

it is gradient update which is used to dream up pictures. We also correct a small


issue in Theorem III.2 of [23]. There, the morphisms of Learn were defined up to
an equivalence (pg. 4 of [23]) but, unfortunately, the functor defined in Theorem
III.2 does not respect this equivalence relation. Our approach instead uses 2-cells
which comes from the universal property of Para — a 2-cell from (P, f ) : A → B
to (Q, g) : A → B is a lens, and hence has two components: a map α : Q → P
and α∗ : Q × P → Q. By comparison, we can see the equivalence relation of [23]
as being induced by map α : Q → P , and not a lens. Our approach highlights
the importance of the 2-categorical structure of learners. In addition, it does not
treat the functor Para(C) → Learn as a primitive. In our case, this functor
has the type Para(C) → Para(Lens(C)) and arises from applying Para to a
canonical functor C → Lens(C) existing for any reverse derivative category, not
just Smooth. Lastly, in our paper we took advantage of the graphical calculus
for Para, redrawing many diagrams appearing in [23] in a structured way.
Other than [23], there are a few more relevant papers. The work of [18] con-
tains a sketch of some of the ideas this paper evolved from. They are based
on the interplay of optics with parameterisation, albeit framed in the setting of
diffeological spaces, and requiring cartesian and local cartesian closed structure
on the base category. Lenses and Learners are studied in the eponymous work
of [22] which observes that learners are parametric lenses. They do not explore
any of the relevant Para or CRDC structure, but make the distinction between
symmetric and asymmetric lenses, studying how they are related to learners de-
fined in [23]. A lens-like implementation of automatic differentiation is the focus
of [21], but learning algorithms aren’t studied. A relationship between category-
theoretic perspective on probabilistic modeling and gradient-based optimisation
is studied in [42] which also studies a variant of the Para construction. Usage of
Cartesian differential categories to study learning is found in [46]. They extend
the differential operator to work on stateful maps, but do not study lenses, pa-
rameterisation nor update maps. The work of [24] studies deep learning in the
context of Cycle-consistent Generative Adversarial Networks [51] and formalises
it via free and quotient categories, making parallels to the categorical formula-
tions of database theory [45]. They do use the Para construction, but do not
relate it to lenses nor reverse derivative categories. A general survey of category
theoretic approaches to machine learning, covering many of the above papers,
can be found in [43]. Lastly, the concept of parametric lenses has started appear-
ing in recent formulations of categorical game theory and cybernetics [9,10]. The
work of [9] generalises the study of parametric lenses into parametric optics and
connects it to game thereotic concepts such as Nash equilibria.

7 Conclusions and Future Directions

We have given a categorical foundation of gradient-based learning algorithms


which achieves a number of important goals. The foundation is principled and
mathematically clean, based on the fundamental idea of a parametric lens. The
foundation covers a wide variety of examples: different optimisers and loss maps
Categorical Foundations of Gradient-Based Learning 25

in gradient-based learning, different settings where gradient-based learning hap-


pens (smooth functions vs. boolean circuits), and both learning of parameters
and learning of inputs (deep dreaming). Finally, the foundation is more than
a mere abstraction: we have also shown how it can be used to give a practical
implementation of learning, as discussed in Section 5.
There are a number of important directions which are possible to explore
because of this work. One of the most exciting ones is the extension to more
complex neural network architectures. Our formulation of the loss map as a
parametric lens should pave the way for Generative Adversarial Networks [27],
an exciting new architecture whose loss map can be said to be learned in tandem
with the base network. In all our settings we have fixed an optimiser beforehand.
The work of [4] describes a meta-learning approach which sees the optimiser as a
neural network whose parameters and gradient update rule can be learned. This
is an exciting prospect since one can model optimisers as parametric lenses;
and our framework covers learning with parametric lenses. Recurrent neural
networks are another example of a more complex architecture, which has already
been studied in the context of differential categories in [46]. When it comes to
architectures, future work includes modelling some classical systems as well, such
as the Support Vector Machines [15], which should be possible with the usage
of loss maps such as Hinge loss.
Future work also includes using the full power of CRDC axioms. In particular,
axioms RD.6 or RD.7, which deal with the behaviour of higher-order derivatives,
were not exploited in our work, but they should play a role in modelling some
supervised learning algorithms using higher-order derivatives (for example, the
Hessian) for additional optimisations. Taking this idea in a different direction,
one can see that much of our work can be applied to any functor of the form
F : C → Lens(C) - F does not necessarily have to be of the form f 7→ (f, R[f ])
for a CRDC R. Moreover, by working with more generalised forms of the lens
category (such as dependent lenses), we may be able to capture ideas related
to supervised learning on manifolds. And, of course, we can vary the parameter
space to endow it with different structure from the functions we wish to learn. In
this vein, we wish to use fibrations/dependent types to model the use of tangent
bundles: this would foster the extension of the correct by construction paradigm
to machine learning, and thereby addressing the widely acknowledged problem
of trusted machine learning. The possibilities are made much easier by the com-
positional nature of our framework. Another key topic for future work is to link
gradient-based learning with game theory. At a high level, the former takes lit-
tle incremental steps to achieve an equilibrium while the later aims to do so in
one fell swoop. Formalising this intuition is possible with our lens-based frame-
work and the lens-based framework for game theory [25]. Finally, because our
framework is quite general, in future work we plan to consider further modifica-
tions and additions to encompass non-supervised, probabilistic and non-gradient
based learning. This includes genetic algorithms and reinforcement learning.

Acknowledgements Fabio Zanasi acknowledges support from epsrc EP/V002376/1.


Geoff Cruttwell acknowledges support from NSERC.
26 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

References

1. Inceptionism: Going deeper into neural networks (2015), https://ai.googleblog.


com/2015/06/inceptionism-going-deeper-into-neural.html
2. Explainable AI: the basics - policy briefing (2019), royalsociety.org/ai-
interpretability
3. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Pro-
ceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004.
pp. 415–425 (2004). https://doi.org/10.1109/LICS.2004.1319636
4. Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T.,
Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient
descent. In: 30th Conference on Neural Information Processings Systems (NIPS)
(2016)
5. Baez, J.C., Erbele, J.: Categories in Control. Theory and Applications of Categories
30(24), 836–881 (2015)
6. Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang:
Resourceful lenses for string data. SIGPLAN Not. 43(1), 407–419 (Jan 2008).
https://doi.org/10.1145/1328897.1328487
7. Boisseau, G.: String Diagrams for Optics. arXiv:2002.11480 (2020)
8. Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal flow di-
agrams I: linear relations on streams. Inf. Comput. 252, 2–29 (2017).
https://doi.org/10.1016/j.ic.2016.03.002, https://doi.org/10.1016/j.ic.2016.
03.002
9. Capucci, M., Gavranovi’c, B., Hedges, J., Rischel, E.F.: Towards foundations of
categorical cybernetics. arXiv:2105.06332 (2021)
10. Capucci, M., Ghani, N., Ledent, J., Nordvall Forsberg, F.: Translating Extensive
Form Games to Open Games with Agency. arXiv:2105.06763 (2021)
11. Chollet, F., et al.: Keras (2015), https://github.com/fchollet/keras
12. Clarke, B., Elkins, D., Gibbons, J., Loregian, F., Milewski, B., Pillmore, E., Román,
M.: Profunctor optics, a categorical update. arXiv:2001.07488 (2020)
13. Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B.,
Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proceedings of the
28th Computer Science Logic (CSL) conference (2020)
14. Coecke, B., Kissinger, A.: Picturing Quantum Processes: A First Course in Quan-
tum Theory and Diagrammatic Reasoning. Cambridge University Press (2017).
https://doi.org/10.1017/9781316219317
15. Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273–297
(1995)
16. Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: Training Deep Neural
Networks with binary weights during propagations. arXiv:1511.00363
17. CRCoauthors, A.: Numeric Optics: A python library for constructing and training
neural networks based on lenses and reverse derivatives. https://github.com/
anonymous-c0de/esop-2022
18. Dalrymple, D.: Dioptics: a common generalization of open games and gradient-
based learners. SYCO7 (2019), https://research.protocol.ai/publications/
dioptics-a-common-generalization-of-open-games-and-gradient-based-
learners/dalrymple2019.pdf
19. Dosovitskiy, A., Brox, T.: Inverting convolutional networks with convolutional net-
works. arXiv:1506.02753 (2015)
Categorical Foundations of Gradient-Based Learning 27

20. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning
and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121–
2159 (2011)
21. Elliott, C.: The simple essence of automatic differentiation (differentiable functional
programming made easy). arXiv:1804.00746 (2018)
22. Fong, B., Johnson, M.: Lenses and learners. In: Proceedings of the 8th International
Workshop on Bidirectional transformations (Bx@PLW) (2019)
23. Fong, B., Spivak, D.I., Tuyéras, R.: Backprop as functor: A compositional per-
spective on supervised learning. In: Proceedings of the Thirty fourth Annual IEEE
Symposium on Logic in Computer Science (LICS 2019). pp. 1–13. IEEE Computer
Society Press (June 2019)
24. Gavranovic, B.: Compositional deep learning. arXiv:1907.08292 (2019)
25. Ghani, N., Hedges, J., Winschel, V., Zahn, P.: Compositional game theory. In:
Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer
Science. p. 472–481. LICS ’18 (2018). https://doi.org/10.1145/3209108.3209165
26. Ghica, D.R., Jung, A., Lopez, A.: Diagrammatic Semantics for Digital Circuits.
arXiv:1703.10247 (2017)
27. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z.,
Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in
Neural Information Processing Systems 27, pp. 2672–2680 (2014), http://papers.
nips.cc/paper/5423-generative-adversarial-nets.pdf
28. Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of
algorithmic differentiation. Society for Industrial and Applied Mathematics (2008)
29. Hedges, J.: Limits of bimorphic lenses. arXiv:1808.05545 (2018)
30. Hermida, C., Tennent, R.D.: Monoidal indeterminates and cate-
gories of possible worlds. Theor. Comput. Sci. 430, 3–22 (Apr 2012).
https://doi.org/10.1016/j.tcs.2012.01.001
31. Johnson, M., Rosebrugh, R., Wood, R.: Lenses, fibrations and universal transla-
tions. Mathematical structures in computer science 22, 25–42 (2012)
32. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio,
Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
(2015), http://arxiv.org/abs/1412.6980
33. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied
to document recognition. In: Proceedings of the IEEE. pp. 2278–2324 (1998).
https://doi.org/10.1109/5.726791
34. Mahendran, A., Vedaldi, A.: Understanding deep image representations by invert-
ing them. arXiv:1412.0035 (2014)
35. Nguyen, A.M., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High
confidence predictions for unrecognizable images. arXiv:1412.1897 (2014)
36. Olah, C.: Neural networks, types, and functional programming (2015), http://
colah.github.io/posts/2015-09-NN-Types-FP/
37. Polyak, B.: Some methods of speeding up the convergence of iteration meth-
ods. USSR Computational Mathematics and Mathematical Physics 4(5), 1 –
17 (1964). https://doi.org/https://doi.org/10.1016/0041-5553(64)90137-5, http:
//www.sciencedirect.com/science/article/pii/0041555364901375
38. Riley, M.: Categories of optics. arXiv:1809.00738 (2018)
39. Selinger, P.: A survey of graphical languages for monoidal categories. Lecture Notes
in Physics p. 289–355 (2010)
28 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

40. Selinger, P.: Control categories and duality: on the categorical semantics of the
lambda-mu calculus. Mathematical Structures in Computer Science 11(02), 207–
260 (4 2001). https://doi.org/null, http://journals.cambridge.org/article_
S096012950000311X
41. Seshia, S.A., Sadigh, D.: Towards verified artificial intelligence. CoRR
abs/1606.08514 (2016), http://arxiv.org/abs/1606.08514
42. Shiebler, D.: Categorical Stochastic Processes and Likelihood. Compositionality
3(1) (2021)
43. Shiebler, D., Gavranović, B., Wilson, P.: Category Theory in Machine Learning.
arXiv:2106.07032 (2021)
44. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks:
Visualising image classification models and saliency maps. arXiv:1312.6034 (2014)
45. Spivak, D.I.: Functorial data migration. arXiv:1009.1166 (2010)
46. Sprunger, D., Katsumata, S.y.: Differentiable causal computations via delayed
trace. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in
Computer Science. LICS ’19, IEEE Press (2019)
47. Steckermeier, A.: Lenses in functional programming. Preprint, available at
https://sinusoid.es/misc/lager/lenses.pdf (2015)
48. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initial-
ization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.)
Proceedings of the 30th International Conference on Machine Learning. vol. 28,
pp. 1139–1147 (2013), http://proceedings.mlr.press/v28/sutskever13.html
49. Turi, D., Plotkin, G.: Towards a mathematical operational semantics. In: Pro-
ceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science. pp.
280–291 (1997). https://doi.org/10.1109/LICS.1997.614955
50. Wilson, P., Zanasi, F.: Reverse derivative ascent: A categorical approach to learn-
ing boolean circuits. In: Proceedings of Applied Category Theory (ACT) (2020),
https://cgi.cse.unsw.edu.au/~eptcs/paper.cgi?ACT2020:31
51. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation
using Cycle-Consistent Adversarial Networks. arXiv:1703.10593 (2017)

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Compiling Universal Probabilistic Programming
Languages with Efficient Parallel Sequential
Monte Carlo Inference?

Daniel Lundén1 ( ) , Joey Öhman2 , Jan Kudlicka3 , Viktor Senderov4 ,


Fredrik Ronquist4,5 , and David Broman1

1
EECS and Digital Futures, KTH Royal Institute of Technology, Stockholm,
Sweden, {dlunde,dbro}@kth.se
2
AI Sweden, Stockholm, Sweden, [email protected]
3
Department of Data Science and Analytics, BI Norwegian Business School, Oslo,
Norway, [email protected]
4
Department of Bioinformatics and Genetics, Swedish Museum of Natural History,
Stockholm, Sweden, {viktor.senderov,fredrik.ronquist}@nrm.se
5
Department of Zoology, Stockholm University

Abstract. Probabilistic programming languages (PPLs) allow users to


encode arbitrary inference problems, and PPL implementations provide
general-purpose automatic inference for these problems. However, con-
structing inference implementations that are efficient enough is challeng-
ing for many real-world problems. Often, this is due to PPLs not fully ex-
ploiting available parallelization and optimization opportunities. For ex-
ample, handling probabilistic checkpoints in PPLs through continuation-
passing style transformations or non-preemptive multitasking—as is done
in many popular PPLs—often disallows compilation to low-level lan-
guages required for high-performance platforms such as GPUs. To solve
the checkpoint problem, we introduce the concept of PPL control-flow
graphs (PCFGs)—a simple and efficient approach to checkpoints in low-
level languages. We use this approach to implement RootPPL: a low-level
PPL built on CUDA and C++ with OpenMP, providing highly effi-
cient and massively parallel SMC inference. We also introduce a general
method of compiling universal high-level PPLs to PCFGs and illustrate
its application when compiling Miking CorePPL—a high-level universal
PPL—to RootPPL. The approach is the first to compile a universal PPL
to GPUs with SMC inference. We evaluate RootPPL and the CorePPL
compiler through a set of real-world experiments in the domains of phylo-
genetics and epidemiology, demonstrating up to 6× speedups over state-
of-the-art PPLs implementing SMC inference.

Keywords: Probabilistic Programming Languages · Compilers · Se-


quential Monte Carlo · GPU Compilation
?
This project is financially supported by the Swedish Foundation for Strategic Re-
search (FFL15-0032 and RIT15-0012), the European Union’s Horizon 2020 re-
search and innovation program under the Marie Skłodowska-Curie grant agreement
PhyPPL (No 898120), and the Swedish Research Council (grant number 2018-04620).

c The Author(s) 2022


I. Sergey (Ed.): ESOP 2022, LNCS 13240, pp. 29–56, 2022.
https://doi.org/10.1007/978-3-030-99336-8_2
30 D. Lundén et al.

1 Introduction

Probabilistic programming languages (PPLs) allow for encoding a wide range of


statistical inference problems and provide inference algorithms as part of their
implementations. Specifically, PPLs allow language users to focus solely on en-
coding their statistical problems, which the language implementation then solves
automatically. Many such languages exist and are applied in, e.g., statistics, ma-
chine learning, and artificial intelligence. Some example PPLs are WebPPL [20],
Birch [32], Anglican [40], and Pyro [10].
However, implementing efficient PPL inference algorithms is challenging for
many real-world problems. Most often, universal 6 PPLs implement general-
purpose inference algorithms—most commonly sequential Monte Carlo (SMC)
methods [14], Markov chain Monte Carlo (MCMC) methods [18], Hamiltonian
Monte Carlo (HMC) methods [12], variational inference (VI) [39], or a combina-
tion of these. In some cases, poor efficiency may be due to an inference algorithm
not well suited to the particular PPL program. However, in other cases, the PPL
implementations do not fully exploit opportunities for parallelization and opti-
mization on the available hardware. Unfortunately, doing this is often tricky
without introducing complexity for end-users of PPLs.
A critical performance consideration is handling probabilistic checkpoints [37]
in PPLs. Checkpoints are locations in probabilistic programs where inference al-
gorithms must interject, for example, to resample in SMC inference or record
random draw locations where MCMC inference can explore alternative execution
paths. The most common approach to checkpoints—used in universal PPLs such
as WebPPL [20], Anglican [40], and Birch [32]—is to associate them with PPL-
specific language constructs. In general, PPL users can place these constructs
without restriction, and inference algorithms interject through continuation-
passing style (CPS) transformations [9,20,40] or non-preemptive multitasking
[32] (e.g., coroutines) that enable pausing and resuming executions. These so-
lutions are often not available in languages such as C and CUDA [1] used for
high-performance platforms such as graphics processing units (GPUs), making
compiling PPLs to these languages and platforms challenging. Some approaches
for running PPLs on GPUs do exist, however. LibBi [29] runs on GPUs with
SMC inference but is not universal. Stan [12] and AugurV2 [22] partially run
MCMC inference on GPUs but have limited expressive power. Pyro [10] runs on
GPUs, but currently not in combination with SMC. In this paper, we compile a
universal PPL and run it with SMC on GPUs for the first time.
A more straightforward approach to checkpoints, used for SMC in Birch [32]
and Pyro [10], is to encode models with a step function called iteratively. Check-
points then occur each time step returns. This paper presents a new approach to
checkpoint handling, generalizing the step function approach. We write prob-
abilistic programs as a set of code blocks connected in what we term a PPL
6
A term due to Goodman et al. [19]. No precise definition exists, but in principle, a
universal PPL program can perform probabilistic operations at any point. In partic-
ular, it is not always possible to statically determine the number of random variables.
Compiling Universal PPLs with Efficient Parallel SMC Inference 31

Section 2 Section 4 Section 3


Miking RootPPL C++ or CUDA
Compiler Executable
CorePPL Language Compiler

RootPPL SMC
Inference Engine

Fig. 1: The CorePPL and RootPPL toolchain. Solid rectangular components


(gray) represent programs and rounded components (blue) translations. The
dashed rectangles indicate paper sections.

control-flow graph (PCFG). PPL checkpoints are restricted to only occur at


tail position in these blocks, and communication between blocks is only allowed
through an explicit PCFG state. As a result, pausing and resuming executions
is straightforward: it is simply a matter of stopping after executing a block and
then resuming by running the next block. A variable in the PCFG state, set from
within the blocks, determines the next block. This variable allows for loops and
branching and gives the same expressive power as other universal PPLs. We im-
plement the above approach in RootPPL: a low-level universal PPL framework
built using C++ and CUDA with highly efficient and parallel SMC inference.
RootPPL consists of both an inference engine and a simple macro-based PPL.
A problem with RootPPL is that it is low-level and, therefore, challenging
to write programs in. In particular, sending data between blocks through the
PCFG state can quickly get difficult for more complex models. To solve this, we
develop a general technique for compiling high-level universal PPLs to PCFGs.
The key idea is to decompose functions in the high-level language to a set of
PCFG blocks, such that checkpoints in the original function always occur at
tail position in blocks. As a result of the decomposition, the PCFG state must
store a part of the call stack. The compiler adds code for handling this call
stack explicitly in the PCFG blocks. We illustrate the compilation technique by
introducing a high-level source language, Miking CorePPL, and compiling it to
RootPPL. Fig. 1 illustrates the overall toolchain.
In summary, we make the following contributions.

– We introduce PCFGs, a framework for checkpoint handling in PPLs, and use


it to implement RootPPL: a low-level universal PPL with highly efficient and
parallel SMC inference (Section 3).
– We develop an approach for compiling high-level universal PPLs to PCFGs
and use it to compile Miking CorePPL to RootPPL. In particular, we give an
algorithm for decomposing high-level functions to PCFG blocks (Section 4).

Furthermore, we introduce Miking CorePPL in Section 2 and evaluate the


performance of RootPPL and the CorePPL compiler in Section 5 on real-world
models from phylogenetics and epidemiology, achieving up to 6× speedups over
the state-of-the-art. An artifact accompanying this paper supports the evalua-
tion [26]. An extended version of this article is also available [27]. A † symbol in
the text indicates more information is available in the extended version.
31 D. Lundén et al.

2 Miking CorePPL

This section introduces the Miking CorePPL language, used as a source language
for the compiler in Section 4. We discuss design considerations (Section 2.1) and
present the syntax and semantics (Section 2.2).

2.1 Design Considerations

Miking CorePPL (or CorePPL for short) is an intermediate representation (IR)


PPL, similar to IRs used by LLVM [6] and GCC [2]. This allows the reuse
of CorePPL as a target for domain-specific high-level PPLs and PPL compiler
back-ends. Consequently, CorePPL needs to be expressive enough to allow easy
translation from various domain-specific PPLs and simple enough for practical
use as a shared IR for compilers. Therefore, we base CorePPL on the lambda
calculus, extended with standard data types and constructs.
We must also consider which PPL-specific constructs to include. Critically,
most PPLs include constructs for defining random variables and likelihood up-
dating [21]. CorePPL includes such constructs, including first-class probability
distributions, to match the expressive power of existing PPLs.

2.2 Syntax and Semantics

We build CorePPL on top of the Miking framework [11]: a meta-language system


for creating domain-specific and general-purpose languages. This allows reusing
many existing Miking language components and transformations when building
the CorePPL language. More precisely, CorePPL extends Miking Core—a core
functional programming language in Miking—with PPL constructs.
A CorePPL program t is inductively defined by

t ::= x | lam x. t | t1 t2 | let x = t1 in t2 | C t | c


| recursive [let x = t] in
| match t1 with p then t2 else t3 | [t1 , t2 , . . ., tn ] (1)
| {l1 = t1 , l2 = t2 , . . ., l3 = t3 }
| assume t | weight t | observe t1 t2 | D t1 t2 . . . t|D|

where the metavariable x ranges over a set of variable names; C over a set of data
constructor names; p over a set of patterns; l over a set of record labels; and c over
various literals, such as integers, floating-point numbers, booleans, and strings, as
well as over various built-in functions in prefix form such as addi (adds integers).
The notation [let x = t] indicates a sequence of mutually recursive let bindings.
The metavariable D ranges over a set of probability distribution names, with |D|
indicating the number of parameters for a distribution D. For example, for the
normal distribution, |N | = 2. In addition to (1), we will also use the standard
syntactic sugar ; to indicate sequencing, as well as if t1 then t2 else t3 for
match t1 with true then t2 else t3 .
Compiling Universal PPLs with Efficient Parallel SMC Inference 33

0.4 Standard geometric


recursive let geometric = lam p.

Probability
1

2 let x = assume (Bernoulli p) in 0.2


3 if x then 0
0.4 Weighted geometric
4 weight (log 1.5);
5 addi 1 (geometric p) 0.2
6 else 1 0
0 1 2 3 4 5 6 7 8 9 ···
7 in geometric 0.5
Outcome
(a) (b)

Fig. 2: A toy example encoding a skewed geometric distribution, illustrating


CorePPL. Part (a) gives the CorePPL program, and part (b) the corresponding
distribution. The upper part of (b) shows the distribution for (a) with line 4
omitted, and the lower part of (b) shows it with line 4 included.

Consider the simple but illustrative CorePPL program in Fig. 2a. The pro-
gram encodes a variation of the geometric distribution, for which the result is the
number of times a coin is flipped until the result is tails. The program’s core is
the recursive function geometric, defined using a function over the probability
of heads for the coin, p. We initially call this function at line 7 with the argument
0.5, indicating a fair coin. On line 2, we define the random variable x to have a
Bernoulli distribution (i.e., a single coin flip) using the assume construct (often
known as sample in PPLs with sampling-based inference). If the random variable
is false (tails), we stop and return the result 1. If the random variable is true
(heads), we keep flipping the coin by a recursive call to geometric and add 1 to
this result. To illustrate likelihood updating, we make a contrived modification
to the standard geometric distribution by adding weight (log 1.5) on line 4.
This construct weights the execution by a factor of 1.5 each time the result is
heads. Note that CorePPL weight computations are in log-space for numerical
stability (hence the log 1.5 to factor by 1.5). Thus, the unnormalized probabil-
ity of seeing n coin flips, including the final tails, is 0.5n ·1.5n−1 —where 1.5n−1 is
the factor introduced by the n−1 calls to weight. The difference compared to the
standard geometric distribution is illustrated in Fig. 2b. The weight construct
is also commonly named factor or score in other PPLs.
What separates PPLs from ordinary programming languages is the ability to
modify the likelihood of execution paths, akin to the use of weight in Fig. 2a. We
often use likelihood modification to condition a probabilistic model on observed
data. For this purpose, CorePPL includes an explicit observe construct, which
allows for modifying the likelihood based on observed data assumed to originate
from a given probability distribution. For instance, observe 0.3 (Normal 0 1)
updates the likelihood with fN (0,1) (0.3) (note that this can equivalently be ex-
pressed through weight), where fN (0,1) is the probability density function of
the standard normal distribution. This conditioning can be related to Bayes’
theorem: the random variables defined in a program define a prior distribution
(e.g., the upper part of Fig. 2b), the use of the weight and observe primitives a
34 D. Lundén et al.

likelihood function, and the inference algorithm of the PPL infers the posterior
distribution (e.g., the lower part of Fig. 2b)
CorePPL includes sequences, recursive variants, records, and pattern match-
ing, standard in functional languages. For example, [1, 2, 3] defines a se-
quence of length 3, {a = false, b = 1.2} a record with labels a and b, and
Leaf {age = 1.0} a variant with the constructor name Leaf, containing a
record with the label age. The match construct allows pattern matching. For ex-
ample, match a with Leaf {age = f} then f else 0.0 checks if a is a Leaf
and returns its age if so, or 0.0 otherwise. Here, f is a pattern variable that is
bound to the value of the age element of a in the then branch of the match.
The data types and pattern matching features in Miking, and consequently
CorePPL, are not directly related to the paper’s key contributions. Therefore,
we do not discuss them further. However, the CorePPL compiler in Section 4.3
supports the features, and the CorePPL models in Section 5 make frequent use
of them. We consider CorePPL again in Section 4 when compiling to PCFGs.

3 PPL control-flow graphs and RootPPL

This section introduces the new PCFG concept (Section 3.1) and shows how to
apply SMC over these (Section 3.2). Finally, we present the PCFG and SMC-
based RootPPL framework (Section 3.3).

3.1 PPL Control-Flow Graphs

In order to handle checkpoints efficiently without CPS or non-preemptive mul-


titasking, we introduce PPL control-flow graphs (PCFGs). In contrast to tra-
ditional PPLs, where checkpoints are most often implicit, we make them ex-
plicit and central in the PCFG framework. The main benefit of this approach
is that the handling of checkpoints in inference algorithms is greatly simplified,
which allows for implementing the framework in low-level languages. However,
the explicit checkpoint approach makes PCFGs relatively low-level, and they are
mainly intended as a target when compiling from high-level PPLs. We introduce
such a compiler in Section 4.
Formally, we define a PCFG as a 6-tuple (B, S, sim, b0 , bstop , L). The first
component B is a set of basic blocks inspired by basic blocks used as a part
of the control-flow analysis in traditional compilers [8]. In practice, the blocks
in B are pieces of code that together make up a complete probabilistic pro-
gram. Unlike basic blocks used in traditional compilers, we allow these pieces of
code to contain branches internally. The second component S is a set of states,
representing collections of information that flow between basic blocks. In prac-
tice, this state often contains local variables that live between blocks and an
accumulated likelihood. The blocks and states form the domain of the function
sim : B × S → B × S × {false, true}. This function performs computation specific
for the given block over the given state and outputs a successor block indicating
Compiling Universal PPLs with Efficient Parallel SMC Inference 35

b2 sim(b0 , s0 ) 7→ (b1 , s1 , false)


sim(b1 , s1 ) 7→ (b2 , s2 , true)
b0 b1 b4 bstop sim(b2 , s2 ) 7→ (b4 , s3 , true)
b3 sim(b4 , s3 ) 7→ (bstop , s4 , false)

(a)
(b)

Fig. 3: A PCFG illustration. Part (a) shows an example PCFG. The arrows de-
note the possible flows of control between the blocks, with regular arrows denot-
ing checkpoint transitions and arrows with open tips non-checkpoint transitions.
Part (b) shows a possible execution sequence with sim for (a).

Algorithm 1 A standard SMC algorithm applied to PCFGs.


Input: A PCFG (B, S, sim, b0 , bstop , L). A set of initial states {sn }N
n=1 .
Output: An updated set of states {sn }N n=1 .

1. Initialization: For each 1 ≤ n ≤ N , let an := b0 and cn := false.


2. Propagation: If all an = bstop , terminate and output {sn }N n=1 . If not, for each
1 ≤ n ≤ N where cn = false, let (an , sn , cn ) := sim(an , sn ). If all cn = true, go
to 3. If not, repeat 2.
3. Resampling: For each 1 ≤ n ≤ N , let pn := L(sn )/ N i=1 L(si ). For each 1 ≤
P
n ≤ N , draw a new index i from {i}N i=1 with probabilities {pi }i=1 . Let (sn , bn ) :=
N 0 0

(si , bi ). Finally, for each 1 ≤ n ≤ N , let (sn , bn , cn ) := (s0n , b0n , false). Go to 2.

what to execute next, an updated state, and a boolean indicating whether or


not there is a checkpoint at the end of the executed block.
To illustrate this formalization, consider the PCFG in Fig. 3a for which
B = {b0 , b1 , . . . , b4 , bstop }. The block b0 is present in every PCFG and represents
its entry point. Similarly, the block bstop is a unique block indicating termination,
which must be reachable from all other blocks. For some initial state s0 ∈ S,
Fig. 3b illustrates a possible execution sequence starting at b0 in Fig. 3a before
terminating at bstop . The structure of a PCFG restricts checkpoints to only occur
at the end of basic blocks and confines communication between blocks to the
state. These restrictions greatly simplify inference algorithm implementations.
More precisely, rather than relying on CPS or non-preemptive multitasking, the
inference algorithm can simply run a block b with sim, handle the checkpoint,
and then run the successor block indicated by the output of sim.

3.2 SMC and PCFGs


To prepare for introducing RootPPL in Section 3.3, we present how to apply
SMC inference to PCFGs. The work by Naesseth et al. [33] contains a more
general and pedagogical introduction to SMC. At a high level, SMC inference
works by simulating many instances—known as particles in SMC literature—of
36 D. Lundén et al.

a PCFG program concurrently, occasionally resampling the different particles


based on their current likelihoods. In CorePPL, for example, such likelihoods
are determined by weight and observe. Resampling allows the downstream
simulation to focus on particles with a higher likelihood.
In order to apply SMC inference over PCFGs, we need some way of deter-
mining the likelihood of the SMC particles. For this, we use the final component
of the PCFG definition, L : S → R≥0 , which is a function mapping states to a
likelihood (a non-negative real number). Concretely, this likelihood is most often
stored directly in the state as a real number, and L simply extracts it.
Algorithm 1 defines an SMC algorithm over PCFGs. It takes a PCFG as
input, together with a set of N states {sn }N n=1 , which represent the SMC par-
ticles. Step 1 in the algorithm sets up variables an and cn , indicating for each
particle its current block and whether or not a checkpoint has occurred in it.
Step 2 simulates all particles that have not yet reached a checkpoint using sim.
This step repeats until all particles have reached a checkpoint (this is a synchro-
nization point for parallel implementations). Step 3 uses the likelihood function
L to compute the relative likelihoods of all particles and then resamples them
based on this. That is, we sample N particles from the existing N particles (with
replacement) based on the relative likelihoods. After resampling, we return to
step 2. If all particles have reached the termination block bstop , the algorithm
terminates and returns the current states.
Note in Algorithm 1 that the input states are not required to be identical. For
example, each state should have a unique seed used to generate random num-
bers (e.g., with assume in CorePPL). Non-identical initial states in Algorithm 1
imply that different particles may traverse the blocks in B differently and reach
checkpoints at different times. Although this means that different particles can
be at different blocks concurrently, the SMC algorithm is still correct [24]. This
PCFG property is essential as it allows for the encoding of universal probabilis-
tic programs in PCFG-based PPLs. Furthermore, it implies that some particles
may reach bstop earlier than others. To solve this, we require in Algorithm 1 that
sim(bstop , s) = (bstop , s, true) holds for all states s. That is, particles that have
finished also participate in resampling and cannot cause step 2 to loop infinitely.
Next, we describe our implementation of PCFGs with SMC: RootPPL.

3.3 RootPPL

We make use of the PCFG framework when implementing RootPPL: a new


low-level PPL framework built on top of CUDA C++ and C++, intended
for highly optimized and massively parallel SMC inference on general-purpose
GPUs. RootPPL consists of two major components: a macro-based C++ PPL
for encoding probabilistic models and an SMC inference engine.
The macro-based language has two purposes: to support compiling the same
program to either CPU or GPU and to simplify the encoding of models for
programmers. As a result, the macros hide all hardware details from the pro-
grammer. To illustrate this macro-based PPL, consider the example RootPPL
Another Random Scribd Document
with Unrelated Content
an ein Oratorium „Luther“, über das sich ein reger Briefwechsel mit
Pohl entspinnt. Einige Stellen daraus seien hier angeführt, als
Beispiel, bis zu welchem Grade Schumann seine „Dichter“
beeinflußte und wie er sich das Ideal eines Oratoriums ersonnen
hatte: „Das Oratorium müßte ein durchaus volkstümliches werden,
das Bürger und Bauer verstände – dem Helden gleich, der ein so
großer Volksmann war. Und in diesem Sinne würde ich mich auch
bestreben, meine Musik zu halten, also am allerwenigsten künstlich,
kompliziert, kontrapunktisch, sondern einfach, eindringlich, durch
Rhythmus und Melodie vorzugsweise wirkend. Im übrigen stimme
ich mit allem, was Sie wegen Behandlung des Textes in metrischer
Hinsicht sagen, wie über die volkstümlich altdeutsche Haltung, die
dem Gedichte zu geben wäre, durchaus überein. – Das Oratorium
müßte für Kirche und Konzertsaal passen. Es dürfte mit Einschluß
der Pausen zwischen den verschiedenen Abteilungen nicht über
zweieinhalb Stunden dauern. – Alles bloß Erzählende und
Reflektierende wäre möglichst zu vermeiden, überall die dramatische
Form vorzuziehen. – Möglichst historische Treue, namentlich die
Wiedergabe der bekannten Kraftsprüche Luthers. Sein Verhältnis zur
Musik überhaupt, seine Liebe für sie, in hundert schönen Sprüchen
von ihm ausgesprochen, dürfte gleichfalls nicht unerwähnt bleiben. –
Hutten, Sickingen, Hans Sachs, Lucas Kranach, die Kurfürsten
Friedrich und Johann von Hessen müssen wir wohl aufgeben –
leider! Erzählungsweise mögen sie aber alle vorkommen. Ich glaube,
wir müssen den Stoff auf die einfachsten Züge zurückführen, oder
nur wenige der großen Begebenheiten aus Luthers Leben
herausnehmen. Auch glaube ich, dürfen wir dem Eingreifen
übersinnlicher Wesen nicht zu großen Platz einräumen, es will mir
nicht zu des Reformators ganzem Charakter passen, wie wir ihn nun
einmal recht als einen geraden, männlichen und auf sich selbst
gegründeten kennen. – Gelegenheit zu Chören geben Sie mir, wo Sie
können. Händels ‚Israel in Egypten‘ gilt mir als das Ideal eines
Chorwerks und eine so bedeutende Rolle wünschte ich dem Chor
auch im ‚Luther‘ zugeteilt. Der Choral ‚Ein’ feste Burg‘ dürfte als
höchste Steigerung nicht eher als zum Schluß erscheinen, als
Schlußchor. – Lassen Sie uns das große Werk mit aller Kraft ergreifen
und daran festhalten.“ Doch kam es trotz der in den letzten Worten
gegebenen Versicherung zur Ausführung des so wohl bedachten
Planes nicht. Andere, wichtiger dünkende Projekte drängten sich in
den Vordergrund. Denn im verzweifelten Streben ja nur Originelles,
Bahnbrechendes zu produzieren, sann Schumann endlich gar auf die
Einführung neuer Kunstgattungen, nahm das „weltliche Oratorium“
wieder auf und erfand das unerfreuliche, zwitterhafte Genre der
„Chorballade“.
D e r R o s e P i l g e r f a h r t (op. 112) heißt das bis zur
Abgeschmacktheit sentimentale Carmen Horns, welches jetzt ein –
sehr untergeordnetes – Seitenstück zu der sinnigen „Peri“ abgeben
sollte, ein Carmen, dessen Heldin eine menschgewordene Blume ist,
welche liebt, heiratet und endlich, man weiß nicht recht warum, von
der E l f e nkönigin unter die E n g e l aufgenommen wird. Kaum
begreifen wir, wie Schumann dies fade Produkt gezierter
Goldschnittlitteratur anziehen konnte, ihn, der einige Jahre vorher
sich geäußert: „Schwache Worte zu komponieren ist mir ein Greuel;
ich verlange keinen großen Dichter, aber eine gesunde Sprache und
Gesinnung.“ Horn war thöricht und eingebildet zu glauben, an dem
geringen Erfolg, den das Werk bei den ersten Aufführungen
davontrug, sei die „in der Auffassung verfehlte“ Musik des Meisters
schuld. Diese gehört nun freilich nicht zu seinen besten Leistungen,
enthält aber immerhin gar manche reizvolle Nummer. Derselbe
Dichter richtete ihm auch die Uhlandsche Ballade: D e r
K ö n i g s s o h n (op. 116) für seine Experimente her, Pohl im
folgenden Jahre D e s S ä n g e r s F l u c h (op. 139). Darnach
kommt der Geibelsche Romanzenkranz V o m P a g e n u n d d e r
K ö n i g s t o c h t e r (op. 140) an die Reihe und schließlich 1853 d a s
G l ü c k v o n E d e n h a l l (op. 143) von seinem Hausarzte Dr.
Hasenclever bearbeitet. Das Prinzip dieser zwischen Epos und Drama
schwankenden Gattung, wie es in op. 139 am deutlichsten
ausgeprägt erscheint, besteht darin, daß der erzählende Teil des
Gedichtes einer gewissen Stimme übertragen, alles Übrige jedoch,
nötigenfalls mit Hilfe von beträchtlichen Texterweiterungen, zu
Sologesang, Duett, Terzett oder Chor umgestaltet wird, je nachdem
es die Situation erfordert. So entwarf Schumann zu „des Sängers
Fluch“ folgendes Schema:

Nr. 1. C h o r m i t S o l i s .
Es stand in alten Zeiten – blühender Genoß.

Nr. 2. D u e t t f o r m (etwa zehn Zeilen).


Alter und Jüngling.
Nun sei bereit – steinern Herz.

Nr. 3. R e c i t a t i v (Sopran).
Schon stehen – zum Ohre schwoll.

Ensemble.
Alter, Jüngling, König, Königin, Chor.
(Breit auszuführen).

Nr. 4. R e c i t a t i v .
Und wie vom Sturm zerstoben – Gärten gellt.

Nr. 5. H a r f n e r.
Weh euch!

Nr. 6. C h o r.
Der Alte hat’s gerufen – Das ist des Sängers Fluch.

Pohl legt nun den beiden Sängern ganze Uhlandsche Lieder in


den Mund und der Chor drückt seine Ergriffenheit mit dem, zu
diesem Zwecke veränderten Verse: „Wie schlägt der Greis die Saiten,
so wundervoll und mild“ aus. Ferner wird der Königin ein früheres,
zärtliches Verhältnis zum Jünglinge insinuiert. Die beiden sind so
unvorsichtig, im Angesichte des ganzen Hofes ein langes Liebesduett
anzustimmen, worauf der König in begreiflicher Entrüstung den
kecken Troubadour mit den Worten: „Stirb feiger Sklavensohn!“
niedersticht. Daß Schumann die poetischen Sünden eines
unerfahrenen Kunstnovizen für gut befinden, ja sogar veranlassen
konnte, zeugt gewiß von einer bedeutenden Trübung seines einst so
klaren, richtigen Blickes.
Im März 1852 gab das Schumannsche Ehepaar unter großem
Zulauf auswärtiger Musiker (Liszt, Robert Franz, Joachim, Meinardus
u. a.) eine Anzahl von Konzerten in Leipzig, doch vermochten selbst
vor diesem wohlgesinnten Publikum gerade die letzten Werke keinen
rechten Erfolg zu erringen. Denn selten nur erhellt in ihnen ein Blitz,
manchmal wohl ein fernes Wetterleuchten des Genies das lastende
Dunkel. Müde und abgespannt kehrte der Meister nach Düsseldorf
zurück, an dem Männergesangsfest (im August) beteiligte er sich nur
wenig und eine Kur in Scheveningen, wo er mit seinem alten
Freunde Verhulst das Wiedersehen feierte, brachte nur
vorübergehende Besserung. Von seinem sonderbaren Zustand giebt
eine Erinnerung des Malers Bendemann an eine Abendgesellschaft,
bei welcher auch Schumann zugegen war, interessanten Bericht:
„Nach dem Essen trug Klara einige Klavierstücke vor. Schumann
hatte sich unterdes seiner Gewohnheit nach in ein Nebenzimmer
zurückgezogen. Seine Frau, die immer sehr besorgt um ihn war, ging
nach Beendigung ihrer Vorträge zu ihm hin. Ich begleitete sie. Als
wir eintraten, schrak er aus seinem träumerischen Versunkensein
empor und fragte: Wer hat da gespielt? Ich merkte, wie die Frage
Frau Schumann ins Herz schnitt. Aber Robert, ich habe ja gespielt,
erwiderte sie mit bebender Stimme. So warst du das!? gab
Schumann gleichgültig zurück und versank wieder in seine
Meditationen. Die sonderbaren Worte hatten die liebe Frau so
angegriffen, daß sie förmlich unwohl wurde und den Wunsch
aussprach, nach Hause zu gehen. Schumann hatte darnach nur ein
kurzes: Warum denn? Es ist ja ganz nett hier! – Aber bester
Schumann, wenn Ihre Frau unwohl ist, müssen Sie sie doch nicht
zurückhalten, sagte ich, mich ins Mittel legend. Da brach er denn
auf. Am andern Morgen aber empfing ich einen Brief von ihm, worin
er sich ziemlich beleidigt über meine Einmischung aussprach.“
(Schrattenholz, E. Bendemann). Über seinen damaligen
Geisterglauben erzählt ein anderer Freund, Wasiliewski, Folgendes:
„Als ich im Mai 1853 eines Tages in Schumanns Zimmer eintrat, lag
er auf dem Sofa und las in einem Buche. Auf mein Befragen, was
der Inhalt des letzteren sei, erwiderte er mit gehobener Stimme: ‚O!
Wissen Sie noch nichts vom Tischrücken?‘ ‚Wohl!‘ sagte ich in
scherzendem Tone. Hierauf öffneten sich weit seine für gewöhnlich
halbgeschlossenen Augen. Die Pupille dehnte sich krampfhaft
auseinander und mit eigentümlich geisterhaftem Ausdrucke sagte er
langsam: ‚d i e T i s c h e w i s s e n a l l e s.‘ Als ich diesen
drohenden Ernst sah, ging ich, um ihn nicht zu reizen, auf seine
Meinung ein, infolge dessen er sich beruhigte. Dann rief er seine
zweite Tochter herbei und fing an mit ihr und einem kleinen Tische
zu experimentieren, wobei er ihn auch den Anfang der C-
mollsymphonie von Beethoven markieren ließ.“ Doch konnte
Schumann zu gewissen Zeiten auch recht umgänglich, ja wohl
aufgeräumt sein. Beim großen niederrheinischen Musikfest
(Pfingsten 1853) that er sich als Dirigent sogar wieder hervor und
brachte nebst der umgearbeiteten D-mollsymphonie auch eine neu
komponierte F e s t o u v e r t ü r e ü b e r d a s R h e i n w e i n l i e d
mit Gesang (Soli und Chor) unter vielem Beifall zur Aufführung.
Denkwürdig ist sodann sein Verhältnis zu Johannes Brahms, der an
ihn durch Joachim empfohlen war. Schumann, die außergewöhnliche
Begabung des zwanzigjährigen Künstlers erkennend, griff
anerkennungsfreudig wie immer, noch einmal zur Feder und
proklamierte ihn in der Brendelschen Zeitschrift – nicht als
vielverheißenden Jünger, sondern als „starken Streiter“ und Meister,
der berufen sei, „den höchsten Ausdruck unserer Zeit in idealer
Weise auszusprechen.“ Brieflich bezeichnet er ihn als den, „der
kommen mußte“, als „einen jungen Aar“, welcher „die größte
Bewegung in der musikalischen Welt hervorrufen werde“. Selbst wer
diesem enthusiastischen Preise nicht in ganzem Umfange
beipflichten kann, muß zugeben, daß Brahms unter den absoluten
Musikern der letzten Zeit entschieden und verdientermaßen den
ersten Platz sich erobert hat. Kurz, der „blonde Johannes“ war
Schumanns erklärter Liebling von Stund an. Oft gedachten die
beiden Künstler bei ihrem Beisammensein auch des fern weilenden,
aber zum Konzerte erwarteten Freundes Joachim, ja, der stille,
schweigsame Robert schwang sich gar einmal zu folgendem Toast in
Charadenform auf: „Drei Silben; die erste liebte ein Gott, die zwei
anderen lieben viele Leser, das Ganze lieben wir alle; das Ganze und
der Ganze sollen leben!“ (Jo-achim). Auch existiert eine Violinsonate,
welche er, Brahms und der junge Musiker Albert Dietrich dem
berühmten Geigenvirtuosen zu Ehren geschrieben haben. Sie wurde
ihm nach seiner Ankunft in Düsseldorf vorgespielt und richtig
erkannte er den Verfasser jedes einzelnen Satzes. Später ersetzte
Schumann die zwei fremden Sätze durch eigene Musik und gab die
Komposition als (op. 131) heraus.
Leider zählten solche Perioden geistiger Frische nur nach Tagen.
In demselben Monate (Oktober), in welchen der Verkehr mit Brahms
fällt, nahmen auch die Abonnementskonzerte ihren Anfang und
schon beim ersten zeigte es sich deutlich, daß Schumann seiner
Aufgabe als Dirigent nicht mehr gewachsen sei. Man bat ihn, das
Dirigieren zur Schonung seiner Gesundheit wenigstens vorläufig dem
zweiten Kapellmeister zu überlassen, beleidigte aber durch dieses
vielleicht nicht genug schonend vorgebrachte Ansinnen den
reizbaren Meister derart, daß er seine Stelle ohne Verzug
niederlegte.
Er wandte Düsseldorf den Rücken und begab sich mit Klara auf
eine Konzerttour nach Holland. Da durfte er denn, mit Ehren
überhäuft, die ihm vermeintlich zugefügten Kränkungen vergessen.
„Das holländische Publikum,“ schreibt er frohbewegt, „ist das
enthusiastischste, die Bildung im ganzen dem Besten zugewendet.
Überall hört man neben den alten Meistern auch die neuen. So fand
ich in Hauptstädten Aufführungen meiner Kompositionen vorbereitet
(der dritten Symphonie in Rotterdam und Utrecht, der zweiten in
Haag und Amsterdam, auch der Rose in Haag), daß ich mich nur
hinzustellen brauchte, um sie zu dirigieren. Ich habe zu meiner
Verwunderung gesehen, daß meine Musik hier beinahe heimischer
ist als im Vaterlande.“
Am 22. Dezember traf das Künstlerpaar wieder zu Hause ein, mit
dem Vorsatz, die leidige Stadt im Laufe des kommenden Frühlings
für immer zu verlassen. Zwei litterarische Pläne beschäftigten den
Meister fortan: Zuerst die Herausgabe seiner gesammelten Aufsätze
in Buchform,[9] wobei es ihm eine Freude war zu bemerken, daß er
in der langen Zeit, seit über zwanzig Jahren, von den damals
ausgesprochenen Ansichten fast gar nicht abgewichen sei; dann eine
Zusammenstellung aller Dichtersprüche über Musik von den ältesten
Zeiten an, unter dem Titel „Dichtergarten“. An der Vollendung dieser
Arbeit verhinderte ihn jedoch sein neuerdings ausbrechendes Leiden.
Becker erzählt, daß Schumann einmal im Gasthause plötzlich die
Zeitung weggelegt habe mit den Worten: „Ich kann nicht mehr
lesen; ich höre fortwährend A.“ Solche Sinnestäuschungen kehrten
jetzt mit doppelter Stärke wieder. Des Nachts erscheinen ihm
Schubert und sein lieber Mendelssohn und singen ihm eine rührende
Melodie vor, bis er vom Lager springt und sie aufzeichnet.[10]
Geisterstimmen tönen ihm entgegen, bald mild und freundlich, bald
drohend und vorwurfsvoll. Eine furchtbare Angst ergreift den
Beunruhigten, während Klara durch vierzehn bange Nächte und Tage
alles aufbietet, um die quälenden Gedanken des Gatten zu
zerstreuen. Umsonst! Am Fastnachtsmontag, am 17. Februar 1854,
entfernt er sich heimlich aus dem Hause, eilt auf die Rheinbrücke
und springt, seinen peinlichen Zustand zu enden, in den eisigen
Strom. Aber anwesende Schifferknechte fischen ihn noch lebendig
wieder heraus: einen Wahnsinnigen. Man brachte ihn am 4. März in
die Privatheilanstalt des Dr. Richarz zu Endenich bei Bonn.
Ererbte Disposition und übermäßige Anstrengung des Geistes –
das gaben die Ärzte als Ursache der Krankheit an. Allein sie erklären
damit nicht alles. Schumann war ein großer, kräftiger, gesund
aussehender Mann, dessen Leben im ganzen glücklich genannt
werden darf. Alles, was er wünschte, hatte er errungen,
Freundschaft, Liebe, Künstlerschaft; nie hatte ihm die Sorge ums
tägliche Brot ihr bleiches Antlitz gewiesen. Bedenkt man, wie andere
Künstler schwach an Körper, bettelarm, mit wundem Herzen durch
die Welt gewandelt sind, ebenso produktiv, vielleicht noch
produktiver als unser Meister, ohne dabei dem Wahnsinn verfallen zu
sein, so muß man einräumen, daß es hier eine besondere
Bewandtnis gehabt habe. Fragen wir darum nicht weiter die Medizin,
sondern lieber einen mitfühlend verstehenden Genius – Grillparzer.
Und dieser sagt uns mit ausdrücklichem Bezug auf Schumann: „Ich
meine immer, ein Künstler, der wahnsinnig wird, sei im K a m p f e
g e g e n s e i n e N a t u r gelegen.“ Herrlicher Tiefblick des Dichters!
Wie beschämt er die Instrumente des secierenden Gelehrten, welche
nur über Gehirnentartung, Gefäßerweiterung und dergleichen
Aufschluß zu geben wußten. Grillparzer, der Schumann bloß einmal
flüchtig gesehen, löst uns durch einen scharfen Hieb seines
Intellekts den gordischen Knoten der Schumannfrage und führt uns
zur Erkenntnis, daß nicht das Übermaß des Schaffens, sondern das
Schaffen gegen seine Natur und Eigenart diesen Geist erschöpft und
zerrüttet habe. Im Verlaufe unserer Darstellung wurde dieses
Moment schon mehrmals hervorgehoben.
Zu Endenich lebte Schumann meist in einem Zustande tiefster
melancholischer Depression. Manchmal trat wieder eine so
bedeutende Besserung ein, daß man sogar auf völlige Genesung
hoffen zu können glaubte: dann verschwand der irre Blick aus seinen
Augen, er musizierte und schrieb an Familie und Verleger. Doch am
Beginn des Sommers 1856 ging es mit ihm merklich zu Ende. Seine
Freunde, die in treuer Anhänglichkeit oft zum Krankenhaus
wanderten, um Nachrichten einzuholen, kamen mit immer
traurigerer Botschaft heim. Die einzige Beschäftigung dieses einst so
bedeutenden Geistes bestände darin, sich in einem, von Brahms
geschenkten Atlas eingebildete Reisen zusammenzusuchen. Am 29.
Juli um vier Uhr nachmittags verschied er sanft in den Armen seiner
Gattin. Sie und alle, die Schumann liebten, fühlten sich erleichtert: er
war ja befreit von schwerem Leiden.
Die Nachricht von Schumanns Tode durcheilte Bonn und die
anderen rheinischen Städte in wenigen Stunden. Auf Straßen und
Plätzen wurde man darauf angeredet, ob man die Trauerkunde
schon vernommen. Von fern und nah eilten die Verehrer des
Meisters zu seinem Begräbnis, Brahms, Joachim, Dietrich gingen
barhaupt hinter dem Sarge, Hiller sprach die Grabrede und die
Bevölkerung Bonns war in Scharen herbeigeströmt, um den
Leichenzug zu sehen, als würde ein König beerdigt. Auf dem Bonner
Kirchhof vor dem Sternenthor, nicht weit von Vater Arndt, ruht jetzt
der müde Sänger. Dort hat ihm im Mai 1880 die Liebe seiner Freunde
ein würdiges Denkmal errichtet.

[8] Bekannte Irrenanstalt.


[9] Erschienen bei Wigand, 1854. Später ging das Verlagsrecht an
Breitkopf und Härtel über. Neue Ausgabe: Universal-Bibliothek Nr.
2472/73. 2561/62. 2621/22.
[10] Johannes Brahms hat über dieses im Nachlasse des Meisters
vorgefundene Thema sehr schöne Variationen geschrieben (op.
23).
8. Rückblick.
Nicht mitzuhassen, mitzulieben bin ich da.
Antigone
.

Wenn unser Auge bei der Betrachtung des Schumannschen


Lebensganges an manchen Schwächen und Irrtümern haften mußte,
so darf der folgende Abschnitt nun ganz dem Preise seiner
bedeutenden Leistungen gewidmet sein. Mag der eine dieser, der
andere jener den Vorzug zuerkennen – über ihren hohen Wert ist
heutzutage alle Welt einig und zählt ihn, wenn auch nicht zu jenen
Gewaltigen, welche die gesamten idealen Bestrebungen des
verfließenden Zeitalters kraftvoll zusammenfassend, einem künftigen
den Stempel ihres Geistes aufdrücken, so doch zu den Musikern
ersten Ranges. Er war ein musikalischer Kernmensch und vermag
darum sogar in Kunstgattungen, für welche ihm die spezielle
Befähigung abging, in Ehren zu bestehen und die volle Teilnahme
des Hörers zu erwecken. Sollen wir aber diese seine spezielle
Befähigung genauer umschreiben, was könnten wir Treffenderes
vorbringen, als das Urteil, welches er selbst, über einen anderen
Künstler (L. Berger) freilich, gefällt hat: „Überhaupt war er in kleinen
Formen glücklicher als in größeren, wie dies oft bei Naturen der Fall,
die stets ihr Bestes, Tiefstes, Innigstes geben möchten. Schlage man
solche Stücke nicht zu gering an. Eine gewisse breite Unterlage, ein
bequemes Aufbauen und Abschließen mag mancher Leistung zum
Lobe gereichen. Es giebt aber Tondichter, die, wozu andere Stunden
brauchen, in Minuten auszusprechen wissen: zur Darstellung, wie
zum Genießen solcher geistig konzentrierter Kompositionen gehört
aber freilich auch eine gesteigerte Kraft des Darstellenden wie des
Aufnehmenden und dann auch die rechte Stunde und Zeit; denn
schöne, bequeme Form läßt sich immer genießen und auslegen;
tiefer Gehalt wird aber nicht zu jeder Zeit verstanden. Daß Berger
auch größerer Formen Meister war, hat er in seinen Sonaten und
Konzerten bewiesen. Keineswegs aber geben wir für diese eben jene
kleineren, genialeren Arbeiten hin, wie jene Charakterstücke, einige
seiner Variationen und vor allem seine Lieder.“
So Schumann. Um die Schönheiten seiner Musik zu empfinden,
bedarf es natürlich, wie bei allen tiefer gedachten Kunstwerken einer
eingehenden Bekanntschaft. „Als ich Schumann zu spielen anfing,“
erzählt Robert Hamerling,[11] „glaubte ich in seiner Tonsprache dem
hellen Klangreize anderer Meister gegenüber etwas Herbes, Dumpfes
zu finden, welches jedoch, sobald es nur vom rechten Geschick und
Verständnis bewältigt war, in den süßesten Wohllaut sich auflöste.
Vor allem will das Individuelle, Charakteristische des Tonstückes bei
ihm erfaßt und festgehalten sein und aus diesem Grunde ist die
Kenntnisnahme der Überschriften, die er über seine Werke setzt, für
den vollen Genuß des Hörers so unentbehrlich, wie für den Vortrag.
Was soll der Hörer von den Sprüngen des Harlekins denken, wenn er
nicht weiß, daß es eben Harlekinssprünge sind.“ Als Gegensatz zu
Mendelssohns ohrenfälligem Formenspiel wurde diese Musik vom
Publikum anfangs mit der – gleichfalls unverständlich dünkenden –
„Zukunftsmusik“ zusammengeworfen, wozu noch der Umstand
beitrug, daß Brendel in seiner Zeitschrift die beiden Richtungen mit
demselben Enthusiasmus verfocht. Schumanns Parteigänger
protestierten entschieden, aber nicht ganz mit Recht gegen solche
Beiordnung, denn thatsächlich finden sich zwischen ihm und der
„Weimarer Schule“ gar viele und wichtige Berührungspunkte.
Beiden ist vor allem die geschichtliche Basis: Bach und (der
letzte) Beethoven gemeinsam. „Das Tiefkombinatorische, Poetische
und Humoristische der neueren Musik,“ bemerkt Schumann, „hat
seinen Ursprung zumeist in Bach. Die sogenannten Romantiker
stehen Bach weit näher als Mozart, wie ich selbst tagtäglich vor
diesem Hohen beichte, mich durch ihn zu reinigen und zu stärken
suche,“ und ein andermal gesteht er, daß ihn nur mehr „das
Äußerste“ reize; „Bach fast durchweg, Beethoven zumeist in seinen
späteren Werken.“ „Es ist wahr, zum Verständnis der
Beethovenschen letzten Quartette gehört mehr als bloße Lust zum
Hören. Der empfänglichste, offenste Musikmensch wird ungerührt
von ihnen gehen, wenn er nicht tiefe Kenntnis des Charakters
Beethovens und dessen späterer Aussprache mitbringt. Dann aber,
ist er auf dem Wege dahin, so kann auch dem menschlichen Geiste
kaum etwas Wunderwürdigeres geboten werden, als jene
Schöpfungen, denen in ihrer tiefsinnigen Gestaltung, ihrem alle
menschlichen Satzungen überschwebenden Ideenfluge von anderer
neuerer Musik gar nichts und im übrigen nur einiges etwa von Lord
Byron oder von Jean Pauls und Goethes späteren Werken verglichen
werden kann.“ Die Tonkunst ist ihm „die veredelte Sprache der
Seele; andere finden in ihr einen Ohrenrausch, andere ein
Rechenexempel und üben sie in dieser Weise aus. Aber das wäre
eine kleine Kunst, die nur klänge und keine Sprache noch Zeichen
für Seelenzustände hätte!“ Auch die Lehre von der Einheit des
Kunstgefühls, von der Urverwandtschaft der Künste läßt sich in
Schumanns Schriften und Briefen nachweisen. „Die Ästhetik der
einen Kunst ist die der anderen; nur das Material ist verschieden.
Der gebildete Musiker wird an einer Rafaelschen Madonna mit
gleichem Nutzen studieren können, wie der Maler an einer
Mozartschen Symphonie. Noch mehr: dem Bildhauer wird jeder
Schauspieler zur ruhigen Statue, diesem die Werke jenes zu
lebendigen Gestalten; dem Maler wird das Gedicht zum Bild, der
Musiker setzt die Gemälde in Töne um.“ Gleich den „Neudeutschen“
möchte sich Schumann vom partikularistischen Standpunkt des
Musikers zum höheren, allgemeineren des „Künstlers“ schlechthin
erheben. „Sieh dich tüchtig im Leben um, wie auch in anderen
Künsten und Wissenschaften,“ ruft er dem Musiker zu und bekennt:
„Es affiziert mich alles, was in der Welt vorgeht, Politik, Litteratur,
Menschen; über alles denke ich nach meiner Weise nach, was sich
dann durch die Musik Luft machen, einen Ausweg suchen will.
Deshalb sind viele meiner Kompositionen so schwer zu verstehen,
weil sie an entfernte Interessen anknüpfen, oft auch bedeutend, weil
mich alles Merkwürdige der Zeit ergreift und ich es dann musikalisch
aussprechen muß.“ Sogar was die Programmmusik anbelangt,
stimmt er mit den Neudeutschen in der Kardinalfrage: nach ihrer
Berechtigung überein. Er meint zwar einmal, es sei kein gutes
Zeichen für ein Tonstück, wenn es der Überschrift bedarf; es wäre
dann gewiß nicht der inneren Tiefe entquollen, sondern erst durch
irgend eine äußere Vermittlung angeregt; und vom Programm einer
Berliozschen Symphonie: „Ganz Deutschland schenkt es ihm; solche
Wegweiser haben immer etwas Unwürdiges, Charlatanmäßiges.“
Doch stehen damit zahlreiche andere Aussprüche Schumanns in
offenbarem Widerspruch, so daß es den Anschein hat, als sei er in
dieser Frage nicht ganz schlüssig geworden. Man höre: Warum
könnte nicht einen Beethoven inmitten seiner Phantasien der
Gedanke an Unsterblichkeit überfallen? Warum nicht das Andenken
eines großen gefallenen Helden ihn zu einem großen Werke
begeistern? Warum nicht einen anderen die Erinnerung an eine selig
verlebte Zeit? Italien, die Alpen, das Bild des Meeres, eine
Frühlingsdämmerung – hätte uns die Musik noch nichts von diesem
allen erzählt? Ja, selbst kleinere, speziellere Bilder können der Musik
einen so reizend festen Charakter verleihen, daß man überrascht
wird, wie sie solche Züge auszudrücken vermag. So erzählte mir ein
Komponist, daß sich ihm während des Niederschreibens das Bild
eines Schmetterlings, der auf einem Blatte im Bache
mitfortschwimmt, aufgedrungen: dies hätte dem kleinen Stücke die
Zartheit und Naivetät gegeben, wie es nur irgend das Bild in der
Wirklichkeit besitzen mag.... Man irrt sich gewiß, wenn man glaubt,
die Komponisten legen sich Feder und Papier in der elenden Absicht
zurecht, dies oder jenes auszudrücken. Doch schlage man zufällige
Einflüsse und Eindrücke von außen nicht zu gering an. Unbewußt
neben der musikalischen Phantasie wirkt oft eine Idee fort, neben
dem Ohr das Auge, und dieses, das immer thätige Organ, hält dann
mitten unter den Klängen und Tönen gewisse Umrisse fest, die sich
mit der vorrückenden Musik zu deutlichen Gestalten verdichten und
ausbilden können. Je mehr nun mit der Musik verwandte Elemente
die mit den Tönen erzeugten Gedanken oder Gebilde in sich tragen,
von je poetischerem oder plastischerem Ausdrucke wird die
Komposition sein und, je phantastischer oder schärfer der Musiker
aufpaßt, um so mehr wird sein Werk erheben oder ergreifen. Daß
sich Schumann zu seinen Symphonien durch äußere Momente
anregen ließ, wurde bereits erwähnt, aber selbst in seine
Kammermusikwerke spielen, wie es scheint, allerlei poetische Ideen
hinein. Umgekehrt, und das ist das Entscheidende, genießt
Schumann absolute Musik nicht rein musikalisch, sondern als Poet,
als Künstler: „Ich kann nicht unterlassen anzuführen, wie mir
einstens während eines Schubertschen Marsches der Freund, mit
dem ich spielte, auf meine Frage, ob er nicht ganz eigene Gestalten
vor sich sehe, zur Antwort gab: Wahrhaftig, ich befand mich in
Sevilla, aber vor mehr als hundert Jahren, mitten unter auf- und
abspazierenden Dons und Donnas mit Schleppkleid,
Schnabelschuhen u. s. w. Merkwürdigerweise waren wir in unseren
Visionen bis auf die Stadt einig.“ Verwahrte sich Schumann nicht
ausdrücklich dagegen, so dürfte man ihn (mit Liszt) nach den
Überschriften, die er seinen Tonstücken giebt, als Programmatiker
betrachten; doch erklärt er zu wiederholten Malen, sie seien erst
später entstanden und nur „feinere Fingerzeige für Vortrag und
Auffassung.“ Mag dies nun für all seine Werke zutreffen oder nicht –
jedenfalls ist Schumann, zwar kein Partner, wohl aber ein Vorläufer
der „neudeutschen Schule“, das notwendige Mittelglied zwischen ihr
und Beethoven.
Der Glaube, in welchem er komponierte, war der an die
unversiegbare Kraft des deutschen Kunstgeistes: „Mir ist oft, als
ständen wir an den Anfängen, als könnten wir noch Saiten
anschlagen, von denen man früher noch nicht gehört.“ Auch der
Hinweis auf die Zukunft kehrt bei Schumann häufig wieder, ja einmal
sagt er gar: „Eine Zeitschrift für z u k ü n f t i g e M u s i k fehlt noch,
als Redakteure wären freilich nur Männer, wie der ehemalige, blind
gewordene Kantor an der Thomasschule (Bach) und der taube, in
Wien ruhende Kapellmeister (Beethoven) passend.“ Aber gleich den
Neudeutschen verwirft er bloße Nachahmung dieser verehrten
Meister. Persönlichkeit scheint ihm nicht nur das „höchste Glück der
Erdenkinder“, sondern auch die vornehmste Eigenschaft jedes
wirklichen Künstlers. „Nenne mich beileibe nicht mehr Jean Paul den
Zweiten oder Beethoven den Zweiten,“ schreibt er Klara; „da könnte
ich dich eine Minute lang hassen. Ich will zehnmal weniger sein als
andere, aber nur für mich etwas.“ Noch muß die nationale Tendenz
Schumanns hervorgehoben werden, denn er ist eigentlich der erste
Komponist, der sein Deutschtum stärker betont, der die Anlehnung
an ausländische Muster geflissentlich vermeidet – ganz wie die
Neudeutschen. „Die höchsten Spitzen italienischer Kunst reichen
noch nicht an die Anfänge wahrhaft Deutscher,“ sagt er, ebenso
überzeugungsvoll, als ungerecht, und die Fremdwörter, welche sich
zu seiner Zeit auf Titeln und in den Vortragsbezeichnungen breit
machten, auszumerzen, war sein eifrigstes Bestreben. Als Brendel
bei der Übernahme der Zeitschrift die Absicht aussprach, sie in
Zukunft mit lateinischen Lettern drucken zu lassen, protestierte
Schumann sehr heftig dagegen und erklärte, daß er dann imstande
sei, sie nicht wieder anzusehen. – Endlich hat Schumann mit den
Neudeutschen die Doppelthätigkeit als Künstler und Schriftsteller
gemeinsam. Bei ihm ist diese Vereinigung zweier Talente, wie Liszt
vortrefflich ausführt, „noch durch das Verdienst erhöht, daß er nicht
unbewußt dem Drange der Verhältnisse nachgab und, nachdem er
diese erkannt, nicht erst die äußerste Notwendigkeit zum Handeln
abwartete. Nicht zufrieden, für seine Idee, die damals ebenso wenig
allgemein begriffen wurde, als sie es kaum in den nächsten
Dezennien sein dürfte, zu eifern, zu predigen, zu arbeiten, zu
kämpfen, setzte er für die erkannte Wahrheit Gut und Leben ein. Ein
richtiger Blick ist zu allen Zeiten sein Vorzug, seine Kritik liefert ein
schönes Beispiel eines prinzipiell strengen, faktisch wohlwollenden
Geistes, der anspruchsvoll für die Kunst, nachsichtig gegen die
Künstler ist, der gern aus seiner Heimat in den Wolkenschichten als
freundlicher Gast in bescheidenen Niederungen einkehrt, dem
Vielwollenden vieles verzeiht, redliche Gesinnung und beharrliches
Streben ermuntert, sich mutig und voll Zorn gegen reiche Geister
erhebt, die ihren Reichtum nicht zum alleinigen Nutzen der Kunst
erheben wollen, der selbst im Tadel sanft gegen Schwache ist und
im Lobe selbst gebieterisch gegen Erfolgreiche – ehrlich aber gegen
alle.“ Schumanns unmittelbaren Schülern kann freilich der Vorwurf
nicht erspart werden, die von ihm erlernte stilistische Gewandtheit
mehr dazu benützt zu haben, ihren eigenen Witz auf Kosten des
Kunstwerkes glänzen zu lassen, als es verständnisvoll und
wohlwollend zu beleuchten; allein wer wird den Meister für die
Verirrungen seiner Jünger zur Verantwortung ziehen? Er hatte
fürwahr eine höhere Meinung von dem ehrwürdigen Amte der Kritik:
„Thörichten, Eingebildeten schlägt sie die Waffen aus der Hand,“
sagt er, „Willige schont, bildet sie; Mutigen tritt sie rüstig freundlich
entgegen: vor Starken senkt sie die Degenspitze und salutiert.“
Ferner: „Wir gestehen, daß wir für die höchste Kritik halten, welche
durch sich selbst einen Eindruck hinterläßt, dem gleich, den das
anregende Original hervorbringt. In diesem Sinne könnte Jean Paul
zum Verständnis einer Beethovenschen Symphonie durch ein
poetisches Gegenstück mehr beitragen, als die Dutzend-Kunstrichter,
die Leitern an den Koloß legen, um ihn nach Ellen zu messen.“
Unter den zahlreichen, zum größeren Teile in die „Gesammelten
Schriften“ aufgenommenen, kritischen Artikeln Schumanns haben
namentlich zwei bedeutenden Einfluß auf den musikalischen
Geschmack in Deutschland gehabt; der eine über Berlioz, worin er
den Wert dieses zumeist als Halbverrückten behandelten Künstlers
unbefangen aufdeckte, ist besonders in methodologischer Hinsicht
von außerordentlicher Bedeutung und man hört es noch heute von
Musikern, daß sie daraus mehr wirkliches und nützliches Wissen
schöpfen, als aus den umfangreichsten Lehrbüchern musikalischer
Gelehrsamkeit. In späterer Zeit wurde Schumann allerdings
empfindlicher gegen das Bizarre, Extravagante in Berlioz’ Musik,
doch blieb es immer sein Stolz, „mit der kritischen Weisheit nicht
zehn Jahre hinterdrein gefahren zu sein, sondern im voraus gesagt
zu haben, daß Genie in dem Franzosen gesteckt.“ Griepenkerls, von
wahrem Fanatismus für Berlioz erfüllte Broschüre: „Ritter Berlioz in
Braunschweig“, besprach er, trotzdem gegen ihn selbst polemisiert
wurde, überaus freundlich und schloß mit den Worten: „Möge die
kleine Schrift gelesen werden; sie enthält manch blitzenden
Gedanken und konnte auf so vieles Unwürdige, Ignorantenhafte,
was neuerdings über Berlioz geschrieben worden ist, gar nicht
ausbleiben.“ Dem merkwürdigen Künstler aber schlage, was um ihn
vorgeht, alles zum Besten aus, wie Goethe sagt im Tasso:

„Ruhm und Tadel


Muß er ertragen lernen, sich und andere
Wird er gezwungen recht zu kennen.“

Ganz anders gestaltete sich hingegen das Verhältnis Schumanns


zu einem anderen, nicht minder berühmten Pariser Musiker:
Meyerbeer. Nicht als hätte er bei diesem das große Talent verkannt,
aber sein unkünstlerisches Streben nach Effekt, die innere Hohlheit
seiner Werke stieß den idealistischen, ehrlichen Deutschen ab.
Bedenkt man das ungeheuere, durch die bezahlte Tagespresse
erzeugte Ansehen, welches Meyerbeer genoß und seinen
verderblichen Einfluß auf den öffentlichen Kunstgeschmack (es gab
Leute, die ihn Mozart an die Seite stellten!), so muß Schumanns
Artikel gegen „Die Hugenotten“ als eine nationale und künstlerische
That bezeichnet werden. „Nie unterschrieb ich etwas mit so fester
Überzeugung als heute,“ sagte er am Schlusse des Aufsatzes und
auch späterhin ist er von seiner Meinung nicht abgekommen. Die
erste, nicht günstige Kritik über „Tannhäuser“ scheint, wenigstens
indirekt, Meyerbeer verschuldet zu haben, insofern, als Wagners
offenkundige Beziehungen zu ihm, Schumanns Argwohn erweckten.
Mit welchem Freimut er sein Urteil zurücknahm, sobald er sich von
der Unbilligkeit des Tadels überzeugt hatte, hörten wir schon.
Sowohl durch seinen gewählten, reinen und graziösen Stil, als
durch die treffende, harmonische Anwendung seiner Bilder gehört
Schumann unbedingt zu den hervorragendsten Schriftstellern der
dreißiger Jahre. Bis dahin hatte man in Deutschland selten
Wissenschaftliches, Vernünftiges und Richtiges über Musik in einem
blühenderen Stil als dem bei Lehrbüchern der Arithmetik
gebräuchlichen vortragen gehört. Schumann vermied diese
Trockenheit der Fachmenschen, die in so wenig anziehender Weise
und immer nur vom technischen Standpunkt aus über Musik derartig
gesprochen hatten, daß man leicht von ihr selbst hätte abgeschreckt
werden können. Er wußte die Laien zu interessieren, denen bisher
die musikalischen Zeitschriften meistens für zu viel Langeweile zu
wenig Belehrung geboten hatten.
Wie Jean Paul die zwei kontrastierenden Seiten seines Wesens in
„Walt und Wult“ verkörpert hat, so Schumann bekanntlich das
Seinige in „Florestan und Eusebius“. In wie feiner Weise macht er
sich nun diese Fiktion für seine Kritik zu nutze! Er stellt Florestan als
Repräsentanten der abstrakten Kunst hin und Eusebius als das
liebevoll auffassende Künstlergemüt und vermochte, indem er bald
diesem, bald jenem das Wort erteilte, Doppelkritiken nach den bei
der Beurteilung notwendigen zwei Gesichtspunkten zu geben. Noch
mehr. Er gewinnt durch diesen Davidsbund auch eine prächtig
sinnige Einkleidung für seine kritischen Aufsätze. Als er neu
erschienene Tanzlitteratur zu besprechen hat, bedient er sich
beispielsweise der Fiktion eines Maskenballes, welchen die Bündler
veranstalten und wobei die zu besprechenden Stücke aufgespielt
werden. Oder er berichtet über die Gewandhauskonzerte des
Oktober 1835 in Form von Briefen, die Eusebius an Chiara (Klara)
nach Italien schreibt. Auch recht lustige Episoden weiß er
gelegentlich zu erfinden, so wenn bei einer von Franzilla Pixis
gesungenen Donizettischen Arie „etwas sehr Nasses“ auf der Backe
selbst des gestrengen Kunstrichters Florestan sichtbar wird. Daheim
läuft er dann wütend auf und ab, vor sich hinsprechend: „O ewige
Schande! O Florestan, bist du bei Sinnen, hast du deshalb den
Marpurg studiert, deshalb das wohltemperierte Klavier seciert,
kannst du deshalb den Bach und Beethoven auswendig, um bei einer
miserablen Arie von Donizetti nach vielen Jahren so etwas wie
weinen? Hätte ich die Thränen, zu nichts wollt’ ich sie zerkratzen mit
der Faust!“ Darauf setzt er sich unter schrecklichem Lamentieren ans
Klavier und spielt jene Arie so wirtshausmäßig, lächerlich und
fratzenhaft, daß er endlich beruhigt zu sich sagen kann: „Wahrhaftig,
nur der Ton ihrer Stimme war’s, der mir so ins Herz ging!“
Die Idee des Davidsbundes wurde von den Freunden Schumanns
ehedem viel bewundert, ja, man dichtete allerhand Tiefsinnigkeiten
in sie hinein. „Vor allem mahne uns dieser Name,“ kommentiert
Wedel (Zuccamaglio), „an den ewigen, heiligen Bund der Dicht- und
Tonkunst. Der Name des gekrönten Sängers, der nur in
gottbegeisterten Liedern sich verewigte, deute uns immer das
Verhältnis zwischen Kunst und Religion, erinnere uns, daß die
Sprache einer Geisterwelt nicht herabgewürdigt werden darf, dem
Niedrigen im Menschen zu schmeicheln und das Verwerfliche zu
übergolden und zu verbrämen.“ Solche Gedanken lagen wohl
Schumann ursprünglich fern. Er war, obgleich gut christlich gesinnt,
doch eigentlich keine tief religiöse Natur. Das erklärt sich aus der
Geistesrichtung der Periode, in welcher er lebte. Auch auf
Unterschiede zwischen den christlichen Konfessionen zu halten, fiel
ihm nicht ein, was schon durch den Umstand bewiesen wird, daß er,
der doch Protestant war, 1852 für die katholische Kirche in
Düsseldorf eine Messe und ein Requiem geschrieben hat.
Schließlich soll noch angeführt sein, daß Schumann in den Jahren
1837–39 darauf sann, dem nur in seiner Phantasie existierenden
Bunde wirkliches Leben zu erteilen, einen großen, deutschen
Künstlerbund zu begründen. Auch einen anderen derartigen Plan
hegte er damals: die Errichtung einer Agentur zur Herausgabe von
Musikwerken, „welche den Zweck hätten, alle Vorteile, die bis jetzt
den Verlegern in so reichlichem Maße zufließen, den Komponisten
zuzuwenden.“ Man sieht, auch an praktischen Vorschlägen ließ es
dieser merkwürdige Geist nicht fehlen, so wenig er selbst darnach
geschaffen war, sie in Wirklichkeit durchführen zu können. Immerhin
muß man seinem richtigen, idealen Wollen die gebührende
Anerkennung entrichten.
Robert Schumanns Leben und Schaffen, stellt es nicht treu die
Eigenart seiner Zeit, des zweiten Drittels unseres Jahrhunderts dar?
Es ist das Zeitalter erneuter Regsamkeit auf dem Gebiete der Kunst,
der Wissenschaft und des öffentlichen Lebens. Überall zeigen sich
Keime und Ansätze, viel Enthusiasmus und Fleiß, aber geringe
Thatkraft. Es ist das Zeitalter politischer und geistiger Zersplitterung;
aber auch der Vorbereitung auf die später gewonnene Einheit; und
Schumann strebte weiter: nach Erfüllung wenigstens in der Kunst
und setzte sein edles Leben vergebens daran. Erst das künftige
Geschlecht sollte ernten, wo jenes gesäet hatte. Dank darum den
emsigen Säeleuten und nicht zum mindesten fürwahr unserem
Meister. Seine Werke aber mögen im deutschen Volke immer
unvergeßlich bleiben!

[11] R. Hamerling, Prosa. Bd. 1: Meine Lieblinge (Hamb. 1891).


Register der Werke Robert
Schumanns.
1. Numerierte Werke.
1. Thème sur le nom Abegg.[12]
2. Papillons.
3. Studien für Pianoforte nach Kaprizen von Paganini.
4. Intermezzi.
5. Impromptüs über ein Thema von Klara Wieck.
6. Davidsbündlertänze. (Gewidmet Walther von Goethe.)
7. Toccata.
8. Allegro.
9. Carnaval. Scènes mignonnes.
10. VI Etudes pour le pianoforte d’après des caprices de
Paganini.
11. Grande Sonate (dediée à Mlle. Clara Wieck).
12. Phantasiestücke.
13. Etudes symphoniques (dediée Sterndale Bennett).
14. Concert sans orchestre (dediée J. Moscheles).
15. Kinderscenen.
16. Kreisleriana (gewidmet Friedrich Chopin).
17. Phantasie für Pianoforte (Franz Liszt).
18. Arabeske.
19. Blumenstück.
20. Humoreske.
21. Novelletten (Adolph Henselt).
22. Sonate Nr. II.
23. Nachtstücke.
24. Liederkreis von Heine (Mlle. Pauline Garcia).[13]
25. Myrthen. Liederkreis. („Seiner geliebten Braut“.)
26. Faschingsschwank aus Wien.
27. Lieder und Gesänge (Heft I).
28. Drei Romanzen für Pianoforte.
29. Drei Gedichte von Geibel für mehrstimmigen Gesang u.
Pianoforte.
30. Drei Gedichte von E. Geibel.
31. Drei Gesänge von A. Chamisso.
32. Scherzo, Gigue, Romanze und Fughette für Pianoforte,
komponiert 1838–39.
33. Sechs Lieder für vierstimmigen Männergesang.
34. Vier Duette für Sopran und Tenor mit Pianoforte.
35. Zwölf Gedichte von J. Kerner.
36. Sechs Gedichte von Reinick.
37. Zwölf Gedichte aus E. Rückerts Liebesfrühling.
38. Erste Symphonie (B-Dur).
39. Liederkreis von J. v. Eichendorff.
40. Fünf Lieder (H. C. Andersen).
41. Drei Quartette (E. Mendelssohn).
42. Frauenliebe und Leben.
43. Drei zweistimmige Lieder mit Pianoforte.
44. Quintett für Pianoforte, Violinen, Viola, Cello (Klara).
45. Romanzen und Balladen (Heft I).
46. Andante und Variationen für zwei Pianoforte.
47. Quartett für Pianoforte, Violine, Viola und Cello.
48. Dichterliebe aus dem Buche der Lieder von Heine
(Schröder-Devrient).
49. Romanzen und Balladen (Heft II).
50. Das Paradies und die Peri von Th. Moore für Soli, Chor,
Orchester.
51. Lieder und Gesänge (Heft II), komponiert 1842.
52. Ouvertüre, Scherzo und Finale für Orchester.
53. Romanzen und Balladen (Heft III).
54. Konzert für das Pianoforte mit Orchester.
55. Fünf Lieder von R. Burns für gemischten Chor.
56. Studien für den Pedalflügel.
57. Belsazar, Ballade von Heine.
58. Skizzen für den Pedalflügel.
59. Vier Gesänge für gemischten Chor.
60. Sechs Fugen über den Namen Bach für Orgel.
61. Zweite Symphonie (C-Dur).
62. Drei Gesänge für Männerchor.
63. Trio für Pianoforte, Violine und Cello (D-moll).
64. Romanzen und Balladen (Heft IV).
65. Ritornelle von F. Rückert für mehrstimmigen
Männergesang.
66. Bilder aus Osten. Impromptüs für Pianoforte zu vier
Händen.
67. Romanzen und Balladen für Chor (I. Heft).
68. Album für die Jugend.
69. Romanzen für Frauenstimmen (Heft I).
70. Adagio und Allegro für Pianoforte und Horn.
71. Adventlied von F. Rückert für Sopran, Chor, Orchester.
72. Vier Fugen für Pianoforte (Carl Reinecke).
73. Phantasiestücke für Pianoforte und Klarinette.
74. Spanisches Liederspiel f. eine u. mehrere Stimmen m.
Pianoforte.
75. Romanzen und Balladen für Chor (II. Heft).
76. Vier Märsche für Pianoforte.
77. Lieder und Gesänge (Heft III).
78. Vier Duette für Sopran und Tenor mit Pianoforte.
79. Liederalbum für die Jugend.
80. Zweites Trio (F-Dur).
81. Genoveva (Oper in vier Akten nach Tieck und F. Hebbel).
82. Waldscenen (Neun Klavierstücke).
83. Drei Gesänge.
84. Beim Abschied zu singen. (Lied f. Chor m.
Instrumentalbegleitung).
85. Zwölf vierhändige Klavierstücke für kleine und große
Kinder.
86. Konzertstück für vier Hörner und großes Orchester.
87. Der Handschuh (Ballade von Schiller).
88. Phantasiestücke für Pianoforte, Violine, Cello.
89. Sechs Gesänge von W. von der Neun (Jenny Lind).
90. Sechs Gedichte von Lenau und Requiem.
91. Romanzen für Frauenstimmen (Heft II).
92. Introduktion und Allegro. (Konzertstück für Pianoforte mit
Orchester).
93. Motette „Verzweifle nicht“ von Rückert für doppelten
Männerchor und Orgel.
94. Drei Romanzen für Hoboe und Pianoforte.
95. Drei Gesänge von Byron mit Harfe oder Pianoforte.
96. Lieder und Gesänge (Heft IV).
97. Dritte Symphonie (Es-Dur).
98. a) Lieder aus „Wilhelm Meister“; b) Requiem für Mignon
für Soli, Chor, Orchester.
99. Bunte Blätter (14 Stücke für Pianoforte). ††[14].
100. Ouvertüre zur Braut von Messina für Orchester, kompon.
1851.
101. Minnespiel aus Rückerts Liebesfrühling für eine und
mehrere Stimmen mit Pianoforte.
102. Fünf Stücke im Volkston für Cello und Pianoforte.
103. Mädchenlieder von E. Kulmann f. zwei Soprane m.
Pianoforte.
104. Sieben Lieder von E. Kulmann.
105. Sonate (A-moll) für Pianoforte und Violine.
106. Schön Hedwig (Ballade von Hebbel f. Deklamation u.
Pianoforte).
107. Sechs Gesänge.
108. Nachtlied von Hebbel f. Chor u. Orchester. (Dem Dichter
gewidmet.)
109. Ballscenen für Pianoforte zu vier Händen.
110. Drittes Trio (G-moll). Niels Gade zugeeignet.
111. Drei Phantasiestücke für Pianoforte.
112. Der Rose Pilgerfahrt (Märchen nach Moritz Horn für Soli,
Chor, Orchester).
113. Märchenbilder (vier Stücke für Pianoforte und Viola).
114. Drei Lieder für drei Frauenstimmen m. Pianoforte,
kompon. 1853.
115. Manfred (dramatisches Gedicht von Lord Byron).
116. Der Königssohn (Ballade von Uhland, für Soli, Chor,
Orchester).
117. Vier Husarenlieder von Lenau.
118. Drei Klaviersonaten für die Jugend, komponiert 1853.
119. Drei Gedichte von S. Pfarrius.
120. Vierte Symphonie (D-moll).
121. Sonate (D-moll) für Violine und Pianoforte. (Ferd. David.)
122. Ballade vom Haideknaben von Hebbel. Die Flüchtlinge
von Shelley, für Deklamation und Pianoforte.
123. Festouvertüre mit Gesang über das Rheinweinlied.
124. Albumblätter (20 Klavierstücke). ††
125. Fünf heitere Gesänge.
126. Sieben Klavierstücke in Fughettenform, komponiert 1853.
127. Lieder und Gesänge.
128. Ouvertüre zu Shakespeares Julius Cäsar.
129. Konzert (A-moll) für Cello mit Orchester.
130. Kinderball (sechs leichte Tanzstücke zu vier Händen, für
Pianoforte, komponiert 1853).
131. Phantasie für Violine mit Orchester oder Pianoforte (J.
Joachim).
132. Märchenerzählungen (vier Stücke für Klarinette, Viola u.
Pianoforte, komponiert 1853).
133. Gesänge der Frühe (fünf Stücke für Pianoforte, kompon.
1853).
134. Konzert-Allegro mit Introduktion für Pianoforte mit
Orchester, komponiert 1853, (J. Brahms).
135. Gedichte der Königin Maria Stuart.
136. Ouvertüre zu Hermann und Dorothea. (Seiner lieben
Klara.)[15]
137. Jagdlieder aus Laubes Jagdbrevier f. vierstimmigen
Männerchor.
138. Spanische Liebeslieder für eine und mehrere Stimmen mit
Pianoforte zu vier Händen.
139. Des Sängers Fluch (Ballade nach Uhland, bearbeitet von
R. Pohl für Soli, Chor, Orchester).
140. Vom Pagen und der Königstochter (Balladen von Geibel, f.
Soli, Chor, Orchester).
141. Vier doppelchörige Gesänge.
142. Vier Gesänge, komponiert 1842.
143. Das Glück von Edenhall (Ballade nach Uhland, bearbeitet
von Hasenclever, für Soli, Chor, Orchester).
144. Neujahrslied von Rückert, für Chor und Orchester.
145. Romanzen und Balladen für Chorgesang (III. Heft).
146. Romanzen und Balladen für Chorgesang (IV. Heft).
147. Messe für vierstimmigen Chor mit Orchester.
148. Requiem für Chor und Orchester.
2. Unnumerierte Werke.
I. Scenen aus Goethes Faust, für Soli, Chor, Orchester.
II. Der deutsche Rhein von N. Becker, für Soli, Chor,
Orchester, komponiert 1840.
III. Soldatenlied von Hoffmann von Fallersleben. †
IV. Scherzo und Presto für Pianoforte. †
V. Kanon über „An Alexis“ für Pianoforte. †

[12] op. 1–23, ausschließlich zweihändige Klavierwerke.


[13] Wo nichts weiteres angegeben ist, sind die Lieder einstimmig
mit Begleitung des Pianoforte.
[14] Das Zeichen † bedeutet, daß die Entstehungszeit dieser, im
Texte nicht weiter berührten Komposition nicht bekannt ist, zwei
† † , daß das Werk aus mehreren zu verschiedenen Zeiten
entstandenen Stücken besteht.
[15] Die Werke von op. 136 an sind erst nach Schumanns Tode
erschienen.

Ende.
Musiker-Biographien
aus Reclams Universal-
Bibliothek

Auber. Von Ad. Kohut. Bd. 17. Nr. 3389


J. S. Bach. Von Rich. Batka. Bd. 15.
Nr. 3070

Beethoven. Von L. Nohl. Bd. 2. Nr.


1181/81 a

Bellini. Von Paul Voß. Bd. 23. Nr. 4238


Berlioz. Von Br. Schrader. Bd. 28. Nr. 5043

Bizet. Von Paul Voß. Bd. 22. Nr. 3925


Brahms. Von Richard von Perger.
Bd. 27. Nr. 5006

Cherubini. Von M. E. Wittmann.


Bd. 18. Nr. 3434

Chopin. Von E. Redenbacher. Bd.


30. Nr. 5327

Cornelius. Von Edgar Istel. Bd. 25. Nr.


4766
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookball.com

You might also like