0% found this document useful (0 votes)
63 views209 pages

Adaptive and Learning Based Control of Safety Critical Systems

Uploaded by

IonelCop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views209 pages

Adaptive and Learning Based Control of Safety Critical Systems

Uploaded by

IonelCop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 209

Synthesis Lectures on Computer Science

Max Cohen
Calin Belta

Adaptive and
Learning-Based Control
of Safety-Critical Systems
Synthesis Lectures on Computer Science
The series publishes short books on general computer science topics that will appeal to
advanced students, researchers, and practitioners in a variety of areas within computer
science.
Max Cohen · Calin Belta

Adaptive
and Learning-Based Control
of Safety-Critical Systems
Max Cohen Calin Belta
Department of Mechanical Engineering Department of Mechanical Engineering
Boston University Boston University
Boston, MA, USA Boston, MA, USA

ISSN 1932-1228 ISSN 1932-1686 (electronic)


Synthesis Lectures on Computer Science
ISBN 978-3-031-29309-2 ISBN 978-3-031-29310-8 (eBook)
https://doi.org/10.1007/978-3-031-29310-8

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my mom
—Max Cohen

To Ana and Stefan


—Calin Belta
Preface

Motivation and Objectives

The rising levels of autonomy exhibited by complex cyber-physical systems have brought
questions related to safety and adaptation to the forefront of the minds of controls and
robotics engineers. Often, such autonomous systems are deemed to be safety-critical
in the sense that failures during operation could significantly harm the system itself,
other autonomous systems, or, in the worst-case, humans interacting with such a system.
Complicating the design of control and decision-making algorithms for safety-critical
systems is that they must cope with large amounts of uncertainty as they are deployed
autonomously in increasingly real-world environments. For example, a drone required to
deliver medicine to remote regions may encounter unknown wind gusts along its path; due
to unforeseen weather conditions, an autonomous vehicle may drive on surfaces where
the friction forces between the tires and ground become uncertain; a robotic manipulator
operating in close proximity with humans may need to transport various loads with uncer-
tain masses while avoiding collisions with humans moving in an unpredictable manner.
In all of these scenarios, the autonomous systems’ control/decision-making policies must
be able to adapt to uncertainties while adhering to safety-critical constraints.
Recent advances in artificial intelligence (AI) and machine learning (ML) have
facilitated the design of control and decision-making policies directly from data. For
example, advancements in reinforcement learning (RL) have enabled robots to learn
high-performance control policies purely from trial-and-error interaction with their envi-
ronment; advances in deep neural network architectures have allowed for learning control
policies directly from raw sensory information; advancements in Bayesian inference have
allowed for constructing non-parametric models of complicated dynamical systems with
probabilistic accuracy guarantees. The fact that these ML techniques can learn control
policies and dynamic models directly from data makes them extremely attractive for use in
autonomous systems that must operate in the face of uncertainties. Despite these promises,
the performance of such ML approaches is tightly coupled to the data used to train the ML
model and may act unexpectedly when exposed to data outside its training distribution.
That is, although such ML models are extremely expressive at describing the complex
vii
viii Preface

input-output relation of the training data, they are not necessarily adaptive in that input
data outside of the range of the training dataset may produce unexpected outputs. This
phenomenon makes it challenging to directly deploy such learning-based controllers on
safety-critical systems that will inevitably encounter unexpected scenarios that cannot be
accounted for using pre-existing data.
The main objective of this book is to present a unified framework for the design of con-
trollers that learn from data online with formal guarantees of correctness. We are primarily
concerned with ensuring that such learning-based controllers provide safety guarantees,
a property formalized using the framework of set invariance. Our focus in this book is
on online learning-based control, or adaptive control, in which learning and control occur
simultaneously in the feedback loop. Rather than using a controller trained on an a priori
dataset collected offline that is then statically deployed on a system, we are interested in
using real-time data to continuously update the control policy online and cope with uncer-
tainties that are challenging to characterize until deployment. In this regard, most of the
controllers developed in this book are dynamic feedback controllers in that they depend
on the states of an auxiliary dynamical system representing an adaptation algorithm that
evolves based upon data observed in real-time. This idea is not new—it has been the
cornerstone of the field of adaptive control for decades. From this perspective, the main
objective of this book is to extend techniques from the field of adaptive control, which
has primarily been concerned with stabilization and tracking problems, to consider more
complex control specifications, such as safety, that are becoming increasingly relevant in
the realm of robotic and autonomous systems.

Intended Audience

This book is intended to provide an introduction to learning-based control of safety-critical


systems for a wide range of scientists, engineers, and researchers. We have attempted to
write this book in a self-contained manner—a solid background in vector calculus, linear
algebra, and differential equations should be sufficient to grasp most of the mathematical
concepts introduced herein. Prior knowledge of nonlinear systems theory would be useful
(e.g., an introductory course that covers the basics of Lyapunov stability) as it serves as the
starting point for most of the developments in this book, but is not strictly necessary as we
briefly cover the concepts used throughout this book in Chap. 2. Researchers from control
theory are shown how established control-theoretic tools, such as Lyapunov functions, can
be suitably transposed to address problems richer than stabilization and tracking problems.
They are also exposed to ML and its integration with control-theoretic tools with the goal
of dealing with uncertainty. ML researchers are shown how control-theoretic and formal
methods tools can be leveraged to provide guarantees of correctness for learning-based
control approaches, such as reinforcement learning.
Preface ix

Book Outline and Usage

This book is organized into nine chapters. In most chapters, we begin with a short intro-
duction that provides motivation and an overview of the methods introduced therein and
then immediately move into the technical content. Discussions on related works are post-
poned until the Notes section, which can be found at the end of each chapter. We have
aimed to prove most of the results we state; however, in an effort to keep this book rel-
atively self contained, we omit proofs that require the reader to consult outside sources,
and instead provide reference to where such proofs may be found in the Notes section of
each chapter. For increased readability, we do not cite references in the technical parts—
the works on which the material is based are cited in the Notes sections at the end of each
chapter. The contents of each chapter are summarized as follows:

• In Chap. 1 we provide an informal overview of the topics discussed in this book.


• In Chap. 2 we review the fundamentals of Lyapunov stability theory and how such
ideas can be directly used for control synthesis using the notion of a control Lyapunov
function (CLF). Although these concepts are likely familiar to many readers, here we
recast these ideas in the modern context of optimization-based control in which control
inputs are computed as the solution to convex optimization problems that guarantee
stability of the closed-loop system by construction.
• In Chap. 3 we provide a concise introduction to the field of safety-critical control
in which the central object of study is a control barrier function (CBF), an exten-
sion of CLFs from stability to safety problems. Before introducing CBFs, we first
review the idea of formalizing the concept of safety in dynamical systems using the
notion of set invariance. After covering CBFs, we present an extension of the CBF
methodology using the notion of a high order CBF (HOCBF), which provides a sys-
tematic framework for dynamically extending a CBF candidate to construct a valid
safety certificate.
• In Chap. 4, we provide a short introduction to adaptive control of nonlinear systems—
a field focused on simultaneous learning and control of uncertain nonlinear systems.
Our discussion on nonlinear adaptive control centers around the notion of an adaptive
CLF (aCLF), which extends the CLF paradigm from Chap. 2 to adaptive control sys-
tems. Here, we also introduce the more modern adaptive control concept of concurrent
learning—a data-driven technique that can be used to strengthen traditional adaptive
controllers.
• In Chap. 5 we unite the safety-critical control framework of Chap. 3 with the nonlinear
adaptive control framework proposed in Chap. 4 via the notion of an adaptive CBF
(aCBF). In particular, we demonstrate how the concurrent learning adaptive control
techniques from the previous chapter can be used to enforce safety under the worst-
case bounds on the model uncertainty while simultaneously reducing such bounds to
allow for less conservative behavior as more data about the system is collected.
x Preface

• In Chap. 6 we extend the adaptive control methods from Chaps. 4 to 5 to a larger


set of parameter estimation algorithms using the notions of input-to-state stability and
input-to-state-safety. The adaptive control techniques in this chapter are referred to
as “modular” as they allow interchangeability of the parameter estimation algorithm
without affecting the stability and safety guarantees of the adaptive control.
• In Chap. 7 we extend the adaptive control techniques from previous chapters to handle
a more general class of uncertainties. In particular, whereas the methods from Chaps. 4
to 6 handled systems in which the uncertainty enters the dynamics in an additive fash-
ion, in this chapter we consider systems in which the uncertainty enters the dynamics
in a multiplicative fashion in the sense that the dynamics are bilinear in the control
and uncertainty.
• In Chap. 8, we move from an adaptive control framework to a reinforcement learning
(RL) framework in which the goal is to control a system to optimize a cost function
while satisfying safety-critical constraints. We illustrate how the adaptive control tech-
niques from earlier chapters can be extended to this domain by developing safe online
model-based RL (MBRL) algorithms that allow for safely learning the system dynam-
ics and solution to an optimal control problem online in real-time, rather than in an
episodic learning framework as is typical in RL approaches.
• In Chap. 9, we broaden the class of control specifications under consideration. In par-
ticular, we shift our objective from designing controllers that guarantee stability and
safety to controllers that guarantee the satisfaction of more general linear temporal
logic (LTL) specifications. Here, we show how the problem of LTL controller synthe-
sis can be broken down into a sequence of reach-avoid problems that can be solved
using the Lyapunov and barrier functions introduced in earlier chapters. As a specific
example of this idea, we apply the MBRL framework from Chap. 8 to solve such a
sequence of reach-avoid problems, and ultimately, to design a controller that enforces
satisfaction of an LTL specification.

This book can be used and read in a few different ways. Although it was written primarily
to present research results by the authors and others in a unified framework, it could also
serve as the main reference for teaching a course for various audiences. For an audience
with a strong background in control theory, the material in this book could be used to
teach a short course that provides an overview of safe learning-based control from an
adaptive control perspective. For an audience with less exposure to advanced topics in
control theory (e.g., first-year graduate students), this book could be used to teach a full
course on the topic of safe learning-based control by covering the material in this book
Preface xi

along with more extensive coverage of background material in nonlinear systems, adaptive
control, machine learning, and formal methods.

Boston, MA, USA Max Cohen


Calin Belta

Acknowledgements We gratefully acknowledge our collaborators who contributed to the results


in this book: we thank Roberto Tron who was instrumental in developing the approach detailed in
Chap. 7, and we thank Zachary Serlin and Kevin Leahy who contributed to the results reported in
Chap. 9. The first author gratefully acknowledges support from the National Science Foundation
Graduate Research Fellowship Program, which has funded much of the research presented in this
book. The second author acknowledges partial support from the National Science Foundation.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Stabilizing Control Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Lyapunov Stability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Stability Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Control Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Designing Control Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Feedback Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Backstepping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Design Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Safety-Critical Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Safety and Set Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Control Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 High Order Control Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Adaptive Control Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Adaptive Nonlinear Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Concurrent Learning Adaptive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Parameter Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Concurrent Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Exponentially Stabilizing Adaptive CLFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

xiii
xiv Contents

4.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


4.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Adaptive Safety-Critical Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Adaptive Control Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Robust Adaptive Control Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 High Order Robust Adaptive Control Barrier Functions . . . . . . . . . . . . . . . 86
5.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 A Modular Approach to Adaptive Safety-Critical Control . . . . . . . . . . . . . . . . 95
6.1 Input-to-State Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Modular Adaptive Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3 Input-to-State Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7 Robust Safety-Critical Control for Systems with Actuation Uncertainty . . . 117
7.1 A Duality-Based Approach to Robust Safety-Critical Control . . . . . . . . . . 118
7.1.1 Robust Control Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1.2 Robust Control Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2 Online Learning for Uncertainty Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8 Safe Exploration in Model-Based Reinforcement Learning . . . . . . . . . . . . . . . 133
8.1 From Optimal Control to Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 135
8.2 Value Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.3 Online Model-Based Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 140
8.3.1 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.3.2 Safe Exploration via Simulation of Experience . . . . . . . . . . . . . . . . 141
8.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9 Temporal Logic Guided Safe Model-Based Reinforcement Learning . . . . . . 165
9.1 Temporal Logics and Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2 Simultaneous Stabilization and Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Contents xv

9.3 A Hybrid Systems Approach to LTL Control Synthesis . . . . . . . . . . . . . . . 173


9.4 Temporal Logic Guided Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 178
9.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Acronyms

aCBF Adaptive control barrier function


aCLF Adaptive control Lyapunov function
BE Bellman error
CBF Control barrier function
CLF Control Lyapunov function
DBA Deterministic Büchi automaton
DNN Deep neural network
DTA Distance to acceptance
eISS Exponentially input-to-state stable
eISS-CLF Exponential input-to-state stabilizing control Lyapunov function
ES-aCLF Exponentially stabilizing adaptive control Lyapunov function
ES-CLF Exponentially stabilizing control Lyapunov function
FE Finite excitation
HJB Hamilton-Jacobi-Bellman equation
HOCBF High order control barrier function
HO-RaCBF High order robust adaptive control barrier function
ISS Input-to-state stability
ISSf Input-to-state safety
ISSf-CBF Input-to-state safe control barrier function
ISSf-HOCBF Input-to-state safe high order control barrier function
KKT Karush-Kuhn-Tucker
LP Linear program
LQR Linear quadratic regulator
LTL Linear temporal logic
MBRL Model-based reinforcement learning
PE Persistence of excitation
QP Quadratic program
RaCBF Robust adaptive control barrier function
RCBF Robust control barrier function
RCLF Robust control Lyapunov function

xvii
xviii Acronyms

RL Reinforcement learning
RLS Recursive least squares
ROI Region of interest
SMID Set membership identification
Notation

N Set of natural numbers


Z Set of integers
R Set of real numbers
R≥a Set of real numbers greater than or equal to a ∈ R
R>a Set of real numbers strictly greater than a ∈ R
Rn The n-dimensional Euclidean vector space
xi The ith component of a vector x ∈ Rn
x Transpose of a vector x ∈ Rn
n
x y Inner product i=1 xi yi between two vectors x, y ∈ Rn
x Euclidean norm of a vector x ∈ Rn
Rm×n Set of m × n matrices with real entries
Aij The (i, j) entry of a matrix A ∈ Rm×n
A Transpose of a matrix A ∈ Rm×n
A Induced norm of a matrix A ∈ Rm×n
I n×n The n × n identity matrix
λmin (A) Minimum eigenvalue of a matrix A ∈ Rn×n
λmax (A) Maximum eigenvalue of a matrix A ∈ Rn×n
∅ Empty set
2C Power set of set C
C1 × C2 Cartesian product of sets C1 and C2
∂C Boundary of a set C
Int(C) Interior of a set C
C̄ Closure of a set C
C1 \ C2 Set difference of sets C1 and C2
Br (x) Open ball of radius r ∈ R>0 centered at x ∈ Rn
B̄r (x) Closed ball of radius r ∈ R>0 centered at x ∈ Rn
∂h
∂ x (x) The m × n Jacobian matrix of a continuously differentiable
function h : Rn → Rm evaluated at x ∈ Rn
∇h(x) The gradient of a continuously differentiable scalar function
h : Rn → R evaluated at x ∈ Rn

xix
xx Notation

Lf h The Lie derivative L f h(x) = ∂∂hx (x) f (x) of a continuously


differentiable function h : Rn → R along a vector field
f : Rn → Rn
Lg h The Lie derivative L g h(x) = ∂∂hx (x)g(x) of a continuously
differentiable function h : Rn → R along a vector field
g : Rn → Rn× m
K : X⇒U A set-valued mapping that assigns to each x ∈ X a set
K (x) ⊂ U
K Set of class K functions
K∞ Set of class K ∞ functions
KL Set of class KL functions
Ke Set of extended class K functions
K e∞ Set of extended class K ∞ functions
L∞ Space of piecewise continuous and bounded functions
L2 Space of piecewise continuous and square integrable
functions
wo (0)wo (1) . . . wo (i) ∈ O Word over set O
(wo (0)wo (1) . . . wo (k))ω Infinitely many repetitions of a sequence
Oω Set of infinite words over set O
 Boolean constant true
¬, ∧, ∨, →, ↔ Boolean operators negation, conjunction, disjunction,
implication, and equivalence
 “Next” temporal operator
U “Until” temporal operator
♦ “Eventually” temporal operator
 “Always” temporal operator
Introduction
1

In this brief introductory chapter, we provide an informal description of the problem that we
consider throughout the book. We introduce the two classes of control systems that we focus
on and discuss our assumptions and relevance to applications. We also provide a high-level
description of the technical approach.

Safe learning-based control generally concerns the design of control policies for uncertain
dynamical systems that ensure the satisfaction of safety constraints. Such safety constraints
may be represented by constraints on the states of a dynamical system (i.e., the system tra-
jectory should remain within a prescribed “safe set” at all times), constraints on the actuators
of a system (i.e., control effort must not exceed specified bounds), or both. Secondary to
such safety requirements are performance requirements, such as asymptotic stability of an
equilibrium point or the minimization of a cost function, that would be desirable to satisfy,
but not at the cost of violating any safety constraints. The challenge that safe learning-based
control aims to address is the construction of control policies enforcing the satisfaction of
these aforementioned requirements in the face of model uncertainty. To make these ideas
more concrete, consider the control system

ẋ = f (x, u),

where x ∈ X is the system state belonging to the state space X, u ∈ U is the control input
belonging to the (possibly bounded) control space U, and f (x, u) is a vector field charac-
terizing the system dynamics. The main problem considered herein is to design a control
policy that renders the above system safe, that is, that the resulting closed-loop trajectory
x(t) remains in a safe set x(t) ∈ C ⊂ X at all times under the control input u(t) ∈ U, with-
out (precise) knowledge of the vector field f governing the system dynamics. In this book,
we present a suite of tools for solving this problem by building an estimate of the dynam-
ics fˆ(x, u) using data collected online that is then used in a model-based control policy

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_1
2 1 Introduction

to satisfy the aforementioned objectives. Importantly, we aim to ensure that such a policy
guarantees satisfaction of all safety constraints not only after a suitable estimate has been
obtained, but also during the learning process. Such a problem is very challenging without
further restrictions on the system model or learning approach used—in what follows we
discuss a special case of this more general problem and outline the main techniques used to
solve it.
The primary focus of this book is on safe learning-based control from an adaptive control
perspective. For much of this book, our development focuses on nonlinear control affine
systems with parametric uncertainty:

ẋ = f (x) + F(x)θ + g(x)u,

where θ ∈  is a vector of uncertain parameters belonging to the (possibly bounded) param-


eter space . Here, f (x) models the system drift dynamics describing the natural flow of
the system in the absence of control inputs or uncertainty, g(x) is a matrix whose columns
capture the system’s control directions, and F(x) is a matrix whose columns characterize
the directions along which the uncertain parameters act. The assumption that the uncer-
tainty manifests itself in a parametric fashion implies that the structure of the dynamics are
known but may depend on various quantities, such as inertial or friction parameters, that
are unknown. Fortunately, most real-world robotic systems (e.g., those whose equations
of motion are derived using Lagrangian mechanics) satisfy this structural assumption, and
systems not satisfying such an assumption can often be made to satisfy it using hand-picked
or learned features to represent the uncertainty.
We are primarily interested in designing adaptive controllers u = k(x, θ̂), where θ̂ is
an estimate of the uncertain parameters, that ensure the closed-loop system trajectory x(t)
remains in some prescribed safe set C ⊂ X at all times. Complicating this problem is the
fact that θ̂ is not a static estimate of the uncertainty, but continuously evolves according
to its own dynamics θ̂˙ , which may, in turn, depend on the system state x and even the
control input u. This gives rise to a tight feedback loop between learning and control, which
necessitates carefully selecting both the control policy k(x, θ̂) and the learning algorithm
itself—characterized by the auxiliary dynamical system θ̂˙ —to ensure the ultimate control
objective is met. To this end, we demonstrate how modern ideas from machine learning
can be incorporated into traditional adaptation algorithms to guarantee convergence of the
parameter estimates to their true values, allowing for a gradual reduction in uncertainty as
more data about the system is collected.
We approach the problem of designing learning-based control algorithms for safety-
critical systems through the use of certificate functions from nonlinear control theory. The
most familiar of such functions is the well-known Lyapunov function for certifying the sta-
bility of dynamical systems, and its extension to control systems—the control Lyapunov
function (CLF). The main ideas regarding Lyapunov can be suitably transposed to address
safety, rather than stability, using the dual notion of a barrier function for safety certi-
fication, and a control barrier function (CBF) for control synthesis. Much of this book
1 Introduction 3

concentrates on developing adaptive versions of CLFs and CBFs that facilitate the design of
both controllers and adaptation algorithms that guarantee stability and safety, respectively,
by construction. Here, we recast the traditional design of adaptive controllers in the modern
context of optimization-based control in which control inputs are computed as the solution
to a convex optimization problem whose constraints guarantee satisfaction of closed-loop
system properties (stability, safety) by construction.
In later parts of this book, we shift from the problem of safe adaptive control to safe
reinforcement learning in which the objective is to construct a controller for an uncertain
dynamical system that minimizes the infinite-horizon cost functional
 ∞
(x(s), u(s))ds,
0
where (x, u) is a running cost, while ensuring that the system state remains in a safe set at
all times. Extending the adaptive control ideas introduced in early chapters, we demonstrate
how similar techniques can be leveraged to safely learn a parametric estimate of the value
function of the above optimal control problem online using data from a single trajectory.
Finally, we discuss an extension of the aforementioned approaches to richer control
specifications given in the form of temporal logic formulas, which provide a formal way to
express complex control objectives beyond that of stability and safety. We focus on Linear
Temporal Logic (LTL), which is particularly fitted for synthesis of control strategies using
automata-based techniques. We show how the safety guarantees developed for adaptive
control throughout the book can be extended to accommodate expressive temporal logic
specifications.
Stabilizing Control Design
2

In the first technical chapter of the book, we introduce fundamental notions of stability for
dynamical systems, and review ways in which stability can be enforced through feedback
control. Central to our treatment is the notion of a control Lyapunov function, which maps
stability requirements to control constraints. The most important statement that we make in
this chapter is that stability can be enforced as a constraint in an optimization problem, which
paves the way to seamless integration with safety, which is treated later in the book. This
chapter is organized as follows. In Sect. 2.1, we define stability and Lyapunov functions, and
review the main stability verification results based on Lyapunov theory. We introduce control
Lyapunov functions in Sect. 2.2, where we also discuss enforcing stability as a constraint
in an optimization problem. We present two widely used methods for designing control
Lyapunov functions in Sect. 2.3. We conclude with references, discussions, and suggestions
for further reading in Sect. 2.4.

2.1 Lyapunov Stability Theory

Here we recount the main ideas behind Lyapunov stability theory, which serves as a founda-
tion for the technical developments in later chapters. Much of this material may be familiar
to many readers; however, we will use it as a means to establish common ground regarding
notation, definitions, and preliminary results that will be used throughout the rest of the
book. Moreover, we hope this presentation will help highlight the duality between stability
and safety properties, encoded via Lyapunov functions and barrier functions, respectively,
the latter of which will be covered in the proceeding chapters.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 5


M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_2
6 2 Stabilizing Control Design

2.1.1 Stability Notions

We consider a nonlinear dynamical system described by

ẋ = f (x), (2.1)

where x ∈ X ⊂ Rn is the system state, assumed to take values in an open subset X ⊂ Rn


of the n–dimensional Euclidean space, and f : X → Rn is a vector field that is locally
Lipschitz on X, which models the system dynamics.

Definition 2.1 A mapping f : X → Y with X ⊂ Rn and Y ⊂ Rm is said to be locally


Lipschitz on X if, for all x0 ∈ X, there exist positive constants δ, L ∈ R>0 such that for all
x, y ∈ Bδ (x0 ),
 f (x) − f (y) ≤ Lx − y. (2.2)
When f : X → Y is locally Lipschitz on X and the domain of the mapping X is understood
from the context, we will simply say that f is locally Lipschitz.

The Lipschitz assumption on a vector field f ensures that the associated dynamical
system generates unique trajectories from any initial condition x0 ∈ X, at least for short
time intervals. Formally, we have:

Theorem 2.1 Let f : X → Rn be locally Lipschitz. Then, for any initial condition x 0 ∈
X, there exists a maximal interval of existence I (x0 ) := [0, τmax ) ⊂ R, τmax ∈ R>0 , and
a continuously differentiable mapping x : I (x0 ) → X such that t  → x(t) is the unique
solution to (2.1) on I (x0 ) in the sense that it solves the initial value problem

x(0) = x0 ,
(2.3)
ẋ(t) = f (x(t)), ∀t ∈ I (x0 ).

The vector field f is called forward complete if τmax = ∞ in the above theorem. With a
slight abuse of notation, we use x(·) or t  → x(t) to distinguish a trajectory of (2.1) from an
arbitrary state of (2.1), which is simply denoted by x.
A fundamental problem studied in control theory concerns the stability of equilibrium
points of (2.1).

Definition 2.2 A point xe ∈ X is said to be an equilibrium point of (2.1) if f (xe ) = 0.

For the remainder of this chapter, we assume that there exists at least one equilibrium point
of (2.1), which, without loss of generality, is assumed to be at the origin. Informally, the
origin is said to be stable for (2.1) if initial conditions near the origin produce trajectories that
2.1 Lyapunov Stability Theory 7

stay near it. The origin is asymptotically stable if such trajectories also approach the origin.
One way to formally define various notions of stability is through the use of different classes
of scalar comparison functions. Two classes of comparison functions useful for studying
stability are defined as follows.

Definition 2.3 (Class K, K∞ functions) A continuous function α : [0, a) → R≥0 , where


a ∈ R>0 , is said to be a class K function, denoted by α ∈ K, if α(0) = 0 and α(·) is strictly
increasing. If a = ∞ and limr →∞ α(r ) = ∞, then α is said to be a class K∞ function,
which we denote by α ∈ K∞ .

Definition 2.4 (Class KL function) A continuous function β : [0, a) × R≥0 → R≥0 ,


where a ∈ R>0 , is said to be a class KL function, denoted by β ∈ KL, if β(r , s) is a
class K function of r for each fixed s ∈ R≥0 , and if β(r , s) is decreasing in s for each fixed
r ∈ [0, a).

Making use of these comparison functions allows us to concisely define various notions
of stability.

Definition 2.5 (Stability) Let t  → x(t) be a trajectory of (2.1) from an initial condition
x0 ∈ X in the sense of (2.3). The origin for (2.1) is said to be

• locally stable if there exists α ∈ K and a positive constant δ ∈ R>0 , such that for all
x0 ∈ Bδ (0),
x(t) ≤ α(x0 ), ∀t ∈ R≥0 ; (2.4)
• locally asymptotically stable if there exists β ∈ KL and a positive constant δ ∈ R>0 ,
such that for all x0 ∈ Bδ (0),

x(t) ≤ β(x0 , t), ∀t ∈ R≥0 ; (2.5)

• locally exponentially stable if it is locally asymptotically stable and β(r , s) = kr e−cs for
some positive constants k, c ∈ R>0 .

If the above conditions hold for all x0 ∈ X then we remove the “local” qualifier and simply
say that the origin is stable, asymptotically stable, or exponentially stable.

According to the above definitions, the stability of an equilibrium point implies that
the trajectory from a fixed initial condition x0 will remain within a ball of radius α(x0 )
centered at the origin. If the system is asymptotically stable, then, in addition to remaining
in some ball of the origin, the properties of class KL functions allow us to conclude that
trajectories approach the origin in the limit as time goes to infinity. Thus, stability can
be used to answer questions related to the safety (“never do anything bad”) and liveness
8 2 Stabilizing Control Design

(“eventually do something good”) properties of a system. For example, if the ball of radius
α(x0 ) centered at the origin represents some “safe” region that the system should remain in
at all times, then certifying the stability of (2.1) allows one to conclude safety of the system1 .
On the other hand, if the origin represents some desired state that the system should reach,
then certifying asymptotic stability allows one to conclude that the system will eventually
converge to some neighborhood of that state. If asymptotic stability can be strengthened
to exponential, then one can even provide estimates on how long it takes for the system
to reach such a region. The main limitation of the preceding approach is that certifying
these properties depends on knowledge of the system trajectories, which is problematic
since nonlinear dynamical systems of the form (2.1) generally do not admit closed-form
solutions.

2.1.2 Lyapunov Functions

One way to certify the stability of the equilibrium points of a dynamical system without
explicit knowledge of its trajectories is through the use of a Lyapunov function. Roughly
speaking, a Lyapunov function is a positive definite scalar function that captures a gener-
alized notion of the system’s “energy”; if this energy is conserved or decays along system
trajectories, then one can draw conclusions about stability. To examine how a scalar func-
tion changes along a system’s trajectories without explicitly requiring knowledge of such
trajectories, we use the Lie derivative. Given a scalar function V : X → R and a vector
field f : X → Rn , the Lie derivative of V along f is defined as
∂V
L f V (x) := (x) f (x) = ∇V (x) f (x), (2.6)
∂x
which measures the rate of change of V along the vector field f . The Lie derivative directly
encodes the time rate of change of V along a given trajectory t  → x(t) since
d ∂V ∂V
V (x(t)) = (x(t))ẋ(t) = (x(t)) f (x(t)) = L f V (x(t)).
dt ∂x ∂x
The notion of a Lie derivative generalizes to vector-valued functions h : Rn → Rm using
the same definition as in (2.6), in which case L f h(x) = ∂∂hx (x) f (x) ∈ Rm will be a m-
dimensional vector.
Since V̇ denotes differentiation of V with respect to time, we often abbreviate the Lie
derivative as V̇ (x) := L f V (x). The main idea behind certifying stability using the Lyapunov
method is to guarantee that some positive definite scalar “energy” function V decreases along
system trajectories, which can be ensured by checking that

L f V (x) < 0 (2.7)

1 A more in-depth treatment of safety is postponed to the next chapter.


2.1 Lyapunov Stability Theory 9

for all x ∈ X \ {0}. If such a condition holds and V is positive definite, then V (x(t)) must
decrease along any solution t  → x(t) to zero, implying the solution itself converges to
zero. If such an energy function V satisfies the preceding conditions (positive definiteness,
negative definite Lie derivative), then we refer to V as a Lyapunov function, which provides a
certificate for the stability of the system. To formalize these ideas, we give a few preliminary
definitions.

Definition 2.6 (Lyapunov function candidate) A continuously differentiable scalar function


V : X → R≥0 is said to be a Lyapunov function candidate if there exist α1 , α2 ∈ K∞ , such
that for all x ∈ X,
α1 (x) ≤ V (x) ≤ α2 (x). (2.8)

The requirement that a Lyapunov function candidate be bounded by class K∞ functions


can be relaxed to require that α1 , α2 ∈ K if one only wishes to only establish local stability
results. The existence of such class K functions for all x ∈ X is guaranteed if V : X → R≥0
is positive definite2 on X. If, in addition, V is radially unbounded3 , then α1 , α2 can be taken
as class K∞ functions. For ease of exposition, our working definition of a Lyapunov function
candidate will assume that α1 , α2 ∈ K∞ with the understanding that all stability results can
be given a local characterization through the use of class K functions.

Definition 2.7 (Lyapunov function) A Lyapunov function candidate is said to be a Lyapunov


function if
V̇ (x) = L f V (x) < 0, ∀x ∈ X \ {0}. (2.9)

Note that if V is a Lyapunov function, then L f V is negative definite on X, implying the


existence of α ∈ K such that

L f V (x) ≤ −α(x), ∀x ∈ X.

We now have all the tools in place to state the main result regarding Lyapunov functions.

Theorem 2.2 (Lyapunov’s Direct Method) Let the origin be an equilibrium point of (2.1)
and let V : X → R≥0 be a Lyapunov function. Then, the origin of (2.1) is asymptotically
stable.

The power of Theorem 2.2 is that it allows for certifying stability purely based on the
vector field f describing the system dynamics (2.1) and a suitably constructed Lyapunov
function V : verifying that f points directly into the sublevel sets of the Lyapunov function V

2 Recall that a function V : X → R


≥0 is positive definite on X if V (x) > 0 for all x ∈ X \ {0} and
V (x) = 0 if and only if x = 0.
3 Recall that a function is radially unbounded if lim
x→∞ V (x) = ∞.
10 2 Stabilizing Control Design

is sufficient to certify the stability of the equilibrium point. If stronger conditions are placed
on V , then such an approach can be used to certify the exponential stability of equilibria.

Theorem 2.3 Let the conditions of Theorem 2.2 hold and suppose that αi (x) = ci x2
with ci ∈ R>0 for all i ∈ {1, 2, 3}. Then, the origin of (2.1) is exponentially stable.

We will use the preceding theorem as an opportunity to introduce a tool known as the
Comparison Lemma that allows for bounding the trajectories of a possibly very complicated
dynamical system by the trajectories of a much simpler one.

Lemma 2.1 (Comparison Lemma) Let f : R × R≥0 → R be locally Lipschitz in its first
argument and continuous in its second. Consider the initial value problem

ẏ(t) = f (y(t), t), ∀t ∈ I (y0 ),


y(t0 ) =y0 ,

where I (y0 ) = [t0 , τmax ) is the solution’s maximal interval of existence. Now let v :
[t0 , τmax ) → R be a continuously differentiable function satisfying

v̇(t) ≤ f (v(t), t), ∀t ∈ [t0 , τmax )


v(t0 ) ≤y0 .

Then v(t) ≤ y(t) for all t ∈ [t0 , τmax ).

We now demonstrate how the Comparison Lemma in conjunction with Theorem 2.3 can be
used to establish exponentially decaying bounds on the trajectory of a dynamical system.

Proof (of Theorem 2.3) The Lie derivative of the Lyapunov function candidate can be
bounded as
c3
V̇ (x) = L f V (x) ≤ −c3 x2 ≤ − V (x).
c2
We now introduce the scalar comparison system
c3
ẏ = − y
c2
y(0) =V (x0 ),

whose solution is given by


c
− c3 t
y(t) = V (x0 )e 2 .
It then follows from the Comparison Lemma that
c
− c3 t
V (x(t)) ≤ V (x0 )e 2 .
2.2 Control Lyapunov Functions 11

Using c1 x2 ≤ V (x) ≤ c2 x2 , the above inequality implies that



c2 c
− 3 t
x(t) ≤ x0 e 2c2 ,
c1

which implies the origin is exponentially stable, as desired. 

The main limitation of the Lyapunov approach is that it relies on constructing a Lyapunov
function, which often raises the question “How does one find a Lyapunov function?” If no
additional assumptions are placed on f (other than those required for the existence and
uniqueness of solutions to the corresponding differential equation), then this is a challenging
question to answer, and we do not attempt to do so in this book. Rather, we note that in this
book the question of “finding a Lyapunov function” is not necessarily the question we seek
to answer, since it is implicit in this question that a controller has already been designed
for a system, and that our goal is to verify that such a controller renders the origin stable
for the closed-loop system. If the ultimate objective of the control design, however, is to
enforce stability, then a different approach would be to design a controller that enforces
the Lyapunov conditions by construction, thereby obviating the need to perform post hoc
verification or find a Lyapunov function for the closed-loop system. In the following section,
we explore this approach using the notion of a control Lyapunov function.

2.2 Control Lyapunov Functions

In the previous section, we briefly recounted the main ideas behind Lyapunov stability
theory and illustrated how Lyapunov functions can be used to certify the stability of closed-
loop dynamical systems of the form ẋ = f (x). In this section, we discuss an extension of
Lyapunov methods to open dynamical systems, or control systems, of the form ẋ = f (x, u),
where u is a control input that allows for modifying the natural dynamics of the system,
encoded by the vector field f (x, 0). Note that by fixing a feedback controller u = k(x),
we obtain a new closed-loop dynamical system ẋ = f cl (x) := f (x, k(x)) whose stability
properties could then be studied by searching for a suitable Lyapunov function as in Sect. 2.1.
In this section, we present an alternative approach to certifying the stability of control systems
based on the notion of a control Lyapunov function (CLF). In essence, a CLF is a Lyapunov
function candidate for a control system, whose Lie derivative can be made to satisfy the
Lyapunov conditions by appropriate control action.
The main idea behind the CLF approach is as follows. Rather than fixing a desired
controller and then searching for a Lyapunov function, we fix a desired Lyapunov function
and then construct a controller such that the Lyapunov conditions are satisfied for the closed-
loop system by construction. The distinction between these two approaches is subtle, but
we believe important. The process of finding a Lyapunov function is very challenging, often
requiring a certain amount of ingenuity or trial and error. However, the process of finding a
12 2 Stabilizing Control Design

CLF can be made almost systematic for many relevant classes of control systems. Various
methods to construct CLFs are reviewed in Sect. 2.3.
We focus on a class of nonlinear systems of the form

ẋ = f (x) + g(x)u, (2.10)

where x ∈ X ⊂ Rn is the state of the system with X an open set, and u ∈ U ⊂ Rm is the
control input. The vector field f : X → Rn models the system drift dynamics and is assumed
to be locally Lipschitz, whereas the columns of g : X → Rn×m , denoted by gi : X → Rn ,
i ∈ {1, . . . , m}, are vector fields capturing the system’s control directions, also assumed to
be locally Lipschitz. Control systems in the form (2.10) are called affine since the right-hand
side of (2.10) is an affine function in control (if the state is fixed).
Given a state feedback controller u = k(x), we obtain the corresponding closed-loop
system:
ẋ = f (x) + g(x)k(x) =: f cl (x). (2.11)
Note that if k : X → U is locally Lipschitz, then the corresponding closed-loop vector field
f cl : X → Rn is also locally Lipschitz. Hence, by Theorem 2.1, for any initial condition x0 ∈
X, there exists a maximal interval of existence I (x0 ) = [0, τmax ) ⊆ R≥0 and a continuously
differentiable function x : I (x0 ) → X such that

x(0) =x0 ,
(2.12)
ẋ(t) = f (x(t)) + g(x(t))k(x(t)), ∀t ∈ I (x0 ).

We are interested in designing controllers for (2.10) that guarantee the stability of the
origin for the resulting closed-loop system by construction, which can be accomplished
using a CLF. To this end, let V : X → R≥0 be a Lyapunov function candidate and observe
that the Lie derivative of V along the dynamics (2.10) is given by
∂V
V̇ (x, u) = (x)( f (x) + g(x)u) = L f V (x) + L g V (x)u.
∂x

Here, L g V (x) = ∂∂Vx (x)g(x) ∈ R1×m is the 1 × m matrix whose components are the Lie
derivatives of V along each column of g. At this point, we cannot ask if V is a Lyapunov
function for (2.10)–this would first require fixing a controller u = k(x), thereby producing
a new dynamical system ẋ = f cl (x), and then checking if L fcl V (x) satisfies the criteria of
Definition 2.7. However, we can ask if it is possible, for each nonzero x, to pick some input
u to enforce the Lyapunov conditions upon V̇ (x, u). If V satisfies such a condition, then we
say that V is a CLF. Formally, we define a CLF as follows.

Definition 2.8 (Control Lyapunov function) A Lyapunov function candidate V : X → R≥0


is said to be a control Lyapunov function (CLF) for (2.10) if there exists α ∈ K such that
for all x ∈ X\{0}
2.2 Control Lyapunov Functions 13

inf {L f V (x) + L g V (x)u} < −α(x). (2.13)


u∈U

Determining if a Lyapunov function candidate is a CLF depends heavily on the behavior of


L g V . For example, when U = Rm (i.e., the control input is unconstrained) the condition in
(2.13) can be restated as

∀x ∈ X \ {0} : L g V (x) = 0 =⇒ L f V (x) < −α(x). (2.14)

That is, when L g V (x)  = 0 and the control input is unconstrained, it is always possible to
pick some u such that the scalar inequality in (2.13) is satisfied; when L g V (x) = 0 one must
rely on the drift dynamics f to ensure the such a condition is met. If control constraints are
present–for example, when U ⊂ Rm is a convex polytope–determining if V is a CLF is a
much more challenging problem, which will be discussed later in this section.
The appeal of a CLF is that a CLF induces an entire family of stabilizing policies expressed
through the set-valued map

K clf (x) := {u ∈ U | L f V (x) + L g V (x)u ≤ −α(x)}, (2.15)

that assigns to each x ∈ X, a set K clf (x) ⊂ U of control values satisfying the CLF condition
from (2.13). The main result with regard to CLFs is that choosing any locally Lipschitz
controller u = k(x) satisfying k(x) ∈ K clf (x) for all x ∈ X renders the origin asymptotically
stable.

Theorem 2.4 If V is a CLF for (2.10), then any locally Lipschitz controller u = k(x)
satisfying k(x) ∈ K clf (x) for all x ∈ X renders the origin asymptotically stable.

Proof Putting u = k(x) and computing V̇ reveals that

V̇ (x) = L f V (x) + L g V (x)k(x) ≤ −α(x),

which, according to Definition 2.7, implies that V is a Lyapunov function for the closed-loop
system. From Theorem 2.2, it follows that the origin is asymptotically stable for the closed
loop system. 

The notion of a CLF can also be specialized to handle exponential stabilization tasks
using the notion of an exponentially stabilizing CLF.

Definition 2.9 (Exponentially stabilizing CLF) A continuously differentiable function V :


X → R≥0 is said to be an exponentially stabilizing control Lyapunov function (ES-CLF) if
there exist positive constants c1 , c2 , c3 ∈ R>0 such that for all x ∈ X

c1 x2 ≤ V (x) ≤ c2 x2 , (2.16)


14 2 Stabilizing Control Design

and for all x ∈ X \ {0}

inf {L f V (x) + L g V (x)u} < −c3 x2 . (2.17)


u∈U

Similar to CLFs, an ES-CLF V induces a set-valued map K es-clf : X ⇒ U that associates


to each x ∈ X the set K es-clf ⊂ U of control values satisfying the ES-CLF condition from
(2.17) as
K clf (x) := {u ∈ U | L f V (x) + L g V (x)u ≤ −α(x)}. (2.18)
Choosing any such controller k(x) ∈ K es-clf (x) for all x ∈ X renders the closed-loop system
ES-CLF as shown in the following theorem.

Theorem 2.5 If V is an ES-CLF for (2.10), then any locally Lipschitz controller u = k(x)
satisfying k(x) ∈ K es-clf (x) for all x ∈ X renders the origin exponentially stable.

Proof With u = k(x), the Lie derivative of V along the closed-loop dynamics satisfies

V̇ (x) = L f V (x) + L g V (x)k(x) ≤ −c3 x2 ,

and the theorem follows from Theorem 2.3. 

The existence of a CLF implies the existence of control inputs that, for each nonzero x,
enforce negativity of V̇ (x, u), and Theorem 2.4 illustrates that if such inputs can be stitched
together into a locally Lipschitz feedback control policy, then that policy renders the origin
asymptotically stable.
The approach taken in this book is to view the CLF condition (2.13) as a constraint that
must be satisfied by the control input u for each x ∈ X. When viewed as a function of u, such
a constraint is affine and is therefore convex, which allows the computation of control inputs
satisfying the CLF condition (2.13) to be computed, for any x ∈ X, by solving a convex
optimization problem. For example, inputs satisfying the CLF condition can be computed
as the solution to the optimization problem
1
k(x) = arg min u2
u∈U 2 (2.19)
subject to L f V (x) + L g V (x)u ≤ −α(x),

which is a quadratic program (QP) for a given, fixed x, provided U = Rm or U is a convex


polytope. The controller in (2.19) returns, for any given x, the input u = k(x) of minimum
norm that satisfies k(x) ∈ K clf (x) and is often referred to as a pointwise min-norm controller.
Clearly, such a controller satisfies k(x) ∈ K clf (x) for each x ∈ X and therefore renders the
origin asymptotically stable provided x  → k(x) is locally Lipschitz on X. However, the
fact that control inputs are computed as the solution to an optimization problem in (2.19),
2.2 Control Lyapunov Functions 15

as opposed to being output from a closed-form feedback law, raises concerns regarding
the continuity and smoothness properties of the resulting controller. The remainder of this
section is thus dedicated to establishing various properties of the controller in (2.19). In
particular, we will study the more general QP-based controller

k(x) = arg min 2 u


1 2
− k0 (x) u
u∈Rm (2.20)
subject to a(x) + b(x) u ≤ 0,

so that such results can be directly extended to other QP-based controllers introduced
throughout the book. The following result provides conditions that guarantee the QP-based
controller (2.20) is locally Lipschitz.

Theorem 2.6 (Lipschitz continuity of QP Controllers) Consider the QP-based controller


k : X → Rm defined on some open subset X ⊂ Rn and suppose that k0 : X → Rm , a :
X → R, and b : X → Rm are locally Lipschitz on X. Further, assume that

∀x ∈ X : b(x) = 0 =⇒ a(x) < 0. (2.21)

Then, the solution to (2.20) can be expressed as



k0 (x), if a(x) + b(x) k0 (x) ≤ 0
k(x) = a(x)+b(x) k0 (x) (2.22)
k0 (x) − b(x)2
b(x), if a(x) + b(x) k0 (x) > 0,

and is locally Lipschitz on X.

Proof The proof leverages the Karush-Kuhn-Tucker (KKT) conditions for optimality. We
first note that since the objective function is convex and differentiable, and the constraints
are affine, the KKT conditions are necessary and sufficient for optimality. We next define
the Lagrangian

L(x, u, λ) := 21 u2 − k0 (x) u + λ(a(x) + b(x) u), (2.23)

where λ ∈ R is the Lagrange multiplier. The KKT conditions state that a pair (u ∗ (x), λ∗ (x))
is optimal if and only if the following conditions are satisfied:
∂L ∗ ∗
∂u (x, u (x), λ (x))
=0 (stationarity)
a(x) + b(x) u ∗ (x)≤0 (primal feasibility)
λ∗ (x) ≥ 0 (dual feasibility)
λ (x)(a(x) + b(x) u ∗ (x)) = 0 (complementary slackness).

Hence, for (2.20) the KKT conditions imply that an optimal solution (u ∗ (x), λ∗ (x)) must
satisfy
u ∗ (x) = k0 (x) − λ∗ (x)b(x). (2.24)
16 2 Stabilizing Control Design

We will derive the closed-form solution to (2.20) by breaking the dual feasibility con-
dition down into two cases: (i) λ∗ (x) = 0 and (ii) λ∗ (x) > 0. If λ∗ (x) = 0 then (2.24)
implies that u ∗ (x) = k0 (x). To determine the subset of the state space where this solu-
tion applies, we leverage the primal feasibility condition to see that if u ∗ (x) = k0 (x) then
a(x) + b(x) k0 (x) ≤ 0. This implies that u ∗ (x) = k0 (x) is the optimal solution to the QP
(2.20) in the set
1 := {x ∈ X | a(x) + b(x) k0 (x) ≤ 0}. (2.25)
We now show that the condition in (2.21) implies that any point such that b(x) = 0 lies
strictly in the interior of 1 . Indeed, note from (2.21) that

b(x) = 0 =⇒ a(x) < 0 =⇒ a(x) + b(x) k0 (x) < 0 =⇒ x ∈ Int(1 ). (2.26)

We now consider the case when λ∗ (x) > 0. In such a case it follows from the comple-
mentary slackness condition that we must have

0 = a(x) + b(x) u ∗ (x)


= a(x) + b(x) (k0 (x) − λ∗ (x)b(x)) (2.27)

= a(x) + b(x) k0 (x) − λ (x)b(x) , 2

where the second equality follows from substituting in the stationarity condition (2.24). To
solve the above equation for λ∗ we note that when λ∗ (x) > 0 we must have b(x)  = 0. Indeed,
if it were not, then b(x) = 0, which would imply that u ∗ (x) = k0 (x) and that λ∗ (x) = 0,
which contradicts the initial assumption that λ∗ (x) > 0. Hence, using the fact that b(x)  = 0
and solving for λ∗ yields
a(x) + b(x) k0 (x)
λ∗ (x) = , (2.28)
b(x)2
which, after substituting back into (2.24), yields

a(x) + b(x) k0 (x)


u ∗ (x) = k0 (x) − b(x). (2.29)
b(x)2
To determine the subset of the state space where the above solution applies, we note from
the complementary slackness condition (2.27) that

a(x) + b(x) k0 (x) = λ∗ (x)b(x)2 > 0 (2.30)

where the inequality follows from λ∗ (x) > 0 and b(x)  = 0. Hence, the optimal solution to
the QP (2.20) is given by (2.29) in the set

2 := {x ∈ X | a(x) + b(x) k0 (x) > 0}. (2.31)

Combing the results for 1 and 2 , the solution to (2.20) is given by


2.2 Control Lyapunov Functions 17


k0 (x) if x ∈ 1 ,
k(x) = a(x)+b(x) k0 (x) (2.32)
k0 (x) − b(x)2
b(x) if x ∈ 2 ,

which coincides with the solution given in (2.22). Clearly, x  → k(x) is locally Lipschitz
on 1 as x  → k0 (x) is locally Lipschitz on X ⊃ 1 . Furthermore, x  → k(x) is locally
Lipschitz on 2 since b(x)  = 0 for all x ∈ 2 and x  → k0 (x), a(x), b(x) are all locally
Lipschitz on X ⊃ 2 . Given that k is locally Lipschitz on 1 and 2 , it remains to show
that k is continuous on

¯1∩
 ¯ 2 = ∂1 = {x ∈ X | a(x) + b(x) k0 (x) = 0}. (2.33)

To this end, let {xi }i∈N be a convergent sequence in 1 such that limi→∞ xi = x ∈ ∂1 .
Using the fact that x  → k0 (x) is continuous, we then have

lim k(xi ) = lim k0 (xi ) = k0 (x). (2.34)


i→∞ i→∞

Now let {xi }i∈N be a convergent sequence in 2 such that limi→∞ xi = x ∈ ∂1 . Using the
fact that x  → k0 (x), x  → a(x), and x  → b(x) are all continuous we have
 
lim k(xi ) = lim k0 (xi ) − a(xi )+b(x i ) k0 (xi )
b(x ) 2 b(x i ) = k0 (x), (2.35)
i→∞ i→∞ i

where the last equality follows from x ∈ ∂1 and (2.33). Note that the limit in (2.35) exists
because limi→∞ b(xi )  = 0 for the sequence {xi }i∈N in 2 converging to x ∈ ∂1 since the
set of points where b(x) = 0 lies strictly in the interior of 1 . Combining (2.34) and (2.35)
reveals that x  → k(x) is continuous on ¯1∩ ¯ 2 , which, along with the fact that x  → k(x)
is locally Lipschitz on 1 and 2 , reveals that x  → k(x) is locally Lipschitz on X. 

Remark 2.1 Note that in the above proof, establishing Lipschitz continuity of the QP-based
controller relies heavily on the fact that a strict, rather than a nonstrict, inequality is used in
(2.21). If a nonstrict inequality is used then it is possible that b may vanish on ∂1 from
(2.33), in which case the limit in (2.35) may not exist.

A straightforward application of the preceding theorem reveals that the solution to the
CLF-QP (2.19) when U = Rm is given by

⎨0 if L f V (x) + α(x) ≤ 0,
k(x) = L f V (x)+α(x) (2.36)
⎩− 2 L g V (x) if L f V (x) + α(x) > 0,
L g V (x) 

and is locally Lipschitz on X \ {0}. In general, the above controller may fail to be continuous
at the origin. However, such a controller can be made to be continuous at the origin (i.e., we
may take k(0) = 0 while preserving continuity) provided the corresponding CLF satisfies
the following condition.
18 2 Stabilizing Control Design

Definition 2.10 A CLF V is said to satisfy the small control property if for each ε ∈ R>0
there exists a δ ∈ R>0 such that if x ∈ Bε (0) \ {0}, then there exists a u ∈ Bδ (0) such that
L f V (x) + L g V (x)u ≤ −α(x).

The small control property guarantees that for states arbitrarily close to the origin, there
exists small enough control inputs that enforce negativity of V̇ .
The QP-based approach to control brings with it the ability to incorporate additional
objectives into the control design, expressed as additional constraints in the QP. For exam-
ple, polytopic control constraints can be incorporated by simply including the additional
halfspace constraint A0 u ≤ b0 , where A0 and b0 are a matrix and vector, respectively,
of appropriate dimensions describing the control constraint set U = {u ∈ Rm | A0 u ≤ b0 }.
Although it is straightforward in practice to incorporate such a constraint, care must be taken
to ensure feasibility of the QP and satisfaction of the ultimate control objective. For example,
if it cannot be verified that V is a valid CLF under input constraints, then there may not
exist control values simultaneously satisfying the input constraints and the CLF condition,
leading to infeasibility of the QP. Even if simultaneous satisfaction of such constraints can
be guaranteed, there are few results that ensure the resulting controller will be sufficiently
smooth to guarantee existence and uniqueness of solutions to the closed-loop system. One
way to address this challenge is to relax a subset of constraints and then penalize the mag-
nitude of this relaxation in the objective function. Since control constraints are typically an
intrinsic property of the system under consideration and cannot be relaxed, one can relax
the CLF constraint with a scalar relaxation variable δ ∈ R and solve a relaxed version of the
original CLF-QP from (2.19) as
1
k(x), δ ∗ = arg min u2 + pδ 2
u∈U, δ∈R 2 (2.37)
subject to L f V (x) + L g V (x)u ≤ −α(x) + δ,

where p ∈ R>0 is a relaxation penalty. The above controller respects the actuation limitations
of the system but does not necessarily guarantee stability. Despite this theoretical limitation,
the relaxed CLF-QP controller in (2.37) has been successfully used in practice (see the notes
in Sect. 2.4). The idea of relaxing constraints in an effort to balance potentially competing
control objectives will be further explored in Chap. 3 in the context of safety-critical control.

2.3 Designing Control Lyapunov Functions

The benefits of the CLF approach discussed thus far are contingent upon the construc-
tion of a valid CLF. In this section, we present two approaches–feedback linearization and
backstepping–for systematically constructing CLFs for special classes of nonlinear con-
trol systems. We keep our exposition brief, only focusing on the main concepts. Further
discussions and references to technical details can be found in Sect. 2.4.
2.3 Designing Control Lyapunov Functions 19

2.3.1 Feedback Linearization

In this section, we briefly outline the technique of feedback linearization as a means to


systematically construct CLFs for certain classes of nonlinear control systems. The main
idea behind feedback linearization is, as the name suggests, to transform a nonlinear system
into a linear one by means of feedback, allowing linear control tools to be leveraged to
complete the control design. The benefit of combing feedback linearization with CLFs is
that once a feedback linearization-based controller is known, converse Lyapunov theorems
can be invoked to produce a Lyapunov function certifying the stability of the control design,
which, by definition, is also a CLF for the original nonlinear system. Hence, rather than
implementing the controller that fully cancels the nonlinearities of the system, one can use
any controller satisfying the CLF conditions, such as the QP-based controller proposed in
Eq. (2.19).
We focus on the nonlinear control system (2.10), restated here for convenience:

ẋ = f (x) + g(x)u,

to which we associate an output


y = h(x), (2.38)
where h : X → Rm maps each element of the state space X ⊂ Rn to a vector of outputs
y ∈ Rm . Note that the dimension of the output is the same as that of the control input.
Our objective is to design a controller for (2.10) such that the output is driven to zero.
Accomplishing this objective depends heavily on the notion of the relative degree of the
output with respect to the system dynamics. Informally, the relative degree of a component of
the output (2.38) with respect to system (2.10) is the number of times that output component
needs to be differentiated along the system dynamics for the control input to explicitly
appear. Obviously, different output components can have different relative degrees. The
collection of the relative degrees for each output component forms the vector relative degree
of the output. A more detailed discussion and references can be found in Sect. 2.4. Before
proceeding we introduce the notion of higher order Lie derivatives obtained by taking the
Lie derivative of a Lie derivative. For example, given the output h : X → Rm and the vector
field f : X → Rn , the second order Lie derivative of h along f is defined as
∂L f h
L 2f h(x) = (x) f (x).
∂x
In general, we denote the r -th Lie derivative as

∂ L rf−1 h
L rf h(x) = (x) f (x).
∂x
It is also possible to take higher order Lie derivatives along different vector fields. For
example, we can take the Lie derivative of L f h along g as
20 2 Stabilizing Control Design

∂L f h
L g L f h(x) = (x)g(x).
∂x
For the remainder of this section, we assume that the vector relative degree of the output is
well-defined (i.e., all components have the same relative degree) on X with respect to (2.10),
and it is equal to 2. Under this assumption, differentiation of the output along the system
dynamics yields
ẏ = L f h(x)
(2.39)
ÿ = L 2f h(x) + L g L f h(x)u,
where L g L f h : X → Rm×m is referred to as the decoupling matrix, which is invertible on
X provided the relative degree 2 condition holds. Applying the control

u = (L g L f h(x))−1 −L 2f h(x) + μ , (2.40)

where μ ∈ Rm is an auxiliary input to be specified and defining the coordinates η :=


[y ẏ ] ∈ R2m yields the linear system

η̇ = Fη + Gμ, (2.41)

with
0m×m Im×m 0
F= , G = m×m , (2.42)
0m×m 0m×m Im×m
where 0m×m ∈ Rm×m is an m × m matrix of zeros. Choosing the auxiliary input as μ =
−K η, such that the closed-loop system matrix A := F − G K is Hurwitz renders the origin
of η̇ = Aη exponentially stable. From standard converse Lyapunov theorems, it follows that,
for any symmetric positive definite Q ∈ R2m×2m , there exists a symmetric positive definite
P ∈ R2m×2m solving the Lyapunov equation

P A + A P = −Q, (2.43)

such that
V (η) = η Pη (2.44)
is a Lyapunov function certifying exponential stability of the closed-loop system’s origin. In
particular, taking the Lie derivative of V along the closed-loop vector field and leveraging
the Lyapunov equation (2.43) leads to

V̇ (η) =2η P Aη = η (P A + A P)η = η Qη ≤ −λmin (Q)η2 . (2.45)

Hence, V is also an ES-CLF for the output dynamics as V satisfies

λmin (P)η2 ≤ V (η) ≤ λmax (P)η2 , ∀η ∈ R2m , (2.46a)


2.3 Designing Control Lyapunov Functions 21

inf V̇ (η, u) < −cλmin (Q)η2 , ∀η ∈ R2m \ {0}, (2.46b)


u∈Rm

with c ∈ (0, 1), which can be used to generate a controller that exponentially drives η to
zero by solving the QP
1
minm u2
u∈R 2 (2.47)
subject to V̇ (η, u) ≤ −cλmin (Q)η2 .

2.3.2 Backstepping

In this section, we discuss backstepping as a means to systematically construct CLFs for a


particular class of nonlinear control systems. Backstepping is a recursive design procedure
applicable to classes of nonlinear control systems with a hierarchical structure, in which
higher order states act as “virtual” control inputs for the lower order dynamics. This approach
allows for systematically constructing a CLF for the overall system using only a CLF for
the lowest order subsystem. To this end, consider a nonlinear control affine system in strict
feedback form
ẋ = f 0 (x) + g0 (x)ξ
(2.48)
ξ̇ = f 1 (x, ξ ) + g1 (x, ξ )u,
where (x, ξ ) ∈ Rn × R p is the system state, u ∈ Rm is the control input, and the func-
tions f 0 : Rn → Rn , g0 : Rn → Rn× p , f 1 : Rn × R p → R p , g1 : Rn × R p → R p×m
characterize the system dynamics. We assume that the functions characterizing the dynam-
ics are locally Lipschitz and that g1 is pseudo-invertible on its domain with g1 (x, ξ )† :=
(g1 (x, ξ ) g1 (x, ξ ))−1 g1 (x, ξ ) denoting the Moore-Penrose pseudo-inverse. Our main
objective is to design a controller that renders x = 0 an asymptotically stable equilibrium
point for the closed-loop system. The backstepping methodology proceeds by viewing ξ as
a “virtual” control input for the subsystem

ẋ = f 0 (x) + g0 (x)ξ, (2.49)

and then designing a controller k0 : Rn → R p that would render the origin of the closed-
loop system
ẋ = f 0 (x) + g0 (x)k0 (x),
asymptotically stable, provided we could simply choose ξ = k0 (x).
Let V0 : Rn → R≥0 be a twice-continuously differentiable CLF for the first subsystem
in the sense that there exists a twice-continuously differentiable controller k0 : Rn → R p
and α1 , α2 , α3 ∈ K∞ such that, for all x ∈ Rn ,

α1 (x) ≤ V0 (x) ≤ α2 (x),


22 2 Stabilizing Control Design

L f0 V0 (x) + L g0 V0 (x)k0 (x) ≤ −α3 (x).


Now define the coordinate transformation z := ξ − k0 (x), which represents the difference
between ξ and the desired control we would implement on the first subsystem (2.49) if ξ
were directly controllable. Using the (x, z) ∈ Rn × R p coordinates, system (2.48) can be
represented as

ẋ = f 0 (x) + g0 (x)k0 (x) + g0 (x)z


∂k0 (2.50)
ż = f 1 (x, ξ ) + g1 (x, ξ )u − (x)( f 0 (x) + g0 (x)k0 (x) + g0 (x)z).
∂x
Now consider the composite Lyapunov function candidate

V (x, z) = V0 (x) + 21 z z,

whose time derivative is

V̇ = L f0 V0 (x) + L g0 V0 (x)k0 (x) + L g0 V0 (x)z



∂k0 (2.51)
+z f 1 (x, ξ ) + g1 (x, ξ )u − (x)( f 0 (x) + g0 (x)k0 (x) + g0 (x)z) .
∂x

Choosing the control as



∂k0
u = g1 (x, ξ )† − f 1 (x, ξ ) + (x)( f 0 (x) + g0 (x)k0 (x) + g0 (x)z) − L g0 V0 (x) − K z ,
∂x
(2.52)
where K ∈ R p× p is positive definite, yields

V̇ =L f0 V0 (x) + L g0 V0 (x)k0 (x) − z K z


(2.53)
≤ − α3 (x) − λmin (K )z2 ,

which implies that V is a Lyapunov function for the transformed system (2.50). Hence, V
is also a CLF for the transformed system (2.50) as V satisfies

α1 (x) + 21 z2 ≤ V (x, z) ≤ α2 (x) + 21 z2 , ∀(x, z) ∈ Rn × R p (2.54a)

inf V̇ (x, z, u) < −c(α3 (x) + λmin (K )z2 ), ∀(x, z) ∈ (Rn \ {0}) × (R p \ {0}).
u∈Rm
(2.54b)
Although the preceding discussion focused on hierarchical systems with only 2 subsys-
tems, the same steps can be recursively followed for systems with an arbitrary number q ∈ N
of finite subsystems of the form
2.3 Designing Control Lyapunov Functions 23

ẋ = f 0 (x) + g0 (x)ξ1
ξ̇1 = f 1 (x, ξ1 ) + g1 (x, ξ1 )ξ2
ξ̇2 = f 2 (x, ξ1 , ξ2 ) + g2 (x, ξ1 , ξ2 )ξ3
..
.
ξ̇q = f q (x, ξ1 , ξ2 , . . . , ξq ) + gq (x, ξ1 , ξ2 , . . . , ξq )u.

2.3.3 Design Example

We close this section with a simple example that demonstrates the construction of a CLF
using the methods outlined thus far. We consider an inverted pendulum with angular position
q ∈ R whose equations of motions can be expressed as

m 2 q̈ − mg sin(q) = τ − bq̇, (2.55)

where m ∈ R>0 is the pendulum’s mass, assumed to be concentrated at its tip, ∈ R>0
is the pendulum’s length, g ∈ R>0 is the acceleration due to gravity, b ∈ R>0 is a viscous
damping coefficient, and τ ∈ R is the torque applied to the base of the pendulum. Taking
the state of the system as x = [q q̇] and the control input as u = τ allows the pendulum
to be expressed in control-affine form (2.10) as:
   
q̇ 0
ẋ = g + 1 u.
sin(q) − mb 2 q̇ m 2 (2.56)
     
f (x) g(x)

Our main objective is to design a feedback controller u = k(x) that drives the pendu-
lum to q = 0 using a CLF, which we’ll accomplish using both feedback linearization and
backstepping.

Feedback Linearization
To proceed with the feedback linearization approach, we define the output y = h(x) = q
and compute the Lie derivative of y to get ẏ = L f h(x) = q̇. As the input u does not appear
in ẏ, the relative degree of the output is greater than one, and we proceed by computing the
Lie derivative of L f h to get

g b 1
ÿ = q̈ = sin(q) − 2
q̇ + u. (2.57)
  m  m 2

L 2f h(x) L g L f h(x)
24 2 Stabilizing Control Design

As L g L f h(x)  = 0 for all x ∈ X = R2 , the output has relative degree 2 on X. Applying the
feedback

g b
u = L g L f h(x)−1 −L 2f h(x) + μ = m 2 − sin(q) + q̇ + μ ,
m 2

and defining η = [y ẏ] yields the linear control system

01 0
η̇ = η+ μ.
00 1

Using the proportional-derivative controller μ = −K p y − K d ẏ, with K p , K d ∈ R>0


such that the origin of the closed-loop linear system

0 1
η̇ = η,
−K p −K d
  
A

is exponentially stable, allows for constructing a Lyapunov function V (η) = η Pη, where
P ∈ R2×2 is the positive definite solution to the Lyapunov equation (2.43) for any positive
definite Q ∈ R2×2 . For this particularly simple output, we have η = [y ẏ] = [q q̇] = x.
Thus converting back to the original coordinates x, the Lyapunov function for the original
system is given by V (x) = x P x, which is, by definition, a CLF for the original nonlinear
control system.
Backstepping
To proceed with the backstepping approach, we first represent the system in strict-feedback
form with state (x, ξ ) = (q, q̇) and control u = τ as

ẋ =ξ
g b 1 (2.58)
ξ̇ = sin(x) − 2
ξ+ u.
m m 2
We then design the “virtual” controller for the first subsystem k0 (x) = −K p x with
K p ∈ R>0 , whose stability can be certified using the Lyapunov function V0 (x) = 21 x 2 . This
function is twice-continuously differentiable and satisfies the Lyapunov conditions with
α1 (s) = α2 (s) = 21 s 2 and α3 (s) = K p s 2 . We next introduce the coordinate transformation
z = ξ − k0 (x) = ξ + K p x with dynamics

ż = ξ̇ − ∂k
∂ x (x) ẋ
0

g b 1 (2.59)
= sin(x) − ξ+ u + K p ξ.
m 2 m 2
Thus, the dynamics in the (x, z) coordinates can be expressed as
2.4 Notes 25

ẋ = z − K p x
g b 1 (2.60)
ż = sin(x) − 2
ξ+ u + K p z − K 2p x.
m m 2

Now consider the Lyapunov function candidate V (x, z) = 21 x 2 + 21 z 2 , whose time-


derivative is
  g b 1
V̇ = x z − K p x + z sin(x) − ξ+ u + K p z − K 2p x (2.61)
m 2 m 2

Taking the control as

g b
u=m 2
− sin(x) + ξ − K p z + K 2p x − x − K d z , (2.62)
m 2

with K d ∈ R>0 then yields


V̇ = −K p x 2 − K d z 2 . (2.63)
Hence, V is a CLF for the original nonlinear control system as satisfies the conditions
from (2.54) with α1 (s) = α2 (s) = 21 s 2 , α3 (s) = K p s 2 , and K = K d .

2.4 Notes

In this chapter we introduced background on nonlinear dynamical systems as well as the


fundamentals of Lyapunov theory in both the analysis and control of nonlinear systems. For
a more in-depth treatment of ordinary differential equations we refer the reader to [1, 2],
whereas more details on nonlinear systems analysis can be found in [3–6]. The standard
definitions for stability of an equilibrium point are typically given an − δ characterization.
In this book we have opted instead to leverage comparison functions in Sect. 2.1.1, which
provide a concise and elegant framework for characterizing stability. Our choice to use
comparison functions is also motivated by the characterizations of safety presented in the
following chapter and its duality with stability. One of the earliest works that leverages
comparison functions to define notions of stability is the book by Hahn [7] and the work by
Sontag in [8] made the use of comparison functions more common in the control-theoretic
literature. A survey of comparison functions, including a more complete history and various
technical results, can be found in [9].
The notion of Control Lyapunov Function (CLF) discussed in Sect. 2.2 was first intro-
duced by Artstein [10] in 1983, where it was shown that the existence of a continuously
differentiable CLF is sufficient for the existence of a controller, locally Lipschitz everywhere
except possibly the origin, that asymptotically stabilizes the origin of the closed-loop sys-
tem. The results in [10], however, are non-constructive in the sense that no explicit formula
for constructing such a controller from a CLF is provided. In 1989, Sontag [11] provided a
“universal” formula using what is now commonly referred to as Sontag’s Formula to explic-
26 2 Stabilizing Control Design

itly construct a controller from a given CLF. Other universal constructions of controllers
from CLFs were proposed in [12], which introduced the point-wise min-norm controller that
selects, at each state, the control value of minimum norm satisfying the CLF conditions. It
was shown in [12] that such controllers are inverse optimal in the sense that any point-wise
min-norm controller is also the optimal controller for a meaningful optimal control prob-
lem. Such controllers have shown a large amount of practical success in controlling complex
nonlinear systems, such as bipedal robots [13].
Although it was clear in [12] that the point-wise min-norm controller is the solution to
a convex optimization problem (2.19), namely a quadratic program (QP), explicitly solving
these optimization problems to generate the corresponding control actions only became pop-
ular in the last decade [14–19]. The shift from traditional closed-form feedback controllers
to optimization-based feedback controllers has been motivated by the ability to incorporate
multiple, potentially competing, objectives into a single controller by carefully selecting
the cost function and constraints in the resulting optimization problem. For example, the
authors of [15] impose torque limits on the CLF controller for bipedal robots from [13]
by embedding both the CLF conditions and torque limits as constraints in a QP, resulting
in the relaxed CLF-QP from (2.37). Another example motivating this shift is the ability to
combine CLFs with the control barrier functions discussed in the next chapter to balance
performance and safety objectives.
The Karush-Kuhn-Tucker (KKT) conditions used to derive the closed-form solution to
the CLF-QP (2.19) can be found in popular books on convex optimization, see, for example,
that of Boyd and Vandenberghe [20] or that of Bertsekas [21]. Results establishing continuity
and/or smoothness of more general optimization-based controllers can be found in [16, 22,
23]. Our proof of Theorem 2.6 is inspired by those in [24, 25]. More details on the small
control property for ensuring continuity of the CLF-QP controller at the origin can be found
in [23, 26].
The feedback linearization technique described in Sect. 2.3.1 has been a popular tool for
nonlinear control design since its inception, which can be traced back to the work of Brockett
[27]. For an in-depth treatment of feedback linearization, we refer the reader to books on
geometric nonlinear control [28, 29]. Traditionally, a criticism of the feedback linearization
approach is that it may cancel useful nonlinearities, such as damping terms that could oth-
erwise aid the stabilization objective. More recently, works such as [13] have demonstrated
the effectiveness of this methodology as a means to generate CLFs for nonlinear control
systems that are feedback equivalent to linear systems. Our discussion on the connections
between feedback linearization and CLFs follows that of [13].
Backstepping, reviewed in Sect. 2.3.2, was introduced in the early 1990s as an alternative
to feedback linearization-based designs. By recursively constructing a CLF by “backstep-
ping” through each subsystem, such designs are often able to avoid the cancellation of
useful nonlinearities that may be cancelled by feedback linearization-based techniques. For
a more in-depth treatment of backstepping we refer the reader to [26], and for a more com-
References 27

plete survey of nonlinear control designs we refer the reader to [30]. Our discussion on the
connections between backstepping and CLFs is inspired by that in [31].

References

1. Arnold VI (1978) Ordinary differential equations. MIT Press


2. Hirsch MW, Smale S (1974) Differential equations, dynamical systems, and linear algebra.
Academic Press
3. Khalil HK (2002) Nonlinear systems, 3rd ed. Prentice Hall
4. Sontag ED (2013) Mathematical control theory: deterministic finite dimensional systems.
Springer Science & Business Media
5. Slotine JJE, Li W (1991) Applied nonlinear control. Prentice Hall
6. Haddad WM, Chellaboina VS (2011) Nonlinear dynamical systems and control: a lyapunov-
based approach. Princeton University Press
7. Hahn W (1967) Stability of motion. Springer
8. Sontag ED (1989) Smooth stabilization implies coprime factorization. IEEE Trans Autom Con-
trol 34(4):435–443
9. Kellett CM (2014) A compendium of comparison function results. Math Control Signals Syst
26:339–374
10. Artstein Z (1983) Stabilization with relaxed controls. Nonlinear Anal Theory Methods Appl
7(11):1163–1173
11. Sontag ED (1989) A universal construction of artstein’s theorem on nonlinear stabilization. Syst
Control Lett 13:117–123
12. Freeman RA, Kokotovic PV (1996) Inverse optimality in robust stabilization. SIAM J Control
Optim 34(4):1365–1391
13. Ames AD, Galloway K, Sreenath K, Grizzle JW (2014) Rapidly exponentially stabilizing control
lyapunov functions and hybrid zero dynamics. IEEE Trans Autom Control 59(4):876–891
14. Ames AD, Powell M (2013) Towards the unification of locomotion and manipulation through
control lyapunov functions and quadratic programs. Control of cyber-physical systems, pp 219–
240
15. Galloway K, Sreenath K, Ames AD, Grizzle JW (2015) Torque saturation in bipedal robotic
walking through control lyapunov function-based quadratic programs. IEEE Access, vol 3
16. Morris BJ, Powell MJ, Ames AD (2015) Continuity and smoothness properties of nonlinear
optimization-based feedback controllers. In: Proceedings of the IEEE conference on decision
and control, pp 151–158
17. Nguyen Q, Sreenath K (2015) Optimal robust control for bipedal robots through control lyapunov
function based quadratic programs. In: Robotics: science and systems
18. Ames AD, Grizzle JW, Tabuada P (2014) Control barrier function based quadratic programs with
application to adaptive cruise control. In: Proceedings of the IEEE conference on decision and
control, pp 6271–6278
19. Ames AD, Xu X, Grizzle JW, Tabuada P (2017) Control barrier function based quadratic programs
for safety critical systems. IEEE Trans Autom Control 62(8):3861–3876
20. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press
21. Bertsekas DP (2016) Nonlinear programming, 3rd ed. Athena Scientific
22. Hager WH (1979) Lipschitz continuity for constrained processes. SIAM J Control Optim
17(3):321–338
28 2 Stabilizing Control Design

23. Jankovic M (2018) Robust control barrier functions for constrained stabilization of nonlinear
systems. Automatica 96:359–367
24. Molnar TG, Kiss AK, Ames AD, Orosz G (2022) Safety-critical control with input delay in
dynamic environment. IEEE Trans Control Syst Technol
25. Tan X, Cortez WS, Dimarogonas DV (2022) High-order barrier functions: robustness, safety and
performance-critical control. IEEE Trans Autom Control 67(6):3021–3028
26. Krstić M, Kanellakopoulos I, Kokotović P (1995) Nonlinear and adaptive control design. Wiley
27. Brockett R (1978) Feedback invariants for nonlinear systems. IFAC Proc Vol 11(1):1115–1120
28. Isidori A (1995) Nonlinear control systems, 3rd ed. Springer
29. Nijmeijer H, van der Schaft A (2015) Nonlinear dynamical control systems. Springer
30. Kokotovic PV, Arcak M (2001) Constructive nonlinear control: a historical perspective. Auto-
matica 37:637–662
31. Taylor AJ, Ong P, Molnar TG, Ames AD (2022) Safe backstepping with control barrier functions.
In: Proceedings of the IEEE conference on decision and control, pp 5775–5782
Safety-Critical Control
3

In the preceding chapter, we discussed the fundamentals of Lyapunov theory, and how
these ideas can be used to design controllers enforcing stability of dynamical systems. In
the present chapter, we discuss how such ideas can be transposed to address the problem
of safety. Informally, safety can be thought of as requiring a system to never do anything
“bad.” This abstract notion is dependent on the application under consideration. For example,
in autonomous driving, safety may correspond to an autonomous vehicle never leaving
its current lane, whereas for a robot navigating in a cluttered environment, safety may
correspond to avoiding collisions with obstacles. In this book, the notion of safety is linked
to set invariance, in the sense that a system is safe if it never leaves a set deemed “good”. In the
last chapter of the book (Chap. 9), we will briefly discuss the satisfaction of temporal logic
formulas, which includes (is strictly more expressive than) safety, and is usually referred
to as “correctness”. Safety is defined in Sect. 3.1, before control barrier functions used to
enforce it are introduced in Sects. 3.2 and 3.3. Final remarks, references, and suggestions
for further reading are included in Sect. 3.4.

3.1 Safety and Set Invariance

In this section, we provide a definition of safety for dynamical systems and provide conditions
under which dynamical systems can be guaranteed to be safe. Formally, the notion of safety
is linked with the fundamental concept of set invariance.

Definition 3.1 (Forward invariance and safety) A set C ⊂ Rn is said to be forward invariant
for (2.1) if, for each x0 ∈ C, the trajectory x : I (x0 ) → Rn with x(0) = x0 satisfies x(t) ∈
C, for all t ∈ I (x0 ). If C is forward invariant, then the system is said to be safe on C.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 29


M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_3
30 3 Safety-Critical Control

Similar to the notions of stability from Definition 2.5, the above definition of safety
requires knowledge of the system trajectory, which motivates the development of conditions
imposed on the system vector field f that can be used to certify safety. For general closed
subsets C ⊂ Rn , such a development relies on the concept of tangent cone to a set.

Definition 3.2 For a closed set C ⊂ Rn , the Bouligand tangent cone1 to C at a point x ∈ C
is defined as   
n
 x + τ vC
TC (x) := v ∈ R  lim inf =0 , (3.1)
τ →0+ τ
where
xC := inf x − y
y∈C

is the distance from a point x ∈ Rn to the set C.

Note that, if x ∈ Int(C), then TC (x) = Rn , and if x ∈/ C, then TC (x) = ∅. It is only on


the boundary of C that TC (x) becomes interesting. Informally, for x ∈ ∂C the Bouligand
tangent cone TC (x) is the set of all vectors v ∈ Rn that are tangent to or point into C. The
following result provides necessary and sufficient conditions for the forward invariance of
closed sets for dynamical systems with a locally Lipschitz vector field.

Theorem 3.1 (Nagumo’s Theorem) A closed set C ⊂ Rn is forward invariant for (2.1) if
and only if for all x ∈ ∂C
f (x) ∈ TC (x). (3.2)

The above result, often referred to as Nagumo’s Theorem or Brezis’s Theorem, simply
states that C is forward invariant if and only if, for each point on the boundary of C, the
vector field f at such points is either tangent to or points into C. A challenge with directly
applying Theorem 3.1 is that for a general closed set C ⊂ Rn , the computation of the tangent
cone may be nontrivial. To provide more practical conditions to certifying safety, we now
specialize the class of sets whose forward invariance we wish to certify. In particular, we
consider sets C ⊂ Rn that can be expressed as the zero superlevel set of a continuously
differentiable function h : Rn → R as

C ={x ∈ Rn | h(x) ≥ 0},


∂C ={x ∈ Rn | h(x) = 0}, (3.3)
Int(C) ={x ∈ R | h(x) > 0}.
n

1 The Bouligand tangent cone is also referred to at the contingent cone.


3.1 Safety and Set Invariance 31

For sets of the form (3.3), it can be shown that, provided2 that ∇h(x) = 0 for all x ∈ ∂C,
the tangent cone to C at x ∈ ∂C can be represented as:

TC (x) = {v ∈ Rn | ∇h(x) v ≥ 0}. (3.4)

A straightforward application of Theorem 3.1 yields the following result:

Corollary 3.1 Consider a closed set C ⊂ Rn defined as the zero superlevel set of a contin-
uously differentiable function h : Rn → R as in (3.3) and assume that ∇h(x) = 0 for all
x ∈ ∂C. Then C is forward invariant for (2.1) if and only if, for all x ∈ ∂C,

L f h(x) ≥ 0. (3.5)

The above condition can be useful for verifying the safety of a dynamical system as it
only requires checking that (3.5) holds on the boundary of C. However, the fact that this
condition is defined only on the boundary of C makes it challenging to use as the basis
for a control design. Ideally, it would be useful to establish invariance conditions over the
entirety of C (or even all of Rn ) so that a single continuous controller u = k(x) could be
used to enforce safety, and possibly other control objectives. One approach would be to
simply ensure that (3.5) holds over all of C; however, this is restrictive as such an approach
would render every superlevel set of h forward invariant (rather than only the zero level set).
Extending condition (3.5) to the entirety of C in the least restrictive fashion requires the
introduction of a new comparison function.

Definition 3.3 (Extended class K, K∞ functions) A continuous function α : (−b, a) → R,


a, b ∈ R>0 is said to be an extended class K function, denoted by α ∈ K e , if α(0) = 0 and
α(·) is strictly increasing. A continuous function α : R → R is said to be an extended class
K∞ function, denoted by α ∈ K∞ e , if α(0) = 0, α(·) is strictly increasing, lim
r →∞ α(r ) =
∞, and limr →−∞ α(r ) = −∞.

Essentially, an extended class K∞ function is a class K function defined on the entire real
line, and facilitates the definition of a barrier function, which plays a role dual to that of a
Lyapunov function for establishing forward invariance and safety.

Definition 3.4 (Barrier function) Let h : Rn → R be a continuously differentiable func-


tion defining a set C ⊂ Rn as in (3.3) such that ∇h(x) = 0 for all x ∈ ∂C. Then, h is said to
e such that for all x ∈ Rn
be a barrier function for (2.1) on C if there exists an α ∈ K∞

L f h(x) ≥ −α(h(x)). (3.6)

2 The condition that ∇h(x) = 0 for all x ∈ ∂C is equivalent to 0 being a regular value of h, which
ensures that h −1 ({0}) = ∂C is an embedded submanifold of Rn .
32 3 Safety-Critical Control

Introducing an extended class K∞ function on the right-hand-side of (3.6) allows t →


h(x(t)) to decrease along trajectories t → x(t) of (2.1) but never become negative. That is,
the condition in (3.6) allows the system trajectory to approach the boundary of C, but never
cross it, thereby ensuring safety in a minimally restrictive fashion. This extension is subtle,
but plays an important role in extending barrier functions to control systems, as it directly
enlarges the class of functions that can serve as a barrier function, thereby directly enlarging
the set of controllers that can be used to render the system safe.

Remark 3.1 The requirement in Definition 3.4 that the barrier condition (3.6) holds on all of
Rn can be generalized so that the condition is only required to hold on some open set D ⊃ C.
This generalization also permits the use of an extended K e function on the right-hand-side
of the inequality in (3.6) provided that such a function is defined on all of D.

The following result constitutes the main result with regard to barrier functions and shows
that the existence of such a function is sufficient to certify the forward invariance of C.

Theorem 3.2 If h is a barrier function for (2.1), then C is forward invariant.

Proof The properties of α and (3.6) ensure that for x ∈ ∂C, L f h(x) ≥ 0 and the conclusion
of the theorem follows from Theorem 3.1. 

One may note that in Definition 3.4, condition (3.6) is required to hold over Rn (or on some
open set D ⊃ C as noted in Remark 3.1), rather than only on C. The benefit of enforcing such
a condition over a larger set containing C is that it endows the barrier function with a certain
degree of robustness in the sense that trajectories that begin outside C will asymptotically
approach C in the limit as time goes to infinity. To formalize this idea we introduce the
notion of stability with respect to sets.

Definition 3.5 A closed forward invariant set C ⊂ Rn is said to be stable for (2.1) if for
each ε ∈ R>0 there exists a δ ∈ R>0 such that

x0 C < δ =⇒ x(t)C < ε, ∀t ∈ R≥0 . (3.7)

Definition 3.6 A closed forward invariant set C ⊂ Rn is said to be asymptotically stable


for (2.1) if it is stable and δ is chosen such that

x0 C < δ =⇒ lim x(t)C = 0. (3.8)


t→∞

Using the above definitions of stability with respect to sets allows for establishing the
following result.
3.2 Control Barrier Functions 33

Proposition 3.1 Let h be a barrier function for (2.1) on a set C ⊂ Rn as in (3.3) and suppose
either one of the following conditions hold:

• the vector field f is forward complete;


• the set C is compact.

Then, C is asymptotically stable for (2.1).

3.2 Control Barrier Functions

In this section we discuss an extension of barrier functions to control systems of the form
(2.10)
ẋ = f (x) + g(x)u,
via the notion of a control barrier function (CBF). Before introducing CBFs we must for-
mally define what it means for a control system with inputs (as opposed to a closed-loop
system) to be safe on a set C. Note that the definition of forward invariance introduced in
the previous section cannot be directly applied to (2.10) as the trajectories of (2.10) cannot
be determined without first fixing a controller. Rather than fixing a feedback controller for
(2.10) and then studying the safety of the closed-loop system, we wish to study the intrinsic
properties of (2.10) and determine if it is possible to design a feedback controller that ren-
ders a set forward invariant for the resulting closed-loop system. Such a property is captured
using the notion of controlled invariance.

Definition 3.7 (Controlled invariance) A set C ⊂ Rn is said to be controlled invariant


for (2.10) if there exists a locally Lipschitz feedback controller k : Rn → U such that C
is forward invariant for the closed-loop system ẋ = f (x) + g(x)k(x). If C is controlled
invariant for (2.10) then C is said to be a safe set.

Analogous to how barrier functions certify the forward invariance of sets for closed-
loop systems, CBFs certify the controlled invariance, and thus the safety, of sets for control
systems.

Definition 3.8 (Control barrier function) Let h : Rn → R be a continuously differentiable


function defining a set C ⊂ Rn as in (3.3) such that ∇h(x) = 0 for all x ∈ ∂C. Then, h is
said to be a control barrier function (CBF) for (2.10) on C if there exists an α ∈ K∞
e such

that for all x ∈ Rn


sup {L f h(x) + L g h(x)u} > −α(h(x)). (3.9)
u∈U
34 3 Safety-Critical Control

In other words, h is a CBF for (2.10) if for each x ∈ Rn there exists an input u ∈ U
satisfying the barrier condition

L f h(x) + L g h(x)u ≥ −α(h(x)).

As with the definition of barrier functions from the previous section, the CBF condition
(3.9) can be generalized to hold on some open set D ⊃ C rather than all of Rn (see Remark
3.1).
Similar to CLFs, the definition of a CBF allows for defining the set-valued map K cbf :
Rn ⇒ U as
K cbf (x) = {u ∈ U | L f h(x) + L g h(x)u ≥ −α(h(x))}, (3.10)
which assigns to each x ∈ Rn a set K cbf (x) ⊂ U of control values satisfying condition (3.9).
We again note the distinction between the strict inequality used in the definition of a CBF
(3.9) and the nonstrict inequality used in the definition of K cbf (3.10). The purpose of the
strict inequality is twofold: (1) the strict inequality helps establish Lipschitz continuity of
the resulting QP-based controller; (2) the strict inequality ensures that the supremum in (3.9)
can actually be achieved by a given controller. Similar to CLFs, determining if h is a CBF
depends heavily on the behavior of L g h. When U = Rm , the condition in (3.9) is equivalent
to
∀x ∈ Rn : L g h(x) = 0 =⇒ L f h(x) > −α(h(x)), (3.11)
implying that whenever L g h(x) = 0 one can always pick some u to satisfy the CBF condition
from (3.9). Determining the validity of h as a CBF when U = Rm is a much more challenging
problem–some insights are provided in Section 3.4. The main theoretical result with regard
to CBFs is that the existence of such a function implies the controlled invariance of C.

Theorem 3.3 Let h : Rn → R be a CBF for (2.10) on a set C ⊂ Rn . Then, any locally
Lipschitz controller u = k(x) satisfying k(x) ∈ K cbf (x) for all x ∈ Rn renders C forward
invariant for the closed-loop system.

Proof With u = k(x), the closed-loop vector field f cl (x) := f (x) + g(x)k(x) is locally
Lipschitz and satisfies L fcl (x) ≥ −α(h(x)) for all x ∈ Rn . Hence, h is barrier function for
ẋ = f cl (x) and forward invariance follows from Theorem 3.2. 

One of the practical benefits of CBFs is their ability to act as a safety filter for a predesigned
control policy k0 : Rn → Rm , which may not have been designed to guarantee safety a
priori. In particular, CBFs allow one to modify such a policy in a minimally invasive fashion
to guarantee safety by filtering out unsafe actions from k0 through the optimization-based
controller
1
k(x) = arg min u − k0 (x)2
u∈U 2 (3.12)
subject to L f h(x) + L g h(x)u ≥ −α(h(x)),
3.2 Control Barrier Functions 35

which is a QP for a given x if U = Rm or U is a convex polytope. For the controller in


(3.12) to provide safety guarantees, it is important that it is Lipschitz continuous as required
by Theorem (3.9). Fortunately, when U = Rm the QP in (3.12) is a special case of (2.20),
and thus has a closed-form solution given by

⎨k0 (x) if ψ(x) ≥ 0
k(x) = ψ(x) (3.13)
⎩k0 (x) − 2 L g h(x) if ψ(x) < 0,
L g h(x) 

where ψ(x) := L f h(x) + L g h(x)k0 (x) + α(h(x)), and is locally Lipschitz.


Rather than filtering a pre-designed control policy through the QP in (3.12), it is also
possible to unify both CBF (safety) and CLF (performance) objectives in a single QP. Since
the constraints corresponding to these objectives may be conflicting (in the sense that there
may not exist a single control value satisfy both constraints simultaneously), it is necessary to
relax one of the objectives by treating the corresponding constraint as a soft constraint. Since
safety objectives are typically “hard” in the sense that they cannot be relaxed, it is common
to relax the performance objective (represented by a CLF) by replacing the standard CLF
condition V̇ ≤ −γ (V ) for some γ ∈ K with the soft constraint V̇ ≤ −γ (V ) + δ, where
δ ∈ R is a slack variable. Taking such an approach, one can solve the QP
1
k(x), δ ∗ = arg min u H (x)u + F(x) u + pδ 2
u∈U,δ∈R 2
(3.14)
subject to L f h(x) + L g h(x)u ≥ −α(h(x))
L f V (x) + L g V (x)u ≤ −γ (V (x)) + δ

where H : Rn → Rm×m is locally Lipschitz and H (x) is a positive definite matrix for each
x ∈ Rn , F : Rn → Rm is locally Lipschitz, and p ∈ R>0 is a weight that penalizes the
magnitude of the CLF relaxation, to obtain a controller satisfying the CBF conditions and
(relaxed) CLF conditions.
We close this section by providing two example of applications where CBFs are well-
suited to address the competing objectives of stability and safety.
Example 3.1 (Adaptive cruise control) A problem that has served as motivation for
the development of CBFs is the adaptive cruise control (ACC) problem. This problem
considers a vehicle on the road tasked with driving at a specified speed, while maintaining
a safe distance behind the preceding vehicle on the road. Solving such a problem requires
designing a feedback controller that balances the potentially competing objectives of driving
at a desired speed (stability) and maintaining an appropriate distance behind the lead vehicle
on the road (safety) and has thus serves as a well-motivated scenario for benchmarking CBF-
based controllers.
We formalize the ACC problem by introducing a system with state x = [v d] ∈ R2 , where
v ∈ R is the velocity of the ego (controlled) vehicle and d ∈ R is the distance between the
ego vehicle and the lead vehicle. The dynamics of the system are given by
36 3 Safety-Critical Control

  1   1
v̇ − M f0 + f1 v + f2 v2
= + M u, (3.15)
ḋ vl − v 0
ẋ f (x) g(x)

where M ∈ R>0 is the mass of the ego vehicle, f 0 , f 1 , f 2 ∈ R>0 are aerodynamic drag
coefficients, and vl ∈ R>0 is the velocity of the lead vehicle. The control input u ∈ R of the
system is the wheel force of the ego vehicle. The specification “drive at a desired speed” can
be encoded as the stabilization of the point v = vd . To design a nominal controller achieving
this stabilization objective, we take a feedback linearization approach by defining the output
y = v − vd . Following the steps outlined in Sect. 2.3.1, we construct a nominal controller
k0 whose stability can be certified using the CLF V (y) = 21 y 2 . The specification “maintain
a safe distance behind the lead vehicle” can be encoded through the safe set C ⊂ R2 defined
as the zero superlevel set of
h(x) = d − τd v,
where τd ∈ R>0 is the desired time headway.3 Note that h is indeed a CBF when U = R as

  1 τd
L g h(x) = −τd 1 M = − ,
0 M
∇h(x)

which satisfies L g h(x) = 0 for all x ∈ R2 and ∇h(x) = 0 for all x ∈ ∂C.
Results from simulations of the closed-loop system under the resulting CBF-based con-
troller from (3.12) are provided in Fig. 3.1. Note that the ego vehicle initially accelerates
from its initial velocity to its desired velocity (left plot); however, as the distance between

Fig. 3.1 Example simulation of the ACC scenario. The left plot illustrates the evolution of the ego
vehicle’s velocity with the dashed line indicating the desired velocity vd = 24 m/s. The right plot
illustrates the evolution of the CBF along the system trajectory, where the dashed line denotes h = 0

3 This is typically chosen as τ = 1.8.


d
3.2 Control Barrier Functions 37

the ego and lead vehicle decreases (right plot), the ego vehicle must slow down to ensure it
remains a safe distance behind the lead vehicle at all times.
Example 3.2 (Robot motion planning) Another scenario in which CBFs have shown
success in balancing the competing objectives of performance and safety is in the robot
motion planning problem. Here, the objective is to design a feedback controller that drives
a robotic system from some initial configuration to a goal configuration while avoiding
obstacles in the workspace. To illustrate the application of CBFs to this scenario, we consider
an extremely simple version of this problem in which the objective is to drive a planar single
integrator ẋ = u (i.e., we directly control the robot’s velocity) to a goal location while
avoiding a circular obstacle in the workspace. The goal-reaching (stability) objective can
be accomplished by considering the CLF V (x) = 21 x x, whereas the obstacle avoidance
(safety) objective can be accomplished by considering the set C ⊂ R2 defined as the zero
superlevel set of
h(x) = x − xo 2 − ro2 ,
where xo ∈ R2 is the location of the obstacle’s center and ro ∈ R>0 is its radius. This
function is a CBF when U = R2 since ∇h(x) = 2(x − xo ) so that ∇h(x) = 0 for x ∈
∂C, and L g h(x) = 2(x − xo ) , which is non-zero everywhere except for the center of the
obstacle. Hence, although the CBF conditions do not hold on all of Rn , they do hold on
some open set D ⊃ C, which is sufficient to guarantee the controlled invariance of C (see
Remark 3.1).
We use this simple example to demonstrate the impact of changing the hyperparameters
of the CBF, namely, the extended class K∞ function α, as well as some limitations of the
CBF approach, in general. Common choices of extended class K∞ function α include any
power function of the form α(r ) = ar c , with a ∈ R>0 and c any odd number, with c = 1
or c = 3 being favored in practice. In Fig. 3.2 we demonstrate the impact of changing both

Fig. 3.2 Simulation results for the robot motion planning scenario for different choices of extended
class K function α. The left plot illustrates the trajectories of the closed-loop system, where the
gray disk denotes the obstacle, and the right plot shows the evolutions of the CBF along the system
dynamics
38 3 Safety-Critical Control

the coefficient a and power c from α on the resulting trajectory of the closed-loop system
under the CBF-QP controller from (3.12), with the nominal policy chosen as the CLF-QP
controller. Generally speaking, increasing a leads to more “aggressive” behavior in the
sense that the robot is able to approach the obstacle very quickly, whereas smaller values of
a result in more conservative behavior. For the particular initial condition used in Fig. 3.2,
taking c = 3 (i.e., a cubic α) results in more conservative behavior compared to taking c = 1
(i.e., a linear α). In general, however, cubic extended class K∞ functions allow for quickly
approaching the boundary of C when beginning far way, but become more conservative
(compared to linear α) near the safe set’s boundary.
We close this example with a short discussion on fundamental limitations of the CBF
approach in relation to satisfaction of stabilization objectives. When designing a CBF-based
controller for the robot motion planning problem, our objective is to assign a vector field to

Fig. 3.3 Closed-loop vector field for the robot motion planning problem with α(r ) = r , where the
gray disk denotes the obstacle. The colors of the vectors at each x indicate the magnitude of the
closed-loop vector field f (x) + g(x)k(x) evaluated at that point, which is simply the magnitude of
the control input at each x. Lighter and darker colors correspond to larger and smaller magnitudes of
the control input, respectively
3.3 High Order Control Barrier Functions 39

the closed-loop dynamics such that the resulting trajectories flow around an obstacle and to
a goal location. This idea is depicted in Fig. 3.3. By inspecting the closed-loop vector field
in Fig. 3.3, it appears that simultaneous objectives of stability and safety will be achieved
from almost all initial conditions. However, for initial conditions behind the obstacle (i.e.,
in the top left corner of Fig. 3.3) that satisfy x1 = −x2 , the stabilization objective is not
achieved, as the robot gets “stuck” behind the obstacle. We note that this phenomenon is
not solely due to the fact that the stabilization objective is treated as “soft” constraint in the
QP, but rather due to more fundamental limitations of the behavior that can be exhibited by
continuous vector fields. Such limitations have not been extremely detrimental in practice;
however, they still must be taken into consideration when designing controllers tasked with
guaranteeing both stability and safety.

3.3 High Order Control Barrier Functions

In the previous section we introduced CBFs as a tool to synthesize controllers that guarantee
safety, a concept formally encapsulated using the notion of set invariance. Once a CBF
is known, it can be used within an optimization-based framework to filter unsafe actions
out of an a priori designed controller, or combined with CLFs to generate control actions
that mediate the potentially conflicting objectives of performance and safety. Although
constructing a controller given a CBF is straightforward, constructing a CBF is a much more
challenging problem as it implicitly requires knowledge of a controlled invariant set that can
be described as the zero superlevel set of a single continuously differentiable function h.
The challenges in constructing CBFs become more apparent when one makes the distinc-
tion between a state constraint set and a safe set. As discussed in the previous section, a set
C ⊂ Rn is considered safe for the nonlinear control system (2.10) if it is controlled invariant.
That is, a set is a safe set for (2.10) if it is possible to render the set forward invariant for
the resulting closed-loop system through the appropriate design of a feedback controller.
On the other hand, a state constraint set is simply the set of states that are deemed by the
user to not be in violation of a given state constraint. In some simple cases these sets may
coincide; in general, however, they may be very different. For example, in the robot motion
planning problem introduced in the previous section, the state constraint set corresponds to
all positions that are outside the obstacle. This set is also a safe set when the control input is
simply the robot’s velocity (i.e., a single integrator model) since the state constraint serves
as CBF. When the control input is acceleration and the robot’s state consists of both position
and velocity (i.e., a double integrator model), however, the state constraint set is no longer
a controlled invariant set as states on the boundary of such a set will contain velocities that
direct the robot into the obstacle.
In this section, we provide a methodology to construct safe sets for nonlinear control
systems from user-specified state constraint sets. That is, given a state constraint set C0 ⊂ Rn
whose controlled invariance we cannot certify, we aim to produce a controlled invariant set
40 3 Safety-Critical Control

C ⊂ Rn such that C ⊂ C0 . If such a controlled invariant set can be found, then the original
state constraint can be conservatively enforced by designing a feedback controller rendering
C forward invariant for the closed-loop system. In particular, we consider state constraint
sets of the form
C0 = {x ∈ Rn | h(x) ≥ 0}, (3.16)
where h : Rn → R is continuously differentiable, which is of the same form as (3.3). As
discussed in the previous section, if h is a CBF then clearly such a state constraint set is
also a controlled invariant set. However, if L g h(x) = 0 for all x ∈ Rn , then h is unlikely to
be a CBF since the control input has no influence over the behavior of ḣ. Such a situation
arises, for example, when (3.16) represents a constraint on the configuration or kinematics
of a mechanical system, but does not take into account the dynamics where the control input
will eventually enter. This phenomenon is related to the relative degree of h, which we
informally introduced in Sect. 2.3 in the context of feedback linearization. We now provide
a formal definition of relative degree for a scalar function.

Definition 3.9 (Relative degree) A scalar function h : Rn → R is said to have relative


degree r ∈ N with respect to (2.10) at a point x ∈ Rn if

1. h is r -times continuously differentiable at x;


2. there exists a δ ∈ R>0 such that for all y ∈ Bδ (x) and for each i < r − 1, we have
L g L if h(y) = 0;
3. L g L rf−1 h(x) = 0.

If h has relative degree r for each point in a set D ⊂ Rn , then h is said to have relative
degree r on D.

Remark 3.2 With a slight abuse of terminology, we will often say that a scalar function
h : Rn → R has relative degree r ∈ N to mean that it has relative degree r for at least one
point in Rn .

We now illustrate many of the ideas introduced in this section thus far using the following
running example.
Example 3.3 (Inverted pendulum) To illustrate many of the concepts introduced in this
section, we now introduce an example that we will revisit later. We consider the inverted
pendulum from (2.56) with state x = [q q̇] , where q ∈ R denotes the angular position of
the pendulum, and the dynamics are reproduced here for convenience:
   
q̇ 0
ẋ = g + 1 u.
sin(q) − mb 2 q̇ m 2

f (x) g(x)
3.3 High Order Control Barrier Functions 41

Our objective is to design a feedback controller that ensures the angular position of the pen-
dulum remains less than π4 radians. Such an objective can be encapsulated by the constraint
π
h(x) = 4 − q,

which induces a constraint set defined as the zero superlevel set of h. Is it possible to
use h as a CBF to render such a constraint set forward invariant? Computing the gradient
∇h(x) = [−1 0] and corresponding Lie derivatives yields
 
  q̇
L f h(x) = −1 0 g = −q̇,
sin(q) − mb 2 q̇
 
  0
L g h(x) = −1 0 1 = 0.
m 2

As L g h(x) = 0 for all x ∈ Rn , the relative degree of h is greater than one for all x ∈ Rn .
That is, it is impossible for the control input to influence the change of h along the system
vector field and thus h is not a CBF. The main challenge in using h as a CBF is that,
although it defines a relatively simple constraint on the system’s configuration, it does not
fully capture the behavior necessary to prevent violation of such a constraint. For example,
if the pendulum is moving rapidly towards the boundary of the constraint set it may be
impossible to stop the pendulum before the constraint is violated. Generally speaking, one
must take into account the full dynamics of the system when specifying the safe set via h.
For some simple examples, such as the one presented here, there are better choices of h
that will yield a valid CBF. In general, however, encoding the behavior necessary to prevent
constraint violation in a single continuously differentiable h may be very challenging. In
what follows we provide a partial solution to this challenge based upon the observation that,
although we may not be able to directly influence the behavior ḣ, we may be able to control
the behavior of higher order Lie derivatives of h. For example, by computing
 
  0 1
L g L f h(x) = ∇ L f h(x) g(x) = 0 −1 1 = − 2,
m 2 m

we see that h has relative degree 2 for all x ∈ Rn , implying that the control input may
influence the behavior of ḧ for all x ∈ Rn .
We provide a solution to the problem outlined in the preceding example by introducing
the notion of a high order control barrier function (HOCBF). In essence, the HOCBF
method works by dynamically extending the original constraint function h and then placing
conditions on the higher order derivatives of h that are sufficient to guarantee satisfaction of
the original constraint. As will be shown shortly, such an approach provides a natural and
systematic method to construct safe sets from user-defined safety constraints.
42 3 Safety-Critical Control

We begin our exposition by considering a state constraint function h : Rn → R as in


(3.16) with relative degree r ∈ N, which will be dynamically extended to access the control
input. That is, we compute the derivative of h along the system dynamics (2.10) until the
control input appears. To this end, consider the collection of functions

ψ0 (x) = h(x),
(3.17)
ψi (x) = ψ̇i−1 (x) + αi (ψi−1 (x)), ∀i ∈ {1, . . . , r − 1},

where each αi ∈ K∞ e . Note that as h has relative degree r , each ψ for i ∈ {0, . . . , r − 1} is
i
independent of u for all x ∈ Rn , while ψ̇r −1 (x, u) will depend on u at least in some open
subset of Rn . We associate to each ψi , i ∈ {0, . . . , r − 1}, a set Ci ⊂ Rn defined as the zero
superlevel set of ψi :
Ci := {x ∈ Rn | ψi (x) ≥ 0}, (3.18)
whose intersection
−1
r
C := Ci , (3.19)
i=0
will serve as the set whose controlled invariance we wish to certify using the notion of
HOCBF.

Definition 3.10 (High order CBF) Let h : Rn → R have relative degree r ∈ N for (2.10)
that recursively defines a set C ⊂ Rn as in (3.19) such that ∇ψi (x) = 0 for all x ∈ ∂Ci for
each i ∈ {0, . . . , r − 1}. Then, h is said to be a high order control barrier function (HOCBF)
for (2.10) on C if there exists αr ∈ K∞ e such that for all x ∈ Rn

 
sup L f ψr −1 (x) + L g ψr −1 (x)u > −αr (ψr −1 (x)). (3.20)
u∈U

Remark 3.3 The HOCBF condition (3.20) can be expressed in terms of the original con-
straint function h by noting that
r −1

L f ψr −1 (x) = L rf h(x) + L if (αr −i ◦ ψr −i−1 )(x),
i=1
L g ψr −1 (x) = L g L rf−1 h(x).

Remark 3.4 As with the CBFs of the previous section, the above definition can be gener-
alized to hold only on some open set D ⊂ Rn containing C rather than all of Rn .

Example 3.4 (Inverted pendulum (continued)) Continuing on with our running example,
our objective is now to investigate the impact of dynamically extending h(x) = π4 − q to
compute a candidate safe set as in (3.19). We have already seen that h has relative degree 2
3.3 High Order Control Barrier Functions 43

Fig. 3.4 Depiction of the HOCBF induced safe set for the inverted pendulum. The left plot illustrates
the resulting safe set when taking α1 as a linear function whereas the right plot illustrates the resulting
safe set when taking α1 as a cubic function. In each plot, the green region corresponds to the safe set
(3.19), the gray region corresponds to states that lie in the state constraint set but not the safe set, and
the red region corresponds to states that are in violation of the state constraint. In each plot, the solid
black curve denotes the boundary of the safe set. Note that, as the safe set is unbounded, these plots
only depict a portion of C–the green region extends infinitely far down the q̇ axis and infinitely far to
the left along the q axis

for all x ∈ Rn . We begin by computing the sequence of functions from (3.17) as


π
ψ0 (x) = h(x) = 4 −q
π 
ψ1 (x) = ψ̇0 (x) + α1 (ψ0 (x)) = −q̇ + α1 4 −q ,

for some α1 ∈ K∞ e . Each of these functions satisfy ∇ψ (x) = 0 whenever ψ (x) = 0 and
i i
are used to defined two closed sets as

C0 = {x ∈ R2 | π4 − q ≥ 0}
 
C1 = {x ∈ R2 | − q̇ + α1 π4 − q ≥ 0}.

These sets are used to define a candidate safe set as C = C0 ∩ C1 , which is illustrated in
Fig. 3.4 for different choices of α1 .
The main idea behind the definition of C is that the dependence of the safety requirement
on the higher order dynamics is implicitly encoded through higher order derivatives of h.
For this particular example, the resulting safe set C dictates that as the angular position of the
pendulum approaches the constraint boundary, the velocity must decrease and eventually
become non-positive at the boundary. The choice of extended class K∞ function α1 used in
the definition of C determines the manner in which the velocity may decrease.
Following a similar approach to the standard CBF case, a HOCBF allows us to define
the set-valued map K ψ : Rn ⇒ U assigning to each x ∈ Rn the set
44 3 Safety-Critical Control

K ψ (x) := {u ∈ U | L f ψr −1 (x) + L g ψr −1 (x)u ≥ −αr (ψr −1 (x))}, (3.21)

of control values satisfying the HOCBF condition from (3.20). Before proceeding with the
main technical result regarding HOCBFs, we require a few properties regarding tangent
cones of sets defined as in (3.19). First, note that (3.19) can be expressed as

C = {x ∈ Rn | ∀i ∈ {0, . . . , r − 1}, ψi (x) ≥ 0}


∂C = {x ∈ Rn | ∃i ∈ {0, . . . , r − 1}, ψi (x) = 0} (3.22)
Int(C) = {x ∈ R | ∀i ∈ {0, . . . , r − 1}, ψi (x) > 0}.
n

Next, denote the set of all active constraints of C at a point x by

AC (x) := {i ∈ {0, . . . , r − 1} | ψi (x) = 0}. (3.23)

Then, the tangent cone to C at a point x can be expressed as

TC (x) = {v ∈ Rn | ∀i ∈ AC (x), ∇ψi (x) v ≥ 0}, (3.24)

provided that ∇ψi (x) = 0 whenever ψi (x) = 0 for each i ∈ {0, . . . , r − 1}. The following
theorem shows that the existence of a HOCBF is sufficient to enforce forward invariance of
C and thus satisfaction of the original safety constraint.

Theorem 3.4 Let h : Rn → R be a HOCBF for (2.10) on a set C as in (3.19). Then, any
locally Lipschitz controller u = k(x) satisfying k(x) ∈ K ψ (x) for all x ∈ Rn renders C
forward invariant for the closed-loop system.

Proof Define f cl (x) := f (x) + g(x)k(x) as the closed-loop vector field of (2.10) under the
controller u = k(x). As the controller is locally Lipschitz, the closed-loop vector field is as
well. By the definitions of ψi from (3.17) we have

ψ̇i−1 (x) = L fcl ψi−1 (x) = ψi (x) − αi (ψi−1 (x)), ∀i ∈ {1, . . . , r − 1},

or, equivalently,
L fcl ψ0 (x) = ψ1 (x) − α1 (ψ0 (x))
L fcl ψ1 (x) = ψ2 (x) − α2 (ψ1 (x))
..
.
L fcl ψr −2 (x) = ψr −1 (x) − αr −1 (ψr −2 (x)).
By (3.22) we have that for all x ∈ C, ψi (x) ≥ 0 for each i ∈ {0, . . . , r − 1}. Hence, for all
x ∈C
3.3 High Order Control Barrier Functions 45

L fcl ψ0 (x) ≥ − α1 (ψ0 (x))


L fcl ψ1 (x) ≥ − α2 (ψ1 (x))
..
.
L fcl ψr −2 (x) ≥ − αr −1 (ψr −2 (x)).
By (3.21) the controller u = k(x) ∈ K ψ (x) ensures that for all x ∈ Rn

L fcl ψr −1 (x) = L f ψr −1 (x) + L g ψr −1 (x)k(x) ≥ −αr (ψr −1 (x)).

It thus follows from the two preceding equations that, for all x ∈ C,

L fcl ψi−1 (x) ≥ −αi (ψi−1 (x)), ∀i ∈ {1, . . . , r }.

Hence, for all x ∈ ∂C, we have

L fcl ψi (x) ≥ 0, ∀i ∈ AC (x).

By the definition of TC from (3.24), the preceding argument implies

f cl (x) ∈ TC (x), ∀x ∈ ∂C,

and the forward invariance of C follows from Theorem 3.1. 

Note that the preceding theorem establishes forward invariance of C from (3.19) and
not of the constraint set C0 from (3.16). That is, an initial condition of x0 ∈ C0 does not
necessarily guarantee that x(t) ∈ C0 for all t ∈ I (x0 ). Rather, the theorem asserts that x0 ∈
C =⇒ x(t) ∈ C for all t ∈ I (x0 ), which is sufficient to guarantee that x(t) ∈ C0 for all
t ∈ I (x0 ) since C ⊂ C0 . Similar to CBFs, the motivation behind enforcing the HOCBF
conditions on a larger set containing C is to endow the HOCBF with a certain degree of
robustness, in the sense that solutions that begin outside or leave4 C will asymptotically
approach C. Indeed, under certain conditions, the set C is also asymptotically stable for the
closed-loop system when h is a HOCBF.

Proposition 3.2 Let h be a HOCBF for (2.10) on a set C as in (3.19) and assume that C
is compact. Then, any locally Lipschitz controller u = k(x) satisfying k(x) ∈ K ψ (x) for all
x ∈ Rn renders C asymptotically stable.

When h is a HOCBF for (2.10), controllers enforcing forward invariance of C can be con-
structed similarly to the standard CBF case. For example, given a locally Lipschitz nominal
control policy k0 (x) one can solve the QP

4 Due to external perturbations/disturbances or unmodeled dynamics.


46 3 Safety-Critical Control

k(x) = arg min 2 u


1
− k0 (x)2
u∈U (3.25)
subject to L f ψr −1 (x) + L g ψr −1 (x)u ≥ −αr (ψr −1 (x)),

to produce a controller u = k(x) satisfying k(x) ∈ K ψ (x) that acts as a safety filter for the
nominal policy. The results regarding Lipschitz continuity of the QP-based controllers in the
previous section can be directly extended to show Lipschitz continuity of the HOCBF-QP
in (3.25), and to provide a closed form solution to the QP when U = Rm as

⎨k0 (x) if (x) ≥ 0
k(x) = (x) (3.26)
⎩k0 (x) − 2 L g ψr −1 (x) if (x) < 0,
L g ψr −1 (x) 

where (x) := L f ψr −1 (x) + L g ψr −1 (x)k0 (x) + αr (ψr −1 (x)). Of course, the validity of
this approach is conditioned upon h being a HOCBF. When L g ψr −1 (x) = 0 for all x ∈ Rn
and U = Rm , h is a HOCBF since it is always possible to pick an input u ∈ Rm satisfying
the HOCBF condition (3.20).
Example 3.5 (Inverted pendulum (continued)) We continue our inverted pendulum
example by investigating the validity of h as a HOCBF and therefore, by Theorem 3.4,
the controlled invariance of C. We have already seen that h has relative degree 2 and that
∇ψi (x) = 0 for x ∈ ∂Ci for each i ∈ {0, 1}. It thus remains to show that h satisfies the
HOCBF condition (3.20), which, by Remark 3.3, can be done by analyzing the behavior of
L g L f h. As
1
L g L f h(x) = − 2 ,
m
which satisfies L g L f h(x) = 0 for all x ∈ R2 , h is indeed a HOCBF when U = R. Thus,
the controller in (3.25) renders C forward invariant by Theorem 3.4. The closed-loop vector
field of the inverted pendulum under such a controller with α1 (r ) = α2 (r ) = r is provided
in Fig. 3.5.

In the preceding example, the validity of h as a HOCBF was guaranteed by ensur-


ing that L g ψr −1 h(x) = 0 for all x ∈ Rn . However, if there exist points in Rn such that
L g ψr −1 h(x) = 0, then (3.20) only holds if

∀x ∈ Rn : L g ψr −1 (x) = 0 =⇒ L f ψr −1 (x) > −αr (ψr −1 (x)).

Unfortunately, the above condition is unlikely to hold at all points in Rn (or even all points
in C) where L g L rf−1 h vanishes.
Example 3.6 (Singularities in HOCBFs) In this example, we consider the same inverted
pendulum as in the previous example, but slightly modify the state constraint to
π
h(x) = 4 − q 2,
3.3 High Order Control Barrier Functions 47

Fig. 3.5 Closed-loop vector field of the inverted pendulum under the HOCBF-QP controller from
(3.25) plotted over safe set and constraint set. The dashed black line indicates the boundary of the
state constraint set C0 , the solid black line denotes the boundary of the resulting safe set C, and colors
of the arrows indicate the magnitude of the vectors

which encodes the requirement that the pendulum’s configuration should satisfy q ∈
[− π4 , π4 ]. The gradient of h is

−2q
∇h(x) = ,
0
and  
  q̇
L f h(x) = −2q 0 g = −2q q̇
sin(q) − mb 2 q̇
 
  0
L g h(x) = −2q 0 1 = 0,
m 2

so h has relative degree larger than one. Computing


48 3 Safety-Critical Control

 
  0 2q
L g L f h(x) = −2q̇ −2q 1 =− ,
m 2 m 2
∇ L f h(x)

reveals that h has relative degree 2 everywhere except on the set {x ∈ R2 | q = 0}. Taking h
as a candidate HOCBF we compute
π
ψ0 (x) = 4 − q2
π 
ψ1 (x) = −2q q̇ + α1 4 − q2 ,

for some α1 ∈ K∞ e . The resulting safe set C = C ∩ C is illustrated in Fig. 3.6 for α (r ) = r ,
0 1 1
where the dotted red line denotes the set of points where L g L f h vanishes.
Since L g ψ1 (x) = L g L f h(x) = 0 when q = 0, we must ensure that L f ψ1 (x) >
−α2 (ψ1 (x)) for some α2 ∈ K∞ e at such points. We first note that


−2q̇ − 2q
∇ψ1 (x) = ,
−2q

and thus L f ψ1 (x) = −2q̇ 2 whenever q = 0. Hence, the HOCBF condition requires that

q = 0 =⇒ −2q̇ 2 > −α2 ( π4 ) =⇒ q̇ 2 < 21 α2 ( π4 ).

That is, the magnitude of velocity must be sufficiently small at points where we cannot
directly influence the higher order derivatives of the HOCBF candidate. As such points of
high velocity are contained in C, h is not a HOCBF. The inability to satisfy the HOCBF
conditions at such points will render the HOCBF-QP (3.25) infeasible at such points, giving
rise to singularities in the control input and, consequently, the closed-loop vector field. For
example, in the right plot of Fig. 3.6, the HOCBF conditions will be violated along the dotted
red line for any points in the shaded regions.

The previous example demonstrates that even relatively simple safety constraints can
lead to invalid HOCBFs due to points where L g L rf−1 h(x) = 0. Fortunately, provided the
set of points where L g L rf−1 h(x) = 0 does not lie on the boundary of the constraint set ∂C0 ,
then it is always possible to make a simple modification to h that generates a valid HOCBF.

Proposition 3.3 (Removing the singularity) Consider a state constraint set C0 ⊂ Rn


defined as the zero superlevel set of h 0 : Rn → R as in (3.16) and assume h 0 has rela-
tive degree r ∈ N. Let
E := {x ∈ Rn | L g L rf−1 h 0 (x) = 0},

be the set of all points in Rn where L g L rf−1 h 0 (x) = 0 and suppose there exists an ∈ R>0
such that E is completely contained within the -superlevel set of h 0 :
3.3 High Order Control Barrier Functions 49

Fig. 3.6 Safe set for the inverted pendulum with a modified safety constraint h(x) = π4 − q 2 . In
each plot the safe set is generated using α1 (r ) = r whose boundary is depicted by the solid black
curve. The dashed red line in each plot indicates the set of points where L g L f h(x) = 0, which, for
 is simply q = 0. In the right plot the shaded regions represent the set of states where
this example,
q̇ 2 ≥ 21 α2 π4 , which is the set of points where the HOCBF condition is violated at q = 0. This set
is generated by taking α2 (r ) = r

E ⊂ {x ∈ Rn | h 0 (x) ≥ }.

Define
h(x) := τ (h 0 (x)/ ), (3.27)
where τ : R → R is any sufficiently smooth function satisfying


⎨τ (0) = 0,

τ (s) = 1, for s ≥ 1 (3.28)


⎩∇τ (s) > 0, for s < 1.

Provided U = Rm , then h from (3.27) is a HOCBF.

The above result exploits the fact that the original invariance conditions introduced in
Sect. 3.1 only need to be enforced on the boundary of a given set. The use of extended class K
functions in the CBF and HOCBF approach permits the extension of such conditions to the
entirety of a set, but brings additional challenges in verifying that such conditions are satisfied
at points where the control input cannot influence the evolution of the CBF/HOCBF. The
procedure outlined in Proposition 3.3 essentially modifies the HOCBF candidate h 0 so that
the HOCBF conditions trivially hold at points in the constraint set where L g L rf−1 h 0 (x) = 0.
Provided such points lie strictly in the interior of the constraint set, then no modifications
are made to the HOCBF candidate on the boundary of C0 .
Example 3.7 (Singularities in HOCBFs (continued) We continue our example of the
inverted pendulum with a demonstration of the procedure for generating valid HOCBFs
50 3 Safety-Critical Control

Fig. 3.7 Example of a transition function satisfying the criteria of Proposition 3.3 (left) and modified
HOCBF safe set for different values of (right). Here, the dashed black line denotes the boundary
of the state constraint set C0 and the solid lines of varying color denote the boundary of the resulting
safe set C for different values of as indicated in the legend

outlined in Proposition 3.3. Previously we saw that h 0 (x) = π4 − q 2 is not a valid HOCBF
as condition (3.20) fails to hold at all points in the candidate safe set where L g L f h 0 (x) = 0.
Such points occur when q = 0, which implies that

{x ∈ R2 | L g L f h 0 (x) = 0} ⊂ {x ∈ R2 | h 0 (x) ≥ },

for any ∈ (0, π4 ). It follows from Proposition 3.3 that h(x) = τ (h 0 (x)/ ) is a valid HOCBF
for any smooth τ satisfying (3.28). An example of such a function when h 0 has relative degree
r = 2 is given by 
(s − 1)3 + 1, if s ≤ 1,
τ (s) =
1, if s > 1,
which is illustrated in Fig. 3.7 (left). The safe set resulting from using h as a HOCBF for
different values of is shown in Fig. 3.7 (right).

Although the above approach provides a technique to remove singularities from the
HOCBF when the input is unconstrained, the resulting safe set becomes a poor approximation
of the viability kernel5 under input constraints as it allows for extremely high velocities
near the boundary of the constraint set (see Fig. 3.7). An alternative, albeit more heuristic,
approach is to simply use additional CBFs that prevent the system from entering regions in
which the HOCBF conditions are violated as shown in the following example.

Example 3.8 (Singularities in HOCBFs (continued)) We continue our running example


of the inverted pendulum by providing a heuristic approach to removing singularities from

5 The viability kernel is the maximum controlled invariant subset of the state constraint set.
3.3 High Order Control Barrier Functions 51

Fig. 3.8 The set resulting from


intersecting the HOCBF
candidate safe set with the
candidate safe set that restricts
the velocity of the pendulum
for different choices of α2 . The
curves of varying color denote
the boundary of the resulting
safe set for different choices of
α2 as noted in the legend

the HOCBF. As we saw previously, the constraint function h(x) = π4 − q 2 is not a HOCBF
for the inverted pendulum (at least not on all of C) because the HOCBF conditions are
violated at points in C where L g L f h(x) = 0. In particular, at such points of singularity, the
HOCBF conditions dictate that

q = 0 =⇒ q̇ 2 < 21 α2 ( π4 ),

where, recall that for simplicity, we have chosen α1 (r ) = r to construct the candidate safe
set. Rather than removing the singularity using Proposition 3.3, which results in a safe
set that permits extremely large velocities near the boundary of the constraint set, one can
alternatively define a new CBF candidate

h v (x) = 21 α2 ( π4 ) − c − q̇ 2 ,

for some arbitrarily small c ∈ R>0 , that defines a set Cv := {x ∈ Rn | h v (x) ≥ 0} whose
intersection with the original candidate safe set C ∩ Cv represents the subset of C where the
HOCBF conditions hold. Note that h v has relative degree one since
  
0   0 2q̇
∇h v (x) = , L g h v (x) = 0 −2q̇ 1 = − 2.
−2q̇ m 2 m

Moreover, h v is a valid CBF on Cv when U = Rm since ∇h v (x) = 0 whenever h v (x) = 0


for c sufficiently small and L g h v (x) = 0 if and only if q̇ = 0, which implies that 0 >
−αv (h v (x)) whenever L g h v (x) = 0–a condition that holds everywhere on the interior6 of
Cv . The set resulting from intersecting C with Cv for different choices of α2 is shown
in Fig. 3.8. Although it is possible to individually satisfy the conditions imposed by the

6 Note that a given controller only needs to satisfy the nonstrict inequality to guarantee safety.
52 3 Safety-Critical Control

HOCBF and h v within the set depicted in Fig. 3.8, it is not clear if both conditions are
mutually satisfiable, especially under actuation limits, which is a challenging problem in
general.

3.4 Notes

In this chapter, we introduced the notion of safety of dynamical system using the formalism of
invariant sets, and how the recent development of control barrier functions (CBFs) facilitates
the design of controllers enforcing safety/set invariance. Necessary and sufficient conditions
for set invariance were first established by Nagumo in 1942 [1], with similar conditions being
independently rediscovered decades later by others [2, 3]. A proof of Nagumo’s Theorem
(Theorem 3.1) can be found in [4, Ch. 4.1] and a broader introduction to set invariance from
a control-theoretic perspective can be found in [5, 6].
The first modern version of a barrier function in the context of safety verification was
introduced in [7], where such a function was referred to a barrier certificate. Such barrier
certificates provided Lyapunov-like sufficient conditions for the verification of invariance
properties for nonlinear and hybrid systems. A limitation of these early instantiations of
barrier functions is that they effectively enforced the conditions of Corollary 3.1 over the
entirety of the safe rather than only on the boundary. Such an approach is overly restrictive
as it renders every superlevel set of the safe set forward invariant rather than only the zero
superlevel set. Other early attempts to define barrier functions in control theory came via
the notion of a barrier Lyapunov function (BLF) [8]. These BLFs operate under the same
premise as the barrier certificates mentioned above but are also positive definite, which
further restricts the class of safe sets that can be described using such a function.
The notion of a barrier function introduced in this chapter first appeared in the series of
papers [9–11]. Here, a subtle, yet largely consequential, modification to early definitions
of barrier functions was proposed by extending the conditions for set invariance from the
boundary of the safe set to the entirety of the safe set in a least-restrictive fashion. In essence,
these works replaced the earlier condition

ḣ(x) ≥ 0, ∀x ∈ C,

with the less restrictive condition

ḣ(x) ≥ −α(h(x)), ∀x ∈ Rn , α ∈ K e ,

which allows the value of the barrier function h to decrease along the system trajectory, but
never become negative. Originally, such conditions were formulated for reciprocal barrier
functions [9] that take unbounded values on the boundary of the safe set, but were quickly
extended to zeroing barrier functions in [10, 11] that vanish on the boundary of the safe set,
such as those proposed in Definition 3.4. Shifting from reciprocal to zeroing barrier functions
3.4 Notes 53

allowed for the development of robustness results for such barriers, such as Proposition 3.1,
that ensure the safe set is not only forward invariant, but also asymptotically stable. The
concept of a control barrier function (CBF) was first introduced in [12]; however, the more
recent version presented in this chapter was introduced in [9–11]. These works also made
popular the quadratic programming (QP) approach to multi-objective control. The adap-
tive cruise control example is taken from [9–11]. A further discussion on the history and
applications of CBFs can be found in [13].
Initial attempts to extend the CBF methodology to constraints with higher relative degree
were first explored in [14, 15] by leveraging a backstepping approach to construct CBFs from
relative degree 2 safety constraints. A general approach for constructing high order CBFs
(HOCBFs) was first developed in [16] using the notion of an Exponential Control Barrier
Functions (ECBFs). These ECBFs are essentially the same as the HOCBFs presented earlier
in this chapter except that the extended class K functions are limited to linear functions.
The HOCBF approach presented here that allows for the use of general extended class
K functions was first developed in [17, 18]. The observation that the HOCBF conditions
may be violated at points in the candidate safe set when the constraint function h does not
have a uniform relative degree was first pointed out in [19]. The method of “removing the
singularity” from the HOCBF was also proposed in [19] along with additional robustness
results that guarantee asymptotic stability of the safe set generated by HOCBFs–Propositions
3.2 and 3.3 were first stated and proved in [19] as well. Alternative methods to constructing
CBFs from high relative degree safety constraints rely on extensions of CLF backstepping
as introduced in Sect. 2.3.2 to CBF backstepping–a process introduced in [20].
Most methods to construct CBFs do so under the premise that the system’s control
authority is unlimited (i.e., U = Rm ). Constructing valid CBFs when additional actuation
limits are present is a challenging task and an active area of research. For simple systems
and constraints, it is often possible to analytically derive a CBF that respects actuation
bounds. For example, the authors of [21] construct CBFs for mobile robots modeled as
double integrators with acceleration bounds by determining the minimum braking distance
required to stop before colliding with obstacles. Such an approach is highly effective in
certain scenarios, but generally requires closed-form solutions to the system dynamics. This
idea is extended to general systems and constraints in [22–27] using the notion of a backup
CBF. This approach allows for systematically determining a control invariant subset of the
constraint set, which is implicitly represented using a backup control law that renders a
smaller set forward invariant. Notably, such an approach is applicable to safety constraints
with high relative degree and to systems with actuation bounds, but does require more
computational effort at run-time as it requires numerically integrating online the dynamics
of the system under the backup policy. Similar ideas in the context of HOCBFs have appeared
in [28], whereas [29, 30] provides an alternative approach to constructing HOCBFs under
actuation bounds by including additional constraints in the resulting QP that ensure the
system remains within a controlled invariant subset of the constraint set. Other approaches
to constructing valid CBFs and HOCBFs rely on sum-of-squares programming [31] and
machine learning [32, 33].
54 3 Safety-Critical Control

As briefly noted in Example 3.2, uniting CBFs and CLFs may fail to (asymptotically)
stabilize a system when the stability and safety conditions are not mutually satisfiable. In
Example 3.2 this manifests itself as the robot getting trapped in a “local minima” behind
the obstacle; however, more generally speaking, such a phenomenon arises from the funda-
mental limitations of the behavior that can be achieved using a continuous static feedback
controller. Such limitations were originally pointed out by Brockett in [34], a paper that
derived necessary conditions for stabilization by means of continuous static state feedback.
Further details on such limitations are discussed in [35, 36]. In the context of CBFs, the work
of [37] illustrates that the standard CBF-CLF-QP (3.14) provides no guarantees of stability
due to the use of the relaxation variable, even if simultaneous stabilization and safety is pos-
sible, and proposes a modification to (3.14) that can be used to establish local asymptotic
stability. Additionally, it was shown in [38] the (3.14) may induce additional asymptotically
stable points that may be located on the boundary of the safe set. Other works have looked
to address these challenges by deriving conditions under which CBFs may be shown to be
compatible with CLFs to design controllers that guarantee simultaneous stabilization and
safety [39].

References

1. Nagumo N (1942) Uber die lage der integralkurven gewöhnlicher differentialgleichungen. In:
Proceedings of the physical-mathematical society of Japan, vol 24, no 3
2. Brezis H (1970) On a characterization of flow-invariant sets. Commun Pure Appl Math 23:261–
263
3. Redheffer RM (1972) The theorems of bony and brezis on flow-invariant sets. Am Math Monthly
79(7):740–747
4. Abraham R, Marsden JE, Ratiu T (1988) Manifolds, tensor analysis, and applications, 2nd ed.
Springer
5. Blanchini F, Miani S (2008) Set-theoretic methods in control. Springer
6. Blanchini F (1999) Set invariance in control. Automatica 35(11):1747–1767
7. Prajna S, Jadbabaie A (2004) Safety verification of hybrid systems using barrier certificates. In:
Proceedings of the international workshop on hybrid systems: computation and control, pp 477–
492
8. Tee KP, Ge SS, Tay EH (2009) Barrier lyapunov functions for the control of output-constrained
nonlinear systems. Automatica 45(4):918–927
9. Ames AD, Grizzle JW, Tabuada P (2014) Control barrier function based quadratic programs with
application to adaptive cruise control. In: Proceedings of the IEEE conference on decision and
control, pp 6271–6278
10. Xu X, Tabuada P, Grizzle JW, Ames AD (2015) Robustness of control barrier functions for
safety critical control. In: Proceedings of the IFAC conference on analysis and design of hybrid
systems, pp 54–61
11. Ames AD, Xu X, Grizzle JW, Tabuada P (2017) Control barrier function based quadratic programs
for safety critical systems. IEEE Trans Autom Control 62(8):3861–3876
12. Wieland P, Allgöwer F (2007) Constructive safety using control barrier functions. In: Proceedings
of the IFAC symposium on nonlinear control systems
References 55

13. Ames AD, Coogan S, Egerstedt M, Notomista G, Sreenath K, Tabuada P (2019) Control barrier
functions: theory and applications. In: Proceedings of the European control conference, pp 3420–
3431
14. Hsu S, Xu X, Ames AD (2015) Control barrier function based quadratic programs with applica-
tion to bipedal robotic walking. In: Proceedings of the American control conference, pp 4542–
4548
15. Nguyen Q, Sreenath K (2015) Safety-critical control for dynamical bipedal walking with precise
footstep placement. In: Proceedings of the IFAC conference on analysis and design of hybrid
systems, pp 147–154
16. Nguyen Q, Sreenath K (2016) Exponential control barrier functions for enforcing high relative-
degree safety-critical constraints. In: Proceedings of the American control conference, pp 322–
328
17. Xiao W, Belta C (2019) Control barrier functions for systems with high relative degree. In:
Proceedings of the IEEE conference on decision and control, pp 474–479
18. Xiao W, Belta C (2022) High order control barrier functions. IEEE Trans Autom Control
67(7):3655–3662
19. Tan X, Cortez WS, Dimarogonas DV (2022) High-order barrier functions: robustness, safety and
performance-critical control. IEEE Trans Autom Control 67(6):3021–3028
20. Taylor AJ, Ong P, Molnar TG, Ames AD (2022) Safe backstepping with control barrier functions.
In: Proceedings of the IEEE conference on decision and control, pp 5775–5782
21. Wang L, Ames AD, Egerstedt M (2017) Safety barrier certificates for collisions-free multirobot
systems. IEEE Trans Robot 33(3):661–674
22. Gurriet T, Singletary A, Reher J, Ciarletta L, Feron E, Ames AD (2018) Towards a framework for
realizable safety critical control through active set invariance. In: Proceedings of the ACM/IEEE
international conference on cyber-physical systems, pp 98–106
23. Guirriet T, Mote M, Ames AD, Feron E (2018) An online approach to active set invariance. In:
Proceedings of the IEEE conference on decision and control, pp 3592–3599
24. Gurriet T, Nilson P, Singletary A, Ames AD (2019) Realizable set invariance conditions for
cyber-physical systems. In: Proceedings of the American control conference, pp 3642–3649
25. Guirriet T, Mote M, Singletary A, Feron E, Ames AD (2019) A scalable controlled set invariance
framework with practical safety guarantees. In: Proceedings of the IEEE conference on decision
and control, pp 2046–2053
26. Gurriet T, Mote M, Singletary A, Nilsson P, Feron E, Ames AD (2020) A scalable safety critical
control framework for nonlinear systems. IEEE Access, vol 8
27. Chen Y, Jankovic M, Santillo M, Ames AD (2021) Backup control barrier functions: formulation
and comparative study. In: Proceedings of the IEEE conference on decision and control, pp 6835–
6841
28. Breeden J, Panagou D (2021) High relative degree control barrier functions under input con-
straints. In: Proceedings of the IEEE conference on decision and control, pp 6119–6124
29. Xiao W, Belta C, Cassandras CG (2022) Sufficient conditions for feasibility of optimal control
problems using control barrier functions. Automatica, vol 135
30. Xiao W, Belta C, Cassandras CG (2022) Adaptive control barrier functions. IEEE Trans Autom
Control 67(5):2267–2281
31. Clark A (2021) Verification and synthesis of control barrier functions. In: Proceedings of the
IEEE conference on decision and control, pp 6105—6112
32. Xiao W, Belta C, Cassandras CG (2020) Feasibility-guided learning for constrained optimal
control problems. In: Proceedings of the IEEE conference on decision and control, pp 1896–
1901
56 3 Safety-Critical Control

33. Dawson C, Qin Z, Gao S, Fan C (2021) Safe nonlinear control using robust neural lyapunov-
barrier functions. In: Proceedings of the 5th annual conference on robot learning
34. Brockett RW (1983) Asymptotic stability and feedback stabilization. In: Millman RS, Brockett
RW, Sussmann H (eds) Differential geometric control theory, pp 181–191. Birkhauser
35. Liberzon D (2003) Switching in systems and control. Birkhäuser, Boston, MA
36. Sontag ED (1999) Stability and stabilization: discontinuities and the effect of disturbances. In:
Clarke FH, Stern RJ, Sabidussi G (eds) Nonlinear analysis, differential equations, and control.
Springer, Dordrecht, pp 551–598
37. Jankovic M (2018) Robust control barrier functions for constrained stabilization of nonlinear
systems. Automatica 96:359–367
38. Reis MF, Aguiar AP, Tabuada P (2021) Control barrier function-based quadratic programs intro-
duce undesirable asymptotically stable equilibria. IEEE Control Syst Lett 5(2):731–736
39. Cortez WS, Dimarogonas DV (2022) On compatibility and region of attraction for safe, stabi-
lizing control laws. IEEE Trans Autom Control 67(9):4924–4931
Adaptive Control Lyapunov Functions
4

In Chap. 2, we introduced Lyapunov theory as a tool to design controllers enforcing stability


of nonlinear control systems. In this chapter, we extend these ideas to nonlinear control
systems with uncertain dynamics. We focus on the case where the uncertainty enters the
system in a structured manner through uncertain parameters. Such a situation occurs when
the structure of the vector fields are known, but certain physical attributes of the system
(e.g., inertia and damping properties) under consideration are unknown. The dynamics of
many relevant systems, especially those in robotics, obey such an assumption. We argue
that exploiting this structure, rather than treating uncertainties as a “black box,” allows for
the development of efficient learning-based approaches to control with strong guarantees of
correctness. We provide an introduction to adaptive nonlinear control in Sect. 4.1, where sta-
bilization does not necessarily require convergence of the unknown parameters to their true
values. Next, we present two methods that enforce both stability and parameter convergence
in Sect. 4.2 and show how convergence can be guaranteed to be exponential in Sect. 4.3. We
conclude with final remarks and suggestions for further readings in Sect. 4.4.

Our development in this chapter revolves around a control affine system with additive para-
metric uncertainty
ẋ = f (x) + F(x)θ + g(x)u, (4.1)
where f and g are as in (2.10), and F : Rn → Rn× p is a matrix-valued function whose
columns correspond to vector fields capturing the directions along which the uncertain
parameters θ ∈ R p act.
We assume that F(0) = 0 so that the origin remains an equilibrium point of (4.1) with
u = 0. When the uncertainty enters a system as a linear combination of known nonlinear
features F(x) and unknown parameters θ , such as in (4.1), we say that the system is linear
in the uncertain parameters or simply linear in the parameters. The model in (4.1) captures

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 57


M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_4
58 4 Adaptive Control Lyapunov Functions

the situation in which we understand the structure of the system dynamics, but may not have
full knowledge of the system’s attributes, such as its inertia or damping properties.

4.1 Adaptive Nonlinear Control

Adaptive control focuses on the problem of simultaneous learning and control: we seek to
design a controller u = k(x, θ̂) based on an estimate θ̂ ∈ R p of the uncertain parameters θ ,
while concurrently improving this estimate using information observed along the system tra-
jectory. This improvement in parameter estimation is accomplished using a parameter update
law/estimation algorithm, which manifests itself as a dynamical system θ̂˙ = τ (x, θ̂, t),
where τ : Rn × R p × R≥0 → R p is vector field that is locally Lipschitz in (x, θ̂) and
piecewise continuous in t. Thus, the adaptive control problem can be understood as the
design of a controller and parameter update law

u = k(x, θ̂)
(4.2)
θ̂˙ = τ (x, θ̂, t),

that guarantee the satisfaction of certain properties (stability, safety) of the closed-loop
system. In this regard, adaptive control can be seen as a form of dynamic feedback control
in the sense that the parameters of the controller adjust over time to accomplish the control
objective. A central object studied in the remainder of this book is the parameter estimation
error
θ̃ = θ − θ̂. (4.3)
In general, throughout this book, the notation (·)ˆ stands for the estimate of some quantity
˜ denotes the corresponding estimation error (·)
(·), whereas (·) ˜ = (·) − (·).
ˆ Note that, since
θ is constant, the time-derivative of the parameter estimation error is given by θ̃˙ = −θ̂˙ . As
we will see shortly, when studying the behavior of the closed-loop system (4.1), it is often
necessary to study the properties of the composite dynamical system
   
ẋ f (x) + F(x)θ + g(x)k(x, θ̂)
= (4.4)
θ̃˙ −τ (x, θ̂, t)

composed of the original system dynamics and the parameter estimation error dynamics.
The traditional problem in adaptive control is to design a dynamic controller that stabilizes
(4.1) to an equilibrium point or desired reference trajectory, which can be accomplished using
the Lyapunov-based methods introduced in Chap. 2. The following definition provides an
extension of CLFs to adaptive control systems.

Definition 4.1 (Adaptive CLF) A Lyapunov function candidate V is said to be an adaptive


control Lyapunov function (aCLF) for (4.1) if there exists α ∈ K such that for all x ∈
Rn \ {0} and all θ̂ ∈ R p
4.1 Adaptive Nonlinear Control 59

inf {L f V (x) + L F V (x)θ̂ + L g V (x)u} < −α(x). (4.5)


u∈U

As was the case for the CLFs from Chap. 2, an aCLF induces a set-valued map K aclf :
Rn × R p ⇒ U that associates to each pair (x, θ̂) a set of control values K aclf (x, θ̂) ⊂ U
satisfying the aCLF condition as

K aclf (x, θ̂) := {u ∈ U | L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −α(x)}. (4.6)

The existence of an aCLF implies the existence of a control policy that stabilizes the
estimated dynamics f (x) + F(x)θ̂ + g(x)u for each θ̂ . Before observing how such a policy
affects the validity of the aCLF V as a Lyapunov function for the actual system dynamics
(4.1), we use the parameter estimation error (4.3) to represent (4.1) as

ẋ = f (x) + F(x)θ̂ + g(x)u + F(x)θ̃.

This implies the Lie derivative of the aCLF V along the system dynamics can be expressed
as
V̇ = L f V (x) + L F V (x)θ̂ + L g V (x)u + L F V (x)θ̃.
Choosing the aCLF induced control policy u = k(x, θ̂) ∈ K aclf (x, θ̂) implies that

V̇ ≤ −α(x) + L F V (x)θ̃.

Unfortunately, the above analysis is inconclusive in terms of stability since the sign-
indefinite term L F V (x)θ̃ prevents us from drawing any conclusions about the sign of V̇ .
Fortunately, we have yet to specify the other component of our adaptive controller, the
parameter update law. In a similar vein to the CLF approach, where we designed a controller
to enforce the Lyapunov conditions by construction, in the adaptive control setting we select
a parameter update law so that the Lyapunov conditions for the composite system (4.4) are
satisfied by construction. To this end, consider a new Lyapunov function candidate for the
composite dynamical system (4.4)
1
Va (x, θ̃) = V (x) + θ̃   −1 θ̃ , (4.7)
2
where  ∈ R p× p is positive definite, which consists of the aCLF V and a weighted quadratic
term that penalizes the parameter estimation error. Computing the Lie derivative of Va along
the composite system dynamics with u = k(x, θ̂) ∈ K aclf (x, θ̂) yields

V̇ = L f V (x) + L F V (x)θ̂ + L g V (x)k(x, θ̂) + L F V (x)θ̃ − θ̃   −1 θ̂˙


(4.8)
≤ − α(x) + L V (x)θ̃ − θ̃   −1 θ̂.˙
F

Although the problematic term L F V (x)θ̃ is still present, we now have another variable
˙θ̂ that we are free to choose as we wish. Selecting
60 4 Adaptive Control Lyapunov Functions

θ̂˙ = L F V (x) , (4.9)

reduces (4.8) to
V̇ ≤ −α(x) ≤ 0. (4.10)
Although we have eliminated all sign-indefinite terms, we can only conclude that V̇ ≤ 0
(V̇ is negative semidefinite), which implies that the origin of the composite system (4.4)
is stable by Theorem 2.2. We can actually go a step further than this and demonstrate
convergence of the original system trajectory t → x(t) to the origin with the help of the
following theorem.

Theorem 4.1 (LaSalle-Yoshizawa Theorem) Consider a dynamical system ẋ = f (x, t)


where f : Rn × R≥0 → Rn is locally Lipschitz in x, uniformly in t, satisfying f (0, t) = 0
for all t ∈ R≥0 . Let V : Rn → R≥0 be a Lyapunov function candidate and assume there
exists a continuous function W : Rn → R such that for all (x, t) ∈ Rn × R≥0

V̇ = L f V (x, t) ≤ −W (x) ≤ 0.

Then, all trajectories of ẋ = f (x, t) are uniformly bounded and satisfy

lim W (x(t)) = 0.
t→∞

The above result is often referred to as the LaSalle-Yoshizawa Theorem. Combining


Theorem 4.1 with (4.10) implies that limt→∞ α(x(t)) = 0, which, by the properties of
class K functions, implies limt→∞ x(t) = 0. The preceding discussion is formalized by
the following theorem.

Theorem 4.2 Let V be an aCLF for (4.1) and assume the parameter estimates are updated
according to
θ̂˙ = L F V (x) .
Then, any controller u = k(x, θ̂) locally Lipschitz on (Rn \ {0}) × R p satisfying k(x, θ̂) ∈
K aclf (x, θ̂) for all (x, θ̂) ∈ Rn × R p renders the origin of the composite system (4.4) stable
and ensures that limt→∞ x(t) = 0.

An interesting consequence of Theorem 4.2 is that the adaptive control strategy is capable
of accomplishing the control objective despite there being no guarantee that the estimated
parameters converge to their true values. This phenomenon is one of the defining features
of adaptive control: exactly learning the parameters is generally not a necessary condition
for satisfaction of the control objective.1

1 Although, as discussed later in this chapter, there are many benefits to parameter convergence.
4.1 Adaptive Nonlinear Control 61

In Chap. 2 we saw that for many classes of systems there exist systematic methods to
construct CLFs. When system (4.1) satisfies a certain structural condition such techniques
can be directly extended to construct aCLFs.

Definition 4.2 (Matched uncertainty) The parameters in (4.1) are said to be matched if
there exists a locally Lipschitz mapping ϕ : Rn → Rm× p such that

F(x) = g(x)ϕ(x). (4.11)

When the parameters in (4.1) are matched, the feature mapping F can be expressed as a
linear combination of the control directions:

ẋ = f (x) + g(x)(u + ϕ(x)θ ), (4.12)

implying that if θ were known then the term ϕ(x)θ could simply be canceled by the control
input. This structural condition on (4.1) greatly simplifies the construction of an aCLF, as it
allows one to directly use a CLF for the nominal dynamics ẋ = f (x) + g(x)u as an aCLF
for the uncertain dynamics (4.1).

Proposition 4.1 Let V be a CLF for ẋ = f (x) + g(x)u with U = Rm . If the parameters
in (4.1) are matched, then V is an aCLF for (4.1) with U = Rm .

Proof When U = Rm , the aCLF condition (4.5) is equivalent to

∀(x, θ̂) ∈ Rn × R p : L g V (x) = 0 =⇒ L f V (x) + L F V (x)θ̂ < −α(x). (4.13)

If the uncertain parameters are matched, then F(x) = g(x)ϕ(x), which implies that
L F V (x) = L g V (x)ϕ(x) and, consequently, that L g V (x) = 0 =⇒ L F V (x) = 0. Hence,
(4.13) reduces to

∀x ∈ Rn : L g V (x) = 0 =⇒ L f V (x) < −α(x), (4.14)

which is exactly the statement that V is a CLF for ẋ = f (x) + g(x)u when U = Rm . 

The significance of dealing with matched uncertainty is that the construction of an aCLF
can be done independently of the uncertain parameters. When the parameters are not matched
the construction of an aCLF is much more challenging, and typically relies on using adaptive
backstepping, a technique that extends the standard backstepping idea from Chap. 2.3.2 to
adaptive control. Note that although we have framed most of the results regarding aCLFs as
applicable to the general system (4.1), these are often only applicable to those systems with
matched uncertainties. This is because, in general, an aCLF V may also depend on estimates
of the uncertain parameters V (x, θ̂), which complicates the aCLF condition (4.5) since, in
62 4 Adaptive Control Lyapunov Functions

this situation, the Lie derivative of V depends on the update law θ̂˙ , which in turn depends
on the aCLF V itself. Since our working definition of an aCLF from Definition 4.1 does
not include parameter dependence, most of our discussion in this chapter, and the following
few chapters, will be limited to systems with matched parameters. Additional methods to
construct aCLFs for systems with unmatched parameters will be discussed in Sect. 4.5, and
an adaptive control method that handles general systems of the form (4.1) based on ideas
from reinforcement learning will be presented in Chap. 8.
Similar to classical CLFs, once an aCLF is known, a controller satisfying the conditions
of Definition 4.1 can be constructed using the QP

k(x, θ̂) = arg min 2 u


1 2
u∈U (4.15)
subject to L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −α(x).

Note that we are not limited to using convex optimization-based controllers–any con-
troller satisfying the criteria of Theorem 4.2 can be used to complete the stabilization task;
however, taking the optimization-based approach of (4.15) brings with it the benefits dis-
cussed in Chap. 2. When U = Rm , the QP in (4.15) is a special case of the QP from (2.20)
and thus admits the closed form solution

⎨0 if ψ(x, θ̂) ≤ 0,
k(x, θ̂) = (4.16)
⎩− ψ(x,θ̂ ) 2 L g V (x) if ψ(x, θ̂) > 0,
L V (x) 
g

where ψ(x, θ̂) := L f V (x) + L F V (x)θ̂ + α(x), and is locally Lipschitz on (Rn \ {0}) ×
Rp.

4.2 Concurrent Learning Adaptive Control

In the previous section, we demonstrated how adaptive control provides a methodology


to control nonlinear systems with uncertain parameters via online parameter estimation.
An interesting property of most adaptive controllers is that convergence of the estimated
parameters to their true values is generally not guaranteed. In the Lyapunov-based approach
introduced in the previous section, the parameters are updated to satisfy the Lyapunov condi-
tions and not necessarily to provide the best estimate of the parameters. Intuition, however,
suggests that better control performance may be achieved if the parameter estimates are
driven to their true values. Indeed, parameter convergence in adaptive control is highly
desirable as it allows for establishing exponential stability and increases robustness to exter-
nal disturbances. In this section, we discuss two methods to achieve parameter convergence.
The first is based on the notion of persistence of excitation (PE) condition (Sect. 4.2.1).
The second is an adaptive control technique that can enforce parameter convergence under
conditions that are weaker than PE (Sect. 4.2.2).
4.2 Concurrent Learning Adaptive Control 63

4.2.1 Parameter Identification

Establishing parameter convergence in adaptive control has traditionally relied on satisfy-


ing the persistence of excitation (PE) condition. As the name suggests, such a condition
requires the system trajectory to be sufficiently “excited” (a more formal definition is given
in Definition 4.3), which typically requires injecting some form of exploration noise in the
control input that may detract from performance. To formalize the conditions under which
parameter convergence may be achieved, we start by constructing a linear regression model
for estimating the uncertain parameters θ . We start by noting that, along a given state-control
trajectory t → (x(t), u(t)), the uncertain parameters satisfy the relation

ẋ(t) − f (x(t)) − g(x(t))u(t) = F(x(t))θ, ∀t ≥ 0.

To remove the dependence of the above relation2 on ẋ, we note that by integrating the above
over a finite time interval [t − t, t] ⊂ R≥0 , t ∈ R≥0 , the relation can be equivalently
expressed as
 t  t
x(t) − x(t − t) − ( f (x(s)) + g(x(s))u(s)) ds = F(x(s))dsθ,
max{t−t,0} max{t−t,0}

for all t ≥ 0. Defining


 t
Y(t) :=x(t) − x(t − t) − ( f (x(s)) + g(x(s))u(s)) ds
t max{t−t,0}
 t (4.17)
F (t) := F(x(s))ds,
max{t−t,0}

yields the following linear regression equation for θ :

Y(t) = F (t)θ. (4.18)

The integrals in (4.17) can be computed using measurements of the system state and control
input using standard numerical integration routines. Given a parameter estimate θ̂, we can
then compute the prediction error

e(θ̂, t) = Y(t) − F (t)θ̂. (4.19)

If the ultimate objective of the parameter estimator were to drive θ̂ to θ , then one possible
approach would be to update the estimates to minimize the squared prediction error

2 In practical applications, ẋ may contain quantities such as acceleration that may not be directly
˙
available for measurement. It is also possible to work with an estimate of the state derivative x̂;
however, this could require numerically differentiating state measurements, which could produce a
noisy estimate of ẋ.
64 4 Adaptive Control Lyapunov Functions

E(θ̂, t) = 21 e(θ̂, t)2 , (4.20)

which could be done using gradient descent


∂E 
θ̂˙ = − (θ̂ , t) = F (t) Y(t) − F (t)θ̂ , (4.21)
∂ θ̂
where  ∈ R p× p is positive definite. Using Lyapunov-based tools it is straightforward to
show that such an approach ensures that the parameter estimates remain bounded.

Lemma 4.1 Let t → θ̂ (t) be generated by (4.21). Then, the parameter estimation error
t → θ̃(t) remains bounded for all time.

Proof Consider the Lyapunov function candidate V (θ̃ ) = 21 θ̃   −1 θ̃ . The Lie derivative of
V along the parameter estimation dynamics can be computed as

V̇ = − θ̃   −1 θ̂˙

= − θ̃  F (t) Y(t) − F (t)θ̂
= − θ̃  F (t) F (t)θ̃
≤0,

where the final inequality follows from the fact that F (t) F (t) is at least positive semi-
definite. As V̇ ≤ 0, the Lyapunov function candidate is non-increasing along trajectories of
the parameter estimation error: V (θ̃ (t)) ≤ V (θ̃(0)) for all t ≥ 0. Using the bounds on V it
follows that for all t ≥ 0

λmax ( −1 ) λmax ()


θ̃ (t) ≤ θ̃ (0) = θ̃ (0), (4.22)
λmin ( −1 ) λmin ()

which implies that the parameter estimation error is bounded for all time. 

Similar to the Lyapunov-based parameter estimators outlined in the previous section, there
is no guarantee that the parameter estimates will converge to their true values. Traditionally,
convergence of the parameter estimates can only be ensured when the system trajectories
are persistently excited.

Definition 4.3 (Persistence of excitation) A matrix-valued signal F : R≥0 → Rn× p is said


to be persistently excited if there exist positive constants T , c ∈ R>0 such that for all t ∈ R≥0
 t+T
F (s) F (s)ds ≥ cI p× p . (4.23)
t
4.2 Concurrent Learning Adaptive Control 65

The persistence of excitation (PE) condition implies that over any given finite time interval
F (s) F (s)ds is positive definite, and is historically the
t+T
[t, t + T ] ⊂ R≥0 , the matrix t
condition that has been required for convergence of the parameter estimates in adaptive
control. We do not formally show this here, and instead direct the reader to Sect. 4.5 at the
end of this chapter for more details. Imposing the PE condition for parameter convergence is
challenging for multiple reasons, especially for nonlinear systems. The PE condition cannot
be verified a priori since it depends on knowledge of the systems trajectories, which are
unknown, and is often not possible to check at run-time since it requires reasoning about
all possible future behaviors of the system. From a more practical standpoint, achieving
PE often requires exciting the system by injecting a probing signal into the control input,
which could cause unexpected behaviors that may be especially undesirable in safety-critical
systems.

4.2.2 Concurrent Learning

Concurrent learning is an adaptive control technique that allows for enforcing parameter
convergence under much weaker conditions than the PE condition. The main idea behind
such an approach is to store input-output data that is observed along the system trajectory
in a history stack, which is then leveraged in the parameter update law to facilitate param-
eter convergence. That is, rather than simply using the data observed at the current time t
to update the parameter estimates, as in (4.21), we leverage historical data from previous
time steps t1 < t2 < t3 < · · · < t to improve the parameter estimates. The term “Concurrent
Learning” comes from the fact that, in such an approach, instantaneous data is used concur-
rently with historical data to improve parameter convergence. Intuitively, if this historical
data is sufficiently rich (i.e., if the recorded input-output data from previous time-steps are
sufficiently distinct from one another), then one can show convergence of the parameters to
their true values. We formalize this intuition by defining the notion of a history stack.

Definition 4.4 (History stack) A collection of tuples of the form H(t)={(Y j (t), F j (t))} M
j=1
is said to be a history stack with M ∈ N entries for system (4.1) at time t ∈ R≥0 if each
tuple satisfies
Y j (t) = F j (t)θ.
The piecewise continuous mappings Y j : R≥0 × N → Rn and F j : R≥0 × N → Rn× p
associate to each t ∈ R≥0 and each j ∈ {1, . . . , M} the values of Y(ti ) and F (ti ) as defined
in (4.17) recorded at some previous point in time ti ≤ t.

A history stack may be initially empty, in which case we define F j (0) = 0, Y j (0) = 0 for all
j ∈ {1, . . . , M}, or may contain pre-recorded data from an auxiliary dataset collected offline.
Input-output tuples of data generated along a state-control trajectory t → (x(t), u(t)) can
be stored in H by filling in entries that were initially equal to zero, or by replacing previous
66 4 Adaptive Control Lyapunov Functions

tuples with new tuples. Algorithms for deciding when to store a given tuple (Y(t), F (t))
in H will be discussed shortly. The following definition outlines the key condition that a
history stack must satisfy to guarantee convergence of the parameter estimates.

Definition 4.5 (Finite excitation) A history stack H is said to satisfy the finite excitation
(FE) condition if there exists a time T ∈ R≥0 and a positive constant λ ∈ R>0 such that
⎧ ⎛ ⎞⎫
⎨ M ⎬
inf λmin ⎝ F j (t) F j (t)⎠ ≥ λ > 0. (4.24)
t∈[T ,∞) ⎩ ⎭
j=1

The above definition states that a history stack H satisfies the finite excitation (FE)
 
condition if there exists some finite time T such that the matrix M j=1 F j (t) F j (t) is
positive definite for all t ≥ T . The following theorem shows that satisfaction of the finite
excitation condition is sufficient for parameter convergence.

Theorem 4.3 Consider system (4.1) and let H be a history stack for (4.1). If the estimated
parameters are updated according to

M
θ̂˙ =  F j (t) [Y j (t) − F j (t)θ̂], (4.25)
j=1

where  ∈ R p× p is positive definite, and H satisfies the finite excitation condition, then the
parameter estimation error t → θ̃ (t) exponentially converges to zero in the sense that, for
all t ∈ R≥0 ,
λmax ()
θ̃ (t) ≤ θ̃ (0)e−λλmax ()(t−T ) . (4.26)
λmin ()

Proof Consider the Lyapunov function candidate V (θ̃) = 21 θ̃   −1 θ̃ , which satisfies


2
2λmax () θ̃ 
1
= 21 λmin ( −1 )θ̃ 2 ≤ V (θ̃ ) ≤ 21 λmax ( −1 )θ̃2 = 2λmin () θ̃  ,
1 2
(4.27)

for all θ̃ ∈ R p . The Lie derivative of V along the parameter estimation dynamics can be
computed as
V̇ (θ̃, t) = − θ̃   −1 θ̂˙

M 
= − θ̃  F j (t) Y j (t) − F j (t)θ̂
j=1


M
= − θ̃  F j (t) F j (t)θ̃.
j=1
4.2 Concurrent Learning Adaptive Control 67

 
For any t ≥ 0, the matrix M j=1 F j (t) F j (t) is at least positive semi-definite, implying
that V̇ (θ̃, t) ≤ 0 for all t ≥ 0 and thus V (θ̃ (t), t) ≤ V (θ̃ (0), 0) for all t ≥ 0. Provided H
 
satisfies the finite excitation condition, then for all t ≥ T , the matrix M j=1 F j (t) F j (t) is
positive definite, allowing V̇ to be further bounded as

V̇ (θ̃ , t) ≤ −λθ̃ 2 ≤ −2λλmin ()V (θ̃ , t), ∀t ≥ T . (4.28)

Invoking the comparison lemma to solve the above differential inequality over the interval
[T , ∞) then yields

V (θ̃ (t), t) ≤ V (θ̃ (T ), T )e−2λλmin ()(t−T ) ≤ V (θ̃(0), 0)e−2λλmin ()(t−T ) , (4.29)

where the second inequality follows from the fact that V (θ̃(t), t) ≤ V (θ̃(0), 0) for all t ≥ 0.
Note that the above bound is also valid for all t ≥ 0 as

V (θ̃(0), 0) ≤ V (θ̃(0), 0)e−2λλmin ()(t−T ) , ∀t ∈ [0, T ].

Combining the bounds in (4.27) with those in (4.29) yields (4.26), as desired. 

The main assumption imposed in the previous theorem is that H satisfies the FE con-
 
dition, which requires the existence of a time T such that the matrix M j=1 F j (t) F j (t) is
M
positive definite for all time thereafter. In general, ensuring that j=1 F j (t) F j (t) becomes
positive definite after some finite time is challenging, since, just like the PE condition, this
would require reasoning about future behavior of the system trajectory. However, there exist
 
methods for ensuring that data is added to H in such a way that λmin ( M j=1 F j (t) F j (t)) is
M 
non-decreasing in time. Hence, if one can verify that j=1 F j (t) F j (t) is positive definite
at a single instant in time (e.g., by periodically checking its minimum eigenvalue), then one
can verify satisfaction of the finite excitation condition. An algorithm for accomplishing this
objective is outlined in Algorithm 1. For ease of presenting the algorithm, we define
⎛ ⎞
M
λmin (H(t)) := λmin ⎝ F j (t) F j (t)⎠ ,
j=1

for a history stack H, and


H j (t) := (Y j (t), F j (t)),
as the tuple of data present in the jth slot of H at time t. This algorithm, referred to as
the Singular Value Maximizing Algorithm takes as inputs an initial history stack H(0) and
a tolerance threshold for adding new data ε ∈ R>0 . At each time instant t, the algorithm
checks if the value of F (t) is sufficiently distinct from the value of Fi (t), with i ∈ {1, . . . , M}
denoting the index of the slot most recently updated (initialized to the first slot), according
to the tolerance threshold ε. If the current value of F (t) is sufficiently different from the
68 4 Adaptive Control Lyapunov Functions

previously recorded value and H is not yet full (i.e., if i < M), then the current tuple
(Y(t), F (t)) is stored in slot i + 1 of H. If the current value of F (t) is sufficiently different
from the previously recorded value and H is full (i.e., if i = M), then the algorithm checks
if adding the current tuple (Y(t), F (t)) to H will increase λmin (H(t)). This process entails
replacing the tuple of data present in each slot with the current tuple (Y(t), F (t)) and
checking if the minimum eigenvalue of the history stack with the new data is larger than the
original stack without the data added. If replacing the existing data in a particular slot leads
to an increase in λmin (H(t)), then this newly formed history stack replaces the old history
stack. If replacing the existing data in multiple slots leads to an increase in λmin (H(t)), then
the current tuple (Y(t), F (t)) is stored in the slot whose replacement results in the largest
increase of λmin (H(t)). If it is not possible to increase λmin (H(t)) by replacing existing
data with new data, then no changes to the history stack are made, thereby ensuring that
λmin (H(t)) is non-increasing in time.

Algorithm 1 Singular Value Maximizing Algorithm


Require: History stack H(0) at t = 0 and a tolerance for adding new data ε ∈ R>0
i ←1  Set stack index to 1
if F (t)−F i (t)
Fi (t) ≥ ε then  Check if current data is different enough
if i < M then  If stack is not full
i ←i +1  Bump index by 1
Hi (t) ← (Y(t), F (t))  Record current data in ith slot of stack
else  If stack is full
Htemp ← H  Copy data in current stack
λold ← λmin (Htemp (t))  Compute minimum eigenvalue of current stack
←∅
for j ∈ {1, . . . , M} do  For each entry in the stack
Htemp, j (t) ← (Y(t), F (t))  Replace data in jth slot with current data
λ j ← λmin (Htemp (t))  Compute minimum eigenvalue
← ∪ {λ j }  Save minimum eigenvalue
Htemp ← H
end for
λnew ← max  Get the largest minimum eigenvalue
k ← arg max
if λnew > λold then
Hk (t) ← (Y(t), F (t))  Replace old data with new data
end if
end if
end if
4.3 Exponentially Stabilizing Adaptive CLFs 69

4.3 Exponentially Stabilizing Adaptive CLFs

In the previous section we demonstrated that the convergence properties of traditional param-
eter estimation routines could be enhanced by leveraging a history stack of recorded input-
output data. When such data is sufficiently rich (as characterized by the minimum eigenvalue
of a recorded data matrix), such concurrent learning parameter estimators ensure exponential
convergence of the parameter estimation error to zero. In the present section, we exploit this
exponential convergence to endow adaptive controllers with exponential stability guarantees.
Our development begins by specializing the notion of an aCLF from Definition 4.1.

Definition 4.6 (Exponentially stabilizing adaptive CLF) A Lyapunov function candidate


V : Rn → R≥0 is said to be an exponentially stabilizing adaptive control Lyapunov function
(ES-aCLF) for (4.1) if there exist positive constants c1 , c2 , c3 ∈ R>0 such that for all x ∈ Rn

c1 x2 ≤ V (x) ≤ c2 x2 , (4.30)

and for all x ∈ Rn \ {0} and θ̂ ∈ R p

inf {L f V (x) + L F V (x)θ̂ + L g V (x)u} < −c3 x2 . (4.31)


u∈U

Following the same recipe outlined in Chap. 2, given an ES-aCLF we construct the set-valued
map

K es-aclf (x, θ̂) := {u ∈ U | L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −c3 x2 }, (4.32)

that assigns to each (x, θ̂) a set of control inputs satisfying the conditions from Definition 4.6.
The following theorem demonstrates that any controller satisfying the conditions of Defini-
tion 4.6 renders the origin of the composite system (4.4) exponentially stable provided the
parameter estimates are updated using a history stack that satisfies the FE condition.

Theorem 4.4 For system (4.1), let H be a history stack and V be an ES-aCLF. Suppose
the estimated parameters are updated according to
⎛ ⎞

˙θ̂ =  ⎝ L V (x) + γ  F (t) Y (t) − F (t)θ̂ ⎠ ,
M
F c j j j (4.33)
j=1

where  ∈ R p× p is positive definite and γc ∈ R>0 . If H satisfies the FE condition,


then any controller u = k(x, θ̂) locally Lipschitz on (Rn \ {0}) × R p satisfying k(x, θ̂) ∈
K es-aclf (x, θ̂) for all (x, θ̂) ∈ Rn × R p renders the origin of the composite system (4.4)
exponentially stable in the sense that
70 4 Adaptive Control Lyapunov Functions

    
 x(t)  η2  x(0)  − η3 (t−T )
    2η
 θ̃ (t)  ≤ η  θ̃ (0)  e 2 , ∀t ≥ 0, (4.34)
1

where
η1 := min{c1 , 2λmax
1
() }
η2 := max{c2 , 2λmin1 () }
η3 := min{c3 , γc λ}.

Proof Consider the composite Lyapunov function candidate

Va (x, θ̃) := V (x) + 21 θ̃   −1 θ̃ ,

which satisfies
 2  2
 x   x 

η1   ≤ Va (x, θ̃) ≤ η2  
 θ̃  , ∀(x, θ̃) ∈ R × R .
n p
 (4.35)
θ̃

Computing the Lie derivative of V along the composite system dynamics yields

V̇a (x, θ̃, t) = L f V (x) + L F V (x)θ̂ + L g V (x)k(x, θ̂) + L F V (x)θ̃



M 
− θ̃  L F V (x) − γc θ̃  F j (t) Y j (t) − F j (t)θ̂
j=1


M
(4.36)

= L f V (x) + L F V (x)θ̂ + L g V (x)k(x, θ̂) − γc θ̃ F j (t) F j (t)θ̃
j=1


M
≤ −c3 x2 − γc θ̃  F j (t) F j (t)θ̃,
j=1

where the last inequality follows from the definition of k. For any t ≥ 0, the matrix
M 
j=1 F j (t) F j (t) is at least positive semi-definite, allowing (4.36) to be further bounded
as
V̇a (x, θ̃, t) ≤ −c3 x2 ≤ 0. (4.37)
As V̇a is negative semi-definite for all t, the origin of the composite system is stable
and Va (x(t), θ̃(t), t) ≤ Va (x(0), θ̃(0), 0) for all t ≥ 0. Moreover, by Theorem 4.1 we have
that limt→∞ c3 x(t)2 = 0, which implies limt→∞ x(t) = 0. Provided H satisfies the
 
FE condition, then for all t ≥ T , the matrix M j=1 F j (t) F j (t) is positive definite, allowing
(4.36) to be further bounded as
η3
V̇a (x, θ̃, t) ≤ −c3 x2 − γc λθ̃ 2 ≤ − Va (x, θ̃), ∀t ≥ T . (4.38)
η2
Invoking the comparison lemma to solve the above differential inequality over the interval
[T , ∞) yields
4.3 Exponentially Stabilizing Adaptive CLFs 71

η η
− η3 (t−T ) − η3 (t−T )
Va (x(t), θ̃(t), t) ≤ Va (x(T ), θ̃(T ), T )e 2 ≤ Va (x(0), θ̃(0), 0)e 2 , (4.39)

where the second inequality follows from the observation that Va (x(t), θ̃(t), t) ≤ Va (x(0),
θ̃ (0), 0) for all t ≥ 0. Note that the above bound is also valid for all t ≥ 0 as
η
− η3 (t−T )
Va (x(0), θ̃(0), 0) ≤ Va (x(0), θ̃(0), 0)e 2 , ∀t ∈ [0, T ].

Rearranging terms using the bounds on Va from (4.35) yields the bound in (4.34), as
desired. 

The previous theorem demonstrates that concurrent learning provides a pathway towards
guaranteeing both parameter convergence and exponential stability in the context of adaptive
control. An interesting consequence of the above theorem is that asymptotic stability of
x = 0 is guaranteed regardless of the satisfaction of the FE condition. That is, exploiting the
recorded data can only aid in the stabilization task–if the FE condition is not satisfied then
the overall control objective is still achieved, albeit not in an exponential fashion.
Similar to the aCLFs of Sect. 4.1, when the parameters in (4.1) are matched, the con-
struction of an ES-aCLF can be performed independently of the uncertainty.

Proposition 4.2 Let V : Rn → R≥0 be an ES-CLF for ẋ = f (x) + g(x)u with U = Rm .


If the parameters in (4.1) are matched, then V is an ES-aCLF for (4.1) with U = Rm .

Proof The proof follows the same argument as that of Proposition 4.1. 

Once an ES-aCLF is known, controllers satisfying the conditions of Definition 4.6 can be
computed through the QP
1
k(x, θ̂) = arg min u2
u∈U 2 (4.40)
subject to L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −c3 x2 ,

which admits a closed-form solution when U = Rm that is locally Lipschitz on (Rn \ {0}) ×
R p as ⎧
⎨0 if ψ(x, θ̂) ≤ 0,
k(x, θ̂) = (4.41)
⎩− ψ(x,θ̂ ) 2 L g V (x) if ψ(x, θ̂) > 0,
L V (x) 
g

where ψ(x, θ̂) := L f V (x) + L F V (x)θ̂ + c3 x2 .


72 4 Adaptive Control Lyapunov Functions

4.4 Numerical Examples

Example 4.1 (Inverted pendulum (revisited)) Consider again an unstable nonlinear sys-
tem in the form of the inverted pendulum from (2.56)

m2 q̈ − mg sin(q) = u − bq̇,

with state x = [q q̇] ∈ R2 , where we assume that the constants b, g are unknown. Defining
the uncertain parameter vector as θ := [g b] ∈ R2 allows this system to be put into the
form of (4.1) as
      
q̇ 0 0 g 0
ẋ = + 1 + 1 u. (4.42)
 sin(q) − m2 q̇
1
0 b m2
       
f (x) F(x) θ g(x)

Our main objective is to design an aCLF-based controller that stabilizes the origin of
the above system. Note that the uncertainty is matched since there exists a ϕ : Rn → R1×2
satisfying  
0  
F(x) = 1 m sin(q) −q̇ , (4.43)
m2   
   ϕ(x)
g(x)

which implies that an aCLF can be constructed by finding a CLF for the simple double
integrator
   
q̇ 0
ẋ = + 1 u, (4.44)
0 m2
   
f (x) g(x)

Fig. 4.1 Trajectory of the


inverted pendulum under the
ES-aCLF controller (blue
curve) and aCLF controller
(orange) curve. Both
trajectories start from an initial
condition of x0 = [ π6 0] and
converge to the origin
4.4 Numerical Examples 73

Fig. 4.2 State (top) and parameter estimate (bottom) trajectories for the inverted pendulum under
the ES-aCLF controller and aCLF controller. In each plot the solid curves correspond to results
generated by the ES-aCLF controller and the dotted curves correspond to those generated by the
aCLF controller. In the bottom plot, the dashed lines of corresponding color denote the true values
of the unknown parameters

which could be performed using the methods from Sect. 2.3 or by solving a linear quadratic
regulator (LQR) problem for the nominal linear system. In what follows we demon-
strate numerically the theoretical claim that concurrent learning-based parameter estimators
achieve convergence of the parameter estimators under weaker conditions than persistence
of excitation. For the simulations we take as our aCLF the function V (x) = q 2 + 21 q̇ 2 + q q̇
and choose  = I2×2 , γc = 20 as learning gains used in the ES-aCLF update law (4.33).
The data used in the update law is stored in a history stack with M = 30 entries, where the
integration window used to generate the data is chosen as t = 0.2.
74 4 Adaptive Control Lyapunov Functions

In the simulations we compare the performance of the ES-aCLF induced controller with
that obtained using a standard aCLF (i.e., with no additional data to enforce parameter
convergence), the results of which are illustrated in Figs. 4.1 and 4.2. In particular, the
plot in Fig. 4.1 depicts the trajectory of the inverted pendulum in the q × q̇ plane, where
each trajectory can be seen to converge from its initial condition to the origin. This is also
illustrated in Fig. 4.1 (top), which provides the evolution of the pendulum’s states over time.
The trajectories under each controller are similar; however, the states under the ES-aCLF
exhibit less overshoot than those produced by the aCLF controller. Moreover, the parameters
under the ES-aCLF controller converge to their true values (see Fig. 4.2 (bottom))–a property
that will become very important in the proceeding chapter when extending adaptive control
techniques to safety-critical systems.

4.5 Notes

In this chapter we provided a brief introduction to adaptive control of nonlinear systems from
a control Lyapunov perspective. Central to our approach was the idea of concurrent learning
in which instantaneous data is used alongside with a recorded data to learn the uncertain
parameters online. Early works in adaptive control primarily focused on the control of
uncertain linear systems using the framework of model reference adaptive control (MRAC),
wherein the objective is to update the parameters of the controller so that the system states
converge to those of a suitably constructed reference model. A more in-depth introduction
to early results in the adaptive control can be found in several textbooks such as [1–3], with
extensions to nonlinear systems developed a few decades later [4–6]. More details on the
persistence of excitation (PE) condition can be found in any of the aforementioned textbooks
on adaptive control.
Historically, adaptive control methods have been categorized as either direct or indirect.
Direct adaptive control typically parameterizes a controller whose parameters are updated
directly to accomplish a control objective. Indirect adaptive control typically estimates the
system’s uncertain parameters without regard to the underlying control objective, which is
then used to compute control actions. The distinction between direct and indirect adaptive
control dominates the literature on linear adaptive control; however, for nonlinear systems
such a distinction is generally blurred since the parameters used in the controller are typi-
cally the estimated system parameters themselves. In the context of nonlinear systems, adap-
tive control designs are often classified as either Lyapunov-based 3 or estimation-based. In
Lyapunov-based designs, estimates of the uncertain system parameters are typically updated
to satisfy the Lyapunov conditions for stability without regard to how accurate such esti-
mates are with respect to the true values of the parameters. On the other hand, estimation-
based designs typically update the estimated parameters to minimize the prediction error
without considering how such estimates may affect the stability of the closed-loop system.

3 More generally, such designs can be classified as certificate-based since they are also applicable to
other certificate functions such as barrier functions.
4.5 Notes 75

Approaches that combine the benefits of Lyapunov-based and estimation-based (direct and
indirect) are often referred to as composite adaptive controllers [7].
The control Lyapunov perspective on adaptive control was outlined in [8] with the intro-
duction of the adaptive control Lyapunov function (aCLF). A more recent account of the
CLF perspective on adaptive control is presented in [9]. Based on the preceding discus-
sion, the aCLF method is clearly classified as a Lyapunov-based adaptive control approach.
Our statement of the LaSalle-Yoshizawa theorem that plays a fundamental role in proving
the stability of adaptive control systems is adapted from [6]. A fundamental property that
we exploit in this chapter to construct aCLFs is that when the uncertain parameters are
matched, an aCLF can be constructed by constructing a CLF for the nominal control affine
system. When the parameters are not matched, the construction of an aCLF is more chal-
lenging; however, there still exists a wealth of tools for constructing aCLFs in this scenario.
The most popular approach is via adaptive backstepping–a process extensively outlined in
[6]. Although adaptive backstepping provides a systematic approach towards constructing
aCLFs for certain classes of nonlinear systems, the resulting aCLF is often parameter depen-
dent, which significantly complicates the Lyapunov-based update law needed to establish
stability. More recently, [10, 11] introduced the notion of an unmatched CLF (uCLF), which
is a parameter dependent aCLF-type function that can be constructed using the backstepping
methodology, but that also allows for the use of much simpler Lyapunov-based update laws.
Alternative computational approaches to constructing aCLFs or uCLFs can be performed
using sum-of-squares programming provided the system vector fields are polynomial [12].
Traditionally, parameter convergence in adaptive control relied on the PE condition,
which, as argued in this chapter, is rather restrictive. Concurrent learning, first introduced by
Chowdhary and coauthors [13–15], replaces the PE condition with less restrictive conditions
that depend only on data observed along the system trajectory. Exploiting such data allows for
ensuring exponential convergence of the parameter estimates to their true values, which, in
turn, allows for establishing exponential stability of a composite dynamical system consisting
of the system and parameter estimation error dynamics. The singular value maximizing
algorithm for recording data that ensures the minimum eigenvalue of the history stack is non-
increasing was first developed in [16]. Based on our earlier discussion, concurrent learning
adaptive control can be classified as a composite adaptive control approach. Originally,
such concurrent learning techniques required measurements of the state derivative ẋ to
compute the prediction error–the works of [17, 18] provide a methodology to remove this
restriction using state derivative estimation and numerical integration, respectively. Our
approach presented in this chapter that alleviates such an assumption by integrating the
dynamics over a finite horizon was introduced in [18] and has been referred to as integral
76 4 Adaptive Control Lyapunov Functions

concurrent learning. The development of concurrent learning adaptive control from a CLF
perspective was introduced in [19] using the notion of an exponentially stabilizing aCLF.

References

1. Ioannou PA, Sun J (2012) Robust adaptive control. Dover


2. Ioannou P, Fidan B (2006) Adaptive control tutorial. SIAM
3. Sastry S, Bodson M (2011) Adaptive control: stability, convergence, and robustness. Dover
4. Slotine JJE, Li W (1987) On the adaptive control of robot manipulators. Int J Robot Res 6(3):49–
59
5. Slotine JJE, Li W (1991) Applied nonlinear control. Prentice Hall
6. Krstić M, Kanellakopoulos I, Kokotović P (1995) Nonlinear and adaptive control design. Wiley
7. Slotine JJE, Li W (1989) Composite adaptive control of robot manipulators. Automatica
25(4):509–519
8. Krstić M, Kokotović P (1995) Control lyapunov functions for adaptive nonlinear stabilization.
Syst Control Lett 26(1):17–23
9. Taylor AJ, Ames AD (2020) Adaptive safety with control barrier functions. In: Proceedings of
the American control conference, pp 1399–1405
10. Lopez BT, Slotine JJE (2022) Universal adaptive control of nonlinear systems. IEEE Control
Syst Lett 6:1826–1830
11. Lopez BT, Slotine JJE (2022) Adaptive variants of optimal feedback policies. In: 4th annual
conference on learning for dynamics and control, vol 166. Proceedings of machine learning
research, pp 1–12
12. Moore J, Tedrake, R (2014) Adaptive control design for underactuated systems using sums-of-
squares optimization. In: Proceedings of the American control conference, pp 721–728
13. Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control with-
out persistency of excitation. In: Proceedings of the IEEE conference on decision and control,
pp 3674–3679
14. Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency
of excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA
15. Chowdhary G, Yucelen T, Muhlegg M, Johnson EN (2013) Concurrent learning adaptive control
of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process
27(4):280–301
16. Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for
concurrent learning. In: Proceedings of the American control conference, pp 3547–3552
17. Kamalapurkar R, Reish B, Chowdhary G, Dixon WE (2017) Concurrent learning for parameter
estimation using dynamic state-derivative estimators. IEEE Trans Autom Control 62(7):3594–
3601
18. Parikh A, Kamalapurkar R, Dixon WE (2019) Integral concurrent learning: adaptive control with
parameter convergence using finite excitation. Int J Adapt Control Signal Process 33(12):1775–
1787
19. Cohen MH, Belta C (2022) High order robust adaptive control barrier functions and exponen-
tially stabilizing adaptive control lyapunov functions. In: Proceedings of the american control
conference, pp 2233–2238
Adaptive Safety-Critical Control
5

In the previous chapter, we discussed how techniques from adaptive control can be used to
construct stabilizing controllers for nonlinear systems with uncertain parameters by dynam-
ically adjusting the controller based upon data observed online. In the present chapter, we
discuss how the same ideas can be applied to construct controllers that enforce safety, rather
than stability specifications. In Sect. 5.1, we define adaptive control barrier functions by
extending the notion of adaptive control Lyapunov function that we discussed previously.
We further improve this definition by introducing robust adaptive control barrier functions in
Sect. 5.2. These notions are first defined for safety constraints with relative degree one with
respect to the system dynamics—we extend them for higher relative degrees in Sect. 5.3. We
conclude with references, final remarks, and suggestions for further reading in Sect. 5.5.

Throughout this chapter, we focus on the uncertain nonlinear system (4.1) given here again
for convenience:
ẋ = f (x) + F(x)θ + g(x)u.
Our objective is to design an adaptive controller u = k(x, θ̂) that renders a closed set C ⊂ Rn
forward invariant.

5.1 Adaptive Control Barrier Functions

In this section, we extend the notion of an aCLF from Sect. 4.1 to a safety-critical setting
using the notion of an adaptive CBF (aCBF). Once again we consider the set C defined as
the zero superlevel set of a continuously differentiable function h : Rn → R as in (3.3):

C = {x ∈ Rn | h(x) ≥ 0}.

As the parameters θ in (4.1) are unknown we cannot directly enforce the CBF condition
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 77
M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_5
78 5 Adaptive Safety-Critical Control

 
sup L f h(x) + L F h(x)θ + L g h(x)u > −α(h(x))
u∈U

over C for some α ∈ K∞ e . Rather, similar to the aCLF approach from Sect. 4.1, we will

enforce the CBF condition using an estimated model of the system dynamics and then
eliminate the residual parameter estimation by carefully selecting the parameter update law.
Such a development motivates the following definition:

Definition 5.1 (Adaptive CBF) Let h : Rn → R be a continuously differentiable function


defining a set C ⊂ Rn as in (3.3) such that ∇h(x)  = 0 for all x ∈ ∂C. Then, h is said to be
an adaptive control barrier function (aCBF) for (4.1) if, for all (x, θ̂) ∈ C × R p ,
 
sup L f h(x) + L F h(x)θ̂ + L g h(x)u > 0. (5.1)
u∈U

One may notice a few differences between the standard CBF definition and the one presented
above, the most notable being the absence of the extended class K∞ function on the right-
hand-side of the inequality in (5.1). Unfortunately, replacing the right-hand-side of (5.1)
with α(h(x)) will be insufficient to establish forward invariance of C. Similar to the aCLF
case, our analysis will proceed with studying the properties of a composite barrier function
that contains the parameter estimation error. To this end, consider the composite barrier
function candidate
h a (x, θ̃) := h(x) − 21 θ̃ −1 θ̃ , (5.2)
where  ∈ R p× p is a positive definite learning gain that defines a family of sets Cθ ⊂ Rn
parameterized by θ̃ as
Cθ = {x ∈ Rn | h a (x, θ̃) ≥ 0}
∂Cθ = {x ∈ Rn | h a (x, θ̃) = 0} (5.3)
Int(Cθ ) = {x ∈ Rn | h a (x, θ̃) > 0}.

Note that Cθ ⊂ C for each θ̃ ∈ R p as

x ∈ Cθ =⇒ h(x) ≥ 21 θ̃  −1 θ̃ =⇒ h(x) ≥ 0 =⇒ x ∈ C.

Hence, designing a controller that renders Cθ forward invariant provides a pathway towards
(conservatively) ensuring that x(t) ∈ C for all t ∈ I(x0 ). To facilitate the construction of
such a controller, note that an aCBF h induces the set valued map

K acbf (x, θ̂) := {u ∈ U | L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ 0}, (5.4)

that associates to each (x, θ̂) ∈ C × R p the set of control values K acbf (x, θ̂) ⊂ U satisfying
the aCBF condition from (5.1). Before stating the main result regarding aCBFs, note that, for
a given initial condition x0 ∈ C and an initial parameter estimation error θ̃0 ∈ R p , the gain
matrix  must be selected such that h(x0 ) ≥ 21 θ̃  −1 θ̃ to ensure that x0 ∈ Cθ , a sufficient
5.1 Adaptive Control Barrier Functions 79

condition for which is that


θ̃0 2
λmin () ≥ . (5.5)
2h(x0 )
The following theorem provides conditions under which a controller drawn from K acbf (x, θ̂)
renders Cθ forward invariant.

Theorem 5.1 Let h : Rn → R be an aCBF for (4.1) and consider the family of sets
Cθ ⊂ C ⊂ Rn defined by the composite barrier function candidate from (5.2). Suppose
the estimated parameters are updated according to

θ̂˙ = −L F h(x) , (5.6)

and that  is selected such that (5.5) holds for a given initial condition (x0 , θ̃0 ) ∈ C × R p .
Then, the trajectory of the composite dynamical system
  
ẋ f (x) + F(x)θ + g(x)k(x, θ̂)
˙θ̃ = L F h(x)
, (5.7)

with u = k(x, θ̂) ∈ K acbf (x, θ̂) locally Lipschitz on C × R p , satisfies x(t) ∈ C for all t ∈
I (x0 , θ̃0 ).

Proof Taking the Lie derivative of the composite barrier function candidate h a along the
vector field of the composite dynamical system (5.7) yields

ḣ a = L f h(x) + L F h(x)θ + L g h(x)k(x, θ̂) − θ̃  −1 θ̃˙


(5.8)
= L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂) ≥ 0,

where the second line follows from substituting in the parameter update law (5.6) and the
final inequality follows from the properties of the controller in (5.4). Integrating the above
over a finite time interval [0, t] ⊂ R reveals that

h a (x(t), θ̃(t)) ≥ h a (x0 , θ̃0 ). (5.9)

Hence, provided  is selected such that (5.5) holds, then h a (x0 , θ̃0 ) ≥ 0, which implies
x(t) ∈ Cθ for all t ∈ I (x0 , θ̃0 ). As Cθ ⊂ C for each θ̃ ∈ R p , the preceding argument implies
that x(t) ∈ C for all t ∈ I (x0 , θ̃0 ), as desired. 

The proceeding theorem provides safety guarantees for the adaptive controller derived from
an aCBF by rendering the family of subsets Cθ ⊂ C forward invariant. Similar to the defi-
nition of a standard CBF, when U = Rm the condition in (5.1) can be expressed as

∀(x, θ̂) ∈ C × R p : L g h(x) = 0 =⇒ L f h(x) + L F h(x)θ̂ > 0. (5.10)


80 5 Adaptive Safety-Critical Control

Verifying the above condition for all θ̂ ∈ R p may be very challenging. Fortunately, when
the parameters in (4.1) are matched, the aCBF condition is independent of θ̂.

Proposition 5.1 Let h : Rn → R be a continuously differentiable function defining a set


C ⊂ Rn as in (3.3) such that ∇h(x)  = 0 for all x ∈ ∂C. Suppose U = Rm and that (4.1)
satisfies the matching condition. Then, h is an aCBF for (4.1) if

∀x ∈ C : L g h(x) = 0 =⇒ L f h(x) > 0. (5.11)

Proof Follows similar steps to that of Proposition 4.1. 

Although the above proposition illustrates that the design of h can be decoupled from the
uncertain parameters when (4.1) satisfies the matching condition, it does not state that h being
a CBF for the nominal dynamics ẋ = f (x) + g(x)u is sufficient to guarantee that h is an
aCBF for the uncertain dynamics ẋ = f (x) + F(x)θ + g(x)u. Indeed, the aCBF condition
(5.1) is much stronger than the standard CBF condition (3.9) as it requires every non-negative
superlevel set of h to controlled invariant rather than only the zero superlevel set, which is
one source of conservatism of the aCBF approach. The other source of conservatism stems
from the fact that a subset Cθ of C is rendered forward invariant, rather than C itself. As the
term 21 θ̃  −1 θ̃ → 0, the composite barrier candidate h a from (5.2) approaches the original
barrier candidate h, which implies Cθ → C. Hence, the conservatism of the approach can
be reduced in two ways: (1) choosing  with a larger minimum eigenvalue; (2) decreasing
the parameter estimation error θ̃ . In theory, one can take  as large as they like; in practice,
this is ill-advised as large adaptation gains can amplify the effect of unmodeled dynamics
and disturbances. Thus, a more practical approach may be to reduce parameter estimation
error, yet, as discussed in the previous chapter, traditional adaptive control methods typically
provide no guarantees of convergence of the parameter estimates. Fortunately, the data-driven
adaptive control tools introduced in Sect. 4.2 provide a methodology to reduce the level of
uncertainty in the parameter estimates online as more data becomes available. As shown in
the following section, such tools can be gracefully integrated into the aCBF framework to
address the limitations outlined above.

5.2 Robust Adaptive Control Barrier Functions

In this section we demonstrate how the limitations of the aCBF approach can be addressed by
uniting tools from concurrent learning adaptive control with the notion of a robust adaptive
control barrier function (RaCBF). The main idea behind the RaCBF methodology is to
robustly account for the worst-case bound on the parameter estimation error, but reduce such
a bound online as more data about the system becomes available. For a such an approach to
be tractable, we require stronger assumptions on (4.1) in the form of prior knowledge of θ .
5.2 Robust Adaptive Control Barrier Functions 81

Assumption 5.1 There exists a known subset of the parameter space  ⊂ R p and a maxi-
mum estimation error ϑ̃max ∈ R≥0 such that

 := {θ̂ ∈ R p | θ − θ̂ ≤ ϑ̃max } (5.12)

The above assumption implies that, although we do not know the exact values of the uncertain
parameters θ , we do know some region of the parameter space in which the parameters lie.
From a theoretical standpoint, this is more restrictive than the assumptions posed in the
previous chapter in which no assumptions on where the parameters lie were made. We
argue, however, that this is not restrictive from a practical standpoint as such parameters
generally correspond to the physical attributes of a system (e.g., mass, inertia, damping,
etc.) that may take on known ranges of values.

Definition 5.2 (Robust adaptive CBF) Let h : Rn → R be a continuously differentiable


function defining a set C ⊂ Rn as in (3.3) such that ∇h(x)  = 0 for all x ∈ ∂C. Then, h is
said to be a robust adaptive control barrier function (RaCBF) for (4.1) on C if there exists
α ∈ K∞ e such that for all (x, θ̂) ∈ Rn × 

 
sup L f h(x) + L F h(x)θ̂ + L g h(x)u > −α(h(x)) + L F h(x) ϑ̃max . (5.13)
u∈U

When U = Rm , the RaCBF condition (5.13) can be restated as

∀(x, θ̂) ∈ Rn ×  : L g h(x) = 0 =⇒ L f h(x) + L F h(x)θ̂


(5.14)
> −α(h(x)) + L F h(x) ϑ̃max ,

which may be very challenging to verify for all possible θ̂ ∈ . Similar to the results in the
previous section, when (4.1) satisfies the matching condition, the criteria for determining
the validity of h as a RaCBF becomes much simpler.

Proposition 5.2 Let h be a CBF for ẋ = f (x) + g(x)u with U = Rm on a set C and
suppose the parameters in (4.1) are matched. Then, h is a RaCBF for (4.1) on C.

Proof Follows the same steps as that of Proposition 4.1. 

As in the previous section, an RaCBF induces a family of control policies expressed through
the set-valued map

K RaCBF (x, θ̂) := {u ∈ U | L f h(x) + L F h(x)θ̂ + L g h(x)u


(5.15)
≥ −α(h(x)) + L F h(x) ϑ̃max },
82 5 Adaptive Safety-Critical Control

assigning to each (x, θ̂) ∈ D ×  the set K RaCBF (x, θ̂) ⊂ U of control values satisfying
the RaCBF condition (5.13). The following result demonstrates that any locally Lipschitz
controller u = k(x, θ̂) satisfying k(x, θ̂) ∈ K RaCBF (x, θ̂) renders C forward invariant.

Proposition 5.3 Let h : Rn → R be a RaCBF for (4.1) on a set C as in (3.3), and let
Assumption 5.1 hold. Then, any locally Lipschitz controller u = k(x, θ̂) satisfying k(x, θ̂) ∈
K RaCBF (x, θ̂) for all (x, θ̂) ∈ Rn ×  renders C forward invariant.

Proof The Lie derivative of h along the closed-loop system vector field is

ḣ = L f h(x) + L F h(x)θ + L g h(x)k(x, θ̂)


(5.16)
= L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂) + L F h(x)θ̃.

Provided θ̂ ∈  and Assumption 5.1 holds, ḣ can be lower bounded as

ḣ ≥L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂) − L F h(x) θ̃


≥L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂) − L F h(x) ϑ̃max (5.17)
≥ − α(h(x)),

where the last inequality follows from (5.15). Thus, since ∇h(x)  = 0 for all x ∈ ∂C and
ḣ ≥ −α(h(x)), h is a barrier function for the closed-loop system and the forward invariance
of C follows from Theorem 3.2. 

The preceding result shows that a RaCBF remains valid as a safety certificate while updating
the parameter estimates by accounting for the worst-case bound on the estimation error. Our
ultimate objective, however, is to reduce this worst-case bound over time as our parameter
estimates improve. The following result shows that if h is a RaCBF for (4.1) with a given
level of model uncertainty ϑ̃max , then it remains a RaCBF as the level of uncertainty is
reduced.

Lemma 5.1 If h is a RaCBF for (4.1) for a given ϑ̃max ∈ R≥0 , then it is also an RaCBF for
any ϑ̃ ∈ [0, ϑ̃max ] in the sense that for all (x, θ̂) ∈ Rn × 
 
sup L f h(x) + L F h(x)θ̂ + L g h(x)u > −α(h(x)) + L F h(x) ϑ̃. (5.18)
u∈U

Proof If ϑ̃ ∈ R≥0 is such that ϑ̃ ≤ ϑ̃max , then L F h(x) ϑ̃max ≥ L F h(x) ϑ̃. Thus, if h is
a RaCBF for (4.1), then for all (x, θ̂) ∈ Rn × 
 
sup L f h(x) + L F h(x)θ̂ + L g h(x)u > − α(h(x)) + L F h(x) ϑ̃max
u∈U (5.19)
≥ − α(h(x)) + L F h(x) ϑ̃,

implying (5.18) holds. 


5.2 Robust Adaptive Control Barrier Functions 83

The following assumption outlines the characteristics of parameter estimators that reduce
the level of uncertainty online.

Assumption 5.2 There exists a parameter update law θ̂˙ = τ (θ̂, t), with τ locally Lipschitz
in θ̂ and piecewise continuous in t such that θ̂ (t) ∈  for all t ∈ R≥0 . Moreover, there exists
a piecewise continuous and a non-increasing function ϑ̃ : R≥0 → R≥0 such that

θ̃ (t) ≤ ϑ̃(t) ≤ ϑ̃max , ∀t ∈ R≥0 . (5.20)

Examples of parameter estimation routines satisfying the above assumption will be provided
shortly. The following theorem constitutes the main result with regard to RaCBFs.

Theorem 5.2 Let h : Rn → R be a RaCBF for (4.1) on a set C as in (3.3), and let Assump-
tions 5.1 and 5.2 hold. Define the set-valued map
 
K (x, θ̂, ϑ̃) := u ∈ U | L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ −α(h(x)) + L F h(x) ϑ̃ .

Then, any locally Lipschitz controller u = k(x, θ̂, ϑ̃(t)) satisfying k(x, θ̂, ϑ̃(t)) ∈ K (x, θ̂,
ϑ̃(t)) for all (x, θ̂, ϑ̃(t)) ∈ Rn ×  × [0, ϑ̃max ] with t  → ϑ̃(t) from Assumption 5.2, renders
C forward invariant.

Proof The Lie derivative of h along the closed-loop vector field is

ḣ = L f h(x) + L F h(x)θ + L g h(x)k(x, θ̂, ϑ̃(t))


(5.21)
= L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂, ϑ̃(t)) + L F h(x)θ̃ .

Now let t  → (x(t), θ̂(t)) be the trajectories of the composite system


   
ẋ f (x) + F(x)θ + g(x)k(x, θ̂, ϑ̃(t))
= ,
θ̂˙ τ (θ̂, t)

whose existence and uniqueness on some maximal interval of existence are guaranteed given
the assumptions on the controller and update law. Note that, as under Assumption 5.2 we
have ϑ̃(t) ≤ ϑ̃max , the set K (x(t), θ̂(t), ϑ̃(t)) is non-empty for each t by Lemma 5.1. Hence,
lower bounding ḣ along the system trajectory yields

ḣ ≥L f h(x(t)) + L F h(x(t))θ̂(t) + L g h(x(t))k(x(t), θ̂(t), ϑ̃(t)) − L F h(x(t)) θ̃(t)


≥ − α(h(x(t))) + L F h(x(t)) ϑ̃(t) − L F h(x(t)) θ̃(t)
≥ − α(h(x(t))).
(5.22)
84 5 Adaptive Safety-Critical Control

As ∇h(x(t))  = 0 for any x(t) ∈ ∂C and ḣ ≥ −α(h(x(t))), h is a barrier function for the
closed-loop system and C is forward invariant by Theorem 3.2. 

The above result demonstrates that combining a RaCBF with a parameter estimation routine
that reduces the estimation error online allows for constructing controllers that robustly
enforce safety while reducing conservatism as more data about the system becomes available.
It is interesting to note that if the parameter estimation algorithm enforces convergence of
the estimates to their true values, then the RaCBF controller converges to the standard CBF
controller in the limit as time goes to infinity. It is also important to note that safety is
guaranteed regardless of whether the parameter estimation error converges to zero—safety
is guaranteed so long as the estimation error does not increase along a given trajectory. Given
a RaCBF h and nominal adaptive controller k0 , inputs satisfying the RaCBF conditions can
be enforced by solving the QP
1
k(x, θ̂, ϑ̃) = arg min u − k0 (x, θ̂) 2
u∈U 2
subject to L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ −α(h(x)) + L F h(x) ϑ̃,
(5.23)
which has a locally Lipschitz closed-form solution when U = Rm given by

⎨k0 (x, θ̂) if ψ(x, θ̂, ϑ̃) ≥ 0
k(x, θ̂, ϑ̃) = ψ(x, θ̂, ϑ̃) (5.24)
⎩k0 (x, θ̂) − L h(x) if ψ(x, θ̂, ϑ̃) < 0,
L h(x) 2 g
g

where

ψ(x, θ̂, ϑ̃) = L f h(x) + L F h(x)θ̂ + L g h(x)k0 (x, θ̂) + α(h(x)) − L F h(x) ϑ̃.

Remark 5.1 If the nominal adaptive controller k0 in (5.23) is generated by an aCLF or


ES-aCLF, it may be necessary to use different estimates of θ in the objective function and
constraint. That is, one may need to solve the QP
1
min u − k0 (x, θ̂clf ) 2
u∈U 2
subject to L f h(x) + L F h(x)θ̂cbf + L g h(x)u ≥ −α(h(x)) + L F h(x) ϑ̃,

where θ̂cbf and θ̂clf are estimates generated by the update laws needed to guarantee safety and
stability when using CBFs and CLFs, respectively. This redundancy in parameter estimation
can be removed using the methods developed in Chap. 6.
5.2 Robust Adaptive Control Barrier Functions 85

We close this section by providing a particular example of a parameter estimator that


satisfies Assumption 5.2 using the concurrent learning method introduced in Sect. 4.2. Recall
that integrating (4.1) over some finite time interval [t − t, t] yields the linear regression
equation from (4.18)
Y(t) = F (t)θ,
with Y and F defined as in (4.17). Using the above relation, one can update the parameters
as
M  
θ̂˙ = γ F (t) Y (t) − F (t)θ̂ ,
j j j (5.25)
j=1

where γ ∈ R>0 is a learning gain, given a history stack H = {(Y j , F j )} M


j=1 consisting of
tuples of Y and F recorded at various instances along the system trajectory.

Proposition 5.4 Let the estimated parameters be updated according to (5.25) and consider
the initial value problem
⎛ ⎞
M
˙
ϑ̃(t) = − γ λmin ⎝ F j (t) F j (t)⎠ ϑ̃(t)
j=1 (5.26)

ϑ̃(0) = ϑ̃max .

Then, θ̂ (t) ∈  for all t ∈ R≥0 , t  → ϑ̃(t) is non-increasing, and θ̃ (t) ≤ ϑ̃(t) for all
t ∈ R≥0 .

Proof Provided the parameters are updated according to (5.25), the parameter estimation
error evolves according to
M
θ̃˙ = −γ F j (t) F j (t)θ̃, (5.27)
j=1

which can be upper bounded as


⎛ ⎞
M
θ̃˙ ≤ −γ λmin ⎝ F j (t) F j (t)⎠ θ̃ . (5.28)
j=1

M
As F j (t) is at least positive semidefinite for all t ∈ R≥0 , we have θ̃˙ ≤ 0. By
j=1 F j (t)
the same reasoning, we also have that ϑ̃˙ ≤ 0. Using the comparison lemma (Lemma 2.1) to
solve the above differential inequality implies that
⎛ ⎛ ⎞ ⎞
 t M
θ̃ (t) ≤ exp ⎝−γ λmin ⎝ F j (s) F j (s)⎠ ds ⎠ θ̃ (0) . (5.29)
0 j=1
86 5 Adaptive Safety-Critical Control

Furthermore, the solution to the initial value problem in (5.26) is given by


⎛ ⎛ ⎞ ⎞
 t M
ϑ̃(t) = exp ⎝−γ λmin ⎝ F j (s) F j (s)⎠ ds ⎠ ϑ̃max . (5.30)
0 j=1

Since θ̃(0) ≤ ϑ̃max , the above implies that θ̃ (t) ≤ ϑ̃(t) ≤ ϑ̃max for all t ∈ R≥0 , which,
based on (5.12), also ensures θ̂ (t) ∈  for all t ∈ R≥0 , as desired. 

5.3 High Order Robust Adaptive Control Barrier Functions

In the previous section, we outlined a methodology for safe robust adaptive control using
CBFs. The main limitation of such an approach is that it is contingent on knowledge of a
CBF-like function, which implicitly requires knowledge of a controlled invariant set that can
be expressed as the zero superlevel set of a single continuously differentiable function. As
we discussed in Sect. 3.3, constructing such a function may be challenging as a user-defined
state constraint set does not typically coincide with a controlled invariant set. Rather, one
may need to search for a controlled invariant subset of the state constraint set—a process
that, under certain assumptions, can be carried out using the high order CBF (HOCBF)
approach.
In this section, we unite the HOCBF approach with the safe robust adaptive control
approach outlined in the previous sections of this chapter. Similar to Sect. 3.3, we begin by
considering the state constraint set

C0 = {x ∈ Rn | h(x) ≥ 0},

where h : Rn → R has relative degree r ∈ N. Our development proceeds by placing addi-


tional assumptions on the structure of the uncertainty in (4.1). Namely, we assume that if h
has relative degree r , then the uncertain parameters only appear in the r th derivative of h
along the system dynamics.

Assumption 5.3 If h : Rn → R has relative degree r ∈ N for (4.1), then L F L if h(x)  = 0


for all x ∈ Rn and all i ∈ {0, 1, . . . , r − 2}, and there exists some nonempty set R ⊂ Rn
such that L F L rf−1 h(x)  = 0 for all x ∈ R.

Although we have only defined the notion of relative degree (see Def. 3.9) for control affine
systems without uncertain parameters (2.10), the same criteria for relative degree applies to
(4.1) as well. Following the same approach as in Sect. 3.3, we compute the derivative of h
along the dynamics (4.1) until both the control input and uncertain parameters appear. To
this end, we once again consider the collection of functions from (3.17):
5.3 High Order Robust Adaptive Control Barrier Functions 87

ψ0 (x) = h(x)
ψi (x) = ψ̇i−1 (x) + αi (ψi−1 (x)), ∀i ∈ {1, . . . , r − 1},

where αi ∈ K∞ e . If h has relative degree r and Assumption 5.3 holds, then each ψ , i ∈
i
{0, . . . , r − 1} will be independent of both u and θ for all x ∈ Rn , whereas ψ̇r −1 (x, u, θ )
will depend on both u and θ . Each ψi , i ∈ {0, . . . , r − 1}, is associated to a set Ci ⊂ Rn as
in (3.18):
Ci = {x ∈ Rn | ψi (x) ≥ 0},
and we define the candidate safe set as in (3.19):
−1
r
C= Ci .
i=0

Before proceeding, we note that throughout this section it will be assumed that Assump-
tion 5.1 holds so that there exists some maximum possible parameter estimation error ϑ̃max .
The following definition extends the concept of a HOCBF to nonlinear control systems with
parametric uncertainty.

Definition 5.3 (High order RaCBF) Let h : Rn → R have relative degree r ∈ N for (4.1)
such that ∇ψi (x)  = 0 for all x ∈ ∂Ci for each i ∈ {0, . . . , r − 1}, and let Assumption 5.3
hold. Then, h is said to be a high order robust adaptive control barrier function (HO-RaCBF)
for (4.1) on a set C as in (3.19) if there exists αr ∈ K∞e such that for all (x, θ̂) ∈ Rn × 

 
sup L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u
u∈U (5.31)
> −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃max .

Similar to the previous section, h is a HO-RaCBF if U = Rm and

∀(x, θ̂) ∈ Rn ×  : L g ψr −1 (x) =⇒ L f ψr −1 (x) + L F ψr −1 (x)θ̂


> −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃max ,

which can be further simplified when the parameters are matched.

Proposition 5.5 Let h be a HOCBF for ẋ = f (x) + g(x)u with U = Rm on a set C defined
as in (3.19), and suppose the parameters in (4.1)are matched. Then, h is a HO-RaCBF for
(4.1) on C.

Proof The proof follows the same steps as that of Proposition 4.1. 

Recall from Sect. 3.3 that it may be challenging to construct h such that the HOCBF
conditions are satisfied at all points where L g ψr −1 (x) = 0; however, if such points are
88 5 Adaptive Safety-Critical Control

strictly bounded away from the boundary of the constraint set, h can always be modified
so that the HOCBF conditions hold. Keeping in line with the approach taken in previous
sections, we define the set-valued map

K ψ (x, θ̂) = {u ∈ U | L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u


(5.32)
≥ −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃max },

that assigns to each (x, θ̂) ∈ Rn ×  the set of controls satisfying the HO-RaCBF condition
(5.31). The following proposition shows that drawing controllers (in a point-wise sense)
from such a map renders C forward invariant for the closed-loop system

Proposition 5.6 Let h : Rn → R be a HO-RaCBF for (4.1) on a set C ⊂ Rn as in (3.19)


and let Assumption 5.1 hold. Then, any locally Lipschitz controller u = k(x, θ̂) satisfying
k(x, θ̂) ∈ K ψ (x, θ̂) for all (x, θ̂) ∈ Rn ×  renders C forward invariant.

Proof Define the closed-loop system vector field by

f cl (x, θ̂) := f (x) + F(x)θ + g(x)k(x, θ̂),

and note that the Lie derivative of ψr −1 along f cl satisfies

ψ̇r −1 = L fcl (x, θ̂)


= L f ψr −1 (x) + L F ψr −1 (x)θ + L g ψr −1 (x)k(x, θ̂)
= L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)k(x, θ̂) + L F ψr −1 (x)θ̃
≥L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)k(x, θ̂) − L F ψr −1 (x) ϑ̃max
≥ − αr (ψr −1 (x)).

The remainder of the proof then follows the same steps as those of Theorem 3.4. 

Similar to RaCBFs, the validity of a HO-RaCBFs as a safety certificate remains valid as the
level of uncertainty is reduced.

Lemma 5.2 If h is a HO-RaCBF for (4.1) for a given ϑ̃max ∈ R≥0 , then it is also a HO-
RaCBF for any ϑ̃ ∈ [0, ϑ̃max ] in the sense that for all (x, θ̂) ∈ Rn × 
 
sup L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u
u∈U (5.33)
> −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃.

Proof Follows the same steps as that of Lemma 5.1. 


5.4 Numerical Examples 89

Theorem 5.3 Let h : Rn → R be a HO-RaCBF for (4.1) C as in (3.19), and let Assump-
tions 5.1 and 5.2 hold. Define the set-valued map

K ψ (x, θ̂, ϑ̃) = {u ∈ U | L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u


≥ −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃}.

Then, any locally Lipschitz controller u = k(x, θ̂, ϑ̃(t)) satisfying k(x, θ̂, ϑ̃(t)) ∈ K (x, θ̂,
ϑ̃(t)) for all (x, θ̂, ϑ̃(t)) ∈ Rn ×  × [0, ϑ̃max ] with t  → ϑ̃(t) from Assumption 5.2, renders
C forward invariant.

Proof The proof is a combination of that of Theorem 5.2 and Proposition 5.6. 

Given a HO-RaCBF and a parameter estimator satisfying Assumption 5.2, inputs satisfying
the conditions of the above theorem can be computed by solving the QP
1
k(x, θ̂, ϑ̃) = arg min u − k0 (x, θ̂) 2
u∈U 2
(5.34)
subject to L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u
≥ −αr (ψr −1 (x)) + L F ψr −1 (x) ϑ̃,

where k0 is a locally Lipschitz nominal adaptive controller. When U = Rm , the closed-form


solution to the above QP is given by

⎨k0 (x, θ̂), if (x, θ̂, ϑ̃) ≥ 0
k(x, θ̂, ϑ̃) = (x, θ̂ , ϑ̃) (5.35)
⎩k0 (x, θ̂) − L ψ (x) , if (x, θ̂, ϑ̃) < 0,
L ψ (x) 2 g r −1
g r −1

where
(x, θ̂, ϑ̃) := L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)k0 (x, θ̂)
+ αr (ψr −1 (x)) − L F ψr −1 (x) ϑ̃,
and is locally Lipschitz.

5.4 Numerical Examples

Example 5.1 We illustrate many of the ideas introduced in this chapter using a more com-
plex version of the robot motion planning problem introduced in Example 3.2. In particular,
we now consider a mobile robot modeled as a planar double integrator with uncertain friction
effects
q̈ = u − μq̇, (5.36)
90 5 Adaptive Safety-Critical Control

where q ∈ R2 denotes the position of the robot, u ∈ R2 is its commanded acceleration, and
μ ∈ R2 is a vector of uncertain friction coefficients. Taking the state as x := [q q̇ ] and
the uncertain parameters as θ := μ allows the system to be expressed in the form of (4.1) as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ẋ1 x3 0 0 0 0
⎢ẋ2 ⎥ ⎢x4 ⎥ ⎢ 0 0 ⎥ ⎢0 0⎥
⎢ ⎥=⎢ ⎥+⎢ ⎥ ⎢ ⎥ u.
⎣ẋ3 ⎦ ⎣ 0 ⎦ ⎣−x3 0 ⎦ θ + ⎣1 0⎦
(5.37)
ẋ4 0 0 −x4 0 1

For simplicity, we set θ = [1 1] . Note from (5.36) that the uncertain parameters are clearly
matched to the control input. The objective is to stabilize the system to the origin while
avoiding a set of static obstacle in the workspace. To achieve the stabilization objective,
we construct an exponentially stabilizing adaptive control Lyapunov function (ES-aCLF)
introduced in Chap. 4 by solving an LQR problem for the nominal system without any
uncertainty. The safety objective is achieved by considering a collection of state constraint
sets of the form (3.16) with

h i (x) = (x1 − y1,i )2 + (x2 − y2,i )2 − Ri2 ,

where [y1,i y2,i ] ∈ R2 denote the location of the obstacle’s center and Ri ∈ R>0 denotes
its radius, which are used to form a collection of candidate HO-RaCBFs defined by the
extended class K∞ functions α1 (r ) = α2 (r ) = r . With straightforward calculations one can
verify that such a function has relative degree 2 for the control input and uncertain parameters.
To demonstrate the impact of reducing the level of uncertainty online, we simulate the system
under the resulting HO-RaCBF/ES-aCLF controller and compare the results to controllers
that account for the worst-case uncertainty without any online adaptation. To learn the
uncertain parameters online, we use the concurrent learning estimator outlined in (5.25),
where we maintain a history stack with M = 20 entries using an integration window of
t = 0.5 s and a learning gain of γ = 10. As noted in Remark 5.1, different update laws
are required for the ES-aCLF and HO-RaCBF to provide stability and safety guarantees,
respectively. Hence, we maintain two separate estimates of the uncertain parameters, where
the parameters for the ES-aCLF controller are updated using the estimator proposed in
Theorem 4.4 with  = I2×2 and γ = 10. Note that, although two different update laws must
be used, the same history stack can be shared between them.
To demonstrate the relationship between adaptation and safety, we run a set of simula-
tions comparing performance using the HO-RaCBF controller to that under a purely robust
approach (i.e., accounting for the maximum possible parameter estimation error without
adaption), where the set of possible parameters  satisfying Assumption 5.1 are varied. The
results of these simulations are reported in Figs. 5.1, 5.2 and 5.3. As shown in Figures 5.1
and 5.2, the trajectories under both controllers are safe for all levels of uncertainty; however,
for larger levels of uncertainty, the purely robust controller is overly conservative, causing
the system trajectory to diverge from the origin. In contrast, the adaptive controller reduces
5.4 Numerical Examples 91

Fig.5.1 Position trajectory for the mobile robot under each controller across four different uncertainty
sets, where the solid lines denote trajectories under the adaptive controller and the dashed lines denote
trajectories under the purely robust controller. The gray disks represent obstacles in the workspace

Fig.5.2 Minimum value among the two HO-RaCBFs point-wise in time along each system trajectory.
The solid and dashed curves have the same interpretation as those in Fig. 5.1 and the dashed black
line denotes h(x) = 0
92 5 Adaptive Safety-Critical Control

Fig. 5.3 Estimates of the uncertain parameters used in the ES-aCLF and HO-RaCBF controller for
the simulation corresponding to the uncertainty set  = [0, 3]2

the uncertainty online and achieves dual objectives of stability and safety. In fact, the con-
vergence of the trajectory to the origin under the adaptive controller is minimally affected by
the initial level of uncertainty, whereas the trajectory under the robust controller fails to con-
verge to the origin in the presence of large uncertainty. The ability of the concurrent learning
estimators to identify the uncertain parameters is demonstrated in Fig. 5.3, which shows the
trajectory of the estimated parameters used in the ES-aCLF and HO-RaCBF controller, both
of which converge to their true values in just under 15 s.

5.5 Notes

In this chapter, we discussed extensions of traditional nonlinear adaptive control techniques


from stability specifications to safety specifications. Early works on incorporating safety
constraints into adaptive control approaches relied on the use of barrier Lyapunov functions
(BLF) [1]; however, as noted in Chap. 3 BLFs are often overly restrictive. Other promising
approaches to safe adaptive control leverage model predictive control (MPC). In this set-
ting, safety in the presence of uncertainties is typically enforced using a tube-based MPC
approach, in which the size of the tube is reduced by reducing the uncertainty in the parameter
estimates using various parameter estimation routines [2–5].
The notion of an adaptive control barrier function (aCBF) was introduced in [6], and
extended the adaptive control Lyapunov function (aCLF) paradigm [7] from stability to
safety problems. These ideas were extended using the notion of a robust aCBF (RaCBF) in
[8], where techniques based on set membership identification [9] were used to reduce the
level of uncertainty in the parameter estimates online. The idea of using concurrent learning
to reduce parameter uncertainty when using CBFs was introduced in [10], with related
works [11] also exploiting the concurrent learning paradigm to reduce parameter uncertainty
online. The extension of RaCBFs to high relative degree safety constraints was introduced in
[12]. Other related works uniting ideas from adaptive control and CBFs include [13–16]. A
References 93

different version of an aCBF was introduced in [17], where adaptation was used to adhere to
potentially time-varying control bounds rather than to estimate unknown parameters online.
Beyond traditional adaptive control methods, other learning-based approaches have also
been combined with CBFs to develop safety-critical controllers for uncertain systems. The
methods in [18–20] leverage an episodic learning framework to reduce the impact of uncer-
tainties on the CBF conditions, allowing potential safety violations to be reduced episodically
as more data about the system is collected. By delegating the learning process to offline com-
putations, such approaches allow for the use of powerful functions approximators, such as
deep neural networks, to represent unknown terms in the system dynamics. Along similar
lines, probabilistic approaches that leverage Gaussian processes (GPs) as function approxi-
mators have been combined with CBFs as they allow for making high confidence statements
regarding estimated model errors. Works that combine GPs with CBFs to develop controllers
for uncertain systems with probabilistic safety guarantees include [21–24].

References

1. Tee KP, Ge SS, Tay EH (2009) Barrier lyapunov functions for the control of output-constrained
nonlinear systems. Automatica 45(4):918–927
2. Tanaskovic M, Fagiano L, Smith R, Morari M (2014) Adaptive receding horizon control for
constrained mimo systems. Automatica 50:3019–3029
3. Lopez BT (2019) Adaptive robust model predictive control for nonlinear systems. PhD thesis,
Massachusetts Institute of Technology
4. Lu X, Cannon M, Koksal-Rivet D (2021) Robust adaptive model predictive control: performance
and parameter estimation. Int J Robust Nonlinear Control 31(18):8703–8724
5. Köhler J, Kötting P, Soloperto R, Allgöwer F, Müller MA (2021) A robust adaptive model
predictive control framework for nonlinear uncertain systems. Int J Robust Nonlinear Control
31(18)
6. Taylor AJ, Ames AD (2020) Adaptive safety with control barrier functions. In: Proceedings of
the American control conference, pp 1399–1405
7. Krstić M, Kokotović P (1995) Control lyapunov functions for adaptive nonlinear stabilization.
Syst & Control Lett 26(1):17–23
8. Lopez BT, Slotine JJ, How JP (2021) Robust adaptive control barrier functions: an adaptive and
data-driven approach to safety. IEEE Control Syst Lett 5(3):1031–1036
9. Kosut RL, Lau MK, Boyd SP (1992) Set-membership identification of systems with parametric
and nonparametric uncertainty. IEEE Trans Autom Control 37(7):929–941
10. Isaly A, Patil OS, Sanfelice RG, Dixon WE (2021) Adaptive safety with multiple barrier functions
using integral concurrent learning. In: Proceedings of the American control conference, pp 3719–
3724
11. Azimi V, Hutchinson S (2021) Exponential control lyapunov-barrier function using a filtering-
based concurrent learning adaptive approach. IEEE Trans Autom Control
12. Cohen MH, Belta C (2022) High order robust adaptive control barrier functions and exponen-
tially stabilizing adaptive control lyapunov functions. In: Proceedings of the American control
conference pp 2233–2238
94 5 Adaptive Safety-Critical Control

13. Zhao P, Mao Y, Tao C, Hovakimyan N, Wang X (2020) Adaptive robust quadratic programs using
control lyapunov and barrier functions. In: Proceedings of the IEEE conference on decision and
control, pp 3353–3358
14. Black M, Arabi E, Panagou D (2021) A fixed-time stable adaptation law for safety-critical control
under parametric uncertainty. In: Proceedings of the European control conference, pp 1328–1333
15. Maghenem M, Taylor AJ, Ames AD, Sanfelice RG (2021) Adaptive safety using control barrier
functions and hybrid adaptation. In: Proceedings of the American control conference, pp 2418–
2423
16. Nguyen Q, Sreenath K (2022) L1 adaptive control barrier functions for nonlinear underactuated
systems. In: Proceedings of the American control conference, pp 721–728
17. Xiao W, Belta C, Cassandras CG (2022) Adaptive control barrier functions. IEEE Trans Autom
Control 67(5):2267–2281
18. Taylor AJ, Singletary A, Yue Y, Ames A (2020) Learning for safety-critical control with control
barrier functions. In: Proceedings of the 2nd annual conference on learning for dynamics and
control. Proceedings of machine learning research, vol 120, pp 708–717
19. Taylor AJ, Singletary A, Yue Y, Ames AD (2020) A control barrier perspective on episodic
learning via projection-to-state safety. IEEE Control Syst Lett 5(3):1019–1024
20. Csomay-Shanklin N, Cosner RK, Dai M, Taylor AJ, Ames AD (2021) Episodic learning for safe
bipedal locomotion with control barrier functions and projection-to-state safety. In: Proceedings
of the 3rd annual conference on learning for dynamics and control. Proceedings of machine
learning research, vol 144, pp 1041–1053
21. Castaneda F, Choi JJ, Zhang B, Tomlin CJ, Sreenath K (2021) Pointwise feasibility of gaus-
sian process-based safety-critical control under model uncertainty. In: Proceedings of the IEEE
conference on decision and control, pp 6762–6769
22. Dhiman V, Khojasteh MJ, Franceschetti M, Atanasov N (2021) Control barriers in bayesian
learning of system dynamics. IEEE Trans Autom Control
23. Fan DD, Nguyen J, Thakker R, Alatur N, Agha-mohammadi A, Theodorou EA (2020) Bayesian
learning-based adaptive control for safety critical systems. In: Proceedings of the IEEE interna-
tional conference on robotics and automation, pp 4093–4099
24. Cheng R, Khojasteh MJ, Ames AD, Burdick JW (2020) Safe multi-agent interaction through
robust control barrier functions with learned uncertainties. In: Proceedings of the IEEE confer-
ence on decision and control, pp 777–783
A Modular Approach to Adaptive Safety-Critical
Control
6

In the previous chapter, we introduced adaptive control barrier functions (aCBFs) for sys-
tems with uncertain parameters. Central to that approach was the construction of a suitable
parameter estimation algorithm that continuously reduced the level of uncertainty in the
parameter estimates using data collected online. In this chapter, by unifying the concepts of
input-to-state stability (ISS) and input-to-state-safety (ISSf), we develop a framework for
modular adaptive control that addresses some limitations of that method. Specifically, we
show how to allow more freedom in the parameter estimation algorithm, how to relax the
required knowledge on the parameter bounds, and how to reduce the redundancy in param-
eter estimation necessary for safety and stability. In Sect. 6.1, we introduce input-to-state
stability (ISS). The concept of modular adaptive stabilization is defined in Sect. 6.2. The ISS
concept is extended to input-to-state safety (ISSf) in Sect. 6.3. We include numerical exam-
ples in Sect. 6.4 and conclude with final remarks, references, and suggestions for further
reading in Sect. 6.5.

As in the previous few chapters,1 our central object of interest is the nonlinear dynamical
system with parametric uncertainty (4.1), given here again for convenience:

ẋ = f (x) + F(x)θ + g(x)u,

where x ∈ Rn is the state, u ∈ U ⊂ Rm is the control input, θ ∈ R p are the uncertain


parameters, and f : Rn → Rn , F : Rn → Rn× p , g : Rn → Rn×m characterize the sys-
tem dynamics. Letting θ̂ ∈ R p be an estimate of the uncertain parameters, recall that the

1 The term modular adaptive control is often synonymous with indirect adaptive control or estimation-
based adaptive control.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 95


M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_6
96 6 A Modular Approach to Adaptive Safety-Critical Control

parameter estimation error θ̃ ∈ R p is defined as

θ̃ = θ − θ̂.

The estimation error allows (4.1) to be equivalently represented as

ẋ = f (x) + F(x)θ̂ + g(x)u + F(x)θ̃. (6.1)

The approach taken in this chapter is to view the term F(x)θ̃ as a disturbance input to the
nominal system dynamics f (x) + F(x)θ̂ + g(x)u, and then to characterize the impact of
such a disturbance on the stability and safety properties of the nominal system using the
notions of ISS and ISSf.

6.1 Input-to-State Stability

We begin our development by reviewing the concept of input-to-state stability (ISS),


which allows for characterizing stability of nonlinear systems in the presence of distur-
bances/uncertainties. In this section, we briefly introduce the notion of ISS for the general
disturbed dynamical system
ẋ = f (x, d), (6.2)
where x ∈ Rn is the system state, d ∈ R p is a disturbance input, and f : Rn × R p → Rn
is a vector field, locally Lipschitz in its arguments, and later specialize this definition to our
system of interest.

Definition 6.1 (Input-to-state stability) System (6.2) is said to be input-to-state stable if


there exists β ∈ KL and ι ∈ K∞ such that for any initial condition x0 ∈ Rn and any contin-
uous disturbance signal d(·) the solution t  → x(t) of the initial value problem

ẋ(t) = f (x(t), d(t))


x(0) = x0 ,

satisfies
x(t) ≤ β(x0 , t) + ι(d∞ ), (6.3)
where d∞ := supt∈R≥0 d(t). If β(r , s) = cr exp(−λs) for some c, λ ∈ R>0 , then (6.2)
is said to be exponentially ISS (eISS).

The definition of ISS implies that trajectories remain bounded and will converge to a ball
about the origin, the size of which depends on the magnitude of the disturbance input.
Clearly, in the absence of disturbances (i.e., if d∞ = 0) we recover the definition of
asymptotic stability. As with the definitions of stability introduced in Chap. 2, verifying the
ISS property using Definition 6.1 requires knowledge of the system trajectories, which are
6.1 Input-to-State Stability 97

difficult to obtain for nonlinear systems in general. Fortunately, ISS properties can be given
a Lyapunov-like characterization using the notion of an ISS Lyapunov function.

Definition 6.2 (ISS Lyapunov function) A continuously differentiable function V : Rn →


R≥0 is said to be an ISS Lyapunov function for (6.2) if there exist α1 , α2 , ρ ∈ K∞ and
α3 ∈ K such that, for all (x, d) ∈ Rn × R p ,

α1 (x) ≤ V (x) ≤ α2 (x), (6.4)

x ≥ ρ(d) =⇒ L f V (x, d) ≤ −α3 (x). (6.5)

The following theorem shows that the existence of an ISS Lyapunov function is sufficient
to establish ISS of (6.2).

Theorem 6.1 Let V be an ISS Lyapunov function for (6.2). Then, (6.2) is ISS as in (6.3)
with ι(r ) = (α1−1 ◦ α2 ◦ ρ)(r ).

We now specialize the notion of ISS to the uncertain control system (6.1) with the parameter
estimation error acting as a disturbance input to the nominal dynamics.

Definition 6.3 (Exponential ISS-CLF) A continuously differentiable function V : Rn →


R≥0 is said to be an exponential input-to-state stable control Lyapunov function (eISS-CLF)
for (6.1) if there exist positive constants c1 , c2 , c3 , ε ∈ R>0 such that

c1 x2 ≤ V (x) ≤ c2 x2 , ∀x ∈ Rn , (6.6)

and for all (x, θ̂) ∈ (Rn \ {0}) × R p ,


 
inf L f V (x) + L F V (x)θ̂ + L g V (x)u < −c3 V (x) − 1ε L F V (x)2 . (6.7)
u∈U

Similar to the CLFs from earlier chapters, the above definition allows to construct the set-
valued map

K ISS (x, θ̂) := {u ∈ U | L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −c3 V (x) − 1ε L F V (x)2 }


(6.8)
that assigns to each (x, θ̂) ∈ Rn × R p a set of control values satisfying the conditions of
Definition 6.3. The following theorem shows that any locally Lipschitz controller belonging
to the above set enforces ISS of (6.1).

Theorem 6.2 Let V be an eISS-CLF for (6.1) and assume that θ̃ (·) ∈ L∞ . Then, any con-
troller u = k(x, θ̂) ∈ K ISS (x, θ̂) locally Lipschitz on (x, θ̂) ∈ (Rn \ {0}) × R p renders (6.1)
eISS in the sense that, for all t ∈ R≥0 ,
98 6 A Modular Approach to Adaptive Safety-Critical Control

 
c2 1 ε
x(t) ≤ x(0) exp(− 21 c3 t) + θ̃ ∞ . (6.9)
c 2 c1 c3
 1     
β(x(0),t) ι(θ̃ ∞ )

Proof The Lie derivative of V along the closed-loop dynamics can be bounded as

V̇ = L f V (x) + L F V (x)θ̂ + L g V (x)k(x, θ̂) + L F V (x)θ̃


≤ −c3 V (x) − 1ε L F V (x)2 + L F V (x)θ̃
(6.10)
≤ −c3 V (x) + 4ε θ̃ 2
≤ −c3 V (x) + 4ε θ̃ 2∞ ,

where the first inequality follows from the definition of K ISS , the second from completing
squares, and the third from the assumption that θ̃ (·) ∈ L∞ . Invoking the comparison lemma
(Lemma 2.1) and using (6.6) yields

c2 ε
x(t) ≤ x(0)2 exp(−c3 t) + θ̃ 2∞
c1 4c1 c3
  (6.11)
c2 1 ε
≤ x(0) exp(− 2 c3 t) +
1
θ̃∞ ,
c1 2 c1 c3

where the second inequality follows from the fact that the square root is a subadditive
function, which implies that (6.1) is eISS as in (6.9), as desired. 

Similar to previous chapters, given an eISS-CLF, control inputs satisfying the above theorem
can be computed for any (x, θ̂) by solving the optimization problem
1
k(x, θ̂) = arg min u2
u∈U 2
subject to L f V (x) + L F V (x)θ̂ + L g V (x)u ≤ −c3 V (x) − 1ε L F V (x)2 ,
(6.12)
which is a QP when U = R or U is a convex polytope. If U = R , the closed-form
m m

expression to (6.12) is given by



⎨0, if ψ(x, θ̂) ≤ 0
k(x, θ̂) = ψ(x, θ̂) (6.13)
⎩− L V (x) , if ψ(x, θ̂) > 0,
L V (x)2 g
g

where
ψ(x, θ̂) := L f V (x) + L F V (x)θ̂ + c3 V (x) + 1ε L F V (x)2 .
Note that when U = Rm and the parameters in (4.1) are matched, the construction of an
eISS-CLF can be done similarly to the construction of an aCLF (i.e., independently of the
uncertain parameters). This follows from the fact that the statement that V is an eISS-CLF
6.2 Modular Adaptive Stabilization 99

when U = Rm is equivalent to the statement that

L g V (x) = 0 =⇒ L f V (x) + L F V (x)θ̂ < −c3 V (x) − 1ε L F V (x)2 .

Hence, when the parameters are matched L g V (x) = 0 =⇒ L F V (x) = 0, and the above
condition reduces to the standard ES-CLF condition for the nominal dynamics. This obser-
vation is summarized in the following proposition.

Proposition 6.1 Let V : Rn → R≥0 be an ES-CLF for ẋ = f (x) + g(x)u with U = Rm .


If the parameters in (6.1) are matched, then V is an eISS-CLF for (6.1) with U = Rm .

6.2 Modular Adaptive Stabilization

In the previous section we demonstrated how the notion of ISS can be used to characterize the
impact of parameter estimation error on the stability of an adaptive control system, and how
eISS-CLFs could be used to construct controllers ensuring ISS of the closed-loop system. In
the present section we outline properties of a general class of parameter estimators that can
be combined with eISS-CLF-based controllers to provide asymptotic stability guarantees,
rather than ISS guarantees. The following lemma outlines the characteristics of such a class
of parameter estimators.

Lemma 6.1 Consider a parameter update law θ̂˙ = τ (θ̂, t), with τ locally Lipschitz in its
first argument and piecewise continuous in its second, and a Lyapunov-like function Vθ :
R p × R≥0 → R≥0 satisfying

η1 θ̃2 ≤ Vθ (θ̃, t) ≤ η2 θ̃2 , ∀(θ̃ , t) ∈ R p × R≥0 , (6.14)

for some η1 , η2 ∈ R>0 . Provided

V̇θ (θ̃, t) ≤ 0, ∀(θ̃ , t) ∈ R p × R≥0 , (6.15)

then θ̃ (·) ∈ L∞ . Furthermore, if there exists a pair (η3 , T ) ∈ R>0 × R≥0 such that

V̇θ (θ̃ , t) ≤ −η3 θ̃2 , ∀(θ̃ , t) ∈ R p × R≥T , (6.16)

then θ̃ (·) ∈ L2 ∩ L∞ and


η2 η3 η
T − 3 t
θ̃ (t) ≤ θ̃(0)e 2η2 e 2η2 , ∀t ∈ R≥0 . (6.17)
η1

Proof Since V̇θ (θ̃ (t), t) ≤ 0 for all t ∈ R≥0 , Vθ (θ̃(t), t) is nonincreasing and Vθ (θ̃(t), t) ≤
Vθ (θ̃(0), 0) for all t ∈ R≥0 . Using (6.14) this implies that for all t ∈ R≥0
100 6 A Modular Approach to Adaptive Safety-Critical Control

η2
θ̃(t) ≤ η1 θ̃(0), (6.18)

and thus θ̃ (·) ∈ L∞ . For t ∈ R≥T , V̇θ (θ̃ (t), t) ≤ −η3 θ̃ (t)2 , which, after using the com-
parison lemma and (6.14), implies
η
η2 − 2η3 (t−T )
θ̃(t) ≤ η1 θ̃ (T )e
2

η3 η (6.19)
T − 2η3 t
≤ ηη21 θ̃(0)e 2η2 e 2 ,

for all t ∈ R≥T . This bound is also valid for all t ∈ R≥0 as stated in (6.17) since 1 ≤
η3
exp(− 2η 2
(t − T )) for all t ∈ [0, T ]. The bound in (6.17) also implies θ̃ (·) ∈ L2 since

t η3 t η
− η3 s
θ̃ (s)2 ds ≤( ηη21 θ̃(0)e 2η2 )2
T
e 2 ds
0 0
η3 η
− η3 t
=( ηη21 θ̃(0)e 2η2 )2 (− ηη23 (e
T
2 − 1)),

which, after taking limits as t → ∞, implies that


∞ η
η23 3
2 η2 T
θ̃ (s)2 ds ≤ η3 η12  θ̃(0) e < ∞. (6.20)
0

Combining (6.20) and (6.18) yields θ̃(·) ∈ L2 ∩ L∞ . 

The condition in (6.15) requires that the parameter estimation error remains bounded for
all time - a property satisfied by a variety of standard estimation algorithms (e.g., gradient
descent, recursive least squares, etc.). The condition in (6.16) is more restrictive. It asks,
after a certain time period, for the parameter estimates to exponentially converge to their
true values, and is reminiscent of the concurrent learning parameter estimators introduced
in earlier chapters. Later in this chapter we will provide specific examples of concurrent
learning-based parameter estimators satisfying the conditions of Lemma 6.1. We show in
the following theorem that combining a parameter estimator satisfying the conditions of
Lemma 6.1 with a controller satisfying the conditions in (6.7) renders the origin of the
closed-loop system asymptotically stable.

Theorem 6.3 If V is an eISS-CLF for (6.1) and the conditions of Lemma 6.1 hold, then any
bounded controller u = k(x, θ̂) locally Lipschitz on (Rn \ {0}) × R p satisfying k(x, θ̂) ∈
K ISS (x, θ̂) for all (x, θ̂) ∈ Rn × R p renders the origin of (6.1) asymptotically stable.

To prove the above theorem we require a classical tool used extensively in adaptive control
known as Barbalat’s Lemma.
6.3 Input-to-State Safety 101

Lemma 6.2 (Barbalat’s Lemma) Consider a signal x(·) and suppose that x(·), ẋ(·) ∈ L∞
and x(·) ∈ L2 . Then limt→∞ x(t) = 0.

Proof (of Theorem 6.3) The stability of the origin follows directly from Theorem 6.2
since θ̃ (·) ∈ L∞ by Lemma 6.1. To show that the origin is also attractive in the sense
that limt→∞ x(t) = 0 we rearrange the third line of (6.10):

c1 c3 x2 ≤ c3 V (x) ≤ 4ε θ̃2 − V̇ . (6.21)

Integrating the above over a finite time interval [0, t] yields


t ε t
c1 c3 x(s)2 ds ≤ θ̃(s)2 ds − V (x(t)) + V (x(0))
0 4 0
ε t
≤ θ̃(s)2 ds + V (x(0)).
4 0

Taking limits as t → ∞ and noting that θ̃ (·) ∈ L2 by Lemma 6.1 yields


∞ ε ∞
V (x(0))
x(s)2 ds ≤ θ̃(s)2 ds + c1 c3 < ∞,
0 4c1 c3 0

implying x(·) ∈ L2 . It follows from Lemma 6.1 that θ̃ (·) ∈ L∞ and thus θ̂(·) ∈ L∞ . Com-
bining this with the assumption that u = k(x, θ̂) is bounded and x(·) ∈ L∞ implies that
ẋ(·) ∈ L∞ . Since x(·), ẋ(·) ∈ L∞ and x(·) ∈ L2 , Lemma 6.2 implies limt→∞ x(t) = 0. 

The above theorem only certifies asymptotic stability whereas the exponentially stabiliz-
ing adaptive CLF (ES-aCLF) controllers posed in Chap. 4 enforce exponential stability. It
should be noted, however, that the exponential stability results of Chap. 4 are established
with respect to a composite system consisting of the original dynamical system and the
parameter estimation error dynamics. The ISS approach presented herein has the benefit
of characterizing the transient behavior of the system trajectory rather than a composite
system trajectory. Moreover, as discussed earlier in this chapter, taking the ISS approach
outlined above has certain benefits when combined with CBF-based adaptive controllers, as
demonstrated in the subsequent section.

6.3 Input-to-State Safety

In this section we present an extension of the ISS formalism to safety specifications using
the notion of input-to-state safety (ISSf). Similar to ISS, the framework of ISSf allows
for characterizing the degradation of safety guarantees in the presence of uncertainties.
In particular, the ISSf framework is concerned with establishing forward invariance of an
inflated safe set Cδ ⊃ C whose inflation is proportional to the magnitude of uncertainty
102 6 A Modular Approach to Adaptive Safety-Critical Control

perturbing the nominal system dynamics. Formally, given a continuously differentiable


function h : Rn → R we define the inflated safe set for some δ ∈ R≥0 as

Cδ :={x ∈ Rn | h(x) + γ (δ) ≥ 0},


∂ Cδ :={x ∈ Rn | h(x) + γ (δ) = 0}, (6.22)
Int(Cδ ) :={x ∈ R | h(x) + γ (δ) > 0},
n

where γ ∈ K∞ . The fact that γ ∈ K∞ implies that C = Cδ , with C as in (3.3), when δ = 0,


implying that we recover the original safe set in the absence of any uncertainty. We first
define the notion of ISSf for the uncertain dynamical system (6.2) and then extend it to our
system of interest (6.1) in the context of control design.

Definition 6.4 (Input-to-state safety) Consider system (6.2) and a set C ⊂ Rn as in (3.3).
System (6.2) is said to be input-to-state safe if there exists γ ∈ K∞ and δ ∈ R≥0 such that
for all d(·) satisfying d∞ ≤ δ, the set Cδ in (6.22) is forward invariant.

The above definition provides a pathway towards developing controllers for (6.1) using the
notion of an input-to-state safe control barrier function (ISSf-CBF).

Definition 6.5 (Input-to-state safe CBF) A continuously differentiable function h : Rn →


R defining a set C ⊂ Rn as in (3.3) is said to be an input-to-state safe control barrier function
for (6.1) on C if ∇h(x)  = 0 for all x ∈ ∂ C and there exists α ∈ K∞ e ,ε ∈R
>0 such that for
all (x, θ̂) ∈ R × R
n p

 
sup L f h(x) + L F h(x)θ̂ + L g h(x)u > −α(h(x)) + 1ε L F h(x)2 . (6.23)
u∈U

The above definition allows for characterizing the set of all controllers meeting the criteria
in (6.23) as
 
K ISSf (x, θ̂) := u ∈ U | L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ −α(h(x)) + 1ε L F h(x)2 .
(6.24)
We show in the following theorem that any locally Lipschitz controller satisfying k(x, θ̂) ∈
K ISSf (x, θ̂) renders the closed-loop system ISSf.

Theorem 6.4 Let h be an ISSf-CBF for (6.1) on C and assume that θ̃(·) ∈ L∞ such that
θ̃ ∞ ≤ δ for some δ ∈ R≥0 . Then, any locally Lipschitz controller u = k(x, θ̂) satisfying
k(x, θ̂) ∈ K ISSf (x, θ̂) for all (x, θ̂) ∈ Rn × R p renders Cδ forward invariant for the closed-
loop system with  
εδ 2
γ (δ) := −α −1 − . (6.25)
4
6.3 Input-to-State Safety 103

Proof Taking the Lie derivative of h along the closed-loop system and lower bounding
yields
ḣ = L f h(x) + L F h(x)θ̂ + L g h(x)k(x, θ̂) + L F h(x)θ̃
≥ −α(h(x)) + 1ε L F h(x)2 + L F h(x)θ̃
≥ −α(h(x)) + 1ε L F h(x)2 − L F h(x)θ̃∞ (6.26)
≥ −α(h(x)) + 1ε L F h(x)2 − L F h(x)δ
εδ 2
≥ −α(h(x)) − ,
4
where the first inequality follows from K ISSf , the second from

L F h(x)θ̃ ≥ −L F h(x)θ̃ ≥ −L F h(x)θ̃∞ ,

the third from θ̃∞ ≤ δ, and the fourth from completing squares. Now define

h δ (x, δ) := h(x) + γ (δ), (6.27)

and note that Cδ is the zero-superlevel set of h δ . Taking the Lie derivative of h δ along the
closed-loop system and lower bounding yields

εδ 2
ḣ δ = ḣ + γ̇ (δ) = ḣ ≥ −α(h(x)) − . (6.28)
4
Hence, to show that Cδ is forward invariant, we must show that ḣ ≥ 0 whenever x ∈ ∂ Cδ .
To this end, for x ∈ ∂ Cδ we have
 
−1 εδ 2
h(x) = −γ (δ) = α − , (6.29)
4

which implies that, for x ∈ ∂ Cδ ,

εδ 2
α(h(x)) + = 0,
4
and it follows from (6.28) that

x ∈ ∂ Cδ =⇒ ḣ δ ≥ 0,

which implies the forward invariance of Cδ by Corollary 3.1, as desired. 

Given an ISSf-CBF and a nominal adaptive policy k0 (x, θ̂), e.g., an eISS-CLF-QP con-
troller from earlier in this chapter, an ISSf-CBF-based safety filter can be constructed by
solving the optimization problem
104 6 A Modular Approach to Adaptive Safety-Critical Control

1
k(x, θ̂) = arg min u − k0 (x, θ̂)2
u∈U 2
subject to L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ −α(h(x)) + 1ε L F h(x)2 ,
(6.30)
which is a QP when U = Rm or U is a convex polytope. If U = Rm , the closed-form
expression to (6.30) is given by

⎨k0 (x, θ̂), if ψ(x, θ̂) ≥ 0
k(x, θ̂) = ψ(x, θ̂ ) (6.31)
⎩k0 (x, θ̂) − L h(x) , if ψ(x, θ̂) < 0,
L h(x)2 g
g

where

ψ(x, θ̂) := L f h(x) + L F V (x)θ̂ + L g h(x)k0 (x, θ̂) + α(h(x)) + 1ε L F h(x)2 .

A benefit of the above controller is that a single set of parameters are shared between the
nominal (typically performance-based policy) and the safety controller, which is in contrast
to the methods developed in Chaps. 4 and 5 that require the parameters to be updated in a
particular way to guarantee stability and safety. As with the majority of results outlined thus
far, when the parameters in (6.1) are matched and U = Rm , construction of an ISSf-CBF
can be done independently of the uncertain parameters.

Proposition 6.2 Let h : Rn → R be a CBF for ẋ = f (x) + g(x)u with U = Rm . If the


parameters in (6.1) are matched, then h is an ISSf-CBF for (6.1) with U = Rm .

As discussed in Chap. 3, the construction of a CBF may be challenging when the user-
defined constraint set does not coincide with a controlled invariant safe set. In what follows,
we partially2 address this challenge by demonstrating how to extend the high order CBF
(HOCBF) methodology from previous chapters to this ISSf setting. Our development par-
allels that of Sect. 5.3: we first consider a constraint set

C0 := {x ∈ Rn | h(x) ≥ 0},

for some constraint function h : Rn → R with relative degree r , and assume that Assump-
tion 5.3 holds so that both the control input and uncertain parameters only appear in h (r ) ,
the r th derivative of h along system (6.1). We then recursively construct the collection of
functions from (3.17)

ψ0 (x) =h(x)
ψi (x) =ψ̇i−1 (x) + αi (ψi−1 (x)), ∀i ∈ {1, . . . , r − 1},

2 We only partially address this issue as all of our results regarding the construction of controlled
invariant sets using the HOCBF methodology leverage the simplifying assumption that no actuation
bounds are present.
6.3 Input-to-State Safety 105

where each αi ∈ K∞
e , which is used to construct the candidate safe set

−1
r
C := Ci ,
i=0

where
Ci := {x ∈ Rn | ψi (x) ≥ 0}, ∀i ∈ {0, . . . , r − 1}.
To extend the HOCBF framework to this ISSf setting, we define a collection of inflated sets

Cδ,i := {x ∈ Rn | ψi (x) + γi (δ) ≥ 0}, ∀i ∈ {0, . . . , r − 1} (6.32)

which is used to define the overall inflated safe set


−1
r
Cδ := Cδ,i , (6.33)
i=0

where each γi ∈ K∞ , whose controlled invariance we wish to certify using the notion of an
input-to-state safe high order control barrier function (ISSf-HOCBF).

Definition 6.6 (ISSf high order CBF) Let h : Rn → R have relative degree r ∈ N at some
x ∈ Rn for (6.1) such that ∇ψi (x)  = 0 for all x ∈ ∂ Ci for each i ∈ {0, . . . , r − 1} and let
Assumption 5.3 hold. The function h is said to be an input-to-state safe high order control
barrier function for (6.1) on C as in (3.19) if there exist αr ∈ K∞
e ,ε ∈R
>0 such that for all
(x, θ̂) ∈ R × R
n p

sup {L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u}


u∈U (6.34)
> −αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 .

The above definition allows for constructing the set of all control policies satisfying the
criteria in (6.34) as

K ψ (x, θ̂) := {u ∈ U | L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u


(6.35)
≥ −αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 },

and we show in the following theorem that any locally Lipschitz policy belonging to the
above set renders Cδ forward invariant.

Theorem 6.5 Let h be an ISSf-HOCBF for (6.1) on C ⊂ Rn as in (3.19) and assume that θ̃(·)
is such that θ̃ ∞ ≤ δ for some δ ∈ R≥0 . Then, any locally Lipschitz controller u = k(x, θ̂)
satisfying k(x, θ̂) ∈ K ψ (x, θ̂) for all (x, θ̂) ∈ Rn × R p , with K ψ as in (6.35), renders Cδ
from (6.33) forward invariant for the closed-loop system with
106 6 A Modular Approach to Adaptive Safety-Critical Control

 
εδ 2
γr −1 (δ) := − αr−1 −
4 (6.36)
−1
γi (δ) := − αi+1 (−γi+1 (δ)), ∀i ∈ {0, . . . , r − 2}.

Proof The proof is approached in the same manner as that of Theorem 6.4. Taking the Lie
derivative of ψr −1 along the closed-loop system and lower bounding yields

ψ̇r −1 =L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)k(x, θ̂) + L F ψr −1 (x)θ̃


≥ − αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 + L F ψr −1 (x)θ̃
≥ − αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 − L F ψr −1 (x)θ̃∞ (6.37)
≥ − αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 − L F ψr −1 (x)δ
εδ 2
≥ − αr (ψr −1 (x)) − ,
4
where the first inequality follows from K ψ , the second from

L F ψr −1 (x)θ̃ ≥ −L F ψr −1 (x)θ̃ ≥ −L F ψr −1 (x)θ̃∞ ,

the third from θ̃∞ ≤ δ, and the fourth from completing squares. Now define

ψδ,r −1 (x, δ) := ψr −1 (x) + γr −1 (δ), (6.38)

and note that Cδ,r −1 is the zero-superlevel set of ψδ,r −1 . Taking the Lie derivative of ψδ,r −1
along the closed-loop system and lower bounding yields

εδ 2
ψ̇δ,r −1 = ψ̇r −1 + γ̇r −1 (δ) = ψ̇r −1 ≥ −αr (ψr −1 (x)) − . (6.39)
4
To show that Cδ,r −1 is forward invariant we must show that x ∈ ∂ Cδ,r −1 =⇒ ψ̇r −1 ≥ 0.
To this end, observe that

x ∈ ∂ Cδ,r −1 =⇒ ψr −1 (x) = −γr −1 (δ),

so that
εδ 2
x ∈ ∂ Cδ,r −1 =⇒ ψ̇r −1 ≥ −αr (−γr −1 (δ)) − .
4
Taking γr −1 as in (6.36), we have
  
−1 εδ 2 εδ 2
x ∈ ∂ Cδ,r −1 =⇒ ψ̇r −1 ≥ − αr αr − − = 0.
4 4

It then follows from Corollary 3.1 that Cδ,r −1 is forward invariant for the closed-loop system,
which implies that the closed-loop trajectory t  → x(t) satisfies
6.3 Input-to-State Safety 107

ψr −1 (x(t)) ≥ −γr −1 (δ), ∀t ∈ I (x(0)),

where I (x(0)) ⊂ R≥0 is the trajectory’s maximal interval of existence from an initial con-
dition of x(0) ∈ Cδ . Using the definition of ψr −1 from (3.17), we then have that

ψr −1 (x(t)) = ψ̇r −2 (x(t), θ̃(t)) + αr −1 (ψr −2 (x(t))) ≥ −γr −1 (δ), ∀t ∈ I (x(0)),

which implies that

ψ̇r −2 ≥ −αr −1 (ψr −2 (x(t))) − γr −1 (δ), ∀t ∈ I (x(0)).

Now define
ψδ,r −2 (x, δ) := ψr −2 (x) + γr −2 (δ), (6.40)
and note that Cδ,r −2 is the zero-superlevel set of ψδ,r −2 . Taking the Lie derivative of ψδ,r −2
along the closed-loop system and lower bounding yields

ψ̇δ,r −2 = ψ̇r −2 + γ̇r −2 (δ) = ψ̇r −2 ≥ −αr −1 (ψr −2 (x(t))) − γr −1 (δ). (6.41)

Hence, to show that Cδ,r −2 is forward invariant we must show that x ∈ ∂ Cδ,r −2 =⇒ ψ̇r −2 ≥
0. To this end, observe that

x ∈ ∂ Cδ,r −2 =⇒ ψr −2 (x) = −γr −2 (δ),

so that
x ∈ ∂ Cδ,r −2 =⇒ ψ̇r −2 ≥ −αr −1 (−γr −2 (δ)) − γr −1 (δ).
Taking γr −2 as in (6.36), we have
 
x ∈ ∂ Cδ,r −2 =⇒ ψ̇r −2 ≥ − αr −1 αr−1
−1 (−γr −1 (δ)) − γr −1 (δ) = 0.

It then follows from Corollary 3.1 that Cδ,r −2 is forward invariant for the closed-loop system.
One can then take analogous steps to those outlined above for the remaining ψi terms to show
that, provided x(0) ∈ Cδ , then ψδ,i (x(t)) ≥ 0 for all t ∈ I (x(0)) and all i ∈ {0, . . . , r − 1},
which implies x(t) ∈ Cδ for all t ∈ I (x(0)), as desired. 

As with ISSf-CBFs, given an ISSf-HOCBF and a nominal adaptive policy k0 (x, θ̂), a
safety filter enforcing the invariance of Cδ can be constructed by solving the optimization
problem
1
k(x, θ̂) = arg min u − k0 (x, θ̂)2
u∈U 2
(6.42)
subject to L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)u
≥ −αr (ψr −1 (x)) + 1ε L F ψr −1 (x)2 ,
108 6 A Modular Approach to Adaptive Safety-Critical Control

which is again a QP when U = Rm or U is a convex polytope. Moreover, when U = Rm ,


the above QP has a closed-form solution, which is given by

⎨k0 (x, θ̂), if (x, θ̂) ≥ 0
k(x, θ̂) = (x, θ̂ ) (6.43)
⎩k0 (x, θ̂) − L ψ (x) , if (x, θ̂) < 0,
L ψ (x)2 g r −1
g r −1

where
(x, θ̂) := L f ψr −1 (x) + L F ψr −1 (x)θ̂ + L g ψr −1 (x)k0 (x, θ̂)
+ αr (ψr −1 (x)) − 1ε L F ψr −1 (x)2 .
Similar to ISSf-CBFs, when the parameters in (6.1) are matched and U = Rm , an ISSf-
HOCBF can be constructed using a HOCBF for the nominal system dynamics:

Proposition 6.3 Let h : Rn → R be a HOCBF for ẋ = f (x) + g(x)u with U = Rm . If the


parameters in (6.1) are matched, then h is a ISSf-HOCBF for (6.1).

Recall that care must be taken when constructing HOCBFs as even seemingly benign safety
constraints may produce invalid HOCBFs without further modifications (see Sect. 3.3).

6.4 Numerical Examples

Example 6.1 We illustrate the methods developed in this chapter using a simple obstacle
avoidance scenario for a planar mobile robot modeled as a double integrator with nonlinear
drag effects of the form
q̈ = −D q̇q̇ + u, (6.44)
where q ∈ R2 denotes the robot’s position, u ∈ R2 its commanded acceleration, and
D ∈ R2×2 a diagonal matrix of damping coefficients. Defining x := [q q̇ ] ∈ R4 allows
(6.44) to be represented in the form of (4.1) as
      
q̇ 02×2 D1 02×2
ẋ = + + u, (6.45)
0 diag(q̇q̇) D2 I2×2
         
f (x) F(x) θ g(x)

where 02×2 ∈ R2×2 is a 2 × 2 matrix of zeros, I2×2 is a 2 × 2 identity matrix, diag(·)


constructs a diagonal matrix from a vector, and D1 , D2 ∈ R>0 are the unknown drag coeffi-
cients. Our control objective is to drive (6.44) to the origin while avoiding an obstacle in the
workspace and learning the uncertain parameters online. To estimate the uncertain param-
eters online, we leverage the concurrent learning approach introduced in Chap. 4. Recall
that such an approach is predicated on the observation that, along state-control trajectory
(x(·), u(·)), system (4.1) can be expressed as
6.4 Numerical Examples 109

t t t
ẋ(s)ds = f (x(s))ds + F(x(s))dsθ
max{t−t,0} max{t−t,0} max{t−t,0}
t
+ g(x(s))u(s)ds,
max{t−t,0}

for all t ≥ 0, where t ∈ R>0 is the length of an integration window. Defining


t
Y (t) := (ẋ(s) − f (x(s)) − g(x(s))u(s))ds,
max{t−t,0}
t
F (t) := F(x(s))ds
max{t−t,0}

yields the linear relationship for the uncertain parameters

Y (t) = F (t)θ. (6.46)

The parameters can then be recursively estimated online by storing values of Y and F at
run-time in a history stack H = {(Y j , F j )} M
j=1 using Algorithm 1 from Chap. 4, which can
˙
then be used in a parameter update law θ̂ = τ (θ̂, t) to improve the parameter estimates. For
example, in previous chapters such update laws were derived by forming the prediction error


M
e(θ̂) = Y j − F j θ̂2 ,
j=1

based on the data available in H and then recursively minimizing such an error online using
gradient descent. Rather than taking such an approach, we demonstrate how any estimation
algorithm satisfying the conditions of Lemma 6.1 can be used to endow the resulting adaptive
control system with stability and safety guarantees. To this end, we consider the following
class of update laws

 N
θ̂˙ = −(t)∇e(θ̂) = (t) F j (Y j − F j θ̂ ), (6.47)
j=1

which serves as a general template for particular update laws based on the properties of (·)
as follows:
˙ = 0, (6.48a)
 N 
˙ = − F j F j , (6.48b)
j=1

N 
˙ = β −  F j F j , (6.48c)
j=1
110 6 A Modular Approach to Adaptive Safety-Critical Control

  N 

˙ = β 1 − − F j F j , (6.48d)
¯ j=1

where β ∈ R>0 is a forgetting/discount factor and ¯ ∈ R>0 is a user-defined constant that


bounds (t). With ˙ as in (6.48), the update law in (6.47) corresponds to: (6.48a) gradient
descent; (6.48b) recursive least squares (RLS); (6.48c) RLS with a forgetting/discount factor;
(6.48d) RLS with a variable forgetting factor. We emphasize that the purpose of our numerical
example is not necessarily to establish superiority of one algorithm over the others; rather, our
goal is to demonstrate that, under the assumptions posed in Lemma 6.1, the stability/safety
guarantees of the controller can be decoupled from the design of the parameter estimator,
which allows considerable freedom in selecting an estimation algorithm best suited for
the problem at hand. We demonstrate the modularity of our approach (i.e., the ability to
decouple the design of the estimator from the controller) by running a set of simulations
with randomly sampled initial conditions for the system state and estimated parameters
under each algorithm, and show that, for a given level of uncertainty, the ISSf guarantees are
invariant to the particular choice of parameter estimator. For each estimation algorithm we
produce 25 different trajectories by uniformly sampling the initial state from [−1.8, −2.2] ×
[1.8, 2.2] × {0} × {0} ⊂ R4 and the initial parameter estimates from [0, 3]2 ⊂ R2 ; the true
parameters are set to θ = [0.8 1.4] . The hyperparameters for the estimation algorithms
are selected as N = 20, (0) = 100I2×2 , β = 1, ¯ = 1000. The stabilization objective is
achieved by considering the eISS-CLF candidate V (x) = 21 q2 + 21 q + q̇2 with c3 = 1
and ε = 20. The safety objective is achieved by considering the constraint function h(x) =
q − qo 2 − Ro2 , where qo = [−1 1] is the center of the circular obstacle and Ro = 0.5 its
radius, which has relative degree 2 for (6.44) with respect to both u and θ . This constraint
function is used to construct an ISSf-HOCBF candidate with α1 (s) = s, α2 (s) = 21 s, and
ε = 1. To determine the validity of h as a valid ISSf-HOCBF we note that as the parameters
in (6.44) are matched, it suffices to show that h is a valid HOCBF for a two-dimensional
double integrator without any uncertainty. To this end, we compute Lie derivatives of h
along the system dynamics  
2(q − q0 )
∇h(x) =
02×1
 
  q̇
L f h(x) = 2(q − q0 ) 01×2 = 2(q − q0 ) q̇
02×1
 
  02×2
L g h(x) = 2(q − q0 ) 01×2 = 01×2 ,
I2×2
and see that h has relative degree larger than one. Computing second order Lie derivatives
yields  
2q̇
∇ L f h(x) =
2(q − q0 )
6.4 Numerical Examples 111

Fig. 6.1 Mean and standard deviation of the norm of the parameter estimation over time generated by
each parameter estimator. The solid lines indicate the average value of θ̃(t) across each simulation,
and the ribbon surrounding each line corresponds to one standard deviation from the mean

 
  02×2
L g L f h(x) = 2q̇ 2(q − q0 ) = 2(q − q0 ) ,
I2×2
which reveals that L g L f h(x)  = 0 for all states whose position does not lie at the center of
the obstacle. Hence, h has relative degree 2 for all x ∈ C , where C is defined recursively by
h as in (3.19), and h is thus a valid ISSf-HOCBF.
For each simulation, the closed-loop trajectory is generated by the ISSf-HOCBF con-
troller in (6.42), where the nominal adaptive policy is chosen as the eISS-CLF controller
from (6.12), the results of which are provided in Figs. 6.1 and 6.2. As shown in Fig. 6.2, the
trajectories under each update law remain safe and converge to the origin, whereas Fig. 6.1
illustrates the convergence of the parameter estimation error to zero for each estimation
algorithm as predicted by Lemma 6.1. The curves in Fig. 6.1 represent the mean and stan-
dard deviation of the parameter estimation error over time across all simulations for each
estimation algorithm. The results in Fig. 6.1 illustrate that, on average, the RLS with forget-
ting factor estimator (6.48c) produces the fastest convergence of the parameters estimates
while also exhibiting low variance across different trajectories. The standard RLS algorithm
(6.48b) produces the slowest convergence, which is expected given that, in general, this
algorithm cannot guarantee exponential convergence3 of the parameter estimates, whereas
the others can.

3 This also implies that (6.48b) does not satisfy all the conditions of Lemma 6.1. Despite this, note
that boundedness of the estimates is sufficient to establish ISS and ISSf.
112 6 A Modular Approach to Adaptive Safety-Critical Control

Fig. 6.2 State trajectories generated by each estimation algorithm projected onto the x1 -x2 plane. In
each plot the gray disk denotes the obstacle. The colors in each plot share the same interpretation as
those in Fig. 6.1

Example 6.2 In the preceding examples, safety was enforced by choosing an appropriate
value of ε for the given level of uncertainty. In theory, Cδ → C as ε → 0; however, taking
ε very small may require a significant amount of control effort that could exceed physical
actuator limits. An alternative approach to reducing safety violations in this ISSf setting is
through fast adaptation - if the parameter estimates quickly converge to their true values then
the estimated dynamics used in (6.42) to generate control actions will be very close to the
true dynamics. In Fig. 6.3, we generated additional trajectories of the closed-loop system
under the gradient descent update law (6.48a) and the RLS update law with a forgetting factor
(6.48c) using the same setup as in the previous example, but with different levels of initial
parameter uncertainty. As demonstrated in Fig. 6.3, the trajectories under the RLS update
law avoid the obstacle for the given initial parameter estimation errors via fast adaptation,
6.4 Numerical Examples 113

Fig. 6.3 Comparison between trajectories generated by the gradient descent (GD) learning algorithm
(6.48a) and the recursive least squares algorithm (LS) with a forgetting factor (6.48c) for different
initial parameter estimates. The top plot displays the system trajectories, the middle illustrates the
control trajectories, and the bottom illustrates the parameter estimation error
114 6 A Modular Approach to Adaptive Safety-Critical Control

whereas the trajectories under the gradient descent algorithm violate the safety constraint for
higher levels of uncertainty. Hence, rather than using a more robust controller (by decreasing
ε), which may be overly conservative if bounds on θ are unknown, one can endow the ISSf
controller with stronger safety guarantees through the use of a more efficient estimation
algorithm.

6.5 Notes

In this chapter we introduced a class of adaptive control techniques that are referred to as
modular in the sense that the design of the controller can be decoupled from the design of
the parameter estimator. Such a property allows for interchanging the parameter estimation
algorithm with minimal impact on the resulting stability/safety guarantees. Whereas the
adaptive control methods from Chap. 4 could be classified as Lyapunov-based, the approach
presented in the present chapter can generally be classified as an estimation-based approach
since the primary objective of the parameter estimator is often to reduce the prediction error.
The modular approach to adaptive stabilization in this chapter relies heavily on the notion
of input-to-state stability (ISS), a concept introduced by Sontag in [1]. Since its inception
in 1989, ISS has proven to be a powerful tool in nonlinear systems and control - a more
complete exposition on ISS and its applications can be found in [2]. Over the years, various
notions of ISS control Lyapunov functions (ISS-CLF) have been introduced, see, e.g., [3, 4]
for examples. Our definition of an exponential ISS-CLF is inspired by that from [5], where
such a function was used to control bipedal robots in the presence of uncertainty. Similar
to the CLFs from Chap. 2, it was shown in [3] that ISS-CLFs are also inverse optimal [6]
in that they solve a differential game with a meaningful cost function. The idea of using
ISS-CLFs for adaptive nonlinear stabilization can be traced back to [7, 8], in which they
were combined with general classes of parameter estimators to achieve ISS of nonlinear
systems with respect to the parameter estimation error. Extending such modular designs to
concurrent learning-based parameter estimators [9] was explored in [10].
Efforts towards extending to ISS paradigm from stability to safety was first explored
in [11], where it was shown that, in the presence of disturbances, the stability guarantees
induced by CBFs can be given an ISS characterization. These ideas were formalized using
the concept of input-to-state safety (ISSf) in [12], which also introduced the notion of an
ISSf-CBF. Generalizations of the original ISSf framework along with different notions of
ISSf-CBFs have appeared in [13–15]. The idea of using ISSf to extend traditional modular
adaptive control approaches to a safety-critical setting first appeared in [10]. Beyond the
References 115

integration of ISSf and adaptive control, the ISSf framework has also shown promise in
other learning-based control frameworks [16–18] as well as in event-triggered control [19].

References

1. Sontag ED (1989) Smooth stabilization implies coprime factorization. IEEE Trans Autom Con-
trol 34(4):435–443
2. Sontag ED (2008) Input to state stability: basic concepts and results. In: Nistri P, Stefani G (eds)
Nonlinear and optimal control theory. Springer, pp 163–220
3. Krstic M, Li ZH (1998) Inverse optimal design of input-to-state stabilizing nonlinear controllers.
IEEE Trans Autom Control 43(3):336–350
4. Liberzon D, Sontag ED, Wang Y (2002) Universal construction of feedback laws achieving iss
and integral-iss disturbance attenuation. Syst Control Lett 46:111–127
5. Kolathaya S, Reher J, Hereid A, Ames AD (2018) Input to state stabilizing control lyapunov
functions for robust bipedal robotic locomotion. In: Proceedings of the American control con-
ference
6. Freeman RA, Kokotovic PV (1996) Inverse optimality in robust stabilization. SIAM J Control
Optim 34(4):1365–1391
7. Krstić M, Kokotović P (1995) Adaptive nonlinear design with controller-identification separation
and swapping. IEEE Trans Autom Control 40(3):426–440
8. Krstić M, Kokotović P (1996) Modular approach to adaptive nonlinear stabilization. Automatica
32(4):625–629
9. Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency
of excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA
10. Cohen MH, Belta C (2023) Modular adaptive safety-critical control. In: Proceedings of the
American control conference
11. Xu X, Tabuada P, Grizzle JW, Ames AD (2015) Robustness of control barrier functions for
safety critical control. In: Proceedings of the IFAC conference on analysis and design of hybrid
systems, pp 54–61
12. Kolathaya S, Ames AD (2019) Input-to-state safety with control barrier functions. IEEE Control
Syst Lett 3(1):108–113
13. Taylor AJ, Singletary A, Yue Y, Ames AD (2020) A control barrier perspective on episodic
learning via projection-to-state safety. IEEE Control Syst Lett 5(3):1019–1024
14. Alan A, Taylor AJ, He CR, Orosz G, Ames AD (2022) Safe controller synthesis with tunable
input-to-state safe control barrier functions. IEEE Control Syst Lett 6:908–913
15. Alan A, Taylor AJ, He CR, Ames AD, Orosz G (2022) Control barrier functions and input-to-state
safety with application to automated vehicles. arXiv:2206.03568
16. Csomay-Shanklin N, Cosner RK, Dai M, Taylor AJ, Ames AD (2021) Episodic learning for safe
bipedal locomotion with control barrier functions and projection-to-state safety. In: Proceedings
of the 3rd annual conference on learning for dynamics and control, vol 144. Proceedings of
machine learning research, pp 1041–1053
17. Cosner RK, Tucker M, Taylor AJ, Li K, Molnar TG, Ubellacker W, Alan A, Orosz G, Yue Y, Ames
AD (2022) Safety-aware preference-based learning for safety-critical control. In: 4th annual
conference on learning for dynamics and control, vol 166. Proceedings of machine learning
research, pp 1–14
116 6 A Modular Approach to Adaptive Safety-Critical Control

18. Cosner RK, Yue Y, Ames AD (2022) End-to-end imitation learning with safety guarantees
using control barrier functions. In: Proceedings of the ieee conference on decision and control,
pp 5316–5322
19. Taylor AJ, Ong P, Cortés J, Ames AD (2021) Safety-critical event triggered control via input-
to-state safe barrier functions. IEEE Control Syst Lett 5(3):749–754
Robust Safety-Critical Control for Systems with
Actuation Uncertainty
7

In earlier chapters, our main objective was to design controllers for nonlinear systems with
parametric uncertainty so that the closed-loop system satisfied desired performance require-
ments such as exponential stability or safety. We focused on the case when the parameters
entered the dynamics in an additive fashion, in the sense that the dynamics were affine in
both the control input and the uncertain parameters. In this chapter, we consider the case
when uncertainty enters the dynamics multiplicatively, which is relevant to many applica-
tion areas. We propose a duality-based approach to robust safety-critical control in Sect. 7.1,
which is based on robust control barrier functions (Sect. 7.1.1) and robust control Lyapunov
functions (Sect. 7.1.2). An online learning approach for uncertainty reduction based on
leveraging input-output data generated by the system at run-time is presented in Sect. 7.2.
Numerical examples are included in Sect. 7.3. We conclude with final remarks, references,
and suggestions for further reading in Sect. 7.4.

In previous chapters, we considered uncertain nonlinear systems of the form (4.1):

ẋ = f (x) + F(x)θ + g(x)u,

where θ ∈ R p are constant, but unknown parameters of the underlying dynamical system.
In the above system, the parameters enter the dynamics in an additive fashion in the sense
that the dynamics are affine in both the control input and uncertain parameters. A more
challenging situation arises when the uncertainty enters the dynamics multiplicatively in the
sense that the dynamics are bilinear in the control and parameters:

ẋ = f (x) + g(x)u + ϕ(x, u)θ, (7.1)

where ϕ : Rn × Rm → Rn× p is a locally Lipschitz mapping that is affine in u and θ ∈ R p


once again denotes the uncertain parameters. Allowing the uncertainty to enter the dynamics
multiplicatively as in (7.1) is crucial towards extending the ideas introduced thus far to larger
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 117
M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_7
118 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

classes of uncertain systems. Even the parameters of relatively simple systems may fail to
obey the additive restriction imposed by (4.1).

Example 7.1 Let q ∈ R3 be the position of a particle with mass m ∈ R>0 moving in R3
acted upon by a control input u ∈ R3 whose equations of motion can be derived using
Newton’s Second Law:
mq̈ = u
If the mass is unknown, then the above system cannot be put into the form of (4.1) since
the uncertain parameters θ = m1 will multiply the control input. This simple system does,
however, fit into the model proposed in (7.1) with state x = [q  q̇  ] as
   
q̇ 0
ẋ = + m .
1
0 u 
  θ
f (x) ϕ(x,u)

Despite its relevance, system (7.1) presents challenges in designing controllers based on
control Lyapunov functions (CLFs) or control barrier functions (CBFs) that are robust to the
uncertain parameters. These challenges arise from enforcing the CLF or CBF conditions for
all possible realizations of the uncertain parameters, which, as we will demonstrate shortly,
does not have the quadratic programming (QP) structure typically used to synthesize such
controllers.

7.1 A Duality-Based Approach to Robust Safety-Critical Control

To make the aforementioned challenges more precise, we first impose the following assump-
tion on the parameters:

Assumption 7.1 There exist known constants θ i , θ i ∈ R for all i ∈ {1, . . . , p} and a hyper-
rectangle  := [θ 1 , θ 1 ] × · · · × [θ p , θ p ] ⊂ R p such that θ ∈ .

Assumption 7.1 implies the set of possible parameters  admits a halfspace representation
as  = {θ ∈ R p | Aθ ≤ b}, where A, b capture linear halfspace constraints. As argued in
Chap. 5, such an assumption is not restrictive from a practical standpoint as it simply states
there exist known bounds on physical attributes of the system such as its inertia and damping
properties. In the proceeding sections, we detail the challenges that arise when directly
applying CLF/CBF controllers to (7.1) and how such challenges can be overcome using
ideas that exploit the duality of a particular class of convex optimization problems.
7.1 A Duality-Based Approach to Robust Safety-Critical Control 119

7.1.1 Robust Control Barrier Functions

In this section, we develop a CBF approach that robustly accounts for all possible realizations
of the system uncertainty to design controllers guaranteeing safety of (7.1). Importantly, we
show how this can be accomplished while retaining the traditional QP structure used in CBF
approaches by exploiting the dual of a particular linear program (LP). As is typically the
case when using CBFs, we consider a candidate safe set defined as the zero superlevel set
of a continuously differentiable function h : Rn → R as in (3.3):

C = {x ∈ Rn | h(x) ≥ 0}.

We begin by introducing the notion of a robust CBF (RCBF) for systems of the form (7.1).

Definition 7.1 (Robust CBF) A continuously differentiable function h : Rn → R is said


to be a robust CBF (RCBF) for (7.1) on a set C ⊂ Rn as in (3.3) if there exists α ∈ K∞
e such

that for all x ∈ Rn


sup inf ḣ(x, u, θ ) > −α(h(x)), (7.2)
u∈U θ∈

where ḣ(x, u, θ ) = L f h(x) + L g h(x)u + L ϕ h(x, u)θ .

The above definition states that h is a RCBF for (7.1) if it is possible to enforce the standard
CBF condition ḣ ≥ −α(h) for the worst-case scenario given the feasible parameter set .
Similar to the standard CBF case, let

K rcbf (x) :={u ∈ U | L f h(x) + L g h(x)u + inf L ϕ h(x, u)θ ≥ −α(h(x))}


θ∈

be, for each x ∈ Rn , the set of control values satisfying the condition from (7.2). The
following lemma shows that any locally Lipschitz control policy k(x) ∈ K rcbf (x) renders C
forward invariant for the closed-loop system.

Lemma 7.1 If h is a RCBF for (7.1) on a set C as in (3.3) and Assumption 7.1 holds, then
any locally Lipschitz control policy u = k(x) satisfying k(x) ∈ K rcbf (x) for each x ∈ Rn
renders C forward invariant for the closed-loop system.

Proof The derivative of h along the closed-loop system is lower bounded as

ḣ(x) = L f h(x) + L g h(x)k(x) + L ϕ h(x, k(x))θ


≥ L f h(x) + L g h(x)k(x) + inf L ϕ h(x, k(x))θ
θ∈
≥ −α(h(x)).
120 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

Hence, h is a barrier function for the closed-loop system and the forward invariance of C
follows from Theorem 3.2. 

Although the above lemma demonstrates that the class of CBF from Definition 7.1 pro-
vides sufficient conditions for safety, this formulation is not appealing from a control syn-
thesis perspective. In particular, the minimax nature and coupling of control and parameters
in Definition 7.1 will lead to bilinear constraints on the control and parameters and thus
cannot be directly cast as a QP. For example, directly embedding the conditions imposed by
Definition 7.1 into an optimization problem yields

min 1
2 u − k0 (x) 2
u∈U
(7.3)
subject to L f h(x) + L g h(x)u + inf L ϕ h(x, u)θ ≥ −α(h(x)).
θ∈

This optimization problem requires solving simultaneously for both u and θ ; however, the
constraint is bilinear in u and θ and is thus not a QP. To remedy this, note that the inner
minimization problem from (7.2) can be written as the LP1 :

inf L ϕ h(x, u)θ


θ (7.4)
subject to Aθ ≤ b.

The dual of (7.4) is


sup b μ
μ≤0 (7.5)
subject to μ A = L ϕ h(x, u),
where μ is the dual variable. In light of (7.4) and (7.5) we show in Theorem 7.1 that one
can solve the following QP

min 1
2 u − kd (x) 2
u∈U , μ≤0

subject to L f h(x) + L g h(x)u + b μ ≥ −α(h(x)) (7.6)

μ A = L ϕ h(x, u),

with decision variables u and μ, to compute a controller satisfying the RCBF conditions
from Definition 7.1.

Theorem 7.1 Let the assumptions of Lemma 7.1 hold. Then any locally Lipschitz solution
to (7.6), u = k(x), renders C forward invariant for the closed-loop system.

1 Note that L h(x, u) is an affine function of u.


ϕ
7.1 A Duality-Based Approach to Robust Safety-Critical Control 121

Proof The RCBF condition (7.2) is satisfied at a state x ∈ Rn if the value of the optimization
problem
sup inf p L f h(x) + L g h(x)u + L ϕ h(x, u)θ + α(h(x))
u∈U θ∈R (7.7)
subject to Aθ ≤ b,
is greater than or equal to 0. It follows from the strong duality theorem of LPs that the values
of the primal and dual LPs in (7.4) and (7.5), respectively, are equal, allowing the inner
minimization in (7.7) to be replaced with its dual (7.5) yielding

sup L f h(x) + L g h(x)u + b μ + α(h(x))


u∈U ,μ∈R2 p (7.8)
subject to μ A = L ϕ h(x, u), μ ≤ 0.

By the strong duality of LPs, the values of the optimization problems in (7.7) and (7.8) are
equivalent implying that if the optimal value of (7.8) is greater than or equal to 0 for a given
x ∈ Rn , then the resulting input u satisfies (7.2). Embedding the conditions imposed by
(7.8) as constraints in an optimization problem yields the QP in (7.6). Under the assumption
that K rcbf (x) is nonempty for each x ∈ Rn , the optimal value of (7.7), and thus of (7.8)
by strong duality, is greater than or equal to 0, which implies that (7.6) is feasible for each
x ∈ Rn and that k(x) ∈ K rcbf (x) for each x ∈ Rn . It then follows from the assumption that
the resulting control policy u = k(x) is locally Lipschitz and Lemma 7.1 that such a policy
renders C forward invariant for the closed-loop system, as desired. 

Remark 7.1 An alternative way to replacing (7.4) with (7.5) would be to use the fact that,
for an LP, the optimum value is achieved at a vertex of the feasible set. Therefore, it is
possible to replace the constraint given by (7.4) with an enumeration of constraints obtained
by replacing θ with each corner of the feasible polyhedron Aθ ≤ b. In general, however,
this would result in a number of constraints that grows combinatorially in the number of
half spaces in Aθ ≤ b. Intuitively, this is avoided in (7.5) because the dual variable μ
automatically selects the worst-case corner.

Remark 7.2 Although we have stated all results here for relative degree one CBFs, the same
recipe outlined in previous chapters can be used to extend this approach to high order CBFs.
Similar to previous chapters, such an extension is contingent on the uncertain parameters θ
only appearing in the highest order Lie derivative of h along the dynamics (7.1).

7.1.2 Robust Control Lyapunov Functions

The duality-based approach developed for robust safety naturally extends to robust stabi-
lization problems using the notion of a robust CLF for systems of the form (7.1). For all
results in this section we make the following assumption:
122 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

Assumption 7.2 The uncertain system (7.1) satisfies f (0) = 0 and ϕ(0, 0) = 0 so that the
origin is an equilibrium point of the unforced system.

Definition 7.2 (Robust CLF) A Lyapunov function candidate V : Rn → R≥0 is said to be


a Robust CLF (RCLF) for (7.1) if there exists γ ∈ K such that for all x ∈ Rn \ {0}

inf sup V̇ (x, u, θ ) < −γ (V (x)), (7.9)


u∈U θ∈

where V̇ (x, u, θ ) = L f V (x) + L g V (x)u + L ϕ V (x, u)θ .

Now consider the set

K rclf (x) :={u ∈ U | L f V (x) + L g V (x)u + sup L ϕ V (x, u)θ ≤ −γ (V (x))},


θ∈

of all control values satisfying the condition from (7.9). The following lemma shows that
any locally Lipschitz controller satisfying the conditions of Definition 7.2 renders the origin
asymptotically stable for (7.1).

Lemma 7.2 If V is a RCLF for (7.1) and Assumptions 7.1–7.2 hold, then any control policy
u = k(x), locally Lipschitz on Rn \ {0}, satisfying k(x) ∈ K rclf (x) for each x ∈ Rn renders
the origin asymptotically stable for (7.1).

Proof The derivative of V along the closed-loop system is upper bounded as

V̇ (x) = L f V (x) + L g V (x)k(x) + L ϕ V (x, k(x))θ


≤ L f V (x) + L g V (x)k(x) + sup L ϕ V (x, k(x))θ
θ∈
≤ − γ (V (x)),

and asymptotic stability follows from Theorem 2.2. 

Following the same duality-based approach as in the previous section we can make the
synthesis of robust stabilizing controllers more tractable than as presented in Definition 7.2.
The dual of the LP supθ∈ L ϕ V (x, u)θ is given by

inf b λ
λ≥0
(7.10)
subject to λ A = L ϕ V (x, u),

where λ is the dual variable. This allows to generate inputs satisfying condition (7.9) by
solving the following QP:
7.2 Online Learning for Uncertainty Reduction 123

1 2
min 2 u
u∈U , λ≥0

subject to L f V (x) + L g V (x)u + b λ ≤ −γ (V (x)) (7.11)

λ A = L ϕ V (x, u),

as shown in the following theorem.

Theorem 7.2 Let the assumptions of Lemma 7.2 hold. Then, any solution to (7.11), u =
k(x), locally Lipschitz on Rn \ {0}, renders the origin asymptotically stable for the closed-
loop system.

Proof Follows the same steps as that of Theorem 7.1. 

7.2 Online Learning for Uncertainty Reduction

The previous section demonstrates how to robustly account for system uncertainty to guar-
antee stability and/or safety; however, the initial bounds on the system uncertainty may be
highly conservative, which could restrict the system from exploring much of the safe set and,
as illustrated in Sect. 7.3, could produce controllers that require large amounts of control
effort to enforce stability and safety. A more attractive approach is to leverage input-output
data generated by the system at run-time in an effort to identify the system uncertainty,
which can be used to reduce the conservatism of the approach outlined in the previous
section. Such an approach was leveraged using techniques from adaptive control in previous
chapters. Here, we present an alternative approach based on the idea of set membership iden-
tification (SMID), which is a model identification approach commonly used in the model
predictive control literature. Rather than maintaining a point-wise estimate of the uncertain
parameters (as in the adaptive control approach), the SMID approach maintains an entire
feasible set of parameters. We will effectively use this approach to shrink the hyperrectangle
 containing θ down to a smaller set to reduce the conservatism of the robust approach
presented in the previous section.
Following the approach from Sect. 4.2, let t ∈ R>0 be the length of an integration
window and note that over any finite time interval [t − t, t] ∈ R≥0 , the Fundamental
Theorem of Calculus can be used to represent (7.1) as the linear regression model from
(4.18):
Y (t) = F (t)θ,
where
124 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

 t
Y (t) := x(t) − x(t − t) − ( f (x(s)) + g(x(s))u(s)) ds
t max{t−t,0}
 t (7.12)
F (t) := ϕ(x(s), u(s))ds.
max{t−t,0}

Our goal is now to use the above relation to shrink the set of possible parameters  using
input-output data collected online. To this end, let H(t) := {(Y j (t), H j (t))} M
j=1 be a history
stack with M ∈ N entries. Letting {tk }k∈Z≥0 be a strictly increasing sequence of times with
t0 = 0, consider the corresponding sequence of sets

0 =
k = {θ ∈ k−1 | − ε1n ≤ Y j (tk ) − F j (tk )θ ≤ ε1n , ∀ j ∈ M},

where 1n is an n-dimensional vector of ones, which is the set of all parameters that approx-
imately satisfy (4.18) for each j ∈ M with precision2 ε ∈ R>0 . In practice, the set k can
be computed by solving, for each i ∈ {1, . . . , p}, the pair of LPs

θ ik = arg min θi
θ
s.t. Y j (tk ) − F j (tk )θ ≤ ε1n ∀ j (7.13)
Y j (tk ) − F j (tk )θ ≥ −ε1n ∀ j
Ak−1 θ ≤ bk−1 ,

k
θ i = arg max θi
θ
s.t. Y j (tk ) − F j (tk )θ ≤ ε1n ∀ j (7.14)
Y j (tk ) − F j (tk )θ ≥ −ε1n ∀ j
Ak−1 θ ≤ bk−1 ,
where θi is the ith component of θ and Ak−1 , bk−1 capture the halfspace constraints imposed
by k−1 . The updated set of possible parameters is then taken as
k k
k = [θ k1 , θ 1 ] × · · · × [θ kp , θ p ]. (7.15)

The following result shows that the true parameters always belong to the set of possible
parameters generated by the SMID scheme.

Lemma 7.3 Provided that Assumption 7.1 holds and the sequence of sets { k }k∈Z≥0 is
generated according to (7.13)–(7.15), then k ⊆ k−1 ⊆  and θ ∈ k for all k ∈ Z≥0 .

2 The constant ε can be seen as a parameter governing the conservativeness of the identification
scheme, which can be used to account for disturbances, noise, unmodeled dynamics, and/or numerical
integration errors.
7.2 Online Learning for Uncertainty Reduction 125

Proof The observation that k ⊆ k−1 for all k ∈ Z≥0 follows directly from (7.13) and
k k−1
(7.14) since the constraint Ak−1 θ ≤ bk−1 ensures that θ ik , θ i ∈ [θ ik−1 , θ i ] for all i
k k−1
implying [θ ik , θ i ] ⊆ [θ ik−1 , θ i ] for all i. It then follows from (7.15) and 0 =  that
k ⊆ k−1 ⊆  for all k ∈ Z≥0 . Our goal is now to show that θ ∈ k−1 =⇒ θ ∈ k . For
any k ∈ Z≥0 , relation (4.18) implies that θ belongs to the set

Hk = {θ ∈ R p | Y j (tk ) − F j (tk )θ = 0}

for all j ∈ M. Additionally, for any k ∈ Z≥0 the constraints in (7.13) and (7.14) ensure that
− +
k ⊂ Hk ∩ Hk , where

Hk− ={θ ∈ R p | Y j (tk ) − F j (tk )θ ≥ −ε1n }


Hk+ ={θ ∈ R p | Y j (tk ) − F j (tk )θ ≤ ε1n },

for all j ∈ M. It then follows from θ ∈ Hk and Hk ⊂ Hk− ∩ Hk+ that θ ∈ Hk− ∩ Hk+ . The
last constraint in (7.13) and (7.14) ensures that k ⊂ Hk− ∩ Hk+ ∩ k−1 , which implies that
θ ∈ k as long as θ ∈ k−1 . Since θ ∈ 0 it inductively follows from θ ∈ k−1 =⇒ θ ∈
k for all k ∈ Z≥0 that θ ∈ k for all k ∈ Z≥0 . 

The following propositions demonstrate that if h and V are a RCBF and RCLF, respec-
tively, for (7.1) with respect to the original parameter set , then they remain so for the
parameter sets generated by the SMID algorithm.

Proposition 7.1 Let h be a RCBF for (7.1) on a set C ⊂ Rn in the sense that there exists
α ∈ K∞
e such that (7.2) holds for all x ∈ Rn . Provided the assumptions of Lemma 7.3 hold,

then
sup inf ḣ(x, u, θ ) ≥ −α(h(x)),
u∈U θ∈ k

for all x ∈ Rn and all k ∈ Z≥0 .

Proof Let θk∗ ∈ k be the solution to the LP inf θ∈ k L ϕ h(x, u)θ for some (x, u). Since
∗ ∗
k+1 ⊆ k by Lemma 7.3 one of the following holds: either (i) θk ∈ k+1 or (ii) θk ∈

k \ k+1 . For case (i), if the infimum is achieved over the set k+1 , then θk would also be
an optimal solution to the LP inf θ∈ k+1 L ϕ h(x, u)θ and

inf L ϕ h(x, u)θ = inf L ϕ h(x, u)θ.


θ∈ k+1 θ∈ k

For case (ii) if θk∗ ∈ k\ k+1 , then necessarily

inf L ϕ h(x, u)θ ≥ inf L ϕ h(x, u)θ,


θ∈ k+1 θ∈ k
126 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

otherwise the infimum would have been achieved over k+1 since k ⊇ k+1 . Thus, since
the RCBF condition (7.2) holds over  and k ⊆  for all k ∈ Z≥0 by Lemma 7.3, we have

inf L ϕ h(x, u)θ ≥ inf L ϕ h(x, u)θ,


θ∈ k θ∈

for all k ∈ Z≥0 . The preceding argument implies

sup inf ḣ(x, u, θ ) ≥ sup inf ḣ(x, u, θ ) ≥ −α(h(x)),


u∈U θ∈ k u∈U θ∈

for all x ∈ Rn and k ∈ Z≥0 , as desired. 

Proposition 7.2 Let V be a RCLF for (7.1) in the sense that there exists γ ∈ K such that
(7.9) holds for all x ∈ Rn . Provided the assumptions of Lemma 7.3 hold, then

inf sup V̇ (x, u, θ ) ≤ −γ (V (x)),


u∈U θ∈ k

for all x ∈ Rn and all k ∈ Z≥0 .

Proof The proof parallels that of Proposition 7.1. 

7.3 Numerical Examples

Example 7.2 We first consider a scenario for a two-dimensional nonlinear system with
naturally unsafe dynamics in the sense that trajectories of the system leave the safe set
without intervention from a controller. The dynamics of the system are in the form of (7.1)
and are given by f (x) = g(x) = 0 such that
⎡ ⎤
θ
    1
ẋ1 x1 x2 0 0 ⎢ θ ⎥
⎢ 2⎥ .
=
ẋ2 0 0 x13 x2 u ⎣θ3 ⎦
  
ϕ(x,u)
θ4
  
θ

The uncertain parameters are assumed to lie in the set

 = [−1.2, 0.2] × [−2, −0.1] × [0.5, 1.4] × [0.8, 1.2] ⊂ R4 .

The objective is to regulate the system to the origin while remaining in a set C ⊂ R2 char-
acterized as in (3.3) with
h(x) = 1 − x1 − x22 .
The regulation objective is achieved by considering the RCLF candidate
7.3 Numerical Examples 127

Fig. 7.1 Trajectory of the nonlinear system under various controllers. The solid blue curve depicts
the trajectory with SMID, the dotted orange curve depicts the trajectory without SMID, the purple
curve illustrates the trajectory under a standard CBF-QP with exact model knowledge, and the black
curve denotes the boundary of the safe set

V (x) = 14 x14 + 21 x22 ,

with γ (s) = 21 s and the safety objective is achieved by considering the RCBF candidate with
h as above and α(s) = s 3 . Given a RCLF, RCBF, and uncertainty set , one can form a QP as
noted after Theorem 7.2 to generate a closed-loop control policy that guarantees stability and
safety provided the sufficient conditions of Theorems 7.1 and 7.2 are satisfied. To illustrate
the impact of the integral SMID procedure, simulations are run with and without SMID
active, the results of which are provided in Figs. 7.1 and 7.2. The parameters associated with
the SMID simulation are t = 0.3, ε = 0.1, M = 20. The M data points in LPs (7.13) and
(7.14) are collected using a moving window approach, where the M most recent data points
are used to update the uncertainty set. As illustrated in Fig. 7.1 the trajectory under the RCLF-
RCBF-QP achieves the stabilization and safety objective with and without SMID; however,
the trajectory without any parameter identification is significantly more conservative and is
unable to approach the boundary of the safe set. In contrast, the trajectory with SMID is able
to approach the boundary of the safe set as more data about the system becomes available.
In particular, both trajectories follow an identical path up until t = t, at which point the
set of possible parameters is updated, causing the blue curve (SMID) to deviate from the
orange curve (no SMID) in Fig. 7.1. In fact, even after the first SMID update the blue curve
closely resembles the purple curve, which corresponds to the trajectory under a CBF-QP
with perfect model knowledge. Although the parameters have not been exactly identified by
the end of the simulation (see Fig. 7.2), the modest reduction in uncertainty offered by the
SMID approach greatly reduces the conservatism of the purely robust approach.
128 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

Fig. 7.2 Set-based estimate of the uncertain parameters for the nonlinear system. From left to right,
the plots illustrate the uncertainty set  projected onto the θ1 × θ2 , θ1 × θ3 , and θ1 × θ4 axes,
respectively. In each plot the pale rectangle represents the original uncertainty set, the dark rectangle
represents the final uncertainty set generated by the SMID algorithm, and the dot represents the true
values of the parameters

Example 7.3 We now consider a robotic navigation task and demonstrate how to incor-
porate HOCBFs into the developed framework. The robot is modeled as a planar double
integrator with uncertain mass and friction effects of the form

mq̈ = u − cq̇,

where q ∈ R2 denotes the robot’s position, m ∈ R>0 its mass, c = diag([c1 c2 ]) ∈ R2×2
a vector of friction coefficients, and u ∈ R2 its commanded acceleration. Taking the state
as x = [q  q̇  ] ∈ R4 and the uncertain parameters as θ = [ cm1 cm2 m] ∈ R3 allows the
system to be put into the form of (7.1) as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ẋ1 x3 0 0 0 ⎡ c1 ⎤
⎢ẋ2 ⎥ ⎢x4 ⎥ ⎢ 0 0 0⎥ m
⎢ ⎥ = ⎢ ⎥+⎢ ⎥ ⎣ c2 ⎦ .
⎣ẋ3 ⎦ ⎣ 0 ⎦ ⎣−x3 0 u1 ⎦ m
m
ẋ4 0 0 −x4 u 2   
         θ
ẋ f (x) ϕ(x,u)

The objective is to drive the robot to the origin while avoiding a circular obstacle of radius
r ∈ R>0 centered at [xo yo ] ∈ R2 . The state constraint set can be described as the zero
superlevel set of
h(x) = (x1 − xo )2 + (x2 − yo )2 − r 2 ,
which has relative degree 2 with respect to both the control input and uncertain parameters
as required by Remark 7.2.
To further demonstrate the advantage of reducing the level of uncertainty online, we
simulate the double integrator under a robust HOCBF-based policy with and without the
SMID algorithm running. For each simulation, the uncertain parameters are assumed to
7.4 Notes 129

SMID Robust

2
2

−5 −4 −3 −2 −1 0 1
1

Fig. 7.3 Evolution of the double integrator’s position with the SMID algorithm active (blue) and
inactive (orange). The gray disk denotes an obstacle of radius r = 1.5 centered at xo = −2.5, yo = 2.5

lie in the set  = [0, 5] × [0, 5] × [0.1, 2] and all extended class K∞ e functions used in

the HOCBF constraints are chosen as α(s) = s 3 . The stabilization objective is achieved
by considering the same CLF candidate used in Example 5.1 and the controller ultimately
applied to the system is computed by filtering the solution to the RCLF-QP (7.11) through
a robust HOCBF-QP. The parameters for the SMID algorithm are chosen as M = 20, t =
0.1, and ε = 1, where data is recorded using the same technique as in the previous example.
The trajectory of the robot’s position with and without the SMID algorithm is illustrated in
Fig. 7.3, where each trajectory is shown to satisfy the stability and safety objective. Although
the trajectories appear very similar, the controller without SMID generates this trajectory
with significantly more control effort (see Fig. 7.4). In fact, within the first second of the
simulation such a controller requires control effort that is an order of magnitude higher than
that of the controller that reduces the uncertainty online to avoid collision with the obstacle.

7.4 Notes

In this chapter, we presented a duality-based approach to robust and data-driven safety-


critical control that allows for the synthesis of CBF and CLF-based controllers using
quadratic programming (QP). An alternative to the duality-based approach to designing
CBF/CLF-based controllers for systems with additive and multiplicative uncertainty, which
first appeared in [1], involves reformulating the optimization problem (7.3) as a second order
cone program (SOCP), which is a convex optimization problem and hence can be solved
efficiently in real-time. Converting (7.3) to a SOCP generally involves assuming that the Lie
130 7 Robust Safety-Critical Control for Systems with Actuation Uncertainty

Fig. 7.4 Evolution of the control input for the double integrator with the SMID algorithm active (top)
and inactive (bottom)

derivative of h along the dynamics (7.1) can be lower-bounded as

ḣ = L f h(x) + L g h(x)u + L ϕ h(x, u)θ


= L f h(x) + L g h(x)u − (a(x) + b(x) u )θ̄,

for some locally Lipschitz functions a : Rn → R, b : Rn → R and some known bound


θ ≤ θ̄. The above condition is still a nonlinear function of the control input because of the
appearance of u ; however, it can be recast as a second order cone constraint with an explicit
conversion detailed in [2]. Promising works that take the SOCP approach to accounting
for model uncertainty include [3–5]. The SOCP approach has also found applications in
developing CBF-based controllers that account for measurement uncertainty [2, 6] and
input delays [7].
As mentioned in Remark 7.1, it is also possible to convert the optimization problem
from (7.3) into a QP by enumerating all the vertices of  as constraints - an approach
taken in works such as [8, 9]. Although this approach leads to control synthesis using a QP,
the number of constraints can grow rapidly in higher dimensions. For example, the vertex
representation of a p-dimensional hyperrectangle results in 2 p constraints, whereas the
halfspace representation only results in 2 p constraints. Although the duality-based approach
presented in this chapter does not directly correspond to using the halfspace representation
of  in a QP, it does produce a QP whose number of constraints scale linearly, rather than
References 131

exponentially. The system dynamics and safety constraint used in Example 7.2 are taken
from [10].

References

1. Cohen MH, Belta C, Tron R (2022) Robust control barrier functions for nonlinear control systems
with uncertainty: a duality-based approach. In: Proceedings of the ieee conference on decision
and control, pp 174–179
2. Dean S, Taylor AJ, Cosner R, Recht B, Ames AD (2021) Guaranteeing safety of learned per-
ception modules via measurement-robust control barrier functions. In: Proceedings of the 2020
conference on robot learning, vol 155. Proceedings of machine learning research, pp 654–670
3. Taylor AJ, Dorobantu VD, Dean S, Recht B, Yue Y, Ames AD (2021) Towards robust data
driven-control synthesis for nonlinear systems with actuation uncertainty. In: Proceedings of the
ieee conference on decision and control, pp 6469–6476
4. Castaneda F, Choi JJ, Zhang B, Tomlin CJ, Sreenath K (2021) Pointwise feasibility of gaus-
sian process-based safety-critical control under model uncertainty. In: Proceedings of the ieee
conference on decision and control, pp 6762–6769
5. Dhiman V, Khojasteh MJ, Franceschetti M, Atanasov N (2021) Control barriers in bayesian
learning of system dynamics. In: IEEE transactions on automatic control
6. Cosner RK, Singletary AW, Taylor AJ, Molnar TG, Bouman KL, Ames AD (2021) Measurement-
robust control barrier functions: Certainty in safety with uncertainty in state. In: Proceedings of
the IEEE/RSJ international conference on intelligent robots and systems, pp 6286–6291
7. Molnar TG, Kiss AK, Ames AD, Orosz G (2022) Safety-critical control with input delay in
dynamic environment. In: IEEE transactions on control systems technology
8. Dawson C, Qin Z, Gao S, Fan C (2021) Safe nonlinear control using robust neural lyapunov-
barrier functions. In: Proceedings of the 5th annual conference on robot learning
9. Emam Y, Glotfelter P, Wilson S, Notomista G, Egerstedt M (2021) Data-driven robust barrier
functions for safe, long-term operation. In: IEEE transactions on robotics
10. Jankovic M (2018) Robust control barrier functions for constrained stabilization of nonlinear
systems. Automatica 96:359–367
Safe Exploration in Model-Based Reinforcement
Learning
8

In this chapter, we show how to use reinforcement learning (RL) to generate a control policy
for the same uncertain dynamical system considered in the previous chapters. Specifically,
we present an online model-based RL (MBRL) algorithm that balances the often competing
objectives of learning and safety by simultaneously learning the value function of an optimal
control problem and the uncertain parameters of a dynamical system using real-time data.
Central to our approach is a safe exploration framework based on the adaptive control barrier
functions introduced in Chap. 5. We start by formulating an infinite-horizon optimal control
problem as an RL problem in Sect. 8.1. Approximations for the value function used in the
RL algorithm are discussed in Sect. 8.2. The main part of this chapter is Sect. 8.3, where we
present the online MBRL algorithm. We illustrate the method with numerical examples in
Sect. 8.4 and conclude with final remarks, references, and suggestions for further reading in
Sect. 8.5.

Reinforcement learning (RL) is a machine learning technique used to solve sequential


decision-making problems in the face of uncertainty via function approximation. The
sequential decision-making problem we consider in this chapter is that of optimal con-
trol in which the goal is to construct a control policy that optimizes an objective function
over a long time horizon, whereas the uncertainty we consider is that of uncertainty in the
vector fields governing the system dynamics. Thus, in the context of this book, we simply
view RL as a technique used to solve optimal control problems when the system dynamics
are uncertain. The typical high-level approach taken in using RL to solve optimal control
problems for uncertain system dynamics is as follows:

1. A user specifies a cost function and an initial control policy;


2. The control policy is applied to the dynamical system (either through simulation or
experiment) to generate a closed-loop system trajectory;

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 133
M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_8
134 8 Safe Exploration in Model-Based Reinforcement Learning

3. The data (e.g., state trajectory, control trajectory, total cost, etc.) from such a trajectory is
used to update the control policy offline in an attempt to improve the policy with respect
to the cost function;
4. Steps 2 and 3 are repeated until the user is satisfied with the control policy.

The above description is, of course, an extremely simplified depiction of the RL pipeline,
but captures the main essence of many standard RL techniques. Where many RL techniques
differ is in how the trajectory data is used to improve the policy. For example model free RL
does not assume any knowledge of the vector fields governing the system dynamics and,
instead, updates the policy directly from the observed trajectory data. An argument in favor
of model-free RL is that it may be more challenging to construct a complicated dynamics
model than simply learning a policy that maps states to control actions. By not explicitly1
using model knowledge in the control design, such approaches may also generalize well
beyond their training data. The drawback of model-free RL is that it is generally extremely
sample inefficient, and therefore may require generating many trajectories until convergence
to a suitable policy is obtained. On the other hand, model-based RL (MBRL) approaches
often use trajectory data to learn a dynamics model, which is then exploited in the process of
learning a better policy. Such approaches are often considered to be more sample efficient,
requiring less data before a suitable policy is obtained. An argument against MBRL is
that the resulting policy may be heavily biased towards the learned model, which, if not
representative of the true underlying dynamics, may perform poorly when deployed on a
system.
Regardless of the method that is ultimately used to update the policy, we refer to the above
paradigm as episodic or offline RL as one must run multiple trials or episodes to obtain data,
and the actual learning (i.e., updating the policy) is generally not done in real-time as the
system is being controlled. This offline approach has demonstrated a tremendous amount
of success in low-risk settings where generating an abundance of data can be done without
real-world consequences, but presents a number of challenges in the context of safety-
critical systems. These challenges mainly stem from the idea that learning and safety seem
to be fundamentally at odds with each other: learning in a RL context requires exploring
a variety of different actions to generate sufficiently rich data whereas enforcing safety
requires restricting actions to those that can only be certified as safe. The fact that a variety
of (potentially unsafe) actions may need to be explored before convergence to a suitable
policy is obtained using the aforementioned episodic approach may not be a limiting factor
in simulation; however, such an approach significantly limits the applicability of these ideas
to safety-critical systems where failures during training are deemed to be unacceptable.
These challenges can be partially mitigated if one has unlimited access to an accurate
simulator of the system under consideration, where mistakes made during learning are
largely inconsequential; however, if such a simulator is not fully representative of the physical

1 Of course, any policy learned from simulation data will implicitly depend on the underlying model
used to generate such data.
8.1 From Optimal Control to Reinforcement Learning 135

system, then policies trained on data generated by such a simulator may yield undesirable
performance when deployed on the physical system.
The aforementioned challenges motivate the development of RL algorithms that operate
online in which real-time data generated by the system is used to learn a desirable policy and
adapt to situations that may be difficult to account for using a policy trained using purely
offline data. In the present chapter we introduce an online MBRL algorithm that balances
the often competing objectives of learning and safety by simultaneously learning the value
function of an optimal control problem and the uncertain parameters of a dynamical system
safely using real-time data. In particular, we develop a safe exploration framework in which
the learned system dynamics are used to simulate on-the-fly exploratory actions needed
to generate sufficiently rich data for learning while simultaneously shielding the learning
policy from unsafe actions on the physical system using the adaptive control barrier functions
introduced in Chap. 5.

8.1 From Optimal Control to Reinforcement Learning

In this chapter, we once again consider the nonlinear control affine system with parametric
uncertainty (4.1) with dynamics

ẋ = f (x) + F(x)θ + g(x)u.

As in previous chapters, our main objective is to design an adaptive controller that stabi-
lizes the origin of (4.1) while satisfying some safety criteria in the sense that closed-loop
trajectories should remain in the set (3.3):

C = {x ∈ Rn | h(x) ≥ 0},

at all times. Rather than accomplishing the stabilization objective by constructing a control
Lyapunov function (CLF), we seek a control policy that minimizes the user-specified infinite-
horizon cost functional  ∞
J (x0 , u(·)) = (x(s), u(s))ds, (8.1)
0
where  : Rn × Rm → R≥0 is the running cost, assumed to take the form

(x, u) = Q(x) + u  Ru, (8.2)

where Q : Rn → R≥0 is continuously differentiable and positive definite, and R ∈ Rm×m


is symmetric and positive definite. Framing the control objective as the optimization of a
cost functional provides a natural way of encoding desired performance specifications that
may be challenging to encode when constructing a CLF.
136 8 Safe Exploration in Model-Based Reinforcement Learning

Solutions to the optimal control problem described by the infinite-horizon cost (8.1) are
typically characterized in terms of the optimal value function or cost-to-go

V ∗ (x) = inf J (x, u(·)). (8.3)


u(·)

Note that the above minimization is an infinite-dimensional optimization performed over


the space of control functions u(·). For the remainder of our development, we impose the
following restriction on the value function.

Assumption 8.1 The optimal value function (8.3) is continuously differentiable, and its
gradient is locally Lipschitz.

Provided the above assumption holds, the value function can be shown to be the unique pos-
itive definite solution to the Hamilton–Jacobi–Bellman (HJB) partial differential equation

0 = infm H (x, u, ∇V ∗ (x)), ∀x ∈ Rn , (8.4)


u∈R

with a boundary condition of V ∗ (0) = 0, where

H (x, u, λ) := λ ( f (x) + F(x)θ + g(x)u) + (x, u) (8.5)

is the Hamiltonian, and λ ∈ Rn is the costate vector. Provided there exists a continuously
differentiable positive definite function V ∗ satisfying the HJB, taking the minimum on the
right-hand-side of (8.4) yields the optimal feedback control policy as
1
k ∗ (x) = − R −1 L g V ∗ (x) . (8.6)
2
The following fundamental result illustrates that any positive definite and continuously
differentiable function satisfying the HJB is also a CLF.

Theorem 8.1 (Value functions are CLFs) Under Assumption 8.1, the value function V ∗ is
a Lyapunov function for the closed-loop system (4.1) equipped with the optimal feedback
controller u = k ∗ (x), which asymptotically stabilizes the closed-loop system to the origin.

Proof We first note that the value function is positive definite by construction based on (8.1)
and (8.2). Substituting the optimal policy from (8.6) back into the HJB (8.4) yields

0 = L f V ∗ (x) + L F V ∗ (x)θ + L g V ∗ (x)k ∗ (x) + (x, k ∗ (x)).

Taking the Lie derivative of V ∗ along the closed-loop vector field and bounding using the
above relation yields

V̇ ∗ (x) = L f V ∗ (x) + L F V ∗ (x)θ + L g V ∗ (x)k ∗ (x) = −(x, k ∗ (x)) ≤ −Q(x), (8.7)


8.2 Value Function Approximation 137

which implies V ∗ is a Lyapunov function for the closed-loop system, and consequently, that
the origin is asymptotically stable for the closed-loop system. 

Although the above result is appealing from a theoretical standpoint, it is of little practical
use since solving the HJB for the value function is a much more difficult problem than
constructing a CLF. The difficulty in solving the HJB generally arises from the fact that
(8.4) does not admit closed-form solutions, except in special cases. For example, when the
dynamics are linear and the cost is quadratic in the state and control, the HJB reduces to the
algebraic Riccati equation which can be solved easily either in closed-form or numerically.
Unfortunately, for nonlinear systems and/or nonquadratic cost functions such closed-form
solutions rarely exist and numerical approaches, typically based upon discretization of the
time domain, state space, and control space, tend to suffer from the well-known “curse
of dimensionality.” Moreover, even if computationally efficient solutions to the HJB were
available, the fact that the dynamics (4.1) under consideration are partially unknown makes
it challenging to guarantee that offline solutions obtained using a possibly inaccurate model
will yield desirable behavior when deployed in the actual system.
One way to overcome these challenges is through the use of function approximation
in which the value function and control policy are parameterized using a suitable class of
function approximators whose parameters are then updated to optimize a performance met-
ric/loss function. In what follows, we demonstrate how the adaptive control tools developed
in earlier chapters can be used to develop update laws for the parameters of such function
approximators. Taking this approach allows for learning the value function and policy online
in real-time (i.e., one-shot learning), rather than episodically as it typical in RL approaches,
and allows for making guarantees on convergence of the function approximation, and ulti-
mately, stability of the closed-loop system.

8.2 Value Function Approximation

Given that the optimal value function is difficult to compute in general, we seek a parametric
approximation of V ∗ over some compact set X ⊂ Rn containing the origin. In order to
derive convergence results for our approximations, we limit ourselves to approximation
architectures that are linear in the trainable parameters. This restriction prohibits the use
of powerful function approximators such as deep neural networks (DNNs) and places the
burden of choosing relevant features for function approximation on the user. That is, we
assume that over a given compact set X, the value function can be represented as

V ∗ (x) = W  φ(x) + ε(x), (8.8)

where W ∈ Rl is a vector of unknown ideal weights, φ : Rn → Rl is a continuously differ-


entiable feature vector, and ε : Rn → R is the unknown continuously differentiable function
138 8 Safe Exploration in Model-Based Reinforcement Learning

reconstruction error. The assumption that V ∗ can be represented as in (8.8) is justified by


the universal function approximation theorem (see the notes in Sect. 8.5), which states that,
given a continuously differentiable function V ∗ , a compact set X, and a constant ε̄, there
exists a continuously differentiable feature vector φ, and a vector of weights W such that
V ∗ and its gradient can be ε̄-approximated over X in the sense that
 
sup V ∗ (x) − W  φ(x) + ∇V ∗ (x) − W  ∂φ ∂x (x) ≤ ε̄.
x∈X

The universal function approximation theorem, however, does not state how the features
should be chosen or how many features are necessary to achieve a desired approximation
accuracy over the domain of interest. Despite this, the fact that the value function is a
CLF provides guidance towards what features may be relevant to produce an adequate
approximation of the value function. Common choices include polynomial features, radial
basis functions or other kernel functions, and pretrained DNNs with tunable outer-layer
weights.
Given that the weights in (8.8) are unknown, we develop an approximation of the value
function by replacing W with an estimate, denoted by Ŵc ∈ Rl , yielding the approximated
value function
V̂ (x, Ŵc ) = Ŵc φ(x),
∂ V̂ (8.9)
(x, Ŵc ) = Ŵc ∂φ
∂x (x).
∂x
For reasons that we will discuss shortly, we maintain a separate approximation of the ideal
weights for use in the approximated optimal policy, denoted by Ŵa ∈ Rl , as
1
k̂(x, Ŵa ) = − R −1 L g V̂ (x, Ŵa ) . (8.10)
2
Given the approximated value function and control policy from (8.9) and (8.10), respectively,
our objective is now to develop a performance metric (i.e., “loss function” if one prefers
machine learning terminology) that quantifies the accuracy of such approximations. To this
end we first define
W̃c := W − Ŵc
(8.11)
W̃a := W − Ŵa ,

as the weight estimation errors. Since W is unknown we cannot simply compute W̃c and
W̃a and directly use such quantities to construct a meaningful performance metric. Instead,
we seek an indirect performance metric related to the quality of the approximations. Such
a development will be centered around the HJB (8.4), which provides a necessary and
sufficient condition for optimality. In particular, the HJB states that, for any given x ∈ Rn ,
the optimal value function and control policy satisfy the relation

H (x, k ∗ (x), V ∗ (x)) = 0. (8.12)


8.2 Value Function Approximation 139

Thus, we take as our performance metric the difference between the Hamiltonian evaluated
with the approximated value function and approximated optimal policy and the optimal
Hamiltonian as

δ(x, Ŵc , Ŵa ) := H (x, k̂(x, Ŵa ), ∇ V̂ (x, Ŵc )) − H (x, k ∗ (x), ∇V ∗ (x))
(8.13)
= H (x, k̂(x, Ŵa ), ∇ V̂ (x, Ŵc )),

where the second equality follows from (8.12). We refer to the above quantity as the Bellman
error (BE), which can be computed at any given state x ∈ X and for any given estimates
(Ŵc , Ŵa ) ∈ Rl × Rl provided the model parameters θ are known, and provides an indirect
metric related to the “distance” between the optimal solution and current approximations.
If the model parameters are unknown, however, the Hamiltonian cannot be computed
exactly, and we must instead work with an approximate Hamiltonian corresponding to an
estimate of the uncertain parameters θ̂ ∈ R p as

Ĥ (x, u, λ, θ̂) := λ ( f (x) + F(x)θ̂ + g(x)u) + (x, u). (8.14)

Following the same steps as before, we can then define an approximated version of the BE
as

δ̂(x, Ŵc , Ŵa , θ̂) := Ĥ (x, k̂(x, Ŵa ), ∇ V̂ (x, Ŵc ), θ̂) − H (x, k ∗ (x), ∇V ∗ (x))
= Ĥ (x, k̂(x, Ŵa ), ∇ V̂ (x, Ŵc ), θ̂)
= ∇ V̂ (x, Ŵc ) ( f (x) + F(x)θ̂ + g(x)k̂(x, Ŵa )) + (x, k̂(x, Ŵa ))
= Ŵc ∂φ
∂ x (x)( f (x) + F(x)θ̂ + g(x)k̂(x, Ŵa )) + (x, k̂(x, Ŵa ))
= Ŵc ω(x, Ŵa , θ̂) + (x, k̂(x, Ŵa ))
= ω(x, Ŵa , θ̂) Ŵc + (x, k̂(x, Ŵa )),
(8.15)
where we have defined
∂φ
ω(x, Ŵa , θ̂) := ∂ x (x)( f (x) + F(x)θ̂ + g(x)k̂(x, Ŵa )), (8.16)

to make the affine dependence of δ̂ on Ŵc explicit. Note that δ̂ is an affine function of Ŵc but
a nonlinear function of Ŵa . This affine dependence on Ŵc is the motivation for maintaining
separate approximations of the ideal weights in the value function and policy as this makes
minimizing a performance metric based on δ̂ 2 much easier. Similar to (8.13), the approximate
BE (8.15) can be computed for any x ∈ X given an estimate of the model parameters θ̂ and
weights (Ŵc , Ŵa ) to indirectly quantify the performance of such weight estimates in terms
of approximation of the optimal value function and policy. Our objective is then to select
the weights (Ŵc , Ŵa ) to minimize the total squared BE over the approximation domain

δ̂ 2 (x, Ŵc , Ŵa , θ̂)d x, (8.17)
X
140 8 Safe Exploration in Model-Based Reinforcement Learning

which we replace with the more tractable minimization of


N
δ̂ 2 (xi , Ŵc , Ŵa , θ̂), (8.18)
i=1

where {xi }i=1


N is a collection of N ∈ N points sampled from X. For a fixed θ̂ such a mini-

mization can be performed simply by sampling states from X; however, the resulting weight
estimates (Ŵc , Ŵa ) would be highly biased towards the estimated model parameters θ̂, which
may be inaccurate. Hence, to obtain an accurate approximation of the value function, it is
necessary to obtain a better estimate of the uncertain model parameters, which can be done
using the adaptive control methods outlined in previous chapters. In the following section we
provide a pathway towards accomplishing this objective using a model-based reinforcement
learning (MBRL) approach that allows for learning the value function, control policy, and
uncertain model parameters all simultaneously online.

8.3 Online Model-Based Reinforcement Learning

In this section we introduce a safe exploration framework that allows for jointly learning
online the uncertain system dynamics and the optimal value function and policy while
guaranteeing safety. Our safe exploration architecture consists of two main components.
The first is an adaptive CBF scheme, similar to those outlined in the preceding chapters,
which allows for identifying the uncertain parameters θ online while guaranteeing safety.
The second is a safe exploration framework that leverages the learned model to simulate and
explore potentially unsafe actions to generate data for learning the value function without
risking safety violation of the physical system.

8.3.1 System Identification

The first component of our MBRL architecture is a system identification method for learning
the uncertain model parameters online, which we accomplish using the concurrent learning
technique introduced in previous chapters. To this end, recall that integrating the dynamics
(4.1) over some finite time interval yields the linear regression equation from (4.18)

Y(t) = F (t)θ,

where Y and F are defined as in (4.17). Given a history stack of input-output data, we
propose to update the parameters according to (5.25) as
8.3 Online Model-Based Reinforcement Learning 141


M  
θ̂˙ = γ F j Y j − F j θ̂ ,
j=1

where γ ∈ R>0 is a learning gain and H = {(Y j , F j )} M j=1 is a history stack. For the results
in this chapter, we assume that the history stack used in the above update law satisfies the
finite excitation condition (see Definition 4.5) so that the parameter estimates exponentially
converge to their true values.

Assumption 8.2 The history stack H used in the update law (5.25) satisfies the finite
excitation condition (Definition 4.5), which, by Theorem 4.3, ensures the existence of a
Lyapunov-like function Vθ : R p × R≥0 → R≥0 satisfying

c1 θ̃ 2
≤ Vθ (θ̃, t) ≤ c2 θ̃ 2
, ∀(θ̃ , t) ∈ R p × R≥0 , (8.19a)

V̇θ (θ̃ , t) ≤ −c3 θ̃ 2


, ∀(θ̃ , t) ∈ R p × R≥0 , (8.19b)
for some positive constants c1 , c2 , c3 ∈ R>0 .

Remark 8.1 Although Theorem 4.3 does not directly assert the existence of a Lyapunov-
like function satisfying (8.19b) for all t ∈ R≥0 , it does establish an exponential bound on the
parameter estimation error, which, by standard converse Lyapunov theorems, can be used
to show the existence of a Lyapunov-like function satisfying Assumption 8.2.

8.3.2 Safe Exploration via Simulation of Experience

We present in this section our safe exploration method for learning jointly online the value
function, control policy, and uncertain system dynamics. Recall that our objective is to
design a control policy that optimizes the cost functional from (8.1) while guaranteeing
forward invariance of a set C ⊂ Rn as defined as the zero superlevel set of a continuously
differentiable function h : Rn → R. We accomplish the safety objective using a robust
adaptive control barrier function (RaCBF) as introduced in Chap. 5.2. In particular, under
the assumption that h is a RaCBF (see Definition 5.2), we shield the learned policy k̂ from
(8.10) using the optimization-based controller

k(x, θ̂, Ŵa ) = argmin 1


2 u − k̂(x, Ŵa ) 2
u∈Rm
s.t. L f h(x) + L F h(x)θ̂ + L g h(x)u ≥ −α(h(x)) + L F h(x) ϑ̃,
(8.20)
where ϑ̃ is a bound on the parameter estimation error from Assumption 5.2. Although the
above policy guarantees safety by Theorem 5.2, it may restrict the system from taking actions
that are necessary to generate sufficiently rich data for learning. On the other hand, directly
142 8 Safe Exploration in Model-Based Reinforcement Learning

applying the learned policy from (8.10) to the system may not enforce forward invariance
of C, which, in our safety-critical setting, would be unacceptable.
Our safe exploration framework is facilitated by the observation that the approximate BE
(8.15) need not be evaluated with the policy that is ultimately applied to the system. That
is, one can leverage the learned policy (8.10), which may be unsafe, to generate sufficiently
rich data for learning the value function while simultaneously shielding such a policy using
the RaCBF safety filter (8.20) to prevent the actual system from violating safety-critical
constraints. This idea manifests itself as simulation of experience in which the learned
model and policy are used to simulate potentially unsafe actions that may be beneficial for
learning, but that also may be unsafe. To this end, recall from the previous section that
our learning objective can be accomplished by minimizing the squared approximate BE
over the approximation domain X. To develop a learning algorithm that minimizes such a
performance metric, we introduce the normalized loss function

1  δ̂ 2 (xi , Ŵc , Ŵa , θ̂ )


N
L(Ŵc , Ŵa , θ̂) := , (8.21)
i=1 2ρ (x i , Ŵa , θ̂)
N 2

where
ρ(x, Ŵa , θ̂) := 1 + ω(xi , Ŵa , θ̂) ω(xi , Ŵa , θ̂ ),
is a normalization term. The derivative of L with respect to Ŵc is

1  (Ŵc ω(xi , Ŵa , θ̂) + (xi , k̂(xi , Ŵa )))ω(xi , Ŵa , θ̂)
N
∂L
(Ŵc , Ŵa , θ̂) =
∂ Ŵc N
i=1 ρ 2 (xi , Ŵa , θ̂)

1  ω(xi , Ŵa , θ̂)


N
= δ̂(xi , Ŵc , Ŵa , θ̂).
i=1 ρ (x i , Ŵa , θ̂)
N 2

(8.22)
Using the above, we update the value function weights using a normalized version of the
recursive least squares (RLS) with forgetting/discount factor used in Sect. 6.4 as

κc  ω(xi , Ŵa , θ̂)


N
Ŵ˙ c = − δ̂(xi , Ŵc , Ŵa , θ̂), (8.23)
i=1 ρ (x i , Ŵa , θ̂)
N 2


κc  ω(xi , Ŵa , θ̂)ω(xi , Ŵa , θ̂)
N
˙ = β −  , (8.24)
N
i=1 ρ 2 (xi , Ŵa , θ̂)
where κc ∈ R>0 is a learning gain and β ∈ R>0 is the discount factor. Based on the pro-
ceeding convergence analysis, we update the control policy weights as

κc  G φ (xi )Ŵa ω(xi , Ŵa , θ̂ )


N
Ŵ˙ a = −κa1 (Ŵa − Ŵc ) − κa2 Ŵa + Ŵc , (8.25)
N 4ρ 2 (xi , Ŵa , θ̂ )
i=1
8.3 Online Model-Based Reinforcement Learning 143

where
∂φ ∂φ
G φ (x) := (x)G R (x) (x) ,
∂x ∂x
−1 
G R (x) := g(x)R g(x) .

Remark 8.2 Although the update law (8.25) is helpful in establishing theoretical conver-
gence results, the much simpler update law

Ŵ˙ a = projW (−κa (Ŵa − Ŵc )), (8.26)

where projW (·) is a projection operator that keeps the weights within a convex compact set
W ⊂ Rl , tends to work well in practice. References for such a projection operator will be
provided in the notes for this chapter.

The following lemma places bounds on the least-squares matrix  and will play an
important role in ensuring convergence of the value function and control policy weights.

Lemma 8.1 Let t → Ŵc (t), Ŵa (t), (t), θ̂(t) be trajectories generated by the update laws
in (8.23), (8.24), (8.25), and (5.25). Suppose that λmin ((0)) > 0 and that the constant

1  ω(xi , Ŵa (t), θ̂(t))ω(xi , Ŵa (t), θ̂(t))
N
λc := inf λmin , (8.27)
t∈R≥0 N
i=1 ρ 2 (xi , Ŵa (t), θ̂(t))

¯  ∈ R>0 such that I L  ≤ (t) ≤


is strictly positive. Then, there exist positive constants ,
I L ¯ for all t ∈ R≥0 .

The condition in (8.27) is similar to the concurrent learning conditions needed for parameter
convergence in earlier chapters, and serves a similar purpose here as satisfaction of such
a condition will allow us to make guarantees about convergence of the weight estimates
(Ŵc , Ŵa ). Although the minimum eigenvalue in (8.27) can be computed at any given point
in time, verifying that the overall condition holds is challenging as this requires reasoning
about the future evolution of the weight estimates. We note that such a condition can be
heuristically satisfied by densely sampling X (i.e., choosing a large number N of distinct
extrapolation points). Given the preceding lemma we now have all the tools in place to
establish convergence guarantees for the weight estimates and stability of the closed-loop
system. We first show that provided the hypothesis of Lemma 8.1 and the constant λc is
sufficiently large, then all estimated parameters converge to a neighborhood of their true
values. Before stating the result, for convenience we introduce the notation

f (x) X := sup f (x)


x∈X

for any continuous f : X → Rq and q ∈ N.


144 8 Safe Exploration in Model-Based Reinforcement Learning

Theorem 8.2 Let z := [W̃c W̃a θ̃  ] ∈ R2l+ p be a composite vector of estimation errors
and suppose the estimated weights and parameters are updated according to (8.23), (8.24),
(8.25), and (5.25). Provided the conditions of Lemma 8.1 are satisfied, Assumption 8.2 holds,
and λmin (M) > 0, where
⎡ ⎤
κc λ̄
− υ2ac − υ2cθ
⎢ 4 ⎥
M = ⎣− υac κa1 +κa2 − υa 0 ⎦ , (8.28)
2 4
− υ2cθ 0 c3
2

and
β λc
λ̄ := + ,
2κc ¯ 2

3 3κc
υac := κa1 + W G φ (x) X ,
√ 64 (8.29)
 
3 3κc  ∂φ 
υcθ := W  ∂ x (x)F(x) ,
16 X

3 3κc
υa := W G φ (x) X ,
64
then all estimated parameters exponentially converge to a neighborhood of their true values
in the sense that for all t ∈ R≥0

κ2 κ
− 3t
κ
− 3t ι
z(t) ≤ z(0) 2 e κ2 + (1 − e κ2 ) , (8.30)
κ1 κ1 κ3
where  
1 1
κ1 := min , , c1 ,
2¯ 2
 
1 1
κ2 := max , , c2 ,
2 2
 
κc λ̄ κa1 + κa2 c3
κ3 := min , , ,
4 4 2
(8.31)
ι2c ιa2
ι := + ,
2λ̄κc 2(κa1 + κa2 )

3 3κc
ιc := (x) X ,
16 √
3 3κc
ιa := κa2 W + W 2
G φ (x) X.
64

Before presenting the proof, we aim to provide some intuition to the sufficient conditions of
the above theorem and the bound in (8.30). For the condition λmin (M) to hold, the constant
λc from Lemma 8.1 must be sufficiently large, implying that the data generated through
sampling must be sufficiently rich, which can be achieved by more densely sampling X.
8.3 Online Model-Based Reinforcement Learning 145

The bound in (8.30) implies that all estimation errors exponentially decay to some ultimate
bound at a rate determined by κ3 . The size of this bound is determined by the constant
ι, which is dependent on the function reconstruction error ε. Generally, choosing a more
expressive basis for V ∗ will decrease ε and therefore the ultimate bound.

Proof The proof is largely an exercise in bookkeeping: we perform some straightforward,


but tedious, algebraic manipulations and then propose a Lyapunov function that certifies
the stability of the weight estimates via the comparison lemma. We begin by deriving an
alternate form of the approximate BE (8.15). We first note that the approximate Hamiltonian
can be expressed as

Ĥ = − ω(x, Ŵa , θ̂) W̃c + W  ω(x, Ŵa , θ̂) + (x, k̂(x, Ŵa )). (8.32)

The second term in (8.32) can be expressed as

W  ω(x, Ŵa , θ̂) = W  ∂φ  ∂φ  ∂φ


∂ x f + W ∂ x F θ̂ + W ∂ x g k̂
= W  ∂φ  ∂φ 1 
∂ x f + W ∂ x F θ̂ − 2 W G φ Ŵa
= W  ∂φ  ∂φ 1  1 
∂ x f + W ∂ x F θ̂ − 2 W G φ W + 2 W G φ W̃a ,

where functional arguments are omitted for ease of readability. Similarly, the third term in
(8.32) can be expressed as

(x, k̂(x, Ŵa )) = Q + k̂  R k̂


   
1 −1  ∂φ  1 −1  ∂φ 
= Q + − R g ∂ x Ŵa R − R g ∂ x Ŵa
2 2
1
= Q + Ŵa G φ Ŵa
4
1 1
= Q + W  G φ Ŵa − W̃a G φ Ŵa
4 4
1 1 1 1
= Q + W  G φ W − W  G φ W̃a − W̃a G φ W + W̃a G φ W̃a
4 4 4 4
1 1 1
= Q + W  G φ W − W  G φ W̃a + W̃a G φ W̃a .
4 2 4
Combining the preceding terms allows (8.32) to be expressed as

Ĥ = − ω W̃c + W  ∂φ  ∂φ 1  1 
∂ x f + W ∂ x F θ̂ − 4 W G φ W + Q + 4 W̃a G φ W̃a .
(8.33)

We now proceed with a similar analysis for the optimal Hamiltonian (8.5), which can be
expressed as
146 8 Safe Exploration in Model-Based Reinforcement Learning

∂V ∗ 
H (x, k ∗ (x), ∇V ∗ (x)) = ∂x ( f + Fθ + gk ∗ ) + Q + k ∗ Rk ∗
(8.34)
∂V ∗ ∂V ∗ 1 ∂V ∗ ∂V ∗ 
= ∂x f + ∂x Fθ + Q − 4 ∂x G R ∂x .

Using the representation of the value function from (8.8) allows the above to be expressed
as

1 ∂V ∗ ∂V ∗ 
H (x, k ∗ (x), ∇V ∗ (x)) = W  ∂φ
∂x f +
∂ε
∂x f + W  ∂φ
∂ x Fθ +
∂ε
∂x Fθ + Q − 4 ∂x G R ∂x ,

where the last term in the above equation can be expanded to obtain

1 ∂V ∗ ∂V ∗  1 ∂V ∗ ∂φ  1 ∂V ∗ ∂φ  ∂ε 
4 ∂x G R ∂x = 4 ∂x G R ∂x W + 4 ∂x G R ∂x ∂x
∂φ  1  ∂φ ∂ε 
= 41 W  G φ W + 1
4
∂ε
∂x G R ∂x W + 4 W ∂x G R ∂x + 41 G ε
∂φ 
= 41 W  G φ W + 1
2
∂ε
∂x G R ∂x W + 4 Gε,
1

∂ε ∂ε 
where G ε (x) := ∂ x (x)G R (x) ∂ x (x) . Substituting the above expression back into H then
yields
H = W  ∂φ
∂x f +
∂ε
∂x f + W  ∂φ
∂ x Fθ +
∂ε
∂x Fθ + Q
(8.35)
∂φ 
− 41 W  G φ W − 1 ∂ε
2 ∂x G R ∂x W − 41 G ε .

Recall that the approximate BE is defined as δ̂ = Ĥ − H , hence, subtracting (8.35) away


from (8.33) yields the alternate form of the BE

δ̂ = − ω W̃c + 41 W̃a G φ W̃a − W  ∂φ


∂ x F θ̃ + ,
(8.36)

where
1 ∂ε ∂φ  ∂ε ∂ε
(x) := 2 ∂ x (x)G R (x) ∂ x W + 41 G ε (x) − ∂ x (x) f (x) − ∂ x (x)F(x)θ. (8.37)

Now consider the Lyapunov function candidate


1  −1 1
V (z, t) = W̃c  (t)W̃c + W̃a W̃a +Vθ (θ̃, t), (8.38)
2   2  
Vc Va

where Vθ is from Assumption 8.2. Provided the conditions of Lemma 8.1 hold, then V can
be bounded for all (z, t) ∈ R2l+ p × R≥0 as

κ1 z 2
≤ V (z, t) ≤ κ2 z 2 .

Computing the derivative of V along the trajectory of z yields


8.3 Online Model-Based Reinforcement Learning 147

V̇ = W̃c  −1 (t)W̃˙ c − W̃c  −1 (t)(t) (t)W̃c + W̃a W̃˙ a +V̇θ (θ̃, t).
1
˙ −1
(8.39)
 2     
V̇a
V̇c

Before substituting in the expressions for the update laws, it will be convenient to express
everything in terms of the estimation errors. We begin with W̃c and, for ease of expo-
sition, define ωi := ω(xi , Ŵa , θ̂) ρi := ρ(xi , Ŵa , θ̂), and δ̂i := δ̂(xi , Ŵc , Ŵa , θ̂), which,
after using the alternate form of the approximate BE (8.36), gives us

W̃˙ c = − Ŵ˙ c
κc  ωi
N
= δ̂i
N ρ2
i=1 i

κc  ωi   
N
1   ∂φ
= −ωi W̃ c + 4 W̃ a G φ,i W̃ a − W ∂x i Fi θ̃ + i
N ρ2
i=1 i

κc  ωi ωi κc  ωi  1  
N N
 ∂φ
= − W̃ c +  W̃ a G φ,i W̃ a − W ∂x i Fi θ̃ +  i ,
N
i=1
ρi2 N
i=1 i
ρ2 4
(8.40)
where G φ,i := G φ (xi ), ∂φ
∂x i := ∂φ
(x
∂x i ), F i := F(x i ), and  i := (x i ). Using (8.40), to
compute V̇c then yields

V̇c = W̃c  −1 W̃˙ c − W̃c  −1 


1
˙ −1 W̃c
2
 
κc  ωi ωi κc  ωi  1  
N N
  ∂φ
= W̃c − W̃c + W̃a G φ,i W̃a − W ∂ x i Fi θ̃ + i
N
i=1
ρi2 N
i=1 i
ρ2 4
 
κc  ωi ωi
N
 β −1
− W̃c  + W̃c
2 2N
i=1
ρi2
   
1  ωi ωi  ωi  
N N
 β −1  κc  ∂φ
= − κc W̃c  + W̃c + W̃c i − W ∂ x i Fi θ̃
2κc 2N ρi2
i=1
N ρi2
i=1

κc 
N
ωi W̃a G φ,i W̃a
+ W̃c .
N
i=1
4ρi2
(8.41)
Provided the conditions of Lemma 8.1 hold, then V̇c can be upper bounded as
148 8 Safe Exploration in Model-Based Reinforcement Learning

  √
β λc 3 3κc
V̇c ≤ − κc + W̃c 2 +  X W̃c
2κc ¯ 2 16
√    N
ωi W̃a G φ,i W̃a
3 3κc    κc
+ W  ∂φ∂ x F  W̃ c θ̃ + W̃ c
16 X N
i=1
4ρi2
√ √  
(8.42)
3 3κc 3 3κc  
= − κc λ̄ W̃c + 2
 X W̃c + W  ∂φ∂ x X W̃c
F θ̃
16 16
κc  ωi W̃a G φ,i W̃a
N
+ W̃c ,
N
i=1
4ρi2

where the bound follows from the fact that for any ω ∈ Rl
  √
 ω  3 3
 ≤ .
 (1 + ω ω)2  16

We now proceed to analyze W̃a :

W̃˙ a = − Ŵ˙ a
κc  G φ (xi )Ŵa ωi
N
= κa1 (Ŵa − Ŵc ) + κa2 Ŵa − Ŵc
N
i=1
4ρi2

κc  G φ (xi )Ŵa ωi


N
= κa1 (W − W̃a − W + W̃c ) + κa2 W − κa2 W̃a − Ŵc
N
i=1
4ρi2

κc  G φ (xi )Ŵa ωi


N
= − (κa1 + κa2 )W̃a + κa1 W̃c + κa2 W − Ŵc .
N
i=1
4ρi2

The last term in the preceding equation can be expanded as

G φ Ŵa ω G φ Ŵa ω G φ Ŵa ω


Ŵ c = W − W̃c
4ρ 2 4ρ 2 4ρ 2
G φ W ω G φ W̃a ω G φ W ω G φ W̃a ω
= W − W − W̃ c + W̃c ,
4ρ 2 4ρ 2 4ρ 2 4ρ 2
which implies that

κc  G φ,i W ωi
N
W̃˙ a = − (κa1 + κa2 )W̃a + κa1 W̃c + κa2 W − W
N
i=1
4ρi2

κc  G φ,i W̃a ωi κc  G φ,i W ωi κc  G φ,i W̃a ωi


N N N
+ W+ W̃c − W̃c ,
N
i=1
4ρi2 N
i=1
4ρi2 N
i=1
4ρi2
8.3 Online Model-Based Reinforcement Learning 149

Using the above equation to compute V̇a yields

V̇a = W̃a W̃˙ a


κc 
N
G φ,i W ωi
= − (κa1 + κa2 ) W̃a 2
+ κa1 W̃a W̃c + κa2 W̃a W − W̃a W
N
i=1
4ρi2

κc 
N
G φ,i W̃a ωi G φ,i W ωi G φ,i W̃a ωi
+ W̃a W+ W̃c − W̃c .
N
i=1
4ρi2 4ρi2 4ρi2

Upper bounding the above expression then yields

V̇a ≤ − (κa1 + κa2 ) W̃a 2 + κa1 W̃a W̃c + κa2 W W̃a


√ √
3 3κc   3 3κc  
+ W 2
G φ X W̃a + W G φ X W̃a 2
64 64

κc  G φ,i W̃a ωi
N
3 3κc  
+ W G φ X W̃a W̃c − W̃a W̃c .
64 N 4ρi2 i=1

After grouping similar terms, the above bound can be expressed as


 √
3 3κc
V̇a ≤ W G φ X − (κa1 + κa2 ) W̃a 2
64
 √
3 3κc
+ κa2 W + W 2 G φ X W̃a (8.43)
64
 √
N
G φ,i W̃a ωi
3 3κc  κc
+ κa1 + W G φ X W̃a W̃c − W̃a W̃c .
64 N
i=1
4ρi2

Now, adding V̇c and V̇a , taking upper bounds using (8.42) and (8.43), and recognizing that
the last term in (8.42) and (8.43) cancel out yields
 √
3 3κc
V̇c + V̇a ≤ − κc λ̄ W̃c 2 − (κa1 + κa2 ) − W G φ X W̃a 2
64
√  √
3 3κc 3 3κc
+  X W̃c + κa2 W + W 2 G φ X W̃a
16 64
 √  √
 
3 3κc 3 3κc  
+ κa1 + W G φ X W̃a W̃c + W  ∂φ
∂ x X
F W̃c θ̃ .
64 16
(8.44)
Defining
150 8 Safe Exploration in Model-Based Reinforcement Learning


3 3κc
ιc :=  X
16 √
3 3κc
ιa := κa2 W + W 2 Gφ X
√ 64
3 3κc
υac := κa1 + W Gφ X
√ 64
 
3 3κc  
υcθ := W  ∂φ ∂ x X
F
16

3 3κc
υa := W Gφ X,
64
allows the bound in (8.44) to be compactly represented as
 
V̇c + V̇a ≤ − κc λ̄ W̃c 2 − (κa1 + κa2 ) − υa W̃a 2 + ιc W̃c + ιa W̃a
+ υac W̃a W̃c + υcθ W̃c θ̃ .

Combining the above bound with that on V̇θ from Assumption 8.2 allows V̇ to be bounded
as
V̇ ≤ − κc λ̄ W̃c 2 − (κa1 + κa2 ) W̃a 2 + υa W̃a 2 − c3 θ̃ 2
(8.45)
+ ιc W̃c + ιa W̃a + υac W̃a W̃c + υcθ W̃c θ̃ .
The objective is now to complete squares in the above relation to show that V̇ < 0 for
sufficiently large weight estimation errors. To this end, observe that

κc λ̄ ι2c
− W̃c 2
+ ιc W̃c ≤ ,
2 2κc λ̄

ιa2
− 21 (κa1 + κa2 ) W̃a 2
+ ιa W̃a ≤ ,
2(κa1 + κa2 )
which allows V̇ to be further bounded as
κc λ̄ κa + κa2
V̇ ≤ − W̃c 2 − 1 W̃a 2 + υa W̃a 2
− c3 θ̃ 2
2 2
+ υac W̃a W̃c + υcθ W̃c θ̃ + ι,

where
ι2c ιa2
ι := + .
2κc λ̄ 2(κa1 + κa2 )
Partitioning terms in the preceding bound allows the bound on V̇ to be expressed as
8.3 Online Model-Based Reinforcement Learning 151

κc λ̄ κa1 + κa2 c3
V̇ ≤ − W̃c 2
− W̃a 2
− θ̃ 2

4 4⎡ 2 ⎤⎡ ⎤
κc λ̄
! ⎢ 4 κa +κ − υ2ac − υ2cθ W̃c
⎥⎣ (8.46)
− W̃c W̃a θ̃ υ
⎣− 2 ac 1
4
a2
− υa 0 ⎦ W̃a ⎦ .
υcθ c3 θ̃
− 2 0 2
  
M

Provided that λmin (M) > 0, the above can be further bounded as

κc λ̄ κa + κa2 c3
V̇ ≤ − W̃c 2 − 1 W̃a 2
− θ̃ 2

4 4 2
≤ − κ3 z 2 + ι (8.47)
κ3
≤ − V + ι.
κ2
Invoking the Comparison Lemma (Lemma 2.1) implies that for all t ∈ R≥0
κ
− κ3 t
κ
− κ3 t ι
V (z(t), t) ≤ V (z(0), 0)e 2 + (1 − e 2 ) (8.48)
κ3
which, after combining with the bounds on V , implies that for all t ∈ R≥0

κ2 κ
− 3t
κ
− 3t ι
z(t) ≤ z(0) 2 e κ2 + (1 − e κ2 ) , (8.49)
κ1 κ1 κ3
which is exactly the bound from (8.30), as desired. 

Having established convergence of the estimated weights to a neighborhood of their ideal


values, we now establish stability of the closed-loop system under the MBRL-based policy
from (8.10), which is the learned policy used to generate data and not the safe policy (8.20)
ultimately applied to the system.2 The stability analysis is facilitated by the Lyapunov
function candidate
VL (y, t) := V ∗ (x) + V (z, t), (8.50)
where y := [x  z  ] is a composite state vector with z ∈ R2l+ p defined as in Theorem 8.2
and V is the Lyapunov function candidate from the proof of Theorem 8.2. We recall that
since VL is positive definite, there exist of α1 , α2 ∈ K satisfying

α1 ( y ) ≤ VL (y, t) ≤ α2 ( y ). (8.51)

Before stating the result, we require one more Lyapunov theorem that we have not yet
introduced.

2 Results on stability using the policy from (8.20) are postponed until the next chapter.
152 8 Safe Exploration in Model-Based Reinforcement Learning

Theorem 8.3 (Uniformly ultimately bounded) Let f : Rn × R≥0 → Rn be a vector field,


locally Lipschitz in its first argument and piecewise continuous in its second argument,
that induces the non-autonomous dynamical system ẋ = f (x, t) defined on some domain
X ⊂ Rn . Let V : Rn × R≥0 → R≥0 be a continuously differentiable function satisfying

α1 ( x ) ≤ V (x, t) ≤ α2 ( x ), ∀(x, t) ∈ X × R≥0 , (8.52a)

V̇ (x, t) ≤ −W (x), ∀ x ≥ μ > 0, ∀t ∈ R≥0 , (8.52b)


where α1 , α2 ∈ K and W : Rn → R≥0 is continuous and positive definite. Let B̄r (0) ⊂
X be a closed ball of radius r ∈ R>0 contained in X centered at the origin such that
μ < α2−1 (α1 (r )). Then, for any initial condition x 0 ∈ X satisfying x0 ≤ α2−1 (α1 (r )) there
exists a β ∈ KL and a time T ∈ R≥0 such that the trajectory t → x(t) with x(0) = x0
satisfies
x(t) ≤ β( x(0) , t), ∀t ∈ [0, T ], (8.53a)
x(t) ≤ α1−1 (α2 (μ)), ∀t ∈ [T , ∞). (8.53b)

The above theorem allows one to establish asymptotic convergence of trajectories to a ball
about the origin and is very similar to the notion of input-to-state stability used in Chap. 6. A
trajectory satisfying (8.53) is said to be uniformly ultimately bounded. We now have all the
tools in place to provide conditions under which the learning-based controller guarantees
stability.

Theorem 8.4 Consider system (4.1) under the influence of the learning-based policy from
(8.10). Let y := [x  W̃c W̃a θ̃  ] ∈ Rn+2l+ p be a composite state vector and suppose
the estimated weights and parameters are updated according to (8.23), (8.24), (8.25), and
(5.25). Let B̄r (0) ⊂ X × R2l+ p be a closed ball of radius r ∈ R>0 contained in X × R2l+ p .
Provided the conditions of Theorem 8.2 hold and

μ := α3−1 (2ν) < α2−1 (α1 (r )),

where
ι2c ιa2
2
ν := + + 1 G ε (x) X
2κc λ̄ 2(κa1 + κa2 ) 4 (8.54)
 
 
ιa2 := ιa + 21 W  G φ (x) + ∇ε(x)G R (x) ∂φ
∂x (x)  ,
X
and α3 ∈ K satisfies

λ̄κc κa1 + κa2 c3


α3 ( y ) ≤ Q(x) + W̃c 2
+ W̃a 2
+ θ̃ 2
, (8.55)
4 4 2
then any trajectory t → y(t) with an initial condition such that
8.3 Online Model-Based Reinforcement Learning 153

y(0) ≤ α2−1 (α1 (r )),

satisfies
lim sup y(t) ≤ α1−1 (α2 (μ)). (8.56)
t→∞

Proof The proof is facilitated by the Lyapunov function candidate VL from (8.50) composed
the optimal value function and the Lyapunov function used in the proof of Theorem 8.2. We
begin by analyzing V ∗ whose Lie derivative along the closed-loop dynamics (4.1) equipped
with the controller from (8.10) is
∂V ∗
V̇ ∗ = ∂x ( f + Fθ + g k̂)
∂V ∗ 1 ∂V ∗ ∂φ 
= ∂x ( f + Fθ ) − 2 ∂ x G R ∂ x Ŵa
∂V ∗ 1 ∂V ∗ ∂φ  1 ∂V ∗ ∂φ 
= ∂x ( f + Fθ ) − 2 ∂ x G R ∂ x W + 2 ∂ x G R ∂ x W̃a
∂V ∗ 1  1 ∂ε ∂φ  1  1 ∂ε ∂φ 
= ∂x ( f + Fθ ) − 2 W G φ W − 2 ∂ x G R ∂ x W + 2 W G φ W̃a + 2 ∂ x G R ∂ x W̃a .
(8.57)
Using the alternate form of the optimal Hamiltonian from (8.35), we have that

∂V ∗ ∂φ 
∂x ( f + Fθ ) = −Q + 14 W  G φ W + 1 ∂ε
2 ∂x G R ∂x W + 41 G ε ,

which, after substituting into the preceding expression and upper bounding yields

V̇ ∗ = − Q − 41 W  G φ W + 41 G ε + 21 (W  G φ + ∂∂εx G R ∂φ
∂ x ) W̃a
  (8.58)
 
≤ − Q + 14 G ε X + 21 W  G φ + ∂∂εx G R ∂φ
∂ x  W̃a . X

Now, taking the Lie derivative of VL along the composite system trajectory and then upper
bounding using the bounds on V̇ and V̇ ∗ from (8.45) and (8.58), respectively, yields

V̇L = V̇ ∗ + V̇
 
 ∂ε ∂φ  
≤− Q+ 1
4 Gε X + 21 W  G φ + ∂ x G R ∂ x X W̃a

− κc λ̄ W̃c 2
− (κa1 + κa2 ) W̃a 2
+ υa W̃a 2
− c3 θ̃ 2
(8.59)
+ ιc W̃c + ιa W̃a + υac W̃a W̃c + υcθ W̃c θ̃
= − Q − κc λ̄ W̃c 2
− (κa1 + κa2 ) W̃a 2
+ υa W̃a 2
− c3 θ̃ 2

+ ιc W̃c + ιa2 W̃a + υac W̃a W̃c + υcθ W̃c θ̃ + 1


4 Gε X,

where  
 ∂ε ∂φ  
ιa2 := ιa + 21 W  G φ + ∂ x G R ∂ x X .
154 8 Safe Exploration in Model-Based Reinforcement Learning

From this point the proof follows similar steps to those of Theorem 8.2: after separating
terms, completing squares, and further bounding we obtain

κc λ̄ κa + κa2
V̇L ≤ − Q − W̃c 2 − 1 W̃a 2 + υa W̃a 2
− c3 θ̃ 2
2 2
+ υac W̃a W̃c + υcθ W̃c θ̃ + ν,

where
ι2c ιa2
2
ν := + + 1
G ε (x) X
2κc λ̄ 2(κa1 + κa2 ) 4

Partitioning terms in the preceding bound allows the bound on V̇L to be expressed as

κc λ̄ κa1 + κa2 c3
V̇L ≤ − Q − W̃c 2
− W̃a 2
−θ̃ 2 + ν
4 ⎡ 4 2 ⎤⎡ ⎤
!⎢ 4
κc λ̄
− υ2ac − υ2cθ W̃c
κa1 +κa2 ⎥ (8.60)
− W̃c W̃a θ̃ ⎣− υac
2 4 − υa 0 ⎦ ⎣ W̃a ⎦ .
− υ2cθ 0 c3
2
θ̃
  
M

Provided the conditions of Theorem 8.2 hold, then M is positive definite and V̇L can be
further bounded as
κc λ̄ κa1 + κa2 c3
V̇L ≤ − Q − W̃c 2
− W̃a 2
− θ̃ 2

4 4 2
(8.61)

≤ − α3 ( y ) + ν,

where α3 ∈ K is any class K function satisfying

κc λ̄ κa1 + κa2 c3
α3 ( y ) ≤ Q(x) + W̃c 2
+ W̃a 2
+ θ̃ 2
.
4 4 2
The bound in (8.61) implies that

V̇L ≤ − 21 α3 ( y ), ∀ y ≥ α3−1 (2ν) . (8.62)


  
μ

Hence, provided that μ < α2−1 (α1 (r )) then Theorem 8.3 implies the existence of a β ∈
KL and a time T ∈ R≥0 such that for any initial condition y0 := y(0) satisfying y0 ≤
α2−1 (α1 (r )) the resulting solution t → y(t) satisfies

y(t) ≤β( y0 , t), ∀t ∈ [0, T ],


y(t) ≤α1−1 (α2 (μ)), ∀t ∈ [T , ∞),
8.4 Numerical Examples 155

which implies
lim sup y(t) ≤ α1−1 (α2 (μ)),
t→∞
as desired. 

We again remark that the above theorem does not establish stability under the safe RL
policy from (8.20); rather, it establishes stability under the nominal (potentially unsafe) RL
policy from (8.10). We note that, similar to Theorem 8.2, verifying the conditions of the
preceding theorem are challenging, and, in practice, one must generally resort to trial-and-
error tuning of the hyperparameters associated with the algorithm (e.g., number of basis
functions, learning gains, number of sampling points, etc.) to produce a stabilizing control
policy. Despite this, note that the learning-based policy from (8.20) is safe by definition as
the RaCBF conditions are independent of any conditions associated with the RL approach.

8.4 Numerical Examples

Example 8.1 Our first example reconsiders the two-dimensional nonlinear system and safe
set from Example 7.2. The system dynamics are in the form of (4.1) and are given by
⎡ ⎤
" # " # " # θ1 " #
ẋ1 0 x1 x2 0 ⎣ ⎦ 0
= + θ2 + u,
ẋ2 0 0 0 x13 x2
    θ3   
f (x) F(x)    g(x)
θ

where the uncertain parameters are set to θ = [−0.6 − 1 1] . The main objective is to drive
the system to the origin while remaining in a safe set C ⊂ R2 defined as the zero superlevel
set of
h(x) = 1 − x1 − x22 ,
and minimizing the infinite-horizon cost (8.1) with a running cost (8.2) defined as

(x, u) = 1
2 x 2
+ u2.

The uncertain parameters are learned online using the update law from (5.25) with γc = 1 and
a history stack with M = 20 entries, where the integration window is chosen as t = 0.5.
To guarantee safety using the RaCBF-QP (8.20) we compute bounds on the initial estimation
error by assuming that the parameters belong to the set

θ ∈ [−1, 0] × [−1.5. − 0.5] × [0.5, 1.5] ⊂ R3 ,

and choose the extended class K∞ function as α(s) = 10s 3 . The value function is approxi-
mated using the quadratic basis
156 8 Safe Exploration in Model-Based Reinforcement Learning

⎡ ⎤
x12
φ(x) = ⎣x1 x2 ⎦
x22 ,
and the sample points used to evaluate the BE are chosen as the vertices of a uniform grid over
[−4, 1] × [−2, 2] ⊂ R2 for a total of N = 99 samples. These samples are used to update the
weight estimates using the update laws from (8.23), (8.24), and (8.26) with κc = 1, β = 1,
κa = 1.
To demonstrate the efficacy of the approach, we simulate the system with and without
shielding the learned policy with the RaCBF-QP (8.20). For each simulation, the initial state
is taken as x(0) = [−4 1] , the initial value function and policy weights are drawn from a
uniform distribution3 between 0 and 1, and the initial model parameter estimate is taken as
θ̂ (0) = [−0.5 − 0.6, 0.7] . The results of the simulations are provided in Figs. 8.1 and 8.2.

Fig. 8.1 Trajectories of the nonlinear system under the RL policy (8.10) with (solid blue curve) and
without (dotted green curve) intervention from the RaCBF-QP (8.20). The black curve denotes the
boundary of the safe set

3 The same random weights are used for both simulations.


8.4 Numerical Examples 157

Fig. 8.2 Evolution of the value and policy weights (top) as well as the parameter estimates (bottom)
for the nonlinear system under the RaCBF-QP based safety filter from (8.20), which corresponds
to the state trajectory given by the blue curve in Fig. 8.1. In the bottom plot, the dotted lines of
corresponding color indicate the true values of the model parameters and the solid curve indicate the
parameter estimates over time, which converge to their true value in just under 4 s

In particular, Fig. 8.1 illustrates the trajectories of the system under the controller with and
without shielding, whereas Fig. 8.2 illustrates the evolution of the value and policy weights,
as well as the estimated model parameters. As shown in Fig. 8.1, shielding the RL policy
keeps the system safe, whereas the system leaves the safe set under the RL policy without any
shielding. Moreover, similar to previous chapters, the adaptive CBF safety filter gradually
allows the system to come closer to the boundary of the safe set as the uncertain model
parameters are identified (see Fig. 8.2). As the value function for this problem is unknown,
the accuracy of the value function and policy weights is challenging to quantify; however,
in each of the simulations the weights converge on a policy that stabilizes the system to the
origin as predicted by Theorem 8.2.
158 8 Safe Exploration in Model-Based Reinforcement Learning

Example 8.2 We now examine a simple scenario to demonstrate some interesting properties
of the proposed online learning approach. We reconsider the robot motion planning example
from Example 3.2 in which a mobile robot is modeled as a single integrator ẋ = u and the
goal is to drive the robot from its initial position to the origin while avoiding an obstacle.
The safety objective can be considered by defining

h(x) = x − xo 2
− ro2 ,

where xo ∈ R2 is the location of the obstacle’s center and ro ∈ R>0 is its radius, which is
used to construct a safe set C ⊂ R2 as in (3.3) as well as a CBF with α(s) = s 3 . To obtain
a stabilizing control policy we associate to the single integrator an infinite-horizon optimal
control problem as in (8.1) with a running cost of

(x, u) = x 2
+ u 2.

Fig. 8.3 Trajectories of the single integrator under the safe RL policy (blue curve) and safe LQR
policy (green curve). The gray disk denotes the obstacle
8.4 Numerical Examples 159

As the system is linear and the cost is quadratic, the HJB (8.4) reduces to the algebraic
Ricatti equation, which can be solved using standard numerical tools to obtain the optimal
value function as
V ∗ (x) = x12 + x22 .
In principle, one could use the linear quadratic regulator (LQR) policy induced by the
above value function as the nominal controller in the standard CBF-QP (3.12) to solve this
problem; however, as demonstrated in the subsequent numerical results, such an approach
presents certain limitations. To compare the LQR solution with the RL solution presented
in this chapter, we approximate the value function using the same basis in the preceding
example—this implies that the optimal value function weights are W = [1 0 1] and that
there exists no function reconstruction error ε. To learn the value function online, we eval-
uate the approximate BE (8.15) at N = 25 points sampled from a multivariate Gaussian
distribution using the current state as the mean and a covariance matrix of 0.1I2×2 at every
instant in time. The sampled BE is then used to update the value function weights using the
update law in (8.23) and (8.24) with κc = 1 and β = 0.001. The control policy weights are
updated once again using the simple projection update law from Remark 8.2 with κa = 1.
Since the dynamics for this example are trivial, no parameter identification is performed.
The results of the simulations comparing the performance of the RL-based policy and the
LQR policy, both of which are filtered through the CBF-QP (3.12), are provided in Figs. 8.3
and 8.4. The resulting system trajectories are illustrated in Fig. 8.3, where the trajectory
under the RL policy navigates around the obstacle and converges to the origin, whereas
the trajectory under the LQR policy gets stuck behind the obstacle. Note that the LQR
controller filtered through the CBF-QP is a continuous time-invariant feedback controller,
and, as discussed in Chap. 3, there are fundamental limitations to the behavior that can be
produced by continuous time-invariant vector fields. On the other hand, the RL controller is
continuous but is also a dynamic feedback controller as it explicitly depends on the control
policy weights Ŵa , which are updated based upon data observed online. As seen in Fig. 8.4
(top), the trajectories under both policies quickly approach the obstacle and initially fail to
make progress towards the goal. However, unlike the static LQR policy, the weights of the
RL controller continue to evolve, as shown in Fig. 8.4 (bottom) and eventually converge to
a policy that navigates the robot around the obstacle and to the goal. Furthermore, by the
end of the simulation the weights have converged extremely close to their optimal values in
line with the results of Theorem 8.2.
160 8 Safe Exploration in Model-Based Reinforcement Learning

Fig.8.4 Top: Evolution of the system states under the safe RL and safe LQR policy. Bottom: Evolution
of the estimated value function and control policy weights of the the RL policy

8.5 Notes

In this chapter, we presented a safe online reinforcement learning (RL) framework that
allows for simultaneously learning the value function and dynamics of a nonlinear system in
real-time while satisfying safety constraints. The roots of the online RL method developed
in this chapter can be traced back to the work of Vamvoudakis and Lewis in [1]. This
work proposed an “actor-critic” structure in which two function approximators were tuned
simultaneously online using techniques from adaptive control to learn the value function
(critic) and controller (actor) that solve an undiscounted infinite-horizon optimal control
problem.4 The method from [1] was initially limited to systems with known dynamics;

4 This is the primary reason why the estimated value function weights are marked with the subscript
‘c’ (for critic) and the estimated policy weights are marked with the subscript ‘a’ (for actor).
8.5 Notes 161

however, extensions of this approach to uncertain systems were quickly achieved using
model free [2, 3] or model-based RL methods [4]. A key limitation of early works in
this area is that convergence of the weight estimates was contingent on the persistence of
excitation conditions from Chap. 4, which can be seen as an analogue of the exploration
paradigm often mentioned in the RL literature. The approach from [5] alleviated such a
restriction using ideas from concurrent learning adaptive control [6] by introducing the idea
of “simulation of experience” in which an estimated model of the system is used to generate
data for learning. Similar ideas that relax the PE requirement have also been used in model
free RL approaches [7]. The use of concurrent learning-based ideas in this setting often
draws analogues with the idea of “experience replay” from the RL literature. Surveys of
model-free RL methods stemming from the work of [1] can be found in [8, 9] whereas a
collection of MBRL methods can be found in [10].
Although the online RL method from [1] was originally developed for unconstrained
optimal control problems, recent works have begun to extend such ideas to those with input
and state constraints. Such extensions are typically facilitated by the inclusion of additional
terms in the cost function that penalize violation of such constraints. For example, works such
as [7, 11–13] include a tanh-based control input penalty in the cost function, which is shown
to guarantee satisfaction of hyper-rectangular actuation constraints. State constraints can be
handled by including reciprocal barrier-based terms in the cost function either implicitly
via coordinate transformation [14–17] or by explicitly [18–20] including terms that take
large/infinite value on the boundary of the state constraint set. A challenge with including
terms that take infinite values on the boundary of the constraint set is that the resulting
value function may be non-differentiable at such points, which makes it challenging to
approximate using smooth function approximators. A more fundamental challenge, however,
is that safety guarantees in such approaches generally use the value function as a safety
certificate. Since the ultimate objective of such approaches is to learn the value function,
and therefore the safety certificate, learning and safety become tightly coupled—safety
guarantees are contingent upon convergence of the RL algorithm, which, as discussed in
this chapter, relies on conditions that cannot be verified in practice.
The approach presented in this chapter was originally introduced in [21], where a recipro-
cal barrier function-based controller (based on the class of Lyapunov-like barrier functions
from [22]) was used as a shielding controller instead of the adaptive CBF-QP (aCBF-QP)
used here. However, reciprocal barrier functions (i.e., those that take unbounded values on
the boundary of the safe set) may require large control values near the boundary of the
safe set and are not well-defined outside the safe set. Fortunately, the method from [21] can
be easily extended to use zeroing-type barrier functions by simply replacing the reciprocal
barrier function-based shielding controller from [21] with an aCBF-QP filter from Chap. 5,
and is done so explicitly in this chapter as well as in [23]. Incorporating CBFs into more
traditional episodic RL frameworks to endow such controllers with safety properties has
also become popular recently [24, 25]. Standard references on RL include [26–28].
162 8 Safe Exploration in Model-Based Reinforcement Learning

Lemma 8.1 is adapted from [29]. Further details on the projection operator can be found
in [30, Appendix E]. Similar to Chap. 7, the system dynamics and safety constraint from
Example 8.1 are taken from [31]. Further details on the universal function approximation
theorem can be found in [32, 33] and [10, Chap. 2].

References

1. Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time
infinite horizon optimal control problem. Automatica 46(5):878–888
2. Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL (2009) Adaptive optimal control for
continuous-time linear systems based on policy iteration. Automatica 45:477–484
3. Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal
control for partially unknown nonlinear systems. Neural Netw 22(3):237–246
4. Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A
novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear
systems. Automatica 49(1):82–92
5. Kamalapurkar R, Walters P, Dixon WE (2016) Model-based reinforcement learning for approx-
imate optimal regulation. Automatica 64:94–104
6. Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without
persistency of excitation. In: Proceedings of the IEEE conference on decision and control, pp
3674–3679
7. Modares M, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experi-
ence replay for adaptive optimal control of partially-unknown constrained-input continuous-time
systems. Automatica 50(1):193–202
8. Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control:
Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst
32(6):76–105
9. Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2017) Optimal and autonomous control
using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062
10. Kamalapurkar R, Walters P, Rosenfeld JA, Dixon WE (2018) Reinforcement learning for optimal
feedback control: a lyapunov-based approach. Springer
11. Abu-Khala M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating
actuators using a neural network hjb approach. Automatica 41(5):779–791
12. Vamvoudakis KG, Miranda MF, Hespanha JP (2015) Asymptotically stable adaptive-optimal
control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Trans
Neural Netw Learn Syst 27(11):2386–2398
13. Deptula P, Bell ZI, Doucette EA, Curtis JW, Dixon WE (2020) Data-based reinforcement learning
approximate optimal control for an uncertain nonlinear system with control effectiveness faults.
Automatica 116:1–10
14. Yang Y, Vamvoudakis KG, Modares H, He W, Yin Y, Wunsch D (2019) Safety-aware reinforce-
ment learning framework with an actor-critic-barrier structure. In: Proceedings of the American
control conference, pp 2352–2358
15. Yang Y, Vamvoudakis KG, Modares H (2020) Safe reinforcement learning for dynamical games.
Int J Robust Nonlinear Control 30(9):3706–3726
16. Greene ML, Deptula P, Nivison S, Dixon WE (2020) Sparse learning-based approximate dynamic
programming with barrier constraints. IEEE Control Syst Lett 4(3):743–748
References 163

17. Mahmud SMN, Hareland K, Nivison SA, Bell ZI, Kamalapurkar R (2021) A safety aware model-
based reinforcement learning framework for systems with uncertainties. In: Proceedings of the
American control conference, pp 1979–1984
18. Cohen MH, Belta C (2020) Approximate optimal control for safety-critical systems with control
barrier functions. In: Proceedings of the IEEE conference on decision and control, pp 2062–2067
19. Marvi Z, Kiumarsi B (2021) Safe reinforcement learning: a control barrier function optimization
approach. Int J Robust Nonlinear Control 31(6):1923–1940
20. Deptula P, Chen H, Licitra R, Rosenfeld JA, Dixon WE (2020) Approximate optimal motion
planning to avoid unknown moving avoidance regions. IEEE Trans Robot 32(2):414–430
21. Cohen MH, Belta C (2023) Safe exploration in model-based reinforcement learning using control
barrier functions. Automatica 147:110684
22. Panagou D, Stipanovic DM, Voulgaris PG (2016) Distributed coordination control for multi-robot
networks using lyapunov-like barrier functions. IEEE Trans Autom Control 61(3):617–632
23. Cohen MH, Serlin Z, Leahy KJ, Belta C (2023) Temporal logic guided safe model-based rein-
forcement learning: a hybrid systems approach. Nonlinear Anal: Hybrid Syst 47:101295
24. Cheng R, Orosz G, Murray RM, Burdick JW (2019) End-to-end safe reinforcement learning
through barrier functions for safety-critical continuous control tasks. Proc AAAI Conf Artif
Intell 33:3387–3395
25. Choi J, Castaneda F, Tomlin CJ, Sreenath K (2020) Reinforcement learning for safety-critical
control under model uncertainty using control lyapunov functions and control barrier functions.
Robot: Sci Syst
26. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
27. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific
28. Bertsekas D (2019) Reinforcement learning and optimal control. Athena Scientific
29. Kamalapurkar R, Rosenfeld JA, Dixon WE (2016) Efficient model-based reinforcement learning
for approximate online optimal control. Automatica 74:247–258
30. Krstić M, Kanellakopoulos I, Kokotović P (1995) Nonlinear and adaptive control design. Wiley
31. Jankovic M (2018) Robust control barrier functions for constrained stabilization of nonlinear
systems. Automatica 96:359–367
32. Hornik K, Stinchcombea M, White H (1990) Universal approximation of an unknown mapping
and its derivatives using multilayer feedforward networks. Neural Netw 3:551–560
33. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw
4:251–257
Temporal Logic Guided Safe Model-Based
Reinforcement Learning
9

Temporal logics are formal, expressive languages traditionally used in the computer science
area of formal methods to specify the correctness of digital circuits and computer programs.
Safety, as defined in previous chapters, is just a particular case of a temporal logic formula. In
this chapter, we discuss adding general, temporal logic specifications to the control problem
considered previously. In Sect. 9.1, we introduce Linear Temporal Logic (LTL) and automata
accepting languages satisfying LTL formulas. Our approach to constructing controllers that
enforce the satisfaction of an LTL formula is to break down an LTL formula into a sequence
of reach-avoid subproblems. In Sect. 9.2, we formulate and solve these subproblems, and in
Sect. 9.3 we present a hybrid system approach to combine the above controllers in such a way
that the trajectories of the closed loop system satisfy the formula. We extend this procedure
to systems for which CLFs are not known or for which the dynamics are uncertain in
Sect. 9.4, where we leverage the MBRL framework from Chap. 8. We illustrate the method
developed in this chapter with numerical examples in Sect. 9.5 and conclude with final
remarks, references, and suggestions for further reading in Sect. 9.6.

Until this chapter, our objective was to stabilize a control system, while satisfying safety
specifications and other control and state constraints. Safety can be informally stated as
“nothing bad ever happens”. In previous chapters, a system was considered safe if it was
guaranteed to stay inside a given set i.e., the outside of this set was considered “bad”.
Temporal logics, such as the Linear Temporal Logic (LTL) considered in this chapter, can
be used to express arbitrarily rich specifications, including liveness (“something good should
eventually happen”). In this chapter, we show how to accommodate specifications given in
a fragment of LTL to the control problem considered previously. As in the first part of the
book, we consider nonlinear control affine systems in the form (2.10), reproduced here for

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 165
M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8_9
166 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

convenience:
ẋ = f (x) + g(x)u.

9.1 Temporal Logics and Automata

An LTL formula is built from a finite set of observations1 O, logical connectives, and
temporal operators, such as eventually (♦) and always ().

Definition 9.1 (LTL syntax) A linear temporal logic formula ϕ over a finite set of observa-
tions O is recursively defined as

ϕ =  | o | ϕ1 ∧ ϕ2 | ¬ϕ |  ϕ | ϕ1 U ϕ2 , (9.1)

where o ∈ O is an observation, ϕ, ϕ1 , ϕ2 are LTL formulas,  is the Boolean “true,” the


symbols ∧ and ¬ denote conjunction and negation, respectively, and the symbols  and U
denote the temporal operators “next” and “until,” respectively.

The above syntax can be used to derive other temporal operators, such as “eventually” ♦
and “always”  as
♦ϕ := U ϕ,
ϕ := ¬♦¬ϕ.
The semantics of LTL formulas are interpreted over infinite words composed of observa-
tions2 in O (denoted by O ω ), whose formal definition we omit here and refer the reader to
the notes in Sect. 9.6 for further details.

Example 9.1 (LTL example) As an example of an LTL formula that expresses a control
specification richer than safety, we consider a surveillance task in which a mobile robot
must continuously gather data from a region of interest and bring it back to a base location
while remaining safe. Ultimately, our control objectives for the robot are as follows:

• The robot should continuously gather data;


• The robot should continuously return to the base to upload the data;
• The robot should continuously recharge itself;
• The robot should avoid all dangerous areas.

1 The observations in an LTL formula are typically referred to as atomic propositions.


2 Traditionally, the semantics of LTL formulas are interpreted over infinite words in 2 O ; however, as
will become clear shortly, each o ∈ O will correspond to mutually disjoint regions in the state space
of a dynamical system. Therefore, only a single observation can evaluate to true at a given instant
and there is no loss of generality in considering words only over O.
9.1 Temporal Logics and Automata 167

We formalize the above requirements with the LTL formula

ϕ = (♦gather ∧ ♦base ∧ ♦recharge ∧ ¬danger),

defined over the set of observations

O = {gather, base, recharge, danger}.

In plain English, the above formula reads “Always eventually gather data and always even-
tually return to base and always eventually recharge and always avoid dangerous regions.”

The language of ϕ is the set of all words satisfying ϕ and is denoted by L(ϕ). For any
LTL formula ϕ, there exists a nondeterministic Büchi Automaton with input alphabet O
that accepts exactly the language of ϕ. Various off-the-shelf tools for converting an LTL
formula into a nondeterministic Büchi Automaton are discussed in the notes later on. In
this chapter, we focus on the subset of LTL formulas for which there exists a deterministic
Büchi Automaton (DBA) accepting exactly the language of ϕ. In short (and informally), an
LTL formula can be converted into a DBA if the formula does not contain an “eventually
always” sequence of temporal operators.3

Definition 9.2 (Deterministic Büchi Automaton) A deterministic Büchi Automaton (DBA)


is a tuple A = (Q, q0 , O, δA , Q f ), where Q is a finite set of states, q0 ∈ Q is the initial state,
O is the input alphabet, δA : Q × O → Q is a transition function, and Q f ⊂ Q is the set
of final/accepting states.

A run of a DBA A over a word wo = wo (0)wo (1)wo (2) · · · ∈ O ω is an infinite sequence of


states wq = wq (0)wq (1)wq (2) · · · ∈ Qω such that wq (i + 1) = δA (wq (i), wo (i)). A word
wo is said to be accepted by A if the corresponding run wq intersects with the set of accepting
states Q f infinitely often. The set of all words accepted by A is referred to as the language
of A and is denoted by L(A).

Example 9.2 (LTL example (continued)) The LTL formula from the proceeding example
can be translated into the DBA displayed in Fig. 9.1 with states Q = {q0 , q1 , q2 , q3 , q4 },
where the set of accepting states is Q f = {q2 }. An input word accepted by the DBA is
wo = (gather base recharge)ω , which produces the run wq = q0 (q1 q3 q2 )ω .

In this chapter, each o ∈ O in an LTL formula will correspond to a region of interest


(ROI) Ro ⊂ Rn in the state space of (2.10) satisfying the following conditions.

3 Note that this is not restrictive in practice. Even though ♦ resembles stability, we do not use LTL
specifications for it – we enforce stability with CLFs.
168 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

Fig. 9.1 DBA corresponding to the LTL formula from Example 9.1. If multiple transitions exist
between states, only one transition, labeled by all observations that enable that transition is illustrated

Assumption 9.1 Each region of interest (ROI) Ro ⊂ Rn , o ∈ O in the state space of (2.10)
is closed, nonempty, contains no isolated points, and can be expressed as the zero sublevel
set of a continuously differentiable function, in the sense that

Ro = {x ∈ Rn | h o (x) ≤ 0}
∂Ro = {x ∈ Rn | h o (x) = 0} (9.2)
Int(Ro ) = {x ∈ R | h o (x) < 0}.
n

Moreover, all ROIs are mutually disjoint, i.e., Ro ∩ Ro = ∅ for all (o, o ) ∈ O × O, o = o .

The assumption that h o is continuously differentiable restricts the shape of each ROI
(e.g., it cannot be a rectangle); however, one can typically over/under approximate such
a region using an ellipse, which can be expressed using a continuously differentiable h o .
The assumption that all ROIs are disjoint implies that at most only one observation can be
satisfied at a given state. To associate elements of Rn with the set of observations labeling
the ROIs, we introduce a labeling function  : Rn → O ∪ {null} such that

o, if x ∈ Ro ,
(x) = (9.3)
null, otherwise.
9.2 Simultaneous Stabilization and Safety 169

This labeling function induces an inverse map −1 (o) = {x ∈ Rn | (x) = o}. The ele-
ment null acts as a null observation in the sense that −1 (null) corresponds to the
space between the disjoint ROI, which will be useful in defining the semantics related to
continuous-time trajectories. We now formalize what it means for a trajectory of (2.10) to
satisfy an LTL formula ϕ.

Definition 9.3 (Semantics of a continuous trajectory) Let x : R≥0 → Rn be a continuous


curve. The word generated by x(·) is a sequence wx = wx (0)wx (1)wx (2) . . . recursively
defined as

1. wx (0) = (x(0));
2. wx (i) = limt→τ + (x(t)) for all i ≥ 1 such that τi < ∞, where τi is defined as τ0 = 0
i
and τi := inf{t | t > τi−1 , (x(t)) = wx (i − 1)} for all i ≥ 1;
3. wx (i) = wx (i − 1) for all i such that τi = ∞.

The word wx is said to be well-defined if τi → ∞ as i → ∞. The curve x(·) is said to satisfy


an LTL formula ϕ if and only if the word it generates is well-defined and proj O (wx ) ⊂ L(ϕ),
where proj O (wx ) removes all instances of null from wx .

The above definition implies that the word generated by a continuous curve is exactly the
sequence of observations corresponding to the regions intersected by the curve over time,
where each observation is generated at the instant the curve switches from one region to
another.

9.2 Simultaneous Stabilization and Safety

Our approach to constructing controllers that enforce the satisfaction of an LTL formula by
the closed-loop trajectory of (2.10) is to break down an LTL formula into a sequence of
reach-avoid sub-problems dictated by the structure of the formula’s corresponding DBA.
Central to this approach is the ability to construct continuous controllers that allow for
stabilizing (2.10) to various ROI while avoiding those that should not be visited. In previous
chapters, we have constructed controllers for (2.10) enforcing stability and safety using
CLFs and CBFs, respectively, by filtering a nominal stabilizing controller k0 through the
CBF-QP (3.12)

k(x) = argminu∈Rm 21 u − k0 (x)2


subject to L f h(x) + L g h(x)u ≥ −α(h(x)),

where h : Rn → R is a CBF for (2.10) on a set C ⊂ Rn as in (3.3) and α ∈ K∞ e . Recall

that, provided all the functions involved in (3.12) are Lipschitz continuous, the solution to
(3.12) is as well, and can be expressed in closed-form as
170 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning


⎨k0 (x) if ψ(x) ≥ 0
k(x) = ψ(x) (9.4)
⎩k0 (x) −
L g h(x) 2
Lg h(x) if ψ(x) < 0,

where ψ(x) := L f h(x) + L g h(x)k0 (x) + α(h(x)). Empirically, the above controller per-
forms well at enforcing both (asymptotic) stability and safety; however, as stability is treated
as a “soft” constraint (encoded through the cost function) in the CBF-QP, there is no formal
guarantee, in general, that such a controller enforces asymptotic stability of the origin. The
question we provide an answer to in this section is as follows: When the nominal policy k0
enforces asymptotic stability of the closed-loop system using a Lyapunov function V , when
is V also a Lyapunov function for the closed-loop system under the CBF-QP controller
(3.12) using k0 as the nominal policy? We provide a (conservative) answer to this question
using the notion of a CBF-stabilizable set.

Definition 9.4 (CBF stabilizable set) Let h be a CBF and V a CLF for (2.10) with an
associated locally Lipschitz controller k0 : Rn → Rm satisfying

L f V (x) + L g V (x)k0 (x) ≤ −γ (V (x)), ∀x ∈ Rn , (9.5)

where γ ∈ K. Then, the set


Vl := {x ∈ Rn | V (x) ≤ l} (9.6)
is said to be CBF-stabilizable if either one of the following conditions hold:

1. For all x ∈ C ∩ Vl , we have


L g V (x)L g h(x) ≤ 0. (9.7)
2. Condition (9.7) only holds on some subset S ⊂ C ∩ Vl , and everywhere on the comple-
ment of such a subset, we have

L f h(x) + L g h(x)k0 (x) + α(h(x)) ≥ 0, ∀x ∈ (C ∩ Vl ) \ S. (9.8)


 
ψ(x)

Given a CBF-stabilizable set, the region of points where (9.7) holds represents the set of all
points where the CBF does not interfere with the stabilization objective. The second condition
from Definition 9.4 allows for the existence of points where the CBF could act to prevent
stabilization, but where the nominal controller k0 satisfies the CBF conditions without any
intervention and so such interference is not necessary. To study stability properties of the
closed-loop system under the CBF-QP controller, we must first ensure that the origin is
indeed an equilibrium point of the closed-loop system.

Lemma 9.1 Consider system (2.10) with f (0) = 0 and a locally Lipschitz nominal con-
troller k0 : Rn → Rm satisfying k0 (0) = 0. Let h : Rn → R be a CBF for (2.10) on a
9.2 Simultaneous Stabilization and Safety 171

set C ⊂ Rn as in (3.3) with 0 ∈ Int(C). Then, the origin is an equilibrium point for the
closed-loop system under the CBF-QP controller (3.12).

Proof Under the assumptions of the lemma, we have that

ψ(0) = L f h(0) + L g h(0)k0 (0) + α(h(0)) > 0.


     
=0 =0 >0

It then follows from (9.4) that k(0) = k0 (0), which implies that f (0) + g(0)k(0) = 0, so
that the origin is an equilibrium point for the closed-loop system, as desired. 

The following lemma will be helpful in proving the next result.

Lemma 9.2 Let f : Rn → Rn be a locally Lipschitz vector field and let ⊂ Rn be a


compact set. If the solution of the initial value problem

ẋ(t) = f (x(t)), ∀t ∈ I (x0 ),


x(0) = x0 ,

satisfies x(t) ∈ for all t ∈ I (x0 ), where I (x0 ) = [0, τmax ) is the solution’s maximal inter-
val of existence from an initial condition of x0 , then τmax = ∞.

We now show that the existence of a CBF-stabilizable set is sufficient to ensure asymptotic
stability of the origin under the CBF-QP controller.

Theorem 9.1 Suppose the conditions of Lemma 9.1 hold and let Vl ⊂ Rn be a CBF-
stabilizable set. Then, Vl ∩ C is forward invariant for the closed-loop system, closed-loop
trajectories exist for all time, and the origin is asymptotically stable for the closed-loop
system.

Proof It follows from Lemma 9.1 that the origin is an equilibrium point for the closed-loop
system. Now take the CLF V from Definition 9.4 as a Lyapunov function candidate for the
closed-loop system. The Lie derivative of V along the closed-loop vector field is

V̇ (x) = L f V (x) + L g V (x)k(x)



⎨ L f V (x) + L g V (x)k0 (x), if ψ(x) ≥ 0, (9.9)
= ψ(x)
⎩ L f V (x) + L g V (x)k0 (x) −
L g h(x) 2
L g V (x)L g h(x) , if ψ(x) < 0.

Our objective is now to show that V̇ (x) < 0 for all x ∈ (Vl ∩ C) \ {0}. If Vl is CBF-
stabilizable and (9.7) holds for all x ∈ Vl ∩ C, then (9.9) in conjunction with (9.5) implies
that V̇ (x) ≤ −γ (V (x)) for all x ∈ Vl ∩ C. On the other hand, if (9.7) only holds on
172 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

some subset S ⊂ Vl ∩ C but for all points in (Vl ∩ C) \ S we have ψ(x) ≥ 0, then
V̇ (x) ≤ −γ (V (x)) still holds for all x ∈ Vl ∩ C by (9.9). Thus, in either case, if Vl is
CBF-stabilizable, we have

V̇ (x) ≤ −γ (V (x)), ∀x ∈ Vl ∩ C. (9.10)

We now show that Vl ∩ C is forward invariant for the closed-loop system and that system
trajectories are defined for all time. As V̇ (x) ≤ 0 for all x ∈ Vl ∩ C, then V̇ (x) ≤ 0 for
any sublevel set of V contained in Vl ∩ C. This implies that a given trajectory t  → x(t)
cannot exit Vl ∩ C via ∂Vl or via ∂C since h is a CBF and the controller in (9.4) ensures
ḣ(x) ≥ 0 for all x ∈ ∂C. As the closed-loop vector field is locally Lipschitz, given an initial
condition x0 ∈ Vl ∩ C there exists a maximal interval of existence I (x0 ) = [0, τmax ) and a
continuously differentiable trajectory x : I (x0 ) → Rn solving the initial value problem

ẋ(t) = f (x(t)) + g(x(t))k(x(t)), ∀t ∈ I (x0 ),


x(0) = x0 .

Based on the preceding argument, the solution of the initial value problem satisfies x(t) ∈
Vl ∩ C for all t ∈ I (x0 ), which implies Vl ∩ C is forward invariant for the closed-loop
system. To show that τmax = ∞ in the above initial value problem, we note that since C
is closed and Vl is compact, Vl ∩ C is a compact set, and it follows from Lemma 9.2
that τmax = ∞. As the solution is defined for all times and, along such a solution, we have
V̇ (x(t)) ≤ −γ (V (x(t)), V is a Lyapunov function for the closed-loop system and the origin
is asymptotically stable. 

Provided 0 ∈ Int(C), h is a CBF, and V is a CLF, it is always possible to find a CBF stabi-
lizable set simply by taking l from (9.6) as an arbitrarily small positive constant. Clearly, such
an approach is highly conservative and finding large CBF-stabilizable sets is a challenging
problem. Despite the conservatism of these theoretical results, we demonstrate empirically
later on that simultaneous stabilization and safety can be achieved even in rather complex
environments. In what follows, we are not necessarily interested in stabilizing (2.10) to a
particular point asymptotically; rather, we are interested in ensuring that trajectories of the
closed-loop system reach a desired region in finite time. Fortunately, such a problem can be
addressed by rendering a point on the interior of such a set uniformly4 asymptotically stable.

Proposition 9.1 Let T ⊂ Rn be a non-empty set containing no isolated points and let
0 ∈ Int(T ). Provided the conditions of Theorem 9.1 holds, then there exists a finite T ∈ R≥0
such that x(T ) ∈ T .

4 Asymptotic stability is equivalent to uniform asymptotic stability for time-invariant systems.


9.3 A Hybrid Systems Approach to LTL Control Synthesis 173

Proof Since 0 ∈ Int(C), 0 ∈ Int(Vl ), and 0 ∈ Int(T ) then 0 ∈ Int(C) ∩ Int(Vl ) ∩ Int(T ).
As the intersection of a finite number of set interiors is equal to the interior of the set
intersection we have 0 ∈ Int(C ∩ Vl ∩ T ). As the origin is contained in the interior of
C ∩ Vl ∩ T , then there must exist a δ ∈ R>0 such that Bδ (0) ⊂ C ∩ Vl ∩ T . Now note
that the origin of the closed-loop system is asymptotically stable by Theorem 9.1. Since the
closed-loop system is time-invariant, this implies that the origin is uniformly asymptotically
stable, which implies that for each δ ∈ R>0 there exists a time T = T (δ ) such that x(t) ∈
Bδ (0) for all t ≥ T . Taking δ < δ, we have Bδ (0) ⊂ Bδ (0), which implies that x(t) enters
Bδ (0) by time T < ∞, which also implies x(t) enters T by time T < ∞, as desired. 

Remark 9.1 Although the results of this section are tailored to the special case in which
the origin is the desired equilibrium point to be asymptotically stabilized, the same ideas
are applicable to an arbitrary equilibrium point via a coordinate transformation on the state
and, in certain cases, the control input. An explicit example of this is provided in Chap. 9.4.

9.3 A Hybrid Systems Approach to LTL Control Synthesis

In this section, we illustrate how the CBF controller from the previous section can be used
in a hybrid system framework to synthesize a control policy capable of satisfying an LTL
formula. To this end, consider an LTL formula ϕ defined over a finite set of observations
O corresponding to a collection of ROIs satisfying Assumption 9.1. Given a CBF stabiliz-
able set and a CBF-QP-based controller capable of steering (2.10) to a target set T while
remaining within a safe set C, our objective is to coordinate the sequence of sets that are
reached/avoided by (2.10) in order to generate a word that satisfies the specification. To
formally specify the switching logic used to select among a family of feedback controllers
for (2.10), we consider augmenting the continuous dynamics with the discrete dynamics of
the DBA corresponding to ϕ to form a hybrid control system. To enforce satisfaction of the
specification over the resulting product hybrid system, we leverage the notion of the distance
to acceptance function (DTA). The DTA function is, in essence, a Lyapunov-like function
for a given DBA A = (Q, q0 , O, δA , Q f ) corresponding to an LTL formula ϕ in that it is
non-negative for all q ∈ Q, positive for all q ∈ Q \ Q f , and zero for all q ∈ Q f . Rather than
enforcing convergence of trajectories to an equilibrium, however, the DTA will be used to
enforce convergence to the set of accepting states infinitely often.
To formalize the preceding discussion, let A = (Q, q0 , O, δA , Q f ) be a DBA corre-
sponding to ϕ and let P(qi , q j ) denote the set of all paths in A from qi ∈ Q to q j ∈ Q. More
formally
P(qi , q j ) := {q1 . . . qn | q1 = qi , qn = q j , ∀k ∈ [1, n − 1],
(9.11)
∃o ∈ O s.t. qk+1 = δA (qk , o)}.
The distance between any two states in A is then defined as
174 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning


minq∈P(q,q ) ϒ(q), if P(q, q ) = ∅,
d(q, q ) := (9.12)
∞, if P(q, q ) = ∅,

where q ∈ P(q, q ) denotes a path from q to q and ϒ(q) denotes the number of states in
q. The DTA can then be computed as

Vd (q) := min d(q, q ), (9.13)


q ∈Q f

and represents the minimum number of transitions needed to reach an accepting state from
any given q ∈ Q. These properties imply that enforcing satisfaction of a given LTL formula
ϕ is equivalent to ensuring runs of A reach a set of states where Vd (q) = 0 infinitely often.
The following results outline some useful properties of the DTA.

Lemma 9.3 The distance to acceptance (DTA) function from (9.13) satisfies the following
properties:

(i) An accepting run wq of A cannot contain a state q ∈ Q such that Vd (q) = ∞.


(ii) For each q ∈ Q, if V (q) > 0 and V (q) = ∞, then there exists a state q and an obser-
vation o ∈ O such that q = δA (q, o) and Vd (q ) < Vd (q).

Proof (i) Recall that a run wq = wq (0)wq (1)wq (2) · · · ∈ Qω is accepting if it intersects
with Q f infinitely many times. For the sake of contradiction, suppose there exists a j ∈ N
such that Vd (wq ( j)) = ∞. Based on (9.12) and (9.13) this implies that P(wq ( j), q ) = ∅
for all q ∈ Q f . That is, there exists no path in A starting from wq ( j) and ending in an
accepting state. Thus, there exists no j > j such that wq ( j ) ∈ Q f , which contradicts the
initial assumption that wq is an accepting run of A. Hence, there cannot exist an accepting
run containing a state q ∈ Q such that Vd (q) = ∞, and (i) follows.
(ii) Since q ∈ Q satisfies V (q) > 0, then q ∈ / Q f . Moreover, since Vd (q) = ∞, then
(9.12) implies there exists a finite shortest path q1 q2 . . . qn such that q1 = q and qn ∈ Q f . It
follows from Bellman’s Principle of Optimality that the finite path q2 . . . qn is the shortest
path starting at q2 to reach a state in Q f , hence

Vd (q) = d(q, q2 ) + Vd (q2 ).

Since d(q, q2 ) > 0, the preceding equality only holds if Vd (q2 ) < Vd (q). Since P(q, q2 ) =
∅, with P as in (9.11), then there must exist an observation o ∈ O such that q2 = δA (q, o),
and (ii) follows from taking q = q2 . 

The preceding lemma implies that for any q ∈ Q\Q f such that Vd (q) = ∞, there always
exists a q ∈ Q such that Vd (q ) < Vd (q). This observation allows us to define the set Oq :=
{o ∈ O | ∃q ∈ Q s.t. q = δA (q, o)}, which is the set of all admissible observations for state
9.3 A Hybrid Systems Approach to LTL Control Synthesis 175

q, and
Ōq := argmino∈Oq Vd (δA (q, o)). (9.14)

For each q ∈ Q, the set Ōq in (9.14) represents the set of all observations that force a
transition to the reachable state with the minimum DTA. For q ∈ Q\Q f this guarantees that
Vd (q ) < V (q), where q = δA (q, o) and o ∈ Ōq , whereas for q ∈ Q f inputting observation
o ∈ Ōq forces a transition to the reachable state that incurs the smallest increase in DTA. We
now leverage the properties of the DTA to construct a product hybrid system that captures
the continuous behavior (2.10) and the discrete behavior of A. To formalize this idea, we
first introduce the notion of a hybrid system used in this chapter:

Definition 9.5 (Hybrid system) A hybrid system H is a tuple

H = (Q, X , Init, Dom, F , E, Guard),

where

• Q is a finite set of discrete modes;


• X ⊂ Rn is a set of continuous states;
• Init ⊂ Q × X is a set of initial states;
• Dom : Q ⇒ X is a set-valued map that associates to each discrete mode a set Dom(q) ⊂
X that describes the set of admissible continuous states while in mode q;
• F : Q × X → Rn is a mapping that associates to each discrete mode q ∈ Q a vector
field Fq : X → Rn that characterizes the evolution of the continuous states;
• E ⊂ Q × Q is a set of edges that describe the admissible transitions between discrete
modes.

Next, we define the semantics of a hybrid system:

Definition 9.6 (Semantics of a hybrid system) An execution of H is interpreted over a


N
hybrid time set, which is a finite or infinite sequence of intervals of the form τ = {Ii }i=0
such that: Ii = [τi , τi ] for all i < N; if N < ∞, then IN may be right open or closed;
and τi ≤ τi = τi+1 for all i, where each τi represents the times at which discrete transitions
take place. Given a hybrid time set τ , we denote by τ  the set {0, 1, . . . , N} if N < ∞
and {0, 1, . . . } if N = ∞. Given an initial time t0 ∈ R≥0 , an execution of H is a collection
ξ = (τ, q, x), where τ is a hybrid time set, q : τ  → Q is a mapping from τ  to the
set of discrete modes, and x = {x i : i ∈ τ } is a collection of differentiable maps x i :
Ii → X , such that: (q(0), x 0 (t0 )) ∈ Init; for all t ∈ [τi , τi ) we have x i (t) ∈ Dom(q(i))
and ẋ i (t) = Fq(i) (x i (t)); for all i ∈ τ \{N} we have (q(i), q(i + 1)) ∈ E and x i (τi ) ∈
Guard(q(i), q(i + 1)).
176 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

To study the continuous system and the discrete DBA corresponding to an LTL formula
in a unified framework, let H = (Q, X , Init, Dom, F , E, Guard) be a hybrid system as in
Definition 9.5, where

• Q is the set of discrete modes inherited from A;


• X = Rn is the continuous state space;
• Init = {q0 } × X is the set of initial states, where q0 is inherited from A;
• Dom(q) = X \ ∪o∈Oq −1 (o) is the set of all states in X that lie outside regions of interest
corresponding to observations in Oq ;
• F (q, x) = f (x) + g(x)k(q, x), where f and g are vector fields from (2.10) and k :
Q × X → Rm is a hybrid feedback law to be specified;
• E = {(q, q ) ∈ Q × Q | ∃o ∈ Oq s.t. q = δA (q, o)} is a set of edges such that a transition
from mode q to mode q can be taken if there exists an observation from Oq enabling
such a transition in A;
• Guard(q, q ) = {x ∈ −1 (o) | q = δA (q, o)} is the ROI corresponding observation o
such that q = δA (q, o).

The interpretation of the above definition is that for each q ∈ Q the continuous state
evolves within the domain Dom(q) until Guard(q, q ) is reached, at which point a discrete
transition governed by the dynamics of A takes place. Unfortunately, the structure of H
does not take into account unsafe regions that correspond to observations that should not
appear in the accepting word wo . To address this limitation we associate with each q ∈ Q
a safe set Cq ⊂ X that the continuous state must remain in while in mode q. To formalize
this notion, consider the set Oq¬ := Oq \ Ōq , which is the set of all observations that do not
minimize the DTA when transitioning out of the current mode. For each o ∈ Oq¬ define the
set Cq,o := X \Int(Ro ) = {x ∈ X | h o (x) ≥ 0}, which is used to construct the overall safe
set for mode q as Cq = ∩o∈Oq¬ Cq,o with the convention that points on the boundary of such
ROI are considered safe.5 To ensure the safe set associated with each q ∈ Q can be rendered
forward invariant we make the following assumption.

Assumption 9.2 For each q ∈ Q the set Oq¬ = {oq¬ } is a singleton and h oq¬ is a CBF for
(2.10) over Cq .

The above assumption is motivated by the fact that results regarding CBF-stabilizable
sets are developed for only a single CBF. This assumption may appear very restrictive at
first glance - after all, we would likely need to consider multiple safe sets given a complex
LTL specification. However, multiple CBFs can be combined smoothly using an under
approximation of the min operator to obtain a single CBF that captures multiple safe sets.
To this end, note that multiple barrier functions can be combined in a nonsmooth fashion as

5 If this is undesirable in practice one can always take C


q,o := {x ∈ X | h o (x) ≥ } for some  ∈ R>0 .
9.3 A Hybrid Systems Approach to LTL Control Synthesis 177

h 1 ∧ h 2 = min{h 1 , h 2 },

where h 1 and h 2 are CBFs. To avoid the use of nonsmooth analysis, given a collection of
CBFs h i , i ∈ {1, . . . , N }, the min operator can be under-approximated as
N
− ln exp(−h i (x)) ≤ min h i (x), (9.15)
i∈{1,...,N }
i=1

which provides a conservative sufficient condition for the satisfaction of multiple CBF
constraints. Of course, it may be challenging to verify that the above smooth combination
of CBFs is a CBF in its own right, and for the remainder of this chapter we leverage
Assumption 9.2 to ensure that such a smooth combination indeed produces a CBF.
To specify a control policy for the closed-loop vector field associated with each q ∈ Q,
let {q }q∈Q be a collection of diffeomorphisms such that q (x) := x − xd (q), where xd :
Q → Int(Ro ) with o ∈ Ōq is a mapping that associates to each q ∈ Q a point on the interior
of the region that should be visited by the continuous trajectory of H to enable a transition to
a mode with the minimum DTA.6 To ensure that the continuous trajectory of H is regulated
to Ro for o ∈ Ōq , we associate to each q ∈ Q a CLF Vq satisfying

γ1,q (q (x)) ≤ Vq (q (x)) ≤ γ2,q (q (x)),

for some γ1 , γ2 ∈ K∞ , a locally Lipschitz controller k0,q satisfying

L f Vq (q (x)) + L g Vq (q (x))k0,q (q (x)) ≤ −γ3,q (Vq (q (x))), ∀x ∈ Rn , (9.16)

and a CBF-QP as in (3.12) that filters the nominal policy k0,q to ensure the system stays in the
safe set associated with mode q. Now let {Vq }q∈Q be the collection of CLFs associated with
each mode and for each q ∈ Q, lq ∈ R>0 , consider the set Vl,q := {x ∈ Rn | Vq ((x)) ≤
lq }. We now make the following assumption to ensure that the definition of H is well-posed
in the sense that there exists a suitable continuous feedback law that drives the continuous
trajectory of H from one ROI to another while remaining safe.

Assumption 9.3 Given an execution of H, ξ = (τ, q, x), the set Vl,q(i) is CBF-stabilizable
for all i ∈ Z≥0 and x i (τi ) ∈ Vl,q(i) ∩ Cq(i) for all i ∈ Z≥0 .

The above assumption ensures that there exists a CBF-stabilizable set for each mode which
the hybrid execution traverses and that, upon transitioning to each mode, the trajectory is
contained in the intersection of the CBF-stabilizable set and the safe set. The controller
ultimately applied to the system is

6 In the case when Ō is not a singleton the particular o ∈ Ō used to define q  → x (q) can be
q q d
chosen arbitrarily from Ōq .
178 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

kq (x) = argminu∈Rm 21 u − k0,q (x)2


(9.17)
subject to L f h oq¬ (x) + L g h oq¬ (x)u ≥ −α(h oq¬ (x)),

where k0,q is the nominal hybrid CLF policy from (9.16) that drives the trajectory to the
desired ROI and h oq¬ is the CBF from Assumption 9.2 that represents the safe set associated
with mode q. The following proposition shows that if there exists a suitable sequence of
CBF-stabilizable sets associated with the sequence of stabilization/safety problems dictated
by the structure of H, then the word generated by the execution of H under the hybrid
CBF-based policy (9.17) satisfies ϕ.

Proposition 9.2 Consider system (2.10), an LTL formula ϕ defined over a finite set of
observations O, a hybrid product system H as in Definition 9.5, and assume there exists
an accepting run of A. Provided Assumptions 9.1–9.3 hold, then the word produced by the
execution of H under control (9.17) is accepted by A.

Proof We start by noting that any accepting run of A cannot contain a mode q ∈ Q for
which Vd (q) = ∞ by Lemma 9.3. Since A is deterministic and there is only one possi-
ble initial state, then, by assumption, Vd (q(0)) < ∞. Hence, by Lemma 9.3, there exists
some o ∈ O such that Vd (δA (q(0), o)) < Vd (q(0)). Provided x 0 (τ0 ) ∈ Vl,q(0) ∩ Cq(0) and
Vl,q(0) is CBF-stabilizable, then Theorem 9.1 and Proposition 9.1 imply x 0 (t) reaches
Guard(q(0), q ) for some q such that Vd (q ) < Vd (q(0)) in finite time and remains in
Cq(0) for all times beforehand. Noting that Q is a finite set and repeatedly applying the same
argument under the hypothesis that x i (τi ) ∈ Vl,q(i) ∩ Cq (i) and Vl,q(i) is CBF-stabilizable
for all i ∈ Z≥0 implies that there exists a finite sequence of modes q(0), q(1), . . . , q( N̄ ) such
that Vd (q(0)) > Vd (q(1)) > · · · > Vd (q( N̄ )) = 0. The assumption that Vl,q(i) is CBF-
stabilization for each i implies that the previous transitions occur without the continuous
trajectory generating any observations that should not appear in an accepting word. Since
it is assumed there exists an accepting run of A and such a run cannot contain any states
with an infinite DTA, there must exist some o ∈ O such that Vd (δA (q( N̄ )), o) < ∞. Under
the hypothesis that Vl,q( N̄ ) is CBF-stabilizable and that x N̄ (τ N̄ ) ∈ Vl,q( N̄ ) ∩ Cq( N̄ ) , then
Theorem 9.1 and Proposition 9.1 again imply x N̄ (t ) ∈ Guard(q( N̄ ), q( N̄ + 1)) for some
finite t . Repeatedly applying the same argument using q( N̄ + 1) as the initial mode implies
that modes with Vd (q) = 0 are visiting infinitely often while always remaining in the cor-
responding safe set, which, by Definition 9.3, implies satisfaction of ϕ. 

9.4 Temporal Logic Guided Reinforcement Learning

In the previous section, we illustrated how controllers based on CBFs and CLFs could be
coordinated to solve a sequence of reach-avoid problems towards ultimately satisfying an
LTL specification. A significant limitation of this approach is that it requires constructing a
9.4 Temporal Logic Guided Reinforcement Learning 179

potentially large number of CBFs and CLFs to accomplish the LTL objective. In this section,
we relax the requirement of knowledge of CLFs for the stabilization tasks by leveraging the
model-based reinforcement learning (MBRL) framework from Chap. 8 to solve a sequence of
optimal control problems (whose associated values functions are CLFs) to obtain stabilizing
control policies. Shifting to this MBRL setting also allows for considering systems with
uncertain dynamics using the adaptive CBFs (aCBFs) from Chap. 5.
To extend the framework of Chap. 8 to this setting, we first demonstrate how such a
framework can be used to steer the trajectory to a desired target set T ⊂ Rn while remaining
in a safe set C ⊂ Rn . To this end, we consider the uncertain system from (4.1)

ẋ = f (x) + F(x)θ + g(x)u,

restated above for convenience. We make the following assumption on the control directions
to ensure that the above system can be stabilized to arbitrary points.

Assumption 9.4 The matrix of control directions g is full row rank for all x ∈ Rn , which
guarantees existence of the Moore-Penrose pseudo-inverse g † (x) := g(x) (g(x)g(x) )−1
for all x ∈ Rn .

Under Assumption 9.4, given a desired state xd ∈ T , where T ⊂ Rn is a target set we wish
to reach, selecting u d := −g † (xd )( f (xd ) + F(xd )θ ) ensures xd is an equilibrium point for
ẋ = f (x) + F(x)θ + g(x)u d . Given the feed-forward input u d , we consider decomposing
the control for (4.1) as u = u d + μ, where μ : Rn → Rm is a feedback law to be determined
that regulates (4.1) to T . To facilitate this approach, define the regulation error η := x − xd ,
let  : Rn → Rn be a diffeomorphism such that η = (x) and x = −1 (η), and consider
the auxiliary dynamical system

η̇ = f(η) + F(η)θ + g(η)μ, (9.18)

where
f(η) := f (−1 (η)) − g(−1 (η))g(xd ) f (xd )
F(η) := F(−1 (η)) − g(−1 (η))g(xd )F(xd )
g(η) := g(−1 (η)).
Note that η̇ = ẋ − ẋd = ẋ so that trajectories t  → x(t) of (4.1) can be uniquely recovered
from trajectories t  → η(t) of (9.18) as x(t) = −1 (η(t)). Since, under Assumption 9.4,
system (4.1) can always be put into the form of (9.18) to achieve stabilization to a point other
than the origin, in what follows our development will focus on (4.1) for ease of exposition.
As a general approach to obtaining a stabilizing controller and stability certificate, we follow
the same procedure as in Sect. 8.1 by constructing the cost functional

J (x0 , u(·)) = (x(s), u(s))ds,
0
180 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

where the running cost is of the same form as (8.2), which can be used to define the value
function
V ∗ (x) = inf J (x, u(·)),
u(·)

which satisfies the Hamilton-Jacobi-Bellman equation (HJB)


 
0 = infm L f V ∗ (x) + L F V ∗ (x)θ + L g V ∗ (x)u + (x, u) .
u∈R

Provided the value function is continuously differentiable (see Assumption 8.1), the optimal
policy can then be derived from the HJB as

k ∗ (x) = − 21 R −1 L g V ∗ (x) . (9.19)

Since the value function V ∗ is a CLF (see Theorem 8.1) and the optimal policy k ∗ is an
explicit example of a stabilizing controller using V ∗ as a Lyapunov function, they can be
used within the framework from the previous section to develop a hybrid control scheme that
enforces satisfaction of an LTL specification. Of course, such an approach would require
solving multiple HJB equations (one for each reach-avoid problem), which would quickly
become computationally intractable for complex systems and specifications. Fortunately,
the value function and policy for each optimal control problem can be safely learned online
in real-time using the MBRL algorithm developed in Chap. 8.

Remark 9.2 In the special case that (4.1) is a linear system of the form ẋ = Ax + Bu,
where the entries of A are possibly unknown and are treated as uncertain parameters, the
value function of the optimal control problem for the resulting auxiliary system (9.18) is
invariant to the choice of xd ∈ Rn . To see this, note that the feed-forward input is given
by u d = −B  (B B  )−1 Axd , implying that the auxiliary system (9.18) becomes ẋ = Ax −
B B  (B B  )−1 Axd + Bμ = Ax − Axd + Bμ = Aη + Bμ for any choice of xd . Provided
the running cost is the same for any two optimal control problems characterized by (8.1),
then the HJB is also the same for each problem, which implies they share the same value
function. This will become important later on as, rather than learning a family of value
functions, it is only necessary to learn one value function.

Recall that the MBRL approach proceeds by parameterizing the value function and opti-
mal policy over a compact set X ⊂ Rn as

V̂ (x, Ŵc ) = Ŵc φ(x)


k̂(x, Ŵa ) = − 21 R −1 g(x) ∂φ 
∂ x (x) Ŵa ,

where Ŵc , Ŵa are the parameters and φ(x) is a vector of features, and then passing the
approximated policy through a robust adaptive CBF (RaCBF) QP (8.20) to guarantee forward
invariance of a safe set C. While the RaCBF-QP acts to shield the approximated policy
9.4 Temporal Logic Guided Reinforcement Learning 181

from taking unsafe actions, an estimated model of the system dynamics, which is learned
using the Concurrent Learning techniques from Sect. 4.2, is used to simulate potentially
unsafe actions at various sample points to generate data for updating the value function and
policy parameters. Conditions under which this MBRL converges to a neighborhood of the
true value function and policy were introduced in Theorem 8.2 and stability of the origin
under the learned policy is established in Theorem 8.4. A challenge with directly using the
Lyapunov function from Theorem 8.4 in the CBF-stabilizable set framework introduced in
the present chapter is that such a Lyapunov function guarantees convergence of a composite
state trajectory consisting of the system states and weight estimation errors, which makes it
challenging to develop precise bounds on the trajectory of the system itself. In the following
result we provide an alternative method to establish stability of the closed-loop system under
the learning-based policy k̂ using only the value function as a Lyapunov function.

Theorem 9.2 Consider system (4.1) under the influence of the learning-based policy (8.10).
Suppose the estimated weights and parameters are updated according to (8.23), (8.24),
(8.25), and (5.25). Let B̄r (0) ⊂ X be a closed ball of radius r ∈ R>0 contained within
the compact set over which value function parameterization (8.8) is valid. Provided the
conditions of Theorem 8.2 hold, and

ι := γ3−1 (2ν) < γ2−1 (γ1 (r )),

where γ1 , γ2 , γ3 ∈ K satisfy

γ1 (x) ≤ V ∗ (x) ≤ γ2 (x), γ3 (x) ≥ Q(x),

for all x ∈ X and


 
 ∂ε ∂φ 
ν := 41 G ε (x)X + 21 W  G φ (x) + ∂ x (x)G R (x) ∂ x (x) X W̄a ,

where W̄a is a positive constant satisfying W̃a  ≤ W̄a for all t ≥ 0, then for any trajectory
t  → x(t) with an initial condition such that

x(0) ≤ γ2−1 (γ1 (r )),

there exists a time T ∈ R≥0 and a β ∈ KL such that

x(t) ≤β(x0 , t), ∀t ∈ [0, T ]


x(t) ≤γ1−1 (γ2 (ι)), ∀t ∈ [T , ∞).

Proof Take V ∗ as a Lyapunov function candidate for the closed-loop system under the
policy from (8.10). From (8.58), V ∗ can be upper bounded as
182 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

 
 ∂φ 
V̇ ∗ ≤ −Q(x) + 41 G ε (x)X + 21 W  G φ (x) + ∂ε
∂ x (x)G R (x) ∂ x (x) X  W̃a .

If the conditions of Theorem 8.2 hold, then t  → W̃a (t) is bounded such that W̃a (t) ≤ W̄a
for all t ∈ R≥0 . Hence, V̇ ∗ can be further bounded as

V̇ ≤ − Q(x) + ν
≤ − γ3 (x) + ν
(9.20)
≤ − 21 γ3 (x), ∀x ≥ γ3−1 (2ν) .
 
ι

Hence, provided that ι < γ2−1 (γ1 (r )) then Theorem 8.3 implies the existence of a β ∈ KL
and a time T ∈ R≥0 such that for any initial condition x0 := x(0) satisfying x0  ≤
γ2−1 (γ1 (r )) the resulting solution t  → x(t) satisfies

x(t) ≤β(x0 , t), ∀t ∈ [0, T ]


x(t) ≤γ1−1 (γ2 (ι)), ∀t ∈ [T , ∞),

as desired. 

The preceding theorem implies that all trajectories t  → x(t) starting in the set {x ∈
X | x ≤ γ2−1 (γ1 (r ))} converge to the smaller set := {x ∈ X | x ≤ γ1−1 (γ2 (ι))} in
finite time and remain there for all times thereafter. Since the inverse of a class K func-
tion is a class K function and the composition of class K functions is again a class K
function, the ultimate bound is a class K function of ι and therefore decreases with decreas-
ing ι. Based on the proceeding Theorem, the ultimate bound can be decreased by decreasing
∇εX (by choosing a more expressive function approximator), by decreasing λmin (R) in
the cost function, and by choosing a larger state penalty Q(x) in the cost function. Hence,
the ultimate bound can be made arbitrarily small through the appropriate selection of the
cost function and function approximator. By taking the ultimate bound sufficiently small,
the results of the previous sections can be directly extended to this setting, and then used
within a hybrid system framework to develop controllers satisfying an LTL formula. More
details on this extension are provided in the Notes at the end of this chapter.

9.5 Numerical Examples

Example 9.3 Our first example involves a persistent surveillance scenario for an uncertain
system as in (4.1) with
9.5 Numerical Examples 183

⎡ ⎤
θ
      1   
ẋ1 0 x1 x2 0 0 ⎢ θ ⎥
⎢ 2⎥ + 1 0 u1 ,
= + (9.21)
ẋ2 0 0 0 x1 x2 ⎣θ3 ⎦ 0 1 u2
        
f (x)
θ 4
ẋ F(x)   g(x) u
θ

Fig. 9.2 DBA representing the specification in (9.22). The state q0 is the initial state, q3 is the
accepting state, and q4 is a “trap” state. If multiple transitions exist between states, only one transition,
labeled by all observations that enable that transition is illustrated. For this simple example, the DTA
for each state can be computed by inspection as Vd (q0 ) = 3, Vd (q1 ) = 2, Vd (q2 )01, Vd (q3 ) = 0,
Vd (q4 ) = ∞

and an LTL specification

ϕ = (♦(o2 ∧ ♦(o1 ∧ ♦o3 ))) ∧ ¬o4 , (9.22)

where the regions corresponding to o1 , o2 , o3 are areas of the state space that must be
visiting infinitely often and the one corresponding to o4 is a dangerous area that must
avoided at all times. In words, (9.22) reads “always eventually visit Ro2 and then Ro1 and
then Ro3 and always avoid Ro4 .” The DBA corresponding to (9.22) is displayed in Fig. 9.2. In
accordance with minimizing the DTA over the transitions of the DBA in Fig. 9.2, an accepting
word of φ can be computed as wo = (o2 o1 o3 )ω . Thus, by Definition 9.3, to generate the
accepting word of φ one needs to design a collection of feedback controllers that drive
the continuous trajectory of the system through Ro2 , then Ro1 , then Ro3 infinitely often
without ever entering Ro4 . We consider each region as a circular disk that can be expressed
as the zero sublevel set of h o (x) = x − xo 2 − ro2 , where ro ∈ R>0 denotes the radius
of the region and xo ∈ R2 its center. The true values of θ = [0.2 − 0.3 0.5 − 0.5] are
184 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

assumed to be unknown to the controller, but the ranges of the parameters are assumed
to be known in the sense that θ ∈  = [−1, 1]4 . For simplicity, the uncertain parameters
are identified using a modified version of the concurrent learning estimator (5.25), where,
rather than integrating the dynamics to remove the dependence on ẋ, we simply assume
that ẋ is available for measurement. To synthesize a hybrid feedback controller capable
of driving the system through the regions corresponding to the observations in wo , we
formulate a collection of optimal control problems with a quadratic cost defined by Q(x) =
x − xd 2 and R = I2×2 , where xd = xo for each o. We parameterize the value function
using a quadratic basis of the regulation error. Given that the system under consideration is
linear, the corresponding auxiliary system from is also linear, implying that the HJB (8.4)
simplifies to the algebraic Ricatti equation, which can be solved using standard numerical
tools to yield the true weights corresponding to the selected quadratic basis of the optimal
control problem as W = [1.208 − 0.047 0.624] . The parameters of the value function
are updated using (8.23), (8.24) with kc = 1, β = 0.001, whereas the parameters of the
policy are updated using the simple projection-based update law from (8.26) with ka = 1.
To densely sample the system’s operating region in accordance with the sufficient conditions
of Lemma 8.1, the sample points used in the update laws are selected as the vertices of a
5 × 5 grid laid over [−1.5, 1.5]2 ⊂ R2 yielding N = 25 extrapolation points. To guarantee
that the system trajectory remains outside Ro4 we construct an RaCBF by taking h(x) =
h o4 (x) and selecting the extended class K∞ function as α(h(x)) = 10h(x). The initial
value function and policy weights are selected as Ŵc (0) = Ŵa (0) = [0.5 0.5 0.5] , the
least squares matrix is initialized as (0) = 100I3×3 , and the initial drift parameters are set
to zero. In the subsequent simulation we use a relatively low adaptation rate for updating
the drift parameters to better illustrate the relationship between safety and adaptation.
The system is simulated for 100 s starting from an initial condition of x = [1 0] , the
results of which are provided in Figs. 9.3 and 9.4. The plot in Fig. 9.3 illustrates the closed-
loop system trajectory (denoted by the curve with varying color), where the system can be
seen to visit the ROI in the corresponding order without ever entering Ro4 (red disk). Note
that as time progresses the system is able to more closely approach the boundary of the safe
set as the uncertain model parameters are identified. This phenomenon is further highlighted
in Fig. 9.4b and c, which show the evolution of the estimated parameters and the value of
the CBF over the duration of the simulation, respectively. At the start of the simulation the
CBF-based controller accounts for the worst case model error, causing the safety margin to
remain relatively high; however, as the estimation error decreases the value of the CBF is
able to approach zero without crossing the boundary of the safe set as shown in Fig. 9.4e.
Furthermore, the value function approximation scheme is able to closely approximate the
solution to the optimal control problem as illustrated by the close convergence of the weights
to their true values in Fig. 9.4d. By the end of the simulation, one of the true weights has
been learned very closely, whereas the others can be seen to exhibit asymptotic convergence
to a neighborhood of their true values.
9.5 Numerical Examples 185

Fig. 9.3 Closed-loop trajectory of the uncertain linear system and regions of interest correspond-
ing to observations in the specification. The curve of varying color denotes the system trajectory,
where darker colors denote the system’s state early in the simulation and lighter colors denote the
system’s state towards the end of the simulation. The purple, green, yellow, and red disks denote
Ro1 , Ro2 , Ro3 , Ro4 , respectively. The initial condition of the system is represented as a blue hexagon

Example 9.4 Our next example involves a more complex nonlinear system in an obstacle-
scattered environment subject to the specification

ϕ = (♦o1 ∧ ♦o2 ∧ ♦o3 ∧ ♦o4 ) ∧ ¬(o5 ∨ o6 ∨ o7 ∨ o8 ∨ o9 ),

which requires the system to visit the regions Ro1 , Ro2 , Ro3 , Ro4 infinitely often and always
avoid regions Ro5 , Ro6 , Ro7 , Ro8 , Ro9 , where the regions are assumed to be of the same form
as in the previous example. An accepting word for this specification can be computed as
wo = (o4 o3 o2 o1 )ω . The system under consideration is a two-dimensional nonlinear control-
affine system with parametric uncertainty and dynamics
⎡ ⎤
−1
        
ẋ1 0 x x 0 0 ⎢ 1 ⎥
= + 1 2 ⎢ ⎥ + 5 0 u1 ,
ẋ2 0 0 0 x1 x2 (1 − (cos(2x1 ) + 2)2 ) ⎣−0.5⎦ 0 3 u2
        
f (x)
−0.5
ẋ F(x)   g(x) u
θ

where the set of possible parameter values  is chosen to be the same as in the previous
example. The construction of the optimal control problem and all parameters associated
186 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

Fig. 9.4 Additional results from the simulation involving the uncertain linear system. The plot in
Fig. 9.4a illustrates the evolution of the system states over time. Figure 9.4b displays the trajectory
of the estimated drift weights (solid lines), where the dotted lines of corresponding color denote the
true value of the weights. Figure 9.4c portrays the evolution of the CBF over time, where the blue
curve denotes the value of the CBF along the system trajectory and the dotted black line represents
h(x) = 0. A closer view of the CBF trajectory is provided in Fig. 9.4e, where the value is shown to be
greater than zero for all time, indicating constraint satisfaction. The trajectory of the estimated value
function and policy weights is provided in Fig. 9.4d, where the solid curves denote the estimated
weights and the dotted lines denote the value of the ideal weights

with value function and policy approximation remain the same as in the previous exam-
ple, where the initial weight estimates are chosen as Ŵc (0) = [1 1 1] , Ŵa (0) = 0.7Ŵc (0)
and the least squares matrix is initialized as (0) = 1000I3×3 to ensure fast convergence
to a stabilizing policy. To ensure the trajectory of the system satisfies the specification’s
safety requirements we take h oi , i = 5, . . . , 9 as RaCBFs with extended class K∞ func-
tions α(h oi (x)) = 12h oi (x) for all i = 5, . . . , 9, which are to construct an RaCBF safety
filter. To demonstrate quick adaptation to uncertain dynamics, we use the same concurrent
learning scheme as in the previous example, but increase the adaptation gain to achieve
quicker parameter convergence.
The nonlinear system is simulated for 15 s from an initial condition of x = [−2 0] , the
results of which are provided in Figs. 9.5–9.6. Specifically, Fig. 9.5 illustrates the trajectory
of the closed-loop system evolving an obstacle scattered environment, where the red disks
denote the unsafe regions, which are avoided at all times. Similar to the previous example,
9.5 Numerical Examples 187

varying the color of the curve in Fig. 9.5 is used to emphasize the passage of along the
system’s trajectory, which is shown to visit the regions of interest infinitely often. The
periodic nature of the system’s trajectory is further highlighted in Fig. 9.6a. The evolution of
the estimated value function and policy weights are provided in Fig. 9.6d; however, a closed-
form solution to the HJB for this problem is unavailable and thus accurately quantifying
the quality of the approximation scheme is non-trivial. Despite this, the theoretical and
numerical results clearly indicate that the MBRL algorithm is able to quickly adapt to
each optimal control problem and safely navigate the system to a given target set in the
presence of model uncertainty. The evolution of the estimated drift parameters is illustrated
in Fig. 9.6b, where the estimated parameters are shown to rapidly converge to their true
values. As a result, the RaCBF allows the system to closely approach the boundary of the
safe set after about 1 s of simulation time. This behavior is further highlighted in Fig. 9.6c,
which illustrates the minimum value among all RaCBFs at each timestep of the simulation.
As shown in Fig. 9.6e, the minimum value among all RaCBFs remains non-negative for all
time, indicating satisfaction of all safety requirements.

Fig.9.5 Trajectory of the uncertain nonlinear system under the safe MBRL policy. Similar to Fig. 9.3,
the curve of varying color denotes the trajectory of the system over time, and the disks in the state
space represent various regions of interest, with red disks denoting unsafe areas
188 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

Fig.9.6 Additional results from the simulation involving the uncertain nonlinear system. The subplots
share the same interpretation as those in Fig. 9.4

9.6 Notes

In this chapter, we focused on designing controllers that enforce specifications richer than
traditional control-theoretic objectives such as stability or safety. In particular, we focused
on specifications given as linear temporal logic (LTL) formulas, which can be used to
express complex requirements for dynamical systems. Temporal logics are used heavily in
the formal methods community [1] in model checking problems in which the objective is
to verify whether the trajectories of a discrete system satisfy a temporal logic specification.
Recently, there has been an increasing interest from the control theoretic community in
formal synthesis in which control policies for continuous dynamical systems are synthesized
directly from a complex specification, such as an LTL formula. A central paradigm of early
works in this field [2, 3] is the notion of abstraction in which a continuous-state dynamical
system is viewed as a finite transition system whose dynamics capture only the essential
behavior pertinent to the given specification. With such an abstraction in hand, a discrete
plan can be obtaining by solving a Büchi or Rabin game over a product system composed
of the abstraction and an automaton capturing the formal specification, which can then be
executed by continuous control policies for the original system to ensure satisfaction of the
formal specification. Tools that can be used to convert LTL formulas into a corresponding
9.6 Notes 189

Büchi automaton include LTL2BA [4] and Spot [5] Comprehensive textbooks that cover
the abstraction-based approach to formal synthesis include [6, 7].
Although abstraction-based approaches provide strong formal guarantees of correctness,
the computation of such abstractions is generally expensive and the class of systems for
which continuous controllers can be synthesized is limited. To address these limitations,
various researchers have proposed optimization-based techniques to formal synthesis (see
[8] for a survey), in which the objective is optimizing a cost function subject to satisfaction of
the temporal logic formula. An important component of such approaches is the use of tempo-
ral logics with quantitative semantics, such as signal temporal logic [9] that provides a metric
of how well trajectories satisfy a given specification. In this setting, Boolean satisfaction of
the specification can often be translated into a set of mixed-integer constraints and controller
synthesis is performed by solving an optimal control problem with the objective of maximiz-
ing formula satisfaction subject to the aforementioned mixed integer constraints, resulting
in a mixed integer convex programming (MICP) problem [10–12]. Such optimization-based
approaches address some limitations of classical abstraction-based techniques, but can be
computational intensive as large MICPs in real-time is challenging.
To cope with challenges of existing optimization-based approaches to control synthesis,
various authors have attempted to leverage certificate-based functions, such as control barrier
functions (CBFs), to enforce satisfaction of temporal logic formulas. In [13], time-varying
CBFs are used to encode the satisfaction of an STL specification, allowing control inputs to
be computed in a computationally efficient quadratic programming (QP) framework. Various
extensions of the approach from [13] have been reported in [14–17]. Similar approaches to
satisfying STL specifications using CBFs that ensure convergence to a set in finite time were
developed in [18, 19]. Beyond STL, CBFs have also demonstrated success in developing
controllers for other temporal logics, such at LTL [20, 21], and at bridging the gap between
high-level planners outputting trajectories that satisfy an LTL specification and the low-level
controllers that must be executed on the system to ensure satisfaction of the specification
[22, 23].
Although the use of certificate-based functions has shown promise towards the devel-
opment of computationally efficient control strategies that enforce temporal logic specifi-
cations, constructing such certificate functions for complex tasks may be challenging. An
attractive alternative is to leverage learning-based techniques, such as reinforcement learn-
ing (RL), to obtain controllers capable of satisfying temporal logic specifications. Such
approaches are often based upon using an LTL formula’s corresponding automata to guide
the learning process [24–29] or leverage quantitative semantics of certain temporal logics,
such as STL, in the reward function of the RL problem to guide the learning process towards
maximal satisfaction of the specification [30, 31].
The aforementioned approaches that use RL for temporal logic tasks typically do so in
the offline episodic RL framework discussed in Chap. 8, which presents challenges in the
context of safety-critical systems that must remain safe during the learning process. Initial
attempts towards extending such ideas to online RL approaches were introduced in [32–34].
190 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

The method presented in the present chapter that extends the techniques from Chap. 8 and
[35] to temporal logic tasks was introduced in [36]. The notion of a CBF-stabilizable set,
which the development presented herein heavily relies upon, was first introduced in [37],
where Theorem 9.1 was first stated and proved. The distance to acceptance function (DTA)
was introduced in [38, 39] with similar approaches adopted in [40, 41].

References

1. Baier C, Katoen JP (2008) Principles of model checking. MIT Press


2. Tabuada P, Pappas G (2006) Linear time logic control of discrete-time linear systems. IEEE
Trans Autom Control 51(12):1862–1877
3. Kloetzer M, Belta C (2008) A fully automated framework for control of linear systems from
temporal logic specifications. IEEE Trans Autom Control 53(1):287–297
4. Gastin P, Oddoux D (2001) Fast ltl to büchi automata translation. In: Proceedings of the inter-
national conference on computer aided verification, pp 53–65
5. Duret-Lutz A, Lewkowicz A, Fauchille A, Michaud T, Renault E, Xu L (2016) Spot 2.0 –
a framework for ltl and omega-automata manipulation. In: Proceedings of the international
symposium on automated technology for verification and analysis, pp 122–129
6. Tabuada P (2009) Verification and control of hybrid systems: a symbolic approach. Spring Science
& Business Media
7. Belta C, Yordanov B, Gol EA (2017) Formal methods for discrete-time dynamical systems.
Springer
8. Belta C, Sadraddini S (2019) Formal methods for control synthesis: an optimization perspective.
Ann Rev Control Robot Auton Syst 2:115–140
9. Maler O, Nickovic D (2004) Monitoring temporal properties of continuous signals. In: Lakhnech
Y, Yovine S (eds) Formal techniques, modelling and analysis of timed and fault-tolerant systems.
Springer, Berlin, Heidelberg, pp 152–166
10. Raman V, Donzé A, Maasoumy M, Murray RM, Sangiovanni-Vincentelli A, Seshia SA (2014)
Model predictive control with signal temporal logic specifications. In: Proceedings of the IEEE
conference on decision and control, pp 81–87
11. Raman V, Donzé A, Sadigh D, Murray RM, Seshia SA (2015) Reactive synthesis from signal
temporal logic specifications. In: Proceedings of the international conference on hybrid systems:
computation and control
12. Sadraddini S, Belta C (2015) Robust temporal logic model predictive control. In: Proceedings
of the 53rd annual Allerton conference on communication, control, and computing, pp 772–779
13. Lindemann L, Dimarogonas DV (2019) Control barrier functions for signal temporal logic tasks.
IEEE Control Syst Lett 3(1):96–101
14. Lindemann L, Dimarogonas DV (2019) Control barrier functions for multi-agent systems under
conflicting local signal temporal logic tasks. IEEE Control Syst Lett 3(3):757–762
15. Lindemann L, Dimarogonas DV (2019) Decentralized control barrier functions for coupled
multi-agent systems under signal temporal logic tasks. In: Proceedings of the European control
conference, pp 89–94
16. Lindemann L, Dimarogonas DV (2020) Barrier function based collaborative control of multiple
robots under signal temporal logic tasks. IEEE Trans Control Netw Syst 7(4):1916–1928
17. Gundana D, Kress-Gazit H (2021) Event-based signal temporal logic synthesis for single and
multi-robot tasks. IEEE Robot Autom Lett 6(2):3687–3694
References 191

18. Garg K, Panagou D (2019) Control-lyapunov and control-barrier functions based quadratic pro-
gram for spatio-temporal specifications. In: Proceedings of the IEEE conference on decision and
control, pp 1422–1429
19. Xiao W, Belta C, Cassandras CG (2021) High order control lyapunov-barrier functions for
temporal logic specifications. In: Proceedings of the American control conference, pp 4886–
4891
20. Srinivasan M, Coogan S (2021) Control of mobile robots using barrier functions under temporal
logic specifications. IEEE Trans Robot 37(2):363–374
21. Niu L, Clark A (2020) Control barrier functions for abstraction-free control synthesis under
temporal logic constraints. In: Proceedings of the IEEE conference on decision and control, pp
816–823
22. Nilsson P, Ames AD (2018) Barrier functions: bridging the gap between planning from spec-
ifications and safety-critical control. In: Proceedings of the IEEE conference on decision and
control, pp 765–772
23. Rosolia U, Singletary A, Ames AD (2022) Unified multirate control: from low-level actuation
to high-level planning. IEEE Trans Autom Control 67(12):6627–6640
24. Sadigh D, Kim ES, Coogan S, Sastry SS, Seshia SA (2014) A learning based approach to control
synthesis of markov decision processes for linear temporal logic specifications. In: Proceedings
of the IEEE conference on decision and control, pp 1091–1096
25. Li X, Serlin Z, Yang G, Belta C (2019) A formal methods approach to interpretable reinforcement
learning for robotic planning. Sci Robot 4(37)
26. Hasanbeig M, Kantaros Y, Abate A, Kroening D, Pappas GJ, Lee I (2019) Reinforcement learning
for temporal logic control synthesis with probabilistic satisfaction guarantees. In: Proceedings
of the IEEE conference on decision and control, pp 5338–5343
27. Bozkurt AK, Wang Y, Zavlanos MM, Pajic M (2020) Control synthesis from linear temporal logic
specifications using model-free reinforcement learning. In: Proceedings of the IEEE international
conference on robotics and automation, pp 10349–10355
28. Cai M, Hasanbeig M, Xiao S, Abate A, Kan Z (2021) Modular deep reinforcement learning for
continuous motion planning with temporal logic. IEEE Robot Autom Lett 6(4):7973–7980
29. Cai M, Xiao S, Li B, Li Z, Kan Z (2021) Reinforcement learning based temporal logic control
with maximum probabilistic satisfaction. In: Proceedings of the IEEE international conference
on robotics and automation, pp 806–812
30. Aksaray D, Jones A, Kong Z, Schwager M, Belta C (2016) Q-learning for robust satisfaction
of signal temporal logic specifications. In: Proceedings of the IEEE conference on decision and
control, pp 6565–6570
31. Li X, Vasile CI, Belta C (2017) Reinforcement learning with temporal logic rewards. In: Proceed-
ings of the IEEE/RSJ international conference on intelligent robots and systems, pp 3834–3839
32. Sun C, Vamvoudakis KG (2020) Continuous-time safe learning with temporal logic constraints
in adversarial environments. In: Proceedings of the American control conference, pp 4786–4791
33. Kanellopoulos A, Fotiadis F, Sun C, Xu Z, Vamvoudakis KG, Topcu U, Dixon WE (2021)
Temporal-logic-based intermittent, optimal, and safe continuous-time learning for trajectory
tracking. arXiv:2104.02547
34. Cohen MH, Belta C (2021) Model-based reinforcement learning for approximate optimal control
with temporal logic specifications. In: Proceedings of the international conference on hybrid
systems: computation and control
35. Cohen MH, Belta C (2023) Safe exploration in model-based reinforcement learning using control
barrier functions. Automatica 147:110684
36. Cohen MH, Serlin Z, Leahy KJ, Belta C (2023) Temporal logic guided safe model-based rein-
forcement learning: a hybrid systems approach. Nonlinear Anal: Hybrid Syst 47:101295
192 9 Temporal Logic Guided Safe Model-Based Reinforcement Learning

37. Cortez WS, Dimarogonas DV (2022) On compatibility and region of attraction for safe, stabi-
lizing control laws. IEEE Trans Autom Control 67(9):4924–4931
38. Ding X, Belta C, Cassandras CG (2010) Receding horizon surveillance with temporal logic
specifications. In: Proceedings of the IEEE conference on decision and control, pp 256–261
39. Ding X, Lazar M, Belta C (2014) Ltl receding horizon control for finite deterministic systems.
Automatica 50:399–408
40. Bisoffi A, Dimarogonas DV (2018) A hybrid barrier certificate approach to satisfy linear temporal
logic specifications. In: Proceedings of the American control conference, pp 634–639
41. Bisoffi A, Dimarogonas DV (2021) Satisfaction of linear temporal logic specifications through
recurrence tools for hybrid systems. IEEE Trans Autom Control 66(2):818–825
Index

A Control Lyapunov function, 12


Adaptive control, 58 adaptive, 58
concurrent learning, 65 exponentially stabilizing, 13
modular, 99 exponentially stabilizing adaptive, 69
Algebraic Riccati equation, 137 input-to-state stable, 97
Automaton, 167 robust, 122
Büchi, 167 Controlled invariance, 33
deterministic Büchi, 167

D
B Deep neural network, 137
Backstepping, 21 Distance to acceptance, 173
Barbalat’s Lemma, 100 Dynamic feedback, 58
Barrier function, 31
Bellman error, 139
approximate, 139 E
Equilibrium point, 6

C
F
Comparison function, 7
Feedback linearization, 19
Class KL, 7
Finite excitation, 66
Class K, 7
Forward invariance, 29
Class K∞ , 7
Extended class K, 31
Extended class K∞ , 31 H
Comparison lemma, 10 Hamilton–Jacobi–Bellman, 136
Control barrier function, 33 History stack, 65
adaptive, 78 Hybrid system, 175
high order, 42 semantics, 175
high order robust adaptive, 87
input-to-state safe, 102
input-to-state safe high order, 105 I
robust, 119 Input-to-state, 96
robust adaptive, 81 exponential stability, 96
stabilizable set, 170 Lyapunov function, 97
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer
Nature Switzerland AG 2023 193
M. Cohen and C. Belta, Adaptive and Learning-Based Control of Safety-Critical Systems,
Synthesis Lectures on Computer Science,
https://doi.org/10.1007/978-3-031-29310-8
194 Index

safety, 101 R
stability, 96 Recursive least squares, 110
Region of interest, 167
Reinforcement learning, 133
K episodic, 134
Karush-Kuhn-Tucker, 15 model-based, 134
model free, 134
online, 135
L online model-based, 140
Labeling function, 169 Relative degree, 40
LaSalle-Yoshizawa Theorem, 60
Lie derivative, 8
Linear program, 119 S
Linear regression, 63 Safety, 29
Lipschitz, 6 filter, 34
locally, 6 Set membership identification, 123
Lyapunov, 8 Simulation of experience, 141
direct method, 9 Singlular value maximizing algorithm, 67
function, 9 Small control property, 18
function candidate, 9 Stability, 7
asymptotic, 7
exponential, 7
N set, 32
Nagumo’s Theorem, 30 uniform asymptotic, 172
Nonlinear, 6
control affine system, 12
T
dynamical system, 6
Tangent cone, 30
Bouligand, 30
Temporal logic, 166
O
linear, 166
Optimal control, 135
semantics, 166
infinite-horizon, 136
syntax, 166

P U
Parameter, 57 Uniformly ultimately bounded, 152
estimation, 58 Universal function approximation, 138
estimation error, 58
matched, 61
Persistence of excitation, 63 V
Projection operator, 143 Value function, 136
approximation, 137
Vector field, 6
Q forward complete, 6
Quadratic program, 14 Viability kernel, 50

You might also like