4174 Learning Barrier Functions With Memory For Robust Safe Navigation
4174 Learning Barrier Functions With Memory For Robust Safe Navigation
1125
sensor measurements Pk,i for k ∈ N and i = 1, . . . , K, The training set D0 for the Eikonal term can be generated
design a feedback controller u(x) for (1) that ensures that arbitrarily in Y since ϕ̃i needs to satisfy the Eikonal con-
the robot can move safely along the path r, i.e., y(t) ∈ v(S) straint almost everywhere in Y. In practice, we generate D0
for all t and y(t) → r(1) as t → ∞. by sampling points from a mixture of a uniform distribution
IV. O BSTACLE E STIMATION VIA O NLINE S IGNED and Gaussians centered at the points in the distance training
D ISTANCE F UNCTION A PPROXIMATION set D with standard deviation equal to the distance to the k-
th nearest neighbor (k = |D|/2). For the distance training set
Throughout the paper, we rely on the concept of signed D, ideally, we should use as much data as possible to obtain
distance function to describe each unsafe region {v(Oi )}K
i=1 . an accurate approximation of the SDF ϕi . For example,
For each i, the SDF function ϕi : Y 7→ R is: at time tk , all observed data may be used as the distance
(
−d(y, ∂v(Oi )), y ∈ v(Oi ), training set D = ∪kl=0 Dl,i . However, the online training
ϕi (y) := (2) time becomes longer and longer as the robot receives more
d(y, ∂v(Oi )), y ∈ / v(Oi ),
and more LiDAR measurements, which makes it impractical
where d denotes the Euclidean distance between a point and for real-time mapping and navigation tasks. We introduce
a set. In this section, we describe an approach to construct an “Incremental Training with Replay Memory” approach
approximations to the obstacle SDFs ϕi using the point cloud in Sec. IV-C, which obtains accurate SDF estimates with
observations {Pk,i }k at time step tk . training times suitable for online learning.
A. Data Pre-processing
C. Incremental Learning
Given the point cloud Pk,i ⊂ Rw on the surface of the ith
obstacle at time tk , we can regard Pk,i as the points on the When an obstacle i is first observed at time t0 , we use the
zero level-set of a distance function. To normalize the scale data set D0,i to train the SDF network ϕ̃i (y; θ 0 ). When new
of the training data, we define the point coordinates with re- observations Dk,i of obstacle i are obtained at k = 1, 2, . . . ,
spect to the obstacle centroid. Since the centroid is unknown, we need to update the network parameters θ k . We first
we approximate consider an approach that updates ϕ̃i (y; θ k−1 ) based on the
1
Pm it as the sample mean of the training points new data set Dk,i and uses the previous parameters θ k−1 as
p̄k,i := m j=1 pk,i,j . The points pk,i,j − p̄k,i are centered
around p̄k,i and have a measured distance of 0 to the obstacle initialization. We call this approach “Incremental Training”
surface v(∂Oi ). Let c = kv(xk ) − qk,i,j k be the Euclidean (IT). Alternatively, we can update ϕ̃i (y; θ k−1 ) using all prior
distance between the robot state xk := x(tk ) and pk,i,j . Let data Di = ∪kl=0 Dl,i , observed up to time tk , and θ k−1 as
δ > 0 be a small positive constant. Define a point qk,i,j := initialization. We call this approach “Batch Training” (BT).
δ δ The IT approach is efficient because, at each time tk , it
c v(xk )+(1− c )pk,i,j along the LiDAR ray from the output
v(xk ) to pk,i,j that is approximately a distance δ from the uses data sets Dk,i of approximately constant cardinality,
obstacle surface ∂v(Oi ). We call the set {qk,i,j − p̄k,i }j a leading to approximately constant update times during online
truncated SDF point set. The training set for each obstacle training. However, discarding the old data ∪k−1 l=0 Dl,i and
i is constructed as a union of the points on the boundary using stochastic gradient descent to re-train the network
and the truncated SDF points. The training set at time tk is parameters θ k−1 on the new data Dk,i causes degradation
Dk,i := {(pk,i,j − p̄k,i , 0)} ∪ {(qk,i,j − p̄k,i , δ)}. in the neural network’s ability to represent the old data.
In other words, the SDF approximation ϕ̃i (y; θ k ) at time
B. Loss Function tk is good at approximating the latest observed obstacle
Inspired by the recent impressive results on multilayer surface but the approximation quality degrades at previously
perceptron approximation of SDF [7], [14], we introduce a observed surfaces, as shown in Fig. 2. The BT approach
fully connected neural network ϕ̃i (y; θ k ) with parameters θ k does not have this limitation since it uses all data ∪kl=0 Dl,i
to approximate the SDF of each observed obstacle i at time for training at time tk but, as a result, its training time
step tk . Our approach relies on the fact that the norm of the increases over time. Hence, our motivation for introducing
SDF gradient satisfies the Eikonal equation k∇ϕi (y)k = 1 in an “Incremental Training with Replay Memory” (ITRM)
its domain. We use a loss function that encourages ϕ̃i (y; θ k ) approach is to balance the trade-off between SDF estimation
to approximate the measured distances in a training set D error and online training time, making it suitable for real-
(distance loss `D i ) and to satisfy the Eikonal constraint for set time robotic tasks.
of points D0 (Eikonal loss `E i ). For example, ϕ̃i equals 0 for Experience replay is an effective and commonly used
points on the obstacle surface and equals δ for the truncated technique for online reinforcement learning, which enables
SDF points along the LiDAR rays. The loss function is convergence of stochastic gradient descent to a critical point
defined as `i (θ k ; D, D0 ) := `D E 0
i (θ k ; D) + λ`i (θ k ; D ), with in policy and value function approximation [10], [18]. This
a parameter λ > 0 and: idea has not been explored for online supervised learning
1 X of geometric surfaces. The first contribution of this paper
`Di (θ k ; D) := |ϕ̃i (p; θ k ) − d|,
|D| is to use replay memory for online incremental learning of
(p,d)∈D
(3) SDF. At each time step tk−1 , We construct an experience
0 1 X
`Ei (θ k ; D ) := (k∇ ϕ̃ i (p; θ k )k − 1) . replay memory Qk−1,i at each time step by utilizing the
|D0 | 0 SDF approximations ϕ̃i (y; θ k−1 ).
p∈D
1126
ϕ̃i (y; θ k ) are trained via the loss function in (3) but each
method uses a different training set D:
IT: D = Dk,i BT: D = ∪kl=0 Dl,i ITRM: D = Dk,i ∪Q̄k,i .
We use the continuously differentiable Softplus function
ln(1+ex ) as the non-linear activation to ensure that ϕ̃i (y; θ k )
is continuously differentiable. The gradient ∇ϕ̃i (y; θ k ) is an
(a) Training data at k = 0 (b) IT data at k = 70 (c) ITRM data at k = 70
input in the loss function in (3) and can be calculated via
backpropagation [14].
V. S AFE NAVIGATION WITH E STIMATED O BSTACLES
We rely on the estimated SDFs constructed in Sec. IV to
formalize the synthesis of a controller that guarantees safety
with respect to the exact obstacles, despite errors in the SDF
approximation. Our analysis assumes error bounds are avail-
(d) Estimation at k = 0 (e) IT estimation at (f) ITRM estimation at
k = 70 k = 70 able, and we leave their actual computation for future work.
In this regard, several recent works study the approximation
Fig. 2: Shape estimation with and without replay memory. The
top row shows the training data used at time step k = 0 and power and error bounds of neural networks [20], [21]. In
k = 70 by the IT and ITRM approaches. The purple points are the our evaluation in Sec. VI, we obtain SDF error bounds by
observed LiDAR end points, while the blue points are the truncated comparing the estimated and ground-truth object SDFs.
SDF points along the LiDAR rays. In (c), the green and red points
are boundary and truncated SDF points obtained from the replay A. Safe Control with Estimated Barrier Functions
memory. The bottom row shows the SDF estimation of the obstacle A useful tool to ensure that the robot state remains in the
surface at time step k = 0 and k = 70 for the IT and ITRM
approaches. The black rectangle shows the ground-truth obstacle safe set S throughout its evolution is the notion of control
boundary, while colored regions are level-sets of the SDF estimate. barrier function (CBF).
The white region denotes the estimated obstacle boundary. The blue
(resp. red) region denotes negative (resp. positive) signed distance. Definition 2 ([22]). A continuously differentiable function
IT and ITRM use the same data and lead to the same estimate at k = h : Rn 7→ R, with h(x) > 0 if x ∈ Int(S) and h(x) = 0
0 because the replay memory set is empty. In (e), the SDF estimate if x ∈ ∂S, is a zeroing control barrier function (ZCBF) on
of the top obstacle region at k = 70, without memory replay, X ⊂ Rn if there exists an extended class K function αh such
degrades compared to (d). In (f), training with replay memory helps that, for each x ∈ X , there exists u ∈ U with
the neural network remember the overall obstacle shape.
Lf h(x) + Lg h(x)u + αh (h(x)) ≥ 0, (4)
Definition 1. The replay memory Q associated with a signed
distance function ϕ and truncation parameter τ ≥ 0 is the where Lf h(x) is the Lie derivative of h(x) along f (x).
set of points that are at most a distance τ from the zero-level Any Lipschitz-continuous controller u : X 7→ U such
set of ϕ: Q := {(q, ϕ(q)) ∈ (Rw , R) | |ϕ(q)| ≤ τ }. that u(x) satisfies (4) renders the set S forward invariant
To construct the replay memory set Qk−1,i , we need for the control-affine system (1) [6], [22]. If the exact SDFs
to generate samples from the level sets of the SDF ap- ϕi describing the obstacles Oi were known, we could define
proximation ϕ̃i (y; θ k−1 ). In robotics applications, where the ZCBFs hi (x) := ϕi (v(x)) that ensure the forward invariance
environments are commonly 2-D or 3-D, samples from the of S. However, when the environment is observable through
level sets of ϕ̃i (y; θ k−1 ) can be obtained using the Marching online range measurements as described in Sec. IV, we only
Cubes algorithm [19]. In our experiments in Sec. VI, we used have the estimated SDFs ϕ̃i (v(x); θ k ) at our disposal to
Marching Cubes to extract samples q0 and qδ from the zero define h̃i (x) := ϕ̃i (v(x); θ k ). Our next result describes
and δ level-sets of ϕ̃i (y; θ k−1 ), respectively, and construct how to use this information to ensure the safety of S. The
the replay memory as Qk−1,i := {(q0 , 0)} ∪ {(qδ , δ)}. statement is for a generic h̃ (e.g., a single obstacle). We later
Given the replay memory Qk−1,i , our ITRM approach particularize our discussion to the case of multiple obstacles.
constructs a training set at time tk by combining the latest Proposition 1. Let e1 (x) := h(x) − h̃(x) ∈ R be the error
observation Dk,i with a randomly sampled subset Q̄k−1,i at x in the approximation of h, and likewise, let e2 (x) :=
from Qk−1,i . To make the algorithm efficient without losing ∇h(x)−∇h̃(x) ∈ Rn be the error at x in the approximation
significant information about prior data, we let |Q̄k−1,i | = of its gradient. Assume there are available known functions
|Dk,i |, i.e., we pick as many points from the replay memory eh (x) : R 7→ R≥0 and e∇h (x) : Rn 7→ R≥0 such that
as there are in the latest observation Dk,i .
The three training techniques, IT, BT, and ITRM, dis- |e1 (x)| ≤ eh (x), ke2 (x)k ≤ e∇h (x) (5)
cussed in this section, are formally defined as follows. In with eh (x) → 0 and e∇h (x) → 0 as x → ∂S. Let
all approaches, for obstacle i, the parameters θ k−1 obtained
at the previous time step are used as an initial guess for Kh̃ (x) := {u ∈ U | Lf h̃(x) + Lg h̃(x)u− (6)
the new parameters θ k at time tk . The SDF approximations kf (x) + g(x)uke∇h (x) + αh (h̃(x) − eh (x)) ≥ 0}.
1127
Then, any locally Lipschitz continuous controller u : X 7→ U where ũ(x) is a baseline controller, L(x) is a matrix
such that u(x) ∈ Kh̃ (x) guarantees that the safe set S is penalizing control effort, and δ ≥ 0 (with the corresponding
forward invariant. penalty λ) is a slack variable, introduced to relax the CLF
constraints in order to ensure the feasibility of the QP. The
Proof. We start by substituting h(x) = h̃(x) + e1 (x) in (4):
baseline controller ũ(x) is used to specify additional control
Lf h̃(x) + Lg h̃(x)u + e2 (x)> f (x) + e2 (x)> g(x)u requirements such as desirable velocity or orientation (see
Sec. VI) but may be set to ũ(x) ≡ 0 if minimum control
≥ −αh (h̃(x) + e1 (x)).
effort is the main objective. The QP formulation in (8)
For any fixed x and any errors e1 (x) and e2 (x) satisfying modifies ũ(x) online to ensure safety and stability via the
(5), we need the minimum value of the left-hand side greater CBF and CLF constraints.
than the maximum value of the right-hand side to ensure that Without exact knowledge of the barrier function h, we
(4) still holds, namely: need to replace the CLF constraint in (8) by (6):
n
min Lf h̃(x) + Lg h̃(x)u + e2 (x)> f (x)+ min kL(x)> (u − ũ(x))k2 + λδ 2
ke2 (x)k≤e∇h (x) u∈U ,δ∈R
o n o (7)
e2 (x)> g(x)u ≥ max − αh (h̃(x) + e1 (x)) . s.t. Lf V (x) + Lg V (x)u + αV (V (x)) ≤ δ
|e1 (x)|≤eh (x)
(9)
Lf h̃(x) + Lg h̃(x)u + αh (h̃(x) − eh (x))
Note that since eh (x) ≥ 0 and αh is an extended class K
≥ kf (x) + g(x)uk e∇h (x).
function, the maximum value in (7) is obtained by:
This makes the optimization problem in (9) no longer a
max −αh (h̃(x) + e1 ((x))) = −αh (h̃(x) − eh (x)).
|e1 (x)|≤eh (x) quadratic program. However, the following result shows
The minimum value is attained when e2 (x) is in the opposite that (9) is a (convex) second-order cone program (SOCP).
direction to the gradient of f (x) + g(x)u, namely Proposition 2. The optimization problem in (9) is equivalent
n o to the following second-order cone program:
min e2 (x)> f (x)+e2 (x)> g(x)u =
ke2 (x)k≤e∇h (x) min l
u∈U ,δ∈R,l∈R
− kf (x) + g(x)uke∇h (x).
s.t. Lf V (x) + Lg V (x)u + αV (V (x)) ≤ δ
Therefore, the inequality condition (7) can be rewritten
as Lf h̃(x) + Lg h̃(x)u − kf (x) + g(x)uke∇h (x) ≥ kf (x) + g(x)uk e∇h (x) ≤ Lf h̃(x) + Lg h̃(x)u
(10)
−αh (h̃(x) − eh (x)), which is equivalent to the condition + αh (h̃(x) − eh (x))
in (6).
2L(x)> (u
√ − ũ(x))
Proposition 1 allows us to synthesize safe controllers even 2 λδ ≤ l + 1.
though the obstacles are not exactly known, provided that l−1
error bounds on the approximation of the barrier function Proof. We first introduce a new variable l so that the problem
and its gradient are available and get better as the robot state in (9) is equivalent to
gets closer to the boundary of the safe set S.
min l
u∈U ,δ∈R,l∈R
B. Control Synthesis via Second-Order Cone Programming
s.t. Lf V (x) + Lg V (x)u + αV (V (x)) ≤ δ
We encode the control objective using the notion of a
Lyapunov function. Formally, we assume the existence of kf (x) + g(x)uk e∇h (x) ≤ Lf h̃(x) + Lg h̃(x)u (11)
a control Lyapunov function, as defined next. + αh (h̃(x) − eh (x))
Definition 3 ([6]). A control Lyapunov function (CLF) for kL(x)> (u − ũ(x))k2 + λδ 2 ≤ l.
the system dynamics in (1) is a continuously differentiable The last constraint in (11) corresponds to a rotated second-
function V : Rn 7→ R≥0 for which there exist a class K order cone, Qnrot := {(xr , yr , zr ) ∈ Rn+2 | kxr k2 ≤
function αV such that, for all x ∈ X : yr zr , yr ≥ 0, zr ≥ 0}, which can be converted into a
inf [Lf V (x) + Lg V (x)u + αV (V (x))] ≤ 0. standard SOC constraint [23]:
u∈U
The function V may be used to encode a variety of control 2xr
≤ yr + zr .
objectives, including for instance path following. We present yr − zr
a specific Lyapunov function for this purpose in Sec.VI. Let yr = l, zr = 1 and consider the constraint kL(x)> (u −
When the barrier function is precisely known, one can ũ(x))k2 + λδ 2 ≤ l. Multiplying both sides by 4 and adding
combine CLF and CBF constraints to synthesize a safe l2 + 1, makes the constraint equivalent to
controller via the following QP:
4kL(x)> (u − ũ(x))k2 + 4λδ 2 + (l − 1)2 ≤ (l + 1)2 .
> 2 2
min kL(x) (u − ũ(x))k + λδ
u∈U ,δ∈R Taking a square root on both sides, we end up with
(8)
q √
s.t. Lf V (x) + Lg V (x)u + αV (V (x)) ≤ δ k2L(x)> (u − ũ(x))k2 + (2 λδ)2 + (l − 1)2 ≤ l + 1,
Lf h(x) + Lg h(x)u + αh (h(x)) ≥ 0, which is equivalent to the third constraint in (10).
1128
TABLE I: SDF estimation error (14) of the IT, ITRM, and BT
training methods for LiDAR measurements with zero-mean Gaus-
sian range noise with standard deviation σ = 0.01.
Case E in IT E in ITRM E in BT
1 (Ball) 0.0287 0.0148 0.0140
2 (Sofa) 0.0653 0.0212 0.0215 (a) Ball (b) Sofa (c) Table (d) Toy
3 (Table) 0.0564 0.0230 0.0227
4 (Toy) 0.1064 0.0242 0.0198
5 (Dog) 0.0328 0.0211 0.0123
6 (Duck) 0.0751 0.0102 0.0111
7 (Rabbit) 0.0603 0.0134 0.0109
8 (Cat) 0.0292 0.0149 0.0123
Average 0.0568 0.0179 0.0147
(e) Dog (f) Duck (g) Rabbit (h) Cat
1129
TABLE II: Training time and validation error of different neural
network configurations for SDF approximation. The number of
network layers, the number of training epochs, and whether a GPU
is used are varied. The results are obtained using the settings in
Table I for the dog object shown in Fig. 3. The SDF training time
includes both pre-processing of the LiDAR data and neural network
training. We report the average time per LiDAR scan along the robot
path. The SDF validation error is computed via (14).
Training Setting Average Training Time SDF Validation Error (a) Environment 1 (b) Environment 2
(number of layer,
GPU CPU GPU CPU
training epochs)
4, 5 0.0744 0.197 0.0212 0.0266
4, 10 0.143 0.388 0.0215 0.0207
4, 16 0.224 0.716 0.0207 0.0218
6, 5 0.0978 0.407 0.0132 0.0215
6, 10 0.193 0.731 0.0136 0.0193
6, 16 0.312 1.258 0.0149 0.0158
8, 5 0.178 0.544 0.0181 0.0208
8, 10 0.325 1.127 0.0201 0.0184
8, 16 0.497 1.841 0.0189 0.0156 (c) Environment 6 (d) Environment 7
The second set of experiments demonstrates safe trajectory F (A, B) = inf max {d(A(α(t)), B(β(t)))} , (15)
α,β t∈[0,1]
tracking using online SDF obstacle estimates to define con-
straints in the CLF-CBF-SOCP control synthesis optimiza- where α, β : [0, 1] 7→ [0, 1] are continuous, non-decreasing,
tion in (10). The robot moves along a reference path while surjections and d is the Euclidean distance in our case.
avoiding unknown obstacles in 8 different environments, In Fig. 5, the robot can follow the prescribed reference
similar to the one shown in Fig. 1. To account for the path if no obstacles are nearby. The green trajectory gener-
fact that the robot body is not a point mass, we subtract ated by CLF-CBF-SOCP is always more conservative than
the robot radius τ = 0.177 from the SDF estimate when the pink trajectory since it takes the errors in the CBF
defining the CBF: h̃i (x) = ϕ̃i (y; θ) − τ . The error bounds estimation into account. When there is an obstacle near or
of CBF eh (x) and its gradient e∇h (x) are approximated on the reference path, the robot controlled by the CLF-CBF-
based on Table I. We compare our CLF-CBF-SOCP approach SOCP controller stays further away from the obstacles than
to a CLF-CBF-QP. To account for estimation errors, it is the robot controlled by the CLF-CBF-QP controller. This
possible to inflate the CBFs in the CLF-CBF-QP formulation agrees with the Fréchet distance results presented in Table III.
by both the robot radius and the SDF approximation error, In Fig. 6, we see that the CLF-CBF-QP controller sometimes
h̃i (x) = ϕ̃i (y; θ)−τ −eh (x). This preserves linearity of the fails to avoid obstacles because it does not consider the errors
CBF constraints and leads to a more conservative controller. in the CBF estimation. In contrast, the CLF-CBF-SOCP
Comparing with (10), we can see that the CLF-CBF-SOCP controller is guaranteed by Prop. 2 to remain safe if the CBF
formulation accounts for both direct and gradient errors in estimation is captured correctly in the SOC constraints.
the CBF approximations, and naturally reduces to a QP if In summary, Table III indicates that the trajectory mis-
e∇h (x) is zero. To emphasize the importance of accounting match with respect to the reference path is larger under
1130
[3] S. Prajna, “Barrier certificates for nonlinear model validation,” in
Conference on Decision and Control (CDC), pp. 2884–2889, 2003.
[4] S. Prajna and A. Jadbabaie, “Safety verification of hybrid systems us-
ing barrier certificates,” in Hybrid Systems: Computation and Control,
pp. 477–492, Springer Berlin Heidelberg, 2004.
[5] P. Wieland and F. Allgöwer, “Constructive safety using control barrier
functions,” in IFAC Proceedings Volumes, pp. 462–467, 2007.
[6] A. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and
P. Tabuada, “Control barrier functions: Theory and applications,” in
(a) Environment 3 (b) Zoom out of (a) European Control Conference (ECC), pp. 3420–3431, 2019.
[7] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove,
“DeepSDF: Learning Continuous Signed Distance Functions for Shape
Representation,” in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2019.
[8] C.-H. Lin, C. Wang, and S. Lucey, “SDF-SRN: Learning Signed
Distance 3D Object Reconstruction from Static Images,” ArXiv,
vol. abs/2010.10505, 2020.
[9] L. Han, F. Gao, B. Zhou, and S. Shen, “Fiesta: Fast incremental
euclidean distance fields for online motion planning of aerial robots,”
in IEEE/RSJ International Conference on Intelligent Robots and
(c) Environment 8 (d) Zoom out of (c) Systems (IROS), 2019.
Fig. 6: Simulation results in environments where the CLF-CBF- [10] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experi-
ence replay,” in Int. Conf. on Learning Representations (ICLR), 2016.
SOCP controller succeeds but CLF-CBF-QP one fails. The robot is
[11] A. Hornung, K. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard,
shown as a purple circle when crashing into an obstacle using the “Octomap: An efficient probabilistic 3D mapping framework based on
CLF-CBF-QP controller. Fig. 6b and Fig. 6d show enlargements of octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189–206, 2013.
the crash regions in Fig. 6a and Fig. 6c, respectively. [12] H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto,
TABLE III: Fréchet distance between the reference path and the “Voxblox: Incremental 3d euclidean signed distance fields for on-board
robot trajectories generated by the CLF-CBF-SOCP and the CLF- mav planning,” in IEEE/RSJ International Conference on Intelligent
CBF-QP controllers (smaller values indicate larger trajectory simi- Robots and Systems (IROS), pp. 1366–1373, 2017.
[13] L. M. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and
larity, N/A indicates that the robot failed to reach the goal region).
A. Geiger, “Occupancy networks: Learning 3d reconstruction in func-
Environment CLF-CBF-SOCP CLF-CBF-QP tion space,” IEEE/CVF Conference on Computer Vision and Pattern
1 0.3428 0.3442 Recognition (CVPR), pp. 4455–4465, 2019.
2 0.5339 0.3971 [14] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman, “Implicit
3 0.8356 N/A geometric regularization for learning shapes,” in International Confer-
4 0.3968 0.3492 ence on Machine Learning, 2020.
5 1.0535 N/A [15] P. Glotfelter, J. Cortés, and M. Egerstedt, “Nonsmooth barrier func-
6 0.3611 0.3444 tions with applications to multi-robot systems,” IEEE Control Systems
7 0.4643 0.3493 Letters, vol. 1, no. 2, pp. 310–315, 2017.
8 0.8674 N/A [16] X. Xu, T. Waters, D. Pickem, P. Glotfelter, M. Egerstedt, P. Tabuada,
J. W. Grizzle, and A. D. Ames, “Realizing simultaneous lane keeping
and adaptive speed regulation on accessible mobile robot testbeds,”
in IEEE Conference on Control Technology and Applications (CCTA),
our CLF-CBF-SOCP controller than under the CLF-CBF-QP pp. 1769–1775, 2017.
controller if they both succeed. However, our approach guar- [17] M. Srinivasan, A. Dabholkar, S. Coogan, and P. Vela, “Synthesis
antees safe navigation, while the CLF-CBF-QP controller of control barrier functions using a supervised machine learning
approach,” arXiv preprint:2003.04950, 2020.
sometimes causes collisions due to errors in CBF estimation. [18] D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne,
“Experience replay for continual learning,” in AAAI Conference on
VII. C ONCLUSION Artificial Intelligence, 2018.
We introduced an incremental training approach with [19] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution
3d surface construction algorithm,” COMPUTER GRAPHICS, vol. 21,
replay memory to enable online estimation of signed dis- no. 4, pp. 163–169, 1987.
tance functions and construction of corresponding safety [20] B. Bailey, Z. Ji, M. Telgarsky, and R. Xian, “Approximation power of
constraints in the form of control barrier functions. The use random neural networks,” ArXiv, vol. abs/1906.07709, 2019.
[21] D. Yarotsky, “Error bounds for approximations with deep relu net-
of replay memory balances training time with estimation works,” Neural Networks, vol. 94, pp. 103–114, 2017.
error, making our approach suitable for real-time estimation. [22] A. Ames, X. Xu, J. Grizzle, and P. Tabuada, “Control barrier function
We showed that accounting for the direct and gradient errors based quadratic programs for safety critical systems,” IEEE Transac-
tions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
in the CBF approximations leads to a new CLF-CBF-SOCP [23] F. Alizadeh and D. Goldfarb, “Second-order cone programming,”
formulation for safe control synthesis. Future work will Mathematical programming, vol. 95, no. 1, pp. 3–51, 2003.
consider capturing localization and robot dynamics errors in [24] E. Coumans and Y. Bai, “PyBullet, a Python module for physics
simulation for games, robotics and machine learning.” http://
the safe control formulation and will apply our techniques pybullet.org, 2016.
to real autonomous navigation experiments. [25] J. Cortés and M. Egerstedt, “Coordinated control of multi-robot
systems: A survey,” SICE Journal of Control, Measurement, and
R EFERENCES System Integration, vol. 10, no. 6, pp. 495–503, 2017.
[26] A. D. Ames, K. Galloway, K. Sreenath, and J. W. Grizzle, “Rapidly
[1] O. Khatib, “Real-Time Obstacle Avoidance for Manipulators and exponentially stabilizing control Lyapunov functions and hybrid zero
Mobile Robots,” The International Journal of Robotics Research, dynamics,” IEEE Transactions on Automatic Control, vol. 59, no. 4,
vol. 5, no. 1, pp. 90–98, 1986. pp. 876–891, 2014.
[2] E. Rimon and D. E. Koditschek, “Exact robot navigation using [27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
artificial potential functions,” IEEE Transactions on Robotics and tion,” arXiv preprint: 1412.6980, 2015.
Automation, vol. 8, no. 5, pp. 501–518, 1992.
1131