0% found this document useful (0 votes)
22 views8 pages

Real-Time_Neural_MPC_Deep_Learning_Model_Predictive_Control_for_Quadrotors_and_Agile_Robotic_Platforms

Uploaded by

fnoentouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

Real-Time_Neural_MPC_Deep_Learning_Model_Predictive_Control_for_Quadrotors_and_Agile_Robotic_Platforms

Uploaded by

fnoentouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO.

4, APRIL 2023 2397

Real-Time Neural MPC: Deep Learning Model


Predictive Control for Quadrotors and
Agile Robotic Platforms
Tim Salzmann , Elia Kaufmann , Member, IEEE, Jon Arrizabalaga , Graduate Student Member, IEEE,
Marco Pavone , Member, IEEE, Davide Scaramuzza , Senior Member, IEEE, and Markus Ryll , Member, IEEE

Abstract—Model Predictive Control (MPC) has become a pop-


ular framework in embedded control for high-performance au-
tonomous systems. However, to achieve good control performance
using MPC, an accurate dynamics model is key. To maintain real-
time operation, the dynamics models used on embedded systems
have been limited to simple first-principle models, which substan-
tially limits their representative power. In contrast to such simple
models, machine learning approaches, specifically neural networks,
have been shown to accurately model even complex dynamic effects,
but their large computational complexity hindered combination
with fast real-time iteration loops. With this work, we present
Real-time Neural MPC, a framework to efficiently integrate large,
complex neural network architectures as dynamics models within
a model-predictive control pipeline. Our experiments, performed
in simulation and the real world onboard a highly agile quadrotor
platform, demonstrate the capabilities of the described system to
run learned models with, previously infeasible, large modeling ca-
Fig. 1. Embedded Model Predictive Control using a neural network as learned
pacity using gradient-based online optimization MPC. Compared dynamics model. Naive integration of the neural network in the MPC opti-
to prior implementations of neural networks in online optimization mization loop would lead to extensive optimization times (red) resulting in
MPC we can leverage models of over 4000 times larger parametric instabilities. Our approach can handle complex larger learning models while
capacity in a 50 Hz real-time window on an embedded platform. being real-time capable (green).
Further, we show the feasibility of our framework on real-world
problems by reducing the positional tracking error by up to 82%
when compared to state-of-the-art MPC approaches without neural
network dynamics.
to simultaneously address actuation constraints and performance
Index Terms—Machine learning for robot control, model objectives through optimization. Due to its predictive nature, the
learning for control, aerial systems: Mechanics and control.
performance of MPC hinges on the availability of an accurate
dynamics model of the underlying system. This requirement is
I. INTRODUCTION exacerbated by strict real-time constraints, effectively limiting
the choice of dynamics models on embedded platforms to simple
ODEL Predictive Control (MPC) is one of the most pop-
M ular frameworks in embedded control thanks to its ability
first-principle models. Combining MPC with a more versatile
and efficient dynamics model would allow for an improvement in
performance, safety and operation closer to the robot’s physical
Manuscript received 3 September 2022; accepted 24 January 2023. Date of
publication 20 February 2023; date of current version 15 March 2023. This letter limits.
was recommended for publication by Associate Editor G. Loianno and Editor P. Precise dynamics modeling of autonomous systems is chal-
Pounds upon evaluation of the reviewers’ comments. (Corresponding author: lenging, e.g. when the platform approaches high speeds and
Tim Salzmann.)
Tim Salzmann and Jon Arrizabalaga are with the Technical Univer-
accelerations or when in contact with the environment. Accu-
sity of Munich, 85521 Munich, Germany (e-mail: [email protected]; rate modeling is especially challenging for autonomous aerial
[email protected]). systems, as high speeds and accelerations can lead to complex
Elia Kaufmann and Davide Scaramuzza are with the University of aerodynamic effects [1], and operating in close proximity to
Zurich, 8050 Zurich, Switzerland (e-mail: [email protected]; davide. obstacles with an aerial vehicle requires modeling of interaction
[email protected]).
Marco Pavone is with Stanford University and NVIDIA Research, Stanford, forces, e.g. ground effect. Data-driven approaches, in particu-
CA 94305 USA (e-mail: [email protected]). lar neural networks, demonstrated the capability to accurately
Markus Ryll is with the Technical University of Munich, 85521 Munich, Ger- model highly nonlinear dynamical effects [1], [2]. However,
many, and also with the Munich Institute of Robotics and Machine Intelligence due to their large computational complexity, the integration of
(MIRMI), 80992 Munich, Germany (e-mail: [email protected]).
This letter has supplementary downloadable material available at
such models into embedded MPC pipelines remains challenging
https://doi.org/10.1109/LRA.2023.3246839, provided by the authors. due to high frequency real-time requirements. To overcome this
Digital Object Identifier 10.1109/LRA.2023.3246839 problem prior works have relied on one of two strategies:

2377-3766 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
2398 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 4, APRIL 2023

I) Largely reducing the model’s capacity to the point where TABLE I


a lot of the predictive performance is lost but real-time speeds COMPARISON OF STATE-OF-THE-ART DATA-DRIVEN MPC ALGORITHMS AND
THEIR MODELING CAPACITY USED FOR REAL-TIME (RT) APPLICATIONS
can be achieved [2], [3], [4], [5], [6]. Commonly, the model
is reduced to a Gaussian Process (GP) with few supporting
points [3], [4] or small neural networks [5], [6], [7]. Still, these
methods are exclusively applied off-device on a powerful CPU.
II) A control strategy different to online optimized MPC
is used which are either non-predictive [8], [9], do not use
online optimization [7], [10], [11], or learn the controller end-
to-end [12], [13], [14], [15], [16], [17].
In this letter, we present an efficient framework, Real-time
Neural MPC (RTN-MPC), that allows for the integration of
large-capacity, data-driven dynamics models in online optimiza-
tion Model Predictive Control and its deployment in real-time Previous works that leverage the representational power of
on embedded devices (see Fig. 1). Specifically, the framework deep networks for such modeling tasks include aerodynamics
enables the integration of arbitrary neural network architectures modeling of quadrotors [1], [2], [18] and helicopters [19],
as dynamics constraints into the MPC formulation. To this end, turbulence prediction [20], tire friction modeling [21], and
RTN-MPC leverages CPU or GPU parallelized local approxima- actuator modeling [22]. Although these works demonstrated
tions of the data-driven model. Compared to a naive integration that neural networks can learn system models that are able to
of a deep network into an MPC framework, our approach allows learn the peculiarities of real-world robotic systems, they were
unconstrained model architecture selection, embedded real-time restricted to simulation-only use cases or employed the network
capability for larger models, and GPU acceleration, without a predictions as simple feedforward components in a traditional
decrease in performance. control pipeline: Saviolo et al. [2] had to revert their accurate
physics-inspired model to a simple multi-layer-perceptron for
closed-loop control.
Contribution
Data-Driven Control: Leveraging the power of learned mod-
Our contribution is threefold: First, we formulate the com- els in embedded control frameworks has been extensively re-
putational paradigm for RTN-MPC, an MPC framework which searched in recent years. Most approaches have focused on
uses deep learning models in the prediction step. By separating combining the learned model with a simple reactive control
the computationally heavy data-driven model from the MPC scheme, such as the “Neural Lander” approach [8]. Neural
optimization we can leverage efficient online approximations Lander uses a learned model of the aerodynamic ground-effect
which allow for larger, more complex models while retaining to substantially improve a set-point controller in near-hover con-
real-time capability. Second, we compare and ablate the MPC ditions. In [23], a learned recurrent dynamics model formulates
problem with and without our RTN-MPC paradigm demonstrat- a model-based control problem. While this approach allowed
ing improved real-time capability on CPU, which is further the system to adapt online to changing operating conditions, it
enhanced when GPU processing is available. Finally, we eval- cannot account for system constraints such as limited actuation
uate our approach on multiple simulation-based and real-world input. Recent approaches that integrate the modeling strengths
experiments using a high speed quadrotor in aggressive and of data-driven approaches in the MPC framework propose the
close-to-obstacle maneuvers. All while running large models, use of Gaussian Processes (GP) as a learned residual model
multiple magnitudes higher in capacity compared to state-of- for race cars [4] and quadrotors [3]. For Gaussian Processes,
the-art algorithms, in a real-time window. both their complexity and accuracy scale with the number of
To the best of the authors’ knowledge, this is the first approach inducing points and with their dimensionality, limiting their
enabling data-driven models, in a real-time on-board gradient- performance on embedded systems. The approaches of Chee
based MPC setting on agile platforms. Further, it scales to large et al. [5], Williams et al. [7] and Spielberg et al. [6] follow the
models, vastly extending simple two or three-layer networks, on approach of [3] but model the quadrotor’s dynamics residual
an off-board CPU or GPU enabling model sizes deemed unfit using a small neural network for different applications. In Table
for closed-loop MPC [1], [2]. The introduced framework, while I, we compare existing data-driven MPC approaches based on
demonstrated on agile quadrotors, can be applied broadly and their modeling capacity. All state-of-the-art models are severely
benefit any controlled agile system such as autonomous vehicles limited by the small modeling capacity of either GPs with a
or robotic arms. small number of supporting points or small two- or three-layer
neural networks.
With the rise of deep reinforcement learning (RL), a new class
II. RELATED WORK of controllers for robotic systems has emerged that directly maps
With the advent of deep learning, there has been a considerable sensory observations to actions. Popular instances of such RL
amount of research that aims to combine the representational controllers are imitation learning of an expert controller [12],
capacity of deep neural networks with system modeling and [13], [16] as well as Model-free and Model-based reinforcement
control. In the following, we provide a brief overview of prior learning [7], [10], [11], [17], [24]. Although such approaches
work that focuses on learning-based dynamics modeling, and achieve high control frequencies and may outperform online
data-driven control. MPC approaches, they commonly require training in simulation,
Data-driven Dynamics Models: Thanks to their ability to do not allow for tuning without costly retraining, and often
identify patterns in large amounts of data, deep neural networks discard the optimality, robustness and generalizability of an
represent a promising approach to model complex dynamics. online optimized MPC framework.

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
SALZMANN et al.: REAL-TIME NEURAL MPC: DEEP LEARNING MODEL PREDICTIVE CONTROL FOR QUADROTORS 2399

Our work is inspired by [2], [3], [4], [5], [6], [7] but replaces program (QP). The solution to the QP leads to an update on
the Gaussian Process dynamics of [3], [4] or the small neural the iterate ω i+1 = ω i + Δω i where the step Δω i is given by
networks of [5], [6] with networks of higher modeling capac- solving the following QP
ity [1], [2] and uses gradient-based optimization as opposed to a N
 −1        
sampling-based scheme [7]. The resulting framework allows a qk Δxk Δxk Δxk
min + Hk
combination of the versatile modeling capabilities of deep neural Δω i rk Δuk Δuk Δuk
k=0
networks with state-of-the-art embedded optimization software
without tightly constraining the choice of network architecture. subject to
Δxk+1 = Ak Δxk + Bk Δuk + φ̄k − xk+1 , (2)
III. PROBLEM SETUP
k = 0, . . . , N − 1,
In its most general form, MPC solves an optimal control
problem (OCP) by finding an input command u which mini- − ḡk ≥ Gxk Δxk + Guk Δuk , k = 0, . . . , N ,
mizes a cost function L subject to its system dynamics model (3)
ẋ = f (x, u) while accounting for constraints on input and where qk = δxδ i L(xik , uik ), rk = δuδ i L(xik , uik ) linearize the
state variables for current and future timesteps. Traditionally, k k

the model f is manually derived from first principles using cost function and, under given circumstances, the hessian Hk
“simple” differential-algebraic equations (DAE) which often can be approximated by the Gauss-Newton algorithm. φ̄k
neglect complicated dynamics effects such as aerodynamics and ḡk are shorthand notations for the function evaluations
or friction as they are hard or computationally expensive to φ(xik , uik , f, δt) and g(xik , uik ). The main computational burden
formalize. Following prior works [2], [3], [4], [5], we partition lies in the parameter computation of the continuity condition
f into a mathematical combination of first principle DAEs fF (2). Specifically for each shooting node k = 0, . . . , N − 1 we
and a learned data-driven model fD . This enables more general need to compute
models extending the capability of DAE dynamics models. To δ δ
Ak = i φ(xik , uik , f, δt), Bk = φ(xik , uik , f, δt),
solve the aforementioned OCP, we approximate it by discretizing δxk δuik
the system into N steps of step size δt over a time horizon T
using direct multiple shooting [25] which leads to the following φ̄k = φ(xik , uik , f, δt).
nonlinear programming (NLP) problem Leading to N ∗ E ∗ 2 evaluations of the partial differentiations
N
 −1 δf (x, u) = δfN (x, u) + δfD (x, u)
min L(xk , uk ) and N ∗ E function evaluations
u
k=0
f (x, u) = fN (x, u) + fD (x, u)
subject to xk=0 = x0
of the dynamics equation. For computational heavy data-driven
xk+1 = φ(xk , uk , f, δt) dynamics models fD this leads to extensive processing times
generating the QP.
f (xk , uk ) = fF (xk , uk ) + fD (xk , uk ) The learned data-driven dynamics fD are assumed to be
g(xk , uk ) ≤ 0 (1) accurate over the entire input space of states and controls present
in the training dataset. However, to create the QP continuity
where x0 denotes the initial condition and g can incorporate (in-) condition we only require the model and its differentiations
equality constraints, such as bounds in state and input variables. to be accurate in and around specific input values ω i . Thus,
φ is the numerical integration routine to discretize the dynamics to speed up the QP generation we replace the computationally
equation where commonly a 4th order Runge-Kutta algorithm heavy globally valid data-driven dynamics equation fD with a
is used involving E = 4 evaluations of the dynamics function computationally light locally valid approximation up to second
f . To leverage advancements in embedded solvers, the NLP is order around the current iterate
optimized using sequential quadratic programming (SQP) with  
ω being the SQP iterate ω i = [xi0 , ui0 , . . . , xiN −1 , uiN −1 ]. ∗ i i x − xik
fD (x, u) ≈ f̄D + JD,k
u − uik
IV. BRINGING NEURAL MPC TO ONBOARD REAL-TIME    
1 x − xik i x − xik
In this section, we lay down the key concepts to speed up the + HD,k . (4)
2 u − uik u − uik
optimization times of MPC control with neural networks. The
key insight in Section IV-A is that local approximations of the The required differentiations are readily available as submatri-
learned dynamics are sufficient to keep alike performance while ces of JiD,k for first-order approximations or as submatrices of a
drastically improving the generation process of the optimization Tensor multiplication and sum for second-order approximations.
problem. This insight is utilized in a three-phased embedded The induced error of this computational simplification is of
real-time optimization procedure in Section IV-B. second order for a first-order approximation and of third order
for a second-order approximation in the size of state and control
changes between nodes. We will experimentally demonstrate
A. Locally Approximated Continuity Quadratic Program this error to be neglectable for agile platforms where δt is small
Due to advances in embedded optimization solvers, SQP has in Section VII.
become a well-suited framework to efficiently solve NLPs re- Applying (4), the QP creation becomes independent of the
sulting from multiple shooting approximations of OCPs. This in- complexity and architecture of the data-driven dynamics model.
volves repetitively approximating and solving (1) as a quadratic Further, with JiD,k and HiD,k being the single interfaces between

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
2400 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 4, APRIL 2023

Fig. 2. Data flow for our RTN-MPC algorithm. The data-driven (DD) prepa- Fig. 3. Evaluation of real-time capability for different two-layer model para-
ration phase is performed efficiently using optimized machine learning batch- metric capacities. We evaluate on an embedded platform (Nvidia Jetson Xavier
differentiation tools on CPU or GPU. NX) and a laptop machine (Intel i7, Nvidia RTX 3000). Parametric model capac-
ity is approximated by the squared number of neurons per layer. The RTN-MPC
framework can run 4000 times larger models in parametric complexity compared
to a naive implementation. To make the results comparable, we define a target
the SQP optimization and the data-driven dynamics model, we run-time window of at least 50 Hz (dashed red line) and preferably over 100 Hz
(dashed green line). However, in a real-world scenario the real-time window is
are free to optimize the approximation process independent of specific to the use-case.
the NLP framework; passing them as parameters to the continu-
ity condition procedure of the QP generation. As fD is a neural
network model commonly consisting of large matrix multiplica-
tions we are therefore free to use algorithms and hardware opti- arbitrary neural network models, trainable in PyTorch and usable
mized for neural network evaluation and differentiation. Those in CasADi.
capabilities are readily available in modern machine learning Further, we will compare our RTN-MPC approach against
tools such as PyTorch [26] and TensorFlow [27]. This enables a naive implementation of a neural network data-driven MPC
us to calculate the Jacobians and Hessians for all shooting nodes as applied in [2], [5], [6]. Here, the learned model is directly
N as a single parallelized batch on CPU or GPU. constructed in CasADi in the form of trained weight matrices
and activation functions. Subsequently, the QP generation and
automatic differentiation engine in CasADi has to deal with the
B. Real-Time Neural MPC
full neural-network structure for which it is lacking optimized
Even without a data-driven dynamics model, solving the SQP algorithms while being confined to the CPU.
until convergence is computationally too costly in real-time for
agile robotic platforms. To account for this shortcoming, MPC V. RUNTIME ANALYSIS
applications subjected to fast dynamics are commonly solved
using a real-time-iteration scheme (RTI) [28], where only a We demonstrate the computational advantage of our proposed
single SQP iteration is executed - one quadratic problem is con- RTN-MPC paradigm compared to a naive implementation of a
structed and solved as a potentially sub-optimal but timely input data-driven dynamics model in online MPC. Thus, we construct
command is preferred over an optimal late one. As shown in an experimental problem in which the nominal dynamics is
Fig. 2, RTN-MPC divides the real-time optimization procedure trivial while the data-driven dynamics can be arbitrarily scaled
into three parts: QP Preparation Phase, Data-Driven Dynamics in computational complexity. As such the nominal dynamics
Preparation Phase and Feedback Response. model is a double integrator on a position p while the data-
With available iterate ω i , the data-driven dynamics prepa- driven dynamics is a neural network of variable architecture. To
ration phase calculates f¯Di and JiD,k using efficient batched solely focus on the computational complexity of the data-driven
differentiates of the data-driven dynamics on CPU or GPU. dynamics, rather than modeling accuracy, the networks are not
Meanwhile, the QP preparation phase constructs a QP by trained but weights are manually adjusted to force a zero output.
   
linearizing around xi and control ui using a first-order approx- ṗ ṗ
imation fD∗ (x, u) for the continuity condition parametrized by ẋ = = fF (x, u) = ,
p̈ u
the result of the data-driven dynamics preparation phase.
Once a new disturbed state xk=0 is sensed, the feedback re-   
0
sponse phase solves the pre-constructed QP using the disturbed f (x, u) = fF (x, u) + fD (x, u) . (5)
state as input. The iterate ω is adjusted with the QP result and
the optimized command u is sent to the actuators. We use an explicit Runge-Kutta method of 4th order
φ(x, u, f, δt) = RK4(x, u, f, δt) to numerically integrate f .
In this experiment, we simulate the system without any model-
C. Implementation plant-mismatch to focus solely on runtime. The optimization
To demonstrate the applicability of the RTN-MPC paradigm, problem is solved by constructing the multiple shooting scheme
we provide a implementation1 using CasADi [29] and aca- with N = 10 nodes.
dos [30] as the optimization framework and PyTorch [26] as Fig. 3 compares two-layer networks with increasing neuron
ML framework. This enables the research community to use count for a naive implementation and our RTN-MPC framework.
On an embedded system, such as the Nvidia Jetson Xavier NX,
our approach enables larger models of factor 60 in parametric
1 Framework Code: https://github.com/TUM-AAS/ml-casadi complexity on CPU and of factor 4000 on GPU while staying

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
SALZMANN et al.: REAL-TIME NEURAL MPC: DEEP LEARNING MODEL PREDICTIVE CONTROL FOR QUADROTORS 2401

TABLE II
RUNTIME COMPARISON BETWEEN NAIVE IMPLEMENTATION AND RTN-MPC

Fig. 4. Quadrotor model with world and body frames and propeller numbering
convention. Grey arrows indicate the spinning direction of the individual rotors.

Nominal Quadrotor Dynamics Model: The nominal dynamics


assume the quadrotor to be a 6◦ -of-freedom rigid body of mass
m and diagonal moment of inertia matrix J = diag(Jx , Jy , Jz ).
Our model is similar to [3], [32], [33] as we write the nominal dy-
namics ẋ up to second order derivatives, leaving the quadrotors
within a real-time window above 50 Hz. Running on a desktop, individual rotor thrusts Ti ∀ i ∈ (0, 3) as control inputs u ∈ R4 .
which is the current default in data-driven MPC research [2], The state space is thus 13-dimensional and its dynamics can be
[3], [5], [21], we can run two-layer models with more than 150 written as:
⎡ ⎤
million parameters above 100 Hz on a low-end GPU (Nvidia ⎡ ⎤ vW
RTX3000). ṗW B ⎢   ⎥
⎢q̇ W B ⎥ ⎢ 0 ⎥
We further evaluate the runtime of a broad range of deep ⎢ q · ⎥
ẋ = ⎢ ⎥ = fF (x, u) = ⎢
⎣v̇ W B ⎦
WB
ω B /2 ⎥
learning architectures in Table II. While the naive approach has ⎢ 1 ⎥
better runtime for small networks, our approach dominates for ⎣ m qW B T B + gW ⎦
ω̇ B
larger and deeper networks enabling running a 12 layer 512 J −1 (τ B − ω B × J ω B )
neurons each network above 50 Hz on an embedded CPU and (6)
above 500 Hz on a desktop CPU. To demonstrate that complex
network architectures are easily integrated in the MPC loop with g W = [0, 0, −9.81 m/s2 ] denoting Earth’s gravity, T B
using RTN-MPC, we run a full CNN ResNet model [31] with 18 the collective thrust and τ B the body torque. Again, an explicit
convolutional layers in the optimization loop above 50 Hz when Runge-Kutta integration of 4th order is used.
leveraging the GPU capabilities of our framework. Augmented Aerodynamic Residual Models: Following previ-
ous works [3], [4], we use the data-driven model, in the form
of a neural network N , to complement the nominal dynamics
VI. EXPERIMENTAL SETUP by modeling a residual. In its full configuration, our residual
While the RTN-MPC framework described in Section III can dynamics model is defined as
be applied to a variety of robotic applications, we will use agile f (x, u) = fF (x, u) + fD (x, u),
quadrotor flight maneuvers to showcase its potential for real- ⎡ ⎤
world problems2 . 02
Notation: Scalars are denoted in lowercase s, vectors in low- fD (x, u) = ⎣ fDθ (x, u) ⎦ , (7)
ercase bold v, and matrices in uppercase bold M . Coordinate fDψ (x, u)
frames such as the World W and Body B frames are defined
with orthonormal basis i.e. {xW , y W , z W }, with the Body where we individually account for disturbances in linear and
frame being located at the center of mass of the quadrotor (see angular accelerations unknown to the nominal dynamics and θ
Fig. 4). A vector from coordinate p1 to p2 expressed in the and ψ are the parameters of the neural networks modeling linear
W frame is written as W v 12 . If the vector’s origin coincides and angular disturbances respectively.
with the frame it is described in, the frame index is dropped, We also evaluate two simplified versions of the residual
e.g. the quadrotor position is denoted as pW B . Orientations model:
⎡ ⎤ ⎡ ⎤
are represented using unit quaternions q = (qw , qx , qy , qz ) with 02 02
q = 1, such as the attitude state of the quadrotor body q W B . f Da (x, u) = ⎣f Dθ (v B )⎦ , f Da,u (x, u) = ⎣f Dθ (v B , u)⎦ .
Finally, full SE3 transformations, such as changing the frame
of reference from Body to World for a point pB1 , are described 0 0
by W pB1 = W tW B + q W B pB1 . Note the quaternion-vector (8)
product denoted by representing a rotation of the vector v by These simplified models only consider residual forces as a func-
the quaternion q as in q v = qvq̄, where q̄ is the quaternion’s tion of the platform’s velocity (left), potentially accompanied by
conjugate. the commanded inputs (right).
Augmented Ground Effect Model: To show the strength of our
approach, leveraging a complex arbitrary high level input, we
2 Experimental Code: https://github.com/TUM-AAS/neural-mpc extend the residual model using a height map under the quadrotor

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
2402 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 4, APRIL 2023

as additional input to model the ground effect. TABLE III


⎡ ⎤ RESULTS FOR THE SIMPLIFIED SIMULATION EXPERIMENT
02
f Dg (x, u) = ⎣f Nθ (x, u, zW B · 1 − hl (pW B , H W ))⎦
0
where zW B is the altitude of the quadrotor and hl is a mapping
hl : R3 × RN ×M → R3×3 which takes the quadrotor’s position
pW B and a fixed or sensed global height map H W of size N ×
M as input. The function returns a 3 × 3 local patch of the height
map around the quadrotor’s position with a resolution of 10 cm.
MPC Cost Formulation: We specify the cost in (1) to be of
quadratic form L(x, u) = x − xr 2Q + u − ur 2R penaliz-
ing deviations from a reference trajectory xr , ur and account
for input limitations by constraining 0 ≤ u ≤ umax .

VII. EXPERIMENTS
In our experiments we will re-validate the findings of previ-
ous works [2], [5] that using neural-network data-driven mod-
els in MPC improves tracking performance compared to no a non-augmented MPC controller, a naive integration of data-
data-driven models or Gaussian Processes. More importantly, driven dynamics [2], [5], [6], and GPs [3] with respect to
however, we will demonstrate that RTN-MPC enables the use of real-time capability and model capacity.
larger network capacities to fully exhaust possible performance Simplified Quadrotor Simulation: We use the simulation
gains while providing real-time capabilities. framework described in [3], where perfect odometry measure-
All our experiments are divided into two phases: system ments and ideal tracking of the commanded single rotor thrusts
identification and evaluation. During system identification, we are assumed. Drag effects by the rotors and fuselage are sim-
collect data using the nominal dynamics model in the MPC ulated, as well as zero mean (σ = 0.005) constant Gaussian
controller. The state-control-timeseries are further processed in noise on forces and torques, and zero mean Gaussian noise
subsequent state, control tuples. Each step is then re-simulated on motor voltage signals with standard
√ deviation proportional
using the nominal controller and the error is used as the training to the input magnitude σ = 0.02 u. There are no run-time
label for the residual model. constraints as controller and simulator are run sequentially in
During evaluation we track two fixed evaluation trajectories, simulated time. Using the simplified simulation, we analyze the
Circle and Lemniscate, and measure the performance based on predictive performance and run-time of our approach for varying
the reference position tracking error. As such, we report the network sizes and directly compare to the naive implementa-
(Mean) Euclidean Distance between the reference trajectory and tion and Gaussian Process approach. We constrain the residual
the tracked trajectory as error. model to linear accelerations f Da to facilitate comparison with
To identify model architectures used in the experiments we prior work [3]. To fairly evaluate the run-times of our full and
use a naming convention stating the model type followed by distributed approach and considering the limited resources of
the size and the implementation type where we differentiate embedded systems this experiment was performed on a single
between our RTN-MPC approach (-Ours) and a naive integration CPU core. The results are depicted in Table III. We also compare
(-Naive). N-3-32-Ours is a neural network model with 3 hidden with a Nominal model where no learned residuals are modeled in
layers, 32 neurons each using our RTN-MPC framework and the dynamics function and we also compare with an oracle-like
N-3-32-Naive using a naive integration. GP-20 is a Gaussian Perfect model which uses the same dynamics equations as the
Process Model with 20 inducing points. simulation (excluding noise). Neural networks which achieve
All of our learned dynamic models are trained with a batch accurate modeling performance on the simulated dynamics are
size of 64 and a learning rate of 1e−4 using the Adam optimizer. integrated easily with real-time optimization times below 3 ms
We split all datasets into a training and validation part and using our approach while they have high optimization times (up
train the models using early stopping on the validation set. to 36 ms) when a naive integration approach is used. The local
Dataset sizes are 20 k datapoints for the simple simulation approximations described in Section IV-A do not negatively
environment, 200 k for the BEM simulation environment, and influence performance compared to a naive implementation.
we use the openly available dataset presented in [1] with 1.8 Furthermore, we demonstrate that such modeling performance
million datapoints for the real-world experiment. is not reachable with a GP even when using a large number of
When comparing against GPs we follow the original im- supporting points.
plementation of [3] for the f Da model configuration where BEM Quadrotor Simulation: In addition to the simplified
one single-input-single-output GP is trained per dimension. For simulation setting, we also evaluate our approach in a highly
the f Da,u configuration, their implementation is extended to a accurate aerodynamics simulator based on Blade-Element-
multi-input-single-output GP per dimension. Momentum-Theory (BEM) [1]. In contrast to the simplified
simulation setting, this simulation can accurately model lift
and drag produced by each rotor from the current ego-motion
A. Simulation
of the platform and the individual rotor speeds. The simulator
We use two simulation environments featuring varying mod- runs in real-time and communicates with the controller via the
eling accuracy and real-time requirements to compare against Robot Operating System (ROS). We target a real-time control

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
SALZMANN et al.: REAL-TIME NEURAL MPC: DEEP LEARNING MODEL PREDICTIVE CONTROL FOR QUADROTORS 2403

TABLE IV
RESULTS FOR THE REAL-WORLD EXPERIMENT

Fig. 5. Control Frequency over Tracking Error for Lemniscate trajectory in


the realistic BEM simulation - top-left is desired. Our approach (blue) can lever-
age multidimensional inputs and large model capacities while being real-time
capable. Increasing the naive approach to a four layer network (orange) leads
to the controller becoming unstable for high-dimensional input. No additional
noise is simulated, leading to error standard deviations within 1 mm over 5 trials
per experiment, induced by non-deterministic ROS transportation times.

frequency of 100 Hz. We want to understand how our approach


copes with increasing parameter count and model complexity of
the learning task: First, we change the learned dynamics from
just modeling linear acceleration residuals f Da with velocity
as inputs to also accounting for rotor commands f Da,u . In a
second step we, model the full residual f D additionally out-
putting residuals on angular accelerations. The results obtained
in each of these settings are illustrated in Fig. 5. While a
naive approach can accurately model the residuals, its control
frequency quickly declines for increasingly complex models.
For larger networks, it has excessively high optimization times
leading to the controller becoming unstable even in simulation.
In contrast, our RTN-MPC approach can leverage both higher
modeling capacity and the most representative residual model
f D for on-par performance while running above 200 Hz.
Fig. 6. (a) Quadrotor overflying the table in close proximity to the plane.
(b) Vertical position error over distance. Vertical lines mark the position of the
B. Real World table. Our approach can model the aerodynamic effects in close proximity to the
ground, substantially limiting the tracking error in z.
Finally, we perform experiments evaluating the real world
effectiveness of our approach by performing a set of agile tra-
jectories using the physical quadrotor platform agilicious [34]. over the table with a target altitude of 80 cm of the quadrotor’s
Control commands in the form of desired collective thrust and center of gravity; leaving approximately 2 cm between the table
body rates are computed on a Jetson Xavier NX and are tracked and the lowest point of the quadrotor (battery). To isolate the
by a low-level PID controller. All real-world flight experiments performance of our approach, compensating for ground effect,
are performed in an instrumented tracking arena that provides we evaluate the trained model in two configurations. First, in
accurate pose estimates at 400 Hz. As in the simulation exper- which the height map information is unknown to the model
iments, we compare the tracking error along both circle and (Baseline), and second where the information is known to the
lemniscate trajectories at speeds up to 14 m/s. We evaluate our model. On an evaluation trajectory with 8 flyovers we improve
approach against the nominal controller, the naive integration, the tracking error in z direction by 72% in close proximity (table
and the Gaussian Process configuration deployed in [3]. The plane +10 cm in xy) above of the table. A visualization of a single
results of these experiments is depicted in Table IV, where we flyover can be seen in Fig. 6.
improve positional tracking error by up to 82% compared to the
nominal controller while the naive integration becomes unstable
VIII. CONCLUSION
due to a long optimization time. Furthermore, we outperform
Gaussian Processes by up to 55%. In this work we demonstrated an approach to scale the mod-
Ground Effect: Finally, we demonstrate the generalizability of eling capacity of data-driven MPC using neural networks to
our approach to other use-cases, modeling the complex aerody- larger, more powerful architectures while being real-time capa-
namics of the ground effect using a height map as input (See ble on embedded devices. Our framework can improve new and
Section VI). We place a table of 70 cm height in the flight existing applications of data-driven MPC by increasing the
arena and collect data by repeatedly flying over the table in available real-time modeling capacity; making our approach
close proximity. During evaluation, we fly repeated trajectories generalizable to a variety of control applications.

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.
2404 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 4, APRIL 2023

An open challenge, which is not yet considered in this work, [13] J. Nubert, J. Köhler, V. Berenz, F. Allgöwer, and S. Trimpe, “Safe and
but the authors plan to tackle in the future, is to use a historic fast tracking on a robot manipulator: Robust MPC and neural network
sequence of states and control input in a learned dynamics control,” IEEE Robot. Automat. Lett., vol. 5, no. 2, pp. 3050–3057,
Apr. 2020.
model. This would naturally lead to incorporating sequential [14] D. Wang et al., “Model predictive control using artificial neural network
and temporal models such (LSTMs, GRUs, and TCNs) in the for power converters,” IEEE Trans. Ind. Electron., vol. 69, no. 4, pp. 3689–
optimization loop using our approach and would give rise to 3699, Apr. 2022.
running approaches currently only feasible in simulation [1] in [15] R. Winqvist, A. Venkitaraman, and B. Wahlberg, “On training and eval-
uation of neural network approaches for model predictive control,” 2020,
embedded MPC real-time. arXiv:2005.04112.
We experimentally show that the controller’s performance is [16] E. Kaufmann, A. Loquercio, R. Ranftl, M. Müller, V. Koltun, and D. Scara-
not negatively affected by the real-time inducing approxima- muzza, “Deep drone acrobatics,” in Proc. 13th Int. Joint Conf. Artif. Int.,
tions. Thus, this method overcomes the limitation of having Z. H. Zhou, Ed. Aug. 2021, pp. 4780–4783, doi: 10.24963/ijcai.2021/650.
[17] M. Henaff, A. Canziani, and Y. LeCun, “Model-predictive policy learning
to sacrifice performance for efficiency as described in previous
with uncertainty regularization for driving in dense traffic,” in Proc.
works [2], [5]. We demonstrate its usefulness by evaluating the Int. Conf. Learn. Representations, 2019. [Online]. Available: https://
isolated real-time capability of RTN-MPC on different devices openreview.net/forum?id=HygQBn0cYm
and applying the framework to the challenging problem of trajec- [18] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin,
tory tracking of a highly agile quadrotor; reducing the tracking “Learning quadrotor dynamics using neural network for flight control,”
in Proc. IEEE Conf. Decis. Control Inst. Elect. Electron. Eng. Inc, 2016,
error substantially while using powerful models on-device. pp. 4653–4660.
[19] A. Punjani and P Abbeel, “Deep learning helicopter dynamics models,” in
Proc. Int. Conf. Robot. Automat., 2015, pp. 3223–3230.
ACKNOWLEDGMENT [20] Z. Li, N. B. Kovachki, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and
We would like to thank Matteo Zallio for his help in visually A. Anandkumar, “Fourier neural operator for parametric partial differential
equations,” in Proc. Int. Conf. Learn. Representations, 2020.
communicating our work. [21] N. A. Spielberg, M. Brown, N. R. Kapania, J. C. Kegelman, and J. C.
Gerdes, “Neural network vehicle models for high-performance automated
driving,” Sci. Robot., vol. 4, no. 28, 2019, Art. no.eaaw1975.
REFERENCES [22] J. Hwangbo et al., “Learning agile and dynamic motor skills for legged
robots,” Sci. Robot., vol. 4, no. 26, 2019, Art. no. eaau5872.
[1] L. Bauersfeld, E. Kaufmann, P. Foehn, S. Sun, and D. Scaramuzza,
[23] I. Lenz, R. Knepper, and A. Saxena, “DeepMPC: Learning deep latent fea-
“NeuroBEM: Hybrid aerodynamic quadrotor model,” in Proc. Robot.: Sci.
tures for model predictive control,” in Proc. Robot.: Sci. Syst. Robot.: Sci.
Syst., 2021. [Online]. Available: http://arxiv.org/abs/2106.08015
Syst. Found., 2015. [Online]. Available: http://www.roboticsproceedings.
[2] A. Saviolo, G. Li, and G. Loianno, “Physics-inspired temporal learning
org/rss11/p12.pdf
of quadrotor dynamics for accurate model predictive trajectory tracking,”
[24] O. M. Andrychowicz et al., “Learning dexterous in-hand manipulation,”
IEEE Robot. Automat. Lett., vol. 7, no. 4, pp. 10256–10263, Oct. 2022.
Int. J. Robot. Res., vol. 39, no. 1, pp. 3–20, 2020.
[Online]. Available: https://ieeexplore.ieee.org/document/9834096/
[25] H. Bock and K. Plitt, “A multiple shooting algorithm for direct solu-
[3] G. Torrente, E. Kaufmann, P. Foehn, and D. Scaramuzza, “Data-driven
tion of optimal control problems,” IFAC Proc. Volumes, vol. 17, no. 2,
MPC for quadrotors,” IEEE Robot. Automat. Lett., vol. 6, no. 2,
pp. 1603–1608, 1984. [Online]. Available: https://linkinghub.elsevier.
pp. 3769–3776, Apr. 2021. [Online]. Available: http://arxiv.org/abs/2102.
com/retrieve/pii/S1474667017612059
05773
[26] A. Paszke et al., “PyTorch: An imperative style, high-performance deep
[4] J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning-based
learning library,” in Proc. 33rd Int. Conf. Neural Inf. Process. Syst., 2019,
model predictive control for autonomous racing,” IEEE Robot. Automat.
pp. 8026–8037.
Lett., vol. 4, no. 4, pp. 3363–3370, Oct. 2019.
[27] M. Abadi et al., “TensorFlow: Large-scale machine learning on heteroge-
[5] K. Y. Chee, T. Z. Jiahao, and M. A. Hsieh, “KNODE-MPC: A knowledge-
neous distributed systems,” 2016, arXiv:1603.04467.
based data-driven predictive control framework for aerial robots,” IEEE
[28] M. Diehl, H. Bock, J. P. Schlöder, R. Findeisen, Z. Nagy, and F. All-
Robot. Automat. Lett., vol. 7, no. 2, pp. 2819–2826, Apr. 2022. [Online].
göwer, “Real-time optimization and nonlinear model predictive control
Available: https://ieeexplore.ieee.org/document/9691797/
of processes governed by differential-algebraic equations,” J. Process
[6] N. A. Spielberg, M. Brown, and J. C. Gerdes, “Neural network model
Control, vol. 12, no. 4, pp. 577–585, 2002. [Online]. Available: https:
predictive motion control applied to automated driving with unknown
//linkinghub.elsevier.com/retrieve/pii/S0959152401000233
friction,” IEEE Trans. Control Syst. Technol., vol. 30, no. 5, pp. 1934–1945,
[29] J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl,
Sep. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/
“CasADi: A software framework for nonlinear optimization and optimal
9638389/
control,” Math. Program. Computation, vol. 11, no. 1, pp. 1–36, 2019.
[7] G. Williams et al., “Information theoretic MPC for model-based reinforce-
[Online]. Available: http://link.springer.com/10.1007/s12532-018-0139-
ment learning,” in Proc. Int. Conf. Robot. Automat., 2017, pp. 1714–1721.
4
[Online]. Available: https://ieeexplore.ieee.org/document/7989202/
[30] R. Verschueren et al., “acados: A modular open-source framework for
[8] G. Shi et al., “Neural lander: Stable drone landing control using learned
fast embedded optimal control,” Math. Program. Comput., vol. 14, no. 1,
dynamics,” in Proc. Int. Conf. Robot. Automat., 2019, pp. 9784–9790.
pp. 147–183, 2021. [Online]. Available: http://arxiv.org/abs/1910.13753
[Online]. Available: http://arxiv.org/abs/1811.08027http://dx.doi.org/10.
[31] F. Ramzan et al., “A deep learning approach for automated diagnosis and
1109/ICRA.2019.8794351
multi-class classification of alzheimer’s disease stages using resting-state
[9] M. Faessler, A. Franchi, and D. Scaramuzza, “Differential flatness of
FMRI and residual neural networks,” J. Med. Syst., vol. 44, pp. 1–16, 2020.
quadrotor dynamics subject to rotor drag for accurate tracking of high-
[32] D. Falanga, P. Foehn, P. Lu, and D. Scaramuzza, “PAMPC: Perception-
speed trajectories,” IEEE Robot. Automat. Lett., vol. 3, no. 2, pp. 620–626,
aware model predictive control for quadrotors,” in Proc. IEEE/RSJ Int.
Apr. 2018.
Conf. Intell. Robots Syst., 2018, pp. 1–8. [Online]. Available: https://
[10] K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement
ieeexplore.ieee.org/document/8593739/
learning in a handful of trials using probabilistic dynamics models,” in
[33] M. Kamel, T. Stastny, K. Alexis, and R. Siegwart, “Model predic-
Proc. Neural Inf. Process. Syst., pp. 4754–4765, 2018. [Online]. Available:
tive control for trajectory tracking of unmanned aerial vehicles us-
http://arxiv.org/abs/1805.12114
ing robot operating system,” in Proc. Robot Operating Syst., 2017,
[11] N. O. Lambert, D. S. Drew, J. Yaconelli, S. Levine, R. Calandra, and
pp. 3–39.
K. S. J. Pister, “Low-level control of a quadrotor with deep model-
[34] P. Foehn et al., “Agilicious: Open-source and open-hardware agile
based reinforcement learning,” IEEE Robot. Automat. Lett., vol. 4, no. 4,
quadrotor for vision-based flight,” Sci. Robot., vol. 7, no. 67, 2022,
pp. 4224–4230, Oct. 2019. [Online]. Available: https://ieeexplore.ieee.
Art. no.eabl6259.
org/document/8769882/
[12] E. Maddalena, C. da S. Moraes, G. Waltrich, and C. Jones, “A neural
network architecture to learn explicit MPC controllers from data,” IFAC-
PapersOnLine, vol. 53, no. 2, pp. 11362–11367, 2020. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S2405896320308442

Authorized licensed use limited to: University of Stellenbosch. Downloaded on March 30,2023 at 13:16:56 UTC from IEEE Xplore. Restrictions apply.

You might also like