0% found this document useful (0 votes)
4 views

DeepLearningControl

This paper explores the use of neural networks to learn quadrotor dynamics for flight control, aiming to synthesize controllers for various trajectories beyond those used in training. It demonstrates that a neural network can effectively model the nonlinear couplings between translational and rotational motions, allowing for control of a quadrotor even on unfamiliar trajectories. The authors validate their approach through experiments, showing that the learned dynamics model can generalize well to new flight conditions.

Uploaded by

Dr. RAMPRASADH.C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DeepLearningControl

This paper explores the use of neural networks to learn quadrotor dynamics for flight control, aiming to synthesize controllers for various trajectories beyond those used in training. It demonstrates that a neural network can effectively model the nonlinear couplings between translational and rotational motions, allowing for control of a quadrotor even on unfamiliar trajectories. The authors validate their approach through experiments, showing that the learned dynamics model can generalize well to new flight conditions.

Uploaded by

Dr. RAMPRASADH.C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Learning Quadrotor Dynamics Using Neural Network for Flight Control

Somil Bansal∗ Anayo K. Akametalu∗ Frank J. Jiang Forrest Laine Claire J. Tomlin

Abstract— Traditional learning approaches proposed for con-


trolling quadrotors or helicopters have focused on improving
performance for specific trajectories by iteratively improving
upon a nominal controller, for example learning from demon-
strations, iterative learning, and reinforcement learning. In
these schemes, however, it is not clear how the information
gathered from the training trajectories can be used to synthesize
controllers for more general trajectories. Recently, the efficacy
of deep learning in inferring helicopter dynamics has been
shown. Motivated by the generalization capability of deep
learning, this paper investigates whether a neural network
based dynamics model can be employed to synthesize control for
trajectories different than those used for training. To test this,
we learn a quadrotor dynamics model using only translational
and only rotational training trajectories, each of which can Fig. 1: A picture of Crazyflie 2.0 quadrotor flying during one
be controlled independently, and then use it to simultaneously
control the yaw and position of a quadrotor, which is non-trivial of our experiments.
because of nonlinear couplings between the two motions. We
validate our approach in experiments on a quadrotor testbed. veloped, the performance of these control schemes depends
heavily on the underlying model. In [8], the authors present
I. I NTRODUCTION an in-depth study of some of the advanced aerodynamics
System identification, the mathematical modeling of a effects that can affect quadrotor flight, like blade flapping and
system’s dynamics, is one of the most basic and important effect of airflow. These effects, however, are hard to model
components of control. Constructing an appropriate model and hence difficult to take into account while designing a
is often the first step in designing a controller. Modeling controller. To circumvent these modeling issues, data-driven,
accuracy, therefore, directly impacts controller success and learning-based control schemes have also been proposed (see
performance, as inaccuracies in the model appear to the [23], [6] and references therein). An interesting approach
controller as external disturbances. has been presented in [1] to successfully perform advanced
Quadrotors have recently emerged as a popular platform aerobatics on a helicopter under autonomous control using
for unmanned aerial vehicle (UAV) research, due to the apprenticeship learning. In this approach, a helicopter is
simplicity of their construction and maintenance. Quadrotors flown on a trajectory repeatedly, and a target trajectory for
can be highly maneuverable, and have the potential to hover, control and time-varying dynamics are estimated from them.
take off, fly and land in small areas due to a vertical take Together, these trajectories allow for successful control of
off and landing (VTOL) capability [8]. A quadrotor has four the helicopter through advanced aerobatics. One limitation
rotors located at the four corners of a cross frame, and is of the approaches above is that they are limited to designing
controlled by changing the speed of rotation of the four rotors a controller for specific trajectories; for a new trajectory, one
[4], [21]. However, the system is under-actuated, nonlinear has to learn the controller again from scratch.
and difficult to control on aggressive trajectories. Most of The apprenticeship learning approach, however, indicates
the work in this area focuses on designing controllers that that the difficulty in modeling helicopter dynamics does not
are derived from a linearization of the model around hover come from stochasticity in the system or unstructured noise
conditions and are stable only under reasonably small roll in the demonstrations [1]. Rather, the presence of unobserved
and pitch angles. While advanced control methods such as states causes simple models to be inaccurate, even though
feedback linearization [20], adaptive control [13], sliding- repeatability in the system dynamics is preserved across
mode control [22], H∞ robust control [16] have been de- repetitions of the same maneuver. One can thus use system
The authors are with the Department of Electrical Engineering and
data to model these dynamics directly in the entire state space
Computer Sciences, University of California, Berkeley, CA 94720. rather than for specific trajectories.
{somil, kakametalu, forrest.laine, fjiang6o2, One potential approach can be to model such dynamics
tomlin}@eecs.berkeley.edu
∗ Both authors contributed equally to this work. This work is supported using neural networks. Neural networks (NN) are known to
by the NSF CPS project ActionWebs under grant number 0931843, NSF be universal function approximators; their structure allows
CPS project FORCES under grant number 1239166, and by ONR under the them to model highly nonlinear functions and unobserved
HUNT, SMARTS and Embedded Humans MURIs, and by AFOSR under
the CHASE MURI. The research of A.K. Akametalu has received funding states directly from the observed data, which might in general
from the UC Berkeley Chancellor’s Fellowship. be hard to model directly [12]. Moreover, they can learn a
generalized model that can be extended beyond the observed based model, the parameters can be thought of as degrees of
data. Motivated by this, the authors in [14] propose a NN freedom a NN has to learn different nonlinear function.
model to learn local unmodeled dynamics for a helicopter The quadrotor system is modeled as a rigid body  with a
in different parts of the state space; thus, one need not learn twelve dimensional state vector s := p v ζ ω , which
the unmodeled dynamics for a specific trajectory. However, includes the position p = (x, y, z) in a North-East-Down
it is not clear if the proposed NN-based model can be used inertial reference frame I, linear velocities v = (ẋ, ẏ, ż) in
to control the system, and if the learned dynamics accurately I, attitude (orientation) represented by Euler angles ζ =
represent the system beyond the data it was trained on. In this (φ , θ , ψ), and angular velocities ω = (ωx , ωy , ωz ) expressed
paper, we answer these practically important questions and in the body-fixed coordinate frame B of the quadrotor. The
investigate (i) whether a highly nonlinear dynamics model Euler angles parameterize the coordinate transformation from
given by a NN can be effectively used to design a controller I to B with the standard yaw-pitch-roll convention, i.e. a
for a quadrotor and (ii) whether it is general enough to rotation by ψ about the z-axis in the inertial frame, followed
be used to design a controller for the trajectories that the by a rotation of θ about the y-axis of the body-fixed frame,
network was not trained on. and finally another rotation of φ about the x-axis in the new
For this purpose, we collect state-input data of a nano- body-fixed frame. This is written compactly as
quadrotor Crazyflie 2.0 by flying it on the trajectories that B
consist of translational or rotational motion, but not both. I R(φ , θ , ψ) = Rx (φ )Ry (θ )Rz (ψ), (1)
We next train a feed-forward Rectified-Linear Unit (ReLU) where Rx ,Ry , and Rz are basic 3 × 3 rotation matrices about
NN to learn the state-space dynamics of Crazyflie. To test their respective axes.
the generalization capabilities of the trained NN, we use the  The system is controlled via four inputs u :=
learned NN model to control the quadrotor on a trajectory u1 u2 u3 u4 , where u1 is the thrust along the z-
that consists of a simultaneous translational and rotational axis in B, and u2 , u3 and u4 are rolling, pitching and
motion. A non-zero yaw angle introduces highly nonlinear yawing moments respectively, all in B 1 . The system evolves
couplings in the rotational and translational dynamics, and according to dynamics:
is the primary motivation behind why the quadrotor position    
ṗ v
control is studied generally regulating yaw to zero and vice-  v̇   fv (s, u; α1 ) 
versa [2], [18]. Thus, to successfully perform such a motion, ṡ = 
 ζ˙  = f (s, u; α) = 
  , (2)
R̂ω 
the NN needs to infer these couplings from the individual
ω̇ fω (s, u; α2 )
translational and rotational trajectories it was trained on, and
yet the model should be simple enough to design a controller. where the system model is parameterized by α := (α1 , α2 ).
Our main contributions are: In Section III, we explain how fv and fω are exactly
• learning the dynamics of a quadrotor using a NN that parameterized by α1 and α2 , and how we can determine these
is simple enough to be used for control purposes, but parameters. Note that ζ˙ 6= ω in general. ζ˙ , or Euler rates
complex enough to accurately model system dynamics; as they are called, can be obtained by rotating the angular
• demonstrating that the current state-input data is suffi- velocities to the inertial frame [2], [8] and are given by:
cient to learn the dynamics to a good accuracy; 
1 sin φ tan θ cos φ tan θ

• showing that NN can generalize the dynamics to learn ζ˙ = R̂ω, R̂ = 0 cos φ − sin φ  . (3)
nonlinear couplings between translational and rotational 0 sin φ cos φ
cos θ cos θ
motions, even when the training data does not capture
these couplings significantly; thus, the NN model can The unknown components in (2) are fv and fω , the linear
be used to fly the trajectories it was not trained on. (or translational) and angular (or rotational) acceleration that
the quadrotor undergoes, which we aim to approximate with
II. Q UADROTOR S YSTEM I DENTIFICATION a NN as a function of state, control, and model parameters.
In this section, we introduce our general quadrotor model The system identification task for the quadrotor is thus to
and formulate the system identification problem for a quadro- determine α1 (resp. α2 ), given observed values of fv (resp.
tor system. Consider a dynamical system with state vector s fω ), s, and u. In this work, we minimize mean squared
and control inputs u. The goal of the system identification prediction error (MSE) over a training set of collected data,
process is to find a function f which maps from state-control solving
T
space to state-derivative: 1
min ∑ k f˜v,t − fv (st , ut ; α1 )k2 , (4)
α1
t=1 T
ṡ = f (s, u; α),
where the system model is parameterized by α. The system where f˜v,t are the observed values of fv . A similar optimiza-
identification task then becomes to find, given input and tion problem can be defined for fω . Depending on the forms
state data, parameters α that minimize the prediction error. of fv and fω , (4) results in a linear or a nonlinear least
Note that for a physics-based model, α generally captures squares problem.
the physical properties of the system (for example, mass, 1 These inputs are generated by varying the angular speeds of the four
moment of inertia, etc. for a quadrotor); however, for a NN propellers, which map linearly to the inputs.
III. N EURAL N ETWORK M ODEL
In this section, we present a neural network architecture β WT wT

to solve the system identification problem in (4) and com-


W Tβ + B wTΦ(WTβ + B) + b
pute the parameters α1 , α2 that minimize the MSE between fv
predicted and observed data.
As more and more data is being produced, and more and
more computational power continues to become available, B b
an important opportunity lies in harnessing data towards Hidden Layer Output Layer
autonomy. In recent years, the fields of computer vision
and speech processing have not only made significant leaps Fig. 2: The neural network architecture used to learn fv .
forward, but also rapidly increased their rate of progress, The NN consists of two layers, a hidden ReLU layer and an
largely thanks to developments in deep learning [10], [12]. output layer. The parameters to be learned during the training
Thus far the impact of deep learning has largely been in process are α1 = (W, w, B, b). A similar architecture was used
supervised learning. In supervised learning, each (training) to learn α2 (and hence fω ).
example is a pair consisting of an input value (e.g., images)
and a desired output value (e.g., ‘cat’, ‘dog’, etc. depending
used to learn fω . Once the training process is complete, we
on what is in the image). After learning on the training data,
can obtain a model for fv and fω by plugging in the optimal
the system is expected to make correct predictions for future
α1 and α2 , obtained during training, in (5). We, however,
(unseen) inputs. Supervised learning can thus also be thought
defer the exact details of the used hyperparameters and the
as a direct high dimensional regression (or classification).
training process until section V-B.
Motivated by these advances, we train a multiple layer
Remark 1: Note that we only feed the current state and
NN (i.e., “deep learn” a NN) using supervised learning to
input in the network, and not any information on the past
predict the next state of the system based on current state and
states and inputs unlike [14]. Although giving the past state-
input. Our design is motivated by [14], wherein the authors
input information will allow the NN to learn a more complex
deep learn the helicopter dynamics with a Rectified Linear
(and potentially more accurate) system dynamics model, it
Unit (ReLU) Network Model. A ReLU network model is
will also make it harder to design a controller for the resultant
a two-layer NN, consisting a hidden layer and an output
dynamics. So a simple input structure is chosen to make sure
layer, where the rectified-linear transfer function is used in
that the NN can be effectively employed to design a feedback
the hidden layer. Algebraically, the model can be written as:
controller.
fv (β ; α1 ) := wT φ (W T β + B) + b, (5) IV. C ONTROL D ESIGN
where fv represents the unknown linear acceleration com- In this section, we aim to design a controller for the
ponent in (2), which is modeled by a NN whose input is quadrotor system in (2) to stabilize it on complex trajectories
given by β := (s, u) ∈ R|β | . The NN has a hidden layer that involve both rotational and translational motions, such as
with N units with weight matrix W ∈ R|β |×N and bias vector a sinusoid-yaw trajectory (i.e., a trajectory where a quadrotor
B ∈ RN , and a linear output layer of 3 units with weight is flying on a sinusoid in the position coordinates (for
matrix w ∈ RN×3 and bias vector b ∈ R3 . φ represents the example XY plane) while also yawing).
activation (or transfer) function of hidden units (also called In general, in a trajectory tracking problem, it may be
ReLU activation function) and is given by φ (·) = max(0, ·). impossible to exactly track a given desired trajectory due
The architecture of the NN is presented in Figure 2, which to limits imposed by the constraints on the system or when
can be interpreted as follows: the input layer takes in the the trajectory is not dynamically feasible. This can happen,
current state and input of the system. Each of N hidden for example, for complex high dimensional systems such as
units computes the inner product of β and one of the quadrotors where it is relatively straightforward to specify
columns of W . The hidden units add a bias B to the inner the desired position and angular trajectories, but non-trivial
product and rectify this value at zero. The output layer is to specify the linear and angular velocities such that the
a linear combination of the hidden units, plus a final bias overall trajectory satisfies the system dynamics. It is therefore
b. Intuitively, each hidden unit linearly partitions the input common practice to first compute a reference trajectory,
space into two parts based on W and B. In one part the unit which is the closest trajectory to the given desired trajectory
is inactive with zero output, while in the other it is active satisfying system dynamics, and then, the optimal reference
with positive output. Together, all hidden units partition the trajectory is tracked instead [17]. We discuss the reference
state space into polytopes. In each of these polytopes, the trajectory calculation in section IV-A. The stabilization of
model has flexibility to learn the local dynamics. quadrotor on the reference trajectory is discussed in IV-B.
The goal of the training process is to determine (or
“learn”) the parameters α1 := (W, B, w, b) that minimize the A. Computation of a Feasible Reference
MSE between the predicted acceleration fv and the observed Once we have computed fv and fω during the NN training
acceleration f˜v subject to (5). A similar NN architecture is phase, the full model of the quadrotor can be obtained from
(2). Our goal in this section is to compute a dynamically the model used in the optimization and actual dynamics
feasible reference given a desired trajectory and the full of the quadrotor and unmodeled disturbances. This can
system model (2). Since the system is controlled at discrete be mitigated via feedback. LQR is a well-known state
time points in our experiments, for ease of presentation we feedback scheme for designing stabilizing controllers for
consider a discrete time approximation of (2): trajectory tracking in linear systems. The technique can
also be extended to nonlinear systems by linearizing the
s(n + 1) = s(n) + f (s(n), u(n); α)∆t, (6) dynamics of the system about the desired trajectory. To
stabilize the quadrotors on a (feasible) reference trajectory,
where n indexes the time step, ∆t is the sampling rate, s(n)
we use a LQR feedback controller designed for the near
and u(n) are the state and input of the quadrotor at time n∆t,
hover model of quadrotor, along with a reference rotation
and α := (α1 , α2 ) are the parameters learned during the NN
before applying the feedback to correct for a non-zero yaw.
training. Given a horizon NH and a desired trajectory over
NH A quadrotor is said to be in the hover condition if the
that horizon sd := {sd (0), sd (1), . . . , sd (NH )} our goal is to
N plane of the rotors is perpendicular to the vertical and it is at
find a control signal u H := {u(0), u(1), . . . , u(NH )} that will
zero velocity relative to some inertial frame. The quadrotor
achieve the desired trajectory when applied to the quadrotor.
position and orientation in space can be modified from the
In most cases the desired trajectory may not be dynami-
hover condition by varying the speeds of the motors from
cally feasible, so no such control signal exists. Instead, we
their hover speed. An approximate linear model can be
look for a dynamically feasible trajectory that is “as close
derived that accurately represents the quadrotor dynamics for
as possible” to the desired trajectory. We thus want to solve
small perturbations from the hover state [3], [2]. Since the
the following optimization problem:
quadrotor hover dynamics are derived around zero yaw, they
no longer accurately represent the dynamics of the quadrotor
NH
argmin ∑ ks(n) − sd (n)k2 for non-zero yaw states. In fact, there are highly nonlinear
N
s ,u
H NH n=0 couplings between translational and rotational accelerations
s. t. s(n + 1) − s(n) = f (s(n), u(n); α)∆t, n = 0, . . . , NH − 1 for non-zero yaw [18]. However, for any given non-zero
(7) yaw, one can consider a different inertial frame such that
In words, we want to find the trajectory that minimizes the the yaw is zero with respect to the new inertial frame. In
Euclidean distance to the desired trajectory, and the control this inertial frame, the quadrotor is in near hover condition
that achieves such a trajectory. Since the NN output (5) is and hence one can use the LQR controller designed for the
nonlinear, f is nonlinear; therefore, the above optimization near hover model to control the system. We formalize this
problem is a non-convex problem. In this paper, we use the control scheme below.
sequential convex optimization (SCP) procedure proposed in Given a dynamically feasible reference trajectory sN∗ H , and
N
[19] to solve this non-convex optimization problem. SCP nominal (open-loop) signal u∗ H , define the error dynamics
solves a non-convex problem by repeatedly constructing a s̄(n) = s(n) − s∗ (n) and compensation input ū(n) = u(n) −
convex subproblem-an approximation to the problem around u∗ (n). Also, let the system dynamics be given by s(n + 1) =
the current iterate x. A local convex approximation of the As(n) + Bu(n). The dynamics of the error signal can thus be
non-convex constraints is added along with a penalty co- obtained by:
efficient in the objective function. This subproblem can be
s̄(n + 1) = As̄(n) + Bū(n). (8)
efficiently solved using convex solvers and used to generate
a step ∆x that makes progress on the original problem. The Subject to the dynamic constraints (8), the objective in LQR
penalty co-efficient is then adjusted during the optimization is to find a controller that minimizes the following quadratic
to ensure that the constraint violation is driven to zero. For cost starting from a given error state s̄0 ,
more details on the optimization procedure, we refer the N
interested readers to [19]. JN (s̄0 ) = ∑ s̄(n)T Qs̄(n) + ū(n)T Rū(n), (9)
Remark 2: In our experience, solving the above opti- n=0
mization problem is very challenging even for relatively
where Q and R are positive definite matrices of appropriate
simpler NN structures (like in (5)) because of highly non-
size, and s̄(0) = s̄0 . The minimum cost to go JN∗ can be solved
linear outputs of neural networks. Moreover, this complexity
recursively via dynamic programming, which also yields a
increases further with more complex network structures. This
time-invariant state feedback matrix K that takes in the error
is another reason for choosing a simple NN structure in our
state and outputs the appropriate compensation control. The
analysis.
closed loop control is thus given by
B. Linear Quadratic Regulator (LQR)
u(n) = u∗ (n) + K(s(n) − s∗ (n)). (10)
Let us define the solution to (7) as
s∗NH := {s∗ (0), s∗ (1), . . . , s∗ (NH )} and u∗ NH := For our system, sN∗ H , uN∗ H are obtained from SCP in Section
{u∗ (0), u∗ (1), . . . , u∗ (NH − 1)}. In practice, applying the IV-A. A and B matrices are obtained from the linear near-
control signal uN∗ H could yield a trajectory that significantly hover model proposed in [3], [2]. A feedback controller
differs from sN∗ H due to (any remaining) mismatch between can thus be obtained by solving (9) subject to (8), which
Controller Block Diagram

is a convex-problem. However, when yaw is non-zero, the


VICON
near hover model in (8) is no longer valid and hence state
100 Hz
feedback law given in (10) will no longer be able to stabilize IMU
Sensors 100 Hz
the system. So at every time-step, we first rotate our inertial 250 Hz LQR
frame to another inertial frame in which yaw is zero. Controller
250 Hz PD
Let the error at step n is given by s̄(n) := Motors Controller
100 Hz
Reference
( p̄(n), v̄(n), ζ¯ (n), ω̄(n)), where p̄(n) = (x̄(n), ȳ(n), z̄(n)) Trajectory
Crazyflie 2.0 Ground Sta4on
represents the error in position, and (v̄(n), ζ¯ (n), ω̄(n))
similarly represent errors in linear velocity, orientation and Fig. 3: Control block diagram used to stabilize Crazyflie
angular velocity respectively. Also, let the yaw at time n be during experiments. At the ground station, LQR is running
ψ(n). The rotation of the inertial frame is thus equivalent to at 100Hz. On-board the Crazyflie, PD controller is running
rotating the error in (x, y) position and (ẋ, ẏ) by a rotation at 250Hz. Together they are able to stabilize the Crazyflie.
matrix as follows:
ẋ¯R (n) ẋ¯(n)
       
x̄R (n) x̄(n) The Crazyflie 2.0 is an open source nano quadrotor
= TR , = TR ¯ , (11)
ȳR (n) ȳ(n) ẏ¯R (n) ẏ(n) platform developed by Bitcraze. Its small size, low cost, and
where the rotation matrix is given by robustness make it an ideal platform for testing new control
  paradigms. Recently it has been used to exemplify aggressive
cos(ψ(n)) sin(ψ(n))
TR = . (12) flight in cluttered environments and for human robot inter-
−sin(ψ(n)) cos(ψ(n))
action research [11], [9]. We use Crazyflie to collect training
The corresponding rotated error vector is data as well as for the sinusoid-yaw experiments in this paper.
s̄R (n) := ( p̄R (n), v̄R (n), ζ¯ (n), ω̄(n)), where p̄R (n) = We retrofit the quadrotor with reflective markers to allow
(x̄R (n), ȳR (n), z̄(n)) and v̄R (n) = (ẋ¯R (n), ẏ¯R (n), ż¯(n)). Our for accurate position and velocity estimation via the VICON
overall state feedback control is thus given by: motion capture system at 100Hz. Furthermore, Crazyflie is
equipped with an on-board inertial measurement unit (IMU)
u(n) = u∗ (n) + K s̄R (n). (13) that provides orientation and angular velocity measurements
Remark 3: Since the control law in (13) is derived using at 250 Hz. VICON and IMU together thus provide the 12
near-hover and only yaw rotation assumption, it is not the dimensional state of the system and we can use the feedback
optimal feedback control when roll and pitch angles change control in (13) to stabilize the Crazyflie around the reference
significantly; however, this simple control scheme works trajectory; however, our experiments indicate that this control
well in practice and has been employed to fly an inverted scheme is not fast enough to keep the system stable. To
pendulum in [5] as well as by us in all our experiments. overcome this problem, we implemented an on-board PD
Nevertheless, more sophisticated control techniques can be controller (proposed in [11]), which takes into account only
developed based on the system dynamics [18]. the angular position and angular velocities that are available
Remark 4: Note that the state feedback control law in at a higher frequency of 250 Hz.
(13) is good at error correction only when an accurate      
open-loop state xN∗ and control uN∗ are provided. Since the u2 φ − φdes ωx − ωx,des
u3  = K p  θ − θdes  + Kd ωy − ωy,des  , (14)
open-loop control depends heavily on the system model,
the tracking with (13) can only be as good as the system u4 ψ − ψdes ωz − ωz,des
dynamics model itself (for more details see Section V-C).
In particular, for sinusoid-yaw reference trajectories, the NN where K p and Kd are 3 × 3 matrices, (φdes , θdes , ψdes ) is
should be able to learn the couplings between translational the desired attitude, and (ωx,des , ωy,des , ωz,des ) is the desired
and rotational motion for a good tracking. angular rate. Together LQR (100Hz) and PD controller
(250Hz) are able to stabilize the Crazyflie around most of the
C. Crazyflie 2.0 and On-board PD Controller trajectories. Note that we also did the same augmentation on
To use the feedback control law in (13), we need the our system in (2) so that the new inputs to the system are now
full state s(n). However, in practice, this information is û := (u1 , φdes , θdes , ψdes , ωx,des , ωy,des , ωz,des ), where mapping
obtained from different sensors which might run at different between inputs is given by (14). The reference trajectory as
frequencies and hence the control update rate is limited by well as the feedback law is thus computed for the augmented
the frequency of the slowest sensor. This frequency, however, system. The full block diagram of our controller is shown in
might not be enough to effectively control the system, and a Figure 3.
low-level controller is thus required in practice to control the
system between the two updates of the feedback loop. In this V. E XPERIMENTS
section, we first provide more details about our experiment
testbed Crazyflie 2.0 and different sensors, and then design In this section, we present the results of our experiments.
a low-level PD controller to control the system between the We also discuss the data collection process to train the neural
feedback updates. networks and the hyper-parameters used for training.
A. Data Collection Model Error

Roll acceleration (rad/s 2)


2
To collect data for training, we flew Crazyflie au- 1
tonomously on a variety of trajectories, for example sinusoids
0
in XY, XZ and YZ planes (but no yaw), and fixed position
yaw-rotations, as well as manually on unstructured flights. -1

For these flights, we recorded the state (s) and input (û) data. -2
0 2 4 6 8 10
Note that since we do not necessarily care about how closely time(s) Measured
we track a trajectory during the data collection process, the NN output
4

Y acceleration (m/s 2)
feedback controller discussed in Section IV-B is sufficient
2
to fly Crazyflie directly on the desired trajectories (that is,
no feasible reference calculation is required); however, in 0
general, the entire data can also be collected manually with -2
experts flying the system.
-4
A picture of Crazyflie flying during one of our experiments 0 2 4 6 8 10
is shown in Figure 1. One of our experimental videos can be time(s)

found at: https://www.youtube.com/watch?v=QREeZvHg0lQ.


Fig. 4: Observed and predicted values for the roll and y
For communication between the ground station and
accelerations. The NNs are able to learn the acceleration
Crazyflie, we used the Robot Operating System (ROS)
models fairly accurately even with just the current state and
framework [15], [7]. In total there are 2400 seconds of
input, indicating that the past states and inputs may not be
flight time recorded, which correspond to 240, 000 (s, û) data
required to learn the dynamics, and hence are avoided in this
samples. Note that since we augment the system with a PD
work to keep the control design simple.
controller, we collect û during the flights, as opposed to u.
B. Neural Network Training
We next train two NNs (denoted as NN1 and NN2 here 60% of the total collected data was used for training,
on) using the collected data to learn the linear and angular 25% was used for validation purposes and for tuning hyper
acceleration components fv and fω such that the MSE in (4) parameters, and the rest was used for testing purposes. All
is minimized. Before training the NNs, we follow a few data the weights (W, w) and biases (B, b) were initially sampled
pre-processing steps: from normal Gaussian distribution. For training the networks,
we use the Neural Network Toolbox of MATLAB. We
• Since we collect û (input to the PD-augmented system),
use the Resilient backpropagation learning algorithm. The
we first derive u (input to the quadrotor system in (2))
learning rate, momentum constant, regularization factor and
from û using (14), and use it as the input to the NNs
the number of hidden units were set at 0.01, 0.95, 0.1
along with the state information. This is to make sure
and 100 respectively, but later tuned using the validation
that the learned system dynamics are independent of the
data. Overall, the learning algorithm makes about 100 passes
control scheme.
through the data, and optimize the weights and biases to
• Instead of providing orientation angles as the input to
minimize the loss function. Once the training is complete,
the NNs, we provide sines and cosines of the angles.
the optimal weights and biases are obtained, which can be
This is to make sure that NNs do not differentiate
substituted in (5) to obtain the models for fv and fω , and
between 0 and 2π radians.
finally in (2) to get the full dynamics model.
• We do not provide position as the input to any of
the neural networks, as the translational and rotational The (normalized) MSE numbers obtained for the training
accelerations should be position independent. and the testing data after learning fv are 0.134 and 0.135
• For each NN, we scale the observed outputs (for ex- respectively, and that for fω are 0.341 and 0.344. Since
ample, x, y, z components of translational acceleration) the MSE numbers are very close for training and testing,
such that each of them has zero mean and unity standard it indicates that the NNs do an accurate prediction on the
deviation. This is to make sure that the NNs give equal unseen data as well, meaning that our NNs are not overfitting
weightage to MSE in the three components. on the training data. In Figure 4, we show the observed
The NN structure for each acceleration component is given values and the predicted outputs of the trained NNs for roll
by (5). The objective (also called loss function) for each NN and y accelerations. As evident from the figure, the NNs
is to minimize the MSE between observed and predicted have been successfully able to learn the dynamics to a good
accelerations. The input to NN1 is (v, ω, sin(ζ ), cos(ζ ), u1 ) accuracy. This indicates that a simple two-layer feed-forward
and is (v, ω, sin(ζ ), cos(ζ ), u2 , u3 , u4 ) for NN2. Note that NN structure used in this paper is sufficient to learn quadrotor
we do not include u2 , u3 , u4 in the input to NN1 because dynamics to a good accuracy. Moreover, only the current
our experiments indicate that providing them result in over- state and input information is sufficient to learn the dynamics
fitting. Moreover, the physics of a quadrotor hints that the models, and hence past state and input information, which
translational acceleration should not depend on these inputs will potentially make the model as well as control design
[2]. The same argument holds for u1 and NN2. more complex, has not been used as an input to the NNs.
3 1
C. Sinusoid-yaw Trajectory Tracking Using NN Models NN Model
Model Free
2 0.5 Desired
Once NN1 and NN2 are trained, the full quadrotor model

x (m)

y (m)
is available through (2). In this section, our goal is to use this 1 0

model to track a sinusoid-yaw trajectory, where quadrotor 0 -0.5


is undergoing a sinusoidal motion in the XY plane while
yawing at the same time. Since the desired trajectory consists -1
0 1 2 3 4
-1
0 1 2 3 4
of a simultaneous translational and rotational movement, the time(s) time(s)

learned NN models should be able to capture the nonlinear -0.8 8

couplings between these two movements to accurately track 6


the trajectory. -0.9

yaw (rad)
4

z (m)
Using the full model, we first compute a dynamically 2
-1
feasible reference that is as close as possible to the de-
0
sired sinusoid-yaw trajectory (shown in Figure 5), using
-1.1 -2
the sequential convex programming method outlined in 0 1 2 3 4 0 1 2 3 4
time(s) time(s)
Section IV-A. The reference trajectory is then flown us-
ing the near-hover LQR scheme along with a yaw rota- Fig. 5: The reference, NN model and model-free trajec-
tion as described in Section IV-B. One of the tracking tories obtained during the experiments. The NN model
videos recorded during our experiments can be found at: track the desired trajectory closely even though it involves
https://www.youtube.com/watch?v=AeIfZbkjWPA. We label both translational and rotational motion at the same time,
the results corresponding to this experiment as ‘NN model’ which the NNs were not explicitly trained on, indicating the
trajectory. generalization capabilities of deep neural networks.
The desired and NN model trajectories are shown in Figure
5. As evident from the figure, the NN model trajectory is able with a careful choice of the network architecture and its
to track the desired trajectory closely. This illustrates that: inputs, the NN model can be used effectively to control the
• the trained NNs are able to generalize the dynamics system. Thus, deep neural networks seem to present a good
beyond the training data. In particular, the NN models alternative for the system identification of complex systems
capture the nonlinear couplings between translational such as quadrotors, especially in scenarios in which it is hard
and rotational accelerations, and can be used to track to derive a physics-based model of the system.
the trajectories they were not trained on. Remark 5: Note that even though neural networks present
• even simple NN architectures, such as one used in this one approach to identify the complex system dynamics,
paper, have good generalization capabilities and can be it is not the only approach. In our experience, one can
used to control a quadrotor on complex trajectories. use the general nonlinear model of quadrotor derived using
From the results thus far it is not clear how much of Newtonian-Euler formalism [2], [8] along with the control
the control performance is due to the open-loop signal scheme proposed in Section IV to get good tracking as well.
derived from the NN model, since the LQR control may Our goal in this paper, however, was to test the efficacy
be correcting for model inconsistency. Therefore, we opted of neural network in learning a dynamic model, and not to
to fly Crazyflie using simply the LQR controller and desired compare different system models. In practice, one can use
trajectory as a reference and no open-loop control. We label the physics-based model to collect data about the system and
the results corresponding to this experiment as the ‘model- then use a neural network to learn incremental unmodeled
free’ trajectory. dynamics (on top of the physics-based model) which might
In Figure 6, we show the (absolute) tracking error for the lead to an improved control performance.
NN model and model-free trajectories. The NN model has
a significantly lower tracking error compared to the model- VI. C ONCLUSION
free trajectory, indicating that the open-loop control derived Traditional learning approaches proposed for controlling
from the NN model results in better tracking of the desired quadrotors have focused on improving the control perfor-
trajectory. The reduced tracking error is thus the result of mance for specific trajectories. In this work, we use deep
the availability of an accurate dynamics model. In general, neural networks to generalize the dynamics of the system
to ensure a small tracking error, we need a good open-loop beyond the trajectories used for training. Our experiments
control, which in turn need a good dynamics model that indicate that even simple NNs such as feed-forward net-
accurately represents the system dynamics around the desired works can also have good generalization capabilities and can
trajectory. learn the dynamics of a quadrotor to good accuracy. More
Our experiments thus indicate that given the state-input importantly, we demonstrate that the learned dynamics can
data of a complex dynamical system (quadrotors in our case), be used effectively to control the system. Thus NNs are not
deep neural networks are capable of learning the system only useful in being a good function approximator, but in this
dynamics to a good accuracy, and can represent the system instance we can actually exploit the function that it produces
behavior beyond the data they were trained on. Moreover, for control purposes. For future work, it will be interesting
0.6 0.6 [13] C Nicol, CJB Macnab, and A Ramirez-Serrano. Robust adaptive
control of a quadrotor helicopter. Mechatronics, 21(6):927–938, 2011.
x error (m)

y error (m)
0.4 0.4 [14] Ali Punjani and Pieter Abbeel. Deep learning helicopter dynamics
models. In Robotics and Automation (ICRA), 2015 IEEE International
0.2 0.2 Conference on, pages 3223–3230. IEEE, 2015.
[15] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote,
0 0 Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. ROS: an open-source
0 1 2 3 4 0 1 2 3 4 robot operating system. In ICRA workshop on open source software,
time(s) time(s) volume 3, page 5, 2009.
0.2 1.5 [16] Guilherme V Raffo, Manuel G Ortega, and Francisco R Rubio. An
NN Model
integral predictive/nonlinear H∞ control structure for a quadrotor

yaw error (rad)


Model Free
0.15 helicopter. Automatica, 46(1):29–39, 2010.
z error (m)

1
[17] J. B. Rawlings and D. Q. Mayne. Model predictive control: Theory
0.1
and design. Nob Hill Pub., 2009.
0.5
0.05 [18] Anand Sanchez-Orta, Vicente Parra-Vega, Carlos Izaguirre-Espinosa,
and Octavio Garcia. Position–yaw tracking of quadrotors. Journal of
0 0 Dynamic Systems, Measurement, and Control, 137(6):061011, 2015.
0 1 2 3 4 0 1 2 3 4
time(s) time(s)
[19] John Schulman, Jonathan Ho, Alex X Lee, Ibrahim Awwal, Henry
Bradlow, and Pieter Abbeel. Finding locally optimal, collision-free
Fig. 6: (Absolute) Tracking error for model-free and NN trajectories with sequential convex optimization. In Robotics: science
and systems, volume 9, pages 1–10. Citeseer, 2013.
model trajectories. Model-free trajectory has a significantly [20] Ilolger Voos. Nonlinear control of a quadrotor micro-UAV using
higher tracking error compared to the NN model, especially feedback-linearization. In Mechatronics, 2009. ICM 2009. IEEE
in the translational motion, indicating that nonlinear coupling International Conference on, pages 1–6. IEEE, 2009.
[21] Eric A Wan and Ronell Van Der Merwe. The unscented kalman filter
between translational and rotational motions should be taken for nonlinear estimation. In Adaptive Systems for Signal Processing,
into account while designing a controller, which in this work Communications, and Control Symposium 2000. AS-SPCC. The IEEE
is captured by training a NN model that accurately represents 2000, pages 153–158. Ieee, 2000.
[22] Rong Xu and Ümit Özgüner. Sliding mode control of a quadrotor
the system dynamics. helicopter. In Decision and Control, 2006 45th IEEE Conference on,
pages 4957–4962. IEEE, 2006.
[23] Ma Zhaowei, Hu Tianjiang, Shen Lincheng, Kong Weiwei, Zhao
to analyze whether combining a NN model with a physics- Boxin, and Yao Kaidi. An iterative learning controller for quadrotor
based model can lead to an improved control performance, UAV path following at a constant altitude. In Control Conference
thus utilizing both the known information about the system (CCC), 2015 34th Chinese, pages 4406–4411. IEEE, 2015.
as well as the generalization capabilities of NNs.

R EFERENCES
[1] Pieter Abbeel, Adam Coates, and Andrew Y Ng. Autonomous heli-
copter aerobatics through apprenticeship learning. The International
Journal of Robotics Research, 2010.
[2] Randal Beard. Quadrotor dynamics and control Rev 0.1. 2008.
[3] Patrick Bouffard. On-board model predictive control of a quadrotor
helicopter: Design, implementation, and experiments. Technical report,
DTIC Document, 2012.
[4] Eli Brookner. Tracking and kalman filtering made easy, John Wiley
and Sons. Inc. NY, 1998.
[5] Markus Hehn and Raffaello D’Andrea. A flying inverted pendulum. In
Robotics and Automation (ICRA), 2011 IEEE International Conference
on, pages 763–770. IEEE, 2011.
[6] Markus Hehn and Raffaello DAndrea. An iterative learning scheme
for high performance, periodic quadrocopter trajectories. In European
Control Conference (ECC). IEEE, pages 1799–1804, 2013.
[7] Wolfgang Hoenig, Christina Milanes, Lisa Scaria, Thai Phan, Mark
Bolas, and Nora Ayanian. Mixed reality for robotics. In IEEE/RSJ Intl
Conf. Intelligent Robots and Systems, pages 5382 – 5387, Hamburg,
Germany, Sept 2015.
[8] Gabriel M Hoffmann, Haomiao Huang, Steven L Waslander, and
Claire J Tomlin. Quadrotor helicopter flight dynamics and control:
Theory and experiment. In Proc. of the AIAA Guidance, Navigation,
and Control Conference, volume 2, 2007.
[9] Wolfgang Honig, Christina Milanes, Lisa Scaria, Thai Phan, Mark
Bolas, and Nora Ayanian. Mixed reality for robotics. In Intelligent
Robots and Systems (IROS), 2015 IEEE/RSJ International Conference
on, pages 5382–5387. IEEE, 2015.
[10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet
classification with deep convolutional neural networks. In Advances
in neural information processing systems, pages 1097–1105, 2012.
[11] Benoit Landry. Planning and control for quadrotor flight through
cluttered environments. Master’s thesis, Massachusetts Institute of
Technology, 2015.
[12] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.
Nature, 521(7553):436–444, 2015.

You might also like