Dynamical Movement Primitives Learning Attractor Models for Motor Behaviors
Dynamical Movement Primitives Learning Attractor Models for Motor Behaviors
Jun Nakanishi
[email protected]
School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
Heiko Hoffmann
[email protected]
Peter Pastor
[email protected]
Computer Science, Neuroscience, and Biomedical Engineering, University
of Southern California, Los Angeles, CA 90089, U.S.A.
Stefan Schaal
[email protected]
Computer Science, Neuroscience, and Biomedical Engineering, University
of Southern California, Los Angeles, CA 90089, U.S.A.; Max-Planck-Institute
for Intelligent Systems, Tübingen 72076, Germany; and ATR Computational
Neuroscience Laboratories, Kyoto 619-0288, Japan
1 Introduction
1 With low-dimensional, we refer to systems with less than about 100 degrees of
freedom.
330 Ijspeert et al.
literature (Schaal, Sternad, Osu, & Kawato, 2004) to denote point-to-point (nonperiodic
Dynamical Movement Primitives 331
τ ÿ = αz (βz (g − y) − ẏ) + f,
τ ż = αz (βz (g − y) − z) + f, (2.1)
τ ẏ = z,
where τ is a time constant and αz and βz are positive constants. If the forcing
term f = 0, these equations represent a globally stable second-order linear
system with (z, y) = (0, g) as a unique point attractor. With appropriate val-
ues of αz and βz , the system can be made critically damped (with βz = αz /4)
in order for y to monotonically converge toward g. Such a system imple-
ments a stable but trivial pattern generator with g as single point attractor.5
The choice of a second-order system in equation 2.1 was motivated
ẏ equation (instead of the ż equation), which is analytically less favorable. See section 2.1.8.
332 Ijspeert et al.
N
i (t)wi
f (t) = i=1
N
,
i=1 i (t)
where i are fixed basis functions and wi are adjustable weights. Represent-
ing arbitrary nonlinear functions as such a normalized linear combination
of basis functions has been a well-established methodology in machine
learning (Bishop, 2006) and also has similarities with the idea of popula-
tion coding in models of computational neuroscience (Dayan & Abbott,
2001). The explicit time dependence of this nonlinearity, however, creates
a nonautonomous dynamical system or, in the current formulation, more
precisely a linear time-variant dynamical system. However, such a system
does not allow straightforward coupling with other dynamical systems and
the coordination of multiple degree-of-freedom in one dynamical system
(e.g., as in legged locomotion; cf. section 3.2).
Thus, as a novel component, we introduce a replacement of time by
means of the following first-order linear dynamics in x
τ ẋ = −αx x, (2.2)
point of these equations. We call this equation the canonical system because
it models the generic behavior of our model equations, a point attractor
in the given case and a limit cycle in the next section. Given that equation
2.2 is a linear differential equation, there exists a simple exponential func-
tion that relates time and the state x of this equation. However, avoiding
the explicit time dependency has the advantage that we have obtained an
autonomous dynamical system now, which can be modified online with
additional coupling terms, as discussed in section 3.2.
With equation 2.2, we can reformulate our forcing term to become
N
i (x)wi
f (x) = i=1
N
x(g − y0 ) (2.3)
i=1 i (x)
1
i (x) = exp − 2 (x − ci )2 , (2.4)
2σi
where σi and ci are constants that determine, respectively, the width and
centers of the basis functions and y0 is the initial state y0 = y(t = 0).
Note that equation 2.3 is modulated by both g − y0 and x. The modulation
by x means that the forcing term effectively vanishes when the goal g
has been reached, an essential component in proving the stability of the
attractor equations. The modulation of equation 2.3 by g − y0 will lead to
useful scaling properties of our model under a change of the movement
amplitude g − y0 , as discussed in section 2.1.4. At the moment, we assume
that g = y0 , that is, that the total displacement between the beginning and
the end of a movement is never exactly zero. This assumption will be relaxed
later but allows a simpler development of our model. Finally, equation 2.3
is a nonlinear function in x, which renders the complete set of differential
equations of our dynamical system nonlinear (instead of being a linear time-
variant system), although one could argue that this nonlinearity is benign
as it vanishes at the equilibrium point.
The complete system is designed to have a unique equilibrium point at
(z, y, x) = (0, g, 0). It therefore adequately serves as a basis for constructing
discrete pattern generators, with y evolving toward the goal g from any ini-
tial condition. The parameters wi can be adjusted using learning algorithms
(see section 2.1.6) in order to produce complex trajectories before reaching
g. The canonical system x (see equation 2.2) is designed such that x serves
as both an amplitude and a phase signal. The variable x monotonically and
asymptotically decays to zero. It is used to localize the basis functions (i.e.,
as a phase signal) but also provides an amplitude signal (or a gating term)
that ensures that the nonlinearity introduced by the forcing term remains
334 Ijspeert et al.
8 80
1 6 60
0.5 4 40
yd
zd
2 20
y
0 0 0
−2 −20
−0.5 −4 −40
0 0.5 1 1.5 0 0.5 1 1.5 0 0.5 1 1.5
1 0 1
Kernel Activation
−1
−2
0.5 xd 0.5
x
−3
−4
0 −5 0
0 0.5 1 1.5 0 0.5 1 1.5 0 0.5 1 1.5
time [s]
Figure 1: Exemplary time evolution of the discrete dynamical system. The pa-
rameters wi have been adjusted to fit a fifth-order polynomial trajectory between
start and goal point (g = 1.0), superimposed with a negative exponential bump.
The upper plots show the desired position, velocity, and acceleration of this
target trajectory with dotted lines, which largely coincide with the realized
trajectories of the equations (solid lines). On the bottom right, the activation
of the 20 exponential kernels comprising the forcing term is drawn as a func-
tion of time. The kernels have equal spacing in time, which corresponds to an
exponential spacing in x.
1.5 2 2 2
1.5 1.5 1.5
1
y2
y2
y2
1 1 1
0.5 0.5 0.5
y1
0.5
0 0 0
0
0.5 0.5 0.5
0.5
1 1 1
0 0.2 0.4 0.6 0.8 1 1 0 1 2 3 1 0 1 2 3 1 0 1 2 3
Time [s] y1 y1 y1
x = 0.08 x = 0.04 x = 0.02
3 3 3
Dynamical Movement Primitives
1.2 2 2 2
1 1.5 1.5 1.5
0.8
y2
y2
y2
1 1 1
0.6
0.5 0.5 0.5
y2
0.4
0 0 0
0.2
0.5 0.5 0.5
0
0.2
1 1 1
0 0.2 0.4 0.6 0.8 1 1 0 1 2 3 1 0 1 2 3 1 0 1 2 3
Time [s]
y1 y1 y1
Figure 2: Vector plot for a 2D trajectory where y1 (top left) fits the trajectory of Figure 1 and y2 (bottom left) fits a minimum jerk
trajectory, both toward a goal g = (g1 , g2 ) = (1, 1). The vector plots show (ż1 , ż2 ) at different values of (y1 , y2 ), assuming that only
y1 and y2 have changed compared to the unperturbed trajectory (continuous line) and that x1 , x2 , ẏ1 , and ẏ2 are not perturbed. In
other words, it shows only slices of the full vector plot (ż1 , ż2 , ẏ1 , ẏ2 , ẋ1 , ẋ2 ) for clarity. The vector plots are shown for successive
values of x = x1 = x2 from 1.0 to 0.02 (i.e., from successive steps in time). Since τ ẏi = zi , such a graph illustrates the instantaneous
accelerations (ÿ1 , ÿ2 ) of the 2D trajectory if the states (y1 , y2 ) were pushed somewhere else in state space. Note how the system
335
evolves to a spring-damper model with all arrows pointing to the goal g = (1, 1) when x converges to 0.
336 Ijspeert et al.
τ φ̇ = 1, (2.5)
where φ ∈ [0, 2π] is the phase angle of the oscillator in polar coordinates
and the amplitude of the oscillation is assumed to be r.
Similar to the discrete system, the rhythmic canonical system serves to
provide both an amplitude signal (r) and a phase signal (φ) to the forcing
term f in equation 2.1:
N
i wi
f (φ, r) = i=1
N
r, (2.6)
i=1 i
where the exponential basis functions in equation 2.7 are now von Mises
basis functions, essentially gaussian-like functions that are periodic. Note
that in case of the periodic forcing term, g in equation 2.1 is interpreted
as an anchor point (or set point) for the oscillatory trajectory, which can
be changed to accommodate any desired baseline of the oscillation. The
amplitude and period of the oscillations can be modulated in real time by
varying, respectively, r and τ .
Figure 3 shows an exemplary time evolution of the rhythmic pattern gen-
erator when trained with a superposition of several sine signals of different
frequencies. It should be noted how quickly the pattern generator con-
verges to the desired trajectory after starting from zero initial conditions.
The movement is started simply by setting the r = 1 and τ = 1. The phase
variable φ can be initialized arbitrarily: we chose φ = 0 for our example.
More informed initializations are possible if such information is available
from the context of a task; for example, a drumming movement would
normally start with a top-down beat, and the corresponding phase value
could be chosen for initialization. The complexity of attractors is restricted
only by the abilities of the function approximator used to generate the
forcing term, which essentially allows almost arbitrarily complex (smooth)
attractors with modern function approximators.
1 10 150
0.5 5 100
0 50
yd
zd
0
y
−0.5 0
−1 −5 −50
−1.5 −10 −100
0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2
8 7.5 1
Kernel Activation
6 7
φd 6.5
4 0.5
φ
6
2 5.5
0 5 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2
time [s]
Another path to prove stability for our approach was suggested by Perk
and Slotine (2006), who proved that our dynamical systems equations are
two hierarchically coupled systems, each fulfilling the criterion of contrac-
tion stability (Lohmiller & Slotine, 1998). Contraction theory provides that
any parallel or serial arrangement of contraction stable systems will be
contraction stable too, which concludes the stability proof. This property
will be useful below, where we create multiple degree-of-freedom dynami-
cal systems, which inherit their stability proof from this contraction theory
argument.
This result can be verified by simply inserting equations 2.9 into the scaled
equations using the simple trick of k(g − y0 ) + ky0 as a goal. Similarly, when
scaling r → kr in the rhythmic system (see equations 2.1 and 2.5–2.7), the
same scaling law (see equation 2.9) allows proving topological equivalence
(for simplicity, assume that g = 0 in equation 2.1 for the rhythmic system,
which can always be achieved with a coordinate shift). For the scaling of
the time constant τ → kτ , topological equivalence for both the discrete and
Dynamical Movement Primitives 339
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
y
y
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
ż ẏ ẋ φ̇
ż → , ẏ → , ẋ → , φ̇ → . (2.10)
k k k k
Figure 4 illustrates the spatial (see Figure 4a) and temporal (see Figure 4b)
invariance using the example from Figure 1. One property that should be
noted is the mirror-symmetric trajectory in Figure 4a when the goal is at a
negative distance relative to the start state. We discuss the issue again in
section 3.4.
Figure 5 provides an example of why and when invariance properties
are useful. The blue (thin) line in all subfigures shows the same handwritten
cursive letter a that was recorded with a digitizing tablet and learned by a
two-dimensional discrete dynamical system. The letter starts at a StartPoint,
as indicated in Figure 5a, and ends originally at the goal point Target0 .
Superimposed on all subfigures in red (thick line) is the letter a generated
by the same movement primitive when the goal is shifted to Target1 . For
Figures 5a and 5b, the goal is shifted by just a small amount, while for
Figures 5c and 5d, it is shifted significantly more. Importantly, for Figures
5b and 5d, the scaling term g − y0 in equation 2.3 was left out, which destroys
the invariance properties as described above. For the small shift of the goal
in Figures 5a and 5b, the omission of the scaling term is qualitatively not
very significant: the red letter “a” in both subfigures looks like a reasonable
“a.” For the large goal change in Figures 5c and 5d, however, the omission of
the scaling term creates a different appearance of the letter “a,” which looks
almost like a letter “u.” In contrast, the proper scaling in Figure 5c creates
just a large letter “a,” which is otherwise identical in shape to the original
340 Ijspeert et al.
Start
Point
a) Target_0 b)
Target_1
c) d)
Figure 5: Illustration of the significance of the invariance properties, exempli-
fied in a two-dimensional discrete dynamical system to draw a cursive letter
a. In all subfigures, the blue (thin) line denotes the letter “a” as taught from a
human demonstration using a digitizing tablet. The start point for all figures
is the same, while the goal is originally Target0 , and, for the purpose of testing
generalization, the goal is shifted to Target1 . In a and b, the shift of the goal is
small, while in c and d, the shift of the goal is much more significant. Subfigures
a and c use equations 2.1 to 2.4, the proper formulation of the discrete dynamical
system with invariance properties. As can be noted from the red (thick) lines,
the generalized letter “a” is always a properly uniformly zoomed version of
the original letter “a.” In contrast, in subfigures b and d, the scaling term g − y0
in equation 2.3 was left out, which destroys the invariance properties. While
for a small shift of the goal in b the distortion of the letter “a” is insignificant,
for a large shift of the goal in d, the distortion creates more a letter “u” than a
letter “a.”
Position
Transformation
Velocity
System 1
Acceleration
f1 Position
Transformation
Velocity
System 2
f2 Acceleration
Position
Canonical Transformation
f3 Velocity
System System 3
Acceleration
fn ...
Position
Transformation
Velocity
System n
Acceleration
2.1.6 Learning the Attractor Dynamics from Observed Behavior. Our systems
are constructed to be linear in the parameters wi , which allows applying
a variety of learning algorithms to fit the wi . In this letter, we focus on a
supervised learning framework. Of course, many optimization algorithms
could be used too if only information from a cost function is available.
We assume that a desired behavior is given by one or multiple de-
sired trajectories in terms of position, velocity, and acceleration triples
(ydemo (t), ẏdemo (t), ÿdemo (t)), where t ∈ [1, . . . , P].6 Learning is performed in
two phases: determining the high-level parameters (g, y0 , and τ for the dis-
crete system or g, r, and τ for the rhythmic system) and then learning the
parameters wi .
For the discrete system, the parameter g is simply the position at the
end of the movement, g = ydemo (t = P) and, analogously, y0 = ydemo (t = 0).
The parameter τ must be adjusted to the duration of the demonstration.
In practice, extracting τ from a recorded trajectory may require some
thresholding in order to detect the movement onset and end. For in-
stance, a velocity threshold of 2% of the maximum velocity in the move-
ment may be employed, and τ could be chosen as 1.05 times the duration
6 We assume that the data triples are provided with the same time step as the integration
step for solving the differential equations. If this is not the case, the data are downsampled
or upsampled as needed.
Dynamical Movement Primitives 343
τ ż − αz (βz (g − y) − z) = f. (2.11)
P
Ji = i (t)( ftarget (t) − wi ξ (t))2 , (2.13)
t=1
where ξ (t) = x(t)(g − y0 ) for the discrete system and ξ (t) = r for the rhyth-
mic system. This is a weighted linear regression problem, which has the
344 Ijspeert et al.
solution
sT Ŵi ftarget
wi = , (2.14)
sT Ŵi s
where
0 ftarget (1)
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
ξ (1) i (1)
⎜ f (2) ⎟
⎜ ξ (2) ⎟ i (2)
⎜ ⎟ ⎜ ⎟
s =⎜ ⎟ Ŵi = ⎜ ⎟ ftarget = ⎜ target ⎟.
⎜ ⎟ ⎜ ⎟
⎝ ... ⎠ ⎝ ··· ⎠ ⎝ ... ⎠
ξ (P) 0 i (P) ftarget (P)
Discrete or Rhythmic
Transformation System
(Simple Dynamics+Forcing Term)
Coupling
Terms
2.1.8 Variations. Given the general design principle from the previous
section, we briefly discuss some variations of our approach that have been
useful in some applications.
When switching the goal g to a new value, equation 2.1 generates a
discontinuous jump in the acceleration ÿ. This can be avoided by filtering
the goal change with a simple first-order differential equation:
whose constants are chosen to be critically damped (see Ijspeert et al., 2003,
for an example). The forcing term is reformulated as
N
i (x)wi
f (x) = i=1
N
v(g − y0 ). (2.17)
i=1 i (x)
gfit and y0, f it denote the goal and start point used to fit the weights of the
nonlinear function. Thus, during fitting, this quotient is always equal to +1
or −1, such that the forcing term is guaranteed to be active. The numerically
very small positive constant ǫ avoids dividing by zero. There is, however,
Dynamical Movement Primitives 347
a numerical danger that after learning, a new goal different from y0 could
create a huge magnification of the originally learned trajectory. For instance,
if during learning we had g f it − y0, f it = 0, and ǫ = 0.0001, a change of the
goal offset to the start position of g − y0 = 0.01 would create a multiplier of
100 in equation 2.18. While theoretically fine, such a magnification may be
practically inappropriate. A way out is to modify the forcing term to use the
maximal amplitude of a trajectory as scaling term, A = max (y) − min (y):
N
i (x)wi
f (x) = i=1
N
xA. (2.19)
i=1 i (x)
In this variant, the goal g becomes decoupled from the amplitude A, and
both variables can be set independently. While it is easy to lose the strict
property of structural equivalence in this way, it may be practically more
appropriate for certain applications. Hoffmann, Pastor, Park, and Schaal
(2009) suggested a similar approach.
For the rhythmic system, equation 2.15 is equally useful as for the discrete
system if the midpoint of the oscillation is supposed to be changed during
the run of the dynamical systems and a discontinuity is to be avoided.
Similarly, the amplitude of the oscillation can be changed with a smooth
dynamics equation,
3 Evaluations
Discrete Rhythmic
(x)
N N
i=1 i i=1
i
Notes: The high-level design parameters of the discrete system are τ , the temporal scaling factor, and g, the goal position. The design parameters
of the rhythmic system are g, the baseline of the oscillation; τ , the period divided by 2π ; and r, the amplitude of oscillations. The terms Ct and Cc
are coupling terms that are application dependent and explained in section 3.2. The parameters wi are fitted to a demonstrated trajectory using
locally weighted learning. The parameters αz , βz , αx , αr , αg, hi , and ci are positive constants. Unless stated otherwise, the values at the bottom of
the table are used in this letter.
349
350 Ijspeert et al.
τ ż = αz (βz (g − y) − z) + f + Ct , (3.1)
τ ẏ = z.
Dynamical Movement Primitives 351
The angle θ is interpreted as the angle between the velocity vector ẏ and the
difference vector (o − y) between the current position and the obstacle. The
vector r is the vector that is perpendicular to the plane spanned by ẏ and
(o − y), and serves to define a rotation matrix R, which causes a rotation
of 90 degrees about r (Sciavicco & Siciliano, 2000). Intuitively, the coupling
term adds a movement perpendicular to the current movement direction as
a function of the distance vector to the obstacle (see Hoffmann et al., 2009,
for more details). The constants are chosen to γ = 1000 and β = 20/π.
Figure 8 illustrates the behavior that the obstacle-avoidance coupling
term generates for various trajectories starting from different initial position
around the origin y = [0 0 0]T but ending at the same goal state g = [1 1 1]T .
Depending on the start position, the coupling term creates more or less
curved movements around the obstacle at o = [0.5 0.5 0.5]T . The behavior
352 Ijspeert et al.
3.2.2 Temporal Coupling. By modulating the canonical system, one can in-
fluence the temporal evolution of our dynamical systems without affecting
Dynamical Movement Primitives 353
Movement
Orignal
Online Adaptation
Obstacle Avoidance to changing target
Figure 9: Sarcos slave robot placing a red cup on a green coaster. The first row
shows the placing movement on a fixed goal with a discrete dynamical system.
The second row shows the ability to adapt to changing goals (white arrow)
after movement onset. The third row shows the resulting movement as a blue
ball-like obstacle interferes with the placing movement, using the coupling term
from equation 3.2.
τ ẋ = −αx x + Cc (3.5)
τ φ̇ = 1 + Cc . (3.6)
20
10
−10
−20
0 20 40 60 80 100 120 140 160 180
Time [sec]
(a)
5
Elbow acceleration [rad/s2]
10
0
0
−10
−5 −20
10 20 30 40 50 150 155 160 165 170
Time [sec] Time [sec]
(b) (c)
Figure 10: Desired acceleration trajectory for the elbow flexion-extension joint
over time in a drumming movement, generated by a rhythmic dynamical sys-
tem. Vertical bars represent the external signal. Panels b and c are zoomed
versions of panel a in order to show more detail.
τ φ̇ = ω, (3.7)
τ ω̇ = kω (ωext − ω) + kφ (mod2π (φext − φ + φd )).
to which the slow drumbeat of the robot was to synchronize. In the begin-
ning, the robots started from immobility (ω = 0). Within two beats (the time
needed to extract the frequency from the acoustic signal), perfect synchro-
nization and phase locking is achieved with a 0.15 Hz signal—very rapid
synchronization. Afterward, we had the external metronome pace increase
frequency slowly to 0.5 Hz to demonstrate the continuous adaptation abil-
ity of the oscillator. Figure 10 shows the elbow DOF angular acceleration
of the drumming pattern, which has the most significant contribution to
the whole arm movement. As can be seen, with increasing frequency, the
overall acceleration amplitude of the pattern changes but not the qualitative
waveform. This property is due to the invariance properties of the dynami-
cal systems. All other DOFs of the arm demonstrate the same behavior and
are equally phase-locked to the beat of the metronome.
3.2.3 Temporal and Spatial Coupling. In this section, we illustrate how both
temporal and spatial coupling can be used together to model disturbance
rejection, a property that is inherent in attractor systems. For this purpose,
we assume a simple control system:
Here, the position y and velocity ẏ of the 1 DOF discrete dynamical system
drive the time evolution of ya , which can be interpreted as the position
of a simple point mass-controlled by a proportional-derivative controller
with gains kp and kv . We use the dynamics from Figure 1 to generate the
input y, ẏ. At time t = 0.35 s, the point mass is suddenly blocked from any
further motion and released again at t = 0.9 s. Thus, for t ∈ [0.35 s, 0.9 s] we
have ya = const, ẏa = ÿa = 0. Without coupling terms, the dynamical system
would just continue its time evolution, regardless of what happens to the
point mass. For a practical control system, this behavior is undesirable as
the desired trajectory y can move away significantly from the current state
ya , such that on release of the mass, it would create a dangerously large
motor command and jump to catch up with the desired target.
To counter this behavior, we introduce the coupling terms
1.5
1. 60
60
4 40
40
1
2 20
20
zd
zd
yd
0.5
0.
y
0 0
0 -20
-2
-2
-2
-0.5
-0
0..5 -40
-4
0 1 2 3 0 1 2 3 0 1 2 3
1 0 1
tio
ati
vat on
ion
-1
-1
iva
cttiv
Acti
xd
0.5
0. 0.5
0.
el Ac
x
-2
-2
ern
Ker nel
rne
-3
-3
Ke
0 0
0 1 2 3 0 1 2 3 0 1 2 3
time
ti
imme [s
[s]
Figure 11: Subjecting the discrete dynamical system from Figure 1 to “holding”
perturbation. At time t = 0.35 s, the actual movement system is blocked from
its time evolution: its velocity and acceleration are zero, and its position (dash-
dot line in the top-left figure) remains constant until t = 0.9 s (see the shaded
area). Due to the coupling terms, the time evolution of the dynamical system
decays to zero and resumes after the actual system is released. For comparison,
the unperturbed time evolution of the dynamics is shown in a dashed line.
Essentially the perturbation simply delays the time evolution of the dynamical
system without any large motor commands leading to possible harm.
system, that is, both the canonical and the transformation systems. This
modification of the time constant slows the temporal evolution of the dy-
namics in case of a significant tracking error. The constants are chosen to be
k p = 1000, kv = 125, αe = 5, kt = 1000, kc = 10,000. It should be noted that
many other coupling terms could be created to achieve similar behavior
and that our realization is just a simple and intuitive design of such a
coupling term.
Figure 11 illustrates the behavior due to these coupling terms in com-
parison to the unperturbed (dashed line) time evolution of the dynamics.
The top-left plot of Figure 11 also shows with the dash-dot line the position
ya . During the holding time period, the entire dynamics comes almost to a
stop and then smoothly resumes after the release of the mass roughly with
the same behavior as where the system had left off; the system continues
with the negative dip before moving toward the goal. Without the coupling
terms, y would already have evolved all the way to the goal position, and
the error between ya and y would have grown very large. These types of
couplings have been used successfully with the humanoid robot (see the
video at http://biorob.epfl.ch/dmps). It should also be emphasized that
many different behaviors could be generated with other coupling terms,
and it is really up to the modeler to decide which behavioral properties to
realize with a particular realization of coupling terms.
Dynamical Movement Primitives 357
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V WX Y Z
a)
b)
c)
coordinate, and only y1 has a difference. Thus, we use equation 2.19 for rep-
resenting y2 . In order to generalize to new goals, the movement is generated
in local coordinates and requires a subsequent coordinate transformation
to rotate the trajectory to global coordinates. With this strategy, the gen-
eralization pattern in Figure 14c looks the most appealing to the human
eye. Mathematically, all we did was change coordinates to represent our
dynamical systems model.
In conclusion, the invariance properties in our dynamical systems model
are mathematically well founded, but the choice of coordinates for repre-
senting a model can make a big difference in how generalization of the
dynamics systems appears. From a practical point of view, one should first
carefully investigate what properties a model requires in terms of temporal
and spatial invariance and then realize these properties by choosing the
most appropriate variant of the dynamical systems model and the most
appropriate coordinate system for modeling.9
4 Related Work
As mentioned in section 1, the central goal of our work was to derive learn-
able attractor dynamical systems models for goal-directed behavior and
to explore generalization and coupling phenomena with such systems as
models for biological and technical systems. Thus, we roughly classify re-
lated work according to more biological or more technical domains. Besides
general nonlinear systems theory, most related work comes from the field
of biological and artificial motor control.
f
τ ÿ = αz βz g + − y − ẏ ,
αz βz
f
one can interpret the term g + α β as a virtual trajectory or equilibrium point
z z
trajectory. But in contrast to equilibrium point approaches, which were
meant to be a simplified computation to generate motor commands out
of the spring properties of the neuromuscular system, our work addresses
kinematic planing dynamics, which still requires a controller to convert
kinematic plans into motor commands. Gomi and Kawato (1996, 1997)
provide a useful discussion on this topic.
4.2 Robotics and Control Theory. Potential field approaches create vec-
tor fields according to which a movement system is supposed to move. This
idea has the same spirit as dynamical systems approaches in motor con-
trol. Deriving robot controllers from potential fields has a long tradition
in robotics (Khatib, 1986; Koditschek, 1987; Tsuji, Tanaka, Morasso, San-
guineti, & Kaneko, 2002; Okada, Tatani, & Nakamura, 2002; Li & Horowitz,
1999). Potential fields represent attractor landscapes, with the movement
goal acting as a point attractor. Designing such potential fields for a given
behavior is often a hard problem, with few analytically sound approaches
(Koditschek, 1987). Due to this lack of analytical tractability, some peo-
ple have suggested recurrent neural network (Paine & Tani, 2004; Jaeger
& Haas, 2004) or evolutionary methods (Ijspeert, Hallam, & Willshaw,
1999; Ijspeert, 2001) as a design strategy for nonlinear dynamical systems
controllers.
From a more theoretical side, Bühler and Koditschek (1990), Rizzi and
Koditschek (1994), Burridge, Rizzi, and Koditschek (1999), and Klavins and
Koditschek (2001) developed a variety of control algorithms in the context
of nonlinear dynamics that could be investigated both experimentally and
analytically and that demonstrated very good performance. The idea of
364 Ijspeert et al.
5 Conclusion
This letter presented a general design principle for learning and model-
ing with attractor dynamical systems for goal-directed behavior, which is
particularly useful for modeling motor behaviors in robotics but also for
modeling biological phenomena. Using nonlinear forcing terms that are
added to well-understood dynamical systems models, we can create a rich
variety of nonlinear dynamics models for both point attractive and limit
cycle systems. The nonlinear forcing term can be represented as an au-
tonomous coupling term that can be learned with standard machine learn-
ing techniques that are linear in the open parameters. We demonstrated the
properties of our approach, highlighting theoretical and practical aspects.
We illustrated our method in various examples from motor control.
To the best of our knowledge, the approach we have presented is the first
realization of a generic learning system for (weakly) nonlinear dynamical
systems that can guarantee basic stability and convergence properties of
the learned nonlinear systems and that scales to high-dimensional attractor
systems. Besides the particular realizations of nonlinear system models
presented in this letter, we believe it is of greater importance to highlight
the design principle that we employed. This design principle seems to be
applicable for many other nonlinear dynamical systems models, as well as
technical applications and computational neuroscience.
Several points of our approach require highlighting. First, the proposed
system of equations is not very complicated. We primarily make use of
linear spring-damper differential equations, while nonlinearities are in-
troduced with the help of standard kernel-based function approximators.
Dynamical Movement Primitives 367
Acknowledgments
References
Atkeson, C. G., Hale, J., Kawato, M., Kotosaka, S., Pollick, F., Riley, M., et al. (2000).
Using humanoid robots to study human behaviour. IEEE Intelligent Systems, 15,
46–56.
Bernstein, N. A. (1967). The control and regulation of movements. London: Pergamon
Press.
Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Robot programming by
demonstration. In B. Siciliano & O. Khatib (Eds.), Handbook of robotics. Cambridge,
MA: MIT Press.
Billard, A., & Mataric, M. (2001). Learning human arm movements by imitation:
Evaluation of a biologically-inspired architecture. Robotics and Autonomous Sys-
tems, 941, 1–16.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Buchli, J., Righetti, L., & Ijspeert, A. J. (2006). Engineering entrainment and adapta-
tion in limit cycle systems—from biological inspiration to applications in robotics.
Biological Cybernetics, 95(6), 645–664.
Bühler, M., & Koditschek, D. E. (1990). From stable to chaotic juggling: Theory,
simulation, and experiments. In Proceedings of the IEEE International Conference on
Robotics and Automation (pp. 845–865). Piscataway, NJ: IEEE.
Bullock, D., & Grossberg, S. (1989). VITE and FLETE: Neural modules for trajectory
formation and postural control. In W. A. Hersberger (Ed.), Volitional control (pp.
253–297). New York: Elsevier.
Burridge, R. R., Rizzi, A. A., & Koditschek, D. E. (1999). Sequential composition of
dynamically dexterous robot behaviors. International Journal of Robotics Research,
18(6), 534–555.
Chevallereau, C., Westervelt, E. R., & Grizzle, J. W. (2005). Asymptotically stable
running for a five-link, four-actuator, planar, bipedal robot. International Journal
of Robotics Research, 24(6), 431–464.
Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge, MA: MIT Press.
Dijkstra, T. M., Schoner, G., Giese, M. A., & Gielen, C. C. (1994). Frequency de-
pendence of the action-perception cycle for postural control in a moving visual
environment: Relative phase dynamics. Biol Cybern, 71(6), 489–501.
Fajen, B. R., & Warren, W. H. (2003). Behavioral dynamics of steering, obstacle
avoidance, and route selection. J. Exp. Psychol. Hum. Percept. Perform., 29(2), 343–
362.
Flash, T., & Hogan, N. (1985). The coordination of arm movements: An exper-
imentally confirmed mathematical model. Journal of Neuroscience, 5(7), 1688–
1703.
Flash, T., & Sejnowski, T. (2001). Computational approaches to motor control. Current
Opinion in Neurobiology, 11, 655–662.
Dynamical Movement Primitives 369
Miyamoto, H., Schaal, S., Gandolfo, F., Koike, Y., Osu, R., Nakano, E., et al. (1996).
A kendama learning robot based on bi-directional theory. Neural Networks, 9,
1281–1302.
Mussa-Ivaldi, F. A. (1997). Nonlinear force fields: a distributed system of control
primitives for representing and learning movements. In Proceedings of the IEEE
International Symposium on Computational Intelligence in Robotics and Automation
(pp. 84–90). San Mateo, CA: IEEE Computer Society.
Mussa-Ivaldi, F. A. (1999). Modular features of motor control and learning. Current
Opinion in Neurobiology, 9(6), 713–717.
Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004).
Learning from demonstration and adaptation of biped locomotion. Robotics and
Autonomous Systems, 47, 79–91.
Okada, M., Tatani, K., & Nakamura, Y. (2002). Polynomial design of the nonlinear
dynamics for the brain-like information processing of whole body motion. In
Proceedings of the IEEE International Conference on Robotics and Automation (pp.
1410–1415). Piscataway, NJ: IEEE.
Paine, R. W., & Tani, J. (2004). Motor primitive and sequence self-organization in a
hierarchical recurrent neural network. Neural Networks, 17(8–9), 1291–1309.
Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization
of motor skills by learning from demonstration. In International Conference on
Robotics and Automation (pp. 763–768). Piscataway, NJ: IEEE.
Perk, B. E., & Slotine, J. J. E. (2006). Motion primitives for robotic flight control.
arXiv:cs/0609140v2 [cs.RO].
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy
gradients. Neural Networks, 21(4), 682–697.
Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate
phase-locking of rhythmic motor primitives. In Proceedings of the IEEE International
Conference on Intelligent Robots and Systems (pp. 2911–2916). Piscataway, NJ: IEEE.
Righetti, L., Buchli, J., & Ijspeert, A. J. (2006). Dynamic Hebbian learning in adaptive
frequency oscillators. Physica D, 216(2), 269–281.
Rimon, E., & Koditschek, D. (1992). Exact robot navigation using artificial potential
functions. IEEE Transactions on Robotics and Automation, 8(5), 501–518.
Rizzi, A. A., & Koditschek, D. E. (1994). Further progress in robot juggling: Solvable
mirror laws. In Proceedings of the IEEE International Conference on Robotics and
Automation (pp. 2935–2940). Piscataway, NJ: IEEE.
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends Neurosci.,
21(5), 188–194.
Sakoe, H., & Chiba, S. (1987). Dynamic programming algorithm optimization for
spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Pro-
cessing, 26(1), 43–49.
Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cog-
nitive Sciences, 3(6), 233–242.
Schaal, S., & Atkeson, C. G. (1994). Assessing the quality of learned local models. In
J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information process-
ing systems, 6 (pp. 160–167). San Mateo, CA: Morgan Kaufmann.
Schaal, S., & Atkeson, C. G. (1998). Constructive incremental learning from only local
information. Neural Computation, 10(8), 2047–2084.
372 Ijspeert et al.
Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor
learning by imitation. Philosophical Transactions of the Royal Society of London: Series
B, Biological Sciences, 358(1431), 537–547.
Schaal, S., Mohajerian, P., & Ijspeert, A. (2007). Dynamics systems vs. optimal
control—a unifying view. Prog. Brain Res., 165, 425–445.
Schaal, S., & Sternad, D. (1998). Programmable pattern generators. In Proceedings of
the International Conference on Computational Intelligence in Neuroscience (pp. 48–51).
Piscataway, NJ: IEEE.
Schaal, S., Sternad, D., Osu, R., & Kawato, M. (2004). Rhythmic movement is not
discrete. Nature Neuroscience, 7(10), 1137–1144.
Schöner, G. (1990). A dynamic theory of coordination of discrete movement. Biological
Cybernetics, 63, 257–270.
Schöner, G., & Kelso, J. A. S. (1988). Dynamic pattern generation in behavioral and
neural systems. Science, 239, 1513–1520.
Schöner, G., & Santos, C. (2001). Control of movement time and sequential action
through attractor dynamics: A simulation study demonstrating object intercep-
tion and coordination. In Proceedings of the 9th International Symposium on Intelli-
gent Robotic Systems. http://spiderman-2.laas.fr/sirs2001/proceedings/
Sciavicco, L., & Siciliano, B. (2000). Modelling and control of robot manipulators. New
York: Springer.
Scott, A. (2005). Encyclopedia of nonlinear science. New York: Routledge.
Slotine, J. J. E., & Li, W. (1991). Applied nonlinear control. Upper Saddle River, NJ:
Prentice Hall.
Sternad, D., Amazeen, E., & Turvey, M. (1996). Diffusive, synaptic, and synergetic
coupling: An evaluation through inphase and antiphase rhythmic movements.
Journal of Motor Behavior, 28, 255–269.
Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology,
chemistry, and engineering. Reading, MA: Addison-Wesley.
Swinnen, S. P., Li, Y., Dounskaia, N., Byblow, W., Stinear, C., & Wagemans, J. (2004).
Perception-action coupling during bimanual coordination: The role of visual
perception in the coalition of constraints that govern bimanual action. J. Mot.
Behav., 36(4), 394–398, 402–407 (discussion 408–417).
Taga, G., Yamaguchi, Y., & Shimizu, H. (1991). Self-organized control of bipedal loco-
motion by neural oscillators in unpredictable environment. Biological Cybernetics,
65, 147–159.
Thelen, E., & Smith, L. B. (1994). A dynamical systems approach to the development of
cognition and action. Cambridge, MA: MIT Press.
Theodorou, E., Buchli, J., & Schaal, S. (2010). Reinforcement learning in high dimen-
sional state spaces: A path integral approach. Journal of Machine Learning Research,
2010(11), 3137–3181.
Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuro-
science, 7, 907–915.
Tsuji, T., Tanaka, Y., Morasso, P. G., Sanguineti, V., & Kaneko, M. (2002). Bio-mimetic
trajectory generation of robots via artificial potential field with time base gen-
erator. IEEE Transactions on Systems, Man, and Cybernetics—Part C, 32(4), 426–
439.
Turvey, M. T. (1990). Coordination. Am. Psychol., 45(8), 938–953.
Dynamical Movement Primitives 373
Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization
of discrete and periodic dynamic movement primitives. IEEE Transactions on
Robotics, 26(5), 800–815.
Wada, Y., & Kawato, M. (2004). A via-point time optimization algorithm for complex
sequential trajectory formation. Neural Networks, 17(3), 353–364.
Wolpert, D. M. (1997). Computational approaches to motor control. Trends Cogn. Sci.,
1(6), 209–216.
Wyffels, F., & Schrauwen, B. (2009). Design of a central pattern generator using
reservoir computing for learning human motion. In ATEQUAL 2009: 2009 EC-
SIS Symposium on Advanced Technologies for Enhanced Quality of Life (LABRS and
ARTIPED 2009): Proceedings (pp. 118–122). Los Alamitos, CA: IEEE Computer
Society.