Generalized Learning To Create An Energy Efficient ZMP-Based Walking 2014
Generalized Learning To Create An Energy Efficient ZMP-Based Walking 2014
ZMP-Based Walking
1. Introduction
Fig.2. a) Frontal view of the NAO robot and the inverted pendulum model b) A schematic view
of the inverted pendulum model
In the sagittal plane, the horizontal and vertical positions of CoM are denoted by x
and z, respectively. Gravity g, horizontal CoM acceleration 𝑥 , and vertical CoM
acceleration 𝑧, create a moment Tp around the center of pressure (CoP) point Px. The
Equation (1) provides the moment around P.
𝑇𝑝 = 𝑀(𝑔 + 𝑧)(𝑥 − 𝑃𝑥 ) − 𝑀𝑥 𝑧 (1)
We know from [9] that when the robot is dynamically balanced, ZMP and CoP are
identical, therefore, the amount of moment in the CoP point must be zero, 𝑇𝑝 = 0. By
assuming the left hand side of equation (1) to be zero, equation (2) provides the
position of the ZMP based on the position and acceleration of CoM. In order to
generate a 3D walking, the CoM must also move in the frontal plane; hence, another
inverted pendulum must be used in y direction. Using the same assumptions, equation
(2) is given for movements in the frontal plane denoted by y.
z 𝑧
𝑃𝑥 = 𝑥 − 𝑥 𝑃𝑦 = 𝑦 − 𝑦 (2)
𝑔+𝑧 𝑔+𝑧
In order to apply the inverted model in a biped walking problem, first the positions
of the support foot during a walk must be determined. In a forward walk, the support
foot positions are calculated based on the desired input step length. Then, the ZMP
trajectory is designed based on support foot positions and the input step period. The
vertical CoM position and acceleration trajectory must also be determined as the input
of the inverted pendulum model, our approach to generate vertical CoM trajectories is
explained in section 3. In the final step, the horizontal position of the CoM is
calculated by solving the differential equations (2). The main issue of using the
inverted pendulum is how to solve these differential equations. The solution is
explained in section 2.1. Finally, an inverse kinematics method is used to find the
angular trajectories of each joint based on the planned position of the feet and
generated CoM position. We used and developed an inverse kinematic approach,
which was applied on the NAO humanoid soccer robot, see details in [11].
Kagami et al. proposed an approach to generate walking patterns by solving the ZMP
equations numerically [10]. Kajita et al. used this numerical approach and the inverted
pendulum model in order to generate the horizontal CoM trajectory of a ZMP based
biped running [3].
In this numerical approach, in order to generate horizontal CoM, first the position
and acceleration of CoM are discretized with a small time step ∆𝑡.
𝑥(𝑖∆𝑡) → 𝑥(𝑖) (3)
𝑥 𝑖 − 1 − 2𝑥 𝑖 + 𝑥(𝑖 + 1)
𝑥 𝑖∆𝑡 →
∆𝑡 2
Then, a tridiagonal system for the equation (2) is written as:
𝑃𝑥 = 𝑎𝑖 𝑥 𝑖 − 1 + 𝑏𝑖 𝑥 𝑖 + 𝑎𝑖 𝑥 𝑖 + 1 (4)
Where,
1 𝑧(𝑖∆𝑡) 2 𝑧(𝑖∆𝑡) (5)
𝑎𝑖 = − 2
( ) 𝑏𝑖 = 1 + 2
( )
∆𝑡 𝑔 + 𝑧(𝑖∆𝑡) ∆𝑡 𝑔 + 𝑧(𝑖∆𝑡)
For generating CoM trajectory, the linear system is obtained, which is presented in
equation (6). In order to solve this tridiagonal system Thomas algorithm can be
applied. The solution can be obtained in O(n) operations. Here, n= Ts/∆t, in this study
∆𝑡 is assumed to be 0.005 s, and Ts is the total time in which CoM is calculated.
𝑍𝑀𝑃𝑥 𝐾 𝐶𝑜𝑀𝑥 (6)
𝑃(1) 𝑏1 𝑎1 … 0 𝑥(1)
𝑃(2) 𝑎2 𝑏2 𝑎2 𝑥(2)
𝑎3 𝑏3 𝑎3 ⋮ 𝑥(3)
=
⋱ ⋱ ⋱
⋮ ⋮ 𝑎𝑛 −1 𝑏𝑛 −1 𝑎𝑛−1 ⋮
𝑃(𝑛) 0 … 𝑎𝑛 𝑏𝑛 𝑥(𝑛)
We consider the height trajectory as the periodic movement. In this study, vertical
CoM trajectory is represented by the first five terms of the Fourier basis functions.
Therefore, the equation of our vertical CoM trajectory generator is given in (7).
i=2
𝑖𝜋𝑡 (𝑖 − 1)𝜋𝑡 (7)
𝐹 𝑡 =∁+ 𝛽𝑖 cos + 𝛼𝑖 sin
𝐿 𝐿
i=1
The parameter L is equal to the step period, therefore the generator has five
parameters such as ∁ , 𝛽1 , 𝛽2 , 𝛼1 and 𝛼2 . A black-box optimization approach can be
applied in order to find the optimized hip height trajectory generator with respect to
energy efficient walking, in the section 4 we describe our optimization scenario. The
generated trajectory by Fourier basis functions is the input to the programmable
CPGs.
In the CPGs implementation studies, nonlinear oscillators i.e. Hopf are interesting
because of their synchronization properties when they are coupled with other
oscillators or with an external drive signal. Most CPGs use phase-locking behavior for
their coupling method [12]. If intrinsic frequency of the oscillator is close to
frequency component of the periodic input, phase-lock behavior will appear, and
synchronization will be done perfectly. In 2006, Righeti et.al. designed an adaptive
oscillator based on Hopf oscillator which was able to learn CPGs frequency from the
frequency of periodic input signals [8]. They called their adaptive mechanism
dynamic Hebbian learning because it shared similarities with correlation-based
learning found in neural networks [13]. The structure of the network of adaptive Hopf
oscillators is shown in figure 3.
In this study the external drive signal, or teaching trajectory, is the output trajectory
of the presented Fourier based generator. Each oscillator is responsible for learning
one frequency component of the signal. The network can be designed by four
oscillators and each oscillator is denoted by i. The output of the system is the
weighted sum of the output of the oscillators 𝑄𝑙𝑒𝑎𝑟𝑛𝑒𝑑 (𝑡) = 𝑖 𝑎𝑖 𝑥𝑖 , here a i is
assumed the amplitude of each learned frequency. By using negative feedback loop,
the already learned frequencies will be subtracted from the teaching signal F t =
Pteach (t)- Q learned (t). It leads the system to adapt to remaining frequencies
component which have not yet converged. According to the fact that each oscillator
has its own phase shift, a variable encoding phase difference between the oscillator
and the first oscillator of the network is associated with each of them. In order to
reproduce any phase relationship between the oscillators, the Kuramoto coupling
scheme [14] is used. The Equations describing the Total CPG’s learning and
dynamics are given as follow.
𝑥𝑖 = 𝛾 𝜇 − 𝑟𝑖 2 𝑥𝑖 − 𝜔𝑖 𝑦𝑖 + 𝜖𝐹 𝑡 + 𝜏 sin(𝜃𝑖 − ∅𝑖 ) (8)
𝑦𝑖 = 𝛾 𝜇 − 𝑟𝑖 2 𝑦𝑖 − 𝜔𝑖 𝑥𝑖 (9)
𝑦𝑖
𝜔𝑖 = 𝜖𝐹 𝑡 (10)
𝑟𝑖
𝑎𝑖 = 𝛽𝑥𝑖 𝐹(𝑡) (11)
𝜔𝑖
∅𝑖 = 𝑠𝑖𝑛( 𝜃 − 𝜃𝑖 − ∅𝑖 ) (12)
𝜔0 0
𝑦𝑖
𝜃𝑖 = 𝑠𝑔𝑛(𝑥𝑖 )𝑐𝑜𝑠 −1 (− ) (13)
𝑟𝑖
Equation 8, 9 and 10 are representing Hopf oscillator and its frequency learning,
where γ controls the speed of recovery after perturbation. In equation 8, the
Kuramoto coupling method is represented by τ sin(θi − ∅i ) in order to achieve phase
synchronization between oscillators. Each adaptive oscillator is coupled with
oscillator 0, with strength τ to keep correct phase relationships between oscillators.
∅i is the phase difference between oscillator i and 0.
Equation 12 and 13 shows how ∅i can converge to the phase difference between
the instantaneous phase of oscillator 0,θ0 scaled at frequency ωi and the instantaneous
phase of oscillator i, θi Learning rule for updating a i is presented by equation 11,
where β is learning rate. Learning rule shows how correlation between 𝑥𝑖 and F(t)
will be maximized. The correlation will be positive on average and will stop
increasing when frequency component 𝜔𝑖 disappears from F(t) because of the
negative feedback loop. The negative feedback is working like amount of the error,
and learning rule is working like the perceptron rule and since the input signal
is linearly separable, the above online algorithm will converge.
As conclusion, applying learning rules given as differential equations, parameters
such as intrinsic frequencies, amplitudes, and weights of phase coupling can be
automatically adapted to a teaching signal. One of its interesting aspects is that the
learning is completely embedded into the dynamical system, and does not require
external optimization algorithms.
Using Fourier basis functions together with presented CPG concept the proposed
trajectory generator model , or policy representation, has the following advantages:
Smoothness: Using the CPGs increase basin of stability of walking. CPGs
generate smooth and continuous trajectories without sudden accelerations, which
enable the robot not fall and also reduce its energy consumption. By changing the step
length and period during the walk, the robot may change its energy efficient vertical
CoM trajectory; CPGs make able this change to be smoothly. CPGs also have the
ability of frequency adaptation when walking step period and CoM trajectory period
changed.
Periodicity: The Fourier basis function is easily able to represent periodic or cyclic
movement. A biped walk often consists of periodic movements.
Convergence: the frequency of a walk is equal to the frequency of its vertical CoM
trajectory. Using Fourier basis function the frequency parameters is eliminated,
therefore the robot converges to the energy efficient walk faster compared to the
approaches uses Spline basis function [6].
4. Learning Scenario
In this study, the vertical CoM trajectory is represented by the first five terms of the
Fourier basis functions. The optimized energy efficient vertical CoM motion must be
achieved for different step periods and step lengths. Since the step length and step
period are continuous variables, they are discretized with a proper resolution. The
boundaries of the step lengths and step period resolutions used in this work are the
following:
Step Length= [0.06…..0.18] m; Resolution = 0.04
Step Period= [0.4…. 0.8] s; Resolution = 0.2
There are 12 possible combinations of the step periods and step lengths, and for
each of them the energy optimization is performed. The optimized values of the
Fourier basis functions terms must be found, with respect to minimization of actuator
electrical power consumption. Bipedal walking is known as a complicated motion
since many factors affect walking style and stability, such as robot's kinematics and
dynamics, collision between feet and the ground. In such a complex motion, relation
between gait trajectory and walking characteristic, e.g. energy consumption, is
nonlinear. Stochastic optimization algorithms can be applied to find the optimized
parameter values of the CoM vertical trajectory generator with respect to generate an
energy efficient walk.
In this paper, Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is used
as a stochastic optimization algorithm for our gait optimization scenario. CMA-ES is
a population-based stochastic, derivative-free method, which can be used in black-box
optimization problems or direct policy search reinforcement learning. It has been
successfully applied previously on gait optimization scenarios [15][16]. It is also
reported that CMA-ES could achieve better results and faster convergence compared
to other famous stochastic optimization techniques such as particle swarm
optimization (PSO) and Genetic Algorithm (GA) [16].
CMA-ES generates a set of candidates, as the population, sampled from a
multivariate Gaussian distribution. After generating the population, CMA-ES
evaluates each candidate with respect to a fitness measure. After evaluating all the
candidates in the population, the mean of the multivariate Gaussian distribution is
recalculated as a weighted average of the candidates with the highest fitness. The
covariance matrix of the distribution is also updated to bias the generation of the next
set of candidates toward directions of previously successful search steps. In this study
the population size is assumed to be 8.
For minimization of electrical power and energy consumption, the electrical power
must be measured. The electrical power for a motor can be given in a simple form by
Pm = I2R, where I is the current, R the resistance. The motor stall torque is calculated
by τ = KtI, where Kt is the motor torque constant. By combining these expressions, the
𝑟
electrical power can be rewritten as Pe= 2 𝜏 2 . Therefore, in this study, the cost
𝐾𝑡
metric is measured by the sum of the joint-torques squared.
In this study, a simulated NAO robot is used in order to test and verify the approach.
The NAO model is a kid size humanoid robot that is 58 cm high and 21 degrees of
freedom (DoF). The simulation is carried out using the RoboCup soccer simulator,
rcsssever3d, which is the official 3D simulator released by the RoboCup community,
in order to simulate humanoids soccer match. The simulator is based on Open
Dynamic Engine (ODE). The ODE can report the produced torque of each joint in
each simulation step time, therefore the sum of the joint-torques squared can be
calculated as the cost function.
Using the CMA-ES, after 30 iterations and 240 trails, the robot could reduce its
energy usage by 25 percent, on average in all learning scenarios. The optimization is
performed for 10 seconds walking with all the step lengths and step periods, which
were presented in section 4. Figure 4, shows the convergence of the cost function of
the learning scenario for the walk with step length 0.1 m and step period 0.4 s. The
optimized vertical CoM trajectory for this walk is also shown in the figure 5.
Fig 4. CMA_ES convergence for walk with step length 0.10 m and step period 0.4 s
Fig 6. Learning convergence for walking with step length 0.10 m and period 0.8 s
Fig. 7. Energy efficient CoM vertical trajectory during two step periods
Figures 8 and 9, also show the convergence of the cost function and the optimized
vertical CoM trajectory of the walk with step length 0.14 m and step period 0.4 s.
Fig. 8. CMA_ES convergence for walk with step length 0.14 m and step period 0.4 s
Fig. 9. Optimized vertical CoM vertical trajectory for the above walking scenario
As shown in figures 5, 7 and 9, the optimized CoM vertical trajectory for different
walking characteristics are different. By using programmable CPGs, the robot can
change its walking speed, and modulation of the CoM trajectories can be done
autonomously. Figure 10 shows the modulation of CoM trajectory of a walk when the
robot, changes its walking characteristics from walk with step length 0.10 m and step
period 0.8 s to the walk with step length 0.14 m and step period 0.4. This change is
happening after the four seconds from the starting of the walk.
Fig. 10. Modulation of the vertical CoM trajectory by using the CPGs
For the same walking scenario shown in the figure 10, we test the change of the
vertical CoM trajectories, this time with only use the Fourier basis function generator.
Figure 11, shows this experiment. This figure also illustrates that, at the time when the
change is happening, the CoM vertical trajectory has the sudden acceleration. In our
experiment the robot in this scenario fell four times in 10 tests, this happens because
of the explained sudden acceleration in vertical CoM trajectory. Nevertheless, by
using CPGs for the same walking scenario the robot did not fall in 10 times tests,
because of the smooth change in vertical CoM trajectory, as it is shown in figure 10.
Fig. 11. Changing the vertical CoM trajectory by the Fourier based generator
7. Conclusions
This paper presented an approach to create an energy efficient walk. The walking
controller approach is a ZMP based approach, which the ZMP dynamics is modeled
by an inverted Pendulum model. A numerical approach is used to generate the
horizontal CoM trajectory. The main contribution of this paper is the using of the
CPG approach with Fourier based function in order to formulate the vertical CoM
trajectory generator. By using the CMA-ES, an energy efficient walk is achieved for
walking types with different characteristics, including the step lengths and periods.
The results show that by optimizing the vertical CoM trajectory, the energy
consumption of the walk is reduced by as much as 25 percent compared to the
walking with fixed height. For different step lengths and periods, the optimized CoM
vertical trajectory is different in shape and characteristics. By using CPGs the online
modulation and change of the vertical CoM trajectories is done smoothly and without
jerk.
Since a ZMP-based approach is used and CPGs can generate smooth trajectories,
the generated walking is stable, and the risk of hardware damage during the gait
learning procedure is low. Therefore, Future work will be concerned with performing
the gait learning directly on a real NAO robot. For improving the learning
generalization, the linear regression may also be used to obtain the predicted values of
the Fourier basis function terms based on new walk parameter values.
Acknowledgements
The first author is supported by the Foundation for Science and Technology (FCT)
under grant SFRH/BD/66597/2009 and PEst-OE/EEI/UI0127/2014. This work was
funded by FCT in the context of the projects PEst-OE/EEI/UI0027/2014 and also
supported by project Cloud Thinking (funded by the QREN Mais Centro program,ref.
CENTRO-07-ST24-FEDER-002031).
References
1. Gordon, K. E.,Ferris, D. P.,and Kuo, A. D.: Metabolic and Mechanical Energy Costs of
Reducing Vertical Center of Mass Movement During Gait. , Arch. Phys. Med. Rehabil., vol.
90, pp. 136–144, (2009)
2. Kajita, S.,Kanehiro, F.,Kaneko, K.,Yokoi, K.,and Hirukawa, H.: The 3D linear inverted
pendulum mode: a simple modeling for a biped walking pattern generation. , in: IEEE/RSJ
International Conference on Intelligent Robots and Systems, pp. 239–246, (2001)
3. Kajita, B. Y. S.,Nagasaki, T.,Kaneko, K.,and Hirukawa, H.: ZMP-Based Biped Running
Control. in: IEEE/RSJ International Conference on Intelligent Robots and System, , (2007)
4. Kajita, S.,Kanehiro, F.,Kaneko, K.,and Fujiwara, K.: Biped walking pattern generation by
using preview control of zero-moment point. , in: IEEE International Conference on
Robotics and Automation, pp. 1620–1626, (2003)
5. Kuo, A. D.,Donelan, J. M.,and Ruina, A.: Energetic consequences of walking like an
inverted pendulum: step-to-step transitions. , Exerc. Sport Sci. Rev., vol. 33, no. 2, pp. 88–
97, (2005)
6. Kormushev, P.,Ugurlu, B.,Calinon, S.,Tsagarakis, N. G.,and Caldwell, D. G.: Bipedal
walking energy minimization by reinforcement learning with evolving policy
parameterization. , in: 2011 IEEE/RSJ International Conference on Intelligent Robots and
Systems, pp. 318–324, (2011)
7. Hansen, N.: The CMA evolution strategy: A tutorial. , (2005)
8. Righetti, L. and Ijspeert, A. J.: Programmable central pattern generators: an application to
biped locomotion control. , in: IEEE International Conference on Robotics and Automation,
2006, pp. 1585–1590, (2006)
9. Vukobratović, M. and Juricić, D.: Contribution to the synthesis of biped gait. , IEEE Trans.
Biomed. Eng., vol. 16, no. 1, pp. 1–6, (1969)
10.Kagami, S.,Nishivaki, K.,Inaba, M.,and Inoue, H.: A Fast Dynamically Equilibrated
Walking Trajectory Generation Method of Humanoid Robot. , Auton. Robots, vol. 12, no. 1,
pp. 71–82, (2002)
11.Domingues, E.,Lau, N.,Pimentel, B.,Shafii, N.,Reis, L.,and Neves, A.: Humanoid behaviors:
from simulation to a real robot. .in: EPIA’11, Springer, pp. 352–364, (2011)
12.Pikovsky, A.,Rosenblum, M.,and Kurths, J.: Synchronization: A Universal Concept in
Nonlinear Sciences. Cambridge University Press, (2003)
13.Righetti, L.,Buchli, J.,and Ijspeert, A.: Dynamic Hebbian learning in adaptive frequency
oscillators. , Phys. D Nonlinear Phenom., vol. 216, no. 2, pp. 269–281, (2006)
14.Acebrón, J.,Bonilla, L.,Pérez Vicente, C.,Ritort, F.,and Spigler, R.: The Kuramoto model: A
simple paradigm for synchronization phenomena. , Rev. Mod. Phys., vol. 77, no. 1, pp. 137–
185, (2005)
15.MacAlpine, P.,Barrett, S.,Urieli, D.,Vu, V.,and Stone, P.: Design and optimization of an
omnidirectional humanoid walk: A winning approach at the RoboCup 2011 3D simulation
competition. , in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial
Intelligence,,pp. 1047-1053, (2012)
16.Farchy, A.,Barrett, S.,MacAlpine, P.,and Stone, P.: Humanoid robots learning to walk faster:
From the real world to simulation and back. , In: international conference on Autonomous
agents and multi-agent systems, pp. 39-46, (2013)