Foothold Estimation
Foothold Estimation
Dominik Belter1 , Jakub Bednarek1 , Hsiu-Chin Lin2 , Guiyang Xin2 , Michael Mistry2
7442
Our goal is to select an optimal foothold from a set
of potential footholds online using the information from
the local map that satisfies the following constraints.
0.4 z [m]
First, the robot should avoid selecting footholds on sharp 0.1
-6
edges and/or steep slopes because they are potentially -0.2
6 -4
B. Dataset Fig. 3. Elevation map created offline to collect data for training the
neural network.
Training a model from the robot is expensive, time-
consuming, and dangerous since sensor data is prone to
2) terrain cost: The terrain cost c m evaluates the prop-
noise. Thus, we took the transfer learning approach, where
erty of the local map. We use the same cost function
data are gathered in simulation but used on the real robot.
with the hexapod Messor robot [9] since both robots
To train the neural network we collect the samples on
have a similar hemispherical foot. The c m value is
the 12 × 12m elevation map presented in Fig. 3. The map
normalized.
was created offline by composing maps obtained during
various experiments on the robot. We also added the flat Finally, we compute the final cost of the considered
region, steps with various height, concavities, and bumps foothold c f and scale the cost to the range [0, 255]:
to increase the variation of foothold examples. ck + 2 · cm
cf =· 255. (1)
Since the robot is symmetrical, we train the models for 3
the right legs and adapt it for the left legs. To this end, For each cell, we repeat the above procedure and save
we have to flip horizontally the input terrain map and the input (elevation map) and the output (terrain cost).
after inference, we flip horizontally the obtained cost map. We collected 20000 training pairs for each leg.
Therefore, we only need to collect data for two legs and Examples of training data are presented in Fig. 4. The
train two separate models. first two columns present the computed cost maps for
To generate training data, we randomly select the po- the flat terrain. In this case, the output depends on the
sition of the robot on the map (horizontal position and leg’s workspace and the kinematic margin. The distance
distance to the ground). The orientation of the robot on between the terrain and the robot is larger on the map in
the horizontal plane (yaw angle) is randomly selected Fig. 4a than in Fig. 4b. The output cost map also differs.
from four main orientations: n · Π2 , for n = 0, 1, 2, 3. For the The obtained cost maps (Fig. 4f and Fig. 4g) represents the
obtained pose of the robot, we compute the pose of the horizontal cross-section over the workspace of the robot’s
i -th leg and extract a 40×40 local map. For each cell of the leg. The yellow cells represent positions of the foot which
map, we quality the cost of a foothold c f ∈ [0, 255] based are outside the workspace and are inaccessible for the
on a set of constraints. This quantity is first evaluated robot (c f = 255). Collisions, the edges on the obstacles or
based on the following hard constraints: slopes also increase significantly the cost of footholds and
1) kinematic range of the leg: If the given position of the neural network classifies them as inaccessible (yellow
the foot is outside the workspace of the considered color). The red cells correspond to acceptable footholds. In
leg, we set the cost to the maximal value c f = 255, the following examples in Fig. 4c–e the terrain is irregular
and we do not check other constraints. and we can observe how the workspace of the robot is
2) self-collisions and collision with the ground: We use limited by the terrain shape.
Flexible Collision Library [24] to determine whether C. Convolutional Neural Network
there is any collision between any pairs of rigid
We aim to learn a mapping between the features extracted
bodies and with the ground, except the collision
from a local elevation map and the quality of potential
between the foot and the terrain. A foothold is
footholds. In this work, we choose Convolutional Neural
rejected (i.e., c f = 255) if there is a collision, and
Network (CNN), since this architecture runs in real-time
we do not check further.
on machines without GPUs.
If a given potential foothold is collision-free and the foot is Because CNN is much more efficient in solving classifi-
within leg’s workspace, the quality of selecting a foothold cation than regression, we discretize the terrain costs into
is evaluated using the following evaluation criteria: C different classes. In this work, C is set to 14 since it is
1) kinematic margin: The kinematic margin c k is the sufficient to distinguish between weak and good footholds
distance between the current position of the foot and we can easily provide a sufficient number of training
and the border of the workspace. The maximal value samples for each class.
of c k means that the leg has the maximal motion The proposed CNN architecture is an Efficient Residual
range. The c k value is normalized. Factorized ConvNet (ERF) first introduced in [25]. The
7443
a b c d e
f g h i j
Fig. 4. Example training data: local elevation maps (a,b,c,d,e), and corresponding terrain cost (f,g,h,i,j) (red color – acceptable footholds, yellow
– unacceptable footholds). Note that both subfigures a and b represent flat terrain but elevation is different. Thus, the acceptable region (red area
which corresponds to the leg’s workspace) obtained from the neural network is different in subfigures f and g.
128
cross-entropy loss and regularization loss. In the training
64 64
Input
16 16
Output
dataset, the most examples are provided for the class
40x40x1 40x40xC
which represents footholds inaccessible for the robot. To
Fig. 5. Model of the ERF network. Light blue blocks represent handle unbalanced data, the cross-entropy is additionally
downsampling, dark blue - upsampling by transposed convolution and weighted [27] based on the frequency of occurrence.
white blocks show residual layers. Numbers below blocks describes the Namely, the weight of the i t h class w i is defined by
number of feature maps used in specific levels. C denotes the number
of classes (14 in the current implementation). 1
wi = (2)
log(c + p i )
where c = 1.08 is a constant and p i is a probability of the
characteristics of this model is the modification of the
occurrence of the i t h class in the entire training dataset.
residual layer [26] called residual non-bottleneck 1D layer.
We use the method presented in [28] for training the
The 2D convolution with a 3×3 filter is replaced by two
model with an initial learning rate of 5e-4. Additionally,
2D convolutions with filters shape 3×1 and 1×3. This
the exponential decay was applied after each epoch to the
approach reduces the number of variables and complexity.
learning rate with a factor of 0.98. Because of the nature
The ERF model is shown in Fig. 5. First, the input of the training examples, we can’t use any of the known
data is processed twice by downsampling blocks. The data augmentation methods.
downsampling blocks are created from the concatenation In order to measure the quality of models accuracy, an
of the max pooling and 2D convolution with a 3×3 Intersection over Union (IoU) metrics were calculated. The
filter and a stride of 2. The concatenation is followed learning process took place in 500 epochs. The results
by the activation function. Then, five residual layers and obtained by two ERF models for front and rear legs are
another downsample block are added. The output of the shown in Tab. I. The IoU value is higher than 82%. Note
encoder part is processed by eight residual layers which that, this value does not represent directly the quality of
are interwoven with different dilation rate applied to the the foothold selection module. Although the learnt model
convolutions. The decoder part of the model consists of misclassifies 22% of the footholds, most of the errors
two series of convolutional upsampling and two residual are between neighboring classes, which is not a crucial
layers. The upsampling is performed by transposing con- problem. For example, if the foothold is classified as a
volution with a stride of 2. The output of the model is class number 13 instead of the class number 14 it is still
produced by upsampling convolution with 2×2 filters and considered as a very weak foothold.
strides of 2, where the number of filters is equal to the
number of classes. The Activation function used in each D. Inference procedure
nonlinear layer is a rectified linear unit (ReLU). The inference procedure is presented in Fig. 7. In the
The optimized objective of the model is composed of first step, we get submap from the global map built by
7444
a b
Fig. 6. Environment configuration during experiment with the ANYmal robot on stairs (a) and on rough terrain (b) in the Gazebo simulator. Blue
lines represent feet trajectories.
7445
a a
b
d
e c
7446
R EFERENCES [15] M. A. Hoepflinger, M. Hutter, C. Gehring, M. Bloesch, and R.
Siegwart, Unsupervised identification and prediction of foothold
[1] M. Hutter, C. Gehring, D. Jud, A. Lauber, C. Bellicoso, V. Tsounis,
robustness, IEEE International Conference on Robotics and Au-
J. Hwangbo, K. Bodie, P. Fánkhauser, M. Bloesch, R. Diethelm,
tomation, pp. 3293–3298, 2013
S. Bachmann, A. Melzer, M. Hoepflinger, ANY-mal - a highly
[16] M. Focchi, R. Orsolino, M. Camurri, V. Barasuol, C. Mastalli, D.G.
mobile and dynamic quadrupedal robot, Proc. IEEE International
Caldwell, C. Semini, Heuristic Planning for Rough Terrain Locomo-
Conference on Intelligent Robots, pp. 38–44, 2016
tion in Presence of External Disturbances and Variable Perception
[2] D. Hyun, D. Jin, S. Seok, J. Lee, S. Kim, High speed trot running:
Quality, arXiv:1805.10238v3, 2018
Implementation of a hierarchical controller using proprioceptive
[17] V. Barasuol, M. Camurri, S. Bazeille, D.G. Caldwell, C. Semini,
impedance control on the MIT Cheetah, The International Journal
Reactive Trotting with Foot Placement Corrections through Visual
of Robotics Research, Vol. 33(11), pp. 1417–1445, 2014
Pattern Classification, IEEE/RSJ International Conference on Intel-
[3] D. Wooden, M. Malchano, K. Blankespoor, A. Howardy, A.A. Rizzi,
ligent Robots and Systems, pp. 5734–5741, 2015
M. Raibert, Autonomous navigation for BigDog, 2010 IEEE Inter-
[18] J. Rebula, P. Neuhaus, B. Bonnlander, M. Johnson, and J. Pratt. A
national Conference on Robotics and Automation, pp. 4736–4741,
controller for the LittleDog quadruped walking on rough terrain. In
2010
Proc. IEEE Int. Conf. on Robotics and Automation, pp. 1467–1473,
[4] A. Hornung, K. Wurm, M. Bennewitz, C. Stachniss, W. Burgard,
2007
OctoMap: An efficient probabilistic 3D mapping framework based
[19] M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal, Learning locomotion
on octrees, Autonomous Robots, Vol. 34(3), pp. 189–206, 2013
over rough terrain using terrain templates. In Proc. IEEE Int. Conf.
[5] J. Saarinen, H. Andreasson, T. Stoyanov, A.J. Lilienthal, 3D normal
on Intelligent Robots and Systems, pp. 167–172, 2009
distributions transform occupancy maps: An efficient representa-
[20] J. Kolter, M. Rodgers, A. Ng, A control architecture for quadruped
tion for mapping in dynamic environments, International Journal
locomotion over rough terrain. In Proc. IEEE Int. Conf. on Robotics
of Robotics Research, Vol. 32(14), pp. 1627–1644, 2013
and Automation, pp. 811–818, 2008
[6] D. Belter, P. Łabecki, Fánkhauser, R. Siegwart, RGB-D terrain percep- [21] O. Villarreal, V. Barasuol, M. Camurri, M. Focchi, L. Franceschi,
tion and dense mapping for legged robots, International Journal of M. Pontil, D.G. Caldwell, C. Semini, Fast and Continuous Foothold
Applied Mathematics and Computer Science, Vol. 26(1), pp. 81–97, Adaptation for Dynamic Locomotion through Convolutional Neural
2016Automation, pp. 5761–5768, 2018 Networks, arXiv, 2019
[7] P. Fánkhauser, M. Bloesch, C. Gehring, M. Hutter, R. Siegwart, [22] A. Roennau, T. Kerscher, M. Ziegenmeyer, J. M. Zöllner, and R.
Robot-Centric Elevation Mapping with Uncertainty Estimates, Inter- Dillmann. Six-legged walking in rough terrain based on foot point
national Conference on Climbing and Walking Robots, pp. 433–440, planning. In O. Tosun, M. Tokhi, G. Virk, and H.L. Akin (Eds.), Mo-
2014 bile Robotics: Solutions and Challenges, World Scientific, pp. 591–
[8] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification 598, 2009
with deep convolutional neural networks, Proc. of the International [23] D. Belter, P. Łabecki, P. Skrzypczyński, Adaptive Motion Planning for
Conference on Neural Information Processing Systems, pp. 1097– Autonomous Rough Terrain Traversal with a Walking Robot, Journal
1105, 2012 of Field Robotics, Vol. 33(3), pp. 337–370, 2016
[9] D. Belter, P. Skrzypczyśki, Rough terrain mapping and classification [24] J. Pan, S. Chitta, D. Manocha, FCL: A General Purpose Library for
for foothold selection in a walking robot, Journal of Field Robotics, Collision and Proximity Queries, IEEE International Conference on
Vol. 28(4), pp. 497-528, 2011 Robotics and Automation, pp. 3859–3866, 2012
[10] M. Kopicki, R. Detry, M. Adjigble, R. Stolkin, A. Leonardis, J.L. Wyatt, [25] E. Romera, J.M. Alvarez, L.M. Bergasa, R. Arroy, ERFNet: Efficient
One shot learning and generation of dexterous grasps for novel Residual Factorized ConvNet for Real-Time Semantic Segmentation.
objects, International Journal of Robotics Research, pp. 959–976, IEEE Transactions on Intelligent Transportation Systems Vol. 19,
Vol. 35(8), 2015 pp. 263–272, 2018
[11] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. Aparicio, [26] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image
K. Goldberg, Dex-Net 2.0: Deep Learning to Plan Robust Grasps Recognition, IEEE Conference on Computer Vision and Pattern
with Synthetic Point Clouds and Analytic Grasp Metrics, Robotics Recognition, pp. 837–840, 2017
Science and Systems, 2017 [27] A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, ENet: A Deep
[12] M. Gualtieri, A. ten Pas, K. Saenko, R. Platt, High precision grasp Neural Network Architecture for Real-Time Semantic Segmentation,
pose detection in dense clutter, IEEE/RSJ International Conference arXiv:1606.02147, 2016
on Inteligent Robots and Systems, pp. 598–605, 2016 [28] D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization,
[13] E. Krotkov, R. Simmons. Perception, planning, and control for International Conference on Learning Representations, 2014
autonomous walking with the Ambler planetary rover. International [29] P. Fankhauser, M. Bjelonic, C.D. Bellicoso, T. Miki, M. Hutter,
Journal of Robotics Research, Vol. 15(2), pp. 155–180, 1996 Robust Rough-Terrain Locomotion with a Quadrupedal Robot, IEEE
[14] C.-H. Chen, V. Kumar. Motion planning of walking robots in International Conference on Robotics and and Systems, Daejeon,
environments with uncertainty. In Proc. IEEE Int. Conf. on Robotics South Korea, pp. 38–44, 2016
and Automation, pp. 3277–3282, 2009
7447