scirobotics.abk2822 --
scirobotics.abk2822 --
Legged robots that can operate autonomously in remote and hazardous environments will greatly increase op-
portunities for exploration into underexplored areas. Exteroceptive perception is crucial for fast and energy-efficient
locomotion: Perceiving the terrain before making contact with it enables planning and adaptation of the gait
ahead of time to maintain speed and stability. However, using exteroceptive perception robustly for locomotion
has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which
the robot cannot step or are missing altogether due to high reflectance. In addition, depth perception can degrade
due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason,
the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely
limits locomotion speed because the robot has to physically feel out the terrain before adapting its gait accordingly.
Here, we present a robust and general solution to integrating exteroceptive and proprioceptive perception for
legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and extero-
ceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities
without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The
Learning-based quadrupedal or bipedal locomotion for simulated by leveraging a robot-centric elevation map. The elevation map serves
characters has been achieved by using reinforcement learning (RL) as an abstraction layer between sensors and the locomotion controller,
(29–32), and realistic robot models were used in recent works (33). making our method independent of depth sensor choices. It works
However, these works were only conducted in simulation. Recently, with no fine-tuning with different sensors, such as stereo cameras or
RL-based locomotion controllers have been successfully transferred LiDAR. Because the policy was trained to handle large noises, bias,
to physical robots (3, 4, 34–40). Hwangbo et al. (3, 41) realized quadru and gaps in the elevation map, the robot can continue walking even
pedal locomotion and recovery on flat ground with a physical robot when mapping fails or the sensors are physically broken.
by using learned actuator dynamics to facilitate simulation-to-reality The presented approach achieves substantial improvements over
(sim-to-real) transfer. Lee et al. (4) extended this approach and enabled the state of the art (4) in locomotion speed and obstacle traversability
rough-terrain locomotion by simulating challenging terrain in a while maintaining exceptional robustness. Our key contribution is a
privileged training setup with an adaptive curriculum. Peng et al. method for combining multimodal perception and demonstrating
(35) used imitation learning to transfer animal motion to a legged with extensive hardware experiments that the resulting control policy
robot. However, these methods do not use any visual information. is robust against various exteroceptive failures. Handling exterocep
To add exteroceptive information to locomotion learning, tion failures has been a challenging problem in robotics. Our approach
Gangapurwala et al. (42) combined a learning-based foothold planner constitutes a general framework for robust deployment of complex
and a model-based whole-body motion controller to transfer policies autonomous machines in the wild.
to the real world in a laboratory setting. Their applications are limited
to rigid terrain with mostly flat surfaces and are still constrained in
their deployment range. Their performance is tightly bound to the RESULTS
quality of the map, which often becomes unreliable in the field. Fast and robust locomotion in the wild
In both model-based and learning-based approaches, the assump We deployed our controller in a wide variety of terrain, as shown in
complete an hour-long hiking loop on the Etzel mountain in The robot was able to reach the summit in 31 min, which is faster
Switzerland. The hiking route was 2.2 km long, with an elevation gain than the expected human hiking duration indicated in the official
of 120 m. Completing the trail required traversing steep inclinations, signage (35 min as shown in Fig. 2), and finished the entire path in
high steps, rocky surfaces, slippery ground, and tree roots (Fig. 2). 78 min, virtually the same duration suggested by a hiking planner
As seen in Movie 2, ANYmal completed the entire hike without any (76 min), which rates the hike “difficult” (47). The difficulty levels
failure, stopping only to fix a detached shoe and swap batteries. are chosen from “easy,” “moderate,” and “difficult,” calculated by
because the sensors were only located on the robot itself, areas be
hind structures were occluded and not presented in the map, which
was especially problematic during uphill walking (Fig. 3G).
Overall, our controller could handle all of these challenging con
ditions gracefully, without a single failure. The belief state estimator
was trained to assess the reliability of exteroceptive information and
made use of it to the extent possible. When exteroceptive informa
tion was incomplete, noisy, or misleading, the controller could always
gracefully degrade to proprioceptive locomotion, which was shown
to be robust (4). The controller thus aims to achieve the best of both
worlds: achieving fast predictive locomotion when exteroceptive
information is informative but seamlessly retaining the robustness
of proprioceptive control when it is not.
Movie 1. Wild ANYmal: Robust zero-shot perceptive locomotion.
Evaluating the contribution of exteroception
We conducted controlled experiments to quantitatively evaluate the
combining the required fitness level, sport type, and the technical contribution of exteroception. We compared our controller with a
complexity (48). proprioceptive baseline (4) that does not use exteroception.
During the hike, the controller faced various challenges. The as First, we compared the success rate of overcoming fixed-height
cending path reached inclinations of up to 38% with rocky and wet steps as shown in Fig. 4A. Wooden steps of various heights (from 12
surfaces (Fig. 2, B and C). On the descent through a forest, tree roots to 36.5 cm) were placed ahead of the robot, which performed 10
Evaluating robustness
with belief state
visualization
To examine how our controller
integrates proprioception and
exteroception, we conducted a
number of controlled experiments.
We tested with two types of ob
stacles that provide ambiguous
or misleading exteroceptive in
put: an opaque foam obstacle that
appears solid but cannot support a foothold and a solid but trans misleading obstacle: The controller initially trusted the exteroceptive
parent obstacle. We placed each obstacle ahead of the robot and input (red) but quickly revised its estimate of terrain height upon
commanded the robot to walk forward at a constant velocity. contact. Once the correct belief had been formed, it was retained
The sensors perceived the foam block as solid, and the robot con even after the foot left the ground, showing that the controller retains
sequently prepared to step on it but could not achieve a stable foot past information due to its recurrent structure.
hold due to the deformation of the foam. Figure 5A shows how the The transparent obstacle is a block made of clear, acrylic plates
internal belief state (blue) was revised as the robot encounters the that were not accurately perceived by the onboard sensors (Fig. 5B).
The robot therefore walked as if it was on flat ground until it made In the next experiment, we simulated complete exteroception
contact with the step, at which point it revised its estimate of terrain failure by physically covering the sensors, thus making them fully
profile upward and changed its gait accordingly. uninformative (Fig. 5, C and D). The robot was commanded to walk
up and down two steps of stairs. With an unobstructed sensor, the
controller traversed the stairs gracefully, without any unintended
contact with the stair risers, adjusting its footholds and body pos
ture to step down the stairs softly. When the sensors were covered,
the map had no information, and the controller received random
noise as input. Under this condition, the robot made contact with
the riser of the first stair, which could not be perceived in advance,
revised its estimate of the terrain profile, adjusted its gait accordingly,
and successfully climbed the stairs. On the way down, the blinded
robot made a hard landing with its front feet but kept its balance
and stepped down softly with its hind legs.
Last, we tested locomotion over an elevated slippery surface
(Fig. 5E). After the robot stepped onto the slippery platform, it de
tected the low friction and adapted its behavior to step faster and keep
Movie 2. Hiking at Etzel. its balance. The momentarily sliding feet violated the assumption of
Fig. 3. Exteroceptive representation and challenges. Our locomotion controller perceives the environment through height samples (red dots) from an elevation map
(A). The controller is robust to many perception challenges commonly encountered in the field: missing map information due to sensing failure (B, C, and G) and misleading
map information due to nonrigid terrain (D and E) and pose estimation drift (F).
the kinematic pose estimator, which, in turn, destabilized the esti DISCUSSION
mated elevation map and rendered exteroception uninformative We have presented a fast and robust quadrupedal locomotion con
during this time. The controller seamlessly fell back on propriocep troller for challenging terrain. The controller seamlessly integrates
tion until the estimated elevation map stabilized and exteroception exteroceptive and proprioceptive input. Exteroceptive perception
became informative again. enables the robot to traverse the environment quickly and gracefully
Fig. 5. Internal belief state inspection during perceptive failure using a learned belief decoder. Red dots indicate height samples given as input to the policy. Blue
dots show the controller’s internal estimate of the terrain profile. (A) After stepping on a soft obstacle that cannot support a foothold, the policy correctly revises its esti-
mate of the terrain profile downward. (B) A transparent obstacle is correctly incorporated into the terrain profile after contact is made. (C) With operational sensors, the
robot swiftly and gracefully climbs the stairs, with no spurious contacts. (D) When the robot is blinded by covering the sensors, the policy can no longer anticipate the
terrain but remains robust and successfully traverses the stairs. (E) When stepping onto a slippery platform, the policy identifies low friction and compensates for the in-
duced pose estimation drift. The graph shows a decoded friction coefficient.
by anticipating the terrain and adapting its gait accordingly before integration of exteroceptive and proprioceptive inputs is learned
contact is made. When exteroceptive perception is misleading, in end to end and does not require any hand-coded rules or heuristics.
complete, or missing altogether, the controller smoothly transitions The result is a rough-terrain legged locomotion controller that com
to proprioceptive locomotion. The controller remains robust under bines the speed and grace of vision-based locomotion with the high
all conditions, including when the robot is effectively blind. The robustness of proprioception.
model is independent of the specific exteroceptive sensors. (We use First, a teacher policy is trained with RL to follow a random
LiDAR and stereo cameras in different deployments, with no re target velocity over randomly generated terrain with random dis
training or fine-tuning.) However, the elevation map representa turbances. The policy has access to privileged information such as
tion omits detail that may be present in the raw sensory input and noiseless terrain measurements, ground friction, and the distur
may provide additional information concerning material and texture. bances that were introduced.
Furthermore, our elevation map construction relies on a classical In the second stage, a student policy is trained to reproduce
pose estimation module that is not trained jointly with the rest of the teacher policy’s actions without using this privileged infor
the system. Appropriately folding the processing of raw sensory in mation. The student policy constructs a belief state to capture
put into the network may further enhance the speed and robustness unobserved information using a recurrent encoder and outputs
of the controller. In addition, an occlusion model could be learned, an action based on this belief state. During training, we leverage
such that the policy understands that there is an occlusion behind two losses: a behavior cloning loss and a reconstruction loss. The
the cliff and avoids stepping off it. Another limitation is the inability behavior cloning loss aims to imitate the teacher policy. The re
to complete locomotion tasks, which would require maneuvers very construction loss encourages the encoder to produce an informa
different from normal walking, for example, recovering from a leg tive internal representation.
stuck in narrow holes or climbing onto high ledges. Last, we transfer the learned student policy to the physical robot
and deploy it in the real world with onboard sensors. The robot
constructs an elevation map by integrating depth data from on
MATERIALS AND METHODS board sensors and samples height readings from the constructed
Overview elevation map to form the exteroceptive input to the policy. This
We train a neural network policy in simulation and then perform exteroceptive input is combined with proprioceptive sensory data
zero-shot sim-to-real transfer. Our method consists of three stages, and is given to the neural network, which produces actuator
as illustrated in Fig. 6. commands.
same reward is applied to the yaw command as well. We penalize behavior cloning loss is defined as the squared distance between the
the velocity component orthogonal to the desired velocity as well as student action and the teacher action given the same state and com
the body velocity around roll, pitch, and yaw. In addition, we use mand. The reconstruction loss is the squared distance between the
shaping rewards for body orientation, joint torque, joint velocity, joint noiseless height sample and privileged information (oet , spt ) and their
acceleration, and foot slippage as well as shank and knee collision. reconstruction from the belief state. We generate samples by rolling
Body orientation reward was used to avoid strange postures of out the student policy to increase robustness (60, 61).
the body. Joint-related reward terms were used to avoid overly Height sample randomization
aggressive motion. Foot slippage and collision reward terms were During student training, we inject random noise into the height
used to avoid them. We tuned the reward terms by looking at the samples using a parameterized noise model n ( o˜et ∣oet , z), z ∈ ℝ8 × 4.
policy’s behavior in simulation. In addition to the traversal per We apply two different types of measurement noise when sampling
formance, we checked the smoothness of the locomotion. All reward the heights, as shown in Fig. 7A:
terms are specified in section S7. 1) Shifting scan points laterally.
Curriculum 2) Perturbing the height values.
We use two curricula to ramp up the difficulty as the policy’s per Each noise value is sampled from a Gaussian distribution, and
formance improves. One curriculum adjusts the terrain difficulty the noise parameter z defines the variance. Both types of noise are
using an adaptive method (4), and the other changes elements such applied in three different scopes, all with their own noise variance:
as reward or applied disturbances using a logistic function (3). per scan point, per foot, and per episode. The noise values per scan
For the terrain curriculum, a particle filter updates the terrain point and per foot are resampled at every time step, while the epi
parameters such that they remain challenging but achievable at any sodic noise remains constant for all scan points.
point during policy training (4). The second curriculum multiplies In addition, we define three mapping conditions with associated
the magnitude of domain randomization and some reward terms noise parameters z to simulate changing map quality and error sources,
The same gate is used in the decoder, where it is used to recon 15. D. Belter, P. Skrzypczyński, Rough terrain mapping and classification for foothold
struct the privileged information and the height samples (Fig. 7D). selection in a walking robot, in 2010 IEEE Safety Security and Rescue Robotics, Bremen,
Germany, 26 to 30 July 2010 (IEEE, 2010), pp. 1–6.
This is used to calculate a reconstruction loss that encourages the
16. P. Fankhauser, M. Bloesch, C. Gehring, M. Hutter, R. Siegwart, Robot-centric elevation
belief state to capture veridical information about the environment. mapping with uncertainty estimates, in Mobile Service Robotics (World Scientific, 2014),
We use the GRU (62) as our RNN architecture. The evaluation pp. 433–440.
of the effectiveness of the gate structure is presented in section S9. 17. P. Fankhauser, M. Bloesch, M. Hutter, Probabilistic terrain mapping for mobile robots
with uncertain localization. IEEE Robot. Autom. Lett. 3, 3019–3026 (2018).
18. M. Zucker, J. A. Bagnell, C. G. Atkeson, J. Kuffner, An optimization approach to rough
Deployment terrain locomotion, in 2010 IEEE International Conference on Robotics and Automation
We deployed our controller on the ANYmal-C robot with two dif (IEEE, 2010), pp. 3589–3595.
ferent sensor configurations, either using two Robosense Bpearl (67) 19. P. D. Neuhaus, J. E. Pratt, M. J. Johnson, Comprehensive summary of the institute
dome LiDAR sensors or four Intel RealSense D435 depth cameras for human and machine cognition's experience with LittleDog. Int. J. Robot. Res. 30,
216–235 (2011).
(68). We trained our policy in PyTorch (69) and deployed on the
20. J. Z. Kolter, Y. Kim, A. Y. Ng, Stereo vision and terrain modeling for quadruped robots,
robot zero-shot without any fine-tuning. We build a robot-centric in 2009 IEEE International Conference on Robotics and Automation (IEEE, 2009),
2.5D elevation map at 20 Hz by estimating the robot’s pose and reg pp. 1557–1564.
istering the point-cloud readings from the sensors accordingly. The 21. I. Havoutis, J. Ortiz, S. Bazeille, V. Barasuol, C. Semini, D. G. Caldwell, Onboard
policy runs at 50 Hz and samples the heights from the latest eleva perception-based trotting and crawling with the hydraulic quadruped robot (HyQ), in
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2013),
tion map, filling a randomly sampled value if no map information is pp. 6052–6057.
available at a query location. 22. C. Mastalli, M. Focchi, I. Havoutis, A. Radulescu, S. Calinon, J. Buchli, D. G. Caldwell,
We developed an elevation mapping pipeline for fast terrain mapping C. Semini, Trajectory and foothold optimization using low-dimensional models for rough
on a graphics processing unit to parallelize point-cloud processing. terrain locomotion, in 2017 IEEE International Conference on Robotics and Automation
(ICRA) (IEEE, 2017), pp. 1096–1103.
We follow a similar approach to that used by Fankhauser et al. (17)
42. S. Gangapurwala, M. Geisert, R. Orsolino, M. Fallon, I. Havoutis, RLOC: Terrain-aware legged 60. S. Ross, G. Gordon, J. D. Bagnell, A reduction of imitation learning and structured
locomotion using reinforcement learning and optimal control. arXiv:2012.03094 (2020). prediction to no-regret online learning, in Proceedings of the Fourteenth International
43. M. Focchi, R. Orsolino, M. Camurri, V. Barasuol, C. Mastalli, D. G. Caldwell, C. Semini, Conference on Artificial Intelligence and Statistics (JMLR Workshop and Conference
Heuristic planning for rough terrain locomotion in presence of external disturbances and Proceedings, 2011), pp. 627–635.
variable perception quality, in Advances in Robotics Research: From Lab to Market 61. W. M. Czarnecki, R. Pascanu, S. Osindero, S. Jayakumar, G. Swirszcz, M. Jaderberg,
(Springer, 2020), pp. 165–209. Distilling policy distillation, in Proceedings of Machine Learning Research, K. Chaudhuri,
44. Boston Dynamics, Spot user guide release 2.0 version A (2021); www.generationrobots. M. Sugiyama, Eds. (PMLR, 2019), pp. 1331–1340.
com/media/spot-boston-dynamics/spot-user-guide-r2.0-va.pdf [online; accessed 62. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio,
June 2021]. Learning phrase representations using rnn encoder-decoder for statistical machine
45. D. Chen, B. Zhou, V. Koltun, P. Krähenbühl, Learning by cheating, in Conference on Robot translation, in Conference on Empirical Methods in Natural Language Processing (EMNLP)
Learning (PMLR, 2020), pp. 66–75. (2014), pp. 1724–1734.
46. M. Bloesch, M. Hutter, M. A. Hoepflinger, S. Leutenegger, C. Gehring, C. D. Remy, 63. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780
R. Siegwart, State estimation for legged robots-consistent fusion of leg kinematics (1997).
and IMU. Robotics 17, 17–24 (2013). 64. T. Anzai, K. Takahashi, Deep gated multi-modal learning: In-hand object pose changes
47. Komoot, Etzel kulm loop hike (2021); https://bit.ly/35bjfyE [online; accessed June 2021]. estimation using tactile and image data, in 2020 IEEE/RSJ International Conference on
48. Komoot, Komoot help guides (2021); https://d21buns5ku92am.cloudfront.net/67683/ Intelligent Robots and Systems (IROS) (IEEE, 2020), pp. 9361–9368.
documents/40488-Komoot [online; accessed December 2021]. 65. J. Kim, J. Koh, Y. Kim, J. Choi, Y. Hwang, J. W. Choi, Robust deep multi-modal learning
49. R. C. Coulter, Implementation of the pure pursuit path tracking algorithm, Tech. rep., based on gated information fusion network, in Asian Conference on Computer Vision
Carnegie-Mellon UNIV Pittsburgh PA Robotics INST (1992). (Springer, 2019), pp. 90–106.
50. M. Tranzatto, F. Mascarich, L. Bernreiter, C. Godinho, M. Camurri, S. M. K. Khattak, T. Dang, 66. J. Arevalo, T. Solorio, M. Montes-y Gómez, F. A. González, Gated multimodal units for
V. Reijgwart, J. Loeje, D. Wisth, S. Zimmermann, H. Nguyen, M. Fehr, L. Solanka, information fusion, ICLR workshop (2017).
R. Buchanan, M. Bjelonic, N. Khedekar, M. Valceschini, F. Jenelten, M. Dharmadhikari, 67. Rs-bpearl (April 2021); www.robosense.ai/en/rslidar/RS-Bpearl.
T. Homberger, P. De Petris, L. Wellhausen, M. Kulkarni, T. Miki, S. Hirsch, M. Montenegro, 68. Intel RealSense (April 2021); www.intelrealsense.com/.
C. Papachristos, F. Tresoldi, J. Carius, G. Valsecchi, J. Lee, K. Meyer, X. Wu, J. Nieto, 69. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
A. Smith, M. Hutter, R. Y. Siegwart, M. Mueller, M. Fallon, K. Alexis, CERBERUS: N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
Science Robotics (ISSN ) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science Robotics is a registered trademark of AAAS.
Copyright © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works