Robust High-Speed Running For Quadruped Robots Via
Robust High-Speed Running For Quadruped Robots Via
Fig. 5: Motion snapshots of running over random rough terrain up to 0.04 m in height, while carrying a 6 kg load. The
quadruped is able to recover from feet catching on the box edges, shown here for the rear right foot.
TABLE III: Randomized physical parameters and their 1. How does environmental noise affect the agent’s ability
ranges. to learn, as well as resulting policies and gaits?
Parameter Lower Bound Upper Bound
2. Which environmental noise parameters are most critical
Mass (each body link) 80% 120% for a successful sim-to-sim transfer?
Added mass 0 kg 15 kg 3. How does action space choice affect the sim-to-sim
Added mass base offset [-0.15,-0.05,-0.05] m [0.15,0.05,0.05] m transfer?
Coefficient of friction 0.5 1
1) Training curves: Figure 6 shows training curves for the
environment described in Sec. III-D over flat terrain, as well
TABLE IV: PPO Hyperparameters.
as for randomly varying rough terrain. In this latter case, at
Parameter Value the start of each environment reset, we generate 100 random
Horizon (T) 4096
Optimizer Adam
boxes up to 0.04 m high, up to 1 m wide and with random
Learning rate 1 · 10−4 orientation. These are randomly distributed in a 20 × 6 m
Number of epochs 10 grid in front of the agent, and walls are added at ±3 m on the
Minibatch size 128 y axis so the agent cannot learn to avoid the rough terrain.
Discount (γ) 0.99
GAE prarmeter (λ) 0.95 Results show that training converges within a few million
Clipping parameter () 0.2 timesteps on both flat and rough terrain.
VF coeff. (c1 ) 1
Number of hidden layers (all networks) 2 There is a classic trade-off between speed and robustness
Number of hidden units per layer [200, 100] on the rough terrain, as the agent needs to lift its feet
Nonlinearity tanh higher to clear the terrain, which necessitates expending more
energy as well as decreasing forward velocity. Also, we note
that the agent is still likely to catch its feet on the sides of the
perceptrons with two hidden layers of 200 and 100 neurons boxes, but is able to recover in most cases if the added mass
respectively, with tanh activation. Other training hyperparam- is not too large, or if the speed is not too high. As a reactive
eters are listed in Table IV. controller can only do so well in such instances, in future
Snapshots from executing the learned policy are shown work we plan to incorporate exteroceptive measurements
over flat terrain in Figure 3, with a a 10 kg load in Figure 4, such as vision to run over even rougher terrain.
over rough terrain with blocks of up to 0.04 m high and a 2) Effects of Dynamics Randomization on Sim-to-Sim
load of 6 kg in Figure 5, and from transferring the policy Transfer: To evaluate the requirement and usefulness of
to Gazebo in Figure 1, shown with and without running our dynamics randomization choices, we compare policies
with a 10 kg mass. The reader is encouraged to watch the trained with systematically increasing environment noise
supplementary video for clearer visualizations. parameters in PyBullet, and their performance at test time in
For our experiments, we are specifically interested in the a sim-to-sim transfer. Each trained policy is run in Gazebo
following questions: for 10 trials, where a trial lasts for a maximum of 20 seconds
Episode Reward Mean
80 TABLE VI: Sim-to-sim performance of 10 trials from poli-
Episode Reward Mean
60
cies learned in joint space with various joint PD gains. All
joint gains can produce fast and stable (though unnatural-
40
looking) locomotion in PyBullet, but do not transfer well to
20
another simulator.
0 Flat Terrain
Rough Terrain kp kd Mean Dist. Success Rate
-20 30 0.1 59 0.9
0 2 4 6 8 10
6 50 0.1 20 0.2
Number of Training Timesteps 10
20 0.5 11 0
Fig. 6: Episode reward mean while training in our proposed 30 0.5 24 0.1
40 0.5 56 0.7
action space. We compare training over flat terrain only with 50 0.5 37 0.9
training over randomly varying rough terrain, which consists 60 0.5 26 0.7
of random boxes up to 0.04 m in height. 70 0.5 15 0.4
50 1 36 0.4
60 1 24 0.3
TABLE V: Sim-to-sim success rate for 10 trials from training
with varying dynamics randomization. A successful trial
represents running without falling for 20 seconds.
Dynamics Randomization Mean Dist. (m) Success Rate
None <2 0
Varying µ only 5 0
Varying µ, added load 7 0
Varying µ, 10% mass 5 0
Varying µ, 20% mass 4 0
Varying µ, 10% mass, added load 14 0
Varying µ, 20% mass, added load 78.9 1