0% found this document useful (0 votes)
61 views501 pages

(the MIT Press Series) Nicholas Roy_ Jonathan Schoenberg_ Paul Newman_ Siddhartha Srinivasa_ Priyanshu Agarwal_ Suren Kumar_ Julian Ryde_ Jason Corso_ Venkat Krovi_ Nisar Ahmed - Robotics_ Science and (Z-Lib.io)

Uploaded by

sfleandro_67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views501 pages

(the MIT Press Series) Nicholas Roy_ Jonathan Schoenberg_ Paul Newman_ Siddhartha Srinivasa_ Priyanshu Agarwal_ Suren Kumar_ Julian Ryde_ Jason Corso_ Venkat Krovi_ Nisar Ahmed - Robotics_ Science and (Z-Lib.io)

Uploaded by

sfleandro_67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 501

Robotics

Robotics

Science and Systems VIII

edited by Nicholas Roy, Paul Newman, and Siddhartha Srinivasa

The MIT Press


Cambridge, Massachusetts
London, England
c 2013 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any electronic or me-
chanical means (including photocopying, recording, or information storage and retrieval) without
permission in writing from the publisher.

For information about special quantity discounts, please email special [email protected] or
write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142.

Printed and bound in the United States of America.

Library of Congress Cataloging-in-Publication Data

Robotics: Science and Systems Conference (8th : 2012 : Sydney)


Robotics : science and systems VIII / edited by Nicholas Roy, Paul Newman, and Siddhartha Srini-
vasa.
pages cm
Selected papers presented at Robotics: Science and Systems (RSS) 2012, held July 9-13 at the
University of Sydney, Sydney, NSW, Australia.
Includes bibliographical references and index.
ISBN 978-0-262-51968-7 (pbk. : alk. paper) 1. Robotics—Congresses. I. Roy, Nicholas. II.
Newman, Paul (Paul Michael). III. Srinivasa, Siddhartha. IV. Title.
TJ210.3.R6435 2012
629.8 92—dc23
´

2012049981

10 9 8 7 6 5 4 3 2 1
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Organizing Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Program Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Workshop Evaluation Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Sponsors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Estimating Human Dynamics On-the-Fly Using Monocular Video For Pose Estimation
Priyanshu Agarwal, Suren Kumar, Julian Ryde, Jason J. Corso, and Venkat N. Krovi . . . . 1
Fast Weighted Exponential Product Rules for Robust General Multi-Robot Data Fusion
Nisar Ahmed, Jonathan Schoenberg, and Mark Campbell . . . . . . . . . . . . . . . . . . . 9
State Estimation for Legged Robots: Consistent Fusion of Leg Kinematics and IMU
Michael Bloesch, Marco Hutter, Mark A. Hoepflinger, Stefan Leutenegger, Christian Gehring,
C. David Remy, and Roland Siegwart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Extrinsic Calibration from Per-Sensor Egomotion
Jonathan Brookshire and Seth Teller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Colour-Consistent Structure-from-Motion Models Using Underwater Imagery
Mitch Bryson, Matthew Johnson-Roberson, Oscar Pizarro, and Stefan B. Williams . . . . . 33
M-Width: Stability and Accuracy of Haptic Rendering of Virtual Mass
Nick Colonnese and Allison Okamura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Contextual Sequence Prediction with Application to Control Library Optimization
Debadeepta Dey, Tian Yu Liu, Martial Hebert, and J. Andrew Bagnell . . . . . . . . . . . . 49
Physics-Based Grasp Planning Through Clutter
Mehmet R. Dogar, Kaijen Hsiao, Matei Ciocarlie, and Siddhartha S. Srinivasa . . . . . . . 57
FFT-Based Terrain Segmentation for Underwater Mapping
B. Douillard, N. Nourani-Vatani, M. Johnson-Roberson, S. Williams, C. Roman, O. Pizarro, I.
Vaughn, and G. Inglis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Formalizing Assistive Teleoperation
Anca D. Dragan and Siddhartha S. Srinivasa . . . . . . . . . . . . . . . . . . . . . . . . . 73
Reducing Conservativeness in Safety Guarantees by Learning Disturbances Online: Iterated
Guaranteed Safe Online Learning
Jeremy H. Gillula and Claire J. Tomlin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
What’s in the Bag: A Distributed Approach to 3D Shape Duplication with Modular Robots
Kyle W. Gilpin and Daniela Rus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
v
An Object-Based Approach to Map Human Hand Synergies onto Robotic Hands with
Dissimilar Kinematics
G. Gioioso, G. Salvietti, M. Malvezzi, and D. Prattichizzo . . . . . . . . . . . . . . . . . . 97
What Types of Interactions Do Bio-Inspired Robot Swarms and Flocks Afford a Human?
Michael A. Goodrich, Brian Pendleton, Sean Kerman, and P. B. Sujit . . . . . . . . . . . . . 105
Real-Time Inverse Dynamics Learning for Musculoskeletal Robots Based on Echo State
Gaussian Process Regression
Christoph Hartmann, Joschka Boedecker, Oliver Obst, Shuhei Ikemoto, and Minoru Asada . 113
Recognition, Prediction, and Planning for Assisted Teleoperation of Freeform Tasks
Kris Hauser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Hybrid Operational Space Control for Compliant Legged Systems
Marco Hutter, Mark A. Hoepflinger, Christian Gehring, Michael Bloesch, C. David Remy, and
Roland Siegwart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Modeling and Prediction of Pedestrian Behavior Based on the Sub-Goal Concept
Tetsushi Ikeda, Yoshihiro Chigodo, Daniel Rea, Francesco Zanlungo, Masahiro Shiomi, and
Takayuki Kanda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Asymptotically-Optimal Path Planning on Manifolds
Léonard Jaillet and Josep M. Porta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Minimal Coordinate Formulation of Contact Dynamics in Operational Space
Abhinandan Jain, Cory Crean, Calvin Kuo, Steven Myint, and Hubertus von Bremen . . . . 153
Nonparametric Bayesian Models for Unsupervised Scene Analysis and Reconstruction
Dominik Joho, Gian Diego Tipaldi, Nikolas Engelhard, Cyrill Stachniss, and Wolfram Burgard 161
Distributed Approximation of Joint Measurement Distributions Using Mixtures of Gaussians
Brian J. Julian, Stephen L. Smith, and Daniela Rus . . . . . . . . . . . . . . . . . . . . . . 169
Robust Object Grasping Using Force Compliant Motion Primitives
Moslem Kazemi, Jean-Sebastien Valois, J. Andrew Bagnell, and Nancy Pollard . . . . . . . 177
Multi-Stage Micro Rockets for Robotic Insects
Mirko Kovač, Maria Bendana, Rohit Krishnan, Jessica Burton, Michael Smith, and Robert J.
Wood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Feature-Based Prediction of Trajectories for Socially Compliant Navigation
Markus Kuderer, Henrik Kretzschmar, Christoph Sprunk, and Wolfram Burgard . . . . . . . 193
Variational Bayesian Optimization for Runtime Risk-Sensitive Control
Scott Kuindersma, Roderic Grupen, and Andrew Barto . . . . . . . . . . . . . . . . . . . . 201
Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and
Velocity
Tobias Kunz and Mike Stilman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Towards A Swarm of Agile Micro Quadrotors
Alex Kushleyev, Daniel Mellinger, and Vijay Kumar . . . . . . . . . . . . . . . . . . . . . . 217
CompActTM Arm: a Compliant Manipulator with Intrinsic Variable Physical Damping
Matteo Laffranchi, Nikos G. Tsagarakis, and Darwin G. Caldwell . . . . . . . . . . . . . . 225
vi
Robust Loop Closing Over Time
Yasir Latif, Cesar Cadena, and José Neira . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Optimization-Based Estimator Design for Vision-Aided Inertial Navigation
Mingyang Li and Anastasios I. Mourikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Practical Route Planning Under Delay Uncertainty: Stochastic Shortest Path Queries
Sejoon Lim, Christian Sommer, Evdokia Nikolova, and Daniela Rus . . . . . . . . . . . . . 249
A Distributable and Computation-Flexible Assignment Algorithm: From Local Task Swapping
to Global Optimality
Lantao Liu and Dylan A. Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
The Banana Distribution Is Gaussian: A Localization Study with Exponential Coordinates
Andrew W. Long, Kevin C. Wolfe, Michael J. Mashner, and Gregory S. Chirikjian . . . . . . 265
Recognition and Pose Estimation of Rigid Transparent Objects with a Kinect Sensor
Ilya Lysenkov, Victor Eruhimov, and Gary Bradski . . . . . . . . . . . . . . . . . . . . . . 273
Towards Persistent Localization and Mapping with a Continuous Appearance-Based Topology
William Maddern, Michael Milford, and Gordon Wyeth . . . . . . . . . . . . . . . . . . . . 281
Robust Navigation Execution by Planning in Belief Space
Bhaskara Marthi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Visual Route Recognition with a Handful of Bits
Michael J. Milford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Exploiting Passive Dynamics with Variable Stiffness Actuation in Robot Brachiation
Jun Nakanishi and Sethu Vijayakumar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Inference on Networks of Mixtures for Robust Robot Mapping
Edwin Olson and Pratik Agarwal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Turning-Rate Selective Control : A New Method for Independent Control of Stress-Engineered
MEMS Microrobots
Igor Paprotny, Christopher G. Levey, Paul K. Wright, and Bruce R. Donald . . . . . . . . . 321
Affine Trajectory Deformation for Redundant Manipulators
Quang-Cuong Pham and Yoshihiko Nakamura . . . . . . . . . . . . . . . . . . . . . . . . 329
E-Graphs: Bootstrapping Planning with Experience Graphs
Mike Phillips, Benjamin Cohen, Sachin Chitta, and Maxim Likhachev . . . . . . . . . . . . 337
Walking and Running on Yielding and Fluidizing Ground
Feifei Qian, Tingnan Zhang, Chen Li, Pierangelo Masarati, Aaron M. Hoover, Paul Birkmeyer,
Andrew Pullin, Ronald S. Fearing, and Daniel I. Goldman . . . . . . . . . . . . . . . . . . 345
On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference
Konrad Rawlik, Marc Toussaint, and Sethu Vijayakumar . . . . . . . . . . . . . . . . . . . 353
Failure Anticipation in Pursuit-Evasion
Cyril Robin and Simon Lacroix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Tendon-Driven Variable Impedance Control Using Reinforcement Learning
Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emanuel Todorov, and Yoky Matsuoka 369
Guaranteeing High-Level Behaviors While Exploring Partially Known Maps
Shahar Sarid, Bingxin Xu, and Hadas Kress-Gazit . . . . . . . . . . . . . . . . . . . . . . 377
vii
Development of a Testbed for Robotic Neuromuscular Controllers
Alexander Schepelmann, Michael D. Taylor, and Hartmut Geyer . . . . . . . . . . . . . . . 385
Experiments with Balancing on Irregular Terrains Using the Dreamer Mobile Humanoid Robot
Luis Sentis, Josh Petersen, and Roland Philippsen . . . . . . . . . . . . . . . . . . . . . . . 393
Parsing Indoor Scenes Using RGB-D Imagery
Camillo J. Taylor and Anthony Cowley . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Toward Information Theoretic Human-Robot Dialog
Stefanie Tellex, Pratiksha Thaker, Robin Deits, Dimitar Simeonov, Thomas Kollar, and Nicholas
Roy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Efficiently Finding Optimal Winding-Constrained Loops in the Plane
Paul Vernaza, Venkatraman Narayanan, and Maxim Likhachev . . . . . . . . . . . . . . . . 417
On the Structure of Nonlinearities in Pose Graph SLAM
Heng Wang, Gibson Hu, Shoudong Huang, and Gamini Dissanayake . . . . . . . . . . . . 425
Probabilistic Modeling of Human Movements for Intention Inference
Zhikun Wang, Marc Peter Deisenroth, Heni Ben Amor, David Vogt, Bernhard Schölkopf, and
Jan Peters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Optimization of Temporal Dynamics for Adaptive Human-Robot Interaction in Assembly
Manufacturing
Ronald Wilcox, Stefanos Nikolaidis, and Julie Shah . . . . . . . . . . . . . . . . . . . . . . 441
Optimal Control with Weighted Average Costs and Temporal Logic Specifications
Eric M. Wolff, Ufuk Topcu, and Richard M. Murray . . . . . . . . . . . . . . . . . . . . . . 449
Probabilistic Temporal Logic for Motion Planning with Resource Threshold Constraints
Chanyeol Yoo, Robert Fitch, and Salah Sukkarieh . . . . . . . . . . . . . . . . . . . . . . . 457
Hierarchical Motion Planning in Topological Representations
Dmitry Zarubin, Vladimir Ivan, Marc Toussaint, Taku Komura, and Sethu Vijayakumar . . . 465
Rigidity Maintenance Control for Multi-Robot Systems
Daniel Zelazo, Antonio Franchi, Frank Allgöwer, Heinrich H. Bülthoff, and Paolo Robuffo
Giordano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

viii
Preface

This volume contains the 61 papers presented at Robotics: Science


and Systems (RSS) 2012, held July 9-13 at the University of Sydney,
Sydney, NSW, Australia. This year 177 papers were submitted, each
of which was reviewed by four program committee members. Final
decisions were made during an in-person area chair committee meet-
ing yielding an acceptance rate of 35%. Readers will recall that in RSS
2011, 45 papers were accepted from a field of 183.
Beyond doubt, RSS continues to represent the best work from across the whole spec-
trum of the robotics science endeavour and as ever the diversity of papers submitted (and
accepted) to RSS continues to astound and this volume bears ample witness to that. This,
in conjunction with a RSS’s single track format, furnishes the attendee and reader with a
tour of what is judged by peer review to be new, exciting and sometimes challenging. This
preface then is an appropriate place to thank you, the robotics community, for reviewing all
the submitted papers and putting up with a stream of badgering emails from the area and
program chairs. Without the hard work of reviewers, RSS would be greatly impoverished.
Great thanks is due.
This year RSS took on a different format. In the hope that change is good, we tried to
integrate workshops, plenaries, early career spotlights, technical paper talks and e-poster
sessions. Every day began with a morning of workshops or tutorials. Here one could
find discussions and interactions covering the latest and best of robotics science. The
afternoon sessions began with a plenary from one of our five outstanding invited speakers.
They presented both visions of the future and revelations of the state of the art in fields
that one might not immediately associate with robotics but surely should or soon will:

• Dr. Anders Sandberg from the University of Oxford gave a talk titled “The Robot
and the Philosopher: Charting Progress at the Turing Centenary.”
• Prof. Mandyam Srinivasan FRS from the University of Queensland gave a talk titled
“Small Brains, Small Planes.”
• Prof. Michelle Simmons from the University of New South Wales University gave
a talk titled “Single Atom Devices for Quantum Computing.”
• Prof. Zoubin Ghahramani from Cambridge University gave a talk titled “Machine
Learning as Probabilistic Modeling.”
• Dr. Andrew Howard from SpaceX gave a talk titled “Enter the Dragon: The SpaceX
COTS Missions.” 1

The afternoons proceeded with a mix of best paper finalist talks and early career spot-
light talks, the latter given by three rising and recognised stars in the robotics community:

• Jan Peters from Technische Universitaet Darmstadt gave a talk titled “Towards Mo-
tor Skill Learning for Robotics.”
• Charlie Kemp from Georgia Institute of Technology gave a talk titled “Mobile Ma-
nipulation for Healthcare.”
1
With perhaps the most astounding tales of engineering gusto heard for some time.
ix
• Cyrill Stachniss from the University of Freiburg gave a talk titled “Towards Lifelong
Navigation for Mobile Robots.”

A highlight every afternoon was the all-important rapid-fire, e-poster teaser talks punc-
tuated and regulated by the RSS gong. This year, instead of having one gargantuan poster
session, each day ended with a smaller, refreshment-enhanced e-poster session with around
15 presenters. We hoped that these sessions would allow authors and attendees to inter-
act in a meaningful and two way discussion without attendees being overwhelmed by 61
poster shows running in parallel. The jury is out but the wine was good.
Of course RSS 2012 was only possible because of industrial and institutional spon-
soring. Special thanks to NICTA for Platinum level sponsorhip, ONR Global and the
Australian DSTO for Gold sponsorship, Google and NSW Trade and Investment for Sil-
ver, Willow Garage, CSIRO, ARAA and Springer for Bronze sponsorship. Thanks also to
Google for funding the best paper award and to Springer for funding the best student paper
award. We would like to recognize our technical sponsors: IEEE Robotics and Automation
Society, the Association for the Advancement of Artificial Intelligence, the International
Foundation of Robotics Research and the Robotics Society of Japan.
A huge thanks must go to the area chairs who once again brought panache, insight and
professionalism to the paper reviewing process. There was a lot to do but it was worth it.
They were: Pieter Abbeel (University of California, Berkeley), Tim Barfoot (University
of Toronto), Han-Lim Choi (KAIST), Noah Cowan (Johns Hopkins University), Andrew
Davison (Imperial College), Frank Dellaert (Georgia Institute of Technology), Ryan Eu-
stice (University of Michigan), Seth Hutchinson (UIUC), Takayuki Kanda (ATR), George
Kantor (Carnegie Mellon University), Hadas Kress-Gazit (Cornell University), Katherine
Kuchenbecker (University of Pennsylvania), Ingmar Posner (University of Oxford), Cedric
Pradalier (ETH Zurich), Luis Sentis (University of Texas at Austin), Siddhartha Srinivasa
(Carnegie Mellon University), Cyrill Stachniss (University of Freiburg), Salah Sukkarieh
(University of Sydney), Juan Tardos (University of Zaragoza), Sethu Vijayakumar (Uni-
versity of Edinburgh) and Gordon Wyeth (QUT).
We save the last and greatest thanks for Stefan Williams and Fabio Ramos for their
unflappable approach to local arrangements, to Dylan Shell for elegant and timely action
as Publicity Chair, to José Neira for building an outstanding program of workshops and
tutorials, to Siddhartha Srinivasa for being the final gatekeeper of quality as Publications
Chair, and finally, to Anca Dragan and Mehmet Dogar for dealing with the business of
getting the conference booklet and this tome together. If this conference was a success, it
was in no small part because of these folk.
Now please read on and enjoy what RSS 2012 brought to the community either here,
on these pages, or online (including color images) at:
http://www.roboticsproceedings.org/rss08/index.html

Nicholas Roy, Massachusetts Institute of Technology


Paul Newman, University of Oxford
Siddhartha Srinivasa, CMU
July 2012
x
Organizing Committee

General Chair Nicholas Roy, Massachusetts Institute of Technology


Program Chair Paul Newman, University of Oxford
Local Arrangement Co-Chairs Stefan B. Williams, University of Sydney
Fabio Ramos, University of Sydney
Workshops Chair José Neira, Universidad de Zaragoza
Publicity Chair Dylan Shell, Texas A&M
Publications Chair Siddhartha Srinivasa, Carnegie Mellon University
Pieter Abbeel, UC Berkeley
Tim Barfoot, University of Toronto
Han-Lim Choi, KAIST
Noah Cowan, Johns Hopkins University
Andrew Davison, Imperial College
Frank Dellaert, George Institute of Technology
Ryan Eustice, University of Michigan
Seth Hutchinson, UIUC
Takayuki Kanda, ATR
George Kantor, Carnegie Mellon University
Hadas Kress-Gazit, Cornell University
Katherine Kuchenbecker, University of Pennsylvania
Ingmar Posner, Oxford University
Cédric Pradalier, ETH Zürich
Luis Sentis, University of Texas at Austin
Siddhartha Srinivasa, Carnegie Mellon University
Cyrill Stachniss, University of Freiburg
Salah Sukkarieh, University of Sydney
Juan Tardós, Universidad de Zaragoza
Sethu Vijayakumar, University of Edinburgh
Gordon Wyeth, QUT

xi
RSS Foundation Board
President Sebastian Thrun, Stanford University
Directors Oliver Brock, Technische Universität Berlin
Dieter Fox, University of Washington
Yoky Matsuoka, University of Washington
Stefan Schaal, University of Southern California
Gaurav Sukhatme, University of Southern California
Jeff Trinkle, Rensselaer Polytechnic Institute
Treasurer Wolfram Burgard, University of Freiburg
Secretary Nicholas Roy, Massachusetts Institute of Technology

xii
Program Committee

Abbott, Jake Chli, Margarita Gosselin, Clment


Adams, Martin Choi, Youngjin Gould, Stephen
Ahmed, Nisar Chong, NakYoung Grisetti, Giorgio
Akin, Emrah Choset, Howie Grizzle, Jessy
Andreasson, Henrik Christensen, Henrik Grocholsky, Ben
Asfour, Tamim Chung, Soon-Jo Grollman, Dan
Atkeson, Chris Chung, Timothy Gross, Roderich
Ayanian, Nora Civera, Javier Gutmann, Steffen
Bachrach, Abe Clark, Jonathan Ha, Quang
Bailey, Tim Clark, Chris Haddadin, Sami
Bajcsy, Ruzena Colas, Francis Hannaford, Blake
Balasubramanian, Ravi Comport, Andrew Hart, Stephen
Balkcom, Devin Conti, Francois Hauser, Kris
Beall, Chris Corke, Peter Haydar, Ali
Beer, Jenay Crandall, Jacob Henning, Philippe
Beetz, Michael Degani, Amir Hollerbach, John
Behnke, Sven Deisenroth, Marc Hollinger, Geoff
Bekris, Kostas Deng, Xinyan Hover, Franz
Belta, Calin Deshpande, Ashish How, Jonathan
Benenson, Rodrigo Doğar, Mehmet Howard, Matthew
Bennewitz, Maren Douillard, Bertrand Howard, Andrew
Berenson, Dmitry Dragan, Anca Howard, Ayanna
Bergbreiter, Sarah Duckett, Tom Hsiao, Kaijen
Bergerman, Marcel Dudek, Greg Hsieh, Ani
Bertuccelli, Luca Fairfield, Nathaniel Hsu, David
Bicchi, Antonio Fitch, Robert Huang, Haomiao
Billard, Aude Fraisse, Philippe Huang, Albert
Bosse, Mike Fraundorfer, Friedrich Hurst, Jonathan
Bretl, Tim Frazzoli, Emilio Hyun, Baro
Brock, Oliver Frew, Eric Ijspeert, Auke
Bryson, Mitch Frisoli, Antonio Ila, Viorela
Buchli, Jonas Fritz, Mario Imai, Michita
Burgard, Wolfram Full, Robert Indelman, Vadim
Calinon, Sylvain Furgale, Paul Jenkin, Michael
Carpin, Stefano Gasserts, Roger Jensfelt, Patric
Castellanos, José Geyer, Hartmut Jo, Sungho
Çavuşoğlu, Cenk Geyer, Christopher Kaess, Michael
Censi, Andrea Giguere, Philippe Kallem, Vinutha
Chadwicke, Odest Gillespie, Brent Karaman, Sertac
Chaumette, Francois Glas, Dylan Katz, Dov
Chernova, Sonia Goldberg, Ken Kelly, Jonathan
Chirikjian, Greg Goldman, Dan Kelly, Alonzo
xiii
Kim, Jung Matthies, Larry Pomerleau, Francois
Kinsey, James Maye, Jerome Porta, Josep
Klein, Georg Mayol, Walterio Rajan, Kanna
Knepper, Ross McLurkin, James Ramamoorthy, Subrama-
Ko, SeongYoung Mei, Christopher nian
Kodagoda, Sarath Michael, Nathan Ramos, Fabio
Koditschek, Daniel Michaud, Francois Ranganathan, Ananth
Kolter, Zico Milford, Michael Rao, Rajesh
Kosecka, Jana Minato, Takashi Ratliff, Nathan
Köser, Kevin Minguez, Javier Ravela, Sai
Kroeger, Torsten Mistry, Michael Ren, Wei
Krüger, Norbert Mitra, Sayan Revzen, Shai
Kuemmerle, Rainer Moll, Mark Riviere, Cam
Kuhnlenz, Kolja Mombaur, Katja Rock, Steve
Kumar, Pawan Moon, Hyungpil Rogers, John
Kuwata, Yoshiaki Morgansen, Kristi Roman, Chris
Kyrki, Ville Morimoto, Jun Rusu, Radu
Lacroix, Simon Mountney, Peter Salichs, Miguel
Lai, Kevin Mourikis, Anastasios Saranlı, Uluç
Lamiraux, Florent Munich, Mario Sardellitti, Irene
Lane, David Myung, Hyun Sattar, Junaed
Lavalle, Steve Nakanishi, Jun Scaramuzza, Davide
Lee, Jusuk Napp, Nils Scerri, Paul
Lee, Joon-Yong Neira, José Schaal, Stefan
Leibe, Bastian Nelson, Brad Schwager, Mac
Leonard, John Neumann, Gerhard Secchi, Cristian
Leonardis, Ales Nuske, Stephen Shah, Julie
Levesque, Vincent Ny, Jerome Le Shibata, Tomohiro
Likhachev, Maxim O’Kane, Jason Shiller, Zvi
Lilienthal, Achim Oliver, Kai Shin, Jiwon
Little, Jim Olson, Edwin Shiomi, Masahiro
Liu, Dikai Oriolo, Giuseppe Sibley, Gabe
Lozano-Perez, Tomas Ott, Christian Simeon, Nicola
Lynch, Kevin Pacheco, Lluis Singh, Arjun
Maciver, Malcolm Pack, Leslie Smith, Christian
Manchester, Ian Padois, Vincent Smith, Stephen
Mansard, Nicolas Park, Jaeheung Smith, Ryan
Marchand, Eric Paz, Lina Sousa, João
Maria, Jose Peynot, Thierry Spaan, Matthijs
Marshall, Joshua Pezzementi, Zachary Spinello, Luciano
Marthi, Bhaskara Pfaff, Patrick Stilman, Mike
Martinelli, Agostino Philippsen, Roland Stoytchev, Alex
Martinez, Oskar Piater, Justus Strasdat, Hauke
Mason, Matt Piniés, Pedro Strobl, Klaus
Matthies, Larry Pizarro, Oscar Sturm, Juergen
xiv
Sukhatme, Gaurav Triebel, Rudolph Werfel, Justin
Takayama, Leila Ude, Ales Whitcomb, Louis
Tangorra, James Ueda, Jun Williams, Stephen
Tanner, Herbert Underwood, James Wongpiromsarn, Nok
Tapus, Adriana Upcroft, Ben Wood, Robert
Tedrake, Russ Valls, Jamie Yamane, Katsu
Teo, Rodney van den Berg, Jur Yguel, Manuel
Tews, Ashley Vaughan, Richard Yoshida, Eiichi
Tipaldi, Gian Vedaldi, Andrea Yoshikawa, Yuichiro
Todorov, Emanuel Venkadesan, Madhusu- Zavlanos, Michael
Toussaint, Marc dan Zhang, Fumin
Trawny, Nikolas Walter, Matt Zinn, Mike
Trevor, Alex Waslander, Steven

xv
Workshop Evaluation Committee

Balkcom, Devin Kragic, Danica Simeon, Nicola


Barfoot, Tim Kress-Gazit, Hadas Singh, Sanjiv
Castellanos, José Kumar, Vijay Stachniss, Cyrill
Dudek, Gregory Leonard, John Sukkarieh, Salah
Eustice, Ryan Murimoto, Jun Tardós, Juan
Grisetti, Giorgio Oh, Paul Tedrake, Russ
Howard, Ayanna Pineau, Joelle Thomaz, Andrea
Inamura, Tetsunari Posner, Ingmar Wood, Robert
Kanda, Takayuki Roumeliotis, Stergios Wyeth, Gordon
Kantor, George Saranlı, Uluç Xiao, Jing
Klavins, Eric Sentis, Luis Yamane, Katsu

xvii
Sponsors
The organizers of Robotics Science and Systems 2012 gratefully acknowledge the follow-
ing conference sponsors:

• Platinum Sponsors:

• Gold Sponsors:

• Silver Sponsors:

• Bronze Sponsors:

xix
• Sponsor of Student Travel Fellowships:

• Sponsor of the Best Paper Award:

• Sponsor of the Best Student Paper Award:

• Technical Sponsors:

• Organized by:

xx
Estimating Human Dynamics On-the-Fly Using
Monocular Video For Pose Estimation
Priyanshu Agarwal† , Suren Kumar† , Julian Ryde‡ , Jason J. Corso‡ and Venkat N. Krovi†
Mechanical and Aerospace Engineering Department†
Department of Computer Science and Engineering‡
State University of New York at Buffalo
Buffalo, New York 14260-1660
Email: {priyansh, surenkum, jryde, jcorso, vkrovi}@buffalo.edu

Abstract—Human pose estimation using uncalibrated monoc- style due to changes in speed, step-length, and mass leading to
ular visual inputs alone is a challenging problem for both the 3D person tracking with physically plausible poses. The Kneed
computer vision and robotics communities. From the robotics Walker extension, allowed for capture of subtle aspects of
perspective, the challenge here is one of pose estimation of
a multiply-articulated system of bodies using a single non- motion such as knee bend and torso lean, even when these are
specialized environmental sensor (the camera) and thereby, not strongly evident from the images [5]. Vondrak et al. [31]
creating low-order surrogate computational models for analysis employed a full body 3D simulation-based prior that explicitly
and control. incorporated motion control and dynamics into a Bayesian
In this work, we propose a technique for estimating the lower- filtering framework for human motion tracking. However, there
limb dynamics of a human solely based on captured behavior
using an uncalibrated monocular video camera. We leverage our is always one or more fundamental assumptions involved that
previously developed framework for human pose estimation to there is a priori knowledge about the physical properties (e.g.
(i) deduce the correct sequence of temporally coherent gap-filled mass, inertia, limb lengths, ground plane and/or collision ge-
pose estimates, (ii) estimate physical parameters, employing a ometries), the activity in the scene, calibrated camera, imagery
dynamics model incorporating the anthropometric constraints, from multiple cameras, availability of similar motion dataset
and (iii) filter out the optimized gap-filled pose estimates, using an
Unscented Kalman Filter (UKF) with the estimated dynamically- [24, 2, 3, 25, 16, 26, 32]. Furthermore, the obtained pose
equivalent human dynamics model. We test the framework on estimates are susceptible to the error in the physical parameters
videos from the publicly available DARPA Mind’s Eye Year 1 estimates of the system due to propagation of pose states via
corpus [8]. The combined estimation and filtering framework not a nonlinear dynamics model.
only results in more accurate physically plausible pose estimates, Brubaker et al. [4] proposed a framework for estimating
but also provides pose estimates for frames, where the original
human pose estimation framework failed to provide one. contact dynamics and internal torques using 3D pose estimates
obtained from two views (roughly sagittal and roughly frontal)
I. I NTRODUCTION of a subject using calibrated cameras, mass and inertial prop-
Estimating and tracking 3D pose of humans in uncon- erties determined by combining the body segment length esti-
strained environments using monocular vision poses sev- mated from the motion capture data with standard biomechan-
eral technical challenges due to high-dimensionality of hu- ical data. However, well established system identification tech-
man pose, self-occlusion, unconstrained motions, variability niques fail to solve the problem of estimating physical human
in human motion and appearance, observation ambiguities parameters solely using uncalibrated monocular imagery when
(left/right limb ambiguity), ambiguities due to camera view- only partial noisy pose estimates are available due to: (i) partial
point, motion blur and unconstrained lighting [13]. In the state knowledge (no pose velocity information); (ii) noise due
past, visual tracking algorithms have relied on kinematic to imaging inaccuracies; (iii) unknown and insufficient input
prior models. While these exploit the motion constraints of excitation (both structure and frequency); (iv) low sampling
articulated models, joint limits, and temporal smoothness frequency; and (v) inherent dynamics nonlinearity. Neverthe-
for improved performance, they have proven insufficient to less, accurate and efficient kinematic and inertial parameter
capture the nonlinearities of human motion [28, 27, 14]. More estimation of articulated multibody systems using noisy partial
advanced activity-specific models learned from motion capture sensing has broader applications for enhanced telepresence
data have also been employed that yield approximately correct [22], robot navigation [12], vision-based humanoid control,
body configurations on datasets containing similar motions multi-robot cooperation [20] and imitation based robot control
[9, 29, 30]. However, such models often lead to poses that are [21].
physically implausible (e.g. foot-skates, out-of-plane rotations) In this work, we build on our previous work [1] wherein a
and face difficulty in generalization beyond the learned dataset. framework for human pose estimation in uncalibrated monoc-
Brubaker et al. [6] introduced the notion of a physics-based ular videos is proposed to obtain the raw pose estimates of
biomechanical prior-model characterizing lower-body dynam- lower limbs. A Fast Fourier Transform (FFT) of the raw pose
ics. A key benefit is the natural accommodation of variations in estimates provides information regarding the fundamental fre-

1
Video Preprocessing Optimization
(Silhouette Extraction, based 3D
Person Detection by Heading/Pose
Tracking) Estimation
Raw Monocular Video
Pose Estimates
Frequency based Activity Inferencing
and Parameter Estimation

Dynamics Model Structure Non-linear Least Squares Pose based


(No input/inertial parameter Pose Sequence and Physical
information) Parameter Estimation
Pose Sequence, Parameter
and Input Estimates
Pose filtering using Estimated
Optimized Pose Sequence and
Dynamically-Equivalent Model
Dynamically Equivalent Model
(Unscented Kalman Filter)

Fig. 1: Overview of the overall system. The grayed out boxes represent previous work of the authors that has been used in
this framework.

quency in the estimates leading to inference about the current III. H UMAN P OSE E STIMATION
activity in the scene. We focus on inertial and kinematic
parameter estimation for human lower-limbs from monocu- The work on human pose estimation by the authors employs
lar video by setting up optimization subproblems employing a model-based generative approach for the task of human pose
the structure of the dynamics model. A simplified dynamics estimation in unrestricted environments. Unlike many previous
model–the Anthropomorphic Walker is used along with the approaches, the framework is fully automatic, without any
optimized anthropometric constraints tuned to generate stable manual initialization for monocular video-based pose estima-
gait, to identify the human physical parameters along with tion. We do not use any camera calibration, prior motion (mo-
an optimized sequence of gap-filled pose estimates. Finally, tion capture database), prior activity, appearance, body shape
an UKF is used to filter out the optimized gap-filled pose or size information about the scene. The generative model
estimates using the continuous-discrete state-space model of for estimating heading direction of the subject in the video
the estimated dynamically-equivalent human dynamics model uses motion-based cues to constrain the pose search space.
for walking. Thereafter, a sequential optimization framework estimates the
remaining uncoupled pose states (camera parameters, body
II. S YSTEM OVERVIEW location, body joint angles) leveraging a combination of
deterministic and probabilistic optimization approaches. The
Problem Statement: Given the noisy partial pose estimates framework outputs two probable body pose estimates (human
(joint angles of hips) from an uncalibrated monocular video torso location (x, y, scaled z), right/left hip flexion angle,
of a human, develop a dynamic (equivalent) model for human right/left knee flexion angle, human heading direction) for
activity suitable for subsequent use to filter the pose estimates. each frame in the video. Fig.s. 7(a), 7(b) illustrates the two
Fig. 1 provides an overview of the proposed system. A obtained raw pose estimates (right leg being forward and
raw monocular video is processed to extract human silhouette the left leg being forward, respectively) obtained for a few
and track humans in the scene. The human pose estimation videos in the DARPA corpus [8]. However, determining the
framework is then used to obtain the pose estimates. The correct temporally coherent sequence of physically plausible
lower limb pose information is used to extract the primary pose estimates remained a challenge which is addressed in this
frequency component and inference about the activity in the work for nearly periodic activities.
scene. An anthropometrically constrained dynamic model with
uncertain physical parameters is simulated to obtain a model-
based frequency estimate. In Phase I, a non-linear least squares IV. H UMAN DYNAMICS M ODEL
(NLS) estimator is used to estimate a subset of the parameters A. The Anthropomorphic Walker
that minimizes the frequency disparity (between the measured
and simulated frequencies).This is used to bootstrap a Phase The Anthropomorphic Walker proposed and pioneered by
II NLS estimator (with 6 uncertain parameters) to minimize McGeer [19], and later modified by Kuo [17] by introducing
the disparity between measured and simulated poses. These impulsive push-off combined with torsional spring between
estimates are then used to obtain a dynamically equivalent the legs permitted the generation of a range of stable walking
system which is used as the process model for an UKF to motion on level and slanted ground. We use this simplified
obtain the filtered pose estimates. dynamics model (Eqn.s 1,2) for modeling human walking.

2
Mass Relations
Body Part Notation Relative fraction/
Expression
Torso/Upper body mt 0.678
Thigh mth 0.1
Shank ms 0.061
Leg ml mth + ms
Length Relations
Upper body lt 0.47
Thigh lth 0.23669
Shank ls 0.24556
Foot Radius Rf 0.14466
Center of Mass Relations
Leg from foot end Cl 0.553
Radius of Gyration Relations
Fig. 2: Parameters of the Anthropomorphic Walker. Please Leg γl 0.326
refer to [6] for details regarding the dynamics model. Torso γt 0.496
Inertia Relations
Leg Il ml × γleg
2

Torso It mt × γt2
TT M Tq̈ = f (t) + TT M (G − g) (1) TABLE I: Relative fractions and expressions obtained after
T
where q = [φ1 , φ2 ] , T (q) is the kinematic transfer function, optimizing the anthropometric relationships for various phys-
M is the mass matrix, T is the Jacobian of T (q), f (t) is the ical parameters. Relative fractions are expressed in terms of
generalized force vector, G is the gravitational acceleration full body mass and length.
vector, g is the convective acceleration, and γ is the ground
slope angle which is assumed to be zero for this work. The
post collision velocities are solved using to the length of the body part, the number of steps, and
  the minimum number of foot steps required in simula-
T+T M T+ q̇ + = T+T S + M Tq̇ − (2) tion. We use Θal = [0.1, 0.6, 0.2, 0.2, 0.3, 0.4] and Θau =
where T + (q) is the kinematic transfer matrix immediately [0.2, 0.7, 0.3, 0.3, 0.4, 0.5] such that the standard relations are
after the impact, T+ (q) is the Jacobian of T + (q), q̇ − , q̇ + well within the bounds. In order to obtain global solutions in
are the pre- and post-collision velocities, respectively, and S the multi-modal space, we use the standard genetic algorithm
is the impulse vector. [11] to solve the optimization problem with termination cri-
teria as tolerance on parameter values (0.001) and function
B. Incorporation of Empirical Anthropomorphic Constraints value (0.01). The optimized length relationships along with
In order to constrain the various physical parameters other mass relationships are provided in Table I. The only
(lengths, mass, inertia) in the original model, we exploit the unknowns for the model after incorporating these constraints
empirical anthropometric values provided in [7]. However, are the total mass (m), total height (h), spring stiffness (κ)
since the dynamics model is a simplified version of the actual for the spring connecting the two legs, and the magnitude of
human anthropometry, the values are not directly applicable. foot-ground collision impulse (ι) of the human. However, since
We set up an optimization problem (Eqn. 3) to estimate the spring stiffness and the magnitude of foot-ground collision
the optimal length relations such that minimum variation is impulse leads to the same effect in the dynamics, we treat the
obtained in step length across multiple steps (i.e. gait is stable) impulsive reaction force to be a constant. Please note that till
by simulating the dynamics model for the current set of this point no pose information regarding the specific person in
anthropometric relations. We initialize these relations with the the scene has been exploited.
corresponding relations in standard human anthropometry.
V. P OSE AND PARAMETER E STIMATION
min f (Θa ) = V ar(lsi )
Θa Once a stable walking model is obtained, the parameters can
(j)
subject to: Θal ≤ Θ(j) (3)
a ≤ Θau , lsi > 0, ns > nmin
(j) now be optimized to generate the characteristic nearly periodic
gait of the human in the scene using the raw pose estimates
Θa = [rR , rC , rs , rth , rγl , rγt ]
in a two stage approach.
where Var(.) stands for variance, lsi , rx , rγx , ns , nmin rep-
resent the step length of ith step, relative fraction of A. Gait Frequency Based Estimation
the parameter with respect to the total height of the hu- Estimation of all parameters simultaneously based on pose
man, relative fraction of radius of gyration with respect alone leads to incorrect estimates at times due to aliasing.

3
0.8 0.6 0.5
0.8 Right Hip Estimate 1 Optimized Optimized
Right Hip Estimate 1
Left Hip Estimate 1 0.7 Left Hip Estimate 1 0.4 Simulated Simulated

Right Hip Angle (radians) →


0.6

Left Hip Angle (radians) →


Right Hip Estimate 2 Right Hip Estimate 2
Left Hip Estimate 2 0.6 Left Hip Estimate 2
Hip Angle (radians) →

0.2
0.4

|Hip Angle(t)| →
0.5
0
0.2
0.4 0
−0.2
0
0.3
−0.4
−0.2
0.2
−0.6
−0.4 0.1
−0.8 −0.5
−0.6 0 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5
0 0.5 1 1.5 2 2.5 3 3.5 0 5 10 15 Time (sec) → Time (sec) →
Time (sec) → Frequency (Hz) →

(a) (b) (a) (b)

Fig. 3: Frequency estimation from raw pose estimates for a Fig. 4: Optimized gap-filled and dynamics-simulation-based
video in the DARPA corpus. (a) Time evolution of raw pose pose estimates for a video in the DARPA corpus. (a) Right
estimates, and (b) Single-sided amplitude spectrum of raw hip angle, and (b) Left hip angle.
pose estimates.

(Eqn. 6).
Physical parameter estimation to first minimize the difference 
N

between the simulated and estimated gait frequency reduces min f (Θ) = min Xs − Xei j
Θ i
this error. Fig. 3 represents the raw pose estimates and the j=1
(6)
single-sided amplitude spectrum of the raw pose estimates subject to: Θl ≤ Θ ≤ Θu ,
obtained using FFT. We take the average of the frequency Θ = [m, h, κ, φ10 , φ20 , φ̇10 , φ̇20 ],
estimates obtained using the right (frh1 , frh2 ) and the left
(flh1 , flh2 ) hip raw pose estimates for both the raw estimates where X = [φ1 , φ2 ], Xs and Xei , i = 1, 2 represent the
(Eqn. 4). The range in which the frequency of the raw pose simulated and the two estimated poses, respectively, and Θ
estimates lies determines the periodic activity of the human represents the unknown parameters to be estimated. Fur-
in the scene. Fujiyoshi and Lipton [10] have shown that thermore, experiments showed that assuming φ20 = −φ10
it is possible to classify motions into walking and running resulted in stable walking gait which otherwise was diffi-
based on the frequency of the cyclic motion and that a cult. We use Θl = [1, 0.1, 0.1, −π/2, −π/2, −3π/2, −3π/2]
threshold of 2.0 Hz correctly classifies 97.5% of the target and Θu = [200, 2, 100, π/2, π/2, 3π/2, 3π/2]. We use the
motions. In this work, we consider the case where the human function evaluation based Nelder-Mead Simplex Method (due
activity is walking (fe ≈ 1.17Hz) to illustrate the concept. to absence of closed form solution of the states making
Generalized more complex dynamics model can be built that derivative-based methods inapplicable) [18] to solve both the
can cater multiple nearly periodic activities. In this Phase I optimization problems with termination criteria as tolerance on
NLS estimation problem (Eqn. 5), we seek to minimize the parameter values (0.01), function value (0.01) and maximum
difference between estimated frequency (fe ) and the frequency number of function evaluations (100000).
obtained using the dynamics simulation (fs ). During the optimization for pose and physical parameters
we obtain (i) the correct sequence of the pose estimates as
well as (ii) the estimates for the physical parameters of the
frh1 + flh1 + frh2 + flh2
fe = (4) system resulting in a dynamically-equivalent model for the
4 human in the scene. Raw pose estimates for many frames
min f (Θ) = fs − fe  are missing due to inability of the pose estimation algorithm
Θ (5)
subject to: Θl ≤ Θ ≤ Θu , Θ = [m, h, κ] to provide reliable estimates for these frames because of: (i)
improper silhouette segmentation due to insignificant lower
We use Θl = [1, 0.1, 0.1] and Θu = [200, 2, 100] for the limb movement, large shadow, high similarity in appearance
optimization problem to be generalizable. of foreground and background, merging of human silhouette
with other entities; (ii) partial occlusion of the human in
B. Pose Based Estimation the scene. We substitute the dynamic simulation based states
In Phase II, we tune the full parameter set of the model for the states available from simulation (hip angles) and use
to minimize the difference between the dynamics-simulation- linear interpolation for the remaining states. Fig. 4 shows
based states and the estimated states (chosen based on its prox- the optimized gap-filled and dynamics simulation-based pose
imity to the simulated state for the current frame). Initializing estimates for the hip angles in a video.
the dynamics model with the noisy raw pose states also results In order to filter the optimized pose estimates using the
in unstable walking (due to noise in the estimation). So, the estimated dynamically-equivalent model we linearized the
initial states are also estimated during the parameter estima- dynamics model. However, the linearization of the model does
tion. The objective is to minimize the difference between the not truly capture the non-linearity in the model especially
dynamics-simulation based states and the estimated raw states because a strong non-linearity is present in the system (due

4
to impulsive collision of the foot with the ground). This leads where L = nx (= 4) + nw (= 2) + nv (= 2):
to unstable walking, rendering the use of Extended Kalman 
Filter infeasible. Hence, in lieu of computationally expensive χak−1 = x̂ak−1 x̂ak−1 + γSxak−1 x̂ak−1 − γSxak−1 ,
particle filters, we explore the use of UKF for filtering out the a
optimized states, using the dynamically equivalent model for Pk−1 = Sxak−1 (Sxak−1 )T
the human in the scene. Step 2.2 Time update equations for i = 0, ..., 2L:
 
VI. U NSCENTED K ALMAN F ILTERING
χt|k−1,i = f t, χxk−1,i , χw
k−1,i
, t ∈ [0, Δt]
The original discrete-time Unscented Kalman filter works
[15] satisfactorily when small time steps are considered. How- Step 2.3 If collision occurred at t = tc
ever, for the problem of human pose filtering the measurement
φ̇− (tc ) = [χtc ,i,3 χtc ,i,4 ]T
update rate (Δt) is governed by the frame rate of the video be-  −1 +T  
ing processed. This is typically poor even for high frame-rate φ̇+ (tc ) = T+T M T+T T S + M T− φ̇− (tc )
cameras (∼ 100 Hz), especially given significant non-linearity
in the dynamics. Furthermore, the presence of discontinuity Swap the states (stance leg becomes swing leg and vice
in the system dynamics requires a continuous-discrete Kalman versa)
filter to be employed, especially because the occurrence of the χt = [χtc ,i,2 χtc ,i,1 φ̇+ +
2 (tc ) φ̇1 (tc )]
T
c |k−1,i
discontinuity significantly affects the stability of the system.  
The use of continuous time non-linear state-space model for χt|t = f t, χxt |k−1,i , χw , t ∈ [tc , Δt]
c ,i c k−1,i
the system ensures that the occurrence of the discontinuity
χk|k−1,i = [χΔt,i,2 χΔt,i,1 χΔt,i,4 χΔt,i,3 ]T
is accurately captured in time, which is imperative for the
stability of the system. In addition, the resulting change in the Step 2.4 If no collision occurred
system model requires careful switching of the states and the
entries in the associated state and error covariance matrices as χk|k−1,i = [χΔt,i,2 χΔt,i,1 χΔt,i,4 χΔt,i,3 ]T
explained in Algorithm 1. We use the following continuous- 
2L
discrete system model (Eqn. 7) derived from the dynamics x̂−
k = wim χxk|k−1,i
model (Eqn. 1) for human walking with the states switching i=0
roles (φ1 , φ̇1 becomes φ2 , φ̇2 respectively and vice versa) 
2L   T
when collision occurs: Px−k = wic χxk|k−1,i − x̂−
k χ x
k|k−1,i
− x̂ −
k
i=0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Step 2.5 Measurement-update equations for i = 0, ..., 2L:
ẋ1 (t) φ̇1 (t) 0
⎢ ẋ2 (t) ⎥ ⎢ ⎥ ⎢ 0 ⎥  x 
⎢ ⎥=⎢ φ̇2 (t) ⎥+⎢
y w
⎥ , χk|k−1,i = h χk−1,i , χk−1,i
⎣ ẋ3 (t) ⎦ ⎣ f1 (t, φ1 (t), φ2 (t), φ̇1 (t), φ̇2 (t)) ⎦ ⎣ w1 (t) ⎦
ẋ4 (t) w2 (t) 2L
f2 (t, φ1 (t), φ2 (t), φ̇1 (t), φ̇2 (t)) ŷ − = wim χyk|k−1,i
k
w(t) ∼ N (0, Q), w(t) = [w1 (t) w2 (t)] T
i=0

y1 (k) φ1 (k) v1 (k) 2L   T


, v(k) ∼ N (0, R) − −
y2 (k)
=
φ (k)
+
v (k) Peyey = w c
i χ y
k|k−1,i
− ŷ k
χ y
k|k−1,i
− ŷ k
2 2
i=0
E(w(t1 )w(t2 ) ) = δ(t1 − t2 )Q, E(v(i)v(j)T ) = δij R,
T

2L   T
E(v(k)w(t)T ) = 0, ∀i, j, v(k) = [v1 (k) v2 (k)]T Pexey = wic χxk|k−1,i − x̂−
k χyk|k−1,i − ŷ −
k
(7) i=0

Step 2.6 If collision occurred swap the entries in the state


covariance matrices Px−k , Peyey , and Pexey to accommodate
Algorithm 1 Pseudocode for UKF implemented for a
for the swapping of the states.
continuous-discrete system model.
Step 2.7 Kalman update equations
Step 1. Initialization
−1
Kk = Pexey Peyey
x̂0 = [φ1 , φ2 , φ̇1 , φ̇2 ]Toptimized , Px0 = diag(0.01, 0.01, 0.5, 0.5)  
x̂k = x̂− −
k + Kk y k − y k
Q0 = diag(10, 10), R0 = diag(0.01, 0.01)
⎡ ⎤ Pxk = Px−k − Kk Peyey
−1
KkT
Pxk 0 0
P0a = ⎣ 0 Qk 0 ⎦ End for
0 0 Rk where {wi } is a set of scalar weights given by:
Step 2. For k=1,...,N λ λ
w0m = , wc = + (1 − α2 + β),
Step 2.1 Calculate sigma-points {χak,i ∈ RL |i = 0, ..., 2L}, L+λ 0 L+λ

5
1 0.6 0.5

wim = wic = , i = 1, ..., 2L 0.4


2(L + λ)

Right Hip Angle (degrees) →

Left Hip Angle (degrees) →


√ 0.2
λ = α2 (L + κ) − L, γ = L + λ 0
0
−0.2

−0.4
UKF estimated φ1 UKF estimated φ2
−0.6
It was observed that arbitrarily choosing the process and Optimized φ
1
Optimized φ
2
−0.8 −0.5
measurement covariances results in the state covariance matrix 0 20 40 60
Time step →
80 100 0 20 40 60
Time step →
80 100

being negative definite due to round-off errors during the (a) (b)
filtering process, in which case the filtering cannot continue.
We provide more weight to the measurements and so, choose Fig. 5: Results for the UKF
 filtered states for a video in the
2 L
large values for the process covariance. DARPA corpus. (α = L , β = 0, κ = 2 ) (a) Right hip
angle, and (b) Left hip angle.
VII. R ESULTS
Fig. 5 shows the filtered pose estimates for the hip angles Output Average Error Per Number of Frames
obtained using the UKF. As is evident, the higher frequency Marker Per Frame Processed
noise in the raw pose estimates is filtered and the filtered Raw 18.33 pixels 52
estimates are temporally coherent. Table 7 shows the raw Filtered 16.54 pixels 94
estimates corresponding to right/left leg forward, simulated,
optimized and filtered estimates for the hip angles rendered TABLE II: Error metric (L1 norm for 13 salient marker
on the original video. Fig.s. 7(a) and 7(b) show the raw points in the image frame) for the raw and the filtered pose
estimates corresponding to the frames with missing pose estimates for a video in the DARPA corpus. The original frame
estimates rendered with reduced color intensity. Please note resolution is 1280 × 720 and the ground truth was obtained
that the limb identities are not maintained (left leg becomes using manual annotation.
right and vice versa) in these raw estimates. Fig. 7(c) shows
that the hip angles generated using the estimated dynamics
very closely explain the true hip angles, even for the frames of the limbs in the estimated pose. However, the estimates for
where no raw pose estimates were available. Note that a subset the markers in general are not comparable as the number of
of pose variables (left/right knee angles, torso location, and frames considered are different for the two histograms.
heading estimate) for the frames with missing pose estimates
is obtained via linear interpolation. VIII. C ONCLUSION
Fig. 7(d) shows the optimized pose estimates obtained by In this work, we propose a technique for estimating the
choosing the correct raw pose estimates, where they were dynamics of a human, solely based on its uncalibrated monoc-
available, and substituting the dynamics based pose estimates, ular video. An UKF is used to filter out the optimized
in case the raw estimates were missing. Fig. 7(e) shows gap-filled pose estimates using the continuous-discrete state-
the pose estimates obtained after filtering the optimized pose space model of the estimated dynamically-equivalent human
estimates using an UKF with the estimated dynamics model dynamics model. Results showed that the framework not only
as the process model. The filtered pose clearly shows that not resulted in more accurate pose estimates, but also provided
only the poor raw pose estimates were replaced by better pose physically plausible pose estimates for frames where the
estimates, the frames for which no pose estimates were present original human pose estimation framework failed to provide
were filled with very accurate pose estimates. one. In the future, we plan to build multi-model adaptive
Table II presents the results obtained after filtering the Unscented Kalman filter with more complex dynamics model
optimized pose estimates for a video in the DARPA corpus. (also simulating knee flexion, torso lean, and arms) that can
Note that the error metric [23] for filtered pose should ideally cater multiple human activities.
not be compared with that of the raw estimates. This is because
the frames where no pose estimates were available is simply ACKNOWLEDGMENTS
filled with the dynamics estimates and interpolation i.e. no The authors gratefully acknowledge the support from De-
evidence from the images is used. Nevertheless, the error fense Advanced Research Projects Agency Mind’s Eye Pro-
metric considering the total number of frames is better than gram (W911NF-10-2-0062). The keen insight of the anony-
the raw estimates. Furthermore, the filtered pose estimates are mous reviewers is also gratefully acknowledged.
now available for 94 frames as opposed to the 52 frames with
noisy raw estimates. R EFERENCES
Fig. 6 shows the error distribution across the 13 salient [1] P. Agarwal, S. Kumar, J. Ryde, J. Corso, and V. Krovi.
marker points considered for error metric evaluation for the An Optimization Based Framework for Human Pose
same video. There is significant reduction in the error corre- Estimation in Monocular Videos. In International Sym-
sponding to the left/right foot markers as there is no swapping posium on Visual Computing, 2012.

6
(a) Raw I
(b) Raw II
(c) Simulated
(d) Optimized
(e) Filtered

Fig. 7: Human pose estimation results at various stages for videos in the DARPA corpus. (a) Raw estimates with left leg forward,
(b) Raw estimates with right leg forward, (c) Optimized simulated dynamics, (d) Optimized estimates, and (e) Filtered estimates.

7
35 25 [16] H. Kjellstrom, D. Kragic, and M.J. Black. Tracking
30
20
− Average = 16.54 people interacting with objects. In CVPR, pages 747–
25
− Average = 18.33
−− Median = 15.04 754, 2010.
Error in Pixels →

Error in Pixels →
−− Median = 18.55
20
15
[17] A.D. Kuo. A simple model of bipedal walking predicts

Right Shoulder
the preferred speed–step length relationship. Journal of
Right Shoulder

Left Shoulder
15
Left Shoulder

10

Right Elbow
Right Elbow

Right Hand

Right Knee
Right Hand

Left Elbow
Right Knee

Right Foot
Left Elbow

Right Foot

Left Hand
Biomechanical Engineering, 123:264, 2001.

Left Knee
Right Hip
Left Hand

Left Knee

Left Foot
Right Hip

Left Foot
10

Left Hip
Left Hip
5

Head
Head

5
[18] J.C. Lagarias, J.A. Reeds, M.H. Wright, and P.E. Wright.
0
Marker Name →
0
Marker Name → Convergence properties of the Nelder-Mead simplex
(a) (b) method in low dimensions. SIAM Journal of Optimiza-
tion, 9:112–147, 1998.
Fig. 6: Average error distribution for 13 markers for the
[19] Tad McGeer. Passive Dynamic Walking. The Interna-
obtained pose estimates for a video in the DARPA corpus.
tional Journal of Robotics Research, 9(2):62–82, 1990.
(a) Raw estimates averaged for both the outputs (with left leg
[20] M.B. Nogueira, A.A.D. Medeiros, P.J. Alsina, and
forward, with right leg forward), (b) Filtered estimates.
N.R.N. Brazil. Pose Estimation of a Humanoid Robot
Using Images From a Mobile External Camera. In IFAC
Workshop on Multivehicle Systems, 2006.
[2] A.O. Balan and M.J. Black. An adaptive appearance [21] J.P.B. Rubio, C. Zhou, and F.S. Hernández. Vision-
model approach for model-based articulated object track- based walking parameter estimation for biped locomotion
ing. In CVPR, volume 1, pages 758–765, 2006. imitation. Computational Intelligence and Bioinspired
[3] A.O. Balan, L. Sigal, M.J. Black, J.E. Davis, and H.W. Systems, pages 677–684, 2005.
Haussecker. Detailed human shape and pose from im- [22] M. Sarkis, K. Diepold, and K. Huper. Pose estimation of
ages. In CVPR, pages 1–8, 2007. a moving humanoid using Gauss-Newton optimization on
[4] M. A. Brubaker, L. Sigal, and D. J. Fleet. Estimating a manifold. In 7th IEEE-RAS International Conference
contact dynamics. In ICCV, pages 2389–2396, 2010. on Humanoid Robots, pages 228 –234, 2007.
[5] M.A. Brubaker and D.J. Fleet. The kneed walker for [23] L. Sigal and M.J. Black. Humaneva: Synchronized video
human pose tracking. In CVPR, pages 1–8, 2008. and motion capture dataset for evaluation of articulated
[6] M.A. Brubaker, D.J. Fleet, and A. Hertzmann. Physics- human motion. Brown Univertsity TR, 120, 2006.
based person tracking using simplified lower-body dy- [24] L. Sigal and M.J. Black. Measure locally, reason glob-
namics. In CVPR, pages 1–8, 2007. ally: Occlusion-sensitive articulated pose estimation. In
[7] A. Bruderlin and T.W. Calvert. Goal-directed, dynamic CVPR, volume 2, pages 2041–2048, 2006.
animation of human walking. ACM SIGGRAPH Com- [25] L. Sigal, A. Balan, and M.J. Black. Combined dis-
puter Graphics, 23(3):233–242, 1989. criminative and generative articulated pose and non-
[8] DARPA. ”DARPA Mind’s Eye Year 1 Videos”, Decem- rigid shape estimation. Advances in Neural Information
ber 2011. www.visint.org. Processing Systems, 20:1337–1344, 2007.
[9] A. Elgammal and C.S. Lee. Inferring 3D body pose from [26] L. Sigal, M. Isard, H. Haussecker, and M.J. Black. Loose-
silhouettes using activity manifold learning. In CVPR, limbed People: Estimating 3D Human Pose and Motion
volume 2, pages II–681, 2004. Using Non-parametric Belief Propagation. IJCV, pages
[10] H. Fujiyoshi and A.J. Lipton. Real-time human motion 1–34, 2011.
analysis by image skeletonization. In IEEE Workshop on [27] C. Sminchisescu and A. Jepson. Variational mixture
Applications of Computer Vision, pages 15–21, 1998. smoothing for non-linear dynamical systems. In CVPR,
[11] D.E. Goldberg. Genetic algorithms in search, optimiza- volume 2, pages II–608, 2004.
tion, and machine learning. Addison-wesley, 1989. [28] C. Sminchisescu and B. Triggs. Kinematic jump pro-
[12] O.K. Gupta and R.A. Jarvis. Robust pose estimation and cesses for monocular 3D human tracking. In CVPR,
tracking system for a mobile robot using a panoramic volume 1, pages I–69, 2003.
camera. In IEEE Conference on Robotics Automation [29] R. Urtasun, D.J. Fleet, A. Hertzmann, and P. Fua. Priors
and Mechatronics, pages 533–539, 2010. for people tracking from small training sets. In ICCV,
[13] Y.W. Hen and R. Paramesran. Single camera 3d human volume 1, pages 403–410, 2005.
pose estimation: A review of current techniques. In [30] R. Urtasun, D.J. Fleet, and P. Fua. 3D people tracking
International Conference for Technical Postgraduates, with Gaussian process dynamical models. In CVPR,
pages 1–8, 2009. volume 1, pages 238–245, 2006.
[14] L. Herda, R. Urtasun, and P. Fua. Hierarchical implicit [31] M. Vondrak, L. Sigal, and O.C. Jenkins. Physical
surface joint limits for human body tracking. Computer simulation for probabilistic motion tracking. In CVPR,
Vision and Image Understanding, 99(2):189–209, 2005. pages 1–8, 2008.
[15] S.J. Julier and J.K. Uhlmann. Unscented filtering and [32] Y. Yang and D. Ramanan. Articulated pose estimation
nonlinear estimation. Proceedings of the IEEE, 92(3): with flexible mixtures-of-parts. In CVPR, pages 1385–
401–422, 2004. 1392, 2011.

8
Fast Weighted Exponential Product Rules for
Robust General Multi-Robot Data Fusion
Nisar Ahmed* Jonathan Schoenberg* Mark Campbell
Autonomous Systems Laboratory, Cornell University
Ithaca, New York 14853
e-mail:[nra6,jrs55,mc288]@cornell.edu
*both authors contributed equally to this work

Abstract—This paper considers the distributed data fusion correlation is the weighted exponential product (WEP) [1]. The
(DDF) problem for general multi-agent robotic sensor net- WEP is a generalization of CI to non-Gaussian distributions
works in applications such as 3D mapping and target search. [9] and different cost metrics to determine an optimal WEP
In particular, this paper focuses on the use of conservative
fusion via the weighted exponential product (WEP) rule to fusion weight have been proposed [3, 9]. However, these
combat inconsistencies that arise from double-counting common existing WEP fusion approaches have drawbacks that can
information between fusion agents. WEP fusion is ideal for limit their usefulness in robotic DDF, including the theoretical
fusing arbitrarily distributed estimates in ad-hoc communication nature of their cost metrics and difficult implementation for
network topologies, but current WEP rule variants have limited arbitrary pdfs.
applicability to general multi-robot DDF. To address these issues,
new information-theoretic WEP metrics are presented along with This paper makes the following contributions: (1) it pro-
novel optimization algorithms for efficiently performing DDF poses novel information-theoretic metrics for performing WEP
within a recursive Bayesian estimation framework. While the fusion that address these issues and are suitable for fusing
proposed WEP fusion methods are generalizable to arbitrary arbitrary state estimates shared via ad-hoc network communi-
probability distribution functions (pdfs), emphasis is placed here cation topologies in a wide range of robotic DDF applications;
on widely-used Bernoulli and Gaussian mixture pdfs. Experi-
mental results for multi-robot 3D mapping and target search (2) it presents new formally consistent algorithms for online
applications show the effectiveness of the proposed methods. implementation of our proposed WEP fusion metrics that can
be used to quickly and robustly combine information in a
I. I NTRODUCTION recursive Bayesian estimation framework, with emphasis on
The problem of fusing information from an ensemble of the Bernoulli and Gaussian mixture distribution functions used
noisy data streams is critical to many existing and soon-to- widely in robotics; (3) it demonstrates the effectiveness of
be-realized robotic systems operating in uncertain dynamic our proposed WEP fusion methods for performing online
environments. This is particularly true for distributed multi- collaborative 3D mapping and 2D target search with multi-
robot systems requiring distributed perception for applica- robot networks in loopy communication topologies.
tions such as collaborative mapping for exploration, target
A. DDF Preliminaries
search/tracking for surveillance, and futuristic unmanned ur-
ban transport systems. For sensor agents in a network to share Formally, let xk ∈ Rnx be an nx -dimensional state to be
information and perform distributed data fusion (DDF), it estimated at discrete time steps k = 0, 1, ... by N independent
is most desirable to establish a scalable, flexible and robust robotic sensor agents. Assume that each robot i ∈ {1, ..., N }
network over which the robots can transmit and receive infor- obtains ny local sensor measurements of xk in the vector yik
mation. An ad-hoc and arbitrary connected network provides with likelihood function pi (yik |xk ). Let pi (xk |Zik ) be the local
scalability for fusion agents to join and drop off the network,  for robot i given the set of all information
posterior pdf
 Zik =
k−1 k k
flexibility to allow agents to join at any point and robustness Zi , yi available to i before new information Zj arrives
to ensure multiple links or agents must fail before the network from robot j = i, such that local fusion of yik is given by the
becomes unconnected [5]. posterior pdf from Bayes’ rule,
Implementation of DDF for general robot sensor networks
pi (xk |Zik ) ∝ pi (xk−1 |Zik−1 )pi (yik |xk ). (1)
thus requires conservative data fusion techniques to maintain
estimates that avoid inconsistencies due to rumor propaga- For generality, the robot networks considered here are as-
tion [3]. A common conservative fusion rule for estimates sumed to have arbitrary dynamic node-to-node communication
with Gaussian probability distribution functions (pdfs) with topologies, such that: (i) N may vary; (ii) each robot is only
unknown correlation is Covariance Intersection (CI) [11]. This aware of nodes it connected to; (iii) no robot ever knows the
rule is appropriate for certain types of problems, but it is complete global network topology; (iv) no robot knows the
inadequate for handling non-Gaussian distributions that arise receipt status of messages it has sent.
in applications such as target search and 3D mapping. A suit- Centralized fusion
N or raw data sharing can fully recover new
able conservative fusion rule for arbitrary pdfs with unknown information in i=1 Zik , but such methods scale poorly with

9
N and are vulnerable to node failures. DDF overcomes such C. WEP Fusion Algorithms
bottlenecks and can recover the centralized Bayesian fusion General WEP fusion can present challenging implementa-
result for a given pair of robots (i, j) [8], since tion issues when dealing with arbitrary non-Gaussian pdfs. For
pi (xk |Zik )pj (xk |Zjk ) instance, both Shannon and Chernoff fusion require minimiza-
p(xk |Zik ∪ Zjk ) ∝ , (2) tion of integral costs that are in general analytically intractable,
p(xk |Zik ∩ Zjk )
making WEP fusion potentially unsuitable for online robotic
where the denominator is the common information pdf for DDF problems. Furthermore, (3) is generally not available in
the local pdf pi (xk |Zik ) ∝ p(xk |Zik ∩ Zjk ) · p(xk |Zi\j k
), closed-form. Heuristic WEP fusion approximations have been
k
where Zi\j is the exclusive local information set for node proposed to address these issues [6, 10], but these have no
i and similarly for node j. Hence, the common information formal guarantees of either correctly minimizing the desired
pdf must be explicitly known for DDF to optimally extract WEP cost or accurately approximating (3). Such heuristic
only new information in Zik ∪ Zjk . Note that if division by approximations can thus lose information that should be
p(xk |Zik ∩ Zjk ) were ignored in (2), the resulting ‘Naive preserved between robots or introduce spurious information.
Bayes’ fusion rule would always double-count p(xk |Zik ∩ Zjk )
and induce inconsistencies via rumor propagation. However, II. N EW I NFORMATION - BASED WEP M ETRICS
tracking p(xk |Zik ∩ Zjk ) in an arbitrary ad-hoc network re- A. Limitations of Existing Metrics
quires pedigree-sharing about each distribution to fuse; this Shannon fusion is not always justified from a general
becomes computationally prohibitive for non-tree communi- Bayesian estimation perspective, since acquisition of new
cation topologies with loops/cycles [3, 8]. The generality of information can, in many cases, increase the entropy of the
assumptions (i)-(iv) above leads to difficulties for implement- exact/centralized Bayesian posterior for non-Gaussian pdfs [1].
ing DDF via graphical model techniques. Chernoff fusion ignores the possibility that an imbalance in
An alternative method for consistent DDF considered here information gain may exist between robots i and j prior to
is the conservative weighted exponential product (WEP) fusion fusion. To address such scenarios, ref. [9] suggests introducing
rule, weighting factors into Chernoff fusion that are equal to the
1 number of measurements taken by each robot in order to bias ω
pWEP (xk |Zik ∪ Zjk ) = pω (xk |Zik ) · p1−ω (xk |Zjk ), (3)
c i j towards the better informed pdf. However, the number of mea-
≡ pWEP (xk ) surements is not a complete reflection of information gain for
each robot, as local differences in sensor quality/characteristics
where ω ∈ [0, 1] and c is a normalizing constant. The WEP and information obtained from other neighbors in the robot
rule has two properties that make it well-suited to general network must also be taken into account.
multi-robot DDF problems: (1) it is applicable to arbitrary
non-Gaussian pdfs, and (2) for any ω ∈ [0, 1], it is guaranteed B. Generalized Information Weighted Chernoff Fusion
to avoid double-counting of p(xk |Zik ∩ Zjk ) [1]. However, The first new fusion rule modifies the measurement
this also means some exclusive information from i and j weighted Chernoff fusion [9] to apply weighting factors
is always discarded. Hence, the WEP fusion weight ω must ci , cj ≥ 0 that reflect actual information gain by each robot i
be picked carefully to ensure useful information exchange and j, instead of the number of measurements. This biases ω
between robots i and j. towards the constituent pdf with greater information content
B. WEP Fusion Metrics without explicitly counting measurements, so that ω satisfies
 
Ref. [9] considers two popular cost metrics on (3) for ci · DKL pWEP (xk )||pi (xk |Zik )
optimal selection of ω ∈ [0, 1]. The first cost is the Shannon  
= cj · DKL pWEP (xk )||pj (xk |Zjk ) , (4)
entropy of pWEP (xk ), which is motivated by the idea that the
absolute uncertainty of (3) should be minimized. Ref. [9] where DKL [·||·] is the KLD. For problems yik are always
shows that the resulting Shannon fusion rule [3] is a direct guaranteed to reduce uncertainty (e.g. static linear Gaussian
non-Gaussian generalization of the Gaussian CI rule. The estimation), ci and cj can be conveniently estimated as the
second cost selects the ω = ω ∗ corresponding to the Chernoff reciprocal of the entropy of the constituent distributions,
Information between pi (xk |Zik ) and pj (xk |Zjk ) [4], and is thus 1 1
called Chernoff fusion [6, 10]. It can be shown that this equates ci = , cj = , (5)
H[pi (xk |Zik )] H[pi (xk |Zik )]
the Kullback-Leibler divergences (KLDs) between (3) and the
individual robot posteriors, which guarantees that the Chernoff where H[p(x)] is the Shannon entropy of p(x). This leads
fusion result is ‘half-way’ between the original local i and j to an Entropy Weighted Chernoff Fusion solution that uses
posteriors in an information gain/loss sense. Both the Shannon a weight ω ∗ ∈ [0, 1] to equate the entropy weighted KLDs
and Chernoff fusion rules provide nice information theoretical between the WEP and original distributions. This results in
reasons for choosing ω, but each rule has potential limitations a bias towards the distribution with the lower entropy along
in the context of general multi-robot DDF, as explained in the chord that connects the two distributions according to the
Section II. KLDs.

10
However, as discussed in Section II-A, the Shannon entropy The advantage of this Minimum Information Loss Fusion
is not always a suitable indicator of information gain. It is also (10) scheme is that ω is selected to minimize the possible
unsuitable for fusing distributions over continuous states xk , information loss (7) should the two distributions be truly
since the differential Shannon entropy of probability density uncorrelated. In practice, Minimum Information Loss Fusion
functions can be negative (whereas ci , cj should be non- drives the solution towards Naive Bayes fusion in the case
negative). One possible alternative is to select ci and cj to when the distributions to fuse are significantly different (in a
be the information gained in the KLD sense with respect to a Kullback-Leibler sense), which could indicate the two do not
common reference distribution pr (xk ) for robots i and j. If a share a significant amount of common information. The Min-
common prior pdf p(x0 ) is available to all agents at k = 0 for imum Information Loss Fusion scheme provides an automatic
initializing Bayesian estimation, then this forms a convenient method for trading between the Naive Bayes and conservative
choice for pr (xk ), so that fusion rules, without resorting to a heuristic decision.
 
ci = DKL pi (xk |Zik )||p(x0 ) , (6) III. WEP F USION FOR M ULTI -ROBOT O CCUPANCY G RIDS
AND G AUSSIAN M IXTURE B ELIEFS
and similarly for cj . These weights reflect the total information
gain of each agent over time with respect to p(x0 ) via the The proofs have been omitted due to limited space, but it
combined effects of local Bayesian fusion and DDF. Note that can be shown that the newly proposed metrics Section II are
a fixed reference distribution must be used with the KLD, since all convex in ω for arbitrary pdfs pi (xk |Zik ) and pj (xk |Zjk ).
it is not a true distance metric and does not obey the triangle In this section, the problem of implementing WEP fusion
inequality. The KLD could be replaced by another divergence using the (un)weighted Chernoff and Minimum Information
that is a true metric (e.g. the Hellinger divergence [2] or the Loss cost metrics is addressed. The focus is on cases where
symmetric KLD), which removes the need for a common prior both pi (xk |Zik ) and pj (xk |Zjk ) are described by either discrete
and permits ci , cj to be computed incrementally, but that is not Bernoulli distributions or continuous finite Gaussian mixtures,
demonstrated here. which are commonly used in robotics, e.g. for occupancy grid
mapping and object tracking/localization, respectively. With
C. Minimum Information Loss Weight Fusion
some minor modifications, the methods described here can
A second cost is now developed to minimize an approxi- be readily applied in other contexts and to other arbitrary
mation on the information lost as a result of WEP fusion. Eq. discrete/continuous pdfs.
(3) discounts exclusive information at i and j to ensure that
the unknown common information between the two is counted A. Fast WEP Fusion for Bernoulli Distributions
only once. Therefore, there is an information loss between the Applying the general WEP fusion rule (3) to two Bernoulli
optimal fusion pdf (2) and the WEP fusion pdf (3), which is distributions with unknown correlation results in a fused
defined distribution that is also Bernoulli,
  
Δ
ILOSS = DKL pOPTIMAL (x|ZiK ZjK )||pWEP (x|ZiK ZjK )
pi (xk |Zik ) = pi , pj (xk |Zjk ) = pj (11)
(7) 1−ω

i pj
which cannot be computed without knowing the optimally pWEP (xk ) = . (12)
pω 1−ω
i pj + (1 − pi )ω (1 − pj )1−ω
fused distribution. However, note that if i and j are known
to have no information in common, then the optimal fusion In the case of Chernoff fusion, the optimal weight ω ∗ is
rule simplifies to the Naive Bayes (NB) fusion, given in closed form,
 1 [log(1−pi )−log(1−pj )]
pNB (x|ZiK ZjK ) = p(x|ZiK )p(x|ZjK ). (8) log − log pj + log(1 − pj )
∗ [log(pi )−log(pj )]
c ωCF = (13)
[log pi − log pj − log(1 − pi ) + log(1 − pj )]
In the case of an unknown dependence between the i and j
pdfs, the discounting of exclusive information resultant from For Entropy Weighted Chernoff Fusion, ω ∗ cannot be found
WEP fusion leads to information loss if the true dependence in closed form, and the following equation must be solved
between the distribution is zero. Therefore, an approximation numerically,
to the information loss is the KLD between the NB and WEP
1 pω (1 − pω )
fusion pdfs, pω log + (1 − pω ) log − ...
H[pi ] pi (1 − pi )
ILOSS ≈ ĪLOSS (ω) 1 pω (1 − pω )
   pω log + (1 − pω ) log = 0 (14)
Δ
= DKL pNB (x|Zi Zj )||pWEP (x|Zi Zj ) (9) H[pj ] pj (1 − pj )
where pω is given by (12) and the entropy of a Bernoulli
It is now possible to compute the fusion weight ω ∗ ∈ [0, 1] to
distribution is H[p] = −p log p − (1 − p) log(1 − p). When the
minimize the approximate information loss,
entropy of the two distributions is equal, Entropy Weighted
ω ∗ = arg min ĪLOSS (ω). (10) Chernoff Fusion collapses to regular (i.e. unweighted) Cher-
ω∈[0,1] noff fusion.

11
In the case of the Minimum Information Loss fusion, ω ∗ is Algorithm 1 GM WEP Fusion via Importance Sampling
given in closed form, Inputs: robot GM pdfs pi (xk ) and pj (xk ); number of samples Ns ; initial
 guess ω0 ; importance pdf exponent ω̄; update function [ωold , ωcurr ] ←
pi pj R1D [f˜(ωcurr ), ωcurr , ωold ] for desired 1D minimization rule; effective sample
log pj − log(1 − pj ) − log (pi −1)(p j −1) size threshold, Nt
ωM∗ - L = . (15) Output: ω̃ ∗ ∈ [0, 1]; samples {xs }N Ns
s=1 with normalized weights {θs }s=1
s
log pj − log(1 − pj ) − log pi + log(1 − pi )
1. Initialize ωcurr ← ω0 and ωold according to R1D
B. Fast WEP Fusion for Finite Gaussian Mixtures 2. construct GM importance sampling pdf g(x; ω̄) via (21)
3. draw Ns samples {xs }N s=1 ∼ g(x; ωcurr )
s
Assume each robot’s pdf is a finite Gaussian mixture (GM),
4. store pdf values pi (x ), pj (xk ), g(x; ωcurr ) for {xs }N
k s
s=1
i while ωcurr not converged do

M
5. compute unnormalized importance weights {θs }N s
s=1 via (20)
pi (x k
|Zik ) k
= pi (x ) = i
wm N (μim , Σim ) (16) 6. using (19), compute f˜(ωcurr ) via (17) or (18)
m=1 7. update [ωold , ωcurr ] ← R1D [f˜(ωcurr ), ωcurr , ωold ]
end while
where (for agent i and similarly for j) M i is the number 8. if effective sample size < N t , set ω̄ = ωcurr and go to step 1.
9. normalize {θs }N Ns ∗
of mixands, μi and Σi are the mean and covariance matrix s
s=1 s.t. s=1 θs = 1 and return ω̃ = ωcurr .
of component m, andwm is the mth mixing weight such
M
that wm ∈ [0, 1] and m=1 wm = 1. The WEP cost metric
integrals and the resulting fusion pdf (3) are not closed form the following ‘first-order’ GM approximation to (3) proposed
for GMs. A new fast and consistent two-phase Monte Carlo- by [10] is particularly effective,
based optimization and learning procedure is presented to F
overcome these issues. 
M
g(x; ω) = F
wm N (μm
F F
, Σm ) (21)
1) Particle-based optimization: The Information Weighted
m=1
Chernoff fusion cost can be written as  −1
 F
Σm = ω(Σiq )−1 + (1 − ω)(Σjr )−1 (22)
 
k
fWEP (ω) = hij (xk ) · pWEP (xk )dxk , (17) F
μm = ΣmF
ω(Σiq )−1 μiq + (1 − ω)(Σjr )−1 μjr (23)
ci log pi (x ) − cj log pj (x )
k k
F
(wqi )ω (wrj )1−ω
where hij (xk ) = , wm =  j 1−ω
, (24)
log pi (xk ) − log pj (xk ) i ω
q  ,r  (wq  ) (wr  )

(where unweighted Chernoff fusion uses ci = cj = 1); the which is a component-wise covariance intersection operation
Minimum Information Loss cost (4) can be written as on the constituent GMs (with M F resulting mixands).
 Eq. (19) is unbiased and converges to the true value of the
fWEP (ω) = ω · κ + log hij (xk ) · pWEP (xk )dxk ,
k
(18) integral as Ns → ∞. Hence, a numerical 1D optimization
   routine can obtain an unbiased estimate ω̃ ∗ of ω using (19) to
pj (xk ) evaluate (17) or (18). The 1D search is accelerated by using
hij (xk ) = 1, κ = pi (xk )pj (xk ) log dxk
pi (xk ) a fixed importance pdf g(x; ω) = g(x; ω̄), where ω̄ is the
Ns
same in all optimization steps. As such, the samples {xs }s=1
The integral terms with respect to pWEP (xk ) in each of these
are drawn once and the pdf values {pi (xs ), pj (xs ), g(xs ; ω̄)}
analytically intractable costs are approximated via importance
are stored as constants for recalculating (20) at new ω values.
sampling,
The resulting optimization procedure is shown in Algorithm
 1, where R1D [f˜(ωcurr ), ωcurr , ωold ] denotes an ω-update rule for
1 
Ns
hij (x ) · pWEP (x )dx ≈
k k k
θs (ω) · hij (xs ), (19) any desired 1D optimization algorithm (e.g. bisection search
Ns s=1 or golden section search) and ω̄ = 0.5 is typically used in
1−ω

i (xs )pj (xs ) our applications as an initial estimate. Algorithm 1 typically
θs (ω) = (20) converges quite quickly and therefore may be run again if
g(xs ; ω)
it terminates with an effective sample size less than some
where hij (xs ) are sample values of hij (xk ) and θs (ω) are prescribed threshold Nt . In this case, ω̄ is reset to the current
unnormalized non-negative weights for Ns i.i.d. samples best estimate for ω to restart the optimization.
Ns
{xs }s=1 drawn from a fixed importance pdf g(xk ; ω). The 2) WEM Learning: Immediately following optimization,
constants ci and cj in (6) and κ in (18) are generally not avail- density estimation procedures are applied to the importance
able in closed form for GMs, but can be well-approximated samples to recover an accurate GM to approximate (3) at
estimated with fast deterministic methods [7]. ω̃ ∗ . The Weighted EM (WEM) algorithm is used for GMs
To ensure that (19) is well-behaved and easy to compute, to exploit the normalized importance weights as part of the
the importance pdf g(xk ; ω) should be easy to sample from learning problem. Details for implementing WEM learning of
and evaluate, have non-zero support over the entire domain GMs from weighted samples are given in [7]. The initial GM
of integration and resemble (3) closely enough to control the parameters for WEM are found via compressing (21) to a
variance of (19). Many choices are possible for g(xk ; ω), but desired maximum number of GM terms, M MAX , using [13].

12
(a) (b)

Fig. 1. (a) Mobile robot set up with Hokuyo lidar and on-board computer
for distributed occupancy grid map fusion experiment. (b) Laboratory envi-
ronment for the occupancy grid mapping experiment is 15 × 8 meters and Fig. 2. The overhead paths of the eight mobile robots exploring the
contains boxes of different sizes. environment for distributed occupancy grid mapping experiment. The data
collection interval is 120 seconds and different robots explore different regions
of the environment with some overlap to demonstrate the benefit of the
IV. E XPERIMENTAL R ESULTS generalized fusion rules to distributed mapping. The distributed mapping
optimal network topology is shown as an inset.
A. Multi-Robot Occupancy Grid Map Fusion
The generalized fusion of the Bernoulli distribution (Section
III-A) is directly applicable to fusion of multi-robot occupancy
grid maps. The probability of a voxel of space being occupied
is represented as a Bernoulli probability and each voxel is
assumed independent. As a result, the fusion of occupancy grid
maps from multiple robots is accomplished by sequentially
performing fusion across each of the grid cells in the map.
A laboratory experiment demonstrates the fusion of 3D occu-
pancy grid maps based on the different fusion rules derived,
(a)
including Chernoff Fusion, Entropy Weighted Chernoff Fusion
and Minimum Information Loss fusion and compares the
resulting approximate information loss as a function of map
location for the different techniques.
Data was collected using the Pioneer P3-DX differential
drive mobile robot from Mobile Robots Inc. shown in Fig.
1(a). The primary sensor for the occupancy grid mapping is
the Hokuyo URG-04X laser scanner which features a 240o
field-of-view and angular resolution of 0.36o and maximum (b)
range of ≈ 5 m. The laser is pitched downward 45o and scans
along the ground as the robot moves forward.
The test environment is 15 × 8 meters and is instrumented
with a Vicon MX+ precision tracking system for robot lo-
calization and 3D object position/attitude determination. The
features in the environment consist of boxes between 10
and 25 cm tall that are meant to simulate traffic cones or
other similarly sized obstacles for a full-size traffic vehicle.
(c)
Eight robots are run in different paths around the environment
for a 120 second data collection (Fig. 2). The robots are Fig. 3. (a) 3D centralized occupancy grid map displaying the occupied voxels
with false coloring based on height. (b) 3D Occupancy grid map from Agent
run sequentially to avoid sensing each other during map 1 using only local updates. The occupied and empty cells (lightly shaded) are
construction. shown along with the final pose of the robot (large box). (c) 3D Occupancy
The robots each construct a 3D occupancy grid map using grid map from Agent 1 after Minimum Information Loss Fusion; this map is
qualitatively similar to the centralized solution.
the Octomap [14] implementation. The occupancy grid reso-
lution is 0.05 m. To establish a baseline occupancy grid map,
the data from all the mobile robots is processed to make the to height according to the scale in Fig. 3(b); a 0.25 m grid
centralized solution that is equivalent to all agents sending is shown for reference. The line of small skinny boxes to
each laser scan and position report to a central server for the left of the environment are clearly visible along with the
processing. The resulting centralized map is shown in Fig. larger boxes towards the right of the environment and the outer
3(a). The map is rendered displaying only the voxels that are walls. There are gaps in the center of the map where no robot
occupied or have a pOCC > 0.8, the empty and unknown voxels explored. The centralized solution establishes a baseline map
are not shown and the voxels are falsely shaded according for use in the distributed occupancy grid mapping approaches.

13
Distributed Occupancy Grid Fusion on Optimally Con-
nected Network: To evaluate the application of the distributed
generalized fusion rules to occupancy grid mapping, the
mobile robots are connected in a network. The network is
connected according to the topology with maximum robustness
[5]. The topology is symmetric, has equal node and link
connectivity equal to the minimum degree, and each node is a
central vertex of the graph. The topology is shown as an inset
(a)
to Fig. 2. This makes it difficult to track common information
during fusion without data tagging and generalized data fusion
for unknown correlations is used.
The individual sensor nodes collect and process local scans
to build a local occupancy grid map that will be updated with
map data passed along the network from other agents. An
example of map constructed via local updates only is in Fig.
3(b), which shows the map from Agent 1 (cf. path from Fig.
2). The map is rendered showing the occupied cells falsely (b)
shaded according to height and the empty cells are shown
lightly shaded; the unknown cells are not shown. The final
pose of the robot is shown using a solid box. The map shows
Agent 1 explores only a portion of the map. To enable a full
representation of the environment, the distributed data fusion
techniques are used over the maximally robust sensor network.
The robots share map information across the bi-directional
links connecting the robots. The agents are required to share
the following information for each voxel: the center coordi- (c)
nate and the log-odds probability. This implies each voxel
Fig. 4. (a) Approximate information loss ĪLOSS on the occupancy grid after
requires 32 bytes of data (if all numbers are in double Chernoff Fusion for Agent 1. (b) Approximate information loss ĪLOSS on
precision) transmitted for each communication. The nodes the occupancy grid after Entropy Weighted Chernoff Fusion for Agent 1. (c)
communicate aperiodically as they collect information. The Approximate information loss ĪLOSS on the occupancy grid after Minimum
Information Loss Fusion for Agent 1.
final occupancy grid map after Minimum Information Loss
Fusion is shown in Fig. 3(c). The distributed data fusion via three links connected to Agent 1.
Minimum Information Loss fusion does not have access to the The resulting information loss map for Entropy Weighted
centralized solution nor the optimal fusion distribution, but is Chernoff Fusion at Agent 1 is shown in Fig. 4(b). The
successfully utilized to build a map that is qualitatively similar results shows an increase in some areas and a decrease in
to the centralized solution (Fig. 3(a)). The maps for Chernoff the approximate information loss in other areas as a result of
Fusion and Entropy Weighted Chernoff Fusion are similar (not Entropy Weighted Chernoff Fusion. The resulting fused map
shown). (not shown) is similar to Fig. 3(b). The Minimum Information
Approximate Information Loss Maps: The approximate Loss Fusion rule is significantly better than either approach
information loss ĪLOSS (9) is computed after fusion for each of in terms of reducing the need for verification due to potential
the different fusion rules. The resulting information loss maps information loss, especially along the ground.
can be used for planning purposes to balance exploration vs The resulting information loss map for Minimum Informa-
verification of cells that may have contained substantial infor- tion Loss Fusion at Agent 1 is shown in Fig. 4(c). The results
mation loss as a result of fusion. The information loss map show a dramatic improvement in the potential information
resulting from Chernoff Fusion is shown in Fig. 4(a). The map loss, because the fusion rule has zero loss for 50% of the
is falsely shaded according to the approximate information possible combinations of Bernoulli distributions. The Mini-
loss metric ĪLOSS and the scale is shown in Fig. 4(b), where mum Information Loss Fusion rule generates consistent and
the range reflects the maximum range of values according to quality occupancy grid maps and has the lowest information
the empirical CDF computed over all possible combinations of loss. Therefore, this fusion rule is the best for distributed data
two Bernoulli distributions for the different fusion rules (not fusion of occupancy grid maps.
shown). The areas where the largest information loss occurs
when information coming from remote nodes clashes with B. GM Fusion for Multi-robot Target Search
the content estimated locally at Agent 1 and are in the left In another experiment, three Pioneer 3D-X robots are tasked
hand portion of the map. The areas of information loss to the to search for a static target with location xtarg in an en-
right hand portion of the map are resulting from the sequential vironment similar to the one shown in Fig. 1. Each robot
application of the fusion rule as information is received on the equipped with single forward facing Unibrain Fire-I camera

14
In contrast, the final WEP DDF posterior GMs in Fig. 6(e)-
(h) and information gains in Fig. 7(d)-(f) all bear much closer
resemblance to the centralized fusion result (with KLDs to the
centralized result roughly the same between 0.4-0.55 nats).
The posterior GM plots for Information Weighted Chernoff
and Minimum Information Loss fusion show that these rules
(a) (b) (c) are better at retaining some of the subtler features of the
Fig. 5. (a) Example GM prior over xTARG (darker/lighter surface indicates centralized robot fusion GM pdf, such as the distinctness of the
lower/higher density); (b) model for camera-based binary ‘No Detection’ modes around the outskirts of the map and the ‘plowed lanes’
likelihood (P (‘No Detection’|xTARG ) shown zoomed in); (c) posterior GM created by robots 2 and 3. On the other hand, the posterior
after fusing GM prior in (a) with ‘No Detection’ report in (b) via eq.(1) using
Gaussian sum filter approximation. GMs for unweighted Chernoff and Bhattacharyya fusion are
smeared out, indicating that less novel information is retained
with a 42.5 deg field of view, which feeds onboard visual by WEP fusion at the expense of robot 1’s older outdated com-
object detection software that produces binary ‘detection’/‘no mon prior information (note the frequent information ‘dips’ for
detection’ outputs at 1 Hz with 1 m maximum detection range. unweighted Chernoff). Interestingly, Fig. 7(f) shows that each
The likelihood model for the visual object detectors is shown robot is more likely to gain (and less likely to lose) information
in Fig. 5(b) and the common GM prior pdf for xTARG at via Information Weighted Chernoff fusion, as the gains for
k = 0 is shown in Fig. 6(a). The sensor likelihood model for each robot rarely drop from their previous values, as they
each robot is highly non-Gaussian and induces non-Gaussian do near the end of the unweighted Chernoff and Minumum
posterior via (1). As shown in Fig. 5(c), this local Bayesian Information Loss results. Further analysis shows that this
posterior is well-approximated at each time step k by a finite behavior for Information Weighted Chernoff fusion stems from
GM using a variant of the Gaussian sum particle filter with an aggressive ‘winner takes all’ strategy for selecting ω in
1000 particles per mixand [12] and a standard GM reduction the initial stages of the search, so that values of ω close
algorithm to keep M MAX = 15 for each robot [13]. to 0 or 1 are observed frequently. As the robot information
The robots move along different pre-defined search trajecto- gains become similar after k = 30, the ω values selected
ries for 65 secs, during which they fuse their own camera ob- by Information Weighted Chernoff become less extreme. This
servations to modify their local copy of the GM pdf for xTARG suggests different approaches are more appropriate at different
(none of the agents finds the target). The robots communicate times during the search process.
5 times (once every 13 secs) using the cyclic communication
pattern 1 → 2 → 3 → 1 for time steps k = 26 and k = 39, and V. C ONCLUSION
1 → 3 → 2 → 1 for the remaining instances. The unweighted The WEP approach for robust fusing two arbitrary prob-
Chernoff, KLD-gain Information Weighted Chernoff, and Min- ability distributions with unknown common information was
imum Information Loss rules were all evaluated for DDF via addressed for general multi-robot applications. The nominal
Algorithm 1 with Ns = 1000. Non-DDF benchmark runs Chernoff fusion rule does not account for differences in con-
using no sharing and centralized fusion were also evaluated, tent or quality of information sources used to construct con-
alongside alternative DDF benchmark runs obtained via Naive stituent robot distributions; the Information Weighted Chernoff
Bayes and conservative Bhattacharyya fusion. Bhattacharyya Fusion rule was developed to account for such potential dispar-
fusion is another WEP fusion rule that always assigns ω = 0.5 ities without explicitly counting the number of measurements.
[3]. This fusion rule requires no optimization but does require To approximately minimize losses incurred by WEP fusion
approximation of the WEP fusion pdf, which is accomplished relative to the exact fusion posterior, a second novel Minimum
by applying WEM to weighted importance samples, as done Information Loss Fusion rule was also developed using the
for the other WEP rules used here. Fig. 6(b)-(h) show the KLD between the Naive Bayes and WEP-fused distributions
resulting fusion posterior GM pdfs at k = 65 for robot 1 as the cost metric. Novel fast convex implementations of
(bottom left of map). Fig. 7 shows the KLD information gains the proposed WEP fusion rules were presented for fusion of
(6) for each robot over time under each fusion scheme. Bernoulli and arbitrary Gaussian mixture pdfs. These methods
Fig. 6(b) and Fig. 7(a) show that robot 1 contributes the were experimentally demonstrated in loopy multi-robot com-
least amount of new information, as it remains completely munication topologies for fusion of 3D occupancy grids and
still at the bottom left of the map throughout the search . 2D target location beliefs. Future work will investigate the
When using DDF to update its local copy of the xTARG GM, theoretical properties and online performance of the proposed
robot 1 exchanges GMs with robots 2 and 3 without any of the WEP methods across a variety of dynamic ad hoc fusion
robots being aware of the common information in the network. scenarios, spanning scales from dozens to thousands of robots.
The Naive Bayes DDF results in Fig. 6(d) and Fig. 7(c) show
that repeated double-counting of common information drives ACKNOWLEDGMENTS
the final DDF posterior pdfs for all robots very far from the This work was supported by the Air Force Office of
centralized fusion posterior in Fig. 6(c) and thus leads to Scientific Research (AFOSR) (FA9550-08-1-0356) and Army
severely inconsistent results. Research Office (ARO) (W911NF-09-1-0466).

15
Info Gain vs. Time, No Fusion Info Gain vs. Time, Centralized Fusion
0.4 0.4
Robot 1 Robots 2 and 3 Robot 1
0.35 Robot 2 separately and slowly gain 0.35
Robot 3 new info over search paths
0.3 0.3

KLD from Prior (nats)

KLD from Prior (nats)


0.25 Stationary Robot 1 0.25
receives no new info
0.2 0.2

0.15 0.15
Optimal info gain
0.1 0.1 curve for all robots
if all estimates fused
(a) (b) 0.05 0.05

0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time step k (secs) Time step k (secs)

(a) (b)
Info Gain vs. Time, Naive Bayes Fusion Info Gain vs. Time, Min Info Loss Fusion
3.5 0.4
Robot 1 Robot 1
3 Robot 2 0.35 Robot 2
Robot 3 Robot 3
0.3

KLD from Prior (nats)

KLD from Prior (nats)


2.5 Information gains ~10x larger Large initial info gain for Robot 1,
than for centralized fusion 0.25 followed by smaller info
2 gains/drops over network
(c) (d) 0.2
1.5
0.15
1
0.1
0.5 0.05

0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time step k (secs) Time step k (secs)

(c) (d)
Info Gain vs. Time, Unwtd Chernoff Fusion Info Gain vs. Time, Wtd. Chernoff
0.4 0.4
Robot 1 Robot 1
(e) (f) 0.35 Robot 2 Info gains drop off 0.35 Robot 2
Robot 3 and nearly level at later steps Robot 3
0.3 0.3

KLD from Prior (nats)

KLD from Prior (nats)


Large initial info gains Info gains more consistently
0.25 during fusion steps 0.25 positive for each robot
at each fusion step
0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time step k (secs) Time step k (secs)
(g) (h)
(e) (f)
Fig. 6. (a) Prior GM pdf for xTARG at k = 0; (b)-(h) Robot 1’s (bottom left)
GM posterior pdf at k = 65 under various fusion rules: (b) no fusion, (c) Fig. 7. Robot information gains vs. time relative to k = 0 GM prior
centralized fusion, (d) Naive Bayes, (e) unweighted Chernoff, (f) Information under different fusion rules: (a) no fusion, (b) centralized, (c) Naive Bayes
Weighted Chernoff, (g) Minimum Information Loss, (h) Bhattacharyya fusion. (note larger vertical scale) (d) Minimum Information Loss, (e) unweighted
Chernoff, (f) Information Weighted Chernoff. Results for Bhattacharyya fusion
(not shown) are similar to unweighted Chernoff results. Dashed vertical lines
R EFERENCES denote interagent DDF time steps.

[1] T. Bailey, S. Julier, and G. Agamennoni. On Conservative Practice, 2(5):849–863, 1994.


Fusion of Information with Unknown Non-Gaussian De- [9] M. B. Hurley. An Information-Theoretic Justification for
pendence. Technical report, Austrailian Center for Field Covariance Intersection and Its Generalization. Technical
Robotics, 2011. report, MIT Lincoln Lab, 2001.
[2] F. Bourgault. Decentralized control in a Bayesian World. [10] S. Julier. An Empirical Study into the Use of Chernoff
PhD thesis, University of Sydney, 2005. Information for Robust, Distributed Fusion of Gaussian
[3] K. Chang, Chong Chee-Yee, and S. Mori. Analytical and Mixture Models. In 9th Int’l Conf. on Info. Fusion., 2006.
Computational Evaluation of Scalable Distributed Fusion [11] S. J. Julier and J. K. Uhlmann. A non-divergent estima-
Algorithms. IEEE Trans. on Aero. and Electronic Sys., tion algorithm in the presence of unknown correlations.
46(4):2022–2034, 2010. In Proc. of the American Control Conf., volume 4, pages
[4] T. M. Cover and J. A. Thomas. Elements of information 2369–2373 vol.4, 1997.
theory, volume 6. Wiley Online Library, 1991. [12] J. H. Kotecha and P. M. Djuric. Gaussian sum particle
[5] Anthony H. Dekker and Bernard D. Colbert. Network filtering. IEEE Trans. on Signal Proc., 51(10):2602–
robustness and graph topology, 2004. 2612, 2003. 1053-587X.
[6] W. J. Farrell and C. Ganesh. Generalized chernoff fusion [13] Andrew R. Runnalls. Kullback-Leibler Approach to
approximation for practical distributed data fusion. In Gaussian Mixture Reduction. IEEE Trans. on Aerospace
12th Int’l Conf. on Info. Fusion., pages 555–562, 2009. and Electronic Systems., 43(3):989–999, 2007.
[7] J. Goldberger, H. K. Greenspan, and J. Dreyfuss. Simpli- [14] K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss,
fying Mixture Models Using the Unscented Transform. and W. Burgard. OctoMap: A probabilistic, flexible, and
IEEE Trans. on Pattern Analysis and Mach. Intell.,, 30 compact 3D map representation for robotic systems. In
(8):1496–1502, 2008. Proc. of the ICRA 2010 workshop on best practice in 3D
[8] S. Grime and H. F. Durrant-Whyte. Data fusion in perception and modeling for mobile manipulation, 2010.
decentralized sensor networks. Control Engineering

16
State Estimation for Legged Robots: Consistent
Fusion of Leg Kinematics and IMU
Michael Bloesch, Marco Hutter, Mark A. Hoepflinger, Stefan Leutenegger, Christian Gehring,
C. David Remy and Roland Siegwart
Autonomous Systems Lab, ETH Zürich, Switzerland, [email protected]

Abstract—This paper introduces a state estimation framework


for legged robots that allows estimating the full pose of the
robot without making any assumptions about the geometrical
structure of its environment. This is achieved by means of an
Observability Constrained Extended Kalman Filter that fuses
kinematic encoder data with on-board IMU measurements. By
including the absolute position of all footholds into the filter
state, simple model equations can be formulated which accurately
capture the uncertainties associated with the intermittent ground
contacts. The resulting filter simultaneously estimates the position
of all footholds and the pose of the main body. In the algorithmic
formulation, special attention is paid to the consistency of the
linearized filter: it maintains the same observability properties
as the nonlinear system, which is a prerequisite for accurate state
estimation. The presented approach is implemented in simulation
and validated experimentally on an actual quadrupedal robot.

I. I NTRODUCTION Fig. 1. Experimental quadruped platform StarlETH [11]. The inertial and
Particularly in rough and highly unstructured environments the body fixed coordinate frames I and B are depicted, as well as the
absolute position vectors of the robot r and of the footholds p1 , . . . , pN .
in which we ultimately want to employ autonomous legged The presented EKF includes all foothold positions into the estimation process.
robots, postural controllers require fast and precise knowledge
of the state of the robots they are regulating. Especially
knowing the precise relief of the terrain, Chitta et al. [3]
for dynamic locomotion, the underlying state estimation can
implemented a pose estimator based on a particle filter. It fuses
quickly become a bottleneck in terms of achievable bandwidth,
leg kinematics and IMU in order to globally localize a robot.
robustness, and locomotion speed. To achieve the required
Just very recently, three novel pose estimators have been
performance, a state estimator for legged robots should explic-
presented that are all based on leg kinematics. Reinstein
itly take into account that such systems are interacting with
and Hoffmann [15] presented a data-driven approach using
their environment via multiple intermittent ground contacts.
joint encoders, pressure sensors, and an on-board IMU. They
Ignoring or neglecting the ground interaction will lead to
searched for significant sensory based indicators in order to
computationally and sensory more “expensive” approaches,
determine the stride length when given a specific gait pattern.
ranging from vision-based [2, 16, 17] to GPS-supported [4, 6]
With this assumption, they successfully limited the position
methods. In contrast to such approaches, we will show in the
drift of the IMU and by appropriate training of the filter could
following that in cases where on-board sensors fully measure
additionally handle slippage. Chilian et al. [2] implemented a
the internal kinematics of the robot as well as its inertial
leg odometer based on point cloud matching for a hexapod
acceleration and rotational rate, precise information on the
robot, requiring a minimum of three feet in contact. It is
robot’s pose can be made readily available.
based on a multisensor fusion algorithm that includes inertial
One of the earliest approach exploiting information given by
measurements and visual odometry. Assuming planar spring-
the leg kinematics was implemented by Lin et al. [13] in 2005
mass running, Gur and Saranli [7] proposed a generic, model-
on a hexapod robot. Assuming that the robot is in contact with
based state estimation technique.
three of its six feet at all times and assuming completely flat
In the presented approach we implement an Extended
terrain, they implemented a leg-based odometer. Their method
Kalman Filter (EKF) and choose an appropriate state vector
requires the robots to follow a tripod gait and is affected by
in order to break down the estimation problem to the proper
drift. In [14], the same group fused the leg-based odometer
formulation of a few simple measurement equations. Without
with data from an Inertial Measurement Unit (IMU) and thus
any assumption about the shape of the terrain, we are able to
is able to handle tripod running. Using the assumption of
estimate the full state of the robot’s main body, and we can
This research was supported by the Swiss National Science Foundation provide an estimate of the ground geometry. By performing an
through the National Centre of Competence in Research Robotics. observability analysis, we show that apart from the absolute

17
position and yaw angle of the robot all other states can be where C is the matrix rotating coordinates of a vector
precisely observed as long as at least one foot is in contact expressed in the inertial coordinate frame I into the body
with the ground. This means that, after accumulating some coordinate frame B. The IMU quantities f and ω are assumed
drift during a flight phase, the pitch and roll angles become to be directly measured in the body coordinate frame B.
again fully observable when the robot regains ground contact In order to describe the underlying stochastic process, the
and the corresponding estimation errors will decrease. following continuous stochastic models are introduced:
Only proprioceptive sensors are required and no assump-
tions are made concerning the type of gait or the number of f̃ = f + bf + wf , (4)
robot legs. Little foot slippage and uncertainties on the leg ḃf = wbf , (5)
kinematics can be handled as well. Due to current limitations ω̃ = ω + bω + wω , (6)
of the control approach, dynamic gaits are currently evaluated ḃω = wbω . (7)
in simulation only. Still, results obtained from static walking
sequences on an actual quadrupedal platform (see Fig. 1) are The measured quantities f̃ and ω̃ are affected by additive
presented and compared with ground truth measurements from white Gaussian noise processes wf and wω and by bias terms
an external motion tracker. bf and bω . The bias terms are modeled as Brownian motions
The structure of the paper is as follows. In section II a short and their derivatives can be represented by white Gaussian
overview of the sensory devices is provided. Subsequently, noise processes wbf and wbω . The noise terms are specified
section III presents the design of an Extended Kalman Filter. by the corresponding covariance parameters Qf , Qbf , Qω ,
Section IV argues on the observability of the filter states and and Qbω . Following the paper of El-Sheimy et al. [5], they can
introduces observability constraints. Simulation and experi- be evaluated by examining the measured Allan variances. For
mental validation are discussed in section V. the sake of simplicity each covariance parameter is assumed
to be a diagonal matrix with identical diagonal entries.
II. S ENSOR D EVICES AND M EASUREMENT M ODELS
This section discusses the required sensors and the cor- III. S TATE E STIMATION
responding stochastic measurement models for a N legged As stated in the previous section, two different sources of
robot. The particular model choices represent a trade-off be- data are available. Each of them provides information that can
tween simplicity and accuracy. Throughout the paper, external potentially contribute to the state estimate of the robot. In
disturbances and noise will be modeled as continuous white order to exploit this information an Extended Kalman Filter
Gaussian noise or as discrete Gaussian noise processes. This is designed. This section starts by defining the state vector of
is a coarse simplification, but can be handled by increasing the filter and subsequently continues by formulating the filter
the corresponding covariance matrix. models and equations.
A. Forward Kinematics and Encoders A. Filter State Definition
Incremental encoders provide access to the angular position The state vector of the filter has to be chosen such that the
of all joints. The corresponding encoder measurement vector corresponding prediction and measurement equations can be
α̃ of the joint angles vector α is assumed to be affected by stated in a clean and consistent manner. In this approach the
discrete Gaussian noise nα with covariance matrix Rα : state vector of the quadruped robot is composed of the position
α̃ = α + nα . (1) of the center of the main body r, of the corresponding velocity
v and of the quaternion q representing the rotation from the
Based on the known leg kinematics, the location of each foot inertial coordinate frame I to the body coordinate frame B.
can be computed with respect to the main body. However, due In order to consider the kinematics of the legs, the absolute
to erroneous calibration and possible errors in the kinematical positions of the N foot contact points pi are included into the
model lkini (·) of leg i, additive discrete Gaussian noise terms state vector. Together with the accelerometer bias bf and the
ns,i are included in the model: gyroscope bias bω this yields the following state vector:
si = lkini (α) + ns,i , (2)  
x := r v q p1 · · · pN bf bω . (8)
where si represents the vector from the center of the main r,v and all contact positions pi are expressed in the inertial
body to the contact point of leg i and where Rs is the coordinate frame I, whereas bf and bω are expressed in
covariance matrix of ns,i . Both, si and ns,i , are expressed the body coordinate frame B. Given a quaternion q the
in the body fixed frame B. corresponding rotation matrix C can be easily determined.
B. Inertial Sensors The presented Extended Kalman Filter represents the uncer-
The IMU measures the proper acceleration f and the angu- tainties of the estimated state vector via the covariance matrix
lar rate ω of the robot’s main body. The proper acceleration P of the corresponding state error vector δx
is related to the absolute acceleration a by P := Cov(δx), (9)
 
f = C(a − g), (3) δx := δr δv δφ δp1 · · · δpN δbf δbω . (10)

18
For the orientation state q, special care has to be taken. It foothold is set to infinity (or to a very large value) whenever
possesses 3 degrees of freedom and its covariance term should it has no ground contact. This enables the corresponding
therefore also be represented by a 3 dimensional covariance foothold to relocate and reset its position estimate when it
matrix. Therefore the error of the pose is represented as a 3- regains ground contact, whereby the old foothold position is
dimensional rotation vector δφ. That is, if q̂ represents the dropped from the estimation process. This is all that is required
estimate of the orientation quaternion, the error quaternion δq in order to handle intermittent contacts when stepping.
is defined by the relation ⎡ ⎤
wp,i,x 0 0
q = δq ⊗ q̂, (11) Qp,i = ⎣ 0 wp,i,y 0 ⎦ . (21)
0 0 wp,i,z
where ⊗ is the quaternion multiplication operator and where
the quaternion error is related to the error rotation vector by C. Measurement Model
means of the map ζ(·):
Based on the kinematic model (2) a transformed measure-
δq = ζ(δφ), (12) ment quantity is introduced for each leg i:
v
sin( 12 v) v s̃i := lkini (α̃)
ζ : v → ζ(v) = . (13) (22)
cos( 2 v)
1
≈ lkini (α) + J lkin,i nα (23)
The inclusion of the foot contact positions into the filter ≈ si −ns,i + J lkin,i nα . (24)
state is the key point in the filter design, enabling a simple and   
ni
consistent representation of the model equations. The leg kine-
matics measurements represent relative pose measurements be- The linearized noise effect from the encoders (1) and the noise
tween main body and foot contact, based on which the EKF is from the foothold position are joined into a new measurement
able to simultaneously correct the location of the foot contacts noise quantity ni with covariance matrix Ri :
as well as the pose of the main body. In fact, the presented
approach can be interpreted as a simultaneous localization and Ri = Rs + J lkin,i Rα J Tlkin,i , (25)
mapping (SLAM) algorithm, where the position of the actual
where J lkin,i is the Jacobian of the kinematics of leg i with
foot contacts build up the map the robot is localized in.
respect to the joint angles αi of the same leg:
B. Prediction model
∂lkini (α)
The prediction equations are responsible for propagating the J lkin,i := i ∈ {1, . . . , N }. (26)
∂αi
state from one timestep to the next. The IMU measurements
f̃ and ω̃ are directly included here. Using (3)-(7), a set of s̃i is the measurement of the position of the foot contact i
continuous time differential equations can be formulated: with respect to the body coordinate frame B which can also
be expressed as the absolute position of the foot contact minus
ṙ = v, (14) the absolute position of the robot rotated into the body frame.
v̇ = a = C (f̃ − bf − wf ) + g,
T
(15)
s̃i = C(pi − r) + ni . (27)
1 1
q̇ = Ω(ω)q = Ω(ω̃ − bω − wω )q, (16)
2 2 D. Extended Kalman Filter Equations
ṗi = C T wp,i ∀i ∈ {1, . . . , N }, (17)
For the subsequent linearization and discretization of the
ḃf = wbf , (18) above models, the following auxiliary quantity is introduced:
ḃω = wbω , (19) ∞
 Δti+n ×i
where Ω(·) maps an arbitrary rotational rate ω to the 4x4 Γn := ω , (28)
i=0
(i + n)!
matrix used for representing the corresponding quaternion rate:
⎡ ⎤ where the (·)× superscript is used to represent the skew-
0 ωz −ωy ωx
⎢−ωz 0 ω x ωy ⎥ symmetric matrix obtained from a vector. It draws on the series
Ω : ω → Ω(ω) = ⎢ ⎣ ωy −ωx 0 ωz ⎦ .
⎥ (20) expansion of the matrix exponential. For n = 0 it yields:
−ωx −ωy −ωz 0 ∞  i
Δtω ×  
While in principle the foot contacts are assumed to remain Γ0 = = exp Δtω × . (29)
i=0
i!
stationary, the white noise terms wp,i in (17) with covariance
parameter Qp,i are added to the absolute foot positions in or- This means that Γ0 represents the incremental rotation matrix
der to handle a certain amount of foot slippage. It is described if rotating an arbitrary coordinate frame with a rotational rate
in the body frame which allows tuning the magnitude of the of −ω for Δt seconds. There exists a closed form expression
noise terms in the different directions relative to the quadruped for Γn that can be efficiently numerically evaluated (similar
orientation (21). Furthermore, the noise parameter of a certain to Rodrigues’ rotation formula).

19
1) Prediction Step: A standard filtering convention is em- By linearly combining two Gaussian distributions the Ex-
ployed: at time step k the a priori state estimate is represented tended Kalman Filter stipulates the following a priori estimate
by x̂−k , the a posteriori state estimate by x̂k . Assuming
+
of the covariance matrix at the timestep k + 1:
zero-order hold for the measured quantities f̃ k and ω̃ k , and
neglecting the effect of the incremental rotation, equations P−
k+1 = F k P k F k + Qk .
+ T
(45)
(14)-(19) can be discretized to:
2) Update Step: The measurement residual, also called
Δt2 +T innovation, is the difference between actual measurements and
r̂ −
k+1 = r̂ +
+
k +Δtv̂ +
k (Ĉ k f̂ k + g), (30)
2 their predicted value:
+T
v̂ − = v̂ +
k + Δt(Ĉ k f̂ k + g), (31) ⎛ − ⎞
s̃1,k − Ĉ k (p̂− −
k+1
1,k − r̂ k )
q̂ − = ζ(Δtω̂ k ) ⊗ q̂ +
k, (32) ⎜ ⎟
y k := ⎜ .. ⎟.
k+1
⎝ ⎠ (46)
p̂−
i,k+1 = p̂+
i,k ∀i ∈ {1, . . . , N }, (33) − −
.

− + s̃N,k − Ĉ k (p̂N,k − r̂ k )
b̂f,k+1 = b̂f,k , (34)
− + Considering the error states and again neglecting all higher
b̂ω,k+1 = b̂ω,k , (35)
order terms, it can be derived that the errors of the predicted
with the bias corrected IMU measurements leg kinematics measurements are given by:
+
f̂ k = f̃ k − b̂f,k , (36) − − −
si,k − Ĉ k (p̂− − − −
i,k − r̂ k ) ≈ − Ĉ k δr k + Ĉ k δpi,k
ω̂ k = ω̃ k −
+
b̂ω,k . (37)  − ×
+ Ĉ k (p− −
i,k − r k ) δφ−
k . (47)
In order to correctly propagate the covariance matrix
through the state dynamics, a set of linear differential equa- With this the measurement Jacobian H k can be evaluated:
tions describing the error dynamics is derived from (14)-(19)
∂y k
where all higher order terms were neglected: Hk =
∂ x̂k
˙ = δv,
δr (38) ⎡ −
 −
× −

−Ĉ k 0 Ĉ k (p̂− −
1,k − r̂ k ) Ĉ k · · · 0 0 0
˙ = −C T f × δφ − C T δbf − C T wf ,
δv (39) ⎢ ⎥
⎢ .. ⎥ .
˙ = −ω × δφ − δbω − wω , = ⎢ ... .. .. .. . .
.
.. ..
.⎥
δφ (40) ⎣ .

.

. . . ⎦
− − −
δp˙ = C T wp,i
i ∀i ∈ {1, . . . , N }, (41) −Ĉ k 0 Ĉ k (p̂− −
N,k − r̂ k ) 0 · · · Ĉ k 0 0
˙ f = wbf ,
δb (42) Stacking the single measurement noise matrices (25) returns
˙ ω = wbω .
δb (43) the total measurement noise matrix:
⎡ ⎤
For the subsequent discretization, Van Loan’s results [18] R1,k
and the relation (28) can be applied to get the discrete ⎢ .. ⎥
Rk = ⎣ . ⎦. (48)
linearized error dynamics matrix F k and the discrete process
noise covariance matrix Qk (for readability only one foothold RN,k
estimate is depicted): Finally the a priori state estimate can be merged with
⎡ 2 +T × 2 +T ⎤ the current measurements, where the Extended Kalman Filter
I ΔtI − Δt2 Ĉ k f̂ k 0 − Δt2 Ĉ k 0
⎢ +T × +T ⎥ states the following update equations:
⎢0 I −ΔtĈ k f̂ k 0 −ΔtĈ k 0 ⎥
⎢ T ⎥
⎢ T
−Γ̂1,k ⎥ S k := H k P −
k H k + Rk ,
T
(49)
F k = ⎢0 0 Γ̂0,k 0 0 ⎥ , (44)
⎢0 0 0 ⎥
⎢ 0 I 0 ⎥ K k := P− T −1
k H k Sk , (50)
⎣0 0 0 0 I 0 ⎦ Δxk := K k y k , (51)
0 0 0 0 0 I −
P+
k := (I − K k H k )P k (52)

Δt3 Δt5 Δt2 4
Qf + Qbf Qf + Δt
8 Qbf 0


3
Δt2
Qf +
20
Δt4
Qbf
2
Δt3
ΔtQf + 3 Qbf 0
where S k represents the innovation covariance, K k the



2
0
8
0 ΔtQω + (Γ̂3,k +
T
Γ̂3,k )Qbω Kalman gain, Δxk the resulting correction vector and P + k



0 0 0 the a posteriori estimate of the state covariance matrix. Given
3 + 2 +
⎣ − Δt
6 Q bf Ĉ k − Δt
2 Qbf Ĉ k 0 Δxk the state estimate can be updated. Again the orientation
0 0 −Qbω Γ̂2,k
3 +T ⎤
state requires special attention. Although the quaternion is of
0 − Δt
6 Ĉ k Qbf 0 dimension 4, the extracted rotational correction Δφk has only
2 +T ⎥
0 − Δt
2 Ĉ k Qbf 0 ⎥
T ⎥

3 dimensions. It basically represents the 3D rotation vector that
0 0 −Γ̂2,k Qbω ⎥ = Qk .
+T
ΔtĈ k Qp Ĉ k
+ ⎥ needs to be applied to correct the predicted quaternion:
0 0 ⎥

0 ΔtQbf 0 −
0 0 ΔtQbω q̂ +
k = ζ(Δφk ) ⊗ q̂ k . (53)

20

IV. O BSERVABILITY A NALYSIS form. For the sake of readability the are dropped again:
⎡ ⎤
I ··· 0 0 0 0 0 0
A. Nonlinear Observability Analysis ⎢ .. . . .. .. .. .. .. .. ⎥
⎢. . . . . . . .⎥
⎢ ⎥
⎢0 · · · I 0 0 0 0 0⎥
Analyzing the observability characteristics of the underlying ⎢ ⎥
nonlinear system reveals the theoretical limitations of state ⎢0 · · · 0 −I
⎢ 0 0 s× 1 0⎥ ⎥
⎢0 · · · 0 0 I −2(Cg)× O1 0⎥
estimation and can validate the employed approach. Based on ⎢ ⎥
the paper of Hermann and Krener [8] a nonlinear observability O= ⎢ ⎢0 · · · 0 0 0 2ω × (Cg)× O2 0⎥ ⎥ , (59)
⎢0 · · · 0 0 0 0 Δs× 0⎥
analysis is performed. In order to remain analytically tractable ⎢ i,j ⎥
⎢0 · · · 0 0 Δs× ×
0⎥
robocentric coordinates are introduced. The coordinate trans- ⎢ 0 0 i,j ω ⎥
⎢0 · · · 0 0 Δs× × ×
0⎥
formation is bijective and will thus not change the observabil- ⎢ 0 0 i,j ω ω ⎥
⎢0 · · · 0 0 0 0 O3 0⎥
ity characteristics. Given the current operating point by ⎣ ⎦
..
  0 ··· 0 0 0 0 . 0
x∗ := r ∗ v ∗ q ∗ p∗1 · · · p∗N b∗f b∗ω (54)
O 1 = − s× × ×
1 ω − 2(Cv) , (60)
the following coordinate transformation is introduced: O2 = (s×

× × ×
+ 3(Cv) )ω − ω ×
(s×

×
+ 2(Cv) ) ×

× ×
⎡ ⎤ ⎡ ⎤ − (Cg) − 2f , (61)
s1 C(p1 − r) ×
⎢ .. ⎥ ⎢ .. ⎥ O 3 = ω × (s× × × × × ×
1 ω ω + 5(Cv) ω − 4f − 3(Cg) )
⎢ . ⎥ ⎢ ⎥ ×
⎢ ⎥ ⎢ . ⎥ − (s× × × × × ×
1 ω ω + 4(Cv) ω − 3f − 2(Cg) )ω
×
⎢sN ⎥ ⎢C(p − r)⎥
⎢ ⎥ ⎢ N ⎥ − 4ω × (Cv)ω T , (62)
⎢ ⎥ ⎢ Cv ⎥
z := ⎢ v̄ ⎥ = ⎢ ⎥. (55)
⎢ b̄ω ⎥ ⎢ bω − b∗ ⎥
⎢ ⎥ ⎢ ω ⎥
Δsi,j := si − sj . (63)
⎢ q̄ ⎥ ⎢ q ⊗ q ∗−1 ⎥
⎢ ⎥ ⎢ ⎥
⎣ b̄f ⎦ ⎣ bf − b∗f ⎦ A full interpretation of this matrix is not within the scope
r̄ Cr of this paper. However, two essential points are emphasized.
The four dimensional manifold composed of robot position
The quantities in (55) are ordered such that a nice row echelon and yaw angle (rotation around gravity vector g) is always
form results. The corresponding prediction model (14)-(19) unobservable. This can be verified by looking at the tangential
and measurement equation (27) will be transformed to space spanned by the matrix
T
⎡ ⎤ 0 ··· 0 0 0 0 0 I
×
(ω − b̄ω ) s1 − v̄ Ū = , (64)
0 ··· 0 0 0 1
2 (Cg)
T
0 0
⎢ .. ⎥
⎢ . ⎥ 0 = O Ū . (65)
⎢ ⎥
⎢ (ω − b̄ ) ×
s − v̄ ⎥
⎢ ω N ⎥ Infinitesimal errors Δz = Ū  lying within the subspace of
⎢(ω − b̄ω )× v̄ + f − b̄f + C̄C ∗ g ⎥
ż := ⎢ ⎥, (56) Ū cannot be detected. Transforming this back to our original
⎢ 0 ⎥
⎢ ⎥ coordinates yields the tangential space
⎢ Ω(ω − b̄ )q̄ ⎥
⎢ ω ⎥
⎣ 0 ⎦ CT 0 0 CT · · · CT 0 0
T
U= (66)
(ω − b̄ω )× r̄ + v̄ g T r × g T v × g T C g T p×
T T ×
1 · · · g pN 0 0
s̃i = si i ∈ {1, . . . , N } (57) where the upper row corresponds to a 3 dimensional trans-
lation of the inertial coordinate frame and where the lower
where all noise terms were disregarded as they have no influ- row corresponds to a rotation of the inertial coordinate frame
ence on the observability and where C̄ and C ∗ represent the around the gravity axis g.
rotation matrices corresponding to q̄ and to q ∗ , respectively. The second point is that in some cases, the rank loss asso-
The observability of the transformed system can now be ciated with the unobservable manifold can increase by up to 5
analyzed. In contrast to the linear case, Lie-derivatives need additional ranks. Table I depicts some of the cases. All cases
to be computed in order to evaluate the observability matrix. which induce a rank loss require some singularities. It can thus
By applying a few row-operations and by directly including be stated that statistically almost surely the unobservable space
the transformed operating point will be limited to absolute position and yaw angle (except for
the case where there is no ground contact at all). Note that
  if the bias estimation is excluded, the unobservable subspace
z ∗ := s∗1 · · · s∗N C ∗ v ∗ 0 (0 0 0 1) 0 C ∗ r ∗ (58) will be invariantly of rank four.
Unfortunately, typical operating points can lie very close to
the observability matrix can be converted into a row echelon singular cases. The upper highlighted row in table I represents

21
ω f v s1 , . . . , sN Rank loss
ω · Cg =
 0 * * not co-linear 0 The above phenomenon has been observed earlier in the
ω · Cg = 0 det O 3 = 0 at least one contact 0 context of EKF-SLAM [9, 12]. Huang et al. [10] introduced
ω · Cg = 0 * * at least one contact ≥1 the Observability Constrained Extended Kalman Filter in order
0 * * at least one contact ≥2
to tackle this issue. The approach in this paper goes much
0 * * not co-linear 2
0 0 * s1 = . . . = s N 3 along their idea: the unobservable subspace of the nonlinear
0 −1/2Cg * s1 = . . . = s N 5 system (66) should also be unobservable in the linearized
TABLE I and discretized system (67)-(68). Mathematically, this can be
E STIMATION SCENARIOS AND CORRESPONDING RANK LOSS . imposed by adding the following constraint:
the case where the robot has at least 3 non co-linear ground M U = 0. (70)
contacts and where the rotation axis is not perpendicular to
the gravity vector. The lower highlighted row represents the In order to meet this constraint Huang et al. evaluate the
corresponding singular case where ω = 0 inducing a rank Jacobians at special operating points: instead of using the
loss of 2. This proximity to singular cases can cause bad actual state estimate they use slightly altered values.
convergence quality. For this reason the filter is implemented The approach in this paper tackles the observability problem
in such a manner that the estimation of the accelerometer by exploiting the following observation: the noiseless case
and gyroscope biases can be enabled or disabled at runtime. does meet the constraint (70) because it perfectly fulfills the
Thereby it is possible to disable the bias estimation for critical prediction equations (30)-(35) and thus the appropriate terms
tasks. On the other hand special maneuvers can be derived are canceled out. For the presented filter it suffices if the
from the conditions in table I which can properly estimate the following constraints are introduced (where a ∗ denotes the
bias states. states or measurements around which Jacobians are evaluated):
Δt2 ∗T ∗
B. Observability Analysis of the Extended Kalman Filter r ∗k+1 = r ∗k + Δtv ∗k + (C k f k,1 + g), (71)
2
The filter makes use of a linearized and discretized version ∗
v ∗k+1 = v ∗k + Δt(C ∗T
k f k,2 + g), (72)
of the nonlinear system model:
q ∗k+1 = ζ(ω ∗k ) ⊗ q ∗k , (73)
xk+1 = F k xk + wlin,k , (67)
p∗i,k+1 = p∗i,k ∀i ∈ {1, . . . , N }. (74)
y k = H k xk + nlin,k , (68)
Both, filter state and IMU measurements, are allowed to
where errors caused by linearization or discretization are differ from their actual estimated quantities. However, in order
incorporated in the noise terms wlin,k and nlin,k . The corre- to keep the linearization errors small the linearization point
sponding observability analysis will be performed by applying should remain as close as possible to the estimated state. Thus,
the concept of local observability matrices [1]: similar to the given the timestep li of the last touch-down event of foot i,
time-invariant case the observable subspace can be derived the first-ever available estimate is chosen for the linearization:
by analyzing the subspace spanned by the rows of a local
observability matrix: r ∗k = r −
k, v ∗k = v −
k, q ∗k = q −
k, (75)
⎡ ⎤ p∗i,k = p− ∀i ∈ {1, . . . , N }. (76)
Hk i,li
⎢ H k+1 F k ⎥
⎢ ⎥ This is in analogy to the First-Estimates Jacobian EKF of
⎢ H k+2 F k+1 F k ⎥
M=⎢ ⎥. (69) Huang et al. [9]. But, in general, the prediction constraints
⎢H k+3 F k+2 F k+1 F k ⎥
⎣ ⎦ (71)-(73) are still not met. For this reason the additional terms
.. f ∗k,1 , f ∗k,2 and ω ∗k were introduced. Now, by choosing
.
 ∗ 
The observability characteristics of the discrete linear time- ∗ ∗T r k+1 − r ∗k − Δtv ∗k
f k,1 = C k −g , (77)
varying system (67)-(68) can differ from those of the under- 0.5Δt2
 ∗ 
lying nonlinear system (14)-(19),(27). This discrepancy can v k+1 − v ∗k
be caused by linearization/discretization effects as well as f ∗k,2 = C ∗T
k −g , (78)
Δt
by noise effects. The effect of noise becomes particularly  
ω ∗k = ζ −1 q ∗k+1 ⊗ q ∗−1
k (79)
evident when contemplating the observability characteristics
of a corresponding noiseless (ideal) system. For the presented all constraints can be easily met. The above quantities repre-
system the effect of noise renders the yaw angle observable sent the IMU measurements that would arise when considering
by preventing the evaluation of the Jacobians F k and H k two subsequent filter prediction states at timestep k and
around the true state and thereby increasing the numerical k + 1. Because the acceleration related measurements can
rank of the local observability matrix M. The spurious differ if evaluated based on the position prediction or on the
appearance of new observable states is strongly objectionable velocity prediction, two terms were introduced. This permits
as it results in overconfident state estimation. The magnitude to keep the computation of the linearization quantities simple
of this inconsistency depends on the noise ratio, but in the and avoids complex optimization algorithms or oscillation
long run, it will always deteriorate the state estimate. provoking bindings between subsequent linearization points.

22
1
Computing the Jacobians F k and H k using the supplemen-
tary linearization quantities and evaluating the corresponding 0.5

y [m]
local observability matrix (69) yields:
⎡ ⎤ 0

−I 0 s×
1,k C k
T
I ··· 0 0 0 0 1 2 3 4 5
⎢ . . .. .. . . .. .. .. ⎥ x [m]
⎢ .. .. . . . . . .⎥
⎢ ⎥
⎢−I 0 ×
s1,k C kT
0 ··· I 0⎥
Fig. 2. 2D view of a 5 m trotting sequence in simulation. Dashed line (in the
⎢ 0 ⎥ background): ground-truth body trajectory. Darker ellipses (red): successive
⎢ × T Δt2 ⎥
⎢ 0 I (v k + Δt g) C 0 0 0 − C T
# ⎥ position estimates of the robot’s main body. Light grey ellipses: estimates of
M=⎢ 2 k 2 k ⎥ the foothold positions. In both cases the ellipses are scaled depending on the
⎢0 0 −g ×
0 0 0 1
(C T
+ C T
) # ⎥
⎢ 2 k+1 k ⎥ corresponding standard deviation (1σ). The position error at the end amounts
⎢0 0 0 0 0 0 1
(C T
− C T
) # ⎥ to less than 5% of the traveled distance.
⎢ 2 k+2 k ⎥
⎢0 0 0 0 0 0 2 (C k+3 − C k+1 ) #⎥
1 T T
⎣ ⎦ 2
.. 1.5
0 0 0 0 0 0 . #

x [m]
Vicon data
1 Filter estimate
whereby it is simple to test that the observability constraint 0.5
Covariance Hull
0 10 20 30 40 50 60
(70) is satisfied. As a last side note: similarly to the nonlinear
case, observability rank loss will again be induced when ω ≡ 0 0.5

and thus

y [m]
0

C Tk+2 − C Tk = 0. (80) −0.5


0 10 20 30 40 50 60

V. R ESULTS AND D ISCUSSION 0.5

0.45
Experiments are performed in simulation and on a real plat- z [m]
0.4
form, whereby a series-elastic actuated quadruped is stabilized 0.35
by a virtual model control approach [11] using the feedback 0 10 20 30 40 50 60
of the pose estimator. The estimation of accelerometer and time [s]

gyroscope biases is always enabled. In a first experiment the Fig. 3. Comparison between estimated position and the motion capture
filter behavior is evaluated for a dynamic trotting gait within a system’s position outputs. All three positions are affected by some drift,
amounting up to 10 % of the traveled distance.
simulation environment including realistic noise levels. Fig. 2
shows results from a 15 s trajectory with a reference forward values of less than 0.02 m/s (see Fig. 4). Like the velocity
speed of 0.4 m/s. The uncertainties of the robot and of the states, the roll and pitch angles are fully observable as well
foothold positions are represented by the corresponding 1σ- and exhibit also very small estimation errors (see Fig. 5).
ellipses. The effects of unobservable absolute position and The drift of the yaw angle is almost imperceivable. For all
yaw angle can clearly be perceived. The leg kinematics estimates the corresponding 3σ covariance-hull is plotted.
measurements directly correlate the estimate of the main Except for the x-position estimate, where model inaccuracies
body position and the estimates of the foothold positions induce a significant offset, all estimate errors remain within
and thereby strongly limit the drift. Moreover, considering the covariance-hull and thus validate the consistency of the
the correlations induced by the prediction model, the filter is presented approach.
able to properly correct the estimated quantities rendering the
inclination angles and the velocities fully observable. Based on VI. C ONCLUSION AND F UTURE W ORK
the resulting state estimate the quadruped can stabilize itself This paper presents a pose estimator for legged robots. It
in a highly dynamic gait. fuses information from leg kinematics and IMU data, whereby
The second set of results is collected on a real platform. the model equations are kept simple and precise, and only a
During the experiment independent ground truth measure- minimum of assumptions is introduced (mainly limited foot
ments are provided by an external visual tracking system. slippage). The filter can handle unknown terrain and arbitrary
A 60 s long static walking sequence where the robot moves locomotion gaits. Through an observability analysis, it was
approximately one meter forward is evaluated. By pushing shown that for non-degenerate cases only absolute position
and tilting the robot external disturbances are imposed on and yaw angle are not observable. Consequently, the roll and
the slow locomotion pattern. Because the position is not pitch angles as well as the robot’s velocity can be accurately
fully observable, a slight drift occurs for the corresponding tracked, which was confirmed by the experimental results.
estimates (see Fig. 3), it can amount up to roughly 10 % of Compared to proprioceptive sensor setups only, the obtained
the traveled distance. Notable sources for the drift are the state estimate attains an unpreceded level of precision. The
inaccurate leg kinematics and the fault-prone contact detection. very generic formulation enables the filter to be extended with
The slightly longer actual robot shank explains the shorter further sensory measurements and allows its implementation
estimated traveled distance (x direction). On the other hand, on various kinds of legged platforms.
small perturbations are closely tracked by the filter. This is Future work will include handling the unobservable states.
attested by very precise velocity estimates yielding RMS error Different approaches like introducing coordinate transforma-

23
known terrain. In Robotics and Automation, IEEE
International Conference on, Apr. 2007.
[4] J. A. Cobano, J. Estremera, and P. Gonzalez de San-
tos. Location of legged robots in outdoor environments.
Robotics and Autonomous Systems, 56:751–761, 2008.
[5] N. El-Sheimy, Haiying Hou, and Xiaoji Niu. Analysis
and modeling of inertial sensors using allan variance.
Instrumentation and Measurement, IEEE Transactions
on, 57(1):140–149, Jan. 2008.
[6] B. Gassmann, F. Zacharias, J.M. Zöllner, and R. Dill-
mann. Localization of walking robots. In Robotics and
Automation, IEEE Int. Conf. on, Apr. 2005.
[7] O. Gur and U. Saranli. Model-based proprioceptive state
estimation for spring-mass running. In Proceedings of
Fig. 4. Comparison between estimated velocity and the motion capture
Robotics: Science and Systems, Jun. 2011.
system’s numerical position derivatives. All three velocity estimates are fully [8] R. Hermann and A. Krener. Nonlinear controllability and
observable and consequently can be tracked very accurately. The resulting observability. Automatic Control, IEEE Transactions on,
RMS error values are 0.0111 m/s, 0.0153 m/s and 0.0126 m/s.
22(5):728–740, Oct. 1977.
[9] G.P. Huang, A.I. Mourikis, and S.I. Roumeliotis. Anal-
ysis and improvement of the consistency of extended
kalman filter based slam. In Robotics and Automation,
IEEE International Conference on, May 2008.
[10] Guoquan P. Huang, Anastasios I. Mourikis, and Ster-
gios I. Roumeliotis. Observability-based rules for design-
ing consistent ekf slam estimators. International Journal
of Robotics Research, 29:502–528, Apr. 2010.
[11] M. Hutter, C. Gehring, M. Bloesch, M.A. Hoepflinger,
C.D. Remy, and R. Siegwart. StarlETH: A compliant
quadrupedal robot for fast, efficient, and versatile loco-
motion. In Climbing and Walking Robots, International
Conference on, Jul. 2012.
[12] S.J. Julier and J.K. Uhlmann. A counter example to the
theory of simultaneous localization and map building. In
Fig. 5. Comparison between estimated roll, pitch and yaw angle and the Robotics and Automation, IEEE Int. Conf. on, May 2001.
motion capture system’s orientation outputs. Roll and pitch angle are fully [13] P.C. Lin, H. Komsuoglu, and D.E. Koditschek. A leg
observable and the filter produces very precise corresponding estimates, with configuration measurement system for full-body pose es-
angular error RMS of less than 0.5 deg (0.0088 rad and 0.0073 rad). The
yaw angle drift is almost unnoticeable. timates in a hexapod robot. Robotics, IEEE Transactions
on, 21(3):41–422, Jun. 2005.
tions, partitioning the unobservable manifold or implementing [14] P.C. Lin, H. Komsuoglu, and D.E. Koditschek. Sensor
pseudo-measurements could be evaluated. Fusion with exte- data fusion for body state estimation in a hexapod robot
roceptive sensors will also be investigated. More aggressive with dynamical gaits. Robotics, IEEE Transactions on,
locomotion needs to be further tested: while it has been 22(5):932–943, Oct. 2006.
validated in simulation, future work will include dynamic [15] M. Reinstein and M. Hoffmann. Dead reckoning in
walking on the real quadruped platform. a dynamic quadruped robot: Inertial navigation system
aided by a legged odometer. In Robotics and Automation,
R EFERENCES IEEE International Conference on, May 2011.
[1] Z. Chen, K. Jiang, and J.C. Hung. Local observability [16] S.P.N. Singh, P.J. Csonka, and K.J. Waldron. Optical
matrix and its application to observability analyses. In flow aided motion estimation for legged locomotion. In
Industrial Electronics Society, 16th Annual Conference Intelligent Robots and Systems, IEEE/RSJ International
of IEEE, Nov. 1990. Conference on, Oct. 2006.
[2] A. Chilian, H. Hirschmüller, and M. Görner. Multi- [17] F.-L. Tong and M.Q.-H. Meng. Localization for legged
sensor data fusion for robust pose estimation of a six- robot with single low-resolution camera using genetic
legged walking robot. In Intelligent Robots and Systems, algorithm. In Integration Technology, IEEE, Mar. 2007.
IEEE/RSJ International Conference on, Sep. 2011. [18] C. Van Loan. Computing integrals involving the matrix
[3] S. Chitta, P. Vernaza, R. Geykhman, and D.D. Lee. exponential. Automatic Control, IEEE Transactions on,
Proprioceptive localization for a quadrupedal robot on 23(3):395–404, Jun. 1978.

24
Extrinsic Calibration from Per-Sensor Egomotion
Jonathan Brookshire and Seth Teller
MIT Computer Science and Artificial Intelligence Laboratory, {jbrooksh, teller}@csail.mit.edu

Abstract—We show how to recover the 6-DOF transform


between two sensors mounted rigidly on a moving body, a form
of extrinsic calibration useful for data fusion. Our algorithm
takes noisy, per-sensor incremental egomotion observations (i.e.,
incremental poses) as input and produces as output an estimate of
the maximum-likelihood 6-DOF calibration relating the sensors
and accompanying uncertainty.
The 6-DOF transformation sought can be represented effec-
tively as a unit dual quaternion with 8 parameters subject to
two constraints. Noise is explicitly modeled (via the Lie algebra),
yielding a constrained Fisher Information Matrix and Cramer-
Rao Lower Bound. The result is an analysis of motion degeneracy Fig. 1: The incremental motions of the r, red, and s, blue,
and a singularity-free optimization procedure. sensors are used to recover the calibration between the sensors
The method requires only that the sensors travel together along as the robot moves. The dotted lines suggest the incremental
a motion path that is non-degenerate. It does not require that motions, vri and vsi , for sensors r and s, respectively. (In
the sensors be synchronized, have overlapping fields of view,
color; best viewed online.)
or observe common features. It does not require construction
of a global reference frame or solving SLAM. In practice, axis. This confirms previous findings [3, 20] and provides a
from hand-held motion of RGB-D cameras, the method recov-
ered inter-camera calibrations accurate to within ∼0.014 m and variance estimator useful in practice.
∼0.022 radians (about 1 cm and 1 degree). A key aspect of this work is the choice of representation
for elements of the Special Euclidean group SE(3), each of
I. I NTRODUCTION which combines a translation in R3 with a rotation in SO(3).
Extrinsic calibration – the 6-DOF rigid body transform Ideally, we desire a representation that (1) supports vector
relating sensor coordinate frames – is useful for data fusion. addition and scaling, so that a principled noise model can be
For example, point clouds arising from different range sensors formulated, and (2) yields a simple form for g in Eq. 1, so
can be aligned by transforming one sensor’s data into another that any singularities in the FIM can be readily identified.
sensor’s frame, or all sensor data into a common body frame. We considered pairing translations with a number of rotation
We show that inter-sensor calibration and an uncertainty representations – Euler angles, Rodrigues parameters, and
estimate can be accurately and efficiently recovered from quaternions – but each lacks some of the criteria above.
incremental poses (and uncertainties) observed by each sensor. Instead, we compromise by representing each element of
Fig. 1 shows sensors r and s, each able to observe its own SE(3) as a unit dual quaternion (DQ) in the space H. Each
incremental motion vri and vsi respectively, such that the DQ q ∈ H has eight parameters and can be expressed:
calibration k satisfies: q = qr + εqε (2)
vsi = g (vri , k) (1) where qr is a “real” unit quaternion representing the rotation,
where g simply transforms one set of motions according to k. qε is the “dual part” representing the translation, and ε2 = 0.
This paper describes an algorithm to find the k that best An 8-element DQ is “over-parametrized” (thus subject to two
aligns two series of observed incremental motions. The al- constraints) when representing a 6-DOF rigid body transform.
gorithm takes as input two sets of 6-DOF incremental pose Although DQ’s are not minimal, they are convenient for
observations and a 6×6 covariance matrix associated with this problem, combining in a way analogous to quaternion
each incremental pose. It produces as output an estimate composition and yielding a simple form for g – about 20 lines
of the 6-DOF calibration, and a Cramer-Rao lower bound of MatLab containing only polynomials and no trigonometric
(CRLB) [1] on the uncertainty of that estimate. (For source functions. An Euler-angle representation, by contrast, is min-
code and data, see http://rvsn.csail.mit.edu/calibration3d.) imal but produces much more complex expressions involving
Prior to describing our estimation method, we confirm that hundreds of lines of trigonometric terms. Homogeneous trans-
the calibration is in general observable, i.e. that there is formations yield a simple form of g, but require maintenance
sufficient information in the observations to define k uniquely. of many constraints. The DQ representation offers a good
Observability analysis yields a characterization of singularities balance between compactness and convenience.
in the Fisher Information Matrix (FIM) arising from non- Ordinary additive Gaussian noise cannot be employed with
generic sensor motion. For example, singularity analysis re- DQ’s, since doing so would produce points not on SE(3).
veals that 6-DOF calibration can not be recovered from planar- Instead, we define a noise model using a projected Gaussian
only motion, or when the sensors rotate only around a single in the Lie algebra of DQ’s [9] which is appropriate for this

25
over-parametrized form. yield a simple form for g (Eq. 1) and because they can be
To identify singularities of the FIM, we adapt communica- implemented efficiently [14]. Optimization with DQ’s was
tion theory’s “blind channel estimation” methods to determine also examined in [6], but their cost function included only
the calibration observability. Originally developed to deter- translations; our optimization must simultaneously minimize
mine the CRLB on constrained imaginary numbers [18], these rotation error. We review DQ’s briefly here; the interested
methods extend naturally to DQ’s. reader should see [15, p. 53-62] for details.
Background on calibration techniques and an introduction A DQ q can be written in several ways: as an eight-element
to DQ’s is provided in § II. The problem is formally stated vector [q0 , · · · , q7 ]; as two four-element vectors [qr , qε ] (c.f.
in § III, along with a noise model appropriate for the DQ Eq. 2); or as a sum of imaginary and dual components:
representation. System observability is proven, and degenerate q = (q0 + q1 i + q2 j + q3 k) + ε(q4 + q5 i + q6 j + q7 k) (3)
cases are discussed, in § IV. The optimization process for
constrained parameters is described in § V, with techniques In this form, two DQ’s multiply according to the standard rules
for resampling asynchronous data and converting between for the imaginary numbers {i, j, k}.
representations provided in § VI. Experimental results from We write DQ multiplication as a1 ◦ a2 , where {a1 , a2 } ∈
simulated and real data are given in § VII. H. When we have vectors of DQ’s, e.g., a = [a1 , a2 ] and
b = [b1 , b2 ], where {b1 , b2 } ∈ H, we write a ◦ b to mean
II. BACKGROUND [a1 ◦ b1 , a2 ◦ b2 ].
A. Calibration A pure rotation defined by unit quaternion qr is represented
by the DQ q = [qr , 0, 0, 0, 0]. A pure translation defined by
There are several ways to determine the calibration. One
t = [t0 , t1 , t2 ] can be represented by the DQ:
can attempt to physically measure the rigid body transform
between sensors. However, manual measurement can be made t 0 t1 t2
q = 1, 0, 0, 0, 0, , , (4)
difficult by the need to establish an origin at an inaccessible 2 2 2
location within a sensor, or to measure around solid parts of Given rigid body transform q, the inverse transform q −1 is:
the body to which the sensors are mounted. q −1 = [q0 , −q1 , −q2 , −q3 , q4 , −q5 , −q6 , −q7 ] (5)
Alternatively, the calibration can be mechanically engi- −1 −1
neered through the use of precision machined mounts. This can such that q ◦ q = q ◦ q = I = [1, 0, 0, 0, 0, 0, 0, 0]. A
work well for sensors in close proximity (e.g., stereo camera vector v = [v0 , v1 , v2 ] can be represented as a DQ by:
rigs), but is impractical for sensors placed far apart (e.g., on qv = [1, 0, 0, 0, 0, v0 , v1 , v2 ] (6)
a vehicle or vessel). The DQ form qv of vector v transforms according to q as:
The calibration could also be determined by establishing
a global reference frame using Simultaneous Localization qv = q ◦ qv ◦ q ∗ (7)

and Mapping (SLAM), then localizing each sensor within where q is the dual-conjugate [14] to q:
that frame (e.g., [11]). This approach has the significant q ∗ = [q0 , −q1 , −q2 , −q3 , −q4 , q5 , q6 , q7 ] (8)
disadvantage that it must invoke a robust SLAM solver as
a subroutine. DQ transforms can be composed as with unit quaternions;
applying transform A, then transform B to point v yields:
Incremental motions have also been used to recover “hand-
eye calibration” parameters. The authors in [5, 3, 20] recover qv = qB ◦ (qA ◦ qv ◦ qA
∗ ∗
) ◦ qB ∗
= qBA ◦ qv ◦ qBA (9)
the calibration between an end-effector and an adjacent camera where qBA = qB ◦ qA . If the incremental motion vri and
by commanding the end-effector to move with some known calibration k are expressed as DQ’s, then Eq. 1 becomes:
velocity and estimating the camera motion. The degenerate
g(vri , k) := k −1 ◦ vri ◦ k (10)
conditions in [3, 20] are established through geometric argu-
ments; we confirm their results via information theory and the A DQ with qrT qr = 1 and qrT qε = 0 (c.f. Eq. 2) has unit
CRLB. Further, the use of the CRLB allows our algorithm to length. We impose these conditions as two constraints below.
identify a set of observations as degenerate, or nearly degener-
C. Lie Groups & Lie Algebras
ate (resulting in a large variance), in practice. Dual quaternions
were also used by [5]; we extend this notion and explicitly Lie groups are smooth manifolds for which associativity
model the system noise via a projected Gaussian. Doing so of multiplication holds, an identity exists, and the inverse is
allows us to confirm the CRLB in simulated experiments defined [7, 13]; examples include Rn , SO(3), and SE(3).
§ VII-A. However, Rn is also a vector space (allowing addition, scaling,
and commutativity), but SO(3) and SE(3) are not. For
B. Unit Dual Quaternions this reason, points in SO(3) and SE(3) cannot simply be
Our calibration algorithm requires optimization over ele- interpolated or averaged. We use the Lie algebra to enable an
ments in SE(3). Optimization over rigid body transformations optimization method that requires these operations.
is not new (e.g., [8]) and is a key component of many Lie algebra describes a local neighborhood (i.e., a tangent
SLAM solutions. In our setting, unit dual quaternions (DQ’s) space) of a Lie group. So the Lie algebra of DQ’s, h, can
prove to be a convenient representation both because they be used to express a vector space tangent to some point

26
in H. Within this vector space, DQ’s can be arithmetically be evaluated as:
manipulated. Once the operation is complete, points in the lim log q = [0, 0, 0, q5 , q6 , q7 ] (12)
Lie algebra can be projected back into the Lie group. θ→0

The logarithm maps from the Lie group to the Lie algebra; For compactness, we write log a to mean [log a1 , log a2 ].
the exponent maps from the Lie algebra to the Lie group. 2) Exponential of dual quaternions: If d ∈ h, w =
Both mappings are done at the identity. For example, if we [0, d0 , d1 , d2 ] is a quaternion and q = [w, 0, d3 , d4 , d5 ], we
have two elements of a Lie group {u, v} ∈ H, the “box” can exponentiate d as [17]:
notation of [9] expresses the difference between v and u as 1
exp d = (2 cos |w| + |w| sin |w|)I
d = v  u. (Here, {u, v} are each eight-element vectors and 2
d is a six-element vector.) That is, the box-minus operator 1 1
− (cos |w| − 3sinc|w|)q + (sinc|w|)q 2
connotes d = 2 log (u−1 ◦ v) where d ∈ h. In the Lie group, 2 2
u−1 ◦ v is a small transformation relative to the identity, i.e., −
1
(cos |w| − sinc|w|)q 3
(13)
relative to u−1 ◦ u (see Fig. 2). The factor of two before the 2|w|2
log is a result of the fact that DQ’s are multiplied on the left The singularity at w = 0 can be avoided by evaluating:
and right of the vector during transformation (c.f. Eq. 7). lim exp d = I + q (14)
Similarly, the box-plus addition operator [9] involves expo- |w|→0
nentiation. If d ∈ h, then exp d ∈ H. If d = 2 log (u−1 ◦ v),
III. P ROBLEM S TATEMENT
then u ◦ exp d2 = u ◦ exp (log (u−1 ◦ v)) = u ◦ u−1 ◦ v = v.
Since d applies a small transform to u, we use the box-plus Given the observed incremental motions and their covari-
operator to write v = u  d = u ◦ exp d2 . ances, the problem is to estimate the most likely calibration.
ʹŽ‘‰ሺ‫ିݑ‬ଵ ‫ ݑ ל‬ൌ ‫ܫ‬ሻ 2Ž‘‰ ‫ିݑ‬ଵ ‫ ݒ ל‬ൌ ݀ We formulate this task as a non-linear least squares optimiza-
ज़ tion with a state representation of DQ’s. DQ’s were chosen to
‫ିݑ‬ଵ ‫ݑ ל‬
Ž‘‰
verify the CRLB in § VII-A. In § V, we show how to perform
‡š’ the calculation in the over-parametrized state space. (We chose
an over-parametrization rather than a minimal representation,
‫ିݑ‬ଵ ‫ ݒ ל‬ൌ ‡š’ሺ݀ൗʹሻ in order to avoid singularities.)
The calibration to estimate is k = [k0 , · · · , k7 ], where
k ∈ H. The 6-DOF incremental poses from each sensor form
ԯ a series of DQ observations (our experiments use FOVIS
Fig. 2: The mapping between the Lie group, H, and the Lie [10] and KinectFusion [16]). Let zr = [zr1 , zr2 , · · · , zrN ]
algebra, h, is performed at the identity, u−1 ◦ u. and zs = [zs1 , zs2 , · · · , zsN ] be the observations from sensor
This definition of the difference between DQ’s yields a r and s, respectively. Note that {zri , zsi } ∈ H. Finally, let
Gaussian distribution as follows: imagine a Gaussian drawn on z = [zr , zs ] be the (2N ) observations. Both the incremental
the line h in Fig. 2. Exponentiating points on this Gaussian poses of sensor r and the calibration must be estimated [2],
“projects” the distribution onto H. This projected Gaussian so the state to estimate is x = [vr , k] consisting of (N + 1)
serves as our noise model (§ III). DQ’s, where vr = [vr1 , vr2 , · · · , vrN ] is the latent incremental
motion of sensor r.
Summarizing, the Lie group/algebra enables several key
operations: (1) addition of two DQ’s, (2) subtraction of two We then formulate the task as a maximum likelihood
DQ’s, and (3) addition of noise to a DQ. optimization [2]:
&N
x̂M L (z) = argmax P (zri |vri )P (zsi |vri , k) (15)
1) Logarithm of dual quaternions: The logarithm of some x=[vr ,k] i=1
q ∈ H can be calculated as [17]: under the constraint that {vri , k} ∈ H.
 1
log q = [(2θ − sin (2θ))q 3 The probability functions might be assumed to be Gaussian.
4(sin θ)3 However, it is clear that adding Gaussian noise to each term
+ (−6θ cos θ + 2 sin (3θ))q 2 of a DQ will not generally result in a DQ. Instead, we use the
+ (6θ − sin (2θ) − sin (4θ))q projected Gaussian.
+ (−3θ cos θ + θ cos (3θ) By comparison, other approaches [5, 3, 20] simply ignore
 noise and assume that observations and dimensions should
− sin θ + sin (3θ))I] 1:3,5:7 (11) be equally weighted. However, when observation uncertainty
where θ is the rotation angle associated with the DQ, and varies (e.g., when the number of features varies for a vision
exponentiation of a DQ is implemented through repeated algorithm) or when the uncertainty among dimensions varies
multiplication (◦). (This expression incorporates a correction, (e.g., translations are less accurate than rotations), explicit
provided by Selig, to that given in [17].) The (·)1:3,5:7 removes representation of noise minimizes the effects of inaccurate ob-
the identically zero-valued first and fifth elements from the 8- servations. Further, a principled noise model allows recovery
vector. To avoid the singularity at θ = 0, the limit of log q can of the CRLB and, thus, the calibration’s uncertainty.

27
A. Process model Then, we calculate:
T −1
∇x ln P (z|x) = ∇x ln ce− 2 λ Σz λ
1
The process model can be written as: (22)
 
δ 1 T
z = G(x) ◦ exp ≡ G(x)  δ (16) = ∇x − λ Σ−1 z λ (23)
2 2
G(x) = [vr1 , · · · , vrN , g(vr1 , k), · · · , g(vrN , k)] (17)
= (JH JG ) Σ−1
T
z λ (24)
where δ ∈ h acts as a projected Gaussian: δ ∼ N (0, Σz ).
= T T
JG JH Σ−1 λ (25)
Here, the expected observations G(x) have been corrupted by 
z

12N ×12N 12N ×1
a noisy transformation with δ. Notice that the process model
is not typical additive Gaussian noise, z = G(x) + δ, which Substituting into Eq. 20:

would result in z ∈/ H. J = E (∇x ln P (z|x)) (∇x ln P (z|x))
T
(26)
The difference between the observations, z, and the ex-  T T −1   T T −1 T 
pected values, G(x), is λ = −2 log (z −1 ◦ G(x)), where λ = E JG J H Σz λ J G J H Σ z λ (27)
includes 12N parameters, six for each of the 2N observations. T  T 
T T −1
The posteriors in Eq. 15 can be written: = E JG JH Σz λλ Σ−1 z JH JG (28)

T  −1 T
−1
P (z|x) ∼ f (z ◦ G(x)) T T −1
= JG JH Σz E λλ Σz JH JG (29)
−1 1 − 12 λT Σ−1  
f (z ◦ G(x)) = ' e z λ (18) T T −1
= JG JH Σz Σz Σ−1
T
JH JG (30)
(2π)12N |Σz | z
T T −1
= JG J H Σz J H J G (31)
IV. O BSERVABILITY
Since each of the (N + 1) DQ’s in x is subject to two
Assume we have some estimate x̂ of the true parameters constraints, JG is always rank-deficient by at least 2(N +1). Of
x0 . We wish to know how the estimate varies, so we calculate course, our interest is not in rank deficiencies caused by over-
T
the covariance E (x̂ − x0 ) (x̂ − x0 ) . Cramer and Rao [19] parametrization but in singularities due to the observations. We
showed that this quantity can be no smaller than the inverse must distinguish singularities caused by over-parametrization
of J, the Fisher Information Matrix (FIM): from those caused by insufficient or degenerate data.

E (x̂ − x0 ) (x̂ − x0 ) ≥ J −1
T
(19)
The CRLB is critical because (1) it defines a best case C. Cramer-Rao Lower Bound
covariance for our estimate, and (2) if J is singular, no In the communications field, a related problem is to
estimate exists. If (and only if) x cannot be estimated, then estimate complex parameters of the wireless transmission path.
x is unobservable. Indeed, we wish to identify the situations These complex “channel” parameters are usually unknown
under which x is unobservable and arrange to avoid them in but obey some constraints (e.g., they have unit magnitude).
practice. Since the CRLB is often useful for system tuning, Stoica [18]
developed techniques to determine the CRLB with constrained
A. Shift invariance parameters.
For an unbiased estimator, the FIM is: Following [18], suppose the m constraints are expressed

J = E (∇x ln P (z|x)) (∇x ln P (z|x))
T
(20) as f c = [f1c , · · · , fm
c
] such that f c (x) = 0. Let F c be the
Jacobian of f and U be the null space of F c such that
c
The FIM, as it is based on the gradient, is invariant under F c (x)U = 0. When K = U T JU is non-singular, the CRLB
rigid body transformations, so the FIM of a distribution of exists. In our case,
f (z −1 ◦ G(x)) equals that of the distribution f (G(x)). This T T −1
K = U T JU = U T JG JH Σ z J H J G U (32)
is because the expectation is an integration over SE(3) and is
TT T
unaffected by a shift [4]. Thus, we analyze P (z|x) ∼ f (G(x)) = U JG JH LLT JH JG U (33)
to produce an equivalent FIM. = (LT JH JG U )T (LT JH JG U ) (34)
Let H(y) = −2 log y and λ = H(G(x)). H accepts a where Σ−1 T
= LL by the Cholesky factorization. In order to
16N ×1 parameter vector of DQ’s in the Lie group, and returns z
find the cases where J is singular, we examine the rank of K:
a 12N × 1 vector of differences in the Lie algebra.  
rank(K) = rank LT JH JG U )T (LT JH JG U (35)
B. Fisher Information Matrix Since rank(AT A) = rank(A),
In order to calculate the FIM for our estimation problem, rank(K) = rank(LT JH JG U ) (36)
we first find the gradient of λ by applying the chain rule:
 T Further, since each observation is full rank, Σz is full rank;
T
∇x λ = ∇p H(p)|p=G(x) [∇x G(x)] LT is a 12N × 12N matrix with rank 12N and rank(LT A) =
= JH JG (21) rank(A). Thus (see Appendix A):
 
12N ×16N 16N ×8(N +1) rank(K) = rank(JH JG U ) = rank(JG U ). (37)

28
1 5 10 15 18
1 1

10 10 Fig. 3: Visualization of the ma-


trix JG U shows that only the
first six columns can reduce.
Blank entries are zero; orange
are unity; red are more com-
20 20 plex quantities. (Full expres-
sions for the matrix elements
shown in red are included at
the URL given in § I.)

32 32
1 5 10 15 18
Fig. 4: Two robots driven along the suggested paths experience
D. Degeneracies
rotation about only one axis (green). As a result, the true
calibration relating the two true sensor frames (red/blue)
As shown in [2], one observation is insufficient to recover
cannot be determined. The magenta lines and frames show
the calibration. For two observations and a particular null
ambiguous locations for the red sensor frame.
space matrix, U , JG U has the form shown in Fig. 3. Notice
that columns 7-18 are always linearly independent due to the These special cases, however, do not fully characterize the
unity entries. This is not surprising since vri , the estimated degeneracy. So long as the axis of rotation of the incremental
motion of sensor r, is directly observed by zri . Only the first poses remains fixed, any translations and any magnitude of
six columns, corresponding to the six DOF’s of the calibration, rotation will not avoid singularity.
can possibly reduce. By examining linear combinations of In Fig. 4, for example, a robot is traveling along the
these columns, we can identify a singularity which, if avoided suggested terrain. Although the robot translates and rotates
in practice, will ensure that the calibration is observable. some varying amount between poses, it always rotates about
Suppose the two incremental motions experienced by sensor the same axis (green lines). In such situations, the calibration
r are {a, b} ∈ H and let ai be the i-th term of the 8- is ambiguous at least along a line parallel to the axis of rotation
element DQ, a. When a1 b3 = a3 b1 and a2 b3 = a3 b2 , JG U (magenta line). That is, if the calibration is translated along
is singular and the calibration is unobservable. Since the 2nd- such a line, the observations from sensor s remain fixed. Thus,
4th elements of the DQ correspond to the 2nd-4th elements because multiple calibrations result in the same observations,
of a unit quaternion representing the rotation, it follows that the calibration is unobservable.
these relations hold only when there is no rotation or when
the rotation axes of a and b are parallel. Thus, the calibration V. O PTIMIZATION
is unobservable when the sensors rotate only about parallel
axes. In order to estimate the calibration in Eq. 15, we per-
In principle, this analysis could have been completed us- form non-linear least squares optimization, using a modified
ing any representation for SE(3). However, attempting the Levenberg-Marquardt (L-M) algorithm [9]. The optimization
analysis using Euler angles and Mathematica 8.0 exceeded proceeds as:
 −1 T −1
24GB of memory without completing; manual analysis was xt+1 = xt  − JtT Σ−1 Jt Jt Σ (G (xt )  z) (38)
equally difficult. By contrast, DQ over-parametrization made
The term G(xt )  z represents error elements in h, which
both manual and automated analyses tractable to perform and
are scaled by the gradient and added (via ) to the current
simple to interpret, making readily apparent the degeneracy
parameter estimate to produce a new set of DQ’s. The method
arising from a fixed axis of rotation.
computes error, and applies corrections to the current esti-
The condition to avoid degeneracy has several common
mates, in the tangent space via the Lie algebra. After each
special cases:
update, the parameters lie in H.
1) Constant velocity. When the sensors move with constant Jt is the analytic Jacobian at the current estimate [21]:
velocity, the axes of rotation are constant.    ((
−1 h (
2) Translation only. When the sensors only translate, no Jt = ∇h H z ◦ G xt ◦ exp ( ) ( (39)
2 (
rotation is experienced. h=0
3) Planar motion. When the robot travels only in a plane, Essentially, the method shifts the parameters xt via h ∈ h,
the rotation axis is fixed (perpendicular to the plane). then evaluates that shift at h = 0.

29
VI. I NTERPOLATION
Although the DQ representation facilitates the FIM anal- Fig. 5: Motion simulated
ysis, and there are methods to develop a noise model, data such that the red and
and covariance matrices will typically be available in more blue sensors traveled the
common formats, such as Euler angles. Furthermore, sensors paths shown. The path is
are rarely synchronized, so incremental motions may be ob- always non-degenerate;
served over different sample periods. Following the process in this image k  =
in [2], we use the Scaled Unscented Transform (SUT) [12] to 
0.1, 0.05, 0.01, 0, 0, π3 .
(1) convert incremental motion and covariance data to the DQ
representation and (2) resample data from different sensors at
common times.
The SUT creates a set of sigma points, X , centered about
the mean and spaced according to the covariance matrix. Each
point is passed through the interpolation function f i to produce k0 k1
a set of transformed points, Y. A new distribution is then 50 100
created to approximate the weighted Y. We employ the process 50
adapted by [9] to incorporate the Lie algebra. 0 0
0.72 0.74 0.76 0.78 0.8 0.82 0 0.05 0.1
First, f i converts the Euler states to DQ’s. Second, it k2 k3
accumulates the incremental motions into a common reference 50 100
frame and resamples them at the desired instants, typically the
50
sample times of the reference sensor. This resampling is done
via the DQ SLERP operator [14] which interpolates between 0 0
0.45 0.5 0.55 −0.45 −0.4 −0.35
poses with constant speed and shortest path. The function then k4 k5
calculates the incremental motions. 50 50
The SUT requires that transformed sigma points, Y, be
averaged according to some weights, w. Because averaging
0 0
each individual element of a DQ would not result in a DQ, 0.1 0.2 0.3 0.4 1.8 1.9 2 2.1 2.2
the mean of a set of DQ’s must be estimated by optimization. k6 k7
We wish to find the mean b̂ that minimizes the error term in: 100 50
N
  50
b̂ (w, Y) = argmin wi log b−1 ◦ Yi (40)
b 0 0
i=1 −0.8 −0.7 −0.6 −0.5 −0.4 0 0.1 0.2 0.3 0.4
The averaging procedure given in [9], intended for non-
Fig. 6: Histograms (gray) of calibration estimates from 400
negative weights, fails to converge when the Yi ’s are similar
simulations of the path in Fig. 5 match well with the true
(i.e., for small covariance). Since our SUT implementation can
calibration (green triangles) and constrained CRLB (green di-
return negative weights, we use L-M optimization (Eq. 38)
amonds). Black lines indicate the sample mean (solid) and one
with the mean cost function given in Eq. 40.
standard deviation (dashed); red lines show a fitted Gaussian.
VII. R ESULTS Fig. 6 shows results for a sample calibration:
A. Simulation kEuler = [2.79 m, −2.79 m, −1.45 m, −0.51 r, 0.94 r, −1.22 r]
We validated the calibration estimates and constrained in meters ( m) and radians ( r) or, in DQ form, at:
CRLB by simulating robot motion along the path shown in kDQ = [0.77, 0.07, 0.49, −0.40, 0.30, 1.99, −0.57, 0.21]
Fig. 5. The N = 62 observations and covariances per sensor
were simulated using a translation/Euler angle parametrization As shown, the mean and standard deviations from the simu-
(instead of DQ’s) to model data obtained from sensors in lations match well with the true value and the CRLB, respec-
practice. Additive Gaussian noise is applied to this minimal tively. It is important to note that the covariance matrix cor-
representation with a magnitude 10-40% of the true value. responding to Fig. 6 is calculated on an over-parametrization;
Thirty different calibrations were simulated, each there are only six DOF’s in the 8-element DQ representation.
with rigid body parameters k uniformly drawn from Due to these dependent (i.e., constrained) parameters, the
[±3 m, ±3 m, ±3 m, ±π, ±π, ±π]. The observations and covariance matrix is singular. However, because we use the Lie
covariances were then converted to DQ’s using the algebra during optimization and filtering, we can employ the
interpolation method of § VI. We validated the CRLB DQ parametrization to avoid problems associated with singular
by performing 400 Monte Carlo simulations [1] for each covariances.
calibration, sampling each velocity vector from its Gaussian Fig. 7 shows the error between the thirty true calibrations
distribution. and the mean of the estimated values for one of the DQ

30
0.01

0.008
TABLE I: Calibrations recovered from real data

Error of DQ Parameter 0.006 x ( m) y ( m) z ( m) ρ ( r) ϑ ( r) ψ ( r)


0.004 a
True 1 -0.045 -0.305 -0.572 -1.316 0.906 -1.703
0.002
Errorb 0.000 0.011 0.011 0.022 0.002 0.009
0

−0.002
True 2a -0.423 -0.004 0.006 -0.000 0.000 3.141
−0.004
Errorb -0.007 -0.014 -0.003 -0.019 0.003 -0.007
−0.006 True 3a -0.165 0.204 -0.244 1.316 -0.906 3.009
−0.008 Errorb -0.007 0.003 -0.013 0.003 0.000 -0.005
−0.01
0 5 10 15 20 25 30 True 4a -0.040 0.025 0.000 -0.052 0.000 3.141
Simulation
Errorb -0.006 0.008 0.006 0.013 -0.017 0.005
Fig. 7: The error between the known calibration and the mean a Ground truth calibration
estimate was less than ±0.01 for each DQ parameter. b Difference between mean of the estimates and the true calibration
0.04
Simulated
Standard Deviation of DQ Parameter

0.035 CRLB

0.03

0.025

0.02

0.015

0.01

0.005

0
0 5 10 15 20 25 30
Simulation
Fig. 8: The standard deviation of the simulations and predicted Fig. 9: We assess the method’s consistency by recovering the
CRLB agreed to within ∼0.01 for each DQ parameter. loop of calibrations relating three RGB-D sensors.
parameters. The other DQ elements (not shown) behaved Table I summarizes the results of the four different cali-
similarly; they were recovered to within about 0.01 of truth. brations. For clarity, the transforms are shown as translations
Fig. 8 compares the standard deviation of the parameters and Euler angles, but all processing was done with DQ’s.
resulting from the Monte Carlo experiments and the predicted We assumed a constant variance for each DOF. The first
CRLB. In general, the method came close (within about 0.01) three calibrations used Kinects and FOVIS with 2N  2000
to the best-case CRLB. observations; the last used Xtions and KinectFusion with
2N  400. In each case, the method recovered the inter-sensor
B. Real data
translation to within about 1.4 cm, and the rotation to within
We further validated the estimator with two different types about 1.26 degrees.
of depth sensors and motion estimation algorithms. First, we We assessed the estimator’s consistency with three rigidly
collected data with two Microsoft Kinect RGB-D cameras, mounted depth cameras r, s, and t (Fig. 9). We estimated the
mounted on three different machined rigs with known calibra- pairwise calibrations ks,r , kt,s , and kr,t , where, e.g., ks,r is
tions. The RGB-D data from each camera was processed using the calibration between r and s. The closed loop of estimated
the Fast Odometry from VISion (FOVIS) [10] library, which transformations should return to the starting sensor frame:
uses image features and depth data to produce incremental
ks,r ◦ kt,s ◦ kr,t = I (41)
motion estimates. Second, we collected data with two rigidly
mounted Xtion RGB-D cameras and used the KinectFusion The accumulated error was small: translation error was
algorithm [16] for motion estimation. For all four calibra- [−4.27, −2.66, 7.13] mm and rotation error (again in Euler
tions, we moved each rig by hand along a path in 3D. The angles) was [7.03, −5.20, −1.49] milliradians.
interpolation algorithm (§ VI) was used to synchronize the
translations/Euler angles and convert to DQs. VIII. C ONCLUSION
We characterized the noise in both systems using data from We described a practical method that recovers the 6-DOF
stationary sensors. We projected the noise into h and, using a rigid body transform between two sensors, from each sensor’s
chi-squared goodness-of-fit test, found the projected Gaussian observations of its 6-DOF incremental motion. Our contribu-
to be a good approximation (at 5% significance) for both tions include treating observation noise in a principled manner,
FOVIS and KinectFusion. allowing calculation of a lower bound on the uncertainty of the

31
estimated calibration. We show that the system is unobservable 1 10 20 32

when rotation occurs only about parallel axes. 1


2
1
2
Additionally, we illustrate the use of a constrained DQ 3
4
3
4
parametrization which greatly simplified the algebraic machin- 5
6
5
6
ery of degeneracy analysis. Such over-parametrizations are 7
8
7
8
typically avoided in practice, however, because they make 1 10 20 32
it difficult to perform vector operations (addition, scaling, T
averaging, etc.), develop noise models, and identify system Fig. 10: The matrix N (JH ) , depicted here for N = 2,
singularities. We assemble the tools for each required op- reveals 4N DOF’s corresponding to the constraints of the 2N
eration, employing the Lie algebra to define local vector DQ’s in z. Blank entries are zero; filled are unity.
operations and a suitable projected Gaussian noise model. R EFERENCES
Finally, we demonstrated that the constrained form of the [1] Y. Bar-Shalom, T. Kirubarajan, and X. Li. Estimation
CRLB enables system observability to be shown. with Applications to Tracking and Navigation. John
The method accurately recovers the 6-DOF transformations Wiley & Sons, Inc., New York, NY, USA, 2002.
relating pairs of asynchronous, rigidly attached sensors, re- [2] J. Brookshire and S. Teller. Automatic calibration of
multiple coplanar sensors. RSS, 2011.
quiring only hand-held motion of the sensors through space. [3] H. H. Chen. A screw motion approach to uniqueness
We gratefully acknowledge the support of the Office of analysis of head-eye geometry. In CVPR, Jun 1991.
[4] G. Chirikjian. Information theory on Lie groups and
Naval Research through award #N000141210071. mobile robotics applications. In ICRA, 2010.
[5] K. Daniilidis. Hand-eye calibration using dual quater-
A PPENDIX nions. IJRR, 18, 1998.
[6] D. W. Eggert, A. Lorusso, and R. B. Fisher. Estimating 3-
If A is D × E, B is E × F , N (A) is the null space of A, d rigid body transformations: a comparison of four major
R (B) is the column space of B, and dim A is the number algorithms. Mach. Vision Appl., 9(5-6), March 1997.
of vectors in the basis of A, then rank(AB) = rank(B) − [7] V. Govindu. Lie-algebraic averaging for globally consis-
dim [N (A) ∩ R (B)]. Substituting from Eq. 37, tent motion estimation. In CVPR, 2004.
[8] G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, and
rank(JH JG U ) = rank(JG U ) − dim [N (JH ) ∩ R (JG U )] W. Burgard. Efficient estimation of accurate maximum
Intuitively, this means that if a column of JG U lies in the null likelihood maps in 3D. In IROS, Nov 2007.
[9] C. Hertzberg, R. Wagner, U. Frese, and L. Schröder.
space of JH , information is lost during the multiplication and Integrating generic sensor fusion algorithms with sound
the rank of the matrix product is reduced. In order to show state representations through encapsulation of manifolds.
that rank(JH JG U ) = rank(JG U ), there are two cases: CoRR, 2011.
[10] A. Huang, A. Bachrach, P. Henry, et al. Visual odometry
1) If JG U is singular, then rank(JG U ) < 6(N + 1), and mapping for autonomous flight using an RGB-D
where N is the number of observations. This implies camera. In ISSR, Aug 2011.
rank(JH JG U ) < 6(N + 1). Thus, JG U is singular [11] E. Jones and S. Soatto. Visual-inertial navigation,
implies JH JG U is singular. mapping and localization: A scalable real-time causal
2) If JH JG U is singular, then either JG U is singular or approach. IJRR, Oct 2010.
[12] S. Julier. The scaled unscented transformation. In
dim [N (JH ) ∩ R (JG U )] > 0. Proc. ACC, volume 6, pages 4555–4559, 2002.
• If JG U is singular, then this is the case above. [13] K. Kanatani. Group Theoretical Methods in Image
• If JG U is not singular, then rank(JG U ) = Understanding. Springer-Verlag New York, Inc., 1990.
6(N + 1). The task then becomes to determine [14] L. Kavan, S. Collins, J. Zara, and C. O’Sullivan. Geomet-
ric skinning with approximate dual quaternion blending.
dim [N (JH ) ∩ R (JG U )]. Since JG U is full rank, volume 27. ACM Press, 2008.
R (JG U ) is the columns of JG U . Furthermore, [15] J. McCarthy. An Introduction to Theoretical Kinematics.
there are 4N columns in N (JH ), one for each MIT Press, 1990.
of the two constraints of the 2N DQ’s. (Fig. 10 [16] R. Newcombe, A. Davison, S. Izadi, et al. KinectFusion:
shows N (JH ) for N = 2.) It can be shown that real-time dense surface mapping and tracking. In ISMAR,
Oct. 2011.
rank([JH , JG U ]) = rank(JH ) + rank(JG U ). In [17] J. Selig. Exponential and Cayley maps for dual quater-
other words, none of the columns of N (JH ) will nions. Adv. in App. Clifford Algebras, 20, 2010.
intersect with the columns of JG U . Thus, N (JH )∩ [18] P. Stoica and B. C. Ng. On the Cramer-Rao bound under
R (JG U ) = ∅ and rank(JH JG U ) = rank(JG U ). parametric constraints. Signal Processing Letters, IEEE,
5(7):177–179, Jul 1998.
Since JG U is not singular, JH JG U is not singular,
[19] H. Van Trees. Detection, Estimation, and Modulation
which is a contradiction. Only the former possibility Theory, Part I. John Wiley & Sons, New York, 1968.
remains, and JH JG U is singular implies JG U is [20] R. Y. Tsai and R. K. Lenz. A new technique for
singular. fully autonomous and efficient 3D robotics hand/eye
calibration. IEEE Trans. Robot. Autom., 5(3), Jun 1989.
JH JG U is singular if and only if JG U is singular. This is [21] A. Ude. Nonlinear least squares optimisation of unit
not a surprising result; the log function preserves information quaternion functions for pose estimation from corre-
when mapping between the Lie group and the Lie algebra. sponding features. In ICPR, volume 1, Aug 1998.

32
Colour-Consistent Structure-from-Motion Models
Using Underwater Imagery
Mitch Bryson, Matthew Johnson-Roberson, Oscar Pizarro and Stefan B. Williams
Australian Centre for Field Robotics
University of Sydney, Australia
email: [email protected]

Abstract—This paper presents an automated approach to


correcting for colour inconsistency in underwater images col-
lected from multiple perspectives during the construction of 3D
structure-from-motion models. When capturing images under-
water, the water column imposes several effects on images that
are negligible in air such as colour-dependant attenuation and
lighting patterns. These effects cause problems for human inter-
pretation of images and also confound computer-based techniques
for clustering and classification. Our approach exploits the 3D
structure of the scene generated using structure-from-motion
and photogrammetry techniques accounting for distance-based
attenuation, vignetting and lighting pattern, and improves the
consistency of photo-textured 3D models. Results are presented
using imagery collected in two different underwater environments
using an Autonomous Underwater Vehicle (AUV).

I. I NTRODUCTION
In recent years, marine biologists and ecologists have in-
creasingly relied on remote video and imagery from Au-
tonomous Underwater Vehicles (AUVs) for monitoring marine
benthic habitats such as coral reefs, boulder fields and kelp
forests. Imagery from underwater habitats can be used to
classify and count the abundance of various species in an area.
Data collected over multiple sampling times can be used to
infer changes to the environment and population, for example
due to pollution, bio-invasion or climate change. To provide
a spatial context to collected imagery, techniques in structure-
from-motion, photogrammetry and Simultaneous Localisation
And Mapping (SLAM) have been applied to provide three- Fig. 1: Example Images and a 3D Structure-from-Motion Model
dimensional (3D) reconstructions of the underwater terrain of an Underwater Environment: Top, example underwater images
using collected images from either monocular or stereo camera taken from an AUV. Bottom, a 3D terrain model derived from
setups and other navigational sensor information (for example stereo image pairs in a structure-from-motion processing pipeline
where textures on the right half of the model are created from the
see [2, 3, 6, 9]).
collected imagery. Vehicle-fixed lighting patterns and water-based
When capturing images underwater, the water column im- light attenuation cause foreground objects to appear bright and red
poses several effects on images that are not typically seen with respect to background objects that appear dark and blue/green
when imaging in air. Water causes significant attenuation of resulting in texture inconsistencies.
light passing through it, reducing its intensity exponentially
with the distance travelled. For this reason, sunlight, com-
monly used as a lighting source in terrestrial photogrammetry,
at different distances from the camera and light source. In the
is typically not strong enough to illuminate scenes below
context of structure-from-motion, the colour and reflectivity
depths of approximately 20m, necessitating the use of artificial
of objects is significantly different when imaged from differ-
lighting on-board an imaging platform.
ent camera perspectives and distances (see Figure 1). This
The attenuation of light underwater is frequency-dependant;
can cause problems for human interpretation of images and
red light is attenuated over much shorter distances than blue
computer-based techniques for clustering and classification of
light, resulting in a change in the observed colour of an object
image data based on colour intensities.
Full colour paper available at: http://roboticsproceedings.org/rss08/p05.html. The aim of this paper is develop a method for correcting

33
images so that they appear as if imaged in air, where effects
such as attenuation are negligible, and where observed image
intensities and colour are much less dependant on the per-
spective and distance from which the image was captured.
The context of this work is imaging in large-scale biological
habitats, which makes the use of colour-calibration tools such
as colour-boards undesirable due to logistical complexity and
the sensitivities of marine habitats to man-made disturbances
Fig. 2: Left, example of a colour image of an underwater scene and
to the seafloor. Instead, our approach to colour correction right, the corresponding range image derived from the 3D structure-
focuses on colour consistency by utilising the assumption of from-motion. The range image is a map of the distance between the
a ‘grey-world’ colour distribution. We assume that, just as is front of the camera lens and objects in the scene for each pixel in
the case in air, that surface reflectances have a distribution the image.
that on average is grey and is independent of scene geometry.
In air, given the benign imaging process, this assumption
approximately holds at the image pixel intensities which neglecting the case where different objects in a single scene
is why grey-world image correction is remarkably effective are observed at different ranges to the camera. In [15], the
[1]. When underwater, range and wave-length dependent at- authors propose a method for correcting underwater images for
tenuation bias the distribution of image intensities and the colour-attenuation by estimating attenuation coefficients using
grey-world correction cannot be applied naively. Our method either a single image captured both underwater and in air or
exploits known structure of the 3D landscape and the relative two underwater images captured at different depth values. The
position of the camera, derived using structure-from-motion method relies on a controlled setup for capturing images and
and photogrammetry, to apply an image formation model is only demonstrated on three images. Similarly, the authors
including attenuation into the grey-world correction. Results of [12] present a method for colour correcting underwater
of the method are illustrated in two different underwater images for attenuation using known terrain structure under
environments using images collected from an AUV. the assumption that a sufficient number of ‘white’ points can
Section II provides an overview on past work in modelling be identified in the data, either from known properties of the
underwater image formation and underwater image correction. scene or by using calibration targets. While these approaches
Section III details our approach to underwater structure-from- are insightful, they are typically impractical in large-scale,
motion and image correction. Section IV provides an overview unstructured underwater environments.
of the context in which underwater images are acquired and the In [13], the authors present a learning-based approach to
experimental setup used to demonstrate our image correction colour correction that learns the average relationship between
method. Section V provides results of our image correction colours underwater and in air using training points of images
method. Conclusions and future work are presented in Section of the same objects both in and out of water. The approach
VI. ignores the effect of distance-based attenuation, instead learn-
II. BACKGROUND L ITERATURE ing a single transformation that corresponds to an object at a
fixed imaging distance.
The predominant effects of the water column on images Past approaches to underwater colour correction in the liter-
taken underwater are scattering and attenuation [5]. Scattering ature either ignore the explicit causes of attenuation or require
occurs when light is reflected from microscopic particles in complicated and limiting calibration setups for attenuation-
the water, causing a ‘blurring’ in images that increases with compensation. The contribution of our work is thus a method
distance. The authors in [11] present an approach to reducing of underwater image correction that explicitly accounts for
the effects of light scattering by taking images through a distance-based attenuation and does not require an in-situ
polarising filter at various angles. colour calibration setup, thus enabling large-scale surveying
Attenuation occurs when light is absorbed or diffracted by in unstructured underwater environments.
water molecules or other particles in the water [5] and can be
affected by water temperature, salinity, water quality (i.e. from III. M ETHODOLOGY
pollution, sediment suspension) and suspended microscopic
life (plankton). Attenuation is the dominant cause of the colour This section of the paper provides a background in image
imbalance often visible in underwater images. Several authors transformation algorithms, underwater structure-from-motion
have proposed approaches to compensating for this effect. and an underwater image formation model, and describes our
In [14], the authors present an active imaging strategy that approach to underwater image correction.
adaptively illuminates a scene during imaging based on the
average depth from the camera. The approach alters the colour A. Underwater Structure from Motion
balance of the strobe lighting source (for example to increase The colour correction strategy discussed throughout the
red-light for scenes further from the camera) but uses only one rest of the paper exploits knowledge about precise distances
depth value (derived from the camera focal length) per scene, between the camera and each point in an observed scene.

34
object reflectance and radiance at the camera lens as a function
of the object distance, d, and the attenuation coefficient, b(λ),
which is a function of the wavelength of light λ (i.e. colour):
L = Re−b(λ)d (1)
Light travelling through the lens of the camera undergoes a
fade-out in intensity towards the corners of the image via the
effect of vignetting [7]. Vignetting is caused primarily by the
geometry of light passing through the lens and aperture of the
camera; light passing in from greater angles to the principle
axis of the camera is partially shaded by the aperture and
Fig. 3: Underwater Image Formation Model: Light is reflected from sometimes by the lens housing. Vignetting can be summarised
the surface of an object (R) from an artificial lighting source attached by:
to the underwater platform. This light is attenuated as it passes E = C(u, v)L (2)
through the water column and reaches the camera lens with a radiance
(L). Irradiance (E) reaching the image sensor is affected by vignetting where E is the light per unit area arriving at the camera
through the camera lens and transformed to an image brightness image sensor (irradiance) and C(u, v) is constant for a given
intensity (I) via the sensor response function of the camera. horizontal and vertical position in the image [u, v]. C(u, v)
may be parameterised in various ways under different camera
setups; a typical model [7] for a simple lens has C(u, v)
This information is generated using an existing structure-from- proportionate to θ, the angle of the image position from the
motion processing pipeline; more details on the approach can principle axis of the camera:
be found in [6, 9]. πr2 cos4 θ
Overlapping stereo-image pairs are collected over an under- C= (3)
4l2
water scene using an AUV. Scale Invariant Feature Transform where r is the radius of the lens and l is the distance between
(SIFT) feature points [8] are extracted from each pair and the lens and the image plane. The term C(u, v) can also ac-
used to compute pose-relative 3D point features. Features are count for the effects of a camera-fixed lighting pattern, which
also matched and tracked across overlapping poses and used is typical of underwater imaging, where artificial lighting is
with additional navigation sensor information (a depth sensor, carried by the underwater platform and imposes an object
doppler-velocity sensor and attitude sensors) to compute the illumination that is dependant on the position of the object in
trajectory of the platform using a pose-based Simultaneous Lo- the image. The last step in image formation occurs when light
calisation and Mapping (SLAM) algorithm [9]. The estimated arriving at the image sensor of the camera is converted into
trajectory and pose-relative 3D point features are then used an image intensity value I, via the sensor response function
to construct a global feature map of the terrain from which a of the camera f (.):
photo-textured surface model is created by projecting colour I = f (kE) (4)
camera images onto the estimated terrain structure [6]. The
points in the estimated surface model are then back-projected where k is the exposure constant, typically proportionate to
into each stereo-image pair and interpolated to produce a the shutter speed of the camera. The sensor response function
range-image, i.e. an image whose intensity values correspond f (.) can take a variety of forms, for example a gamma curve,
to the range from the camera lens to the observed scene points. and is typically controlled by camera firmware. A detailed
Figure 2 illustrates an example underwater colour image of a discussion of sensor response functions can be found in [4].
boulder field with scene objects of varying distances from the The overall image formation process is illustrated in Figure 3.
camera and the corresponding range image derived from a 3D Under the assumption that the sensor response function is
structure-from-motion model of the area. linear with the form:
I = AkE + c (5)
B. Underwater Image Formation Model
The measured image intensity recorded by a camera at a where the parameters A and c are constant, Equations 1, 2 and
point in an underwater scene is not directly proportionate to 5 can be combined into a single equation:
the intensity of light reflected from the point; instead, several I = AkC(u, v)Re−b(λ)d + c (6)
factors act on the light path that make this relationship camera
= a(u, v)Re−b(λ)d + c (7)
perspective-dependent. The light reflected from an object per
unit area (reflectance, R) is attenuated through the water where the parameter a(u, v) is a function of image position,
column, resulting in a exponential decrease in the magnitude of b(λ) is a function of light wavelength (i.e. image colour
the light per unit area arriving at the front of the lens (radiance, channel) and c is a constant.
L) as the distance between the object and the camera lens Our underwater image formation model assumes that am-
increases [5]. Equation 1 describes the relationship between bient light (i.e. from the sun) is negligible compared to the

35
histogram statistics and the transformation parameters:
μy (λ) = E[Iy (λ)] = m(λ)μx (λ) + n(λ) (9)
σy2 (λ) = E[(Iy (λ) − E[Iy (λ)])2 ] (10)
= m(λ)2 E[(Ix (λ) − E[Ix (λ)])2 ] (11)
= m(λ)2 σx2 (λ) (12)
where μx (λ) and σx2 (λ) are the mean and variance of a given
channel of the original image x and E[.] is the expectation
operator. These relationships can then be used to derive a linear
transform that results in an image channel with any desired
mean and variance:
)
σy2
m(λ) = (13)
σx2 (λ)
n(λ) = μy − m(λ)μx (λ) (14)
where μy and σy2 are the desired mean and variance. This
method can be used to balance colour channels by setting
the desired mean and variance equal for each channel and
computing separate scale and offset parameters for each image
channel. Figure 4 provides an example of this process. An
uncompensated underwater image with corresponding his-
togram is shown along with the final, compensated image
and image histogram after providing a linear transformation
Fig. 4: Image before and after grey-world correction: top left with parameters derived for each red-green-blue channel using
subfigures shows the original, uncorrected image, top right subfigure Equations 13 and 14 with a desired mean of μy = 0.5 and
shows the colour-balanced image. Bottom subfigures illustrate the desired image variance of σy2 = 0.162 for each channel.
image colour intensity histograms. This process is commonly referred to as a ‘grey-world’
transformation [1] and can also be applied across a collection
of images taken in a given environment by computing μx (λ)
vehicle-carried artificial light source, which typically holds and σx2 (λ) for each red, green and blue image channel using
for the operating depths considered in this study. The effect the concatenation of data from several images (i.e. μx (λ)
of ambient light could be added as an additional parameter and σx2 (λ) are computed using all pixels from all images
to the model in Equation 7; this is beyond the scope of this in a given channel). The assumption behind the grey-world
paper and will be considered in future work. transformation is that the real colours in the environment are
evenly distributed (i.e. grey on average).
C. Image Transformation Algorithms D. Image Transformation Accounting for Image Spatial Ef-
As a result of the effect of artificial lighting patterns fects
and attenuation, underwater images typically exhibit a colour Camera-fixed lighting pattern and vignetting are two ex-
balance that is different from what would be observed from amples of effects on images that result in pixels in different
air, with much stronger blue and green responses. Image trans- regions of an image having different statistical distributions
formations such as contrast and brightness enhancements can across multiple images in an environment. On average, pixels
be applied to each colour channel of the image to compensate; images towards the centre of an image will appear brighter
consider the effect of applying a linear transform to each pixel than those towards the periphery. To account for this, the
in the image: greyworld image transformation can be computed separately
for each spatial region of an image. Thus for each pixel
Iy (u, v, λ) = m(λ)Ix (u, v, λ) + n(λ) (8)
location [u, v], the image correction is applied:
where Ix (u, v, λ) is the original intensity of a given pixel Iy (u, v, λ) = m(u, v, λ)Ix (u, v, λ) + n(u, v, λ) (15)
[u, v] for channel λ in image x, Iy (u, v, λ) is the image pixel
intensity in image y, and m(λ) and n(λ) are scaling and offset where a separate set of scale and offset parameters is used for
constants applied across all pixels in a given colour channel each pixel location and channel:
λ (i.e. separate values for each of the red, green and blue )
σy2
channels). The mean μy (λ) and variance σy2 (λ) of the resulting m(u, v, λ) = (16)
pixel histogram after transformation for a given channel can σx2 (u, v, λ)
be calculated as a function of the original image channel n(u, v, λ) = μy − m(u, v, λ)μx (u, v, λ) (17)

36
channel λ. One potential method for computing μx (u, v, λ, d)
and σx2 (u, v, λ, d) is to ‘bin’ measured pixel intensities into
specified values of d (distance from the camera) and compute
the mean and variance of each bin, requiring a large number
of samples at each bin, resulting in certain ranges and pixels
locations being under-sampled.
Since the expected distribution of intensities across different
depths is expected to follow the relationship of image forma-
tion derived in Equation 7, an alternative approach is taken.
For each pixel location, a scatter plot of image intensity vs.
range is created, one point for each of the N image in the
underwater dataset. If μR is the expected mean reflectance
Fig. 5: Scatter plot of measured pixel intensity vs. distance to camera of the surface, then the mean image intensity measured from
from the center pixel (i.e. u = 680, v = 512) in a set of 2180, 1360-
by-1024 pixel images. The scatter data exhibits an exponential falloff pixel [u, v], channel λ and range d is:
in intensity with distance from the camera due to attenuation, and a
dependancy to colour. μx (u, v, λ, d) = a(u, v, λ)μR e−b(u,v,λ)d + c(u, v, λ) (21)
where a(u, v, λ), b(u, v, λ) and c(u, v, λ) correspond to pa-
rameters for the considered pixel location and image channel.
where μy and σy2 are the desired mean and variance and Let a∗ (u, v, λ) = a(u, v, λ)μR and the parameters a∗ (u, v, λ),
μx (u, v, λ) and σx2 (u, v, λ) are the mean and variance of all b(u, v, λ) and c(u, v, λ) in Equation 21 can now be estimated
intensities in a given set of N images at the pixel location by taking a non-linear least-squares fit of this function with the
[u, v] for a given channel λ. This process benefits from a scatter data using a Levenberg-Marquardt optimisation [10].
large collection of N images within a given environment An initial state estimate x = [a, b, c] = [1, 0, 0] was used
from different perspectives in order to provide a statistically and was found to provide stable convergence in all of the
significant measurement of μx (u, v, λ) and σx2 (u, v, λ) over datasets examined. The mean intensity is then computed from
objects of various colours and intensities. the function μx (u, v, λ, d) = a∗ (u, v, λ)e−b(u,v,λ)d +c(u, v, λ)
as a function of distance d.
E. Image Transformation Accounting for Attenuation To compute σx2 (u, v, λ, d), the variance corresponding to
In addition to image spatial effects, such as lighting pattern pixel [u, v] and channel λ as a function of distance d, a similar
and vignetting, attenuation also serves to skew the distribution approach is taken. If σR2
is the expected variance of reflectance
of intensities measured at a given pixel location over an image 2
of the surface, then σx (u, v, λ, d) is:
set based on the range to different objects measured in each
image. For example, Figure 5 illustrates a scatter plot of the σx2 (u, v, λ, d) = E[(Iy (u, v, λ, d) − E[Iy (u, v, λ, d)])2 ]
red, green and blue image intensities measured from the center = [a(u, v, λ)e−b(u,v,λ)d ]2 σR
2
(22)
pixel (i.e. u = 680, v = 512) in a set of 2180, 1360- 2
by-1024 pixel images vs. the computed range in the scene The expected variance of surface reflection σR can be com-
of this image pixel taken from various perspectives over an puted using the parameters estimated in Equation 21; for each
underwater environment. The distributions in pixel intensities sample of the scatter data, the expected reflectance can be
exhibit considerable correlation to the range to objects with a computed by taking the inverse of Equation 21 and from the
2
exponentially decaying relationship to range, indicative of the resulting values of reflectance, the variance σR is calculated.
image formation model described in Equation 7.
IV. E XPERIMENTAL S ETUP
As in the case for lighting pattern and vignetting, range-
dependent attenuation can be accounted for by applying an Data collected from an AUV deployed in two different types
image transformation that is a function of image channel, of underwater environments was used to test the image correc-
image pixel coordinate and range to the object. The complete tion techniques described above. The first dataset was collected
image transformation is thus computed as: over an underwater boulder field populated by various species
of sponge, algae and sea urchins. The data was collected by
Iy (u, v, λ) = m(u, v, λ, d)Ix (u, v, λ) + n(u, v, λ, d) (18) taking overlapping transects of a rectangular region approxi-
) mately 25-by-20m and contained images collected at various
σy2 altitudes as the vehicle moved over the rocky terrain. The
m(u, v, λ, d) = 2
(19) second dataset was collected from a single line transect over
σx (u, v, λ, d)
an area with various types of benthic coverage including rocky
n(u, v, λ, d) = μy − m(u, v, λ, d)μx (u, v, λ, d) (20)
reef platforms with various species of coral, kelp and sandy
where μx (u, v, λ, d) and σx2 (u, v, λ, d) are the mean and bottom. In both datasets, the underwater terrain is rugous and
variance of all intensities in a given set of N images at results in images being collected from a range of different
the pixel location [u, v] for a given depth d and for a given perspectives and camera-object distances.

37
The underwater images were collected using a stereo camera
setup with two 1360-by-1024 pixel cameras; the left camera
was a colour-bayer camera and the right camera monochrome
(only images from the left camera were used in the image
correction process). The cameras are mounted below the AUV
platform looking downwards so as as to capture images of the
terrain as the vehicle passed over. Images were captured in a
raw 12-bit format and later converted to three colour channels.
The camera firmware used a linear sensor response function.
The aperture and shutter exposure times were fixed for each
data collection. Artificial lighting was provided by two strobes
attached to the AUV that provide white light to the scene.
Both datasets were processed using a structure-from-motion
pipeline [6, 9], and the derived 3D structure used to create
range images for each left camera image collected.
Two sets of corrected images were obtained from the raw
data for both datasets for comparison. The correction was ap-
plied firstly using Equation 15 which only accounted for image
space effects such as vignetting/lighting pattern and colour
balance (i.e. not accounting for distance-based attenuation). A
second set of corrected images was obtained using Equation 18
which accounted for the full image formation model described
in Equation 7 including image space effects and distance-based
attenuation. Raw images were typically too dark to view with
the naked eye (see Figure 4 for an example) and thus were
not displayed in the comparisons shown.
The corrected images were assessed both qualitatively (by
examining consistency in appearance) and quantitatively by Fig. 6: Scatter plot of measured pixel intensity vs. distance to camera
and estimated mean and standard deviation curves for each red green
measuring the inconsistency in overlapping images of common and blue channel for the center pixel, taken from 2180 images in
objects. For each planar region of the surface model, a list of dataset 1 (boulder field). The mean and variance (shown at three
images that covered this region was compiled. The structure- standard deviations from the mean) curves provide an approximation
from-motion estimated image poses were used via back- of the distribution of pixel intensities at a given distance from the
projection to compute the patch of each image corresponding camera.
to the given spatial region. As a measure of inconsistency,
the variance in these image texture intensities at this planar
region was computed and displayed spatially over the map
for each region. When images were affected by depth-based a robust estimate at ranges where only a small amount of data
attenuation, regions that were viewed from multiple perspec- is present (i.e. very close or far away from the camera).
tives (and thus multiple ranges) displayed widely different Figure 7 illustrates a comparison between corrected images
texture intensities (and thus high variance and inconsistency) using the correction methods described in Sections III-D and
whereas when attenuation was compensated for, images of III-E. The left subfigure illustrates an image that has been
a common region displayed lower variance in intensity. The corrected using Equation 15 where only the spatial effects
spatial patterns seen in the variance (i.e. faint banding patterns) and colour channel are considered in the correction (i.e. no
correspond to changes in the number of overlapping views of distance-based attenuation model). Although the image now
each point. Some of the sharp peaks in variance correspond to has colours that are balanced and does not exhibit spatial
small registration errors in the structure-from-motion process. effects such as vignetting or lighting pattern, there is still
significant distance-based attenuation. The image in the right
V. R ESULTS subfigure has been corrected using Equation 18 where all the
Figure 6 illustrates examples of the mean/variance curve effects of the image position, colour channel and distance-
fitting using image intensity samples taken from the centre based attenuation are considered in the correction. Both the
pixel (i.e. u = 680, v = 512) from the 2180 images in data colour intensity and contrast in the more distant regions of the
set 1 (boulder field). For a given value of distance d, the mean image have been increased to match the overall scene, while
and variance values approximate of the distribution of image the less distant regions have been darkened, largely removing
intensities captured from various objects at this distance from the effects of attenuation.
the camera and are used to derive image correction parameters. Once images were corrected across an entire dataset, they
The use of a fitting approach to compute statistics allowed for were applied as photo-textures for the structure-from-motion

38
Fig. 7: Left, colour compensated image using standard (non depth-based) correction and right, colour compensated image with full water
attenuation correction. Both the intensity and contrast in the more distant regions of the image have been increased to match the overall
scene in the right subfigure, while objects less distant have been darkened.

Fig. 8: Left, 3D photo-textured scene using standard colour corrected textures and right, 3D photo-textured scene using full water attenuation
colour corrected textures.

Fig. 9: First and second subfigures from the top: comparison of 3D photo-textured scenes using standard and full water attenuation colour
corrected textures. Third and fourth subfigures: comparison of image texture standard deviation for the same corrected textures (colourbar
units are normalised intensity). The attenuation-corrected textures display significantly reduced variance in the intensity of images at each
part of the map, in particular areas corresponding to large perspective changes by the vehicle.

39
derived 3D surface model. Figure 8 shows a comparison R EFERENCES
of the photo-textured terrain model of dataset 1 (boulder [1] G. Buchsbaum. A Spatial Processor Model for Object
field) when using image textures that have been corrected Color Perception. Journal of Franklin Institute, 310(1):
via the methods described in Sections III-D and III-E. The 1–26, 1980.
left subfigure illustrates the model using textures taken from [2] R. Campos, R. Garcia, and T. Nicosevici. Surface
images corrected using Equation 15 (i.e. no distance-based reconstruction methods for the recovery of 3D models
attenuation model) and the right subfigure illustrates the model from underwater sites. In IEEE OCEANS Conference,
using textures corrected using Equation 18 (i.e. full model with 2011.
attenuation correction). The left model exhibits considerable [3] R. Eustice, O. Pizarro, and H. Singh. Visually Aug-
correlation between the intensity/colour of image textures and mented Navigation for Autonomous Underwater Vehi-
the average distance from which each part of the surface was cles. IEEE Journal of Oceanic Engineering, 33(2):103–
imaged from, in particular a horizontal banding pattern that 122, 2008.
corresponds to overlapping swaths during image collection [4] M. D. Grossberg and S. K. Nayar. Modelling the Space
that occur at slightly different heights above the terrain. The of Camera Response Functions. IEEE Trans. on Pattern
distance and general spatial correlation has been essentially Analysis and Machine Intelligence, 26(10):1272–1282,
removed in the corrected-texture model to the right. 2004.
Figure 9 illustrates a similar comparison between photo- [5] J. S. Jaffe. Computer Modeling and the Design of
textured models from dataset 2 (reef) with the non-attenuation Optimal Underwater Imaging Systems. IEEE Journal
corrected model shown in the first subfigure (from the top) and of Oceanic Engineering, 15(2):101–111, 1990.
the attenuation-corrected model shown in the second subfigure. [6] M. Johnson-Roberson, O. Pizarro, S.B. Williams, and
Inconsistencies are visible in the first model during passage I. Mahon. Generation and Visualization of Large-
of sections of rocky reef that sit above the sandy bottom and scale Three-dimensional Reconstructions from Underwa-
appear bright and red in the images as the AUV changes its ter Robotic Surveys. Journal of Field Robotics, 27(1):
height above the terrain in order to clear the obstacle. The 21–51, 2010.
bottom two subfigures illustrate maps of the variance in over- [7] S.J. Kim and M. Pollefeys. Robust Radiometric Calibra-
lapping images textures across the map for the two different tion and Vignetting Correction. IEEE Trans. on Pattern
correction schemes. The attenuation-corrected textures display Analysis and Machine Intelligence, 30(4):562–576, 2008.
significantly reduced variance in the intensity of images at [8] D.G. Lowe. Distinctive Image Features from Scale-
each part of the map, in particular areas corresponding to large Invariant Keypoints. International Journal of Computer
perspective changes by the vehicle. Vision, 60(2):91–110, 2004.
VI. C ONCLUSIONS AND F UTURE W ORK [9] I. Mahon, S. Williams, O. Pizarro, and M. Johnson-
Roberson. Efficient View-based SLAM using Visual
This paper has developed an automated approach for cor- Loop Closures. IEEE Transactions on Robotics, 24(5):
recting colour inconsistency in underwater images collected 1002–1014, 2008.
from multiple perspectives during the construction of 3D [10] D. Marquardt. An Algorithm for Least-Squares Estima-
structure-from-motion models. Our technique exploits the 3D tion of Nonlinear Parameters. SIAM Journal on Applied
structure of the scene generated using structure-from-motion Mathematics, 11(2):431–441, 1963.
and photogrammetry techniques accounting for distance-based [11] Y.Y. Schechner and N. Karpel. Clear Underwater Vision.
attenuation, and improves the consistency of photo-textured In IEEE International Conference on Computer Vision
3D models. Results are presented using imagery collected in and Pattern Recognition, 2004.
two different underwater environments and demonstrated both [12] A. Sedlazeck, K. Koser, and R. Koch. 3D reconstruction
the qualitative and quantitative improvement of the imagery. based on underwater video from ROV Kiel 6000 consid-
Our approach relies on the assumption of a ‘grey-world’ ering underwater imaging conditions. In IEEE OCEANS
(i.e. one in which colours in the environment are grey on Conference, 2009.
average and not biased in hue) and further more that this [13] L. A. Torres-Mendez and G. Dudek. A Statistical
distribution is spatially consistent (in particular, depth). Future Learning-Based Method for Color Correction of Under-
work will consider extensions to our approach to account for water Images. In Research on Computer Science Vol. 17,
environments where this assumption is violated such as when Advances in Artificial Intelligence Theory, 2005.
objects of a particularly strong colour are only present at a [14] I. Vasilescu, C. Detweiler, and D. Rus. Color-Accurate
biased depth in the dataset. One potential approach to this issue Underwater Imaging using Perceptual Adaptive Illumi-
is to consider robustified fitting approaches or outlier rejection nation. In Robotics: Science and Systems, 2010.
methods that allow for a select, ‘well-behaved’ subset of the [15] A. Yamashita, M. Fujii, and T. Kaneko. Color Registra-
data to be used during model fitting. Future work will also tion of Underwater Images for Underwater Sensing with
consider approaches for building consistency in image datasets Consideration of Light Attenuation. In IEEE Interna-
collected over multiple collection times, for monitoring long- tional Conference on Robotics and Automation, 2007.
term changes to the environment.

40
M-Width: Stability and Accuracy of
Haptic Rendering of Virtual Mass
Nick Colonnese Allison Okamura
Mechanical Engineering Mechanical Engineering
Stanford University Stanford University
Stanford, California 94305-2232 Stanford, California 94305-2232
Email: [email protected] Email: [email protected]

Abstract—In many physical human-robot interaction scenar-


 
ios, such as haptic virtual environments for training and rehabil-
  

   
itation, it is desirable to carefully control the apparent inertia of 
a robot. Inertia compensation can be used to mitigate forces  
felt by the user during free-space motion, and rendering of  
additional inertia is desired for particular rehabilitation and  
training procedures. Many factors influence the stability and
accuracy of rendering for haptic display of a pure mass, including    
device mechanical properties, sample rate, control structure, 
and human behavior. Inspired by the “Z-Width” approach to

   
haptic device stability and performance analysis, we introduce
“M-width”, which we define as the dynamic range of virtual 
masses renderable in a stable manner. We identify the important
parameters for system stability, find stability boundaries, and
describe the expected accuracy of the haptic rendering for a
canonical haptic system. These results serve as a design tool    
for creating stable and accurate haptic virtual environments,
establish limits of performance, and lay the groundwork for new
controllers to improve mass rendering. Fig. 1. Example application of inertia rendering in rehabilitation. (a) A user
makes single-joint reaches to a 30◦ target in an exoskeleton robot, during
which rendered inertia alters his movement. (b) A patient with hypermetria
I. I NTRODUCTION (tendency to overshoot) is predicted to benefit from decreased effective arm
In many applications of haptic virtual environments, we inertia. (c) A patient with hypometria (tendency to undershoot) is predicted
to benefit from increased effective arm inertia. While neuromechanical mod-
desire the user to feel a very specific set of mechanical proper- els predict these effects, implementing inertia rendering requires improved
ties, which arise from inherent device dynamics, programmed understanding of system stability and accuracy.
virtual environment dynamics, and (potentially unexpected)
interactions between the two that depend on control structure,
sampling rate, and other system properties. Consider the haptic sensations. Accurate haptic virtual environments should
example application shown in Figure 1. Here, researchers wish feel exactly as desired, with no unwanted effects of haptic
to test the hypothesis that damage to the cerebellum (a region device dynamics (e.g., the inherent inertia of the mechanism).
of the brain that plays an important role in motor control) In the rehabilitation example given above, stability and ac-
holds a dynamic model of the body used in feed-forward curacy are both necessary for the scientific validity of the
planning of movements. Patients with cerebellar damage who study. Yet, there exists a classic tradeoff between ensuring the
exhibit ataxia (uncoordinated movements) are asked to make stability of a haptic display and the range of impedances that
fast reaching movements in a backdrivable, planar exoskeleton it can render. Mass rendering (presenting a force proportional
robot. If a rendered inertia could be found that eliminates to acceleration) presents a particular challenge because mea-
the ataxia, this would provide significant evidence toward the suring acceleration on a typical impedance-type haptic device
theory that cerebellar damage results in a specific bias in is noisy; it involves double discrete differentiation of a noisy
internal modeling of body dynamics, and it would open up position signal. While system improvements (e.g., the addition
a host of potential new rehabilitation therapies for patients of new sensors) can improve mass rendering, an improved
with ataxia. However, modifying the effective inertia of the understanding of the performance of a canonical haptic de-
user/robot combination to be a specific value evokes funda- vice will yield insight as to what modifications in hardware,
mental challenges in haptic device control. sensing, and control will improve mass rendering. The long-
Haptic interfaces should be simultaneously stable and accu- term goal of this work is to develop stable and accurate mass
rate. Unstable behavior can cause damage to a haptic device, rendering for rehabilitation applications, and predict the limits
injure the human user, and generate unrealistic and unexpected of performance. Here, we begin by developing a framework

41
(a)
for analyzing the stability and accuracy of mass rendering with
theory applicable to a one-degree-of-freedom linear system.
Colgate and Brown [5] define “Z-Width” as the dynamic
range of impedances that can be rendered passively. Passive
systems are incapable of generating a net amount of energy,
and the coupling of passive systems is guaranteed to be
stable. However, conditions for passivity can be conservative (b)
compared to conditions for stability. Although Z-width is a
general term spanning all impedances, it has generally been
discussed as relating to haptic rendering of virtual springs
and dampers. In addition, Colgate and Schenkel [6] derived
a general theorem for the passivity of haptic interfaces.
This paper introduces “M-Width”, the dynamic range of
virtual mass renderable in a stable manner. We allow for
positive virtual mass, corresponding to mass rendering, as well
as negative virtual mass, corresponding to mass compensation.
The definition of M-width differs from Z-width in three
key ways: (1) we consider BIBO (bounded input, bounded
output) stability, not passivity – to avoid an overly conservative
implementation, (2) it models the human operator as a specific
impedance, not a general passive element, and (3) the desired
virtual environment is a pure mass with motion data filtering.
This work builds upon significant prior work related to
the stability of haptic displays, e.g. [1, 6, 7, 10]. Virtual
mass rendering has been previously explored for specific
medical robotics scenarios, including dynamic compensation
for a surgical teleoperator [14], and inertia compensation for
Fig. 2. (a) Schematic of a human interacting with a haptic device imple-
a lower-leg exoskeleton [3]. Gil et al. [11] explore inertia menting virtual mass. (b) Block diagram for this system, considering human
compensation by force feedforward for an impedance haptic parameters, device dynamics, sampling, and ZOH.
device using additional sensors for acceleration estimation.
Adams and Hannaford [2] present a general virtual coupling
approach that could be used to render mass that is guaranteed of important parameters with respect to stability, and for use
to be passive, but do not consider mass explicitly. Hannaford in a quality of rendering analysis. We show that the discrete
and Ryu [12] extend the passivity domain, at the necessary model results in stability bounds that are identical to the hybrid
expense of reducing the quality of haptic rendering. Brown model stability bounds.
and Colgate [4] establish conditions for passive positive mass We consider a haptic device described by a mass, m, and
simulations. viscous damper, b, acted upon by two external forces: the
While recognizing that many scenarios require treatment force applied by the operator, Fh (s), and the force applied
of nonlinear models, in this paper we consider linear models by the actuator implementing the virtual environment, Fa (s)
– which are relevant to the problem of dysmetria (under- or (Figure 2). The system is equipped with only one sensor
overshooting targets) in single-jointed movements, and enable measuring the position of the mass, X(s). An estimate of
the use of classic linear control theory. the acceleration of the device is formed by performing a
double back difference on the sampled position signal and
II. S YSTEM M ODELS then low-pass filtering. The actuator force is the product of
In this section we introduce system models for a human the acceleration estimate and the desired virtual mass.
interacting with a haptic device, where the goal is to ren- A. Hybrid Model
der a pure mass. There are many different haptic system
architectures that can be used to render inertia, but in this The system equation of motion is:
paper we are interested in an impedance-type device with Fh (s) − Fa (s) = X(s)(ms2 + bs). (1)
position sensing only. Three system models are introduced.
One is a hybrid model, containing both continuous and discrete The force of the operator is determined by the motion of the
elements. This model is the closest to reality, in that it captures operator Xh (s) via the hand impedance,
the control of a physical haptic device through a computer
Fh (s) = bh (sXh (s) − sX(s)) + kh (Xh (s) − X(s)), (2)
containing A/D (analog to digital) and D/A (digital to analog)
components. Entirely continuous and discrete models are also where bh and kh are positive constants corresponding to
introduced. The continuous model is used for the identification the hand damping and stiffness, respectively. The continuous

42
position signal X(s) is sampled with a constant sampling
period of T to get a sampled position X(k). The sampled
position measurements go through a discrete double back
differencing operator,

(z − 1)2
D(z) = , (3)
(T z)2
and then a discrete low-pass filter,

(1 − e−ω0 T )z
H(z) = , (4)
z − e−ω0 T Fig. 3. Steps for converting the hybrid system into an entirely discrete
system. A: From the hybrid model shown in Figure 2, we find a continuous
to form an estimate of the acceleration signal. The discrete transfer function G(s). B: Then, the continuous elements are converted into
low-pass filter has cut-off frequency ω0 (rad/s). The low-pass an equivalent element G(z), resulting in an entirely discrete system.
filter exists for practical considerations because the double
differentiation of the sampled position signal is expected to
be noisy. Other filters could be used; we chose one of the C. Discrete Model
simplest possible filters to provide information about baseline It is also possible to represent the system with an entirely
performance. In a Z-width analysis, larger virtual damping en- discrete model. The discrete model is useful for stability
ables the stable display of larger virtual environment stiffness. analysis in which the discrete elements can be considered
Here, a greater degree of filtering enables the stable display explicitly. Figure 3 illustrates the process.
of a larger span of virtual masses. We will see, however, that First, a transfer function from Fa (s) to X(s), G(s), is
filtering can result in inaccurate mass rendering. formed combining only the continuous elements into a single
The force of the actuator is the product of the accelera- transfer function:
tion estimate and a desired virtual mass, M , which can be
Fa (s) 1
positive (mass rendering), or negative (mass compensation). = G(s) = . (9)
The actuator force is held constant for the duration of the X(s) ms2 + s(b + bh ) + kh
sampling period with a zero-order hold (ZOH), resulting in a Then, the ZOH, G(s), and the sample elements are converted
continuous-time staircase actuator force. At sample time k the into a discrete element, G(z) using a zero order hold equivalent
actuator force is [9] * +
z−1 G(s)
Fa (k) = M H(z)D(z)X(k). (5) G(z) = Z . (10)
z s
B. Continuous Model The input-output characteristics of the continuous and discrete
elements are the same. For this reason, the stability of the
It is possible to convert the general hybrid system shown
hybrid and discrete models are identical.
in Figure 2 into an entirely continuous one. The continuous
model is desirable because of its tractability for finding im- III. S YSTEM S TABILITY
portant parameters with respect to stability using conventional
linear control systems analysis. The two systems are similar, We now analyze the stability of the system shown in
but not identical. To represent the system continuously, the Figure 2, considering the all-continuous and discrete repre-
discrete elements are converted to continuous ones. The con- sentations of the hybrid system from which BIBO (bounded
tinuous representation of D(z), D(s), is found using bilinear input, bounded output) stability can be determined. First, we
(Tustin) mapping: identify important parameters with respect to stability using
the continuous model. This is useful as a haptic system
4s2 design tool because it identifies what key parameters determine
D(z) → D(s) = . (6)
(sT + 2)2 mass rendering stability. Then, we form quantitative stability
boundaries using the discrete model equivalent to those of the
The continuous representation of H(z), H(s), is a first-order hybrid model. This establishes the limits of a haptic system
low-pass filter with cut-off frequency ω0 (rad/s) with unity rendering mass using this architecture. Finally, we discuss how
gain at DC. the continuous and discrete models differ.
ω0
H(z) → H(s) = . (7)
s + ω0 A. Effects of Parameters on Stability
The zero-order hold is modeled as a time delay equal to half To find important parameters with respect to stability we
the sample period. analyze the characteristic polynomial of the continuous sys-
tem. We can determine system stability using a Bode plot.
ZOH → e−
sT
2 . (8) By examining how the form of the Bode plot changes with

43
TABLE I
system parameters we observe the effect of the parameters on E FFECT OF PARAMETERS ON G AIN M ARGIN OF L(s)
stability. The characteristic polynomial of the system is

1 + L(s), (11) Gain Margin Gain Margin


Parameter
(positive M) (negative M)
where
|M/m| ⇓ ⇓
M s2 4s2 ω0 ωn - ⇑
e− 2
sT
L(s) =
m s2 + 2ζωn s + wn2 (sT + 2)2 s + ω0 ζ - ⇑
, (12) T ⇓ -
kh ω0 ⇓ ⇓
ωn = (13)
m for ωn  ω ∗

(b + bh )
ζ= √ . (14) TABLE II
2 kh m H UMAN AND D EVICE PARAMETERS

We see that L(s) can be formed by the product of the ratio of


M to m, two poles from a second order system described by Human Parameters [13] Device Parameters [7]
ωn and ζ, four zeros at the origin, two poles on the real axis kh 700 (N/m) m 70 (g)
at −2/T , a first-order low-pass filter with cut-off frequency bh 5 (Ns/m) b .005 (Ns/m)
ω0 (rad/s), and a linear phase delay depending on T . It is
convenient to define ω ∗ as the frequency at which the phase
of L(s) is −180◦ . The stability of the system is determined by
the value of the gain margin of L(s). The gain margin, GM , Figure 4 shows how L(s) varies with the sample time, T ,
is defined to be for positive desired virtual mass. (The Bode plots for negative
1 desired virtual mass would have −180◦ lower phase at every
GM = (15)
|L(ω ∗ )| frequency.) The parameter values are |M/m| = 1, ωn = 100
(rad/s), ζ = 0.5, and ω0 = 30 (rad/s). These values were chosen
A gain margin greater than one corresponds to a stable system, to represent a human interacting with a Phantom Premium
and a gain margin of less than one corresponds to an unstable haptic device (SensAble, Inc.) based on a human model from
system. The Bode plot for positive and negative M have the Kuchenbecker et al. [13] and device model from Diolaiti et al.
same magnitude, and the phase for negative M is 180◦ lower [7] (Table II). Figure 4 demonstrates that the sample rate
than for positive M , at every frequency. The effects of these significantly affects the gain margin for positive virtual mass
system parameters on the gain margin of L(s) is summarized because the value of ω ∗ is mainly determined by the sample
in Table I. rate. However, for negative virtual mass, ω ∗ is nearly ωn , so
• |M/m| is a gain of the system; changing it directly affects the sample rate has little effect on the gain margin.
the gain margin. There exists a maximum stable value of Figure 5 shows how L(s) varies with ωn for positive virtual
|M/m|. mass. For positive virtual mass, ωn and ζ do not affect the gain
• ωn is the frequency at which the magnitude and phase of margin of L(s), as long as the phase transition of the second
the second order system transitions. For positive M , if order system reaches its asymptote before ω ∗ . However, for
ωn  ω ∗ then ωn has no effect on the gain margin. For negative virtual mass they do affect the gain margin. Thus,
negative M , ω ∗ is nearly ωn , so increasing ωn increases the haptic device and operator parameters have more effect on
the gain margin. stability for inertia compensation than inertia rendering.
• ζ affects the magnitude and phase of the second order
The virtual inertia to device inertia ratio, M/m, is a gain
system around ωn . The lower the value of ζ, the higher of the system and directly affects the gain margin of L(s). If
the “spike” in magnitude around ωn , and the smaller the other system parameter values are set, we can determine the
span for the phase to transition. For positive M , if ω0  range of M/m for which the system is stable. This analysis
ω ∗ the value of ζ does not affect the gain margin. For shows how much inertia rendering is possible. For example,
negative M , increasing ζ increases the gain margin. the Bode plot of Figure 4 at a sample rate 1000 Hz is about
• ω0 affects the gain margin by introducing a pole at
−45 dB at ω ∗ , so for positive M , values of M/m up to 45
frequency ω0 . Lower cut-off frequencies, corresponding dB are stable. For negative M , the magnitude of the system
to more aggressive filtering, increase the gain margin. is about −15 dB at ω ∗ , so values of |M/m| up to 15 dB are
• T affects how quickly phase lag is added to the system.
stable. This establishes a range of stable commanded virtual
Larger sample periods result in more phase lag added at mass values, but we will see in Section IV that the rendered
a given frequency. For positive M , T has a large effect mass of the system may not actually match the commanded
on the gain margin; a smaller T corresponds to a higher value. This analysis is based on the continuous model whose
gain margin. For negative M , T has little effect on the stability boundaries are similar, but not identical, to the hybrid
gain margin. and discrete model stability boundaries.

44
    #     #

  
 *#   *# 

#+    #+  









 
 
 

  #!"##' 


 
!" # 



 $%&

%&
 %&


   
   
  

Fig. 4. Bode plots of L(s) with positive virtual mass M for various sample 
rates. Here M/m = 1, ωn = 100 (rad/s), ζ = 0.5, and ω0 = 30 (rad/s).
The Bode plots for negative virtual mass have 180◦ lower phase at every 
frequency. The sample rate affects the gain margin greatly for positive virtual
mass, but not for negative virtual mass. 
      
 

  


Fig. 6. Examining M-width: Stability boundaries from analysis and numerical

simulation. Lines mark boundaries between stable and unstable behavior; the
 origin side is stable. Given the parameters in Table II, a cut-off frequency

   
of the low-pass filter, f0 (Hz), and sample rate, the plots display the range
   
 
of desired virtual mass to device mass ratios, M/m, that are stable. The
maximum minus the minimum stable value of virtual to device ratio scaled
 by the device mass is the M-width.
!" # 



 !
!
 
!"
  

 
   
   
  


Fig. 5. Bode plots of L(s) with positive virtual mass M for various values 
of ωn . Here M/m = 1, ζ = 0.5, ω0 = 30 (rad/s), and the sample rate is 1000 








Hz. The Bode plots for negative virtual mass M have 180◦ lower phase at  

every frequency. For positive virtual mass ωn and ζ do not affect the gain 
!" " 
margin of L(s) as long as the phase transition of the second order system # 
reaches its asymptote before ω ∗ , but for negative virtual mass they do affect
!" # 



the gain margin. 




   
   
B. Stability Boundaries (M-width)  

In this section, analytical stability boundaries are formed


Fig. 7. Bode plots of the loop polynomials for the continuous and discrete
using the continuous and discrete models. Stability boundaries systems. The plots are similar, but the discrete system has more phase lag at
are also formed using numerical simulation. Figure 6 displays higher frequencies.
stability boundaries for a family of systems. The maximum
minus the minimum stable value of the (desired) virtual to
device mass ratio, M/m, scaled by the device mass, give the
M-width of the system: the dynamic range of virtual masses which shares the same stability boundaries as the hybrid
renderable in a stable manner. The maximum stable value one. Figure 7 displays the Bode plots of L(s) and L(z) for
for rendering positive mass is predominately determined by the continuous and discrete models, respectively. The system
the sample time. For rendering negative inertia, the maximum parameter values are the same as in Figure 4 with a sample
stable value is mainly determined by ωn and ζ. Reducing the time of 1000 Hz. The Bode plots are very similar, but the
cut-off frequency of the filter increases the M-width. discrete model has more slightly more phase lag at higher
frequencies. Thus, the continuous model will appear stable for
C. Continuous and Discrete Model Comparison parameters at which the discrete model is unstable. However,
How do the continuous and discrete models compare? We the general effects of parameters for the models are similar, so
now examine whether the stability intuition that we get from the intuitive parameter guidelines apparent in the continuous
the continuous model is applicable to the discrete model, model are still useful.

45
    
IV. ACCURACY OF H APTIC R ENDERING
(a) Positive mass rendering
We are interested not only in stability, but also in the

  


 
accuracy of the haptic rendering. We investigated this using

"  * *&
$ 
# 
two different approaches. The first approach is to analyze  

the input-output relationship from the position of the human 

 
Xh (s) to the force of the human Fh (s). The second approach 
   

   
       
is to perform a system identification on the simulated system    

   ! "


assuming a model of pure mass. This results in an estimate

for the mass of the system, and this estimate can be compared $

;< * "



to the ideal case. 
: 
A. Accuracy Analysis by Transfer Function Comparisons
    
   
We now analyze the accuracy of the haptic rendering by  



    
 
comparing the admittance of three systems using the contin-
uous model. The first system is the ideal case in which the (b) Negative mass rendering

!"" #  '%



operator feels only a mass. The value of this mass is the sum

   
 

of the device mass and virtual mass:  

Fh (s) 
Gideal (s) = = (m + M )s2 . (16)  
Xh (s) 
       
       
The second system is the transfer function of the device alone:    

!"" # '() '%



Fh (s) bh s(s + b/m)(s + kh /bh ) &
Gdevice (s) = = . (17)  
Xh (s) s2 + 2ζωn s + ωn2 


!"

The third is the transfer function of the device with the haptic 

loop implementing a virtual mass:     
   
       
Fh (s) (bh s + kh )(ms2 + bs + D(s)H(s)M )    
Gsys (s) = = .
Xh (s) ms2 + (b + bh )s + kh + D(s)H(s)M
(18) Fig. 8. Measuring accuracy of haptic rendering using transfer functions.
A reasonable metric for the accuracy of the rendering is how (a) Bode and effective mass and damping ratio plots of three systems with
the Bode plot of the full system (the third case) compares to positive virtual mass M = 3m. (b) Bode and effective mass and damping ratio
plots of three systems with negative virtual mass M = -.5m. The system with
the other two. For high-quality haptic rendering, the system full feedback behaves similar to the ideal system for frequencies less than the
plot should be more similar to the first case (ideal). To aid cut-off frequency of the filter, and similar to the device system for frequencies
this analysis, we introduce effective mass ratio and effective past.
damping ratio.
1 −Re{G(s)}
effective mass ratio = (19) B. Accuracy Analysis by System Identification
m w2
1 Im{G(s)} Another way to measure the accuracy of haptic rendering is
effective damping ratio = (20) to compare the perceived mass of the system to the ideal case.
(b + bh ) w
Ideally, at every instant of time the system should follow
These quantities can be plotted against frequency to identify
components of the system response corresponding to mass and fh (t) = (m + M )ẍ, (21)
damping. The plots contain the same information as the Bode
plots, but allow for a more easy comparison. where ẍ is the acceleration of the device. We can form
Figure 8 shows the Bode and effective mass and damping an estimate for the perceived mass of the system, m̂, by
ratio plots of a family of systems where the human and performing a system identification on simulated data assuming
device parameters used are those listed in Table II, the cut- a model of pure mass. We can compare this estimate to the
off frequency of the low-pass filter, f0 , is 10 (Hz), and ideal case, (m + M ). Numerical simulation of the system can
the sample rate is 1000 Hz. The system with full feedback yield numerous “measurements” for model fitting.
behaves similarly to the ideal system for frequencies less
than the cut-off frequency of the filter, and similar to the [ẍ] m̂ = [Fh ] , (22)
device system for larger frequencies. This analysis shows that
the cut-off frequency of the filter plays a powerful role in where m̂ is a scalar and [ẍ] and [Fh ] and are vectors of the
determining the accuracy of the rendering. The higher the cut- samples. The best estimate of m̂, in the least squares sense, is
off frequency, corresponding to less aggressive filtering, the T
larger the frequency span in which the system behaves like [ẍ] [Fh ]
m̂ = T
. (23)
the ideal. [ẍ] [ẍ]

46
(a) Positive mass rendering
 achieved for a small ratio of virtual mass to device mass and

high cut-off frequency of the low-pass filter. However, these

       requirements generate restrictive stability bounds, as shown in



Figure 6. In practice, an upper bound for the cut-off frequency

of the low-pass filter exists for noise reduction. This results
 in a rendering accuracy limit based on the noise present in
 acceleration measurement.

For systems in which ωn  ω ∗ , we established that sam-

pling rate is an important parameter for implementing positive

      
 virtual mass, but not negative virtual mass. For positive virtual
  
mass, the stability of the system is largely determined by the
(b) Negative mass rendering delay in the haptic feedback loop. For negative virtual mass,


" the stability of the system is largely determined by the user


behavior, so the time delay introduced by sampling has a small
  

 

! relative effect.
 
The M-width of a haptic device scales with the device mass.

This means that to display large virtual mass, large device
 
mass is desired. This is at odds with a primary goal of haptic


 
device design: to create mechanisms that result in minimal

free-space dynamics. It is possible that the inertia of a more
 massive device could be canceled, but mass compensation
      
  
  
stability is highly dependent on user behavior. For haptic
environments that aim to render large virtual inertia, such as
Fig. 9. The error of the estimate of the mass of the system, m̂, with respect
the rehabilitation scenario presented in Figure 1, this presents a
to the ideal case, (m+M ), versus the cut-off frequency of the low-pass filter, design challenge, which we would like to circumvent through
f0 (Hz), for various mass ratios. (a) The case of positive virtual mass. (b) the use of clever controllers and improved sensing.
The case of negative virtual mass.
With the results of this paper, we can now consider various
changes to the haptic system design, including mechanical
parameters of the device, position and/or acceleration sensors,
We define the error as the difference between the estimate and
filters, and novel controllers, and predict their effects on
the ideal value scaled by the ideal value:
stability and accuracy of mass rendering. For applications in
m̂ − (m + M ) which perception of inertia (as opposed to display of specific
error = . (24)
(m + M ) inertia values) is the goal, we can also consider the use of
multimodal feedback to alter inertia perception. Dominjon
If the error is negative, the system would feel less massive
et al. [8] found that the ratio of the amplitude of movements
than desired, and if the error is positive, the system would
of a user’s hand to the amplitude of a virtual cursor, as well
feel more massive than desired. This method outputs a scalar
as gravity, play a large role in inertia perception.
for the error by combing errors in magnitude and phase, and
as such has no physical meaning. Despite this, it is a useful B. Influence of the Human on Stability
metric to establish conditions in which the commanded and
The parameters of the system affected by the human me-
rendered system mass are similar or different.
chanical properties are ωn and ζ. These parameters, which also
Figure 9 shows the error in mass rendering as a function of
depend on the physical properties of the haptic device, increase
cut-off frequency for various mass ratios. The motion input to
with increased human impedance or co-contraction, as occurs
the simulated system was a chirp signal with frequencies of
when an operator strongly grasps a haptic device. For positive
0.1 to 5 Hz. For both positive and negative mass rendering, as
virtual mass, if ωn  ω ∗ , the human does not contribute to
the cut-off frequency of the filter is increased, corresponding
stability. As ωn gets closer to ω ∗ , however, ωn and ζ affect
to less aggressive filtering, the perceived mass approaches the
stability more strongly. This can be interpreted in the following
ideal. Also, generally, the larger the mass ratio, the larger
way: if the user is grasping weakly, then the stability of the
the error. For this system, the human parameters and sample
system does not depend on the user. However, as the user
period did not significantly affect the simulation.
grasps the haptic device more strongly, the values of ωn and
V. D ISCUSSION ζ increase, and the system becomes more stable because of
the contribution of the human. For positive virtual mass the
A. Design Trade-offs system has a “baseline” stability without any interaction from
Our results illustrate the classic trade-off between stability the human, that can only be improved by the user. This is
and accuracy of haptic rendering. For the case of mass not the case for negative virtual mass, where ωn and ζ affect
rendering, we demonstrated that accurate rendering can be stability directly. This confirms the results of Gil et al. [11]:

47

R EFERENCES

  
 [1] J. J. Abbott and A. M. Okamura. Effects of position

quantization and sampling rate on virtual-wall passivity.

IEEE Transactions on Robotics, 21(5):952–964, 2005.








 

 [2] R. J. Adams and B. Hannaford. Stable haptic interaction
with virtual environments. IEEE Robotics and Automa-

tion, 15(3):465–474, 1999.
!" # 


[3] G. Aguirre-Ollinger, J. E. Colgate, M. A. Peshkin, and
 !"
!" A. Goswami. Design of an active one-degree-of-freedom
 !"#
lower-limb exoskeleton with inertia compensation. The






 
 International Journal of Robotics Research, 30(4):486–
 
499, 2010.
[4] J. M. Brown and J. E. Colgate. Minimum Mass for
Fig. 10. Bode plots of L(s) with positive virtual mass M for various values Haptic Display Simulations. In Proc. ASME Dynamic
of ζ. Here M/m = 1, ωn = 100 (rad/s), ω0 = 30 (rad/s), and the sample
rate is 1000 Hz. The value of ζ does not affect the stability of the system
Systems and Control Division, pages 249–256, 1998.
for positive virtual mass, but has an extreme affect on stability for negative [5] J. E. Colgate and J. M. Brown. Factors affecting the Z-
virtual mass. Width of a haptic display. In Proc. IEEE International
Conference on Robotics and Automation, pages 3205–
that the mechanical properties of the operator strongly affect 3210, 1994.
stability. For example, if the user grasps the haptic device [6] J. E. Colgate and G. G. Schenkel. Passivity of a class of
weakly, the system can have a strong resonance peak at ωn sampled-data systems: Application to haptic interfaces.
which could make the system go unstable for very small Journal of Robotic Systems, 14(1):37–47, 1997.
amounts of mass compensation. This effect is shown in Figure [7] N. Diolaiti, G. Niemeyer, F. Barbagli, and J. K. Salisbury.
10, which displays the effect of ζ on the Bode plot of L(s). Stability of haptic rendering: Discretization, quantization,
time delay, and coulomb effects. IEEE Transactions on
C. Model Choice Robotics, 22(2):256–268, 2006.
Many models could be used to describe a haptic system for [8] L. Dominjon, A. Lécuyer, J. M. Burkhardt, P. Richard,
inertia rendering; our choice of the model shown in Figure 2 and S. Richir. Influence of control/display ratio on the
was driven by relevance as a practical design tool, tractability perception of mass of manipulated objects in virtual
for finding analytical solutions, and our ideas about the most environments. In Proceedings of IEEE Virtual Reality,
significant factors in system stability and accuracy. In future pages 19–25, 2005.
work, we may wish to capture additional system properties, [9] G. F. Franklin, D. J. Powell, and A. Emami-Naeini.
such as quantization of signals occurring due to A/D and D/A Feedback Control of Dynamic Systems. Pearson Prentice
conversions and encoder position measurements. Quantization Hall, 5th edition, 2006.
is a non-linear phenomenon, and in position measurement [10] J. J. Gil, A. Avello, A. Rubio, and J. Florez. Stability
it can lead to sustained oscillations and instability in the analysis of a 1 dof haptic interface using the routh-
system [1, 7]. Nonlinear Coulomb friction has also been hurwitz criterion. IEEE Transactions on Control Systems
shown to have a substantial effect on the passivity of haptic Technology, 12(4):583–588, 2004.
devices. Comparing our theoretical and simulation results to [11] J. J. Gil, A. Rubio, and J. Savall. Decreasing the
the stability and accuracy of haptic rendering on a variety of Apparent Inertia of an Impedance Haptic Device by
haptic devices will determine the broad applicability of our Using Force Feedforward. IEEE Transactions on Control
approach and whether improved models are needed. Systems Technology, 17(4):833–838, 2009.
[12] B. Hannaford and J.-H. Ryu. Time-Domain Passivity
VI. C ONCLUSION
Control of Haptic Interfaces. IEEE Transactions on
The main results of our analysis are the identification of Robotics, 18(1):1–10, 2002.
important parameters for system stability, stability boundaries [13] K. J. Kuchenbecker, J. G. Park, and G. Niemeyer.
(M-width), and analysis of the accuracy of mass rendering Characterizing the human wrist for improved haptic
using impedance-type haptic devices. We verify our analytical interaction. In Proc. ASME Int. Mechanical Engineering
results in numerical simulations, while recognizing that exper- Congress and Exposition, volume 2, 2003. paper number
iments with a variety of haptic devices are needed to validate 42017.
the broad applicability of our approach. Our results could serve [14] M. Mahvash and A. M. Okamura. Enhancing trans-
as a design tool for creating virtual environments, controllers, parency of a position-exchange teleoperator. In World
and device/sensor designs to improve rendering of virtual Haptics, pages 470–475, 2007.
mass. This could apply to inertia compensation for more
massive haptic interfaces, as well as rendering increased or
decreased inertia in rehabilitation and motor training scenarios.

48
Contextual Sequence Prediction with
Application to Control Library Optimization
Debadeepta Dey∗ , Tian Yu Liu∗ , Martial Hebert∗ and J. Andrew Bagnell∗
∗ The
Robotics Institute
Carnegie Mellon University, Pittsburgh, PA
Email: (debadeep, tianyul, hebert, dbagnell)@ri.cmu.edu

Abstract—Sequence optimization, where the items in a list are recording demonstrated trajectories of experts aggressively fly-
ordered to maximize some reward has many applications such ing unmanned aerial vehicles (UAVs) has enabled dynamically
as web advertisement placement, search, and control libraries feasible trajectories to be quickly generated by concatenating
in robotics. Previous work in sequence optimization produces a
static ordering that does not take any features of the item or a suitable subset of stored trajectories in the control library
context of the problem into account. In this work, we propose a [10].
general approach to order the items within the sequence based Such libraries are an effective means of spanning the space
on the context (e.g., perceptual information, environment descrip- of all feasible control actions while at the same time dealing
tion, and goals). We take a simple, efficient, reduction-based with computational constraints. The performance of control
approach where the choice and order of the items is established
by repeatedly learning simple classifiers or regressors for each libraries on the specified task can be significantly improved by
“slot” in the sequence. Our approach leverages recent work on careful consideration of the content and order of actions in the
submodular function maximization to provide a formal regret library. To make this clear let us consider specific examples:
reduction from submodular sequence optimization to simple cost- Mobile robot navigation. In mobile robot navigation the
sensitive prediction. We apply our contextual sequence prediction task is to find a collision-free, low cost of traversal path
algorithm to optimize control libraries and demonstrate results
on two robotics problems: manipulator trajectory prediction and which leads to the specified goal on a map. Since sensor
mobile robot path planning. horizons are finite and robots usually have constrained motion
models and non-trivial dynamics, a library of trajectories
I. I NTRODUCTION respecting the dynamic and kinematic constraints of the robot
are precomputed and stored in memory. This constitutes the
Optimizing the order of a set of choices is fundamental to control library. It is desired to sample a subset of trajectories
many problems such as web search, advertisement placements at every time step so that the overall cost of traversal of the
as well as in robotics and control. Relevance and diversity are robot from start to goal is minimized.
important properties of an optimal ordering or sequence. In Trajectory optimization. Local trajectory optimization
web search, for instance, if the search term admits many differ- techniques are sensitive to initial trajectory seeds. Bad ini-
ent interpretations then the results should be interleaved with tializations may lead to slow optimization, suboptimal per-
items from each interpretation [19]. Similarly in advertisement formance, or even remain in collision. Here the control ac-
placement on web pages, advertisements should be chosen tions are end-to-end trajectory seeds that act as input to the
such that within the limited screen real estate they are diverse optimization. Zucker [31], Jetchev et al. [13] and Dragan et
yet relevant to the page content. In robotics, control libraries al. [8] proposed methods for predicting trajectories from a
have the same requirements for relevance and diversity in precomputed library using features of the environment, yet
the ordering of member actions [7]. In this paper, we apply these methods do not provide recovery methods when the
sequence optimization to develop near-optimal control libraries prediction fails. Having a sequence of initial trajectory seeds
taking into account the context of the environment. Examples provides fallbacks should earlier ones fail.
of control actions include grasps for manipulation, trajectories Grasp libraries. During selection of grasps for an object, a
for mobile robot navigation or seed trajectories for initializing library of feasible grasps can be evaluated one at a time until a
a local trajectory optimizer. collision-free, reachable grasp is found. While a naı̈ve ordering
Control libraries are a collection of control actions obtained of grasps can be based on force closure and stability criteria
by sampling a useful set of often high dimensional control [2], if a grasp is unachievable, then grasps similar to it are also
trajectories or policies. In the case of control libraries, a likely to fail. A more principled ordering approach which takes
sequence refers to a ranked list of control action choices rather into account features of the environment can reduce depth of
than a series of actions to be taken. Examples of control the sequence that needs to be searched by having diversity in
libraries include a collection of feasible grasps for manip- higher ranked grasps.
ulation [5], a collection of feasible trajectories for mobile Current state-of-the-art methods in the problems we address
robot navigation [11], and a collection of expert-demonstrated either predict only a single control action in the library that
trajectories for a walking robot (Stolle et. al. [23]). Similarly, has the highest score for the current environment, or use an

49
ad-hoc ordering of actions such as random order or by past search.
rate of success. If the predicted action fails then systems • The application of this technique to the contextual opti-
(e.g. manipulators and autonomous vehicles) are unable to mization of control libraries. We demonstrate the efficacy
recover or have to fall back on some heuristic/hard-coded of the approach on two important problems: robot ma-
contingency plan. Predicting a sequence of options to evaluate nipulation planning and mobile robot navigation. Using
is necessary for having intelligent, robust behavior. Choosing the sequence of actions generated by our approach we
the order of evaluation of the actions based on the context of observe improvement in performance over sequences
the environment leads to more efficient performance. generated by either random ordering or decreasing rate
A naı̈ve way of predicting contextual sequences would be of success of the actions.
to train a multi-class classifier over the label space consisting • Our algorithm is generic and can be naturally applied
of all possible sequences of a certain length. This space is to any problem where ordered sequences (e.g., advertise-
exponential in the number of classes and sequence length ment placement, search, recommendation systems, etc)
posing information theoretic difficulties. A more reasonable need to be predicted and relevance and diversity are
method would be to use the greedy selection technique by important.
Steeter et al. [24] over hypothesis space of predictors which
II. C ONTEXTUAL O PTIMIZATION OF S EQUENCES
is guaranteed to yield sequences within a constant factor
of the optimal sequence. Implemented naı̈vely, this remains A. Background
expensive as it must explicitly enumerate the hypothesis space. The set of all possible actions which we term the mother set
Our simple reduction based approach trains multiple multi- is denoted V . Each action is denoted by a ∈ V . 2 Formally,
class classifiers/regressors to mimic greedy selection given a function f : S → ℜ+ is monotone submodular for any
features of the environment while efficiently retaining the sequence S ∈ S where S is the set of all sequences of actions
performance guarantees of the greedy solution. if it satisfies the following two properties:
Perception modules using sensors such as cameras and li- • (Monoticity) for any sequence S1 , S2 ∈ S , f (S1 ) ≤ f (S1 ⊕
dars are part and parcel of modern robotic systems. Leveraging S2 ) and f (S2 ) ≤ f (S1 ⊕ S2 )
such information in addition to the feedback of success or • (Submodularity) for any sequence S1 , S2 ∈ S , f (S1 ) and
failure is conceptually straightforward: instead of considering any action a ∈ V , f (S1 ⊕ S2 ⊕ a) − f (S1 ⊕ S2 ) ≤ f (S1 ⊕
a sequence of control actions, we consider a sequence of a) − f (S1 )
classifiers which map features X to control actions A, and where ⊕ denotes order-dependent concatenation of sequences.
attempt to find the best such classifier at each slot in the control These imply that the function always increases as more actions
action sequence. By using contextual features, our method are added to the sequence (monotonicity) but the gain obtained
has the benefit of closing the loop with perception while by adding an action to a larger pre-existing sequence is less
maintaining the performance guarantees in Streeter et al.[24]. as compared to addition to a smaller pre-existing sequence
Alternate methods to produce contextual sequences include the (sub-modularity).
independent work of Yue et al. [30] which attempts to learn For control library optimization, we attempt to optimize one
submodular functions and then optimize them. Instead, our of two possible criteria: the cost of the best action a in a
approach directly attempts to optimize the predicted sequence. sequence (with a budget on sequence size) or the time (depth
The outlined examples present loss functions that depend in sequence) to find a satisficing action. For the former, we
only on the “best” action in the sequence, or attempt to consider the function,
minimize the prediction depth to find a satisfactory action.
No − min(cost(a1 ), cost(a2 ), . . . , cost(aN ))
Such loss functions are monotone, submodular – i.e., one f≡ , (1)
with diminishing returns.1 We define these functions in section No
II and review the online submodular function maximization where cost is an arbitrary cost on an action (ai ) given an
approach of Streeter et al. [24]. We also describe our contex- environment and No is a constant, positive normalizer which
tual sequence optimization (C ON S EQ O PT) algorithm in detail. is the highest cost. 3 Note that the f takes in as arguments the
Section III shows our algorithm’s performance improvement sequence of actions a1 , a2 , . . . , aN directly, but is also implicitly
over alternatives for local trajectory optimization for manipu- dependent on the current environment on which the actions are
lation and in path planning for mobile robots. evaluated in cost(ai ). Dey et al. [6] prove that this criterion is
Our contributions in this work are: monotone submodular in sequences of control actions and can
• We propose a simple, near-optimal reduction for contex-
be maximized– within a constant factor– by greedy approaches
tual sequence optimization. Our approach moves from similar to Streeter et al. [24].
predicting a single decision based on features to making 2 In this work we assume that each action choice takes the same time to
a sequence of predictions, a problem that arises in many execute although the proposed approach can be readily extended to handle
domains including advertisement prediction [25, 19] and different execution times.
3 For mobile robot path planning, for instance, cost(a ) is typically a
i
simple measure of mobility penalty based on terrain for traversing a trajectory
1 For more information on submodularity and optimization of submodular ai sampled from a set of trajectories and terminating in a heuristic cost-to-go
functions we refer readers to the tutorial [4]. estimate, compute by, e.g. A*.

50
For the latter optimization criteria, which arises in grasping
and trajectory seed selection, we define the monotone, sub-    
modular loss function f : S → [0, 1] as f ≡ P(S) where    
P(S) is the probability of successfully grasping an object   
using the sequence of grasps provided. It is easy to check  
[7] that this function is also monotone and submodular, as 
 
the probability of success always increases as we consider
additional elements. Minimizing the depth in the control    
 
library to be evaluated becomes our goal. In the rest of the
 
paper all objective functions are assumed to be monotone
submodular unless noted otherwise.
While optimizing these over library actions is effective, the
  
 

ordering of actions does not take into account the current
context. People do not attempt to grasp objects based only    
  
on previous performance of grasps: they take into account
 
the position, orientation of the object, the proximity and 
arrangement of clutter around the object and also their own  
position relative to the object in the current environment.    
 
B. Our Approach  
We consider functions that are submodular over sequences
over classifiers that take as input environment features X and 
map to control actions V . Additionally, by considering many 
environments, the expectation of f in equation (1) over these 
environments also maintains these properties. In our work, we Fig. 1: Schematic of training a sequence of classifiers for
always consider the expected loss averaged over a (typically regret reduction of contextual sequence optimization to multi-
empirical) distribution of environments. class, cost-sensitive, classification. Each slot has a dedicated
In Algorithm 1, we present a simple approach for learning classifier which is responsible for predicting the action with
such a near-optimal contextual control library. the maximum marginal benefit for that slot.
C. Algorithm for Contextual Submodular Sequence Optimiza-
tion whose features are in the first row of X. The best action has 0
loss while all the others have relative losses in the range [0, 1]
Figure 1 shows the schematic diagram for algorithm 1 depending on how much worse they are compared to the best
which trains a classifier for each slot of the sequence. Define action. This way the rest of the rows in ML1 are filled out.
matrix X to be the set of features from a distribution of The cost sensitive classifier π1 is trained which attempts to find
example environments (one feature vector per row). Let each the classifier which minimizes the expected multi-class cost-
feature vector contains L attributes. Let D be the set of sensitive loss. The set of features X from each environment in
example environments containing |D| examples. The size of the training set are again presented to it to classify. The output
X is |D| × L. We denote the ith classifier by πi . Define MLi is vector Yπ1 of length |D| which contains the selected action
to be the matrix of marginal losses for each environment for for the 1st slot for each environment. As no element of the
the ith slot of the sequence. In the parlance of cost-sensitive hypothesis class performs perfectly this results in Yπ1 , where
learning MLi is the example-dependent cost matrix. MLi is not every environment had the 0 loss action picked.
of dimension |D| × |V |. Each row of MLi contains, for the
corresponding environment, the loss suffered by the classifier Consider the second classifier training in Figure 1. Consider
for selecting a particular action a ∈ V . The most beneficial the first row of ML2 . Suppose control action id 13 was
action has 0 loss while others have non-zero losses. These selected by classifier π1 in the classification step for the first
losses are normalized to be within [0, 1]. We detail how to environment, which provides a gain of 0.6 to the objective
calculate the entries of MLi below. Classifier inputs are the function f i.e. f [13] = 0.6. For each of the control actions a
set of feature vectors X for the dataset of environments and present in the library V find the action which provides maxi-
the marginal loss matrix MLi . mum marginal improvement i.e. amax = argmaxa ( f ([13, a]) −
For ease of understanding let us walk through the training f ([13])) = argmaxa ( f ([13, a]) − 0.6. Additionally convert the
of the first two classifiers π1 and π2 . marginal gains computed for each a in the library to propor-
Consider the first classifier training in Figure 1 and its inputs tional losses and store in the first row of ML2 . If amax is the
X and ML1 . Consider the first row of ML1 . Each element of action with the maximum marginal gain then the loss for each
this row corresponds to the loss incurred if the corresponding of the other actions is f ([13, amax ]) − f ([13, a]). amax has 0
action in V were taken on the corresponding environment loss while other actions have >= 0 loss. The rest of the rows

51
Algorithm 1 Algorithm for training C O N S E Q O P T using classifiers
Input: sequence length N, multi-class cost sensitive classifier routine π, dataset D of
|D| number of environments and associated features X, library of control actions V
Output: sequence of classifiers π1 , π2 , . . . , πN
1: for i = 1 to N do
2: MLi ← computeTargetActions(X, Yπ1 ,π2 ,...,πi−1 , V )
3: πi ← train(X, MLi )
4: Yπi ← classify(X)
5: end for

Algorithm 2 Algorithm for training C O N S E Q O P T using regressors


Input: sequence length N, regression routine ℜ, dataset D of |D| number of
environments, library of control actions V
Output: sequence of regressors ℜ1 , ℜ2 , . . . , ℜN
1: for i = 1 to N do
2: Xi , MBi ← computeFeatures&Benefit(D, Yℜ1 ,ℜ2 ,...,ℜi−1 , V )
3: ℜi ← train(Xi , MBi )
4: M̃Bi ← regress(Xi , ℜi )
5: Yℜi = argmax(M̃Bi )
6: end for

are filled up similarly. π2 is trained, and evaluated on same producing the estimate M̃Bi . We then pick the action a which
dataset to produce Yπ2 . produces the maximum M̃Bi to be our target choice Yℜi , a |D|
This procedure is repeated for all N slots producing a se- length vector of indices into V for each environment.
quence of classifiers π1 , π2 , . . . , πN . The idea is that a classifier
D. Reduction Argument
must suffer a high loss when it chooses a control action which
provides little marginal gain when a higher gain action was We establish a formal regret reduction [3] between cost
available. Any cost-sensitive multi-class classifier may be used sensitive multi-class classification error and the resulting er-
[16]. ror on the learned sequence of classifiers. Specifically, we
During test time, for a given environment features are demonstrate that if we consider the control actions to be the
extracted, and the classifiers associated with each slot of the classes and train a series of classifiers– one for each slot of the
sequence outputs a control action to fill the slot. 4 sequence– on the features of a distribution of environments
The training procedure is formalized in Algorithm 1. In then we can produce a near-optimal sequence of classifiers.
computeTargetActions the previously detailed proce- This sequence of classifiers can be invoked to approximate
dure for calculating the entries of the marginal loss matrix the greedy sequence constructed by allowing additive error in
MLi for the ith slot is carried out, followed by the training step equation (3).
in train and classification step in classify. Theorem 1. If each of the classifiers (πi ) trained in Al-
Algorithm 2 has a similar structure as algorithm 1. This gorithm 1 achieves multi-class cost-sensitive regret of ri ,
alternate formulation has the advantage of being able to add then the resulting sequence of classifiers is within at least
actions to the control library without retraining the sequence (1 − 1e ) maxS∈S f (S) − ∑Ni=1 ri of the optimal such sequence of
of classifiers. Instead of directly identifying a target class, we classifiers S from the same hypothesis space. 5
use a squared-loss regressor in each slot to produce an estimate
of the marginal benefit from each action at that particular slot. Proof: (Sketch) Define the loss of a multi-class,
Hence MBi is a |D| × |V | matrix of the actual marginal benefit cost-sensitive classifier π over a distribution of envi-
computed in a similiar fashion as MLi of Algorithm 1, and M̃Bi ronments D as l(π, D). Each example can be repre-
|V |
is the estimate given by our regressor at ith slot. In line 2 we sented as (xn , m1n , m2n , m3n , . . . , mn ) where xn is the set
th
of features representing the n example environment and
compute the feature matrix Xi . In this case, a feature vector
|V |
is computed per action per environment, and uses information m1n , m2n , m3n , . . . , mn are the per class costs of misclassifying
|V |
from the previous slots’ target choice Yℜi . For feature vectors xn . m1n , m2n , m3n , . . . , mn are simply the nth row of MLi (which
of length L, Xi has dimensions |D||V | × L. The features and corresponds to the nth environment in the dataset D). The best
marginal benefits at ith slot are used to train regressor ℜi , class has a 0 misclassification cost and while others are greater
than equal to 0 (There might be multiple actions which will
4 Repetition of actions with respect to previous slots is prevented by using
5 When the objective is to minimize the time (depth in sequence) to find a
classifiers which individually output a ranked list of actions and filling the -
slot with the highest ranked action which is not repeated in the previous slots. satisficing element then the resulting sequence of classifiers f (ŜN ) ≤ 4 0∞ 1−
This sequence can then be evaluated as usual. maxS∈S f (Sn )dn + ∑Ni=1 ri .

52
yield equal marginal benefit). Classifiers generally minimize sequence of regressors in ' Algorithm 2 is within at least
the expected loss l(π, D) = E [Cπ(xn ) ] where (1 − 1e ) maxS∈S f (S) − ∑Ni=1 2(|V | − 1)rregi of the optimal
|V |
(xn ,m1n ,m2n ,m3n ,...,mn )∼D sequence of multi-class cost-sensitive classifiers.
π(x )
Cπ(xn ) = mn n denotes the example-dependent multi-class
misclassification cost. The best classifier in the hypothesis
III. C ASE S TUDIES
space Π minimizes l(π, D)
A. Robot Manipulation Planning via Contextual Control Li-
π ∗ = argmin E [Cπ(xn ) ] (2) braries
π∈Π (xn ,m1n ,m2n ,m3n ,...,m|V |
n )∼D
We apply C ON S EQ O PT to manipulation planning on a 7
The regret of π is defined as r = l(π, D) − l(π ∗ , D). Each degree of freedom manipulator.
classifier associated with ith slot of the sequence has a regret Recent work [21, 13] has shown that by relaxing the hard
ri . constraint of avoiding obstacles into a soft penalty term on
Streeter et al. [24] consider the case where the ith decision collision, simple local optimization techniques can quickly
made by the greedy algorithm is performed with additive error lead to smooth, collision-free trajectories suitable for robot
εi . Denote by Ŝ = ŝ1 , ŝ2 , . . . , ŝN  a variant of the sequence S in execution. Often the default initialization trajectory seed is
which the ith argmax is evaluated with additive error εi . This a simple straight-line initialization in joint space [21]. This
can be formalized as heuristic is surprisingly effective in many environments, but
f (Ŝi ⊕ ŝi ) − f (Ŝi ) ≥ max f (Ŝi ⊕ si ) − f (Ŝi ) − εi (3) suffers from local convergence and may fail to find a trajectory
si ∈V when one exists. In practice, this may be tackled by providing
cleverer initialization seeds using classification [13, 31]. While
where Ŝ0 = , Ŝi = ŝ1 , ŝ2 , ŝ3 , . . . , ŝi−1  for i ≥ 1 and si is the
these methods reduce the chance of falling into local minima,
predicted control action by classifier πi . They demonstrate that,
they do not have any alternative plans should the chosen
for a budget or sequence length of N
initialization seed fail. A contextual ranking of a library of
N initialization trajectory seeds can provide feasible alternative
1
f (ŜN ) ≥ (1 − ) max f (S) − ∑ εi (4) seeds should earlier choices fail. We take an approach similar
e S∈S i=1
to Dragan et al. [8] where novel actions can be evaluated with
assuming each control action takes equal time to execute. respect to the environment. Proposed initialization trajectory
Thus the ith argmax in equation (3) is chosen with some seeds can be developed in many ways including human
error εi = ri . An εi error made by classifier πi corresponds to demonstration [20] or use of a slow but complete planner [15].
the classifier picking an action whose marginal gain is εi less For this experiment we attempt to plan a trajectory to a pre-
than the maximum possible. Hence the performance bound on grasp pose over the target object in a cluttered environment
additive error greedy sequence construction stated in equation using the local optimization planner CHOMP [21] and mini-
(4) can be restated as mize the total planning and execution time of the trajectory. A
1 N training dataset of |D| = 310 environments and test dataset of
f (ŜN ) ≥ (1 − ) max f (S) − ∑ ri (5) 212 environments are generated. Each environment contains a
e S∈S i=1
table surface with five obstacles and the target object randomly
placed on the table. The starting pose of the manipulator is
Theorem 2. The sequence of squared-loss regressors (ℜi ) randomly assigned, and the robot must find a collision-free
1 trajectory to end pose above the target object. To populate
' in Algorithm 2 is within at least (1− e ) maxS∈S f (S)−
trained
the control library, we consider initialization trajectories that
∑i=1 2(|V | − 1)rregi of the optimal sequence of classifiers S
N

from the hypothesis space of multi-class cost-sensitive classi- move first to an “exploration point” and then to the goal.
fiers. The exploration points are generated by randomly perturbing
the midpoint of the original straight line initialization in
Proof: (Sketch) Langford et al. [16] show that the re- joint space. The resulting initial trajectories are then piece-
gret reduction from multi-class classification
' to squared-loss wise straight lines in joint space from the start point to
regression has a regret reduction of 2(|k| − 1)rreg where k the exploration point, and from the exploration point to the
is the number of classes and rreg is the squared-loss regret goal. 6 30 trajectories generated with the above method form
on the underlying regression problem. In Algorithm 2 we use our control library. Figure 2a shows an example set for a
squared-loss regression to perform multi-class classification particular environment. Notice that in this case the straight-
thereby incurring
' for each slot of the sequence a reduction line initialization of CHOMP goes through the obstacle and
regret of 2(|V | − 1)rregi where |V | is the number of actions
in the control library and rregi is the regret of the regressor for
the ith slot. Theorem 1 states that the sequence of classifiers 6 Half of the seed trajectories are prepended with a short path to start

is within at least f (ŜN ) ≥ (1 − 1e ) maxS∈S f (S) − ∑Ni=1 ri of from an elbow-left configuration, and half are in an elbow-right configuration.
This is because the local planner has a difficult time switching between
the optimal sequence of classifiers. Plugging in the regret configurations, while environmental context can provide much information
reduction from [16] we get the result that the resulting about which configuration to use.

53
therefore CHOMP has a difficult time finding a valid trajectory control sequence’s role is to quickly evaluate a few good
using this initial seed. options and choose the initialization trajectory that will result
in the minimum execution time. We note that in our experi-
ments, the overhead of ordering and evaluating the library is
negligible as we rely on a fast predictor and features computed
as part of the trajectory optimization, and by choosing a
small sequence length we can effectively compute a motion
plan with expected planning time under 0.5s. We can solve
most manipulation problems that arise in our manipulation
research very quickly, falling back to initializing the trajectory
optimization with a complete motion planner only in the most
difficult of circumstances.
For each initialization trajectory, we calculate 17 simple
feature values which populate a row of the feature matrix Xi .
7 During training time, we evaluate each initialization seed

in our library on all environments in the training set, and


use their performance and features to train each regressor ℜi
in C ON S EQ O PT. At test time, we simply run Algorithm 2
without the training step to produce Yℜ1 ,...,ℜN as the sequence
of initialization seeds to be evaluated. Note that while the
(a) The default straight-line initialization of CHOMP is in bold. first regressor uses only the 17 basic features, the subsequent
Notice this initial seed goes straight through the obstacle and
causes CHOMP to fail to find a collision-free trajectory.
regressors also include the difference in feature values between
the remaining actions and the actions chosen by the previous
regressors. These difference features improve the algorithm’s
ability to consider trajectory diversity in the chosen actions.
We compare C ON S EQ O PT with two methods of ranking the
initialization library: a random ordering of the actions, and an
ordering by sorting the output of the first regressor. Sorting by
the first regressor is functionally the same as maximizing the
absolute benefit rather than the marginal benefit at each slot.
We compare the number of CHOMP failures as well as the
average execution time of the final trajectory. For execution
time, we assume the robot can be actuated at 1 rad/second for
each joint and use the shortest trajectory generated using the
N seeds ranked by C ON S EQ O PT as the performance. If we
fail to find a collision free trajectory and need to fall back to
a complete planner (RRT [15] plus trajectory optimization),
we apply a maximum execution time penalty of 40 seconds
due to the longer computation time and resulting trajectory.
The results over 212 test environments are summarized
in Figure 3. With only simple straight line initialization,
(b) The initialization seed for CHOMP found using C ON S E - CHOMP is unable to find a collision free trajectory in 162/212
Q O PT is in bold. Using this initial seed CHOMP is able to find
a collision free path that also has a relatively short execution environments, with a resulting average execution time of 33.4s.
time. While a single regressor (N = 1) can reduce the number of
CHOMP failures from 162 to 79 and the average execution
Fig. 2: CHOMP initialization trajectories generated as control
time from 33.4s to 18.2s, when we extend the sequence
actions for C ON S EQ O PT. The end effector path of each trajec-
length, C ON S EQ O PT is able to reduce both metrics faster
tory in the library is traced out. The trajectory in bold in each
than a ranking by sorting the output of the first regressor.
image traces the initialization seed generated by the default
This is because for N > 1, C ON S EQ O PT chooses a primitive
straight-line approach and by C ON S EQ O PT, respectively.
that provides the maximum marginal benefit, which results
In our results we use a small number (1 − 3) of slots in our
7 Length of trajectory in joint space; length of trajectory in task space, the
sequence to ensure the overhead of ordering and evaluating the
library is small. When CHOMP fails to find a collision-free xyz values of the end effector position at the exploration point (3 values), the
distance field values used by CHOMP at the quarter points of the trajectory
trajectory for multiple initializations seeds, one can always (3 values), joint values of the first 4 joints at both the exploration point (4
fall back on slow but complete planners. Thus the contextual values) and the target pose (4 values), and whether the initialization seed is
in the same left/right kinematic arm configuration as the target pose.

54
in trajectory seeds that have very different features from the    

previous slots’ choices. Ranking by the absolute benefit tends  
to pick trajectory seeds that are similar to each other, and 
 
thus are more likely to fail when the previous seeds fail. At a 

  


sequence length of 3, C ON S EQ O PT has only 16 failures and an  

average execution time of 8 seconds. A 90% improvement

in success rate and a 75% reduction in execution time.
Note that planning times are generally negligible compared to 
execution times for manipulation hence this improvement is 
significant. Figure 2b shows the initialization seed found by 
C ON S EQ O PT for the same environment as in Figure 2a. Note

that this seed avoids collision with the obstacle between the
manipulator and the target object enabling CHOMP to produce 
a successful trajectory.     !
 
B. Mobile Robot Navigation   
       
An effective means of path planning for mobile robots is 
to sample a budgeted number of trajectories from a large  
library of feasible trajectories and traverse the one which has 
 
the lowest cost of traversal for a small portion and repeat
  
the process again. The sub-sequence of trajectories is usually
computed offline [11, 9]. Such methods are widely used in
  

modern, autonomous ground robots including the two highest
placing teams for DARPA Urban Challenge and Grand Chal- 
lenge [28, 17, 27, 26], LAGR [12], UPI [1], and Perceptor [14]

programs. We use C ON S EQ O PT to maximize this function and
generate trajectory sequences taking the current environment 
features.
Figures 4a and 4b shows a section of Fort Hood, TX and 
the corresponding robot cost-map respectively. We simulated     
 
a robot traversing between various random starting and goal         
locations using the maximum-discrepancy trajectory [11] se-
Fig. 3: Results of C ON S EQ O PT for manipulation planning in
quence as well as sequences generated by C ON S EQ O PT using
212 test environments. The top image shows the number of
Algorithm 1. A texton library [29] of 25 k-means cluster
CHOMP failures for three different methods after each slot
centers was computed for the whole overhead map. At run-
in the sequence. C ON S EQ O PT not only significantly reduces
time the texton histogram for the image patch around the robot
the number of CHOMP failures in the first slot, but also
was used as features. Online linear support vector machines
further reduces the failure rate faster than both the other
(SVM) with slack re-scaling [22] were used as the cost-
methods when the sequence length is increased. The same
sensitive classifiers for each slot. We report a 9.6% decrease
trend is observed in the bottom image, which shows the
over 580 runs using N = 30 trajectories in the cost of traver-
average time to execute the chosen trajectory. The ‘Straight
sal as compared to offline precomputed trajectory sequences
Seed’ column refers to the straight-line heuristic used by the
which maximize the area between selected trajectories [11].
original CHOMP implementation
Our approach is able to choose which trajectories to use at each
step based on the appearance of terrain (woods, brush, roads,
V. ACKNOWLEDGEMENTS
etc.) As seen in Figure 4c at each time-step C ON S EQ O PT the
trajectories are so selected that most of them fall in the empty This work was funded by the Defense Advanced Research
space around obstacles. Projects Agency through the Autonomous Robotic Manipula-
tion Software Track (ARM-S) and the Office of Naval Re-
IV. D ISCUSSION search through the ”Provably-Stable Vision-Based Control of
We have presented a framework to predict action sequences High-Speed Flight through Forests and Urban Environments”
with performance guarantees from a set of discrete control ac- project.
tions. Muelling et al. [18] show how to predict a single action
which is a combination of actions in the library. Leveraging R EFERENCES
our framework to also predict such action sequences which [1] J.A. Bagnell, D. Bradley, D. Silver, B. Sofman, and
are not explicitly in the library can prove to be a powerful A. Stentz. Learning for autonomous navigation. Robotics
extension leading to improved performance. Automation Magazine, IEEE, June 2010.

55
challenges, methodology, and phase I results. JFR, 2006.
[13] N. Jetchev and M. Toussaint. Trajectory prediction:
learning to map situations to robot trajectories. In ICML.
ACM, 2009.
[14] A. Kelly et al. Toward reliable off road autonomous
vehicles operating in challenging environments. IJRR,
(a) Overhead color map of por- (b) Cost map of corresponding 2006.
tion of Fort Hood, TX portion [15] Jr. Kuffner, J.J. and S.M. LaValle. Rrt-connect: An
efficient approach to single-query path planning. In
ICRA, 2000.
[16] J. Langford and A. Beygelzimer. Sensitive error correct-
ing output codes. In Learning Theory, Lecture Notes in
Computer Science. 2005.
[17] M. Montemerlo et al. Junior: The stanford entry in the
urban challenge. JFR, 2008.
[18] K. Muelling, J. Kober, and J. Peters. Learning table tennis
with a mixture of motor primitives. In International
Conference on Humanoid Robots, 2010.
[19] F. Radlinski, R. Kleinberg, and T. Joachims. Learning
diverse rankings with multi-armed bandits. In ICML,
2008.
[20] N. Ratliff, J.A. Bagnell, and S. Srinivasa. Imitation learn-
(c) Robot traversing the map using C ON S EQ O PT generating ing for locomotion and manipulation. Technical Report
trajectory sequences which try to avoid obstacles in the vicinity CMU-RI-TR-07-45, Robotics Institute, Pittsburgh, PA,
December 2007.
[2] D. Berenson, R. Diankov, K. Nishiwaki, S. Kagami, and
[21] N. Ratliff, M. Zucker, J.A. Bagnell, and S. Srinivasa.
J. Kuffner. Grasp planning in complex scenes. In IEEE-
Chomp: Gradient optimization techniques for efficient
RAS Humanoids, December 2007.
motion planning. In ICRA, May 2009.
[3] A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and
[22] B. Scholkopf and A.J. Smola. Learning with kernels:
B. Zadrozny. Error limiting reductions between classifi-
support vector machines, regularization, optimization,
cation tasks. In ICML. ACM, 2005.
and beyond. the MIT Press, 2002.
[4] A. Kraus C. Guestrin. Beyond convexity: Submodularity
[23] M. Stolle and C.G. Atkeson. Policies based on trajectory
in machine learning. URL www.submodularity.org.
libraries. In ICRA, 2006.
[5] E. Chinellato, R.B. Fisher, A. Morales, and A.P. del
[24] M. Streeter and D. Golovin. An online algorithm for
Pobil. Ranking planar grasp configurations for a three-
maximizing submodular functions. In NIPS, 2008.
finger hand. In ICRA. IEEE, 2003.
[25] M. Streeter, D. Golovin, and A. Krause. Online learning
[6] D. Dey, T.Y. Liu, B. Sofman, and J.A. Bagnell. Efficient
of assignments. In NIPS, 2009.
optimization of control libraries. Technical Report CMU-
[26] S. Thrun et al. Stanley: The robot that won the darpa
RI-TR-11-20, Robotics Institute, Pittsburgh, PA, June
grand challenge: Research articles. J. Robot. Syst., 2006.
2011.
[27] C. Urmson et al. A robust approach to high-speed
[7] Debadeepta Dey, Tian Yu Liu, Boris Sofman, and J. An-
navigation for unrehearsed desert terrain. JFR, August
drew Bagnell. Efficient optimization of control libraries.
2006.
In AAAI, July 2012.
[28] C. Urmson et al. Autonomous driving in urban environ-
[8] A. Dragan, G. Gordon, and S. Srinivasa. Learning from
ments: Boss and the urban challenge. JFR, June 2008.
experience in manipulation planning: Setting the right
[29] J. Winn, A. Criminisi, and T. Minka. Object catego-
goals. In ISRR, July 2011.
rization by learned universal visual dictionary. In ICCV,
[9] L.H. Erickson and S.M. LaValle. Survivability: Measur-
2005.
ing and ensuring path diversity. In ICRA. IEEE, 2009.
[30] Y. Yue and C. Guestrin. Linear submodular bandits and
[10] E. Frazzoli, M.A. Dahleh, and E. Feron. Robust hybrid
their application to diversified retrieval. In NIPS, 2011.
control for autonomous vehicle motion planning. In
[31] M. Zucker. A data-driven approach to high level plan-
Decision and Control, 2000.
ning. Technical Report CMU-RI-TR-09-42, Robotics
[11] C. Green and A. Kelly. Optimal sampling in the space
Institute, Pittsburgh, PA, January 2009.
of paths: Preliminary results. Technical Report CMU-RI-
TR-06-51, Robotics Institute, Pittsburgh, PA, November
2006.
[12] L.D. Jackel et al. The DARPA LAGR program: Goals,

56
Physics-Based Grasp Planning Through Clutter
Mehmet R. Dogar∗ , Kaijen Hsiao† , Matei Ciocarlie† , Siddhartha S. Srinivasa∗
∗ The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213
† Willow Garage, Inc., 68 Willow Road, Menlo Park, CA 94025

Abstract—We propose a planning method for grasping in


cluttered environments, where the robot can make simultaneous
contact with multiple objects, manipulating them in a deliberate
and controlled fashion. This enables the robot to reach for and
grasp the target while simultaneously contacting and moving
aside obstacles in order to clear a desired path. We use a physics-
based analysis of pushing to compute the motion of each object in
the scene in response to a set of possible robot motions. In order to
make the problem computationally tractable, we enable multiple
simultaneous robot-object interactions, which we pre-compute
and cache, but avoid object-object interactions. Tests on large
sets of simulated scenes show that our planner produces more
successful grasps in more complex scenes than versions that avoid
any interaction with surrounding clutter. Validation on a real
robot shows that our grasp evaluation method accurately predicts
the outcome of a grasp, and that our approach, in conjunction Fig. 1. Manipulation in a constrained and cluttered environment. Left: a
with state-of-the-art object recognition tools, is applicable in real- robot tries to acquire a partly occluded object from a constrained shelf space.
life scenes that are highly cluttered and constrained. Right: the robot clears a path to the target by pushing obstacles in a controlled
manner, and completes the grasp.
I. I NTRODUCTION
Robots operating in our homes will inevitably be confronted
with scenes that are congested, unorganized, and complex - immovable and movable obstacles. However, these approaches
or, simply put, cluttered. Consider, for example, the following are still object-centric, in that each individual robot action still
representative problem: the robot must acquire an object from addresses a single object, while avoiding all contact with the
the back of a cluttered bookcase or fridge shelf (Fig. 1, Left). others. Essentially, the scene is treated like a game of chess,
Approaching the target object from the top is impossible due to where the robot moves one piece at a time until the goal is
the constrained space inside the shelf; various obstacles block achieved.
approaches from the front or side. Some of the obstacles are In this paper, we propose a clutter-centric perspective,
too large for the robot to grasp. How can the robot complete where the fundamental action primitives enable simultaneous
the task? contact with multiple objects. For grasping, this enables the
Planning for manipulation in clutter requires understanding robot to reach for and grasp the target while simultaneously
the consequences of a robot’s interaction with a complex contacting and moving obstacles, in a controlled manner, in
scene. The most direct approach to grasping is object-centric, order to clear the desired path (Fig. 1, Right). In addition
in that it separates the scene into two simple categories: the to simply closing the gripper on the target, interaction with
target object vs. everything else. The desired interaction with objects in the scene is done through quasi-static pushing,
the target object is to touch it with the end-effector, instantly recently used as the basis of the “push-grasping” [5] action.
transitioning into a stable grasp where the object is rigidly However, while push-grasping avoids all contact with any
attached to the arm. Sets of hand poses and configurations that obstacles, we explicitly represent clutter and reason about how
create stable grasps for the target can be pre-computed [7], it will interact with the intended grasp trajectory. Rather than
[11]. Any interaction with the rest of the scene is utterly shying away from complex and sustained interactions with the
avoided, and as such, no consequences must be predicted. world while grasping, the robot uses them to its advantage.
The object-centric approach works well in structured or Instead of a static pose relative to a target object, we
semi-structured settings where objects are well-separated. consider a grasp to also encapsulate a complete trajectory,
However, unstructured and complex environments pose serious possibly making and maintaining multiple contacts throughout
challenges: clutter often makes it impossible to guarantee its execution, and ending with the act of closing the robot’s
avoiding the rest of the world while reaching for an object. gripper. For predicting the outcome of an intended grasp, we
Recent studies address this problem by using reconfigu- use a physics-based simulation method that computes the mo-
ration planners that are able to move obstacles out of the tion of the objects in the scene in response to a set of possible
way in order to reach the target [6], [19]. The robot uses robot motions. In order to make the problem computationally
richer semantic descriptions of a scene, differentiating between tractable, our method allows multiple simultaneous robot-

57
object interactions, which can be pre-computed and cached,
but avoids object-object interaction, which would have to be
computed at run-time.
We compare this approach against a “static” planner that
avoids all contact except for closing the gripper on the target.
We also compare it against the object-centric implementation
of the original push-grasp planner, which uses non-prehensile
manipulation on the target but avoids all obstacles. Our results
show that, for a large set of simulated cluttered scenarios,
our method returns more possible grasps for a given scene,
and succeeds in more scenes. We validate these findings on
a real robot, a PR2, showing that the predicted outcomes
of simulated tasks match the real-life results. We also show
that our approach can be used in conjunction with existing
object recognition tools operating on real sensor data, allowing
the robots to complete grasping tasks in challenging real-life
scenarios such as the above shelf.
Fig. 2. Left: Example objects and their trimesh models. Right: Examples of
II. P HYSICS -BASED G RASPING computed trajectories of objects as the hand pushes them. The hand motion
At its core, our approach relies on predicting the interactions is towards the top of the page in each diagram, and is not shown.
between the robot’s end-effector and the objects in a scene, as
the robot reaches for a target. For a given trajectory of the end-
effector, we would like to compute the resulting motion of all
contacted objects, including the target and potential obstacles.
We can predict these motions using physics-based simulations;
however, these simulations are prohibitively expensive from
a computational standpoint, and do not scale well with the
multiple objects involved: computing contact forces between
multiple rigid bodies is an NP-hard problem [1]. Additionally,
we need to evaluate many candidate grasps to find one
successful grasp, which further increases the computational
costs. Fig. 3. Notation used for hand-object contacts. The hand’s velocity is given
As the general space of possible robot-scene interactions by vh . The friction cone edges, (fL , fR ), and the normal n̂ at the contact
are illustrated.
is intractable to compute, we believe that a viable path
forward is to identify assumptions and constraints that allow
us to study parts of this space, increasing the capabilities in which the hand touches multiple objects as simultaneous
of our robots. As a first step, we focus on a subset of all individual interactions that are pre-computed, as long as the
possible hand trajectories, comprising linear motion along a objects do not touch each other and do not affect the hand’s
pre-defined approach direction relative to the palm. If an object trajectory. We can also detect situations where our conditions
is contacted, this motion results in quasi-static pushing, as the are violated, and thus avoid executing trajectories whose side-
hand continues to advance, with a reduced velocity in order to effects are not fully accounted for.
avoid inertial effects. The mechanics of this interaction have
been previously studied [9], [12], [14], [15], and used recently A. Quasi-Static Pushing Analysis
in [5] with the goal of predicting the final pose of the object Using the notation in Fig. 3, consider a scene where the
relative to the hand. robot hand is moving in the direction vh and contacts an
A second step for reducing the computational complexity object. We further assume that the object is resting on a planar
of physics-based planning is to pre-compute complete object support surface parallel to the xy plane, and that both the initial
trajectories in response to hand motion along the pushing pose of the hand and its velocity are parallel to this plane. We
direction. For each object in our database, we compute trajec- use n̂ to show the normal to the contact, and CoM to show the
tories analogous to the ones shown in Fig. 2 for many possible center of mass of the object. The hand applies a generalized
initial object poses relative to the hand. It is important to note force, q = (fx , fy , m), to the object, causing it to move. Our
that these trajectories are computed for each object in isolation, goal is to compute the resulting generalized object velocity,
in the absence of other obstacles. p = (vx , vy , ω), represented relative to CoM.
Finally, at run-time, we evaluate a potential hand trajectory Given the coefficient of friction, μc , between the finger and
by checking its effect on each object in the scene. We base the object, Coulomb friction restricts the tangential force ft as
our approach on this observation: we can evaluate scenarios a function of the normal force fn at the contact:ft ≤ μc fn . It

58
follows that the possible directions of the force, f = (fx , fy ), that, by definition, in the case of sticking the contact point on
that is applied to the object by the contact is bounded by a the object moves with the same velocity vh as its counterpart
friction cone [15], defined by edges fL and fR making an on the finger.
angle α = arctan μ with the contact normal n̂. If the object We determine whether the contact will be a sticking or
is sliding on the finger as it is being moved, the force applied sliding by computing the velocity vectors at the contact point
at the contact lies at these extreme boundaries. that correspond to the edges of the friction cones: i.e. if the
If we can compute the force applied to the object by the force applied to the object was at a friction cone edge, how
hand, we can then convert it into a velocity for the object would the contact point on the object move? Solving the above
using the limit surface [8], which takes into account the contact constraints for fL and fR defines the motion cone [15]. If the
between the object and the support surface. This contact occurs velocity vector at the contact point on the hand, vh , lies inside
over an area rather than a point, allowing for the transmission the motion cone, the contact will stick; otherwise, it will slide.
of both forces in the xy plane and a moment around an axis See [9], [12] for more details.
perpendicular to it. Given a generalized force on the object, we
find the corresponding general velocity by taking the normal B. Pre-computed Object Trajectories
to the limit surface at the point the generalized force intersects We use the quasi-static contact model outlined in the
it. Our quasi-static assumption implies that the applied force previous subsection to compute object trajectories as the
always lies exactly on the limit surface. hand performs a linear pushing motion. We perform the
In general, computing the limit surface of an object is not pre-computation for each object in isolation: throughout its
analytically solvable. Howe and Cutkosky [9], however, show motion, the object’s only contacts are with the hand and the
that the limit surface can practically be approximated by an support surface. During a push the hand is always parallel to
ellipsoid centered at the CoM. Given this model, we can express the support surface at a pre-specified height (set at 9 cm for
how limit surface relates the generalized velocity of the object all experiments in this study), and the object is resting on the
to the applied generalized force as: support surface. Combined with the quasi-static assumption,
 this means that the resulting motion of the object depends on
'
vx 2 + vy 2 fx 2 + fy 2 mmax the geometry of the hand, as well as the object’s geometry,
= ( )2 frictional and mass properties, and pose relative to the hand
ω m fmax
at the onset of pushing.
where fmax is the maximum force the object can apply to
Possible initial poses of an object in the hand’s coordinate
its support surface during sliding and mmax is the maximum
frame are given by the physically plausible subset of SE(2)
moment the object can apply about CoM, which happens when
for which the object and the hand do not interpenetrate.
the object is rotating around CoM. We find the ratio of fmax
Furthermore, of interest to us are only the initial object poses
to mmax by:
for which the hand and the object actually make contact at
fmax = μs fns
 some point during the hand trajectory. We refer to this set of
mmax = μs |x|p(x) dA object poses as the contact region:
A
C(o, τ ) = {c ∈ SE(2)| if o is initially at c }
hand contacts o during τ
where μs is the coefficient of friction between the object and
the support surface, fns is the normal force that the support where o is the object, τ is the hand trajectory, and c is the
surface applies on the object, A is the area between the object initial pose of the object in the hand’s coordinate frame.
and the support surface, dA is a differential element of A, x is To precompute all possible motions of o in response to τ ,
the position vector of dA, and p(x) is the pressure at x. Note one can discretize C(o, τ ) (in this study, we used intervals of
that μs and the mass of the object cancel out when computing 5mm and 10 degrees for the discretization), place the object at
the ratio; therefore, they do not play a role in predicting the every resulting pose, simulate the motion of the hand during τ ,
motion of the pushed object. We still need to know the pressure and record the object trajectory. The resulting set To,τ contains
distribution at the contact, which we discuss in Section II-B. the trajectory of the pushed object o for each starting pose
Another constraint on the relation between the applied force c ∈ C(o, τ ), assuming hand trajectory τ .
to the object and its motion comes from the fact that the In this paper, we execute linear pushing actions with differ-
limit surface ellipsoid is a circle at the force plane: the linear ent pushing distances, push(d). We computed To,push(d) for a
velocity of the object v = (vx , vy ) is always parallel to the very large d; in this study, we used 0.5m. Then, at planning
applied force f = (fx , fy ) [9]. time, we can extract the object trajectories for shorter distances
As mentioned above, when the contact between the finger from the trajectories of this long push.
and the object is sliding, the applied force is at the friction The linearity of our actions simplify the computation of
cone edge opposing the direction of relative motion, and To,push(d) and reduce the amount of data we need to store. We
the two constraints listed above are sufficient to solve for do not need to place the object at every pose in C(o, push(d)).
the corresponding generalized velocity. When the contact is Instead we placed the object only at the poses where the object
sticking and the applied force is interior to the friction cone, is initially in contact with the hand. There will be cases where
an extra constraint is needed. We can derive one from the fact the hand contacts the object later during the push. However,

59
space of grasps by the initial pose of the gripper in the scene
as well as the length of the linear motion. A successful grasp
will result in the target object stably enclosed in the gripper.
Given the pose of the target object in the scene, we first
generate a set of possible grasps that approach the object from
different approach directions, and at different lateral offsets.
The grasps in this set are evaluated in random order using the
method below until a feasible grasp is found. At that point,
the robot computes a collision-free motion plan to bring the
gripper to the initial pose in the linear trajectory, then executes
the rest of the grasp open-loop.
Fig. 4. Left: An example object and its trimesh model. Center: Simu-
We present an illustration of evaluating a grasp in Fig. 5,
lated object trajectory when contact pressure distribution is concentrated at with an environment constrained by immovable walls on three
the object’s CoM. Right: Simulated object trajectory when contact pressure sides, and populated with three objects, the one in the middle
distribution is concentrated at the object’s periphery.
being the target. Let us assume the grasp we evaluate has an
approach direction, a, pointing upwards, and a lateral offset,
l, a few centimeters to the right (relative to the target object’s
we represent these hand trajectories using their last part during
CoM). Our evaluation algorithm performs the following steps:
which the hand is in contact with the object. The linearity of
1. Compute the initial pose: We place the hand on top of the
our actions implies that this last part is also a linear push, for
target object, pointing in the approach direction and applying
which we already have the object trajectory data. Examples of
the given lateral offset. We then back up (in the opposite
pre-computed object trajectories can be seen in Fig. 2.
approach direction) until the hand is collision-free to find the
The object’s mass distribution also affects the trajectory of
initial pose, ph , for the grasp trajectory.
the object, as it determines the pressure profile of the object-
2. Compute pushing distance for successful grasp: Starting
surface contact. However, this property is difficult to determine
from ph , we draw on our database of pre-computed object
accurately, as is the coefficient of friction between the hand
motions to see what distance d traveled by the gripper along
and the object. For the hand-surface contact, we assumed that
the approach direction, if any, will bring the target object into
the pressure distribution can take one of two different forms,
a position where it can be grasped. In this study, we use the
one where the mass is concentrated on the periphery of the
following heuristic: if the center of mass of the object passes
object, and one concentrated on the center of the object; Fig. 4
behind the line connecting the fingertips of the hand, it is
shows the resulting object motion for each of these two cases
considered to be in a graspable pose. We found this heuristic
(each trajectory stops when the object is inside the hand).
to work well in practice for parallel grippers; for dexterous
For the experiments in this study, we used a fixed value for
hands, checks based on force- or form-closure can be used
the hand-object friction coefficient of 0.6. Generalizing these
instead.
assumptions, future implementations can further discretize the
3. Check immovable obstacles: Once we know ph and d,
space of these parameters and compute separate trajectories
we need to check if the grasp will succeed in the given
for each resulting point in this space.
environment. We first check if the hand will penetrate any
We set the simulated robot hand at a specific configuration non-movable objects (e.g. walls) during the grasp. If yes, we
when we are computing the object trajectories. We would need discard the grasp.
to compute a new set of object trajectories if we were to set 4. Check movable obstacles: We continue by identifying the
the hand at a different configuration, or if we were to use a movable objects the hand will make contact with during the
different hand. grasp. For each of them, we use the respective pre-computed
trajectories to predict motion in response to the grasp. How-
C. Grasp Evaluation in Clutter
ever, since these trajectories were computed with the objects
Armed with the pre-computed data described in the previous in isolation, their validity needs to be maintained. Therefore, if
sub-section, we are ready to evaluate grasps on complete at any point during the grasp, pre-computed trajectories show
scenes. We consider a scene to be composed of the following any object colliding with an obstacle (movable or immovable),
elements: a set of objects of known identity comprising both the grasp is discarded, as we cannot safely predict the resulting
the target and any movable obstacles, a planar surface that object motions. Fig. 6 shows modifications of the original
the objects are resting on, and additional obstacles for which scene where our example grasp is discarded by this process.
no additional semantic information is available and are thus 5. Confirm grasp for execution: Once all the tests above
treated as immovable. have passed, the grasp is deemed safe to execute.
We define a grasp as the following action: starting from a
given gripper pose in the scene, the robot executes a linear D. Handling Uncertainty
gripper trajectory in the pre-defined direction normal to the The evaluation algorithm described above is based on
palm, followed by closing the fingers. We parameterize the knowing the relative locations of the objects in the scene.

60
Fig. 5. Illustration of evaluating a grasp with a given approach direction, a, and lateral offset, l.

TABLE I
P LANNING T IME C OMPARISON

Time per grasp Time until first suc-


evaluation (sec) cessful grasp (sec)
Clutter-Grasp 0.14 0.62
Push-Grasp 0.08 2.43
Static-Grasp 0.03 2.44

• Push-Grasp: This planner, similar to the one presented in


Fig. 6. Illustration of grasps that will be rejected. Left: The pushed object
contacts another object. Right: The pushed object contacts a non-movable [5], is constrained to moving only the target object. No
object (the side-wall). surrounding object is allowed to be contacted or moved.
• Static-Grasp: This planner is not allowed to move the
target object or any of the other objects in the scene; it is
However, when operating in unstructured environments, these restricted to checking whether the object can be grasped
poses are inevitably affected by uncertainty, due to imprecise in its current pose without collisions.
localization, imperfect robot calibration, etc. Previous work We randomly-generated 100 simulated scenes, each with 6
has shown how pushing manipulation can be used to reduce objects that were placed, uniformly at random, in an area of
uncertainty in the pose of the target object [5]. In our method, 0.4m × 0.5m. Scenes in which objects interpenetrated were
we use a similar principle, and extend it to surrounding rejected and replaced. All three planners use the same search
obstacles. space when planning grasps: all grasps are from the side, and
If any of the parameters affecting the behavior of an object the same search parameters are used for approach direction and
in the scene is uncertain, our planner will generate multiple lateral offset. Thus, the total number of evaluated grasps is the
possible samples for that object, sampling the space of possible same for all three planners. For this experiment, we assume
values for the uncertain parameter. We refer to these as zero uncertainty; the main feature of the clutter-grasp planner
uncertainty samples of the object. We can predict how each of over the push-grasp planner is its ability to deal with clutter,
these uncertainty samples will react to gripper contact based and so we are primarily concerned with the effect of increasing
on the pre-computed data described in the previous section. clutter. The clutter-grasp planner has the same ability as the
Our planner accepts a grasp only if it works for all samples push-grasp planner to deal with varying levels of uncertainty,
of all objects in the environment. as quantified in [5].
For avoiding object-object collisions, this implies perform- Fig. 7 shows how the number of grasps planned for each
ing a number of collision tests on the order of (O×N )2 , where grasp planner changes when the 100 random scenes are sorted
O is the number of objects in the scene and N is the number by scene complexity, defined here by the inverse of the number
of uncertainty samples for each object. In practice, however, of grasps found by the static-grasp planner. The fewer grasps
most of the samples in the environment are not moved by the found by the static-grasp planner, the more cluttered and
hand, and so we do not need to collision check between them. constrained the environment around the target object; the
bottom images in Fig. 7 show example scenes for varying
III. P LANNER PERFORMANCE
levels of scene complexity. For scenes of medium complexity,
To better understand how the clutter-grasp planner performs, the clutter-grasp planner finds more grasps than the other two,
we quantified its ability to plan valid grasps in randomly- although all three planners are able to find some valid grasps.
generated, cluttered scenes, and compared its performance However, for the very most cluttered scenes, only the clutter-
to that of two other grasp planners using geometry-only grasp planner is able to find valid grasps that allow the robot
simulation. The grasp planners that we used in our experiments to shove through the clutter surrounding the target object.
were as follows: Planning times for the three grasp planners are shown in
• Clutter-Grasp: The planner presented in this paper. Table I. The clutter-grasp planner takes longer to evaluate each

61
Fig. 8. Time until first grasp for all three planners, for the most complex
60 of the 100 random scenes, sorted by scene complexity.

Fig. 7. Top: A comparison of number of grasps found, in randomly-generated


scenes sorted by scene complexity (the 60 most complex in the set of 100
generated), for our three grasp planners. Bottom: Bird’s eye view of example
scenes for varying levels of complexity. The target object is shown in light
gray.
Fig. 9. An example execution of a randomly-generated scene. Left: Object
meshes in the randomly-generated scene configuration, along with point cloud
from a KinectTM camera. Middle: Actual scene with gripper at start of push.
grasp, but because a larger fraction of evaluated grasps are Right: Actual scene after object has been grasped.
feasible, the average planning time until the first feasible grasp
is found is lower than for the other two planners. Planning
time until the first feasible grasp is found is also plotted for
varying scene complexity in Fig. 8; for the most complex
scenes where the push-grasp and static-grasp planners return
no feasible grasps, the planning time reflects the time taken
to check all grasps in the given search space and then return
failure.

IV. R EAL ROBOT E XPERIMENTS


We performed two sets of experiments on an actual two-
armed mobile manipulator. The first set of experiments vali-
dated the grasps generated by the clutter-grasp planner in our
geometry-only planning experiments by running them on a real Fig. 10. More randomly-generated scenes with kinematically-feasible clutter-
robot; the second set of experiments tested the performance grasps; all attempted clutter-grasps were executed successfully.
of the clutter-grasp planner using real-world uncertainty levels
from an actual, state-of-the-art object recognition algorithm.
In the first set of real-world experiments, we executed grasps objects were placed at the randomly-generated locations on
planned by the clutter-grasp planner on a randomly-chosen set the table by carefully aligning their locations as seen by the
of 10 scenes from the set of 100 randomly-generated scenes robot’s KinectTM point cloud with the object meshes rendered
used in the geometry-only experiments in the previous section. at their desired locations using rviz [22], a 3D visualization
Only scenes with clutter-grasps that were reachable and that tool. The resulting point clouds with aligned meshes are shown
had feasible motion plans were chosen; of the 100 total scenes, in Fig. 9 and Fig. 10, along with a sample grasp execution.
78 had grasps within reach of the robot, and all but 6 of those For these experiments, grasps were generated for uncertainty
had feasible motion plans for at least one grasp. Real-world standard deviations of 1 cm in both translation directions, and

62
0.1 radians in rotation, which is roughly appropriate for the the range of situations a robot can handle. We outline some
accuracy of placement. 10 out of 10 grasps planned by the directions for future research below.
clutter-grasp planner succeeded in picking up the target object,
Object-object contacts. Our planner cannot find grasps where
while successfully moving aside obstacles as predicted.
object-object contact will occur, because our pre-computed
In the second set of experiments, we used the textured
trajectories do not include these cases. We cannot enumerate
object detector described in [17] to recognize objects placed
all possible configurations of all objects and pre-compute how
on the shelf shown in Fig. 1. We only used movable objects.
they move. On the other hand, simulating such interactions
The resulting object poses were then fed to the clutter-grasp
online is too time consuming. We believe that a grasp planner
planner to use in planning grasps for a hand-selected target
can allow object-object contact by using sensor feedback dur-
object. Scenes in which the object detector failed to recognize
ing execution to monitor the acceptability of object motions.
objects were rejected, but even when the objects are correctly
Still, our open-loop planner will be essential to provide an
identified, the poses returned by the object detector have
initial plan of action and to work even when the sensor data
significant amounts of pose uncertainty. The resulting object
flow is not robust.
detection results are shown as point cloud object outlines
overlaid on the images seen by the robot’s KinectTM camera, Toppling. The current version of our planner assumes objects
along with the evaluated grasp hypotheses, in the middle will not topple when they are pushed. Hence, during our
images in Fig. 11 and Fig. 12. The clutter-grasp planner experiments, we limited ourselves to objects that do not topple
succeeded in picking up the target object in all five scenes when they are pushed. A quasi-static analysis, similar to
shown in Fig. 11, and failed to plan grasps in the two scenes the one we use for pushing in this paper, can be made to
shown in Fig. 12. The two shown failures are representative analyze the conditions for toppling. Lynch [13] shows that the
of the most typical failure modes seen by the clutter-grasp conditions for toppling can be reduced to geometric conditions,
planner (not including object detection failure): if objects start giving limits on the height the object is pushed. A similar
out in contact with each other, or if there is no way to shove reasoning can be incorporated into our planner, and objects
obstacle objects aside without their hitting other objects or that are likely to topple can be avoided by treating them as
static obstacles such as shelf walls, the clutter-grasp planner unmovable obstacles.
will not find any grasps. Also, if there is uncertainty in the
Jamming and contact forces. Moving rigid objects in con-
object pose, as in this case, both the push-grasp planner and
strained environments will eventually lead to jamming, and
the clutter-grasp planner require a clear space behind the target
thus require control of contact forces. In this respect, a recent
object so that it can be pushed into the hand; if there is no
study [10] is complimentary to our work: while shoving
such space, then no grasps will be found.
obstacles or the target object, force monitoring can be useful
Videos of these and other real-world examples of the clutter- to ensure that object-sliding is continuing as expected, and that
grasp planner being used to grasp objects in cluttered scenes objects are not stuck on unseen barriers.
can be seen here: www.cs.cmu.edu/%7Emdogar/RSS2012/
Reconfiguration planning. High degrees of clutter in very
V. C ONCLUSION AND F UTURE W ORK constrained spaces may require clever sequencing of reconfig-
In this paper, we propose a clutter-centric perspective to uration actions. Several algorithms have been proposed [2]–
grasp planning, where the action primitive enables simultane- [4], [6], [16], [18]–[20]. The general problem is shown to be
ous contact with multiple objects. The robot reaches for and NP-hard [21]. Our clutter-grasp planner can easily be added
grasps the target while simultaneously contacting and moving as an additional action primitive in such planners, and by
obstacles, in a controlled manner, in order to clear the desired allowing several objects to be manipulated simultaneously,
path. Rather than shying away from complex and sustained would hopefully reduce both the planning and execution time
interactions with the world while grasping, the robot uses them required.
to its advantage. We compare our approach to other planners
R EFERENCES
that avoid contact with clutter, and show that our planner is
able to find more grasps in a given scene, and also that grasps [1] D. Baraff. Issues in computing contact forces for non-penetrating rigid
bodies. Algorithmica, 10(April 1991):292–352, 1993.
that interact with the environment succeed in more scenes. [2] O. Ben-Shahar and E. Rivlin. Practical pushing planning for rearrange-
Finally, we validate these findings by successfully executing ment tasks. IEEE Transactions on Robotics and Automation, 14:549–
such grasps on both randomly-generated scenes instantiated in 565, 1998.
[3] O. Ben-Shahar and E. Rivlin. To push or not to push: on the
real life, and also on scenes for which the input comes from rearrangement of movable objects by a mobile robot. IEEE Transactions
existing object recognition results from real sensor data. on Systems, Man, and Cybernetics, Part B, 28(5):667–679, 1998.
General physics-based analysis of robot interaction with [4] P. Chen and Y. Hwang. Practical path planning among movable obsta-
cles. In IEEE International Conference on Robotics and Automation,
complex scenes is still an open problem, especially when pages 444–449, 1991.
many scenarios must be evluated in a short period of time. [5] M. Dogar and S. Srinivasa. Push-Grasping with Dexterous Hands. In
Under a set of constraints, we have shown how we can make IEEE/RSJ International Conference on Intelligent Robots and Systems,
pages 2123–2130, October 2010.
this problem tractable for certain scenarios. Relaxing these [6] M. Dogar and S. Srinivasa. A Framework for Push-Grasping in Clutter.
constraints, or removing them altogether, can further extend In Robotics: Science and Systems VII, 2011.

63
(a) (b) (c) (d) (e)
Fig. 11. A wide range of scenes and objects for which our planner succeeds by moving obstacles in a controlled manner. Each column shows one grasping
task. For each column, top image: Initial scene; middle image: Scene as seen by the robot, showing an acquired point cloud superimposed with object
recognition results (red point object outlines) and evaluated grasping directions (arrows); bottom image: Final result of successful grasp execution.

Sensing By Pushing Using Tactile Feedback. In IEEE/RSJ International


Conference on Intelligent Robots and Systems, pages 416–421, 1991.
[13] K. M. Lynch. Toppling Manipulation. In IEEE/RSJ International
Conference on Intelligent Robots and Systems, pages 152–159, 1999.
[14] K. M. Lynch and M. T. Mason. Stable Pushing: Mechanics, Con-
trollability, and Planning. International Journal of Robotics Research,
15(6):533–556, 1996.
[15] M. T. Mason. Mechanics and Planning of Manipulator Pushing Opera-
tions. International Journal of Robotics Research, 5(3):53–71, 1986.
[16] M. H. Overmars, D. Nieuwenhuisen, A. Frank, and H. Overmars. An
effective framework for path planning amidst movable obstacles. In
In International Workshop on the Algorithmic Foundations of Robotics,
pages 87–102, 2006.
[17] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: An Efficient
Alternative to SIFT or SURF. In International Conference on Computer
Vision, pages 2564–2571, 2011.
[18] M. Stilman and J. J. Kuffner. Planning among movable obstacles with
(a) (b) artificial constraints. In In International Workshop on the Algorithmic
Foundations of Robotics, pages 1–20, 2006.
Fig. 12. Examples of cases where our planner cannot find a solution. In [19] M. Stilman, J.-U. Schamburek, J. Kuffner, and T. Asfour. Manipulation
(a), multiple objects start out in contact with each other and object-object planning among movable obstacles. In IEEE International Conference
interaction during pushing is inevitable. In (b), insufficient space behind the on Robotics and Automation, pages 3327–3332, 2007.
target object prevents the execution of sufficiently long pushes. [20] J. P. van den Berg, M. Stilman, J. Kuffner, M. C. Lin, and D. Manocha.
Path planning among movable obstacles: A probabilistically complete
approach. In In International Workshop on the Algorithmic Foundations
of Robotics, pages 599–614, 2008.
[7] C. Goldfeder, M. Ciocarlie, H. Dang, and P. Allen. The columbia [21] G. Wilfong. Motion planning in the presence of movable obstacles.
grasp database. In IEEE International Conference on Robotics and In Proceedings of the Fourth Annual Symposium on Computational
Automation, pages 1710–1716, Kobe, Japan, 05 2009. Geometry, pages 279–288, 1988.
[8] S. Goyal, A. Ruina, and J. Papadopoulos. Planar sliding with dry friction. [22] Willow Garage. rviz wiki page. http://www.ros.org/wiki/rviz, June 2012.
Part 1. Limit surface and moment function. Wear, (143):307–330, 1991.
[9] R. D. Howe and M. R. Cutkosky. Practical Force-Motion Models
for Sliding Manipulation. International Journal of Robotics Research,
15(6):557–572, 1996.
[10] A. Jain, M. D. Killpack, A. Edsinger, and C. C. Kemp. Manipulation
in clutter with whole-arm tactile sensing, 2011. Under review.
[11] T. Lozano-Perez and P. H. Winston. Lama: A language for automatic
mechanical assembly. In Proceedings of the 5th International Joint
Conference on Artificial Intelligence, pages 710–716, 1977.
[12] K. Lynch, H. Maekawa, and K. Tanie. Manipulation And Active

64
FFT-Based Terrain Segmentation for Underwater
Mapping
B. Douillard*, N. Nourani-Vatani*, M. Johnson-Roberson*, S. Williams*, C. Roman, O. Pizarro*, I. Vaughn, G. Inglis
* The Australian Centre for Field Robotics, The University of Sydney, Australia
 Department of Ocean Engineering, The University of Rhode Island, USA
[email protected]

Abstract—A method for segmenting three-dimensional scans of


underwater unstructured terrains is presented. Individual terrain
scans are represented as an elevation map and analysed using fast
Fourier transform (FFT). The segmentation of the ground surface
is performed in the frequency domain. The lower frequency
components represent the slower varying undulations of the
underlying ground whose segmentation is similar to de-noising
/ low pass filtering. The cut-off frequency, below which ground (a) KnidosH sub-map 6 (b) KnidosH sub-map 16
frequency components are selected, is automatically determined
using peak detection. The user can specify a maximum admissible
size of objects (relative to the extent of the scan) to drive the
automatic detection of the cut-off frequency. The points above the
estimated ground surface are clustered via standard proximity
clustering to form object segments. The approach is evaluated
using ground truth hand labelled data. It is also evaluated for
registration error when the segments are fed as features to an (c) Pipewreck sub-map 13 (d) Pipewreck sub-map 41
alignment algorithm. In both sets of experiments, the approach is
Fig. 1. Sub-maps which were manually labelled for the evaluation (colours mapped
compared to three other segmentation techniques. The results to height). The oblong shapes in the top row correspond to amphora on a shipwreck
show that the approach is applicable to a range of different site. The cylindrical shape in the bottom row correspond to pipe segments found on
terrains and is able to generate features useful for navigation. another shipwreck site. The data sets are further detailed in Sec. V-A. The extent of the
top sub-maps are     ; the extent of the bottom sub-maps are     .
I. I NTRODUCTION (Best seen in color; view color version online at http://roboticsproceedings.
org/rss08/p09.html)
This paper presents a method for segmenting 3D point
clouds representing unstructured/natural terrains. The ap- segmentation process is fast since a single DFT represents the
proach is applied here to underwater bathymetric point cloud bulk of the calculation. It also does not require any training.
data collected as successive range/angle scans using a down- The output of the process is an estimated ground surface.
ward looking structured light laser system on a moving ve- The notion of ground surface as used in this paper is to be
hicle [17]. Examples of such 3D scans are shown in Fig. 1. understood as the background terrain undulations. Segments
Fig. 2 shows an overview of one of the data sets used here. are formed from the non-ground points above this surface by
The obtained scan segments are used as key areas (by analogy applying a standard voxel based clustering (i.e. clustering by
to key points) in a feature based alignment process to register direct connectivity of non-empty voxels). These segments are
overlapping scans. The core of the segmentation method lies in used as features (or key areas) for alignment.
a novel ground extraction process. The terrain is represented as Underwater platforms are increasingly being used for high-
an elevation map which is interpreted as a 2D (discrete) signal resolution mapping tasks in scientific, archaeological, indus-
and analysed using discrete Fourier transform (DFT). In the trial and defence applications. Accurate 3D scan registration is
frequency domain, the lower frequency components represent critical to high resolution map building using dead reckoning
the slower varying undulations of the underlying ground and navigation measurements. The footprint/aperture and effective
their extraction is similar to denoising/low pass filtering of range of underwater sensors is very limited relative to aerial
the overall signal. The cut-off frequency, below which the and terrestrial equivalents. In many cases the survey areas
frequency components associated with the ground are selected, contain a mix of man-made structures (e.g. shipwrecks, pipes)
is automatically determined. This is done using peak detection and natural/unstructured seafloor (e.g. algae, corals, sand,
on the frequency signal together with the insight that a given canyons) that do not satisfy typical assumptions such as the
frequency directly maps to a feature size in the spatial domain. presence, type or density of features in 3D data observed from
A user can also specify a maximum admissible objects size terrestrial and aerial robots.
(relative to the extent of the scan or as a metric unit) to This paper provides experimental evidence for the appli-
drive the automatic detection of the cut-off frequency. The cability of the proposed DFT based segmentation method to

65
the evaluation is only done on road segments, or visually. On
the contrary, the evaluation presented here is performed on
scans with point-wise labels assigned manually. All of the
mentioned algorithms reason locally and attempt to identify
boundaries between objects in the scene while the approach
proposed here reasons globally across the whole depth image
by analysing its overall frequency content to find underlying
larger scale patterns. Three segmentation techniques for 3D
scans acquired in urban environments are proposed in [6].
This suite of algorithms is able to process point clouds of
different densities but the algorithms all assume the platform
to be located on a drivable surface so that ground seeds are
available; ground seeds are not always readily available in the
underwater scans processed here (for instance see Fig. 1(b)).
Fig. 2. Overview of the KindosH shipwreck. The site is approximately 16 meters long Segmentation of underwater terrain based on acoustic and
and 10 meters wide. The dataset is further described in V-A. range data typically has focused on larger scale, lower res-
olution applications such as generating habitat maps of the
a range of underwater terrains and for its ability to produce
seafloor based on multi-beam sonar data [5]. The objective
key areas that allow accurate 3D alignment. The segmentation
in these cases is to group areas of the seafloor with similar
is first validated using hand-labelled underwater scans. It
textures (derived from bathymetry or backscatter) as represent-
is then evaluated in the context of an alignment pipeline
ing a common habitat, each segment representing hundreds
and compared to three other segmentation techniques. The
or thousands of square meters of seafloor. Another use for
proposed segmentation process has potentially a number of
segmentation is in acoustic side-scan sonar imagery as part
applications in marine robotics.
of mine-detection systems [15], where segmentation is used
to separate the returns into ground, target and shadow. While
II. R ELATED W ORK
the focus in these cases is the detection of particular man-
Segmentation has been studied for several decades and in made structures, side-scan data usually has swaths of hundreds
particular in computer vision, where it is often formulated of square meters in width and targets appear as clusters of
as graph clustering, such as in Graph Cuts [4]. Graph- relatively few pixels. Segmentation is also used to aid object
Cuts segmentation has been extended to 3D point clouds by detection and tracking for obstacle avoidance [13]. In this
Golovinskiy and Funkhouser [8] using k-nearest neighbours case the segmentation is usually applied at the individual
(KNN) to build a 3D graph with edge weights assigned sensor readings as a way of dealing with noise or reducing
according to an exponential decay in length. The method computational demands further along the processing pipeline,
requires prior knowledge on the location of the objects to be rather than considering 3D surfaces observed through multiple
segmented. scans. To our knowledge this paper is the first to address
In ground robot applications, the recent segmentation al- segmentation of ground and non-ground objects in short-range,
gorithm of P. Felzenszwalb [12] (FH) for natural images high resolution 3D point clouds from underwater scenes.
has gained popularity due to its efficiency [19, 21, 22, 24]. The contribution of this work lies in the definition of a
Zhu et al. [24] build a 3D graph with KNN while assuming novel ground extraction method for unstructured terrain. It is
the ground to be flat for removal during preprocessing. 3D based on DFTs and only requires the user to set one parameter
partitioning is then obtained with the FH algorithm. Under- which specifies the maximum size of objects (relative to the
segmentation is corrected via a posterior classification that scan extents) beyond which objects are identified as part of
includes the class “under-segmented”. Triebel et al. [22] the underlying ground.
explore unsupervised probabilistic segmentation in which the
FH algorithm is modified for range images and provides an III. 3D ALIGNMENT
over-segmentation during preprocessing. Segmentation is cast The segmentation process presented in this work is devel-
into a probabilistic inference process both in the range image oped as a preprocessing step to allow accurate alignment of
and in feature space using Conditional Random Fields. Their underwater scans. The survey of 3D alignment techniques pre-
evaluation does not involve ground truth data. Schoenberg sented by Salvi et al. [18] shows that most approaches involve
et al. [19] and Strom et al. [21] have applied the FH algorithm two main steps: coarse alignment followed by fine alignment.
to coloured 3D data obtained from a co-registered camera/laser Coarse alignment is the step this study is concerned with since
pair. The weights on the image graph are computed based on fine alignment is usually addressed with the standard Iterative
a number of features, including point distances, pixel intensity Closest Point (ICP) algorithm [1]. While this particular paper
differences and angles between surface normals estimated at focuses on dense point clouds acquired with a structured light
each 3D point. The FH algorithm is then run on a graph repre- sensor (sensors and data sets described in Sec. V-A), the long
senting either the range image or the colour image. In practice term aim of the study is to develop methods also capable of

66
registering dense point clouds (i.e. obtained from structured
light or multi-beam sonar scans) to sparser point clouds (e.g.
shipborne acoustic surveys). Point feature based alignment
techniques may not be suitable since the underlying features
often require homogeneous sampling across the scan pair. The
Spin Image [10] and the NARF [20] features, for instance,
have this requirement. An alternative to point feature based (a) KnidosH sub-map 6, ground surface (b) KnidosH sub-map 16, ground surface
approaches are approaches which attempt to match segments
of the data [7]. This makes them more directly applicable to
scan pairs of different densities. The alignment method in [7]
is used here and will be be referred to as S-ICP (‘S’ for
segmentation-based).

IV. G ROUND - OBJECT SEGMENTATION


This section describes the core contribution of the paper.
(c) KnidosH sub-map 6, segmentation (d) KnidosH sub-map 16, segmentation
A mechanism to perform ground extraction in unstructured
terrains is introduced. The proposed segmentation approach Fig. 3. Top row: examples of ground surfaces extracted with the proposed segmentation
requires preprocessing of the point cloud to form an elevation method (colours are mapped to height). The two submaps used in this example are also
shown in Fig. 1; these have an extent of about [4 m × 4 m]. Bottom row: resulting
map aligned with the DC component (the zero frequency segmentation, grey indicates ground points, and blue non-ground points (better seen in
term) of the elevation signal; this is described in Sec. IV-A. colour). The estimated ground surfaces (top) are sparser than the point clouds (bottom)
The resulting elevation map is transformed into the frequency since they are built on the elevation grid. In both cases most of the parts of the terrain
belonging to the underlying seafloor are correctly identified. This is further quantified in
domain via DFT and filtering operations are applied to its Sec. V-C.
frequency content to extract the underlying ground surface; pattern of the scans induces a significant number of empty cells
this is described in Sec. IV-B. Finally, ground and objects in a regular grid. Non-trivial interpolation mechanisms would
above the ground are separated following the process described be required to estimate the height in these cells. Alternatively,
in Sec. IV-C. non-uniform DFTs or Polar FFTs could be used [11]. This is
A. Generation of elevation maps left as future work.
The aim of the preprocessing steps presented next is to
B. Ground surface extraction
generate a discrete 2D signal on a regular grid whose DC
component is aligned with the horizontal plane. The obtained The formulation of the ground extraction process presented
2D discrete signal can then be transformed to the frequency in this section is based on the observation that the ground
domain with fast Fourier Transform (FFT). Note that the surface in a scene can be interpreted as the underlying lower
requirement of a regular grid is core to FFT [11]; this has frequency undulations where smaller and higher frequency
implications on the type of point clouds that can be processed shapes (e.g. objects on the ground) sit on. This “ground”
with the proposed approach. We consider alternative methods surface may not physically exist in all natural scenes. For
for applying these techniques to irregular scans at the end of instance, larger objects such as rocks often merge with their
this section. support at their boundary to form continuous surfaces. How-
To obtain a 2D discrete signal from the input point cloud, ever, the estimation of a separation layer—which we refer
the latter is first re-aligned so that the DC component of its to as ground—defined as the lower frequency content of
elevation is brought into the xy-plane. To do so, the sub-map the elevation signal (Figs. 3(a)-3(b)) allows us to identify
is moved to its centre of mass. It is then axis-aligned by using distinct segments above the “ground” (Figs. 3(c)-3(d)). These
a process equivalent to rotating the sub-map onto its eigen segments are then used as landmarks for navigation (see
vectors and ensuring that positive elevation is along the z- Sec. V-D). Intuitively, this segmentation process also relates to
axis. a saliency detection mechanism (and thus information theory);
In this adjusted reference frame, an elevation grid is formed however, the definition of the theoretical relationships between
by voxelisation of the point cloud. The maximum height in the two is beyond the scope of this paper. In the remainder
each column of the voxel grid is kept as the elevation. A of this paper, the word “ground” will be used to refer to both
resolution of 2 cm is used in all the experiments reported here. the separation layer estimated with the mechanisms described
Some of the cells in the elevation grid may not contain any below and the points below this layer.
points. The height in these cells is estimated via nearest neigh- 1) Frequency domain filtering: The aim of performing
bour interpolation. Interpolation in the data sets processed here filtering in the frequency domain is to identify those lower
is not likely to modify the frequency content of the original frequency components that form the ground surface. An ex-
elevation signal since empty cells represent at most 10% of ample of frequency response given by an elevation map is
the elevation grid. However, this is not true in more sparse shown in Fig. 4(a). In this section we assume that the cut-
scans, such as side-looking Velodyne scans [7] where the polar off frequency (i.e. the frequency beyond which the frequency

67
components are considered to be objects) is given. The next
section explains how this cut-off frequency is automatically 120 120

determined. 100 100

Applying a hard threshold in the frequency domain creates 80 80

20*log(|T|)

20*log(|T|)
60 60
the well known ringing effect in the reconstructed spatial 40 40

signal [14]. To avoid this ringing effect a low-pass filter with 20 20

wN wN
smooth cut-off is used to remove the higher frequency (i.e. 0
−wN
0
−wN
0 0
non-ground) components. We employ a linear phase frequency 0
wN −wN
0
wN −wN
y y
domain filter with Butterworth-like magnitude response. An- x x

other advantage of this filter is that the magnitude in the (a) Original (b) Filtered
passband region is almost constant [14], hence the ratios of Fig. 4. Bode magnitude plots of Knidos sub-map 16. (a) The original
the lower frequencies are maintained. The equation of this frequency response image and (b) the filtered frequency response image. The
Butterworth-like filter is: frequencies above the cut-off (here set to 7.99% and shown with a magenta
1 star) are suppressed by > 40 dB.
|T (d)| =  , n≥1
1−2 · dd
2·n (1) are the indices of the local region. In an eight-connected region
√ c

where T is the filter response,  = 10A/10 − 1 is the band n = 3. The constant c is a threshold defining how much larger
edge value with A being the passband variation. d is the the peak has to be compared to the values in the neighboring
distance from DC, dc is the cut-off point and n is the filter cells and is set to 0 in our implementation.
order. The filter order defines how sharp the damping is; n = 2 If the structure of the spatial domain is known [23] this
is used in our implementation. information can be used to select the correct peak and set the
The 2D frequency response of an image has the same di- cut-off frequency directly. Otherwise, the most salient peak has
mensions as the image itself. The frequencies evaluated by the to be searched for. The location of the closest peak to origin
FFT along one dimension of the image are [0, 1, 2, . . . , N2 ] · N1 is used as an initial estimate of the cut-off frequency. The
where N is the number of pixels along that dimension. In frequency components contained up to this peak may not be
the case of non-square images (such as the elevation grids enough to reconstruct the underlying ground surface. This is
processed here) the range of frequencies evaluated along each for example the case when a sub-map contains slow undulating
dimension is different (as it depends on N ). Hence, the cut- patterns or a constant curvature. Fig. 1(a) illustrates the case
off frequency is here encoded as a ratio (a unitless number of an (approximately) constant curvature where the sub-map
between [0, 1]) of the range of frequencies. This implies that is located on the base of a mound. To assess whether more
consistent filtering of objects, independent of their orientation frequency components need to be included in the estimate of
and location in the image, can be obtained in non-square the ground surface, we develop a mechanism that estimates the
images by modulating the cut-off frequency by an elliptical maximum size of an object mc for a given cut-off frequency
gain.The extent of the major and minor axis of the ellipse is and uses mc to refine the cut-off frequency. Beyond mc , an
given by: a = fc · w2 and b = fc · h2 , where fc ⊂ [0, 1] is the object has frequency components below the cut-off frequency
cut-off frequency ratio and w/h the image width/height. The and is identified as part of the ground.
2D filter with elliptical weighting is then obtained by replacing The value mc is obtained by modelling an object in the
the ratio ddc in (1) by: spatial domain as a step function. The frequency response of
 2  2  2 a step function is a sinc function [14]. Correspondingly, the
d x − xo y − yo response to a 2D step (a cube) is J1 (z)/z, where J1 (z) is
= + (2)
dc a b the first order Bessel function [9]. Knowing the size of the
where (xo , yo ) is the ellipse origin, which is the centre of cube τ , the location of the first zero crossing of the frequency
the shifted frequency response image (a shifted image is re- response is given by 2/τ . The wider the object, the closer
arranged so that the DC is in the centre of the image). An the location of the first zero crossing to the DC; which is
example of filtered frequency response image is shown in consistent with the analysis above identifying larger objects
Fig. 4(b). with lower frequency components. The first zero crossing of
2) Cut-off frequency estimation: dict The estimation of the the first order Bessel function is followed by progressively
cut-off frequency is based on a peak detection process. Peaks decreasing undulations. However, the range of frequencies up
in the frequency domain result from periodic structures in the to the first zero crossing contain more than 90% of the signal
original spatial-domain image and their radius and direction energy. Therefore, we assume that the overall shape of the
correspond to the spacing and the orientation of a periodic object is maintained when applying a low-pass filter with a cut-
element [23]. A peak, Π, is found as a local maximum: off frequency at the first zero crossing. Given this modelling,
* a cut off frequency fc > τ2 implies that an object of size τ is
1 if (Iω (j, i) − Iω (l, k)) > c
Π(j, i) = (3) included in the ground. On the other hand, if fc ≤ τ2 , an object
0 otherwise.
of size τ is partly filtered out during ground extraction.This
where Iω (j, i) indicates the (j, i)-th frequency image pixel, implies that the maximum size of objects mc beyond which
l = {j−n, j−n+1, ..., j+n} and k = {i−n, i−n+1, ..., i+n} objects are included in the ground surface can be approximated

68
as f2c . Strictly speaking, objects smaller than f2c are only partly A. Data sets
filtered out, the effect being stronger for smaller objects. The The 3D scans used here are produced by the structured
latter equation transposed from the image space to the metric light laser profile imaging device described in [17]. The basic
world gives: concept consists of a green laser sheet projected on the sea
mc = 2/fc · resg , (4) floor and a calibrated camera imaging the laser line. The
where resg is the resolution of the elevation grid. shipwreck sites were surveyed at approximately an altitude
This formula allows the ground extraction algorithm to of three meters with the vehicle traveling 10-15 cm/sec.
automatically adjust the detected cut-off frequency given a Once extracted from the collected images and projected, the
maximum object size. The ground extraction algorithm is resulting point density is approximately 7 points per square
formulated in such a way that the user provides a relative centimeter. The individual points have a range resolution better
value mu ⊂ [0, 1] which represents the maximum object size than one centimeter. The two following data sets are processed:
with respect to the extent of the sub-map. The sub-map extent (1) KnidosH, which contains 101 sub-maps; (2) Pipewreck,
e is obtained as the minimum of the extents in x and y. which contains 42 sub-maps. In each set, sub-maps contain
Once the initial peak is identified, the corresponding maxi- around 600,000 points.
mum object size is obtained using (4). If mc > mu · e, fc
is set to the next closest peak from DC. This is repeated B. Benchmark segmentation techniques
until mc ≤ mu · e. The outcome of this process is a cut- This sections presents three ground segmentation methods
off frequency fc which is selected in such a way that the which will be used as benchmarks in the evaluation of the
maximum size of objects considered as non-ground is mu · e. proposed approach.
In our implementation mu = 0.5. As an example, in the case 1) Naive ground extraction: In this first approach, the mean
of sub-map 6 from the KnidosH data set (shown in Fig. 3), height of the point cloud is averaged, and the ground is simply
the ground extraction algorithm sets fc to the second closest defined as being the set of points below the mean height. This
peak. Due to the overall constant upward curvature of this sub- approach will be referred to as the “Naive” method.
map, additional frequency components are required to model 2) MLESAC-based plane extraction: In this second ap-
the underlying ground. This means that the initial maximum proach, a plane is fitted to the point cloud and the ground
object size given by the first peak was beyond half the sub-map is defined as the points below this plane. The Point Cloud
extent (i.e. beyond mu e). Library [2] implementation of a MLESAC (Maximum Likeli-
3) Ground surface reconstruction: Once fc is defined, hood Estimation SAmple Consensus) based plane fitter is used
and filtering is applied to the frequency domain image, the in the proposed experiments. This method will be referred to
estimate of the ground surface is built by applying the inverse as the “Planar” method.
FFT. Examples of reconstructed ground surfaces are shown in 3) Grid-based ground extraction: In this third approach, the
Fig. 3. point cloud is voxelised, and the ground points are identified
as the ones contained in the bottom voxel of each column
C. Ground/object separation
of the voxel grid. The grid resolution relates, to some extent,
Given an estimated ground surface, the objects above this to the value of the cut-off frequency automatically estimated
surface are obtained by comparing the height of the points by the proposed DFT-based method. A higher resolution (i.e.
in the original point cloud, to the height of the surface. If a smaller grid separation) will result in finer details of the
a point is on or below the ground surface, it is labelled as terrain being included in the estimated ground surface, which,
ground; if it is above it is labelled as non-ground. Once the to some extend, corresponds to larger cut-off frequency in the
non-ground points are identified, they are clustered using a context of the DFT-based method. However, as developed in
standard proximity voxel based clustering: non-empty voxels Sec. IV, the Fourier formalism allows the algorithm to reason
in contact of each other are gathered in the same cluster. The about the range of scales contained in the terrain relief, and,
resulting set of clusters form segments of non-ground points. given a maximum admissible object size, it allows to automate
These can be used as landmarks for navigation as developed the selection of the relief scales relevant to the definition of
in Sec. V-D. the ground. Such reasoning is not readily applicable in a grid
V. E XPERIMENTS
based elevation segmentation and its resolution would have to
This section presents two sets of experimental results. First, be manually adjusted as a function of the data set. The grid
an evaluation of the proposed ground extraction method is resolution is set to 2cm in our experiments. This approach is
performed using ground-truth hand labelled data (Sec. V-C). referred to as the “Grid” method.
Second, the proposed method is evaluated in terms of the
alignment it leads to when the segments are fed to the S-ICP C. Comparisons to hand labelled segmentation
alignment algorithm (Sec. III). These results are presented in 1) Segmentation quality metric: The segmentation methods
Sec. V-D. In both sets of experiments, the proposed approach are compared to hand labelled segmentation to determine
is compared to three other techniques; these are described in which points are correctly or incorrectly labelled as object
Sec. V-B. All experiments are repeated on two data sets which and ground. The evaluation is performed by quantifying the
are introduced in Sec. V-A. True Positive (TP), False Positive (FP), True Negative (TN),

69
and False Negative (FN) rates in the binary classification task voxels. It will be referred to as N x. The lower the value
of identifying the classes ground and object; where Positive produced by the metric, the crisper the point cloud and in
refers to the object class and Negative to the ground class. turn the more accurate the alignment. The rationale for using
2) Results: From each of the KnidosH and Pipewreck data this metric as opposed to the ICP residual is developped in [7].
sets two sub-maps were chosen and manually labelled. For The voxel resolution used in all the experiments presented here
each point the labels ground / non-ground were assigned. is 2cm.
These sub-maps (Fig. 1) were chosen since their content is 2) Results: The result of the alignments are summarised
representative of the two data sets. in Table II. Naive, Planar and Grid refer to the techniques
The results are presented in Table I. For each segmentation described in Secs. V-B1-V-B3. FFT refers to the proposed
method the true positive and true negative rates as well as the segmentation algorithm. A number of quantities are used to
F1 score are shown. The best results in terms of the F1 score evaluate the quality of the alignments associated to each
are indicated in bold. segmentation method. The first is the mean of the variation of
From these results it is clear that the proposed method is the Nx metric (ΔN x) before and after the alignment is applied.
outperforming all the other approaches. The Planar method This is reported as a percentage in column 1. The associated
generally produced the least proportion of FNs. However, this standard deviation is given in column 2. A negative value of
came at the expensive of having the largest FPs. On the ΔN x corresponds to an improvement of the crispness of the
contrary, our DFT method produced the lowest number of point cloud. A positive value, on the contrary, corresponds
FPs while having very low FNs too resulting in significantly to the point cloud loosing in sharpness during the alignment.
higher F1 score. These results shown that this method provides This may, for instance, be due to segments being mismatched
the most consistent performance in identifying the underlying across sub-maps (i.e. a data association failure). S-ICP only
ground surface compared to the tree benchmark techniques. computes an alignment if at least Nseg are matched across
sub-maps. Nseg is specified by the user and is set to 3 in
D. Comparisons of alignment results our experiments. If less than Nseg are associated during a
This sections presents an evaluation of segmentation meth- run of S-ICP, the alignment is not computed and ΔN x is
ods in terms of the quality of the alignment they lead to set to 0. The number of sub-maps where the alignment has
when used in conjunction with the S-ICP alignment method degraded (ΔN x > 0), has not been calculated (ΔN x = 0),
(Sec. III). The proposed segmentation approach plus the three and has improved (ΔN x < 0) is reported in columns 3-5,
benchmark techniques detailed in Sec. V-B are run on each respectively. The mean and the standard deviation provided
sub-map of the two data sets. This allows to evaluate the in columns 1 and 2 are computed using only the sub-map
performance of each segmentation technique in an alignment pairs for which the alignment is computed (i.e. ΔN x > 0
pipeline. and ΔN x < 0). In the case of rather flat terrains, only small
An initial odometry-based navigation solution is provided segments will be extracted which may not be large enough
using a Doppler Velocity Log, Fiber-Optic Gyrocompass, to be accepted for processing. The minimum segment extent
and depth sensor. Sub-maps are broken before the accumu- is fixed to 2cm in our implementation. The absence of large
lated odometry error has become significant relative to the enough segments in any of the sub-maps in a pair results in the
bathymetry sensor resolution [16]. An Ultra-Short Baseline pair not being processed. These cases are counted in the last
acoustic positioning system is used to reference the survey to column of Table II. The results for each of the two data sets are
a geodetic datum, but is otherwise too inaccurate to localize separated in each cell of the table by “/”: KnidosH/Pipewreck.
the sub-maps or correct the navigation within a sub-map. The The best results in each column and for each data set are
aim of the experiments presented here is to show that pairwise indicated in bold.
alignment through accurate terrain feature segmentation can The first column of Table II shows that for both data sets
locally improve sub-map registration. The use of these local the proposed method (last row) outperforms the benchmark
pairwise registrations in a global scan alignment process (as methods by providing on average the best improvement in
in [3]) or in the online navigation algorithm of [16] is left for crispness when used in conjunction with S-ICP. The associ-
future work. ated standard deviation (column 2) is of the same order of
To define pairs of sub-maps for alignment, each sub-map magnitude as the mean, suggesting a non negligible range of
set is traversed as follows. The first sub-map s1 is paired up variations in ΔN x. This is due to the quality of the initial
with the closest neighbour sub-map (closest in terms of the navigation solution which varies across the data set and which
distance between their centroid); this sub-map then becomes is on average better in the KnidosH data set. In some cases, the
s2 . s2 is paired up with the closest sub-maps amongst the provided navigation is accurate (in particular along the linear
sub-maps not already in a pair. Each sub-map in a data set is segments of the ROV’s trajectory) and little improvement is
assigned to a pair by repeating this process. obtained in terms of point cloud sharpness. In other cases,
1) Alignment quality metric: To evaluate the quality of the in particular for areas re-visited from different transects (near
alignment the metric used here attempts to capture the crisp- loop closures), the alignment provided by S-ICP leads to a
ness of aligned scan pairs. This metric consists of voxelising more significant improvement in the point cloud crispness.
the aligned point cloud and returning the number of occupied Examples of alignment results are shown in Fig. 5. The

70
TABLE I
C OMPARISON OF SEGMENTATION METHODS WITH GROUND TRUTH (TPR/TNR/F1).
KnidosH data set Pipewrech data set
Sub-map 6 Sub-map 16 Sub-map 13 Sub-map 41
Naive 0.57 0.79 0.70 0.60 0.60 0.58 0.77 0.83 0.82 0.74 0.87 0.82
Planar 0.48 0.99 0.64 0.21 0.84 0.31 0.68 0.92 0.78 0.76 0.89 0.83
Grid 0.82 0.83 0.87 0.65 0.80 0.68 0.82 0.77 0.84 0.69 0.86 0.78
FFT 0.90 0.85 0.92 0.94 0.76 0.82 0.98 0.95 0.97 0.96 0.88 0.95
TABLE II
D ELTA CRISPNESS BEFORE / AFTER ALIGNMENT (K NIDOS H / P IPEWRECK DATA SETS )

μ (ΔN x) [%] σ (ΔN x) [%] #ΔN x > 0 #ΔN x = 0 #ΔN x < 0 #unprocessed
Naive 1.14 / Nan 2.37 / Nan 21 / 0 71 / 41 8/0 0/1
Planar 0.43 / -2.04 2.73 / 5.87 21 / 2 59 / 35 15 / 2 6/3
Grid -0.25 / -3.47 1.20 / 2.13 22 / 0 49 / 34 26 / 5 4/3
FFT -1.28 / -4.25 1.23 / 2.90 6/0 50 / 19 41 / 20 4/3

other indicators confirm the relative performance of the four point clouds with a degraded sharpness in the KnidosH data
methods. For both data sets, the proposed approach leads to set (ΔN x = 0.43).
the smallest number of incorrect alignments (column 3), to the The next step up in terms of complexity of the terrain
smallest number (modulo 1 sub-map pair) of non-aligned sub- model corresponds to the Grid method. A more accurate
map pairs (column 4), and to the largest number of improving modelling of the terrain relief obtained with the Grid method
alignments (column 5). allows improved performance with respect to the two previous
The percentages in the first column of Table II are small techniques. For both data sets, the Grid method leads to an
(in terms of their absolute values) due to the following improvement of the point cloud crispness: ΔN x = −0.25 for
reason. ΔN x (expressed in %) is obtained as: ΔN x = the KnidosH data set and ΔN x = −3.47 for the Pipewreck
N xaf ter −N xbef ore
N xaf ter · 100, where the indices “before” and “after” data set. As explained in Sec. V-B3 however, automating the
refer to before and after applying the alignment. As can be definition of a grid resolution which allows the larger scales
seen in Fig. 5, an alignment results in overlapping segments of the relief defining the underlying ground to be captured is
being moved relative to one another compared with the not trivial without resorting to a spectral type of analysis.
original, uncorrected alignment provided by the navigation The proposed FFT based segmentation method is able to
solution. The resulting change in the number of occupied perform better than the methods making assumption of a (par-
cells corresponds to a small number of points compared to tially) flat terrain (Naive and Planar) and methods requiring
the number of points in the two sub-maps. Fig. 5 gives an a resolution to be fixed a priori (Grid). It can reconstruct
intuition on how ΔN x values map to actual alignments. non-planar terrains (as illustrated in Fig. 3(c)) and reason
The Naive segmentation method is not expected to provide globally on the point cloud (modulo some formatting into
high performance but is used here to gage the merits of a an elevation grid) as opposed to reasoning only locally in a
low complexity approach. The corresponding results show that column of heights. The set of results presented in Table II
accurate alignment requires more terrain modelling than a provide experimental evidence in favour of the applicability
simple threshold based separation. The Pipewreck data set, of the proposed ground segmentation method to a range of
which, as can be seen in Figs. 5(a)-5(b), contains large sections different terrains and its ability to generate features useful for
of flat terrain. The mean of the terrain height tends to be pulled navigation.
in the flat sections, implying that large sections of flat terrain VI. C ONCLUSION
are identified as non-ground. These large segments cannot be
matched across sub-maps since their extent and shape is less This paper has introduced a method for segmenting 3D
representative of the terrain itself than of the extents of the scans of underwater unstructured terrains. The method extracts
sub-map they were extracted from. A similar behaviour can a ground surface by selecting the lower frequency components
be observed in the KnidosH data set. in the frequency domain that define the slower varying undula-
tions of the terrain. The points above the estimated ground sur-
The Planar segmentation method is well suited to the face are clustered via standard proximity clustering to form ob-
Pipewreck data in which most of the sub-maps contain non- ject segments. The experimental results show that the approach
negligible visible portions of rather flat ground (Figs. 5(a)- is applicable to a range of different terrains and is able to
5(b)). The extraction of a planar surface via a RANSAC generate segments which are useful landmarks for navigation.
method is in this case appropriate and leads to an accurate Future work will integrate segmentation based registration to
ground segmentation (see results in Sec. V-C). The application a global scan alignment process, and to online navigation
of the same method to the KnidosH data set which contains algorithms. The DFT-based ground extraction mechanism will
more diverse relief (Figs. 5(c)-5(d)) shows that simple plane also be extended to perform non-regular DFTs or polar FFTs to
extraction is not general enough to provide accurate segmenta- process polar scans that generate irregular grids, which cannot
tion / alignment in a variety of terrains. This method provides be directly processed with standard FFTs. Finally, we will
on average an improvement of the point cloud crispness in compare the proposed method to image segmentation based
the Pipewreck data set (ΔN x = −2.04), but tends to lead to approaches such as the GraphCut algorithm.

71
(a) before alignment (b) after alignment (c) before alignment (d) after alignment

Fig. 5. Examples of alignment results using the DFT ground segmentation method in conjunction with S-ICP (better seen in colour). (a) and (c), pairs of sub-maps before
alignment; (b) and (d), the same sub-maps after alignment. In the two left plots, a pair of sub-maps from the Pipewreck data set. Both the reference sub-map (i.e. the sub-map
not being aligned) and the test sub-map (the one being aligned) are shown in each sub-figure with the segments generated by the DFT-based method shown with different levels
of gray. The black segments in the aligned pair indicate the segments matched by S-ICP across the two point clouds and used to compute the alignment. As can be seen in (b),
corresponding segments are brought right on top of each other after alignment. This alignment corresponds to ΔN x = −5.4%. Note that if the alignment were to be performed
using standard (dense) ICP, the two sub-maps would be brought completely on top of each other due to the ground point dominating the cost function in the ICP optimisation.
(c)-(d) show a pair of sub-maps that belongs to the KnidosH data set. These sub-figures are organised as (a) and (b). In (d) it can be seen that the corresponding segments are
brought right on top of each other by the alignment. This alignment corresponds to ΔN x = −12.6%.

VII. ACKNOWLEDGEMENTS [13] Y Petillot, I.T. Ruiz, and D.M. Lane. Underwater vehicle ob-
This research was supported by the Australian Research Council (ARC) stacle avoidance and path planning using a multi-beam forward
through the Discovery programme, the Australian Government through the looking sonar. IEEE Jornal of Oceanic Engineering, 26(2):
SIEF programme, and by the Australian Centre for Field Robotics at the 240–251, Jan 2001.
University of Sydney. The authors would like to thank Alastair Quadros, [14] J. G. Proakis and D. G. Manolakis. Digital signal processing
Peter Morton and Vsevolod Vlaskine for valuable help with software, as - Principles, Algorithms, and Applications. Prentice Hall
well as James P. Underwood, Mitch Bryson and Donald Danserau for useful International, 3 edition, 1996.
discussions. [15] S. Reed, I.T. Ruiz, C. Capus, and Y. Petillot. The fusion of
R EFERENCES
large scale classified side-scan sonar image mosaics. IEEE
[1] P. J. Besl and H. D. McKay. A method for registration of 3-D Transactions on Image Processing, 15(7):2049–2060, Jan 2006.
shapes. IEEE Transactions on Pattern Analysis and Machine [16] C. Roman and H. Singh. A self-consistent bathymetric mapping
Intelligence (PAMI), 14(2):239–256, 1992. algorithm. Journal of Field Robotics, 24(1-2):23–50, 2007.
[2] R. Bogdan Rusu and S. Cousins. 3D is here: Point Cloud [17] C. Roman, G. Inglis, and J. Rutter. Application of structured
Library (PCL). In IEEE international conference on Robotics light imaging for high resolution mapping of underwater archae-
and automation (ICRA), Shanghai, China, May 9-13 2011. ological sites. In IEEE OCEANS, pages 1–9, Sydney, 2010.
[3] D. Borrmann, J. Elseberg, K. Lingemann, A. Nüchter, and [18] J. Salvi, C. Matabosch, D. Fofi, and J. Forest. A review of recent
J. Hertzberg. Globally consistent 3d mapping with scan match- range image registration methods with accuracy evaluation.
ing. Robotics and Autonomous Systems, 56(2):130–142, 2008. Image and Vision Computing, 25(5):578–596, 2007.
[4] Y. Boykov and G. Funka-Lea. Graph cuts and efficient and [19] J. Schoenberg, A. Nathan, and M. Campbell. Segmentation of
image segmentation. International Journal of Computer Vision, dense range information in complex urban scenes. In IEEE/RSJ
70(2):109–131, 2006. International Conference on Intelligent Robots and Systems
[5] C. J. Brown, S. J. Smith, P. Lawton, and J. T. Anderson. Benthic (IROS), 2010.
habitat mapping: A review of progress towards improved un- [20] B. Steder, R.B. Rusu, K. Konolige, and W. Burgard. NARF:
derstanding of the spatial ecology of the seafloor using acoustic 3D range image features for object recognition. In IEEE/RSJ
techniques. Estuarine, Coastal and Shelf Science, 2011. International Conf. on Intelligent Robots and Systems (IROS)
[6] B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, Workshops, 2010.
P. Morton, and A. Frenkel. On the segmentation of 3D LIDAR [21] J. Strom, A. Richardson, and E. Olson. Graph-based segmenta-
point clouds. In IEEE International Conference on Robotics tion of colored 3d laser point clouds. In IEEE/RSJ International
and Automation (ICRA), 2010. Conference on Intelligent Robots and Systems, 2010.
[7] B. Douillard, A. Quadros, P. Morton, J. P. Underwood, M. De [22] R. Triebel, J. Shin, and R. Siegwart. Segmentation and unsu-
Deuge, S. Hugosson, M. Hallström, and T. Bailey. Scan pervised part-based discovery of repetitive objects. In Robotics:
segments matching for pairwise 3D alignment. In IEEE In- Science and Systems, Zaragoza, Spain, June 2010.
ternational Conference on Robotics and Automation (ICRA), [23] B. Xu. Identifying fabric structures with fast Fourier transform
2012. techniques. Textile Research Journal, 66(8):496–506, 1996.
[8] A. Golovinskiy and T. Funkhouser. Min-cut based segmentation [24] X. Zhu, H. Zhao, Y. Liu, Y. Zhao, and H. Zha. Segmentation
of point clouds. In IEEE Internationa Conference on Computer and classification of range image from an intelligent vehicle
Vision (ICCV) Workshops, pages 39–46, 2009. in urban environment. In IEEE/RSJ International Conference
[9] B. Horn. Robot vision. The MIT Press, 1986. on Intelligent Robots and Systems (IROS), pages 1457 –1462,
[10] A. Johnson and M. Hebert. Using spin images for efficient October 2010.
object recognition in cluttered 3d scenes. IEEE Transactions
on Pattern Analysis and Machine Intelligence (PAMI), 25(1):
433–449, 1999.
[11] S. Kunis. Nonequispaced FFT - Generalisation and Inversion.
PhD thesis, Universität Lübeck, Germany, 2006.
[12] D. Huttenlocher P. Felzenszwalb. Efficient graph-based image
segmentation. International Journal of Computer Vision, 59(2):
167–181, 2004.

72
Formalizing Assistive Teleoperation
Anca D. Dragan and Siddhartha S. Srinivasa
The Robotics Institute, Carnegie Mellon University
{adragan,siddh}@cs.cmu.edu

Abstract—In assistive teleoperation, the robot helps the user


accomplish the desired task, making teleoperation easier and
more seamless. Rather than simply executing the user’s input,
which is hindered by the inadequacies of the interface, the
robot attempts to predict the user’s intent, and assists in ac-
complishing it. In this work, we are interested in the scientific
underpinnings of assistance: we formalize assistance under the
general framework of policy blending, show how previous work
methods instantiate this formalism, and provide a principled
analysis of its main components: prediction of user intent and
its arbitration with the user input. We define the prediction
problem, with foundations in Inverse Reinforcement Learning,
User Input
discuss simplifying assumptions that make it tractable, and test U Arbitration
these on data from users teleoperating a robotic manipulator Robot Action
Robot Prediction (1-α)U+ αP
under various circumstances. We propose that arbitration should
be moderated by the confidence in the prediction. Our user study P
analyzes the effect of the arbitration type, together with the
Required U range Required U range
prediction correctness and the task difficulty, on the performance P
of assistance and the preferences of users. w/o assistance with assistance

I. I NTRODUCTION {G} {G} {G}


We address the problem of teleoperating dexterous robotic
manipulators to perform everyday manipulation tasks (Fig.1). Fig. 1. (Top) The user provides an input U . The robot predicts their intent,
In direct teleoperation, the user realizes their intent, for exam- and assists them in achieving the task. (Middle) Policy blending arbitrates user
input and robot prediction of user intent. (Bottom) Policy blending increases
ple grasping the bottle in Fig.1, by controlling the robot via an the range of feasible user inputs (here, α = 0.5).
interface. Direct teleoperation is limited by the inadequacies
and noise of the interface, making tasks, especially complex the user must adapt to various locations of the goal object and
manipulation tasks, often tedious and sometimes impossible to its surrounding clutter.
achieve. In assistive teleoperation, the robot attempts to predict In our work, we stray away from predefined paths, and
the user’s intent, and augments their input, thus simplifying the instead formulate the prediction problem based on Inverse
task. Here, the robot faces two challenges when assisting: 1) Reinforcement Learning [15]–[17]. We introduce several re-
predicting what the user wants, and 2) deciding how to use ductions that make it tractable, discuss how these reductions
this prediction to assist. perform on real teleoperation data, and point out directions for
We contribute a principled analysis of these two challenges. improving prediction performance.
We introduce policy blending, which formalizes assistance In this particular situation the user might be able to specify
as an arbitration of two policies: the user’s input and the the intended goal (including the exact grasp) using other
robot’s prediction of the user’s intent. At any instant, given interfaces, like a GUI [18]. However, prediction via motion
the input, U , and the prediction, P , the robot combines is often natural, faster, and seamless, enabling a user to, for
them using a state-dependent arbitration function α ∈ [0, 1] example, easily change their mind and switch to another grasp
(Fig.1(middle)). Policy blending with accurate prediction has or object, and can complement other interfaces.
a strong corrective effect on the user input (Fig.1,bottom). Of Arbitration. Despite the diversity of methods proposed for
course, the burden is on the robot to predict accurately and assistance, from the robot completing the grasp when close
arbitrate appropriately. to the goal [9], to virtual fixtures for following paths [14], to
Prediction. Prior work in assistive teleoperation usually as- potential fields towards the goal [8], all methods can be seen as
sumes that the robot knows the user’s intent [1]–[9]. Other arbitrating user input and robot prediction. This common lens
work assumes that the user is following one of a set of for assistance enables us to analyze the factors that affect its
predefined paths or behaviors, and trains a classifier for performance, and recommend design decisions for arbitration.
prediction [10]–[14]. In many real-world scenarios, however, Prior work (detailed in Sec. II) compared more manual vs.
environments and goals change significantly, restricting the more autonomous assistance modes [4]–[6] with surprisingly
utility of fixed paths. For example, in the situation from Fig.1, conflicting results in terms of what users prefer. Rather than

73
TABLE I: P RIOR W ORK .
using autonomy as a factor, we introduce aggressiveness:
arbitration should be moderated by the robot’s confidence Method Prediction Arbitration
in the prediction, leading to a spectrum from very timid to
very aggressive assistance, from small augmentation of user [1]–[6], [18] no
input even when confident to large augmentation even when
unsure. Rather than analyzing the effect of aggressiveness (or [10], [11] predefined paths/behaviors
autonomy) alone on the performance of assistance, we conduct
a user study that analyzes how aggressiveness interacts with
new factors, like prediction correctness and task difficulty, [4], [6]–[8] no
in order to help explain the seemingly contradictory findings
from above. [12] predefined paths/behaviors
Our formalism and analysis build on machine learning,
control theory, and human-robot interaction to provide a new
[9], [20]–[22] no
understanding of assistive teleoperation. The challenges we
identify, particularly in predicting and expressing intent, do
not only form the basis of assistive teleoperation, but lie at [13] predefined paths/behaviors
the foundation of human-robot collaboration in general.
II. P RIOR W ORK [12], [14] predefined paths/behaviors

In 1963, Goertz [19] proposed manipulators for handling


radioactive material that are able to turn cranks based on [23] fixed environment, goals (2D) no
imprecise operator inputs, introducing one of the first instances
[24] fully flexible (goal+policy) (2D) no
of assistive teleoperation. Since then, research on this topic has
proposed a great variety of methods for assistance, ranging
from the robot having full control over all or some aspect of the robot is able to generate a full policy for completing
the motion [1]–[6], [10], [11], to taking control (or releasing it) the task on its own, rather than an attractive/repulsive force
at some trigger [9], [13], [20], to never fully taking control [4], or a constraint (e.g. [9], [20]). In this case, the arbitration
[6]–[8], [14]. For example, Debus et al. [3] propose that the is usually a switch from autonomous to manual, although
robot should be in full control of the orientation of a cylinder stages that trade off between the two (not fully taking control
while the user is inserting it into a socket. In [9], the robot but still correcting the user’s input) are also possible [25].
takes over to complete the grasp when close enough to the Arbitration as a linear blend has also been proposed for
target. Crandal et al. [7] propose to mix the user input with a unmanned ground vehicles [25], and outside the teleoperation
potential field in order to avoid obstacles. domain for mediating between two human input channels [26].
Attempts to compare different modes of assistance are Analyzing assistance based on how arbitration is done,
sometimes contradictory. For example, You and Hauser [4] together new factors like prediction correctness and task
found that for a complex motion planning problem in a difficulty, helps explain previously contradictory findings: our
simulated environment, users preferred a fully autonomous results show that aggressive assistance is preferable on hard
mode, where they only clicked on the desired goal, to more tasks, like the ones from [4], where autonomy is significantly
reactive modes of assistance. On the other hand, Kim et al. more efficient; opinions are split on easier tasks, like the
[5] found that users preferred a manual mode and not the ones from [5], where the autonomous and manual mode were
autonomous one for manipulation tasks like object grasping. comparable in terms of time to completion.
Policy blending provides a unifying view of assistance, The same table shows how prior methods handle prediction
leading to an analysis which helps conciliate these differences. of the user’s intent. Aside from work that classifies which one
Table I shows how various methods proposed arbitrate user in- of a predefined set of paths or behaviors the user is currently
put and robot prediction (or simply robot policy, in cases where engaging [10], [11], most work assumes the robot has access
intent is assumed to be known). For example, potential field to the user’s intent, e.g. that it knows what object to grasp
methods (e.g. [7], [8], [12]) that help the user avoid obstacles and how (except in [22], which deals with time delays in
become blends of the user input with a policy obtained from ball catching by projecting the input forward in time using
the repulsive force field, under a constant arbitration function a minimum-jerk model). Predicting or recognizing intent has
that establishes a trade-off. Virtual fixture-based methods (e.g. received a lot of attention outside of the teleoperation domain,
[6], [12]–[14]) that are commonly used to guide the user along dating back to high-level plan recognition [27]. Predicting in-
a predefined path become blends of the user input with a policy tended motion, however, is usually again limited to classifying
that projects this input onto the path. The arbitration function behaviors, or is done in low-dimensional spaces [23], [24]. In
dictates the intensity of the fixture at every step, corresponding the following section, which presents the building blocks of
to a normalized “stiffness/compliance” gain. However, the assistance, we present the general prediction problem, along
same framework also allows for the less studied case in which with simplifying assumptions that make it tractable.

74
III. T HE C OMPONENTS OF A SSISTANCE S
In what follows, Q denotes the robot’s current configuration,
U denotes the desired next configuration inputed by the user
(e.g. through a GUI or a whole-body teleoperation interface, S
as in Fig.1, or by sending a velocity command to the robot1 ), U U
and P denotes the configuration the robot predicts it should
be at next. We denote the user’s starting input as S, and the G1 G1
trajectory of user inputs until U as ξS→U .
Each new scene has a (possibly continuous) set of accessible G2 G2
goals G, known only at runtime to both robot and user. The
robot does not know which goal the user is trying to reach. Fig. 2. Even though G1 is the closest goal to U in both cases, ξS→U
indicates that G2 is more likely in the situation on the right.
A. Prediction
The robot must predict where the user would like it to on the dot product between the user’s velocity and the direction
move next, given ξS→U , and any other cues, e.g. each goal’s to the goal). Using the principle of maximum entropy, we can
reachability, or a high-level description of the overall task. use CG to induce a probability distribution
We break this problem down into two successive steps:  over trajectories

ξ ∈ Ξ given a goal as P (ξ|G) ∝ exp −CG (ξ) , i.e. the
1) Goal Prediction where we predict the most likely goal probability of a trajectory decreases exponentially as its cost
G∗ given available data. increases [16]. Given this distribution and if the cost is additive
2) Trajectory Prediction where we predict how the user along the trajectory,
would want to move towards a predicted goal.  -  
exp −CG (ξS→U ) ξU →G exp −CG (ξU →G )
Goal Prediction. We formulate goal prediction as: P (ξS→U |G) = -  
exp −CG (ξS→G )
G∗ = arg maxP (G|ξS→U , θ) (1) ξS→G
(4)
G∈G
In low-dimensional spaces, (4) can be evaluated ex-
i.e. given ξS→U and any other available cues θ, the robot actly through soft-maximum value iteration [24]. In high-
predicts the goal G∗ that maximizes posterior probability. dimensional spaces, where this is expensive, an alternative is
Several simplifying assumptions help us to solve this prob- to approximate the integral over trajectories using Laplace’s
lem. The strongest is amnesic prediction, which ignores all in- method. First, we approximate C(ξX→Y ) by its second order
formation except the current input U : G∗ = arg maxP (G|U ). Taylor series expansion around ξX→Y ∗
= arg minC(ξX→Y ):
G∈G
ξX→Y
There are many ways to estimate P (G|U ). For example, given
a distance metric on goals d, we can assume that closer goals ∗
C(ξX→Y ) ≈ C(ξX→Y ∗
) + ∇C(ξX→Y ∗
)T (ξX→Y − ξX→Y )+
have higher probability: 1 ∗ ∗ ∗
(ξX→Y − ξX→Y )T ∇2 C(ξX→Y )(ξX→Y − ξX→Y ) (5)
G∗ = arg min d(U, G) (2) 2
G∈G ∗
Since ∇C(ξX→Y ) = 0 at the optimum, we get
Under the Euclidean metric, d(U, G) = ||U − G||, the method 
predicts the goal closest in the robot’s configuration space.    ∗

exp −C(ξX→Y ) ≈ exp −C(ξX→Y )
Under d(U, G) = ||φ(U ) − φ(G)|| (with φ denoting the
 ξX→Y
forward kinematics function), the method predicts the goal  1 ∗ ∗

closest in the robot’s workspace. exp − (ξX→Y − ξX→Y )T HX→Y (ξX→Y − ξX→Y ) (6)
ξX→Y 2
Although intuitive, amnesic prediction does suffer from its ∗
amnesia. Where the user came from is often a critical cue for with HX→Y the Hessian of the cost function around ξX→Y .
where they want to go. Prediction can also be memory-based, Evaluating the Gaussian integral leads to
 √
taking into account the trajectory ξS→U of user inputs (Fig.2):2    ∗
 2π k
exp −C(ξX→Y ) ≈ exp −C(ξX→Y ) ' (7)
G∗ = arg maxP (G|ξS→U ) = arg maxP (ξS→U |G)P (G) ξX→Y |HX→Y |
G∈G G∈G
(3) and the optimal prediction G∗ becomes
In order to compute P (ξS→U |G), we need a model of how  '
exp −CG (ξS→U ) − CG (ξU∗ →G ) |HU →G |
users teleoperate the robot to get to a goal. A possible arg max  ∗
' P (G)
assumption is that the user’s input noisily optimizes a goal- G exp −CG (ξS→G ) |HS→G |
(8)
dependent cost function CG (one that depends, as an example,
1 In the case of velocity inputs, the robot applies its motion model to obtain
If the cost function is quadratic, the Hessian is constant and
the configuration U . (8) simplifies to
2 Although considering current velocity in amnesic prediction can help with  
∗ exp −CG (ξS→U ) − CG (ξU∗ →G )
this, this information is still local. As in Fig.3, incorporating global knowledge G = arg max  ∗
 P (G) (9)
from the trajectory can be beneficial. G∈G exp −CG (ξS→G )

75
S also range from very simple (e.g. the length of the trajectory, or
a trade-off between the length and the distance from obstacles)
 SU to very complex (e.g. functions learned via Inverse Optimal
Control from users physically moving the robot on a trajectory
they would want the robot to take).
 S  S
*
G U
*
G 2 B. Arbitration
1

G1 Given U and P , the robot must decide on what to do next.


U
*
G The arbitration function α, which makes this decision, can
U
2
*
depend on a number of inputs, such as the distance to the
G2 G 1
goal or to the closest object, or even a binary switch operated
by the user. We propose a simple principle: that arbitration
must be moderated by how good the prediction is.
Fig. 3. An example user trajectory leading to two possible goals. A simple Timid vs. Aggressive. In trading off between not over-
prediction implementing (9) with a cost function based on length compares
the two situations on the right and determines that the user’s trajectory is a assisting (providing unwanted assistance) and not under-
smaller deviation from the optimal trajectory to G2 than to G1 . assisting (failing to provide needed assistance), the arbitration
lies on a spectrum: On the one hand, the assistance could be
This prediction method implements an intuitive principle: very timid, with α taking small values even when the robot
if the user appears to be taking (even in the optimistic case) is confident in its prediction. On the other hand, it could be
a trajectory that is a lot costlier than the optimal one to that very aggressive: α could take large values even when the robot
goal, the goal is likely not the intended one. does not trust the predicted policy.
In low-dimensional spaces, it is possible to learn CG from Inescapable Local Minima Do not Occur. In general, when
user teleoperation data via Maximum Entropy Inverse Rein- arbitrating between two policies, we need to guarantee that
forcement Learning [16] (or via algorithms that assume no user inescapable local minima do not occur. In our case, these are
noise, e.g. [15], [17]). In larger domains such as manipulation, states at which the arbitration results in the same state as at
simple guesses for CG can still be useful for prediction. This the previous time step, regardless of the user input.
is illustrated in Fig.3, which shows a toy problem with two Theorem 1: Let Q be the current robot configuration. De-
possible goals and a user trajectory ξS→U . Even with CG note the prediction velocity as p = P − Q, and the user input
as the sum of squared velocity magnitudes, this method still velocity as u = U − Q. Arbitration never leads to inescapable
holds an advantage over the amnesic version. When CG is the local minima, unless ∀u = 0, p = −ku for some k ≥ 0, and
1
same for all goals, the cost of ξS→U is common across goals α = k+1 (i.e. the policy is always chosen to directly oppose
and does not affect the probability, leaving only the starting the user’s input, and the arbitration is computed adversarially,
point S and the current input U as the crucial components or p = 0 and α = 1 for all user inputs).
of the trajectory so far. The comparison between G1 and G2 Proof: Assume that at time t, a local minima occurs in
leads to the correct result, because the path to G2 through the arbitration, i.e. (1 − α)(Q + u) + α(Q + p) = Q. Further

U is much shorter relative to the optimal, ξS→G 2
, than the assume that this minima is inescapable, i.e. (1 − α )(Q + u ) +

path to G1 through U is relative to ξS→G 1
. Whether this α (Q + p ) = Q, ∀u , where p and α are the corresponding
very simplified memory-based prediction is still helpful in an prediction and arbitration if u is the next user input. ⇔ (1 −
analogous problem in the real world is one of the questions α )u + α p = 0, ∀u .
we answer in our user study. Case 1: ∀u = 0, the corresponding α = 0 ⇒ p = − 1−α α u ,


A more sophisticated prediction method learns to choose ∀u = 0 ⇒ p = −ku and α = k+1 1
, with k ≥ 0 (since
G∗ by also considering θ, and using the goal probabilities as α ∈ [0, 1]) ∀u = 0. Contradiction with the problem statement.
features. From labeled data of the form (F [ξS→U , θ, G] → Case 2: ∃u = 0 s.t. the corresponding α = 0 ⇒ (1 − 0)u +
0/1) — features computed relative to G, paired with whether 0p = 0 ⇒ u = 0. Contradiction with u = 0.
G was the desired goal — a range of methods will learn to ⇒ ∃u s.t. (1 − α )(Q + u ) + α (Q + p ) = Q,
predict a goal “score” given feature values for a new situation. Therefore, with an adversarial exception, the user can always
See [28] for goal prediction based on such feature constructs. take a next action that escapes a local minimum.
Trajectory Prediction. Once the robot predicts a goal G∗ , it Evaluating Confidence. Earlier, we had proposed that the
must also compute how the user wants it to move towards G arbitration should take into account how good the prediction
from the current state Q. It can do so by computing a policy is, i.e. a measure of the confidence in the prediction, c, that
or a trajectory based on a cost function. Note that the cost correlates to prediction correctness. One way to evaluate c is to
function the robot must optimize in this stage is not necessarily assume that the closer the predicted goal gets, the more likely
the same as the cost function the human is optimizing during it becomes that it is the correct goal: c = max(0, 1 − Dd
), with
teleoperation: the idea behind assistive teleoperation is that d the distance to the goal and D some threshold past which
the robot executes what the user actually wants rather than the confidence is 0. Alternately, confidence can be defined as
what the user commands. Approximations for this function the probability assigned to the prediction. If a cost function

76
is assumed, the match between the user’s input and this cost Finally, we manipulated the aggressiveness of the assistance
should also factor in. If a classifier is used for prediction, then by changing the arbitration function, and used the distance-
such a probability is obtained through calibration [29]. based measure of confidence from Sec. III-B. As the user
makes progress towards the predicted object, the confidence
IV. A S TUDY ON A SSISTANCE
increases. We had two assistance modes, shown in Fig.4(c):
Mathematically, arbitration can be any non-adversarial func- the timid mode increases the assistance with the confidence,
tion of the robot’s confidence in its prediction, from very timid but plateaus at a maximum value, never fully taking charge.
to very aggressive. But assistive teleoperation is fundamentally On the other hand, the aggressive mode eagerly takes charge
a human-robot interaction task, and this interaction imposes as soon as the confidence exceeds a threshold.
additional requirements on arbitration: the robot must arbitrate Subject Allocation. We chose a within-subjects design, en-
in an efficient and user-preferred way. Therefore, we embarked abling us to ask users to compare the timid and aggressive
upon a user study that analyzes the effect of the aggressiveness mode on each task. Each of our 8 participants (all students, 4
of arbitration on the performance of assistance – an analysis males and 4 females) executed both modes on each of the four
that we believe must incorporate other factors, like prediction types of tasks. To avoid ordering effects, we used a balanced
correctness (users might not appreciate assistance if the robot Latin square for the task order, and balanced the order of the
is wrong) and task difficulty (users might appreciate assistance modes within each task.
if the task is very hard for them). Although this analysis is Dependent Measures. We measure the performance of as-
our primary goal, we will also to test the performance of the sistance in two ways: the amount of time each user took
simplifying assumptions from Sec. III-A on real data of users to complete the task under each condition, and each user’s
teleoperating the robot through or whole-body interface. preference for the timid vs. the aggressive mode on each
We tasked 8 users with teleoperating the robot to grasp task type (on a 7 point Likert scale where the two ends are
an object from a table, as in Fig.1. There were always two the two choices). We expect the two measures to be correlated:
graspable objects, and we gave the user, for every trial, the if an assistance mode is faster on a task, then the users will
farther of the two as goal (an analogous situation to the one also prefer it for that task. We also asked the users additional
from Fig.3). We implemented a whole-body interface that questions for each condition, about how helpful the robot was,
tracks their skeleton (OpenNI, www.openni.org), yielding an how much its motion matched the intended motion, and how
arm configuration which serves as the user input U . The robot highly they would rate the robot as a teammate.
makes a prediction of the goal and the policy to it (that Covariates. We identified the following confounds: the users’
minimizes length in configuration-space), leading to P , and initial teleoperation skill, their rating of the robot without
combines the two via the arbitration function α. assistance, and the learning effect. To control for these, users
A. Goal 1: Factors that affect assistance went though a training phase, teleoperating the robot without
assistance. This partially eliminated the learning effect and
Hypotheses. We test the following two hypotheses:
gave us a baseline for their timing and ratings. We used these
1) Main effects: Prediction correctness, task difficulty, and as covariates, together with number of tasks completed at any
aggressiveness of assistance each has a significant effect point — a measure of prior practice.
on task performance.
2) Interaction effects: Aggressive assistance performs bet- B. Goal 2: Prediction based on real teleoperation data
ter on hard tasks if the robot is right, while the timid
assistance performs better on easy task if the robot is Hypothesis. We test the following hypothesis:
wrong. 3) On tasks in which the target object is not the closest one
Manipulated Variables. We manipulated prediction correct- to the original human input configuration, replicating the
ness by using a simple, easy to manipulate goal prediction situation from Fig.3, the memory-based prediction will
method: the amnesic prediction based on workspace distance, identify the correct goal faster, yielding a higher success
which always selects the closest object. We setup wrong rate despite the simplifying assumptions it makes.
conditions at the limit of the robot being wrong yet rectifiable. Manipulated Variables. We used the amnesic prediction
We place the intended object further, guaranteeing wrong pre- during the study for its transparency, which enabled us to
diction until the user makes his preference clear by providing manipulate prediction correctness. We compared amnesic vs.
an input U closer to the correct goal. We setup right conditions memory-based prediction on the same data of the users
by explicitly informing the robot of the user’s intended goal. teleoperating the robot under the different conditions, in a
We manipulated task difficulty by changing the location post-experimental stage. For memory-based prediction, we use
of the two objects and placing the target object in an easily workspace sum squared velocity as the cost C, leading to the
reachable location (e.g. grasping the bottle in Fig.4(b) makes simplification from (9).
an easy task) vs. a location at the limit of the interface’s Dependent Measures. We took each user trajectory and
reachability (e.g. grasping the box in Fig.4(b) is a hard task). applied each of the two prediction methods at every point
This leads to four types of tasks: Easy&Right, Easy&Wrong, along. We measured the percent of time (success rate) across
Hard&Right and Hard&Wrong. the trajectory the prediction identified the correct target object.

77
(a) Hard&Right (b) Hard&Wrong (c) Timid vs. Aggressive (d) Time and Preference Comparison

Fig. 4. From left to right: hard tasks, with the goal at the limit of the reachable area, for both right and wrong prediction, the arbitration functions, and the
results of the study.

V. A NALYSIS AND D ISCUSSION User Preferences. Fig.4(d) also shows the users’ preferences
on each task, which indeed correlated to the timing results
Our first goal with this study was to identify the effect of
(Pearson’s r(30) = .66, p < .001). The outliers were users
different factors on the performance of assistance, and we
with stronger preferences than the time difference would
do so in the following sections. Our secondary goal was to
indicate. For example, some users strongly preferred the timid
analyze two simplistic prediction methods (an amnesic and
mode on Hard&Wrong tasks, despite the time difference not
a memory-based one) on teleoperation data under different
being as high as with other users. The opposite happened on
assistance modes. We discuss our findings in Sec. V-B.
Hard&Right tasks, on which some users strongly preferred
the aggressive mode despite a small time difference, com-
A. Arbitration
menting that they appreciated the precision of the autonomy.
Teleoperation Timing. The average time per task was ap- On Easy&Right tasks, the opinions were split and some
proximately 28s. We performed a factorial repeated-measures users preferred the timid mode despite a slightly longer time,
ANOVA with Bonferroni corrections for multiple comparisons motivating that they felt more in control of the robot. Despite
and a significance threshold of p = 0.05, which resulted in a the other measures (helpfulness, ranking as a teammate, etc.)
good fit of the data (R2 = 0.66). In line with our first hypothe- strongly correlating to the preference rating (r(30) > .85,
sis, we found main effects for all three factors: hard tasks took p < .001), they provided similar interesting nuances. For
22.9s longer than easy ones (F (1, 53) = 18.45, p < .001), example, the users that preferred the aggressive mode on
tasks where the policy was wrong took 30.1s longer than when Easy&Right tasks because they liked having control of the
right (F (1, 53) = 31.88, p < .001), and the aggressive mode robot were willing to admit that the aggressive mode was more
took overall 19.4s longer than the timid (F (1, 53) = 13.2, helpful. On the other hand, we also encountered users that
p = .001). We found a significant interaction effect between preferred the aggressive mode, and even users that followed
aggressiveness and correctness, showing that when wrong, the robot’s motion while aggressive, not realizing that they
being timid is significantly better than being aggressive. This were not in control and finding the motion of the robot to
is confirmed in Fig.4(d), which compares the means and match their own very well (i.e. the predicted policy P matched
standard errors on each task: the timid mode is better on both what they intended, resulting in seamless teleoperation).
Easy&Wrong and Hard&Wrong. The timid mode performed Overall, although difference in timing is a good indicator
about the same on Easy&Right, and, as expected, worse on of the preference, it does not capture a user’s experience
Hard&Right (the time taken for aggressive is smaller than for in its entirety. First, some users exaggerate the difference in
timid for every user). Surprisingly, the interaction effect among preferences. Second, some users prefer the timid mode despite
all factors was only marginally significant (F (1, 53) = 2.63, it being slightly less efficient. Third, assistance shouldn’t just
p = .11). We believe that increasing our user pool would be quick – it should also be intent-transparent. Our users
strengthen this effect. commented that “Assistance is good if you can tell that [the
To conclude based on this regression that the timid mode is robot] is doing the right thing”.
overall better would be misleading, because it would assume Detecting “Aggressive and Wrong”. Being aggressive and
that the robot is wrong in 50% of the tasks (in general, wrong results in large penalties in time and user preference.
either by predicting he wrong goal, or by computing a motion Fortunately, it is also a state that can easily be identified and
that, for example, collides with an unseen obstacle). Our data remedied by the user. Because prediction affects the user’s
indicates that the aggressive mode is overall more efficient if behavior, when the robot eagerly starts heading towards the
the robot is wrong in less than 16% of the cases. However, wrong target, the user rapidly attempts to get back control
efficiency is only part of the story: as the next section points by moving against the robot’s policy. This, in turn, decreases
out, some users are more negatively affected than others by a the predictor’s confidence, causing the robot to following the
wrong robot policy. user. This state can be detected early (Fig.6) by comparing the

78
(a) Hard, Right, Aggressive (b) Hard, Wrong, Aggressive

Fig. 6. A comparison of the user input trajectories for right vs. wrong. The
graphs show the dot product between the robot’s policy and the robot’s actual Fig. 7. The user trajectory colored according to the prediction for this
velocity. artificially generated environment, the trajectory optimizing configuration
space length in gray spheres, and the one optimizing work space length in
user’s trajectory in the right and wrong case, along with the darker spheres.
dot product between the robot’s policy and its actual velocity.
optimal trajectory to G2 passes through G1 , the two goals get
B. Prediction equal probabilities until G1 along the optimal path:
Results from the Study. A factorial ANOVA on the four ma-  ∗

∗ 1 exp −C(ξS→U ) − C(ξU∗ →G1 ) 1
nipulated factors obtained clean fit of the data, with R2 = 0.9. P (G1 |ξS→U ) =  ∗
 =
Confirming our hypothesis, the factor corresponding to the Z exp −C(ξS→G1 ) Z
prediction was significant (F (1, 112) = 1020.95, p < 0.001):  ∗

memory-based prediction was significantly better at identify- ∗ 1 exp −C(ξS→U ) − C(ξU∗ →G2 ) 1
P (G2 |ξS→U ) =  ∗
 =
ing the correct goal for a longer amount of time along the Z exp −C(ξS→G2 ) Z
trajectory. The assistance being timid also made it significantly
easier for the prediction method to output the correct answer When a small deviation arises, it decreases P (G1 | . . . )
(F (1, 112) = 7.62, p < 0.01): users are not as precise about more, and the method predicts G2 . Even humans are often
the motion in the aggressive mode, making prediction more confused in such situations for example, when interacting in a
difficult. Fig.5(a) shows the means and standard errors for crowded room. We move towards someone at the back of the
the memory-based prediction vs. the amnesic one, for each room in order to speak with them but are often intercepted
assistance type. Fig.5(b) and 5(c) compare the two methods along the way by others who predict that social motion
on one of the trajectories, with arrows indicating when the incorrectly.
prediction starts being correct. While the amnesic variant only Second, given the trajectories in our data set, the method is
switches to the correct prediction at the very end, when the in some cases incorrectly biased towards the rightmost object.
user input gets closer to the correct goal, the memory-based An example is shown in Fig.7: the trajectory, collected in the
prediction is able to identify the goal much earlier. environment from Fig.5, heads towards the artificially added
In our study, the target object was always the farther one obstacle and confuses the predictor. This example should be
(to manipulate prediction correctness, i.e. create situations in taken with a grain of salt, because the user would adapt the
which the robot is wrong until the user’s input gets closer trajectory to the new environment. However, the more funda-
to the correct goal). However, this is disadvantageous to the mental problem is the incorrect model of human behavior, as
amnesic method: if the target object were the closer one, this shown by Fig.7.
method would usually be 100% accurate. In fact, the amnesic While the first issue could be addressed with priors that
success should be on average almost 0.5 higher. Neverthe- prefer closer goals in cases where the goals are “aligned”,
less, even with this boost, memory-based remains the out- the second issue calls for learning a better model of how
performer. Furthermore, prediction becomes more challenging users teleoperate. From our experience, their cost function
as the number of goals increases, changing the results above. does not solely depend on lengths or increasing distance
A Thought Experiment. Despite the good performance of the from obstacles: different users develop different strategies for
memory-based prediction in the environment configurations achieving tasks. We saw some users developing a “move-
from our experiment, we were also interested in exploring its one-joint-at-a-time” strategy, a way of dividing the problem.
limitations. We conducted a thought experiment by taking real We saw differences in trajectories stemming from obstacle
environments and their trajectories from the users, and varying avoidance. These are challenges we need to address in order
the location of the non-target object. This exposed a few failure to improve both goal prediction, as well as prediction of the
modes of the workspace length-based prediction. intended motion. Furthermore, we did not ask the users to
First, if the goals are collinear with the start, and the user move in a way that makes their intent clear to the robot. An
trajectory deviates from this line, the method is biased towards interesting future experiment would consist of analyzing how
the farther goal. We call the “null space effect” of using the prediction improves when users are trained to provide more
same C for each goal: given two goals G1 and G2 s.t. the intent-expressive inputs.

79
S S

G2 G1 G2 G1

(a) Percent of Time Prediction is Successful (b) Amnesic Prediction (c) Memory-Based Prediction

Fig. 5. A comparison of amnesic vs. memory-based prediction on data from our study.

VI. C ONCLUSION [7] J. Crandall and M. Goodrich, “Characterizing efficiency of human robot
interaction: a case study of shared-control teleoperation,” in IROS, 2002.
In this work, we presented a principled analysis of assistive [8] P. Aigner and B. McCarragher, “Human integration into robot control
teleoperation. We formalized assistance as policy blending, utilising potential fields,” in ICRA, 1997.
unifying prior work and providing common ground for fu- [9] J. Kofman, X. Wu, T. Luu, and S. Verma, “Teleoperation of a robot
manipulator using a vision-based human-robot interface,” IEEE Trans.
ture methods and comparisons of assistive teleoperation. We on Industrial Electronics, 2005.
introduced aggressiveness, prediction correctness and task [10] Y. Demiris and G. Hayes, “Imitation as a dual-route process featuring
difficulty as factors that affect assistance performance and user predictive and learning components: a biologically plausible computa-
tional model,” in Imitation in animals and artifacts, 2002.
preferences, and analyzed their interaction in a user study: [11] A. H. Fagg, M. Rosenstein, R. Platt, and R. A. Grupen, “Extracting user
arbitration must take into account the robot’s confidence in intent in mixed initiative teleoperator control,” in AIAA, 2004.
itself (i.e. in the correctness of the predicted policy) and in the [12] W. Yu, R. Alqasemi, R. Dubey, and N. Pernalete, “Telemanipulation
assistance based on motion intention recognition,” in ICRA, 2005.
user (i.e. in how easy the task is for the user). The challenges [13] M. Li and A. Okamura, “Recognition of operator motions for real-time
we identified lie at the base of human-robot collaboration assistance using virtual fixtures,” in HAPTICS, 2003.
in general. The robot’s motion must make its intent clear [14] D. Aarno, S. Ekvall, and D. Kragic, “Adaptive virtual fixtures for
machine-assisted teleoperation tasks,” in IEEE ICRA, 2005.
to the user, reassuring him of the correct prediction. The [15] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforce-
user’s input must also make its intent clear to the robot, ment learning,” in ICML, 2004.
simplifying the prediction task. Furthermore, unlike for typical [16] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. Dey, “Maximum entropy
inverse reinforcement learning,” in AAAI, 2008.
intent prediction tasks, the robot’s prediction directly affects [17] N. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin plan-
the user’s behavior. This gives the learner the opportunity to ning,” in ICML, 2006.
improve its predictions by explicitly incorporating the user’s [18] A. E. Leeper, K. Hsiao, M. Ciocarlie, L. Takayama, and D. Gossow,
“Strategies for human-in-the-loop robotic grasping,” in HRI, 2012.
reaction, and even by taking purposeful information-gathering [19] R. Goertz, “Manipulators used for handling radioactive materials,”
actions to disambiguate among its hypotheses. Human factors in technology, 1963.
[20] J. Shen, J. Ibanez-Guzman, T. C. Ng, and B. S. Chew, “A collaborative-
ACKNOWLEDGMENTS shared control system with safe obstacle avoidance capability,” in RAM,
This material is based upon work supported by NSF-IIS-0916557, 2004.
[21] S. Loizou and V. Kumar, “Mixed initiative control of autonomous
NSF-EEC-0540865, ONR-YIP 2012, DARPA-BAA-10-28, and the vehicles,” in ICRA, 2007.
Intel Embedded Computing ISTC. We thank G. Gordon, J. Forlizzi, [22] C. Smith, M. Bratt, and H. Christensen, “Teleoperation for a ball-
C. Rose, A. Collet, C. Dellin, M. Cakmak, M.K. Lee, M. Vazquez, catching task with significant dynamics,” Neural Networks, vol. 21,
no. 4, pp. 604 – 620, 2008.
and the members of the Personal Robotics Lab for very helpful [23] D. Vasquez, T. Fraichard, O. Aycard, and C. Laugier, “Intentional motion
discussion and advice. on-line learning and prediction,” Machine Vision and Applications, 2005.
[24] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A.
R EFERENCES Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, “Planning-based
[1] L. Rosenberg, “Virtual fixtures: Perceptual tools for telerobotic manip- prediction for pedestrians,” in IROS, 2009.
ulation,” in Virtual Reality Annual International Symposium, 1993. [25] S. J. Anderson, S. C. Peters, K. Iagnemma, and J. Overholt, “Semi-
[2] P. Marayong, M. Li, A. Okamura, and G. Hager, “Spatial motion autonomous stability control and hazard avoidance for manned and
constraints: theory and demonstrations for robot guidance using virtual unmanned ground vehicles,” in MIT, Dept. of Mechanical Eng., 2010.
fixtures,” in ICRA, 2003. [26] S. J. Glynn and R. A. Henning, “Can teams outperform individuals in
[3] T. Debus, J. Stoll, R. Howe, and P. Dupont, “Cooperative human and a simulated dynamic control task,” Proceedings of the Human Factors
machine perception in teleoperated assembly,” in ISER, 2000. and Ergonomics Society Annual Meeting, vol. 44, no. 33, pp. 141–144,
[4] E. You and K. Hauser, “Assisted teleoperation strategies for aggressively 2000.
controlling a robot arm with 2d input,” in R:SS, 2011. [27] C. F. Schmidt and J. D’Addamio, “A model of the common-sense theory
[5] D.-J. Kim, R. Hazlett-Knudsen, H. Culver-Godfrey, G. Rucks, T. Cun- of intention and personal causation,” in IJCAI, 1973.
ningham, D. Port ande, J. Bricout, Z. Wang, and A. Behal, “How [28] A. Dragan, G. Gordon, and S. Srinivasa, “Learning from experience in
autonomy impacts performance and satisfaction: Results from a study manipulation planning: Setting the right goals,” in ISRR, 2011.
with spinal cord injured subjects using an assistive robot,” IEEE Trans. [29] J. C. Platt, “Probabilistic outputs for support vector machines and
on Systems, Man and Cybernetics, Part A: Systems and Humans, 2011. comparisons to regularized likelihood methods,” in Advances in Large
[6] P. Marayong, A. M. Okamura, and A. Bettini, “Effect of virtual fixture Margin Classifiers, 1999.
compliance on human-machine cooperative manipulation,” in IROS,
2002.

80
Reducing Conservativeness in Safety Guarantees
by Learning Disturbances Online:
Iterated Guaranteed Safe Online Learning
Jeremy H. Gillula Claire J. Tomlin
Computer Science Dept., Stanford University Electrical Engineering and Computer Sciences Dept., UC Berkeley
[email protected] [email protected]

Abstract—Reinforcement learning has proven itself to be a


powerful technique in robotics, however it has not often been
employed to learn a controller in a hardware-in-the-loop en-
vironment due to the fact that spurious training data could
cause a robot to take an unsafe (and potentially catastrophic)
action. One approach to overcoming this limitation is known as
Guaranteed Safe Online Learning via Reachability (GSOLR), in
which the controller being learned is wrapped inside another
controller based on reachability analysis that seeks to guarantee
safety against worst-case disturbances. This paper proposes a
novel improvement to GSOLR which we call Iterated Guaranteed
Safe Online Learning via Reachability (IGSOLR), in which
the worst-case disturbances are modeled in a state-dependent
manner (either parametrically or nonparametrically), this model
is learned online, and the safe sets are periodically recomputed
(in parallel with whatever machine learning is being run online Fig. 1. The quadrotor helicopter on which IGSOLR was demonstrated.
to learn how to control the system). As a result the safety of
the system automatically becomes neither too liberal nor too
conservative, depending only on the actual system behavior.
This allows the machine learning algorithm running in parallel with standard control techniques to control a robot. One reason
the widest possible latitude in performing its task while still for this, as described by Roberts et al. [13], is that as they
guaranteeing system safety. In addition to explaining IGSOLR, are being learned, natural parameterizations of controllers can
we show how it was used in a real-world example, namely that of behave poorly on systems near the edge of stability. As a result
safely learning an altitude controller for a quadrotor helicopter.
The resulting controller, which was learned via hardware-in-
spurious training data could cause a robot to take an unsafe (or
the-loop reinforcement learning, out-performs our original hand- even catastrophic) action. Thus despite their successes online
tuned controller while still maintaining safety. To our knowledge, RL algorithms are limited to being used in scenarios where
this is the first example in the robotics literature of an algorithm safety is not critical, or where a large number of trials can be
in which worst-case disturbances are learned online in order to conducted beforehand in a controlled environment to guarantee
guarantee system safety.
system safety.
I. I NTRODUCTION One approach to overcoming this limitation is to wrap the
Reinforcement Learning (RL) is a branch of machine learn- controller being learned inside another controller that seeks to
ing in which an agent attempts to learn which actions to guarantee safety; this is the approach taken by Gillula and
take in order to maximize a reward. RL has proven itself to Tomlin [5], in which we proposed a framework known as
be a powerful technique in control applications for robotics, Guaranteed Safe Online Learning via Reachability (GSOLR).
with successes ranging from flying complex maneuvers on This framework combines machine learning techniques with
an autonomous helicopter [2] to teaching a quadruped robot Hamilton-Jacobi-Isaacs (HJI) reachability analysis in a way
to jump up steps [8], to helping a robot car navigate at that prevents the system being controlled from taking an
high speeds in an unstructured environment [9]. Despite this, unsafe action, while avoiding any changes to the machine
RL has not often been employed to learn a controller in learning algorithm being used [5]. In essence, GSOLR is a
a hardware-in-the-loop environment, especially in situations least-restrictive safe controller: it allows the machine learning
where an unstable or poorly tuned controller could cause injury algorithm to control the system (e.g. via RL) except when the
to humans or damage expensive hardware [13]. In fact, most state is detected as being near a safety threshold, at which time
applications of RL in robotics have been situations in which a safety-guaranteeing control (dictated by HJI reachability
reference trajectories or cost functions were learned offline analysis) is used instead.
(either based on recorded data or a model) and were then used One of the limitations of GSOLR (as well as most HJI

81
reachability analysis) is that the computations for guaranteeing tion III describes how IGSOLR was applied to the problem of
safety are assumed to have been done offline, prior to the learning an altitude controller for a quadrotor helicopter online.
online learning phase. As a result the system is unable to incor- Section IV describes the experimental platform and illustrates
porate any new information it may learn about the disturbances the results obtained from running IGSOLR on the quadrotor.
while it is running; essentially its notion of safety cannot Finally Section V concludes the paper and elaborates on open
change based on what it experiences. Additionally, although questions and future work.
not explicitly stated in [5], it is assumed that the worst-case
disturbance is fixed over the entire state space, preventing the II. I TERATED G UARANTEED S AFE O NLINE
system from taking advantage of the fact that the range of L EARNING VIA R EACHABILITY
disturbance values may be state-dependent and thus potentially Before explaining IGSOLR, we begin with a review of
causing the safety guarantees to be too conservative. GSOLR. Due to space constraints, the review of GSOLR here
This paper overcomes these limitations by proposing an (and HJI reachability analysis in particular) is very brief and
adapted form of GSOLR known as Iterated Guaranteed Safe omits a great deal of detail. For a more in-depth explanation
Online Learning via Reachability (IGSOLR). In IGSOLR of GSOLR, as well as a discussion of its strengths and
the worst-case disturbances are modeled in a state-dependent weaknesses, we refer the reader to Gillula and Tomlin [6].
manner (either parametrically or nonparametrically) and these For an easily understandable overview of HJI reachability, we
models are learned online, causing the resulting safe sets to refer the reader to Ding et al. [4].
be periodically recomputed (in parallel with whatever machine
learning is being run online to learn how to control the system). A. GSOLR
As a result the safety of the system automatically becomes Guaranteed Safe Online Learning via Reachability seeks
neither too liberal nor too conservative, depending only on to ensure system safety using HJI reachability analysis while
the actual system behavior. This allows the machine learning simultaneously improving system performance online using
algorithm running in parallel the widest possible latitude in machine learning.
performing its task while still guaranteeing system safety. 1) Safety via HJI Reachability Analysis: HJI reachability
The proposed algorithm is demonstrated on the example of analysis [10] assumes a system model of the form ẋ =
quadrotor altitude control. The resulting controller, which was f (x, u, d), where x ∈ Rn is the system state, u is the control
learned via hardware-in-the-loop RL on the quadrotor pictured input to the system, and d is the disturbance to the system,
in Figure 1, out-performs the initial hand-tuned controller with u and d bounded to be in some sets U and D, respectively.
while still maintaining safety. d can be used to represent a wide variety of disturbances, and
Of course many other methods for guaranteeing system typically is used to account for noise in the system (both in
safety while improving performance online exist in the estimation and control), unmodeled system dynamics, and the
robotics literature. The work of Aswani et al. [3] makes use potentially stochastic nature of the underlying system, as long
of a Model Predictive Control (MPC) framework in which an all of these quantities are bounded.
a priori system model is used to verify that the constraints HJI reachability analysis assumes safety can be encoded by
in the MPC optimization problem are met (safety) while a a keep-out set K ⊂ Rn , i.e. the system is safe if x ∈/ K. By
learned model is used to evaluate the MPC cost function solving the modified HJI partial differential equation
(performance). The safety guarantees generated by this method ∂J(x, t) ∂J(x, t)
are limited to systems with linear dynamics, however. Ex- = min{0, max min f (x, u, d)} (1)
∂t u∈U d∈D ∂x
panding beyond linear systems, Steinhardt and Tedrake [14]
have developed a technique for bounding the probability of one can calculate the unsafe backwards reachable set P reτ (K)
failure over a finite time for stochastic, nonlinear systems using (given by the zero sub-level set of J(x, τ )), which is the set
exponential barrier functions. In a related manner, Perkins and of all states x such that for all possible control strategies u(·),
Barto [12] have successfully used Lyapunov design methods there exists a disturbance strategy d(·) such that x ∈ K at some
to guarantee safety for high-level controllers which learn how point within τ units of time. Thus as long as we guarantee that
to switch between a number of base-level controllers. However x always remains outside P reτ (K), we can guarantee that
the work that is most related to GSOLR (and thus IGSOLR) is there will exist control actions u that will allow the system
that of Ng and Kim [11], in which they propose an algorithm to stay safe for the next τ units of time. In particular, the
for linear dynamical systems which “monitors” controllers maximization in Equation 1 provides the optimal control u∗
suggested by a learning algorithm and rejects those that would which the system must take when the state is on the border
lead to instability. Although this is just a brief overview of of P reτ (K) in order to maintain safety.
related work, to our knowledge IGSOLR is the first example 2) Combining HJI Reachability with Machine Learning:
in the robotics literature of an algorithm in which disturbances Given this unsafe set P reτ (K) and the optimal control inputs
are learned online in order to automatically tune system safety. u∗ to take to maintain safety, GSOLR is simply a least
The rest of this paper is organized as follows. Section II restrictive safe controller [5] of the form:
briefly reviews GSOLR and then elaborates on the limitations 1) When the state x is on the border of P reτ (K), execute
described above before describing IGSOLR in detail. Sec- the optimal control action u∗ .

82
2) Otherwise, perform whatever control action the given machinery for calculating the unsafe set P reτ (K) remains
machine learning algorithm dictates. unchanged. We purposely do not specify what form D(x)
The resulting controller is guaranteed to be safe and is also should take (or even whether it should be parametric or
able to accommodate a wide variety of machine learning nonparametric) in order to allow IGSOLR to apply to as broad
algorithms, including online RL algorithms. a range of systems as possible.
The second major difference between GSOLR and IGSOLR
B. Limitations of GSOLR is that a model of the system fˆ(x, u, d) as well as D(x) is
As mentioned in the introduction, GSOLR suffers from two learned as the system runs. This model is initialized to be
weaknesses which were not described in Gillula and Tomlin conservative (in order to guarantee safety) and is then updated
[5]. The first limitation (which actually applies to most HJI based on data measured by the system (once a sufficiently
reachability analysis in the literature) is that GSOLR assumes large number of measurements have been taken to ensure
the set D describing the disturbance does not vary over the that the probability of any spurious data points adversely
state space. By not accounting for the fact that the disturbance biasing the model is sufficiently low). Periodically the most
might be smaller in some areas of the state space than others, recently learned model is then used to recalculate the unsafe
and instead using a single worst-case range over the entire state set P reτ (K), which is then used for future safety checks
space, the generated unsafe sets can be overly conservative. as described in Section II-A2. As mentioned in Gillula and
There are a variety of applications in which it makes sense to Tomlin [6], one of the weaknesses of GSOLR is that it is
model the disturbance in a state-dependent manner, including limited to systems with low state dimension due to the curse
autonomous aircraft flight, where different wind disturbances of dimensionality present in the HJI reachability calculations.
could be expected based on weather; ground vehicle control, in Unfortunately IGSOLR suffers even more from this limitation
which different terrain types (e.g. pavement, mud, etc.) could since the HJI reachable set calculations must be done in
lead to different types of disturbances; robotic arm control, a online manner. Thus until faster methods for computing
where motor noise may be different depending on the load reachable sets are developed, IGSOLR is limited to systems
on the arm or the angle of its joints; and of course quadrotor of state dimension two or three.
altitude control, which will be explained in more detail in It is important to note that because IGSOLR learns the
Section III-A. disturbances in an online manner the safety guarantees that
The second weakness of GSOLR is that it does not learn are generated will only be as good as the data on which they
new information online to update its models of the system and are based. As a result IGSOLR cannot be thought of as giving
disturbances. Instead, GSOLR uses a set of worst-case values a “true” worst-case safety guarantee, since there is always
for the disturbances which are chosen a priori by the system the possibility that a low-probability worst-case event could
designer. If these values do not match reality, however, then occur. However, it is worth noting that even with “true” worst-
the resulting safety guarantees could be either too conservative case safety guarantees (i.e. those generated by a method like
(if they are too large) in which case the system would never GSOLR) the guarantee is only as good as the a priori choice
explore parts of its state space that are in fact safe, or too of values for the worst-case disturbances. In particular, even a
liberal (if they are too small) in which case the system may “true” worst-case method is susceptible to very low probability
venture into parts of the state space that are in fact unsafe. disturbances that the system designer did not take into account
Both cases can have negative consequences, either in terms (i.e. an earthquake occurring while your autonomous vehicle is
of limiting the operating envelope of the system, or in risking driving, or a stray cosmic ray flipping a bit in your quadrotor’s
system safety. RAM [16], or other various “acts of God”). In this sense,
Of course these two limitations are intertwined. When doing no safety guarantee can ever truly be worst-case. Instead, the
HJI reachability analysis it is sometimes improper to try to point of IGSOLR is to generate an approximate worst-case
model the disturbance as varying with the system state. For safety guarantee that is neither too liberal nor too conservative,
example, in the canonical collision avoidance scenario [10] it based on a best-effort automatic analysis of the data. In this
would not make sense to vary the pursuer’s possible range light the reduction in conservativeness of the safety guarantees
of inputs based on the state of the system. Fortunately the as IGSOLR learns the disturbances should not be viewed as
sorts of situations in which it does make sense to model the diminishing the legitimacy of its safety guarantees, but instead
disturbance as varying over the state space lend themselves to strengthening them by basing them on the data gathered so as
the idea of learning the disturbance model, which is precisely to bring them more in line with the actual system.
what IGSOLR seeks to do in a simple and straightforward III. A PPLICATION TO Q UADROTOR A LTITUDE C ONTROL
manner.
A. Why Use IGSOLR?
C. IGSOLR Because IGSOLR is such a broad framework, it is helpful
The first major difference between GSOLR and IGSOLR is to illustrate how it works with an example. To that end we
that IGSOLR employs a model of the worst-case disturbances will show how it was used to learn an altitude controller for a
that varies over the state space. More specifically, we replace quadrotor vehicle. Quadrotor altitude control is ideally suited
d ∈ D in the Hamiltonian with d ∈ D(x); the rest of the to IGSOLR for a multitude of reasons, including:

83
1) As described by Waslander et al. [15] classical linear a fixed frequency, however, we must first re-write the dynamics
control techniques are unable to provide a high degree as
of performance for altitude control due to the complex (ẋi+1 − ẋi )/Δt = g + γui + d (3)
airflow interactions present in a quadrotor system. Thus
where Δt is the discrete time step.
RL is a strong candidate for learning a controller.
In order to learn the model parameters we begin with an
2) However, online RL would be unsafe to try since learn-
initial estimate for g based on gravity and the quadrotor’s mea-
ing with hardware-in-the-loop could be dangerous given
sured mass, γ based off of the specifications of the quadrotor,
that the hardware could be damaged by an unstable con-
and D(x) is initialized to a conservative constant value over
troller. Thus GSOLR is a strong candidate for ensuring
the entire state space. Then, as the quadrotor is flown under
safety while an online RL algorithm runs.
the IGSOLR controller described below, traces of state data,
3) We have good reason to believe that the range of
x0 , . . . , xr+1 , and input data, uo , . . . , ur are recorded (where
disturbances experienced by a quadrotor indoors is likely
r is the horizon over which we periodically recompute the
to be dependent on its state: near the ground the vehicle
unsafe set). We then use linear regression based  on this
−1 data to
experiences ground effect, in which air forced through
compute estimates ĝ and γ̂ by writing β̂ = X X X Y,
the propellers interacts with the ground, and at certain
where
velocity ranges the vehicle may enter vortex ring state or ⎡ ⎤ ⎡ ⎤
windmill braking state [7], causing unmodeled changes ẋ1 − ẋ0 1 u0
⎢ .. ⎥ ⎢ . .. ⎥ , β̂ = Δt ĝ .
in the dynamics. Since modeling the disturbance as Y =⎣ . ⎦ , X = ⎣ .. . ⎦ γ̂
a function of the state makes sense, we should use ẋr+1 − ẋr 1 ur
IGSOLR.1 (4)
To demonstrate IGSOLR, we implemented the algorithms To determine D(x) we first compute the residuals
described below on board an Ascending Technologies Pel- [ ε̂0 . . . ε̂r ] = Y − X β̂. We then divide the state space
ican quadrotor (pictured in Figure 1) [1]. The vehicle was into a regularly spaced grid based on x and ẋ. For each grid
flown indoors in a lab environment equipped with a VICON cell c(i, j) = [xi , xi+1 ) × [ẋj , ẋj+1 ), we calculate the mean
MX motion-capture system, which provided sub-centimeter μi,j and standard deviation σi,j of the residuals ε̂i that were
accuracy position measurements at a rate of 120 Hz. To generated by state data from that cell. Then we model
add additional disturbances (since ground effect proved to be
D(x ∈ c(i, j)) = [−3σi,j + μi,j , μi,j + 3σi,j ] (5)
too easy for the IGSOLR algorithm to compensate for) an
industrial-strength fan was positioned to blow upwards at the for grid cells for which state data was measured.
quadrotor between approximately 0.2 m and 0.8 m above the In essence this approximation ignores the 0.3% of distur-
ground, with average wind gust speeds of around 50 kph. bances which will be larger than three standard deviations
We now describe the implementation details of IGSOLR from the mean. While such an approximation is unfortunately
for the quadrotor system. Since IGSOLR treats safety and necessary (as described in Section II-C), we feel it is a valid
performance in a parallel manner we will present these two one. In particular, it is important to note that this does not
aspects of the problem setup separately. mean that our safety guarantees are only valid 99.7% of
the time: for the safety guarantee to be invalid it would
B. Safety via Reachability be necessary to encounter a continuous series of these low-
We model the quadrotor’s altitude dynamics as probability disturbances for τ straight seconds, or in our
discrete case τ /Δt time steps. Assuming the disturbances are
ẍ = g + γu + d (2)
indeed Gaussian and IID, and given our particular values for
where x is the quadrotor’s altitude and x = [ x ẋ ] is its τ and Δt, our safety guarantees have only a 5.2 × 10−53 %
complete state; g is some constant term assumed to be related probability of not holding.
to gravity and the vehicle’s mass; u ∈ U is the control input Since the quadrotor’s trajectory will not necessarily cover
to the vehicle’s rotors, measured in counts (the abstract unit every grid cell in the state space, we must find a way to
used by the low-level interface to our quadrotor); γ is a thrust generalize the estimate of the disturbance from areas of the
multiplier that converts from counts to m/s2 ; and d ∈ D(x) state space where measurements have been taken to areas
is a disturbance, due to any or all of the effects mentioned where no measurements have yet been taken. One way to
in the previous section. In our algorithm, we will attempt to generalize would be to continue to use the a priori conservative
learn g, γ, and D(x). (We assume that U is known, since the estimate of the disturbance in grid cells in which no data
quadrotor’s low-level interface automatically clips the input has been measured. However since a strong (conservative)
to a certain range of counts.) Because the system executes its disturbance just outside the initial safe set could keep that set
control and measures its state at discrete intervals according to from growing, this method could prevent the quadrotor from
ever expanding its initial safe set and thus discovering the true
1 Fortunately restricting our focus to altitude control means that the state
nature of the disturbances in nearby states.
space is two-dimensional, so it is reasonable to believe we will be able to
recompute new unsafe sets based on learned data in a semi-online fashion. To avoid this problem we make the assumption that the
Thus it is feasible to use IGSOLR. variation in the range of disturbances over the state space is

84
smooth. In essence this smoothness assumption means that 3.5
we expect disturbances in neighboring cells to be similar
in range. This assumption not only appears to be valid in 3

practice for this particular application, but makes sense for 2.5
many other applications including many of the ones mentioned
in Section II-B. Thus once we have estimates of D(x) for 2

Velocity (m/sec)
cells in which measurements were taken, we set estimates for 1.5
the cells in which no measurements were taken based on their
1
distance from cells with measurements, increasing the possible
disturbance range the further a cell is from a cell with data. 0.5

More precisely, for a cell c(k, l) in which no measurements 0


have been taken, we set
−0.5
D(x ∈ c(k, l)) = δ D(x ∈ c(i, j)) ||c(i, j) − c(k, l)||2 (6)
−1
where ||c(i, j) − c(k, l)||2 is the Euclidean distance in −0.5 0 0.5 1 1.5
Altitude (m)
2 2.5 3

' state space between cells c(i, j) and c(k, l) (i.e.


the
(xi − xk )2 + (ẋj − ẋl )2 ), c(i, j) is the cell closest to cell Fig. 2. The initial safe reachable set generated by a conservative estimate of
c(k, l) (in the Euclidean sense) in which measurements have the range of possible values for the disturbance. The safe area is the interior
been taken, D(x ∈ c(i, j)) is the disturbance range in that of the region; the unsafe area is the exterior. Note that the safe altitudes
range from 0.2 m (since the quadrotor’s origin is approximately 0.2 m above
cell, and δ is a tuning parameter that adjusts how quickly we its lowest point) and 2.5 m (just below the maximum height at which the
expect the disturbances to grow (in essence modeling how VICON cameras in our lab can accurately detect the vehicle).
quickly we expect the disturbance estimates to change over
the state space).
Finally given the full model of the disturbance, we can (PGSD) [8]. This choice was motivated by several reasons:
compute the unsafe set P reτ (K) and the optimal safe control the fact that PGSD has been demonstrated on hardware-in-the-
actions u∗ , which can then be used in future iterations of the loop systems in the past (albeit ones in which there was very
IGSOLR controller. little danger of severely damaging people or hardware); the
In summary the algorithm for learning the worst-case dis- wish to demonstrate the flexibility of IGSOLR with respect to
turbances online is: the choice of learning algorithm (i.e. to show that IGSOLR
1) Take r + 1 measurements of the state x and r measure- works even when the learning algorithm is model-free, as
ments of the input u. PGSD is); and because of PGSD’s ease of implementation.2
2) Use linear regression to generate an estimate of the We will now briefly review PGSD before describing the exper-
model of the system dynamics, given by ĝ and γ̂. imental platform itself. As with the section on HJI reachability,
3) Use ĝ, γ̂, and the measured data to generate the residuals our review of PGSD is necessarily brief; for a more detailed
ε̂i , and then use the means and standard deviations of description of the algorithm we refer the reader to Kolter and
the residuals to estimate the disturbance D(x) in cells Ng [8], from which this review is closely derived.
where measurements were taken. PGSD is a simple method for performing approximate
4) Propagate the information about the disturbance gradient descent to learn a control law for a Markov Decision
throughout the state space using Equation 6. Process (MDP). In PGSD, we begin by assuming that the cost
5) Use HJI reachability to generate the unsafe set P reτ (K) function of being in state xi and taking action ui is quadratic,
and the optimal safe control actions u∗ .
Of course an initial conservative safe set must be generated C(xt , ut ) = (xt − x∗t ) Q(xt − x∗t ) + ui Rut (7)
for use while the first data set is recorded; Figure 2 shows
this initial safe reachable set, based on conservative estimates where x∗t is the desired state of the system at time t (i.e.
of the worst-case disturbances generated by prior experience a reference trajectory), and Q and R are diagonal positive-
with the vehicle. Intuition helps explain the shape of the set: it semidefinite matrices that penalize deviation from the refer-
is safer to travel at positive velocities than negative velocities ence trajectory and control input respectively.
because the quadrotor can always command zero thrust and PGSD also assumes the control law is linear in the state
count on gravity to slow it down. Additionally, the faster the features, i.e. u = θ φ(x), where θ is a vector of parameters
vehicle is traveling in the positive direction the closer it can that PGSD will attempt to learn, and φ(x) is a vector of
be to the ground (and the further it must be from the ceiling), features that map from the state to scalar values. If we define
and vice versa.
2 It is undoubtedly true that other more advanced learning algorithms could
C. Performance via Reinforcement Learning have learned an even better altitude controller; we remind the reader that the
purpose of this example was not to advance the state of the art in quadrotor
To learn a better altitude controller for the quadrotor we altitude control, but to demonstrate how the IGSOLR framework functions,
chose to use Policy Gradient with the Signed Derivative and we believe PGSD was sufficient for that purpose.

85
the sum of one-step costs of executing the policy θ starting at
x0 over a horizon l as 3.5
IGSOLR (Training)
IGSOLR (Testing)

l Initial Controller

J(x0 , θ) = C(xt , ut ), ut = θ φ(xt ) (8) 3


t=1
2.5
then the standard approach for performing policy gradient
descent on the parameters θ is to update them according to

Cost
2

θ ← θ − α∇θ J(x0 , θ) (9)


1.5

where α is the learning rate and ∇θ J(x0 , θ) is the gradient


of the cost function with respect to θ. When computing 1

∇θ J(x0 , θ) one encounters Jacobians of the system model


with respect to the control input, i.e. terms of the form 0.5
0 2 4 6 8 10
∂(xt )i /∂(ut )j . PGSD approximates these terms with a signed Iteration (#)

derivative matrix S, in which each entry, Si,j , is +1 if


increasing the jth control will increase the ith state, or −1 Fig. 3. Average cost of running the IGSOLR controller over the training
and testing trajectories versus iteration number, along with 95% confidence
if the opposite is true. Additionally, only one entry in each intervals, averaged over ten runs. The average cost of running the initial hand-
row of S is non-zero, corresponding to whichever control has tuned controller on the testing trajectory is plotted as a horizontal line for
the most dominant effect on the given state. Given this signed comparison.
derivative matrix, PGSD then prescribes a way for computing
the approximate gradient, ∇ ˜ θ J(x, θ).
IV. E XPERIMENTAL R ESULTS
In summary the algorithm for performing RL via PGSD is:
1) Execute the policy u = θ φ(x) for l steps and record Code for running IGSOLR and PGSD was developed in
the states x0 , . . . , xl and control inputs u0 , . . . , ul . MATLAB and controlled the quadrotor at a rate of 20Hz. (One
2) Compute the approximate gradient of the cost function important note is that it was not necessary for the learning
with respect to the controls, ∇ ˜ θ J(x, θ), using the signed horizon for the reachability recalculations, r, and the learning
derivative matrix S. horizon for the RL algorithm, l, to be the same; we chose to
3) Update the parameters θ according to Equation 9. set r = 100 and l = 7 seconds.) The algorithm was run on
a training trajectory (in altitude only) that lasted 60 seconds,
Of course as with most reinforcement learning algorithms, which was designed to cover as wide a range of the vehicle’s
design choices must be made before PGSD can be successfully state space as possible. (Whenever the training trajectory
executed, including what features φ(x) to use, what initial ventured outside the learned safe set, it was clipped to remain
values θ0 to use, as well as what values to use for the cost inside.) This training trajectory was repeated ten times for a
matrices Q and R. For our experiment, we chose to use total of ten minutes of training. (To account for noise, the
⎡ ⎤
1 results below show the average of ten runs over this ten minute
⎢ max(xt − x∗t , 0) ⎥ training period.) For comparison the controllers that resulted
⎢ ⎥
⎢ min(xt − x∗t , 0) ⎥ Q = 10 0 , after each iteration of training were also run on a randomly
⎢ ⎥ 0 1
φ(xt ) = ⎢⎢ max(ẋt , 0) ⎥,
⎥ (10) generated testing trajectory. Both the learned controllers and
⎢ min( ẋ , 0) ⎥ the initial controller (which is currently the default used on our
⎢ t ⎥ R = [0],
⎣ Δt(xt − x∗t ) ⎦ quadrotors, and was hand-tuned for performance) were run on
xt the testing trajectory ten times and their results recorded.

This choice was motivated by the fact that the quadrotor’s A. Results: Performance
dynamics were likely to depend on the direction of its ve- Figure 3 shows the performance of the IGSOLR controller
locity [7], so it made sense to have different features (and versus iteration number, where each iteration corresponds to
thus different parameters) for positive and negative velocity one complete run of the training trajectory. As would be
values; similar reasoning motivated splitting the altitude error expected the reinforcement learning quickly converges to a
into positive and negative portions. Additionally, the last term much better controller than the hand-tuned one, whose cost
in φ(x) was chosen to allow the controller to depend on the (averaged over ten runs) is also plotted for comparison. Note
quadrotor’s altitude in order to account for ground effect. that even the first iteration of the IGSOLR controller does
Finally, the first term in φ(x) is intended to represent the better than the hand-tuned controller since the learning horizon
nominal thrust required to keep the quadrotor aloft, and the was much shorter than the entire trajectory, enabling the
second to last term is an approximation for integral error. system to learn within the first few seconds of the run.
Although past work has found the use of an acceleration term Figure 4 shows a sample run of the hand-tuned and IGSOLR
critical [7, 15], we did not find it necessary. controllers over one run of the testing trajectory. Despite a

86
3.5
Reference 4 4 4 4

Velocity (m/sec)

Velocity (m/sec)

Velocity (m/sec)

Velocity (m/sec)
Initial Controller
3 IGSOLR Controller 2 2 2 2
GSOLR Reference

2.5 0 0 0 0

−2 −2 −2 −2
Altitude (m)

2 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
Altitude (m) Altitude (m) Altitude (m) Altitude (m)

1.5
4 4 4 4

Velocity (m/sec)

Velocity (m/sec)

Velocity (m/sec)

Velocity (m/sec)
1
2 2 2 2

0.5 0 0 0 0

−2 −2 −2 −2
0 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
0 10 20 30 40 50 60 Altitude (m) Altitude (m) Altitude (m) Altitude (m)
Time (sec)

Fig. 4. Sample trajectory of the quadrotor’s altitude using the hand-tuned Fig. 6. Eight successive safe reachable sets learned during a typical run of
controller versus the IGSOLR controller. A reference trajectory truncated to the IGSOLR controller; the thinner lines are the traces of the state during the
be safe under GSOLR is also shown for comparison. period in which the given safe set was used, and the asterisks indicate points
along the state trace where IGSOLR used the safety-ensuring control instead
of the controller being learned.

4 Initial Safe Set


Learned Safe Set #1
Learned Safe Set #2 since the conservative disturbances are larger than the learned
Learned Safe Set #3
3
disturbances. This trend continues (the first is smaller than the
second, etc.) as the system learns that the disturbances are in
2
fact smaller than its prior estimate throughout the state space.
Velocity (m/sec)

It is worth emphasizing that had GSOLR been used instead


1 of IGSOLR, the vehicle would have been restricted to the
initial safe set pictured in Figure 5. Although additional
0 experiments showed that PGSD is able to learn a controller
with comparable performance in this restricted state space, the
−1 vehicle itself would not be able to operate safely in as wide
a range of the state space, as shown by the GSOLR reference
−2 trajectory in Figure 4 which has been truncated to always stay
0 1 2 3
Altitude (m) inside the initial safe set. This shows the major improvement
of IGSOLR over GSOLR: that by learning the disturbances
Fig. 5. The initial safe reachable set and the first three safe reachable sets online the system is able to operate in a wider area of the
learned by the IGSOLR algorithm for one of the runs. Note that under GSOLR state space.
the quadrotor would only be allowed to operate in the interior of the initial
safe set. Figure 6 shows a set of eight successive safe reachable sets
generated by the IGSOLR algorithm, along with traces of the
state during the period in which each safe set was being used.
great deal of hand-tuning, the initial controller shows dramatic As would be expected the state trace remains inside the safe set
overshoot both when ascending to a higher altitude as well at all times, with occasional use of the safety-ensuring control
as to a lesser extent when descending to a lower altitude; when the controller being learned takes the system state too
the learned IGSOLR controller does not show such dramatic close to the edge of the safe set.
errors. Additionally, the tracking performance of the IGSOLR
V. C ONCLUSION
controller appears to be better than that of the hand-tuned
controller. This paper has proposed a novel extension to GSOLR
known as Iterated Guaranteed Safe Online Learning via Reach-
B. Results: Safety ability. In IGSOLR the disturbances are modeled in a state-
Figure 5 shows the initial safe reachable set generated dependent manner and learned online, and the safe sets are
by a conservative estimate of the maximum possible range periodically recomputed (in parallel with whatever machine
of the disturbances as well as the safe reachable sets that learning is being run online to control the system). Addition-
were learned iteratively by the IGSOLR algorithm as the ally IGSOLR was demonstrated in a real-world scenario in
quadrotor explored its state space. As would be expected, which RL (in particular, PGSD) was used to learn a policy
the initial safe set is smaller than the first learned safe set for quadrotor altitude control, while HJI reachability analysis

87
guaranteed safety. PGSD was able to quickly converge to [5] Jeremy Hugh Gillula and Claire Jennifer Tomlin. Guar-
a controller which out-performed the hand-tuned controller anteed Safe Online Learning of a Bounded System.
while the safety of the system was tuned automatically to In Proc. of the IEEE/RSJ International Conference on
match the actual performance of the system, resulting in a Intelligent Robots and Systems (IROS), San Francisco,
high-performance system which avoided putting expensive CA, September 2011.
vehicle hardware at risk. [6] Jeremy Hugh Gillula and Claire Jennifer Tomlin. Guar-
In the future we hope to expand on this work in a variety anteed Safe Online Learning via Reachability : Tracking
of ways. One area that we are beginning to investigate is a Ground Target Using a Quadrotor. In Proc. of the IEEE
how to do exploration expressly for the purpose of learning International Conference on Robotics and Automation
disturbances. In the example in this paper the machine learning (ICRA), St. Paul, MN, May 2012.
running in parallel was trying to learn a better controller, [7] Gabriel M. Hoffmann, Haomiao Huang, Steven L.
however it would be interesting to use instead a controller that Waslander, and Claire Jennifer Tomlin. Quadrotor he-
attempts to explore the state space to learn more about the dis- licopter flight dynamics and control: Theory and exper-
turbances. A related idea that we are exploring is the interplay iment. In Proc. of the AIAA Guidance, Navigation, and
between learning and safety, in particular how different models Control Conference (GNC), volume 4, Hilton Head, SC,
for the disturbance can affect the learned reachable safe sets, August 2007. Citeseer.
which can then affect where in the state space the system is [8] J Zico Kolter and Andrew Y. Ng. Policy Search via the
able to explore and learn new information. Another facet of Signed Derivative. In Proc. of Robotics: Science and
this interplay that we are looking into is the question of how Systems (RSS), Seattle, WA, 2009.
to generate a controller that better synthesizes the two goals of [9] Jeff Michels, Ashutosh Saxena, and Andrew Y. Ng. High
safety and learning; instead of simply replacing the learning- Speed Obstacle Avoidance using Monocular Vision and
based control when the reachable sets dictate, we would like to Reinforcement Learning. In Proc. of the International
more closely couple the safety and learning so that the machine Conference on Machine Learning (ICML), pages 593—
learning algorithm automatically weighs safety considerations -600, Bonn, Germany, 2005.
when deciding how to control the vehicle. [10] Ian M. Mitchell, Alexandre M. Bayen, and Claire Jen-
As the future work described above indicates there is clearly nifer Tomlin. A time-dependent Hamilton-Jacobi formu-
still a great deal to be done in order to truly integrate safety lation of reachable sets for continuous dynamic games.
and learning. It is our hope that by continuing to develop tools IEEE Transactions on Automatic Control, 50(7):947–
like IGSOLR online machine learning will soon be able to be 957, July 2005. ISSN 0018-9286. doi: 10.1109/TAC.
used in safety-critical situations. 2005.851439.
[11] Andrew Y. Ng and H. Jin Kim. Stable adaptive control
VI. ACKNOWLEDGMENTS with online learning. In Proc. of Neural Information
The authors wish to thank the reviewers for their extremely Processing Systems (NIPS), number 1, 2005.
detailed, insightful, and critical reviews, which helped a great [12] Theodore J. Perkins and Andrew G. Barto. Lyapunov
deal in strengthening this paper. Additionally the authors Design for Safe Reinforcement Learning. Journal of
wish to thank the open source community for making the Machine Learning Research, 3:803–832, 2002.
experiments in this paper much easier to implement than [13] John W Roberts, Ian R Manchester, and Russ Tedrake.
would otherwise be possible. This research was funded in Feedback Controller Parameterizations for Reinforce-
part by the “Smart Adaptive Reliable Teams for Persistent ment Learning. In IEEE Symposium on Adaptive Dy-
Surveillance (SMARTS)” project administered by the ONR namic Programming and Reinforcement Learning (AD-
under MURI grant #N00014-09-1-1051. PRL), 2011.
[14] Jacob Steinhardt and Russ Tedrake. Finite-time Regional
R EFERENCES Verification of Stochastic Nonlinear Systems. In Proc. of
[1] AscTec Pelican — Ascending Technologies. URL http: Robotics: Science and Systems (RSS), June 2011.
//www.asctec.de/asctec-pelican-3/. [15] Steven L. Waslander, Gabriel M. Hoffmann, Jung Soon
[2] Pieter Abbeel, Adam Coates, and Morgan Quigley. An Jang, and Claire Jennifer Tomlin. Multi-agent quadrotor
application of reinforcement learning to aerobatic heli- testbed control design: integral sliding mode vs. rein-
copter flight. In Advances in Neural Information Pro- forcement learning. In Proc. of the IEEE/RSJ Inter-
cessing Systems 19. MIT Press, 2007. national Conference on Intelligent Robots and Systems
[3] Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, (IROS), pages 3712–3717, Edmonton, AL, 2005. Ieee.
and Claire Jennifer Tomlin. Provably Safe and Robust ISBN 0-7803-8912-3. doi: 10.1109/IROS.2005.1545025.
Learning-Based Model Predictive Control. ArXiv e- [16] J.F. Ziegler et al. IBM experiments in soft fails in com-
prints, 2011. puter electronics (1978-1994). IBM Journal of Research
[4] Jerry Ding et al. Hybrid Systems in Robotics: Toward and Development, 40(1):3–18, 1996.
Reachability-Based Controller Design. IEEE Robotics &
Automation Magazine (RAM), (September), 2011.

88
What’s in the Bag: A Distributed Approach to 3D
Shape Duplication with Modular Robots
Kyle W. Gilpin and Daniela Rus
Computer Science and Artificial Intelligence Lab
Massachusetts Institute of Technology
Cambridge, MA 02139
Email: {kwgilpin,rus}@csail.mit.edu

Abstract—Our goal is to develop an automated digital fabri-


cation process that can make any object out of smart materials.
In this paper, we present an algorithm for creating shapes by the
process of duplication, using modules we have termed smart sand.
The object to be duplicated is dipped into a bag of smart sand;
the particles exchange messages to sense the object’s shape; and
then the particles selectively form mechanical bonds with their
neighbors to form a duplicate of the original.
Our algorithm is capable of duplicating convex and concave
3D objects in a completely distributed manner. It uses O(1)
storage space and O(n) inter-module messages per module. We
perform close to 500 experiments using a realistic simulator with
over 1400 modules. These experiments confirm the functionality
and messaging demands of our distributed duplication algorithm
while demonstrating that the algorithm can be used to form
interesting and useful shapes. Fig. 1. The distributed duplication algorithm is capable of duplicating
arbitrary 3D objects like the coffee mug (left) using a collection of intelligent
I. I NTRODUCTION modules. The modules envelop and sense the shape of the original object
before forming a duplicate (right) from spare modules. Any extra modules
In this paper we present a new distributed algorithm that (white) are then brushed aside to reveal the completed object. (For full-color
versions of all figures, please see the PDF version of this paper posted at
enables the creation of complex 3D shapes from a collection of http://www.roboticsproceedings.org/rss08/p12.pdf)
intelligent robotic particles. Our algorithm enables a collection
these small smart sand grains to form arbitrary 3D shapes
through a process of distributed duplication. The modules have messages exchanged is O(n) per module, where n is the
on-board computation, nearest-neighbor communication, and number of modules in the system.
latching capabilities. Given a “bag” of smart sand, we envision We have implemented and evaluated this algorithm in
dipping a scaled replica of the object we wish to duplicate simulations with thousands of modules. The code is written
into the bag. (See Figure 1 for an example of duplicating a in C and structured such that by retargeting the lowest-level
mug.) The intelligent modules surrounding the object sense inter-module message passing and bonding functions, it could
and learn its shape. Then, using programmed communication be deployed on any modular system whose unit modules
and connections, they replicate the object using the spare can (1) form a regular cubic lattice around the object to be
modules in the bag. Once the solid replica is created, all other duplicated; (2) mechanically bond and communicate with their
inter-module connections are broken, and the user can retrieve six nearest neighbors; (3) perform on-board computation; and
the duplicate object from the bag. (4) store unique identifiers (UIDs) which are assigned during
The algorithm in this paper provides a solution to dis- fabrication.
tributed duplication of arbitrary 3D objects (i.e. both convex
and concave). The solution relies on local sensing of the II. R ELATED W ORK
boundary of the desired object and coordinated inference and While this paper is focused on algorithmic developments
planning to create a solid replica. Modules inside the replica which enable efficient, programmatic shape formation, modu-
form mechanical bonds with each other, and modules outside lar robotics [21] intimately couples hardware and algorithms.
break all their bonds to surrounding neighbors. Because the The Claytronics project envisions cylindrical [11, 10], and
algorithm is distributed and does not rely on a centralized, eventually spherical modules, covered in magnetic or elec-
external controller, the distributed algorithm provides a scal- trostatic actuators, that are capable of rolling relative to one
able solution. No module ever stores the complete goal shape another. Lipson et al. have developed a stochastic fluidic
nor the global state of the system; the memory required by assembly system consisting of cubic modules whose assembly
each module is O(1). Furthermore, the number of inter-module is controlled by pressure and suction [19, 17]. Klavins et al.

89
have developed a set of triangular tiles that circulate on an identifies the perimeter of the void in the module lattice
air table and magnetically bond with their neighbors [1]. The that is occupied by the object. The algorithm identifies the
Kilobot system [16] consists of 1000 autonomous robots that surface of the object by message passing and marks all the
use stick-slip locomotion to form planar shapes. The 45cm lattice modules on the object’s perimeter. A shifted replica
Miche cubes use mechanically switchable permanent magnets of this perimeter is created at a different location in the
to bond together in 3D shapes [4]. The RaChET system is lattice some automatically determined offset distance away
a chain of modules which, when bonded with mechanical from the original. Then, the algorithm uses a flood fill process
latches, can exert practical torques [20]. to notify all the modules within this surface that they are a
To complement the variety of hardware, many shape for- part of the duplicate object. The result is the desired one-to-
mation algorithms have also been developed. Researchers one correspondence between voids in the lattice and conjugate
have attempted to optimize the path planning process when duplicate modules. This approach works for arbitrarily com-
reconfiguring sets of connected modules [18, 2]. Others have plex surfaces, both convex and concave.
focused on encoding a desired shape in a set of rules that is
yx 0 1 2 3 4 5 6 7 8 9
executed by each module [9]. Shen et al. have developed an
approach that conveys a complete description of the desired 5 original
4 shape
shape to every module in the system [15]. While expensive
original
in terms of memory, the algorithm allows large collections of 3 border
mobile modules to form scale-invariant versions of the desired 2 duplicate
shape. Funiak et al. published a localization algorithm based border
1
on SLAM that is capable of localizing tens-of-thousands of duplicate
0 shape
irregularly packed modules [3]. Goldstein et al. envisioned an
algorithm which conveys a description of the desired shape conjugate border notification message
to all modules on exterior perimeter of a system [7]. Then,
Fig. 2. The distributed duplication algorithm works by sensing the border
by injecting or absorbing holes, the border can be expanded of the shape to be duplicated. Once the border is identified, each module on
or contracted until it matches the desired shape. Pillai et al. the border notifies a conjugate duplicate border module that is offset a fixed
developed an algorithm which enables a collection of modules distance in a given direction. With all the modules on the duplicate border
aware of their status, the algorithm notifies all modules inside the duplicate
to act as a 3D fax machine [14]. In contrast to our work, their border that they are part of the duplicate shape.
algorithm is centralized and performs almost all computation
externally. Griffith et al. developed a duplication system that Despite its relatively simple high-level description, there
mimics DNA [8]. Despite these algorithmic advances, no are many challenges when implementing the algorithm. The
solution exists if one wishes to form arbitrary 3D shapes, modules must (1) all agree on the offset distance between the
without the use of an external controller, while keeping the original and the duplicate; (2) posses a way to differentiate
memory requirements of each module constant. between bordering on the obstacle to be duplicated and the
The remainder of this paper is organized as follows. In the very exterior of the initial block of material; (3) synchronize
next section, we explain the challenges of, and our approach to, when each module’s contribution to the duplicate border is
distributed duplication. Section IV then demonstrates how we complete so as to not start disassembling prematurely; (4) be
solve the duplication problem in 2D. Section V then extends able to do all of the above while using a constant amount of
these results to 3D. In Section VI, we present over 1000 memory and a number of messages that scales favorably.
experiments and analyze the performance of the algorithm. A naive solution that considers the border as a set of
Finally, Section VII offers a brief discussion of our results. individual modules instead of a closed surface will fail. First,
if all border modules do not agree on the offset distance by
III. F ORMULATION OF D ISTRIBUTED D UPLICATION which to shift their conjugates, the conjugate border will not be
We assume that the object to be duplicated is tightly a closed surface that mirrors the border of the original. Second,
surrounded by a regular cubic lattice of modules. The user if the system starts the flood fill process before the surface
sends one module a start signal, and all of the modules form surrounding the duplicate is complete, the fill messages will
mechanical bonds with their neighbors to encase the object in escape from holes in the surface and modules that should not
a rigid block of material. Once solidified, the system senses be part of the duplicate will incorrectly remain bonded after
the geometry of the passive object in a distributed manner. For self-disassembly. Third, if some module initiates the disassem-
each location occupied by the obstacle, the algorithm identifies bly process before all modules inside the duplicate surface
a conjugate module some distance away (in a specified offset have received a fill message, some modules will disassemble
direction) that will become part of the duplicate shape. Once instead of remaining bonded as part of the duplicate.
all of these duplicate modules have been notified, they remain We developed a 2D solution [5] which addresses these
solidified when all other modules self-disassemble. challenges, but the extension to 3D is not simple. The primary
Sensing the shape of the object being duplicated in a additional challenge that the 3D algorithm must overcome is
distributed manner that scales favorably is challenging. Fig- the fact that there is no efficient, distributed way to identify the
ure 2 outlines the essence of our solution, which senses and perimeter of the passive shape. In particular, we need a dis-

90
tributed, message-based sensing algorithm that can completely the FILl message propagates, the inside bit is flipped every
envelop the 3D shape. Such an algorithm needs to identify all time the message passes through the duplicate border. As a
modules on the perimeter of the passive shape and inform result, every module inside the duplicate border will receive
those modules where to create the duplicate’s perimeter. a FILl message with the inside bit set and every module
To accomplish 3D duplication, we decompose the duplica- outside will receive the message with the bit cleared. Each
tion process into 2D sub-problems using a layered approach. module that receives a FILl message with the inside bit
The initial block of material is cut into individual planes, set sends a CONfirmation message to the obstacle leader,
and duplication proceeds semi-independently in each plane. In (whose coordinates are provided as part of the FILl mes-
each plane, the obstacle perimeter identification problem uses sage). The obstacle leader compares the number of received
the bug algorithm [13]. Any module on the perimeter of the CONfirmation messages to the area of the obstacle to
obstacle (as determined by a missing neighbor), attempts to determine when the fill phase is complete.
route a message to the unoccupied lattice location. In its futile 5) Self-Disassembly–once all modules that comprise the
attempt to reach its destination, the message circumnavigates duplicate shape have been notified of their status, the obstacle
the entire obstacle before returning to its sender. leader starts the disassembly process by flooding the system
The 3D duplication algorithm must synchronize all these with a DISassemble message. Upon receiving this message
planar processes. Concavities in the object to be duplicated all modules except those in the duplicate structure break their
need careful processing because they can create planes that bonds with the neighbors to reveal the duplicate shape.
have two or more disjoint groups of modules. To route mes- The 2D duplication algorithm uses the bug algorithm [13]
sages from one group to the other, we use recent developments for all message routing. In particular, its ability to route
in 3D geographic routing [12]. SENse and DUPlication messages along the face of the
obstacle is crucial to the algorithm’s success. A SENse mes-
IV. 2D D UPLICATION sage routed with the bug algorithm determines the obstacle’s
To understand the 3D duplication algorithm, one must perimeter by counting the number of times it collides with
understand 2D duplication. For more detail see [5]. The 2D the obstacle (see Figure 5). It also determines the obstacle’s
algorithm, (see Figure 3), is a five-step process: area by integrating on a row-wise basis. While not shown in
1) Encapsulation and Localization–the modules solidify Figure 3, modules on the exterior border of the entire assembly
around the original object and establish a coordinate system. also believe themselves to be bordering on a obstacle, so they
2) Shape Sensing / Leader Election–each module border- also send SENse messages which will circumnavigate the
ing on the original object transmits a SENse message, (which exterior border of the assembly. When one of these messages
includes the module’s hard-coded UID), that circumnavigates has completed its circuit, the area it carries will be negative,
the object using the bug algorithm. As modules forward so the system can differentiate the exterior and interior border.
these messages to their neighbors, they discard messages
with lower UIDs than their own. As a result, only a single V. 3D D UPLICATION
SENse message completes its circumnavigation and returns As illustrated in Figure 4, the key to the complete 3D
to its sender. This module is elected the obstacle leader. The duplication algorithm is to virtually cut the initial block of
returning SENse message has learned the perimeter, area, and modules into individual planes. As shown in the z = 2 plane,
bounding box of the obstacle. a single cut plane may contain multiple distinct groups of
3) Border Notification–The obstacle leader sends another modules. We call each of these groups a slice.
message around the border of the original shape which At a high level, each slice executes the basic 2D duplication
follows the same path taken by the SENse message. This process semi-independently, but the slices must synchronize
DUPlication message prompts each module bordering on and exchange information for the duplication to succeed. As a
the original object to transmit a BORder message. These result, there are unique steps in the 3D algorithm that have no
BORder messages are each addressed to a conjugate border counterpart in the 2D case. The duplication process is initiated
module that is offset, in a user-specified direction, by an auto- by sending one module on the exterior of the raw block of
matically calculated distance. The offset distance is determined material a start message specifying (1) the slice plane; (2) the
by the size of obstacle’s bounding box as learned during shape coordinate direction in which the duplicate should be formed;
sensing, and this distance is carried in the DUPlication and (3) which of the module’s faces is an exterior face of the
message. When each conjugate border module receives its initial block. This start module then assumes its position is
BORder message, it sends a CONfirmation message back (0, 0, 0) and that it has a standard orientation. Once begun,
to the obstacle leader. The obstacle leader compares the num- the 3D duplication algorithm has 10 steps:
ber of received CONfirmation messages to the perimeter of 1) Encapsulation and Localization–As in the 2D case,
the obstacle to determine when the border notification phase the modules in the system solidify around the shape to be
is complete. duplicated and exchange messages to learn their positions and
4) Shape Fill–With a closed duplicate border established, orientations relative to the start module.
the obstacle leader floods the system with a FILl message. 2) Hull Tree Construction–For 3D routing, we employ an
This message has an inside bit that is initially cleared. As algorithm [12] based on convex hull trees. Each module stores

91
3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10
4 4 4 4

3 3 3 3

2 2 2 2

1 1 1 1

(a) (b) (c) (d)

3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10
4 4 4

3 3 3

2 2 2

1 1 1

(e) (f) (g) (h)

original shape sense msg. received duplication msg. received border msg. received leader module

module included in final structure fill msg. received disassembly msg. received message (source/destination)

Fig. 3. After localization, the distributed duplication algorithm begins in (a) by routing a SENse message around the border of the obstacle. As shown
in (b), the message sent by the module with the highest unique ID (marked with an asterisk) will eventually return to its sender, prompting that module to
route a DUPlication message around the border of the obstacle (c). Upon receiving a DUPlication message, a module sends BORder message to its
conjugate that will become the border of the duplicate object. After all duplicate border modules have sent CONfirmation messages back to the obstacle
leader (d), the leader broadcasts a FILl message (e) informing modules contained by the new border that they are part of the duplicate shape and causing
them to send CONfirmation messages back to the leader, (f). Upon receiving all confirmation messages, the leader broadcasts a DISassemble message
(g) causing all modules except those in the duplicate shape to self-disassemble (h) [5].

to traversing the convex hull tree. In particular, a message


only descends into a node if that node’s convex hull contains
the message’s destination. If the message exhausts all of the
convex hulls associated with a module’s children, the message
then moves up the tree to the module’s parent. We simplify
the computation and storage requirements for the algorithm
described in [12] by distilling each convex hull into a simple
rectangular bounding box. Despite some theoretical loss in
performance, this simplification the algorithm works well.
3) Shape Sensing–Within each slice, shape sensing operates
nearly identically in 3D as it does in 2D by using the bug
algorithm to route SENse messages around any apparent
border. The only difference is that, in addition to electing
an obstacle leader, the algorithm also elects a leader for
the entire slice. This is illustrated in Figure 5. Just as each
Fig. 4. Here a 12x6x4 block of material encasing a 4x4x2 tube (transparent)
obstacle leader is the module on the border of the obstacle
is sliced along the x-y plane. We reference all coordinates to the start module with the largest UID, the slice leader is the module on the
in the lower-front-left corner. Each distinct group of modules within a slice border of the slice with the largest UID. Slice leader modules
(there are two in the z = 2 slice) is termed as slice and has a slice leader
(blue) that is always on the slice’s exterior border. Additionally, each slice
are differentiated from obstacle leaders because the SENse
has an inter-slice parent link module (green) that can be located arbitrarily. message that circumnavigates the exterior border of the slice
The arrows point from inter-slice parent link modules to their parent slices. will return to the slice leader indicating a negative area. The
Finally, each obstacle has an associated obstacle leader (red).
magnitude of this number is the actual area of the slice,
including the space consumed by any obstacles.
4) Roll Call–The new slice leader broadcasts its position
a single convex hull that encompasses its position and the to all modules in the slice. Each module then replies with a
positions of all its descendant modules in the tree. During this RoLl Call message indicating whether it has zero, one, or
step, the tree is built from the start module, which becomes the two out-of-slice-plane neighbors. Obstacle leaders supplement
root of the tree, down to the leaves. Then, starting at the leaves, their returned RoLl Call messages with the size of the
the convex hulls are constructed and propagated upward back obstacle they represent. By counting the returning RoLl
to the start module, whose convex hull holds the positions of Call messages, the slice leader can positively account for
all modules in the system. the entire area of the slice. Additionally, the slice leader learns
The 3D routing algorithm routes messages greedily, mov- how many out-of-slice-plane neighbors the slice has.
ing each message directly towards its destination whenever 5) Slice Tree Construction–The algorithm constructs a tree
possible. When blocked by an obstacle, the message switches in which each node is a slice. The root of the tree is the slice

92
yx 0 1 2 3 4 5 6 7 8 9 10 11
of another tube. This means that in addition to duplicating the
5 border of any obstacle contained within a slice, slices must
4
small slice area: -4 also duplicate their exterior borders. The algorithm makes one
small slice perimeter: 8 exception to this rule: it does not duplicate any slice’s exterior
3
large slice area: -72
large slice perimeter: 36 border if that border is also an exterior border of the entire
2
obstacle area: 16 block of material. We explain the differentiation process below.
obstacle perimeter: 16 8) Border Notification–As in the 2D case, all border
1
modules, (except modules on the exterior of the entire block of
0
material), send BORder message to their conjugate modules
/ slice leader / SENse message
/ obstacle leader / SENse message that will become the border of the duplicate shape. The con-
jugate border modules reply with CONfirmation messages
Fig. 5. Examining the z = 2 plane from Figure 4, we see that each of the that are counted by either the obstacle leader or slice leader
two slices detects its exterior border by allowing a SENse message (in blue) (depending on the type of border being duplicated–interior
from the border module with the highest UID (the slice leader) to trace the
slice’s exterior by virtually colliding with all of the missing modules. The or exterior, respectively). When the obstacle and slice leaders
larger outer slice contains an obstacle, so the obstacle leader also transmits a have received CONfirmation messages from each duplicate
SENse message (in red) that makes a complete circuit around the obstacle. border module for which they are responsible, they each
For each −x (+x) collision, the messages increment (decrement), their area
count field by their current x-coordinate (x-coordinate +1). The messages send secondary CONfirmation messages to their respective
increment their perimeter field after any collision. inter-slice parent link modules. Once the inter-slice parent link
module has received secondary CONfirmation messages
for each obstacle, the slice as a whole, and any child slices, it
containing the start module. During construction, each slice forwards a secondary CONfirmation message to its parent
knows that it has accounted for all possible children when slice. Eventually, secondary CONfirmation messages will
each of its out-of-slice-plane neighbor modules reports that it propagate to the root of the slice tree, and the root module
has a parent. This is detailed below. The slice tree is be used will know that the border notification process is complete.
to synchronize all slices after several of the following steps. 9) Shape Fill–The shape fill procedure in 3D is similar to
the process in 2D. The FILl messages still carry the inside
zx 0 1 2 3 4 5 6 7 8 9 10 11
4
bit that is toggled every time the message crosses the duplicate
3
border. The messages also need a live flag that is cleared
2 5 3
when a message crosses a slice border. Until the live flag
1 2 2 is again set–when the message crosses a border module–the
0
1 inside flag is ignored. The reason for the live flag is that an
inter-slice parent link module / parent pointer included module in one slice has no way to determine whether
inter-slice parent tree root a module in neighboring slice is also included. Ignoring
some caveats addressed below in Section V-C, the shape fill
Fig. 6. To aid inter-slice communication, the duplication algorithm forms a phase terminates just like the border notification phase. Each
tree on the slices in the initial structure. Here, we see the modules of Figure 4 included module sends a CONfirmation message to the
projected onto the x-z plane and each of the six slices outlined in a different
color. The solid green modules are inter-slice parent link modules, and they appropriate obstacle leader. Then the obstacle leader sends a
serve as connection points from a slice to its parent. Every module in a slice secondary CONfirmation message to its inter-slice parent
learns the location of its inter-slice parent link module, so a message can be link module. The inter-slice parent link module waits for
forwarded from any location, (in this case from the slice leader of the small
slice in the z = 2 plane), to the root inter-slice parent link module. this and secondary CONfirmation messages from all child
slices before propagating the secondary CONfirmation up
6) Offset Distance Consensus–The slices need to agree on the slice tree to the root.
where the duplicate shape should be placed. To do so, each 10) Self-Disassembly–Once the slice tree root receives
slice transmits the bounding box surrounding all obstacles in CONfirmation messages from all child slices, it floods the
the slice to its parent slice. Then union of all these bounding network with a DISassemble messages causing all modules
boxes propagates up the slice tree to the root slice. The slice except those forming the duplicate shape to disassemble.
tree root module can then determine the global offset necessary
to prevent a duplicate object in which the slices are skewed A. Synchronization
relative to one another. The root slice broadcasts this offset so To enable synchronization before the border notification,
that it can be incorporated into the BORder messages sent to shape fill, and disassembly steps, the algorithm needs a way
the conjugate border modules. to ensure that all slices have completed the active step. To do
7) Exterior Face Determination–The 3D duplication algo- so, the system must determine the total number of slices, so it
rithm must duplicate both the convex and concave portion of builds a tree of slices. Figure 6 shows an example. (Note: this
the original shape. For example, in Figure 4, the interior border is separate from the convex hull tree used for 3D routing.)
of the tube must be duplicated along with the exterior border The slice tree is built from the root downward. The module
or else the duplicate object will become a solid block instead originally given the start signal by the user informs its slice’s

93
leader that the leader should also be the root of the slice tree. modules to emit their own EXTerior messages. Specifically,
(This is why, in Figure 4 the module at (5, 5, 0) is both blue a message arriving in the slice plane will prompt an out-of-
and green.) Once the slice leader is told that it is also the root slice-plane message, and vice versa.
of the slice tree, it broadcasts its location to all other modules When an in-slice-plane EXTerior message completes its
in its slice. As a result, all modules in the slice learn the circuit around an exterior slice, it sends an CONfirmation
location of, what we term, their inter-slice parent link module. message to the root inter-slice parent module. This root can
Once a module knows the location of its inter-slice parent link determine when the entire process is complete because it
module, it can service incoming requests from neighboring knows, as a result of the slice tree construction step, how
slices looking for a parent in the slice tree. many exterior slices comprise the entire structure.
In the neighboring slices that are not yet incorporated in
C. Area Accounting During Shape Fill
the slice tree, all modules send parent request messages to
their out-of-slice-plane neighbors. Eventually, some out-of- One particular challenge of 3D duplication is that two
slice-plane module, (which already knows the position of its slices occupying the same plane do not know of each other’s
inter-slice parent link module), responds. The module in the existence. Consider again Figure 5. During the shape fill step,
unincorporated slice to which it responds becomes that slice’s the obstacle leader in the outer slice, expects to receive sixteen
inter-slice parent link module (after checking with its slice’s CONfirmation messages. It does not know that the inner-
leader module to ensure that no other module has already been most four of those lattice positions are not part of the duplicate
appointed as the inter-slice parent link). That is, the module is obstacle. Our solution, illustrated in Figure 7, is to send a fake
the location of the link to the parent slice. This process repeats CONfirmation message from a conjugate border module
until all slices have an inter-slice parent link module. corresponding to the leader of the interior slice.
When a slice is incorporated into the tree, the modules in the yx 0 1 2 3 4 5 6 7 8 9 10 11
slice inform all of their out-of-slice-plane neighbors that they 5
are now a part of the tree. Because each slice knows, (thanks 4
to the roll call step), how many out-of-slice-plane neighbors it 3
4
4
has, each slice can determine when all of its out-of-slice-plane
2
neighbors have been incorporated into the tree. Therefore, we
can guarantee that all slices are incorporated into the tree. 1

0
B. Exterior Face Identification
CON msg. BOR msg. rcvd. border pointer obstacle leader
When duplicating even a simple 3D shape like a coffee mug, BOR msg. included actual border slice leader
the algorithm must account for both the concave and convex
parts of the object’s border. In the case of tube, the concave, or Fig. 7. The obstacle leader of the exterior slice, expects to receive sixteen
interior part of the tube’s face will correspond to the exterior CONfirmation messages after the shape fill process beings. Although the
obstacle leader does not know it, the obstacle is hollow. To ensure that the
border of multiple slices. The algorithm must duplicate the obstacle leader still receives all CONfirmation messages, the slice leader
exterior border of these slices but not duplicate the exterior of the inner slice sends a special BORder message to its conjugate instructing
borders of slices that also serve as the exterior border of the it to confirm four additional units of area despite the fact that the conjugate
module is not included in the duplicate structure.
initial block of material. Figure 5 provides an example: the
algorithm should duplicate the exterior border of the inner 2- First, whenever an exterior border of a slice is duplicated,
by-2 slice, but it should not duplicate the exterior border of the slice leader, when sending a BORder message to its
the 12-by-6 slice that surrounds the smaller slice. We term the conjugate, includes its slice’s area. This is shown in Figure 7.
larger slice an exterior slice. Second, during the shape fill process, the slice leader’s con-
Our approach to differentiating exterior borders relies on jugate border module sends a CONfirmation message to
the bug routing algorithm. By default, all exterior border the outer slice’s obstacle leader even though it is not part of
modules assume that they should be duplicated. The start the duplicate shape. This CONfirmation message is special
module initiates the exterior face identification process because because it accounts for four units of area, not just one like
it was told, (as part of the start command), which of its faces other messages. As a result of this two-step process, the outer
was an exterior face. The start module uses the bug algorithm slice’s obstacle leader accounts for all sixteen units of area.
in an attempt to route two EXTerior messages in the
direction of the specified exterior face. The first EXTerior D. Storage and Message Scaling
message is routed in the slice plane, and the second is routed The algorithm requires only O(1) storage per module. No
orthogonal to it. Because the bug algorithm is fundamentally part of the algorithm requires a module to amass any data that
a 2D algorithm, these messages will remain in their given correlates with the number of modules in the system. (Note:
planes. These EXTerior messages will circumnavigate the we ignore the O(log n) scaling of the number of bits required
exterior border. As they pass through the modules on their to store variables whose sizes are proportional to n.) The
routes, they notify those modules that they too are on the key to this attribute is the one-to-one correspondence between
exterior of the whole block. Additionally, they prompt those modules on the border of the original shape and modules on

94
the border of the duplicate. When collecting CONfirmation active modules in the system. The seven points along the x-
messages, modules do not track the origins of the messages, axis correspond to cubes with side lengths 1—7. As expected,
only the total number received. The convex hull tree only the total number of messages scales quadratically (0.266n2 +
requires a constant amount of storage per module because 78.037n) with the number of active modules in the system.
each module needs to store the rectangular hulls of at most six Even though the n2 term will dominate past a few hundred
neighbors. Likewise, messages only move up the slice tree, so modules, we expect the number of messages per module to
each module only needs to know the location of its inter-slice scale as O(n). Figure 8(b) illustrates this: both the average
parent module. All other routing information is constant in number of messages per module and the maximum number of
size and stored in the messages themselves. messages exchanged by any given module scale linearly. It is
The number of messages exchanged scales as O(n2 ) total worth noting that the average number of messages per module
or O(n) per module, and it is dominated by the exterior is significantly less than the maximum as the size of the cube
notification, shape sensing and border notification steps. The being duplicated grows. This relationship is better illustrated
worst case message scaling occurs when the initial block of by Figure 8(c) which shows a histogram of the number of
material approaches a 1-by-n line of modules. In this case, messages exchanged by all modules in the system as it is
there will be O(n) modules sending EXTerior messages and used to duplicate cubes. When similar plots are generated for
each message will have to circumnavigate O(n) other modules the rods and tubes that we also duplicated, we see similar
before returning to its sender. The same scaling applies to behavior to that in Figure 8.
the shape sensing phase if the object being duplicated also While theory predicts that the O(n) scaling shown in
approaches a long rod: O(n) modules will each transmit a Figure 8(b) to continue as the number of active modules
SENse message and messages may travel O(n) hops before grows indefinitely, simulations containing more than few thou-
being discarded. During border notification, there will again sand modules are difficult. Each module is simulated as an
be O(n) modules sending messages that each have to travel independent process, and the modules use UDP packets to
O(n) hops before reaching their conjugates. communicate with their neighbors. Several thousand concur-
rent processes begin to tax all but the most powerful PCs.
VI. E XPERIMENTS
Additionally, given that each module uses a unique UDP port
Using a custom simulator [6], we performed over 450 to communicate with each neighbor, simulations with a few
experiments duplicating rods, cubes, square tubes, the mug thousand modules quickly exhaust the available supply of port
shown in Figure 1, a 103-module hammer, and 128-module numbers. Neither of these issues will be a concern when
wrench. The results are listed in Table I. The overall success running the algorithm on dedicated robotic hardware.
rate was 100%. A video of the system in action is linked from
the Supplementary Material section below. VII. D ISCUSSION
TABLE I We have demonstrated a distributed algorithm capable of
E XPERIMENTS SHOW THE 3D DUPLICATION ALGORITHM WORKING autonomously duplicating arbitrary 3D shapes in a modular
CORRECTLY IN A VARIETY OF TEST CASES .
robot system. The algorithm and implementation are generic
Original Encasing # # Avg. Msgs./
Shape Shape Trials Successes Trial
and lend themselves to easy hardware implementation. Ad-
2x1x1 Rod 7x3x3 Block 25 25 5409 ditionally, we have shown the algorithm running correctly in
3x1x1 Rod 9x3x3 Block 25 25 7413 hundreds of test cases with some experiments using over 1200
4x1x1 Rod 11x3x3 Block 25 25 9788 simulated modules. The total memory required in each module
5x1x1 Rod 13x3x3 Block 25 25 12170
6x1x1 Rod 15x3x3 Block 25 25 14757 is fixed while the total number of messages exchanged scales
7x1x1 Rod 17x3x3 Block 25 25 18487 as O(n2 ). We hope to reduce this bound so that the number
1x1x1 Cube 5x3x3 Block 25 25 3602 of messages exchanged per module is sub-linear.
2x2x2 Cube 7x4x4 Block 25 25 10199
3x3x3 Cube 9x5x5 Block 25 25 21768
Our algorithm opens several important problems. It is sen-
4x4x4 Cube 11x6x6 Block 25 25 41725 sitive to any dropped messages. It is not guaranteed to operate
5x5x5 Cube 13x7x7 Block 25 25 71698 if there are modules missing from the lattice. It will also not
6x6x6 Cube 15x8x8 Block 25 25 112410
7x7x7 Cube 17x9x9 Block 25 25 182720
work in completely irregular lattices where the modules are
4x4x4 Tube 11x6x6 Block 25 25 44381 randomly arranged and may have more than six neighbors. In
5x5x5 Tube 13x7x7 Block 25 25 77990 the future, we hope to develop extensions that solve both these
6x6x6 Tube 15x8x8 Block 25 25 131670
7x7x7 Tube 17x9x9 Block 25 25 230320
challenges. Additionally, we hope to eliminate the need for
Figure 1 Mug 17x7x7 Block 25 25 115020 the user to specify in which cardinal direction, with respect
Hammer 22x5x14 Block 5 5 290830 to the original object, the duplicate should be placed. Our
Wrench 20x4x18 Block 5 5 249098 work raises the question of how to create scaled duplicates or
multiple copies of a single original. Finally, if the object to be
Figure 8 shows a number of different statistics collected duplicated cannot be fit into the initial block of material, or
as we duplicated cubes with side lengths ranging from 1— if a scaled duplicate is too big to be created in one pass, we
7. In particular, Figure 8(a) plots the total number of inter- need to find a way to break the object into subcomponents that
module messages exchanged as a function of the number of can be produced independently and then joined. This paper is

95
5
x 10 Total # Msgs. to Duplicate Size−n Cube # Msgs. per Module to Duplicate Size−n Cube Histogram of # Msgs. Per Module
2 800 80
Localization/Hull Tree Average Size−7 Cube

Frequency (Number of Modules)


1.8
Sensing 700 70 Size−6 Cube
Maximum
Total # Msgs per Phase

1.6 Ext. Notification Size−5 Cube

# Msgs. Per Module


600 60
1.4 Border Notification Size−4 Cube
Shape Fill 500 50 Size−3 Cube
1.2
Disassembly Size−2 Cube
1 Total 400 40 Size−1 Cube
0.8
300 30
0.6
200 20
0.4
100 10
0.2

0 0 0
43 96 171 268 387 528 691 43 96 171 268 387 528 691 0 100 200 300 400 500 600 700 800
Active Modules in System Active Modules in System # Msgs. per Module
(a) (b) (c)

Fig. 8. When duplicating cubes with edges lengths 1—7, the total number of messages exchanged by all modules scales quadratically (a). The number of
messages per module scale linearly (b). Furthermore, only a few modules exchange a relatively high number of messages, the numbers of messages exchanged
by most modules are clustered near the average (c). In both subfigures (a) and (b), the 7 x-axis tick marks correspond to cubes with edge lengths 1—7.

a first step, and with additional research, we hope to make [10] M. E. Karagozler, S. C. Goldstein, and J. R. Reid. Stress-
distributed duplication a practical reality. driven MEMS Assembly + Electrostatic Forces = 1mm
Diameter Robot. In IROS, pages 2763–2769, 2009.
ACKNOWLEDGMENTS [11] B. T. Kirby, B. Aksak, J. D. Campbell, J. F. Hoburg, T. C.
Many thanks to our excellent reviewers. This work is sup- Mowry, P. Pillai, and S. C. Goldstein. A Modular Robotic
ported by the US Army Research Office under grant number System Using Magnetic Force Effectors. In IROS, pages
W911NF-08-1-0228, the NSF through Grants 0735953 and 2787–2793, 2007.
1138967, and the NDSEG fellowship program. [12] B. Leong, B. Liskov, and R. Morris. Geographic Routing
without Planarization. In Networked Systems Design and
S UPPLEMENTARY M ATERIAL Implementation, 2006.
[13] V. J. Lumelski and A. A. Stepanov. Dynamic path plan-
http://www.youtube.com/watch?v=XmtXIHCRN c ning for a mobile automaton with limited information on
the environment. Trans. on Automatic Control, 31(11):
R EFERENCES 1058–1063, 1986.
[1] J. Bishop, S. Burden, E. Klavins, R. Kreisberg, W. Mal- [14] P. Pillai, J. Campbell, G. Kedia, S. Moudgal, and
one, N. Napp, and T. Nguyen. Programmable Parts: K. Sheth. A 3D Fax Machine based on Claytronics. In
A Demonstration of the Grammatical Approach to Self- IROS, pages 4728–4735, 2006.
Organization. In IROS, pages 3684–3691, August 2005. [15] Michael Rubenstein and Wei-Min Shen. Automatic
[2] C. Chiang and G. S. Chirikjian. Modular Robot Motion Scalable Size Selection for the Shape of a Distributed
Planning Using Similarity Metrics. Autonomous Robots, Robotic Collective. In IROS, pages 508–513, Oct 2010.
10:91–106, 2001. [16] Mike Rubenstein, Christian Ahler, and Radhika Nagpal.
[3] S. Funiak, P. Pillai, M. P. Ashley-Rollman, J. D. Camp- Kilobot: A low cost scalable robot system for collective
bell, and S. C. Goldstein. Distributed Localization of behaviors. In ICRA, page In Press, May 2012.
Modular Robot Ensembles. IJRR, 28(8):946–961, 2009. [17] M. T. Tolley, M. Kalontarov, J. Neubert, D. Erickson,
[4] K. Gilpin, K. Kotay, D. Rus, and I. Vasilescu. Miche: and H. Lipson. Stochastic Modular Robotic Systems:
Modular Shape Formation by Self-Disassembly. IJRR, A Study of Fluidic Assembly Strategies. Trans. on
27:345–372, 2008. Robotics, 26(3):518–530, June 2010.
[5] Kyle Gilpin and Daniela Rus. A distributed algorithm for [18] J. Walter, E. Tsai, and N. Amato. Algorithms for Fast
2d shape formation with smart pebble robots. In ICRA, Concurrent Reconfiguration of Hexagonal Metamorphic
page In Press, 2012. Robots. Trans. on Robotics, 21(4):621–631, 2005.
[6] Kyle W. Gilpin. Shape Formation by Self-Disassembly in [19] P. White, V. Zykov, J. Bongard, and H. Lipson. Three
Programmable Matter Systems. PhD thesis, MIT, 2012. Dimensional Stochastic Reconfiguration of Modular
[7] S. C. Goldstein, J. Campbell, and T. Mowry. Pro- Robots. In RSS, pages 161–168, June 2005.
grammable Matter. IEEE Computer, 38(6):99–101, 2005. [20] P. J. White, M. L. Posner, and M. Yim. Strength Analysis
[8] Saul Griffith, Dan Goldwater, and Joseph M. Jacobson. of Miniature Folded Right Angle Tetrahedron Chain
Self-replication from random parts. Nature, 437:636, Programmable Matter. In ICRA, pages 2785–2790, 2010.
Sept 28 2005. [21] M. Yim, W.M. Shen, B. Salemi, D. Rus, M. Moll,
[9] Chris Jones and Maja J. Matarić. From Local to Global H. Lipson, E. Klavins, and G. S. Chirikjian. Modu-
Behavior in Intelligent Self-Assembly. In IEEE Interna- lar Self-Reconfigurable Robot Systems: Challenges and
tional Conference on Robotics and Automation (ICRA), Opportunities for the Future. IEEE RAM, 14(1):43–52,
pages 721–726, 2003. 2007.

96
An Object-Based Approach to Map Human Hand Synergies onto
Robotic Hands with Dissimilar Kinematics
G. Gioioso, G. Salvietti, M. Malvezzi and D. Prattichizzo

Abstract— Robotic hands differ in kinematics, dynamics,


programming, control and sensing frameworks. Borrowing the
terminology from software engineering, there is a need for
middleware solutions to control the robotic hands independently
from their specific structure, and focusing only on the task.
Results in neuroscience concerning the synergistic organization
of the human hand, are the theoretical foundation of this
work, which focuses on the problem of mapping human hand
synergies on robotic hands with dissimilar kinematic structures.
The proposed mapping is based on the use of a virtual ellipsoid
and it is mediated by a model of an anthropomorphic robotic
hand able to capture the idea of synergies in human hands. Mapping Algorithm
This approach has been tested in two different robotic hands
with an anthropomorphic and non-anthropomorphic kinematic
structure.

I. I NTRODUCTION
Among the many approaches presented in the literature,
Industrial 3-Finger 4-Finger 5-Finger
bio-inspired control of robotic hands seems to be one of the Gripper Hand Hand Hand
more promising approach [1], [2]. A deeper understanding
on how the brain exploits the high redundancy of the human Fig. 1. Mapping between human synergies and robotic hands.
hands could represent a key issue in the next generation of
control algorithms. In the last two decades, neuroscientists
have studied the organization of the human hand in grasping
and manipulation tasks using accurate hand tracking systems over individual component actions, with advantages in terms
[3]. In particular, some studies demonstrated that, notwith- of simplification and efficiency of the overall system.
standing the complexity of the human hand, a few variables This reduction of Degrees of Freedom (DoFs) can be
are able to account for most of the variance in the patterns used to decrease the complexity of control algorithms for
of human hands configuration and movement [4]. These robotic hands. Intuitively, these findings in neuroscience
conclusions were based on the results of experimental tests in can be used in robotics to control robotics hands with an
which subjects were asked to perform grasping actions on a anthropomorphic structure closely copying the structure of
wide variety of objects. Data were recorded by means of data human hands. This is easy to understand for anthropomor-
gloves and were analysed with principal component analysis phic robotic hands but what happens if the robotic hand has a
(PCA) techniques. The results showed that the first two completely different kinematics? Can we still take advantage
principal components account for most of the variability in of the sensorimotor synergies as studied in neuroscience?
the data, more than 80% of the variance in the hand postures. This chapter provides an answer to these questions focusing
In this context the principal components were referred to on the development of a unified framework for programming
synergies, to capture the concept that, in the sensorimotor and controlling robotic hands with dissimilar kinematic.
system of the human hand, combined actions are favoured This generalization can be achieved through a mapping
algorithm able to reproduce the movements due to a syn-
This work was partially supported by the European Commission with ergistic control of a paradigmatic model of the human hand
the Collaborative Project no. 248587, “THE Hand Embodied”, within onto the robotic hands. This mapping function focuses on the
the FP7-ICT- 2009-4-2-1 program “Cognitive Systems and Robotics” and
the Collaborative EU-Project “Hands.dvi” in the context of ECHORD task space avoiding to consider the peculiar kinematics of the
(European Clearing House for Open Robotics Development). robotic hands. In Fig. 1 the idea of the proposed approach
G. Gioioso, G. Salvietti, M. Malvezzi and D. Prattichizzo is pictorially represented.
are with Department of Information Engineering, University of
Siena, 53100 Siena, Italy [gioioso, salviettigio, In this paper we present a method to map human synergies
malvezzi]@dii.unisi.it onto robotic hands by using a virtual object. The target
G. Gioioso and D. Prattichizzo are also with Department of Information is to reproduce deformations and movements exerted by a
Engineering, University of Siena and with the Department of Advanced
Robotics, Istituto Italiano di Tecnologia, via Morego, 30, 16163 Genova paradigmatic human-like hand on a virtual ellipsoid com-
[gioioso, prattichizzo]@dii.unisi.it puted as the minimum volume ellipsoid containing suitable

97
reference points that lie on the hand structure. This allows to representing the fingertip positions and it is a natural ap-
work directly on the task space avoiding a specific projection proach when, for example, precision grasps are considered.
between different kinematics. The paper generalizes the main In [11] a point-to-point mapping algorithm is presented
idea described in [5], introducing the possibility to model for a multi-fingered telemanipulation system where fingertip
different types of virtual object shapes and deformations. motion of the human hand is reproduced with a three-finger
The paper is organized as follows. In Section II a review of robotic gripper. In [12] authors used a virtual finger solution
the literature is presented. Section III summarizes the main to map movements of the human hand onto a four-fingered
results concerning grasp properties in synergy actuated hands robotic hand. However, these solutions do not guarantee a
and describes the mapping method in detail. In Section IV correct mapping in terms of forces and movements exerted
some simulations are shown to confirm the proposed ap- by the robotic hand on a grasped object.
proach effectiveness, while the advantages and drawbacks The pose mapping can be considered as a particular way of
of the proposed method together with conclusion and future indirect joint angle mapping. The basic idea of the pose map-
work are outlined at the end. ping is trying to establish a correlation between human hand
poses and robotic hand poses. For example, Pao and Speeter
II. R ELATED W ORK [13] developed an algorithm that tries to translate human
hand poses to corresponding robotic hand positions, without
Mapping algorithms are frequently used in several appli- loss of functional information and without the overhead of
cations and in particular in tele-manipulation and learning kinematic calculations. In [14] neural networks were used
by demonstration. The typical approach adopted to tele- to learn the hand grasping postures. Anyway, the proposed
manipulate robotic hands is to use datagloves to collect solutions can produce unpredictable motions of the robot
human hand motion data. In [6], for example, a DLR Hand is hand, and thus in our opinion are only exploitable in those
controlled through a CyberGlove, while in [7] a master-slave cases where basic grasp postures are required.
teleoperation system is developed to evaluate the effective- Besides the above mentioned methods, combinations of
ness of tele-presence in tele-robotics applications. them and some original solutions, as the method proposed
In learning by demonstration applications, data collected by in [15], are also present in literature. In [16] a virtual
human actions are not directly used to control robotic hands, object-based mapping is proposed. The object based scheme
but to improve the grasping performance by teaching to the assumes that a virtual circle is held between the user’s thumb
robot the correct posture to obtain stable grasps. In [8], and index finger. Important parameters of the virtual object
authors evaluated human grasps during an arm transportation (the size, position and orientation) are scaled independently
sequence in order to learn and represent grasp strategies and non-linearly to create a transformed virtual object in the
for different robotic hands, while in [9] a programming by robotic hand workspace. This modified virtual object is then
demonstration system for grasp recognition in manipulation used to compute the robotic fingertip locations that in this
tasks and robot pre-grasp planning is developed. case is a simple two-fingers, four-DoFs gripper.
If the considered robotic hand is strongly anthropomor- A 3D extension of the last method is presented in [17]. Even
phic, the data collect from the humans can be used straight- if this extension allows to analyse more cases, this method is
forward in both tele-manipulation and learning by demon- still not enough general for our purposes. In particular, it is
stration applications. When robotic hands have different constrained by the kinematics of the master and slave hand,
kinematics, mapping strategies have to be taken into account. the number of contact points (three) and their locations (the
There are three main approaches in literature to deal with this fingertips) which have to be the same for both the hands.
problem: joint to joint mapping, Cartesian Space or fingertip Then, it can be used only for a given pair of human and
mapping and pose mapping. robotic hands and for precision grasp operations.
The aim of the first approach is to make the poses of The approach proposed in this paper is inspired by the
human and robotic hands look similar. Ciocarlie and Allen, last two mentioned methods. The main contributions of our
for instance, used this method to take advantage of human work with respect to the methods proposed in [16], [17] are
hand synergies in robotic grasp pre-shaping [10]. They tested here summarized: our approach generalizes the mapping to
four different robotic hands with this procedure: the human a generic number of contact points that can be different in
hand joint values were directly mapped into the joints of the human and robotic hands and there are no constraints on
the anthropomorphic hands, while some empirical solutions positions of contact points on the master and on the slave
were adopted with the non-anthropomorphic hands. This rep- hand. Moreover, to the best of our knowledge, none of the
resents the simplest way to map movements between hands. presented papers deal with mapping synergies of the human
Anyway, joint to joint method has to be redefined according hand onto robotic hands, defined as a reduced dimension of
to the kinematic characteristics of the hands making difficult the configuration space.
a generalization of the control strategy considering also that
the performance notably decrease with non-anthropomorphic III. O BJECT-BASED M APPING
structures.
Cartesian Space mappings focus on the geometric relations The main equations necessary to study hands controlled by
between the two workspaces. This solution is suitable for synergies from a kinematic point of view are here introduced.

98
A more detailed presentation of the problem is described in proportional to the difference between the reference and the
[18], further details on grasping can be found in [19]. actual joint displacement
Let us consider a hand grasping an object with no distinc-  
τ = Kq qre f − q (5)
tion between human or robotic. The hand can be represented
as a mechanical interface composed of a series of kinematic where Kq is the joint stiffness matrix. It represents joint
chains, the fingers, that share a common base, the palm. actuator stiffness, given by the joint position control static
The fingers and/or the palm is connected to the manipulated gain, and the structural compliance, due to the mechanical
object by some contact patches, usually represented as points deformation of hand elements (joints, drive cables, links,
(even though, due to the finger and object compliance, they etc.). The problem of computing contact force distribution
are extended to finite areas). and object movements that can be controlled acting on
Once the contact points on the hand are defined, the con- synergies has been studied in [18].
ventional robotic analysis tools allow to find a relationship The relationship in eq. (4) can be differentiated with respect
between their locations with respect to hand palm and the to time, obtaining
finger joint variable (direct kinematics). This relationship can q̇re f = Sż (6)
be differentiated with respect to time, giving the differential
kinematic equation that relates the contact point velocities where the synergy matrix S ∈ ℜnq ×nz , is defined as
ṗ ∈ ℜnc to the joint velocities q̇ ∈ ℜnq ∂ fz
S= .
ṗ = J q̇ (1) ∂z
Matrix S columns describe the shapes, or directions, of each
where J is the hand Jacobian J ∈ ℜnc ×nq , nq is the number synergy in the joint velocity space. It is worth noting that,
of actuated joints and nc is the number of contact variables if the relationship defined in eq. (4) is not linear, synergy
(components of the contact point twists) constrained by the matrix S is not constant, but depends on synergy value z.
contact model. In the following the main contribution of the paper is
The transpose of hand Jacobian relates the grasping contact presented.
forces λ ∈ ℜnc to the hand joint torques τ ∈ ℜnq The target of this work is to determine a method to map a
τ = JTλ (2) set of synergies defined on a reference human hand onto
a generic robotic hand. A model of the human hand is
Contact point twist components on the grasped object are thus defined. We refer to this model as paradigmatic hand.
related to the object reference frame twist ν o = [ȯT ω T ]T by The paradigmatic hand is a kinematic and dynamic model
the following relationship inspired by the human hand that does not closely copy the
kinematical and dynamical properties of the human hand
ṗ = GT ν o . (3)
but rather represents a trade-off between the complexity
where GT denotes the grasp matrix transpose, G ∈ ℜ6×nc . of the human hand model accounting for the synergistic
The Grasp matrix relates the grasping contact forces λ to organization of the sensorimotor system and the simplicity,
the object external wrench w ∈ ℜ6 and accessibility, of the models of robotic hands available on
the market. A detailed kinematic analysis of the paradigmatic
w = Gλ
hand with postural synergies is reported in [22].
For further details on hand Jacobian and Grasp matrices Let the paradigmatic hand be described by the joint
definition and computation, the reader is referred to [19]. variable vector qh ∈ ℜnqh and assume that the subspace of
According to a model inspired by human hand synergies, all configurations can be represented by a lower dimensional
we suppose that the hand is actuated using a number of input vector z ∈ ℜnz (with nz ≤ nqh ) which parametrizes the
inputs whose dimension is lower than the number of hand motion of the joint variables along the synergies qh = Sh z
joints. These inputs are then collected in a vector z ∈ ℜnz being Sh ∈ ℜnqh ×nz the synergy matrix. In terms of velocities
that parametrize the hand motions along the synergies. In one gets
this paper we define the postural synergies as a joint dis- q̇h = Sh ż. (7)
placement aggregation corresponding to a reduced dimension
The ultimate goal of this work is to find a way of controlling
representation of hand movements, according to a compliant
the joint variables q̇r ∈ ℜnqr of the robotic hand in a syner-
model of joint torques.
gistic way using the vector of synergies z of the paradigmatic
In other terms, the reference vector qre f ∈ ℜnq for joint
hand. In other terms we want to design a map Sr to steer the
variables is a linear combination of postural synergies z ∈ ℜnz
robotic joint variables as follows
with nz ≤ nq
qre f = fz (z) (4) q̇r = Sr ż (8)
This approach differs from other works where the synergies where map Sr depends on synergy matrix Sh and other
are considered perfectly stiff and the actual joint variables are variables as explained in the following.
a linear combination of synergies [20], [21]. In a compliant Remark 1: Actually, according to the compliant model
actuation model, the torques applied by the joint motors is previously summarized, and according to eq. (6), in eq. (7)

99
and (8) we should indicate the reference hand joint values
q̇hre f and q̇rre f respectively. It is worth noting that the
mapping procedure explained in this paper considers a virtual
grasp, i.e. a grasp in which no real contact forces are present,
and then, according to eq. (2) and (5), there is no difference
between the reference and the actual hand configuration.
However, the mapping procedure can be applied even when
a real grasp is considered: in this case it will provide the
reference value of joint variables, that will differ from the
actual one, and the difference will depend both on the contact Fig. 2. Mapping synergies from the human (paradigmatic) to the robotic
hand: the fingertip positions of the paradigmatic hand allows to define the
force magnitude and on the system compliance. In the virtual ellipsoid. Activating the human hand synergies, the ellipsoid is moved
following equations, for the sake of simplicity, we will not and strained; the same motion and strain is imposed to the virtual ellipsoid
use the subscript re f to indicate the reference joint values. defined for the robotic hand.
In this work, we propose a method of projecting synergies
from paradigmatic to robotic hands which explicitly takes
into account the task space. One of the main advantages grasped by the paradigmatic hand, it can be easily shown
of designing a mapping strategy in the task space is that that with a suitable model of joint compliance and contact
results can be used for robotic hands with very dissimilar compliance, the rigid-body motion of the virtual ellipsoid
kinematics. The idea is to replicate the task performed with corresponds to the motion of a grasped ellipsoidal object
the paradigmatic hand using the robotic hand with projected and that the non-rigid motion accounts for the normal com-
synergies. ponents of the contact forces for a ellipsoidal object grasp.
The mapping is defined assuming that both the paradig- By representing the motion of the hand through the virtual
matic and the robotic hands are in given configurations q0h object, the motion of the generic reference point pih can be
and q0r . expressed as
Remark 2: Note that the mapping procedure can be ap-
3 
plied for any pair of reference configurations q0h and q0r , ṗih = ȯh + ωh × (pih − oh ) + ∑ ṡ jh (pih − oh )T ŝ jh ŝ jh , (9)
i.e. paradigmatic and robotic hand configurations can be set j=1
independently. However, the choice of very dissimilar initial
configuration may lead to hand trajectory that appears very where ŝ jh represent the versor corresponding to the ellipsoid
different in the configuration space, although they produce, semi-axes.
on the virtual object, the same displacement and the same Grouping all the reference point motions one gets
deformation. Very dissimilar initial configuration may also ⎡ ⎤
lead to other types of problems, for example one of the hands ȯh
may reach its joint limits or singular configurations while the ⎢ ωh ⎥
⎢ ⎥
other could further move.  ṗh = Ah ⎢ ⎥
⎢ ṡ1h ⎥ , (10)
Given the configuration q0h , a set of reference points ⎣ ṡ2h ⎦
ph = [pT1h , · · · , pTih , · · · ]T are chosen on the paradigmatic hand. ṡ3h
In this work we considered the fingertip points as reference
points. However, the algorithm can be applied also choosing where matrix Ah ∈ ℜnch ×9 is defined as follows
other reference points, for example on the intermediate pha- ⎡  ⎤
langes or in the hand palm, and furthermore the number of I −S(p1h − oh ) (p1h − oh )T ŝ1h ŝ1h ···
reference points is not a-priori fixed. Adding other reference ⎢ ··· ··· ··· ··· ⎥
Ah = ⎢
⎣ I
 ⎥ (11)
point can be useful when a power grasp is taken into account. −S(pih − oh ) (pih − oh )T ŝ1h ŝ1h ··· ⎦
The virtual ellipsoid is then computed as the minimum ··· ··· ··· ···
volume ellipsoid containing the reference points in ph (Fig.
2). Note that in general reference points do not lie on the Matrix Ah depends on the motion that has to be reproduced
ellipsoid surface. Let us parametrize the virtual ellipsoid on the robotic hand and consequently it depends on the task.
by its center oh and by the lengths s jh ( j = 1, 2, 3) of Considering the contact model as hard finger [19], the matrix
its semi-axes. The motion of the hand due to synergies Ah can be seen as the Grasp matrix for the virtual object.
activation could be described using a large set of parameters. From (1), (7) and (10) we can evaluate the virtual ellipsoid
In this paper we simplify the problem assuming a rigid-body motion and deformation as a function of the synergy vector
motion, defined by the linear and angular velocities of the velocity ż of the paradigmatic hand
ellipsoid center ȯh and ωh respectively, and a non-rigid strain ⎡ ⎤
represented by the variations of the semi-axes. Let s˙j be the ȯh
derivative of the j-th semi-axis length. ⎢ ωh ⎥
⎢ ⎥
Although the virtual ellipsoid does not represent an object ⎢ ṡ1h ⎥ = A# ṗh = A# Jh Sh ż, (12)
⎢ ⎥ h h
⎣ ṡ2h ⎦
ṡ3h

100
where A#h denote the pseudo-inverse of matrix Ah . These relationship between robot hand joint velocities and synergy
motions and deformations have to be mapped onto the velocities is defined:
robotic hand. Let us consider the robotic hand in a given
configuration q0r ∈ ℜnqr with a set of selected reference q̇r = Jr# Ar Kc A#h Jh Sh ż. (18)
point location vector pr ∈ ℜncr . Note that no hypothesis Then the synergy mapping Sr in (8) for the robotic hand is
were imposed on the number of reference points on the defined as
paradigmatic human and robotic hands, in general we can Sr = Jr# Ar Kc A#h Jh Sh . (19)
consider nch = ncr , neither on their locations, and neither on
the initial configuration of the two hands. The same use of the Note that the paradigmatic hand synergy matrix Sh is mapped
virtual ellipsoid is applied here: find the minimum ellipsoid to the synergy matrix for the robotic hand Sr through matrix
enclosing the reference points and indicate with or its center Jr# Ar Sc A#h Jh which is function of paradigmatic and robotic
coordinates and with s jr the lengths of its semi-axes. hand configurations (q0h and qrh ) and, of location of the
In order to take into account the differences between reference points for the paradigmatic and robotic hands (ph
the dimensions of the human and the robotic workspace, a and pr ).
scaling factor is introduced. This scaling factor is obtained The proposed algorithm is consequently a non-linear
considering two virtual spheres computed on both the human mapping between the paradigmatic human-like hand and the
and the robotic hand as the minimum volume sphere con- robotic hand. The obtained synergy matrix is not constant
taining reference points. The virtual object scaling factor is and depends on hands configurations.
then defined as the ratio between the radii of the two spheres:
ksc = rrr . Note that the scaling factor depends on the hand IV. S IMULATIONS
h
dimensions, but also on their configurations. We validated the proposed approach on a modular three-
The motion and deformation of the virtual ellipsoid generated fingered 9 DoFs robotic hand and on a DLR-HIT II Hand
by the paradigmatic hand are scaled and tracked by the model with five fingers and 15 DoFs [23]. We compared
virtual ellipsoid referred to the robotic hand: our results with the joint to joint mapping and the fingertip-
⎡ ⎤ ⎡ ⎤ mapping methods [10], [11]. Other mapping methods [14],
ȯr ȯh
⎢ ωr ⎥ ⎢ ωh ⎥ [16] were not taken into account since they can not be
⎢ ⎥ ⎢ ⎥ easily extended to kinematic structures that differ from those
⎢ ṡ1r ⎥ = Kc ⎢ ṡ1h ⎥ (13)
⎢ ⎥ ⎢ ⎥ proposed in the relative applications.
⎣ ṡ2r ⎦ ⎣ ṡ2h ⎦
The grasp of two different objects was considered: a
ṡ3r ṡ3h
sphere and a cube. The paradigmatic and robotic hand joint
where the scale matrix Kc ∈ ℜ9×9 is defined as: variables, and the contact points in the initial configurations
⎡ ⎤ were known. Starting from this initial given grasp, we mod-
ksc I3,3 03,3 03,3 ified the reference joint values according to the previously
Kc = ⎣ 03,3 I3,3 03,3 ⎦ . (14) described methodology. Since the hand is grasping an object,
03,3 03,3 I3,3 by activating the synergies, both the contact forces and the
According to eq. (10) and (11), the corresponding robot grasped object position and orientation vary. The details of
reference point velocity is given by the relationships between input synergy and output variable
⎡ ⎤ values are detailed in [18]. Algorithm performances were
ȯr evaluated comparing the object motion directions and grasp
⎢ ωr ⎥
⎢ ⎥ quality obtained controlling the robotic hands with the above
ṗr = Ar ⎢ ⎥
⎢ ṡ1r ⎥ , (15) mentioned algorithms [18], [22].
⎣ ṡ2r ⎦
Grasp quality evaluation was performed using both quali-
ṡ3r tative and quantitative metrics in order to evaluate the force-
where matrix Ar ∈ ℜncr ×9 is defined as follows: closure properties of the grasp as described in [24]. The
⎡  ⎤ qualitative metric returns a boolean value that shows if the
T
I −S(p1r − or ) (p1r − or ) ŝ1r ŝ1r ··· obtained grasp is force-closure. The quantitative aspect of
⎢ ··· ··· ··· ··· ⎥ the grasp quality is expressed using a penalty function. The

Ar = ⎣  ⎥ (16)
I −S(pir − or ) (pir − or )T ŝ1r ŝ1r ··· ⎦ resulting index represents the inverse of the distance of the
··· ··· ··· ··· grasp from violating contact constraints. All details of the
used indexes can be found in [24].
Recalling eq. (12) and (13) we can express the robotic In the first simulation, a spherical object was considered
hand reference point velocities ṗr as a function of the and the reference points for the human and robotic hands
synergy velocities ż: were chosen on their respective fingertips.
ṗr = Ar Kc A#h Jh Sh ż (17) We considered the paradigmatic hand grasping a sphere
with the fingertips of the thumb, index, medium and ring
and, considering the robot hand differential kinematics ṗr = fingers, while for the modular three-fingered hand and the
Jr q̇r , where Jr ∈ ℜncr ×nqr is its Jacobian matrix, the following DLR-HIT II hand we considered three and four contact

101
value decreases). If we consider the virtual ellipsoid mapping
method described in the preceding section, we obtain better
results with respect to the joint to joint and the fingertip
methods. The same quality indexes were evaluated consid-
ering the grasp of an object with different shape, a cube.
The obtained results are summarized in Tables II for the
DLR hand (the modular hand presents qualitatively similar
results). By observing the obtained results, we can conclude
that, concerning the grasp quality index, the virtual ellipsoid
mapping, both for spherical and cubic objects gets closer to
the human-like grasp behaviour in all the analysed cases.

TABLE I
Fig. 3. The human-like hand (top-left), the modular three- finger robotic G RASP QUALITY EVALUATION FOR THE SPHERICAL OBJECT.
hand (top-right) and the DLR-HIT II Hand (bottom) grasping a sphere with
respectively four, three and four contact points. DLR-HIT II hand

Synergies P VE JtJ F
Syn 1 0.2 − − −
Syn 2 − − − −
points respectively. This emphasizes the independence of Syn 3 − − − −
our method to the number of selected contact points. The Syn [1-2] 0.14 0.046 − 0.079
Syn [1-3] 0.09 0.037 0.057 0.049
paradigmatic and robot hand grasps that were analysed are
shown in Fig. 3. The computed scaling factors ksc were 1.7 Modular hand
and 1.9 for the modular and the DLR-HIT hand respectively.
Synergies P VE JtJ F
Syn 1 0.2 − − −
The obtained results are summarized in Table I for the Syn 2 − − − −
DLR hand and for the modular hand, respectively. Each row Syn 3 − − − −
corresponds to the case of controlling hands with one synergy Syn [1-2] 0.14 0.02 0.113 −
Syn [1-3] 0.09 0.02 0.020 −
or combinations of synergies. This analysis was carried out
considering the first three synergies and their combinations. P: paradigmatic hand, VE: Virtual Ellipsoid,
The second column shows the grasp quality indexes for JtJ: Joint to Joint, F: Fingertip
the human-like hand controlled with synergies, while the
third one reports those of the robotic hand controlled with
TABLE II
the reference joint values obtained with the proposed virtual
G RASP QUALITY EVALUATION FOR THE CUBIC OBJECT.
ellipsoid mapping. The fourth and the fifth columns refer
to the joint to joint mapping and to the fingertip mapping, DLR-HIT II hand
respectively [10], [11]. The performance is expressed by
means of the cost function measuring the grasp quality Synergies P VE JtJ F
Syn 1 0.3 − − −
as described in [24]. The selected cost function basically Syn 2 − − − −
represents a sort of distance between the contact forces and Syn 3 − − − −
the direction normal to the contact surface, then lower values Syn [1-2] 0.20 0.156 − −
Syn [1-3] 0.14 0.146 − −
represent set of contact forces that are farther from the
friction cone boundaries and then are better from the grasp P: paradigmatic hand, VE: Virtual Ellipsoid,
stability point of view. This cost function can be evaluated JtJ: Joint to Joint, F: Fingertip
only if the grasp is force-closure, empty values in the table
mean that, for that method and those selected synergies, no We further evaluated the error arising from the mapping of
force closure is achievable. More details on the evaluation a rigid body motion of an ellipsoidal object. The rigid body
of grasp quality measures in synergy actuated robotic hands motion was obtained by applying a constant synergy rate, i.
can be found in [22]. e. ż = const. We firstly evaluated the trajectory of the virtual
Let us analyse, for example, results shown in Table I, ellipsoid with the paradigmatic hand. We then compared this
relative to the DLR hand. In the paradigmatic hand (second trajectory with those obtained with the DLR-HIT II hand,
column), force closure can be obtained by activating only by applying the proposed virtual ellipsoid (VE), the joint
the first, the first two or the first three synergies, no force to joint (JtJ) and the fingertip (F) mapping procedures. In
closure can be obtained if we activate only the second or third the virtual ellipse algorithm, we used a scale factor ksc = 1,
synergy once. By increasing the number of synergies from in order to compare the trajectories for the paradigmatic
one to three clearly grasp quality increases (cost function and robotic hand respectively. According to the geometric

102
  
  
8
 7
4.5
ïï 6
4
5
3.5

3 4

2.5 3

2 2
1.5 1
1
0
0.5 1.5

0
(a) (b) (c)
1.4
1.2 1 1
1
1 0.8
0.8
0.6
0.4
0.5 0.5
0.4
0.6 Fig. 5. The first synergy mapped onto the modular hand: (a) starting posi-
0.2
0
−0.2 −0.5
0

0
0.2 tion, (b) middle position, (c) end position. In the top-left the corresponding
0
paradigmatic hand postures.
Fig. 4. Trajectories in the workspace of the center of the ellipsoidal object
for the modular (left) and DLR-HIT II (right) hand. In the two figures
the object trajectory followed by the paradigmatic hand is shown. The
trajectory obtained on the robotic hand using the Virtual Ellipsoid mapping
is then compared with the ones obtained with the Fingertip and Joint-to-Joint
mapping as shown by the legend. Axis units are in cm.

analysis presented in [18], that relates the dimensions of (a) (b) (c)
controllable internal contact force and rigid body motion
subspaces to the number of actuated synergies, we observed Fig. 6. The second synergy mapped onto the modular hand: (a) starting
position, (b) middle position, (c) end position. In the bottom-right the
that, if an object rigid body motion has to be produced, corresponding paradigmatic hand postures.
at least four synergies have to be considered with three
contact points (with a Hard Finger contact model) and seven
synergies with four contact points. For the DLR-HIT II hand
analysis we then considered the first seven synergies. The (1) (2)
(1) (2)
arising trajectories of the center of the ellipsoidal object for (1) (2)

modular and DLR-HIT II hand are shown in Fig. 4. As it


can be seen from the plots, the trajectories obtained for the
robotic hand with the virtual ellipsoid procedure is very close (a) (b) (c)
to those obtained with the paradigmatic hand, with respect
to the other mapping methods. This result is clearly due to Fig. 7. Object rigid body motion obtained controlling the first four syner-
gies: (a) starting position, (b) middle position, (c) end position. The arrows
the fact that the mapping itself is based on the replication, labeled with (1) represent the movement imposed by the paradigmatic hand,
in the workspace, of the same rigid body motion of a virtual while the arrows labeled with (2) represent the motion of the object.
object.
V. E XPERIMENTS
The proposed mapping procedure has been validated by VI. D ISCUSSION
some experiments performed with a fully-actuated robotic The proposed mapping strategy, based on mimicking be-
hand with a modular structure. Each module (42 × 33 × haviour of human hand synergies, could be the basis of an
16mm) has one DoF and it can be easily connected to the interface between a higher level control, that defines the
others obtaining kinematic chains that we can consider as synergy reference values z, and the robotic hand. The high
fingers. These chains are connected to a common base that level can be thought as independent from the robotic hand
can be thought as a palm. In the proposed configuration each structure. The interface, based on the proposed mapping
finger has three DoFs, thus the hand has globally nine DoFs. strategy, represents the low level control stage whereby the
The first two synergies of the paradigmatic hand, mapped input synergies are translated into reference joint values
on the modular robotic hand according to eq. (19), are shown which actually control the robotic hand.
in Fig. 5, 6. This mapping has been tested in manipulation tasks. Work
Concerning the rigid body object motion obtained control- is in progress to validate the virtual ellipsoid mapping also
ling the robotic hand with synergies calculated according to for the approaching phase of grasps. Simulation results
the method described in the preceding sections and summa- are very interesting in terms of performances as shown in
rized in eq. (19), the results are shown in Fig. 7. the previous section. However this approach presents some
Although the used device represents a trivial example of drawbacks.
robotic hand, the complexity and the high number to DoFs The proposed mapping is based on a heuristic approach:
to control are, in our opinion, a possible benchmark to we choose to reproduce a part of the hand motion, which
validate our approach. Furthermore its kinematic structure practically corresponds to move and squeeze an ellipsoidal
is significantly different from the paradigmatic hand one, so object. Although squeezing and moving an object explains
it could be useful to test how the proposed mapping method a wide range of tasks, many other possibilities exist in
behaves with very dissimilar hand structures. manipulating objects which are not modelled with this map-

103
ping. Work is in progress to generalize the proposed method [6] M. Fischer, P. Van der Smagt, and G. Hirzinger, “Learning techniques
enriching the possible motions to be reproduced. in a dataglove based telemanipulation system for the dlr hand,” in
Robotics and Automation, Proceedings. 1998 IEEE Int. Conf. on,
Differently from the joint to joint mapping, with respect vol. 2, May 1998, pp. 1603–1608.
to which the proposed method gets better performances, here [7] H. Hu, J. Li, Z. Xie, B. Wang, B. Liu, and G. Hirzinger, “A
Sr is not a constant matrix but it depends on both the human robot arm/hand teleoperation system with telepresence and shared
control,” in Advanced Intellingent Mechatronics, Proceedings, 2005
and robotic hand configurations and by the reference position IEEE/ASME Int. Conf. on, 2005, pp. 1312–1317.
points of the human and robotic hands which should be given [8] S. Ekvall and D. Kragic, “Interactive grasp learning based on human
in this work. demonstration,” in Robotics and Automation,Proceedings, 2004 IEEE
Int. Conf. on, vol. 4, May 2004, pp. 3519–3524.
[9] J. Aleotti and S. Caselli, “Grasp recognition in virtual reality for robot
VII. C ONCLUSION AND F UTURE W ORK pregrasp planning by demonstration,” in Robotics and Automation,
Proceedings, 2006 IEEE Int. Conf. on, 2006, pp. 2801–2806.
[10] M. T. Ciocarlie and P. K. Allen, “Hand posture subspaces for dexterous
Designing synergy-based control strategies in the paradig- robotic grasping,” The International Journal of Robotics Research,
matic hand domain can dramatically reduce the dimension- vol. 28, no. 7, pp. 851–867, July 2009.
ality of the grasping and manipulation problems for robotic [11] A. Peer, S. Einenkel, and M. Buss, “Multi-fingered telemanipulation
- mapping of a human hand to a three finger gripper,” in Robot and
hands. However, an efficient mapping is needed to deal with Human Interactive Communication, Proceeding, 2008 IEEE Int. Symp.
robotic hands with dissimilar kinematics. We proposed a on, August 2008, pp. 465–470.
method for mapping synergies that using a virtual object [12] J. Liu and Y. Zhang, “Mapping human hand motion to dexterous
robotic hand,” Robotics and Biomimetics, Proceedings, 2007 IEEE
allows to work directly in the task space thus avoiding the Int. Conf. on, pp. 829–834, Dec. 2007.
problem of dissimilar kinematics between human-like hand [13] L. Pao and T. Speeter, “Transformation of human hand positions for
and robotic hands. We compared our solution to the most robotic hand control,” in Robotics and Automation, Proceedings, 1989
IEEE Int. Conf. on. IEEE, 1989, pp. 1758–1763.
used solutions existing in literature and we evinced that the [14] P. Gorce and N. Rezzoug, “A method to learn hand grasping posture
proposed method is more efficient in terms of mapped grasp from noisy sensing information,” Robotica, vol. 22, no. 03, pp. 309–
quality and direction of motion. Our preliminary results seem 318, 2004.
[15] S. B. Kang and K. Ikeuchi, “Robot task programming by human
to be very promising, they were performed on a robotic demonstration: mapping human grasps to manipulator grasps,” in
three fingers 9 DoFs hand, a modular hand with a kinematic Intelligent Robots and Systems, Proceedings, 1994 IEEE/RSJ Int. Conf.
structure very different from that of the human hand, and on, vol. 1, September 1994, pp. 97–104.
[16] W. Griffin, R. Findley, M. Turner, and M. Cutkosky, “Calibration
also on a 15 DoFs DLR HIT II robotic hand. The results and mapping of a human hand for dexterous telemanipulation,” in
showed the effectiveness of the proposed method. Further Haptic Interfaces for Virtual Environments and Teleoperator Systems,
investigation on different robotic hands is in progress. One Proceedings, 2000 ASME/IMECE Symp. on, 2000, pp. 1–8.
[17] H. Wang, K. Low, M. Wang, and F. Gong, “A mapping method for
of the main issues of our approach is that the mapping is telemanipulation of the non-anthropomorphic robotic hands with initial
non-linear and that its implementation could need a high experimental validation,” in Robotics and Automation, Proceedings,
computational burden. Ongoing research is focused on the 2005 IEEE Int. Conf. on, april 2005, pp. 4218 – 4223.
[18] D. Prattichizzo, M. Malvezzi, and A. Bicchi, “On motion and force
evaluation of the conditions whereby some simplification can controllability of grasping hands with postural synergies,” in Proceed-
be applied to get constant or slowly varying mappings. As ings of Robotics: Science and Systems, Zaragoza, Spain, June 2010.
future work, moreover, an integration with grasping simulator [19] D. Prattichizzo and J. Trinkle, “Grasping,” in Handbook on Robotics,
B. Siciliano and O. Kathib, Eds. Springer, 2008, pp. 671–700.
like Grasp-it! [25] is expected in order to use its grasp [20] C. Brown and H. Asada, “Inter-finger coordination and postural
planner to determine initial configurations of the human and synergies in robot hands via mechanical implementation of principal
the robotic hands. The choice of the initial configurations components analysis,” in Intelligent Robots and Systems, Proceedings,
2007 IEEE/RSJ Int. Conf. on. IEEE, 2007, pp. 2877–2882.
will be based on performance indexes such as those used to [21] M. Ciocarlie, C. Goldfeder, and P. Allen, “Dimensionality reduction
manipulability analysis [26]. for hand-independent dexterous robotic grasping,” in Intelligent Robots
and Systems, Proceedings, 2007 IEEE/RSJ Int. Conf. on, 2007, pp.
3270–3275.
R EFERENCES [22] M. Gabiccini, A. Bicchi, D. Prattichizzo, and M. Malvezzi, “On the
role of hand synergies in the optimal choice of grasping forces,”
Autonomous Robots, Springer, vol. 31, pp. 235–252, 2011.
[1] B. B. Edin, L. Ascari, L. Beccai, S. Roccella, J. J. Cabibihan, and
[23] H. Liu, K. Wu, P. Meusel, N. Seitz, G. Hirzinger, M. Jin, Y. Liu, S. Fan,
M. Carrozza, “Bio-inspired sensorization of a biomechatronic robot
T. Lan, and Z. Chen, “Multisensory five-finger dexterous hand: The
hand for the grasp-and-lift task,” Brain Research Bulletin, vol. 75,
DLR/HIT Hand II,” in Intelligent Robots and Systems, Proceedings,
no. 6, pp. 785 – 795, 2008.
2008 IEEE/RSJ Int. Conf. on, 2008, pp. 3692 –3697.
[2] L. Zollo, S. Roccella, E. Guglielmelli, M. Carrozza, and P. Dario,
[24] A. Bicchi, “On the closure properties of robotic grasping,” The Int. J.
“Biomechatronic design and control of an anthropomorphic arti-
of Robotics Research, vol. 14, no. 4, pp. 319–334, 1995.
ficial hand for prosthetic and robotic applications,” Mechatronics,
[25] A. Miller and P. Allen, “Graspit! a versatile simulator for robotic
IEEE/ASME Transactions on, vol. 12, no. 4, pp. 418 –429, 2007.
grasping,” Robotics & Automation Magazine, IEEE, vol. 11, no. 4,
[3] J. F. Soechting and M. Flanders, “Flexibility and repeatability of finger pp. 110–122, 2005.
movements during typing: Analysis of multiple degrees of freedom,” [26] A. Bicchi and D. Prattichizzo, “Manipulability of cooperating robots
Journal of Computational Neuroscience, vol. 4, pp. 29–46, 1997. with passive joints,” in Robotics and Automation, Proceedings, 2004
[4] M. Santello, M. Flanders, and J. Soechting, “Postural hand synergies IEEE Int. Conf. on, Leuven, Belgium, 1998, pp. 1038–1044.
for tool use,” The Journal of Neuroscience, vol. 18, no. 23, pp. 10 105–
10 115, December 1998.
[5] G. Gioioso, G. Salvietti, M. Malvezzi, and D. Prattichizzo, “Mapping
synergies from human to robotic hands with dissimilar kinematics: an
object based approach,” in IEEE ICRA 2011 Workshop on Manipula-
tion Under Uncertainty, Shanghai, China, 2011.

104
What Types of Interactions Do Bio-Inspired Robot
Swarms and Flocks Afford a Human?
Michael A. Goodrich, Brian Pendleton, and Sean Kerman P. B. Sujit
Computer Science Department Electrical and Computer Engineering Department
Brigham Young University University of Porto
Provo, Utah USA Portugal
email: [email protected] email: [email protected]

Abstract—This paper uses simulations to identify what types fluence such a broadcasting parameter changes to all agents
of human influence are afforded by the flocking and swarming or implementing design-time consensus algorithms to support
structures that emerge from Couzin’s bio-inspired model [4]. The very specific collective behavior. The work presented in this
goal is to allow a human to influence a decentralized agent col-
lective without resorting to centralized human control. Evidence is paper is novel because it allows for decentralized human
provided that, when nominal agents use switching-based control influence, which may yield systems that are more robust to
to respond to human-guided predators and leaders, the resulting natural variations or communication dropouts than centralized
behavior is responsive to human input but is obtained at the approaches. Thus, we contribute to a decentralized implemen-
cost of causing the dynamic structure of the collective to follow a tation that allows (a) the human to interact with only a subset
single flocking structure. Leaders are more effective in influencing
coherent flocks, but predators can be used to divide the flock of agents, (b) all robots to be responsive to human influence
into sub-flocks, yielding higher performance on some problems. through this subset, and (c) the collective to be robust. In
Introducing a so-called “stakeholder” leadership style makes it other words, we want the human to be able to influence which
possible for a human to guide the agents while maintaining collective behavior emerges and then shape how that collective
several different types of structures; doing so requires more behavior evolves.
than one human-controlled agent. We then demonstrate that it is
possible to produce potentially useful emergent dynamics without This paper does not present a novel robot or user interface
centralized human control, and identify an important type of design. Rather, the paper evaluates what types of human
emergent dynamics: automatic switches between structure types. influences are afforded by bio-inspired robot swarms and
flocks. This evaluation uses case studies with low numbers of
humans augmented by so-called “Oz of Wizard” studies that
I. I NTRODUCTION
simulate the worst-case performance of a human [19]. Since
Steinberg identified human-robot interaction (HRI) with bio- we are measuring what properties of bio-inspired teams afford
inspired robot teams (BIRT) as an important research area for human interaction rather than human performance per se, it
producing responsive, robust systems for complex surveillance is sufficient to use small numbers of real humans plus some
and reconnaissance problems [18]. This paper emphasizes simple simulated human inputs to help identify the impact of
that subset of HRI which helps humans manage multiple various human influence on the behavior of the collective.
remote robots [10, 6]. Similarly, this paper emphasizes that The example tasks used in this paper are abstract and simple,
subset of BIRT work which encodes principles of biological but they are sufficient to provide insight into how humans
societies [20] in robots [16] to produce bio-inspired collective can influence bio-inspired teams. More specifically, the tasks
phenomena in the robot teams. represent a large set of tasks that require coordinated robot
Appropriately combining elements of HRI with BIRT movement to a desired location while maintaining a robust
should yield robot teams that can be efficiently managed by collective structure.
humans but that retain robust qualities in the presence of
unreliability. Such teams should help break through fan-out II. R ELATED L ITERATURE
barriers imposed by a conventional methods of supervisory Miller et al. advocate the use of playbook-style interac-
control, and help open up the possibilities of large-scale tions between a human and a team of multiple robots [13],
robotic teams being used in surveillance, reconnaissance, fire wherein a human calls plays that trigger predictable patterns
suppression, and emergency response. of behavior. Simple plays, like grouping and searching, have
Many bio-inspired teams exhibit robust spatio-temporal been used to manage several large, simulated teams (50-
structures as attractors induced by decentralized control [21]. 200 robots) [8]. More complicated plays, like coordinated
When adding human influence to bio-inspired teams, it is rendezvous or formation-following, have been applied to
imperative that decentralized implementations and distributed smaller teams [11], with some work potentially scaling to
communication be preserved. Most existing approaches to large numbers of agents using torus-like behaviors encoded
human interaction with BIRT use centralized modes of in- as cyclic pursuit attractors [17]. One simple play is for a

105
human to control a leader agent who then influences other other agents via f . In these models, agents are influenced
agents through conventional or bio-inspired means [7]. either by the human or by other agents but not by both. Thus,
Other recent work explores centralized methods for influ- agents will either be leader/predator agents or they will be
encing patterns of collective behavior so that a human can nominal agents. For leader/predator agents, f = 0 and g = 0;
guide the team [9, 7, 12]. More decentralized approaches for nominal agents, f = 0 and g = 0.
include methods to modify potential fields in response to In 2005, Couzin performed an empirical study that explored
directions from a human operator [2] or virtual agent [15]. how informed individuals can guide the behavior of “animal
Shifting from conventional robot teams to swarm-like teams, groups on the move” [5]. This analysis evaluated only the
Bashyal and Venayagamoorthy [3] presented a “human-swarm influence of informed individuals on what we identify below
interaction” (HSI) approach that provided a human with a as parallel groups. We refer to this type of informed individual
partial plan and global information, and then allowed the as a stakeholder because the individual is influenced both by
human to adjust the autonomy of a small subset of swarm knowledge about the location of a goal and by the behavior of
members to influence swarm behavior. The GUARDIANS the group; the individual has a stake in reaching the goal and
project seeks to use swarm robotic technology to support in staying with the collective. Unlike Couzin, we evaluate how
firefighters [1, 14]. stakeholders affect the behavior of more than parallel groups.
HSI builds on the many formal analyses of bio-inspired and A stakeholder tries to sustain influence by remaining in
interacting particle systems. Of particular relevance is work close proximity to the collective, allowing itself to be influ-
that analyzes the emergence of collective structures, including enced by the collective as well as to influence it. As before, for
“translating states” and “rotating states” that are very similar nominal agents f i = 0 and g i = 0. By contrast to the leader-
to the flocks and toroids used in Couzin’s and other work [21]. and predator-style, the stakeholder is influenced both by the
Others have analyzed how various forms of external influence human and by the other agents (both f i = 0 and g i = 0). We
can shape the way a collective moves through space [15]. discuss the precise implementation of the stakeholder model
when we present results for this leadership style.
III. H OW C AN A H UMAN I NFLUENCE ?
In the next section, we will show how Couzin’s specific
Consider the following general discrete-time model for model fits the general model from this section, but discussing
allowing humans to influence BIRT agents: the general model gives some confidence that the results of
xit+1 = f i (xit , x¬i i i op
t ) + g (xt , ut ) (1) this paper may generalize to other systems. Moreover, the
distinction between switching and non-switching controllers
In the model, xit represents agent i’s state at time t, x¬i t that are evident from the general model are probably useful
represents the states of every other agent in the collective for other implementations that use some or all of the zones
except for agent i, f i is the switching controller (defined described in the next section.
below) that encodes how the states of other agents affect
agent i, and g i encodes how the operator input, uop t , influences IV. C OUZIN ’ S M ODEL
agent i. A switching controller is a nonlinear controller in
which the type of response can change depending on the Couzin’s model is a well-studied bio-inspired model that
relative values of xit and x¬i t ; for example, if a predator is
defines three circular zones around an agent: repulsion, ori-
close to an agent, then the agent ignores all other agents and entation, and attraction, with radii of Rrep , Rori , and Ratt ,
flees, but if no predator is close then the agent switches to a respectively [4]. If there are any other agents in the zone of
controller that causes the agent to try to align to its neighbors. repulsion, than those agents induce repulsion forces and any
We assume that g i depends only on the state of the agent, agents outside of the zone of repulsion are ignored. If there
meaning that agent i does not consider the state of other agents are no other agents in the zone of repulsion, then any agent
when it computes the influence from the human. Furthermore, within the zone of attraction and/or the zone of orientation
we assume that local communication between agents is perfect. induces attraction and/or alignment forces. Couzin’s model is
Future work should address these simplifying assumptions. thus a nonlinear controller where agents make zone-dependent
In bio-inspired robot teams, two leadership models have switches between control laws.
been studied by others: lead-by-attraction and lead-by-repul- In addition to the radii of the three zones, Couzin’s model
sion [15]. Lead-by-repulsion is more commonly referred to as has four other critical parameters: noise, blindspot angle (φ),
predation where the leader is a predator and agents are prey, speed (s), and maximum turning rate (ω). In the interest of
so the leader influences the behavior of the agents by pursuing space, we only summarize the model as follows: an agent
them. By contrast, lead-by-attraction is often associated with computes a desired direction by being repelled from all agents
the colloquial use of the word leadership, meaning that a leader within the zone of repulsion (or aligned to and attracted to
is one that gets ahead of a group and the group follows. For all agents within the zones or orientation/attraction) and then
simplicity, we will call lead-by-repulsion models predator- moves toward that desired direction by turning no more than
style and lead-by-attraction models leader-style. ω radians per second.
Under the predator and leader styles, the human influences We analyze Couzin’s model for three reasons. First, it
leader agents via gti (xit , uop
t ), and then these agents influence exhibits four qualitatively different behaviors, two of which

106
have been shown to be attractors in other decentralized sys- of position and angle means that the dynamics for the angle
tems [21]. The presence of attraction, repulsion, and orien- at time t + 1 depend both on the angle at time t and the
tation zones is either typical or is a superset of influences angle at time t − 1. This makes it feasible to believe that the
often found in other decentralized systems. Second, the model reason that Couzin’s model exhibits both a parallel group and a
is compatible with many modern robot systems. The model torus phase is because these phases are attractors for a double
assumes a constant speed with the only control coming in integrator dynamics model with attractors characterized by the
the form of changes in direction. This closely mimics the necessary (but not sufficient) conditions (θ̇ = 0, θ̈ = 0) and
qualitative dynamics of unmanned aerial vehicles and is also (θ̇ = α, θ̈ = 0) for some constant α, respectively.
compatible with four-wheeled robots that must maintain a con-
stant speed. Moreover, sensing and communication between B. The Collective Structures from Couzin’s Model
robots is restricted to a constrained communication distance,
avoiding power, bandwidth and sensing limitations. Third, Figure 1 illustrates the four phases identified by Couzin;
Couzin’s model is fully decentralized and behaves robustly the figures were generated using the parameters shown in
in the presence of noise. This robustness to noise is essential Table I. The figures illustrate agents as circles with a line
because the actual dynamics of real robots will naturally have emanating from them in their direction of travel. Each subfig-
individual variations and will also naturally have different ure in Figure 1 is a snapshot at one time instant. The swarm
dynamics from Couzin’s system.
Simulations in this paper use a sampling rate of Δt =
0.2 Hz; a noise value sampled uniformly from [−0.2, 0.2]
radians is added to angular velocity each second. Following
Couzin’s example, all spatial units are defined with respect to
Rrep which is set to unity. Speed is defined with respect to
this unit as well, that is, s = 1 means that an agent moves
one repulsion radius per second. In Couzin’s model, forward
speed is constant for all agents.
The blindspot needs further explanation. An agent traveling (a) Swarm (b) Torus
in direction θ can only sense agents that are within ±φ radians
from the direction of travel. The blindspot is any area outside
of ±φ radians from the direction of travel. The blindspot opens
the possibility of using visual sensing for real robots, but
clearly does not go far enough toward modeling real robots
by including inevitable things like occlusion.

A. Details of Couzin’s Model


We now summarize Couzin’s model. Couzin’s model is
a switching controller with dynamics generically given by
(c) Dynamic Parallel (d) Highly Parallel
xit+1 = f i (xit , x¬i i
t ). The states are the position vector, ct and
i i i i T
the direction of travel, θt , yielding xt = [ct , θi ] . The position Fig. 1. Snapshots of various phases from Couzin’s model: (a) Swarm.
vector evolves over time as (b) Torus. (c) Dynamic parallel group. (d) Highly parallel group.

s cos θti
cit+1 = cit + Δt , phase is a highly dynamic collective structure that tends to
s sin θti
stay stationary and exhibits short bursts of agent alignment
where s is the speed of the agent. The direction vector evolves interspersed with apparently random milling about the swarms
by first identifying and then tracking the desired direction of center. The toroid phase has agents that circle in the same
i
travel, θ̂t+1 . The desired direction of travel is obtained using direction around a relatively stationary “hole” in the middle
the switching control law subject to the blindspot constraints of the structure. The dynamic and highly parallel structures
as follows: an agent is repelled by all agents within the radius both exhibit collective movement in a unified direction, with
of repulsion; if no agents are within the radius of repulsion, the highly parallel structure exhibiting greater alignment and
then the agent is attracted to all agents within the radius of greater group velocity.
attraction and tries to align with all agents within the radius The title of this paper uses the terms “swarms” and “flocks”
of alignment. Tracking is then performed using as colloquial terms that represent these four types of collec-
* i tive structures. These colloquial terms indicate that collective
i θ̂t+1 if |θti − θ̂t+1
i
| > ωΔt
θt+1 = . structure exhibits a high degree of spatio-temporal correlation
sign(θt − θ̂t+1 )(θt − ωΔt) otherwise
i i i
between agents, in contrast to hive and colony-like behaviors
Naturally, this equation assumes appropriate modulo arith- where agents may work collectively but over an apparently
metic in angle space. Importantly, the time delayed coupling decoherent spatio-temporal horizon.

107
Phase Rori ΔRatt s ω (◦ /s) φ (◦ )
Swarm 1 10 3 100 150
If a predator nears a nominal agent then that nominal agent
Torus 3 12 5 100 150 is repelled by the predator. When the radius of attraction is
Dynamic Par. 7 10 3 100 150 greater than the predator’s radius, the fish tend to stay close
Highly Par. 11 15 5 70 120 together even when the predator starts to “chase” them. This is
TABLE I illustrated in Figure 2 which shows a predator (shark) steering
PARAMETER VALUES USED TO IDENTIFY C OUZIN ’ S PHASES FOR 50 a group of fish.
AGENTS . Ratt = Rori + ΔRatt .

Couzin used group polarization and group angular momen-


tum as a useful way to understand and detect the collective
phases. Toroids spin and stay in place, so they have high
angular momentum and low polarization; individual parallel
groups tend to align and travel in a straight line together, so
they have low angular momentum and high polarization.
Fig. 2. Using a predator to steer a dynamic parallel group.
V. S WITCHING C ONTROLLERS
Couzin’s model requires agents to switch controllers de-
B. Leader Management
pending on whether there are other agents within Rrep , Rori ,
or Ratt . In this section, we add another controller to which The leader model is similar to the predator-based model
the nominal agents switch depending on whether they are above, but the fish are now attracted to the leader producing
near a predator or leader. We included the phrase “switching a tendency for fish to follow the leader. We set Rlead = 30,
controllers” in the title of this section to emphasize that agents meaning that the predator-based and leader-based experiments
change their behavior whenever they are near a leader (within use identical parameters, with the only difference being that
Rlead ) or predator (within Rpred ). In the next two sections, predators repel and leaders attract nominal agents.
we explore an alternative approach. C. Case Study: Leaders vs. Predators
The experiments presented in this section only consider
how parallel groups with parameters (Rrep , Rori , Ratt , φ) = Differences between predator and leader styles were studied
(1, 14, 14, 45◦ ) can be augmented with the new switching in a scenario with 100 agents (fish) placed in a 120 × 120
controller to respond to leaders and predators. This restriction area. Quantities of food, represented graphically as barrels in
is natural because the switching behaviors in the presence of the simulation, are placed around the map to represent the
a leader or a predator cause swarms and toroids to act as if information to be gathered. The “food” is depleted
√ at 1 unit
they were parallel phases. Simply put, adding a new switching per second per fish whenever a fish is within 10 units, that
controller and human influence causes all phases to act like is, the initial quantity of food is measured in fish-seconds. For
parallel phases, at least while human input is present. example, if N fish are within range of food for t seconds, then
The human influences the collective using either a leader N ×t food units are depleted. In the simulations, Rrep < Rpred
or a predator. As described above, adding human influence to and Rrep < Rlead meaning that agents follow the leader or are
Couzin’s model sets g i = 0 for nominal agents and f i = 0 for repelled by the predator unless the agents are about to collide
leaders/predators. The state of nominal agents is augmented with each other. The fish were not aware of the food unless
to include the type of agent (nominal or leader/predator), and they were in proximity to the food, but the human could see
the switching control law for nominal agents is augmented as all food sources; thus, fish could consume food if nearby, but
described in the following sections. could not autonomously find food without human influence.
We performed case studies using both a leader and a
Using a predator or leader allows a human to have central-
predator that were (a) controlled by a single human and
ized1 locus of human influence, namely the predator or leader,
(b) controlled by a simulated human that used a worst-case
while allowing collective behavior to emerge in a decentralized
zig-zag path in an “Oz of Wizard” style experiment. Using
fashion as a leader influences a handful of nominal agents,
a convenience sample of six student volunteers, we obtained
those nominal agents affect a handful of others, and so on.
informed consent and instructed the participants to guide the
A. Predator Management fish in such a way that the food was consumed as quickly as
possible. In the first phase of the experiment, participants were
Predator-style influence is created from the original model
more effective using the predator than the leader because they
by causing agents (illustrated as fish) to be repelled by a
used the predator to divide the 100 fish into different groups
predator if within Rpred = 30. The predator moves slightly
and distribute them around the environment. Leader-style
faster than the fish and can turn much more sharply.
influence didn’t allow this divide and conquer approach so
1 We assume that the human can see all agents. This limits results because, performance was lower. In the second phase of the experiment,
in practice, a human may only know the locations of a small subset of agents. participants were instructed to maintain a cohesive group.

108
For each nominal agent, we can determine which other performance for leaders. All participants were more confident
nominal agents are within one of the radii of attraction2 . We using leaders than using predators.
can then define A as the histogram of the resulting interagent For each agent, we created a time series of the number
topology over 120 time steps. of new agents the leader interacted with plus the number
The zig-zag path simulates a type of worst-case human of agents the leader no longer interacted with at each time
behavior wherein the leader/predator ignores the fish and index. We then computed the power spectral densities for
instead shifts direction left and right while maintaining a these time series and averaged these together across the six
constant average bearing. This is designed to minimize how participants; see Figure 4 which shows only results for 100
long a human sustains influence over any particular subset agents (results for 15 agents are similar). Simply put, predators
of agents because the faster leader/predator shifts from one cause agents to scramble (as evidenced by more frequent
neighborhood of fish to another with each zig. There is a switching), making it more difficult to sustain influence even
significant discrepancy in the A histograms when agents use when the human is trying to keep the group together. Results
autonomous zig-zag control in phase 2. Recall that the zig- from the zig-zag do not add any insight so they are omitted.
zag controller is a type of worst case operator input, because
the leader/predator moves in a zig-zag pattern that disrupts
the nominal topology as much as possible. Results, shown in

(a) Leader (b) Predator

Fig. 4. Typical power spectral densities for human influence styles.

(a) Leader (b) Predator


D. Summary
Results from this section indicate that for parallel group
Fig. 3. A using the zig-zag controller for (a) leader model and (b) predator
model. The values on the axes are not particularly important since the
phases, collective structure can be maintained by either leader-
amplitude is a function of the simulation duration, but it is important to note or predator-style interaction but the human must be clever to
that the same scale is used for both plots. maintain the structure. Simply put, switching controllers afford
leader-based human influence when all agents must be guided
Figure 3, suggest that leader-based influence maintains cohe- to the same point, but when it is useful to divide agents then
sion better than predator-based influence when the operator switching controllers afford predator-based influence.
behaves in a somewhat worst-case manner.
Under human influence in phase 2, A was approximately VI. N ON -S WITCHING C ONTROLLERS
uniform and approximately 120 units tall for both leader-based Switching controllers tend to preclude anything but parallel
and predator-based control. This suggests that humans are group structures. In this section, we evaluate what forms of
clever enough to guide agents into a cohesive parallel group human influence are afforded by non-switching controllers. We
phase; the histograms for both leader-based and predator-based consider two types of non-switching controllers: one based on
influence are very similar to the one shown in Figure 3(a). leadership and one that implements a stakeholder.
To better understand this effect, we measured how long Under the leadership style, leader agents respond to human
the leader and predator sustained influence over other agents input and ignore nominal agents (f i = 0, g i = 0); nominal
for the case when a human controlled the leader/predator agents ignore the human and add the leaders’ influences to
in phase 2 of the experiment. Results for 15 agents are the influence from every other agent (f i = 0, g i = 0). Under
(m = 47.9, s = 70.6) and (m = 42.2, s = 36) for leader and the stakeholder style, nominal agents behave in the same way
predator, respectively, where the sample mean values are units ignoring the human and adding stakeholders’ influences to the
of food consumed during a fixed-time experiment. Results for influence of every other agent (f i = 0, g i = 0); stakeholder
100 agents are (m = 53.8, s = 53.1) and (m = 53.7, s = 67), agents are influenced both by the human and the the nominal
respectively. No participant was able to keep the 100 agents agents (fi = 0, g i = 0). This is very similar to Couzin’s use
together using the predator, which accounts for the higher of informed individuals that were drawn to a goal but were
performance (inadvertent divide and conquer). Participants also governed by other inter-agent dynamics [5].
were able to keep 15 agents together with resulting higher We implement the stakeholder model by determining both
the desired direction using Couzin’s equations as well as the
2 We constructed the connectivity matrix assuming that an agent was
direction to a fixed food source placed by a human (a type
connected to any other within attraction or repulsion range regardless of
whether either agent was actually ignoring the other because of the presence of waypoint). Couzin’s model produces a desired direction
of the leader/predator. θ̂i (t + Δt) computed from f i (xit , xit ). This is then added to

109
θ̂ifood (t + Δt), which is the influence caused by the human • Dynamic parallel groups quickly cohere and then start
telling the stakeholder to be attracted to a point of interest traveling in a somewhat direct line in some random di-
through g i (xit , uop op
t ); in this case, ut is the location of the rection. The blue triangles clustered in a circle around the
food source. Adding f to g yields θ̃i (t + Δt) = θ̂i (t + origin suggest that these groups all travel approximately
Δt)+0.8θ̂ifood (t+Δt) where the 0.8 weighting is subjectively the same straight line distance from the origin.
chosen to encourage the stakeholder to stay near the group • Highly parallel groups quickly cohere, but rarely travel
more than going to the goal. in a straight line from the origin, causing a distribution
We conducted an experiment designed to evaluate if and that is within the “maximum distance” circle.
when a stakeholder could influence the different collective
B. Leaders and Stakeholders
phases of Couzin’s model, and to compare a stakeholder’s
influence to that of a leader. To do this, we used the parameters Fifty agents were placed in the world and given parameters
given in Table I. We then use zero or more “Oz of Wizard”- that caused them to form one of the four phases. Agents
style stakeholders and zero or more leaders of various speeds were initially distributed randomly around the origin with
to see how the different combinations influenced the position random initial directions. One or more agents in the group
of the group. were selected to be leaders or stakeholders and then given
information about a stationary food source placed at location
A. No Leaders (30, 30) in the world. Leader agents go directly toward the
As a point of reference, it is useful to show the final food, but stakeholder agents try to reach the food while
positions of the centroids of each collective phase after 45 sec- maintaining their role in the group. Qualitatively, a stakeholder
onds in the absence of any type of leader. This is shown in tends to occupy a location in a collective structure that is
Figure 5. The legend indicates which symbols correspond to nearest the food. For example, the stakeholder joins the torus
(when the group is in the torus phase) but tends to spend most
of its time on the side of the torus nearest to the food. Similar
behavior is observed for the swarm and parallel phases.
For the purposes of discussion, we will call leaders and
stakeholders by the generic term “manager.” The speed of the
manager is determined by taking the speed of the nominal
agents from Table I and multiplying by the speed scaling
factors from the second line of Table II. For example, the
speed scaling factor of 0 means that the leaders/stakeholders
are stationary, and the speed scaling factor of 1 means that
the leaders/stakeholders are traveling at the same speed as the
other agents. The first row in Table II indicates the number
of agents who are informed of the food source, ranging from
2% to 50% of the total agents.
Fig. 5. Distribution of centroids for various phases with no leaders. The
mean is approximately (0, 0) and σ = (43.05, 43.31). Number of Leaders/StakeHolders {1, 2, 3, 5, 10, 15, 25}
Speed scaling factors { 15 , 13 , 21 , 1, 2}
which phase. Ideal agent speed for the various phases differs,
TABLE II
so results are shown for agents traveling at speed s = 3 for PARAMETERS USED TO CREATE THE SCATTERPLOTS .
swarms and dynamic parallel groups and s = 5 for toroids
and highly parallel groups. The inner circle represents the
maximum distance that an agent could travel at speed s = 3 The centroids for the different groups are computed as a
over the 45s simulation duration, and the outer circle is the sum of the locations of each agent, including the locations of
maximum distance that a torus could travel when agents travel the leaders and stakeholders. Since the stakeholders are part
at speed s = 5. The radius of the outer circle is π4 s, which of the collective, they contribute to the ultimate location of
can be shown to be the maximum distance that a torus can the centroid and should be used in computing the centroid
travel when agents travel at speed s. of the collective at the end of the 40 second simulation. The
It is useful to give a short description of the collective leaders are not technically part of the collective since they
movement patterns of each phase in the absence of any type ignore other agents, but we include their final location in the
of leader of stakeholder. computation of the centroid so as to make sure that there is
• Swarms tend to stay closely coupled together and rarely no unfair advantage given to the stakeholders in comparing
move far from the origin. scatterplots of stakeholders and leaders.
• Toroids also tend to stay closely coupled together and For both manager styles, one and two managers had only a
rarely move far from the origin. Occasionally, a torus modest impact on the ultimate distribution of the phase, but as
will take some time before it forms, causing it to locate the percentage of managers grows their influence also grows.
far from the origin. Figure 6 shows how the error decreases much more rapidly for

110
stakeholders than leaders, and also that the apparent plateau
of error is smaller for stakeholders than leaders.

(a) Leader

Fig. 6. Error as a function of the percentage of number of managers. Error


is Euclidean distance between the final centroid position and the food source.

Once the number of leaders is five or more, constituting at


least 10% of the number of agents in the collective, we start
to see a difference in influence for stakeholders over leaders,
even given the fact that centroids for leaders should be biased
more toward the food location since all leaders end up at that
location.
A sample of results is given in Figure 7 which illustrates
(b) Stakeholder
the final distribution of collective centroids when there are
ten managers. The scatterplots are results for all speed scaling Fig. 7. 20% of the agents try to influence the collective: (a) ten leaders
(m = 20.56, 20.79), σ = (33.03, 33.21)); (b) ten stakeholders (m =
factors. Compared to leaders, stakeholders yield (a) centroid (28.77, 30.59), σ = (23.16, 24.37)). The solid lines denote the location
locations that are closer to the food source and (b) smaller of the food, and the dashed lines the location of the mean of all types.
standard deviations of centroids over all the trials. To empha-
size the differences in the means, two solid lines intersect at
the location of the food sources and two dashed lines intersect and Rori = 3 so that a toroid structure emerges. Consider a
at the location of the mean of the centroids. These results group of N = 80 agents, and consider groups of 30 and 50
extend Couzin’s results [5] by showing that (a) large numbers leaders. In this set of simulations, leaders are added to the
of informed individuals have more influence over a group nominal group so, for example, when a group of N = 80
than small numbers, (b) informed individuals can influence has 30 leaders there are a total of 110 agents. The “Oz of
more than just dynamic parallel groups and (c) torus and Wizard” leaders travel due east at constant speed, but travel
flock attractors can be guided, causing toruses to translate and more slowly than the centroid of the torus. Note that leaders
influencing flocks to travel in a desired direction. are excluded from any calculations of centroid and moment.
We conclude that non-switching controllers afford When agent speeds are moderately higher than the speed
stakeholder-style influence, at least when it is important for of the leaders (s = 10 and s = 5), the torus tends to follow
the agents to maintain their collective structures. Moreover, the leaders but had difficulty keeping up with it (this is true
regardless of the management style, multiple manager agents for ω ∈ {40, 70, 100}◦ /sec). When the torus got behind the
are required to produce satisfactory influence. leaders, it would break from the torus formation, temporarily
C. Emergent Influence Dynamics: Structure Switching form a dynamic parallel group, catch up with the leader, and
then resume the torus.
In the previous section, the radius of attraction was low
enough that the collective structures sometimes fragmented,
leaving groups of agents that were not within each others radii
of influence. In this section, we explore what happens when
we guarantee that agents stay coherent by making the radius
of attraction infinite and by eliminating the blind spot. This
allows us to see what happens when the influence of leaders
is sufficiently high to affect the behavior of all agents to some
extent. Interestingly, potentially useful group behaviors emerge
under these conditions.
Let s = 1 denote the speed of the leader and consider what
happens when the speed of the nominal agents is faster than Fig. 8. When the distance of the group gets large, the group changes from
the leader. Set the parameters of Couzin’s model to Rrep = 1, a torus to a parallel group.

111
Figure 8 illustrates this emergent switching behavior, where ACKNOWLEDGMENTS
the switching occurs at the level of the collective structure This work was funded by the Office of Naval Research and
rather than at the individual. We choose to illustrate this the Army Research Lab. Opinions do not necessarily reflect
switching using group angular momentum, since angular mo- the opinions of the funding agencies.
mentum is very high for a torus and very low for a parallel
group. Although the individual agents are not programmed R EFERENCES
to switch behaviors when the leaders get too far away, the [1] L. Alboul, J. Saez-Pons, and J. Penders. Mixed human-robot team
navigation in the GUARDIANS project. In Proc. of the Intl. Conf.
collective structure of the agents automatically changes when on Safety, Security, and Rescue Robotics, Sendei, Japan, October 2008.
the leaders get too far away. The figure illustrates this in a [2] L. Barnes, M. Fields, and K. Valvanis. Umanned ground vehicle swarm
series of changes in group angular momentum that correspond formation control using potential fields. In Proc, of the Mediterranean
Conf. on Control & Automation, Athens, Greece, July 2007.
to large distances from the centroid of the toroid and the [3] S. Bashyal and G. K. Benayagamoorthy. Human swarm interaction for
locations of the leaders. radiation source search and localization. In IEEE Swarm Intelligence
Symposium, St. Lous, Missouri, USA, 2008.
This natural emergence of switches in collective structure [4] I. D. Couzin, J. Krause, R. James, G. D. Ruxton, and H. R. Franks.
could be very useful for the design of systems that enable Collective memory and spatial sorting in animal groups. Journal of
a human to influence and manage a large team of robots. Theoretical Biology, 218(1), September 2002.
[5] I. D. Couzin, J. Krause, N. R. Franks, and S. A. Levin. Effective
Simply put, emergent switches between collective structures leadership and decision-making in animal groups on the move. Nature,
could afford a human the flexibility to naturally manage col- 433:513–516, 2005.
lective phases without resorting to explicit centralized control. [6] M.L. Cummings, S. Bruni, S. Mercier, and P.J. Mitchell. Automation
architecture for single operator, multiple UAV command and control.
Moreover, this switching likely indicates that it should be The International C2 Journal, 1(2):1–24, 2007.
possible for a human to change which attractor (torus or [7] X. C. Ding, M. Powers, M. Egerstedt, S. Young, and T. Balch. Executive
dynamic parallel group) a particular group exhibits without decision support: Single agent control of multiple UAVs. IEEE Robotics
and Automation Magazine, 2009.
changing internal agent parameters; since the attractors tend to [8] B. Hardin and M. A. Goodrich. On using mixed-initiative control:
be robust to external disturbance, switching between attractors A perspective for managing large-scale robotic teams. In Proc. of
means that the human’s influence can persist until a sufficient ACM/IEEE Intl. Conf. on Human-Robot Interaction, March 2009.
[9] Z. Kira and M. A. Potter. Exerting human control over decentralized
disturbance causes the agents to leave the attractor. robot swarms. In Proc. of the Intl. Conf. on Autonomous Robots and
Agents, pages 566–571, 2009.
VII. C ONCLUSIONS [10] M. Lewis, H. Wang, S. Chien, P. Scerri, P. Velagapudi, K. Sycara, and
B. Kane. Teams organization and performance in multi-human/multi-
robot teams. In Proc. of the IEEE Intl. Conf. on Systems, Man, and
Many bio-inspired models, such as Couzin’s, can produce Cybernetics, Istanbul, October 2010.
a variety of collective structures, but much prior work on [11] T. McLain and R. W. Beard. Cooperative rendezvous of multiple
influencing these models require centralized forms of human unmanned air vehicles. In AIAA Guidance, Navigation and Control
Conference, Denver, Colorado, USA, 2000.
influence. We have used small case studies and “Oz of [12] J. McLurkin, J. Smith, J. Frankel, D. Sotkowitz, D. Blau, and
Wizard”-style experiments to explore how a human can influ- B. Schmidt. Speaking swarmish: Human-robot interface design for
ence these collective structures using decentralized methods. large swarms of autonomous mobile robots. In Proc, of AAAI Spring
Symposium, Stanford, CA, USA, 2006.
Augmenting the switching controllers with a new switch that [13] C. A. Miller, H. B. Funk, M. Dorneich, and S. D. Whitlow. A playbook
responds to the presence of a human-controlled predator or interface for mixed initiative control of multiple unmanned vehicle
leader allowed a human to guide the collective, but only for teams. In Proc. of the 21st Digital Avionics Systems Conf., volume 2,
pages 7E4–1 – 7E4–13, November 2002.
one type of collective structure. Moreover, the leader style [14] A. M. Naghsh, J. Gancet, A. Tanoto, and C. Roast. Analysis and
afforded more sustainable influence over the collective and was design of human-robot swarm interaction in firefighting. In Proc. of the
compatible with leading the collective to a desired location, Intl. Symp. on Robot and Human-Interactive Communication, Muhich,
Germany, August 2008.
and the predator-style afforded divide-and-conquer approaches [15] R. Olfati-Saber. Flocking for multi-agent dynamic systems: Algorithms
to problem solving. and theory. IEEE Transactions on Automatic Control, 51(3):401–420,
Non-switching controllers can be used to allow a human to 2006.
[16] E. Sahin. Swarm robotics: From sources of inspiration to domains
influence a broader set of collective structures, but more than of application. In Lecture Notes in Computer Science, volume 3342.
one human-influenced manager is needed. Stakeholder-style SpringerLink, 2005.
management in which the manager both sought to influence [17] A. Sinha and D. Ghose. Generalization of linear cyclic pursuit with
application to rendezvous of multiple autonomous agents. IEEE Trans-
and to be influenced afforded more human control over the actions on Automatic Control, 51(11):1819–1824, 2006.
collective phases than a management style that did not allow [18] M. L. Steinberg. Biologically-inspired approaches for self-organization,
the leader to be influenced by the group. Importantly, the adaptation, and collaboration of heterogeneous autonomous systems. In
Proc. of SPIE Volume, volume 8062, April 2011.
use of decentralized leaders can be used to induce emergent [19] A. Steinfeld, O. C. Jenkins, and B. Scassellati. The “Oz of Wizard”:
phenomena that may afford use by humans. These emergent Simulation the human for interaction research. In Proc. of the ACM/IEEE
phenomena include the ability to switch from one structure Intl. Conf. on Human-Robot Interaction, 2009.
[20] D. J. T. Sumpter. The principles of collective animal behavior. Philo-
to another and back without changing the parameters of the sophical Transactions of the Royal Society B, 361:5–22, November 2006.
agents. Future work should explore how decentralized human [21] L. Mier y Teran-Romero, E. Forgoston, and I. B. Schwartz. Noise,
influence can enable a team of distributed humans to manage bifurcations, and modeling of interacting particle systems. Technical
Report Rep US 2011, NIH, 2011.
a bio-inspired collective.

112
Real-Time Inverse Dynamics Learning for
Musculoskeletal Robots Based on Echo State
Gaussian Process Regression
Christoph Hartmann∗† , Joschka Boedecker† , Oliver Obst‡§ , Shuhei Ikemoto¶ and Minoru Asada†
∗ Institute
of Cognitive Science, University of Osnabrueck, 49069 Osnabrueck, Germany
Email: [email protected]
† Graduate School of Engineering, Osaka University, Suita 565-0871, Osaka, Japan
‡ CSIRO ICT Centre, PO Box 76, Epping NSW 1710, Australia
§ School of Information Technologies, The University of Sydney, NSW 2006, Australia
¶ Graduate School of Information Science and Technology, Osaka University, Suita 565-0871, Osaka, Japan

Abstract—A challenging topic in articulated robots is the systems being modeled, and/or address the the problem only
control of redundantly many degrees of freedom with artificial in the context of static tasks like position control [3, 21, 2].
muscles. Actuation with these devices is difficult to solve because Machine learning techniques have played an important role
of nonlinearities, delays and unknown parameters such as fric-
tion. Machine learning methods can be used to learn control of in modeling the complex dynamics of a manipulator driven by
these systems, but are faced with the additional problem that the PAMs in [10, 8]. In these studies, feedforward neural networks
size of the search space prohibits full exploration in reasonable were used to model the complex inverse dynamics of the ma-
time. We propose a novel method that is able to learn control of nipulator, and feedback error learning [11, 13] was employed
redundant robot arms with artificial muscles online from scratch to train the networks. However, as the structure of PAM-driven
using only the position of the end effector, without using any joint
positions, accelerations or an analytical model of the system or robots gets more complex, expressing its dynamics in the form
the environment. To learn in real time, we use the so called online of a static mapping is becoming difficult. Additionally, in
“goal babbling” method to effectively reduce the search space, a order to employ feedback error learning to realize trajectory
recurrent neural network to represent the state of the robot arm, tracking of an end effector, an initial feedback controller
and novel online Gaussian processes for regression. With our has to be provided, requiring an inverse kinematics model
approach, we achieve good performance on trajectory tracking
tasks for the end effector of two very challenging systems: a in advance. Here, we aim for a model-free approach to the
simulated 6 DOF redundant arm with artificial muscles, and a 7 control problem. Other relevant methods for learning dynamics
DOF robot arm with McKibben pneumatic artificial muscles. We of high-DOF systems in real time include for example “locally
also show that the combination of techniques we propose results weighted learning” [20].
in significantly improved performance over using the individual In order to deal with the increasing difficulty of learning the
techniques alone.
inverse dynamics of musculoskeletal robots driven by PAMs,
in this work, we combine a variant of goal babbling [19] with
I. I NTRODUCTION
an echo state network (ESN) [9], a particular kind of recurrent
For articulated robots, pneumatic actuators have grown in neural network, to represent the state of the system. Gaussian
popularity and found increasing use in biorobotics, medical, process regression (GPR) is used to train the network in real-
industrial and aerospace applications [1]. Pneumatic artificial time (see Sects. II-A, II-B, and II-C for more details on goal
muscles (PAMs), such as the McKibben artificial muscle [16], babbling, ESNs, and GPR, respectively).
are favourable for developing biologically inspired robots since ESNs [9] have recently become more prominent in the field
they are backdriveable, have high flexibility, and a high power- of robotics, e.g., for learning the inverse kinematics of an
to-weight ratio, similar to biological muscles. As a result, industrial robot [18]. However, to the best of our knowledge,
PAMs have often been used to drive the structural parts ESNs have not been successfully applied to musculoskeletal
of complex anthropomorphic musculoskeletal robots. On the robots and inverse dynamics yet, despite their known ability
other hand, PAMs generally exhibit very strong nonlinearity, to provide highly nonlinear mappings, and a fading memory
and their response is often delayed. For individual PAMs and of past inputs. We employ these features of ESNs to represent
single-DOF systems, there have been several proposals for the state of a redundant robot arm with artificial muscles, and
analytical models [6, 12, 7]. When these are combined for use thereby implicitly compute information such as velocity or
in a multi-DOF robot, it is, however, extremely hard to model acceleration relevant for inverse dynamics tasks.
the dynamics analytically for accurate motion control. In fact, Gaussian process regression has been applied in previous
related studies, using well-established methods from control work in [4] as well as [14, 15] to learn robust inverse dynamics
theory, had to resort to limiting the number of DOF of the on industrial robot arms. The latter authors succeeded to

113
use GPR in a real-time setting, and demonstrated that this • to initialize the network weights with random values and
method outperforms other state of the art learning methods for only adapt the weights from the hidden layer to the output
inverse dynamics. A notable difference of these approaches units (output or readout weights);
to our work is that GPR was only used on industrial robot • to scale hidden layer weights so that the network has the
arms and did not have to face the challenges associated with echo state property, i.e., it implements a fading memory
musculoskeletal robots like elastic deformation of the actuator of past inputs (see [9] for more details).
and long dead-times in the control. Additionally, features such A random input-matrix Win combines input values u linearly
as joint angles, velocities, and accelerations were directly and sends them to the units in the high-dimensional hidden
available for the learning algorithm, which is not the case for layer, also referred to as the reservoir. The units in the
our system. reservoir also have recurrent connections amongst each other,
We briefly introduce the individual methods that our ap- collected in the matrix Wres . Through these loops, informa-
proach builds on, as well as the learning task in Sect. II. tion can remain in the system for some time. In this context,
After reviewing basic GPR, we will present a modification the metaphor of a reservoir is often used since the hidden
for fast online GPR in Sect. III. In Sect. IV, we explain how layer can be seen as a water reservoir that gets disturbed by
the individual methods are combined in our approach, and how a drop, but slowly returns to its initial state after the ripples
they can be applied to both a simulated 6 DOF redundant arm from the input have decayed. This reservoir state r is mapped
with 12 artificial muscles, and to a 7 DOF robot arm driven at timestep t + 1 by an activation function f () such as a
by 17 McKibben pneumatic artificial muscles. Results are hyperbolic tangent or a Fermi function, in the following way:
presented in Sect. V. Finally, in Sect. VI, we briefly summarize
and discuss our results, and give an outlook over possible rt+1 = f (Wres ∗ rt + Win ∗ ut+1 ) (1)
future directions of our work. This mapping has several consequences. First, it projects
the input in a high-dimensional space so that regression gets
II. BASIC M ETHODS easier as it is the case for several kernel methods. Second,
A. Goal Babbling the repeated mapping by the activation function of neurons
in the reservoir leads to many nonlinearities that, together
Goal babbling is based on work by Rolf et al. [19], where
with the fading memory, provide an implicit representation of
the comparison is made to infants who tend to make goal-
nonlinear input properties such as the velocity or acceleration
directed movements from at a very early age even if they do
of a robot arm. Third, the input decays slowly. This way, ESNs
not succeed. Accordingly only a smaller relevant subspace of
are a natural choice to represent delayed and nonlinear signals.
possible motor commands is explored, as opposed to random
Usually, the output weights are trained with some form of
motor babbling. By directly learning from the perturbed goal
linear regression over the reservoir states. In our case, GPR is
directed movements, this bootstraps and increases the speed
used to train the readout instead.
of learning.
In [19], the inverse kinematics is learned in 2D space by C. Gaussian Process Regression
online construction of local linear regressions over positions As Chatzis and Demiris [5] demonstrated, GPR is a more
that are weighted by prototype vectors. A similar local weight- powerful and robust tool for regression on ESNs than the linear
ing is discussed in more detail in Sect. IV. Goal babbling is regression or ridge regression that has been employed so far.
responsible for generating the training samples: At every step, Moreover, it seems to be a good choice for learning control
the inverse kinematics for a point between the last goal and with echo state networks, because Nguyen-Tuong and Peters
the next goal is computed according to the current weight [14] showed that this works very well with regular inverse
matrix and then perturbed to facilitate exploration. Stability in dynamics learning and that it can be adjusted to learn in real-
learning is ensured by always returning to a “home posture” time.
for which the inverse kinematics are known. Continuing the The idea behind GPR is to perform a parameter-free regres-
comparison to infants, this corresponds to the child relaxing sion over the the space of possible functions that gave rise to
its muscles and resting. the training data. In order to do so, a kernel function defines the
In order to transfer these ideas to inverse dynamics learning, correlations or – more intuitively – distances between training
a new state vector has to be generated that captures the points. When a test point is then evaluated, its function value
dynamics of the robot arm. This is done using an echo state should be close to the function values of the training points
network as described below. that covary most with the test point. The hyperparameters of
this kernel function represent a prior distribution over possible
B. Echo State Networks functions. Conceptually, these hyperparameters correspond
ESNs [9] have been introduced as an alternative to more to different activation functions and connections of neural
traditional recurrent neural network (RNN) approaches. Two networks while parameters would be the actual connection
major differences that enable ESNs to overcome problems weights. When data points are added and correlations are
of some earlier algorithms for RNN training, such as slow evaluated, this prior is transformed to a posterior distribution
convergence and instabilities, are: which can be used to predict new targets.

114
A Gaussian process is defined as a distribution over func- We take advantage of the incremental manner of online
tions f (x) with the constraint that any subset of evaluated learning by exploiting the fact that a single data point is added
data points is always jointly Gaussian distributed. With a given to one covariance matrix at every step. For a matrix with sub-
kernel function k(x, x ) and the unproblematic assumption of blocks A, B, D, and E, the block inversion formula states:
a zero mean, the distribution over functions can therefore be −1
A B A−1 + A−1 BS−1 DA−1 −A−1 BS−1
written as = ,
D E −S−1 DA−1 S−1
p(f ) = N (f |0, C) (2) (6)

with Ci,j = k(xi , xj ) + δi,j σ where σ stands for the noise where S is the Schur complement of A, i.e., S = (E −
in the observed target values, and δ is the Kronecker delta. DA−1 B). In the case of an incrementally growing covari-
Given this distribution, a prediction for a new test point xN +1 , ance matrix C, it holds that A = C, B = k(xn , xN +1 ),
that is the mean function value m(xN +1 ) given all previous D = k(xN +1 , xn )T ∀n ≤ N and E = k(xN +1 , xN +1 ) + σ.
N targets t and the correlations k between the previous N Because A−1 = C−1 was already computed in the previous
training points and the new one, can be calculated as follows: step and the Schur complement S is a single number, no
matrix inversion is left and the runtime reduces to O(N 2 )
m(xN +1 ) = kT C−1 T
N t=k α (3) for the matrix multiplications in every step. In order to make
use of this incremental inversion, the hyperparameters have to
with α = C−1N t as the later precomputed prediction vector. be optimized beforehand and therefore represent a true prior.
We have to address two issues to learn with GPRs: A kernel
has to be selected and associated hyperparameters have to be IV. T HE C OMBINED M ODEL
learned. We will use both the exponential kernel The combination of the methods described above is ex-
. / plained in the following paragraphs. Actual parameter settings
D
  2 can be found in Table I.
k(x, x ) = θ exp −0.5 ηi (xi − xi ) (4)
i=1 In order to formally describe the algorithm implementing
the combined model, we define useful terms for online Gaus-
because it is known for its good predictions and robustness,
sian processes with a fixed delay of delay timesteps:
as well as the linear kernel
• Gt is the set of all local GPs at time point t.
k(x, x ) = θxT x.
g g
(5) • ct = Dt  is defined as the center of gt ∈ Gt .
g
• closeness(x, gt ) is adapted from [14] as k(x, ct ) which
It was shown in [5] that this kernel resembles ridge regression. takes a high value for small distances.
Using that, we can easily compare the performance difference g
• Dt = {xi |(delay < i < t)∧
between GPR and ridge regression. The hyperparameters σ, θ (∀h ∈ Gi : closeness(xi , gi ) ≥ closeness(xi , hi ))} is the
and ηi are optimized by applying conjugate gradient descent set of all data points belonging to a local GP.
to the log-likelihood of the data fit given the hyperparameters. g
• Tt = {yi |(0 < i < t − delay) ∧ (xi+delay ∈ gi+delay )} is
The best way to think of ηi is to see it as a relevance measure the set of all target points belonging to a local GP. This
for the ith dimension in the input data since if it is high, the is used as t in formula 3.
corresponding data xi has a high influence on the covariance, The general flow of information goes as follows: The sensor
while if it is low the covariance is not affected. While ηi scales information (length or pressure) from a set of muscles is used
the resulting functions horizontally, θ can be seen as a vertical to update an ESN every esntime milliseconds. Parallel to
scale representing the standard deviation of the GP. this, a growing group of local GPs, G, is used to map the
The runtime cost for making a prediction is dominated by reservoir state r of the ESN and information about the current
the inversion of the covariance matrix C which is O(N 3 ). Ob- and desired position of the arm to commands (Hill-parameters
viously, this is not practical for online learning if a reasonable or pressures) for muscle activation.
amount of data points is assumed. We propose a faster method
to approach this problem. A. Predicting and Learning
The following paragraphs are a detailed description of the
III. O NLINE G AUSSIAN P ROCESSES
algorithms for predicting and learning for one muscle (Algo-
To deal with the expensive matrix inversion at every step, rithms 1 and 2, respectively). Information flow in these is also
Nguyen-Tuong and Peters [14] suggested to use many small visualized in Fig. 1.
local GPRs to keep N reasonably small and to linearly The input to the system is provided by the current reservoir
combine the predictions of the closest GPs. This approach state rt and by the position post of the robot arm end-effector.
reduces the runtime significantly, but it still grows cubic. In Depending on this position a random new goal point goalt is
[15], the same authors further reduce the runtime to O(N 2 ) selected from a larger version of the later trajectory if either the
using a Cholesky factorization. We achieve a similar speedup distance between the goal and the current position is smaller
with our GPR modifications, presented below, using block- than a threshold (the reaching movement was successful), or
inversions. if goalt−1 is older than 20 steps.

115
LJ㼠
ĐůŽƐĞŶĞƐƐ
ƉŽƐŝƚŝŽŶ ƌĞƐĞƌǀŽŝƌ ƚ ĂĐƚŝŽŶ
ƌ㼠 ŵĞŵŽƌLJ ƚ
ŵĞŵŽƌLJ ƚ ŵĞŵŽƌLJ
ƉŽƐ㼠㻙㼐㼑㼘㼍㼥 ƉŽƐ㼠
Ͳ ƌƚͲĚĞůĂLJ ĐůŽƐĞŶĞƐƐ
ĐůŽƐĞŶĞƐƐ
н <ĞƌŶĞů ĂĚĚ 'WZ
ŐŽĂů㼠 є džƚ <ĞƌŶĞů
Ŭ㼑㼞㼚㼑㼘 'WZ
'WZ ƉƌĞĚŝĐƚ є
ƌ㼠
Ͳ㻗
^E LJ㼠㻙㻝
ƐĞŶƐŽƌƐ㼠

ƉŽƐ㼠 ƌŽďŽƚ LJ㼠

Fig. 1. The flow of information in the combined model. Dotted arrows correspond to information flow during action selection whereas dashed arrows are
only relevant for training the GPs. The subdevided rectangles present in the upper half of the figure are memory-queues that save informations between time
t and t − delay. The GPR-box subsumes the set of targets, the matrix inversion and the prediction of the mean.

Next, a state vector is generated by concatenating the Algorithm 1 Predicting in the combined model for a single
reservoir state rt with the difference dpred t = goalt − post muscle at step t. Symbols explained at the end of Sect. III. In
to the state xpred
t . To predict the next action y t , the covariance
case of trajectory tracking, a new goal is selected each step.
between xpred
t and the observed states D t of the closest numpred Input : rt , post
GP is then computed and used in formula 3 to determine the if (goalt−1 == goalt−20 )||(goalt−1 − post  ≤ const)
prediction of each local GPR for the muscle. Our closeness then
measure is defined at the beginning of this section. These goalt ← goal set[rand()]
local approximations are weighted by the closeness of each else
GP to xpred
t . The final prediction yt is then perturbed by
goalt ← goalt−1
uniform noise to resemble the exploration in goal babbling. xpred
t ← [rt , goalt − post ]
In our experiments we found that the properties of noise cls ← Array(|Gt |), cls index ← Array(|Gt |)
highly determine the final outcome. Smoother trajectories were for i = 1 → |Gt | do
obtained by applying the same noise for several steps and to cls[i] ← closeness(xpred
i , Gt [i]), cls index[i] = i
only a subset of the muscles at the same time. sort cls index according to cls in descending order
After sending the action, the actual learning (Alg. 2) takes pred ← Array(numpred ), denom = 0
place. As in the prediction phase, xlearn t is gained by combining for i = 1 → numpred do
a reservoir state with a position difference. However, for learn- j ← cls index(i)
ing rt−delay and dlearn t = post − post−delay are concatenated. g ← Gt [j]
The pair (xlearn
t , y t−delay ) is then used as an observation-target covar ← covariances between Dtg and xpred t
pair for training the closest GPR if the closeness is larger pred[i] ← covarT ∗ α (formula 3)
than mincloseness. By doing so, the position difference after pred[i] ← pred[i] ∗ cls[j], denom ← denom + cls[j]
delay steps is associated to the action and reservoir state at yt = sum(pred)/denom + noise
that time. When a new action is predicted, the desired effects Send : yt
should therefore be visible after delay steps. We are using the
position difference instead of direct position information since
this makes the algorithm a little more independent from the
1) Hyperparameter optimization
arm it is applied to. If the closest GPR is further away than
2) Goal-directed learning
mincloseness, a new local GPR is added and used for learning
3) Trajectory tracking
this state-action pair.
After training, the algorithm waits until the rest of steptime Before both the hyperparameter optimization and the learn-
milliseconds has passed and starts again with new information. ing, a short phase of random activation is performed to “warm
up” the echo state network.
B. Learning Stages The first stage makes use of random muscle activation to
The actual learning is subdivided into three stages: mimic the later activations and thereby explore sensorimo-

116
Algorithm 2 Learning in the combined model for a single Parameter Sim1 Sim2 Real Comments
learn steps 16000 30000 20000 ≈ max gprs ∗ gpr obsa
muscle at step t. Symbols explained at the end of Sect. III. step time 60 60 100 60 ms → 1500 GPR obs b

Input :rt , post esn time 30 30 50 should be step time /2


xlearn
t ← [rt−delay , post − post−delay ] esn units 200 180 200 improvement stops at 200
gpr obs 800 1500 1100 improves fit
for i = 1 → |Gt | do opt obs 800 600 400 many → var, few → bias
cls[i] ← closeness(xlearn
i , Gt [i]) max gprs 5 25 10 maximal number of GPs
(cls max, max index) ← max(cls) numpred 3 3 3 many → bias, few → var
delay 1 2 6 multiply with step time
if cls max > max closeness then min close. 0.7 0.3 0.35 inserts new GP
g ← Gt [max index] spect. rad. 0.7 0.6 0.7 ESN memory measure
covar ← covariances between Dtg and xlearn t
a Sim1,
covar self ← k(xlearn
t , xlearn
t )+σ b on
Real: substituting observations possible in second half
a Thinkpad x220t (2.7Ghz, 4GB memroy)
resize covariance matrix Cgt to |Cgt | + 1
add covar self and covar to Cgt TABLE I
PARAMETER VALUES USED FOR THE SHOWN TRAJECTORIES .
update (Cgt )−1 with formula 6 using B, D = covar and
E = covar self
α ← (Cgt )−1 ∗ Ttg (3)
else and cuts the runtime by the number of muscles, because at
g ← new GP with Cgt = k(xlearn t , xlearn
t ) + δi,j σ every step only one covariance matrix has to be inverted.
g learn g
Add yt−delay to Tt+1 and xt to Dt+1 Also, please note that the position vectors of the simulated
Return : g arms only consist of the 3D-position of a marker on the
“hand”. No metainformation such as positions, velocities or
accelerations of joints are given, since information about these
tor contingencies. A number of optobs training samples are are encoded in the memory of the ESN reservoir. In fact,
recorded and then used to optimize the hyperparameters for correlation analysis shows that although no joint information
the kernel of the GPR. This process is also called automatic is given to the ESN, in every simulation there are always
relevance determination (ARD) [17]. reservoir units that are correlated with the joint angles with
In the learning phase, the algorithm explained above is ρ ≈ 0.9, with the joint velocities with ρ ≈ 0.3 and with the
used to learn actions in the relevant subspace for the later joint accelarations with ρ ≈ 0.2. This enables us to use these
tracking. Also, every 500 steps the muscle activations are set features for learning on the real arm, even though it does not
to 0 for several steps to reach a “home posture” and stabilize have any joint sensors.
the learning process as described in Sect. II-A. V. R ESULTS
For this architecture, the tracking of a trajectory reduces The described model has been successfully applied to end-
to changing the goal to reach at every step to the next goal point effector tracking on simulated 2 and 6 DOF robot arms
on the trajectory. The speed can be adjusted by increasing with artificial muscles as well as a real musculoskeletal arm
or decreasing the distance between two succeeding goals. with 7 degrees of freedom. The parameters used for the
Because of the fixed and identical steptime in learning and trajectories (shown in Fig. 3) can be found in Table I. The
tracking, the muscle activation will then be adjusted to reach source code for the developed model and for the simulation
the required goal faster or slower. will be made available on: www.christophartmann.de

C. Notes A. Simulated Arm


The simulation has two main purposes: First, it should give
To limit complexity and thereby decrease the runtime, we
a proof of concept that the developed model is providing
set the maximal size of one GPR to maxgprobs observations.
reasonable results. Second, it can be used to investigate the
In the case that the closest GPR has reached maxgprobs, the
influence of the different methods employed by running many
information of a previous observation in the inverse covariance
different trials without wearing off the real robot arm.
matrix C−1 can be substituted with the new one according to
The arm used for the proof of concept is quite simple. It is
the Sherman-Morrison formula in O(N 2 ). In our trials with
made of two solid capped cylinders connected by a universal
steptime = 60 ms, this substitution is feasible up to a matrix
2 DOF joint. These segments are connected by four muscles
size of 1100 rows while the blockwise matrix inversion needs
spaced 90◦ apart around the cylinders. They are individually
the same time for 1500 rows. Therefore this method is only
controlled by setting a parameter x, which is then mapped to
used if maxgprobs is small enough.
a force by the following nonlinear modified Hill-equation.
Because the standard deviation between the optimized hy-
perparameters for each muscle in the simulation and on the real 750
force(x) = ( − 375) ∗ muscle length (7)
arm is about ten times smaller than the standard deviation in x+1
the hyperparameters itself, the same hyperparameters are used The force is applied at both fixpoints of a muscle in the
for all muscles. This leads to identical covariance matrices C direction towards the opposing point. The position information

117
1.2
RMSE
1

0.8

0.6

0.4

0.2

0
Regular NoGB NoESN NoGPR

Fig. 2. Left: The simulated robot reaching for the goal visualized as a Fig. 4. The effect of not using one part of the combined model on the root
grey ball. Muscles are drawn in white and the center of the endpoint provides mean squared error. The error bars represent the standard deviance across
position information. Right: The musculoskeletal arm with a glove for position 10 independently trained models. All differences to the regular conditions
tracking. have a significance > 0.99. The significance was computed with the two-
sided Welch’s t-test because the variance of the different samples cannot be
assumed to be identical.

 

needed for the model comes from the position of the marker

at the end of the arm. The length of each muscle serves as
 sensory information for the model. Results for tracking an
ellipsoid are shown in Fig. 3. Since this shows that the tracking

error is only small and that the model works, a next step is to
 apply the model to more difficult and redundant problems.
The arm used to explore the relative influence of the

different methods is based, as depicted in Fig. 2, on the first
 arm but has two more universal joints resulting in a highly
redundant arm with 12 muscles. The influence of each method

as well as remaining parameters such as the hyperparameters
 for the ESN, the kind of initial random exploration or the
         
maximal size of individual GPRs were explored by grid-


 
search. This led to the results in Fig. 4 for trajectories of 10
independently trained models for each condition.
 These observations demonstrate that the novel combination
of methods used in this project is indeed better than the
previous approaches that made use of only one of the methods.


B. Real Arm



The musculoskeletal robot arm used for our experiments
has a 7 DOF skeletal structure, comparable to a human arm in
both the number of DOF, as well as the configuration of bones
 and each joint’s DOF: shoulder, elbow, forearm, and wrist are
realized by using a 3 DOF, 1 DOF, 1 DOF and 2 DOF joint,
respectively. In order to move this structure, 17 PAMs are

      used and configured schematically similar to a human’s arm.

As a consequence, there are also PAMs which drive multiple
joints at the same time. Each PAM has a pressure sensor and a
Fig. 3. 10 successive sample trajectories for the simple (top) and complex tension sensor and is controlled by pressure feedback control.
(bottom) simulated arm. In each case, a completed trajectory takes 9 s. Two cameras are mounted on the head of the robot arm to
Parameters are shown in Table I for “Sim1” and “Sim2”. The respective measure the position of the end effector.
RMSEs are 0.15 and 0.3.
The kinematics of this arm (and similar ones) is difficult

118


 currently only detect the 2D position of the glove of the
robot (see also Fig. 2), the tracking is performed in only

two dimensions. Figure 5 shows the tracking performance
of the algorithm on the musculoskeletal arm using a real-

time Linux system to ensure proper synchronization. As can
be seen, the tracking performance for the first coordinate
 matches well while the performance on the second dimension
is more noisy. Given the fact that the algorithm worked on
 the simulated arms, we believe that the main reason for this
noise lies in the structure of the arm in combination with

the random exploration: The freedom of the arm in the y-
direction is very limited and it requires quite some effort
to maintain a particular pose. Additionally, because of the

         random exploration in the algorithm, we were only able

 to use moderate pressures in the muscles. Therefore, this

 dimension was rarely explored. This might have led to the

noisy performance. Another point to make is that the delay of

the muscles is quite variable since the pressure in the muscles
 builds up over time. For our algorithm, a fixed delay time of 6

steps (=0 600ms) worked well. However, this long and variable
reaction time is surely one reason for the oscillating noise in

the second dimension. Unfortunately, the action feedback to
 the ESN reservoir was not able to compensate this issue. Since

the tracking of the first dimension barely shows oscillations,
this effect might not be as large as one would initially assume.

Finally, the optimal parameters for the real arm might differ
 from the parameters optimized by grid-search for the simulated

one. Exhaustively grid-searching all parameters of the real arm
        
is impossible. Results of our trials can be found in Table I.
VI. C ONCLUSION
Fig. 5. The x and y coordinates of a sample trajectory tracking on the real We present a combination of methods that is capable of
arm when tracking 10 repetitions of a figure 8 using the parameters given in learning inverse dynamics for end-effector trajectory tracking
Table I. One figure consists of 150 goal points and therefore takes 15 s to
complete. of redundant robot arms with artificial muscles in real-time.
We show that, using our approach, this can be achieved
without any previous knowledge about the sensorimotor con-
tingencies and the structure of the musculoskeletal arm. We
for several reasons: When using PAMs, multiple actuators are also demonstrate that the incremental growth present in online
needed for one joint because PAMs are only capable of pulling. learning tasks can be taken advantage of to reduce the runtime
Therefore, the motor command space, such as the pressures for GPR learning to O(N 2 ) in a way that – to the best of our
of the PAMs, has to be redundant with respect to the joint knowledge – has not been suggested in previous work yet.
angular space. In addition to this redundancy, there is also the While the proposed model leads to good tracking results on
well-known redundancy between the position space of the end the simple 2 DOF arm with four muscles, the results gain more
effector and the joint angular space. variance as the number of joints and muscles increases. This
One of the properties of PAMs is their high level of was to be expected for the challenging task of end-effector tra-
backdriveability – in contrast to many traditional driving jectory tracking without any previous knowledge on a highly
systems using electrical motors and decelerators (as found, redundant and nonlinear system. One of the contributions of
e.g., in ASIMO and other robots). This also means, however, the paper is to show that the combination of techniques we
that pressure control of individual PAMs influence each other use is an improvement over using any of them individually. In
through the dynamics of the body structure. Expressing a the previous section, the performance substantially decreased
continuous motion by snapshots of postures has to take into when no ESN, no GPR, or only random exploration instead of
account the dynamics of these systems. In other words, in goal babbling was used. Without the ESN the system is lacking
order to realize trajectory tracking in a PAM-driven arm, memory and a nonlinearly expanded feature space; without the
inverse dynamics learning is necessary but has to face several goal babbling the search space would increase dramatically
challenges. since learning is not focused on the subspace relevant for the
Due to the fact that the vision system of the arm can trajectory tracking; and finally, the GPR facilitates the readout

119
to a level necessary to deal with the complex inverse dynamics Pneumatic Muscle. IEEE Transactions on Robotics, 25
present in the robot models. (6):1282–1291, 2009. ISSN 1552-3098.
We plan to investigate causes of the tracking errors in more [8] H. Gomi and M. Kawato. Neural network control for
detail in future studies. Our experiments point in the direction a closed-loop System using Feedback-error-learning++.
that we are dealing with several problems. Regarding the real Neural Networks, 6(7):933–946, 1993.
arm, the activation delay is not the same for every muscle [9] H. Jaeger and H. Haas. Harnessing nonlinearity: Pre-
and every state of the arm which conflicts with the fixed dicting chaotic systems and saving energy in wireless
delay assumed in our model. This could be approached by communication. Science, 304(5667):78, 2004.
learning state-specific delays, for example using a GP-based [10] M. Katayama and M. Kawato. Learning trajectory and
technique. A possible reason for the error in the complex force control of an artificial muscle arm by parallel-hier-
simulation, which does not suffer from delays, may be found archical neural network model. In Proceedings of the
in the dynamics that emerge from the interactions between the 1990 conference on Advances in neural information
different joints. These might require a more complex, maybe processing systems 3, pages 436–442. Morgan Kaufmann
hierarchical, ESN structure to represent the system state. This Publishers Inc., 1990.
will also be a point for future research. Additionally, the [11] M. Kawato, K. Furukawa, and R. Suzuki. A hierarchical
uncertainty present in the predictions of a GP is not yet used neural-network model for control and learning of volun-
although they can be obtained fairly cheaply. This information tary movement. Biological cybernetics, 57(3):169–185,
could be used to select which state-action pairs to learn and 1987.
thereby reduce the computational load on the GPRs. [12] G.K. Klute and B. Hannaford. Accounting for elastic
energy storage in McKibben artificial muscle actuators.
ACKNOWLEDGMENTS Journal of dynamic systems, measurement, and control,
The authors are grateful for machine time and help with 122:386, 2000.
the robotic arm provided by the Hosoda lab (http://www- [13] J. Nakanishi and S. Schaal. Feedback error learning and
hi.ise.eng.osaka-u.ac.jp). The authors also would like to thank nonlinear adaptive control. Neural Networks, 17(10):
the High Performance Computing and Communications Centre 1453–1465, 2004.
(http://www.hpccc.gov.au/) for the use of their super-computer [14] D. Nguyen-Tuong and J. Peters. Local gaussian pro-
cluster in performing some of the experiments for this paper. cess regression for real time online model learning and
control. In Advances in Neural Information Processing
R EFERENCES Systems 22, 2008.
[1] G. Andrikopoulos, G. Nikolakopoulos, and S. Manesis. A [15] D. Nguyen-Tuong, M. Seeger, and J. Peters. Real-Time
Survey on applications of Pneumatic Artificial Muscles. Local GP Model Learning. In From Motor Learning to
In Proceeding of the 10th Mediterranean Conference Interaction Learning in Robots, volume 264. Springer
on Control Automation (MED), 2011, pages 1439–1446, Verlag, 2010.
june 2011. doi: 10.1109/MED.2011.5982983. [16] V. L. Nickel, J. Perry, and A. L. Garrett. Development of
[2] J.E. Bobrow and B.W. McDonell. Modeling, identifi- useful function in the severely paralyzed hand. Journal
cation, and control of a pneumatically actuated, force of Bone and Joint Surgery, 45A(5):933–952, 1963.
controllable robot. Robotics and Automation, IEEE [17] C.E. Rasmussen. Gaussian processes in machine learn-
Transactions on, 14(5):732–742, 1998. ing. Advanced Lectures on Machine Learning, pages 63–
[3] B.Tondu, S.Ippolito, J.Guiochet, and A.Daidie. A Sev- 71, 2004.
en-degrees-of-freedom Robot-arm Driven by Pneumatic [18] R.F. Reinhart and J.J. Steil. Attractor-based computation
Artificial Muscle for Humanoid Robots. The Inter- with reservoirs for online learning of inverse kinematics.
national Journal of Robotics Research, 24(4):257–274, In Proc. ESANN, pages 257–262. Citeseer, 2009.
2005. [19] M. Rolf, J.J. Steil, and M. Gienger. Online Goal
[4] K.M.A. Chai, C.K.I. Williams, S. Klanke, and S. Vi- Babbling for rapid bootstrapping of inverse models in
jayakumar. Multi-task gaussian process learning of high dimensions. In Development and Learning (ICDL),
robot inverse dynamics. Advances in Neural Information 2011 IEEE International Conference on, volume 2, pages
Processing Systems, 21, 2009. 1–8. IEEE, 2011.
[5] S.P. Chatzis and Y. Demiris. Echo State Gaussian [20] S. Schaal, C.G. Atkeson, and S. Vijayakumar. Real-time
Process. Neural Networks, IEEE Transactions on, 22 robot learning with locally weighted statistical learn-
(9):1435–1445, 2011. ing. In Robotics and Automation, 2000. Proceedings.
[6] C.P. Chou and B. Hannaford. Measurement and mod- ICRA’00. IEEE International Conference on, volume 1,
eling of McKibben pneumatic artificial muscles. IEEE pages 288–293, 2000.
Transactions on Robotics and Automation, 12(1):90–102, [21] X. Zhu, G. Tao, B. Yao, and J. Cao. Adaptive robust
1996. ISSN 1042-296X. posture control of a parallel manipulator driven by pneu-
[7] M. Doumit, A. Fahim, and M. Munro. Analytical matic muscles. Automatica, 44(9):2248–2257, 2008.
Modeling and Experimental Validation of the Braided

120
Recognition, Prediction, and Planning for Assisted
Teleoperation of Freeform Tasks
Kris Hauser
School of Informatics and Computing, Indiana University at Bloomington
[email protected]

Raw User
Abstract—This paper presents a system for improving the input
Freeform task
intuitiveness and responsiveness of assisted robot teleoperation
inference engine
interfaces by combining intent prediction and motion planning. (Sec. III)
Two technical contributions are described. First, an intent pre- Task
dictor estimates the user’s desired task, and accepts freeform distribution
d
tasks that include both discrete types and continuous parameters
(e.g., desired target positions). Second, a cooperative motion Cooperative
pera
motion planner GUI
planner uses the task estimates to generate continuously updated
(Sec.
ec. IV)
robot trajectories by solving optimal control problems with time-
varying objective functions. The planner is designed to respond Robot
Current trajectory
interactively to changes in the indicated task, avoid collisions in
cluttered environments, and achieve high-quality motions using
a hybrid of numerical and sample-based techniques. The system Fig. 1. The system integrates new contributions for prediction of intended
tasks and real-time optimal control.
is applied to the problem of controlling a 6D robot manipulator
using 2D mouse input in the context of two tasks: static target
reaching and dynamic trajectory tracking. Simulations suggest
that it enables the robot to reach static targets faster and to track tasks broaden the capabilities of intelligent teleoperated robots
trajectories more closely than comparable techniques.
because they provide an operator with greater flexibility in
I. I NTRODUCTION unstructured scenarios. But they are more challenging because
This paper aims to develop novice-friendly interfaces for the robot must not only estimate the task but also several
teleoperating complex robots using commodity input devices continuous parameters. Moreover it is no longer practical to
(e.g., computer mice). Safe, intuitive, inexpensive interfaces precompute task-specific motion strategies for the robot, and
could bring robotics in touch with a broad user base and lead instead the robot must optimize motions on-the-fly.
to new applications in teleoperated household tasks, remote This paper presents new techniques for understanding
medicine, and space exploration. But a major challenge is freeform tasks as well as planning high quality motions to
the extreme asymmetry between input devices and robots: the achieve them (Figure 1). We introduce a Bayesian intent
robot has many more degres of freedom than a user can control inference engine based on a Gaussian Mixture Autoregression
at once. Modal interfaces provide users control over subsets framework that deduces a task type and parameters using
of degrees of freedom at once (e.g., [7]), but mode switching statistical features of the input time series. Using this method
makes seemingly simple tasks quite tedious. Moreover, human we learn models for estimating static targets and dynamic
input is noisy and error-prone, and hence a robot should filter trajectories in a unified framework, and we predict intended
out unsafe commands. Hence, novice-friendly teleoperation static and dynamic targets with 43% and 13% lower error,
systems will need to embed a great deal of intelligence in respectively, than using the cursor position alone.
order to act fluidly, capably, and safely. Our system is also enables the extrapolation of intended
One promising approach is to understand the user’s intent future time-varying goals. Our second contribution exploits
so that the teleoperation system can provide task-appropriate this capability to improve trajectory tracking by anticipating
forms of assistance [1, 11, 16, 17, 19]. In this approach, future movements. We present a novel cooperative motion
the robot estimates the operator’s intent using the raw input planner that optimizes the robot’s trajectory to match the fore-
signals and then chooses its action in order to best achieve the casted one, while also handling highly cluttered environments.
desired task. This strategy is appealing because it decouples A combination of sample-based planning [14] and numerical
the overall problem into two subproblems — intent estimation trajectory optimization techniques [4] are used to achieve
and planning — and can also improve performance in the responsive operation, and to produce high-quality, collision-
presence of time delays because the robot can anticipate the free paths. The integrated system achieves fluid and real-time
desired task in the midst of a partially-issued command [5]. In operation by interleaving inference, planning, and execution
this paper we apply this approach to freeform tasks, such as while streaming in user input. Experiments on a simulated
reaching, pointing, or tracking a trajectory, in which the intent 6DOF robot suggest that it improves tracking performance by
ranges over a continuous infinity of possibilities. Freeform up to 67% compared to systems that either do not perform

121
inference, or that use simpler planners. an object), or to perform a gesture, such as a wave hello or
to indicate direction for a coworker. Another possibility might
II. R ELATED W ORK use the underspecification to achieve depth control, or actions
Intent and activity recognition for robot teleoperation has upon specific objects in the environment. While executing a
been studied in the context of telemanipulation and surgical task, a robot may exploit underspecification to avoid obstacles
assist systems [1, 11, 16, 17, 19]. Past work has largely or to optimize other criteria, such as energy consumption.
made use of the well-established machinery of Hidden Markov We use the notion of task to act an intermediary between
Models, and as a result is limited to a finite number of discrete user input and robot motion. We define a task as a sufficient
tasks, such as navigating to a handful of target locations or representation of the optimality criterion for a robot to com-
picking and placing objects in a constrained manner. By con- plete an action primitive, such as reaching a target, picking
trast, our system learns and estimates models with continuous up an object, or pressing a button. In this paper we focus
parameters which makes it applicable to freeform tasks. on freeform tasks, which include both a discrete task type
Several methods have been proposed for predictive mod- it ∈ 1, . . . , M and continuous task parameters zt ∈ RN . (Note
eling of cursor targets to help select icons in graphical user that N depends on the type.) Here we consider M = 2 task
interfaces, including linear regression from peak velocity [2], types:
directional characteristics [12], kinematic motion models [13], 1) Reach tasks. Four task parameters include the position
and inverse optimal control techniques [18]. Unlike [12, 18] of the goal relative to the cursor (gx , gy ), size gr ,
our work is applicable to a fully continuous set of targets. and “urgency” u which is approximated as the average
It also achieves better prediction accuracy compared to linear mouse velocity before reaching the goal.
models [2, 13] because the Gaussian Mixture Autoregression 2) Trajectory following. Six task parameters include the
technique used here is capable of modeling nonlinear charac- goal position relative to the cursor (gx , gy ), its velocity
teristics that are observed in user behavior. (ġx , ġy ), and its acceleration (g̈x , g̈y ).
Sample-based motion planners such as Probabilistic The system (Figure 1) is composed of two major compo-
Roadmaps (PRMs) and Rapidly-Exploring Random Trees nents: the Freeform Task Inference Engine (FTIE) (Sec. IV)
(RRTs) are successful at planning collision-free motion for and the Cooperative Motion Planner (CMP) (Sec. V). The
high-dimensional robot systems [14], and have also been TIE attempts to recover the intended task from the manner
successfully applied to hard real-time planing for 2D he- in which the mouse is dragged, and infers a probabilistic
licopters [6] and ground vehicles [15]. Recently they have distribution over tasks. The CMP uses the estimated task
been applied to teleoperation interfaces for robot manipulator and forecasted evolution of the task in the future to improve
arms [8]. But these works have traditionally focused on finding trajectory tracking. The following sections cover new technical
feasible paths rather than optimal ones. Sample-based ap- contributions in each subsystem that improve performance
proaches have been recently developed for the optimal motion relative to prior approaches.
planning problem [10], but they have not yet been applied to
time-varying cost functions and moreover converge too slowly IV. F REEFORM TASK I NFERENCE E NGINE
for real-time use. An alternative approach is numerical opti- The FTIE estimates a time series of unobserved task pa-
mization over a trajectory parameterization [4]. Optimization rameters (it , zt ) from a streaming series of observations ot ,
approaches can achieve optimality with a fast convergence t = 1, 2, . . ., obtained from the raw cursor movement. It
rate, albeit only locally. Our hybrid planner combines the uses a hybrid discrete/continuous Dynamic Bayesian Network
strength of sample-based and numerical approaches and is (DBN) to infer a distribution over freeform task variables. We
designed specifically to produce high-quality paths quickly for implement the DBN using a Gaussian mixture autoregression
a broad class of cost functions. (GMA) technique, which uses Gaussian mixture models to
learn and represent complex, multi-modal transition models.
III. P ROBLEM OVERVIEW
It is similar in spirit to the interacting multiple model (IMM)
Our system implements an intelligent robot interface for algorithm that extends the Kalman filter to handle multiple
controlling robots with simple and/or noisy input devices. We switching linear processes [3], but in our case the process
consider the setting of a user operating a mouse or other models are more complex. Experiments indicate that GMA
2D input device to control a 6DOF industrial robot arm in a has superior performance to simpler models, such as linear
cluttered, known 3D environment. The user issues commands regressions and Kalman filters.
through a GUI that displays the robot and its environment, and
can translate, rotate, and zoom the camera freely. A. Overview
The operator uses a basic “click-and-drag” operation in Given observations ot of cursor velocities ot =
which he/she clicks on a point to move and then drags it (Δcx,t , Δcy,t ), we wish to infer the distribution over the task
in the manner in which he/she wishes for it to move. This parameters it and zt . We model the evolution of the parameters
“manner” is stated intentionally vaguely so that the robot’s as a DBN, which is autoregressive because we include the
motion is underspecified. For example, a user may wish to observation history ht = (ot−1 , . . . , ot−k ) for the prior k
reach a target as quickly as possible (e.g., to grasp and move time steps as part of the state variable xt = (it , zt , ht ).

122
Time t Time t+1 Transition model m=1 5 m=3 5
݅‫ݐ‬ ݅‫ݐ‬+1 Task drift
݅‫ݐ‬
Component index 3 3
Transition
‫ݐݖ‬ ‫ݐݖ‬+1 model ܿԢ‫ݐ‬ 1 1
‫ܣ‬Ԣ‫ݐ‬ǡ ܾԢ‫ݐ‬ǡ ߳Ԣ‫ݐ‬ MUX

-4 -2 -1 0 2 4 -4 -2 -1 0 2 4
݄‫ݐ‬ ݄‫ݐ‬+1

History ‫ݐݖ‬ ‫ݐݖ‬+1 -3 -3


update
Observation ‫ݖ‬௧ -5 -5
‫ݐ݋‬ ݄‫ݐ‬+1 ‫ݖ‬௧ାଵ ൌ ‫ܣ‬Ԣ௧
model ݄௧ାଵ  ൅ ܾԢ௧ ൅ ߳Ԣ௧ m=5 m=10
Observation model 5 5

‫ݐݖ‬ ݄‫ݐ‬ 3 3
Component
index 1 1
݅‫ݐ‬ ܿ‫ݐ‬
-4 -2 -1 0 2 4 -4 -2 -1 0 2 4

-3 -3
MUX ‫ݐܣ‬ǡ ܾ‫ݐ‬ǡ ߳‫ݐ‬
‫ݖ‬௧
‫ ݐ݋‬ൌ ‫ܣ‬௧ ݄  ൅ ܾ௧ ൅ ߳௧ -5 -5

‫ݐ݋‬

Fig. 3. Gaussian mixture regressions can model complex nonlinear condi-


Fig. 2. Top left: the dynamic Bayesian network (DBN) denoting the temporal tional distributions. This data comes from the reach task training set, with
relationships between hidden task type it , hidden task parameters zt , history the horizontal target position on the x axis and the horizontal cursor velocity
ht , and observation ot . Top right: the transition model represented as a hybrid on the y axis (both normalized to unit variance). Curves indicate the mean
graphical model with an unknown component ct that indexes into a set of and standard deviation of the relationship y(x) estimated by GMRs with a
linear process models. Bottom: the observation model. Dashed circles indicate varying number m of components.
discrete variables while solid circles indicate continuous ones.

B. Gaussian Mixture Regression


The observation is assumed to be generated according to First we will provide some preliminaries on Gaussian mix-
the probabilistic observation model P (ot |xt ), while the state ture regression. A Gaussian mixture model (GMM) with m
evolves according to the transition model P (xt+1 |xt , ot ). Note components describes a weighted combination of m Gaussian
that in traditional HMMs the transition model is typically not distributions. If X is distributed with respect to a GMM, the
dependent on ot ; this only requires a minor adjustment to the probability that X = x is given by
inference procedure. Our model is shown in Figure 2.
P (x) = GM M (x; w1 , . . . , wm , μ1 , . . . , μm , Σ1 , . . . , Σm )
As usual in Bayesian filtering, the belief over xt+1 is
derived from the prior belief over xt and the observation 
m
= wc N (x; μc , Σc ).
ot+1 in a recursive manner. Specifically, we maintain a belief
c=1
bt (xt ) = P (xt |o1 , . . . , ot ) and update it using the recursive (4)
filtering equation:
where N (x; μ, Σ) denotes the probability that a normally
bt+1 (xt+1 ) = P (xt+1 |o1 , . . . , ot+1 ) distributed variable X ∼ N (μ, Σ) takes on the value x. The
 values w1 , . . . , wm are component weights that sum to one.
= P (xt+1 |xt , ot+1 )P (xt |o1 , . . . , ot )dxt GMMs can be interpreted as introducing an auxiliary hidden
(1)
x t class variable C taking values in {1, . . . , m} that identifies
= P (xt+1 |xt , ot+1 )bt (xt )dxt . the component of the GMM from which x is generated. The
xt distribution over x conditional on C = c is N (x; μc , Σc ),
This procedure is performed in in two steps: while the prior distribution over C is given by the weights
P (c) = wc .
1) The predict step, which computes the unconditioned GMMs can be applied to nonlinear regression tasks under an
belief over bt+1 (xt+1 ) independently of the new obser- operation known as a Gaussian mixture regression, or GMR.
vation: It is based on the application of Gaussian conditioning (see

Appendix) to each component in the GMM, and reweighting
bt+1 (xt+1 ) = P (xt+1 |xt , ot )bt (xt )dxt . (2) components according to the probability of observing the
xt
independent variable. Suppose X and Y are jointly distributed
2) The update step, which conditions bt+1 (xt+1 ) on the according to a GMM with m components. Given the value of
observation x, P (y|x) is a GMM with weights
1
bt+1 (xt+1 ) ∝ P (ot+1 |xt+1 )bt+1 (xt+1 ). (3) wc|x = wc P (x|c)
Z
(5)
where Z is a normalization factor and
The following sections will describe the GMA implementation
of the filter. P (x|c) = N (x; μc,x , Σc,x ) (6)

123
is the marginal probability that x is drawn from the c’th being the probability that the d’th component of bzt ,i is
component. Each component in P (y|x) has a mean μc de- observed in the c’th component of the transition GMR. The
termined by a linear fit μc = Ac x + bc with a constant last term in this product is the probability that two variables
covariance Σc . The parameters Ac , bc , and Σc are determined x1 and x2 , distributed respectively according to N (μd , Σd )
in a straightforward manner from the joint GMM using the and N (μc,x , Σc,x ), are equal.
Gaussian conditioning equation (22). The resulting regression Update. The update step applies the GMR observation
model is: model (7) to the update equation (3) via the following deriva-
tion. Let the predicted, unconditioned belief bt+1 (xt+1 ) be a
1 
m
GM R(y|x) = wc N (x; μc,x , Σc,x )N (y; Ac x + b, Σc ). GMM with weights wc , means μc , and covariances Σc , with
Z c=1
c = 1, . . . , m. We also have P (ot |xt ) as GMR with weights
(7) wd , x-components N (μd,x , Σd,x ), and linear fits Ad , bd , Σd,o
Figure 3 shows that GMRs can model nonlinearities and with d = 1, . . . , n. We perform a Kalman observation update
nonuniform variance in data better than linear regression (23) conditional on each pair of components (c, d) from each
(which is equivalent to a GMR with m = 1). These models of the state and observation models (see Appendix) to deter-
are fitted to a 2D projection of our cursor reaching dataset. mine P (xt+1 |ot+1 , c, d). Unconditional on the components,
C. Transition and Observation Modeling we have the update equation:
1 
m n
We use GMRs in both the observation model P (ot |it , zt , ht )
bt+1 (xt+1 ) = wcd P (xt+1 |ot+1 , c, d) (11)
and transition model P (xt+1 |xt , ot ) (Figure 2). In the obser- Z c=1
d=1
vation model, a GMR is estimated for each task type, each of
which has independent variables zt and ht . where Z is a normalization term and the weighs wcd indicate
The transition model is factored into three parts: 1) the task the probability that the observation was generated by the c’th
type drift, 2) the deterministic history update, and 3) the z component of the state prior and the d’th component of the
transition model as follows: observation GMR:

P (xt+1 |xt , ot ) = P (it+1 |it )P (ht+1 |ht , ot )P (zt+1 |it , zt , ht+1 ). wcd = wc wd N (ot+1 ; Ad μc + bd , Σd,o + Ad Σc ATd ). (12)
(8) Mixture collapse. Although these equations are exact and
The type drift is given by a transition matrix, and in our imple- polynomial-time computable, their direct application would
mentation we simply use a uniform probability of switching lead to exponential growth of the number of components in
P (it+1 = it ) = 1 − p, and P (it+1 = it ) = p/(M − 1). The the belief state over time. Hence it is necessary to frequently
history update simply shifts ot into the first position of ht collapse the belief state representation into a more manageable
and drops the oldest observation. Finally, P (zt+1 |it , zt , ht+1 ) number of components. We collapse GMMs with n > k
is encoded as a GMR specific to the task type it , with the components into a constant number k of components after
independent variables zt and ht+1 . both the predict and update steps (k = 10 is used in our
D. GMM Filtering and Forecasting experiments). The collapse operation begins by sampling k
components c1 , . . . , ck without replacement proportionally to
Given GMM belief representations and GMR transition their weights. The n original components are partitioned into
and observation models, the filtering equation (1) has an k subsets S1 , . . . , Sk by assigning component d to subset i if i
exact closed form. For computational efficiency we represent is the index for which the KL divergence between N (μd , Σd )
bt (xt ) in factored form as a distribution over task types and N (μci , Σci ) is minimized. The output GMM contains
P (it ), the history ht (which is deterministic), and a set of one component for each subset Si , with weight wi , mean μi ,
type-conditioned GMMs bzt |i , i = 1, . . . , m each denoting and variance Σi matched to the subset using the method of
P (zt |it = i, o1 , . . . , ot ). moments:
1  1 
Predict. The predict step evaluates (8) in the context of
(2). The history and task type are updated directly as usual, wi = wj μi = w j μj
Z Zwi
while the task parameters are updated via the propagation of j∈Si j∈Si
(13)
each bzt |i through the transition GMR. Suppose the transition  1    T
Σi = wj [(μj − μi )(μj − μi ) + Σj ].
GMR is given by (7), and bzt |i has n components: bzt |i (zt ) = Zwi
j∈Si
GM M (zt ; w1 , . . . , wn , μ1 , . . . , μn , Σ1 , . . . , Σn ). It is not hard
to show that the distribution of zt+1 is a GMM with mn Efficient forecasting. Forecasting of future zt is performed
components, given by: via repeated application of the predict step, without an associ-
ated observation update. However our experiments determined
1 
m n
  T that repeated propagation and collapsing for distant forecasts
bzt+1 |i (zt+1 ) = wcd N (zt+1 ; Ac μd +b, Σc +Ac Σd Ac ),
Z c=1 is too expensive for real time use. So, our method only propa-
d=1
(9) gates a few steps into the future (5 in our implementation)
with and for the remaining steps collapses the transition GMR
wcd = wc wd N (μd ; μc,x , Σc,x + Σd ) (10) into a single linear Gaussian process xt+1 = Axt + b + 

124
TABLE I
M EAN SQUARED TARGET PREDICTION ERRORS OF SEVERAL TECHNIQUES ,
1.4
NORMALIZED TO CURSOR POSITION ERROR . B EST IN COLUMN IS BOLDED . 1.2
1
Reach Tracking Tracking forecast (1s)
Cursor Only 100% 100% 100%
0.8
Linear Reg. 78% 111% 98% 0.6
Linear Reg. (forecast) — — 84% 0.4
Kalman Filter 71% 99% 109%
Velocity Extrapolation — — 203% 0.2
GMA (reach only) 57% 151% 86% 0
GMA (track only) 112% 87% 60% -0.2
GMA (reach+track) 83% 108% 73%
-0.4

Fig. 4. Normalized and time-scaled distance-to-target in static target reaching


linearized around the estimated state distribution. The n’th tasks for the cursor position (solid lines) and the GMA estimated target (dotted
forecasted state is then computed easily in O(log2 n) matrix lines). Mean and standard deviation are plotted.
multiplications through repeated squaring. Memoization is
another effective approach if the forecasting horizon is fixed.
E. Gathering Training Data
Cursor trace
To acquire training data we constructed a GUI for each task GMA esƟmate
type that instructs users to execute an explicitly given task by
Desired
clicking and dragging an on-screen widget. In the reach GUI,
users are asked to drag a widget to circular targets with center
and radius chosen randomly from a uniform range. In the
trajectory-tracking GUI, users are asked to drag a widget at the
same pace as a reference widget that moved along a trajectory. GMA predicƟon
We pick from triangular, rectangular, circular, and figure-eight Smoothed
patterns that were randomly stretched and compressed in the extrapolaƟon
x and y directions and rotated at an arbitrary angle. The speed Desired
of the reference widget was also chosen at random.
These GUIs gathered mouse movements and task parame-
ters at 50Hz from five volunteers resulting in over 830 trials. Fig. 5. Tracking two test trajectories. Top row: target estimates produced
Trials were mirrored in x and y directions to reduce the by the trajectory tracking model are slightly more accurate than the cursor
itself. Bottom row: GMA target forecasts 1 s in the future are substantially
effects of asymmetric training data. We noticed that track- more accurate than using the cursor position alone (19% lower MSE on both
pads and mice produce very different cursor movements, so figures), or velocity extrapolation (42% and 14% lower MSE on the triangle
for consistency we gathered all data using a trackpad. The and circle, respectively).
GMM transition and observation models were learned using
the standard Expectation-Maximization (EM) algorithm. The
history length and number of components were chosen through that this may be an artifact of the input device: in some of our
five-fold cross-validation based model selection, and learning training examples, users made an initial coarse motion toward
was completed in approximately 10 hours on a 2.8 GHz PC. the target, paused and possibly lifted their finger off of the
F. Experiments trackpad, and then approached the target with a fine-grained
motion. The pause causes GMA to be significantly thrown off,
We find that the resulting task models infer desired goals in
if only temporarily.
qualitatively different ways. The reach model predicts the goal
location essentially as an extrapolation of the cursor velocity, For trajectory tracking, GMA only estimates the current goal
with greater variance in the direction of travel. The trajectory position with 13% lower MSE than simply using the cursor
following model essentially performs smoothing, and evens position. But its main strength is its ability to anticipate future
out jerkiness in the cursor motion. target positions. GMA reduces the error in forecasts at 1 s by
Table I demonstrates that GMA dramatically reduces MSE 40%, which is a significant improvement on Kalman filtering
of reach tasks compared to simply using the cursor position. or velocity extrapolation. It also performs better than a linear
It also performs better than simpler techniques of a linear regression trained specifically to perform 1 s forecasts.
regression and Kalman filter, both fit to the last 5 cursor This data also indicates that when GMA does not know
velocities. Figure 4 plots the distance-to-target along all test the task type (reach+track row), its performance decreases
examples for the cursor and the GMA prediction. It drops as compared to when it is given perfect type information
sharply as GMA extrapolates from the current mouse velocity. (reach only and track only rows). This suggests that better task
There is a curious jump in prediction variance once the cursor classification accuracy would yield significant performance
reaches approximately 20% of the original distance. It appears benefits. We leave this issue for future work.

125
V. C OOPERATIVE M OTION P LANNER sampling a configuration qd at random and picking an existing
The second component of our system is a cooperative node to extend a new edge toward qd . Here, each edge is
motion planner that accepts predictive task estimates and a relatively short collision-free trajectory, and we utilize a
avoids collision. As the task inference engine updates the task steering function to ensure that each extension is dynamically
estimate in real-time, the trajectory is adjusted by replanning. feasible and terminates at qd with zero velocity (a similar
Our planner incorporates several contributions to make it strategy was used in [6]). In our case, the steering function
appropriate for real-time user control of robot arms. It handles constructs a time-optimal acceleration-bounded curve using
time-varying probabilistic distributions over tasks as well as the method of [9].
hard constraints like collision avoidance and actuator limits. A Interleaved with random RRT-like expansion, our method
hybrid sampling-based and optimization-based scheme is used also performs numerical optimization to construct trajectory
to quickly generate high-quality motions while also avoiding extensions that locally optimize (15). The integral in (15)
the local minima problems of optimization approaches. It also is evaluated using Gaussian quadrature, and the optimization
is implemented in a hard real-time framework that tolerates parameters are the target configuration of the steering function
planning and communication delays [8]. qd ∈ C as well as a time scale α ≥ 1 that extends
the time at which qd is reached. This parameter helps the
A. Overview planner follow slowly time-varying objectives more closely.
The robot’s motion is subject to joint, velocity, and accel- A boundary constrained quasi-Newton optimization is run for
eration limits: a handful of iterations (10 in our implementation).
qmin ≤ q ≤ qmax Several performance enhancements are used in our planner.
|q̇| ≤ q̇max (14) 1) We initially seed the tree with the trajectory computed
|q̈| ≤ q̈max on the prior iteration and attempt to make improve-
ments through local optimization. This approach helps
where all inequalities are taken element-wise. A trajectory the robot cope better with suboptimal paths caused by
q(t) : [t1 , t2 ] is said to be dynamically feasible if each of these terminating queries early because subsequent iterations
constraints is met for all t. Collision constraints furthermore are likely to improve the path further.
limit the set of valid configurations to the collision-free subset 2) Following [6] we produce more fluid paths by bisecting
of the configuration space q ∈ F ⊆ C. each edge in the tree to produce intermediate states with
When the user clicks and drags a point on the robot, the nonzero velocity.
motion defines a time-varying potential field P (x(q), t) where 3) To expand the tree toward the the random state qd ,
x(q) is the image-space position of the clicked point at the we choose the closest state under the pseudometric that
robot’s configuration q and P (x, t) is a screen-coordinate measures the optimal time to reach qd in the absence of
target distribution from the inferred task bt (zt ). In this section obstacles. The combination of this metric and the prior
we take t = 0 to be the current time, and estimate the bisection technique makes the planner less likely to find
distribution P (x, t) using the FTIE for all t from 0 to some inefficient paths that repeatedly start and stop.
maximum forecasting horizon. 4) We prune branches of the tree that have greater incre-
At each time step the planner searches for a dynamically mental cost than the current best path found so far.
feasible, collision-free trajectory q(t) : [0, T ] → F that 5) Lazy collision checking is used to delay expensive edge
optimizes a cost functional of the form feasibility checks until the planner finds a path that
 T
improves the cost.
J= L(q(t), t)dt + Φ(q(T ), T ) (15) 6) We devote 50% of each time step to trajectory smoothing
0
using a shortcutting heuristic described in [9], with some
where L is an incremental cost and Φ is a terminal cost. Rather minor modifications to ensure that each shortcut makes
than optimizing to convergence, the planner stops when it finds an improvement in the time-varying cost function.
a trajectory with lower cost than the robot’s current trajectory.
If planning is successful then the new trajectory is sent to To obey the hard real-time constraint, the planner should not
the robot. To guarantee safety, the trajectory is ensured to be be initialized at the current state of the robot because once
completely collision free and to terminate in velocity zero. If planning is completed, the robot will have already moved.
planning fails, the planner simply begins anew on the next Instead, following [6, 15] the planner is initialized to the
step. Before turning to the specification of the cost function predicted state along the current trajectory, propagated forward
in the teleoperation setting we will first describe the planner, in time by the planner’s time limit Δt. As in [8] we adapt the
which is more general purpose. time step Δt to the difficulty of problems by increasing it
if the planner failed on the prior time step (indicating a hard
B. Hybrid Sample-Based/Numerical Planning plan), or reducing it if the planner succeeded on the prior time
Our hybrid planner grows a tree of states from the start step (indicating an easy plan). To improve responsiveness we
state forward in time in the configuration/velocity/time space also reduce Δt when the user changes goals by introducing
C × C˙ × R+ . Like the RRT planner, the tree is grown by a scaling factor e−cD where D is the distance between the

126
TABLE II
M EAN TARGETING ERRORS FOR VARIOUS TEST TRAJECTORIES , IN PIXELS .
R ESULTS AVERAGED OVER 20 RUNS .

Reach Tracking Circle Circle+Obst


IK+S, cursor only 88 87 56 141
CMP, cursor only 97 87 39 78
CMP+FTIE 72 62 15 75
CMP+FTIE+type 72 59 18 63

2 2 where λ(t) is a discount factor. Using the fact that E[||X||2 ] =


tr(V ar[X]) + ||E[X]||2 , this expression simplifies to
1.5 1.5
 ∞  T
J= tr(V ar[g(t)])dt + λ(t)||x(t) − E[g(t)]||2 dt
 ∞
0 0
1 1
+ λ(t)||x(T ) − E[g(t)]||2 dt.
T
0.5 0.5 (19)
Since the first term is independent of x it can be dropped from
0 0 the optimization, resulting in an expression in the form of (15)
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 as desired.
EŽŶͲƉƌĞĚŝĐƟǀĞ WƌĞĚŝĐƟǀĞ The introspective discount function weighs the contribution
of each point in time to reflect its expected cost taking into
Fig. 6. Top: Uncluttered and cluttered test scenarios in which the end-
effector is instructed to move along a circular trajectory. The trajectory is a
account the fact that the path will be replanned in the future:
priori unknown to the robot. Bottom: Traces of the end effector in the cluttered ||Xt (t) − E[g(t)]||2
environment without (left) and with (right) predictive tracking. Because it is λ(t) = Ext (20)
planning and executing in real-time, once the non-predictive planner finishes ||x(t) − E[g(t)]||2
planning the target has already moved a great distance. Predictive tracking
anticipates the target’s movements and leads to substantially lower error. where Xt indicates the unknown trajectory actually executed
by the robot, taking future replans into account. We estimate
this expectation by gathering planning statistics on prior
cursor position on the prior planning iteration and the current problems. We find that for a given problem, an exponentially
iteration. Here c is a sensitivity parameter chosen via a small decreasing fit e−bt provides a good fit to the empirical data.
amount of tuning. However, the rate parameter b is highly sensitive to the
C. Expected Cost Functionals and Introspective Discounting difficulty of the problem. Fortunately the current time step
Δ provides a first order approximation of problem difficulty.
This section converts the predictive task probability distri- So, we adaptively discount less in harder regions of the
bution P (x(q), t) to a cost functional of proper deterministic configuration space by scaling the t axis of λ proportionally
form (15). We also introduce an introspective cost functional to Δ and estimate a single b that provides the best fit to our
that monitors both the task distribution and the likelihood of training data.
a successful replan. Let g denote a realization of the user-
defined, time-dependent objective distributed according to the D. Experiments
current belief. We optimize the expected cost: Table II lists mean distance from the robot’s end effector
1 2
T to the intended target for several planners and scenarios. The
J = Eg Ld (q(t), t|g(t))dt + Φd (q(T ), T |g) . (16) Reach and Tracking columns use the natural reach and tracking
0 motions from our testing set as user input. The Circle column
By linearity of expectation, we have indicates a synthetic circle tracking task, while Circle+Obst
 T introduces clutter into the environment (Figure 6).
J= Eg(t) [Ld (q(t), t|g(t))] dt + Eg [Φd (q(T ), T |g)] . For comparison we tested an inverse kinematics controller
0 with a safety filter (IK+S), which prevents infeasible motions
(17)
Now let the deterministic cost functionals Ld and Φd measure by simple rejection. We also tested a non-predictive “cursor
squared distance to the goal g(t). Specifically, reach and only” implementation that simply optimizes the end effector’s
tracking tasks require that the robot’s end effector position distance to the cursor at every step. CMP improves upon IK+S
projected to the screen x(t) ≡ x(q(t)) reaches the goal g(t): by 47% in clutter because it uses trajectory optimization to
 T circumvent obstacles. Furthermore, FTIE improves upon the
  cursor-only approach by 27% on the natural cursor trajectories
J= λ(t)Eg ||x(t) − g(t)||2 dt
and 23% on the circular trajectories.
 0∞ (18)
  We also examined whether FTIE discriminates well between
+ λ(t)Eg ||x(T ) − g(t)||2
T
tasks, and found out that classification performance is in fact

127
quite poor; the most likely task estimated by GMA is correct switching coefficients. IEEE T. on Automatic Control,
only 66% of the time. Surprisingly, this does not have a 33:780–783, 1988.
major effect on overall performance! The data in the last row [4] J. E. Bobrow, B. Martin, G. Sohl, E. C. Wang, F. C. Park,
of Table II suggest that even if perfect task knowledge is and Junggon Kim. Optimal robot motions for physical
introduced into the estimation, targeting error does not change criteria. J. of Robotic Systems, 18(12):785–795, 2001.
significantly. [5] R. R. Burridge and K. A. Hambuchen. Using prediction
to enhance remote robot supervision across time delay.
VI. C ONCLUSION
In Intl. Conf. Intel. Rob. Sys., pages 5628–5634, 2009.
This paper presented a system for estimating, predicting, ISBN 978-1-4244-3803-7.
and planning freeform tasks for assisted teleoperation of a [6] E. Frazzoli, M. A. Dahleh, and E. Feron. Real-time
6DOF robot manipulator using 2D cursor input. Our contribu- motion planning for agile autonomous vehicles. In
tions are twofold. First, a freeform intent inference technique American Control Conf., volume 1, pages 43 – 49, 2001.
based on Gaussian mixture autoregression (GMA) was used [7] D. Gossow, A. Leeper, D. Hershberger, and M. Ciocarlie.
to predict static targets, dynamic targets, and to distinguish Interactive markers: 3-d user interfaces for ros applica-
between the two. Second, a cooperative motion planner was tions. Robotics and Automation Magazine, 18(4):14 –
used generate higher quality trajectories by anticipating users’ 15, 2012.
desired tasks. Results in simulation show improved task perfor- [8] K. Hauser. On responsiveness, safety, and completeness
mance, and suggest that better task discrimination may yield in real-time motion planning. Autonomous Robots, 32
even further benefits. We are interested in extending our work (1):35–48, 2012.
to handle freeform tasks that are contextualized to the robot’s [9] K. Hauser and V. Ng-Thow-Hing. Fast smoothing of ma-
environment, and to support object manipulation tasks. Finally, nipulator trajectories using optimal bounded-acceleration
in the near future we intend to study the subjective experience shortcuts. In Intl. Conf. Rob. Automation, 2010.
of novice users using our system to control a real robot. [10] S. Karaman and E. Frazzoli. Incremental sampling-based
algorithms for optimal motion planning. In Robotics:
A PPENDIX
Science and Systems (RSS), Zaragoza, Spain, 2010.
Gaussian Conditioning. If X and Y are jointly normally [11] D. Kragic, P. Marayong, M. Li, A. M. Okamura, and
distributed as follows: G. D. Hager. Human-machine collaborative systems for
 
X μx Σx Σxy microsurgical applications. Int. J. of Robotics Research,
∼N , , (21)
Y μy Σyx Σy 24(9):731–741, 2005.
[12] D. Lane, S. Peres, A. Sándor, and H. Napier. A process
then the conditional distribution over Y given the value of X
for anticipating and executing icon selection in graphical
is another Gaussian distribution N (μy|x , Σy|x ) with
user interfaces. Int. J. of Human-Computer Interaction,
μy|x = μy + Σyx Σ−1
x (x − μx ) 19(2):241252, 2005.
(22)
Σy|x = Σy − Σyx Σ−1
x Σxy .
[13] E. Lank, Y. Cheng, and J. Ruiz. Endpoint prediction
using motion kinematics. In SIGCHI Conf. on Human
This form is in fact equivalent to the ordinary least-squares fit Factors in Computing Systems, pages 637–646, 2007.
y = Ax + b +  with A = Σxy Σ−1 −1
x , b = μy − Σyx Σx μx , and [14] S. M. LaValle. Planning Algorithms. Cambridge Uni-
where  ∼ N (0, Σy|x ) is an error term. versity Press, 2006.
Kalman Update. Given a linear observation model o = [15] S. Petti and T. Fraichard. Safe motion planning in
Ax + b +  with  ∼ N (0, Q), and prior x ∼ N (μ, Σ), the dynamic environments. Intl. Conf. Intel. Rob. Sys., 2
posterior P (x|o) is a Gaussian with mean and covariance (6):2210 – 2215, 2005.
μx|o = μ − ΣAT C −1 (o − Aμ) [16] O.C. Schrempf, D. Albrecht, and U.D. Hanebeck.
−1
(23) Tractable probabilistic models for intention recognition
Σx|o = Σ − ΣA C T

based on expert knowledge. In Intl. Conf. Intel. Rob.
where C = AΣAT + Q. Sys., pages 1429 –1434, nov. 2007.
[17] W. Yu, R. Alqasemi, R. Dubey, and N. Pernalete. Tele-
R EFERENCES manipulation assistance based on motion intention recog-
[1] D. Aarno and D. Kragic. Motion intention recognition nition. In Intl. Conf. Rob. Automation, pages 1121–1126,
in robot assisted applications. Robotics and Autonomous 2005.
Systems, 56:692–705, 2008. [18] B. Ziebart, A. K. Dey, and J. A. Bagnell. Probabilistic
[2] T. Asano, E. Sharlin, Y. Kitamura, K. Takashima, and pointing target prediction via inverse optimal control. In
F. Kishino. Predictive interaction using the delphian Int. Conf. on Intelligent User Interfaces, 2012.
desktop. In 18th ACM Symposium on User Interface [19] R. Zollner, O. Rogalla, R. Dillmann, and M. Zollner.
Software and Technology, page 133141, 2005. Understanding users intention: programming fine manip-
[3] H. A. P. Blom and Y. Bar-Shalom. The interacting ulation tasks by demonstration. In Intl. Conf. Intel. Rob.
multiple model algorithm for systems with markovian Sys., volume 2, pages 1114 – 1119, 2002.

128
Hybrid Operational Space Control for Compliant
Legged Systems
Marco Hutter, Mark A. Hoepflinger, Christian Gehring, Michael Bloesch, C. David Remy, Roland Siegwart
Autonomous Systems Lab, ETH Zurich, Switzerland, [email protected]

Abstract—This paper introduces the concept of hybrid opera-


tional space control, a method that unifies kinematic tracking of
individual joints with an inverse dynamics task space controller
for the remainder of the robot. The proposed control strategy
allows for a hierarchical task decomposition while simultaneously
regulating the inner forces between the contact points. At the
same time it improves fast tracking for compliant systems by
means of appropriate low level position controllers. Introducing
StarlETH, a compliant quadrupedal robot, the applicability
of the controller and the hardware is demonstrated in real-
time simulations and hardware experiments. We perform static
walking in challenging terrain and show how the controller
can combine precise and fast position control with robust and
compliant interaction with the environment.

I. I NTRODUCTION
Legged systems should interact softly with their environ-
ment to ensure robustness against disturbances such as ter- Fig. 1. The quadrupedal platform StarlETH was developed to study fast,
efficient, and versatile locomotion. It is actuated by high compliant series
rain irregularities or slippage, to protect their hardware from elastic actuators in all joints (photo François Pomerleau).
damage through unexpected collisions, and to safely work
hand-in-hand with human collaborators. In contrast thereto, arms (lightweight segments, compliant, torque controllable)
precise foot or hand placement as well as exact trajectory and that additionally has the capacity to utilize its high-
execution for robust and versatile walking or reaching requires compliant elastic elements to periodically store energy [1] and
accurate and fast tracking with their feet and hands. While thereby increase the efficiency of locomotion [4].
the first scenarios calls for compliant, lightweight, and torque The transition from classical walking machines [5] to such
controlled solutions, the latter are easier fulfilled on stiff a compliant system is accompanied by a shift from traditional
systems with good position control performance. Research has and well elaborated position based control approaches to novel
made significant progress in both directions, yet the unification torque control based strategies. In doing so, the ongoing
in one single device and control framework is still posing improvement in computational power allows the integration of
fundamental challenges. increasingly complex model based control strategies, primarily
Inspired by biology, novel design and actuation concepts based on inverse dynamics. As an example, Sentis et al.
found more and more their way into robotics. Manipulators [25, 26] elaborated a sophisticated framework for humanoid
such as the WAM arm [2] pushed the state of the art with robot control that included a floating base description with
respect to decreasing the inertia of moving segments by using varying support constraints. It was recently released for public
sophisticated cable pulley systems that allow concentrating domain [18]. Along the lines of Khatib’s seminal work on
all actuators in the robot’s base. A very promising approach Operational Space Control (OSC) [12], it extends an inverse
to make systems additionally mechanically compliant, back- dynamics approach with a hierarchical task decomposition
drivable, and torque controllable is the use of Series Elastic [27] in which multiple tasks are executed simultaneously in
Actuators (SEA) [19]. In this context, ‘compliant’ means that accordance with well-defined priorities. This results in a very
elastic elements decouple the actuator from the joints (as done powerful tool for controlling robots with a large amount of
in the Meka robot arms [23]), therefore protect the actuator and actuators.
gearboxes from unforeseen collisions, and make the systems A different approach for inverse dynamics based on or-
inherently safe for interaction with humans. Pushing the state thogonal projection was presented by Mistry et. al. [15]. This
of the art with respect to compliant legged systems, we method avoids the inversion of the inertia matrix, which makes
developed StarlETH (Fig. 1, Section II, [11]) a quadruped it more robust against model uncertainty [17]. Recently, the
robot that combines the advantages of the WAM and Meka same group summarized these different methods for inverse
This research was supported by the Swiss National Science Foundation dynamics [21] showing that they are equivalent with respect
through the National Centre of Competence in Research Robotics. to the minimization of different cost functions.

129
jdes model Tdes torque j des , I des low-level U motor+gearbox j , j& torque Tmeas time tracking performance can be improved. Several particular
based =
tracking controller control velocity source sensor
Tmeas cases of legged locomotion or robot-human interaction fit
j meas
perfectly into this scenario. Yet, in this paper we focus on
Fig. 2. A position tracking task in an inverse dynamics framework requires quadrupedal walking using StarlETH in simulations and for
joint torque controllability. Since an electric actuator with a high gearbox experimental validation.
reduction have to be considered rather as a velocity/position than a torque
source, a cascaded control structures is applied which can lead to position II. H ARDWARE D EVELOPMENT
control performance loss.
StarlETH (Springy Tetrapod with Articulated Robotic Legs,
Using OSC techniques has the great advantage that even Fig. 1) is a quadruped robot that was built at the Autonomous
for highly complex robots the dynamics get controllable in Systems Lab to study fast, efficient, and versatile locomotion.
a very intuitive way. By defining the kinematic behavior of The system has a size of 710 × 640 × 580 mm with segment
a set of distinguished task points and by choosing a task lengths of 200 mm, a total weight of 23 kg (2.4 kg per leg,
specific stiffness and damping behavior, position control tasks 13.2 kg for the main body), and 12 actuated degrees of
(such as following a certain foot-point trajectory or moving freedom, all driven by high compliant SEAs. The leg design
the Center of Gravity (CoG)) are mapped into task space is based on ScarlETH (Series Compliant Articulated Robotic
accelerations and subsequently (via inverse dynamics meth- Leg), a prototype leg that served as a test bench for single
ods) into joint torques. However, such a control framework legged planar running, to evaluate hardware performance, and
assumes perfect torque actuators on joint level, which are to design low level controllers [9, 10]. The main focus in
generally not available in real robotic devices. Common elec- terms of design was put on the high compliant SEAs: The
trical actuators in particular require high gearbox reductions gearbox output of all three degrees of freedom (hip and knee
and are consequently not backdrivable. For practical purposes, flexion/extension, as well as for hip abduction/adduction) are
they must be considered as a velocity rather than a torque connected through cable pulley systems and chain drives to
source, and only the integration of additional torque sensors at antagonistically pre-loaded compression springs. This allowed
the gearbox output and a cascaded low-level regulator allows attaching all actuators closely to the main body and hence
to make the robot fully torque-controllable (Fig. 2). Since minimizing the mass and inertia of the moving segments.
the achievable controller fidelity (in terms of bandwidth and Despite the loss in control bandwidth due to the high
feedback gains) is lowered within every additional loop of the mechanical compliance, SEAs combine multiple advantages
cascaded structure, this setup can greatly limit performance in for legged locomotion. In addition to the decoupling mech-
practice. For a mechanically stiff system (such as for example anism of the elasticity that makes the system robust against
the DLR robot arm [6]) or ‘soft’ tasks (such as interacting impacts and hence well-suited for highly dynamic maneuvers,
with the environment) this is not too much of a problem, but the spring substantially contributes to the passive dynamics
for mechanically compliant systems (such as the Meka Arm of the system and allows for passive energy storage during
or similar robots with a torque bandwidth of below 5Hz [20]) stance. In single legged hopping, the amount of energy that
and precise positioning tasks (such as foot placement) this can is recovered amounts to approximately 70% while only about
quickly become critical. Consequently, for purely kinematic 30% of the energy-fluctuations are actually coming from the
tasks, a low level joint position controller that compensates actuators to compensate for damping and impact losses [9].
for the known series elasticity (e.g., a LQG (Linear-Quadratic- Since the springs show a nearly perfect linear characteristic
Gaussian) structure presented in [10]) will always perform with minimal mechanical damping, accurate torque control
better than an inverse dynamics framework in a cascaded setup can be achieved through deflection control [9]. In addition
as depicted in Fig. 2. thereto it could be shown that using an appropriate LQG
We accordingly consider locomotion of compliant legged control structure ensures fast joint tracking [10] with a po-
systems as a hybrid control problem. Parts of the robot that sition control bandwidth (9 Hz) approximately equal to the
are conducting high performance tracking tasks (in partic- torque control bandwidth (≈11 Hz)[9]. It actively suppresses
ular swing leg control) will be locally position controlled, undesired oscillations that occur due to the passive dynamics
while the remainder of the robot will compliantly interact of spring - providing highly better performance than if one
with the environment in an inverse dynamics framework to would use a cascaded structure as in Fig. 2.
ensure a robust behavior of the system. In this paper we
III. I NVERSE DYNAMICS
present an approach for hybrid OSC that combines these two
requirements: Similar to [27], all dynamic tasks are brought The equation of motion (EoM) for a walking machine can
into a prioritized task space control structure while certain be stated in the form
joints are position controlled based on traditional inverse Mq̈ + b + g + JTs Fs = ST τ , (1)
kinematics control methods. This framework measures and
estimates the influence of the position controlled joints on with the mass matrix M (q), the coriolis and centrifugal
the torque controlled parts such that we can compensate for contribution b (q, q̇), the gravitational component g (q), the
these effects and achieve exactly the same task space behavior ground contact force Fs , its corresponding Jacobian Js (q) as
as with a purely torque controlled system, while at the same a function of the generalized coordinates q = [qb ; qr ] ∈ &n ,

130
 actuator torque τ ∈ &
n−6
and the . The selection matrix In general, the total generalized acceleration q̈ can be
S = 0(n−6)×6 , I(n−6) separates the actuated joint coordi- written as the sum of task-induced accelerations q̈j projected
nates qr ∈ &n−6 from the floating base coordinates qb ∈ &6 . into the null space Nj−1 of all higher prioritized tasks
The ground contact force Fs is the constraining force that
appears due to the contact condition 
n
q̈ = Nj−1 q̈j , (9)
j=1
ṙs = Js q̇ = 0, r̈s = Js q̈ + J̇s q̇ = 0, (2)

whereby rs represents a stacked array of all active contact with N0 being the unitary matrix. The relation between joint
points. In order to eliminate the contact force for inverse dy- space q̈ and task space r̈ is given through
namics, there exist different approaches which can be regarded
r̈i = Ji q̈ + J̇i q̇. (10)
as a support null space projection of the dynamics:

P (Mq̈ + b + g) = PST τ (3) Combining (9) with (10) allows to solve for q̈i :
PJTs = 0 ∀q (4) 
i
Ji q̈ = Ji Nj−1 q̈j = r̈i − J̇i q̇ (11)
As it was recently shown by Righetty et al. [21], OSC j=1
techniques for floating base systems [27] can be brought to ⎛ ⎞

i−1
this inverse dynamics description with (Ji Ni−1 ) ⎝r̈i − J̇i q̇ − Ji Nj−1 q̈j ⎠ (12)
+
q̈i =
j=1
POSC = SNs , (5)
   
−1 To ensure that the lower prioritized tasks do not influence
whereby Ns = M−1 I − JTs Js M−1 JTs Js M−1 rep-
higher prioritized tasks, it has to hold that
resents the dynamically consistent null space of the supporting
contacts. Jj Ni = 0 ∀j ≤ i. (13)
Similar to that, the QR decomposition [15] of the support
R
Jacobian JTs = Q with QT = Q−1 and the upper-right This is fulfilled if Ni represents the null space of the stacked
0 Jacobian matrix of all higher or equal prioritized tasks:
triangular
 matrix  R in combination with a selection matrix
Su = 0 I can be brought to this form with Ni = N ([J1 ; . . . ; Ji ]) (14)
T
PQR = Su Q . (6)
In a very general way, the null space N can be found
The same works also with a direct kinematic null space through singular value decomposition [S, V, D] = svd (J)
projection of the support Jacobian Js that can be weighted which shows better numerical stability than a kernel computa-
e.g. with the mass matrix tion using an LU-decomposition. Since V defines an orthonor-
mal basis, the null space is defined through the basis vectors
 −1
Pdir = I − JTs Js M−1 JTs Js M−1 . (7) that have a zero singular value N = V (:, S (i, i) == 0).
The complete procedure can be stated in the following
The following sections will be independent of the choice recursive algorithm for task prioritization:
of this null space projection. Given a desired acceleration, all
these methods allow to directly calculate the required joint Algorithm 1 Recursive task prioritization for accelerations
torques via a pseudo inverse
n = Number of Tasks
 + q̈t = 0 % total acceleration
τ m = PST P (Mq̈ + b + g) , (8)
N0 = I % initial null space
whereby the subscript m indicates that the torques in- for i = 1 → n do
duce motion (compare to Section VI). The pseudo-inverse Ni = N ([J1 ; ...; 
Ji ]) 
of (8) minimizes the cost function τ Tm τ m . Changing the +
q̈i = (Ji Ni−1 ) r̈i − J̇i q̇ − Ji q̈t
W
pseudo-inverse to the more generalized form (PST ) = q̈t ← q̈t + Ni−1 q̈i
−1 T
 T −1 T +
W SP PS W SP [21] allows to change the cost end for
minimization of the inversion to τ Tm Wτ m which can account
for different segment masses or the available actuator power. Note: This Section holds for floating as well as for fixed base
systems. In the case of floating base systems, the highest
A. Task prioritization priority task includes mandatorily all support acceleration
In contrast to the hierarchical task decomposition of [27], constraints (2). This is equivalent to the support null space
we use a purely kinematic task prioritization approach that has description in the task hierarchy of [25]. Otherwise, the joint
the benefit of avoiding any mass matrix inversions. accelerations are not consistent with the support constraints.

131
IV. A NALYTICAL G LOBAL K INEMATICS AND DYNAMICS
Model based control requires fast and simple methods to
get continuously updated global kinematics and dynamics of
the robotic system. In contrast to most approaches where this
is provided by a simulation environment, we developed a
tool [8] to get the global kinematics and dynamics (1) in
an analytical representation. Using the MATLAB Symbolic
Toolbox a relative kinematic tree of the robot and locally
defined force elements acting in the joints are translated to
a global kinematic description using Euler rotations. The
system dynamics, respectively the EoM in the form of (1)
are established using projected Newton-Euler equations:


N
M= JTSi mi JSi + JTRi θ Si JRi (15)
torque ctrl jmeas position ctrl
i=1
t des t meas j des , j&des

N
b= JTSi mi J̇Si q̇ + JTRi θ Si J̇Ri q̇ + Ω i × θ Si Ω i (16) inv-dynamics inv-kinematics
i=1
N swing leg compensation
g= −JTSi FgSi (17)
i=1 Fig. 3. Walking on a tree stem requires precise foot tracking, CoG
stabilization, and internal force regulation to avoid slipping. For compliant
∂r systems this is considered as a combination of kinematic position tracking
with the translational Jacobians JSi = ∂qSi evaluated at the (swing leg) and inverse dynamics control with swing leg compensation.
CoG of all segments, the rotational Jacobians JRi = ∂Ω ∂ q̇ , the
i

rotational speed Ωi , mass mi , the local body inertia matrix A. No compensation


θ Si , and the gravity force vector FgSi . The code is open Since in most cases only lightweight end-effectors require
source and simple to adapt for every individual or controller accurate position tracking (walking, grasping, etc.), the cor-
specific purpose. Using an analytical representation allows responding influence on the remainder of the body can be
very fast controller routines that are independent of a particular ˆ k = 0). This method is appropriate for slow
neglected (q̈
simulation environment. maneuvers.
V. H YBRID O PERATIONAL S PACE C ONTROL B. Acceleration measurement
For high performance joint position or end-effector tracking, Using a joint encoder, the acceleration can be measured
it is often beneficial to rely on inverse kinematics in combina- ˜ k ).
through double differentiation of the position signal (q̈
tion with fast low-level position controllers for certain joints, While being theoretically the best solution, this method is
while major parts of the system undergo a hierarchical task inapplicable in practice due to sensor noise.
decomposition with inverse dynamics as presented in Section
III. A classic example for such a scenario is quadrupedal C. Acceleration estimation
walking as depicted in Fig. 3. The legs that are in ground
contact (solid lines) remain torque controlled based on inverse Torque sensing in SEAs is mostly done through a deflection
dynamics to move/stabilize the main body and to modify (position) measurement of the series spring. Since this large
contact forces, while fast and precise tracking of the swing foot compliance acts as a mechanical low-pass filter, torque signals
(dotted line) is considered as an inverse kinematics problem. τ̃ are in general very accurate and clean [10]. Hence, the joint
There are no direct means of regulating the acceleration of acceleration can be estimated as
these position controlled joints q̈k = Sk q̈. For lightweight  
ˆ k = Sk (PM)+ P ST τ̃ − b − g .
q̈ (18)
segments and slow maneuvers, coupling effects of the swing
leg motion on the system dynamics are negligible (Section D. Acceleration prediction
V-A). For faster maneuvers, it is required to compensate for
these effects through measuring (Section V-B), estimating In applications as presented here, only end-effector joints
(Section V-C), or predicting (Section V-D) the corresponding will be position controlled. Therefore, all other objectives in
joint acceleration q̈k and to consider it as an additional task the dynamic task prioritization defined in Section III-A are
in Algorithm 1. independent of the position controlled joints (Sk JTi = 0).
This relation allows to encapsulate the EoM for the position
www.mathworks.com/products/symbolic controlled joints through pre-multiplying (1) with Sk (select-
www.leggedrobotics.ethz.ch/software/proneu ing only dynamics for position controlled joints) and through

132
inserting the separation q̈ = STd q̈d + STk q̈k of torque (Sd ) and with the tangential projection matrix
position (Sk ) controlled joints:  
DT = blockdiagonal αi tTi . (28)
   
ˆ k = Sk MSTk −1 τ̃ k − Sk MSTd q̈d − Sk (b + g) (19)
q̈ The corresponding solution is:
In contrast to the estimation approach, the prediction of the  +
ˆ k is purely based on swing leg torque measure- τ 0 = − DT A DT Fsm . (29)
acceleration q̈
ments τ̃ k in combination with the desired joint acceleration C. Constrained minimization problem
q̈d given by the task prioritization. Ideally, one would like to adjust the ground contact forces in
a3 way3 that the maximally required friction coefficient μmax =
VI. G ROUND C ONTACT F ORCES 3 Ftsi 3
3 Fn 3 is kept small, whereby superscript n and t indicate the
si ∞
In addition to the prioritized motion control of the task
normal respectively tangential force directions of the grounded
points, the presented framework allows advanced manipulation
legs i ∈ {1 . . . 4}. This can be either seen as a minimization
of the contact forces of multi-contact systems [22] via the null
of the friction coefficients under torque constraints:
space of the pseudo-inversion (8), which represents the internal
forces Fs0 that do not influence the task motion: minimize μmax (τ 0 )
τ0
  (30)
τ = τ m + N PST τ 0 = τ m + NP τ 0 (20) subject to τ min < τ = τ d + NP τ 0 < τ max ,

τ 0 allows to augment the total contact forces by or as a minimization of the joint torques while a certain safety
 + T against slipping has to be ensured:
Fs0 = JTs S NP τ 0 = Aτ 0 , (21) T
minimize (τ m + NP τ 0 ) W (τ m + NP τ 0 )
τ0 (31)
where A spans the subspace of Fs0 and rank (A) gives the
subject to μmax (τ 0 ) < μsafety
number of internal directions. This allows modifying the total
contact forces to D. Selective force manipulation
Fs = Fsm + Fs0 , (22) Let Sleg be a selection matrix of certain contact forces such
that Fdes = Sleg Fs can be freely chosen. In this case, τ 0 is
with the motion implied ground contact forces Fsm given
given through
through
 T  +
τ 0 = (Sleg A) (Fdes − Sleg Fsm ) . (32)
Fsm = J+s S τ m − (Mq̈ + g + b) . (23)
As long as the matrix Sleg A has full rank, the desired
There are several interesting and relevant cases where the con- forces are feasible, otherwise this results in the least square
tact forces can be adapted to minimize certain cost criterion: minimization of Sleg Fs − Fdes 2 . This approach can be used
minimize f (Fsm + Aτ 0 ) . (24) to vary the force distribution between the different legs to
τ0 achieve both smooth contact force and joint torque profiles.
A. No internal forces This is important for quadrupedal walking (Section VII-D2)
when support changes.
The system has no internal forces which is equivalent to a
minimal total ground contact forces: VII. S IMULATION
minimize (Fsm + Aτ 0 )2 (25) To demonstrate the performance of this control framework,
τ0 we accomplish a static walking test on a tree stem (Fig. 3).
This is a very common approach in minimizing slippage when This requires i) precise control of swing foot trajectories which
walking on flat ground [28]. Given that the pseudo-inverse is considered as a kinematic control task, ii) CoG control to
solves the least square error problem, the solution is ensure stability based on a hierarchical OSC implementation,
and iii) internal force regulation to avoid slipping on the tree
τ 0 = −A+ Fsm . (26) stem. The simulations are conducted both in an idealized MAT-
LAB simulation with hard point contacts (impact at landing)
B. Preferred normal force directions
as well as in a real-time control and simulation environment
For walking in rough terrain it becomes very important (Section VIII-A) that closely fits the actual system.
to align all contact forces with the local surface normal
directions ni ∈ &3×1 , respectively to minimize the forces A. Foothold and CoG trajectories
in the corresponding tangential planes ti ∈ &3×2 . Using a The robot periodically executes a constant footfall sequence
tangential plane selection matrix D and a geometric weighting [14] left-hind (LH), left-front (LF), right-hind (RH), right-front
factor αi that allows to account for the stance-geometry, the (RF). The way-points for the main body are planned through
minimization problem (25) can be written as intersecting subsequent support triangles with a predefined
3 3 safety margin for three steps ahead such that the body motion
minimize 3DT (Fsm + Aτ 0 )32 (27) is minimized. Given the support center before stepping r−
τ0 c =

133
x 10
-3 base position tracking error w/ and w/o compensation a)LF-leg friction coe±cient
2
3 CoG shift RF step CoG shift LH step CoG shift LF step CoG shift RH step
w/o compensation

m [-]
w/ compensation, joint acceleration measurement 1
2.5
w/ compensation, FULL joint torque measurement
rms error [m]

w/ compensation S k joint torque measurement normal LS


2 0
0 1 2 3 4 5 6 7 Fint =80

swing phase
time[s]
T0 = 0
1.5 b)LF-leg knee torque
30 SQP

Tknee [Nm]
1 20
10
0
0.5
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -10
0 1 2 3 4 5 6 7 8
walking cadence (stepping time [s])
time[s]

Fig. 4. Despite the low leg inertia, fast kinematic tracking of the swing leg Fig. 5. a) Required friction coefficient of different force distribution
introduces increased errors on the main body tracking. Through measuring methods in the tree stem experiment: the best results were achieved with the
joint accelerations or torques, these effects can be compensated. normal least-square solution (black-solid, Section VI-B) and the constraint
 −  minimization problem (gray-dashed, Section VI-C). Both the joint torque
mean rF i (index − indicates before step), the next con- (black-dotted), τ 0 = 0) and the contact force minimization (gray-dashed-

tact position r+ +
F i of leg i is given through rF i = rc + rs + rOi
dotted, Section VI-A) show unsatisfying results. b) Using (33), the contact
force and joint torques are both smooth without any discontinuities.
with the step vector rs and the constant offset value rOi that
describes the leg configuration with respect to the contact
center in a predefined home position. As shown in [13] this contact can be minimized (Section VII-D1). Second, the
converges to a symmetric pattern that staggers the footsteps in contact force distribution has to be changed to get smooth
an optimal manner. The swing leg trajectory  is defined
 through transitions before and after lift-off and landing of each leg
 +  1 1
rF i (t) = r−Fi + r Fi − r −
Fi 2 − 2 cos π t
tstep whereby (Section VII-D2).
ground clearance is ensuredthrough a superimposed vertical
1) Force alignment: In Section VI we presented three
motion hF i (t) = h2 − h cos π tstep t
. different methods to change the ground contact force dis-
B. Task hierarchy with kinematic foot tracking tribution. The least square solutions to minimize the total
force (26) and to minimize the tangential forces (29) can be
To ensure stability, the highest priority task is the CoG or directly implemented using a pseudo-inverse. The constraint
main body base (rb , Jb ) control. Additionally to that, the swing minimization (30) is implemented using a SQP (sequential
leg contact point has to follow the predefined step trajectory quadratic programming) solver which is initialized using the
(rf , Jf ) which is considered as a position control task, while least-square solution (29). These last two methods both require
the base orientation should be stabilized (ro , Jo ). The last task knowledge about the ground contact surface normal directions.
(although in this special case not necessary) in the hierarchy is Various studies have shown that relatively simple methods
always a null space joint damping or posture control (rj , Jj ). allow reliable foot surface shape detection based on haptic
The desired task accelerations for the base position as well as [7] or on visual feedback [16]. In simulations of a static gait
the base orientation are determined based on a PD control law executed on a tree stem, the required friction coefficient can
r̈t = kt (rdes − rt ) + dt (ṙdes − ṙt ) + r̈des with relatively low be significantly lowered by taking the local surface normal
gains to achieve a robust and compliant behavior. directions into account (Fig. 5a). The constraint optimization
C. Compensation for swing leg motion (gray-dashed) distributes μ equally on all legs. The normal
direction weighted least square solution (black-solid) performs
To demonstrate the applicability of the hybrid setup (Section
quite as good with significantly less computational effort.
V) and to demonstrate the need for swing leg compensa-
tion, a series of flat ground walking simulations in the real- 2) Load distribution: During the CoG shifting phase from
time environment was conducted over a fixed distance (2 m). support triangle i to j (where i and j stand for two successive
Thereby, only the swing duration of every step was modulated 3-sets of the 4 legs), the contact force is linearly modulated
in the range of 0.2 s to 1.0 s and the average absolute tracking with the relative projected distance x ∈ [0...1] given through
(rb −r0 )·n01
error of the base position during swing phase was recorded. x = (r 1 −r0 )·n01
with the unit direction vector n01 = rr11 −r
−r0  .
0

The results depicted in Fig. 4 are as expected: While for This results in a desired ground contact force
slow tracking the error of assuming zero joint acceleration
(Section V-A, crosses) is negligible due to its very low
4
inertia, it becomes crucial as the swing time is shortened. All {4} {i}
2xFs + (1 − 2x)Fs x ≤ 0.5
other methods (Section V-B - V-D) perform equally good and Fdesired = {4} {j}
2(1 − x)Fs + (2x − 1)Fs x > 0.5.
(nearly) independent of the swing leg duration.
(33)
D. Contact force optimization Without this contact force optimization (τ 0 = 0, black
Optimizing the contact forces without influencing the task dotted line), the contact force (Fig. 5b) as well as the required
tracking performance (Section VII-A) has two main purposes. joint torques show significant discontinuities when the contact
First, the forces should be optimally aligned with the surface situation changes. Adjusting τ 0 according to (33) allows to
normals, such that the risk of slipping and losing ground make these transitions perfectly smooth.

134
a) LF step input b) LF chirp answer a)main body position b)foot position tracking
0.7
0.5 A=0.2rad: demand position Exp: xbody [m]
A=0.2rad: LQG position ctrl Exp: ybody [m]
0.6 0.4 A=0.6rad: demand position 0.6 Exp: zbody [m]

landing
demand position 0.1

lift-off
0.3 A=0.6rad: LQG position ctrl Sim: xbody [m]
LQG position control
cascaded PD-control 0.2 0.5 Sim: ybody [m]

HFE [rad]
HFE [rad]

0.4 0.1 Sim: zbody [m] 0.08


0

position [m]
0.4
-0.1
0.2

z [m]
-0.2 0.06
-0.3
0.3
Exp: zfoot desired [m]
0 -0.4 Exp: zfoot measured [m]
-0.5 0.2 0.04 Sim: zfoot desired [m]
Sim: zfoot measured [m]
0 0.2 0.4 0.6 0 2 4 6 8 10 12
frequency [Hz]
0.1
time [s] 0.02
Fig. 6. (a) Step answer: The position control performance with a fast low- 0
level LQG position controller based on a SEA model performs significantly estimated ground elevation
0
better than the equivalent cascaded controller. (b) Linear chirp time series -0.1
0 5 10 15 20 3.2 3.4 3.6 3.8 4
plotted against frequency: The position control bandwidth drops for large time [s] time [s]
amplitudes due to saturation effects. Fig. 7. The simulation and experimental results using exactly the same
hierarchical OSC controller implementation with slippage minimization for
VIII. R EAL - TIME I MPLEMENTATION AND H ARDWARE flat ground agree to a large extent in terms of base trajectory (a) and the quite
E XPERIMENTS accurate foot position tracking (b).

A. Controller framework and real-time simulation behavior and shorter settling time. Analyzing the linear chirp
The presented controller framework is based on analytical signal (Fig. 6b) shows a bandwidth of close to 10 Hz for the
kinematics and dynamics. This ensures a very clear separation linear system dropping down to about 4 Hz for very large
between controller and simulation environment respectively amplitudes (0.6 rad) due to saturation effects.
actual hardware [11].The simulator part consists of the real- C. Simultaneous walking and contact force optimization
time multi-body-system simulation environment SL [24] that To prove applicability of the proposed methods we finally
communicates through shared memory with the controller in- performed a series of experiments with StarlETH. Static walk-
terface. The controller contains the analytical global kinemat- ing on flat terrain including ground contact force optimization
ics and dynamics (Section IV), a realistic state estimator [3], showed qualitatively very nice performance using the hybrid
and actuator/sensor models. A change in the communication structure described in Section V with a combination of low
layer allows switching to the actual robotic system or even level torque [9] and position [10] controllers. These experi-
running both simultaneously. Four parallel operating CAN bus ments largely agree with the corresponding simulation of the
systems connect to the low level motor controllers and sensor same controller (Fig. 7a). In addition thereto, the swing leg
boards. An inertial measurement unit containing accelerometer position tracking based on the low level position controller
and gyroscope delivers information about the current state of performs well in both the simulation and the experiment (Fig.
the main body. These signals are fused with the leg kinematics 7b). The signals differ since the estimator [3] predicted a
(contact conditions) to give precise information about the main minimal ground elevation.
body position/orientation as well as about ground elevation [3]. The contact force modulation (Section VI) was tested on a
In addition to speed, simplicity of the controller implemen- surface with approximately 40◦ inclination (Fig. 8). Ground
tation, and independence of the simulation environment, using truth data of the contact slippage was accessed through vision
such a setup has several benefits. The probably most important based foot point tracking in the movies collected during the
advantage when working with model based systems is that experiments. As shown in foot position plot of the left-front leg
the dynamics behind the simulation are not exactly equal in Fig. 9, activating the contact force alignment significantly
as the analytical model. Differences mainly arise due to the lowers slippage. Without an optimal force distribution as a
sensor/actuator models (noise, filtering, bandwidth/saturation function of the local contact normals, the system fails after
limitations) and state estimation, different handling of con- a single step sequence (dotted), while adjusting the internal
tact interaction or modeling errors/uncertainties (e.g. point forces using the least square method (29) allows to robustly
feet). Thereby, plant inversion problems that are often hidden walk on the tree stem (solid).
when simulation and control model are exactly equal, can be
avoided. Using this framework allowed a successful validation IX. C ONCLUSION
and robustness check of the controller properties. There are two primary contributions of this paper. First of
all, it introduces StarlETH, a compliant quadrupedal robot that
B. Low-level position control was built to study fast, efficient, and versatile locomotion.
As motivated through rather theoretical considerations (Fig. High compliant series elastic actuation makes the system
2), a low-level position controller performs better than the inherently safe for human and environment interaction, enables
corresponding cascaded structure. This fact was investigated in the temporary storage of energy, and allows for precise torque
joint step input experiments. As depicted in Fig. 6a, the LQG control. In return, torque bandwidth limitations arise due to the
[10] structure shows the equal rise time (saturation effects) as high compliance and call for appropriate control techniques to
the cascaded structure but benefits from a much better damping ensure fast and precise joint tracking [10]. The introduction

135
weight robots. In Int. Conf. on Robotics and Automation (ICRA),
2001.
[7] M. H. Hoepflinger, C. David Remy, M. Hutter, L. Spinello, and
R. Siegwart. Haptic terrain classification for legged robots. In
Int. Conf. on Robotics and Automation (ICRA), 2010.
[8] M. Hutter, C. Gehring, and R. Siegwart. proNEu: Derivation
of analytical kinematics and dynamics. Technical report, Au-
tonomous Systems Lab, ETHZ, 2011.
[9] M. Hutter, C. David Remy, M. H. Hoepflinger, and R. Siegwart.
High compliant series elastic actuation for the robotic leg scar-
leth. In Int. Conf. on Climbing and Walking Robots (CLAWAR),
2011.
[10] M. Hutter, C. David Remy, M. H. Hoepflinger, and R. Siegwart.
Scarleth: Design and control of a planar running robot. In Int.
Conf. on Intelligent Robots and Systems (IROS), 2011.
[11] M. Hutter, C. Gehring, M. Bloesch, M. H. Hoepflinger, C. David
Remy, and R. Siegwart. Starleth: a compliant quadrupedal robot
for fast, efficient, and versatile locomotion. In Int. Conf. on
Fig. 8. StarlETH was successfully tested in experiments that required Climbing and Walking Robots (CLAWAR), 2012.
alignment of the contact forces with the actual ground contact normals to [12] O. Khatib. A unified approach for motion and force control of
avoid slippage. robot manipulators: The operational space formulation. IEEE
Journal of Robotics and Automation, 3(1):43–53, 1987.
LF foot position during tree stem walking experiment [13] J. Z. Kolter and A. Y. Ng. The stanford littledog: A learning
0.1 and rapid replanning approach to quadruped locomotion. The
0.05 Int. Journal of Robotics Research, 30(2):150–174, 2011.
[14] R.B. McGhee. Some finite state aspects of legged locomotion.
z [m]

0
Mathematical Biosciences, 2:67–84, 1968.
-0.05 slippage w/ local surface normal
failure w/o local surface normal
[15] M. Mistry, J. Buchli, and S. Schaal. Inverse dynamics control
-0.1 of floating base systems using orthogonal decomposition. In
0 2 4 6 8 10 12 14 16 18 Int. Conf. on Robotics and Automation (ICRA), 2010.
time [s]
[16] N. J. Mitra, A. Nguyen, and L. Guibas. Estimating surface
Fig. 9. Walking on the tree stem is only possible through accounting for local normals in noisy point cloud data. Int. Journal of Computational
surface normal directions (solid). Using a flat ground assumption (dotted), the Geometry & Applications, 14(4-5):261–276, 2004.
system fails after one full step sequence. [17] J. Nakanishi, R. Cory, M. Mistry, J. Peters, and S. Schaal. Op-
erational space control: A theoretical and empirical comparison.
of a hybrid operational space control technique allows to The Int. Journal of Robotics Research, 27(6):737–757, 2008.
combine the advantages of such local joint position control [18] R. Philippsen, L. Sentis, and O. Khatib. An open source exten-
with global inverse dynamics methods. In the example of sible software package to create whole-body compliant skills
quadrupedal walking, it was shown that the application of in personal mobile manipulators. In Int. Conf. on Intelligent
this method ensures fast tracking of the swing leg (position Robots and Systems (IROS), 2011.
[19] G. Pratt and M. Williamson. Series elastic actuators. In Int.
control) while the rest of the robot is undergoing a hierarchi- Conf. on Intelligent Robots and Systems (IROS), 1995.
cal task decomposition (similar to [27]) that utilizes inverse [20] M. Quigley, A. Asbeck, and A. Ng. A low-cost compliant 7-dof
dynamics to ensure a robust and compliant behavior of the robotic manipulator. In Int. Conf. on Robotics and Automation
entire system. Additionally, we showed in simulations and (ICRA), 2011.
hardware experiments, how the contact forces can be aligned [21] L. Righetti, J. Buchli, M. Mistry, and S. Schaal. Inverse dynam-
ics control of floating-base robots with external constraints: A
to the local surface geometry without altering or impairing unified view. In Int. Conf. on Robotics and Automation (ICRA),
the motion. This greatly reduces the risk of slippage, even in 2011.
highly challenging terrain (Fig. 8). [22] L. Righetti, J. Buchli, M. Mistry, and S. Schaal. Control of
legged robots with optimal distribution of contact forces. In
R EFERENCES Int. Conf. on Humanoid Robots (Humanoids), 2011.
[1] R. M. Alexander. 3 uses for springs in legged locomotion. Int. [23] Meka Robotics. A2 comliant arm, 2011. URL http://www.
Journal of Robotics Research, 9(2):53–61, 1990. mekabot.com/arm.html.
[2] Inc. Barrett Technology. WAM arm, 2010. URL http://www. [24] Stefan Schaal. The SL simulation and real-time control software
barrett.com/robot/products-arm.htm. package. Technical report, USC, 2009.
[3] M. Bloesch, M. Hutter, M. H. Hoepflinger, C. D. Remy, [25] L. Sentis. Synthesis and Control of Whole-Body Behaviors in
C. Gehring, and R. Siegwart. State estimation for legged robots Humanoid Systems. Ph.d. thesis, 2007.
- consistent fusion of leg kinematics and imu. RSS Robotics [26] L. Sentis. Compliant Control of Whole-Body Multi-Contact
Science and Systems, 2012. Behaviors in Humanoid Robots. Springer Global Editorial,
[4] G. A. Cavagna, F. P. Saibene, and R. Margaria. Mechanical 2009.
work in running. J Appl Physiol, 19(2):249–256, 1964. [27] L. Sentis and O. Khatib. Control of free-floating humanoid
[5] P. Gonzalez de Santos, E. Garca, and J. Estremera. Quadrupedal robots through task prioritization. In Int. Conf. on Robotics and
locomotion: an introduction to the control of four-legged robots. Automation (ICRA), 2005.
Springer, 2006. [28] D. Zhou, K. H. Low, and T. Zielinska. An efficient foot-force
[6] G. Hirzinger, A. Albu-Schaffer, M. Hahnle, I. Schaefer, and distribution algorithm for quadruped walking robots. Robotica,
N. Sporer. On a new generation of torque controlled light- 18(04):403–413, 2000.

136
Modeling and Prediction of Pedestrian Behavior
Based on the Sub-Goal Concept
Tetsushi Ikeda, Yoshihiro Chigodo, Daniel Rea, Francesco Zanlungo, Masahiro Shiomi, Takayuki Kanda
Intelligent Robotics and Communication Laboratories, ATR, Kyoto, Japan
E-mail: [email protected]

Abstract— This study addresses a method to predict pedestrians'


long term behavior in order to enable a robot to provide them
services. In order to do that we want to be able to predict their
final goal and the trajectory they will follow to reach it. We attain
this task borrowing from human science studies the concept of
sub-goals, defined as points and landmarks of the environment
towards which pedestrians walk or where they take directional
choices before reaching the final destination. We retrieve the
position of these sub-goals from the analysis of a large set of
pedestrian trajectories in a shopping mall, and model their global
behavior through transition probabilities between sub-goals. The
method allows us to predict the future position of pedestrians on Figure 1. Sub-goal: a point in a space pedestrians are walking towards. The sub-
the basis of the observation of their trajectory up to the moment.1 goals in the picture are retrieved with our method, see fig. 10
Keywords-component; pedestrian models; sub-goal retrieval; their final destination in a polygonal line passing through some
behavior anticipation points known as sub-goals [10,11] (Fig. 1). We assume that the
position of these sub-goals is to a certain extent common to all
I. INTRODUCTION pedestrians, i.e. that they are determined mainly by the
environment. We retrieve the position of these sub-goals from
A robot operating in an environment where a significant the observation of a large number of trajectories, and build a
number of pedestrians is present, such as a shopping mall, probabilistic transition model between sub-goals that is used
needs a considerable amount of information about the current for motion prediction. The sub-goals could be used both as the
and general behavior of pedestrians in the environment. For nodes of the robot global path planner, and as the nodes of the
the task of navigation, besides the obvious need to know the planner in the pedestrian simulator, introducing a common
positions and velocities of nearby pedestrians for local knowledge of the environmental features based on the actual
planning, also the global planner can take advantage of motion of pedestrians.
prediction of human behavior. Knowledge of the current
pedestrian density at larger distances, the average behavior of
pedestrians in the environment, and even the use of a II. RELATED WORK
pedestrian simulator to predict the future behavior of the The simplest method to predict the motion of a pedestrian is
crowd as a whole may prevent the robot to end in highly to perform a velocity-based linear projection, i.e. assuming that
populated areas where collision avoidance could not be people keep moving with a constant speed. This is known to be
performed effectively. To provide services such as guiding a reasonable approximation for short-term behavior, used for
[1,2], surveillance, shopping assistance [3], giving information collision avoidance [12,13] and tracking algorithms [14]. The
[4,5,6], assisting people who get lost, an ability to recognize method does not need any knowledge about the environment or
behavioral patterns and predict future behavior and position is the previous trajectory, but it is not reliable for long term
needed. prediction ignoring environmental constraints or influence .
In this work we tackle the problem of position prediction, In order to predict the future position of a walker using the
which is of particular importance for a mobile robot providing information based on the previously observed behavior of a
services in a shopping mall. An ability of predicting the large number of pedestrians, the motion can be modeled
position of pedestrians beyond velocity projection will enhance through transitions among grids or nodes [15,16]. Recent
the collision avoidance of the robot [7], and enable approaching approaches often retrieve typical transition patterns (pattern-
humans to talk with them and providing services [8,9]. based approaches) to use also knowledge about the previous
behavior of the pedestrian whose motion is being predicted
In particular, we develop an approach that gives us
[7]. The future course of a new observed trajectory is estimated
information useful not only for prediction but also for robot
with transition patterns that resemble the new trajectory [17].
navigation, environment modeling and pedestrian simulation.
To do that we use the concept of sub-goals, i.e. we assume that Pattern-based methods use a discrete space description.
pedestrians have the tendency to segment the walking route to Usually both for robot navigation and pedestrian simulation a
continuous description in which the nodes of the global path
1
planner are given at arbitrary distance is preferred. The usual
This research was supported by JST, CREST.

137
Figure 2. Model of a global behavior: a pedestrian follows a series of sub-
goals while avoiding collision with others

way to do that in robotics is to build a topological map from the


geometrical features of the environment. These topological
Figure 3. Overview of computation in the proposed method
methods do not use information coming from human behavior,
and though effective for navigation, they do not provide any the nodes of these segments to be common to all pedestrians,
knowledge about human behavior to the robot system [18]. i.e. to be determined only by the nature of the environment.
Bennewitz et al. [7] use a stop point approach, i.e. retrieve the Since the path to the next node is straight, these nodes are
position of the nodes based on observation of human behavior, called sub-goals. These sub-goals can be determined by the
computing them as the places where people usually stop. This presence of a visual landmark, and pedestrians may switch to
approach could be limited in the description of an environment their new sub-goal when it is visible to them, even before
in which people take decisions or just change direction without reaching the one they were walking to.
stopping. Once the graph is built with topological or human-
motion based approach, the nodes are used as the states of a C. Influence from other pedestrians (local behavior)
hidden Markov model, and the transition probabilities are used Deviations from the straight path to next sub-goal are due
to model and predict human behavior. to the collision avoiding (local path planning) behavior. This
In this work in order to build the nodes of the transition interaction can be expressed in a simple mathematical form
model we use the social science concept of “sub-goal”. The applying the Social Force Model [22] (SFM), that expresses the
route of pedestrians can be modeled as a succession of decision acceleration of pedestrians as
points, which are influenced by the structure of the
environment, as for example by the presence of landmarks and v p  v (t )
architectural features [10,11,19]. By learning from people’s
a(t )
W
 ¦fj
j (t ) (1)
trajectories, we retrieve the sub-goals from the directions
pedestrians mostly head to. To reduce the effect of local Here v p , stands for the preferred velocity of the pedestrian,
dynamics, this direction to the sub-goal is computed whose magnitude is determined by the most comfortable speed
subtracting the effect of local collision avoiding as predicted by for the pedestrian, and the direction by the unit vector pointing
the Social Force Model specification introduced in [20]. from the current position to the next sub-goal; W is the time
scale for the pedestrian to recover her preferred velocity, while
III. OVERVIEW OF THE MODEL OF PEDESTRIAN f j is the collision avoiding behavior towards pedestrian j (the
BEHAVIOR force depends on the relative positions and velocities of
This section provides ashort overview of the model of pedestrians). Different specifications of f have been proposed,
pedestrian behavior we propose (figure 2). and we adopt the specification and parameters proposed in [19],
which have shown to best describe pedestrian behavior in the
A. Final destination (goal) SFM framework. We use the SFM framework because eq (1)
We assume that pedestrians move towards a final can be easily arranged as
destination, their goal. In an ideally uniform environment,
without any kind of fixed or moving obstacle, neither different
vp v (t )  W (a(t )  ¦ f (t ))
j
j (2)

architectural features and the like, pedestrians would move


straight to their goal. In a real environment, the goal is usually in order to obtain v p , i.e. the direction to the next goal, from
not reachable on a straight path due to the presence of fixed pedestrian positions, velocities and accelerations.
obstacles, and paths different from the straight one could be
preferred due to easiness of walking (nature of terrain, IV. PREDICTION ALGORITHM
architectural features, crowding).
A. Overview
B. Influence of the environment (sub-goals) Figure 3 shows the architecture of the proposed method,
We assume that in a real environment pedestrians try to which consists of an offline analysis step and an online
divide their path (global path planning) in straight segments prediction step. In the offline analysis step, from the positions
joining their actual position with the goal [21], and we assume and velocities of a large number of pedestrians we compute the

138
preferred velocities using eq. (2). Then the sub-goals in the
environment are estimated by extracting points toward which
many pedestrians head. In the proposed model, each pedestrian
is considered to head towards one of the sub-goals and
sometimes switch to a different one. So we represent each
trajectory by a sequence of sub-goals, and obtain a probabilistic a) Observed trajectories of pedestrians that go across a grid square.
sub-goal transition model from observed trajectories .
In the online step, after observing the initial portion of a
trajectory, we compute the preferred velocity with eq. (2). On
the basis of the preferred velocity and sub-goal sequence of a
pedestrian up to the moment, we estimate future positions
using a probabilistic transition model.

B. Estimating sub-goals b) The solid line shows the histogram of directions ș, and the
After collecting a large amount of trajectories in the dashed lines show each component in the estimated von Mises mixture
environment, and computing the preferred velocities, we divide distribution.
the environment in a discrete grid and study the distribution of
preferred directions for each grid cell. We retrieve sub-goals as
points towards which a large number of directions converge,
and in order to find them we define a “sub-goal” field
extending the direction probabilities from each grid cell and
defining the sub-goal set as the set of maxima of the sub-goal c) Estimated major flow directions the grid square. The directions of
field. the arrows correspond to the mean of the estimated von Mises
distribution in b).
1) Computing Directions of Pedestrian Motion on a Grid Figure 4. Observed trajectories and the estimated flow directions in a grid
We discretize the environment using a square grid with square.
linear dimension 0.5 meters which roughly corresponds to the
size of a human being, and thus to the scale at which we can
expect behavioral variation. For each grid the distribution of
preferred directions T p is obtained from velocities v p .The
distribution of directions at each grid square may consist of
several different components. Figures 4 (a), (b) show the
observed direction distribution in a grid square. We call each
component flow, and we model its angle distribution using a
von Mises distribution [23], Figure 5. Extention of flows from grid cells. (+) represents a point where two
exp(N cos(T  P)) flows cross, which is a position many people head toward and is a sub-goal
M (T | P ,V ) (3)
candidate.
2S I 0 (N )
N 1 V 2
where P is the mean, V is the variance and I 0 (N ) is the 0
order modified Bessel function of first kind [24]. Eq. (3) gives
the approximation of a Gaussian wrapped on the circle.
Subsequently we estimate each component using an EM Figure 6. Contribution of a flow to the field I ( x) .
algorithm. We first discretize and smooth the direction This process defines the set of flows F {M j } , i.e. the set
distribution on the grid cell, and then compute the number of of all the von Mises components found in the environment,
local maxima of this distribution. We assume that to each local each one characterized by its P and V values. Figure 4 (c)
maximum corresponds a flow, and fix the parameter P to the shows the estimated flows detected in an observed area.
position of the local maximum, and V to the width of the peak
around the maximum. After this initialization process, each 2) Estimating Sub-goals from Motion Directions
pedestrian direction is associated to one of the von Mises By extending flow directions from each grid, we can define
mixture components according to the probability the a scalar “sub-goal” field on the environment. As shown in
component assigns to that direction. Then each distribution figure 5 in the case of two grid cells, a maximum of this field is
component is estimated based on the assigned pedestrian a place in the environment towards which many people head,
directions. This process is repeated until the estimated i.e. a sub-goal. We define the scalar field generated by the flow
component has converged [25]. M j in point x as
I j (x) M j (T | P , V ) ; T arg(x  c) (4)

139
C. Modeling Global Behavior
Given a set of sub-goals, we model each pedestrian's
trajectory as a sequence of movements toward one of the sub-
goals. So we model the long-term behavior of pedestrians by
conditional probability distribution of the transition to the next
sub-goal.
1) Computing a series of past sub-goals from a fully-
observed trajectory
To compute the series of sub-goals from a trajectory, first
we estimate the sub-goal of the pedestrian at each instant t
sub-goal sequence at each time instant (Ɣ): 1 1 1 3 2 2 2
based on the preferred velocity. When the pedestrian is at
smoothed sub-goal sequence: 1 1 1 1 2 2 2 position x with preferred velocity v p , the sub-goal with
series of sub-goals: 1 2 closest angle to v p that exists in a visual cone area C is
Figure 7. Estimating the sub-goal that a pedestrian heads towards. selected ( y1 in Figure 7) as

where P and V are the parameters of the flow according to eq. st argmin(angle( v tp , y j  x t )) , (9)
(3). T is the direction of the displacement vector from the grid y j in C

center c, where the flow is defined, to x (Figure 6), while arg()


where angle( x , y ) is a function that returns the angle
returns the angle of the given vector. The scalar field generated
by a sub-set of flows Fi Ž F is defined as between vector x and y in [0, S ] . The area of the visual cone
is defined relative to x and v p and its size set to 40 degrees
I (x; Fi ) ¦I j ( x) (5) in angle, a value close to the typical flow spread.
j s.t. M j Fi Then we smooth the sub-goal sequence {s t } obtained at
~
the previous step into { st } by selecting the most frequent
and, following the example with two flows of figure 6, a sub-
index that appears in a fixed length time window:
goal s generated by a flow set Fi is the maximum of field (5),
~s mode(s t ' / 2 ,, s t ' / 2 ) , (10)
t
si argmax I (x; Fi ) . (6)
x where mode( ) is a function that returns most frequent value
and ' is the length of the time window, set as 2000 msec, a
Equation (6) defines a set S {s i } of sub goals given a
value empirically chosen to smooth noise without losing
partition of flows {Fi } , with Fi ˆ F j ‡ and  Fi F . At
information.
the same time, we define a flow partition from a sub-goal set
saying that Finally we compute the sub-goal series which is defined by
the first sub-goal and the sub-goals that are different from the
Fi {M j | s i argmax I j (s k )} . (7) previous one (fig. 7):
sk S
series of sub-goals = {~
s1} ‰ { ~st | ~st z ~st 1} . (11)
In the following we assume N sub-goals to be present in
our environment, and thus the set of flows F to be divided in 2) Transition probability
N subsets Fi . Our algorithm applies iteratively eqs. (6) and To model common movements in an environment, we
(7) in order to find the flow partition {Fi } and sub-goal set construct a statistical model of sub-goal transitions. We
S that maximize compute the following transition probabilities based on
observed trajectories.
¦I (s ; F )
si S
i i
(8)
# of pedestirans that go toward y 1 then y 2 (12)
p ( y 2 | y1 ) =
# of pedestirans that go toward y 1
The algorithm is initialized searching for sub-goals on the
discrete grid flows are defined on. At the beginning the flow We also compute transition model with n-1 step histories
partition consists of the whole set, F1 F , and the first sub- (n-gram model)
goal s1 is the maximum on the grid of I (x; F ) . The second # of pedestrian s that go toward y 1 , , y N (13)
subgoal s 2 is obtained as the point on the grid that, after p ( y N | y 1 , , y N 1 ) =
# of pedestrian s that go toward y 1 , y N 1
obtaining the partition {F1 , F2 } from eq. (7), maximizes eq
(8), i.e. I (s1 ; F1 )  I (s 2 ; F2 ) . The process is continued until We computed these transitions for values of n up to 6, a
N sub-goals and flow partitions are obtained, and at this point, compromise between computational economy and
using a local Monte Carlo search, eq. (6) is used to obtain the completeness of description of the environment.
sub-goal positions in continuous space, which ends the
initialization process. Subsequently, eqs. (7) and (6) are applied D. Prediction of Future Movements
until convergence, i.e. the total displacement of sub-goals After observing trajectories of pedestrians up to now, our
under application of eq. (6) is smaller than a given threshold. algorithm predicts future trajectory of each pedestrian.

140
1) Estimating the probability of the current sub-goal from
a partially observed trajectory
From the observed trajectories, the preferred velocity vector
v p and sub-goal sequence until now {q k } are computed for
each pedestrian. Suppose v p | v p | is the preferred speed, and
T p arg( v p ) is the preferred direction. Based on Bayes'
Theorem, we model the probability that the pedestrian heads
toward sub-goal y given the preferred velocity and the sub-
goal sequence as: Sum of the length of solid lines = v pT
p p
p (y | T , {q k }) v p (T | {q k }, y ) p (y | {q k }) . (14) Figure 8. Computation of the estimated position at time T. The lines from x
to z show the sequence of sub-goals with the highest probability. Note that
The first term of the right hand side of the equation is the pedestrian changes sub-goals before reaching them. We use statistics of the
probability of the preferred direction T p given {q k }, y , which distance from each sub-goal to change sub-goals.
is empirically derived from a large amount of observed
trajectories. In this paper, we approximate this term assuming
dependence on only the last sub-goal q m , and model the term
using a Gaussian distribution, whose mean and variance
depend on q m , y and are determined on empirical data. The
second term is from the probabilistic sub-goal transition model.
Eq. (14) is computed for all sub-goals y in the visual cone.
2) Prediction of future positions
The probability that the pedestrian reaches a sub-goal z in Number of pedestrians 12003
future via any other sub-goal y i is given by Length of observations 7 hours
Size of observation area 860 m 2
p (z | T p ,{q k }) ¦ p ( z |T
i
p
,{q k }, y i ) p( y i | T p ,{q k }) . (15) Figure 9. Experimental environmentࠋPictures of the environment are
visible in figures 1, 14.
In the following we will assume that the transition The future position of the pedestrian at time T after the
probability from y i to the next sub-goal is independent from
observation is estimated assuming that with probability (15),
the current preferred direction, so that the first term in the
(i.e. the one that sums over the whole set L in (17)) the
summation is simplified as
pedestrian will be located on the line connecting the sub-goal
p ( z | T p , {q k }, y i ) | p ( z | {q k }, y i ) (16) succession {r k } that assumes maximum probability in L , and
more precisely, it will be located between sub-goal rn and sub-
To compute this term we sum over all the possible routes goal z at a distance v pT ( v p is the preferred speed at x ) from
from y i to z , but since our goal is to estimate the position of a x on that line, see Figure 8.
pedestrian at T seconds after the last observation, we take in
consideration only the routes that are compatible with the V. RESULTS
length a pedestrian can walk in time T . Explicitly we assume
This section illustrates the model obtained with our method
and compares the prediction results with those of other
p (z | {q k }, y i ) = ¦ p(z,{r } | {q }, y )
{rk }L
k k i
methods. In section V-B, to evaluate the locations of nodes, we
{r k }: subgoal sequence from y i to z compared our method with other node generation methods by
running the same prediction procedure described in VI-C and
L : set of {r k } s.t. dist ( x, y i , r1 ,  , rn )  v pT
VI-D. Then in section V-C, we further compared our method
and dist ( x, y i , r1 ,  , rn , z ) ! v pT and different prediction methods.
where dist (r1 , r2 , ) is the distance (17)
along the route (r1 , r2 , ) A. Environment and setup
x : current position of the pedestrian We collected trajectories in a large shopping mall (Fig. 9)
v p : preferred velocity in which several restaurants and shops are present. Since it is
T : time elapsed located between another large building and a train station, it is
a transition place for many pedestrians. The size of the
observation area is 860 square meters and we observed 12003
where the probabilities in the summation term are obtained
pedestrians in 7 hours, of which 85% were used for calibration
recursively from eq. (14). Finally, the probability to reach sub- and 15% for testing. To track the pedestrians we used twenty
goal z given the observed sub-goal sequence and present LRFs (Hokuyo Automatic UTM-30LX) and applied a tracking
preferred velocity, eq. (15), is obtained summing over all algorithm based on shape-matching at torso-level [26]. This
possible sub-goals y i in the cone of vision (using eq. 14) area is larger than those investigated in previous literature
works, and thus the prediction task is harder.

141
B. Description of the environment through the sub-goal
positions
In this section we compare the proposed method with other
methods that provide a description of the environment based on
a graph, as topology based [18] methods and stop point [7]
methods. The comparison is performed from a qualitative point
of view, comparing the description of the environment given
by the position of the nodes, and from a quantitative point of
view, comparing the precision in the prediction of future a) Modeled global behavior with the proposed method.
positions.
The topology method is implemented obtaining the map of
the environment from the pedestrian movement data. The
environment is discretized using a grid, and the grid cells are
divided between those on which pedestrian data was recorded,
and those without pedestrian data. After a smoothing process,
the boundaries between the two areas are used as the
boundaries of the map of the environment. In order to build the
map a Voronoi diagram method [27] is applied. Also to
implement the stop-point method we divide the environment b) Initial position of sub-goals that are placed at local peaks of eq.(6).
using a discrete grid, and after defining the number of stop- Figure 10. (a) Modeled global behavior in the shopping mall. Filled circles
points N, we have check the positions of the N cells where represent sub-goal positions. The sum of transition probabilities between two
pedestrians stop more often. After obtaining the position of the sub-goals is represented as the intensity and the width of the link. Only links
nodes, in order to predict the future position of the pedestrians with transition probability > 0.1 are shown. N=25. (b) Initial positions of
subgoals.
we use the same method we describe in section IV-C. To
estimate the future position of the pedestrian after T seconds,
we select the sequence of sub-goals that has the highest
probability according to Eq. (18). Then as in figure 8 we
compute the point on the lines that connect the maximum
probability sequence of sub-goals located at distance v pT from
the start point.
In figures 10, 11, and 12 we show the position of the nodes
in, respectively, the proposed method, the topology method and
the stop point method. From a qualitative analysis, we can see Figure 11. Graph nodes obtained with the topology method. N=33.
that the sub-goal method depicts that pedestrians mainly just
pass through the environment (14-7-3-1-10-9-0-23-17-20-2)
but some of them deviate from the main corridor to go to shops
(8,13,18), exhibits (16, 21) or gates (4,5). The topology method
extracts nodes on crossing points between corridors (18,22,24),
but other qualitatively meaningful points such as gates or
entries of shops are not extracted. The stop point method has
the opposite characteristic: it extracts points close to gates and
shops (10,20), but this method cannot extract very well points
on the corridors. We also notice that the topology method
seems sometimes redundant in the description of the Figure 12. Graph nodes obtained with the stop point method. N=25.
environment, presenting often points very close between them,
something that does not occur with the proposed method. :Proposed
:Stoppoint

Figure 13 shows an example that illustrates this difference. The


area shown in the picture connects the long corridor and the
hallway space, and thus is a place where people are expected to
change walking direction. The yellow lines visible on the floor 0
20

are studded paving blocks located on the typical walking 10

course to help the visually impaired, and we observe that 9

usually also healthy people walk nearby these lines. Thus, the
sub-goal point 0 is exactly the point in which we expect people Figure 13. Sub-goals and qualitative points on the main corridor
to change their course. When a person comes from the corridor directed to the corridor usually change their moving direction
(right side in the picture), she typically changes her walking at the entrance of the corridor, i.e., around sub-goal 0. This
direction around the point 0 towards the direction of sub-goal 9. illustrates that sub-goals are extracted in the areas people
Also people coming from the hallway (left side in the picture) typically go towards, and where they change their course.

142
On the other hand, the topology and stop point methods
could not extract such points. The former method provides the
connection points of each corridor in an environment, but not
those that describe minor deviations in proximity of shops and
the like. Similarly, the stop point method could not extract the
points that represent walking routes through the corridor, and
mostly extracted only points close to the shops (10 and 20 in
a) after 8 seconds
Fig. 13). This method provides specific points where people
rest, but not points people move towards. Judging from this
qualitative comparison, we think that only the proposed
method could extract all the important nodes of the
environment in a compact (not redundant) way.
We also compare the precision in the prediction of future
positions, which we believe to be a reasonable quantitative b) after 32 seconds
estimation of the capability of methods to describe the motion Figure 14. Position and trajectory of an observed pedestrian (solid square and
of pedestrians. For this purpose, we measure the ratio with solid line) and estimation based on the proposed method (star and dashed line).
which the correct position of the pedestrian is within 5m from The pedestrian entered from the right side (empty circle), and estimation is based
on observation of 10 seconds' trajectory (up to filled circle). Circles represent
the estimated position after T seconds (given the dimension
hypotheses of estimated positions and color intensity represents probability
of the environment, a 5 meters error is compatible with a (deeper color corresponds to high probability). The hypothesis with highest
correct prediction of the area of the environment the pedestrian probability is the estimated position (star).
is located in, 5 meters being roughly the width of the long 1
corridor). Each test trajectory is observed for 10 seconds (up to 0.9

time t 0 ), and all methods are used to predict the position at 0.8
0.7
*
t 0  T , for different values of T (see figure 14 for a trajectory 0.6
+
**
prediction using the proposed method1). 0.5
**
+ Proposed
0.4 ** Topology
**
Figure 15 shows the prediction ratios for all methods. The 0.3
** **
Stop point
0.2
proposed method performs clearly better than the stop point 0.1
method, with a difference that grows with T , and slightly 0
0 4 8 12 16 20 24 28 32
better than the topology method. To compare the performance T [sec]
statistically, we used a Chi-square test. The comparison with
Figure 15. Ratio with which the estimated position is within 5m from the
the stop point method revealed significant differences (p<.05)
correct position (proposed v.s. topology and stop point methods).
in the ratios after 8 seconds. The comparison with the topology
method revealed significant trends (p<.1) in the ratios at 16 and 1
0.9
20 seconds. 0.8
0.7
In summary, the proposed method not only could extract 0.6
**

the qualitatively important points of the environment, but also 0.5 Proposed
**
outperformed the other methods from the point of view of the 0.4 ** Pattern
0.3 ** Velocity-based
precision to predict future positions. The fact that the 0.2
**
** ** **
performance of the topology method was close to that of the 0.1
**

proposed one suggests that, in the environment under 0


0 4 8 12 16 20 24 28 32
investigation and for what regards the major movement along T [sec]
the corridor, the sub-goals people are heading to are close to Figure 16. Ratio with which the estimated position is within 5m from the
the topologically significant points of the environment. correct position (proposed v.s. transition, pattern and linear methods).
The transition method uses a probabilistic state transition
C. Comparison with other prediction methods
approach (pattern method). The environment is divided using a
We compare our method also with two estimation methods discrete grid whose cells are denoted as {g i } . A large amount
that do not provide a continuous graph based description of the of trajectories is observed and discretized as a succession of
environment: a velocity-based linear extrapolation, and a grid states {g i j } in order to compute the probability of the n -
pattern method using discrete transition. The velocity based gram transition
method extrapolates the position at time t ' based on v 0 , the
observed velocity vector at time t , as p ( g i j 1 | g i j , g i j 1 ,..., g i j n 2 ) (19)

xt' x t  v 0 (t 't ) (18) Given the present position x t (located in grid gi j ), the
current speed v 0 and the discretized trajectory
gi j , gi j 1 ,..., g i j n2 , the grid location gi j1 that maximize eq.
1
Video of the estimation results can be seen from (19) is computed, and the new position is given by
http://www.youtube.com/watch?v=6salvfbZYI8

143
x t ' xt  ' ˜ v0 ˜ e (20) [5] S. Satake, T. Kanda, D. F. Glas, M. Imai, H. Ishiguro and N. Hagita,
“How to Approach Humans?: Strategies for Social Robots to Initiate
Interaction, ” Proc. ACM/IEEE Int. Conf. on Human-Robot Interaction
where e is the unit vector directed from x t to the center of grid (HRI2009), pp. 109-116, 2009.
g i j1 , and ' =500 msec. We name the method pattern and
[6] Y. Morales, S. Satake, T. Kanda and N. Hagita, Modeling Environments
use n 6 , the same value used for the n -grams in the from a Route Perspective, Proc. ACM/IEEE Int. Conf. on Human Robot
proposed method. We tried different values for the size N of Interaction (HRI2011), pp. 441-448, 2011.
the grid and show the results with the best performance. [7] M. Bennewitz, W. Burgard, G. Cielniak, and S. Thrun, “Learning
Motion Patterns of People for Compliant Robot Motion," Int. J.
To compare the precision in the prediction of future Robotics Research (IJRR), Vol. 24, Number 1, 2005.
positions between the proposed method and other methods, we [8] T. Kanda, D. F. Glas, M. Shiomi, and N. Hagita, “Abstracting People's
Trajectories for Social Robots to Proactively Approach Customers,”
use the same metric of section V-B. Figure 16 shows the ratios IEEE Trans. on Robotics, Vol. 25, No. 6, 2008, pp. 1382–1396.
for all methods. For comparison we used again a Chi-square
[9] Y. Iwamura, M. Shiomi, T. Kanda, H. Ishiguro and N. Hagita, “Do
test. The comparison with the velocity-based linear Elderly People Prefer a Conversational Humanoid as a Shopping
extrapolation method revealed significant differences in the Assistant Partner in Supermarkets?, ” Proc. ACM/IEEE Int. Conf. on
ratios after 8 seconds The comparison with the pattern method Human-Robot Interaction (HRI2011), 2011.
revealed significant differences in the ratios after 28 seconds [10] D. R. Montello, “The Perception and Cognition of Environmental
(for T =32 sec we have 43% correct predictions with the Distance: Direct Sources of Information,”Proc. Int. Conf. on Spatial
Information Theory: A Theoretical Basis for GIS, pp. 297-311, 1997.
proposed method, 19% with the linear method and 35% using
the pattern method). Thus, the results showed that the proposed [11] A. Klippel, S. Hansen, J. Davies and S. Winter, “A High-Level
Cognitive Framework for Route Directions,” Proc of SSC 2005 Spatial
method outperforms other methods from the point of view of Intelligence, Innovation and Praxis: The national biennial Conference of
the precision to predict future positions. the Spatial Science Institute, 2005.
[12] M. Seder and I. Petrovic, “Dynamic Window Based Approach to Mobile
VI. CONCLUSIONS Robot Motion Control in the Presence of Moving Obstacles, ” Proc.
IEEE Int. Conf. on Robotics and Automation (ICRA2007), pp. 1986-
In this paper, we proposed a sub-goal based pedestrian 1991, 2007.
behavior model and an algorithm that estimates sub-goals in an [13] M. Göller, F. Steinhardt, T. Kerscher, J. M. Zöllner and R. Dillmann,
environment based on observed trajectories of pedestrians. The “Proactive Avoidance of Moving Obstacles for a Service Robot
concept of sub-goals as local destinations of pedestrian is Utilizing a Behavior-Based Control, ” Proc. IEEE/RSJ Int. Conf. on
Intelligent Robots and Systems (IROS2010), pp. 5984-5989, 2010.
known in cognitive science, but it has not been applied to
[14] N. Bellotto and H. Hu, “Computationally Efficient Solutions for
model pedestrian behavior in robotics systems and algorithms Tracking People with a Mobile Robot: An Experimental Evaluation of
have not been developed to estimate positions of sub-goals Bayesian Filters, ” Autonomous Robots, vol. 28, pp. 425-438, 2010.
from observed trajectories. We proposed an algorithm to [15] D. Ellis, E. Sommerlade, and I. Reid, “Modelling Pedestrian Trajectories
estimate the positions of sub-goals that best describe observed with Gaussian Processes,” Proc. 9th Int. Workshop on Visual
pedestrian trajectories. Based on the estimated set of sub-goals, Surveillance, 2010, pp. 1229–1234.
we constructed a pedestrian model to describe the motion of the [16] K. Nagel and M. Schreckenberg, “A cellular automaton model for
pedestrian towards her destination. Both the estimated positions freeway traffic, ” J. Phys. I France, vol. 2, pp. 2221–2229, 1992
of sub-goals and the pedestrian movement model are derived [17] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz and K. Peterson,
Planning-Based Prediction for Pedestrians, IEEE/RSJ Int. Conf. on
from observed real trajectories in a shopping mall. We used Intelligent Robots and Systems (IROS2009), pp. 3931-3936, 2009.
this model to predict the future position of pedestrians in a
[18] B. Kuipers, The Spatial Semantic Hierarchy, Artificial Intelligence, vol.
large environment, and compared it to other prediction methods. 119, pp. 191-233, 2000.
The proposed method describes qualitatively well the [19] J. Zacharias,“Path choice and visual stimuli: signs of human activity and
environment, spotting as sub-goals the points that look as architecture”, Journal of environmental psychology (2001) 21, 341-352
significant points of the environment also to a qualitative [20] F. Zanlungo, et al., “Social force model with explicit collision
human analysis, a property that previous methods were missing. prediction,” EPL, 93 (2011) 68005.
Furthermore, the prediction of the future position of pedestrians, [21] D. Helbing, I. J. Farkás, P. Molnár, and T. Vicsek (2002)Simulation of
resulted better with the proposed method than with any other pedestrian crowds in normal and evacuation situations. Pages 21-58 in:
tested method. M. Schreckenberg and S. D. Sharma (eds.) Pedestrian and Evacuation
Dynamics (Springer, Berlin).
REFERENCES [22] D. Helbing and P. Molnár, “Social force model for pedestrian
dynamics,” Physical Review E, Vol. 51, No. 5, 1995, pp. 4282-4286.
[1] S. Thrun et al., “Minerva: A Second-Generation Museum Tour-Guide
Robot, ” Proc. IEEE Int. Conf. on Robotics and Automation (ICRA1999), [23] K.V. Mardia and P.E. Jupp, Directional Statistics. John Wiley and Sons
pp. 1999-2005, 1999. Inc., 2000. 4.
[2] R. Siegwart et al., “Robox at Expo.02: A Large Scale Installation of [24] I. S. Dhillon and S. Sra, “Modeling Data using Directional
Personal Robots, ” Robotics and Autonomous Systems, vol. 42, pp. 203- Distributions,” Technical report of University of Texas,TR-03-062, 2003
222, 2003. [25] M. Abramowitz, I. Stegun, Handbook of Mathematical Functions, Dover
[3] H.-M. Gross et al., “Shopbot: Progress in Developing an Interactive (1965)
Mobile Shopping Assistant for Everyday Use, ” Proc. IEEE Int. Conf. [26] D. F. Glas, T. Miyashita, H. Ishiguro, and N. Hagita, “Laser-Based
on Systems, Man, and Cybernetics (SMC2008), pp. 3471-3478, 2008 Tracking of Human Position and Orientation Using Parametric Shape
[4] A. Weiss, J. Igelsböck, M. Tscheligi, A. Bauer, K. Kühnlenz, D. Modeling,” Advanced Robotics, Vol. 23, No. 4, 2009, pp. 405-428
Wollherr and M. Buss, “Robots Asking for Directions: The Willingness [27] A. Okabe and A. Suzuki, “Locational optimization problems solved
of Passers-by to Support Robots,” Proc. ACM/IEEE Int. Conf. on through Voronoi diagrams,” European Journal of Operational Research,
Human-Robot Interaction (HRI2010), pp. 23-30, 2010. Vol. 98, Issue 3, pp. 445-456, 1997.

144
Asymptotically-Optimal Path Planning on Manifolds
Léonard Jaillet Josep M. Porta
Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain
Email: {ljaillet,porta}@iri.upc.edu

Abstract—This paper presents an approach for optimal path


planning on implicitly-defined configuration spaces such as those
arising, for instance, when manipulating an object with two arms
or with a multifingered hand. In this kind of situations, the
kinematic and contact constraints induce configuration spaces
that are manifolds embedded in higher dimensional ambient
spaces. Existing sampling-based approaches for path planning
on manifolds focus on finding a feasible solution, but they do
not optimize the quality of the path in any sense. Thus, the
returned paths are usually not suitable for direct execution.
Recently, RRT* and other similar asymptotically-optimal path
Fig. 1. Two paths connecting the same points for a small ball moving
planners have been proposed to generate high-quality paths in on a torus while avoiding obstacles. Left A low quality path obtained with
the case of globally parametrizable configuration spaces. In this AtlasRRT. Right A close to optimal path obtained with the AtlasRRT*
paper, we propose to use higher dimensional continuation tools algorithm proposed in this paper when optimizing the path length.
to extend RRT* to the case of implicitly-defined configuration
spaces. Experiments in different problems validate the proposed
approach.
configuration space, which hinders its effective exploration.
I. I NTRODUCTION To solve this issue, approaches based on higher dimensional
The last years have witnessed a significant growth in the continuation have been recently proposed [11]. In these ap-
complexity of the robotic platforms. Nowadays, two arms proaches, local parametrizations of the manifold are defined
service robots [21], anthropomorphic hands [28], or even and coordinated, allowing an effective exploration of the
humanoid robots [23] are relatively common in research configuration space. Despite their efficiency, all these planners
labs. The availability of these platforms allows addressing do not focus on the quality of the paths and their outputs are
increasingly relevant tasks, but they also pose significant not suitable for direct execution, as shown in Fig. 1-left.
challenges. From the point of view of path planning, one of When the cost is the length of the path, the determination
the most critical problems is how to deal with the kinematic of optimal paths on manifolds is related to the computation of
and contact constraints arising when manipulating an object. geodesic distances. This is an active field of research in Com-
These constraints reduce the dimensionality of the set of puter Graphics where the problem is addressed for triangulated
valid configurations, which is in principle beneficial for path meshes using variants of the fast marching method [17].
planning. However, this comes at the expense of converting Even though some exceptions exist [22], these approaches
the configuration space into an implicitly-defined manifold are mainly limited to 3D surfaces and they cannot be directly
embedded in the ambient space of the variables representing applied to the configuration spaces arising in path planning. In
the degrees of freedom of the system. Robotics, a recent method addresses the problem of optimal
Due to the relevance of the applications in and beyond path planning on manifolds [10], but it requires an exhaustive
Robotics [30], several approaches have addressed the problem representation of the manifold using a dense set of samples,
of path planning on manifolds. Complete planners [4] are which negatively effects the scalability of the method.
able to find a solution path whenever one exists, but they In globally parametrizable spaces, some approaches have
are only adequate for low dimensional problems. A more been proposed that are less affected by the curse of dimen-
practical approach is to adapt the successful sampling-based sionality and that can deal with generic cost functions. On
path planning algorithms developed in globally parametrizable the one hand, locally optimal paths can be obtained using
spaces [16, 19]. The difficulty is that these approaches assume smoothing techniques [6, 26]. On the other hand, globally
a parametrization of the configuration space to sample it optimal paths can be approximated combining the construction
uniformly. Since, except for some families of mechanisms [8], of a RRT with stochastic optimization [12]. More recently,
such parametrization does not exist, most of the existing different sampling-based path planners have been introduced
algorithms generate samples in the configuration space from that asymptotically converge to the globally optimal path.
samples in the parametrizable ambient space using inverse The seminal work proposed the RRT* algorithm that achieves
kinematic functions [7], or iterative techniques [3, 29]. Al- optimality with a moderate computational cost [14]. Latter
though being probabilistically complete [3], these methods on, many variants have been presented to speed up the
cannot guarantee a uniform distribution of samples on the convergence to the optimal path [1, 2, 15, 20].

145
Algorithm 1: RRT* asymptotically-optimal path planner is non-negative, additive, and somehow bounded by the path
length. To devise a version of the RRT* planner for manifolds,
RRT*(C OST, xs , xg )
the basic functions of RRT* (S AMPLE C ONF, N EAREST N ODE,
input : A cost function for paths, C OST, and a pair of samples N EAR, S TEER, and PATH) must be generalized.
to connect, xs and xg .
output: A minimum-cost path connecting the two samples. Function S AMPLE C ONF (Line 4) generates a sequence of
random samples in the configuration space. The distribution
1 T ← {(∅, xs )}
2 C(xs ) ← 0 of the samples should be uniform since this property is
3 for i ← 1 to N do assumed when fixing γRRT ∗ , a parameter of the algorithm
4 xr ← S AMPLE C ONF that determines the span of the connections at each iteration.
5 xc ← N EAREST N ODE(T , xr ) The asymptotic optimality of the algorithm basically depends
6 xn ← S TEER(xc , xr ) on the value of this parameter. The generation of uniformly
7 if xn = ∅ then
distributed samples in an Euclidean space is straightforward,
8 γ ← γRRT ∗ (log(|T |)/|T |)1/k
9 X ← N EAR(T , xn , γ) but this is not the case when the configuration space is an
10 xm ← x c implicit manifold since a global parametrization of this space
11 cm ← C(xc ) + C OST(PATH(xc , xn )) is generally not available.
12 for x ∈ X do The N EAREST N ODE function (Line 5) identifies the node
c ← C(x) + C OST(PATH(x, xn ))
13
in T closer to xr . This should be done using the intrinsic
14 if c < cm then
15 cm ← c metric of the configuration space. In a parametrizable space
16 xm ← x this metric is simple, but on a manifold, it corresponds to
the more complex geodesic distance. The implementation of
17 T ← T ∪ {(xm , xn )}
an efficient nearest-neighbor procedure for implicitly-defined
18 C(xn ) ← cm
19 for x ∈ X do manifolds is difficult and it has been only addressed recently
20 c ← C(xn ) + C OST(PATH(xn , x)) in an approximated way [5]. A usual solution, that will be
21 if c < C(x) then adopted in this paper, is to resort to the ambient space nearest-
22 C(x) ← c neighbor as an approximation of the manifold one, despite this
23 PARENT(T , x) ← xn
may sometimes lead to inadequate tree extensions.
A related problem appears in function N EAR (Line 9) that
24 R ETURN(PATH I N T REE(T , xg )) identifies the set of nodes in T whose distance to xn is less
than γ. In this case, however, the N EAR function based on the
ambient space metric returns a conservative set of neighbors.
The purpose of this paper is to extend the sampling-based Then, each one of them can be checked for connection to xn
asymptotically-optimal path planners to the case of implicitly- along geodesic curves to discard nodes that are actually too
defined configuration spaces. Section II describes the main far away. This solution, though, relies on the ability to find
challenges of adapting RRT* to path planning on manifolds. geodesic paths between arbitrary points on the manifold.
Those challenges are addressed in Section III using higher Function S TEER (Line 6) generates a new node moving on
dimensional continuation tools [9]. This leads to the definition a collision-free path from xc towards xr . If the motion of the
of the AtlasRRT* algorithm that is described and analyzed in robot is subject to differential constraints, xr cannot be exactly
Section IV and that can approximate optimal paths as shown in reached and xn is generated with the control inputs that better
Fig. 1-right. Section V experimentally evaluates this algorithm approximate xr [15]. Without differential constraints, that is
and, finally, Section VI summarizes the contributions of this the case addressed in this paper, S TEER follows the shortest
paper and suggests points that deserve further attention. path from xc to xr . In parametrizable spaces, this shortest path
corresponds to the straight line between those points. When
II. C HALLENGES OF A SYMPTOTICALLY- OPTIMAL operating on a manifold, though, a procedure to approach xr
PATH P LANNING ON M ANIFOLDS from xc along a geodesic path is necessary.
Algorithm 1 gives the pseudo-code of the RRT* path plan- Finally, function PATH (Lines 11, 13 and 20) should identify
ner [13, 14]. This algorithm takes as input a cost function and the lowest-cost path between nearby configurations. However,
a pair of configurations xs , xg ∈ Rk and attempts to connect the determination of the lowest-cost path between apparently
them with a minimum-cost, collision-free path in N iterations. near configurations can be arbitrarily difficult, specially when
The algorithm initializes a tree, T (Line 1) and extends it by operating on manifolds. Thus, at this local path planning step,
adding nodes towards points selected at random (Line 4). Each the RRT* algorithm considers the shortest path between the
node x has an associated value C(x) estimating the cost of two configurations to connect. Since the cost is bounded by the
the best path from xs to x, at a given iteration. New nodes are scaled path length, any optimal path can be piecewise linearly
connected to the tree, minimizing this cost (Lines 10 to 18) and approximated with a bounded error, that in the limit vanishes.
they are used to rewire the tree, eventually reducing the cost In parametrizable spaces, the shortest path corresponds to the
for neighboring nodes (Lines 19 to 23). The algorithm is fairly straight line but, on manifolds, this naturally translates to a
general and it can optimize paths using any cost function that geodesic path, which is more difficult to determine.

146
Ai Ai by first computing the mapping φi from parameters in the
xi xi tangent space to coordinates in the joint ambient space,
uij uij
xij = φi (uij ) = xi + Φi uij , (1)
xij xij
F xj xj and then, orthogonally projecting this point on the manifold
F
to obtain xj . This projection can be computed by solving
Aj *
F(xj ) = 0,
Φi (xj − xij ) = 0,
Fig. 2. Left A generic approximation of the exponential map is obtained
by orthogonally projecting on F a point xij on the tangent space at xi . using a Newton procedure [27].
Right When a new chart is defined at xj , the applicability areas of the two The logarithmic mapping, ψi−1 , can be approximated as the
charts, Ai and Aj , are coordinated to avoid overlaps.
projection of a point on the tangent subspace,
uij = ψi−1 (xj ) = Φi (xj − xi ).
III. E XPLOITING THE L OCAL PARAMETRIZATION OF
B. Defining a RRT on a Manifold
M ANIFOLDS
Using the exponential and logarithmic maps, a full atlas of
As seen before, the main challenges to develop a version the manifold can be defined [9]. Thus, in principle, one could
of RRT* on manifolds are the need to uniformly sample on use such a full atlas to determine an optimal path between any
the manifold and the necessity to connect configurations along two given configurations using a fast marching like algorithm.
geodesic curves. Both problems would be trivially solved if a However, the construction of a full atlas is computationally
global isometric parametrization of the manifold was avail- demanding, specially in high dimensions. Therefore, we depart
able, but these parametrizations do not exist even for simple from our previous work [11], where the atlas construction is
manifolds such as a sphere in 3D. However, from Differential intertwined with the definition of a RRT.
Geometry, it is well known that a manifold can be described In this approach, the RRT is initiated on a manifold by
by a collection of local parametrizations called charts, which sampling on Txs F and projecting on the manifold when
can be coordinated within an atlas [25]. Higher-dimensional necessary using the exponential map. Formally, if xc is a point
continuation techniques provide principled numerical tools to on F already included in the RRT (initially xs ) and uc are
compute the atlas of an implicitly defined manifold reachable the parameters of this point in chart Cc , then a new vector or
from a given point [9]. In this paper, we rely on such tools to parameters un is generated with a small displacement from uc
define an asymptotically-optimal path planner on manifolds. towards ur , a random vector of parameters on Cc . Then, the
next point to add to the RRT is obtained as xn = ψc (un ).
A. Local Parametrization of a Manifold However, the area of the manifold properly parametrized by
Let us consider a n-dimensional joint ambient space and a a given chart is limited. As the norm of un increases, the
k-dimensional configuration space, F, implicitly defined by a distance to the manifold and the curvature typically increase
set of constraints too, and the Newton process implementing ψc could even
F(x) = 0, diverge. Thus, a new chart is added to the atlas whenever
there is a large error with respect to the manifold, i.e., when
with F : Rn → Rn−k , n > k > 0 and where we assume
φc (un ) − xn  > , (2)
that the configuration space is a smooth manifold everywhere,
without considering the presence of singularities. or when the curvature of the manifold with respect to Cc is
A chart, Ci , locally parametrizes the k-dimensional manifold large, i.e., when
around a given point, xi , with a bijective map, xj = ψi (uij ), uc − un 
between parameters uij in Rk and n-dimensional points xj < cos(α), (3)
xc − xn 
on the manifold, with ψi (0) = xi . The map from the
parameter space to the manifold is the exponential map and the for user-defined parameters  and α. Finally, a new chart is
inverse is the logarithmic map. A generic approximation to the also added when the tree expands too far away from the chart
exponential map valid for any manifold can be implemented center, i.e., when
using Txi F, the k-dimensional space tangent at xi (see Fig. 2- un  > R, (4)
left). An orthonormal basis for this tangent space is given by for a given R. This maximum span for a chart helps to obtain
the n × k matrix, Φi , satisfying a regular covering of the manifold.
J(xi ) 0 The area of the manifold parametrized by a new chart
Φi = , overlaps with those for charts already in the atlas. To reduce
Φi I
the overlaps, the area of applicability Ai of a given chart Ci
with J(xi ) the Jacobian of F evaluated at xi , and I, the is bounded by a set of lineal inequalities, as illustrated in
identity matrix. Using this basis, the mapping ψi is computed Fig. 2-right. These inequalities are defined in the tangent

147
space associated with each chart and points not fulfilling them Algorithm 2: Sampling on an atlas.
correspond to points on the manifold that are parametrized by
SampleConf (A)
a neighboring chart. The set of inequalities bounding Ai is
initially empty and enlarged as new charts are created around input : An atlas, A.
output: A random point.
chart Ci . If a new chart Cj is created on a point xj then
1 repeat
2 u uij ≤ uij 2 , 2 r ← R ANDOM C HART I NDEX(A)
3 ur ← R ANDOM I N BALL(Rs )
is added to the set of inequalities bounding Ai , with 4 until ur ∈ Ar
uij = ψi−1 (xj ). This inequality bisects the vector uij , keeping 5 R ETURN(φr (ur ))
the half-space including the origin. When a chart Ci is fully
surrounded by other charts, Ai becomes a convex polytope.
C. Defining a RRT* on a Manifold Note that l ≤ p and that, using Eq. (5), p ≤ sec(α) l.
Moreover, assume that p∗ is the length of the geodesic path
The local parametrization provided by the charts can be
connecting the two points and l∗ is its length in parameter
exploited to address the issues raised in Section II. First, the
space. Since paths are defined as straight lines in parameter
parametrization allows obtaining a close to uniform distribu-
space we have that l < l∗ and, thus
tion of samples in the part of the manifold covered by the
charts at a given moment. To this end, we select a chart at l ≤ l∗ ≤ p∗ ≤ p ≤ sec(α) l ≤ sec(α) l∗ .
random, we sample a vector of parameters u, rejecting the
samples that are not in the corresponding applicability area. With this, the relative error of a path generated from a straight
In this way, the probability of generating a valid sample in line in parameter space with respect to the geodesic path is
a given chart is proportional to the size of its applicability p − p∗ sec(α) l − l
area. Therefore, the distribution of samples will be uniform in ≤ ≤ sec(α) − 1.
p∗ p∗
the union of the applicability areas for all charts. Moreover,
using Eq. (3), we have that for any two points xi and xj In practice, this upper bound is overly confident since as
parametrized by ui and uj in the same chart samples get denser the relative error tends to vanish. In any
case, α should be always below ±π/2 to get a bounded
xi − xj  ≤ sec(α) ui − uj . (5) error. Geodesic paths can be approximated in large areas by
Thus, there is a bounded distortion between points in the generating new charts as the curvature grows.
tangent space and the associated points on the manifold. As in the Euclidean case [13], the asymptotic optimality of
Therefore, the volume of the patch of the manifold covered RRT* on manifolds is given by the ability to generate samples
by a given chart is a scaled factor of the volume of the close to the optimal path and also to connect samples close
corresponding applicability area. With this, when sampling on to this path, at a given sample density. The first property is
the atlas, the critical value for γRRT ∗ (Theorem 38 in [13]) is guaranteed by the atlas-based sampling and the second is given
by the bound for γRRT ∗ in Eq. (6). The on-line generation of
  1/k the atlas has an effect on the distribution and density of the
1 μ(Afree )
γRRT ∗ > 2 1+ sec(α) , (6) samples. However, optimality can be achieved irrespectively
k ζk
of these aspects and of whether some connections between
where k is the dimension of the manifold, μ(Afree ) is the samples fail at the beginning of the process. These elements
Lebesgue measure of the applicability areas of the charts have influence on the convergence rate, but not on the long
that correspond to collision free configurations, and ζk is the term optimality.
volume of the unitary k-dimensional ball.
The local parametrization provided by the charts can also IV. ATLAS RRT* A LGORITHM
be exploited to approximate geodesic paths. In particular, Using the tools described above, we define the AtlasRRT*
consider a linear interpolation, (u1 , u2 , . . . , um ), between two algorithm, an adaptation of Algorithm 1 to operate on man-
points, u1 and um , in the tangent space of a given chart Cc ifolds. AtlasRRT* has the same structure as RRT*, but it
and the corresponding path on the manifold (x1 , x2 , . . . , xm ) additionally maintains an atlas. The atlas is initialized with
with xi = ψc (ui ), i ∈ {1, . . . , m}. Then, the length of the one chart at xs and new charts are added taking into account
path can be approximated by Eqs. (2) to (4). Moreover, AtlasRRT* keeps track of the chart

n parametrizing each node of the tree. Besides this, the only dif-
p= xi−1 − xi , ference between AtlasRRT* and RRT* is the implementation
i=2 of functions S AMPLE C ONF, S TEER, and PATH.
and its length in parameter space is Using the atlas, S AMPLE C ONF is implemented as described
in Algorithm 2. A chart is selected at random with uniform

n
l= ui−1 − ui  = un − u1 . distribution and then, a point, ur is sampled within a ball of
i=2 radius Rs > R. The process is repeated until ur is inside

148
Algorithm 3: The S TEER/PATH functions. The main difference between the S TEER and PATH functions
is that S TEER aims towards a point that is not on the manifold,
Steer/Path(A, xn , xr )
but on the tangent space of a chart, giving a direction to
input : The atlas A and two points, xn and xr . expand the tree instead of a particular point to reach. Thus,
output: In the case of PATH, a collision-free path connecting
the two samples, if it exists. In the case of S TEER, a Algorithm 3 presents the two functions in a compact way.
point on the manifold as close as possible to xr . Both functions get as input the atlas maintained by AtlasRRT*
1 c ← C HART I NDEX(xn ) and two points, xn and xr . First, they determine the chart
2 un ← ψc−1 (xn ) parametrizing xn (Line 1) and compute the parameters of xn
3 ur ← ψc−1 (xr ) and xr in this chart (Lines 2 and 3). In the case of S TEER, we
4 d ← xn − xr  ensure that ur is at least at distance d from un (Lines 6 and 7),
5 if S TEER then with d the original distance between xn and xr (Line 4). Then,
6 ur ← un + (ur − un )(d/ur − un )
the functions proceed to move from un towards ur in small
7 xr ← φc (ur )
steps of size δ (Lines 11 and 12). If the new configuration, xj ,
8 BLOCKED ← FALSE
is in collision, the expansion is stopped (Line 14). Otherwise,
9 P ←∅
10 while not BLOCKED and un − ur  > δ and d > 0 do the algorithm checks if the new point triggers the creation of
11 uj ← (ur − un ) δ/ur − un  a new chart (Line 17) or if it is in the applicability area of a
12 xj ← ψc (uj ) neighboring chart (Line 21). In any of these cases, uj and ur
13 if C OLLISION(xj ) then are recomputed projecting xj and xr (Lines 25 and 26) on
14 BLOCKED ← T RUE
the new or neighbor chart determined at Lines 18 and 22,
15 else respectively. In the case of S TEER, the random sample is also
16 NEW ← FALSE
projected on the new chart, checking again that it is far enough
17 if φc (uj ) − xj  >  or
un − uj /xn − xj  < cos(α) or uj  > R then from the previous point (Lines 28 and 29). Finally, the new
18 c ← N EW C HART(A, xn ) configuration is added to the computed path, P (Line 30)
19 NEW ← T RUE and it is set as the point from where to continue the path
20 else (Lines 33 and 34). In the case of the S TEER function, the
21 if uj ∈
/ Ac then distance already travelled is discounted (Line 32) to avoid
22 c ← N EIGHBOR C HART(c, uj ) growing an infinite branch. At the end of the PATH procedure,
23 NEW ← T RUE
the computed path is returned, unless the goal configuration is
24 if NEW then not actually reached. The S TEER function always returns the
25 uj ← ψc−1 (xj ) last element included in the path, if any.
26 ur ← ψc−1 (xr ) The additional computational complexity of AtlasRRT*
27 if S TEER then with respect to RRT* concentrates in the computation of the
28 ur ← uj + (ur − uj )(d/ur − uj )
29 xr ← φc (ur )
mapping ψc (Line 12) and in the addition of new charts to the
atlas (Line 18). The first operation scales with O(n3 ) since it is
30 P ← P ∪ {xj } implemented as a Newton process with a bounded number of
31 if S TEER then iterations, where at each iteration a QR decomposition is used.
32 d ← d − xn − xj 
The second operation, which is executed less often, requires
33 xn ← x j to generate the new chart, that is O(n3 ), and to identify the
34 un ← u j
neighboring charts in the atlas to avoid the overlaps. This last
35 if PATH then operation can be implemented using hierarchical structures
36 if un − ur  > δ then reducing their cost to logarithmic in the number of charts in
37 R ETURN(∅) the atlas.
38 else
39 R ETURN(P) V. E XPERIMENTS AND R ESULTS
40 else Figure 3 shows the four benchmarks used to evaluate the
41 R ETURN(L AST(P)) AtlasRRT* algorithm. The first one is a small ball moving on a
implicitly-defined torus with two obstacles forming a narrow
corridor. This example is used since, due to its simplicity,
the results are easy to visualize. The second test case is
the applicability area Ar , i.e., until it fulfills the inequalities the cyclooctane, a molecule that can be modelled with eight
created by neighboring charts, if any. Finally, the process re- revolute joints forming a kinematic loop. In this problem,
turns the ambient space coordinates for ur computed using φr there is a collision whenever two hydrogen atoms (shown in
defined in Eq. (1). To accelerate the convergence to a solution, white in the figure) are closer than the sum of their Van der
a bias towards xg is typically used. In our implementation, xg Waals radii. This example is used to illustrate the ability of
has a 1% chance of being selected as random sample, as long AtlasRRT* to determine the optimal path among many feasible
as a path to the goal has not been found. ones. The third example is the Barret arm solving a simple

149
(a) (b) (c) (d)
Fig. 3. The four benchmarks used in this paper. (a) A ball moving on a torus. (b) The cyclooctane molecule. (c) The Barret arm and hand solving a maze.
(d) The PR2 service robot moving a box with the two arms.

TABLE I
D IMENSION OF THE CONFIGURATION AND AMBIENT SPACES , AND Table I shows the dimension of the configuration space, k,
RELATIVE ERRORS RESPECT TO THE OPTIMAL PATH OBTAINED WITH the dimension of the ambient space, n, and the average
ATLAS RRT∗0 AND WITH ATLAS RRT* USING THE γRRT ∗ VALUES GIVEN relative error of the path obtained with AtlasRRT∗0 and
IN PARENTHESIS . I N BOTH CASES , COLLISIONS ARE NOT ACTIVE .
AtlasRRT* after 1000 iterations with respect to the opti-
mum. For AtlasRRT*, the values of γRRT ∗ are indicated
Benchmark k n AtlasRRT∗0 AtlasRRT* in parenthesis. The optimal path is obtained by smoothing
Torus 2 3 82.54 % 0.25 % (10) the best trajectory returned by AtlasRRT*. Exceptionally, in
Cyclooctane 2 8 40.32 % 8.75 % (12)
Barret 3 9 98.23 % 0 % (2.5)
these experiments collisions are not considered and, thus,
PR2 4 16 127.08 % 0 % (4) the only constraint arises from the manifold structure of
the configuration spaces. The results show that the proposed
algorithm reasonably converges to the optimal path in all cases,
maze. The figure shows the initial configuration and the goal while AtlasRRT∗0 generates paths that have significantly higher
is marked with a circle. The task is constrained because the cost.
peg must remain in contact with and orthogonal to the maze Figure 4 compares the cost of the paths obtained with
plane without rotating about its axis. Finally, the last test AtlasRRT*, C-RRT*, and AtlasRRT∗0 for the four benchmarks
case is the PR2 robot executing a coordinated manipulation used in Fig. 3, considering collisions. In all cases, the path
task consisting in placing a box on a table moving it from obtained with AtlasRRT∗0 has a high cost and it is not improved
underneath the table. The two last problems are used to test once discovered. The path obtained with C-RRT* progres-
the scalability of the method. In all cases, the cost to optimize sively improves, but slowly due to the non-uniform sampling,
is the path length and the experiments are carried out with even in simple problems like the torus one. In the cyclooctane
δ = 0.05,  = 0.1, R = 0.4, Rs = 2, and α =0.45 rad. With problem the path obtained by C-RRT* is sometimes worse
such parameters, the error factor with respect to geodesic is than the one obtained with AtlasRRT∗0 . The reason is that the
below sec(α) = 1.1, which is reasonably small. The value of optimal path goes by an inner part of the manifold where
γRRT ∗ depends on the volume of the free space that is different C-RRT* has difficulties sampling [3]. Whereas AtlasRRT*
for each problem. In a simple problem such as the torus, it can converges to the optimal path in all cases, C-RRT* does not
determined by building the full atlas and evaluating Eq. (6). find a solution in all of the repetitions in the Barret and
In the rest of the problems the value for this parameter was the PR2 problems. Note that in the figure, costs are plotted
determined experimentally. All the experiments were executed when at least half of the repetitions are successful and the
on an Intel Core i7 at 2.93 Ghz running Mac OS X and eventual increments in the curves are caused by the different
averaged over 25 repetitions. The source code together with data averaged at each iteration. In the PR2 experiment, though,
the described benchmarks can be downloaded from [18]. the original AtlasRRT* is too slow and we enhance it with two
To the best of our knowledge, there are no other sampling- heuristics proposed in [24] and in [1, 15]. The first one sets
based asymptotically-optimal path planners for manifolds. γRRT ∗ = 0 until a first path to the goal is found. The second
Thus, for the purpose of comparison, we implemented what we heuristic prevents expanding nodes that cannot be part of the
will call C-RRT*, an alternative version of RRT* for manifolds optimal path, i.e., if the cost of the path to that node plus
that samples in the ambient space and uses a Jacobian pseudo- a conservative estimate of the cost to the goal is larger than
inverse strategy to connect different samples [3]. In this the cost of the best path to the goal found so far. The same
algorithm, the sampling is not uniform in the configuration heuristics are used for C-RRT* in this benchmark.
space and the connection between samples is not guaranteed Figure 5 shows the trees generated in a typical run of
to be close to the geodesic path. Comparison are also done AtlasRRT* on the torus at two different stages. While in the
with respect to AtlasRRT∗0 , i.e. AtlasRRT* with γRRT ∗ = 0. left tree the narrow corridor is not yet discovered and, thus,

150
12 6
20 15
10 5
18
8 4 10
16
6 3
14 5
4 2
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

(a) (b) (c) (d)


Fig. 4. Path cost versus iterations for AtlasRRT*, C-RRT*, and AtlasRRT∗0in (a) the torus, (b) the cyclooctane, (c) the Barret, and (d) the PR2 problems.
Costs are given from the moment a solution is found in at least in half of the repetitions. The bars give the standard deviation.

8
6 8 10 12
7

Cost
6

4
Fig. 5. Two stages of the AtlasRRT* tree construction for the torus example. 0 20 40 60
Time (s)
8
6 8 10 12 Fig. 7. Path cost versus execution time for AtlasRRT* on the torus for 10000
iterations and for different values of γRRT ∗ .
7
Cost

6 VI. C ONCLUSIONS
5 In this paper we have demonstrated the feasibility of gener-
alizing asymptotically-optimal sampling-based path planners
4 to operate on implicitly-defined configuration spaces. This
0 2000 4000 6000 8000 10000 is achieved by resorting to higher dimensional continuation
Iterations
tools. Thanks to these tools, the manifold can be properly
Fig. 6. Path cost versus number of iterations for AtlasRRT* on the torus sampled and we can determine close-to-geodesic paths for
and for different values of γRRT ∗ . generic manifolds. Both properties are fundamental to obtain
an efficient, asymptotically-optimal planner.
The introduced path planner can find optimal paths in
the path is sub-optimal, in the right tree the optimal path is configuration spaces with moderate dimension embedded in
finally found. This latter tree actually provides optimal paths high dimensional ambient spaces. However, as the dimension
to all regions of the configuration space. of the configuration space increases, so does the computational
The selection of the right value for γRRT ∗ is critical for complexity and we have to resort to heuristics to speed up
the performance of the algorithm. For the case of the torus, the convergence. In the future, we would like to explore new
the value for γRRT ∗ given by Eq. (6) can be numerically heuristic strategies exploiting the atlas structure to improve the
computed and is about 8.1. Figure 6 shows the performance performance of the algorithm, possibly accepting sub-optimal
of AtlasRRT* with different values for γRRT ∗ . Clearly, the solutions. The atlas structure could also be used to estimate the
larger the value, the lower the cost of the final path after 10000 local density of samples. This might be used to automatically
iterations. However, with γRRT ∗ below 8.1, the convergence adjust a particular γRRT ∗ for each chart in order to obtain a
seems stalled before reaching the optimum. good trade-off between efficiency and optimality. Finally, the
From the previous plot, one could think that using a use of a bi-directional RRT instead of a one-directional tree
large value for γRRT ∗ is a good strategy since in this way could also result in a significant performance improvement.
AtlasRRT* would always converge to the optimal path. How-
ever, Fig. 7 shows that the larger γRRT ∗ , the more it takes to ACKNOWLEDGMENTS
complete the 10000 iterations due to the increment of nearby This work has been partially supported by the Spanish Min-
samples checked for connection. Thus, with a limited time, istry of Economy and Competitiveness under project DPI2010-
it is advantageous to use a value of γRRT ∗ just above the 18449. Léonard Jaillet was supported by the CSIC under a
threshold that guarantees convergence to the optimum. JAE-Doc fellowship partially founded by the ESF.

151
R EFERENCES S. Teller. Anytime Motion Planning using the RRT*. In
IEEE International Conference on Robotics and Automa-
[1] B. Akgun and M. Stilman. Sampling heuristics for tion, pages 1478 –1483, 2011.
optimal motion planning in high dimensions. In EEE/RSJ [16] L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. H.
International Conference on Intelligent Robots and Sys- Overmars. Probabilistic roadmaps for path planning in
tems, pages 2640 –2645, 2011. high-dimensional configuration spaces. IEEE Transac-
[2] R. Alterovitz, S. Patil, and A. Derbakova. Rapidly- tions on Robotics and Automation, 12(4):566–580, 1996.
exploring roadmaps: Weighing exploration vs. refinement [17] R. Kimmel and J. A. Sethian. Computing geodesic paths
in optimal motion planning. In IEEE International on manifolds. Proceedings of the National Academy of
Conference on Robotics and Automation, pages 3706– Sciences, 95(15):8431–8435, 1998.
3712, 2011. [18] KRD Group. The CuikSuite software. http://www.iri.
[3] D. Berenson, S. Srinivasa, and J. Kuffner. Task Space Re- upc.edu/people/porta/Soft/CuikSuite2-Doc/html, 2012.
gions: A Framework for Pose-Constrained Manipulation [19] S. M. LaValle and J. J. Kuffner. Rapidly-exploring
Planning. International Journal of Robotics Research, 30 random trees: Progress and prospects. In Algorithmic and
(12):1435–1460, 2011. Computational Robotics - New Directions, pages 293–
[4] J. Canny. The Complexity of Robot Motion Planing. MIT 308, 2000.
Press, 1988. [20] J. D. Marble and K. E. Bekris. Asymptotically Near-
[5] R. Chaudhry and Y. Ivanov. Fast approximate near- Optimal is Good Enough for Motion Planning. In
est neighbor methods for non-Euclidean manifolds with International Symposium on Robotics Research, 2011.
applications to human activity analysis in videos. In [21] E. Marder-Eppstein, E. Berger, T. Foote, B. Gerkey, and
European Conference on Computer Vision, pages 735– K. Konolige. The Office Marathon: Robust Navigation
748, 2010. in an Indoor Office Environment. In IEEE International
[6] P. C. Chen and Y. K. Hwang. SANDROS: a dynamic Conference on Robotics and Automation, 2010.
graph search algorithm for motion planning. IEEE [22] F. Mémoli and G. Sapiro. Fast computation of weighted
Transactions on Robotics and Automation, 14(3):390– distance functions and geodesics on implicit hyper-
403, 1998. surfaces. Journal of Computational Physics, 173:730–
[7] L. Han and N. M. Amato. A Kinematics-Based Proba- 764, 2001.
bilistic Roadmap Method for Closed Chain Systems. In [23] C. Ott, O. Eiberger, W. Friedl, B. Bauml, U. Hillenbrand,
Algorithmic and Computational Robotics - New Direc- C. Borst, A. Albu-Schafer, B. Brunner, H. Hirschmuller,
tions, pages 233–246, 2000. and G. Hirzinger. A Humanoid two-arm System for
[8] L. Han, L. Rudolph, J. Blumenthal, and I. Valodzin. Dexterous Manipulation. In IEEE-RAS International
Convexly Stratified Deformation Spaces and Efficient Conference on Humanoid Robots, pages 276–283, 2006.
Path Planning for Planar Closed Chains with Revolute [24] A. Perez, S. Karaman, A. Shkolnik, E. Frazzoli, S. Teller,
Joints. International Journal of Robotics Research, 27 and M. R. Walter. Asymptotically-optimal path plan-
(11-12):1189–1212, 2008. ning for manipulation using incremental sampling-based
[9] M. E. Henderson. Multiple Parameter Continuation: algorithms. In IEEE/RSJ International Conference on
Computing Implicitly Defined k-Manifolds. Interna- Intelligent Robots and Systems, pages 4307–4313, 2011.
tional Journal of Bifurcation and Chaos, 12(3):451–476, [25] A. Pressley. Elementary Differential Geometry. Springer
2002. Verlag, 2001.
[10] T. Igarashi and M. Stilman. Homotopic Path Planning on [26] N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa.
Manifolds for Cabled Mobile Robots. In International CHOMP: Gradient optimization techniques for efficient
Workshop on the Algorithmic Foundations of Robotics, motion planning. In IEEE International Conference on
pages 1–18, 2010. Robotics and Automation, pages 489–494, 2009.
[11] L. Jaillet and J.M. Porta. Path Planning with Loop Clo- [27] W. C. Rheinboldt. MANPACK: A Set of Algorithms
sure Constraints using an Atlas-based RRT. International of Computations on Implicitly Defined Manifolds. Com-
Symposium on Robotics Research, 2011. puters and Mathematics with Applications, 32(12):15–28,
[12] L. Jaillet, J. Cortés, and T. Siméon. Sampling-based 1996.
Path Planning on Configuration-Space Costmaps. IEEE [28] Schunk GmbH & Co. KG. Schunk Anthropomorphic
Transactions on Robotics, 26(4):635–646, 2010. Hand. http://www.schunk.com, 2006.
[13] S. Karaman and E. Frazzoli. Sampling-based Algorithms [29] M. Stilman. Global Manipulation Planning in Robot
for Optimal Motion Planning. International Journal of Joint Space With Task Constraints. IEEE Transactions
Robotics Research, 30(7):846–894, 2011. on Robotics, 26(3):576–584, 2010.
[14] S. Karaman and E. Frazzoli. Incremental Sampling-based [30] W. J. Wedemeyer and H. Scheraga. Exact Analytical
Algorithms for Optimal Motion Planning. In Robotics: Loop Closure in Proteins Using Polynomial Equations.
Science and Systems, pages 267–274, 2011. Journal of Computational Chemistry, 20(8):819–844,
[15] S. Karaman, M. R. Walter, A. Perez, E. Frazzoli, and 1999.

152
Minimal Coordinate Formulation of Contact
Dynamics in Operational Space
Abhinandan Jain, Cory Crean, Calvin Kuo, Steven Myint Hubertus von Bremen
Jet Propulsion Laboratory, California Institute of Technology California State Polytechnic University
Pasadena, CA 91109 Pomona, CA 91768

Abstract—In recent years, complementarity techniques have redundant coordinates lead to large LCP problem size, and
been developed for modeling non-smooth contact and collision require the use of differential-algebraic equation (DAE) like
dynamics problems for multi-link robotic systems. In this ap- techniques for managing error drift in the bilateral constraints
proach, a linear complementarity problem (LCP) is set up
using 6n non-minimal coordinates for a system with n links when integrating the equations of motion.
together with all the unilateral constraints and inter-link bilateral
constraints on the system. In this paper, we use operational space An alternative approach uses minimal hinge coordinates
dynamics to develop a complementarity formulation for contact that automatically eliminate the bilateral constraints for the
and collision dynamics that uses minimal coordinates. The use inter-link hinges [14]. While the underlying physics remains
of such non-redundant coordinates results in much smaller size
LCP problems and the automatic enforcement of the inter-link unchanged, this formulation reduces the size of the LCP prob-
bilateral constraints. Furthermore, we exploit operational space lems, and avoids the need for DAE techniques for controlling
low-order algorithms to overcome some of the computational bilateral constraint violation errors for the inter-link hinges.
bottlenecks in using minimal coordinates. However, the use of minimal coordinate leads to a dense
and configuration dependent mass matrix. This consequent
I. I NTRODUCTION
complexity and computational setup expense for the LCP
For more than a decade, researchers [3, 11, 13] have been problem has been a significant hurdle in the use of such a
developing complementarity based approaches for formulat- minimal coordinate approach.
ing and solving the equations of motion of systems with
contact and collision dynamics. Examples of such dynamics In this paper, we focus on the analytical and computa-
for robotic systems include manipulation and grasping tasks, tional aspects of the minimal coordinate formulation of the
and legged locomotion. This approach models bodies as rigid, complementarity approach to contact and collision dynamics
and uses impulsive dynamics to handle non-smooth collision, for multi-link systems. We adopt the complementarity based
contact interactions and state transitions. By essentially im- physics models from [3, 13], but reformulate the system
pulsively “stepping” over non-smooth events, complementarity dynamics and the associated LCP problem using minimal
methods avoid the small step size and stiff dynamics problems coordinates. While [14] uses a similar minimal coordinate
encountered with penalty based methods which model surface formulation, it is limited to just contact dynamics, and uses
deformation dynamics during contact [9]. the non-optimal divide-and-conquer (DCA) technique [4] as
The complementarity approach involves setting up of a part of its solution technique. This paper goes beyond contact
linear complementarity problem (LCP) that depends on the dynamics to also develop the LCP variants for handling elastic
link mass and inertia properties, contact friction parameters, and inelastic collision dynamics. Moreover, for closed-chain
inter-link bilateral constraints and contact and collision uni- topology systems, we describe a uniform way to incorporate
lateral constraints [3, 11, 13]. The LCP solution identifies the bilateral constraints naturally into the LCP problem. The
the unilateral constraints that are active, and solves for the size of the resulting LCP problem is independent of the
impulsive forces and velocity changes that are consistent with number of links and generalized coordinates, and only depends
the constraints on the system. Variants of the complementarity on the number of contact nodes. Our minimal coordinates
approach to handle elastic and inelastic collisions have also approach adopts an operational space [7] perspective, which
been developed [3]. While the standard LCP formulation uses in turn allows us to take advantage of low-order spatial
a discretized approximation for the friction cones, other re- operator algorithms [5, 6, 8] for computing the operational
searchers [12] have explored non-linear cone complementarity space complementarity matrix (OSCM) needed for setting
approaches that avoid such approximations. up the LCP problem. Taken together the methods described
For a multi-link system with n links, the conventional LCP here provide a comprehensive, and computationally tractable,
problem setup uses 6n non-minimal coordinates [13] together solution to using minimal coordinates for contact and collision
with the bilateral constraints associated with the inter-link dynamics problems. We conclude with an illustrative multi-
hinges and the unilateral contact constraints. This approach link pendulum numerical problem to benchmark the perfor-
has a mass matrix that is block diagonal and constant. Such mance improvements from the minimal coordinates approach.

153
II. E QUATIONS OF M OTION The constraint nodes spatial acceleration αc ∈ R6nc is the
time derivative of Vc which, from Eq. 3 is given by
A. Minimal coordinate dynamics
dVc 3
The minimal coordinate equations of motion for a tree- αc = = Jθ̈ + aos with aos = J̇θ̇ (4)
topology multi-link robotic system with n links and N degrees dt
of freedom has the form Denoting the node accelerations for the free system (i.e.
without the fc nodal forces) by αf , the following analog of
T = Mθ̈ + C − J∗ fc (1) Eq. 4 defines αf in terms of the θ̈f generalized accelerations:

where θ ∈ RN denotes the generalized coordinates, T ∈ RN αf = Jθ̈f + aos (5)


the generalized forces, M ∈ RN×N the system mass matrix,
Pre-multiplying Eq. 2 by J and using Eq. 4 yields:
C ∈ RN the vector of Coriolis, gyroscopic and gravitational
forces, fc ∈ R6nc the stacked vector of nodal forces for nc αc = Λfc + αf where Λ = JM−1 J∗ ∈ R6nc ×6nc
4,5,2

nodes on the system, and J ∈ R6nc ×N the Jacobian matrix for (6)
these nodes. In a minimal coordinates formulation, bilateral Λ is referred to as the operational space compliance matrix
constraints associated with inter-link hinges are eliminated (OSCM). The invertibility of Λ does not depend on J being
by using hinge coordinates that directly parameterize the invertible – only that J have full row-rank. When it exists, the
permissible hinge motion. Thus minimal coordinate models inverse of Λ is referred to as the operational space inertia.
of tree-topology systems have no bilateral constraints.
C. Impulsive dynamics
Bilateral constraints on the other hand are unavoidable even
when using minimal coordinates for closed-chain topology The differential form of the equations of motion in Eq. 2
systems. Such systems are decomposed into a tree-topology can be discretized using an Euler step to obtain a form that
system (formed by a spanning tree) together with a minimal set maps a p impulse stacked vector at the nodes (over a Δt time
of bilateral closure constraints. While the closure constraints interval) into the resulting change in generalized velocities1
define the bilateral constraints, contact constraints lead to a
θ̇+ − θ̇− = M−1 J∗ p + θ̈f Δt with p = fc Δt
2
(7)
set of unilateral constraints on the system.
While the values of the fc nodal force components that arise Multiplying both sides with J leads to the following expression
from control actuators may be known, the ones associated with for the change in nodal spatial velocities
system constraints are not explicitly available. In this paper,
V+ − 3,7
we will assume that fc only contains such implicit constraint c − Vc = Λp + αf Δt (8)
forces, since the effect of the explicit ones is easily handled by Collision events are impulsive and lead to instantaneous
absorbing them into C. In view of this, all nodes in this paper changes in the system velocities, and in this case Δt = 0
will refer to constraint nodes. With fc denoting the implicit in the above equation.
constraint forces, Eq. 1 extends to also be the smooth equations
of motion for non-tree systems, III. U NILATERAL CONTACT CONSTRAINTS
The θ̈ accelerations solution of Eq. 1 can be expressed as Unilateral constraints are defined by inequality relationships
of the form
θ̈ = M−1 J∗ fc + θ̈f where θ̈f = M−1 (T − C) (2) d(θ, t)  0 (9)
θ̈f represents the free generalized accelerations, i.e. the gen- As an example, the non-penetration condition for rigid bodies
eralized accelerations in the absence of the nodal forces. can be stated as an inequality relationship requiring that the
Eq. 2 expresses the overall θ̈ generalized acceleration for the distance between the surfaces of rigid bodies be non-negative.
system as the sum of the θ̈f free generalized accelerations d(θ, t) is generally referred to as the distance or gap function
and the M−1 J∗ fc correction acceleration contribution from for unilateral constraints.
the implicit non-zero nodal forces. The next section describes Contact occurs at the constraint boundary, i.e., when
the operational space form of the equations of motion that d(θ, t) = 0. For bodies in contact, the surface normals at the
describe the mapping between the nodal forces and the nodal contact point are parallel. The existence of contact is typically
spatial accelerations. determined using geometric or collision detection techniques.
For a pair of bodies A and B in contact, we use a convention
B. Operational space dynamics where the ith contact normal n̂(i) is defined as pointing from
The operational space for the multi-link system is defined by body B towards body A, so that motion of A in the direction
the configuration of the set of constraint nodes on the system. of the normal leads to a separation of the bodies.
Let Vc ∈ R6nc denote the stacked vector of spatial velocities A unilateral constraint is said to be in an active state when
[6] of the nc nodes. The relationship between Vc and the θ̇ d(θ, t) = ḋ(θ, t) = d̈(θ, t) = 0 (10)
joint velocities is given by
1 The − and + superscripts denote the respective value of a quantity just
Vc = Jθ̇ (3) before and after the application of an impulse.

154
Thus, a unilateral constraint is active when there is contact, and Combining Eq. 12 and Eq. 14 we have
the contact persists. Only active constraints generate constraint 1 2
forces on the system. Fn (i)
Fu (i) = D(i)β(i) where β(i) = ∈ Rnf +1
A constraint that is not active is said to be inactive. Contact β(i) (15)
separation occurs when the relative linear velocity of the   3×(nf +1)
contact points along the normals becomes positive and the and D(i) = n̂(i), D(i) ∈ R
contact points drift apart. A separating constraint is in the
During sliding, the βj (i) component is non-zero and equal to
process of losing contact and transitioning to an inactive state,
μ(i)Fn (i) for just the single j that corresponds to the closest
and at the start of a separation event, we have
direction opposing the (tangential) relative linear velocity. In
other words, with σ(i) denoting the magnitude of the contact
d(θ, t) = ḋ(θ, t) = 0 and d̈(θ, t) > 0 (11)
relative linear velocity,
A. Contact impulse for an active contact constraint 
μ(i)Fn (i)χ[k=j] if σ(i) > 0
βk (i) = (16)
We now describe contact force modeling using the approach 0 if σ(i) = 0
in references [3, 13]. Denote the number of unilateral contact
nodes by nu . The 6-dimensional spatial impulse at the ith In the above, χ[<cond>] denotes the indicator function whose
active contact constraint node has a zero angular moment com- value is 1 if the condition is true, and 0 otherwise.
ponent. Its non-zero linear impulse component Fu (i) ∈ R3 can
be decomposed into normal and tangential (friction impulse) B. Complementarity relationship for a unilateral contact
components
The sliding/rolling contact relationships can be stated equiv-
Fu (i) = Fn (i)n̂(i) + Ft (i)t̂(i) (12) alently as the following standard complementarity conditions2

t̂(i) denotes a tangent plane vector for the ith contact pair. n̂∗ (i)v+
u (i) ⊥ Fn (i) (separation)

Assuming that the friction coefficient is μ(i), the magnitude σ(i)E(i) + D∗ (i)v+u (i) ⊥ β(i) (friction force direction) (17)
of the tangential Coulomb frictional impulse is bounded by ∗
μ(i)Fn (i) − E (i)β(i) ⊥ σ(i) (friction force magnitude)
the magnitude of the normal component as follows:

Ft (i)  μ(i)Fn (i) (13) where E(i) = col {1}n


j=1 ∈ R
f nf
(18)

When the relative linear velocity between the contact nodes v+


u (i) ∈ R denotes the linear relative velocity of the first
3

is non-zero, the tangential frictional impulse is in a direction body A with respect to the second body B. The component of
opposing the linear velocity vector (which necessarily lies in this relative velocity along the contact normal is, n̂∗ (i)v+
u (i),
the contact tangent plane) and Eq. 13 holds with an equality. and when positive, indicates increasing separation between
When the bodies have non-zero relative linear velocities at the bodies, while a negative value indicates that the bodies
the contact point, the contact is said to be a sliding contact. are approaching each other. The complementarity conditions
Otherwise, when the relative linear velocity is zero, the contact in Eq. 17 enforce the no inter-penetration constraint at the
is said to be a rolling contact. Thus, the tangential friction velocity instead of at the gap level. Hence they are valid only
impulse is on the boundary of the cone defined by Eq. 13 when the gap is zero, i.e., when contact exists [3]. Using
when sliding, and in the interior of the cone when rolling. Eq. 15, Eq. 17 can be alternatively expressed as
For the purpose of numerical computation, the friction cone Ê(i)σ(i) + D∗ (i)v+ ⊥ β(i)
u (i)
at the ith contact is approximated by a friction polyhedron (19)
consisting of a finite number, nf , of unit direction vectors Ē(i)β(i) ⊥ σ(i)
d̂j (i) in the tangent plane. It is assumed that for each direction 1 2
vector, its opposite direction vector is also in the set. For 0
notational simplicity, we assume that nf is the same across all where Ê(i) = ∈ Rnf +1
E(i) (20)
contact points. The ith contact tangential frictional impulse is
∗ 1×nf +1
expressed as the linear combination of these direction vectors and Ē(i) = [μ(i), −E (i)] ∈ R
as follows:
2 A complementarity condition, f(x) ⊥ x, holds for a function f(x) ∈ Rn

nf
of a vector x ∈ Rn , whose xi elements have lower and upper bounds li
Ft (i)t̂(i) = βj (i)d̂j (i) = D(i)β(i) and ui respectively, when the following properties apply: (a) fi (x)  0
j=0 when xi = li ; (b) fi (x)  0 when xi = ui ; and (c) fi (x) = 0 when
  (14) li < xi < ui . For the standard complementarity condition, li = 0 and
where D(i) = d̂1 (i), · · · , d̂nf (i) ∈ R3×nf ui = ∞. It is a linear complementarity condition when f(x) has the form
M(x)x + q(x), and a mixed complementarity condition when li = −∞
and β(i) = col {βj (i)}n
j=1 ∈ R
f nf
and ui = ∞ for one or more of the elements.

155
C. Aggregated complementarity relationships The structure of the standard LCP problem in Eq. 28 is
For now we assume that there are no bilateral constraints on different from that normally found in literature [3, 13, 14].
the system, and thus nc = nu The stacked vector of relative This difference arises from the way we have defined β(i)
linear velocities at the contacts is denoted vu ∈ R3nu . It is in Eq. 15 and used it to arrange the coordinates in the
related to the stacked vector of node spatial velocities Vc ∈ LCP problem. The combined organization of the coordinates
R6nc via the following on a per contact constraint allows us to handle unilateral
and bilateral constraints in a uniform manner. The difference
vu = Qu Vc (21) is only structural and not in the underlying physics of the
problem. The solution to this nu (nf + 2) dimensional LCP
where the Qu ∈ R3nu ×6nc matrix contains one block-row per yields β and σ, and these can be used to propagate the system
contact node-pair, with each row mapping the spatial velocities state as follows:
of the node pair into the relative linear velocity across the
contact. Defining and using the Qu matrix will later allow p = Q∗u D β
us to include unilateral constraints in a manner similar to
fc = p/Δt (30)
bilateral constraints in the operational space formulation. Qu
−1 ∗
has the same structure as the Qb constraint mapping matrix for θ̈ = θ̈f + M J fc
bilateral constraints described later in Eq. 32 for three degree
of freedom spherical hinges. The second equation can be numerically integrated until there
The Qu matrix also relates the Fu ∈ R3nu equal and is a new collision event during the Δt propagation interval.
opposite impulses at contact node-pairs with the corresponding Observe that the p contact impulse is averaged over the Δt
spatial impulses at the nodes, p ∈ R6nc , via time interval to get the fc spatial forces at the contact nodes.
An Euler step integration based discretization of Eq. 30 is:
p = Q∗u Fu (22)
Define the stacked vectors θ̇+ = θ̇− + M−1 J∗ p + Δt θ̈f (31)
 nu
β = col β(i) i=0
∈ Rnu (nf +1) An LCP solution with Fu (i) positive indicates that the ith
(23)
and σ = col {σ(i)}n u
∈R nu contact is active. Furthermore, a zero σ(i) implies that the
i=0
ith contact is a rolling contact while a non-zero value implies
From Eq. 15, we have that it is a sliding contact.
Fu = col {Fu (i)}n
i=1 = Dβ ∈ R
u 3nu
(24)
D = diag {D(i)}n 3nu ×nu (nf +1) D. Including bilateral constraints
i=1 ∈ R
u
where
Thus, As discussed earlier, in the minimal coordinate formulation,
Q∗u Dβ
22,24 bilateral constraints arise from loop closure constraints in
p = (25)
closed-chain topology systems. We now describe extensions
Now we examine the effect of the contact node impulses on to the complementarity formulation when the system has such
the node spatial velocities. bilateral constraints, and thus now nc  nu .
Assuming that nb denotes the dimension of the bilateral
V+ −
= Λ Q∗u D β + Δt αf
8 24
c − Vc = Λp + Δt αf
constraints on the system, a bilateral constraint matrix Qb ∈
(26)
Rnb ×6nc can be used to characterize the loop constraints as
Pre-multiplying Eq. 26 by D∗ Qu we obtain
follows:
D∗ v+ = (D∗ Qu Λ Q∗u D)β + Δt D∗ Qu αf + D∗ v−
21
u u Qb Vc = 0 (32)
(27)
Combining equations 19 and 27, the overall complementarity
conditions can be rephrased as: Eq. 22 and Eq. 25 for the node impulses generalize to
. /1 2 1 2
D∗ Qu Λ Q∗u D Ê β D∗ (Qu Δt αf + v−
u) p = Q∗u Fu + Q∗b λ = Q∗u Dβ + Q∗b λ
24
(33)
+
Ē 0 σ 0
1 2
β where λ ∈ Rnb denotes the Lagrange multipliers associated
⊥ with the bilateral constraints. p now contains contributions
σ
from the unilateral as well as the bilateral constraints.
(28)
To handle the bilateral constraints, Λ in Eq. 27 expands
 n u to be the OSCM for the combined set of unilateral and
where Ê = diag Ê(i) i=1 ∈ Rnu (nf +1)×nu
 n u (29) bilateral constraint nodes. Using Eq. 33 in place of Eq. 25,
Ē = diag Ē(i) i=1 ∈ Rnu ×nu (nf +1) the generalization of the complementarity condition in Eq. 28

156
to include bilateral constraints is as follows: linear velocity at the end of the decompression step remains
w = Mz + q ⊥ z non-negative. The recovered ϑ decompression impulse is
⎛ 1 ∗ 2 ⎞
D Qu Ê ϑ = col {( (i)n̂∗ (i)pc (i)) n̂(i)}n u
∈ R3nu (38)
⎜ Λ [Q∗u D, Q∗b ] ⎟ i=1
with M = ⎝ Qb 0 ⎠
Ē 0 0 (34) The decompression phase LCP has the form:
⎡ ⎤ ⎛ 1 ∗ 2 1
∗ −
2 ⎞
β D Qu D vu w = Mz + qd ⊥ z
⎢ ⎥ ⎜
z = ⎣λ⎦ , q = ⎝ Qb
Δt αf + ⎟ ⎛ 1 ∗ 2 1 ∗ 2 ⎞
0 ⎠ D v+ D Qu
c ∗
σ 0 ⎜ + ΛQu ϑ ⎟ (39)
with qd = ⎝ 0 Qb ⎠
Here M is a (nu (nf + 2) + nb ) size square matrix with z and 0
q being (nu (nf + 2) + nb ) size vectors. The middle λ row
corresponds to the bilateral constraints, and its left hand side The decompression LCP problem is the mixed LCP in Eq. 34

is required to be exactly zero to satisfy the algebraic bilateral with Δt = 0, the contact
1 2 velocity vu replaced with
linear

constraints. Thus this represents a mixed LCP, with the lower D Qu
v+
c , and an additional ΛQ∗u ϑ term for the recovered
bounds on the λ variables being −∞. Analogous to Eq. 31, Qb
the solution to this LCP can be used to propagate the system impulse in the q LCP vector term. The LCP solution is used
state as follows: to instantaneously propagate the state for the decompression
p = Q∗u D β + Q∗b λ phase as follows:
(35)
θ̇+ = θ̇− + M−1 J∗ p + Δt θ̈f p = Q∗u D β + Q∗b λ + Q∗u ϑ
(40)
IV. C OLLISION DYNAMICS θ̇+ = θ̇c + M−1 J∗ p
During inelastic collisions some of the impact energy is
lost. The coefficient of restitution, (i) defines the fraction of When (i) = 0, there is no decompression phase. For collision
the energy that remains after a collision. The complementarity dynamics, the numerical state propagation process consists of
approach to modeling collisions breaks up the collision event the following steps:
into instantaneous compression and decompression phases [3]. 1) Use Eq. 35 to obtain new [θ(t + Δt ), θ̇(t + Δt )] from
During the compression phase, collision energy is stored, and the [θ(t), θ̇(t)] state.
during decompression, a fraction of the collision impulse is 2) If the new [θ(t + Δt ), θ̇(t + Δt )] state involves a new
recovered. collision, then estimate the Δt time for collision, and
redo Step (1) with this new time interval. This should
A. Compression bring the system into contact for the new collision. Now
At the ith contact undergoing collision, the compression use the compression and decompression LCPs in Eq. 37
phase is instantaneous and impulsively changes the relative and Eq. 40 to propagate the system through the collision.
linear contact velocity from v− +
u (i) to a new vc (i) value with An alternative to this discrete event state propagation approach
a non-negative normal component. The compression impulse
is a time-stepping one, where the LCPs are solved only at
is denoted pc (i). The mixed LCP problem for the compression
the end of fixed time steps. This approach is faster, but at the
phase is obtained by setting Δt = 0 in Eq. 34 to obtain
cost of allowing some inter-penetration errors among colliding
⎛ 1 ∗ 2 ⎞
D v− bodies.
u
⎜ ⎟
w = Mz + qc ⊥ z with qc = ⎝ 0 ⎠ (36)
0 V. S PATIAL OPERATOR COMPUTATIONAL ALGORITHM

The LCP solution is used to instantaneously (i.e. impulsively) The key implementation and computational challenge with
propagate the state for the compression phase as follows: using minimal coordinates is the need for computing Λ needed
for setting up and solving the LCP problems in Eq. 28,
pc = Q∗u D β + Q∗b λ Eq. 34, Eq. 36 and Eq. 39. As seen in Eq. 6, Λ involves
θ̇c = θ̇− + M−1 J∗ pc (37) the configuration dependent matrix products of the Jacobian
v+
c = Jθ̇ c matrix and the mass matrix inverse. A direct evaluation of this
expression requires O(N3 ) computations. However references
B. Decompression [5, 6, 8] have used spatial operators to develop simpler and
The decompression phase applies an impulse of magnitude recursive computational algorithms for Λ that are of only
(i)n̂∗ (i)pc (i) for the ith contact along the normal using the O(N) complexity. We briefly sketch out below the underlying
impulse stored during the compression phase. An extra contact analysis and structure of this algorithm, and refer the reader
impulse term ensures that the normal component of the relative to [5, 6, 8] for notation and derivation details.

157
A. Spatial operator factorization of M−1 the parent links of the nodes. From its definition, it is clear
We begin with the following key spatial operator based that Ω is a symmetric and positive semi-definite since D−1 is
analytical results that provide explicit, closed-form expressions a symmetric positive-definite matrix.
for the factorization and inversion of a tree mass matrix [6, 10]: While the explicit computation of M−1 or J is not needed
to obtain Λ, the direct evaluation of Eq. 46 still remains
M = HφMφ∗ H∗ of O(N3 ) complexity due to the need for carrying out the
M = [I + HφK] D [I + HφK]∗ multiple matrix/matrix products. The next section shows that
(41)
[I + HφK]−1 = [I − HψK] these matrix/matrix products can be avoided by exploiting a
decomposition of the Ω matrix.
M−1 = [I − HψK]∗ D−1 [I − HψK]
C. Decomposition of Ω
The first expression defines the Newton-Euler operator fac-
torization of the mass matrix M in terms of the H hinge The following lemma describes a decomposition of Ω into
articulation, the φ rigid body propagation and the M link simpler component terms and an expression for its block
spatial inertia operators. While this factorization has non- elements. The E∗ψ and ψ() terms used below are defined in
square factors, the second expression describes an alternative references [5, 6]. Furthermore, ℘(k) denotes the parent link
factorization involving only square factors with block diagonal for the kth link, and i ≺ j notation implies that the jth link
D and block lower-triangular [I+HφK] matrices. This factor- is an ancestor of the ith link in the tree.
ization involves new spatial operators that are associated with
the articulated body (AB) forward dynamics algorithm [4, 5] Lemma 1 Decomposition of Ω
for the system. The next expression describes an analytical Ω can be decomposed into the following disjoint sum:
expression for the inverse of the [I + HφK] operator. Using Ω = Υ + ψ̃∗ Υ + Υψ̃ + R
this leads to the final analytical expression for the inverse of 
the mass matrix. These operator expressions hold generally for where R = ei ψ∗ (k, i)Y(k)ψ(k, j)e∗j (47)
tree-topology systems irrespective of the number of bodies, the ∀i,j: i⊀j
k=℘(i,j)
types of hinges, the specific topological structure, and even for
non-rigid links [6]. Υ ∈ R6nc ×6nc is a block-diagonal operator, referred to as the
operational space compliance kernel, satisfying the following
B. The Ω extended operational space compliance matrix backward Lyapunov equation:
With V ∈ R6n denoting the stacked vector of link spatial  
velocities, its spatial operator expression is [6] H∗ D−1 H = Υ − diagOf E∗ψΥEψ (48)
 
V = φ∗ H∗ θ̇ (42) diagOf E∗ψΥEψ represents just the block-diagonal part of
Bundling together the rigid body transformations for all nodes the (generally non block-diagonal) E∗ψΥEψ matrix. The 6 × 6
we define the B ∈ R6nc ×6n pick-off matrix such that dimensional, symmetric, positive semi-definite Υ(k) diagonal
matrices satisfy the following parent/child recursive relation-
Vc = BV = Bφ∗ H∗ θ̇ J = Bφ∗ H∗
42 3
⇒ (43) ship:
This is the spatial operator expression for the J Jacobian Υ(k) = ψ∗ (℘(k), k)Υ(℘(k))ψ(℘(k), k) + H∗ (k)D−1 (k)H(k)
matrix. Using this expression and Eq. 41 for the mass matrix (49)
inverse within Eq. 6 leads to the following expression for Λ: This relationship forms the basis for the following O(N) base-
to-tips scatter recursion for computing the Υ(k) diagonal
Λ = JM−1 J∗
6
(44) elements:
= B∗ φ∗ H∗ (I − HψK)∗ D−1 (I − HψK)HφB
41 ⎧
⎪ for all nodes k (base-to-tips scatter)


Using the following spatial operator identity from [6, 10] ⎨ Υ(k) = ψ∗ (℘(k), k)Υ(℘(k))ψ(℘(k), k)
(50)
(I − HψK)Hφ = Hψ (45) ⎪

⎪ + H∗ (k)D−1 (k)H(k)

end loop
in Eq. 44 leads to the following simpler expression for Λ:
While Υ defines the block-diagonal elements of Ω, the fol-
Λ = B∗ ΩB with Ω = ψ∗ H∗ D−1 Hψ ∈ R6nc ×6nc (46) lowing recursive expressions describe its off-diagonal terms:
We have arrived at an expression for Λ, that unlike Eq. 6, ⎧

⎪ Υ(i) for i = j
involves neither the mass matrix inverse nor the node’s Ja- ⎪

cobian matrix! We refer to Ω as the extended operational Ω(i, k)ψ(k, j) for i ) k * j, k = ℘(j)
Ω(i, j) =
space compliance matrix. This terminology is based on Eq. 46 ⎪
⎪ Ω∗ (j, i) for i ≺ j


which shows that the OSCM, Λ can be obtained by a reducing Ω(i, k)ψ(k, j) for i  j, j  i, k = ℘(i, j)
transformation of the full, all body Ω matrix by the B pick-off (51)
operator involving just the matrix sub-blocks associated with Proof: See [5, 6].

158
Eq. 47 shows that Ω can be decomposed into the sum of
.

simpler terms consisting of the block diagonal Υ, the upper-


triangular ψ̃∗ Υ, the lower triangular Υψ̃, and the sparse R
matrices. Furthermore, Eq. 51 reveals that all of the block-
elements of Ω(i, j) can be obtained from the Υ(i) elements
of the Υ block-diagonal operational space compliance kernel.
Since only a small subset of the elements of Ω are needed
for computing Λ, the next section exploits this to avoid the
expensive computation of the full Ω matrix.

D. Computing Λ
From the Λ = B∗ ΩB expression, and the sparse structure
of B, it is clear that only a subset of the elements of Ω are
needed to compute Λ. The B pick-off operator has one column
for each of the nodes, with each such column having only a
single non-zero 6 × 6 matrix entry at the kth parent link slot.
Only as many elements of Ω as there are elements in Λ are
needed. Thus, just nc ×nc number of 6×6 sub-block matrices
of Ω are required. In view of the symmetry of the matrices,
we actually need just nc (nc + 1)/2 such sub-block matrices.
The overall cost of this algorithm is linearly proportional to
the number of degrees of freedom, and a quadratic function
of the number of nodes. This is much lower than the O(N3 )
cost implied by Eq. 6.

E. Simulation example
We use a simulation of a multi-link pendulum colliding with
itself and the environment to measure the performance of our
minimal coordinate formulation. The environment consists of
a floor and a wall located 4m away. The multi-link pendulum
consists of n identical 1kg mass spherical bodies connected
with pin hinges. The radius of the sphere is scaled based
on the number of links to maintain a 12m overall length
of the pendulum. The pendulum base is located at a height
of 10m. The open source Bullet software [1] is used for
collision detection, and the PATH software [2] for solving Fig. 1. Time series capture of swinging pendulum simulation with 12 links
mixed complementarity problems.
The simulation lasts for 5 seconds with a time step of 0.1ms.
A coefficient of restitution of 0.5 is used to simulate inelastic the solutions from these two methods, Figure 2 shows the
collisions. The pendulum starts at an angle of π/2 radians similar time history of the height and normal velocity of the
with an initial angular velocity of 1radian/s and a gravitational
acceleration of 9.8m/s2 .
As the pendulum swings from left to right, it collides with
the ground, bounces off of the ground, and eventually collides
with the wall on the right. In the course of the sequence,
multiple links are at times in collision with the ground, the
wall and with each other as seen in Figure 1 for a 12-link
pendulum.
We simulate the contact and collision dynamics using two
different techniques. In the first minimal coordinate (MC) tech-
nique, the serial-chain pendulum is modeled using minimal Fig. 2. Comparisons of the height and normal velocity of the last link using
coordinates and only unilateral contact constraints. The sec- the MC and RC methods for a 12-body pendulum
ond redundant coordinate (RC) technique uses non-minimal
coordinates where each link is treated as an independent body, last link of a 12-body pendulum from the two methods.
and the hinges are handled as bilateral constraints between the Table I compares the computational cost of the MC and RC
neighboring links. We have verified good agreement between methods for pendulums containing 3 to 30 links. We observe

159
Computation Time (s)
Autonomous Robotic Manipulation Software Track (ARM-S)
Number of Bodies Minimal Coords Redundant Coords
program.
3 13.3258 47.7948
6 10.5026 68.7122 R EFERENCES
12 19.0632 305.2764
15 21.7605 558.0966
[1] Bullet Physics Library, 2012. URL
24 38.1993 1899.8575 http://bulletphysics.org.
30 73.7386 4100.5061 [2] The PATH Solver, 2012. URL
http://pages.cs.wisc.edu/∼ferris/path.html.
TABLE I [3] M Anitescu and F A Potra. Formulating dynamic multi-
A comparison of the computational time for the minimal coordinate (MC) rigid-body contact problems with friction as solvable
and the redundant coordinate (RC) techniques for the multi-link pendulum
example with different number of links. linear complementarity problems. Nonlinear Dynamics,
14(3):231–247, 1997.
[4] Roy Featherstone. Rigid Body Dynamics Algorithms.
Springer Verlag, 2008.
that the MC method is 3 to 50 times computationally faster [5] Abhinandan Jain. Graph Theoretic Foundations of
than the RC method. Moreover the performance gap widens Multibody Dynamics Part II: Analysis and Algorithms.
substantially as the number of links in the system is increased. Multibody System Dynamics, 26(3):335–365, October
2011.
VI. C ONCLUSIONS [6] Abhinandan Jain. Robot and Multibody Dynamics:
In this article we have described the formulation of the Analysis and Algorithms. Springer, 2011.
contact and collision dynamics for multi-link systems using [7] Oussama Khatib. A Unified Approach for Motion and
minimal coordinates. With minimal coordinates, the size of Force Control of Robot Manipulators: The Operational
the LCP problem in Eq. 34 is (nu (nf + 2) + nb ) which Space Formulation. IEEE Journal of Robotics and
is independent of the number of links and the number of Automation, RA-3(1):43–53, February 1987.
degrees of freedom in the system. In contrast, the size of [8] K Kreutz-Delgado, Abhinandan Jain, and Guillermo Ro-
the corresponding LCP problem using redundant coordinates driguez. Recursive formulation of operational space
has dimension larger by 6n − N or more depending on the control. Internat. J. Robotics Res., 11(4):320–328, 1992.
specific formulation. For a 6-link manipulator with 6 degrees [9] F Pfeiffer. Mechanical System Dynamics. Springer,
of freedom, this amounts to dimensional difference of 30. 2005.
The use of minimal coordinates also results in the automatic [10] Guillermo Rodriguez, Abhinandan Jain, and K Kreutz-
enforcement of the bilateral constraints such as from inter- Delgado. A spatial operator algebra for manipulator
link hinges and avoids the need for DAE type error control modeling and control. Internat. J. Robotics Res., 10(4):
schemes for inter-link hinge bilateral constraints. 371, 1991.
We have described the variants of the LCP problem needed [11] D. Stewart and Jeffrey C Trinkle. An implicit
to handle effects such as bilateral constraints associated with time-stepping scheme for rigid body dynamics with
closed-chain topologies as well as elastic and inelastic colli- Coulomb friction. In Proceedings 2000 ICRA.
sion dynamics. Our treatment of unilateral constraints allows Millennium Conference. IEEE International Conference
us to handle them in the same way as bilateral constraints on Robotics and Automation. Symposia Proceedings
in the operational space based formulation. Finally, we have (Cat. No.00CH37065), pages 162–169. Ieee, 2000.
shown that existing low order computational algorithms for [12] A Tasora and M Anitescu. A matrix-free cone comple-
computing the OSCM can be used to make the computation mentarity approach for solving large-scale, nonsmooth,
of the LCP matrices tractable. We have used a multi-link pen- rigid body dynamics. Computer Methods in Applied
dulum contact and collision dynamics simulation example to Mechanics and Engineering, 200(5-8):439–453, January
illustrate the computational speedup achieved by the minimal 2011.
coordinate approach. [13] Jeffrey C Trinkle. Formulation of Multibody Dynamics
as Complementarity Problems. In ASME International
Acknowledgments
Design Engineering Technical Conference, Chicago, IL,
The research described in this paper was performed at September 2003.
the Jet Propulsion Laboratory (JPL), California Institute of [14] Katsu Yamane and Yoshihiko Nakamura. A Numerically
Technology, under a contract with the National Aeronautics Robust LCP Solver for Simulating Articulated Rigid
and Space Administration under a contract with the National Bodies in Contact. In Robotics: Science and Systems IV,
Aeronautics and Space Administration and funded through chapter A numerica, pages 89–104. MIT Press, 2009.
the internal Research and Technology Development program.3
This project was also supported in part by the DARPA
3 c 2012 California Institute of Technology. Government sponsorship ac-
knowledged.

160
Nonparametric Bayesian Models for
Unsupervised Scene Analysis and Reconstruction
Dominik Joho, Gian Diego Tipaldi, Nikolas Engelhard, Cyrill Stachniss, Wolfram Burgard
Department of Computer Science, University of Freiburg, Germany
{joho, tipaldi, engelhar, stachnis, burgard}@informatik.uni-freiburg.de

Abstract—Robots operating in domestic environments need to


deal with a variety of different objects. Often, these objects
are neither placed randomly, nor independently of each other.
For example, objects on a breakfast table such as plates,
knives, or bowls typically occur in recurrent configurations. In
this paper, we propose a novel hierarchical generative model
to reason about latent object constellations in a scene. The
proposed model is a combination of a Dirichlet process and
beta processes, which allows for a probabilistic treatment of the
unknown dimensionality of the parameter space. We show how
the model can be employed to address a set of different tasks in
scene understanding including unsupervised scene segmentation
and completion of partially specified scenes. We describe how
to sample from the posterior distribution of the model using Fig. 1. A scene typically contains several observable objects and the task
Markov chain Monte Carlo (MCMC) techniques and present an is to infer the latent meta-objects where a meta-object is considered to be
experimental evaluation with simulated as well as real-world data a constellation of observable objects. At a breakfast table, for example, the
obtained with a Kinect camera. meta-objects might be the covers that consist of the observable objects plate,
knife, fork, and cup.

I. I NTRODUCTION
Imagine a person laying a breakfast table and the person same, they differ in the sense that some objects may be missing
gets interrupted so that she cannot continue with the breakfast and that the objects may not be arranged in the same way.
preparation. A service robot such as the one depicted in Fig. 1 When specifying a generative model for our problem, we
should be able to proceed laying the table without receiving have the difficulty that the dimensionality of the model is part
specific instructions. It faces a series of questions: how to of the learning problem. This means, that besides learning
infer the total number of covers, how to infer which objects the parameters of the model, like the pose of a meta-object,
are missing on the table, and how should the missing parts we additionally need to infer the number of involved meta-
be arranged? For this, the robot should not require any user- objects, meta-object parts, etc. The standard solution would
specific pre-programmed model but should ground its decision be to follow the model selection approaches, for example,
based on the breakfast tables it has seen in the past. learning several models and then choosing the best one.
In this paper, we address the problem of scene under- Such a comparison is typically done by trading off the data
standing given a set of unlabeled examples and generating likelihood with the model complexity as, for example, done for
a plausible configuration from a partially specified scene. The the Bayesian information criterion (BIC). The problem with
key contribution of this paper is the definition of a novel this approach is the huge number of possible models, which
hierarchical nonparametric Bayesian model to represent the renders this approach intractable in our case.
scene structure in terms of object groups and their spatial To avoid this complexity, we follow another approach,
configuration. We show how to infer the scene structure in motivated by recent developments in the field of hierarchical
an unsupervised fashion by using Markov chain Monte Carlo nonparametric Bayesian models based on the Dirichlet process
(MCMC) techniques to sample from the posterior distribution and the beta process. These models are able to adjust their
of the latent scene structure given the observed objects. complexity according to the given data, thereby sidestepping
In our model, each scene contains an unknown number the need to select among several finite-dimensional model
of latent object constellations or meta-object instances. In alternatives. Based on a prior over scenes, which is updated
the breakfast table example, a place cover can be seen as a by observed training scenes, the model can be used for
latent meta-object instance of a certain type that, for example, parsing new scenes or completing partially specified scenes
consists of the objects plate, knife, and cup. An instance of by sampling the missing objects.
a different type might consist of a cereal bowl and a spoon. Whereas in this paper, we consider the problem of learning
The meta-object instances are sampled from a distribution over the object constellations on a breakfast table as depicted in
object constellations (Fig. 2). Thus, not all instances are the Fig. 1, our model is general and not restricted to this scenario.

161
for learning the pairwise structure of the underlying graphical
model. Triebel et al. [19] presented an unsupervised approach
to segment 3D range scan data and to discover objects by the
frequency of the appearance of their parts. The data is first
segmented using graph-based clustering, then each segment is
treated as a potential object part. The authors used CRFs to
represent both the part graph to model the interdependence of
parts with object labels, and a scene graph to smooth the object
labels. Spinello et al. [14] proposed an unsupervised detection
technique based on a voting scheme of image descriptors.
They introduced the concept of latticelets: a minimal set of
Fig. 2. A meta-object type is a distribution over object constellations. It
is modeled as a collection of parts, each having a Gaussian distribution, a pairwise relations that can generalize the patterns connectivity.
multinomial distribution over object types, and a binary activation probability. Conditional random fields are then used to couple low level
detections with high level scene description. Jiang et al. [8]
used an undirected graphical model to infer the best placement
II. R ELATED W ORK for multiple objects in a scene. Their model considers several
In this section we describe the relevant works on unsuper- features of object configurations, such as stability, stacking,
vised scene analysis. A first family of approaches, and the and semantic preferences.
most related to our model, employs nonparametric Bayesian A different approach is the one of Fidler and Leonardis [5].
models to infer spatial relations. Sudderth et al. [15] introduced They construct a hierarchical representation of visual input
the transformed Dirichlet process, a hierarchical model which using a bottom-up strategy. They learn the statistically most
shares a set of stochastically transformed clusters among significant compositions of simple features with respect to
groups of data. The model has been applied to improve more complex higher level ones. Parts are learned sequen-
object detection in visual scenes, given the ability of the tially, layer after layer. Separate classification and grouping
model to reason about the number of objects and their spatial technique are used for the bottom and top layers to account
configuration. Following his paper, Austerweil and Griffiths for the numerical difference (sensor data) and semantical ones
[2] presented the transformed Indian buffet process, where (object category).
they model both the features representing an object and their The novelty of the approach presented in this paper lies
position relative to the object center. Moreover, the set of in the combination of a Dirichlet process and beta-Bernoulli
transformations that can be applied to a feature depends on processes which provides us with a prior for sampling or
the objects context. completing entire scenes.
A complementary approach is the use of constellation
models [3, 4, 11]. These models explicitly consider parts III. G ENERATIVE S CENE M ODEL
and structure of objects and can be learned efficiently and In this section, we describe the proposed generative scene
in a semisupervised manner. The star model [4], which is the model. We assume that the reader is familiar with the basics
more efficient variant of constellation models, uses a sparse of nonparametric Bayesian models [6], especially with the
representation of the object consisting of a star topology con- Chinese restaurant process (CRP) and the Dirichlet process
figuration of parts modeling the output of a variety of feature (DP) [16], the (two-parameter) Indian buffet process (IBP) and
detectors. The main limitations of these methods, however, the beta process (BP) [7, 18], and the concepts of hierarchical
lies in the fact that the number of objects and parts must [17] and nested [12] processes in this context.
be defined beforehand and thus cannot be trivially used for In the following, we consider a scene as a collection of
scene understanding and object discovery. Ranganathan and observable objects represented as labeled points in the 2D
Dellaert [11] used a 3D constellation model for representing plane. The 2D assumption is due to our motivation to model
indoor environments as object constellations. A closely related table scenes. However, the model is not specifically geared
approach [10] to constellation models uses a hierarchical rule- towards 2D data and could in principle also be applied to
based model to capture spatial relations. It also employs 3D data. Basically, we assume that each scene contains an
a star constellation model and a variant of the expectation unknown number of latent object constellations (place covers).
maximization (EM) algorithm to infer the structure and the An object constellation is called a meta-object instance (or
labels of the objects and parts. simply meta-object) and corresponds to a sample from a meta-
Another family of approaches relies on discriminative learn- object type, which is a distribution over object constellations
ing and unsupervised model selection techniques. One ap- and is represented as a part-based model with infinitely many
proach is to automatically discover object part representa- parts. As illustrated in Fig. 2, each part has a binary activation
tions [13]. In this work, the authors introduced a latent con- probability, a Gaussian distribution over the relative object
ditional random field (CRF) based on a flexible assembly of position, and a multinomial distribution over the object type
parts. Individual part labels are modeled as hidden nodes and (knife, fork, etc.). To sample from a meta-object type, one first
a modified version of the EM algorithm has been developed samples the activation of each part. For each activated part,

162
IBP IBP

CRP

IBP IBP IBP IBP

Scene 1 Scene 2

Fig. 3. Basic structure of our model: The Indian buffet processes (IBP) on top Fig. 4. A more detailed view on a part IBP representing a meta-object type.
are nested within the tables (represented as circles) of the Chinese restaurant This would correspond to one of the IBPs at the very top in Fig. 3. The rows
process (CRP) below them. The blocks within the IBP frames represent the represent customers and the columns represent dishes (parts). The customers
relevant part of the IBP matrix (see Fig. 4). Clusters in the scenes are meta- of this IBP are the meta-object instances associated to this meta-object type
object instances and objects (colored/gray points) of the clusters need to be in any of the scenes. The objects of the instances must be associated to one of
associated to entries in the IBPs shown on top. This is explicitly shown for the parts. They thereby update the posterior predictive distribution (illustrated
one cluster in the first scene. For visibility reasons, the rest of the associations at the bottom) over the objects of a new customer. Remember, that the actual
are drawn as a single thick line. At the lowest level, each scene has a meta- part parameters are integrated out due to the usage of conjugate priors.
object IBP (shown on the left) and a noise IBP (on the right) from which the
meta-object instances and the noise objects of a scene are drawn.

the object from the part’s multinomial distribution over types


(plate, knife, fork, etc.). Having assembled the cover this way,
one then samples the relative position and the object type to be
the robot returns to the breakfast table and puts it randomly
generated at this location. Each activated part generates exactly
on it. The process is then repeated for the remaining n − 1
one object per meta-object instance. Thus, the objects of a
covers. See Fig. 3 for an overview of the model structure.
scene can be grouped into clusters. Each cluster corresponds
to a meta-object instance and the objects of a cluster can be More formally, we have a hierarchical model with a high-
associated to the parts of the corresponding meta-object type. level Dirichlet process DPt , a low-level beta process BPc and
a further independent beta process BP . First, we draw Gt
A. Description of the Generative Process from the high-level DPt
Following the example given in the introduction, imagine Gt ∼ DPt (αt , BPp (cp , αp , Dir ×N W)), (1)
that our robot’s goal is to set a table for a typical family
breakfast. At the beginning, it enters a room with an empty where the base distribution of DPt is a beta process BPp
breakfast table and an infinite number of side tables, each hold- modeling the parts’ parameters. This is done only once and
ing a prototypical cover. It estimates the area A of the breakfast all scenes to be generated will make use of the same draw Gt .
table surface and boldly decides that n ∼ Poisson(Aλ) covers This draw describes the distribution over all possible meta-
are just right. The robot chooses one of the side tables and object types (cover types) and corresponds to the Chinese
finds a note with the address of a Chinese restaurant where restaurant mentioned above. The base distribution of the beta
it can get the cover. Arriving there, it sees again infinitely process is the prior distribution over the part parameters. The
many tables each corresponding to a particular cover type parameters are a 2D Gaussian distribution over the relative
and each displaying a count of how often someone took a location and a multinomial over the observable object types.
cover from this table. It is fine with just about any cover type The parameters are sampled independently and we use their
and decides randomly based on the counts displayed at the conjugate priors in the base distribution, i.e., a (symmetric)
tables, even considering a previously unvisited table. At that Dirichlet distribution Dir for the multinomial and the normal-
table there is another note redirecting to an Indian restaurant. Wishart distribution N W [9] for the 2D Gaussian. The mass
In this restaurant, it is being told that the cover needs to be parameter αp of BPp is our prior over the number of activated
assembled by choosing the parts that make up this cover. The parts of a single meta-object instance and the concentration
robot randomly selects the parts based on their popularity and parameter cp influences the total number of instantiated parts
even considers to use a few parts no one has ever used before. across all instances of the same type. Likewise, the parameter
For each chosen part its relative position is sampled from αt influences the expected number of meta-object types.
the part’s Gaussian distribution and then the robot samples Each scene s has its own meta-object IBP and the meta-

163
object instances are determined by a single draw from the part is active and dj,k = 0 is a reference to the associated
corresponding beta-Bernoulli process as follows: observable object, i.e., dj,k = i if it generated the object zi .
Next, for each scene we have the associations a to the noise
c ∼ BPc (1, |As | αc , Gt × U (As × [−π, π]))
G(s)
IBP, i.e., the list of objects currently not associated to any
(2)
meta-object instance. For ease of notation, we will use index
{Gtj , Tj }j ∼ BeP(G(s)
c ) (3) functions [·] in an intuitive way, for example, t[−j] denotes
{μk , Σk , γ k }k ∼ BeP(Gtj ) for each j (4) all type assignments except the type assignment tj of meta-
{x, ω} ∼ p(z | μk , Σk , γ k , Tj ) for each k (5) object j, and T[tj ] denotes all poses of meta-objects that have
the same type tj as meta-object j, and z[d] are all observations
In Eq. (2), the concentration parameter is irrelevant and referenced by the associations d, etc.
arbitrarily set to one. The mass parameter |As | αc is the We employ Metropolis-Hastings (MH) moves to sample
expected number of meta-object instances in a scene. The base from the posterior [1], which allows for big steps in the state
distribution of BPc samples the parameters of a instance j: space by updating several strongly correlated variables at a
its type tj and its pose Tj . The type tj is drawn from the time. In the starting state of the Markov chain, all objects are
distribution over meta-object types Gt from Eq. (1). The pose assigned to the noise IBP of their respective scene and thus are
Tj is drawn from a uniform distribution U over the pose space interpreted as yet unexplained objects. We sample for a fixed
As ×[−π, π] where As is the table area. Each atom selected by yet sufficiently high number of iterations to be sure that the
the Bernoulli process in Eq. (3) corresponds to a meta-object Markov chain converged. We use several types of MH moves,
instance and this selection process corresponds to the side table which we will explain in the following after describing the
metaphor mentioned above. The meta-object type tj basically joint likelihood.
references a draw Gtj from the nested beta process in Eq. (1) Joint Likelihood: p(C, a, z) = p(T, t, d, a, z). The joint
which models the parts of a meta-object type. Thus, in Eq. (4) likelihood is
we need another draw from a Bernoulli process to sample the . S / .n /
& &t
activated parts for this instance, which yields the Gaussians p(C, a, z) = p(ns, )p(ns,m ) p(t) p(d[t] | t)
(μk , Σk ) and the multinomials (γ k ) for each active part k. s=1 t=1
Finally, in Eq. (5) we draw the actual observable data from the .n K /
&t &t
data distribution as realizations from the multinomials and the p(T) p(z[d[t,k] ] | T[t] , d[t,k] , t) . (8)
(transformed) Gaussians, which yields an object z = {x, ω} t=1 k=1
on the table with location x and type ω for each activated part.
Here, ns,m is the number of meta-object instances and ns,
Each scene has an additional independent beta process BP
is the number of noise objects in scene s. Each dish param-
G(s) ∼ BP (1, α , M × U (As )) (6) eter of the noise IBP directly corresponds to the parameters
{xi , ωi }i ∼ BeP(G (s)
) (7) zi = {xi , ωi } of the associated noise object, as we assumed
that there is no data distribution associated with these dishes.
that directly samples objects (instead of meta-objects) at ran- The base distribution for the dish parameters consists of
dom locations in the scene. Here, M is a multinomial over the independent and uniform priors over the table area and the
observable object types and U (As ) is the uniform distribution object types, and so each dish parameter has the likelihood
over the table area As . This beta-Bernoulli process will mainly (nω |As |)−1 , where nω is the number of observable object
serve as a “noise model” during MCMC inference to account types and |As | is the area of the table in scene s. Thus,
for yet unexplained objects in the scene. Accordingly, we set the probability of the objects z[a[s] ] associated to a scene’s
the parameter α to a rather low value to penalize scenes with noise IBP only depends on the number of noise objects
many unexplained objects. Figs. 3 and 4 illustrate the overall and not on the particular type or position of these objects.
structure of the model. However, it would be straightforward to use a non-uniform
B. Posterior Inference in the Model base distribution. Denoting the Poisson distribution with mean
λ as Poi(· | λ) we thus have
In this section, we describe how to sample from the posterior
distribution over the latent variables {C, a} given the obser- p(ns, ) = p(z[a[s] ] , a[s] ) = ns, ! Poi(ns, | α )(nω |As |)−ns, .
vations z. We use the following notation. The observations are (9)
the objects of all scenes and a single object zi = {xi , ωi } has Next, p(ns,m ) = ns,m ! Poi(ns,m | αm |As |) is the prior
a 2D location xi on the table and a discrete object type ωi . probability for having ns,m meta-objects in scene s. The dish
A meta-object instance j has parameters Cj = {Tj , tj , dj }, parameters of a meta-object IBP are the meta-object param-
where Tj denotes the pose and tj denotes the meta-object eters Cj , consisting of the pose, type, and part activations.
type, which is an index to a table in the type CRP, and dj The likelihood for sampling a pose p(Tj ) = (|As | 2π)−1 is
denotes part activations and associations to the observable uniform over the 5Stable surface and uniform in orientation,
objects. If dj,k = 0 then part k of meta-object instance j hence p(T) = s=1 (|As | 2π)−ns,m . Next, p(t) is the CRP
is inactive, where k is an index to a dish in the corresponding prior for the meta-object types of all meta-object instances.
part IBP (which is nested in the CRP table tj ). Otherwise, the The factors p(d[t] | t) are the IBP priors for the part activations

164
for all meta-object instances of type t and there are nt different over the observable object types can be integrated out as it
types currently instantiated. During MCMC sampling, we will forms a conjugate pair with the Dirichlet distribution. The
only need the conditional for a single meta-object j posterior predictive distribution for a single object type is
nω + α ω
p(tj , dj | t[−j] , d[−j] ) = p(dj | d[−j,tj ] , t)p(tj | t[−j] ). (10) p(ω[dj,k ] | ω[d[−j,tj ,k] ] ) =  , (15)
ω  (nω + αω )
 

Here, p(tj | t[−j] ) is the CRP predictive distribution where nω is the number of times an object of type ω has been
4 ni associated to part k of this meta-object type tj , and αω is
i is an existing type
p(tj = i | t[−j] ) = αc + αci ni , the pseudo-count of the Dirichlet prior. When describing the
α c + i  ni  i is a new type MCMC moves, we will sometimes make use of the predictive
(11) likelihood for all objects z[dj ] associated to a single meta-
and ni is the number of meta-object instances of type i (not object instance j. This likelihood factors into the posterior
counting instance j) and αc is the concentration parameter predictive distributions of the individual parts and their spatial
of the CRP. Further, p(dj | d[−j,tj ] , t) is the predictive and object type components, as described above. In the fol-
distribution of the two-parameter IBP, which factors into lowing, we will describe the various MCMC moves in detail.
activation probabilities for each of the existing parts and an Death (birth) move: (Tj , tj , dj , a) → (a ). A death move
additional factor for the number of new parts, denoted as n+ . selects a meta-object j uniformly at random, adds all of its
An existing part is a part that has been activated by at least one currently associated objects to the noise process and removes
other meta-object instance of this type in any of the scenes. the meta-object j from the model. The proposal probability
The activation probability for an existing part k is for this move is qd (C−j , a | Cj , C−j , a) = (nm )−1 where
nk nm denotes the number of instantiated meta-objects in this
p(dj,k = 0 | d[−j,tj ] , t) = , (12)
nt j + c p scene before the death move. To simplify notation, we will
just write qd (Cj ) for the probability of deleting meta-object
where cp is the concentration parameter of the part IBP, nk is
j. The reverse proposal is the birth proposal qb (Cj , C−j , a |
the number of meta-object instances that have part k activated
C−j , a, z) that proposes new parameters Cj = {Tj , tj , dj }
in any of the scenes, and ntj is the total number of meta-object
for an additional meta-object: the pose Tj , the type tj , and the
instances of type tj in all scenes (the counts exclude the meta-
associations dj . The new meta-object may reference any of
object j itself). The probability for having n+ associations to
the objects previously associated to the noise process and any
new parts is
 (  non-referenced noise objects remain associated to the noise
( c p αp
p(d[j,+] | d[−j,tj ] , t) = n+ ! Poi n+ (( , (13) process. We will describe the details of the birth proposal in
nt + c pj detail later on. To simplify notation, we will just write qb (Cj )
where αp is the mass parameter of the part IBP, and d[j,+] for the birth proposal. Plugging in the model and proposal
denotes the associations to new parts. distributions in the MH ratio and simplifying we arrive at the
As stated in Eq. (8), the data likelihood p(z[d] | T, d, t) for acceptance ratio of the death move
the objects associated to meta-objects factors into likelihoods 1
Rd =
for each individual part k. Further, it factors into a spatial p(z[dj ] | z[d[−j,tj ] ] , T[tj ] , d[tj ] , t)p(tj , dj | t[−j] , d[−j] )
component and a component for the observable object type. 1 p(nm − 1) p(n + nj ) qb (Cj )
As the meta-object poses T are given, we can transform the . (16)
p(Tj ) p(nm ) p(n ) qd (Cj )
absolute positions x[d[t,k] ] of the objects associated to a certain
part k of meta-object type t into relative positions x̃[d[t,k] ] The counts nj , nm , and n refer to the state before the death
with respect to a common meta-object reference frame. The move, and nm denotes the number of meta-objects in this
relative positions are assumed to be sampled from the part’s scene, nj are the number of objects currently associated to
Gaussian distribution which in turn is sampled from a normal- meta-object j, and n is the number of noise objects. The
Wishart distribution. As the Gaussian and the normal-Wishart ratio of a birth move is derived similarly.
Switch move: (Tj , tj , dj , a) → (Tj , tj , dj , a ). This move
distribution form a conjugate pair, we can analytically integrate
is a combined death and birth move. It removes a meta-object
out the part’s Gaussian distribution which therefore does not
and then proposes a new meta-object using the birth proposal.
have to be explicitly represented. Hence, the joint likelihood
Thus, the number of meta-objects remains the same but one
for x̃[d[t,k] ] of part k is computed as the marginal likelihood
meta-object simultaneously changes its type tj , pose Tj , and
under a normal-Wishart prior. During MCMC inference, we
part associations dj . The death proposals cancel out and the
only need to work with the posterior predictive distribution
acceptance ratio of this move is
p(x̃[dj,k ] | x̃[d[−j,tj ,k] ] ) = tν (x̃[dj,k ] | μ, Σ) (14) p(z[dj ] | z[d[−j,t ] ] , Tj , T[−j,tj ] , dj , d[−j,tj ] , tj , t[−j] )
j
Rs =
for a single relative position given the rest. This is a multivari- p(z[dj ] | z[d[−j,tj ] ] , Tj , T[−j,tj ] , dj , d[−j,tj ] , tj , t[−j] )
ate t-distribution tν with parameters μ, Σ depending both on
p(tj , dj | t[−j] , d[−j] )p(Tj )p(n )qb (Cj )
x̃[d[−j,tj ,k] ] and the parameters of the normal-Wishart prior – . (17)
for details see [9]. Similarly, the part’s multinomial distribution p(tj , dj | t[−j] , d[−j] )p(Tj )p(n )qb (Cj )

165
Shift move: (Tj ) → (Tj ). This move disturbs the pose Tj Gaussian noise to it. The type tj is sampled from the current
of a meta-object by adding Gaussian noise to it while the type predictive distribution of the type CRP. In contrast, the match-
and part associations remain unchanged. The acceptance ratio ing mode was inspired by bottom-up top-down approaches
depends only on the spatial posterior predictive distributions and aims to propose Tj and tj in a more efficient way by
of the objects associated to this meta-object. The proposal considering the currently instantiated meta-object types in the
likelihoods cancel due to symmetry and the final ratio is model. However, in contrast to the object mode, it cannot
p(x[dj ] | x[d[−j,tj ] ] , Tj , T[−j,tj ] , d[tj ] , t)p(Tj ) propose new meta-object types (new tables in the type CRP).
RT = . (18) It selects two objects and associates them to a suitable part
p(x[dj ] | x[d[−j,tj ] ] , Tj , T[−j,tj ] , d[tj ] , t)p(Tj ) pair. This suffices to define the pose Tj as the corresponding
Association move (existing part): (dj , a) → (dj , a ). This transformation of these parts into the scene. In detail, we
move samples the part activation and object association of an first sample one of the objects zi at random and choose
existing part k of a single meta-object j. In the IBP metaphor, its nearest neighbor zj . We match this ordered pair zi , zj 
this corresponds to sampling the selection of a single existing against all ordered part pairs ki , kj   of the meta-object types
dish (part) for a single customer (meta-object instance). If the to obtain their matching probabilities pm with respect to the
existing part is already associated with an object, we consider parts’ posterior predictive means, μi and μj  , of the spatial
this object to be temporarily re-associated to the noise process distribution, and their posterior predictive distributions, Mi
(such that there are now n noise objects). We then use Gibbs and Mj  , over the observable object types ωi and ωj
sampling to obtain one of n + 1 possible associations: either pm (ki , kj   | zi , zj ) ∝ N (dΔ | 0, σm
2
)Mi (ωi )Mj  (ωj ).
the part k is inactive (dj,k = 0) and not associated with any (21)
object, or it is active (dj,k = 0) and associated with one out Here, dΔ = xi − xj  − μi − μj   is the residual of the
of n currently available noise objects (e.g. zi when dj,k = i). objects’ relative distance w.r.t. the distance of the posterior
The probabilities for these cases are proportional to means and σm 2
is a fixed constant. We then sample a part
p(dj,k = i) ∝ pair ki , kj   according to its matching probability to define,
⎧ together with the objects zi and zj , the pose Tj . Finally, we

⎪p(dj,k = 0 | d[−j,tj ,k] , t)p(n ) i=0
⎨ (19) add Gaussian noise to Tj . The sampled part pair implicitly
p(zi | z[d[−j,tj ,k] ] , T[tj ] , dj,k , d[−j,tj ,k] , t) defines the meta-object type tj .


⎩ i=0 After having sampled Tj and tj using either of these two
p(dj,k = 0 | d[−j,tj ,k] , t)p(n − 1)
modes, the next step is to sample the associations dj . For
Association move (new parts): (dj , a) → (dj , a ). This this, we randomly choose, without repetition, a noise object
move samples the associations of objects to new parts. For this, and use Gibbs sampling to obtain its association to either: (a)
we use two complementary MH moves: one move increases a yet unassociated part of the meta-object instance; (b) a new
the number of new parts by one by assigning a noise object part of the meta-object instance; or (c) be considered a noise
to a new part, while the other move decreases the number of object. The probabilities for (a) and (b) are proportional to the
new parts by one by assigning an associated object (of a new spatial and object type posterior predictive distributions of the
part) to the noise process. The acceptance ratio for removing respective parts, while (c) is based on the noise IBP’s base
object zi from a new part k of meta-object j that currently distribution.
has n+ new parts is Besides sampling from the proposal distribution we also
p(d[j,+] | d[−j,tj ] , t) p(n + 1) q+ need to be able to evaluate the likelihood qb (Cj ) for sampling
1
R− = a given parameter set Cj . For this, we need to marginalize
p(zi | Tj , dj,k ) p(d[j,+] | d[−j,tj ] , t) p(n ) q− over the latent variables of the proposal, e.g. the binary mode
(20)
variable (object mode or matching mode), the chosen object
The proposal q− = (n+ )−1 chooses one of the new parts to
zi , and the chosen part pair of the matching mode.
be removed uniformly at random, while the reverse proposal
q+ = (n + 1)−1 chooses uniformly at random one of then IV. E XPERIMENTS
n + 1 noise objects to be associated to a new part. The MH We tested our model on both synthetic data and real-world
move that increases the number new parts is derived similarly. data acquired with a Kinect camera.
Birth proposal: qb (Tj , tj , dj ). The birth, death, and switch
moves rely on a birth proposal that samples new meta-object A. Synthetic Data
parameters Cj = {Tj , tj , dj }. The general idea is to sample In the synthetic data experiment, table scenes were gener-
the pose Tj and type tj in a first step. We then proceed to ated automatically using a different, hand-crafted generative
sample the associations dj given Tj and tj , i.e., the potential model. We generated 25 training scenes ranging from two to
assignment of noise objects to the parts of this meta-object. six covers. The cover types represent two different breakfast
Sampling Tj and tj is done in either of two modes: the types. The first type consists of a cereal bowl and a spoon,
object mode or the matching mode. In object mode, we choose while the second type consists of a plate with a glass, a fork
an object uniformly at random and center the meta-object pose and a knife. A fork can be randomly placed at the right or the
Tj at the object’s location (with random orientation) and add left of a plate or being absent.

166
(a) Examples of parsed scenes of the synthetic data set. (b) Examples of parsed scenes of the real-world data set.
Fig. 5. The inferred meta-objects with their part relationships. Each picture shows a different scene from a different MCMC run (best viewed in color).

Part 1 Part 2
0.2 0.99 Activation prob. 0.97 Activation prob.
0.99 Glass 0.99 Mug

Part 2
0.99 Activation prob.
0.99 Knife
[m]

0
plate?
Part 1
Part 3 0.97 Activation prob.
0.99 Activation prob.
Part 4 0.99 Knife
0.99 Plate
0.62 Activation prob. Part 3
−0.2 0.99 Fork 0.86 Activation prob.
0.99 Plate

−0.2 0 0.2 −0.2 0 0.2


[m] [m] (a) Incomplete scene of the synthetic data set.
Fig. 6. Examples of spatial posterior predictive distributions of several parts
of a cover type and their activation probabilities and most likely observable
object types. The cover type on the left was learned on synthetic data, while
the one on the right was learned on the real-world data set.

cup knife
plate cup
plate
plate?
spoon
knife

bowl
(b) Incomplete scene of the real-world data set.
Fig. 8. Inferred meta-objects in an incomplete scene of (a) the synthetic data
Fig. 7. Segmentation (left) and classification (right) results on the point set and (b) the real-world data set (best viewed in color). The color (gray-
cloud segments visualized in the Kinect image. scale value) of the spheres indicate the meta-object type and the line colors
(gray-scale values) indicate the part of the meta-object type. In both scenes
one plate has been removed. The model is still able to correctly segment the
incomplete scenes and infer the missing objects. To enhance readability the
The latent parameters are inferred using all the generated pictures have been modified with human readable labels.
training scenes and setting the hyperparameters to α = 0.5,
cp = 0.25, αp = 2.5, αt = 5, and αc = 1. As a first test,
we wanted to show that the model is able to segment the to infer the correct cover type even in the absence of the
scenes in a consistent and meaningful way. The results of missing object. As can be seen in Fig. 8a, the approach
this test are shown in Fig. 5a. A set of different scenes are correctly segmented and recognized the meta-objects and is
segmented by using the learned model. We see that each scene aware of the missing plate (the red branch of the hierarchy
has been segmented with the same cover types and objects are that is not grounded on an existing object). The missing object
correctly clustered. Note that the color (gray-scale value) of can then be inferred by sampling from the part’s posterior
the meta-objects and of the parts can change in every run, predictive distribution over the location and type of the missing
since the ordering of types and meta-objects is not relevant object, as the one shown in Fig. 6 (left).
in our model. What is important is that the topological and
metrical configuration are respected. B. Real-world Data
A second test is to see if the model can infer the meta- We used a Microsoft Kinect depth camera to identify the
objects in incomplete scenes and if it is able to complete them. objects on the table by, first, segmenting the objects by
To this end, we artificially eliminated objects from already subtracting the table plane in combination with a color-based
generated scenes and segmented the altered scene again using segmentation. Second, we detect the objects based on the
the learned model. We discovered that the approach was able segmented pointcloud using a straight forward feature-based

167
log-likelihood 1,000 Cognition (R6-[SpaceGuide]) and by the EC under contract
0
number FP7-ICT-248258-First-MM.

−1,000
R EFERENCES
[1] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan.
An Introduction to MCMC for Machine Learning. Machine
meta-object parts

Learning, 50(1–2):5–43, 2003.


20
[2] J. L. Austerweil and T. L. Griffiths. Learning invariant features
10 using the Transformed Indian Buffet Process. In Advances in
Neural Information Processing Systems (NIPS), 2010.
0 [3] R. Fergus, P. Perona, and A. Zisserman. Object class recognition
by unsupervised scale-invariant learning. In Proc. of the IEEE
meta-object types

6 Conf. on Computer Vision and Pattern Recognition (CVPR),


4
2003.
[4] R. Fergus, P. Perona, and A. Zisserman. A Sparse Object Cate-
2 gory Model for Efficient Learning and Exhaustive Recognition.
In Proc. of the IEEE Conf. on Computer Vision and Pattern
0
Recognition (CVPR), 2005.
0 1,000 2,000 3,000 4,000 5,000 6,000
[5] S. Fidler and A. Leonardis. Towards Scalable Representations of
MCMC sampling iteration Object Categories: Learning a Hierarchy of Parts. In Proc. of
the IEEE Conf. on Computer Vision and Pattern Recognition
Fig. 9. The plots depict the log-likelihood, the total number of meta-object (CVPR), 2007.
parts (summed over all meta-object types), and the number of meta-object [6] S. J. Gershman and D. M. Blei. A tutorial on Bayesian
types as they evolve during MCMC sampling. The red (dark gray) lines nonparametric models. Journal of Mathematical Psychology,
corresponds to the synthetic data set and the blue (light gray) lines to the 56(1):1–12, 2012.
real-world data set. [7] T. L. Griffiths and Z. Ghahramani. Infinite latent feature
models and the Indian buffet process. In Advances in Neural
object detection system with a cascade of one-vs-all classifiers. Information Processing Systems (NIPS). 2006.
An example for the segmentation and object identification is [8] Y. Jiang, M. Lim, C. Zheng, and A. Saxena. Learning to Place
shown in Fig. 7. Note that there are likely to be better detection New Objects in a Scene. Int. Journal of Robotics Research
systems, however, the task of detecting the objects on the table (IJRR), 2012. To appear.
[9] K. P. Murphy. Conjugate Bayesian analysis of the Gaussian
is orthogonal to the scientific contribution of this paper. distribution. Self-published notes, 2007.
The same set of tests performed on the synthetic data [10] D. Parikh, C. L. Zitnick, and T. Chen. Unsupervised Learning of
have been performed also in this case, showing basically the Hierarchical Spatial Structures In Images. In Proc. of the IEEE
same results. In particular, Fig. 5b shows the segmentation Conf. on Computer Vision and Pattern Recognition (CVPR),
results, Fig. 8b shows a modified scene where a plate has 2009.
[11] A. Ranganathan and F. Dellaert. Semantic Modeling of Places
been removed and Fig. 6 (right) shows the posterior predictive using Objects. In Proc. of Robotics: Science and Systems (RSS),
distribution for location and type, as for the synthetic case, that Atlanta, GA, USA, 2007.
can be used to complete an incomplete scene. [12] A. Rodrı́guez, D. B. Dunson, and A. E. Gelfand. The Nested
To better illustrate the inference process, we plot the log- Dirichlet Process. Journal of the American Statistical Associa-
likelihood, the number of parts, and the number of meta-object tion, 103(483):1131–1154, 2008.
[13] P. Schnitzspan, S. Roth, and B. Schiele. Automatic Discovery of
types in Fig. 9 as they evolve during MCMC sampling. Meaningful Object Parts with Latent CRFs. In Proc. of the IEEE
V. C ONCLUSION Conf. on Computer Vision and Pattern Recognition (CVPR),
2010.
This paper presents a novel and fully probabilistic genera- [14] L. Spinello, R. Triebel, D. Vasquez, K. O. Arras, and R. Sieg-
tive model for unsupervised scene analysis. Our approach is wart. Exploiting Repetitive Object Patterns for Model Com-
able to model the spatial arrangement of objects. It maintains pression and Completion. In Proc. of the European Conf. on
Computer Vision (ECCV), 2010.
a nonparametric prior over scenes, which is updated by ob-
[15] E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky.
serving scenes and can directly be applied for parsing new Describing Visual Scenes Using Transformed Objects and Parts.
scenes and for model completion by inferring missing objects Int. Journal of Computer Vision, 77(1–3):291–330, 2008.
in an incomplete scene. Our model applies a combination [16] Y. W. Teh and M. I. Jordan. Hierarchical Bayesian Nonpara-
of a Dirichlet process and beta processes, allowing for a metric Models with Applications. In Bayesian Nonparametrics:
Principles and Practice. Cambridge University Press, 2010.
probabilistic treatment of the model complexity. In this way,
[17] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierar-
we avoid a model selection step, which is typically intractable chical Dirichlet Processes. Journal of the American Statistical
for the models considered here. To evaluate our approach, Association, 101(476):1566–1581, 2006.
we successfully used our approach to infer missing objects [18] R. Thibaux and M. I. Jordan. Hierarchical Beta Processes and
in complex scenes. the Indian Buffet Process. In Proc. of the Int. Conf. on Artificial
Intelligence and Statistics (AISTATS), 2007.
ACKNOWLEDGMENTS [19] R. Triebel, J. Shin, and R. Siegwart. Segmentation and Unsu-
pervised Part-based Discovery of Repetitive Objects. In Proc. of
This work has been supported by the German Research Robotics: Science and Systems (RSS), 2010.
Foundation (DFG) under contract number SFB/TR 8 Spatial

168
Distributed Approximation of Joint Measurement
Distributions Using Mixtures of Gaussians
Brian J. Julian∗† , Stephen L. Smith‡ , and Daniela Rus∗

Abstract—This paper presents algorithms to distributively ap- estimation calculations in a centralized manner, and then glob-
proximate the continuous probability distribution that describes ally broadcast the results to enable the robots to better position
the fusion of sensor measurements from many networked robots. their sensors. For large systems, the central processor quickly
Each robot forms a weighted mixture of scaled Gaussians to
represent the continuous measurement distribution (i.e., likeli- becomes a computational and communication bottleneck, and
hood) of its local observation. From this mixture set, each robot thus is not considered to be scalable [4].
then draws samples of Gaussian elements to enable the use of a
11
consensus-based algorithm that evolves the corresponding canon- 10 7
ical parameters. We show that these evolved parameters form a 9 10
distribution that converges weakly to the joint of all the robots’ 8 4
unweighted mixture distributions, which itself converges weakly 7 6

ĵ dimension
6 1
to the joint measurement distribution as more system resources
5 3
are allocated. The innovation of this work is the combination of 4 9
sample-based sensor fusion with the notion of pre-convergence 3 5
termination without the risk of ‘double-counting’ any single 2 8

observation. We also derive bounds and convergence rates for 1 2


0
the approximated joint measurement distribution, specifically 0 1 2 3 4 5 6 7 8 9 10 11
î dimension
the elements of its information vectors and the eigenvalues of
its information matrices. Most importantly, these performance
guarantees do not come at a significant cost of complexity, since
computational and communication complexity of the canonical
parameters scales quadratically with respect to the Gaussian
dimension, linearly with respect to the number of samples, and
constant with respect to the number of robots. Results from
numerical simulations for object localization are discussed using
both Gaussians and mixtures of Gaussians.

I. I NTRODUCTION
Fig. 1: This figure shows the distributed approximation of the joint measure-
We wish to develop scalable approaches to state estimation ment distribution using mixtures of Gaussians. Top Right: Ten robots (dark
tasks such as tracking, surveillance, and exploration using circles) are located in three dimensional space with their î and ĵ positions
large teams of autonomous robots equipped with sensors. shown and their k̂ positions equal to their corresponding indices. The robots
use range only sensors and communicate on an undirected graph to locate
Consider the task of using many aerial robots to monitor an object of interest (center star). Top Left: Each robot forms an unweighted
the flow of objects into and out of a major seaport (e.g., mixture of 1000 Gaussians to represent its independent measurement dis-
ships, containers, ground vehicles). To collectively estimate the tribution of the object’s position given its local observation. Here we show
the 1st robot’s normalized mixture distribution on two dimensional slices
objects’ positions, one approach is to wirelessly communicate that intersect at the object’s true location of (5, 5, 5) meters. Bottom: The
all sensor measurements to a data fusion center, perform the 1st robot’s normalized approximation of the joint measurement distribution
becomes more informative as more consensus iterations are performed. Note
∗ Brian J. Julian and Daniela Rus are with the Computer Science and that the shading scales vary among the plots to highlight the distribution’s
structure.
Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA, bju-
[email protected] and [email protected]
† Brian J. Julian is also with MIT Lincoln Laboratory, 244 Wood Street, We propose to solve this problem in a purely decentralized
Lexington, MA 02420, USA manner. Each robot (i) independently makes a local obser-
‡ Stephen L. Smith is with the Department of Electrical and Computer
Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada, vation using its imperfect sensors, (ii) represents the corre-
[email protected] sponding measurement likelihoods with a weighted mixture
The authors would like to thank Dan Feldman for his contributions to the of scaled Gaussians, (iii) draws samples from this mixture
proof for Theorem 2.
This work is sponsored by the Department of the Air Force under Air set to form an unweighted mixture set, and lastly (iv) runs
Force contract number FA8721-05-C-0002. The opinions, interpretations, a consensus-based algorithm to approximate the distribution
recommendations, and conclusions are those of the authors and are not describing the joint of all robots’ observations (Figure 1). This
necessarily endorsed by the United States Government. This work is also
supported in part by ONR grant number N00014-09-1-1051, NSF grant approximation can then be used in a sequential Bayesian filter
number EFRI-0735953, MIT Lincoln Laboratory, and the Boeing Company. to update the robots’ belief of the continuous state of a finite

169
extent of the world. Building on our prior work [8] that only tively formed Gaussian mixtures to representations of the joint
considered a discrete set of probabilities, each robot uses the measurement distribution. In Section IV we derive bounds and
consensus-based algorithm to evolve its representation of the convergence rates for the elements of the joint mixture set,
independent measurement distribution into an approximation followed by numerical simulations in Section V to illustrate
for the joint measurement distribution. This allows for resource these performance guarantees and motivate the work.
adaptive state estimation, for which the computation and II. P ROBLEM F ORMULATION
communication complexities do not depend on the number
of robots. A. General Setup
We prove for all robots on a static, connected, undirected Consider a system of robots, where each robot has a belief
communication graph that the approximation of the joint mea- of the continuous-valued state concerning the same finite
surement distribution converges weakly1 to the joint of all the extent of the world. We model the world state2 as a random
robots’ unweighted mixture distributions. The given restric- variable, X, that takes values from a continuous alphabet,
tions on the graph are used to derive bounds and convergence X . Each robot cannot perfectly measure the world state,
rates for the approximated joint measurement distribution, but instead makes an observation with its sensors that are
specifically elements of its information vectors and eigenvalues influenced by noise. The robots’ synchronous observations
of its information matrices. Yet, the implementation works on together form a joint observation, which we also model as
arbitrary networks without risk of catastrophic failures (e.g., a random variable, Y . Sensing may be interpreted as using
robustness against robot failures), and without restriction on a noisy channel, and thus the relationship between the true
the number of communication rounds that the robots need to world state and the noisy observation is described by the joint
use for the consensus-based algorithm. An extremely attractive measurement distribution3 (JMD), P(Y = y|X), where y is
aspect of this work is that expected performance provably the value the joint observation takes.
improves as more system resources are allocated. We believe Our goal is to enable each robot to independently perform
these theoretical contributions can drive the development of the posterior calculations,
P[i] (X)P(Y =y|X)
application specific sensor fusion approaches that are unbiased, P[i] (X|Y = y) = P[i] (X=x)P(Y =y|X=x)dx
, (1)
x∈X
convergent, and scalable.
We have been inspired by over two decades worth of ad- needed for sequential Bayesian filter predictions and mutual
vancements in distributed estimation algorithms. An approach information gradient-based control [8], where P[i] (X) is the
to compute locally optimal estimators from many independent prior distribution that describes the ith robot’s belief of the
sensor measurements at a central fusion center was described world state. Since the sensors of any two robots are physically
in detail by Gubner [6]. Concerning decentralization, the early detached from one another, we assume that the errors on the
work of Durrant-Whyte et al. [4] with decentralized Kalman observations are uncorrelated between robots. In other words,
filters laid the basis for the Decentralized Data Fusion ar- a random variable that describes the ith robot’s observation,
chitecture [10]. Extensions incorporating consensus averaging Y [i] , is conditionally independent of any other random variable
algorithms [3, 16] have been used for maximum-likelihood that describes another robot’s observation, Y [v] with v = i,
parameter estimation [15], maximum a-posteriori estimation given the world state. This assumption 5nfor a system of nr
[11], and distributed Kalman filtering [1, 17]. robots gives a JMD of P(Y = y|X) = i=1 r
P(Y [i] = y [i] |X)
when we model the joint observation as an nr -tuple random
One of the most relevant works in distributed Kalman
variable, Y = (Y [1] , . . . , Y [nr ] ), where y [i] is the value that
filtering is by Ren et al. [13], who showed the convergence
the ith robot’s observation takes and P(Y [i] = y [i] |X) is the
of a filter incorporating information-based states. The proof of
ith robot’s independent measurement distribution (IMD).
convergence for each Gaussian element in the joint distribution
Thus, the posterior calculations from (1) become
approximation closely follows in our work, even though our
P[i] (X) n
v=1 P(Y
r [v]
=y [v] |X)
algorithms apply to a larger class of Bayesian filters (e.g., P[i] (X|Y = y) = P[i] (X=x) n r P(Y [v] =y [v] |X)dx , (2)
x∈X v=1
map merging [2]). This generality is shared by the approach
of Fraser et al. [5] using hyperparameters. However, our work and our goal of approximating the JMD over all possible
enables the early termination of the consensus-based algorithm continuous-valued world states becomes equivalent to approx-
without the risk of ‘double-counting’ any single observation, imating the product of all the IMDs. It is this equivalence that
even when the maximum in/out degree and the number of enables our distributed algorithm to reasonably approximate
robots are unknown. arbitrary continuous distributions using independently formed
This paper is organized as follows. In Section II we for- mixtures of multivariate Gaussian distributions. By ‘reason-
malize the problem of distributively approximating the joint ably’ we mean that the approximation has the performance
measurement distribution within a multi-robot system, then characteristics discussed in Section IV, such as convergence
discuss the use of a consensus-based algorithm to calculate to the true distribution as certain system resources increase
products of Gaussian distributions in Section III. Also in (e.g., computational capacity, network bandwidth, etc.).
Section III are our main results on the convergence of distribu- 2 We use the term world state as shorthand for state of the finite extent of
the world.
1 Also known as convergence in distribution. 3 Also known as measurement likelihood.

170
[i] [i]
B. Decentralized System (πk )βk converge to the JMD evaluated at x in the limit as
k → ∞, where βk = ψk −1
[i] [i]
Let the nr robots communicate according to an undirected ∞ is a scalar exponential factor
[i] [i]
communication graph, G, with a corresponding unordered edge that converges to nr [9]. The expansion of (πk )βk leads to
set, E; that is, {i, v} ∈ E if the ith and vth robots are neighbors. the following.
We use N [i] to denote the set of neighbors of the ith robot,
which has an in/out degree of |N [i] |. In addition, we consider Theorem 1 (Consensus of a Scaled Gaussian JMD). For all
robots let ψk ∈ [0, 1]nr , ξk ∈ Rng , and Ωk ∈ Rng ×ng be
[i] [i] [i]
the corresponding Metropolis-Hastings weight matrix [16], W,
which has the form [i] [i]
initialized to ei , ξ0 , and Ω0 , respectively, and have all evolve
⎧  [i]
⎪ 1− 1
, i = v, according to (3) on a connected graph G. In addition, let αk ∈
⎨ 
v ∈N
max{|N [i] |,|N [v ] |}+1
[i] [i]
R>0 be initialized to p[i] η(ξ0 , Ω0 ) and evolve according to
[i]
[W]iv = 1
, {i, v} ∈ E,

⎩ max{|N [i] |,|N [v] |}+1 (4). Then we have that
0, otherwise, [i]
[i]
(αk )βk [i] [i] [i] [i]
where [·]iv denotes the matrix entry (i, v). For vectors, [·]m [i] [i] [i] [i] Nc (βk ξk , βk Ωk ) → pNc (ξ, Ω), ∀x ∈ X ,
η(βk ξk ,βk Ωk )
denotes the mth row entry.
It was previously shown [9] that by initializing the consen- as k tends to infinity. In other words, the approximation
[i] [i] distribution converges weakly to the JMD.
sus states ψk and πk to a basis vector and a probability
vector, respectively, each robot can run the discrete-time Proof (Theorem 1). We first note for all robots and commu-
[i]
average consensus algorithm nication rounds that πk is a product of values taken from
[i] [i]  [v]
[i] [i]
unnormalized Gaussians. Hence (πk+1 )βk+1 is itself a value
ψk+1 = [W]ii ψk + v∈N [i] [W]iv ψk (3)
that is taken from an unnormalized Gaussian proportional to
  
5 [v] 
in parallel with its exponential form [i] [v]
[i] [i] 5 [v]
exp βk+1 [W]iv xT ξk − 12 xT Ωk x
πk+1 = (πk )[W]ii v∈N [i] (πk )[W]iv (4) v∈{{i}∪N [i] }

to distributively approximate a finite set of joint measurement which gives us the desired consensus update expressions for
probabilities, where k ∈ Z≥0 denotes the communication [i] [i]
ξk+1 and Ωk+1 . Lastly, from [16, 9] we have for every x ∈
1
round and the above algorithms are understood to be element- [i] [i] [i]
X that πk , βk , and αk converge to Nc (ξ, Ω) nr , nr , and
wise. In this paper, we extend this work to multivariate  1
pη(ξ, Ω) nr , respectively.
Gaussian distributions, then show that this extension supports
the approximations of arbitrary JMDs using mixtures of Gaus-
sians4 . Note that many other algorithms of this form yielding Remark 1 (Complexity of Consensus States). Even though
[i] [i]
asymptotic average consensus are also appropriate (e.g., [11]). π0 is dependent on a given world state, we have that α0 ,
[i] [i]
III. D ISTRIBUTED A PPROXIMATIONS ξ0 , and Ω0 are not. Thus, Theorem 1 implies that we can run
our consensus-based algorithm on parameters of sizes O(1),
A. Product of Scaled Gaussian IMDs O(ng ), and O(n2g ), respectively, to reasonably approximate a
Consider all robots i ∈ {1, . . . , nr } having IMDs that JMD of Gaussian form over all world states.
are accurately represented by non-degenerate5 ng -dimensional
[i] [i] B. Mixtures of Gaussians
scaled Gaussians, p[i] Nc (ξ0 , Ω0 )6 , where for each robot
ξ0 ∈ Rng is its information vector, Ω0 ∈ Rng ×ng is
[i] [i] As previously stated, our goal is to enable each robot
its information matrix, and p [i]
= P(Y [i]
= y [i] ) is its to independently perform the posterior calculations (2) by
scaling factor. For such IMDs, the JMD from (2) is equal distributively approximating the product of all the IMDs. With
nr [i] Theorem 1, we have presented sufficient tools to enable these
to p Nc (ξ, Ω), where ξ = i=1 ξ0 is the joint information
 nr [i] approximations if each IMD can be accurately represented
vector, Ω = i=1 Ω0 is the joint information matrix, and
1
5nr [i] [i] [i] by a single scaled Gaussian. Performance guarantees for this
p = η(ξ,Ω) i=1 p η(ξ0 , Ω0 ) is the joint scaling factor. particular case will be discussed in Section IV. For arbitrary
[i]
For a given world state x ∈ X , let πk ∈ R≥0 be initialized continuous distributions, we now complete the approach using
to the corresponding robot’s IMD evaluated at x. In addition, mixtures of Gaussians.
[i]
let ψk ∈ [0, 1]nr be initialized to the standard basis ei pointing  Let each robot form a weighted
 mixture set M[i] :=
in the ith direction of Rnr . On a connected graph G, we [i,j] [i,j] [i,j]
(w , ξ0 , Ω0 ) : j ∈ I(i) composed of triples that
can use (4) and (3) at each communication round to have have scalar weights7 w[i,j] ∈ (0, 1], information vectors
∈ Rng ×ng .
[i,j] [i,j]
4 We use the term Gaussians as shorthand for multivariate Gaussian distri- ξ0 ∈ Rng , and information matrices Ω0
butions. For simplicity we assume that the weighted summation of
5 By non-degenerate we mean that the information matrix of a Gaussian is the corresponding ng -dimensional scaled Gaussians accurately
a real positive-definite symmetric matrix.  represents the IMD, but also note that an approximation of this
6 We define N (ξ, Ω) := η(ξ, Ω) exp xT ξ − 1 xT Ωx with η(ξ, Ω) :=
c 2
1   1 
det(Ω) 2 / (2π)ng exp(ξ T Ω−1 ξ) 2 . 7 We have for all robots that j∈I(i) w[i,j] = 1.

171
form converges weakly8 to the true IMD in the limit as the size Next, let us define an indicator random variable χǍ[j] for
of the weighted mixture set tends to infinity. Hence, the JMD the event {Ǎ[j] = a}. We have that
is represented by the product of these weighted summations 
n m
across all robots, Λ(M̌) = 1
nm p̌[j] Nc (ξˇ[j] , Ω̌[j] )
5
nr  [i,j] [i,j]
j=1
Λ(M) := w[i,j] p[i] Nc (ξ0 , Ω0 ). 
n m  5
nr
[v,[a]v ] [v,[a]v ]
i=1 j∈I(i) = 1
nm χǍ[j] p[v] Nc (ξ0 , Ω0 ). (5)
j=1 nr v=1
Unfortunately, the computational complexity of computing a∈ I(i)
i=1
the JMD scales exponentially with respect to the number
of robots, and thus is intractable even for moderately sized By the Strong Law of Large Numbers [14], we have that
systems. We now describe algorithms that forms an unbiased9 
n m

approximation of the JMD, for which computation is tractable lim1


χ [j] = E(χǍ[j] ) = w[a] ,
nm →∞ nm j=1 Ǎ
and readily distributed among the robots. Let each robot draw
nm samples from its weighted mixture set with probabilities with probability one. Therefore, exchanging the order of the
proportional to the corresponding weights. The ordered pair- summations in (5) and taking the limit as the number of
ing of drawn information vectors and information samples tend to infinity, we have with probability one that
[i]  [i,j] matrices
[i,j]
form an unweighted  mixture set, M̌ 0 := (ξˇ0 , Ω̌0 ) :  5
nr
[v,[a]v ] [v,[a]v ]
j ∈ {1, . . . , nm } , from which the average over all scaled lim Λ(M̌) = w[a] p[v] Nc (ξ0 , Ω0 )
nm →∞ nr v=1
Gaussians, a∈ I(i)
i=1
[i] 
n m
[i,j] [i,j]  5
nr
Λ(M̌0 ) := n1m p[i] Nc (ξˇ0 , Ω̌0 ), = w[i,[a]v ] p[v] Nc (ξ0
[v,[a]v ]
, Ω0
[v,[a]v ]
)
j=1 nr v=1
a∈ I(i)
approximates the robot’s IMD. We then define the joint mix- i=1

ture set, M̌, to be the unweighted set of canonical parameter 5


nr  [i,j] [i,j]
pairs resulting from the product of the robots’ unweighted = w[i,j] p[i] Nc (ξ0 , Ω0 ) = Λ(M).
i=1 j∈I(i)
independent mixture distributions
 having equal indices. More 
formally, we have M̌ := (ξˇ[j] , Ω̌[j] ) : j ∈ {1, . . . , nm } , We now prove that the joint mixture distribution is an
 nr ˇ[i,j]  nr [i,j] unbiased approximation of the JMD. Let the following random
where ξˇ[j] = i=1 ξ0 and Ω̌[j] = i=1 Ω̌0 . We are in-
terested in each robot independently forming the joint mixture variables take values according to the corresponding (assumed
set to approximate the JMD, leading to the following. to be normalized) distributions:

Lemma 1 (Properties of the Joint Mixture Distribution). Ž ∼ Λ(M̌), Ž [j] ∼ Nc (ξ̌ [j] , Ω̌[j] ),
r
n
[i,[a] ] [i,[a] ]
Define the joint mixture distribution to be the average over Z [a] ∼ Nc (ξ0 i , Ω0 i ), Z ∼ Λ(M).
i=1
scaled Gaussians formed from the joint mixture set,

n m
Considering the expected value of Ž, we have that
Λ(M̌) := n1m p̌[j] Nc (ξˇ[j] , Ω̌[j] ),
j=1

n m 
5 nr [i,j] [i,j] E(Ž) = nm E(
1
Ž [j] ) = E(Ž [1] ) = w[a] E(Z [a] ) = E(Z),
where p̌ [j]
= 1
i=1 p
η(ξ̌ [j] ,Ω̌[j] )
[i]
η(ξˇ0 , Ω̌0 ).
Then the j=1 nr
a∈ I(i)
joint mixture distribution is an unbiased approximation of the i=1

JMD that converges weakly as the number of samples nm


where the equalities in order are due to (i) the linearity of the
tends to infinity.
expected value function, (ii) the independence of the drawn
Proof (Lemma 1). We first prove that the joint mixture samples, (iii) the conditional independence of the robots’
distribution
5nr converges weakly to the JMD. Consider any tuple observations, and (iv) the definition of the joint mixture set.
a ∈ i=1 I(i), where for each i ∈ {1, . . .5 , nr } the ith Thus, the joint mixture distribution is unbiased.
nr
entry is [a]i ∈ I(i). Let us define w[a] := i=1 w
[i,[a]i ]
.
For a given unweighted sample j ∈ {1, . . . , nm }, let Ǎ[j] = C. Approximation of the Joint Mixture Distribution
(Ǎ[1,j] , . . . , Ǎ[nr ,j] ) be an nr -tuple random variable, for which Revisiting Remark 1, we can run a consensus-based al-
each element Ǎ[i,j] takes samples from I(i) with probability gorithm on canonical parameters of sizes O(ng ) and O(n2g )
P(Ǎ[i,j] = [a]i ) = w[i,[a]i ] . Hence, the probability that the jth to reasonably approximate a JMD of Gaussian form over all
sample is a is given by world states. This capability combined with the independence
5
nr 5
nr of the joint mixture set size with respect to the number
P(Ǎ[j] = a) = P(Ǎ[i,j] = [a]i ) = w[i,a[i] ] = w[a] . of robots is the key to enabling distributed and scalable
i=1 i=1
approximations of the JMD. The following formalizes the
8 For weak convergence, we are assuming that the robot’s IMD belongs to
approach and its convergence, while the subsequent remarks
a certain reproducing kernel Hilbert space. See [12] for more detail.
9 By unbiased we mean that the expected first moment of the approximation discuss its significance and limitations.
and the true distribution are equal.

172
Corollary 1 (Distributed Approximation and Convergence). Since we are forming these distributions from the scaled
[i,j] [i] [i,j] [i] [i,j]
For all robots and samples j ∈ {1, . . . , nm }, let ξˇk and canonical parameters βk ξˇk and βk Ω̌k , this making sense
[i,j] [i,j] [i,j]
Ω̌k be initialized to ξˇ0 and Ω̌0 , respectively, and have objective is equivalent to proving for all robots, samples,
[i] [i,j]
both evolve according to (3) on a connected graph G. Define and communication rounds that βk ξˇk is a real vector and
[i] [i,j]
the ith robot’s approximation of the joint mixture set  to βk Ω̌k is a real positive-definite symmetric matrix. Since
[i] [i] ˇ[i,j] [i] [i,j]
be M̌k := (βk ξk , βk Ω̌k ) : j ∈ {1, . . . , nm } . In the collection of real vectors and the collection of positive-
[i,j] [i,j] [i,j]
addition, let α̌k ∈ R>0 be initialized to p[i] η(ξˇ0 , Ω̌0 ) definite symmetric matrices are both closed under addition and
[i]
and evolve according to (4). We have that positive scalar multiplication (βk ∈ [1, nr ] from the upcoming
Lemma 3), it holds that the joint mixture set is composed of
[i] 
n m [i,j] β
(α̌k )
[i]
k [i] [i,j] [i] [i,j]
Λ(M̌k ) := 1
nm [i,j] [i,j] Nc (βk ξˇk , βk Ω̌k ) non-degenerate Gaussians.
j=1 η(βk ξ̌k ,βk Ω̌k )
The guarantee of non-degeneracy is fundamental to many
converges weakly to the joint mixture distribution as k → ∞. of the claims to come. More interestingly, the mathematical
structure of (3) that allows this guarantee also allows for
Proof (Corollary 1). The proof follows directly from Theorem
intuitive interpretations of how the approximations evolve over
1 and Lemma 1.
time, especially concerning the rate of convergence of the
scaled canonical parameters. These will be discussed shortly,
Remark 2 (Complexity and Scalability). We have that the but first we review the concept of exponentially decreasing
communication complexity for broadcasting the canonical disagreement [11].
parameters is O(n2g nm ) for each communication round, while
the memory and computational complexity for calculating Lemma 2 (Exponentially Decreasing Disagreement). For all
[i]
Λ(M̌k ) is also O(n2g nm ). Thus, this distributed algorithm robots and communication rounds, we have that
scales quadratically with respect to Gaussian dimension and 3 [i] 3 3 3  1
3ψ − 1 1 3 ≤ U ψ := 3W − 11T 1 3k 1 − 1 2 ,
linearly with respect to number of samples. Most importantly, k nr 2 k nr 2 nr
overall complexity remains constant with respect to the number where lefthand side of the inequality is termed disagreement
of robots. Please see [9] for further discussions on the and  · 2 for a matrix denotes the spectral norm.
[i]
complexity of ψk , which naı̈vely computed would at worst
scale linearly with respect to the number of robots. Proof (Lemma 2). The proof follows from Xiao et al. [16]
[i]
with ψ0 = ei , since
[i] nr −1
Remark 3 (Significance and Interpretation). The concatena- ψ0 − 1 n1r 22 = (1 − 1 2
nr ) + n2r =1− 1
nr .
tion of the robot mixture sets (versus a Cartesian product)
is allowed due to the fact that the corresponding samples
[i]
are both unweighted (i.e., all samples are equally likely) and Lemma 3 (Properties of ψk ). For all robots and communi-
[i] [i]
conditionally independent across the robots. Without these cation rounds, we have that ψk ∈ [0, 1]nr , ψk 1 = 1, and
properties, Lemma 1 would not necessarily be true, and thus [i]
ψk ∞ ≥ 1/nr .
the robot’s approximation of the JMD would be an arbitrarily
[i]
poor representation. Instead, this approximation is guaranteed Proof (Lemma 3). Since for all robots ψ0 1 = ei 1 = 1,
to converge weakly to the JMD as the communication round we have that
and the number of samples tend to infinity, or in words, as [i] [i]  [v]
ψ1 1 = [W]ii ψ0 1 + v∈N [i] [W]iv ψ0 1
certain system resources increase.   
= [W]ii + [W]iv = 1 − [W]iv + [W]iv = 1.
v∈N [i] v∈N [i] v∈N [i]
Remark 4 (Limitations). One should immediately recognize [i]
that as the number of robots increases, each mixture sample In addition, ψ1 is nonnegative since it is an element-wise sum-
becomes less informative about the JMD of the entire system. mation of nonnegative terms, which from the previous equation
[i]
Simultaneously increasing the robot mixture set size to retain implies ψ1 ∈ [0, 1]nr . Lastly, we have from the relationship
[i] [i]
the approximation’s global precision can be exponential with between 1- and ∞-norms that ψ1 ∞ ≥ ψ1 1 /nr = 1/nr .
respect to the number of robots. In our work using gradient- The proof follows by induction on k.
based information theoretic controllers, this limitation is not
B. Bounds on Scaled Canonical Parameters
significant since we typically want to retain precision with
respect to the JMD of a local neighborhood in physical In the following subsection, we simplify notation by drop-
proximity to the robot. ping the overhead check +̌ and the sample index j. For
[i] [i] [i,j] [i,j]
example, we have that ξk and Ωk denote ξˇk and Ω̌k ,
IV. P ERFORMANCE G UARANTEES
respectively.
A. Non-Degeneracy and Disagreement [i]
It was discussed in [9] that the exponential factor βk
We begin to characterize the joint mixture approximation accounts for the fact that the approximation of the JMD may
by proving that the corresponding Gaussians ‘make sense’. be used before the consensus-based algorithm has converged.

173
In our case, we expect this algorithm to prematurely ter- where
minate before the Gaussian parameters converge, and thus Lβk  [ ]   [Lβ ]
the exponential factor indicates how ‘close’ the approximated LΩ−
k,m := =1 λng (Ω0 ) + Lβk − ,Lβk - λng (Ω0 k )
information and information are to the true joint canonical [1]
with the robot indices ordered such that λ1 (Ω0 ) ≤
parameters. In the following, we provide a strictly increasing [2] [n ]
λ1 (Ω0 ) ≤ · · · ≤ λ1 (Ω0 r ), and where
lower bound for the exponential factor that equals one at k = 0
and converges to nr in the limit as k tends to infinity.  nr [ ]
k,m := λm (Ω) −
LΩ+ =Lβ
λng (Ω0 )
k +1
  [Lβ ]
Theorem 2 (Lower Bound for the Exponential Factor). For − .Lβk / − Lβk λng (Ω0 k )
all robots and communication rounds, we have that [1] [2] [nr ]
  −1 with λng (Ω0 ) ≤ λng (Ω0 ) ≤ · · · ≤ λng (Ω0 ).
[i]
βk ≥ Lβk := Ukψ 1 − n1r + n1r . [i] [i]
Proof (Theorem 3). We first prove that λm (βk Ωk ) is
Proof (Theorem 2). Consider the optimization problem of bounded below by LΩ−
k,m . Note that
[i] nr
[i] [i]
maximizing ψk ∞ with ψk being a free variable subject [i] [i] [i] [v]
βk Ω k = βk v=1 [ψk ]v Ω0 .
to the constraints in Lemma 3 and subject to
 Recursively applying Weyl’s Theorem [7], we have that
[i]
ψk − 1 n1r 2 ≤ Ukψ ∈ [0, 1 − n1r ] = [0, U0ψ ].
[i] [i] [i] nr [i] [v]
λm (βk Ωk ) ≥ βk v=1 [ψk ]v λ1 (Ω0 ). (6)
Note that an optimal solution ψk∗ always exists. Put c ≥ 0,
Ordering the robot indices for λ1 and using the lower bound
and without loss of generality suppose ψk∗ ∞ = [ψk∗ ]1 and from Theorem 2, we have that
ψk∗ − 1 n1r 22 = c2 . We define f (ψk , μ1 , μ2 ) to be
[i]
β
Lk

nr    β
[ L ]
[i]  [i]   [i]  [i]
βk
[i] [v]
[ψk ]v λ1 (Ω0 ) ≥
[]
λ1 (Ω0 ) + Lβk − Lβk λ1 (Ω0 k ).
[ψk ]1 + μ1 ψk − 1 n1r 22 − c 2
+ μ2 ψk 1 − 1 , v=1 =1

and by using Lagrange multipliers obtain Substituting this inequality into (6) gives LΩ−
k,m .
[i] [i]
[i]
([ψk ]1 −1/nr )2
Lastly we prove in similar fashion that λm (βk Ωk ) is
[i]
nr −1 + ([ψk ]1 − n1r )2 − c2 = 0. bounded below by LΩ+ k,m . Note that
[i] ' [i] [i] nr [i] [i] [v]
Thus, we have that [ψk ]1 = c 1 − 1/nr + 1/nr and c ≤
' βk Ωk = Ω − v=1 (1 − βk [ψk ]v )Ω0 .
[i] [i]
1 − 1/nr since [ψk ]1 ∈ [0, 1]. By the last equality, [ψk ]1
Recursively applying Weyl’s Theorem, we have that
is proportional to c, and by the last inequality we conclude that
 nr
Ukψ . Thus, the corresponding value of [ψk ]1 = ψk∗ ∞ is
[i] [i] [i] [i] [v]
c =' λm (Ω) ≤ λm (Ωk ) + v=1 (1 − βk [ψk ]v )λng (Ω0 ). (7)
ψ
Uk 1 − 1/nr + 1/nr .
[i] Ordering the robot indices for λng and using the lower bound
Lastly, consider ψk as consensus term rather than a free from Theorem 2, we have that
variable. From above, we can interpret ψk∗ −1 ∞ as a lower  nr [i] [i] [v]
v=1 (1 − βk [ψk ]v )λng (Ω0 )
[i]
bound for βk given Ukψ , which gives Lβk .
  [Lβ ]  nr [ ]
We now shift our attention to the geometric interpretation ≤ .Lβk / − Lβk λng (Ω0 k ) + λ (Ω0 ).
=Lβ +1 ng
[i] [i] k
of the information matrix βk Ωk , which describes ellipsoidal
contours of equal density for the corresponding scaled Gaus- Subtracting the summation term from both sides of (7), substi-
sian. The squared lengths of the contours’ principal axes are tuting the result into the previous inequality gives LΩ+
k,m .
given by the inverse of the information matrix eigenvalues,
with larger values representing distribution axes of higher Remark 5 (Maximum of Two Bounds). The use of both
certainty. As more communication rounds are performed and [i] [i]
LΩ− Ω+
k,m and Lk,m yields an intuitive bound on λm (βk Ωk ) in
the information matrix converges element-wise, we expect this the instances where k = 0 and k → ∞, respectively. The
certainty to increase and also converge. This is in fact the case, [i] [v]
former implies λm (Ω0 ) ≥ minv λm (Ω0 ) and the latter with
and by using the lower bound for the exponential factor, we [i] [i]
Lemma 3 implies limk→∞ λm (βk Ωk ) = λm (Ω), both of
provide a strictly increasing lower bound for the information which are obvious. In addition, the two bounds are equivalent
matrix eigenvalues. for univariate Gaussians (i.e., ng = 1).

Theorem 3 (Lower Bound for the Information Matrix Eigen- Lastly, we derive the strictly shrinking range for the in-
values). Let λ1 ≤ λ2 ≤ · · · ≤ λng . Then for all robots, formation vector elements, which when combined with the
samples, communication rounds, and m ∈ {1, . . . , ng }, we bounds on the information matrix eigenvalues well character-
have that izes the convergence behavior of the resulting scaled Gaus-
sians. We believe such characterizations can lead to bounds
[i] [i]
λm (βk Ωk ) ≥ LΩ
k,m := max{Lk,m , Lk,m },
Ω− Ω+
on such information theoretic metrics such as Kullback-Leibler

174
divergence of the mixture of Gaussians, however, such efforts
are reserved for future work.

Corollary 2 (Bounds on the Information Vector Elements).


For all robots, samples, communication rounds, and m ∈
[i] [i]
{1, . . . , ng }, we have that Lξk,m ≤ [βk ξk ]m ≤ Uk,m
ξ
, where
Lβk  [v] [Lβ
k ]
Lξk,m := v=1 [ξ0 ]m + (Lβk − ,Lβk -)[ξ0 ]m
[1] [2]
with the robot indices arranged such that [ξ0 ]m ≤ [ξ0 ]m ≤
[n ] ξ
· · · ≤ [ξ0 r ]m , and where Uk,m is defined the same as Lξk,m
[1] [2] [n ]
but with [ξ0 ]m ≥ [ξ0 ]m ≥ · · · ≥ [ξ0 r ]m .
Fig. 3: This figure shows the evolution of each robot’s normalized JMD ap-
Proof (Corollary 2). The proof follows from applying Theorem proximation on a connected communication graph topology at communication
rounds of k = {1, 10, 20, 30}. The dashed envelope represents the feasible
2 to Theorem 1. region within which the peak of every robot’s approximation must lie.

V. N UMERICAL S IMULATIONS
B. Consensus of Mixtures of Gaussians
A. Consensus of Univariate Gaussians We now focus on an object localization task where the world
We first consider an object localization task using ten robots state is three dimensional and each robot’s IMD cannot be
with IMDs that can be accurately represented by univariate accurately represented by a single scaled Gaussian. Consider
scaled Gaussians. Such a simplified task best illustrates how the ten robots in Figure 1 tasked to localize an object by
each robot’s JMD approximation converges weakly to the taking range only measurements that obey a Gaussian sensor
true one, which is analogous to how each Gaussian element model. The resulting IMDs are three dimensional distributions
of a mixture converges. Figure 2 shows the normalized dis- with contours of equal density being spheres centered at the
tributions with corresponding parameters for the ten IMDs. corresponding robot (Figure 4). To represent the IMD with a
Note that we selected the canonical parameters to separate the weighted Gaussian mixture given an observation, each robot
distributions for illustrative purposes, as one should not expect forms a weighted mixture set of three dimensional Gaussian
such initial disagreement within a fielded robot system. Figure elements using a dodecahedron-based geometric layout.
2 also shows the normalized JMD of nonzero mean, since the
assumption of zero mean can lead to misleadingly tight bounds
(e.g., bounds that are not invariant under translation).

Fig. 4: This figure shows the process of representing a robot’s IMD with
a weighted mixture of Gaussians. A range only sensor model of Gaussian
distribution was assumed to have one standard deviation accuracy equal to five
percent of the received sensor measurement. Left: Given a relative observation
Fig. 2: This figure shows the one dimensional normalized IMDs of the ten distance of 4.5 meters, the robot’s three dimensional IMD is generated for all
robots with respect to the normalized JMD. relative world states. Here we show the normalized mixture distribution on
two dimensional slices that intersect at the robot’s location of (7, 6, 1) meters.
We evaluated the performance of our consensus-based algo- Right: A weighted mixture of 32 Gaussian elements is formed to represent
rithm on the connected communication graph shown in Figure the IMD, where each element is located at a vertex or a face centroid of a
1. Figure 3 shows the evolution of each robot’s normalized dodecahedron concentric with the robot.
JMD approximation with respect to a strictly shrinking enve- Using a computer cluster, Monte Carlo simulations employ-
lope derived from bounds given in Theorem 3 and Corollary 2. ing various mixture sizes were performed in a distributed fash-
These envelopes can be interpreted as feasible regions within ion, meaning that every robot was simulated on an independent
which the peaks of all the robots’ normalized JMD approx- computer cluster node and communicated using MatlabMPI.
imations must lie, intuitively highlighting the performance For a given simulation run, each robot (i) drew an observation
guarantees discussed in Section IV. We note that these bounds from the previously described range only measurement model,
for this particular communication graph are conservative; we (ii) represented the resulting IMD with a weighted Gaussian
found that graphs with higher algebraic connectivity tend to mixture, (iii) drew unweighted samples to form the unweighted
produce tighter bounds. mixture set, and finally (iv) ran the consensus-based algorithm

175
to approximate the JMD. Figure 1 illustrates one particular [4] H. F. Durrant-Whyte, B. Y. S. Rao, and H. Hu. Toward
evolution of the 1st robot’s three dimensional normalized JMD a fully decentralized architecture for multi-sensor data-
approximation, which becomes more informative as more fusion. In Proceedings of the IEEE International Con-
consensus rounds are performed. ference on Robotics and Automation, volume 2, pages
1331–1336, 1990.
10
9
nm = 100 [5] C. S. R. Fraser, L. F. Bertuccelli, H. L. Choi, and J. P.
nm = 200
8 nm = 500 How. A hyperparameter consensus method for agreement
KL-divergence [bit]

nm = 1000
7
nm = 2000 under uncertainty. Automatica, 48(2):374–380, February
6 nm = 5000
5
nm = 10000 2012.
4 [6] J. A. Gubner. Distributed estimation and quantization.
3 IEEE Transactions on Information Theory, 39(4):1456–
2
1
1459, 1993.
0 [7] R. A. Horn and C. R. Johnson. Matrix Analysis. Cam-
0 5 10 15 20 25 30
Communication round k bridge University Press, 1985.
[8] B. J. Julian, M. Angermann, M. Schwager, and D. Rus.
Fig. 5: This figure shows the average Kullback-Leibler divergence of the
robots’ normalized JMD approximations with respect to the joint mixture
A scalable information theoretic approach to distributed
distribution for various mixture sizes. robot coordination. In Proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Sys-
Figure 5 shows the average Kullback-Leibler divergence tems, pages 5187–5194, 2011.
over 1000 simulations with respect to the joint mixture distri- [9] B. J. Julian, M. Angermann, and D. Rus. Non-parametric
bution. Not surprisingly, the divergence at all communication inference and coordination for distributed robotics. In
rounds is smaller for larger mixture sizes; however, this Proceedings of the IEEE Conference on Decision and
behavior clearly exhibits diminishing returns. In addition, more Control, December 2012.
than 500 samples are needed to prevent the divergence from [10] A. Manyika and H. F. Durrant-Whyte. Data Fusion
initially increasing, although again all JMD approximations and Sensor Management: A Decentralized Information-
by Corollary 1 are guaranteed to converge weakly to the joint Theoretic Approach. Prentice Hall, 1998.
mixture distribution as the number of communication rounds [11] R. Olfati-Saber, E. Franco, E. Frazzoli, and J. S.
tend to infinity. Shamma. Belief consensus and distributed hypothesis
testing in sensor networks. In Proceedings of the Network
VI. C ONCLUSIONS
Embedded Sensing and Control Workshop, pages 169–
We present scalable, decentralized algorithms that enable 182, 2005.
robots within a large team to independently perform poste- [12] R. B. Platte and T. A. Driscoll. Polynomials and potential
rior calculations needed for sequential Bayesian filter pre- theory for Gaussian radial basis function interpolation.
dictions and mutual information-based control. We focused SIAM Journal on Numerical Analysis, 43(2):750–766,
on distributively approximating the joint of continuous-valued 2006.
measurement distributions, providing performance guarantees [13] W. Ren, R. W. Beard, and D. B. Kingston. Multi-
and complexity analyses. We are currently investigating the agent Kalman consensus with relative uncertainty. In
concept of diminishing returns that was highlighted in our Proceedings of the American Control Conference, pages
numerical simulations. Lastly, we wish to adapt the algorithms 1865–1870, 2005.
for specific types of Bayes filters (e.g., Kalman) for which we [14] D. Williams. Probability With Martingales. Cambridge
can benchmark against much prior work. Our paradigm of Mathematical Textbooks. Cambridge University Press,
sample-based sensor fusion combined with the notion of pre- 1991.
convergence termination has the potential to impact how the [15] L. Xiao, S. Boyd, and S. Lall. A scheme for robust
research community defines scalability in multi-robot systems. distributed sensor fusion based on average consensus.
In Symposium on Information Processing of Sensor Net-
R EFERENCES works, pages 63–70, 2005.
[1] M. Alighanbari and J. P. How. An unbiased Kalman [16] L. Xiao, S. Boyd, and S.-J. Kim. Distributed average
consensus algorithm. In Proceedings of the American consensus with least-mean-square deviation. Journal of
Control Conference, pages 4753–4766, 2006. Parallel and Distributed Computing, 67:33–46, January
[2] R. Aragués, J. Cortés, and C. Sagués. Dynamic con- 2007.
sensus for merging visual maps under limited commu- [17] P. Yang, R A. Freeman, and K. M. Lynch. Distributed
nications. In Proceedings of the IEEE International cooperative active sensing using consensus filters. In
Conference on Robotics and Automation, pages 3032– Proceedings of the IEEE International Conference on
3037, May 2010. Robotics and Automation, volume 1, pages 405–410,
[3] J. Cortés. Distributed algorithms for reaching consensus May 2006.
on general functions. Automatica, 44(3):727–737, 2008.

176
Robust Object Grasping Using
Force Compliant Motion Primitives
Moslem Kazemi, Jean-Sebastien Valois, J. Andrew Bagnell, and Nancy Pollard
The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213 USA
{moslemk, valois, dbagnell, nsp}@andrew.cmu.edu

Abstract—We address the problem of grasping everyday objects


that are small relative to an anthropomorphic hand, such as pens,
screwdrivers, cellphones, and hammers from their natural poses on
a support surface, e.g., a table top. In such conditions, state of the
art grasp generation techniques fail to provide robust, achievable
solutions due to either ignoring or trying to avoid contact with the Fig. 1: Compliant landing of fingers followed by compliant
support surface. In contrast, we show that contact with support
grasping of a hammer
surfaces is critical for grasping small objects. This also conforms
with our anecdotal observations of human grasping behaviors.
We develop a simple closed-loop hybrid controller that mimics
this interactive, contact-rich strategy by a position-force, pre-grasp
and landing strategy for finger placement. The approach uses
a compliant control of the hand during the grasp and release
of objects in order to preserve safety. We conducted extensive
grasping experiments on a variety of small objects with similar Fig. 2: The anthropomorphic Barrett hand grasps small objects:
shape and size. The results demonstrate that our approach is robust a cellphone, hammer, pen, and screwdriver from a table top.
to localization uncertainties and applies to many everyday objects.

generally tend to perform well when reliable grasp points can


I. I NTRODUCTION be found away from the support surfaces. However, that is
Grasping constitutes an essential component in an autonomous rarely the case for small objects. In addition, because existing
robotic manipulation system operating in human environments. grasp planning tools (such as OpenRAVE [4] and GraspIt![12])
The wide variety of everyday objects and various environment rely on precise finger-to-object contact points while avoiding
settings requires grasping strategies that are robust to variations the surrounding environment they are impractical for grasping
in shape, size, and pose, as well as uncertainties in perception small objects. Figure 3 illustrates some of the failure modes of
and robot kinematics. Although advances in perception provide a geometric grasp planner.
accurate object pose estimation, calibration and kinematic factors In our experience, geometric techniques are only effective
affect the accuracy at which an end-effector can be controlled when precise calibration is performed to position the robot
for grasping. Because of such uncertainties contact information relative to the object. However, limitations in sensing and
such as tactile and force/torque feedback is necessary to achieve control uncertainty make grasps that rely on precise calibration
robust grasps. impractical. Thus, it is important to devise grasping strategies
Much research in robotic manipulation is focused on geomet- that are robust to these errors.
ric grasp generation and planning. Researchers study finger and Most approaches to grasping attempt to model uncertainty. In
object interactions and develop grasp quality metrics based on contrast, we believe that the effect of uncertainty can be ignored
form/force closure [2]. Despite promising progress in geomet- in many compliant interactions between the robot and the
ric grasp planning and the efforts devoted to analyzing grasp environment. Because we allow contact with the environment,
properties, challenges still remain in real-world manipulation compliant motions are crucial to ensure safety of the robot and
tasks because the existing techniques fail to consider opportu- successful execution of the task. Our approach is motivated by
nities presented by contacts with the environment. In our view, anecdotal experiments with human subjects which demonstrate
supported by our empirical studies, two factors play a crucial similar behavior in human grasping of small objects (see Section
role in achieving more robust grasps: dealing with positioning III). The use of compliance for grasping small objects is the main
uncertainties [6], and using compliant motions [11] to handle contribution of this paper.
contacts between the robot and the environment. We present three simple, yet effective, manipulation primitives
Our work (Fig.1) is motivated by the task of grasping ev- for robust grasping (and releasing) of small objects from support
eryday objects that are small relative to an anthropomorphic surfaces: (1) Compliant Finger Placement for bringing all fingers
hand, including a pen, screwdriver, cellphone, and hammer from safely in contact with the support surface, (2) Compliant Object
their natural poses on a support surface, e.g., a table top (see Grasping for maintaining the contact between the fingertips and
Fig.2). Note that existing techniques for grasping larger objects the support surface during the finger closure, and (3) Compliant

177
demonstration by optimizing a cost function which measures
the task success. In contrast to their technique, our approach
requires no learning and implements a simple closed-loop hybrid
position-force controller that generates the compliant motions
(a) (b) (c) (d) necessary to maintain the proper contact between the fingertips
Fig. 3: Failure modes of a geometric grasp planner: (a) no and the supporting surface, and is operational across a broad
contact point found since the hammer is sunken into the table range of conditions.
due to localization error, (b) computed contact points are out of We propose a compliant grasping strategy which performs
collision but too close to the table, (c) the planner does not allow pinching as well as enveloping grasps [17]. In contrast to
for the contact with the table and therefore enveloping grasps pinching grasps, where the object is restrained by the fingertips at
are not considered resulting in an unstable grasp, (d) releasing certain contact points, enveloping grasps are formed by wrapping
the object fails since the planner does not account for collision the fingers and the palm around the object. Theoretical analysis
between the fingers and support surface. (e.g. as in [17] and [7]) can be used for pulling objects from a
surface into an enveloping grasp when identifying grasps for
Object Release. An example of compliant landing and grasping new objects. The choice of the grasp is affected by various
primitives is shown in Fig.1. parameters including the task and the size, shape and weight of
the object. Enveloping grasps are shown to be superior in terms
II. R ELATED W ORK of restraining objects as expressed in [2]. Although preferable,
The literature on robotic grasping is vast. Here we only refer they often are more challenging to perform in scenarios where
to the most recent related developments in the area, but we also the object is lying down on a supporting surface, e.g., grabbing
encourage the reader to consult the extensive review by Bicchi a hammer lying on a table (Fig. 1). In fact, in such scenarios,
and Kumar [2]. the fingers need to come in full contact with the support surface
We believe uncertainty is the key challenge for grasping small and then slip underneath the object while the hand is pushed
objects because of limitations in perception and calibration. downwards to maintain the proximity to the support surface.
Many techniques attempt to handle uncertainty by explicitly Proper control of the hand to achieve such motion without
modeling the contact between the fingers and the object (e.g. breaking or stalling the fingers is the motivation for our work.
[14]). In practice, for small objects such precise modeling is
III. PARALLEL WITH H UMAN G RASPING
very difficult.
Deliberate interaction with the environment can be used to Complete results from human subject studies aimed at com-
reduce uncertainty. Active sensing has been proposed to reduce paring our proposed robotic grasping approach to the way
uncertainties in object pose estimation. For example, Hsiao et humans grasp small objects is not the focus of this paper.
al. [8] use pre-grasp interactions to estimate the object pose Nonetheless, our inspiration came from careful recording of
without disturbing it. Other methods such as [3, 5, 6] use human grasping activities. The intent of our anecdotal exper-
deliberate interaction with the object to reduce uncertainty, e.g. iments was to validate our belief that people make extensive
in [5] a push-grasping mechanism is used to align and bring the use of contacts with the environment to reduce hand position
object inside the capture envelope. Platt et al. [15] devised null uncertainty and move fingers into positions during grasping
space grasp controllers to achieve frictionless equilibrium grasp small objects from support surfaces.
configurations by displacing the fingers over the object surface We present results from a typical human experiment trial in
and aiming to regulate contact force/moments error residuals to Fig.4, in which a human subject was asked to grasp a marker.
zero. A custom in-house developed iPad application was used to
The aforementioned techniques along with other geometric carefully measure both location and duration of contacts while
grasping strategies (see [2]) either neglect or try to avoid contact subjects grasped the marker from the iPad surface. We note
with the support surface during grasp execution, and hence that in every one of over 50 trials, human subjects maintain
often fail to successfully execute a planned grasp, particularly extensive contact with the iPad surface while grasping the
for small objects. In contrast, we argue that reliable grasp of marker, in contrast with the classical grasping techniques of
small object must consider contacts with support surfaces. Our precise placement of fingers on grasp points while avoiding
approach leverages this insight. In order to allow contact with contact with support surface.
the environment compliant motion is necessary. Figure 4(b) shows the 2-D histogram of fingertip trajectories
In concurrent related work, Kalakrishnan et al. [9] recently for 50 grasping experiments performed by 5 human subjects in
presented a learning strategy to acquire manipulation and grasp- which they were asked to grab a marker located at a predefined
ing skills where an initial position control policy for the ma- location on the iPad surface (repeatedly for 10 times each). The
nipulation task is initialized through kinesthetic demonstration. center of the figure shows the highest density of contacts due to
The learned policy is then augmented with a force/torque profile subjects dragging their fingers towards the center of the object.
that is controlled in combination with position trajectories using For all the test subjects, we noticed that they were sometimes
a force compliant strategy in a closed-loop scheme that is touching the surface over 2 to 4 centimeters prior to grasping
similar to ours. The force/torque profile is learned through the marker. These early results appear to validate our intuition

178
Grasp All fingers Fingers reached Release launch Fingers reached
launch pose touching closed position pose reached open position

Compliant Compliant Lifting & Compliant


Landing Grasping Transportation Release
Fig. 5: A grasping task decomposed into four distinct steps

Object
cm

erate compliant motion of the hand in response to forces seen at


the wrist. This formulation is fairly straightforwrd to implement
and demonstrates an overall good performance compared to
other variations of operational space control [13]. Desired joint
velocities to track a given hand velocity ẋd , determined based
cm on the task primitive, are calculated using Liegeois’ resolved
(a) Left: fingertip trajectories, right: examples of robot motion rate control approach as: [10]
and human grasps

q̇d = J+ ẋd + λ(I − J+ J)∇H(q), (1)

where J is the robot Jacobian with its pseudo-inverse denoted


J+ , λ is a gain value, and H(q) is a null-space cost/utility
cm

Object function. Different criteria can be used to define H(q) depending


on the objective, e.g., avoiding joint limits or kinematic singu-
larities.
The desired motors torque command is calculated using
the computed torque control method with an added velocity
cm feedback [13] to track the desired joint velocities in (1),
(b) 2-D histogram of fingertip trajectories
Fig. 4: Experiments studying human grasping strategies τ = M(q)q̈d + C(q, q̇) + g(q) + Kq,d (q˙d − q̇) (2)

for developing the simple contact-rich primitives in this work. where M(q) is the inertia matrix, C(q, q̇) is the Corio-
A formal analysis of human grasping activities is the subject of lis/centrifugal vector, g(q) is the gravity vector, Kq,d is a gain
our current research. matrix, and τ is the joints torque vector. The desired joint
acceleration q̈d is obtained by differentiating q̇d .
IV. H ARDWARE AND C ONTROL R EQUIREMENTS In the following sections, we describe how the desired hand
A. Hardware Requirements velocity ẋd is computed to achieve a compliant motion behavior
in response to contact forces.
The compliant motion primitives presented in this work use
two sensing modalities: (1) a 3-axis force/torque sensor posi- 2) Position-based Control of Fingers: The fingers are con-
tioned at the wrist between the manipulator arm and the hand, trolled along their pre-defined trajectories using a position-based
and (2) strain gauges between the proximal and distal finger method. The trajectory is defined as a sequence of waypoints
segments. Moreover, the proposed primitives assume knowledge based on the given task primitives. This controller is used to
of fingertip positions from forward kinematics. Force feedback coordinate the position of fingers along their desired trajectories
from the wrist closes the loop for performing compliant motions during the grasping or releasing of objects.
as described below. Finger strain gauges are solely used to detect
when fingers are in contact with the support surface. V. F ORCE C OMPLIANT G RASPING P RIMITIVES
Ideally, either of the above sensing modalities can provide
the necessary feedback to implement the proposed primitives. A grasping task can be decomposed into four distinct, se-
However, we found the finger strain gauges hard to calibrate quentially executed steps as illustrated in Fig.5. (1) Compliant
and very noisy, so we only used them as a binary sensor to Landing: fingers are placed in a pre-defined grasp pre-shape and
detect individual finger contacts. At the same time, the 3-axis the hand is maneuvered downward until all fingers fully rest on
force/torque was noisy enough that looking at the direction of the support surface, (2) Compliant Grasping: force feedback is
the force vector was impractical to determine which finger had used to maintain a desired contact force at the fingertips while
made contact, especially in the presence of kinematic modeling the fingers joints are synchronously closed to capture the object,
errors. Other sensory hardware can be used as long as the above (3) Lift and Transportation: the object is lifted away from the
feedback is provided. For example, tactile sensors at fingertips surface and carried to the destination, (4) Compliant Release:
can be used to detect the finger contacts. the object is gently deposited on the support surface using a
method similar to step 2. A similar sequence of grasping controls
B. Low-Level Controllers for Hand and Fingers has been suggested in [16] inspired by neuroscience studies,
1) Velocity-based Operational Space Control of Hand: We but it lacks the finger landing step and the compliant primitives
employ a velocity-based operational space formulation to gen- proposed in this paper.

179
Algorithm 1: Force Compliant Finger Landing
1 begin
2 Record the current finger strains, Sir ;
3 Set the finger contact flags to zero, Ci ← 0;
4 repeat
Fig. 6: Compliant finger landing sequence: the circles indicate 5 Get the current finger strains, Si ;
which new finger has made contact with the support surface. 6 Get the current wrist forces, F;
7 Update the fingers contact flags, Ci based on strains;
8 Determine the control point and axis, and the control point
velocity screw cp ẋd ;
CPpalm
Z
Z F1 9 Compute the desired hand velocity screw ẋd ; Apply the
hand velocity ẋd to joints using Eq.2;
X
X
CP100
until Ci == 1 for i = 1, 2, 3;
Z Y Y
CP101
X 10

X
Z
Y Z
X
Z
11 end
CP001 X CP110
CP000
Y Y Y
Z
F3 X CP011
Y
F2
Z
X
CP010 normal and/or modeling errors in the robot kinematics resulted
Y

Fig. 7: Control points and axes selected based on the fingertips in an offset in the computation of the control axes. As a result,
rotation is performed around a slightly different control axes,
A. Force Compliant Fingers Landing causing the contacting fingers to either lose contact with the
surface or apply significant force onto the surface. The former
Landing begins by positioning the palm at a safe hovering leads to failure, and the latter endangers the fingers.
distance above the surface, in the direction of the plane’s normal. We rely on force compliance to avoid these risks. We move
We assume that the location of the support surface and its normal the hand in compliance with the forces exerted onto the fingers,
are roughly known from perception. which ensures a proper contact between the fingertips and the
We achieve safe finger landing by controlling the hand to- surface. The compliant motion introduces a linear velocity only
wards the support surface using a compliant controller. We use at the control point and along the palm’s normal, in response to
a velocity-based controller (Eq. 2) to generate the compliant forces measured at the wrist.
motions. This controller brings all fingers to contact with the The desired force to servo the hand is calculated as
support surface, while preserving them from damage. Landing
is achieved as soon as all fingertips establish contact with Fd = F(t) − Fr − Ft , (3)
the surface. Because the fingers do not necessarily contact
the surface simultaneously, we continuously update the servo where F(t) is the current force seen at the wrist, Fr denotes the
control point and axis to correct the hand’s motion. This process reference force recorded a priori before the fingers touch the
terminates when landing is achieved (i.e., all fingers are in surface. The parameter Ft is an attractive force value used to
contact with the support surface). Figure 6 illustrates a finger ensure downward motion of the hand (along its palm normal)
landing sequence. Our landing strategy can be adapted to support when none of the fingers is in contact with the support surface,
various grasp pre-shapes. i.e. when F(t) ≈ Fr .
Our approach relies on feedback from the finger strain gauges Finally, the linear velocity at the control point is given as
cp
to determine whether a finger is in touch with the support vd = Kf,p Fd where Kf,p is a positive gain. In practice we
surface. It sets contact flags Ci for fingers i = 1, 2, 3. These observed that passing Fd through a deadband filter helps to
contact flags are updated in real-time during landing, and are decrease the oscillation effect when Fd approaches zero.
used to compute the axis around which the hand needs to be The velocity screw at the control point cp xd is composed of a
rotated. The control axes are predefined based on the selection linear force compliant component cp vd and an angular velocity
of control points at the fingertips and their location with respect
cp
ω d around the corresponding control axis as explained above,
to the hand’s (or the end-effector’s) frame. We choose the control i.e., cp xd = [cp vdT , cp ω d T ]T . The control point velocity screw
cp
points at the fingertips, i.e., CP100 , CP010 or CP001 , as xd is transformed to the corresponding hand velocity screw ẋd
shown in Fig.7. For a single finger contact, the control axis given kinematics of the fingers and the current position of control
passes through the fingertip and is defined parallel with the point. Finally, the hand velocity screw is then applied using the
line passing through the other two fingertips. When two fingers computed torque method as (2). Algorithm 1 summarizes the
contact the surface, the mid-point between fingertips defines the proposed landing strategy.
control point, i.e., CP110 , CP101 or CP011 , and the control
axis is specified by the two control points at the fingertips. The B. Force Compliant Grasping
landing primitives continuously observe the contact status of the Our compliant grasping strategy begins with the assumption
fingers and calculate the appropriate control axis around which that all fingertips are in contact with the support surface.
an angular velocity cp ω d (with a constant magnitude) is applied This strategy consists of two closed-loop controllers that run
to land the non-touching fingers (see Algorithm. 1). independently and in parallel (see Algorithm 2). The first servos
In our experiments, uncertainties in localizing the surface the hand using a velocity-based operation space controller. The

180
Algorithm 2: Force Compliant Grasping
1 begin
2 Record the current wrist forces, Fr ;
3 Start fingers trajectory tracking by sending the first waypoints
Fig. 8: Force compliant grasping primitive: the hand is controlled to fingers controller;
in compliance with contact forces exerted from the support 4 repeat
5 Get the current wrist forces, F;
surface to the fingertips while the fingers follow their pre-defined 6 Compute the desired hand velocity ẋd using Eq.4;
trajectory to reach the object. 7 Apply the hand velocity ẋd to joints using Eq.2;
8 if all fingers reached their waypoints then
second is a position-based controller that moves the fingers along 9 Send the next fingers waypoint positions;
their pre-defined trajectories (see Section IV-B). 10 end
The velocity-based operational space controller is similar to 11 until all fingers reached their desired final positions;
(2), which is described in the previous section. This controller 12 end
generates the compliant motion of the hand in response to forces
exerted by the support surface to the fingertips. The goal of this
controller is to maintain contact between the fingers tips and the
support surface, while closing the fingers towards the desired
object-caging configuration (see Fig. 8)
The compliant motion applied to the hand is composed of
only a linear motion calculated at the hand frame as
Fig. 9: Force compliant release/placement primitive
v̇d = Kf,p Fd (4)
release/placement primitive is inspired by human release skills
where Fd is defined in (3). The hand velocity screw is then and utilizes the same methodology we employed to develop the
given as ẋd = [vdT , 0T ]T which is applied to the joints using compliant grasping primitive. The main idea is to servo control
the computed torque in (2). the hand in compliance to forces exerted on the fingers as the
While the fingers move along their predefined trajectories, our they open to release the object (see Fig.9).
compliant velocity controller responds to forces that are due to Assuming the hand (with the object grasped) is located above
the fingers’ contact with the support surface. These forces are the support surface, the release primitive begins by servoing the
measured at the wrist. The fingers’ positions are coordinated hand downward until contact with the surface is detected via
to ensure proper caging of the object, without missing it. To continuous thresholding of the force seen at the wrist. Relying
coordinate the fingers’ positions, as soon as all fingers reach on the contact as a signal to stop the hand motion is inspired by
their waypoints, new waypoints are provided to the position- human release strategy and has been used previously to trigger
based controller (our second controller). Due to uneven contacts, releasing objects (e.g., [16]).
some fingers may be lagging behind. This coordination strategy Once contact between the hand/fingers and the support surface
is key for successful grasping of small objects. is established, we proceed by opening the fingers while compli-
Our compliant grasping strategy can be used for performing antly servo-controlling the hand from the support surface. This
both pinching and enveloping grasps. In a pinching grasp, the is achieved by running two concurrent controllers: a velocity-
object is restrained by the fingertips only. We achieve this based operational space controller to control the hand, and a
behavior by stopping the fingers as soon as contact with the position-based controller to open the fingers along their pre-
object is detected via strain feedback. The enveloping grasp defined trajectories. The proposed strategy follows the same
continues after pinching the object. It applies additional torque scheme presented in Algorithm 2 with a minor difference in
to the fingers while pushing hand downward, which in practice the desired force Fd which is calculated as Fd = F(t) − Fr ,
encourages the object to slip towards the palm. At the same time, where the reference force Fr is recorded at the time of the initial
we close the fingers to fully capture the object. contact between the hand/fingers and the support surface.
C. Force Compliant Object Release VI. E XPERIMENTS AND R ESULTS
The ability to accurately place an object and release it from To validate and demonstrate the robustness and effectiveness
grasp is as important as the ability to grasp and lift it, particularly of the proposed grasping primitives we have performed extensive
for tasks such as stacking objects, assembly, or exchanging experiments on a fully integrated manipulation system: a com-
objects between hands in bi-manual manipulation. We devise a pliant 7-DOF Barrett Whole-Arm Manipulator (WAM) equipped
compliant strategy for gentle release and placement of a grasped with a Barrett 3-finger dexterous hand BH-280, and an integrated
object on a support surface. The proposed approach effectively perception system. The positioning accuracy of the perception
avoids abrupt release of the object from the grasp and ensures system varies depending on the object of interest. Overall, our
gentle placement on the support surface from both precision system is capable of providing object pose with an accuracy
and enveloping grasps. Release from enveloping grasps is more of about 1-2 centimeters and 5-10 degrees error in position
challenging due to the inevitable, extensive contact between the and orientation, respectively. In Section IV-A we described the
fingers and support surface during the release. The proposed minimum hardware/Control requirements.

181
All objects in our experiments were located in their natural
poses on a table top localized by the perception system. For each
of the objects in our grasping experiments the vision system
provides for an object a “launch” pose for the hand where the
grasp is to be initiated. The launch pose is calculated in a way
to restrict the plane specified by the fingertips parallel to the (a) Descending (b) F2 touching (c) Rotating
support surface with the hand centered above the object (or the
target location for the release of object).
To grasp the object the hand assumes a predefined pre-
shape weakly dependent on object geometry. Based on our
empirical observations and given the flexibility of our proposed
grasping approach, one could choose the same grasp pre-shape (d) F3 touching (e) Rotating (f) F1 touching
for objects with similar geometries. For example, for all objects F3

expected
Fingers touch states
with cylindrical shape (e.g., pen, screwdriver, hammer, pipe, F2
etc.) we used a cup-like grasp pre-shape (as shown in Fig.1). F1
In fact, we have observed that the cup-like pre-shape works
remarkably well for grasping many of the small objects we used F3

current
in our experiments. This also highly conforms with our anecdotal F2
human behavior observations. The width of the finger opening F1
to form the cup pre-shape can be chosen arbitrarily wide as long
as it satisfies the localization accuracy of the perception system (a) (c) (e)

Fingers strains (tared)


to make sure that the object can be caged.
We present the experimental results and our empirical ob-
servations in three categories: fingers placement/landing, ob-
ject grasping, and object release/placement experiments as (b) (f)
(d)
follows. A video of the grasp sequence is available online
sr = 50
at http://youtu.be/gxaXCYY87Z0.

A. Fingers Landing Experiments


linear

An example of finger landing/placement experiment for grasp-


Hand velocity

0
ing a hammer from a table top is shown in Fig.10. Initially
the hand is at its launch pose centered above the hammer, see
angular

Fig.10(a). As it is seen the fingertips plane (or the hand palm) is


not parallel to the table due to misalignment of the hand caused 0
by uncertainties combined in perception and the robot kinemat-
ics. Clearly approaching the hammer along this orientation will
not place all fingers in contact with the support surface (see time [sec.]
Fig.10(b)), and hence will not yield a stable and robust grasp if
Fig. 10: A typical compliant fingers landing experiment: the
executed. To fix the hand orientation and ensure all fingers are
fingers (current and expected) touch states are continuously
in contact with the support surface, the proposed finger landing
updated based on the strain gauges feedbacks as the hand is
approach servo controls the hand around appropriate control axes
servo controlled to land the fingers.
(as described in Section V-A) based on the current finger touch
states determined from the finger strain feedbacks. For example, Due to kinematic errors in calculating the control axis extra
in this experiment, the hand is initially servo controlled along the force might be exerted to a finger as the hand rotates to land
normal to the fingertips plane (Fig.10(a)) until the touch between other fingers. For example, the strain in finger F2 increases
finger F2 and the support surface is detected (Fig.10(b)). The even after its first contact with the support surface as shown
strain threshold to identify the touch is 50 as indicated in the in Fig. 10. However, the force compliant motion incorporated
plot. Next the the hand is servo controlled around the control in the proposed landing technique prevents excessive force from
axis at finger F2 (Fig.10(c)) until finger F3 reaches the support being applied to the fingers by moving the hand away from the
surface. Finally, the hand rotates around the control axis between surface to decrease the strain on fingers and avoid damaging
fingers F2 and F3 until finger F1 contacts the surface. them.
The plots in Fig.10 show the fingers’ current contact status The proposed landing primitive can be used for landing fingers
and the fingers which are expected to contact the table next at from different pre-shapes. For example, Fig.11 shows landing
each instant of time. The fingers strain values along with the experiments two different grasp pre-shapes. For every grasping
hand linear (along the normal of fingertips plane) and angular experiment finger landing is first executed to ensure contact
velocities during the whole landing process are also shown. between all the fingers and the supporting surface. This is a

182
(a) (b) (c) (d)

F3

stuck
Fig. 11: Compliant fingers landing/placement experiments from F2

Fingers status
different hand pre-shapes F1

reached waypoint
key prerequisite to the robustness and success of our proposed F3
compliant grasping approach as shown in the experiments which
F2
follow. This is also a unique and novel strategy which may find
applications in other robotic manipulation tasks to place fingers F1
on the surface of objects in a compliant scheme.

Fingers trajectories
B. Object Grasping Experiments
Each grasping experiment starts off with all the fingers ini-
tially in contact with the supporting surface. This is ensured
using the finger landing strategy presented above. Figure 12
shows snapshots of a grasping experiment to grab a pipe from
a table top. The plot shows a number of parameters including:
status of fingers being stuck or reached their waypoints, fingers
trajectories, compliant linear velocity of the hand along its palm Wrist force (tared)
normal, and the force applied to the hand along its palm normal.
As it is shown, the hand responds compliantly to the wrist forces
while the fingers close along their trajectories. For example,
between seconds 6 to 8 all the fingers have reached the body of 0
the pipe (Fig.12(c)) and apply forces to the pipe body as well
as the table to manage to go underneath the pipe (about second
linear velocity

8). At this moment the forces applied by the fingers pushes


the hand upward while lifting the pipe and managing to fully 0
Hand

cage it. The compliant motion of the hand plays a crucial role
to prevent damage to the fingers. For example, there are other
instances (about seconds 2 and 4) where the fingers are stuck Fig. 12: Force compliant grasping of a pipe from a table top
but the compliant motion prevents the robot to exert excessive
pressure to the fingertips. Many of the state of the art grasping
C. Object Release Experiments
techniques either try to avoid the contact between the robot and
support surface or are not capable of accounting for such contact, The last set of our experiments demonstrate the compliant
and hence fully ignore it. object release and placement. The proposed release strategy
The experiments presented here show that the coordination be- has been successfully applied to release objects from both
tween the fingers and the hand movement plays a crucial role in enveloping as well as precision (fingertip) grasps. Due to space
achieving robust grasps of small objects. The simple yet effective limitations we do not provide the plots explaining the details of
compliant grasping primitive presented in this work successfully the release strategy. However, as we noted before, the release
achieves this goal as shown through numerous experiments. strategy heavily borrows from the compliant grasping technique
Figure 13 shows representative examples of experiments we have and one can view it as grasp execution but in reverse order. A
performed to grasp a variety of small objects, a screwdriver, a representative example from numerous experiments which we
pen, and a cellphone, etc. As shown the grasp used is a precision performed is shown in Fig.14. The proposed release technique
grasp using the fingertips to restrain the object. These successful is highly robust to the uncertainties in localizing the height and
results were not achievable without maintaining contact between orientation of the support surface and manages to gently release
the fingertips and the support surface and coordinating finger tra- and replace objects on the support surface without damaging the
jectories, behaviors which are fully integrated into our compliant fingers.
grasping approach. Moreover, to compensate for uncertainties in We conducted a series of grasp repeatability experiments on
placing the fingertips on the support surface, the finger landing common objects found in our lab; namely a D-battery Maglite,
primitive was used prior to grasp execution. a foot-long screwdriver, and a standard hammer. The objects

183
TABLE I: Compliant grasp performance on sample objects
Statistics Maglite Screwdriver Hammer
Num. Trials 24 30 35
Success Rate 92% 93% 97%

the support surface during the finger closure, and (3) Compliant
Object Release. We conducted extensive grasping experiments
on a variety of small objects with similar shape and size. The
results demonstrate that our approach is robust to localization
uncertainties and highlights the benefits of compliant, contact
driven control strategies for grasping tasks.
ACKNOWLEDGMENTS
The authors gratefully acknowledge funding under the
DARPA Autonomous Robotic Manipulation Software Track
(ARM-S) program.
R EFERENCES
[1] J. A. Bagnell, F. Cavalcanti, L. Cui, T. Galluzzo, M. Hebert,
Fig. 13: Examples of force compliant grasping experiments M. Kazemi, M. Klingensmith, J. Libby, T. Yu Liu, N. Pollard,
M. Pivtoraiko, J-S. Valois, and R. Zhu. System design and
implementation for autonomous robotic manipulation. In IROS,
2012.
[2] A. Bicchi and V. Kumar. Robotic grasping and contact: a review.
In ICRA, pages 348–353, 2000.
[3] R.C. Brost. Automatic grasp planning in the presence of uncer-
tainty. Int. J. Robot. Res., 7(1):3 – 17, 1988.
Fig. 14: An example of compliant object release/placement [4] Rosen Diankov. Automated Construction of Robotic Manipulation
Programs. PhD thesis, Carnegie Mellon University, Robotics
were randomly positioned on a table, within the robot’s reach. Institute, August 2010.
[5] M.R. Dogar and S.S. Srinivasa. Push-grasping with dexterous
The perception system was first used to localize the object, hands: Mechanics and a method. In IROS, pages 2123 –2130,
followed by the sequence of motions to position the hand at 2010.
the grasp launch pose. From that point, the sequence of landing, [6] M.A. Erdmann and M.T. Mason. An exploration of sensorless
grasping, transportation and release was executed, and success manipulation. IEEE J. Robot. Autom., 4(4):369 – 79, 1988.
rate measured. The results are shown in TableI. Success rate [7] K. Harada, M. Kaneko, and T. Tsujii. Rolling-based manipulation
for multiple objects. IEEE Trans. Robot. Autom., 16(5):457 – 68,
of 92% and above show that the approach is robust. Note 2000.
that the vast majority of the failures were due to early testing [8] K. Hsiao, L. Kaelbling, and T. Lozano-Perez. Robust grasping
failures, where faults in our finger reset procedures caused under object pose uncertainty. Autonomous Robots, 31(2-3):1–16,
fingers to jam when contacting the table. The aforementioned 2011.
compliant grasping primitives were devised and successfully [9] Mrinal Kalakrishnan, Ludovic Righetti, Peter Pastor, and Stefan
Schaal. Learning force control policies for compliant manipula-
implemented to address grasping tasks as a part of the DARPA tion. In IROS, pages 4639–4644, 2011.
Autonomous Robotic Manipulation challenge [1]. A number [10] A. Liegeois. Automatic supervisory control of the configuration
of grasping tests (hammer, screwdriver, maglight, and shovel) and behavior of multibody mechanisms. IEEE Trans. Syst. Man
were performed independently by DARPA on a different robot Cybern., SMC-7(12):868 – 71, 1977.
using the software we provided. For those tests, careful attention [11] Matthew T. Mason. Compliance and force control for computer
controlled manipulators. IEEE Trans. Syst. Man Cybern., 11(6):
to finger calibration was given and our success rate for the 418 –432, 1981.
objects mentioned was 100%, even on objects not previously [12] A.T. Miller and P.K. Allen. Graspit! a versatile simulator for
encountered. robotic grasping. IEEE Robot. Autom. Mag., 11(4):110 – 22, 2004.
[13] J. Nakanishi, R. Cory, M. Mistry, J. Peters, and S. Schaal.
VII. C ONCLUSION Operational space control: a theoretical and empirical comparison.
We addressed the problem of grasping objects that are small Int. J. Robot. Res., 27(6):737 – 57, 2008.
[14] Van-Duc Nguyen. Constructing stable grasps. Int. J. Robot. Res.,
relative to an anthropomorphic hand, including a pen, screw- 8(1):26 – 37, 1989.
driver, cellphone, and hammer from their natural poses on a [15] Jr. Platt, R., A.H. Fagg, and R.A. Grupen. Null-space grasp
supporting surface, e.g., a table top. We argue that contact control: Theory and experiments. IEEE Trans. Robot., 26(2):282
with support surfaces is critical for grasping small objects. We – 95, 2010.
devised three simple, yet effective, manipulation primitives for [16] J.M. Romano, K. Hsiao, G. Niemeyer, S. Chitta, and K.J. Kuchen-
becker. Human-inspired robotic grasp control with tactile sensing.
robust grasping (and releasing) of small objects from support IEEE Trans. Robot., 27(6):1067 –1079, 2011.
surfaces: (1) Compliant Finger Placement for bringing all fingers [17] J.C. Trinkle, J. M. Abel, and R. P. Paul. An investigation of
safely in contact with the support surface, (2) Compliant Object frictionless enveloping grasping in the plane. Int. J. Robot. Res.,
Grasping for maintaining the contact between the fingertips and 7(3):33–51, June 1988.

184
Multi-Stage Micro Rockets for Robotic Insects
Mirko Kovač, Maria Bendana, Rohit Krishnan, Jessica Burton, Michael Smith, Robert J. Wood
[email protected], [email protected]
Micro Robotics Laboratory (http://micro.seas.harvard.edu)
Harvard University

Abstract—One of the main challenges for sustained flight of


current aerial micro robots is the low energy density available
from common power sources. In this paper we propose solid
rocket fuel powered micro thrusters as a high energy density
actuation method for aerial micro robots. In a multi stage
configuration these thrusters can be used for intermittent flight
which can decrease the energetic cost of locomotion. In particular
we focus on the fabrication method and characterization of multi-
stage micro thrusters with a diameter of 3mm and 6.4mm.
We demonstrate a sustained and repeatable thrust force of
up to 35mN for a duration of up to 42s and a multi-stage
designs with a time delay of up to 4.7s between the propulsion
phases. Furthermore, we present a take-off trajectory of a 10cm
rocket glider with an integrated micro thruster as propulsion
mechanism showing that the technologies developed can be used Fig. 1. Rocket glider with 10cm wing span and a mass of 2g with an
to successfully power micro robots in flight. Future work will integrated multi-stage micro thruster.1
focus on control and flight dynamics of micro thruster powered
gliders. Wider applications of similar thrusters can include other
robotic applications where low weight and high force is important aerial micro robots [6, 7, 8, 9, 10], very little research has
such as for jumping or running robots. 1 addressed intermittent flight and its implications on the design
I. I NTRODUCTION of flying micro robots. In this publication, we present our
first steps towards the development of a family of multi-stage
The development of flying robotic insects is one of the micro thrusters using solid rocket fuel that can be used to
grand challenges in robotics research. The application of such study intermittent locomotion in aerial microrobots. As micro
robots includes mobile distributed sensor networks, assisted thruster we define combustion based jet propulsion systems
agriculture, hazardous environment exploration, and search with a total mass of less than 10g. Micro thrusters are very
and rescue. Several projects are underway which intend to interesting for propulsion of flying robots due to their compact
build insect size flying robots [1, 2, 3] but none of those and mechanically simple design, lacking moving parts which is
systems have demonstrated sustained self-powered flight. One very important for fabrication at the micro scale. Furthermore,
of the main reasons for this limitation is the high energy multi-stage micro thrusters can be used to conveniently study
demand for sustained flight at small scales as well as the very optimal duty cycles for energy efficient intermittent flight.
low energy density of currently available power sources, such The technology of micro thrusters has been developed
as small Lithium Polymer batteries. In order to increase the largely with the motivation of space applications such as for
range of insect-size flying robots, we propose a hybrid flight nano satellites with masses in the range of 20kg. The goal is
mode as well as a novel implementation of high energy density the generation of a very precise and repetitive propulsion that
rocket propulsion. We then integrate these micro thrusters on can be used to position and propel satellites [11, 12, 13] or
a new generation of robotic butterflies as illustrated in figure deploy Smart Dust sensors [14]. Other projects have focussed
1. on optimizing combustion chamber designs of micro thrusters
Based on observations of flying animals and mathematical for use with liquid fuel [15, 16] and solid propellant [17]. Fur-
modeling of their flight energetics, it has been suggested that ther studies have optimized wall losses and nozzle geometries
adapting a hybrid flapping-gliding flight mode, as opposed for micro thrusters using numerical simulations [18, 19]. An
to continuous propulsion, can be a promising locomotion overview on small scale combustion-based propulsion can be
strategy to minimize the energetic cost of transport [4, 5]. found in the recent review article presented by Ju et al. [20].
In fact, many animals such as birds, dragonflies, locusts, Recently, Bergbreiter et al. [21] have demonstrated a jumping
bats and butterflies employ intermittent flight, i.e. alternate micro robot which uses on board combustion to initiate a jump
propelled and gliding flight, as their aerial locomotion mode. with propellant being embedded in nano-structured silicone.
Although recent projects have explored gliding flight for Although these projects are promising and successfully
1 Video footage of the micro thrusters and the rocket glider in operation can apply the technology of micro thrusters, no implementation has
be found at http://www.youtube.com/watch?v=bDRd8-xq0P4 been presented yet which applies these technologies to flying

185
 accelerated in the diverging section of the nozzle and typically
 reaches velocities of several kilometers per second.

  A critical factor used to analyze the effectiveness of a rocket
engine and/or a propellant is the total impulse which is defined
as the thrust force integrated over the burning time:
    
 t
  
Ist = F dt. (2)
0
Fig. 2. Schematic representation of the combustion chamber of a rocket. Tc , Another important factor used to analyze the effectiveness of a
Ac and pc refer to the temperature, cross sectional area and pressure inside
of the combustion chamber. Ae and pe refer to the cross sectional area and rocket propulsion system is the ‘specific impulse’. It is defined
the pressure at the end of the nozzle. p0 is the environmental pressure. as the total impulse per unit weight of propellant:
F
Isp = (3)
robots. In particular, the integration of a multi-stage micro ṁ · g
thruster is a novelty in the field combining modulated thrust with ṁ as the mass flow through the nozzle and g as the
propulsion with a biologically inspired locomotion strategy. gravitational constant. The specific impulse indicates how
In the coming sections, we start by presenting an overview much propulsion was transferred from the propellant to useful
of different options for the fuel that we considered for the thrust and it includes losses of heat through the walls, flow
micro thrusters and the rationale for the nozzle geometry effects, etc. The impulse to weight ratio will be used in the
that we chose. We then compare five fabrication procedures coming sections to evaluate different rocket body designs. It
for the combustion chamber and nozzle and describe the is defined as:
implementation of the ignition mechanism. Furthermore, we
Ist
present the micro thruster prototypes in single and multi-stage Iw = (4)
configurations and characterize their force profiles. Finally, we mc · g
demonstrate successful take-off and flight of a micro thruster with mc as the combustion chamber mass of the micro
powered rocket glider. thruster. The impulse to weight ratio indicates which rocket
construction method for the combustion chamber leads to the
II. ROCKET PROPULSION BASICS highest thrust produced for its weight and burn time.
Rocket propulsion engineering is a well developed and very A. Challenges at small scales
complex field which is largely motivated by space travel and
The most prominent effect of small scale is that the heat
military applications. From an engineering perspective it is
transfer through the walls is much more pronounced as com-
particularly challenging to design optimal rockets due to the
pared to larger rockets. For larger rockets it corresponds to
interrelation of combustion physics, propellant chemistry, ma-
around 2-5% [22] of the total energy, where for micro thrusters
terial science, heat transfer and phase change effects as well as
it can reach more than 10% [23, 24, 25]. This is attributed
fluid dynamics, including compressive flows in the supersonic
to the unfavorable scaling of volume to surface area with
nozzles. In this section we provide the basic definitions which
decreasing size and has been found to be as dramatic as
will be referred to later when presenting the design and the
extinguishing the combustion due to the large heat transfer
results of our micro thrusters. More details on the state of the
through the micro thruster walls. This effect is referred to as
art and history of rocket design can be found in [22].
flame quenching and has prevented the reliable operation of
The thrust F produced by a rocket depends on mass flow,
silicon wafer based micro thrusters in the Smart Dust project
exit velocity, the pressure inside the combustion chamber and
[18]. Another effect that is pronounced at small scales is
in the environment as well as the cross sectional area of the
the boundary layer phenomena in the supersonic nozzle area
nozzle. This relationship can be expressed as:
which can lead to partial blockage of the flow and decrease
in efficiency [26]. A further challenge is the fabrication and
F = ṁve + (pe − p0 )Ae (1)
assembly of the micro thrusters. Conventional large scale
where ṁ is the changing mass due to the propellant being systems use a variety of materials and integration methods
burned and exiting the nozzle, ve is the exit velocity, pe is to ensure the structural and thermal integrity of the rocket.
the pressure of the exiting gas, p0 is the pressure outside the For example, they use bolts and screws, large composite fiber
rocket, and Ae is the area ratio from the throat to the exit layups, polymer liners to seal the inside of the combustion
[22]. The first term is the momentum thrust and the second chamber as well as graphite inserts at the nozzle throat. Many
term is referred to as the pressure thrust. The throat of a of those fabrication methods and material choices are not
rocket nozzle is defined as the smallest cross-sectional area possible at the scale of several millimeters or have not been
of the nozzle. Figure 2 depicts a schematic overview of the developed yet. A major part of the research needed in the
combustion chamber and nozzle of a rocket engine. At the development of micro thrusters is therefore the material choice
throat, the flow has a Mach number of one, indicating that for the combustion chamber and nozzles as well as the choice
the flow velocity equals the speed of sound which is further of appropriate micro fabrication techniques.

186
III. P ROPELLANT engines produce pressures of up to 40bar [28] at temperatures
Various rocket propulsion technologies have been developed which can reach up to 2500◦ C. The main requirement for the
including liquid bi-propellants, nuclear thermal, and solid combustion chamber of the micro thruster is to be structurally
rocket propulsion. The most widely used rocket fuels for resistant to this high pressure and temperature combination
small scale systems are solid composite propellants. Its main and have a low heat transfer coefficient in order to reduce
advantage compared to liquid multicomponent fuel systems is energy losses through the combustion chamber wall. At the
that it is relatively easy to implement and control. The oxidizer same time, the combustion chamber has to be as light weight
and the fuel are both embedded in the grain of a castable solid as possible in order to allow for usage in flight. These design
material eliminating the need for moving parts or complex requirements lead to a trade-off between structural strength
valves. and low weight.
The chemical composition of the solid rocket fuel can We optimize this trade-off by testing five different combus-
consist of a variety of materials using different oxidizer- tion chamber fabrication methods. The materials employed are
fuel combinations and additives which act as binders and carbon fiber composites and castable ceramics. As a first test,
catalysts. An overview of different chemical compositions that we fabricate the micro thrusters with an inner diameter of
are commonly used in existing solid fuel rocket systems can be 6.4mm although the same fabrication methods and materials
found in [22, 27]. We choose to use Ammonium Perchlorate can be used for diameters less than 2mm.
Composite Propellant (APCP) because of its high specific The first design consists of a commercially purchased
impulse, constant burning rate, good curing properties, relative protruded carbon fiber (DPP) tube with a particularly high
ease of fabrication, and commercial availability. Higher energy fiber content of 63%. The fiber type is Torayca T700 which is
fuels such as HMX are based on solid rocket fuels like bonded using Bisphenol A Epoxy Resin. The wall thickness
APCP but increase the specific impulse by adding high-energy is 0.53mm at a weight of 0.176g/cm. These carbon fiber tubes
explosives to the mix, which leads to increased potential are, to the best of our knowledge, the structurally strongest
hazards during fabrication and operation. Its use is therefore that can be purchased off the shelf at this scale and weight.
typically limited to high performance military missiles. The However, the fibers in these tubes are oriented longitudinally
specific APCP composition that we use is described in detail at 0◦ which is not the optimal orientation based on pressure
in [27]. Each batch contained 12% R45-HLO resin (HTPB) vessel theory. In fact, it can be shown theoretically [29] that
as the fuel, 6% 2-ethylhexyl acrylate (EHA) as the plasti- 54.74◦ is the best fiber orientation angle with respect to the
cizer, 0.5% Tepanol (HX-878) as the bonding agent, 70% long-axis of the cylinder to sustain high internal pressures.
ammonium perchlorate (AP, 200μm grain size) as the oxidizer, In order to calculate this angle, let us assume a cylindrical
1% ferric oxide (Fe2 O3 ) as the combustion catalyst, and pressure vessel which is constructed by filament winding
10.5% isophorone diisocyanate (IPDI) as the curing agent. The where the fibers are laid down at a helical angle α as shown in
combustion temperature, and therefore the specific impulse can figure 3. To balance the hoop and axial stresses, the tension in
be increased by adding aluminum particles to the composition an infinitesimal fiber element must satisfy the following two
which act as fuel. However, this is discouraged [27] for relations which need to be solved simultaneously:
micro thrusters because of the risk to clog the nozzle with pr
melted aluminum particles. Additionally, a higher operation nT sin α = (1)(b) (5)
b
temperature would be more detrimental to the combustion
chamber and would increase the risk of mechanical failure. pr
nT cos α = (tan α)(b) (6)
2b
IV. ROCKET MOTOR DESIGN
with n being the number of fibers, T the tension in the fibers,
The design requirements for the micro thursters are dictated p the internal pressure and b the wall thickness. After dividing
by the strong weight limitations and high propulsion needs of the first with the second equation and rearranging we obtain:
flying micro robots. The goal is to develop a combination of
fabrication methods and material choices which are relatively tan2 α = 2, αopt = 54.74◦ (7)
easy to apply, allow sufficient structural and thermal stability,
are light-weight and achieve a high thrust performance. In where αopt is the angle that renders the vessel twice as strong
the following section we present this optimization on the circumferentially as it is axially. More advanced models of
design of the combustion chamber and we introduce the fiber-reinforced composite pressure vessels have been devel-
design choices and construction methods of the nozzle and oped by Parnas et. al. [30], who consider thermal loading
the ignition system. in addition to internal pressure. The optimal angle in this
case ranges between 52.1◦ and 54.2◦ depending on the vessel
A. Combustion chamber geometry and failure criteria.
Combustion of the rocket fuel takes place in the combustion The second design is a custom fabricated carbon fiber layup
chamber where the chemical energy is converted to kinetic with an optimal winding angle of 54◦ . The fabrication method
energy. The mechanical and thermal strain on the combustion is a four step process: First, a 15.0 cm long sheet of 23.5gsm
chamber walls is extreme by any standards. Model rocket (grams per square meter) prepreg carbon fiber and C3C Cynate

187
   
  
  

 
 
  

Fig. 3. The optimal fiber angle based on pressure vessel theory is α =54.7◦ .
(A) Schematic for the fiber layup of the micro thrusters. (B) Fabricated micro
thruster with a 54◦ carbon fiber layup. Fig. 4. The fabricated micro thruster has build in ignition and a high strength
composite nozzle. The fabrication method allows fabricating micro thrusters
with sizes of less than 1.5mm.
Ester Epoxy is cut to size, laid in a 54◦ fiber orientation, and
rolled 7 times on a metal rod which has previously been coated
with Fluorinated ethylene propylene (FEP). Second, the carbon has a mass of only 0.12g. Reliable ignition is achieved and
fiber sheet is placed into an FEP shrink wrap tube, secured requires heating power of 2W for a duration of 1s. A 0.3g
at one end, and the air is removed. Third, the assembly is 10mAh LiPo battery can theoretically sustain 2W for 7.2s
heated using a heat gun to hold the carbon fiber in place. The and could be used to ignite the micro thrusters. More energy
assembly is then placed into a high temperature oven and is efficient igniter solutions are possible by using electrically
heat cured at 235◦ C for 2.5 hours. Finally, the carbon fiber activated pyrotechnic initiators. Common compositions of such
tube is released from the FEP shrink tube and the metal rod initiators consist of zirconium-potassium perchlorate or boron-
and is cut to the desired length. The result is a carbon tube potassium nitrate.
with an inner diameter of 6.4mm, a wall thickness of 0.42mm
C. Nozzle
and a mass of 0.133g/cm. The fiber content is 66% leading to
a very resilient and light weight rocket combustion chamber. The purpose of the expanding section of the nozzle is to
The third design is the same as the second, but with a 45◦ further accelerate the flow after the throat area and create
fiber angle in order to account for the effects of a possible positive pressure which can be used to increase thrust.
fabrication variability on the layup angle during fabrication The main requirement for the nozzle wall is structural
and to test the calculations from pressure vessel theory. and thermal stability during operation and a smooth surface
The fourth design is based on a commercial DPP tube which allowing for sustained flow attachment. The nozzle wall should
is dip coated on the inside in a fiber base Al2 O3 ceramic therefore be smooth and have a smooth transition from the
(Cotronics Resbond 901). The properties of this ceramic are throat to the nozzle in order to avoid abrupt pressure changes
a very low thermal conductivity of only 0.22W/m · K and a which could lead to shock waves which can choke the flow
high service temperature of 1430◦ C. The resulting tube has a through the nozzle and decrease thrust. Traditionally, rocket
ceramic layer of 0.12mm on the inside and the outside of the nozzles are made from high temperature steel or fiber compos-
DPP tube. The final mass is 0.208g/cm. ite materials, with graphite insertions at high abrasion areas
The fifth design is based on a commercial DPP tube which such as the throat [22]. A nozzle opening angle of 30◦ is
is spin coated on the inside wall with a single component an empirical design standard and has been determined to be
ceramic composite based on Mica platelets and proprietary optimal for micro thrusters [19] using numerical simulations.
Al2 O3 ceramic binders (Cotronics Resbond 907). Compared The nozzle for the micro thrusters presented in this paper
to Resbond 901, the 907 offers a lower service temperature of have been designed to be (i) light weight, (ii) easy to fabricate
1260◦ C and a higher thermal conductivity of 0.865W/m · K. at small scales, (iii) structurally resistant at high temperatures
The fabricated combustion chamber has a total wall thickness and (iv) easy to integrate with the combustion chamber. The
of 0.53mm and a mass of 0.175g/cm. chosen design uses layered glass fiber as the structural material
with a high thermal resistance, high temperature epoxy as the
B. Ignition bonding agent to integrate the glass fiber mesh and graphite
The thermodynamics of igniting composite propellants is grains to increase the thermal resistance of the composite.
very complex and is summarized in [22]. The combustion A cut CAD design of a nozzle is depicted in figure 5.A.
dynamics depends on the grain size, mixture ratios of the The fabrication process begins by machining twelve rings of
different components, and local heating and phase transition 0.128mm wear resistant FR4 sheets using a CO2 laser (Versa
effects. A detailed characterization is outside the scope of Laser 2.3). The rings are assembled using a precision pin
this paper and the interested reader is referred to [12] which alignment method [31]. The layers are laminated using high
discusses this topic in detail and reviews different ignition temperature epoxy with 20% graphite powder for two hours at
technologies. The ignition method used for the micro thrusters 120◦ C and four hours at 235◦ C under a weight of 50kN/m2 .
in this paper is based on Joule heating with three integrated Subsequently, the nozzle is released from its pin alignment
small surface mount resistors (figure 4). The complete igniter structure and is ready to be integrated with the combustion

188
  
Propellant
Stage 3
 
  Stage 2


 
  
5mm Stage 1
Cellulose
Fig. 5. The nozzle consist of a combination of glass fiber laminate with delay charge
embedded high temperature epoxy and graphite to for high strength and
structural integrity at high temperatures. A: CAD design, B: Fabricated nozzles
with a U.S. quarter coin for size comparison, C: Micro thruster in operation. Fig. 6. Micro thruster in a multi-stage configuration.

   
chamber. The final fabricated nozzles can be seen in figure
5.B on a U.S. quarter coin for size reference. A Scanning
Electron Microscope (SEM) image of the fabricated nozzle  
 
is depicted in figure 5.C. Once the nozzle is fabricated, it is
  
embedded into the combustion chamber using the same high

temperature epoxy graphite mix adhesive. The final rocket
motor in operation is shown in figure 5.D.
 
D. Multi-stage design    
The multi-stage design enables thrust modulation by reduc-     
ing the thrust force temporarily over a fraction of a second
to several seconds. We considered several methods where the
burning rate of the propellant is modulated using a combustion Fig. 7. The experimental setup allows measuring the thrust force of the
micro thrusters in a controlled and enclosed environment.
cooler, such as sodium bicarbonate, which reduced the thrust
output but at the cost of a less stable combustion. Other meth-
ods include hybrid propellant designs with a low performance range of thrust modulation by changing the inner ring diameter
propellant inserted between the stages. In particular, we tested and thickness of the delay charge. Further, it does not require
potassium nitrate mixed with sulfur and charcoal compositions a flame transition between different propellant compositions
which are also used as the delay charge in model rockets. which reduces the risk of flame extinction at the interface of
We also tested potassium nitrate sucrose composite propellant the delay charge and the main propellant.
which is a common hobbyist rocket fuel and provides a lower
specific impulse than APCP. The main challenge in such a V. R ESULTS
design was to ensure flame transition between the different The experimental setup (figure 7) is enclosed in a fume hood
propellants where each has a different flame and melting in order to contain the exhaust gases of the micro thruster.
temperature and burning rate. In addition, the drawback is that The micro thruster is clamped on a carbon fiber lever arm
a fraction of the propellant is intentionally designed to give which rotates using a low friction ball bearing around the
little thrust which adds weight to the micro thruster without axis. The lever arm is connected to a computer controlled
adding impulse to it. single axis force sensor which has a resolution of 0.1mN and
After evaluating the methods described above, we decided to a measurement bandwidth of 10Hz. The lever arm and the
opt for a novel multistage design where we modify the geom- force sensor are encased in a transparent protective cage to
etry of the combustion chamber by introducing combustable reduce the measurement noise due to the air flow caused by the
delay charges which allow for temporal thrust modulation. The ventilation system of the fume hood. The impulse is calculated
delay charge consists of layered cellulose fiber rings which are by integrating the thrust force over time as described in section
glued together using acetone-based spray adhesive. A three II. In this section we start by comparing the performance
stage micro thruster design is illustrated in figure 6. The inner of the different combustion chamber fabrication methods and
ring of the delay charge is 2mm in diameter and filled with materials. Further, we characterize the scaling behavior of the
APCP which ensures that the combustion continues between micro thrusters by fabricating and testing different lengths and
the two stages. The delay charge itself is highly combustable diameters. Finally, we characterize multi-stage configurations
and reduces the thrust force for a limited time due to the and demonstrate successful take-off and flight of a 10cm rocket
temporal change in combustion chamber geometry. As soon glider with micro thruster based propulsion.
as the second stage ignites, it burns away and allows the
second stage to use the entire combustion chamber. The main A. Combustion chamber materials
advantage of this method for the delay charge design is that In order to decide which combustion chamber fabrication
it is very simple to integrate and that it allows for a large method and material is best for the application of intermittent

189
40 
Commercial 80
Custom 54 15 mm
30 Custom 45 30 mm
60
Thrust [mN]

Ceramic (907) liner 40 mm

Thrust [mN]
20 Ceramic (901) liner 60 mm
40
10
20
0
0 5 10 15 20 25
Time [s] 0
0 10 20 30 40
Time [s]

Fig. 8. Force profile for the combustion chamber fabricated with different 
25
materials an fiber layup angles. The force profiles are more erratic for the
ceramic liners indicating irregular combustion. 15 mm
20 30 mm
40 mm

Thrust [mN]
Impulse per mass [Ns/g]

15

1 10
  
5
0.5 
 0
0 5 10 15 20 25
Time [s]
0  
  
 
    
    
      Fig. 10. Force profiles for different lengths for the micro thrusters with

      6.4mm diameter (A) and 3mm diameter (B). The maximal thrust duration of
  a 60mm micro thruster is 42s.

Fig. 9. The design with a 54.7◦ fiber orientation exhibits the best impulse to
mass ratio, 21% higher than the commercially purchased carbon fiber tubes. 60mm thruster is 42s with a slightly convex force profile. The
convex shape could be caused by a build up of pressure and
temperature during the first 20s which increases the burning
flight for small robots, we measure the impulse per mass rate of the propellant. This then decreases as the volume
for every motor using the same amount of propellant. Each becomes larger with a higher percentage of heat loss through
combustion chamber is filled with 1.4g of APCP. The force the walls which slightly cools the combustion, producing less
profiles for the different designs are shown in figure 8 and force.
the impulse to mass ratio in figure 9. It can be seen that the
design with a 54.7◦ fiber orientation exhibits the best impulse C. Multi-stage micro thruster
to mass ratio, 21% higher than the commercially purchased We characterize the thrust modulation by varying the delay
carbon fiber tubes. The ceramic coatings do not improve the charge thickness between 1.29mm and 3.67mm using the
impulse to mass ratio. Furthermore, the force profiles are more micro thrusters with 6.4mm diameter. For statistical relevance
erratic, indicating irregular combustion which we suspect to be we fabricate and test three samples at each delay charge
caused by flaking of the ceramic liner during combustion. thickness and plot the average force profile for each case and
the standard deviation in figure 11.A. The average time where
B. Scaling the thrust is reduced below 10mN is plotted in 11.B as a
In order to characterize the scaling behavior and repeatabil- function of the delay charge thickness. As a proof of concept
ity of the micro thrusters we fabricate samples with lengths for multistage designs, we fabricate a three stage micro thruster
between 15mm and 60mm for the micro thruster with 6.4mm and measure the force profiles (figure 11.C). These results
(figure figure 10.A) and with 3mm diameter (figure 10.B). It demonstrate that the thrust can be successfully modulated with
can be seen that all lengths have a similar maximal thrust a time delay of up to 4.7sec and a complete thrust recovery
plateau of around 20mN for the larger thrusters and 8mN for at the following thrust phase. Further, we demonstrate that
the smaller design with an initial force peak at the start of three stage micro thrusters are feasible using the presented
the burn. This initial force overshoot is a known phenomena fabrication and integration methods.
and is attributed to the surface area of the flame front. It
peaks as it becomes a cone coming out of the nozzle into D. Integration on a microglider
the body and then flattens out as it hits the body casing and As a first step towards the integration of these micro
burns at a constant rate. The maximal burn duration for the thrusters on a micro glider testbed we fabricate and attach a

190
 4 layer, 1.29mm
 5 
40
8 layer, 2.48mm
4.5
12 layer, 3.67mm
30

Delay Time [s]


4

Thrust [mN]
Thrust [mN]

3.5 20
3
10
2.5

2 0
0 5 10 15 20
1 1.5 2 2.5 3 3.5 4
Time [s] Time [s]
Delay Charge Thickness [mm]

Fig. 11. The multi-stage configuration allows thrust modulation with a delay time of up to 4.7s between the propulsion phases. A: Thrust force results
for different delay charge thicknesses. Each thickness has been repeated three times, the filled line represents the mean value the shaded area the standard
deviation, B: Delay charge thickness vs. measured time delay, C: Thrust profile of a three stage micro thruster.

single stage micro thruster to a micro glider with a wingspan VI. C ONCLUSION AND FUTURE WORK
of 10cm (figure 1). The wing loading of this glider is 5.5N/m2 In this paper we presented the development and integration
which is in the same order of magnitude as large butterflies of novel multi-stage micro thrusters which can be used for
or bats. As wing material we use glass fiber reinforced epoxy the propulsion of the next generation of energy efficient micro
FR4 laminate sheets with a thickness of 250um. A sample robots. We demonstrate a sustained and repeatable thrust force
flight trajectory is illustrated in figure 12 where we test the of up to 35mN for a duration of up to 42s and multi-stage
robustness of the glider material to the very high thermal load designs with a time delay of up to 4.7s between the propulsion
on the micro thruster body as well as the force acting on the phases. The average specific impulse for the developed micro
glider. The glider performs a successful and stable take-off thrusters is 27.2s. Based on the characterization and the
from a starting ramp, although it is not yet fully balanced successful integration and take-off trajectory of these micro
which leads to a pitch instability after around 500ms flight thrusters on a glider testbed, we conclude that solid rocket
time. These results indicate that the chosen glider materials propulsion is a promising actuation method for robotic insects.
and integration methods are suitable for micro thruster pow- It can as well be used for other systems which require high
ered gliders and that the developed micro thrusters have largely force and low weight such as for jumping or running robots.
sufficient thrust to propel flying micro robots with a mass of Future work will focus on the mathematical modeling and
several grams. Future work will focus on multi stage designs experimental characterization of the flight phases of rocket
and control of the rocket gliders using piezo or Shape Memory gliders to study optimal duty cycles for intermittent flight.
Alloy based steering actuators such as in [8, 10].
ACKNOWLEDGMENTS
 The authors would like to thank Prof. J. C. Oxley and
Dr. Patrick Bowden from Rhode Island University for the
helpful discussions on solid rocket fuel chemistry as well as
Nelly Briquet for the assistance with the thruster experiments.
Furthermore, we thank Steve Sansone for the conduct of our
flight tests in the particle accelerator facility. We would also
 like to thank Dr. James Weaver for taking the SEM images
of the nozzles. This work is partially funded by the Wyss
Institute for Biologically Inspired Engineering (award number
ONR N00014-10-1-0684).
R EFERENCES
  [1] R. J. Wood, “The first takeoff of a biologically inspired
at-scale robotic insect,” IEEE Transactions on Robotics,
vol. 24, no. 2, pp. 341–347, 2008.
[2] V. Arabagi and M. Sitti, “Simulation and analysis of
a passive pitch reversal flapping wing mechanism for
an aerial robotic platform,” in IEEE/RSJ International
Conference on Intelligent Robots and Systems, 2008, pp.
Fig. 12. Successful micro thruster powered take-off trajectory of a 10cm 1260–1265.
rocket glider with a mass of 2g. [3] H. Tanaka, K. Matsumoto, and I. Shimoyama, “Design
and performance of micromolded plastic butterfly wings

191
on butterfly ornithopter,” in IEEE/RSJ International Con- [17] E. A. Parra, K. S. J. Pister, and C. Fernandez-Pello,
ference on Intelligent Robots and Systems, 2008, pp. “A practical solid-propellant micro-thruster,” in ASME
3095–3100. International Mechanical Engineering Congress and Ex-
[4] J. Rayner, P. W. Viscardi, S. Ward, and J. R. Speakman, position, 2006.
“Aerodynamics and energetics of intermittent flight in [18] E. Rudnyi, T. Bechtold, J. Korvink, and C. Rossi, “Solid
birds,” American Zoologist, vol. 41, no. 2, pp. 188–204, propellant microthruster: Theory of operation and mod-
2001. elling strategy,” Proc. Nanotech’02, AIAA paper 2002,
[5] J. J. Videler, D. Weihs, and S. Daan, “Intermittent gliding vol. 5755, 2002.
in the hunting flight of the kestrel Falco Tinnunculus,” [19] I. Kim, J. Lee, M. Choi, and S. Kwon, “Optimum nozzle
Journal of Experimental Biologyiology, vol. 102, pp. 1– angle of a micro solid-propellant thruster,” Nanoscale
12, 1983. and Microscale Thermophysical Engineering, vol. 15,
[6] K. Peterson, P. Birkmeyer, R. Dudley, and R. S. Fearing, no. 3, pp. 165–178, 2011.
“A wing-assisted running robot and implications for [20] Y. Ju and K. Maruta, “Microscale combustion: Technol-
avian flight evolution,” Bioinspiration & Biomimetics, ogy development and fundamental research,” Progress in
vol. 6, 2011. Energy and Combustion Science, vol. 1, no. 1, pp. 1–47,
[7] M. Kovac, W. Hraiz, O. Fauria, J.-C. Zufferey, and 2011.
D. Floreano, “The EPFL jumpglider: A hybrid jumping [21] W. A. Churaman, A. P. Gerratt, and S. Bergbreiter, “First
and gliding robot,” in IEEE International Conference on leaps toward jumping microrobots,” in IEEE/RSJ Inter-
Robotics and Biomimetics, 2011, pp. 1503–1508. national Conference on Intelligent Robots and Systems,
[8] R. J. Wood, S. Avadhanula, E. Steltz, M. Seeman, 2011, pp. 1680–1686.
J. Entwistle, A. Bachrach, G. L. Barrows, S. Sanders, and [22] G. P. Sutton, Rocket propulsion elements-An introduction
R. S. Fearing, “Design, fabrication and initial results of to the engineering of rockets. New York, Wiley-
a 2g autonomous glider,” in IEEE Industrial Electronics Interscience, 1992, vol. 1.
Society Meeting, 2005. [23] V. Shirsat and A. K. Gupta, “A review of progress in heat
[9] M. Woodward and M. Sitti, “Design of a miniature recirculating meso-scale combustors,” Applied Energy,
integrated multi-modal jumping and gliding robot,” in vol. 88, no. 12, pp. 4294–4309, 2011.
IEEE/RSJ International Conference on Robotics and [24] K. L. Zhang and S. S. Chou, S. K.and Ang, “Performance
Automation, 2011, pp. 556 – 561. prediction of a novel solid-propellant microthruster,”
[10] M. Kovac, A. Guignard, J.-D. Nicoud, J.-C. Zufferey, and Journal of propulsion and power, vol. 22, no. 1, pp. 56–
D. Floreano, “A 1.5g sma-actuated microglider looking 64, 2006.
for the light,” in IEEE International Conference on [25] H. Mirels, “Effect of wall on impulse of solid propellant
Robotics and Automation, 2007, pp. 367–372. driven millimeter-scale thrusters,” AIAA journal, vol. 37,
[11] B. Larangot, V. Conedera, P. Dubreuil, T. Do Conto, and no. 12, pp. 1617–1624, 1999.
C. Rossi, “Solid propellant microthruster: An alternative [26] R. L. Bayt and K. S. Breuer, “Viscous effects in super-
propulsion device for nanosatellite,” in Aerospace Ener- sonic mems-fabricated micronozzles,” in Proceedings of
getic Equipment Conference, 2002. the 3rd ASME Microfluids Symposium,, 1998, pp. 117–
[12] D. A. De Koninck, U. Bley, V. Gass, D. Briand, and N. F. 123.
De Rooij, “Pyromems igniter based on a temperature [27] G. W. Purrington, Plastic Resin Bonded High Energy
gradient: Concept, fabrication and characterization,” in Rocket Fuel Systems. Basic Ingredient Study and Small
Proceedings of the 9th International Workshop on Micro Motor Production. Vol III. Firefox Enterprises, 2001.
and Nanotechnology for Power Generation and Energy [28] L. H. Hertz, The complete book of model aircraft, space-
Conversion Applications, vol. 1, 2009, pp. 108–111. craft and rockets. Crown Publishers, 1967.
[13] K. L. Zhang, S. K. Chou, S. S. Ang, and X. S. Tang, [29] P. M. Wild and G. W. Vickers, “Analysis of filament-
“A mems-based solid propellant microthruster with au/ti wound cylindrical shells under combined centrifugal,
igniter,” Sensors and Actuators A: Physical, vol. 122, pressure and axial loading,” Composites Part A: Applied
no. 1, pp. 113–123, 2005. Science and Manufacturing, vol. 28, no. 1, pp. 47–55,
[14] D. Teasdale and K. S. J. Pister, “Solid propellant mi- 1997.
crorockets,” Master’s thesis, University of California at [30] L. Parnas and N. Katirci, “Design of fiber-reinforced
Berkeley, 2000. composite pressure vessels under various loading con-
[15] M. Wu, Y. Wang, R. Yetter, and V. Yang, “Liquid mono- ditions,” Composite Structures, vol. 58, no. 1, pp. 83–95,
propellant combustion in mesoscale vortex chamber,” 2002.
Journal of propulsion and power, vol. 25, no. 3, 2009. [31] J. P. Whitney, P. S. Sreetharan, K. Y. Ma, and R. J. Wood,
[16] L. Sitzki, K. Borer, E. Schuster, P. D. Ronney, and “Pop-up book mems,” Journal of Micromechanics and
S. Wussow, “Combustion in microscale heat-recirculating Microengineering, vol. 21, no. 11, pp. 115 021–115 027,
burners,” in The Third Asia-Pacific Conference on Com- 2011.
bustion, vol. 6, 2001, pp. 11–14.

192
Feature-Based Prediction of Trajectories
for Socially Compliant Navigation
Markus Kuderer Henrik Kretzschmar Christoph Sprunk Wolfram Burgard
Department of Computer Science, University of Freiburg, Germany
{kudererm, kretzsch, sprunkc, burgard}@informatik.uni-freiburg.de

Abstract—Mobile robots that operate in a shared environment


with humans need the ability to predict the movements of
people to better plan their navigation actions. In this paper, we
present a novel approach to predict the movements of pedestrians.
Our method reasons about entire trajectories that arise from
interactions between people in navigation tasks. It applies a
maximum entropy learning method based on features that capture
relevant aspects of the trajectories to determine the probability
distribution that underlies human navigation behavior. Hence, our
approach can be used by mobile robots to predict forthcoming
interactions with pedestrians and thus react in a socially compliant
way. In extensive experiments, we evaluate the capability and
accuracy of our approach and demonstrate that our algorithm
outperforms the popular social forces method, a state-of-the-art
approach. Furthermore, we show how our algorithm can be used
for autonomous robot navigation using a real robot. Fig. 1. Encounter of a robot and a pedestrian. A robot that is able to predict
the behavior of the pedestrian can navigate in a socially more compliant way.
I. I NTRODUCTION
As the scope of application of robots is expanding from
industrial labor to domestic services, mobile robots are more are similar to those of humans with respect to these criteria. In
and more expected to share their environment with people. As this paper, we present an approach to learning typical human
a result, new challenges for their navigation systems arise. The trajectories emerging from interactive behavior. Therefore, our
ability to perceive the intention and to predict the behavior method considers features that correspond to natural properties
of people is of pivotal importance for efficient and socially of planned trajectories of several people to learn a probability
compliant navigation in human environments. distribution over possible joint trajectories that fits previously
observed human behavior. We apply two methods for online
Common approaches to mobile robot navigation in populated
trajectory planning during navigation. First, we exploit the fact
environments compute a path to the target position without
that the modes of the probability distribution mainly correspond
taking into account the typical collaborative collision avoidance
to different topological variants of the trajectories. This provides
behavior of humans. Many approaches rather rely on reactive
a natural separation of the search space. Second, we present a
collision avoidance methods [3, 25, 22]. Other methods
hierarchical and efficient optimization scheme to calculate the
evaluate trajectories based on the risk of motion conflicts,
most likely trajectory.
given predictions about potential future places covered by
people [2, 16, 29]. Although such methods effectively avoid The contribution of this paper is a novel approach to learn
collisions with humans, the resulting trajectories are often a model of human navigation behavior that is based on the
suboptimal with respect to the expectations of the humans due principle of maximum entropy. The approach allows a robot
to awkward and unexpected evasive movements. In this paper, to elicit the underlying probability distribution that captures
we present an algorithm for learning typical human navigation human navigation behavior from the observations of pedestrians.
behavior. Predicting human behavior and the ability to react We believe that, given this model, mobile robots will be enabled
in a natural way will enable robots to become socially more to predict the navigation behavior of people in their vicinity
compatible, which is relevant for mobile robots operating in and to react appropriately.
populated environments including urban scenes and traffic.
II. R ELATED W ORK
Our approach is motivated by recent studies that suggest
that the navigation behavior of humans is based on multiple After Goffman [8] had outlined how mutually accepted rules
criteria including the time of travel and comfort, which can be allow humans to navigate safely in shared environments, several
expressed in terms of quantities including accelerations and authors proposed rule-based approaches that describe human
proximity to other humans. These findings suggest that one can behavior. Recently Müller et al. [20] presented an application
accurately model human behavior if the generated trajectories of this in the context of navigation where a mobile robot

193
detects and follows people that walk in the same direction as optimization problem, the authors propose sampling from a
the robot. In contrast to this, Lerner et al. [17] infer a database joint density function, whereas we exploit the topological
of navigation rules from video data. structure of the problem. Furthermore, as opposed to Trautman
A more general and highly popular approach to model and Krause [26], we learn our model from observations of
pedestrian motion behavior is the social forces model by pedestrians to achieve more natural and human-like trajectories.
Helbing and Molnar [11], which suggests that human motion The work presented in this paper is also related to the elastic
can be described by forces that correspond to internal objectives bands approach, which was introduced by Quinlan and Khatib
of humans, such as the desire to reach a target and to avoid [23] and also considers continuous trajectories to the goal
obstacles. Several authors used parameter learning methods to position. Elastic bands aim to compute collision free, smooth
fit the model to observed crowd behavior [10, 15]. Although paths by gradually deforming an initially coarse path to the
the social forces model performs well at simulating crowds, goal. Fraichard and Delsart [7] extend elastic bands to deform
we found that the model poorly predicts the movements of the entire trajectory in the configuration-time space, similar
individual pedestrians, particularly during evasive maneuvers. to our optimization approach, although they assume constant
Several reactive collision avoidance approaches were suc- velocity models for all obstacles.
cessfully applied to mobile robot navigation for robots that The learning approach described in this paper is inspired by
navigate in crowded or dynamic environments. Such methods maximum entropy inverse reinforcement learning, which was
include the dynamic window approach by Fox et al. [6] and the introduced by Ziebart et al. [28] and then used for learning
velocity obstacles by Fiorini and Shillert [4]. In 2009, the latter pedestrian behavior in grid worlds [29, 12]. In contrast to
approach was extended to reciprocal velocity obstacles (RVO), these works, we do not assume an underlying Markov decision
a local reactive collision avoidance method that is guaranteed process. We rather consider continuous trajectories, which
to produce collision free paths if all agents apply this algorithm. allows us to introduce features that capture physical aspects of
To achieve more human-like behavior, Guy et al. [9] extend the trajectories as well as topological properties.
RVO by introducing response and observation time to other
agents. Whereas the approaches described above seek to avoid III. L EARNING H UMAN -L IKE NAVIGATION
dynamic obstacles such as pedestrians, they do not consider The objective of this work is to accurately learn human
cooperation and human predictive abilities, which sometimes navigation behavior from observations. Some authors [1, 19]
results in unnatural movements. suggest to minimize the Euclidean distance between the
For this reason, several authors proposed utility-based generated and the observed trajectories. According to our
optimization approaches that consider properties of possible experience in practical experiments, this approach often leads
trajectories in a given period of time. For instance, Arechavaleta to overfitting since the algorithms try to match the exact shape
et al. [1] present a nonholonomic motion model that minimizes of the demonstrated trajectories and often do not generalize
the norm of the accelerations along the trajectory, assuming seamlessly to different situations. Lerner et al. [17] try to
humans navigate as efficiently as possible. Mombaur et al. [19] overcome this problem by using a huge amount of training
add an additional cost component, penalizing the time to reach data. In contrast to that, we use features that capture important
the target. Furthermore, they use inverse optimal control to learn properties of human motion behavior to generate behavior
the model parameters that best explain observed data. While that is similar with respect to these features. In particular, the
these two approaches do not consider interactions between features may capture principles underlying human navigation
humans, Hoogendoorn and H.L. Bovy [13] propose a model behavior found by psychological studies, such as the attempt
for multi-agent behavior. They argue that cooperative human to move efficiently. Therefore, our approach is based on a
behavior can be modeled by optimizing a joint cost function, continuous representation of the trajectories that allows us to
which is similar to the assumptions in the work presented here. define appropriate features. Furthermore, our model explicitly
Pellegrini et al. [21] present a motion model to improve the accounts for the decisions of the agents regarding which sides
prior estimate of a tracking system. They assume that humans they choose when passing each other to avoid collisions by
seek to minimize an energy function comprising the predicted reasoning about topological variants of the trajectories.
minimal distance between two agents. As in the work described
here, their approach takes into account the multiple topological A. Preliminaries
structures of the trajectories of the different agents.
The trajectory xa of agent a is defined as a continuous
Trautman and Krause [26] point out that joint collision
function
avoidance is crucial for mobile robot navigation to prevent the
robot from “freezing” and getting stuck in densely populated t → xa (t) ∈ X (1)
environments. Their work assumes humans to be utility- that maps each point in time t to a configuration x ∈ X .
optimizing agents that prefer trajectories with low cost. In We furthermore consider the joint trajectory x that defines
contrast to many other approaches, their method is the only the behavior of all the agents as the Cartesian product of the
work that takes into account mutual interaction to plan joint trajectories of all N agents:
trajectories. In this paper we are also interested in the most
likely joint trajectories of robots and humans. To tackle the x(t) = x1 (t) × x2 (t) × . . . × xN (t) ∈ X N . (2)

194
We refer to a topological variant ψ of the joint trajectories as feature values. Hence, the model assumes that the agents are
a subset of X N in which all pairs of agents choose the same exponentially more likely to choose a trajectory with lower
sides when passing each other to avoid collisions. With this cost. In the following, we refer to θ as the feature weights.
definition, we can partition the space X N of joint trajectories The log-likelihood of the observed behavior D is given by
into its topological variants ψ ∈ Ψ such that 1 −θT fD
 Lpθ (D) = log e , (11)
ψ = X N , ∀ψi , ψj ∈ Ψ : ψi ∩ ψj = ∅. (3) Z(θ)
ψ∈Ψ and its derivative with respect to θ is given by

A feature fi is a function that maps joint trajectories to feature ∂
Lpθ (D) = p(x)f (x) dx − fD . (12)
values ∂θ x
fi : X N → R. (4) Eq. (12) reveals that the derivative of the log-likelihood with
respect to θ vanishes when the expected feature values match
We refer to a feature vector comprising several features as f .
the empirical feature values. In other words, we obtain the
B. Principle of Maximum Entropy and Feature Matching constraints that we introduced in Eq. (5). Consequently, the
We assume that the behaviors of the agents can be described problem of finding the maximum entropy distribution of
by a probability distribution of the joint trajectories, and, in joint trajectories subject to feature matching is equivalent to
particular, that it is a function of the feature values. Therefore, maximizing the likelihood of the training data when assuming
our goal is to find the probability distribution p that induces an exponential family distribution [14].
behavior such that the expected feature values of the agents Maximizing the log-likelihood of the demonstrations to find
match the empirical features fD of the demonstrated human the distribution p (x) translates to computing the weights θi .
joint trajectories D. Hence, we want The resulting optimization problem cannot be solved analyti-
cally. However, using the gradient given in Eq. (12), we can
1 
Ep(x) [f (x)] = fD = f (xk ). (5) apply gradient based optimization techniques.
|D|
xk ∈D C. Learning the Feature Weights of Human Navigation
Following Ziebart et al. [28], we are interested in the Given the continuous representation of joint trajectories (see
distribution that matches the feature expectations, as given Sec. IV-A), it is not feasible to compute the expected feature
in Eq. (5), without implying any further assumptions. In this values 
section, we outline how their approach can be applied to Epθ (x) [f (x)] = pθ (x)f (x) dx (13)
continuous spaces. The principle of maximum entropy states x
that the desired distribution maximizes the differential entropy for a given θ analytically by integrating over the high-

dimensional space of all trajectories. Furthermore, our experi-
argmax H(p) = argmax −p(x) log p(x)dx (6) ments suggest that sampling techniques such as Markov chain
p p x
Monte Carlo methods are not suitable for this task. Hence, we
subject to the constraints need to resort to an efficient approximation of Eq. (13).

Our experiments suggest that, given the features we pro-
p(x) dx = 1, (7)
pose, the modes of the probability distribution coincide with
x 
the different topological variants. We therefore propose to
∀i fi D = Ep(x) [fi (x)] = p(x)fi (x)dx. (8) approximate the feature expectations using the partition given
x
in Eq. (3). For each topological variant ψ ∈ Ψ, we compute
Introducing Lagrangian multipliers α and θi for these con-
the most likely joint trajectory xψ using Eq. (10). We estimate
straints yields the maximization problem
the feature expectations as

1 
p , α , θ = argmax −p(x) log p(x) dx
  
Epθ (x) [f (x)] ≈ f̂θ =   pθ (xψ )f (xψ ), (14)

p,α,θ x ψ pθ (xψ )
  ψ∈Ψ

− α( p(x) dx − 1) − θi ( p(x)fi (x) dx − fi D ). (9) where


x i x xψ = argmax pθ (x), s.t. x ∈ ψ. (15)
x
Applying the Euler-Lagrange equation from calculus of varia-
This means that we approximate the distribution of joint
tions (see [5, Sec. 1.4]) to Eq. (9) implies that the probability
trajectories using a weighted sum of Dirac delta functions at
distribution p (x) has the structure
the modes of the distribution that correspond to the topological
variants. This approximation can be interpreted as that the
1 −θT f (x)
pθ (x) = e , (10) agents are assumed to reason about a topological variant
Z(θ) in terms of its most likely joint trajectory. This is a rather
where Z(θ) is a normalization factor to satisfy Eq. (7). coarse approximation of the feature expectations. However,
Interestingly, the term θ T f (x) = i θi fi (x) can be inter- in our practical experiments, we never observed that this
preted as a cost function that depends on a weighted sum of approximation had a negative influence on the overall result.

195
Fig. 2. Recording real-world training data: The figures show snapshots of three human subjects avoiding each other in order to proceed to their next target
location. Using a motion capture system, we recorded one hour of human interactions by tracking markers that were attached to hats worn by the subjects. To
prevent the subjects from focussing on their walking behavior, which may result in awkward trajectories, we made them read and memorize newspaper articles
at various locations saying that we were conducting psychological research on the effects on attention when frequently changing locations.

IV. E FFICIENT R EPRESENTATION AND L EARNING OF 2) Acceleration: Pedestrians typically aim to walk efficiently,
T RAJECTORIES IN C ONTINUOUS E NVIRONMENTS avoiding unnecessary accelerations [13, 19]. Integrating the
A. Spline-Based Representation squared acceleration over the trajectory yields the feature

In this paper, we utilize a continuous representation of joint a
facceleration = ẍa (t)2 dt. (20)
trajectories that represents the positions of the agents over time. t
More precisely, we utilize cubic splines in R2 to represent the x 3) Velocity: Pedestrians have a desired walking velocity that
and y positions of the agents, as splines have several properties is uncomfortable to exceed [11]. We therefore use the feature
that are desirable for representations of human trajectories 
and at the same time have a finite dimensionality. Each spline a
fvelocity = ẋa (t)2 dt. (21)
segment t
saj : [tj , tj+1 ] → R2 (16) 4) Collision Avoidance: A further typical objective of human
defines the position of the agent a in the time interval [tj , tj+1 ]. navigation is to evade other pedestrians. We assume that evasive
Therefore, the position of the agent at time t is maneuvers depend on the distance between the agents and their
velocities:
xa (t) = saj (t), where t ∈ [tj , tj+1 ]. (17)  vta
a
a a a a f = dt, (22)
The control points xj := x (tj ) and vj := ẋ (tj ) fully specify distance
xa (t) − xb (t)2
b=a t
the trajectory since cubic splines have four degrees of freedom.
This allows us to compute the positions xa (t), the velocities where the pairwise components are summed up for all agents
ẋa (t), and the accelerations ẍa (t) in closed form, which is b = a. A similar feature could account for static obstacles. Note
convenient for efficiently computing the values of the features that these features give rise to the modes of the probability
that we will describe in the following section. distribution corresponding to the topological variants.
5) Topological Variants: Our model also allows for using
B. Features
high-level features that explicitly account for the decisions
According to recent studies, pedestrians seem to navigate in of the agents which sides they choose when passing each
crowded environments considering various criteria, such as time other to avoid collisions. For instance, a feature may count the
of travel, accelerations, and proximity to other pedestrians [13]. frequency of passing left versus passing right to account for
In our case, we express these criteria in terms of features cultural differences. A feature that captures groups of people
that map joint trajectories to real numbers. To account for may prevent the robot from moving between associated people,
cooperative human behavior, we sum up the features of the if this was observed in human behavior.
individual agents according to
 C. Computing the Maximum Likelihood Trajectories
fi (x) = fia (x). (18)
a As mentioned above, we approximate the distribution of joint
In our current implementation, we use the following features trajectories using a weighted sum of Dirac delta functions at
to capture human navigation behavior: the modes of the distribution that correspond to the topological
1) Time: The incentive of a pedestrian typically is to reach variants. We generate one representative of each ψ ∈ Ψ by
a certain target position as fast as possible [19]. We therefore listing all combinations of sides the agents pass each other in
introduce a feature a pairwise fashion, i.e., left or right. To compute the maximum
a a likelihood joint trajectory for each of the topological variants,
ftravel time = ttravel time (19) we apply gradient-based optimization techniques. We therefore
that is given by the time of travel to the target position. Note computed the partial derivatives of pθ (x) with respect to xaj
that to compute this feature, we need to estimate the target and vja introduced in Sec. IV-A and evaluated several gradient-
positions of all agents considered. based optimization techniques, namely gradient descent, the

196
humans our approach social forces humans our approach social forces humans our approach social forces

1 our approach 1 our approach 1 our approach


humans humans humans
0.5 social forces 0.5 social forces 0.5 social forces

0 0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

humans our approach social forces humans our approach social forces humans our approach social forces

1 our approach 1 our approach 1 our approach


humans humans humans
0.5 social forces 0.5 social forces 0.5 social forces

0 0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Fig. 3. Comparison of recorded human behavior and the corresponding predictions of our approach and the social forces method in six scenarios. For each
scenario, the plots show representative trajectories, where the agents’ target positions are indicated by dots. The bar plots visualize the probability of the eight
possible topological variants ordered by the probability estimated by our approach.

Levenberg-Marquardt algorithm [18], and RPROP [24]. In do so, the robot incorporates the current poses of the humans
empirical experiments, we found that RPROP performed and itself into the prediction process and implicitly “predicts”
best. To further improve convergence speed, we apply a its own trajectory, which it then follows. Our approach enables
hierarchical optimization scheme that first runs on a coarse the robot to efficiently evaluate multiple topological variants
discretization of the trajectories based on just a few spline of the trajectories online. Hence, it allows the robot to choose
control points, which quickly computes an approximate shape the most appropriate one given the current situation.
of the maximum likelihood joint trajectory. Our algorithm However, humans are likely to react differently to robots
then subsequently refines the estimate using increasingly finer- than to other humans, depending on the characteristics of the
grained discretizations in time. During optimization, we must robot. To solve this problem, we could record large data sets of
take care to not violate the constraint x ∈ Ψ given in Eq. (15). people interacting with an already installed robot. Alternatively,
In other words, we must not change x such that it belongs to the robot could be tele-operated by a human to teach it an
a different topological variant. We therefore check in each step appropriate navigation behavior.
of the optimization whether the integral over the derivative of VI. E XPERIMENTAL E VALUATION
the angle of the vectors between all pairs of agents changes
the sign. Hence, we have Our experimental evaluation demonstrates that our approach
 accurately predicts human navigation behavior, generalizing to
d b different scenarios. We conduct an experiment with a real robot
κ(a, b) = αa (t) dt, (23)
t dt that uses our approach for socially-compliant robot navigation.
We furthermore present a comparison that suggests that our
where αab refers to the angle between the vector xb − xa and
approach outperforms the popular social forces method.
the vector (1, 0)T . We reject optimization steps that violate
the constraint x ∈ ψ, as indicated by κ, and continue the A. Acquisition of Real-World Training Data
optimization with a decreased step size. To fit our motion model to human navigation behavior,
we recorded trajectories of human subjects using a motion
V. NAVIGATION IN P OPULATED E NVIRONMENTS
capture system (see Fig. 2). To prevent the subjects from
A socially compliant robot that navigates in a populated focussing on their walking behavior, which may result in
environment has to reason about future paths the humans are awkward trajectories, we made them read and memorize
likely to follow. The humans, however, will react to the actions newspaper articles at various locations saying that we were
of the robot themselves, thus the robot has to adopt its behavior, conducting psychological research on the effects on attention
which in turn affects the humans. To break up this infinite loop, when frequently changing locations.
our approach reasons about the joint trajectories that are likely We set up eight differently numbered locations in an area of
to be followed by all the agents, including the robot itself. To 7 m × 7 m, where each location was equipped with a different

197
newspaper article. We repeatedly asked the three subjects to
read the articles and simultaneously change their locations such
that their paths crossed each other, which forced the subjects
to avoid each other. Our setup gave rise to eight different
scenarios that repeatedly occurred after each cycle. In total, we
recorded one hour of human interaction, resulting in 96 runs,
1
i.e., 12 runs per scenario. Fig. 3 shows the trajectories recorded P (pass left)
during the first occurrence of the scenarios. We then applied 0.5 P (pass right)
the learning algorithm described in Sec. III to estimate the 0
feature weights that best capture the navigation behavior of time
the human subjects.
Fig. 4. An example encounter of a robot and a human where the robot can
B. Cross Validation on Real-World Data and Comparison to pursue its initial intent to pass on the left side.
the Social Forces Method
We performed a 10-fold cross validation on the recorded
data set described in the previous section to evaluate how well
our approach generalizes to different situations and how it
compares to the social forces method [10].
To allow for a fair comparison, we trained and evaluated
both models given the same data sets in each fold of the cross 1
validation. We applied stochastic gradient descent to compute P (pass left)
0.5 P (pass right)
the parameters of the social forces model that minimize the
norm of the distance between the resulting feature values and 0
time
the feature values of the behavior that was to be imitated.
We analyzed the norm of the discrepancy between the Fig. 5. The robot changes its initial intent to pass on the left side because
behavior predicted by the two methods and the recorded the human’s actions differ from the robot’s initial predictions.
human behavior averaged over all ten folds and obtained
fD − fθ 2 = 1.7, and fD − fsocial forces 2 = 7.2. This indicates
that our approach has a substantially better capability to capture passing left or passing right, seem more or less equally likely
and reproduce human behavior given the features we suggest. to the robot in terms of its probability distribution estimate
Fig. 3 provides insight into the results, which suggest that the over joint trajectories. Nevertheless, a slight tendency towards
trajectories predicted by our approach are substantially more the left side makes the robot commit to this topological variant.
human-like than the predictions of the social forces method. For As a consequence, the robot expects the human to evade to
instance, the jerky evasive movements predicted by the social the opposite side. The human, however, refuses to evade to
forces method do not replicate well the recorded behavior of any side but instead walks on a straight line towards his goal.
the humans. These movements can be attributed to the fact that The robot adapts its expectations to the new situation but still
the social forces method is reactive. In contrast to that, our decides to pass on the left side. After the situation has been
approach also reasons about future interactions of the agents resolved, the robot directly proceeds to its target position.
and is thus able to accurately predict the human behavior. Fig. 5 illustrates a similar initial situation, although the
Our model explicitly accounts for the topological variants human now behaves differently from the previous example as
the humans are likely to choose. The bar plots in Fig. 3 suggest the human insists on passing on the right side regardless of the
that our approach is able to predict a probability distribution robot’s visible intent to pass left. The robot thus re-evaluates
over the outcomes of these decisions. In contrast to that, the the probability distribution over the topological variants and
social forces method only yields a single estimate. decides to pass on the other side.

C. Real-World Robot Experiments D. Simulation Experiments


We implemented our approach on a mobile robot and carried In addition to the experiments based on real-world data,
out experiments in which a human and a robot pass each other we present a series of simulation experiments that aims to
while moving to their target positions. We tracked the poses evaluate the performance of our learning algorithm based on
of the robot and the humans using a visual motion capture synthetic data that adhere to our assumptions, in particular, the
system. Fig. 4 and Fig. 5 visualize the recorded trajectories approximation expressed in Eq. (14). We therefore randomly
of the robot and the humans. The bottom plot of each figure generated a set of 100 scenarios each with three agents by
visualizes how the estimated probability distribution of the uniformly sampling the start and goal positions of the agents in
topological variants evolves during the encounters. an area of 10 m × 10 m. We randomly set the feature weights θ
In the first image of Fig. 4, the belief of the robot reflects that subsequently gave rise to the behavior that is to be imitated.
the symmetry of the situation. Both topological variants, i.e., In the following, we demonstrate that our method was able to

198
15
10
2

2
fD − f θ

fk − fθ
10
5
5

0 0
0 20 40 60 80 100 120 140 0 100 200 300 400 500
iteration iteration k
DKL (P Q)

Fig. 7. Computing the maximum likelihood trajectories given the feature


2 weights θ. The plot shows the evolution of the mean and the standard deviation
of the distance between the current feature values and the resulting feature
1 values after convergence. Our approach carries out each iteration to optimize
60 spline control points in approximately 1 ms on a standard desktop computer.
0
0 20 40 60 80 100 120 140
iteration
sampled initial guesses. For each topological variant, our
Fig. 6. Learning the feature weights averaged over 10 randomly generated algorithm found a unique solution irrespective of the initial
data sets. Top: The evolution of the norm of the difference between the feature guess with a maximal deviation of less than 2 cm. This suggests
values of the behavior that is to be imitated and the expected feature values
of our approach. Bottom: The evolution of the Kullback-Leibler divergence
that our algorithm indeed finds the modes of the probability
between the true probability distribution P of the topological variants and the distribution corresponding to the topological variants.
probability distribution Q estimated by our approach. The plots illustrate that
both error measures converge to zero. F. Runtime
To use our approach for autonomous robot navigation, it is
accurately recover the generating model. necessary to compute the maximum likelihood trajectories in
Fig. 6 evaluates our approach during learning when the an efficient manner. We evaluated the convergence behavior of
exact feature expectations of the behavior to be imitated are our approach and provide results averaged over 100 random
known. We express the performance in terms of the norm scenarios. Fig. 7 visualizes the distance between the current
of the difference between the feature values of the observed feature values and the feature values of the maximum likelihood
behavior and the expected feature values of our approach. The trajectory after convergence. On a standard desktop computer,
figure furthermore visualizes the Kullback-Leibler divergence our implementation optimizes several topological variants in
between the estimated and the true probability distribution of parallel and carries out one iteration to optimize 60 spline
the topological variants. Both errors converge to zero, which control points in approximately 1 ms. This suggests that
suggests that our approach is able to exactly reproduce the our approach is able to predict human behavior online in
behavior in terms of the features and the decisions of which situations similar to our experiments. In more densely populated
sides the agents choose to pass each other. environments, however, it is not feasible to compute all
topological variants. We are currently investigating means for
We conducted another simulation experiment to evaluate the
efficiently pruning the graph of plausible topological variants.
performance of our approach for situations in which only few
observations of human behavior are available. We therefore
VII. C ONCLUSION
sampled a set D of n joint trajectories from the synthetic
data set described above, where n ∈ {10, 25, 50, 75, 100}. To In this paper, we presented a novel approach to predicting
generate the training samples, we first uniformly sampled one socially compliant trajectories in populated environments. Our
of the scenarios and then sampled a topological variant from approach is based on features that capture relevant aspects of
the corresponding probability distribution. We then applied the trajectories. It applies the principle of maximum entropy
our approach to recover the feature weights θ based on the to elicit the probability distribution that governs the navigation
sampled observations D and their empirical feature values fD . behavior of people from real-world observations. We have
We performed a 10-fold cross validation, which, averaged over implemented and tested our method with extensive data sets and
all folds, resulted in the following values: fD10 − fθ 2 ≈ 0.64, real robots. The experimental results demonstrate the efficacy
fD25 −fθ 2 ≈ 0.44, fD50 −fθ 2 ≈ 0.33, fD75 −fθ 2 ≈ 0.25, of our method and suggest that it leads to natural, socially
and fD100 − fθ 2 ≈ 0.21. compliant trajectories. In particular, our algorithm seems to
better capture characteristics of human trajectories than the
E. Computing the Maximum Likelihood Joint Trajectories social forces method, a state-of-the-art approach.
Corresponding to the Topological Variants
ACKNOWLEDGMENTS
We require our gradient-based optimization approach de-
scribed in Sec. IV-C to converge to a unique maximum This work has partly been supported by the EC under FP7-
within each topological variant. To evaluate the convergence ICT-248873-RADHAR, by the German Research Foundation
of our approach, we computed the maximum likelihood joint (DFG) under contract number SFB/TR-8, and by the Hans
trajectories of the real-world data set given a set of randomly L. Merkle-Stiftung.

199
R EFERENCES Robots and Systems (IROS), pages 1248–1253, 1998.
[17] A. Lerner, Y. Chrysanthou, and D. Lischinski. Crowds
[1] G. Arechavaleta, J.-P. Laumond, H. Hicheur, and by example. Computer Graphics Forum, 26(3):655–664,
A. Berthoz. An optimality principle governing human 2007.
walking. IEEE Transactions on Robotics (T-RO), 24(1): [18] D. W. Marquardt. An algorithm for least-squares esti-
5–14, 2008. mation of nonlinear parameters. Journal of the Society
[2] M. Bennewitz, W. Burgard, G. Czielniak, and S. Thrun. for Industrial and Applied Mathematics, 11(2):431–441,
Learning motion patterns of people for compliant robot 1963.
motion. International Journal of Robotics Research [19] K. Mombaur, A. Truong, and J.-P. Laumond. From human
(IJRR), 24(1):31–48, 2005. to humanoid locomotion – an inverse optimal control
[3] W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lake- approach. Autonomous Robots, 28:369–383, 2010.
meyer, D. Schulz, W. Steiner, and S. Thrun. Experiences [20] J. Müller, C. Stachniss, K.O. Arras, and W. Burgard.
with an interactive museum tour-guide robot. Artificial Socially inspired motion planning for mobile robots in
Intelligence (AIJ), 114(1-2):3–55, 2000. populated environments. In Proceedings of the Interna-
[4] P. Fiorini and Z. Shillert. Motion planning in dynamic tional Conference on Cognitive Systems (COGSYS), pages
environments using velocity obstacles. International 85–90, 2008.
Journal of Robotics Research (IJRR), 17:760–772, 1998. [21] S. Pellegrini, A. Ess, M. Tanaskovic, and L. Van Gool.
[5] C. Fox. An Introduction to the Calculus of Variations. Wrong turn – no dead end: A stochastic pedestrian motion
Courier Dover Publications, 1987. model. In Proceedings of the Computer Vision and Pattern
[6] D. Fox, W. Burgard, and S. Thrun. The dynamic window Recognition Workshops (CVPRW), pages 15–22, 2010.
approach to collision avoidance. IEEE Robotics & [22] R. Philippsen and R. Siegwart. Smooth and efficient
Automation Magazine (RAM), 4(1):23–33, 1997. obstacle avoidance for a tour guide robot. In Proceedings
[7] T. Fraichard and V. Delsart. Navigating dynamic environ- of the IEEE International Conference on Robotics and
ments with trajectory deformation. Journal of Computing Automation (ICRA), pages 446–451, 2003.
and Information Technology (CIT), 17(1):27–36, 2009. [23] S. Quinlan and O. Khatib. Elastic bands: Connecting
[8] E. Goffman. Relations in Public: Microstudies of the path planning and control. In Proceedings of the IEEE
Public Order. New York: Basic Books, 1971. International Conference on Robotics and Automation
[9] S.J. Guy, M.C. Lin, and D. Manocha. Modeling collision (ICRA), pages 802–807, 1993.
avoidance behavior for virtual humans. In Proceedings of [24] M. Riedmiller and H. Braun. A direct adaptive method for
the Ninth International Conference on Autonomous Agents faster backpropagation learning: The RPROP algorithm.
and Multi-Agent Systems (AAMAS), pages 575–582, 2010. In Proceedings of the IEEE International Conference on
[10] D. Helbing and A. Johansson. Pedestrian, crowd and Neural Networks (ICNN), pages 586–591, 1993.
evacuation dynamics. In Robert A. Meyers, editor, [25] S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B.
Encyclopedia of Complexity and Systems Science, pages Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg,
6476–6495. Springer New York, 2009. N. Roy, J. Schulte, and D. Schulz. Probabilistic algorithms
[11] D. Helbing and P. Molnar. Social force model for and the interactive museum tour-guide robot Minerva.
pedestrian dynamics. Physical Review E (PRE), 51:4282– International Journal of Robotics Research (IJRR), 19
4286, 1995. (11):972–999, 2000.
[12] P. Henry, C. Vollmer, B. Ferris, and D. Fox. Learning to [26] P. Trautman and A. Krause. Unfreezing the robot:
navigate through crowded environments. In Proceedings Navigation in dense, interacting crowds. In Proceedings
of the IEEE International Conference on Robotics and of the IEEE/RSJ International Conference on Intelligent
Automation (ICRA), pages 981–986, 2010. Robots and Systems (IROS), pages 797–803, 2010.
[13] S. Hoogendoorn and P. H.L. Bovy. Simulation of [27] J. van den Berg, S.J. Guy, M.C. Lin, and D. Manocha.
pedestrian flows by optimal control and differential games. Reciprocal n-body collision avoidance. In Proceedings of
Optimal Control Applications and Methods, 24(3):153– the International Symposium of Robotics Research (ISRR),
172, 2003. pages 3–19, 2009.
[14] E. T. Jaynes. Where do we stand on maximum entropy. [28] B.D. Ziebart, A. Maas, J.A. Bagnell, and A.K. Dey.
Maximum Entropy Formalism, pages 15–118, 1978. Maximum entropy inverse reinforcement learning. In
[15] A. Johansson, D. Helbing, and P. Shukla. Specification Proceedings of the AAAI Conference on Artificial Intelli-
of the social force pedestrian model by evolutionary gence (AAAI), pages 1433–1438, 2008.
adjustment to video tracking data. Advances in Complex [29] B.D. Ziebart, N. Ratliff, G. Gallagher, C Mertz, K. Peter-
Systems (ACS), 10:271–288, 2007. son, J.A. Bagnell, M. Hebert, A.K. Dey, and S. Srinivasa.
[16] E. Kruse and F.M. Wahl. Camera-based monitoring Planning-based prediction for pedestrians. In Proceedings
system for mobile robot guidance. In Proceedings of of the IEEE/RSJ International Conference on Intelligent
the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3931–3936, 2009.

200
Variational Bayesian Optimization for
Runtime Risk-Sensitive Control
Scott Kuindersma Roderic Grupen Andrew Barto
Computer Science Department Computer Science Department Computer Science Department
University of Massachusetts Amherst University of Massachusetts Amherst University of Massachusetts Amherst
[email protected] [email protected] [email protected]

Abstract—We present a new Bayesian policy search algorithm are typically designed to balance exploration and exploitation
suitable for problems with policy-dependent cost variance, a with the intention of minimizing the total number of policy
property present in many robot control tasks. We extend recent evaluations. These properties make Bayesian optimization
work on variational heteroscedastic Gaussian processes to the
optimization case to achieve efficient minimization of very noisy attractive for robotics since cost functions often have multiple
cost signals. In contrast to most policy search algorithms, our local minima and policy evaluations are typically expensive.
method explicitly models the cost variance in regions of low It is also straightforward to incorporate approximate prior
expected cost and permits runtime adjustment of risk sensitivity knowledge about the distribution of cost (such as could be
without relearning. Our experiments with artificial systems and obtained from simulation) and enforce hard constraints on the
a real mobile manipulator demonstrate that flexible risk-sensitive
policies can be learned in very few trials. policy parameters.
Previous implementations of Bayesian optimization have
I. I NTRODUCTION assumed the variance of the cost signal is the same for all
Experiments on physical robot systems are typically asso- policies in the search space, which is not true in general. In
ciated with significant practical costs, such as experimenter this work, we propose a new type of Bayesian optimization
time, money, and robot wear and tear. However, such ex- algorithm that relaxes this assumption and efficiently captures
periments are often necessary due to the extreme difficulty both the expected cost and cost variance in regions of low cost.
associated with constructing simulated systems of sufficiently Specifically, we extend recent work developing a variational
high fidelity that behaviors translate to hardware without Gaussian process model for problems with input-dependent
performance loss. For many nonlinear systems, it can even noise (or heteroscedasticity) [15] to the optimization case
be infeasible to perform simulations or construct a reasonable by deriving an expression for expected improvement [22], a
model. commonly used criterion for selecting the next policy, and
For this reason, model-free policy search methods have incorporating log priors into the optimization to improve
become one of the standard tools for constructing controllers numerical performance. We also consider the use of confidence
for robot systems [27, 23, 12, 29, 17, 11]. These algorithms bounds to produce runtime changes to risk sensitivity and
are designed to minimize the expected value of a noisy cost derive a generalized expected risk improvement criteria that
ˆ
signal, J(θ), by adjusting policy parameters, θ, for a fixed balance exploration and exploitation in risk-sensitive setting.
class of policies. By considering only the expected cost of We evaluate the algorithm in experiments with synthetic
a policy and ignoring cost variance, the solutions found by systems and a dynamic stabilization task using a real mobile
these algorithms are by definition risk-neutral, where the term manipulator.
risk is equivalent to cost variance. However, for systems that
II. BACKGROUND
operate in a variety of contexts, it can be advantageous to have
a more flexible attitude toward risk. For example, a humanoid A. Bayesian Optimization
otherwise capable of a fast and energy efficient gait might Bayesian optimization algorithms are a family of general
adopt a more predictable, possibly less energy efficient gait stochastic optimization techniques that are well suited to
when operating near a large crater. Indeed, studies in human problems where noisy samples of a cost function, J(θ),ˆ are
motor control and animal behavior suggest that variable risk expensive to obtain [17, 6, 3, 28, 13]. In the control context,
sensitivity may be pervasive in nature [2, 1]. Bayesian optimization methods use data from previous policy
Recently there has been increased interest in applying evaluations to compute a nonparameteric distribution of cost
Bayesian optimization algorithms to solve model-free policy in policy parameter space. Given this distribution, a decision-
search problems [17, 19, 20, 33, 28, 13]. In contrast to well- theoretic selection criterion is used to globally select policy
studied policy gradient methods [23], Bayesian optimization parameter values, θ, that, e.g., have a high probability of
algorithms perform policy search by building a distribution having low cost or have high cost uncertainty.
of cost in policy parameter space and applying a selection Most Bayesian optimization implementations represent the
criterion to globally select the next policy. Selection criteria prior over cost functions as a Gaussian process (GP). To fully

201
specify the GP prior, J(θ) ∼ GP(m(θ), kf (θ, θ  )), one must gradient, ∂EI(θ)/∂θ, are efficiently computable, so we can
define a mean function and a covariance (kernel) function, apply standard nonlinear optimization methods to maximize EI
m(θ) = E[J(θ)] and kf (θ, θ  ) = E[(J(θ) − m(θ  ))(J(θ) − to select the next policy. In practice, a parameter ξ is often used
m(θ  ))]. Typically, we set m(θ) = 0 and let kf (θ, θ  ) take to adjust the balance of exploration and exploitation, u∗ =
on one of several standard forms. A common choice is the (μbest − E[Jˆ∗ ] + ξ)/s∗ , where ξ > 0 leads to an optimistic
anisotropic squared exponential kernel, estimate of improvement and tends to encourage exploration
1 of regions of high uncertainty. Cost scale invariance can be
kf (θ, θ  ) = σf2 exp(− (θ − θ  ) M (θ − θ  )), (1) achieved by multiplying ξ by the signal standard deviation,
2
σf [18].
where σf2 is the signal variance and M = diag( −2 f ) is a
diagonal matrix of length scale hyperparameters. Intuitively, B. Variational Heteroscedastic Gaussian Process Regression
the signal variance captures the overall magnitude of the cost One limitation of the standard regression model is the
function variation and the length scales capture the sensitivity assumption of i.i.d. noise over the input space (see equa-
of the cost with respect to changes in each policy parameter. If tion (2)). Many data do not adhere to this simplification and
prior information regarding the shape of the cost distribution models capable of capturing heteroscedasticity are required.
is available, e.g., from simulation experiments, the mean func- The heteroscedastic regression model takes the form
tion and kernel hyperparameters can be set accordingly [17].
However, in many cases such information is not available, so ˆ
J(θ) = J(θ) + ε(θ), ε(θ) ∼ N (0, r(θ)), (6)
these quantities are optimized using maximum likelihood or
maximum a posteriori techniques [24]. where the noise variance, r(θ), is dependent on the in-
Samples of the latent cost function are assumed to have put. In the Bayesian setting, a second GP prior, g(θ) ∼
additive i.i.d. noise: GP(μ0 , kg (θ, θ  )), is placed over the unknown log variance
function, g(θ) ≡ log r(θ) [7, 10, 15]. This heteroscedastic
ˆ
J(θ) = J(θ) + ε, ε ∼ N (0, σn2 ). (2) Gaussian process (HGP) model has the unfortunate property
that the computations of the posterior distribution and the
Given a GP prior and data, Θ = [θ1 , θ2 , . . . , θN ] ∈ RN ×|θ| ,
ˆ 2 ), . . . , J(θ
ˆ 1 ), J(θ ˆ N )] ∈ RN , one can compute the marginal likelihood are intractable, thus making hyperparam-
y = [J(θ
eter optimization and prediction difficult.
posterior (predictive) cost distribution for a policy parameter-
In the variational heteroscedastic Gaussian process (VHGP)
ized by θ∗ as Jˆ∗ ≡ J(θˆ ∗ ) ∼ N (E[Jˆ∗ ], s2 ),
∗ model [15], a variational lower bound on the marginal
E[Jˆ∗ ] = kf ∗ (Kf + σn2 I)−1 y (3) likelihood of the HGP model serves as a tractable surro-
gate function for optimizing the hyperparameters. Let g =
s2∗ = kf (θ∗ , θ∗ ) − kf ∗ (Kf + σn2 I)−1 kf ∗ , (4)
[g(θ1 ), g(θ2 ), . . . , g(θN )] be the vector of latent log noise
where kf ∗ = [kf (θ1 , θ∗ ), kf (θ2 , θ∗ ), . . . , kf (θN , θ∗ )] and variances for the N data points. By defining a normal vari-
Kf is the positive-definite kernel matrix, [Kf ]ij = kf (θi , θj ). ational probability density, q(g) ∼ N (μ, Σ), the marginal
When the hyperparameters are unknown, the log marginal variational bound can be derived [15],
likelihood, log p(y|Θ, σf , f ), is commonly used to perform
F (μ, Σ) = log N (y|0, Kf + R) − 14 tr(Σ)
an optimization before computing the posterior [24]. It is
straightforward to compute log p(y|Θ, σf , f ) and its gra- − KL(N (g|μ, Σ)||N (g|μ0 1, Kg )), (7)
dients, so we are free to choose from existing nonlinear
where R is a diagonal matrix with elements [R]ii =
optimization methods to perform the optimization.
e[μ]i −[Σ]ii /2 . Intuitively, by maximizing equation (7) with
To select the (N +1)th policy parameters, we optimize a se-
respect to the variational parameters, μ and Σ, we maximize
lection criterion computed on the posterior. A common choice
the log marginal likelihood under the variational approxima-
is expected improvement (EI) [22, 3], which is defined as the
tion while minimizing the distance (in the Kullback-Leibler
expected value of the improvement, I, over the expected cost
sense) between the variational distribution and the distribution
of the best policy previously evaluated. Since the predictive
implied by the GP prior. By exploiting properties of F (μ, Σ)
distribution under the GP model is Gaussian, the improvement
at its maximum, one can write μ and Σ in terms of N
for policy, θ∗ , is also Gaussian, I∗ ∼ N (μbest − E[Jˆ∗ ], s2∗ ),
ˆ i )]. Considering only non- variational parameters [15]:
where μbest = mini=1,...,N E[J(θ
negative improvements, the expected improvement is 1
μ = Kg (Λ − I)1 + μ0 1, Σ−1 = K−1
g + Λ,
 ∞ 2
EI(θ∗ ) = I∗ p(I∗ )dI∗ where Λ is a positive semidefinite diagonal matrix of vari-
0
= s∗ (u∗ Φ(u∗ ) + φ(u∗ )), (5) ational parameters. F (μ, Σ) can be simultaneously maxi-
mized with respect to the variational parameters and the HGP
where u∗ = (μbest −E[Jˆ∗ ])/s∗ , and Φ(·) and φ(·) are the CDF model hyperparameters, Ψf and Ψg . If the kernel functions
and PDF of the normal distribution, respectively. If s∗ = 0, kf (θ, θ  ) and kg (θ, θ  ) are squared exponentials (1), then
the expected improvement is defined to be 0. Both (5) and its Ψf = {σf , f } and Ψg = {μ0 , σg , g }.

202

The VHGP model yields a non-Gaussian variational predic- w = (g∗ − μ∗ )/ 2σ∗ and replacing all occurrences of g∗ in
tive density, the expressions for v∗ and u∗ ,
 
v∗
e−w √
2
q(Jˆ∗ ) = N (Jˆ∗ |a∗ , c2∗ + eg∗ )N (g∗ |μ∗ , σ∗2 )dg∗ , (8) EI(θ∗ ) = [u∗ Φ(u∗ ) + φ(u∗ )] dw,
 2πσ∗
e−w h(w)dw.
2
where ≡ (13)
a∗ = kf ∗ (Kf + R)−1 y, Similarly, the gradient ∂EI(θ)/∂θ can be computed under the
c2∗ = kf (θ∗ , θ∗ ) − kf ∗ (Kf + R)−1 kf ∗ , integral (12) and the result is of the desired form:
1 
kg∗ (Λ − I)1 + μ0 , ∂EI(θ∗ )
e−w z(w)dw,
2
μ∗ = = (14)
2 ∂θ
σ∗2 = kg (θ∗ , θ∗ ) − kg∗ (Kg + Λ−1 )−1 kg∗ .
where
Although this predictive density is still intractable, its mean 1 1
and variance can be calculated in closed form [15]: z(w) = √ v∗ (u∗ Φ(u∗ ) + φ(u∗ ))
σ
 2πσ∗ ∗ 
∂σ∗ 2 ∂σ∗
√ ∂μ∗
Eq [Jˆ∗ ] = a∗ , (9) × − + 2w + 2w
∂θ ∂θ ∂θ
Vq [Jˆ∗ ] = c2∗ + exp(μ∗ + σ∗2 /2) ≡ s2∗ . (10)
∂v∗ ∂u∗
+ (u∗ Φ(u∗ ) + φ(u∗ )) + v∗ Φ(u∗ ) .
III. VARIATIONAL BAYESIAN O PTIMIZATION ∂θ ∂θ
There are two practical motivations for capturing policy- As in the standard Bayesian optimization setting, we can
dependent variance structure during optimization. First, met- easily incorporate an exploration parameter, ξ, by setting
rics computed on the predictive distribution, such as EI and u∗ = (μbest −a∗ +ξ)/v∗ . EI can be maximized using standard
probability of improvement, will return more meaningful val- nonlinear optimization algorithms. Since flat regions and mul-
ues for the system under consideration. Second, it creates the tiple local maxima may be present, it is common practice to
opportunity to employ policy selection criteria that take cost perform random restarts during EI optimization to avoid low-
variance into account, i.e., that are risk-sensitive. quality solutions. In our experiments, we use the NLOPT [8]
We extend the VHGP model to the optimization case by implementation of sequential quadratic programming (SQP)
deriving the expression for expected improvement and its with 25 random restarts to optimize EI.
gradient and show that both can be efficiently approximated
B. Confidence Bound Selection
to several decimal places using Gauss-Hermite quadrature (as
is the case for the predictive distribution itself [15]). We show In order to exploit cost variance information for policy
how efficiently computable confidence bound selection criteria selection, we must consider selection criteria that flexibly take
can be used to select risk-sensitive policies and generalize cost variance into account. Although EI performs well during
the expected improvement criterion. To address numerical learning by balancing exploration and exploitation, it falls
issues that arise when N is small (i.e. in the early stages short in this regard since it always favors high variance or high
of optimization), we incorporate independent log priors into uncertainty among solutions with equivalent expected cost. In
the marginal variational bound and identify heuristic sampling contrast, confidence bound (CB) selection criteria allow one
strategies that perform well empirically. to directly specify the sensitivity to cost variance.
The family of confidence bound selection criteria have the
A. Expected Improvement general form
The improvement, I∗ , of a policy, θ∗ , under the variational CB(θ∗ , κ) = E[Ĵ∗ ] + b(V[Ĵ∗ ], κ), (15)
predictive distribution (8) is
 where b(·, ·) is a function of the cost variance and a constant κ
q(I∗ ) = N (I∗ |μbest − a∗ , v∗2 )N (g∗ |μ∗ , σ∗2 )dg∗ , (11) that controls the system’s sensitivity to risk. Such criteria have
been extensively studied in the context of statistical global
where v∗2 = c2∗ + eg∗ . The expression for EI then becomes optimization [5, 26] and economic decision making [16].
 ∞ Favorable regret bounds ' for sampling with CB criteria of the
form b(V[J∗ ], κ) = κ V[J∗ ] ≡ κs∗ have also been derived
EI(θ∗ ) = I∗ q(I∗ )dI∗ ,
for certain types of Bayesian optimization problems [26].
0
Interestingly, CB criteria have a strong connection to the
= v∗ [u∗ Φ(u∗ ) + φ(u∗ )] N (g∗ |μ∗ , σ∗2 )dg∗ , (12)
exponential utility functions of risk-sensitive optimal control
(RSOC) [32, 31]. Consider the standard RSOC objective
where u∗ = (μbest − a∗ )/v∗ . Although this expression is not function,
analytically tractable, it can be efficiently approximated using
γ(κ) = −2κ−1 log E[e− 2 κJ∗ ].
1 ˆ
Gauss-Hermite quadrature. This can be made clear by setting (16)

203
Taking the second order Taylor expansion of e− 2 κJ∗ about
1 ˆ
on the estimated cost distribution, it is most appropriate to
E[Jˆ∗ ] yields apply after a reasonable initial estimate of the cost distribution
1 has been obtained.
γ(κ) ≈ E[Jˆ∗ ] − κV[Jˆ∗ ]. (17) The Variational Bayesian Optimization (VBO) algorithm is
4
outlined in Box 1.
This approximation exposes the role of the parameter κ in
determining the risk sensitivity of the system: κ < 0 is risk- Algorithm 1 Variational Bayesian Optimization
averse, κ > 0 is risk-seeking, and κ = 0 is risk-neutral [31]. Input: Previous evaluations: Θ, y, Iterations: n
Thus, policies selected according to a CB criterion with
1) for i := 1 : n
b(V[Jˆ∗ ], κ) = − 14 κV[Jˆ∗ ] can be viewed as approximate RSOC
solutions. Furthermore, since the selection is performed with a) Maximize equation (18) given Θ, y
resect to the predictive distribution (8), policies with different Ψ+ + +
f , Ψg , Λ := arg max F̂ (μ, Σ)
risk characteristics can be selected on-the-fly, without having b) Optimize selection criterion, EI, ERI, or CB, w.r.t.
to perform separate optimizations that require additional policy optimized model
executions on the system. θ  := arg maxθ S(θ, Ψ+ + +
f , Ψg , Λ )
 ˆ )
We can also apply confidence bound criteria to generalize c) Execute θ , observe cost, J(θ
EI to an expected risk improvement (ERI) criterion. We define d) Append Θ := [Θ; θ  ], y := [y; J(θ
ˆ  )]
risk improvement for a policy θ∗ as I∗κ = μbest + κsbest − 2) Return Θ, y
Jˆ∗ − κs∗ , where
 μbest and sbest are found by minimizing
Eq [Jθi ] + κ Vq [Jˆθi ] over all previously evaluated policies.
ˆ IV. E XPERIMENTS
Thus, ERI can be viewed as a generalization of EI where A. Synthetic Data
u∗ = (μbest − a∗ + κ(sbest − s∗ ))/v∗ and ERI = EI if κ = 0.
As an illustrative example, in Figure 1 we compare the
C. Coping with Small Sample Sizes performance of the VBO to standard Bayesian optimization
in a simple 1-dimensional noisy optimization task. For this
1) Log Hyperpriors: To avoid numerical instability in the
task, the true underlying cost distribution (Figure 1(a)) has two
hyperparameter optimization when N is small, we augment
global minima with different cost variances. Both algorithms
F (μ, Σ) with independent log-normal priors [18] for each
begin with the same N0 = 10 random samples and perform 10
hyperparameter in the VHGP model,
 iterations of EI selection (ξ = 1.0,  = 0.25). In Figure 1(b),
F̂ (μ, Σ) = F (μ, Σ) + log N (log ψk |μk , σk2 ), (18) we see that Bayesian optimization succeeds in identifying the
ψk ∈Ψ regions of low cost, but it cannot capture the policy-dependent
variance characteristics.
where Ψ = Ψf ∪ Ψg is the set of all hyperparameters. In
In contrast, VBO reliably identifies the minima and approx-
practice, these priors can be quite vague and thus do not
imates the local variance characteristics. Figure 1(d) shows the
require significant experimenter insight. For example, in our
result of applying two different confidence bound selection cri-
experiments we set the log prior on length scales so that the
teria to vary risk sensitivity. Here we maximized −CB(θ∗ , κ),
width of the 95% confidence region is at least 20 times the
where
actual policy parameter range.
As is the case with standard marginal likelihood maximiza- CB(θ∗ , κ) = Eq [Jˆ∗ ] + κs∗ (19)
tion, F̂ (μ, Σ) may have several local optima. In practice,
performing random restarts helps avoid low-quality solutions and κ = −1.5 and κ = 1.5 were used to select a risk-seeking
(especially when N is small). In our experiments, we perform and risk-averse policy parameters, respectively.
10 random restarts using SQP for policy selection. B. Noisy Pendulum
2) Sampling: It is well known [9] that selecting policies
based on distributions fit using very little data can lead to As another simple example, we considered a swing-up task
myopic sampling and premature convergence. Incorporating for a noisy pendulum system. In this task, the maximum torque
external randomization is one way to help alleviate this prob- output of the pendulum actuator is unknown and is drawn from
lem. For example, it is common to obtain a random sample a normal distribution at the beginning of each episode. As a
of N0 initial policies prior to performing optimization. We rough physical analogy, this might be understood as fluctu-
have found that sampling according to EI with probability ations in motor performance that are caused by unmeasured
1− and randomly otherwise performs well empirically. In the changes in temperature. The policy space consisted of “bang-
standard Bayesian optimization setting with model selection, bang” policies in which the maximum torque is applied in the
-random EI selection has been shown to yield near-optimal positive or negative direction, with switching times specified
global convergence rates [4]. Randomized CB selection with, by two parameters, 0 ≤ t1 , t2 ≤ 1.5 sec. Thus, θ = [t1 , t2 ].
e.g., κ ∼ N (0, 1) can also be applied when the policy search is The cost function was defined as
 T
aimed at identifying a spectrum of policies with different risk
J(θ) = 0.01α(t) + 0.0001u(t)2 dt, (20)
sensitivities. However, since this technique relies completely 0

204
(a) (b)

Fig. 2. The cost distribution for the simulated noisy pendulum system
obtained by a 20x20 discretization of the policy space. Each policy was
evaluated 100 times to estimate its mean and variance (N = 40000).

(c) (d)

Fig. 1. (a) A noisy cost function sampled during 10 iterations (N0 = 10) of
(b) Bayesian optimization and (c) the VBO algorithm. Bayesian optimization
succeeded in identifying the minima, but it cannot distinguish between high
and low variance solutions. (d) Confidence bound selection criteria are used
to select risk-seeking and risk-averse policy parameter values.

where 0 ≤ α(t) ≤ π is the pendulum angle measured from


upright vertical, T = 3.5 sec, and u(t) = τmax if 0 ≤ t ≤ t1 ,
u(t) = −τmax if t1 < t ≤ t1 + t2 , and u(t) = τmax if
t1 + t2 < t ≤ T . The system always started in the downward
vertical position with zero initial velocity and the episode Fig. 3. Estimated cost distribution after 25 iterations of VBO (N = 40).
The optimization algorithm focuses modeling effort to regions of low cost.
terminated if the pendulum came within 0.1 radians of the
upright vertical position. The parameters of the system were It is often easy to understand the utility of risk-averse
l = 1.0 m, m = 1.0 kg, and τmax ∼ N (4, 0.32 ) Nm. and risk-neutral policies, but the motivation for selecting
With these physical parameters, the pendulum must (with risk-seeking policies may be less clear. The above result
probability ≈ 1.0) perform at least two swings to reach vertical suggests one possibility: the acquisition of specialized, high-
in less than T seconds. performance policies. For example, in some cases risk-seeking
The cost function (20) suggests that policies that reach policies could be chosen in an attempt to identify observable
vertical as quickly as possible (i.e., using the fewest swings) initial conditions that lead to rare low-cost events. Subsequent
are preferred. However, the success of an aggressive policy optimizations might then be performed to direct the system
depends on the torque generating capability of the pendulum. to these initial conditions. One could also imagine situations
With a noisy actuator, we expect aggressive policies to have where the context might demand performance that lower risk
higher variance. An approximation of the cost distribution policies are very unlikely to generate. For example, if the
obtained via discretization (N = 40000) is shown in Figure 2. minimum time to goal was reduced so that only two swing
Here we indeed see that regions around policies that attempt policies had a reasonable chance of succeeding. In such
two-swing solutions (θ = [0.0, 1.0], θ = [1.0, 1.5]) have low instances it may be desirable to select higher risk policies,
expected cost, but high cost variance. even if the probability of succeeding is quite low.
Figure 3 shows the results of 25 iterations of VBO using EI
selection (N0 = 15, ξ = 1.0,  = 0.2) in the noisy pendulum C. Balance Recovery with the uBot-5
task. After N = 40 total evaluations, the expected cost and The uBot-5 is an 11-DoF mobile manipulator that has two
cost variance are sensibly represented in regions of low cost. 4-DoF arms, a rotating trunk, and two wheels in a differential
Figure 4 illustrates two policies selected by minimizing the drive configuration. The robot has a mass of 19 kg and stands
CB criterion (19) on the learned distribution with κ = ±2.0. 60 cm from the ground with a torso that is roughly similar to
The risk-seeking policy (θ = [1.03, 1.5]) makes a large initial a small adult human in scale and geometry (Figure 5). The
swing, attempting to reach the vertical position in two swings. robot balances using a linear-quadratic regulator (LQR) with
The risk-averse policy (θ = [0.63, 1.14]) always produces feedback from an onboard inertial measurement unit (IMU).
three swings and exhibits low cost variance, though it has In our previous experiments [13], the energetic and stabi-
higher cost than the risk-seeking policy when the maximum lizing effects of rapid arm motions on the LQR stabilized
torque is large. system were evaluated in the context of recovery from impact

205
(a) τmax = 3.4 Nm (b) τmax = 4.0 Nm (c) τmax = 4.6 Nm
J(θ) = 18.2 J(θ) = 17.0 J(θ) = 15.9




 

(d) τmax = 3.4 Nm (e) τmax = 4.0 Nm (f) τmax = 4.6 Nm


J(θ) = 19.9 J(θ) = 17.7 J(θ) = 13.0 Fig. 5. The uBot-5 situated in the impact pendulum apparatus.

Fig. 4. Performance of risk-averse (a)-(c) and risk-seeking (d)-(f) policies


as the maximum pendulum torque is varied. fixed, low-gain linear position controller.
The cost function was designed to encourage energy effi-
perturbations. One observation we made was that high energy cient solutions that successfully stabilized the system,
impacts caused a subset of possible recovery policies to have  T
1
high cost variance: successfully stabilizing in some trials, J(θ) = h(x(T )) + I(t)V (t)dt, (21)
while failing to stabilize in others. We extend these experi- 0 10
ments by considering larger impact perturbations, increasing where I(t) and V (t) are the total absolute motor current and
the set of arm initial conditions, defining a policy space that voltage at time t, respectively, T = 3.5 sec, and h(x(T )) = 5
permits more flexible, asymmetric arm motions. if x(T ) ∈ F ailureStates, otherwise h(x(T )) = 0.
The robot was placed in a balancing configuration with its After 15 random initial trials, we applied VBO with EI
upper torso aligned with a 3.3 kg mass suspended from the selection (ξ = 1.0,  = 0.2) for 15 episodes and randomized
ceiling (Figure 5). The mass was pulled away from the robot CB selection (κ ∼ N (0, 1)) for 15 episodes resulting in a total
to a fixed angle and released, producing a controlled impact of N = 45 policy evaluations. Since the left and right pitch
between the swinging mass and the robot. The pendulum parameters are symmetric with respect to cost, we imposed
momentum prior to impact was 9.9 ± 0.8 Ns and the resulting an arbitrary ordering constraint, λleft ≥ λright , during policy
impact force was approximately equal to the robot’s total mass selection.
in earth gravity. The robot was consistently unable to recover After training, we evaluated four policies with different risk
from this perturbation using only the wheel LQR (see the sensitivity selected by minimizing the CB criterion (19) with
rightmost column of Figure 6). κ = 2, κ = 0, κ = −1.5, and κ = −2. Each selected policy
This problem is well suited for model-free policy opti- was evaluated 10 times and the results are shown in Figure 6.
mization since there are several physical properties, such as The sample statistics confirm the algorithmic predictions about
joint friction, wheel backlash, and tire slippage, that make the relative riskiness of each policy. In this case, the risk-averse
the system difficult to model accurately. In addition, although and risk-neutral policies were very similar (no statistically
the underlying state and action spaces are high dimensional significant difference between the mean or variance), while the
(22 and 8, respectively), low-dimensional policy spaces that two risk-seeking policies had higher variance (for κ = −2,
contain high-quality solutions are relatively straightforward to the differences in both the sample mean and variance were
identify. statistically significant).
The parameterized policy controlled each arm joint ac- For κ = −2, the selected policy produced an upward
cording to an exponential trajectory, τi (t) = e−λi t , where laterally-directed arm motion that failed approximately 50%
0 ≤ τi (t) ≤ 1 is the commanded DC motor power for joint i of the time. In this case, the standard deviation of cost
at time t. The λ parameters were paired for the shoulder/elbow was sufficiently large that the second term in equation (19)
pitch and the shoulder roll/yaw joints. This pairing allowed the dominated, producing a policy with high variance and poor
magnitude of dorsal and lateral arm motions to be indepen- average performance. A slightly less risk-seeking selection
dently specified. We commanded the pitch (dorsal) motions (κ = −1.5) yielded a policy with conservative low-energy
separately for each arm and mirrored the lateral motions, arm movements that was more sensitive to initial conditions
which reduced the number of policy parameters to 3. The than the lower risk policies. This exertion of minimal effort
range of each λi was constrained: 1 ≤ λi ≤ 15. At time could be viewed as a kind of gamble on initial conditions.
t, if ∀i τi (t) < 0.25, the arms were retracted to a nominal Figure 7 shows two successful trials executing risk-averse and
configuration (the mean of the initial configurations) using a risk-seeking policies.

206
Fig. 7. Time series (duration: 1 second) showing two successful trials executing low-risk (top, κ = 2) and high-risk (bottom, κ = −2) policies selected using
confidence bound criteria on the learned cost distribution. The low-risk policy produced an asymmetric dorsally-directed arm motion with reliable recovery
performance. The high-risk policy produced an upward laterally-directed arm motion that failed approximately 50% of the time.

ered when determining its suitability for a particular problem.


First, although the computational complexity is the same as
Bayesian optimization, O(N 3 ), the greater flexibility of the
VHGP model means that VBO tends to require more initial
policy evaluations than standard Bayesian optimization. In
addition, many model-free policy search algorithms, such as
Bayesian optimization, VBO, and stochastic gradient descent
[25], are sensitive to the number of policy parameters—high-
dimensional policies can require many trials to optimize. Thus,
these algorithms are most effective in problems where low-
dimensional policy spaces are available, but accurate system
models are not. However, there is evidence policy spaces at
least up to 15 dimensions can be efficiently explored with
Fig. 6. Data collected over 10 trials using policies identified as risk- Bayesian optimization if estimates of the GP hyperparameters
averse, risk-neutral, and risk-seeking after performing VBO. The policies were can be obtained a priori [17].
selected using confidence bound criteria with κ = 2, κ = 0, κ = −1.5, and
κ = −2, from left to right. The sample means and two times sample standard In contrast to local methods, such as policy gradient,
deviations are shown. The shaded region on the top part of the plot contains
all trials that resulted in failure to stabilize. Ten trials with a fixed-arm policy
Bayesian optimization and VBO can produce large changes
are plotted on the far right to serve as a baseline level of performance for this in policy parameters between episodes, which could be unde-
impact magnitude. sirable in some situations. One approach to alleviating this
potential problem is to combine VBO with local gradient
V. D ISCUSSION AND F UTURE W ORK methods. For example, one could imagine collecting data by
In many systems, it may be advantageous to adjust risk performing gradient descent, rather than randomly selecting
sensitivity based on runtime context. For example, systems policies initially. In this case, both the samples obtained and
whose environments change in ways that make failures more the gradient estimates could be used to constrain the posterior
or less costly (such as operating around catastrophic obstacles cost distribution. In turn, the learned local cost distribution
or in a safety harness) or when the context demands that could act as a critic structure to reduce the variance of the
the system seek out a low-probability high-performance event. policy update. Local offline optimization could be interweaved
Perhaps not surprisingly, this variable risk property has been with the local policy updates to select greedy policies or
observed in a variety of animal species, from simple motor change risk sensitivity using CB criteria. Some of these ideas
tasks in humans to foraging birds and bees [2, 1]. have been explored in our recent work [14].
However, most methods for learning policies by interac- Another important consideration is the choice of kernel
tion focus on the risk-neutral minimization of expected cost. functions in the GP priors. In this work, we used the
Extending Bayesian optimization methods to capture policy- anisotropic squared exponential kernel to encode our prior
dependent cost variance creates the opportunity to select assumptions regarding the smoothness and regularity of the
policies with different risk sensitivity. Furthermore, the ability underlying cost function. However, for many problems the
to change risk sensitivity at runtime offers an advantage underlying cost function is not smooth or regular; it contains
over existing risk-sensitive control techniques, e.g., [21, 30], flat regions and sharp discontinuities that can be difficult to
that require separate optimizations and policy executions to represent. An interesting direction for future work is the use
produce policies with different risk. kernel functions with local support, i.e. kernels that are not
There are several properties of VBO that should be consid- invariant to shifts in policy space [24].

207
VI. C ONCLUSION [15] Miguel Lázaro-Gredilla and Michalis K. Titsias. Variational
heteroscedastic Gaussian process regression. In Proceedings
Varying risk sensitivity based on runtime context is a of the International Conference on Machine Learning (ICML),
potentially powerful way to generate flexible control in robot 2011.
systems. We considered this problem in the context of model- [16] H. Levy and H. M. Markowitz. Approximating expected utility
free policy search, where risk-sensitive policies can be selected by a function of mean and variance. The American Economic
based on an efficiently learned cost distribution. Our experi- Review, 69(3):308–317, June 1979.
[17] Daniel Lizotte, Tao Wang, Michael Bowling, and Dale Schu-
mental results suggest that VBO is an efficient and plausible urmans. Automatic gait optimization with Gaussian process
method for achieving risk-sensitive control. regression. In Proceedings of the Twentieth International Joint
Conference on Artificial Intelligence (IJCAI), 2007.
ACKNOWLEDGMENTS
[18] Daniel James Lizotte. Practical Bayesian Optimization. PhD
The authors would like to thank George Konidaris and thesis, University of Alberta, Edmonton, Alberta, 2008.
the anonymous reviewers for their helpful comments. Scott [19] Ruben Martinez-Cantin, Nando de Freitas, Arnaud Doucet, and
Kuindersma was supported by a NASA GSRP Fellowship José A. Castellanos. Active policy learning for robot planning
and exploration under uncertainty. In Proceedings of Robotics:
from Johnson Space Center. Roderic Grupen was supported Science and Systems, 2007.
by the ONR MURI award N00014-07-1-0749. Andrew Barto [20] Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, José A.
was supported by the AFOSR under grant FA9550-08-1-0418. Castellanos, and Arnaud Doucet. A Bayesian exploration-
exploitation approach for optimal online sensing and planning
R EFERENCES with a visually guided mobile robot. Autonomous Robots, 27:
[1] Melissa Bateson. Recent advances in our understanding of risk- 93–103, 2009.
sensitive foraging preferences. Proceedings of the Nutrition [21] Oliver Mihatsch and Ralph Neuneier. Risk-sensitive reinforce-
Society, 61:1–8, 2002. ment learning. Machine Learning, 49:267–290, 2002.
[2] Daniel A. Braun, Arne J. Nagengast, and Daniel M. Wolpert. [22] J. Moc̆kus, V. Tiesis, and A. Z̆ilinskas. The application of
Risk-sensitivity in sensorimotor control. Frontiers in Human Bayesian methods for seeking the extremum. In Toward Global
Neuroscience, 5:1–10, January 2011. Optimization, volume 2, pages 117–128. Elsevier, 1978.
[3] Eric Brochu, Vlad Cora, and Nando de Freitas. A tutorial on [23] Jan Peters and Stefan Schaal. Policy gradient methods for
Bayesian optimization of expensive cost functions, with appli- robotics. In Proceedings of the IEEE International Conference
cation to active user modeling and hierarchical reinforcement on Intelligent Robots and Systems (IROS), pages 2219–2225,
learning. Technical Report TR-2009-023, University of British 2006.
Columbia, Department of Computer Science, 2009. [24] Carl Edward Rasmussen and Christopher K. I. Williams. Gaus-
[4] Adam D. Bull. Convergence rates of efficient global optimiza- sian Processes for Machine Learning. MIT Press, 2006.
tion algorithms. Journal of Machine Learning Research, 12: [25] John W. Roberts and Russ Tedrake. Signal-to-noise ratio
2879–2904, October 2011. analysis of policy gradient algorithms. In Advances of Neural
[5] Dennis D. Cox and Susan John. A statistical method for global Information Processing Systems 21 (NIPS), 2009.
optimization. In Systems, Man and Cybernetics, 1992., IEEE [26] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias
International Conference on, volume 2, pages 1241–1246, 1992. Seeger. Gaussian process optimization in the bandit setting: No
doi: 10.1109/ICSMC.1992.271617. regret and experimental design. In Proceedings of the 27th
[6] Marcus Frean and Phillip Boyle. Using Gaussian processes to International Conference on Machine Learning (ICML), 2010.
optimize expensive functions. In AI 2008: Advances in Artificial [27] Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung.
Intelligence, pages 258–267, 2008. Stochastic policy gradient reinforcement learning on a simple
[7] Paul W. Goldberg, Christopher K. I. Williams, and Christo- 3D biped. In Proceedings of the IEEE International Conference
pher M. Bishop. Regression with input-dependent noise: A on Intelligent Robots and Systems (IROS), volume 3, pages
Gaussian process treatment. In Advances in Neural Information 2849–2854, Sendai, Japan, September 2004.
Processing Systems 10 (NIPS), pages 493–499, 1998. [28] Matthew Tesch, Jeff Schneider, and Howie Choset. Using
[8] Steven G. Johnson. The NLopt nonlinear-optimization package. response surfaces and expected improvement to optimize snake
http://ab-initio.mit.edu/nlopt, 2011. robot gait parameters. In Proceedings of the IEEE/RSJ Inter-
[9] Donald R. Jones. A taxonomy of global optimization methods national Conference on Intelligent Robots and Systems (IROS),
based on response surfaces. Journal of Global Optimization, San Francisco, CA, 2011.
21:345–383, 2001. [29] Evangelos Theodorou, Jonas Buchli, and Stefan Schaal. Rein-
[10] Kristian Kersting, Christian Plagemann, Patrick Pfaff, and Wol- forcement learning of motor skills in high dimensions: A path
fram Burgard. Most likely heteroscedastic Gaussian process integral approach. In Proceedings of the IEEE International
regression. In Proceedings of the International Conference on Conference on Robotics and Automation, Anchorage, Alaska,
Machine Learning (ICML), pages 393–400, 2010. May 2010.
[11] Jens Kober and Jan Peters. Policy search for motor primitives in [30] Bart van den Broek, Wim Wiegerinck, and Bert Kappen. Risk
robotics. In Advances in Neural Information Processing Systems sensitive path integral control. In Proceedings of the 26th Con-
21. MIT Press, 2009. ference on Uncertainty in Artificial Intelligence (UAI), pages
[12] J. Zico Kolter and Andrew Y. Ng. Policy search via the signed 615–622, 2010.
derivative. In Robotics: Science and Systems V (RSS), 2010. [31] Peter Whittle. Risk-sensitive linear/quadratic/Gaussian control.
[13] Scott Kuindersma, Roderic Grupen, and Andrew Barto. Learn- Advances in Applied Probability, 13:764–777, 1981.
ing dynamic arm motions for postural recovery. In Proceedings [32] Peter Whittle. Risk-Sensitive Optimal Control. John Wiley &
of the 11th IEEE-RAS International Conference on Humanoid Sons, 1990.
Robots, pages 7–12, Bled, Slovenia, October 2011. [33] Aaron Wilson, Alan Fern, and Prasad Tadepalli. A behavior
[14] Scott Kuindersma, Roderic Grupen, and Andrew Barto. Variable based kernel for policy search via Bayesian optimization. In
risk dynamic mobile manipulation. In RSS 2012 Workshop on Proceedings of the ICML 2011 Workshop: Planning and Acting
Mobile Manipulation, Sydney, Australia, 2012. with Uncertain Model, Bellevue, WA, 2011.

208
Time-Optimal Trajectory Generation for Path
Following with Bounded Acceleration and Velocity
Tobias Kunz and Mike Stilman
School of Interactive Computing, Center for Robotics and Intelligent Machines
Georgia Institute of Technology
Atlanta, GA 30332
Email: [email protected], [email protected]

Abstract—This paper presents a novel method to generate the `i


time-optimal trajectory that exactly follows a given differentiable qi
joint-space path within given bounds on joint accelerations and
®i
velocities. We also present a path preprocessing method to make qi¡1
nondifferentiable paths differentiable by adding circular blends. ri
We introduce improvements to existing work that make the
algorithm more robust in the presence of numerical inaccuracies.
Furthermore we validate our methods on hundreds of randomly x
^i
generated test cases on simulated and real 7-DOF robot arms. qi+1 y
^i ci
Finally, we provide open source software that implements our
algorithms.
Fig. 1. Circular blend around waypoint
I. I NTRODUCTION
To deal with the complexity of planning a robot motion, the curvature is piecewise-continuous and that for every piece the
problem is often subdivided in two or more subproblems. The path is coplanar.
first subproblem is that of planning a geometric path through The output of a typical geometric path planner is a path
the environment, which does not collide with obstacles. In an in configuration space consisting of continuous straight line
additional step this path is converted into a time-parametrized segments between waypoints. Such a path is not differentiable
trajectory that follows the path within the capabilities of the at the waypoints. Thus, we also present a preprocessing step
robot. Preferably we are looking for a near-optimal trajectory to make such a path differentiable by adding circular blends.
according to some optimality criterion. In practice the capa-
bilities of the robot cannot be expressed exactly. Thus, the II. R ELATED W ORK
capabilities of the robot are normally approximated by using One approach described in standard textbooks [1, 2] to
some model of the robot and limiting certain quantities. One generate a trajectory that satisfies acceleration and velocity
option is to model the dynamics of the robot and limit the constraints from a list of waypoints is to use linear segments
torques that can be applied by the joints. In this paper we are with parabolic blends. However, the approach is not directly
using limits on joint accelerations and velocities which are applicable to automatically generated paths with potentially
often available. dense waypoints. They assume that the timing between the
This paper presents a method to generate the time-optimal waypoints and thus the velocities for the linear segments are
trajectory along a given path within given bounds on acceler- already known. However, a path normally does not include
ations and velocities. The path can be given in any arbitrary timing for the waypoints. Choosing the timing is not trivial.
configuration space. However, we assume that the acceleration Generally we prefer to move fast, but if we move too fast,
and velocity of individual coordinates are limited. Thus, the neighboring blend segments might overlap and render the
configuration coordinates need to be chosen such that they trajectory invalid.
match the quantities that need to be limited. For a robot In [3] a very simple method for choosing the timing is
manipulator the coordinates are typically chosen to match presented. This method tries to resolve overlaps of blend
joints of the robot. Thus, we will refer to the limits as joint segments locally by slowing down neighboring linear segments
acceleration and joint velocity limits. We assume that the path until no blend segments overlap. Yet, this method can lead to
is differentiable with respect to some path parameter s. This very slow trajectories, especially if the waypoints are close
is a weak assumption. If the path was not differentiable at a together.
point, the robot would have to come to a complete stop at The general approach used in this paper was introduced by
that point in order to follow the path. Then the path could Bobrow et al. [4] and by Shin and McKay [5]. We build on
be split at that point and the method presented here could be the later one. They propose to convert all joint constraints
applied to each part of the path. We also assume that the path into constraints on the acceleration and velocity along the

209
path, which reduces the problem to one dimension and then at least partially on numerical search, but also handled
search in the two-dimensional phase plane for the optimal the more general case of torque constraints.
trajectory. They consider not only acceleration constraints but • [8] applies an incorrect optimal acceleration at switching
use a more general model. They are able to limit torque in the points where the limit curve is non-differentiable. We
presence of robot dynamics or even to limit the voltage in the demonstrate this and provide an alternative solution.
presence of motor dynamics. Partially due to this generality, • We provide sufficient conditions for switching points
their derivation is more complex than ours. We are therefore instead of just necessary ones.
able to more thoroughly consider all possible special cases. [5] • An improvement to the algorithm that makes it more
falsely claims that at every point along the limit curve there robust in the presence of numerical inaccuracies.
is only a single feasible acceleration. It also does not consider • Our algorithm is demonstrated to be sufficiently robust to
limits on joint velocities. Finally, Shin searches for switching follow over 100 randomly generated paths without failing.
points solely numerically, which is less efficient and makes • We provide open-source software, available for download
the algorithm more likely to fail, especially in the presence of at http://www.golems.org/node/1570.
discontinuities in the path curvature.
The following papers have built on top of this general IV. PATH P REPROCESSING
approach. All of them assume torque constraints. A couple Common geometric path planners like PRMs or RRTs usu-
of papers analyse conditions for switching points in order to ally output the resulting path as a list of waypoints, which
be able to calculate them explicitly instead of searching for are connected by straight lines in configuration space. At a
them numerically. waypoint the path is changing its direction instantaneously
[6] notes that points where the limit curve is non- and thus is not differentiable. In order to follow such a path
differentiable are potential switching points and gives a nec- exactly, the robot would have to come to a complete stop at
essary condition for identifying them. It also states that there every waypoint. This would make the robot motion very slow
is more than one valid acceleration at these switching points. and look unnatural. Therefore, we add circular blends around
[7] lists all possible cases for switching points and gives the waypoints, which make the path differentiable. If the path
necessary conditions for some of them that can be used to is already differentiable, the preprocessing described in this
explicitly calculcate switching point candidates. However, they section can be omitted.
do not give sufficient conditions for switching points. Also, We are looking for a circular segment that starts tangential
one of the cases still requires a numerical search. We will show to the linear path segment before the waypoint and ends
that the case that requires a numerical search in [7] cannot tangential to the linear path segment after the waypoint. We
happen in our case with acceleration limits instead of torque also need to ensure that the circular segment does not replace
limits and a piecewise planar path. We only use a numerical more than half of each of the neighboring linear segments.
search for switching points caused by joint velocity limits. Otherwise we might not get a continuous path. In addition,
[8] deals with those switching points along the limit curve we allow the enforcement of a maximum deviation from the
where the acceleration is not uniquely determined. However, original path.
we will show that the acceleration they claim to be optimal at First, we define some quantities that are helpful to define
such a point is incorrect. the circle (see also Figure 1). The unit vector ŷi pointing from
Finally, [9] considers joint velocity constraints in addition waypoint qi−1 to qi is given by
to torque constraints. qi − qi−1
ŷi = (1)
None of the previous work deals with numerical issues or |qi − qi−1 |
reports how robustly their algorithm performs.
The angle αi between the two adjoining path segments of
waypoint qi is given by
III. C ONTRIBUTION
αi = arccos (ŷi · ŷi+1 ) (2)
Our algorithms build on existing work. The novel contributions
of this paper are the following: The distance i between waypoint qi and the points where the
• Combining path-following with a preprocessing step that circle touches the linear segments is given as
converts the output of typical path planners to a differen- * +
||qi − qi−1 || ||qi+1 − qi || δ sin α2i
tiable path. i = min , , (3)
2 2 1 − cos α2i
• A more thorough derivation of the constraints in the phase
plane than [5]. At the same time our derivation is simpler where the first two elements give the maximum possible
since we only cover a special case of [5]. distances such that the circular segment does not replace more
• We show that for the case of constraints on only ac- than half of the adjoining linear segments and the last element
celeration and piecewise-planar paths, the limit curve is limits the radius to make sure the circlular segment stays
never continuous and differentiable at a switching point. within a distance δ from waypoint qi .
Thus, we can calculate all switching points along the Given the quantities above, we can now define the circular
acceleration limit curve explicitly. Previous work relied segment. The circle is defined by its center ci , its radius ri and

210
two vectors x̂i and ŷi spanning the plane in which the circle The constraints in the high-dimensional joint space need to
lies. The vectors x̂i and ŷi are orthonormal. x̂i points from be converted into constraints on the scalar path velocity ṡ(s)
the center of the circle to the point where the circle touches and path acceleration s̈(s, ṡ). The constraints on joint accel-
the preceding linear path segment. ŷi is the previously defined erations result in constraints on the acceleration and velocity
direction of the preceding linear segment. along the path as shown in Section V-A. The constraints on
i joint velocities result in constraints on the velocity along the
ri = (4) path as presented in Section V-B.
tan α2i
ŷi+1 − ŷi ri A. Joint Acceleration Limits
ci = qi + · (5)
||ŷi+1 − ŷi || cos α2i We have constraints on the joint accelerations given as
qi − i ŷi − ci
x̂i = (6) −q̈imax ≤ q̈i ≤ q̈imax ∀i ∈ [0, ..., n] (13)
||qi − i ŷi − ci ||
Given these quantities specifying the circle, we can calculate where q̈i is the ith component of vector q̈. Although the uni-
the robot configuration q for any point on the circular segment versal quantifier is ommited in the following, all the following
as a function f (s) of the arc length s traveled from the start inequalities have to hold for all i ∈ [0, ..., n].
of the path. As we are currently only considering the current (13) ⇔ −q̈imax ≤ fi (s) s̈ + fi (s) ṡ2 ≤ q̈imax (14)
circular path segment, we assume si ≤ s ≤ si + αi ri , where
si specifies the start of the circular segment. Similarly we can If fi (s) > 0:
calculate the first and second derivatives of the function f (s). −q̈imax fi (s) ṡ2 q̈imax fi (s) ṡ2
Note that these are not time-derivatives but derivatives by s. (14) ⇔ − ≤ s̈ ≤ − (15)
fi (s) fi (s) fi (s) fi (s)
We will make use of these derivatives later.
     −q̈ max f  (s) ṡ2 q̈imax fi (s) ṡ2
s s ⇔ i − i  ≤ s̈ ≤ − (16)
q = f (s) = ci + ri x̂i cos + ŷi sin (7) |fi (s)| fi (s) |fi (s)| fi (s)
ri ri
    If fi (s) < 0:
s s
f  (s) = −x̂i sin + ŷi cos (8)
ri ri −q̈imax fi (s) ṡ2 q̈imax fi (s) ṡ2
     (14) ⇔ − ≥ s̈ ≥ − (17)
1 s s fi (s) fi (s) fi (s) fi (s)
f  (s) = − x̂i sin + ŷi cos (9)
ri ri ri q̈ max f  (s) ṡ2 −q̈ max f  (s) ṡ2
⇔ i − i  ≥ s̈ ≥  i − i  (18)
V. R EDUCTION TO O NE D IMENSION |fi (s)| fi (s) |fi (s)| fi (s)
The approach we are using was originally proposed in [5], If fi (s) = 0 and fi (s) = 0:
which finds a minimum-time trajectory that satisfies torque −q̈imax q̈ max
limits on the joints. In constrast, we assume acceleration and (14) ⇔  ≤ ṡ2 ≤ i (19)
|fi (s)| |fi (s)|
velocity limits on the joints. Acceleration limits are a special )
case of torque limits. Using acceleration instead of torque q̈imax
⇔ ṡ ≤ (20)
limits results in a simpler problem and a simpler derivation |fi (s)|
of the following equations than in [5]. Unlike [5] and like [9],
If fi (s) = 0 and fi (s) = 0, Eq. 14 is always satisfied.
we are also considering velocity limits on the joints.
Eq. 16 and Eq. 18 are equivalent. Thus, the limits on the
Because the solution is constrained to follow a given path
path acceleration s̈ are
exactly, the problem can be reduced to one dimension: choos-
ing the velocity ṡ = ds
dt for every position s along the path. s̈min (s, ṡ) ≤ s̈ ≤ s̈max (s, ṡ) (21)
A path of length sf is given as a function f : [0, sf ] → Rn .
The configuration q at a point s along the path is given by with
 
−q̈imax f  (s)ṡ2
q = f (s) 0 ≤ s ≤ sf (10) s̈min (s, ṡ) = max  − i (22)
i∈[1,...,n] |fi (s)| fi (s)
fi (s)=0
where s can be an arbitrary parameter. We will assume it is  
the arc length traveled since the start of the path. We can also q̈imax f  (s)ṡ2
s̈ max
(s, ṡ) = min  − i (23)
define the joint velocities and accelerations with respect to the i∈[1,...,n] |fi (s)| fi (s)
fi (s)=0
parameter s.
d d ds The limit on the path acceleration also constrains the path
q̇ = f (s) = f (s) = f  (s) ṡ (11) velocity, because in order for a path velocity to be feasible we
dt ds dt
q̈ = f  (s) s̈ + f  (s) ṡ2 (12) need s̈min (s, ṡ) ≤ s̈max (s, ṡ). We now derive the path velocity
limit ṡmax
acc (s) caused by acceleration constraints. We get rid of
If s is the arc length, ṡ and s̈ are the velocity and acceleration the min and max functions in Eq. 22 and 23 by requiring the
along the path, then f  (s) is the unit vector tangent to the inequality to hold for all possible combinations of arguments
path and f  (s) is the curvature vector. [8] to the min and max functions.

211
Plugging Eq. 11 into Eq. 32 yields

s̈min (s, ṡ) ≤ s̈max (s, ṡ) (24) −q̇imax ≤ fi (s)ṡ ≤ qimax ∀i ∈ [1, ..., n] (33)
 max  . max 
/
q̈i 
f (s)ṡ 2 −q̈j fj (s)ṡ 2
If fi (s)
= 0, then Eq. 33 is always satisfied. Otherwise,
⇔  − i − ((  (( −  ≥0
|fi (s)| fi (s) fj (s) fj (s) because ṡ > 0, Eq. 33 is equivalent to
∀i, j ∈ [1, ..., n], fi (s) = 0, fj (s) = 0 (25) qimax
. / . / ṡ ≤ ∀i ∈ [1, ..., n] (34)
|fi (s)|
fi (s) fj (s) q̈imax q̈jmax
⇔− −  2
ṡ + +( ( ≥0
fi (s) fj (s) |fi (s)| (fj (s)( Thus, the constraint on the path velocity caused by limits
on the joint velocities is given by
∀i, j ∈ [1, ..., n], fi (s) = 0, fj (s) = 0 (26)
( ( . / ṡ ≤ ṡmax
( f  (s) f  (s) ( vel (s) (35)
( i ( 2 q̈imax q̈jmax
+( ( ≥0
j
⇔−(  −  ( ṡ +
( fi (s) fj (s) ( |fi (s)| (fj (s)( with
q̇imax
∀i ∈ [1, ..., n], j ∈ [i + 1, ..., n], fi (s) = 0, fj (s) = 0 (27) ṡmax
vel (s) = min  (36)
i∈[1,...,n] |fi (s)|
This gives a set of downward-facing parabolas in ṡ horizon- fi (s)=0

tally centered around the origin. Each parabola is positive We will make use of the slope of this limit curve in the
within an interval around 0, which is the interval the feasible phase plane, which is the derivative by s given by
velocities may lie in. The positive bound of the interval can
d max q̇ max fi (s) q̇imax
be found by setting the parabola equation to zero. ṡvel (s) = − i i = arg min (37)
ds fi (s) |fi (s)| 
i∈[1,...,n] |fi (s)|
( ( . / fi (s)=0
( f  (s) f  (s) ( q̈imax q̈jmax
( i ( 2
+( ( = 0 (28)
j VI. A LGORITHM
−(  −  ( ṡ +
( fi (s) fj (s) ( |fi (s)| (fj (s)(
: Figure 2 shows the s-ṡ phase-plane. The start point is in
; q̈max the bottom-left corner and the end point in the bottom-right
; i q̈jmax
; |fi (s)| + |fj (s)| corner. The two limit curves limiting the path velocity, which
⇔ṡ = ;< (( f  (s) fj (s) (( (29) are caused by joint acceleration and joint velocity constraints
( fi (s) − f  (s) (
i j respectively, are shown. The trajectory must stay below these
two limit curves. We want to find a trajectory that maximizes
The interval of feasible path velocities ṡ is defined by the
path velocity ṡ at every point along the path. While not on
intersection of the feasible intervals of all parabolas. Thus, the
the limit curve, the path acceleration must be at one of its
upper bound for the path velocity is the minimum of all upper
limits, i.e. s̈min or s̈max . Thus, at every point on the phase-
bounds given in Eq. 29. Combining this with the case from
plane below the limit curve there are two possible directions
Eq. 20, the constraint on the path velocity caused by joint
to move in: one applying minimum acceleration and the other
acceleration limits is given by
one applying maximum acceleration. The algorithm needs to
ṡ ≤ ṡmax find the switching points between minimum and maximum ac-
acc (s) (30)
celeration. In Figure 2 switching points are marked by arrows.
with Switching points from minimum to maximum acceleration
must be on the limit curve, because otherwise we could find
ṡmax (s) = min a faster trajectory above the solution trajectory [5].
⎧ acc ⎫

⎪ ⎪

The high-level algorithm is described below. It differs

⎪ ⎪


⎪ ⎪

slightly from [5]. We integrate backward from the end point

⎪ ⎪


⎪ : ⎪

as the very last step. This makes the algorithm slightly simpler

⎪ ; q̈max ) ⎪


⎨ ; i q̈jmax


to implement, because while integrating forward we can never
; |fi (s)| + |fj (s)| q̈imax
;( intersect with another trajectory part and while integrating

min < ( f  (s) fj (s) (( , min
|fi (s)| ⎪


i∈[1,...,n]
( fi (s) − f  (s) ( i∈[1,...,n] ⎪

backward we can never reach the limit curve. Section VIII

⎪ j∈[i+1,...,n] fi (s)=0 ⎪

⎪ ⎪ gives some more implementation details. The numbers in
i j



fi (s)=0 fi (s)=0 ⎪


⎪  ⎪
⎪ Figure 2 correspond to the steps of the algorithm.


fj (s)=0


⎪ fi (s) − fj (s) =0
⎩ ⎪
⎭ 1) Start from the start of the path, i. e. s = 0 and ṡ = 0.
f  (s) f  (s)
i j
2) Integrate forward with maximum acceleration s̈ =
(31)
s̈max (s, ṡ) until one of the following conditions is met.
B. Joint Velocity Limits • If s ≥ sf , continue from the end of the path, i. e.

Constraints on the joint velocities are given as s = sf and ṡ = 0 and go to step 5.


max
• If ṡ > ṡacc (s), go to step 4.
−q̇imax ≤ q̇i ≤ qimax ∀i ∈ [1, ..., n] (32) max
• If ṡ > ṡvel (s), go to step 3.

212
3)
2)

5)
ṡ 2)

4)
acceleration limit curve
velocity limit curve
1) time-optimal trajectory
s
Fig. 2. Phase-plane trajectory

3) Follow the limit curve ṡmax vel (s) until one of the following limit curve if the limit curve the switching point is on is the
conditions is met. lower one. The following two sections deal with finding the
s̈max (s,ṡmax
vel (s)) d max
• If < ds ṡacc (s), go back to step 2. switching points on both, the acceleration and velocity limit

s̈min (s,ṡmax (s)) d max curves.
• If ṡ
vel
> ds ṡacc (s), go to step 4.
4) Search along the combined limit curve for the next A. Caused by Acceleration Constraints
switching point. See section Section VII.
This section describes how to find switching points along the
• Continue from the switching point and go to step 5.
path velocity limit curve ṡmax
acc (s) caused by constraints on the
5) Integrate backward with minimum acceleration until the joint accelerations.
start trajectory is hit. (If the limit curve is hit instead, We distinguish three cases of these switching points, de-
the algorithm failed. We used this condition to detect pending on whether the curve is continuous and/or differen-
failures shown in Table I.) The point where the start tiable at the switching point.
trajectory is intersected is a switching point. Replace 1) Discontinuous: ṡmax acc (s) is discontinuous if and only
the part of the start trajectory after that switching point if the path curvature f  (s) is discontinuous [7]. For a path
with the trajectory just generated. generated as described in Section IV the discontinuities are
• If we transitioned into this step from step 2, halt. exactly the points where the path switches between a circular
The start trajectory reached the end of the path with segment and straigt line segment. If the path’s curvature f  (s)
s = sf and ṡ = 0. is continuous everywhere, this case of switching points does
• Otherwise, continue from the end of the start tra- not happen.
jectory, which is the switching point found in step At a discontinuity s there are two path velocity limits
4, and go to step 2. −
ṡmax max +
acc (s ) and ṡacc (s ). The switching point is always at
VII. S WITCHING P OINTS the smaller one of the two. If the discontinuity is a positive
step and there is a trajectory sink in the negative direction of
With the exception of a finite number of points, at every
the discontinuity, then the discontinuity is a switching point.
point on the acceleration limit curve there is only a single
Equally, if the discontinuity is a negative step and there is a
feasible acceleration. If this acceleration leads into the feasible
trajectory source in the positive direction of the discontinuity,
region, the point on the limit curve is called a trajectory source
then the discontinuity is a switching point. Or more formally,
[6]. If this acceleration leads out of the feasible region, the
a discontinuity of ṡmax
acc (s) is a switching point if and only if
point is called a trajectory sink. We define trajectory source

and sink similarly for the velocity limit curve if all feasible −
ṡmax max +
acc (s ) < ṡacc (s )
accelerations lead into or out of the feasible region. If the
interval of feasible accelerations allows following the velocity d max − 
∧ s̈max (s− , smax
acc (s −
)) ≥ ṡ (s )
limit curve, we call the point singular.  ds acc
A switching point along a limit curve, is a point where −
∨ ṡmax max +
acc (s ) > ṡacc (s )
the limit curve switches from being a trajectory sink to being
d max + 
a trajectory source or to being singular. The limit curve is ∧ s̈max (s+ , smax
acc (s +
)) ≤ ṡ (s ) (38)
the curve given by the minimum of the acceleration limit ds acc
curve and the velocity limit curve. A switching point of the 2) Continuous and Nondifferentiable: The original paper
acceleration or velocity limit curve is a switching point of the [5] introducing the approach claims that every point along

213
q2

q1

q3

s Fig. 4. Joint acceleration space for n=3 with box constraints and a trajectory
for a switching point that meets the limit curve tangentially.
Fig. 3. Minimum and maximum acceleration trajectories near a switching
point where the limit curve is nondifferentiable
switching point. We leave the proof for future work, but all of
the limit curve only has a single allowable acceleration with our experimental data supports this claim.
s̈min (s, ṡmax
acc (s)) = s̈
max
(s, ṡmax
acc (s)). However, this is inac- In order for the zero acceleration to be feasible, the limit
curate. As [6] notes, there are points along the limit curve curve must switch from a negative to a positive slope at the
where there is an interval of allowable accelerations, i. e. switching point. We claim that the limit curve switching from
s̈min (s, ṡmax
acc (s)) < s̈
max
(s, ṡmax
acc (s)). These points are called a negative to a positve slope is a necessary and sufficient con-
critical points in [8] and zero-inertia points in [7]. At these dition for a switching point at a point of nondifferentiability
points the limit curve is continuous but nondifferentiable [8]. of the limit curve. Together with the sufficient condition stated
As noted in [6], a neccessary condition for such switching in Eq. 39, this allows for an explicit enumeration of all such
points is switching points.
∃i : fi (s) = 0 (39) 3) Continuous and Differentiable: According to the follow-
ing lemma this case does not exist.
For a path generated from waypoints as described in Sec- Lemma 1: If only joint accelerations are constrained, the
tion IV, points satisfying Eq. 39 can be easily calculated. There path is piece-wise coplanar, and the curvature f  (s) is dis-
are at most n per circular segment. continuous at the points where the pieces are stitched together,
[6] does not note that choosing the correct acceleration at then there are no switching points at which the limit curve is
such a switching point might be an issue. [6] proposes to continuous and differentiable.
integrate backward from such a switching point with minimum Proof: Before the switching point the path acceleration
acceleration and integrate forward with maximum acceleration is at the lower limit. This implies that at least one joint is
as usual. [8] notices that the acceleration at such a switching at its acceleration limit. After the switching point the path
point needs special consideration. [8] notes that the maximum acceleration is at its upper limit. This implies that at least one
or minimum acceleration cannot always be followed, because joint distinct from the previous one is at its acceleration limit.
they would lead into the infeasible region of the phase plane. At the switching point both joints are at their acceleration
[8] calls such a point a singular point and proposes to follow limit. The limit curve, the phase-plane trajectory and the joint-
the limit curve tangentially. However, both [6] and [8] are in space trajectory are continuous and differentiable near the
error. switching point. One joint approaches its acceleration limit
Figure 3 shows a switching point at a point where the limit at the switching point and then stays at the limit. Another
curve is nondifferentiable. At the switching point two arrows distinct joint is at its acceleration limit and leaves the limit
indicate the direction of motion in the phase plane when at the switching point. Figure 4 shows the joint acceleration
applying minimum or maximum acceleration, respectively. In space for a 3-joint system with the joint acceleration limits
the feasible region of the phase plane gray curves visual- shown as a box. At the switching point two joint are at
ize two vector fields. They show minimum- and maximum- their acceleration limit. Thus, the trajectory is on an edge of
acceleration trajectories. Minimum-acceleration trajectories the box at the switching point. Before the switching point
are dashed, maximum-acceleration trajectories are solid. In the trajectory stays on one surface of the box, approaches
the case shown here, if integrating backwards with minimum the edge tangentially. After the switching point the trajectory
acceleration, the infeasible region would be entered. Thus, leaves the edge tangentially while staying on another surface
according to [8] we would have to follow the red limit of the constraint box. This trajectory lives in at least a three-
curve tangentially backwards from the switching point. For dimensional subspace of the joint acceleration space. There is
integrating forward [6] and [8] propose to follow the upward no two-dimensional subspace that includes the trajectory in the
arrow. However, both of these motions are not possible, as vicinity of the switching point. Because the path is piecewise
they would not move along the vector field sourrounding the planar, the accelerations in joint space are also piecewise
switching point. Zero is the only acceleration at the switching planar. Because there is no trajectory in joint acceleration
point that conforms with the vector field around it, resulting space that cannot be included in a two-dimensional subspace, a
in a trajecory moving horizontally in the phase plane. switching point where the limit curve is differentiable does not
We claim that the optimal acceleration is zero at every such exist. The only case where such a switching point can exist

214
is where two planar parts of the path are stitched together.
However, these points are those where the curvature f  (s) is
discontinuous and thus the limit curve is discontinuous.
B. Caused by Velocity Constraints
This section describes how to find switching points along the ṡ
path velocity limit curve ṡmax
vel (s) caused by constraints on the
joint velocities. We distinguish two cases of these switching
points, depending on whether s̈min (s, ṡmax
vel (s)) is continuous.
limit curve
1) Continuous: s is a possible switching point if, only if accurate trajectory
inaccurate trajectory
d max
s̈min (s, smax
vel (s)) = ṡ (s) (40) s
ds acc
(a) failure without source/sink check
We search for these switching points numerically by stepping
along the limit curve until we detect a sign change. Then we
use bisection to more accurately determine the switching point.
See also Section VIII-B.
2) Discontinuous: fi (s) being discontinuos is a necessary
condition for s̈min (s, ṡmax
vel (s)) being discontinuos. s is a
possible switching point if and only if ṡ
d max −
(s̈min (s− , smax −
vel (s )) ≥ ṡ (s )) (41)
ds acc
d max + limit curve
vel (s )) ≤
∧ (s̈min (s+ , smax +
ṡ (s )) (42) accurate trajectory
ds acc
inaccurate trajectory
VIII. N UMERICAL C ONSIDERATIONS
When doing floating-point calculations we must accept ap- s
proximate results. However, we cannot accept the algorithm (b) success with source/sink check
failing completely due to numerical inaccuracies. We now Fig. 5. Dealing with integration inaccuracies
describe how numerical inaccuracies can make the algorithm
described in section Section VI fail and the measures we from the next switching point we might hit the limit curve
have taken to avoid that. None of the previous papers on this instead of the forward trajectory. This is because the algorithm
approach [5–9] have dealt with numerical issues. relies on the fact that there are no trajectory sources between
where the forward trajectory hits the limit curve and the next
A. Integration switching point, which is not the case if the forward trajectory
The algorithm described in Section VI assumes that we can hits the limit curve at a trajectory source. To deal with this
exactly integrate the trajectory. Under this assumption it is issue, we determine the intersection point with the limit curve
impossible to hit the limit curve at a trajectory source when by bisection and then check whether this point is a trajectory
integrating forward. However, as we have to do the integration source by comparing the slope of the limit curve with the
numerically with some non-zero step size, it is not exact. This slope of the maximum-acceleration trajectory. Figure 5 shows
can lead to the limit curve being hit at a trajectory source. the result of this measure. The inaccurate trajectory does not
Figure 5 shows an example of the trajectory hitting the limit follow the accurate one exactly, but it does not stop when the
curve at a trajectory source. The figure shows the limit curve limit curve is hit. Thus the algorithm succeeds.
together with an accurate trajectory generated using a small Note that although the smaller step size solved the problem
step size and a not-so-accurate trajectory generated with a in the example shown in Figure 5, it does not do so in general.
10 times larger step size (1 ms). The trajectories enter the In general, a smaller step size or a better integration method
figure on the left at minimum acceleration until they touch only make it less likely that the limit curve is hit at a trajectory
the limit curve. Then they switch to maximum acceleration. source. Thus, the method described above is necessary to
In the following the accurate trajectory gets close to the limit ensure completeness of the algorithm.
curve but does not hit it. The inaccurate trajectory, however,
because of the larger step size, misses the bend shortly before B. Switching Point Search
the limit curve and hits the limit curve. According to the All previous work at least partially relies on numerical search
algorithm described in Section VI and the algorithms described to find switching points, but none notes the potential issues
in previous work [5, 6, 8, 9] we would have to stop the caused by using it. We also use numerical search, but only
forward integration, search for the next switching point along along the velocity limit curve. If we only had acceleration
the limit curve and integrate backward from there until we hit constraints, numerical search would not be necessary. We try
the forward trajectory. However, when integrating backward to avoid numerical search as much as possible, because it

215
computation time was achieved on a single core of an Intel
Core i5 at 2.67 Ghz. Smaller time steps lead to a longer
computation time and a slightly closer to optimal path. What
is not shown here is that a smaller step size also leads to
the constraints being satisfied more accurately. Related to that
is the fact that a larger time step makes the algorithm more
likely to fail if the counter measures for numerical inaccuracies
described in Section VIII-A are omitted. Our algorithm is
robust enough to successfully generate trajectories for 100
random pick-and-place operations using different time steps.
We also ran our algorithm on a real-robot arm following
a path defined by hand-crafted waypoints. A video of this
Fig. 6. 200 start and goal locations and a few example pick-and-place operations is available at
http://www.golems.org/node/1570.
Failure rate
Time Computation Execution without with
X. C ONCLUSION
Step Time Time source/sink check We presented an algorithm to follow a path in time-optimal
10 ms 0.2 s 4.74 s 13 % 0% manner while satisfying joint acceleration and velocity con-
1 ms 0.9 s 4.71 s 3% 0% straints. We added blends to a list of waypoint in order to
0.1 ms 31.8 s 4.70 s 0% 0% be able to use the output of standart path planners. Our
TABLE I improvements to existing work make the algorithm more
R ESULTS FOR VARYING TIME STEP robust and less likely to fail. We showed the robustness of
our implementation by following paths planned by an RRT
is slower, less accurate and might make the algorithm fail. planner for 100 random pick-and-place operations.
Our algorithm relies on the fact that we can find the next
switching point along the limit curve. However, by searching ACKNOWLEDGEMENTS
numerically and stepping along the limit curve, we could This work is supported by Toyota Motor Engineering &
theoretically miss a switching point and find one that is not the Manufacturing North America (TEMA). We appreciate the
next one. The algorithm could potentially be adapted to work great contribution to our robotics research and education.
with any switching point without requiring it being the next R EFERENCES
one. However, during our experiments we did not encounter a
failure caused by missing a switching point. [1] J.J. Craig. Introduction to Robotics: Mechanics and
Control (3rd Edition). Prentice Hall, 2004.
IX. E XPERIMENTAL R ESULTS [2] B. Siciliano, L. Sciavicco, and L. Villani. Robotics:
We evaluated our approach by generating trajectories for modelling, planning and control. Springer, 2009.
a 7-DOF robot arm that is given the task of picking and [3] T. Kunz and M. Stilman. Turning Paths Into Trajectories
placing a bottle from or onto a shelf or table. The results are Using Parabolic Blends. Technical Report GT-GOLEM-
averages over 100 pick-and-place operations. For each pick- 2011-006, Georgia Institute of Technology, 2011.
and-place operation we randomly selected a pick and a place [4] J.E. Bobrow, S. Dubowsky, and J.S. Gibson. Time-optimal
location. Figure 6 shows the 200 pick and place locations. control of robotic manipulators along specified paths. The
We used a bidirectional RRT to plan a configuration space Int. Journal of Robotics Research, 4(3):3–17, 1985.
path for the arm. Each pick-and-place operation constists of 3 [5] K. Shin and N. McKay. Minimum-time control of robotic
path segments: from the initial arm configuration to the pick manipulators with geometric path constraints. IEEE Trans-
configuration, to the place configuration and back to the intial actions on Automatic Control, 30(6):531–541, 1985.
configuration. In our evaluation we ignore the finger motion [6] F. Pfeiffer and R. Johanni. A concept for manipulator
necessary to grasp the object. The robot shown in Figure 6 uses trajectory planning. IEEE Journal of Robotics and Au-
only its left arm. The arm configuration shown is the inital arm tomation, 3(2):115–123, 1987.
configuration, which is reached at the beginning and end of [7] J.-J.E. Slotine and H.S. Yang. Improving the efficiency
every pick-and-place operation. The paths generated by the of time-optimal path-following algorithms. IEEE Trans-
RRT planner are shortened using random short cutting. The actions on Robotics and Automation, 5(1):118–124, 1989.
result are paths given by waypoints at most 0.1 apart. Here [8] Z. Shiller and H.H. Lu. Computation of path constrained
we evaluate the trajectory generation from these waypoints. time optimal motions with dynamic singularities. Journal
This method of generating example paths is very real-world of dynamic systems, measurement, and control, 114:34,
oriented and does not allow us to handcraft waypoints until 1992.
the trajectory generation accidentally succeeds. [9] L. Zlajpah. On time optimal path control of manipulators
Table I shows computation and execution times for different with bounded joint velocities and torques. In Proc. of
time steps used for the integration of the trajectory. The IEEE Int. Conf. on Robotics and Automation, 1996.

216
Towards A Swarm of Agile Micro Quadrotors
Alex Kushleyev, Daniel Mellinger, Vijay Kumar
GRASP Lab, University of Pennsylvania

Abstract—We describe a prototype 73 gram, 21 cm diameter


micro quadrotor with onboard attitude estimation and control
that operates autonomously with an external localization system.
We argue that the reduction in size leads to agility and the
ability to operate in tight formations and provide experimental
arguments in support of this claim. The robot is shown to be
capable of 1850◦ /sec roll and pitch, performs a 360◦ flip in 0.4
seconds and exhibits a lateral step response of 1 body length
in 1 second. We describe the architecture and algorithms to
coordinate a team of quadrotors, organize them into groups and
fly through known three-dimensional environments. We provide
experimental results for a team of 20 micro quadrotors.
Fig. 1. A formation of 20 micro quadrotors in flight. See video at
I. I NTRODUCTION http://youtu.be/50Fdi7712KQ

The last decade has seen rapid progress in micro aerial II. AGILITY OF M ICRO Q UADROTORS
robots, autonomous aerial vehicles that are smaller than 1
meter in scale and 1 kg or less in mass. Winged aircrafts It is useful to develop a simple physics model to analyze a
can range from fixed-wing vehicles [14] to flapping-wing quadrotor’s ability to produce linear and angular accelerations
vehicles [6], the latter mostly inspired by insect flight. Rotor from a hover state. If the characteristic length is L, the rotor
crafts, including helicopters, coaxial rotor crafts [9], ducted radius R scales linearly with L. The mass scales as L3 and the
fans[22], quadrotors [10] and hexarotors, have proved to be moments of inertia as L5 . On the other hand the lift or thrust,
more mature [15] with quadrotors being the most commonly F , and drag, D, from the rotors scale with the cross-sectional
used aerial platform in robotics research labs. In this class, area and the square of the blade-tip velocity, v. If the angular
the Hummingbird quadrotor sold by Ascending Technologies, speed of the blades is defined by ω = Lv , F ∼ ω 2 L4 and D ∼
2 4
GmbH [2], with a tip-to-tip wingspan of 55 cm, a height of ω 2 L4 . The linear acceleration a scales as a ∼ ωLL3 = ω 2 L.
8 cm, mass of about 500 grams including a Lithium Polymer Thrusts from the rotors produce a moment with a moment arm
2 5
battery and consuming about 75 Watts is a remarkably capable L. Thus the angular acceleration α ∼ ωLL5 = ω 2 .
and robust platform as shown in [16, 17]. The rotor speed, ω also scales with length since smaller
Of course micro aerial robots have a fundamental payload motors produce less torque which limits their peak speed
limitation that is difficult to overcome in many practical because of the drag resistance that also scales the same
applications. However larger payloads can be manipulated way as lift. There are two commonly accepted approaches
and transported by multiple UAVs either using grippers or to study scaling in aerial vehicles [28]. Mach scaling is used
cables [20]. Applications such as surveillance or search and for compressible flows and essentially assumes that the tip
rescue that require coverage of large areas or imagery from velocities are constant leading to ω ∼ R1 . Froude scaling is
multiple sensors can be addressed by coordinating multiple used for incompressible flows and assumes that for similar
v2
UAVs, each with different sensors. aircraft configurations, the Froude number, Lg , is constant.
Our interest in this paper is scaling down the quadrotor Here g is the acceleration due to gravity. This yields ω ∼ √1R .
platform to develop a truly small micro UAV. The most However, neither Froude or Mach number similitudes take
important and obvious benefit of scaling down in size is motor characteristics nor battery properties into account. While
the ability of the quadrotor to operate in tightly constrained motor torque increases with length, the operating speed for the
environments in tight formations. While the payload capacity rotors is determined by matching the torque-speed character-
of the quadrotor falls dramatically, it is possible to deploy istics of the motor to the drag versus speed characteristics
multiple quadrotors that cooperate to overcome this limitation. of the propellors. Further, the motor torque depends on the
Again, the small size benefits us because smaller vehicles ability of the battery to source the required current. All these
can operate in closer proximity than large vehicles. Another variables are tightly coupled for smaller designs since there
interesting benefit of scaling down is agility. As argued later are fewer choices available at smaller length scales. Finally,
and illustrated with experimental results, smaller quadrotors the assumption that propeller blades are rigid may be wrong
exhibit higher accelerations allowing more rapid adaptation to and the performance of the blades can be very different at
disturbances and higher stability. smaller scales, the quadratic scaling of the lift with speed may

217
50

40

Pitch Angle (deg)


30

20

10

0
−0.1 0 0.1 0.2 0.3
Time (sec)

(a) Pitch angle step input re- (b) Data for the flipping maneuver
sponse

Fig. 3. Attitude controller performance data


Fig. 2. A prototype micro quadrotor.

not be accurate. Nevertheless these two cases are meaningful altitude. For communication the vehicle contains two Zigbee
since they provide some insight into the physics underlying transceivers that can operate at either 900 MHz or 2.4 GHz.
the maneuverability of the craft. C. Software Infrastructure
Froude scaling suggests that the acceleration is independent
of length while the angular acceleration α ∼ L−1 . On the The work in this paper uses a Vicon motion capture system
other hand, Mach scaling leads to the conclusion that a ∼ L [5] to sense the position of each vehicle at 100 Hz. This
while α ∼ L−2 . Since quadrotors must rotate (exhibit angular data is streamed over a gigabit ethernet network to a desktop
accelerations) in order to translate, smaller quadrotors are base station. High-level control and planning is done in
much more agile. M ATLAB on the base station which sends commands to each
There are two design points that are illustrative of the quadrotor at 100 Hz. The software for controlling a large
quadrotor configuration. The Pelican quadrotor from Ascend- team of quadrotors is described later in Sec. V (see Fig. 7).
ing Technologies [2] equipped with sensors (approx. 2 kg gross Low-level estimation and control loops run on the onboard
weight, 0.75 m diameter, and 5400 rpm nominal rotor speed microprocessor at a rate of 600 Hz.
at hover), consumes approximately 400 W of power [25]. The Each quadrotor has two independent radio transceivers,
Hummingbird quadrotor from Ascending Technologies (500 operating at 900 MHz and 2.4 GHz. The base station sends,
grams gross weight, approximately 0.5 m diameter, and 5000 via custom radio modules, the desired commands, containing
rpm nominal rotor speed at hover) without additional sensors orientation, thrust, angular rates and attitude controller gains
consumes about 75 W. In this paper, we outline a design for to the individual quadrotors. The onboard rate gyros and
a quadrotor which is approximately 40% of the size of the accelerometer are used to estimate the orientation and angular
Hummingbird, 15% of its mass, and consuming approximately velocity of the craft. The main microprocessor runs the attitude
20% of the power for hovering. controller described in Sec. IV and sends the desired propeller
speeds to each of the four motor controllers at full rate
III. T HE M ICRO Q UADROTOR (600Hz).
A. The Vehicle D. Performance
The prototype quadrotor is shown in Figure 2. Its booms Some performance data for the onboard attitude controller
are made of carbon fiber rods which are sandwiched between in Fig. 3. The small moments of inertia of the vehicle enable
a custom motor controller board on the bottom and the main the vehicle to create large angular accelerations. As shown in
controller board on the top. To produce lift the vehicle uses Fig. 4(a) the attiude control is designed to be approximately
four fixed-pitch propellers with diameters of 8 cm. The vehicle critically damped with a settling time of less than 0.2 seconds.
propeller-tip-to-propeller-tip distance is 21 cm and its weight Note that this is twice as fast as the settling time for the attitude
without a battery is 50 grams. The hover time is approximately controller for the AscTec Hummingbird reported in [18]. Data
11 minutes with a 2-cell 400 mAh Li-Po battery that weighs for a flip is presented 3(b). Here the vehicle completes a
23 grams. complete flip about its y axis in about 0.4 seconds and reaches
a maximum angular velocity of 1850 deg/sec.
B. Electronics The position controller described in Sec. IV uses the roll
Despite its small size this vehicle contains a full suite of and pitch angles to control the x and y position of the vehicle.
onboard sensors. An ARM Cortex-M3 processor, running at For this reason, a stiff attitude controller is a required for stiff
72 MHz, serves as the main processor. The vehicle contains position control. Response to step inputs in the lateral and
a 3-axis magnetometer, a 3-axis accelerometer, a 2-axis 2000 vertical directions are shown in Fig. 4(b). For the hovering
deg/sec rate gyro for the roll and pitch axes, and a single- performance data shown in Fig. 4 the standard deviations of
axis 500 deg/sec rate gyro for the yaw axis. The vehicle also the error for x and y are about 0.75 cm and about 0.2 cm for
contains a barometer which can be used to sense a change in z.

218
by:
⎡ ⎤ ⎡ 2⎤
kF kF kF kF ω1
⎢ 0 k L 0 −k ⎥ ⎢
F ⎥ ⎢ ω2 ⎥
L 2⎥
u=⎢ ⎣−kF L
F
, (1)
0 kF L 0 ⎦ ⎣ω32 ⎦
kM −kM kM −kM 2
ω4
where L is the distance from the axis of rotation of the
propellers to the center of the quadrotor.
The Newton-Euler equations of motion are given by:
mr̈ = −mga3 + u1 b3 (2)
(a) Position error (b) Position step input re- ⎡ ⎡ ⎤⎤
sponse u2
ω̇ = I −1 ⎣−ω × Iω + ⎣ u3 ⎦⎦ (3)
Fig. 4. The three curves in (a) represent the x, y, and z errors while
hovering. (b) shows the step response for the position controller in x (top) u4
and z (bottom). where I is the moment of inertia matrix along bi .
We specify the desired trajectory using a time-parameterized
position vector and yaw angle. Given a trajectory, σ(t) :
[0, tf ] → R3 × SO(2), the controller derives the input u1
based on position and velocity errors:
u1 = (−Kp ep − Kv ev + mga3 ) · b3 (4)
where ep = r − rT and ev = ṙ − ṙT . The other three inputs
are determined by computing the desired rotation matrix. We
want to align the thrust vector u1 b3 with (−Kp ep − Kv ev +
mga3 ) in (4). Second, we want the yaw angle to follow the
specified yaw ψT (t). From these two pieces of information
we can compute Rdes and the error in rotation according to:
1 T
Fig. 5. The reference frames and propeller numbering convention. eR = (Rdes R − RT Rdes )∨
2
where ∨ represents the vee map which takes elements of
IV. DYNAMICS AND C ONTROL so(3) to R3 . The desired angular velocity is computed by
differentiating the expression for R and the desired moments
The dynamic model and control for the micro quadrotor can be expressed as a function of the orientation error, eR ,
is based on the approach in [17]. As shown in Figure 5, we and the angular velocity error, eω :
consider a body-fixed frame B aligned with the principal axes T
[u2 , u3 , u4 ] = −KR eR − Kω eω , (5)
of the quadrotor (unit vectors bi ) and an inertial frame A with
unit vectors ai . B is described in A by a position vector r to where KR and Kω are diagonal gain matrices. Finally we
the center of mass C and a rotation matrix R. In order to avoid compute the desired rotor speeds to achieve the desired u by
singularities associated with parameterization, we use the full inverting (1).
rotation matrix to describe orientations. The angular velocity V. C ONTROL AND P LANNING FOR G ROUPS
of the quadrotor in the body frame, ω, is given by ω̂ = RT Ṙ,
where ˆ denotes the skew-symmetric matrix form of the vector. A. Architecture
We are primarily interested in the challenge of coordinating
As shown in Fig. 5, the four rotors are numbered 1-4, with a large team of quadrotors. To manage the complexity that
odd numbered rotors having a pitch that is opposite to the results from growth of the state space dimensionality and
even numbered rotors. The angular speed of the rotor is ωi . limit the combinatorial explosion arising from interactions
The resulting lift, Fi , and the reaction moment, Mi , are given between labeled vehicles, we consider a team architecture
by: in which the team is organized into labeled groups, each
Fi = kF ωi2 , Mi = kM ωi2 . with labeled vehicles. Formally, we can define a group of
agents as a collection of agents which work simultaneously to
where the constants kF and kM are empirically determined. complete a single task. Two or more groups act in a team to
For our micro quadrotor, the motor dynamics have a time complete a task which requires completing multiple parallel
constant less than 10 msec and are much faster than the subtasks [7]. We assume that vehicles within a group can
time scale of rigid body dynamics and aerodynamics. Thus communicate at high data rates with low latencies while the
we neglect the dynamics and assume Fi and Mi can be communication requirements for coordination across groups
instantaneously changed. Therefore the control input to the are much less stringent. Most importantly, vehicles within a
system, u, consists of the net thrust in the b3 direction, group are labeled. The small group size allows us to design
u1 = Σ4i=1 Fi , and the moments in B, [u2 , u3 , u4 ]T , given controllers and planners that provide global guarantees on

219
team trajectory but be separated by some time increment. Here
we let the trajectory for quadrotor q be defined as
rT q (t) = rT T (t + Δtq ) (7)
where rT T is the team trajectory and Δtq is the time shift for
quadrotor q from some common clock, t. If the team trajectory
does not intersect or come within an unsafe distance of itself
then vehicles simply need to follow each other at a safe
   time separation. Large numbers of vehicles can follow team
trajectories that intersect themselves if the time separations,
Fig. 6. The team of quadrotors is organized into m groups. While vehicles Δtq , are chosen so that no two vehicles are at any of the
within the group are tightly coordinated and centralized control and planning
is possible, the inter-group coordination need not be centralized.
intersection points at the same time. An experiment for an
intersecting team trajectory is shown in Sec. VI.
shapes, communication topology, and relative positions of
D. Trajectory Generation with MIQPs
individual, agile robots.
Our approach is in contrast to truly decentralized approaches Here we describe a method for generating smooth, safe
which are necessary in swarms with hundreds and thousands trajectories through known 3-D environments satisfying spec-
of agents [21]. While models of leaderless aggregation and ifications on intermediate waypoints for multiple vehicles. In-
swarming with aerial robots are discussed in the robotics teger constraints are used to enforce collision constraints with
community [11, 26, 19], here the challenge of enumerating obstacles and other vehicles and also to optimally assign goal
labelled interactions between robots is circumvented by con- positions. This method draws from the extensive literature on
trolling such aggregate descriptors of formation as statistical mixed-integer linear programs (MILPs) and their application
distributions. These methods cannot provide guarantees on to trajectory planning from Schouwenaars et al. [23, 24].
shape or topology. Reciprocal collision avoidance algorithms 1) Basic Method: As described in [17] an optimization
[27] have the potential to navigate robots to goal destinations program can be used to generate trajectories that smoothly
but no guarantees are available for transient performance and transition through nw desired waypoints at specified times,
no proof of convergence is available. tw . The optimization program to solve this problem while
On the other hand, the problem of designing decentralized minimizing the integral of the kr th derivative of position
controllers for trajectory tracking for three dimensional rigid squared for nq quadrotors is shown below.
structures is now fairly well understood[12, 13, 8], although nq - tnw (((( dkr rTq ((((2
min q=1 t0 (( dtkr (( dt (8)
few experimental results are available for aerial robots. Our
framework allows the maintenance of such rigid structures in s.t. rT q (tw ) = rwq , w = 0, ..., nw ; ∀q
groups. dj xT q
dtj |t=tw = 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q
B. Formation Flight d j yT q
dtj |t=tw = 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q
Flying in formation reduces the complexity of generating j
d zT q
trajectories for a large team of vehicles to generating a trajec- dtj |t=tw = 0 or free, w = 0, nw ; j = 1, ..., kr ; ∀q
tory for a single entity. If the controllers are well-designed, Here rT q = [xT q , yT q , zT q ] represents the trajectory for
there is no need to explicitly incorporate collision avoidance quadrotor q and rwq represents the desired waypoints for
between vehicles. The position error for quadrotor q at time t quadrotor q. We enforce continuity of the first kr derivatives
can be written as of rT q at t1 ,...,tnw −1 . As shown in [17] writing the trajectories
epq (t) = ef (t) + elq (t) (6) as piecewise polynomial functions allows [8] to be written as
where ef (t) is the formation error describing the error of a quadratic program (or QP) in which the decision variables
position of the group from the prescribed trajectory, and elq (t) are the coefficients of the polynomials.
is the local error of quadrotor q within the formation of the For quadrotors, since the inputs u2 and u3 appear as
group. As we will show in Sec. VI the local error is typically functions of the fourth derivatives of the positions, we generate
quite small even for aggressive trajectories even though the trajectories that minimize the integral of the square of the
formation error can be quite large. norm of the snap (the second derivative of acceleration,
A major disadvantage of formation flight is that the rigid kr = 4). Large order polynomials are used to satisfy such
formation can only fit through large gaps. This can be ad- additional trajectory constraints as obstacle avoidance that are
dressed by changing the shape of the formation of the team not explicitly specified by intermediate waypoints.
or dividing the team into smaller groups, allowing each group 2) Integer Constraints for Collision Avoidance: For col-
to negotiate the gap independently. lision avoidance we model the quadrotors as a rectangular
prisms oriented with the world frame with side lengths lx , ly ,
C. Time-Separated Trajectory Following and lz . These lengths are large enough so that the quadrotor
Another way to reduce the complexity of the trajectory can roll, pitch, and yaw to any angle and stay within the prism.
generation problem is to require all vehicles to follow the same We consider navigating this prism through an environment

220
with no convex obstacles. Each convex obstacle o can be freedom in the optimization problem. For each quadrotor q
represented by a convex region in configuration space with and goal g we introduce the integer constraints:
nf (o) faces. For each face f the condition that the quadrotor’s
desired position at time tk , rT q (tk ), be outside of obstacle o xT q (tnw ) ≤ xg + M βqg (12)
can be written as xT q (tnw ) ≥ xg − M βqg
nof · rT q (tk ) ≤ sof , (9) yT q (tnw ) ≤ yg + M βqg
where nof is the normal vector to face f of obstacle o in yT q (tnw ) ≥ yg − M βqg
configuration space and sof is a scalar that determines the zT q (tnw ) ≤ zg + M βqg
location of the plane. If (9) is satisfied for at least one of the zT q (tnw ) ≥ zg − M βqg
faces then the rectangular prism, and hence the quadrotor, is
not in collision with the obstacle. The condition that quadrotor Here βqg is a binary variable used to enforce the optimal goal
q does not collide with an obstacle o at time tk can be enforced assignment. If βqg is 0 then quadrotor q must be at goal g at
with binary variables, bqof k , as tnw . If βqg is 1 then these constraints are satisfied for any final
nof · rT q (tk ) ≤ sof + M bqof k ∀f = 1, ..., nf (o) (10) position of quadrotor q. In order to guaruntee that at least ng
quadrotors reach the desired goals we introduce the following
bqof k = 0 or 1 ∀f = 1, ..., nf (o)
constraint.

nf (o)
 
nq n g
bqof k ≤ nf (o) − 1 βqg ≤ ng nq − ng (13)
f =1 q=1 g=1
where M is a large positive number [23]. Note that if bqof k
is 1 then the inequality for face f is always satisfied. The last Note that this approach can be easily adapted if there are more
inequality in (10) requires that the non-collision constraint be quadrotors than goals or vice versa.
satisfied for at least one face of the obstacle which implies 5) Relaxations for Large Teams: The solving time of the
that the prism does not collide with the obstacle. We can MIQP grows exponentially with the number of binary vari-
then introduce (10) into (8) for all nq quadrotors for all no ables that are introduced into the MIQP. Therefore, the direct
obstacles at nk intermediate time steps between waypoints. use of this method does not scale well for large teams. Here
The addition of the integer variables into the quadratic program we present two relaxations that enable this approach to be used
causes this optimization problem to become a mixed-integer for large teams of vehicles.
quadratic program (MIQP). a) Planning for Groups within a Team: A large team
3) Inter-Quadrotor Collision Avoidance: When transition- of vehicles can be divided into smaller groups. We can then
ing between waypoints quadrotors must stay a safe distance use the MIQP method to generate trajectories to transition
away from each other. We enforce this constraint at nk groups of vehicles to group goal locations. This reduces
intermediate time steps between waypoints which can be the complexity of the MIQP because instead of planning
represented mathematically for quadrotors 1 and 2 by the trajectories for all nq vehicles we simply plan trajectories
following set of constraints: for the groups. Of course we are making a sacrifice here by
∀tk : xT 1 (tk ) − xT 2 (tk ) ≤ dx12 (11) not allowing the quadrotors to have the flexibility to move
or xT 2 (tk ) − xT 1 (tk ) ≤ dx21 independently.
or yT 1 (tk ) − yT 2 (tk ) ≤ dy12 b) Planning for Sub-Regions: In many cases the en-
vironment can be partitioned into nr convex sub-regions
or yT 2 (tk ) − yT 1 (tk ) ≤ dy21 where each sub-region contains the same number of quadrotor
Here the d terms represent safety distances. For axially sym- start and goal positions. After partitioning the environment
metric vehicles dx12 = dx21 = dy12 = dy21 . Experimentally the MIQP trajectory generation method can be used for the
we have found that quadrotors must avoid flying in each vehicles inside each region. Here we require quadrotors to
other’s downwash because of a decrease in tracking perfor- stay inside their own regions using linear constraints on the
mance and even instability in the worst cases. Therefore we positions of the vehicles. This approach guarantees collision-
do not allow vehicles to fly underneath each other here. Finally, free trajectories and allows quadrotors the flexibility to move
we incorporate constraints (11) between all nq quadrotors in independently. We are gaining tractability at the expense
the same manner as in (10) into (8). of optimality since the true optimal solution might actually
4) Integer Constraints for Optimal Goal Assignment: In require quadrotors to cross region boundaries while this re-
many cases one might not care that a certain quadrotor goes laxed version does not. Also, it is possible that no feasible
to a certain goal but rather that any vehicle does. Here we trajectories exist inside a sub-region but feasible trajectories
describe a method for using integer constraints to find the do exist which cross region boundaries. Nonetheless, this
optimal goal assignments for the vehicles. This results in a approach works well in many scenarios and we show its
lower total cost compared to fixed-goal assignment and often application to formation transitions for teams of 16 vehicles
a faster planning time because there are more degrees of in Sec. VI.

221
Fig. 7. Software Infrastructure (a) Top view (b) Error time histories

Fig. 8. Formation following for a 4 quadrotor trajectory. In (a) the gray lines
VI. E XPERIMENTAL R ESULTS represent the desired trajecotories for each of the four vehicles and the black
A. Software Infrastructure for Groups lines represent the actual trajecotries. The errors are shown in (b). Here the
black line represents the formation error, ef (t), from the desired trajectory
Our architecture (Sec. V-A) is important for a very practical and the gray lines represent the local errors, eli (t), for each quadrotor.
reason. For a large team of quadrotors it is impossible to run
a single loop that can receive all the Vicon data, compute the
commands, and communicate with each quadrotor at a fast
enough rate. As shown in Fig. 7, each group is controlled
by a dedicated software node, running in an independent
thread. These control nodes receive vehicle pose data from
a special Vicon node via shared memory. The node connects
to the Vicon tracking system, receives marker positions for
each subject, performs a 6D pose fit to the marker data
and additional processing for velocity estimation. Finally, the
processed pose estimates are published to the shared memory
using the Boost C++ library [3]. Shared memory is the fastest (a) Error Data (b) Time-Separated Trajectory
method of inter-process communication, which ensures the Following
lowest latency of the time-critical data. Fig. 9. Part (a) shows the Average Standard Deviation for x, y, and z for
The control nodes, implemented in Matlab, read the pose 20 quadrotors in a grid formation. In (b) we show 16 quadrotors following a
data directly from shared memory and compute the com- figure eight pattern. See the video at http://youtu.be/50Fdi7712KQ
manded orientation and net thrusts for several quadrotors based
on the controller described in IV. For non-time-critical data the same and are running the same controller with the same
sharing we use Inter Process Communication (IPC) [4]. For gains. Therefore, even though the deviation from the desired
example, high-level user commands such as desired vehicle trajectory may be large, the relative position error within the
positions are sent to a planner which computes the trajectories group is small.
for the vehicles which are sent to the Matlab control nodes via In Fig. 9(a) we show average error data for 20 vehicles
IPC. IPC provides flexible message passing and uses TCP/IP flying in the grid formation shown in Fig. 1. For this exper-
sockets to send data between processes. iment the vehicles were controlled to hover at a height of
Each Matlab control node is associated with a radio module 1.3 meters for at least 30 seconds at several quadrotor-center-
containing a 900 MHz and 2.4 GHz Zigbee transceivers, which to-quadrotor-center grid spacing distances. The air disturbance
is used to communicate with all the vehicles in its group. created from the downwash of all 20 vehicles is significant and
The radio module sends control commands to several vehicles, causes the tracking performance to be worse for any vehicle
up to five in this work. Each vehicle operates on a separate in this formation than for an individual vehicle in still air as
channel and the radio module hops between the frequencies presented in 4. However, as shown in Fig. 9(a), the separation
for each quadrotor, sending out commands to each vehicle at distance did not have any affect on the hovering performance.
100 Hz. The radio modules can also simultaneously receive Note that at 35 cm grid spacing the nominal distance between
high bandwidth feedback from the vehicles, making use of the propeller tips is about 14 cm.
two independent transceivers.
C. Time-Separated Trajectory Following
B. Formation Flight In Fig. 9(b) we show a team of 16 vehicles following a
In Fig. 8 we present data for a team of four quadrotors cyclic figure eight pattern. The time to complete the entire
following a trajectory as a formation. The group formation cycle is tc and the vehicles are equally spaced in time along
tc
error is significantly larger the the local error. The local x and the trajectory at time increments of 16 . In order to guaruntee
y errors are always less than 3 cm while the formation x error collision-free trajectories at the intersection, vehicles spend
15 17
is as large 11 cm. This data is representative of all formation 32 tc in one loop of the trajectory and 32 tc in the other.
trajectory following data because all vehicles are nominally A trajectory that satisfies these timing constraints and has

222
Fig. 11. A team of sixteen vehicles transitioning from a planar grid to a
three-dimensional helix (top) and pyramid (bottom)
Fig. 10. Four groups of four quadrotors flying through a window
system for position estimation and therefore cannot be truly
some specified velocity at the intersection point (with zero decentralized at this stage, these results represent the first
acceleration and jerk) is generated using the optimization- step toward the development of a swarm of micro quadrotors.
based method for a single vehicle described in [17]. The small size is shown to facilitate agility and the ability
D. MIQP trajectories to fly in close proximity with less than one body length
separation. Mixed integer quadratic programming techniques
In this paper, we use a branch and bound solver [1] to
are used to coordinate twenty micro quadrotors in known
solve the MIQP trajectory generation problem. The solving
three-dimensional environments with obstacles.
time for the MIQP is an exponential function of the number
The videos of all experiments are available at
of binary constraints and also the geometric complexity of
http://youtu.be/50Fdi7712KQ
the environment. The first solution is often delivered within
seconds but finding the true optimal solution and a certificate R EFERENCES
of optimality can take as long as 20 minutes on a 3.4Ghz Core-
i7 Quad-Core desktop machine for the examples presented [1] IBM ILOG CPLEX V12.1: Users manual for CPLEX,
here. International Business Machines Corporation, 2009.
1) Planning for Groups within a Team: In Fig. 10 we [2] Ascending Technologies, GmbH. http://www.asctec.de.
show snapshots from an experiment for four groups of four [3] Boost C++ Libraries. http://www.boost.org.
quadrotors transitioning from one side of a gap to the other. [4] Inter Process Communication.
Note that in this example the optimal goal assignment is http://www.cs.cmu.edu/ ipc/.
performed at the group-level. [5] Vicon Motion Systems, Inc. http://www.vicon.com.
2) Planning for Sub-Regions: In Fig. 11 we show snapshots [6] Aeroenvironment. Aeroenvironment nano hummingbird,
from an experiments with 16 vehicles transitioning from a August 2011. Online: http://www.avinc.com/nano.
planar grid to a three-dimensional helix and pyramid. Directly [7] C. Anderson and N. R. Franks. Teams in animal societies.
using the MIQP approach to generate trajectories for 16 Behavioral Ecology, 12(5):534540, 2001.
vehicles is not practical. Therefore, in both experiments the [8] R. W. Beard, J. Lawton, and F. Y. Hadaegh. A coordina-
space is divided into two regions and separate MIQPs with 8 tion architecture for spacecraft formation control. IEEE
vehicles each are used to generate trajectories for vehicles on Trans. Control Syst. Technol., 9(6):777–790, November
the left and right sides of the formation. Note that, in general, 2001.
the formations do not have to be symmetric but here we exploit [9] C. Bermes. Design and dynamic modeling of autonomous
the symmetry and only solve a single MIQP for 8 vehicles for coaxial micro helicopters. PhD thesis, ETH Zurich,
these examples. Optimal goal assignment is used so that the Switzerland, 2010.
vehicles collectively choose their goals to minimize the total [10] S. Bouabdallah. Design and Control of Quadrotors
cost. with Applications to Autonomous Flying. PhD thesis,
Ecole Polytechnique Federale de Lausanne, Lausanne,
VII. C ONCLUSION Switzerland, February 2007.
In this paper we describe the design, manufacture, modeling [11] F. Bullo, J. Cortés, and S. Martı́nez. Distributed Con-
and control of a micro quadrotor that has a 73 gram mass and trol of Robotic Networks: A Mathematical Approach to
is 21 cm in diameter, and the architecture and software for Motion Coordination Algorithms. Applied Mathematics
coordinating a team of micro quadrotors with experimental Series. Princeton University Press, 2009.
results. While our quadrotors rely on an external localization [12] J. P. Desai, J. P. Ostrowski, and V. Kumar. Modeling and

223
control of formations of nonholonomic mobile robots. SEARCH, 2009.
IEEE Trans. Robot., 17(6):905–908, December 2001. [28] C. H. Wolowicz, J. S. Bowman, and W. P. Gilbert. Simil-
[13] M. Egerstedt and X. Hu. Formation constrained multi- itude requirements and scaling relationships as applied to
agent control. IEEE Trans. Robot. Autom., 17(6):947– model testing. Technical report, NASA, August 1979.
951, December 2001.
[14] Dario Floreano, Jean-Christophe Zufferey, Adam Klap-
tocz, Jrg Markus Germann, and Mirko Kovac. Aerial Lo-
comotion in Cluttered Environments. In Proceedings of
the 15th International Symposium on Robotics Research,
2011.
[15] V. Kumar and N. Michael. Opportunities and challenges
with autonomous micro aerial vehicles. In International
Symposium on Robotics Research, 2011.
[16] S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea.
A simple learning strategy for high-speed quadrocopter
multi-flips. In Proc. of the IEEE Intl. Conf. on Robot. and
Autom., pages 1642–1648, Anchorage, AK, May 2010.
[17] D. Mellinger and V. Kumar. Minimum snap trajectory
generation and control for quadrotors. In Proc. of the
IEEE Intl. Conf. on Robot. and Autom., pages 2520–
2525, Shanghai, China, May 2011.
[18] D. Mellinger, N. Michael, and V. Kumar. Trajectory
generation and control for precise aggressive maneuvers.
In Int. Symposium on Experimental Robotics, December
2010.
[19] N. Michael and V. Kumar. Control of ensembles of aerial
robots. Proc. of the IEEE, 99(9):1587–1602, September
2011.
[20] N. Michael, J. Fink, and V. Kumar. Cooperative ma-
nipulation and transportation with aerial robots. Auton.
Robots, 30(1):73–86, January 2011.
[21] J. Parrish and W. Hamner, editors. Animal Groups in
Three Dimensions. Cambridge University Press, New
York, 1997.
[22] D. Pines and F. Bohorquez. Challenges facing future
micro air vehicle development. AIAA J. of Aircraft, 43
(2):290–305, 2006.
[23] Tom Schouwenaars, Bart DeMoor, Eric Feron, and
Jonathan How. Mixed integer programming for multi-
vehicle path planning. In European Control Conference,
pages 2603–2608, 2001.
[24] Tom Schouwenaars, Andrew Stubbs, James Paduano, and
Eric Feron. Multi-vehicle path planning for non-line of
sight communication. In American Control Conference,
2006.
[25] S. Shen, N. Michael, and V. Kumar. Autonomous
multi-floor indoor navigation with a computationally
constrained MAV. In Proc. of the IEEE Intl. Conf. on
Robot. and Autom., pages 20–25, Shanghai, China, May
2011.
[26] H. Tanner, A. Jadbabaie, and G. J. Pappas. Flocking
in fixed and switching networks. IEEE Trans. Autom.
Control, 52(5):863–868, May 2007.
[27] Jur van den Berg, Stephen J. Guy, Ming C. Lin, and
Dinesh Manocha. Reciprocal n-body collision avoidance.
In INTERNATIONAL SYMPOSIUM ON ROBOTICS RE-

224
CompAct™ Arm: a Compliant Manipulator
with Intrinsic Variable Physical Damping
Matteo Laffranchi, Nikos G. Tsagarakis and Darwin G. Caldwell
Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genova, Italy
{matteo.laffranchi; nikos.tsagarakis; darwin.caldwell}@iit.it

Abstract — Humans exploit compliance in their biomechanical


muscle-tendon-bone actuation structure to enable robust and safe
interaction with the environment and utilize the elastic energy
stored into muscles and tendons to obtain large energy efficiency
or high output mechanical power peaks at their limbs. From the
robotic/mechatronic point of view it is clear that emulating such a
property in robotic actuation systems enables the achievement of
performance which is not possible with classical stiff designs. In
contrast to this, transmission compliance introduces some
disadvantages as e.g. typically underdamped modes which reduce
the achievable control bandwidth, stability margin and accuracy
of the controlled system. These limitations are solved in
mammalians by means of physical damping which clarifies why
these biological systems are able of performing fast and smooth
yet accurate motions in their limbs. This motivates this work
(a) (b)
which consists in the analysis and development of the CompAct™
Arm, a novel compliant manipulator with intrinsic variable Figure 1. The CompAct™ Arm: (a) external layout semi-transparent CAD
damping. This is probably the first robotic system to exhibit these view and (b) fully assembled prototype
diverse bio inspired characteristics. A motivation analysis is
initially presented to show how the drawbacks introduced by [1-4]. The reason for this is that rigid systems do not permit
compliance can be overcome by means of physical damping. The mechanical energy storage while compliant systems as e.g.
second part of the paper presents the mechatronic development humans’ biological structures can exploit this property to
of the robotic manipulator and preliminary experimental results. improve the mentioned characteristics. These bio-inspired
drives motivate scientists to attempt to replicate compliance
Keywords – Mechatronic Design, Compliant assembly, Bio- within robots with a variety of solutions [1-3, 5].
Inspired Design, Dynamics Although compliance can effectively make the difference,
I. INTRODUCTION from the engineering point of view it introduces also some
side effects. It typically induces underdamped dynamics which
Improvements in production time and the quality of the deteriorate the stability margin and the accuracy of the robot
products lie at the heart of modern manufacturing and the use (due to the induced oscillations) and limits the bandwidth
of robots has formed an increasingly important aspect of the which can be set in the controlled system [6-8]. These
drive for efficiency. These robots are designed for precision, drawbacks are tackled in mammalians, which exhibit joint
speed and repeatability and usually work in restricted areas to compliance, by the synergetic presence in their body of both
prevent any harmful interaction with humans. However, new compliance and physical damping. The properties of passive
opportunities are arising in houses and offices that mean that compliance and damping typical of biological joints explain
robots will not be confined to these relatively restricted factory why e.g. humans are able to safely interact and be robust
environments and this sets new demands in terms of ability to during collision due to compliance while at the same time
interact with these less structured and more uncertain having the potential of achieving fast, smooth yet accurate
environments. It is evident that classical heavy and stiff motion thanks to physical damping, [9-11].
manipulators with high control gains are not suitable to From the design perspective, it can be concluded that the
cooperate and work within these new operational areas. In introduction of damping in the mechatronics of compliant
fact, the typical design/control approaches mentioned robot joints (possibly with the ability of varying its level) can
previously make the resulting robots present serious be a solution to overcome the drawbacks introduced by
interaction limitations and large output mechanical impedance compliance. Apart from this, previous work demonstrated that
which means that the robot and human safety are physical damping can enhance the dynamic performance and
compromised when unexpected events occur. Yet, other accuracy of force-torque controlled systems [12]. Although
aspects than safety can be jeopardized by this actuation some of the mentioned advantages are obvious, a rigorous
approach as e.g. mechanical robustness and energy efficiency

225
analysis is necessary to clarify in depth the role of physical
damping in compliant robotic joints.
This paper proposes in its first part an analysis on the
effects of physical damping in compliant robotic joints
assessing how this parameter affects the dynamic (a) (b)
performance, energy efficiency and safety of compliant
Figure 2. (a) Equivalent model of a SEA with variable physical damping and
actuators. The positive outcome of this study motivated the (b) model of the constrained collision scenario
design and development of the CompAct™ Arm, i.e. a
manipulator powered by actuators with intrinsic rich skills B. Dynamic performance
related to the synergetic combination of passive compliance Assuming no interaction is involved (Fout = 0N), the
and variable physical damping. To the authors knowledge this following transfer function between rotor and link velocity (Vθ
is the first robotic systems to exhibit these combined and Vq) can be obtained from (1):
properties. We believe that such a system is step towards the (2)
achievement of functional performance of natural systems and V Ds  K
P( s )  q

particularly the human in terms of motion agility, safety, V M L s 2  Ds  K
energy efficiency and power. The mechatronic details of this
In fact, an increase of the variable joint viscous damping D of
highly integrated system are presented in the second part of
the system in Fig. 2a, can be increased to shift the zero of (2)
this paper.
to lower frequencies to introduce a phase lead action (typically
The paper is structured as follows: Section II presents an
from resonance to approximately 2 decades after) avoiding the
analysis to evaluate and quantify how physical damping
sharp and significant phase lag introduced by the complex
affects the characteristics of a generic compliant actuator
conjugate poles caused by compliance(1). This improves the
while Section III introduces the mechatronic details of the
stability of the system hence facilitating the link
actuation units employed for the development of the
position/velocity control. At the same time, the damping D can
manipulator. Section IV defines the kinematic layout of the
be set sufficiently large so as to reduce the amplitude of the
robot and analyses a dynamic simulation involving an
oscillations of the vibration modes and facilitate accurate
interactive task for determining the preliminary specifications
position tracking when needed. In addition to this, the
of the manipulator and related actuation systems. The robot
magnitude of (2) increases with joint damping in the high
assembly and design is analysed in Section V, with Section VI
frequency domain above resonance. From the performance
presenting preliminary results obtained with the prototype.
perspective this is a further attractive feature since this means
Finally, Section VII addresses the conclusions and future
that for a given actuation system the maximum output speed
work.
limit increases at certain operating frequency (above
II. DYNAMICS OF COMPLIANT ACTUATION SYSTEMS WITH resonance).
INTRINSIC VARIABLE DAMPING The high frequency performance improvement gained by a
joint damping increase can be quantified by the following
A. Mechanical model coefficient:
Considering the basic equivalent linear model of a
P  (s  j ) 
D
(3)
compliant actuator, Fig. 2a, the system dynamics can be GD  lim 
expressed by the following set of equations:  
P (s  j ) D
M x  D( x  x )  K ( x  x )  F
R   q  q in
(1) + -
where P and P are plants having the form of (2) and featuring
M L xq  D( x  xq )  K ( x  xq )  Fout two different levels of joint damping D+ and D- such that D+ >
D-. Equation (3) shows that GD increases proportionally with
where MR = 0.5 kg and ML = 1 kg are the reflected rotor and joint damping meaning that dynamic performance can be
equivalent link inertias, D and K = 100 N/m are the joint augmented by means of damping.
viscous damping and stiffness, xθ and xq are the rotor and link
equivalent linear displacement, Fin and Fout are the effort C. Energy efficiency
provided by the actuation system and the force applied to the Considering the system in Fig. 2a, the rotor velocity vθ
outer link, respectively. The basic model presented in Fig. 2a required to produce certain link motion can be computed using
makes some assumptions, i.e. it does not consider any (2), while the required actuator force Fin is obtained using (1).
dissipative effect (such as e.g. friction/damping due to the The corresponding mechanical power input Pin required to
gearbox, bearings, etc. or due to joule losses in the resistance provide a desired link velocity can therefore be computed in
of the drivers/windings) apart from that of the transmission the frequency domain as:
damper D and neglects the electrical (or hydraulic) dynamics
of the driving system. In fact, these hypotheses are made on
purpose to make the analyses independent of the specific
1
nature of the actuator and make the analysis and results more In fact, compliance acts as a limiting factor to the dynamic performance
which can be achieved in the controlled system. As a rule of a thumb, the
generalised. bandwidth of closed loop (lightly damped) systems should be placed below
1/3 of the mechanical poles introduced by compliance, [6].

226
Pin (s)  Fin (s) *V (s) (4) constrained collision of the system in Fig. 2b, [14]. This
scenario is modelled as in Fig. 2b with the purpose of
Considering a required velocity output of amplitude 0.1 rad/s,
characterizing the joint force generated at collision.
the required RMS power input as a function of the frequency A transfer function which can be used for the evaluation of the
is shown in Fig. 3 for different physical viscous damping safety is the force transmissibility between the input and joint
levels. It can be observed that the effect of damping on the
force(3). For the system in Fig. 2b, this function is:
power consumption varies in function of the operating
F Ds  K (5)
frequency, that is, in the low frequency domain
T ( s)  j

(approximately in the range of [0, 1]Hz for this specific Fin M R s 2  Ds  K
system) there is no substantial change varying damping;
This transfer function has the same form of (2), apart from the
whereas systems with small damping are much more energy
efficient in proximity of resonance. fact that the reflected mass appearing at denominator is that of
the rotor instead of that of the link. This means that (3) is valid
also for (5), that is, the same amount of improvement gained
in performance is lost in safety. In fact, this does not represent
a problem because if the actuator has the possibility of varying
the joint physical damping level as in the systems of Fig. 2,
this parameter can be adjusted to an optimal value as a
function of the system states and of the required safety level
(as in e.g. the fast and soft paradigm, [5]) to obtain the best
safety-performance tradeoff.
Figure 3. Input mechanical power consumption required to provide an output
velocity of amplitude 0.1 rad/s for different damping ratio values as III. MECHATRONICS OF THE ACTUATION SYSTEMS
a function of the operating frequency.
A. Damping systems
This last effect is quite obvious since the system is Dampers can be categorized in three main classes: passive,
exploiting its natural dynamics to perform the motion. In active [15] and semi-active [7, 12] basing on the amount of
contrast to this, it is quite surprising that systems with high external power required for the damping control system to
joint physical damping feature much higher energy efficiency perform its functionality. Purely passive dampers are not able
than compliant actuators with low joint viscous friction in the to adapt the damping value in function of the system
high frequency domain. This is due to the fact that motor and configuration/state, while active dampers can be employed to
link in compliant actuators are highly decoupled in the high suppress oscillations however they do not offer the benefits
frequency domain and therefore in this region much more analysed in the previous section. On the other hand, semi-
effort is required on the motor side to compensate for this active solutions offer the reliability of passive devices while at
decoupling effect when damping is low [13]. In other words, the same time maintain the versatility and the adaptability of
compliance can be beneficial from the efficiency perspective active systems without requiring large amount of energy [7].
when the system operating frequency matches the natural In addition, differently from active implementations, they
dynamics (as in e.g. [1]), however when the task requires cannot inject mechanical power into the system and therefore
higher frequency motions the system has to work against the they do not have the potential of destabilizing the controlled
physics of the system hence compromising energy actuator. When appropriately controlled semi-active systems
efficiency(2). This in turns means that physical damping can be perform significantly better than passive devices and have the
attractive especially in fixed stiffness designs (as e.g. SEAs) potential of achieving the majority of the performance of the
where dynamics cannot be tuned: a tunable damper placed in fully active configuration (e.g. ability of regulation of the
parallel to the transmission compliance can be an effective damping level over wide ranges) [7]. Furthermore, a semi-
mean for the development of performing and efficient (at high active damping system can be completely disconnected in
frequency spectrum) compliant actuators. particularly critical conditions to maximize the decoupling
D. Safety effect of compliance, such as e.g. when the robot is working in
close proximity of the human/environment. This suggests that
The higher dynamic performance and energy efficiency
the use of an actuation system which can provide passive
achieved in plants with larger physical viscous damping level
compliance and variable physical damping (using e.g. a semi
demonstrated previously is due to the increased coupling
active damper) can effectively be employed to develop
between rotor and link. This is on one hand an attractive
human-friendly and still well performing robots and is
property of physical viscous damping, however on the other
therefore employed in the actuation modules used for the
hand this quantity introduces the side effect of a decreased
development of the manipulator presented in this work.
safety level due to the generation of higher forces at impact.
To evaluate this effect, we consider here the risky case of a

2 3
It is worth mentioning that this effect can be observed also in the time In this work we do not consider the contribution to the impact force of the
domain for tracking profiles whose dominant spectral components are well reflected link mass since this variable is independent from the transmission
below the system natural frequency [13]. viscous damping and stiffness.

227
B. The modular design concept interaction. The highly integrated modules embed all the
Looking at the kinematics of existing serial robots e.g. the mechatronics of the system including motor-gearbox group,
Comau Smart NM, [16], Barrett WAM, [17], but also series compliant joint, semi-active damping system, control
humanoid robots as the Honda ASIMO, [18], Figs. 4a, b, it can and conditioning electronics.
be observed that their kinematic structure is mainly 1) Longitudinal actuation module
composed of revolute joints placed in two different The design of this module is shown in Fig. 5. The actuation
configurations. system is fixed to the carbon fibre frames in two points, i.e. at
the base and front support flanges of the actuator, Fig. 5a. This
is to avoid the application of thrust moments to the actuator
which may result in malfunction due to the misalignment
between the frictional surfaces of the damping system, Fig. 5b.
The input interface is an aluminium disk which is directly
connected to the frame by means of structural adhesive while
(a) (b) the output interface is an aluminium component machined
Figure 4. Conceptual schemes showing the kinematics of (a), Comau Smart with a geometry which is the negative of that of the input disk.
NM, (b) Honda ASIMO and Barrett WAM manipulators. The thickness of the carbon fibre frame has been dimensioned
to 1.5mm which guarantees sufficient strength of the structure
The first actuator kinematic configuration counts the joint axis when the nominal load is applied to the fully extended arm
aligned with the robot structure (this configuration is defined and has been validated by means of finite elements analysis
as “longitudinal joint” in this work), whereas the second (FEM). The achievable range of motion for this joint is [0, 2π]
arrangement is represented by the joint axis placed rad, although its motion can be constrained on demand to
perpendicularly to the robot structure, which is instead defined shorter ranges by means of mechanical locks. The total weight
as “transverse joint” configuration. For instance, the Comau of the unit is mS = 2.2 kg. The fully assembled module is
robot shown in Fig. 4a is made of a longitudinal-transverse- shown in Fig. 7a.
transverse configuration while the ASIMO robot arm, Fig. 4b,
presents a kinematic chain made of a longitudinal-transverse-
longitudinal-transverse arrangement.
From the analysed examples of Fig. 4, it can be concluded
that from the kinematic perspective a serial robot uses mostly
revolute joints placed in two basic configurations: the
longitudinal and the transverse joints. A serial robot design
can therefore be developed by means of an appropriate serial
combination of these two joint configurations. The employed
actuation system is a series elastic actuator which presents the
ability of regulating the physical damping in parallel to the
joint compliance and is therefore suited for the development of (a) (b)
this manipulator [8]. The adjustment of the damping is Figure 5. (a) Open view (greyed components: actuation system, electronics,
realized by a semi-active friction damper actuated by light and input-output interfaces) and (b) semi-exploded cross section view
compact piezoelectric stack actuators(4), Fig. 5b. Based on the of the longitudinal actuation module (greyed components:
modular design concept the actuation units were formatted piezoelectric stacks of the VPDA, springs of the compliant module,
frictional surfaces of the VPDA). For more details on this design,
following the Longitudinal and Transverse kinematic schemes please refer to [8].
for developing the full arm. To satisfy the lightweight property
of the overall arm a carbon fibre frame was used in this 2) “Transverse” actuation module
design. The mentioned structure is used not only as frame but In the transverse configuration the frame has been designed in
also as a cover preventing the human/environment to enter in order to allow the replication of the kinematic configuration
contact with the mechatronics of the manipulator therefore mentioned previously. Similarly as in the longitudinal module,
improving its robustness and safe properties. A further the actuator is supported on both sides by means of the base
attractive aspect of the use of composite material is that in and front flanges for the same reasons previously explained,
contrast to traditional manufacturing processes as milling or Fig. 6. The input and output mechanical interfaces present the
sheet metal forming it facilitates the development of parts same geometry as that of the longitudinal module in order to
without sharp edges which may result unsafe during fulfil the modular property and permit arbitrary
interconnections between the modules. In this case the
4
The use of semi-active damping systems placed between the actuator and thickness of the carbon fibre structure is 1.5mm as well as in
the load is not a new concept in robotics, see e.g.[12]; however these works do the previous case whereas the maximum achievable range of
not make use of mechanical compliance. In fact dampers in these works are motion for this joint is [-π/2, π/2] rad and its total weight is
employed as transmission systems (similarly as clutches) and they are hence mT = 2.4 kg. The fully assembled prototype is shown in Fig. 7a
used to deliver the actuator effort to the load. This means that these systems
typically suffer of low energy efficiency when compared to SEAs/VSAs or whereas its specifications are reported in Tab I together with
hybrid solutions as that presented in this work,[12]. those of the longitudinal module.

228
at the output of the harmonic drive gear to measure the joint
torque. Each module is provided with a motor control board,
Fig. 7b, used for the sensor data acquisition, control of the
whole unit and generation of the command voltages for the
brushless DC motor and the piezoelectric driver amplifier
board, Fig. 7c. This latter board is used for amplifying the
control signals for the piezo stack actuators of the employed
friction damper in the nominal voltage range of [0, 150]V as
required by the piezos. The communication with the controller
is performed through a 100Mbit Ethernet interface utilizing
both TCP and UDP protocols.
Figure 6. Open view of the transverse actuation module (greyed components:
actuation system, electronics, input-output interfaces). For the IV. MANIPULATOR SPECIFICATIONS
mechanical design of the actuation system refer to Fig 5b and to[8].
Among the most important issues to be addressed when
TABLE I considering anthropomorphic/humanoid design is the
SPECIFICATIONS OF THE LONGITUDINAL AND TRANSVERSE definition of the basic layout of the robot. This phase includes
ACTUATION MODULES
the definition of the kinematics and dimensions of the arm
Parameter Value which guarantee an appropriate nominal payload at the same
time permitting an overall natural and proportioned design.
Gear ratio – N 100
The size of the manipulator is approximately that of an adult
Power 190 W
human arm. Regarding the kinematics, three degrees of
Maximum output continuous torque 40 Nm freedom are located at the shoulder complex and
Maximum joint velocity 10.7 rad/s (≈ 610 °/s) complemented by an additional degree of freedom for the
Maximum rotary passive deflection ±0.18 rad implementation of the elbow flexion-extension DOF. The first
three DOFs will reproduce, in sequence, the arm extension-
Maximum damping torque 9 Nm
flexion, adduction-abduction and humeral roll degrees of
Joint stiffness 188 Nm/rad
freedom accordingly with the ball-socket kinematic shoulder
Joint Damping Range [0, 9] Nms/rad model, [19]. The elbow pronation-supination degree of
Inherent joint viscous damping 0.25 Nms/rad freedom is not implemented in this first manipulator version
Total mass of the longitudinal actuator - mS 2.2 kg since this degree of freedom plays a less important role in the
Total mass of the transverse actuator - mT 2.4 kg
physical human robot interaction and therefore the inclusion of
compliance (and related variable physical damping) is likely
not strictly necessary for this specific degree of freedom. The
average ranges of motions of the adult human arm are used as
a starting point for the selection of the moveable ranges of
each joint of the manipulator. These are reported in Tab. II,
[19](5).
(b)
A. Static analysis
Although an arm with dimensions exactly equal to those of a
50th percentile male can be designed by means of the modules
presented in Section III, these dimensions have equally been
increased with the purpose of making the manipulator layout
(a) (c) proportioned at the same time guaranteeing that the actuators
Figure 7. (a) Longitudinal (left) and transverse (right) actuation modules, (b) are capable of delivering the required torque. Statically, the
Actuators control board and (b) Piezo amplifier board configurations with fully extended arm on the transverse plane
are the worst case layouts since the arm centre of mass is distal
C. Sensing and electronics hence maximizing the lever arm of the moment produced by
Regarding the available sensors, both modules are equipped gravity. The worst case load scenarios of the manipulator are
with an incremental 12-bit position encoder placed on the shown in Fig 8. The two cases of Fig. 8 are equivalent due to
motor shaft and two magnetic 12-bit absolute encoders to the properties of the roll-pitch-roll joint employed for the
monitor the compliant module deflection angle and the development of the shoulder and this allows finding a unique
mechanical angle of the motor after the reduction drive. A solution to the dimensioning problem.
strain gauge – based custom made force sensor is machined on
the actuator structure to permit the measurement of the force
applied by the piezo actuators, whereas a custom made torque
sensor which exploits the same sensing principle is assembled 5
Note that the ranges of motion reported in Tab. II have been slightly
extended with respect to those of the human arm

229
assembly of a hand/gripper at the robot end effector still
leaving some load margin. The overall dimensions, mass and
nominal payload of the manipulator are reported in Tab. II.
B. Dynamic and interaction analysis
Using the determined kinematic layout and the physical
properties (e.g. mass, stiffness) reported in Tab. I and II as a
baseline, dynamic simulations have been carried out to give an
indication of the typical torque/velocity profiles required by
the employed actuation system when the arm is performing a
(a) (b) task which requires physical interaction. A further mass of
Figure 8. Worst case load configurations of the arm. In the configuration of weight 2 kg has been applied to the end effector to simulate a
(a) the shoulder adduction-abduction actuator receives the load and/or the mass of a robotic hand/gripper. As an example
maximum load, while for (b) the most stressed actuator is that of
the flexion-extension DOF
case, we analyse here the previously studied task of a
manipulator writing on a blackboard, [21], Fig. 9.
The equilibrium equation of the static torque due to
gravitational forces at the critical actuators (greyed in Fig. 8)
is:
T  L m g  L m g  ( L  L )m
1COG s SE T SE EW g (6) pload
where T is the torque applied to the critically loaded actuators,
L1COG is the distance between the point of intersection of the (a)
axes of the DOFs of the shoulder (defined as pSH) and the
centre of gravity of the humeral roll actuator and is given by 10
the dimensions of the mechanical assembly of the longitudinal Torque [Nm] 0
and transverse actuation modules. LSE represents the distance
between pSH and the centre of gravity of the elbow flexion- -10
SFE SAA HRO EEH ti
extension DOF whereas g is the acceleration of gravity and P -20
0 5 10 15
= 1.8 is the ratio between the total arm and the humerus Time [s]
lengths (P = (LSE+LEW)/LSE using the notation of Fig. 8) of a (b)
50th percentile adult male, [20]. Figure 9. (a) Snapshot sequence of the simulation animation (manipulator 3D
model in black and blackboard in grey) and (b) torques required for
TABLE II the execution of the task. The acronyms SFE, SAA, HRO, EFE
SPECIFICATIONS AND RANGES OF MOTIONS (ROM) OF THE MANIPULATOR represent the Shoulder Flexion-Extension, Shoulder Abduction-
Adduction, Humeral Roll and Elbow Flexion Extension DOFs
Parameter Value while ti is the instant when interaction begins.

Arm base – shoulder axes intersection distance (LBS) 193·10-3 m The joint trajectory data obtained from the experiments in
Shoulder axes intersection – elbow axis distance (LSE) 393·10-3 m [21](6) were imposed to the motor positions of the manipulator
Elbow axis – wrist axes intersection distance (LEW) 360·10-3 m model to obtain the required joint torques (which result from
Total weight (marm) 9.9 kg gravity, dynamics and interaction). A virtual rigid wall
simulating the blackboard was implemented in the simulator to
Nominal static load at the EE @ extended arm (mpload) 4 kg
generate normal reaction forces on the manipulator end
Arm extension-flexion ROM [-45, 180]° effector when interaction was verified. The joints torques
Arm adduction-abduction ROM [0, 170]° required for the execution of this task are shown in Fig. 9b.
Humeral roll internal-external rotation ROM [-90, 90]° Figure 9b shows that before interaction (this occurs at ti =
Elbow flexion-extension ROM [0, 145]° 2.2s), the required torques reach relatively small magnitude
levels, i.e. a maximum amplitude of approximately 5 Nm in
Setting the payload to mpload = 4kg and considering that the the shoulder abduction-adduction joint. This occurs because in
critical actuators are loaded at their maximum torque value this phase the actuators compensate for the gravity and the
(Tab I) and solving (6) with respect to LSE it is possible to inertial torques only. However, after this instant the torques
obtain the value of this parameter while the total arm length increase in magnitude due to the increased effort required by
can be set by means of the scaling factor P the interaction. It is important to notice that the torques of Fig.
L L L P
SE EW SE
(7) 9b correspond to the effort required to perfectly track the
where LEW is the distance between the elbow and the wrist desired motor positions and therefore can represent a scenario
axes. Considering the specified payload the layout of the arm
is proportionally scaled with a factor of approximately 40% 6
with respect to the dimensions of a 50th percentile human arm, We use this example case as the Barrett WAM arm employed for the
experiments in [21] presents a kinematic layout which is very much similar to
[20]. At the same time, such a nominal payload will permit the that of the arm presented in this paper.

230
where high gain motor position controllers are implemented. VI. EXPERIMENTAL RESULTS
This hence constitutes a worst case situation since robots Preliminary results are presented in this section as proof of
designed for interaction should employ safety-oriented concept of the employed actuation system applied to the
controllers which can adapt the robot configuration to limit the presented design. A weight of mass 3kg was assembled at the
interaction force/torque. link extremity of an experimental prototype version of the
The maximum output velocity required by the actuators in the actuation modules shown in Fig. 7 in order to replicate a
execution of this task is 0.42 rad/s and is reached by the elbow moment of inertia of JL = 0.13 kg·m2 which simulates the load
joint. The obtained maximum velocities and torques are much of a possible robot configuration (it is about one third of the
lower than the nominal values which can be generated by the highest moment of inertia which can be reached by the
actuators, Tab. I, demonstrating that such an interaction task manipulator, i.e. that seen by the proximal actuator for the
can be executed by the presented system with large torque and fully extended arm, Jarm = 0.42 kg·m2 ≈ 3JL). A step position
velocity safety margins. reference of amplitude 0.15 rad (8.6°) was sent to the position-
V. MANIPULATOR ASSEMBLY AND KINEMATICS controlled motor in order to induce oscillations to be damped
by the semiactive damper while the link position was
A roll-pitch-roll “spherical” joint made of the series recorded, Fig. 11. It is important to remark that no controllers
longitudinal-transverse-longitudinal forms the shoulder of the to shape the dynamics (as e.g. impedance/admittance control)
manipulator. Assembled to this, a hollow carbon fibre part were implemented on the motor which was controlled using
defined as “Humerus” is used to connect the shoulder to the pure position control in order to show the action of the full
elbow. The Humerus output interface has been inclined of 50 module with the variable physical damper which is regulated
degrees with respect to its input flange to respect the set range using the viscous damping control scheme proposed in [8]. At
of motion specifications, Tab. II. The output flange of the the same time, two levels of desired joint physical damping
Humerus is connected with the input interface of another (corresponding to damping ratios of ζ1 = 0.3, ζ2 = 1(7)) were
transverse actuator which implements the elbow flexion- sent to the damping system in order to show the ability of the
extension degree of freedom. Finally, a hollow part damping module in the dynamic replication of the desired
implements the manipulator forearm hence completing the joint physical damping, Fig. 11. The experimental data
design of the arm. obtained in Fig. 11a has been compared with simulation
results to validate the effectiveness of the system
mechatronics.

Figure 10. (a) Kinematics of the manipulator and reference frames according
to the Denavit-Hartenberg convention. The frame shown at the
manipulator base is used as “home” configuration for the DH
parameters.

TABLE III
D-H PARAMETERS OF THE MANIPULATOR
Link ID D θ a Α Figure 11. (a) Recorded link position step responses of the actuation system
for different levels of joint physical damping and overplots of the
experimental results shown in (a) with the simulation results of the
1 LBS (3/2)π+q2 0 (3/2)π
(b) high and (c) low damping cases.
2 L12 (3/2)π-q2 0 (3/2)π
3 LSE (3/2)π+q2 LSE_H (3/2)π
4 0 q4 LEW 0 The simulated link position data was obtained by using the
This latter part is designed so as to permit the future recorded motor trajectory in the experiments shown in Fig.
integration of a robotic hand. The full arm design is shown 11a as a source of flow for an equivalent mass-spring-variable
in Fig. 1. A schematic summarizing the kinematics of the damper system which simulates the joint-link assembly. These
manipulator is shown in Fig. 10 whereas their Denavit- have been plotted together with the experimental link position
Hartenberg parameters are reported in Table III. The ranges of data either for a low (D1 = 1.48 Nms/rad, ζ1 = 0.3) and high
motion reported in Tab. II are implemented by means of a
mechanical pin-based locking mechanism which is employed 7
Note that this quantity is intended in this paper as the equivalent damping
in each joint of the arm. ratio of the mass spring damper system formed by the damped compliant joint
and the link inertia (locked system).

231
(D2 = 9.89 Nms/rad, ζ1 = 1) joint viscous damping levels, Figs. [2] M. G. Catalano, G. Grioli, M. Garabini, F. Bonomo, M. Mancini, N. G.
Tsagarakis, and A. Bicchi, "VSA - CubeBot. A modular variable
12b and 12c, respectively. The resemblance between stiffness platform for multi degrees of freedom systems," in
simulated and experimental data is clear, however International Conference on Robotics and Automation, Shanghai, China,
nonlinearities such as e.g. dry friction due to the bearings and 2011.
assembly have not been considered into the model used for the [3] A. M. Dollar and R. D. Howe, "A robust compliant grasper via shape
simulation and this creates slight estimation errors. deposition manufacturing," Mechatronics, IEEE/ASME Transactions on,
vol. 11, pp. 154-161, 2006.
VII. CONCLUSIONS AND FUTURE WORK [4] M. Laffranchi, N. G. Tsagarakis, F. Cannella, and D. G. Caldwell,
"Antagonistic and series elastic actuators: a comparative analysis on the
This paper presented the analysis and development of the energy consumption," in Intelligent Robots and Systems (IROS).
CompAct™ Arm, i.e. a highly integrated compliant IEEE/RSJ International Conference on, St. Louis, USA, 2009, pp. 5678-
5684.
manipulator with intrinsic variable physical damping. The
[5] A. Bicchi and G. Tonietti, "Fast and soft arm tactics.," Ieee Robotics &
positive results obtained in the first part of the paper Automation Magazine, vol. 11, 2004.
demonstrated that joint physical damping has a significant [6] A. De Luca and W. Book, "Robots with Flexible Elements," in
positive effect in the dynamic performance, accuracy and Handbook of Robotics, S. K. Editors, Ed., ed: Springer, 2008.
efficiency (in the high frequency domain) of compliant robots. [7] M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "A variable
This motivated the employment of a piezoelectrically actuated physical damping actuator (VPDA) for compliant robotic joints," in
semi-active friction damper in parallel to the transmission Robotics and Automation (ICRA), IEEE International Conference on,
Anchorage, USA, 2010, pp. 1668-1674.
compliance of the actuation system. The resultant manipulator
[8] M. Laffranchi, N. Tsagarakis, and D. G. Caldwell, "A compact
is probably the first robotic system to exhibit these rich bio- compliant actuator (CompActTM) with variable physical damping," in
inspired physical properties making a step towards the Robotics and Automation (ICRA), 2011 IEEE International Conference
development of robots with enhanced capabilities and on, Shanghai, China, 2011, pp. 4644-4650.
performance approaching those of biological systems and in [9] T. E. Milner and C. Cloutier, "Damping of the wrist joint during
particular of the human. voluntary movement," Experimental Brain Research, vol. 122, pp. 309-
317, 1998.
In addition, this anthropomorphic manipulator presents low
[10] H. Gomi and R. Osu, "Task-Dependent Viscoelasticity of Human
weight (thanks to the compact design and carbon fibre frame) Multijoint Arm and its Spatial Characteristics for Interaction with
and joint compliance at all DOFs with the objective of Environmentsq," J. Neuroscience, vol. 18, pp. 8965-8978, 1998.
realising high interaction-related performances, enhance [11] T. Tsuji, Y. Takeda, and Y. Tanaka, "Analysis of mechanical impedance
robustness and safety. It is developed following a modular in human arm movements using a virtual tennis system," Biological
Cybernetics, vol. 91, pp. 295-305, 2004.
design concept, i.e. the use of highly integrated actuation
[12] C. Chee-Meng, H. Geok-Soon, and Z. Wei, "Series damper actuator: a
modules which embed the whole actuator mechatronics. The novel force/torque control actuator," presented at the Humanoid Robots,
obvious advantage of such a design approach is its versatility, 2004 4th IEEE/RAS International Conference on, 2004.
which facilitate the development of further robots (e.g. full [13] M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "Analysis and
humanoids) by employing the two presented modules. The Development of a Semiactive Damper for Compliant Actuation
intrinsic characteristics of the presented system (lightweight, Systems," IEEE/ASME Transactions on Mechatronics, 2012.
joint compliance and variable physical damping) will make [14] M. Laffranchi, N. G. Tsagarakis, and D. G. Caldwell, "Safe human robot
interaction via energy regulation control," in Intelligent Robots and
this robot able to interact with the environment presenting at Systems (IROS). IEEE/RSJ International Conference on, St. Louis,
the same time robustness, good accuracy and dynamic USA, 2009, pp. 35-41.
performance. [15] J. Y. Lew and S. M. Moon, "A simple active damping control for
Further experiments will be carried out on the full compliant base manipulators," Mechatronics, IEEE/ASME Transactions
manipulator to show the benefits of the introduced damping as on, vol. 6, pp. 305-310, 2001.
demonstrated in the first part of the paper and to fully [16] http://www.comau.it/.
characterise the effects of damping on safety. Future [17] http://www.barrett.com/robot/index.
developments will also include the design of novel control [18] M. Hirose and K. Ogawa, "Honda humanoid robots development,"
strategies intended to fully exploit the functionalities and Philosophical Transactions of the Royal Society A: Mathematical,
Physical and Engineering Sciences, vol. 365, pp. 11-19, January 15,
advantages of the incorporated semiactive damping system. 2007 2007.
[19] V. M. Zatsiorsky, Kinematics of Human Motion. Leeds, UK: Human
ACKNOWLEDGMENT Kinetics Pub., 1998.
This work is supported by the European Commission [20] A. R. Tilley and H. D. Associates, The Measure of Man and Woman:
project SAPHARI FP7-ICT-287513. Human Factors in Design: Whitney Library of Design, 1993.
[21] A. Pistillo, S. Calinon, and D. G. Caldwell, "Bilateral Physical
REFERENCES Interaction with a Robot Manipulator through a weighted Combination
of Flow Fields," presented at the International Conference on Intelligent
[1] A. Jafari, N. G. Tsagarakis, and D. G. Caldwell, "Exploiting Natural
Robots and Systems, San Francisco, CA, USA, 2011.
Dynamics for Energy Minimization using an Actuator with Adjustable
Stiffness (AwAS)," presented at the International Conference on
Robotics and Automation, Shanghai, China, 2011.

232
Robust Loop Closing Over Time
Yasir Latif, Cesar Cadena and José Neira
Instituto de Investigación en Ingenierı́a de Aragón, I3A
Universidad de Zaragoza
Zaragoza, Spain
Email: {ylatif, ccadena, jneira} @unizar.es

Abstract—Long term autonomy in robots requires the ability to All our method needs is a system that is able to generate a
reconsider previously taken decisions when new evidence becomes pose graph for the sequential poses and a place recognition
available. Loop closing links generated by a place recognition system that can provide constraints for loop closure.
system may become inconsistent as additional evidence arrives.
This paper is concerned with the detection and exclusion of such In the next section, we discuss relevant work and highlight
contradictory information from the map being built, in order to the need for a robust loop closing method. In section III
recover the correct map estimate. We propose a novel consistency we detail our proposal, the RRR algorithm, and carry out
based method to extract the loop closure regions that agree both real experiments in section IV. Finally, in section V and VI
among themselves and with the robot trajectory over time. We discussion and conclusions about the work are presented, along
also assume that the contradictory loop closures are inconsistent
among themselves and with the robot trajectory. We support our with possible future work.
proposal, the RRR algorithm, on well-known odometry systems,
e.g. visual or laser, using the very efficient graph optimization II. R ELATED W ORK
framework g2o as back-end. We back our claims with several Several approaches to persistent mapping have appeared
experiments carried out on real data. in the robotics literature in the recent past. Konolige and
Bowman [8] presented a stereo visual odometry system that
I. I NTRODUCTION
works together with a bag of words place recognition system
The ability to detect that past place recognition decisions to build multiple representations of dynamic environments
(also known as the loop closing problem) were incorrect, and over time. Multiple map stitching, recovery from odometry
to recover the accurate state estimate once they are removed, failures, loop closing, and global localization, all rely on
is crucial for lifelong mobile robotic operations. No matter the robustness of the place recognition system. “Weak links”
how robust a place recognition system might be, there always are removed when place recognition is able to close loops,
exists the possibility of getting just one false positive loop making it prone to errors when the place recognition closes
closing hypothesis. This can be catastrophic for any estimation loops wrongly. Similarly, McDonald et al. [11] presented a
algorithm. Only by collecting more evidence over time we multi-session 6 DOF Visual SLAM system using “anchor
can detect and correct these mistakes in place recognition. nodes”. In their approach place recognition is assumed to be
Reliability increases with redundancy during the verification perfect and its output is trusted every time. Sibley et al. [17]
process. presented a relative bundle adjustment approach for large scale
Consider long term navigation, see Fig. 4(b) (best viewed topological mapping. They show an example of mapping from
in color), where information is collected session by session London to Oxford, over a trajectory of about 121km. They
and place recognition is used to improve the estimation after also use appearance based place recognition and are therefore
every session. Failures may happen, but with the arrival of new in danger of making wrong decisions in case of incorrect
evidence and the reconsideration of decisions taken earlier, the place recognition. All of these approaches to large scale and
estimation can be corrected. persistent mapping rely on a place recognition system with
To achieve this, we need to realize that the place recogni- zero false positives.
tion system has generated wrong constraints, remove them if Appearance based loop closing approaches that work in
necessary, and recompute the estimation. We propose a novel real time usually follow the bag of words approach [18].
consistency method based on consensus: wrong loop closure FabMap, from Cummins and Newman [3], uses a bag of
constraints usually are in contradiction among themselves, words approach with probabilistic reasoning that has been
with the correct ones, and with the sequential movement shown to work robustly over a long trajectory. The BoW-CRF
constraints (odometry) of the robot. At the same time, the system [1] generates loop closings using a bag-of-words and
correct ones form consensus among themselves and with the carries out verifications using conditional random fields. This
odometry. improves the robustness of the system. In any case, neither
Our method, the Realizing, Reversing, Recovering (RRR) approach can guarantee 100% accuracy. Olson [12] proposed
algorithm, works with the pose graph formulation. It is part of a hypothesis verification method for loop closure constraints
the back-end of a SLAM system and is therefore independent using graph partitioning based on spectral clustering. Another
of the type of sensor used for odometry or place recognition. approach is to delay decision making and maintain multiple

233
topologies of the map with an associated belief for each one. and therefore the overall error, assuming all the constraints to
Ranganathan and Dellaert [14] follow this approach using Rao- be independent, is given by:
Blackwellized particle filter. However, they do not explicitly  
show how their system is affected by and recovers from wrong D2 (x) = dij (x)2 = rij (x)T Ωij rij (x) (3)
loop closures. Their method is also unique in the sense that it
The solution to graph-SLAM problem is to find a state x∗ that
uses the estimation process itself to reason about possible loop
minimizes the overall error.
closures. A similar estimation based reasoning approach using
pose graph formulation was presented by Sünderhauf and 
x∗ = argmin rij (x)T Ωij rij (x) (4)
Protzel [20] in which a robust SLAM back end using “switch x
factors” was proposed. The central idea is to penalize those Iterative approaches such as Gauss-Newton or Levenberg
loop closure links during graph optimization that deviate from Marquadt can be used to compute the optimal state estimate
the constraints they suggest between two nodes. Similar to our [10].
approach, they change the topological structure of the graph We divide the constraints into two sets; S contains sequen-
based on identification and rejection of wrong loop closures. In tial links and R contains links from place recognition. Since
contrast, however, they assess the validity of every loop closure all constraints are mutually independent, the error in (3) be
on its own, without forming a general consensus using all the written as:
available information. In cases where there are a number of
 
hypotheses that suggest the same but wrong loop closings (for D2 (x) = dij (x)2 + dij (x)2 (5)
example due to perceptual aliasing in long corridors), overall (i,j)∈S (i,j)∈R
consensus helps us in rejection of such outliers. Their method
suggest a continuous function governing the state of “switch We further divide the set R into n disjoint subsets Rk , where
factors” which does not make sense in all the cases. each subset only contains topologically related constraints
(sequences of links that relate similar portions of the robot
III. O UR PROPOSAL : THE RRR ALGORITHM trajectory) such that R = ∪nk=1 Rk and ∀(i = j)Ri ∩ Rj = ∅.
Since place recognition is a very important step in map We term each of theses subsets as “clusters”.
estimation, there is a need for a robust mechanism that Then the error for set R can be written as:
can identify incorrect place recognition links. We propose
a robust consistency-based loop closure verification method  
n  
n
using the pose graph formulation based on the observation dij (x)2 = dij (x)2 = dRc (x)2 (6)
that correct loop closure in conjunction with odometry can (i,j)∈R c=1 (i,j)∈Rc c=1
help in the detection of wrong loop closures. Our method where dRc (x)2 is the error contributed by the cth subset. This
follows the line of work in which the estimation process itself simply means that the overall error introduced due to place
is used in making the distinction between correct and false recognition constraints is the sum of the individual errors of
loop closures. Thanks to the maturity of the Simultaneous each cluster.
Localization and Mapping (SLAM) problem, in the last years Assuming that we do not have any outliers in odometry,
several very efficient estimation methods have been published the error in (3) is caused practically only by the loop closing
as for instance iSAM by Kaess et al. [7], HOG-Man by Grisetti links. Once we iterate to find the optimal state, the error in the
et al. [6], and g2o by Kümmerle et al. [10]. odometry is no longer zero. This increase in odometry error
The graph based formulation for SLAM, the so-called gives us a measure of the metric change that must take place so
“graph-SLAM”, models robot poses as nodes in a graph that the graph conforms to place recognition constraints. This
where links from odometry or loop closures form edges or error will be smaller when the corresponding place recognition
“constraints”. Let x = (x1 . . . xn )T be a vector of parameters constraints are correct. It is because comparatively smaller
that describe the configuration of the nodes. Let ωij and Ωij metric change is needed as opposed to the change required by
be the mean and the information of the observation of node wrong constraints. Moreover, clusters that suggest the same
j from node i. Given the state x, let the function fij (x) be change would cause smaller errors among them as compared
a function that calculates the perfect observation according to to conflicting clusters. By measuring how much errors the
the current state. The residual rij can then be calculated as: clusters introduce, we can detect which clusters agree with
each other. A point to note here is that even though odometry
rij (x) = ωij − fij (x) (1) drifts with time, it is still a useful measure of the underlying
Constraints can either be introduced by odometry which are topology of the graph.
sequential constraints (j = i + 1), or from place recognition Consider the case, when each of these clusters is the only
system which are non-sequential. The amount of error intro- place recognition constraint in the graph. In absence of other
duced by each constraint weighed by its information can be clusters, this cluster can make the odometry converge to an
calculated as: acceptable solution, where acceptable means that the overall
errors are within some statistical threshold. If this cluster alone
dij (x)2 = rij (x)T Ωij rij (x) (2) deforms the graph enough to make the solution unacceptable,

234
it is highly likely that this cluster is suggesting a wrong place
recognition decision. This forms the basis of our intra-cluster
ωi,j ∈ Rk ⇐⇒
test. Mathematically, for any cluster Ri to be individually
consistent, the following two conditions must hold: ∃ωp,q ∈ Rk | ti − tp  ≤ tg ∧ tj − tq  ≤ tg (11)
where ti means the timestamp related to the node i, and tg can
 
2
DG (x) = T
rij (x) Ωij rij (x) + 2
dij (x) < χ2α,dG be selected according to the rate at which the place recognition
(i,j)∈Ri (i,j)∈S
system runs. This threshold defines the cluster neighbourhood.
(7) In our experiments, we consider loop closing hypotheses less
where dG are the degrees of freedom of the whole graph. than tg = 10s apart to be a part of the same cluster.
Moreover, A cluster is considered closed if for tg time no new links
are added to it.
2) Intra-Cluster Consistency: After clustering hypotheses
Dl2 (x) = rij (x)T Ωij rij (x) < χ2α,dl , (i, j) ∈ Ri (8) together, the next step is to compute the internal consistency
for each cluster. This involves optimizing the pose graph with
ensures that if there are any outliers within the cluster they
respect to just this single cluster and checking which links
are omitted. dl are the degrees of freedom of each constraint.
satisfy the χ2α,dl bound. The links inside the cluster that do
If the criterion in (7) is not met, the whole cluster is rejected.
not pass this test are removed from the cluster and are no
Constraints not meeting the criterion in (8) are removed from
longer considered in the optimization. Algorithm 1 describes
the cluster and the rest of the links are accepted as being
the intra-cluster consistency. This procedure is carried out for
individually consistent.
each cluster and it can be performed as soon as a cluster is
Similarly, for a selected subsets of clusters C, we term the
closed.
clusters in C to be jointly consistent if:
Algorithm 1 Intra Cluster Consistency
|C|
  Input: poses, slinks, cluster of rlinks
2
DC (x) = rij (x)T Ωij rij (x) < χ2α,dC (9) Output: cluster
c=1 (i,j)∈Rc add poses, slinks to PoseGraph
PoseGraphIC ← PoseGraph
and add cluster to PoseGraphIC
optimize PoseGraphIC
2
 if DG < χ2α,dG then
2
DG 2
(x) = DC (x) + rij (x)T Ωij rij (x) < χ2α,dG (10) for each rlinkl ∈ cluster do
if Dl2 < χ2α,dl then
(i,j)∈S
Accept rlinkl
This first criteria ensures that the present clusters are consistent else
Reject rlinkl
with each other while the second one ensures the consistency end if
of the clusters with the odometry links. end for
We also advocate that rather than testing a single hypothesis else
on its own, clusters are more robust and identify regions of Reject cluster
loop closures rather than single pose-to-pose correspondence. end if
This is the case when appearance based place recognition
systems generate hypotheses. Having said that, our method 3) The RRR algorithm: Having established intra-cluster
does not pose any restriction on the minimum number of links consistency, we now look for clusters that are mutually con-
in a cluster and can handle clusters composed of just single sistent. We initially assume that all the clusters are consistent
links. and carry out optimization by including all of them in the
An overview of our method is provided in the following optimization problem. Once optimized, we check for any links
section. whose residual error satisfies the χ2 test. The clusters to which
these links belong are then selected and added to the candidate
A. Method set to be evaluated for joint consistency according to eqs. (9)
1) Clustering: Our method starts by collecting topologi- and (10).
cally related loop closing hypotheses into clusters. Clusters We accept the clusters that are jointly consistent and term
represent sequences of loop closures that relate similar por- them goodSet. At this point, we remove the goodSet from
tions of trajectory. We use a simple incremental way to group the optimization and try to re-optimize with the remaining
them using timestamps. We proceed as follow: with the first clusters. The idea behind doing so is that in the absence of
loop closure that arrives, we initialize the first cluster R1 . the good clusters, other correct clusters will be able to pass
Then, we decide according to (11) if the next loop closure the threshold tests. A good analogy would be that of athletes
arriving belongs to same cluster or a new cluster needs to be running in a special kind of race in which the observer is
initialized. only able to see who crosses the finish line first. In order to

235
Algorithm 2 Inter Cluster Consistency until something is added to the goodSet. The main algorithm
Input: goodSet, candidateSet, PoseGraph is given in the Algorithm 3. The algorithm terminates when
Output: goodSet, rejectSet we are no longer able to find any clusters to add to the
PoseGraphJC ← PoseGraph
add (goodSet, candidateSet) to PoseGraphJC candidateSet.
rejectSet ← {} Long term operation: The algorithm described above works
optimize PoseGraphJC on data from a single session. In order to integrate multiple
2
if DC < χ2α,dC ∧ DG2
< χ2α,dG then sessions, we run the algorithm on the first session, select
goodSet ← {goodSet, candidateSet} the goodSet and optimize the graph using these correct loop
else
find the clusteri ∈ candidateSet with largest CI closing hypothesis. The optimized graph is then used in con-
remove clusteri from candidateSet junction with the second session. The loop closing hypothesis
rejectSet ← clusteri from the first are carried over to the second session, and we
if ¬isempty candidateSet then now look for a goodSet that is correct for both the sessions.
(goodSet, rSet) ← In this way, we can extend the algorithm for n sessions, at the
Inter Cluster Consistency(goodSet, candidateSet)
rejectSet ← {rejectSet, rSet} end of which we have a goodSet consisting of hypothesis that
end if have support from all the n sessions.
end if
IV. E XPERIMENTS
Algorithm 3 RRR We have evaluated the RRR algorithm on an indoor environ-
Input: poses, slinks, R set of clusters containing rlinks ment (Bicocca campus) from the RAWSEEDS project [15], an
Output: goodSet of rlinks
outdoor environment (Bovisa campus) from the same source,
add poses, slinks to PoseGraph
goodSet ← {} and on the NewCollege dataset [19]. Also, we show two multi-
rejectSet ← {} session experiments, 5 large scale sessions on Bicocca, and
loop 36 sessions on a changing environment of an office floor at
PoseGraphPR ← PoseGraph the University of Lincoln [4]. As front-end we use the BoW
currentSet ← R\{goodSet ∪ rejectSet}
algorithm of [1], and as back-end we use the g2o framework,
candidateSet ← {}
add currentSet to PoseGraphPR configured with the Gauss-Newton method and four iterations.
optimize PoseGraphPR The code along with the datasets used in the paper can be
for each clusteri ∈ currentSet do found at https://github.com/ylatif/rrr.
if ∃Dl2 < χ2α,dl | rlinkj ∈ clusteri then
candidateSet ← {candidateSet, clusteri } A. Comparisons
end if
end for In the experiments, the RRR algorithm is used in combina-
if isempty(candidateSet) then tion with a weak place recognition algorithm (BoW), that is
STOP
else
known to exhibit many false positives. In this way, we test the
s = goodSet.size robustness of RRR to incorrect loop closing links. We compare
(goodSet, rSet) ← our method with a branch and bound search (BBS) traversal
Inter Cluster Consistency(goodSet, candidateSet) of the whole solution space, computationally very demanding
if goodSet.size > s then but guaranteed to find the largest subset of hypotheses that is
rejectSet ← {}
else
jointly consistent (according to Algorithm 2). We also compare
rejectSet ← {rejectSet, rSet} with Olson’s method [12] and a modified version (see below).
end if This is for verification within a single session.
end if Olson’s method is mainly used for outlier rejection and find-
end loop ing inliers that are in some sense consistent with each other.
Without any modifications and calculating the consistency
just within the “groups” (that we call clusters), the method
establish which runner would end up in the second position, has very low recall. We believe this happens because a few
we ask the winner to not participate in the race and conduct very well connected prevent other correct but comparatively
the race again. We continue asking the winner each time to weakly connected clusters for entering the final solution. In the
join the previous winners, to see who will take the next spot. modified version, we calculate the consistency between all the
The winners in our case are the member of the goodSet. This given loop closure links. This makes the algorithm unreliable
is show in Algorithm 2. As long as we keep finding clusters because Dijkstra links are derived from long trajectory links.
that are jointly consistent with the goodSet, it will grow. At the same time, the adjacency criterion takes a long time to
An important point to mention here is the use of the compute. The only advantage is that it dilutes the effect of a
rejectSet. The reject set contains all the clusters that we few very well connected clusters. The results for this method
checked in the last iteration but found them to be inconsistent are shown in Fig. 1. The method accepts some false positive
with the goodSet. We omit them from the optimization process loop closures mainly because of the aforementioned reason.

236
Pr = 0.69
Re = 0.85
RMSE=55.5m

Pr = 1.0 Pr = 0.98 Pr = 1.0


Re = 0.42 Re = 0.78 Re = 0.65
RMSE=3.0m RMSE=1.8m RMSE=1.0m

(a) Loop closures from a BoW system. (b) Output of the Olson’s method. (c) Modified Olson’s method’s output. (d) Output of our proposal.

Fig. 1. Experiment used for comparisons against other possible approaches, over one session of Bicocca campus (Fig. 4(b)). Inset: result of optimization
carried out using accepted links. We show the precision (Pr), recall (Re) and root mean square error (RMSE) for each method. (Figure best viewed in colour
online at http://roboticsproceedings.org/rss08/p30.html)

on the optimization process. When the base approach reaches


full precision our method accepts all the constraints, obtaining
the same recall and errors.
We test the RRR algorithm in several real datasets. The
odometry information comes from visual+wheel, laser, and
stereo odometry, for outdoors-Bovisa (Fig 3(a)), mixed-Bovisa
(Fig. 3(b)), and NewCollege (Fig. 3(c)) respectively. We use
BoW as place recognition with geometrical checking (GC)
based on finding the rigid-body transformation with stereo
correspondences. In Fig. 3 we show the results. It can be seen
that our method successfully rejects all the wrong loop closing
links.
B. Long term operation
The task of long term navigation involves robustly joining
multiple sessions into a single map. For this purpose, we
use multiple sessions from RAWSEEDS [15]. The dataset
contains five overlapping runs in indoor environment and
provides wheel odometry, laser scans, and images from a
Fig. 2. Effect of the RRR algorithm over the precision (top), recall (middle), camera mounted on the robot.
and root mean square error (bottom) compared with the base approach In order to use the RRR algorithm, we need some form of
(BoW+GC) as we sweep the parameter α− of the BoW system. This is over odometry as well as the loop closing hypotheses. We use laser
the same Bicocca session used in Fig. 1. The curves for BBS in each case
are superimposed with those of RRR. scans from the dataset to compute the laser odometry using
a simple scan matcher, see Fig. 4(a), and use BoW plus GC,
BBS traverses the solution space to compute the largest see Fig. 4(b), to compute appearance based place recognition
set of clusters of loop closing hypotheses that are jointly links. These are the inputs to our method.
consistent. As can be seen in Fig. 2, BBS selects the same We run our method incrementally on the five sessions.
clusters as our method but at the expense of much higher The first iteration of the algorithm works with data from
computational complexity. the first session and computes the set of consistent clusters.
To further evaluate the performance of the RRR algorithm, The optimized first session is shown at the top-most position
we sweep the minimum confidence level of acceptance α− in Fig. 5. In the inset, the input to this step can be seen.
of the BoW system, and compute the precision, recall, and This optimized session is used in conjunction with the second
the root mean square error of the optimized graph for the session and combined reasoning is done on both. During the
constraints for each value of α− . In Fig. 2, we show the result second session, due to lack of evidence and odometry error we
without our method (base) and with our RRR algorithm. RRR accept some false positives, the effect of which can be seen
obtains full precision in the whole range and better root mean in the 3D plot. During the next session, as more evidence
square error than the base approach. It is important to note that becomes available, we are able to recover from the previous
although sacrificing recall, we obtain better metric accuracy. mistake. As can be seen from the successive improvement in
This shows the disastrous effect of just a single false positive the overall estimate, our method is able to take into account

237
(a) Outdoors (Bovisa campus) (b) Mix outdoors-indoors (Bovisa campus) (c) Outdoors (New College)

Fig. 3. Results of the RRR algorithm over different environments. The loop closures are obtained from an appearance based place recognition algorithm.
Odometry is in blue, accepted links in green and rejected ones in red. (Figure best viewed in colour)

V. D ISCUSSION
The method proposed in this work is applicable for metric
pose graph formulation. Any SLAM system that constructs
a metric map can be used in conjunction with our system
for loop closure verification over time. The method is not
restricted to any type of sensor. The pose graph, for example
can be constructed from either wheel odometry, laser odometry
or visual odometry or any other sensor. Similarly, the method
is independent of the mechanism used for place recognition.
This gives our method the flexibility to be used together with
(a) Laser odometry given by a simple scan matching.
any SLAM front end that can generate a pose graph and the
corresponding place recognition constraints.
For timing information, we compare our method again with
BBS and Olson’s method. BBS explores an interpretation tree
which, in the case of our extension to clusters, is a binary tree.
The worse case complexity of BBS is therefore O(2n ) where
n is the number of clusters. In our experiment, after clustering
and intra-cluster consistency (6s), the algorithm took 12s for
a binary tree of depth 20. There are randomized versions to
check joint consistency such as the RJC algorithm proposed
by Paz et al. [13] or the 1-point RANSAC algorithm proposed
by Civera et al. [2]. They are based on the random sample
consensus paradigm [5], and need a minimum number of data
(b) Loop closing constraints from BoW+GC , α− = 0.15 points to fit a model. In the case addressed here, with arbitrary
Fig. 4. Bicocca multi-session experiment in an indoor environment. (Figure trajectories and without prior knowledge of the number of loop
best viewed in colour) closure areas, there is no clear way to determine the minimum
number of data points needed to fit the model.
more evidence as it become available and incorporate it into For Olson’s method, the most expensive operation is the
the overall map. The last (bottom most) in Fig. 5 is the final computation of the consistency matrix. For N hypotheses the
map estimate based on evidence and combined reasoning on complexity of calculating the consistency matrix is O(N 2 ). It
all the sessions. should be noted that N , the number of hypothesis, is much
In the same way, we run our method on the 36 sessions of larger than the number of clusters n. In our experiment, the
Lincoln dataset. This dataset provides omnidirectional images running time for this method was 8s.
and laser scans. We compute the odometry by simple scan For n clusters, our method needs to carry out n intra-cluster
matching of the laser returns, see Fig. 6(a), and the loop consistency tests. More over, in the worst case if all the clusters
closure constraints taking the best match comparing the GIST are considered for joint consistency after the first optimization
descriptors of the Omnidirectional images as described in [16], and are not jointly consistent, we need n joint consistency
see Fig. 6(b). The final result after our proposal is shown in checks. This makes our method linear O(n) in the number of
Fig. 7. The algorithm takes 40.5s to complete, with no false clusters. The actual cost of carrying out each step, however,
positives. depends on the underlying graph and the optimization method

238
(a) Laser odometry. (b) Loop closure constraints.

Fig. 6. Lincoln dataset. (Figure best viewed in colour)

Fig. 7. Results of our RRR algorithm for long term operation using the 36
sessions of Lincoln dataset. Left: Loop closure constraints accepted. Center
and right: Optimized graph as output of our method. (Figure best viewed in
colour)

as opposed to O(2n ). For the first two sessions of the long


term experiment in Fig. 5, BBS takes 1537.6s compared to
our method which runs in 34.8s. For the complete experiment,
BBS takes over two days to come up with a solution, while
our method takes 314.5s.
The long term experiments demonstrate the importance
of incorporating new evidence session by session to realize
possible previous failures. It is important to mention here that
Fig. 5. Results for long term operation using Bicocca Feb-25a, Feb-25b, Feb- as the number of sessions continue increasing, further criteria
26a, Feb-26b and Feb-27a Inset: Input to current step. 3D plot : Optimized need to be taken into account in order to join and/or forget past
graph as output of our method. (Figure best viewed in colour) evidence. For instance, if we track the performance history of
the clusters we can decide to forget some of them if they are
used. For the experiment in Fig. 1, our method takes 8s always rejected (as would be the case for perceptual aliasing).
to calculate the solution, spending 6 of them in clustering Also we can fuse verified clusters, from different times, if they
and intra-cluster consistency checks. As we mentioned above, are metrically close after the optimization is carried out.
these two operations can be carried out in real time. Here, we assumed that the errors in odometry links are
As we mentioned earlier, currently we consider that links small. We think that it is a plausible assumption given the
less than tg = 10s apart belong to the same cluster, in higher frame rate of odometry w.r.t place recognition rate
accordance with the usual frequency of the place recognition and the state of the art in odometry techniques. Nevertheless,
algorithm, 1fps. A smaller value for tg results in more clusters, if a failure occurs the approaches of “weak links” [9] or
and a corresponding increase in computational cost, with “anchor nodes” [11] can be used to separate the experiment
precision-recall not being affected. A large value can result in into different sessions.
low recall because large clusters containing valid loop closing
links could be rejected more frequently. VI. C ONCLUSION
One of the main advantages of our method is that while The task of long term operation requires that at any given
finding a solution similar to BBS, the algorithm works in O(n) time, the map be updated using all the available information.

239
Therefore, a method is required that can reason based all on [8] K. Konolige and J. Bowman. Towards lifelong visual
the available evidence and as a result produce the best map maps. In Intelligent Robots and Systems, 2009. IROS
estimate in light of that evidence. We therefore use the term 2009. IEEE/RSJ International Conference on, pages
“loop closing over time” for methods that are able to reason 1156–1163. IEEE, 2009.
about the consistency of loop closing hypotheses not only [9] K. Konolige, J. Bowman, J.D. Chen, P. Mihelich,
in space but also along the time dimension. In this work, M. Calonder, V. Lepetit, and P. Fua. View-based Maps.
we have presented a novel consistency based approach to The International Journal of Robotics Research, 29(8):
this problem. We support with evidence our claim that the 941–957, 2010. doi: 10.1177/0278364910370376. URL
estimation process itself can be used to establish the validity http://ijr.sagepub.com/content/29/8/941.abstract.
of loop closing hypotheses. We back up our claims with [10] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige,
experiments showing the long term operation abilities of our and W. Burgard. g2o: A general framework for graph
system as well as its robustness when compared to state of the optimization. In Proc. of the IEEE Int. Conf. on Robotics
art data association methods. As future work, we will explore and Automation (ICRA), Shanghai, China, May 2011.
extending our algorithm to consider the possibility of incorrect [11] J. McDonald, M. Kaess, C. Cadena, J. Neira, and J.J.
odometry links. Leonard. 6-DOF Multi-session Visual SLAM using
Anchor Nodes. In European Conference on Mobile
ACKNOWLEDGEMENTS Robotics, ECMR, 2011.
This research has been funded by the Dirección General [12] E. Olson. Recognizing places using spectrally clustered
de Investigación of Spain under projects DPI2009-13710 and local matches. Robotics and Autonomous Systems, 57
DPI2009-07130, and by DGA-FSE(group T04). (12):1157–1172, December 2009.
[13] L. M. Paz, J. Guivant, J. D. Tardós, and J. Neira. Data
R EFERENCES association in O(n) for divide and conquer SLAM. In
[1] C. Cadena, D. Gálvez-López, J.D. Tardós, and J. Neira. Proc. Robotics: Science and Systems, Atlanta, GA, USA,
Robust place recognition with stereo sequences. IEEE June 2007.
Transaction on RObotics, 28(4), 2012. doi: DOI:10.1109/ [14] A. Ranganathan and F. Dellaert. Online probabilis-
TRO.2012.2189497. to appear. tic topological mapping. The International Journal of
[2] J. Civera, Oscar G. Grasa, A. J. Davison, and J. M. M. Robotics Research, 30(6):755–771, May 2011. doi: 10.
Montiel. 1-Point RANSAC for extended kalman filtering: 1177/0278364910393287. URL http://ijr.sagepub.com/
Application to real-time structure from motion and visual content/early/2011/01/23/0278364910393287.abstract.
odometry. J. Field Robot., 27:609–631, September 2010. [15] RAWSEEDS. Robotics advancement through
ISSN 1556-4959. doi: http://dx.doi.org/10.1002/rob.v27: Webpublishing of sensorial and elaborated
5. URL http://dx.doi.org/10.1002/rob.v27:5. extensive data sets (project FP6-IST-045144), 2009.
[3] M. Cummins and P. Newman. Appearance-only SLAM http://www.rawseeds.org/rs/datasets.
at large scale with FAB-MAP 2.0. The International [16] A. Rituerto, A. C. Murillo, and J. J. Guerrero. Semantic
Journal of Robotics Research, 2010. doi: 10.1177/ labeling for indoor topological mapping using a wear-
0278364910385483. URL http://ijr.sagepub.com/content/ able catadioptric system. to appear in Robotics and
early/2010/11/11/0278364910385483.abstract. Autonomous Systems, special issue Semantic Perception,
[4] F. Dayoub, G. Cielniak, and T. Duckett. Long-term ex- Mapping and Exploration, 2012.
periments with an adaptive spherical view representation [17] G. Sibley, C. Mei, I. Reid, and P. Newman. Vast-scale
for navigation in changing environments. Robotics and outdoor navigation using adaptive relative bundle adjust-
Autonomous Systems, 59(5):285–295, 2011. ment. The International Journal of Robotics Research,
[5] M. A. Fischler and R. C. Bolles. Random sample con- 29(8):958–980, 2010.
sensus: A paradigm for model fitting with applications [18] J. Sivic and A. Zisserman. Video Google: A text retrieval
to image analysis and automated cartography. Commun. approach to object matching in videos. In Proceedings
ACM, 24(6):381–395, 1981. of the International Conference on Computer Vision,
[6] G. Grisetti, R. Kümmerle, C. Stachniss, U. Frese, and volume 2, pages 1470–1477, October 2003.
C. Hertzberg. Hierarchical optimization on manifolds [19] M. Smith, I. Baldwin, W. Churchill, R. Paul, and
for online 2d and 3d mapping. In Robotics and Au- P. Newman. The new college vision and laser data
tomation (ICRA), 2010 IEEE International Conference set. The International Journal of Robotics Research,
on, pages 273 –278, may 2010. doi: 10.1109/ROBOT. 28(5):595–599, May 2009. ISSN 0278-3649. doi:
2010.5509407. http://dx.doi.org/10.1177/0278364909103911. URL http:
[7] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. Leonard, //www.robots.ox.ac.uk/NewCollegeData/.
and F. Dellaert. iSAM2: Incremental smoothing and [20] N. Sünderhauf and P. Protzel. BRIEF-Gist - Closing the
mapping with fluid relinearization and incremental vari- loop by simple means. In Intelligent Robots and Systems
able reordering. In IEEE Intl. Conf. on Robotics and (IROS), 2011 IEEE/RSJ International Conference on,
Automation, ICRA, Shanghai, China, May 2011. pages 1234 –1241, sept. 2011.

240
Optimization-Based Estimator Design for
Vision-Aided Inertial Navigation
Mingyang Li and Anastasios I. Mourikis
Dept. of Electrical Engineering, University of California, Riverside
E-mail: [email protected], [email protected]

Abstract—This paper focuses on the problem of real-time pose limitations has been typically addressed by proposing a mo-
tracking using visual and inertial sensors in systems with limited tion estimation algorithm (e.g., EKF-SLAM, sliding-window
processing power. Our main contribution is a novel approach iterative estimation), and testing whether it can operate within
to the design of estimators for these systems, which optimally
utilizes the available resources. Specifically, we design a hybrid the given resource constraints. If it can, success is declared.
estimator that integrates two algorithms with complementary However, this design paradigm has the shortcoming that it
computational characteristics, namely a sliding-window EKF cannot ensure optimal use of the available resources: even
and EKF-SLAM. To decide which algorithm is best suited to though one algorithm may operate using a certain amount of
process each of the available features at runtime, we learn the CPU time, it is possible that a different algorithm can either
distribution of the feature number and of the lengths of the
feature tracks. We show that using this information, we can (i) produce a more accurate estimate in the same amount of
predict the expected computational cost of each feature-allocation time, or (ii) compute an estimate of the same precision faster.
policy, and formulate an objective function whose minimization Clearly, this situation is undesirable.
determines the optimal way to process the feature data. Our An additional difficulty that arises when designing an es-
results demonstrate that the hybrid algorithm outperforms each timator for any localization task is that, typically, the com-
individual method (EKF-SLAM and sliding-window EKF) by a
wide margin, and allows processing the sensor data at real-time putational efficiency of different methods depends strongly
speed on the processor of a mobile phone. on the nature of each particular dataset. For instance, one
algorithm may outperform all others when the environment is
I. I NTRODUCTION feature-rich and the vehicle is moving slowly, while a different
algorithm may be the fastest in feature-sparse environments
In this paper we address the problem of tracking the under fast motion. This makes algorithm selection a difficult
3D-pose of small, resource-constrained systems in unknown task, for which no general, systematic methods exist to date.
environments. Specifically, we are interested in estimating To address the limitations described above, in this work we
the motion of miniature devices, similar in size to a mobile propose a new paradigm for the design of motion estimation
phone. In contrast to medium- and large-scale systems, (e.g. algorithms. Specifically, we propose designing a hybrid esti-
mobile robots, UAVs, autonomous cars), small devices have mator that incorporates two (or more, in general) algorithms
limited computational capabilities and battery life, factors with complementary computational characteristics. This makes
which make the pose-estimation problem challenging. In the it possible to decide, in real time, to process different measure-
absence of GPS, the types of sensors that can be used for pose ments (e.g., different features) by a different algorithm. Since
estimation in small-scale systems are quite restricted. In our the optimal choice for each measurement will depend on the
work, we opt for using visual and inertial measurements. This characteristics of the sensor data, in the proposed framework
is motivated by the fact that cameras and inertial measurement we employ statistical learning to learn these characteristics.
units (IMUs) are small, lightweight, and inexpensive sensors, Gathering statistical information allows us to compute the
which can operate in almost any environment. Our goal expected cost of any strategy for allocating measurements to
is to estimate the moving platform’s trajectory only, using algorithms. To identify the optimal strategy we solve, in real
the inertial measurements and the observations of naturally time, an optimization problem whose objective function is the
occurring point features. We do not assume that a map of the expected computation time.
area is available, nor do we aim to build such a map. Thus, In this paper, we apply the approach described in the pre-
the problem we address is analogous to visual odometry, with ceding paragraph to the problem of visual-inertial odometry.
the added characteristic that an IMU is available. We term the Specifically, we design a hybrid extended Kalman filter (EKF),
approach visual-inertial odometry. which integrates EKF-SLAM with a sliding-window EKF
The key challenge that needs to be addressed when de- estimator. As explained in Section III, these two estimators
signing an estimator for this task is the limited availability of process the same measurement information in different ways,
processing power. Feature detection algorithms can track hun- and have complementary computational characteristics. We
dreds of features in images of natural scenes, but processing show that the optimal choice of algorithm to process each
all this data in real time is challenging, particularly for a small, individual feature depends on the distribution of the feature-
resource-constrained device. To date, the existence of resource track lengths of all features. This distribution is not known in

241
advance (it depends on the environment, the camera motion, EKF (e.g., [10]–[12]), or iterative minimization over a window
as well as the feature tracker used), and therefore we learn of states (e.g., [13], [14]). The latter approaches iteratively re-
it from the image sequence. Using this information, the linearize the measurement equations to better deal with their
optimal strategy for processing the feature measurements can nonlinearity, which, however, incurs a high computational cost.
be computed by solving a one-variable optimization problem, In our recent work [15] we showed that, by choosing lineariza-
as shown in Section V. Our results demonstrate that the hybrid tion points that preserve the system’s observability properties,
algorithm outperforms each individual method (EKF-SLAM EKF-based visual-inertial odometry can attain better accuracy
and sliding-window EKF) by a wide margin. In fact, the hybrid than iterative-minimization methods, at only a fraction of the
filter allows processing the sensor data at real-time speed on computational cost. Therefore, in this paper we only focus on
a mobile phone, something that is not possible with either of EKF-based algorithms.
the individual algorithms. Within the class of EKF methods, there are two possible
formulations of the estimator. Specifically, we can employ the
II. R ELATED W ORK
EKF-SLAM approach (e.g. [16]–[18] and references therein),
A huge number of methods have been proposed for pose in which the state vector contains the current IMU state and
estimation, and thus any attempt at a general literature survey feature positions. To keep the computations bounded, features
in the limited space available would be incomplete. We there- that leave the field of view must be removed from the state
fore here focus on methods that seek to optimize, in some vector, leaving only the currently observed ones [18]. Other
sense, the computational efficiency of localization. To the best EKF algorithms instead maintain a sliding window of camera
of our knowledge, no prior work exists that employs learning poses in the state vector, and use the feature observations to
of the feature tracks’ characteristics to explicitly optimize apply probabilistic constraints between these poses (e.g., [10],
the computational cost of estimation. Past work has focused [19], [20]). Out of this class of methods, the multi-state-
primarily on SLAM, and can broadly be categorized as: constraint Kalman filter (MSCKF) algorithm [10] uses all
a) Exact reformulations of the SLAM equations: Typical the information provided by the feature measurements, which
methods decompose the computations into smaller parts, and makes it our preferred method.
selectively carry out the only necessary computations at each Both the MSCKF and EKF-SLAM use exactly the same
time instant (see e.g., [1]–[4]). Typically, in these methods information, and only differ in the way they organize the
the currently visible features are updated normally at every computations, and in linearization (more on this in Section VI).
time step, while the remaining ones are updated only “on If the measurement models were linear, both methods would
demand” or when re-visited. In the case of visual-inertial yield exactly the same result, equal to the MAP estimate of the
odometry, however, where the state vector contains only the IMU pose [21]. With respect to computational cost, however,
actively tracked features and no loop-closure is considered, the two methods differ dramatically. For the EKF-SLAM
these methods are not applicable. algorithm the cost at each time-step is cubic in the number
b) Approximations of the SLAM equations: On the other of features (since all features in the state vector are observed).
hand, several methods exist that employ approximations to In contrast, the MSCKF has computational cost that scales
the SLAM equations (e.g., [5]–[7] and references therein), to linearly in the number of features, but cubically in the length
reduce the required computations. In contrast to these meth- of the feature tracks (see Section IV-B). Therefore, if many
ods, which trade-off information for efficiency, our proposed features are tracked, each in a small number of frames, the
approach involves no information loss, and no approximations, MSCKF approach is preferable, but if few features are tracked
other than the inaccuracies due to the EKF’s linearization. over long image sequences, EKF-SLAM would result in lower
c) Feature selection methods: A different family of ap- computational cost1 . Clearly, EKF-SLAM and the MSCKF
proaches seek to reduce the computational cost of localization algorithm are complementary, with each being superior in
by processing only the most valuable, in terms of uncertainty different circumstances. This motivates us to integrate both
reduction, measurements (e.g., [8], [9]). Only processing a algorithms in a single, hybrid filter, as described next.
small subset of carefully selected measurements can often
result in high accuracy. However, since the remaining mea- IV. T HE HYBRID MSCKF/SLAM A LGORITHM
surements are simply discarded, loss of information inevitably
occurs. In our proposed approach, all the available measure- In this section we present the hybrid MSCKF/SLAM es-
ments are processed, and no localization information is lost. timator for visual-inertial odometry. We begin our presenta-
tion by briefly outlining the MSCKF algorithm, which was
III. E STIMATORS FOR VISUAL - INERTIAL ODOMETRY
originally proposed in [10]. In our work, we employ the
We now examine the possible choices of algorithms for modified MSCKF algorithm, which attains increased accuracy
processing the feature observations in visual-inertial odom- and consistency by choosing linearization points that ensure
etry. Since real-time performance is necessary, any candidate the correct observability properties [15].
algorithm must have bounded computational complexity, irre-
spective of the duration of the trajectory. Within this class, 1 Similar conclusions can be reached from the analysis of [22], which
practically all algorithms proposed to date employ either an compares EKF-SLAM to iterative optimization over a window of poses.

242
A. The MSCKF algorithm for visual-inertial odometry and comparing it against a threshold given by the 95-th
percentile of the χ2 distribution. By stacking together the
The MSCKF does not include the feature positions in the
residuals of all features that pass this gating test, we obtain:
state vector, but instead uses the feature measurements to
directly impose constraints between the camera poses. The z̃o = Ho x̃ + no (7)
state vector of the MSCKF is: o o o
where z̃ , H , and n are block vectors/matrices, with block
 T
xTk = xTIk xTC1 xTC2 · · · xTCm (1) rows z̃oi , Hoi , and noi , for i = 1 . . . n, respectively. We can now
use the above residual, which only involves the camera poses,
where xIk is the current IMU state, and xCi , i = 1 . . . m are for an EKF update. However, if a large number of features are
the camera poses (positions and orientations) at the times the processed at each time instant (the common situation), further
last m images were recorded. The IMU state is defined as: computational savings can be achieved. Specifically, in [10]
 T it is shown that instead of using the above residual, we can
xI = q̄T pT vT bTg bTa (2) equivalently compute the thin QR factorization of Ho , written
as Ho = QHr , and then employ the residual z̃r for updates,
where q̄ is the unit quaternion describing the rotation from defined as:
the global frame to the IMU frame, p and v denote the IMU z̃r = QT z̃o = Hr x̃ + nr (8)
position and velocity, respectively, while bg and ba are the
gyroscope and accelerometer biases. where nr is the r×1 noise vector, with covariance matrix σ 2 Ir .
The MSCKF uses the IMU measurements to propagate the Once the residual z̃r and the matrix Hr have been computed,
current IMU state and the covariance matrix, Pk+1|k , while the we compute the state correction and update the covariance
feature measurements are used for EKF updates. Let’s assume matrix via the standard EKF equations:
that the i-th feature, which has been observed in  images, has
Δx = Kz̃r (9)
just been lost from tracking (e.g., it went out of the field of
view). At this time, we use all the measurements of the feature Pk+1|k+1 = Pk+1|k − KSK T
(10)
to carry out an EKF update. Let the observations of the feature r r T 2
S = H Pk+1|k (H ) + σ Ir (11)
be described by the nonlinear equations zij = h(xCj , fi )+nij , K = Pk+1|k (Hr )T S−1 (12)
for j = 1 . . . , where fi is the feature position (described
by the inverse-depth parameterization [23]) and nij is the B. Computational complexity of the MSCKF
noise vector, modelled as zero-mean Gaussian with covariance The way in which the feature measurements are processed
matrix σ 2 I2 . Using all the measurements we compute a in the MSCKF is optimal, in the sense that no approximations
feature position estimate f̂i , and then compute the residuals are used, except for the EKF’s linearization [21]. This is true,
z̃ij = zij − h(x̂Cj , f̂i ), j = 1 . . . . By linearizing, these however, only if the sliding window of states, m, contains at
residuals can be written as: least as many states as the longest feature track. If it does not,
then the measurements that occurred more than m timesteps in
z̃ij  Hij x̃Cj + Hfij f̃i + nij , j = 1 . . .  (3)
the past cannot be processed. Therefore, to use all the available
feature information, the MSCKF must maintain a window of
where x̃Cj and f̃i are the estimation errors of the j-th camera
states long enough to include the longest feature tracks.
pose and i-th feature respectively, and the matrices Hij and
We now examine the dependence of the computational
Hfij are the corresponding Jacobians. Since the feature is
requirements of the MSCKF on the number of features and
not included in the MSCKF state vector, we proceed to
the lengths of the feature tracks. The computation time of the
marginalize it out. For this purpose, we first form the vector
MSCKF is dominated by the following operations:
containing the  residuals from all the feature’s measurements:
1) The computation of the residual and Jacobian matrix in (5),
z̃i  Hi x̃ + Hfi f̃i + ni (4) for each feature. If n features are processed,
 with feature track
lengths i , i = 1 . . . n, this requires O( ni=1 2i ) operations.
n
where z̃i and ni are block vectors with elements z̃ij and nij , 2) The Mahalanobis test, requiring O( i=1 3i ) operations.
respectively, and Hi and Hfi are matrices with block rows Hij 3) The computation of the residual and the Jacobian matrix
and Hfij , for j = 1 . . . . Subsequently, we define the residual in (8), which, by exploiting thestructure in Ho , can be
vector z̃oi = VT z̃i , where V is a matrix whose columns form implemented in approximately O( ni=1 3i ) operations.
a basis of the left nullspace of Hfi . From (4), we obtain: 4) The computation of the Kalman gain and the update of the
covariance matrix, which require O(r(15 + 6m)2 + r2 (15 +
z̃oi = VT z̃i  VT Hi x̃ + VT ni = Hoi x̃ + noi (5) 6m) + r3 ) operations. Here 15 + 6m is the size of the state
covariance matrix, and r is the number of rows of Hr . It
Once z̃oi and Hoi are available, we proceed to carry out a can be shown that, in general, r (which equals the number of
Mahalanobis gating test for the residual z̃oi , by computing: independent constraints for the camera poses) is given by [23]:
 −1 o
γ = (z̃oi )T Hoi Pk+1|k (Hoi )T + σ 2 I z̃i (6) r = 2((1) + (2) + (3) ) − 7 (13)

243
Canyon area
will be included in the state vector and processed as in EKF-
0.5 Town streets SLAM. By analyzing in detail the computational requirements
of each EKF update, it can be shown that, when many features
0.4
are present, there is nothing to be gained by initializing in the
Probability

0.3 state vector any feature observed fewer than m times [23].
Thus, the optimal (in terms of computational requirements)
0.2 strategy for using the features turns out to be a simple one: if
0.1
feature i’s track is lost after fewer than m frames (i.e., i < m),
then the feature is processed using the MSCKF equations, as
0 described in Section IV-A. On the other hand, if a feature is
5 10 15 20 25 30 35 40
Feature track length still actively being tracked after m images, it is initialized into
Fig. 1. Distribution of the feature track lengths in two parts of the Cheddar the state vector, and used for SLAM. Thus, the only choice
Gorge dataset [24]. The features detected were Harris corners, matched by that one needs to make is the size of the sliding window, m.
normalized cross-correlation.
In Section V we show how this can be chosen optimally.
where (1) , (2) , and (3) are the three longest feature tracks. At a given time step, the EKF update is carried out using
From the above we see that, while the computational cost a residual vector constructed by stacking together the residual
of the MSCKF is linear in the number of features, it is at least computed from the “MSCKF features” and from the sk
quadratic in the size of the sliding window, m. In fact, if the observations of the “SLAM features”:
size of the sliding window is chosen to be equal to the longest ⎡ r ⎤ ⎡ r ⎤
z̃k H
feature tracks, then r is also O(m), and the overall complexity ⎢ z̃1m ⎥ ⎢ H1m ⎥
⎢ ⎥ ⎢ ⎥
becomes cubic in m. This demonstrates a shortcoming of z̃k = ⎢ . ⎥  ⎢ . ⎥ x̃k + nk = Hk x̃k + nk (15)
the MSCKF: to preserve all measurement information, the ⎣ .. ⎦ ⎣ .. ⎦
complexity of the algorithm scales cubically as a function of z̃sk m Hsk m
the longest feature track length.
In the above equation, z̃jm , for j = 1 . . . sk are the residuals
In real-world datasets the distribution of the feature-track
of the observations of the sk SLAM features from the latest
lengths is non-uniform, with many features tracked for short
camera state (state m), and Hjm , for j = 1 . . . sk are the
durations, and very few stabler features tracked over longer
associated Jacobian matrices (see (3)). Each of these residuals
periods. For instance, Fig. 1 shows the distribution of the
is a 2×1 vector, while each Hjm is a 2×(15+6m+3sk ) matrix
feature track lengths in two parts of a real-world dataset [24].
(here 15 + 6m+ 3sk is the size of the state covariance matrix).
It can be seen that, even though the longest feature tracks
The residual z̃k and the Jacobian Hk are used for updating
reach 40 images, the vast majority of features is tracked for a
the state and covariance matrix, similarly to (9)-(12).
small number of frames (the percentage of features tracked in
five or less images is 88% and 69% in the two cases shown, For initializing new features, the m measurements of a
respectively). To be able to process the small percentage of feature are used to triangulate it and to compute its initial
features with long track lengths, the MSCKF must maintain covariance and the cross-correlation with other filter states.
a long sliding window, which is computationally inefficient. In our implementation, we use the inverse-depth feature pa-
In what follows, we show how we can integrate the MSCKF rameterization, due to its superior linearity properties. The
algorithm with EKF-SLAM, to address this limitation. latest camera clone, xCm , is used as the “anchor” state, based
on which the inverse-depth parameters are computed. If the
C. The hybrid MSCKF/SLAM algorithm feature is still actively being tracked at the time its anchor state
needs to be removed from the sliding window, the anchor is
The alternative way of processing feature measurements
changed to the most recent camera state, and the covariance
in the EKF is to include them in the state vector, and use
matrix of the filter is appropriately modified. The details of
their observations as in the standard visual-SLAM formulation.
feature initialization as well as the “anchor change” process are
As discussed in Section III, this approach has computational
shown in [23]. Finally, the hybrid MSCKF/SLAM algorithm
properties complementary to those of the MSCKF: while the
is described in Algorithm 1.
MSCKF is better at processing many features tracked for a few
frames, EKF-SLAM is faster when few features are tracked for
many frames. This motivates us to combine both approaches in V. O PTIMIZING THE PERFORMANCE OF THE HYBRID EKF
a single, “hybrid” estimator. Specifically, we formulate a filter
In this section we show how the size of the sliding window,
whose state vector at time-step k contains the current IMU
m, can be selected so as to minimize the computational
state, a sliding window of m camera poses, and sk features:
cost of the hybrid MSCKF/SLAM filter. As the results in
 T Section VI show, the choice of m has a profound effect on
xk = xTIk xTC1 · · · xTCm f1T · · · fsTk (14)
the time requirements of the algorithm. With a suitable choice,
This provides us with the ability to choose whether a feature the hybrid method can significantly outperform each of its
will be processed using the MSCKF approach, or whether it individual components.

244
Algorithm 1 Hybrid MSCKF/SLAM algorithm cannot provide any information about the global pose or scale,
Propagation: Propagate the state vector and covariance which correspond to 7 degrees of freedom. If many features
matrix using the IMU readings. are available, we will have r ≈ 6m − 7, and thus:

n
Update: Once camera measurements become available: fk ≈ α1 3i + α2 m3 + α3 m2 sk + α4 ms2k + α5 s3k
• Augment the state vector with the latest camera pose. i=1
• For features to be processed in the MSCKF (feature tracks
This approximate expression makes it possible to gain intuition
of length smaller than m), do the following
about the behavior of the computational cost of the hybrid
– For each feature to be processed, calculate the resid- method: as the size of the sliding window, m, increases,
ual and Jacobian matrix in (5). more features will be processed by the MSCKF, and fewer
– Perform the Mahalanobis gating test in (6). features will be included inthe state vector of the filter. Thus,
– Using all features that passed the gating test, form as m increases, the term ni=1 3i will also increase, but sk
the residual vector and the Jacobian matrix in (8). will decrease rapidly. These two opposing trends result in the
• For features that are included in the state vector, compute performance curves shown in Section VI (e.g., Fig. 3).
the residuals and measurement Jacobian matrices, and
form the residual z̃k and matrix Hk in (15). B. Minimizing the average operation count
• Update the state vector and covariance matrix, via (9)-
We now turn our attention to determining the optimal value
(12), using the residual z̃k and Jacobian matrix Hk .
of m, in order to minimize the runtime of the hybrid EKF
• Initialize features tracked in all m images of the sliding
estimator. Equation (16) provides us with the operation count
window.
of one filter update, so at first glance, it may appear that one
needs to minimize this equation with respect to m, at each
State Management: time instant. However, that would be an ill-posed problem. To
• Remove SLAM features that are no longer tracked, and see why, consider the case where, at time step k, the sliding
change the anchor pose for SLAM features anchored at window has length m = 20, and ten features exist that have
the oldest pose. been continuously tracked in 20 images. At this time, we have
• Remove the oldest camera pose from the state vector. If the option of either increasing the size of the sliding window,
no feature is currently tracked for more than mo poses or including the ten features in the state vector. Which is best
(with mo < m − 1), remove the oldest m − mo poses. depends on the future behavior of the feature tracks: if these
features end up being tracked for a very large number of
A. Operation count for each update frames (2 20), then it would be preferable to include the
By carefully analyzing the computations needed, we calcu- features in the state vector. If, on the other hand, the features
late the number of floating-point operations per update of the end up being tracked for only 21 frames, it would be preferable
hybrid algorithm: to increase m by one.
Clearly, it is impossible to obtain future information about

n
fk = α1 3i + α2 rm2 + α3 r2 m + α4 rmsk any particular feature track. We can however, collect statistical
i=1
information about the properties of the feature tracks, and
+ α5 (r + 2sk )3 + α6 (r + 2sk )2 (6m + 15 + 3sk ) use this information to minimize the expected cost of the
EKF updates. This is precisely the approach we implement.
+ α7 (r + 2sk )(6m + 15 + 3sk )2 + l.o.t (16) Specifically, during the filter’s operation, we collect statistics
where the αi ’s are known constants, n is the number of fea- to learn the probability mass function (pmf) of the feature track
tures used in the MSCKF, r is defined in (13), and l.o.t. stands lengths, p(i ), the probability of failure of the Mahalanobis
for lower-order terms2 . The terms shown above correspond gating test, as well as the pmf of the number of features tracked
to the computations for the MSCKF features (1st term), for in the images. Using the learned pmfs, we compute the average
the matrix S of the hybrid filter (2nd-4th terms), and for the number of operations needed for each EKF update, f¯(m), by
EKF update (5th-7th terms). Note that (16) also models the direct application of the definition of the expected value of a
probability of failure for the Mahalanobis test. function of random variables.
It is interesting to examine the properties of (16). First, we The value of m that yields the minimum f¯(m) can be found
note that since r represents the number of constraints provided by exhaustive search among all possible values. However, we
by the MSCKF features for the poses in the sliding window, have found that the cost curve in practical cases is quasicon-
it is bounded above by 6m − 7: the total number of unknowns vex, which means that it has a unique minimum (see Fig. 2).
in the sliding window is 6m, and the feature measurements Therefore, to reduce the time needed for the optimization, we
perform the optimization by local search starting from a known
2 The full expression for the operation count consists of tens of individual
good initial guess (e.g., the last computed value of m). Since
terms, whose inclusion would merely complicate the presentation, and not add
much insight. Note however, that in the optimization described Section V-B, the statistical properties of the feature tracks can change over
the complete expression is used. time (see Fig. 1), we perform the learning of the pmfs as well

245
4.5 Fig. 3 plots the results from the application of the hybrid
Actual time
4 Expected flop count filter in this setting. Specifically, the top plot shows the average
runtime for each update of the hybrid algorithm. The solid blue
3.5
line is the average time when m is chosen in advance, and kept
3 constant for the entire trajectory. The red dashed line denotes
2.5 the runtime achieved when applying the optimization process
described in Section V, which optimally chooses m in each
2
time window, to adapt to local feature properties. This plot
1.5 shows that, by optimizing the value of m in real time, we can
1 attain performance higher than that of any fixed value. Note
5 10 15 20 25 30 that when m is large, no features are initialized, and thus the
Sliding window size right-most part of the plot gives the performance of the plain
Fig. 2. Comparison of actual runtime vs. expected flop count. Since the MSCKF (similarly, for small m we obtain pure EKF-SLAM).
curves have different units, each has been normalized with respect to its
minimum value. Therefore, from this plot we can see that the optimal hybrid
filter has a runtime 37.17% smaller than that of the MSCKF,
as the selection of the optimal m in consecutive time windows and 72.8% smaller than EKF-SLAM.
spanning a few seconds (15 sec in our implementation). In the second subplot of Fig. 3 we plot the RMS position
It is worth noting that in modern computers the use of flop error, averaged over all Monte-Carlo trials and over the dura-
counts to model computational cost is not always suitable, as tion of the trajectory. We can observe that the plain MSCKF
performance is affected by several factors including vector- results in the highest accuracy. We attribute this to two causes.
ization, cache access patterns, data locality, etc. However, we First, in the MSCKF features are explicitly marginalized, and
have experimentally verified that in the algorithms considered thus no Gaussianity assumptions are needed for the pdf of the
here, and for our implementation, the computed flop counts feature position errors (as is the case in SLAM). Second, all the
closely follow the observed runtimes. Specifically, Fig. 2 measurements of each feature are used jointly in the MSCKF,
shows the actual runtime of the hybrid filter, as well as the which means that outliers can be more easily detected, and
value f¯(m) calculated analytically, for a specific trajectory. better linearization points can be computed. By combining the
We can observe that the two curves are very similar, with the MSCKF with EKF-SLAM some accuracy may be lost, as the
only significant differences observed at the two “extremes” of errors for the features included in the state vector are now
very small or very large m. These regions are less important, assumed to be Gaussian. However, we can see that if the size
however, as they fall outside the typical operational region of of the sliding window increases above a moderate value (e.g.,
the hybrid filter. Thus, using f¯(m) as the objective function 9 in this case), the change in the accuracy is almost negligible.
to minimize is appropriate. Intuitively, when a sufficient number of observations is used to
initialize features, the feature errors’ pdf becomes “Gaussian
VI. E XPERIMENTAL RESULTS enough” and the accuracy of the hybrid filter is very close to
that of the MSCKF. Based on these results, in our optimization
A. Simulated data we do not allow the value of m to fall below a certain threshold
We generate simulation data that closely match real-world (set to 7 in our implementation).
datasets, to have as realistic a test environment as possible. The timing results presented in Fig. 3 are obtained on a
Our first simulation environment is based on the dataset laptop computer with a Core i7 processor at 2.13GHz, and
of [24], which consists of a 29.6-km long trajectory through a single-threaded C++ implementation. Clearly, if the data
canyons, forested areas, and a town. We generate a ground were to be processed on this computer, the timing performance
truth trajectory (position, velocity, orientation) that matches would easily allow for real-time implementation (the hybrid
the vehicle’s actual trajectory, as computed by a high-precision filter requires fewer than 4 msec per update with optimal
INS system. Using this trajectory, we subsequently generate m). However, our primary interest is in implementing pose
IMU measurements corrupted with noise and bias character- estimation on small portable devices. For this reason, we
istics identical to those of the Xsens MTi-G sensor used in conducted a second set of tests, in which the data were
the dataset. Moreover, we generate monocular feature tracks processed on a Samsung Galaxy S2 mobile phone, equipped
with statistical characteristics similar to those of the features with a 1.2-GHz Exynos 4210 processor. For these tests, we
extracted in the actual dataset. The number of features ob- ported our C++ implementation to Android using the Android
served in each image and the feature track distribution change NDK. The simulation data were produced in a similar way as
in each part of the trajectory, as in the actual dataset (see described above, by emulating a real-world dataset collected
Fig. 1). Overall, there are 232 features observed in each image while driving in a residential area of Riverside, CA. The
on average, and the average feature track length is 5.6 frames. vehicle trajectory and statistics of the dataset (e.g., feature
The IMU measurements are available at 100 Hz, while the distribution, feature track length, and so on) are different, al-
camera frame rate is 20 Hz, as in the actual dataset. All the lowing us to test the proposed algorithm in different situations.
results reported are averages over 50 Monte-Carlo trials. Fig. 4 shows the results of the hybrid filter in this setting.

246
14
Fixed sliding window size
MSCKF
12 Adaptive sliding window size Hybrid MSCKF/SLAM
Time (msec)

SLAM
10 600

8
400
6

North−South (m)
4
200
5 10 15 20 25 30

0
80
Position error (m)

−200
70
−400
−600 −400 −200 0 200 400 600 800 1000
60 East−West (m)

Fig. 5. The trajectory estimates of the MSCKF, EKF-SLAM, and the hybrid
50 algorithm in the real-world experiment.
5 10 15 20 25 30
Sliding window size
Fig. 3. Monte-Carlo simulation results: Timing performance and rms position B. Real-world data
errors of the hybrid filter, for changing values of m. Timing measured on a
laptop computer.
In addition to the synthetic datasets described above, we
employed our proposed algorithm for processing the feature
80 measurements recorded during a real-world experiment. Dur-
Fixed sliding window size ing this experiment an IMU/camera platform was mounted on
70 Adaptive sliding window size
top of a car and driven on city streets. The sensors consisted
Time (msec)

60 of an Inertial Science ISIS IMU and a PointGrey Bumblebee2


50 stereo pair (only a single camera’s images are used). The IMU
40 provides measurements at 100 Hz, while the camera images
were stored at 10 Hz. Harris feature points are extracted, and
30
5 10 15 20 25 30 matching is carried out by normalized cross-correlation. The
vehicle trajectory is approximately 5.5 km long, and a total of
40 7922 images are processed.
In this dataset, to compensate for the low frame rate of
Position error (m)

30
the images, we use lower feature-detection thresholds, which
20 leads to a larger number of features. Specifically, 540 features
are tracked in each image on average, and the average track
10
length is 5.06 frames. This substantially increases the overall
0 computational requirements of all algorithms. When run on
5 10 15 20 25 30
Sliding window size the mobile phone’s processor, the average time per update is
Fig. 4. Monte-Carlo simulation results: Timing performance and rms position 139 msec for the MSCKF, 774 msec for EKF-SLAM, and
errors of the hybrid filter, for changing values of m. Timing measured on a 77 msec for the hybrid filter with optimal m. In Fig. 5 the
Samsung Galaxy S2 mobile phone. Note that the dataset used here is different trajectory estimates for each of the three methods are plotted
from the one used in Fig. 3, hence the different accuracy.
on a map of the area where the experiment took place. We
observe that the accuracy of the MSKCF and the hybrid filter
These results are very similar to what was observed in the first are similar, and substantially better than that of the EKF-
simulation, with the optimal hybrid filter outperforming each SLAM.
of the individual algorithms by a wide margin (runtime 45.8% In Fig. 6 we plot the computed values for the optimal
smaller than the MSCKF, and 55.6% smaller than SLAM). m during the experiment. This figure shows that, due to
More importantly, however, we observe that the hybrid filter the changing properties of the feature tracks’ distribution,
is capable of processing the data at real-time speeds, even the optimal value varies considerably over time, justifying
on the much less capable processor of the mobile phone. the need for periodic re-optimization. As a final remark, we
Specifically, the average time needed for each update of the note that the optimization process itself is computationally
hybrid filter with optimally chosen thresholds is 33.78 msec, inexpensive. In our implementation, the optimal threshold is
corresponding to a rate of 30 images per second. Since the re-computed every 15 sec, and this process takes 0.31 msec,
images are recorded at 20 Hz, this means that the proposed on average, on the mobile phone. Therefore, the optimization
hybrid estimator is capable of real-time processing, something takes up fewer than 0.003% of the overall processing time,
that is not possible with any of the individual methods. while resulting in substantial performance gains.

247
21
[3] M. Walter, R. Eustice, and J. Leonard, “Exactly sparse extended informa-
20 tion filters for feature-based SLAM,” International Journal of Robotics
Research, vol. 26, no. 4, pp. 335–359, Apr. 2007.
19 [4] M. Kaess, A. Ranganathan, and F. Dellaert, “iSAM: Incremental smooth-
Optimal window size

18 ing and mapping,” IEEE Transactions on Robotics, vol. 24, no. 6, pp.
1365 –1378, Dec. 2008.
17 [5] E. D. Nerurkar and S. I. Roumeliotis, “Power-SLAM: a linear-
16
complexity, anytime algorithm for SLAM,” International Journal of
Robotics Research, vol. 30, no. 6, pp. 772–788, May 2011.
15 [6] S. J. Julier, “A sparse weight Kalman filter approach to simultaneous
localisation and map building,” in Proceedings of the IEEE/RSJ Inter-
14
national Conference on Intelligent Robots and Systems, Maui, HI, Oct.
13 29-Nov. 3 2001, pp. 1251–1256.
0 100 200 300 400 500 600 700 800 [7] S. Thrun, Y. Liu, D. Koller, A. Ng, Z. Ghahramani, and H. Durrant-
Time (sec)
Whyte, “Simultaneous localization and mapping with sparse extended
Fig. 6. The length of the sliding window selected via the optimization process information filters,” International Journal of Robotics Research, vol. 23,
for the real-world experiment. no. 7-8, pp. 693–716, Aug. 2004.
[8] H. Strasdat, C. Stachniss, and W. Burgard, “Which landmark is useful?
VII. D ISCUSSION AND CONCLUDING REMARKS Learning selection policies for navigation in unknown environments,”
in Proceedings of the IEEE International Conference on Robotics and
In this work, we presented a hybrid MSCKF/SLAM esti- Automation, Kobe, Japan, May 2009, pp. 1410 –1415.
mator for visual-inertial odometry, as well as a methodology [9] A. J. Davison and D. W. Murray, “Simultaneous localisation and map-
building using active vision,” IEEE Transactions on Pattern Analysis
for optimizing the estimator’s computational efficiency. The and Machine Intelligence, vol. 24, no. 7, pp. 865–880, Jul. 2002.
results presented in the preceding section demonstrate that the [10] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalman
proposed algorithm offers dramatic performance gains over filter for vision-aided inertial navigation,” in Proceedings of the IEEE
International Conference on Robotics and Automation, Rome, Italy, Apr.
both the MSCKF and EKF-SLAM individually. In all the cases 2007, pp. 3565–3572.
examined, the algorithm was able to attain runtimes that allow [11] J. Kelly and G. Sukhatme, “Visual-inertial sensor fusion: Localization,
for real-time performance, even on the low-end (by today’s mapping and sensor-to-sensor self-calibration,” International Journal of
Robotics Research, vol. 30, no. 1, pp. 56–79, Jan. 2011.
standards) processor of a mobile phone. These results validate [12] E. Jones and S. Soatto, “Visual-inertial navigation, mapping and local-
the learning-based optimization approach for designing pose ization: A scalable real-time causal approach,” International Journal of
estimation algorithms outlined in Section I. Robotics Research, vol. 30, no. 4, pp. 407–430, Apr. 2011.
[13] T. Dong-Si and A. I. Mourikis, “Motion tracking with fixed-lag smooth-
In this paper we have exclusively focused on optimizing ing: Algorithm and consistency analysis,” in Proceedings of the IEEE
the performance of the estimator (the estimation “back end” International Conference on Robotics and Automation, Shanghai, China,
as it is sometimes called). The “front end” of visual feature May 2011, pp. 5655 – 5662.
[14] G. Sibley, L. Matthies, and G. Sukhatme, “Sliding window filter with
extraction was not considered in this work. Obviously, for real- application to planetary landing,” Journal of Field Robotics, vol. 27,
time operation on a resource-constrained device both of these no. 5, pp. 587–608, Sep. 2010.
components must be optimized, and this is the subject of our [15] M. Li and A. I. Mourikis, “Improving the accuracy of EKF-based visual-
inertial odometry,” in Proceedings of the IEEE International Conference
ongoing work. on Robotics and Automation, St Paul, MN, May 2012, pp. 828–835.
As a final remark, we would like to stress that the learning- [16] A. Davison, I. Reid, N. Molton, and O. Stasse, “MonoSLAM: Real-
based performance optimization approach proposed in this time single camera SLAM,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, Jun. 2007.
paper is general, and can be applied to different estimators, and [17] P. Pinies and J. D. Tardos, “Scalable SLAM building conditionally
augmented with additional information. For instance, one can independent local maps,” in Proceedings of the IEEE/RSJ International
model the correlations between the track lengths of different Conference on Intelligent Robots and Systems, San Diego, CA, Nov.
2007, pp. 3466–3471.
features, which are now assumed independent. Moreover, one [18] R. Munguia and A. Grau, “Monocular SLAM for visual odometry,” in
can use additional information (e.g., the distinctiveness of Proceedings of the IEEE International Symposium on Intelligent Signal
features) to predict the track length of each feature individ- Processing, Alcala Henares, Spain, Oct. 2007, pp. 1–6.
[19] D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for vision-
ually, or its likelihood of passing the Mahalanobis gating aided inertial navigation,” in IEEE Workshop on Motion and Video
test. Such improvements will allow further optimization of the Computing, Breckenridge, CO, Jan. 2005, pp. 221–228.
algorithm’s performance. [20] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmenting
inertial navigation with image-based motion estimation,” in Proceedings
ACKNOWLEDGMENTS of the IEEE International Conference on Robotics and Automation,
Washington D.C, May 2002, pp. 4326–33.
This work was supported by the National Science Founda- [21] M. Li and A. I. Mourikis, “Consistency of EKF-based visual-inertial
tion (grant no. IIS-1117957), the UC Riverside Bourns College odometry,” University of California Riverside, Tech. Rep., 2011,
www.ee.ucr.edu/∼mourikis/tech reports/VIO.pdf.
of Engineering, and the Hellman Family Foundation. [22] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Real-time monocular
SLAM: Why filter?” in Proceedings of the IEEE International Con-
R EFERENCES ference on Robotics and Automation, Anchorage, AL, May 2010, pp.
[1] J. E. Guivant and E. M. Nebot, “Optimization of the simultaneous 2657–2664.
localization and map building algorithm for real time implementation,” [23] M. Li and A. I. Mourikis, “Optimization-based estimator design
IEEE Transactions on Robotics and Automation, vol. 17, no. 3, pp. 242– for vision-aided inertial navigation: Supplemental materials,” 2012,
257, June 2001. www.ee.ucr.edu/∼mli/SupplMaterialsRSS2012.pdf.
[2] L. M. Paz, J. D. Tardos, and J. Neira, “Divide and conquer: EKF SLAM [24] R. Simpson, J. Cullip, and J. Revell, “The Cheddar Gorge data set,”
in O(n),” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1107 –1120, Tech. Rep., 2011.
Oct. 2008.

248
Practical Route Planning Under Delay Uncertainty:
Stochastic Shortest Path Queries
Sejoon Lim∗ , Christian Sommer∗ , Evdokia Nikolova† , Daniela Rus∗
∗ ComputerScience and Artificial Intelligence Lab, MIT
Email: {sjlim, csom, rus}@csail.mit.edu
† Department of Computer Science and Engineering, Texas A&M University
Email: [email protected]

Abstract—We describe an algorithm for stochastic path plan-


ning and applications to route planning in the presence of traffic
delays. We improve on the prior state of the art by designing,
analyzing, implementing, and evaluating data structures that an-
swer approximate stochastic shortest-path queries. For example,
our data structures can be used to efficiently compute paths that
maximize the probability of arriving at a destination before a
given time deadline.
Our main theoretical result is an algorithm that, given a
directed planar network with edge lengths characterized by
expected travel time and variance, pre-computes a data structure
in quasi-linear time such that approximate stochastic shortest-
path queries can be answered in poly-logarithmic time (actual
worst-case bounds depend on the probabilistic model).
Fig. 1. Path query example for maximizing the probability of reaching the
Our main experimental results are two-fold: (i) we provide
destination within a user-specified deadline. Our system finds such a path
methods to extract travel-time distributions from a large set of within milliseconds for a city-sized network. When a user at origin ‘O’ wants
heterogenous GPS traces and we build a stochastic model of an to get to destination ‘D’ within 30 minutes between 5pm and 6pm on a
entire city, and (ii) we adapt our algorithms to work for real- weekday, Path 1 offers the best chance of reaching the destination within the
world road networks, we provide an efficient implementation, deadline. However, when the deadline changes to 20 minutes, Path 1 has less
and we evaluate the performance of our method for the model than a 50% chance for making the deadline. In this case, Path 2 is the best
of the aforementioned city. route, offering a 88.5% chance for on-time arrival.

I. I NTRODUCTION
We present and evaluate a new algorithm for planning the of arriving at a destination within a specified deadline in city-
path of vehicles on roadways in the presence of uncertainty scale route graphs. This work provides a planning system
in traffic delays. We build on prior work on stochastic path that can be used by autonomous as well as human-driven
planning [15, 19, 20], which introduces stochastic path plan- vehicles for traffic routing applications that meet travel goals
ning algorithms in this model. This prior work presents novel such as “when should you leave, and what path should you
provably correct algorithms that are well suited for planning take, to reach the destination by the given deadline with
the path of a vehicle on small-scale graphs; however, the high probability?” Preserving the inherently non-deterministic
computational complexity of this prior suite of stochastic path nature of actual traffic, we work with accurate probabilistic
planning algorithms makes these algorithms impractical to use models of traffic-intense areas — in our work, we actually
for real-time path planning on city-scale road networks. build a probabilistic model of an entire city based on a large
The stochastic path planning model incorporates traffic his- set of heterogenous GPS traces.
tory in the form of probability distributions of delays observed Our Contributions: The algorithm in this paper improves
on actual road segments. The probability distributions capture the state of the art on stochastic path planning [18, 19, 20]
the historical delay observations that are assigned as weights and its application to traffic routing [15]. The running time
on the edges of the road network. The stochastic path planning of the state-of-the-art stochastic-shortest-path algorithm is
algorithm minimizes a user-specified cost function of the delay approximately quadratic in the number of nodes, rendering
distribution. Current route planning methods can produce it too slow for real-time performance on city-scale networks.
routes with shortest expected time (based on historical traffic We improve upon their running time by using efficient data
data) and some recent commercial systems, using almost-real- structures. In contrast to previous work [18], we provide a
time data based on cell-phone locations [22, 26], can also two-phase algorithm, consisting of preprocessing and query
compute “smartest” routes in dynamic traffic scenarios. The algorithms, better suited for practical implementations. Our
algorithm in this paper improves on and extends previous work preprocessing algorithm runs in quasi-linear time and it does
by enabling fast routes guaranteed to maximize the likelihood not depend on the origin-destination pair or on the deadline,

249
which is the main theoretical challenge. Our query algorithm and selecting the one with minimal objective (2) (see [18, 20]
runs in sublinear time (whereas the algorithms in [15] run for more details). The number of deterministic shortest-path
in polynomial time). The running time is polylogarithmic, iterations is at most nO(log n) in the worst case [20]. Further-
offering exponential speedups over previous algorithms and al- more, there is a fully-polynomial-time approximation scheme
lowing for almost instantaneous response times (see Lemma 1 (FPTAS) for the problem, as stated in the following theorem:
and Theorem 3 for details). Theorem 1 ([18, Theorem 4.1]): Suppose we have a δ –
We have implemented the algorithms described in this approximation algorithm for solving the deterministic shortest-
paper and packaged them as a route planning system. We path problem (1). For any ε > 0, the mean-risk model (2)
evaluated the system using the road map of an entire city. can be approximated to a multiplicative factor of δ (1 + ε)
Using GPS traces from a large fleet of taxis (approximately by calling the algorithm for the deterministic shortest-path
16,000 vehicles), we extract a time-dependent, probabilistic problem polynomially many times in the input size and 1/ε.
model for each individual road segment. Our algorithms In other words, one can use both exact (δ = 1) and δ –
preprocess the road network equipped with probabilistic edge approximate (for any δ > 1) shortest-path subroutines and
lengths into a data structure. Using this data structure, our obtain a corresponding approximate risk-averse solution.
query algorithm can efficiently answer approximate stochastic b) Probability-Tail Model: In this model, we are given
shortest-path queries. Our experiments validate the theoretical a deadline d and we seek the path maximizing the proba-
performance predictions, showing improvements upon existing bility of arriving before the deadline: maximize Pr(wT x ≤
query algorithms by several orders of magnitudes. d) subject to x ∈ F . This model assumes normally-distributed
edge delays in order to obtain a closed-form expression for the
II. BACKGROUND
probability tail. In particular, when w is Gaussian, the problem
A. Stochastic Shortest Paths is transformed into the following formulation [18]:
In this section, we provide a brief overview of the stochastic
d − μT x
shortest-paths problem and known results, extracted from [18]. maximize √ subject to x ∈ F . (3)
Consider a graph G with a given source node s and τT x
destination node t with stochastic edge delays we that come Similarly to the mean-risk model, problem (3) can be solved
from independent distributions with means and variances exactly in time nO(log n) [20] and it can be approximated as
(μe , τe ), respectively. Denote the vector of edge delays by follows:
w = (w1 , ..., w|E| ) and the corresponding vectors of edge Theorem 2 ([18, Theorem 4.2]): Suppose we have a δ –
means and variances by μ = (μ1 , ..., μ|E| ) and τ = (τ1 , ..., τ|E| ). approximation algorithm for solving the deterministic shortest-
Denote the incidence vector of a path by x ∈ {0, 1}|E| with ones path problem
, (1). For any ε > 0, the probability tail model (3)

has a 1 − δ −(1−ε
corresponding to edges present in the path and zeros otherwise. 2 /4)
(2+ε)ε/4 –approximation algorithm that calls
Let the feasible set of s-to-t–paths be F ⊂ {0, 1}|E| .
Given the uncertainty of edge delays, it is not immediate the algorithm for the deterministic shortest-path problem poly-
how to define a stochastic shortest path: is it the path that nomially many times in 1/ε and the input size, assuming the
minimizes mean delay? Or should it minimize path variance optimal solution to problem (3) satisfies μ T x∗ ≤ (1 − ε)d for
or some other metric? Risk-averse users would naturally care a user-specified deadline d.
about the variability in addition to mean delays. Two risk- In our scenario, one main objective is to minimize the
averse models that have been previously considered are the running time of the overall procedure at query time and hence
mean-risk model and the probability-tail model, which we the running times of these shortest-path subroutines are of high
describe below. priority.
The algorithms for both models make black-box calls to B. Approximate Distance Oracles
the underlying deterministic shortest-path algorithm, which
computes An (approximate) distance oracle is a data structure that
efficiently answers shortest-path and/or distance queries in
min wT x subject to x ∈ F . (1) graphs. Relevant quantities of a distance oracle include the
preprocessing time, which is the time required to construct the
a) A Mean-Risk Model: In this model, we seek the path data structure, the space consumption, the query time, and, for
minimizing mean delay plus a risk-aversion coefficient c ≥ 0 approximate distance oracles, the stretch, which is the worst-
times the path standard deviation, more formally: case ratio among all pairs of query nodes of the query result
√ divided by the actual shortest-path distance. Approximate
minimize μ T x + c τ T x subject to x ∈ F . (2)
distance oracles using quasi-linear space for poly-logarithmic
This is a non-convex combinatorial problem, which can be query time and stretch (1 + ε) are known for planar [24]
solved exactly by enumerating all paths that minimize some (implementation in [16]), bounded-genus [14], minor-free [1],
positive combination of mean and variance (the latter is a de- geometric [13], and bounded-doubling-dimension graphs [4].
terministic shortest-path problem with respect to edge lengths For realistic traffic scenarios, we need the distance oracle to
equal to the corresponding mean–variance linear combination) work with directed graphs (also called digraphs). For planar

250
digraphs, Thorup [24] gave an approximate distance oracle that defined by a deadline or a risk-aversion coefficient. We de-
occupies space O(nε −1 log(nL) log(n)) and answers (1 + ε)– velop a method that finds a set of parameter values for the
approximate distance queries in time O(ε −1 + log log(nL)), entire network, independent of the specific problem, and that
where L denotes the largest integer length (see [14] for provides guaranteed approximation accuracy for all choices
different tradeoffs between space and query time). of parameters simultaneously. We show that it suffices to
At a high level, Thorup’s method works as follows. The maintain only a small set of parameters, which allows for small
preprocessing algorithm recursively separates the graph into storage and efficient preprocessing. We preprocess the network
subgraphs of approximately equal size. Shortest paths are used using this set of parameter values.
as separators. For planar graphs, it is possible to separate the We implement the deterministic shortest-path algorithms
graph into two roughly balanced pieces by using three shortest required in [18] using a distance oracle, as described in
paths Q1 , Q2 , Q3 . Next, the preprocessing algorithm considers the previous section. This allows for fast query times. The
each node v ∈ V and it computes O(1/ε) portals for v on preprocessing algorithm, while taking quasilinear time, is
each separator path Qi such that the shortest path from v to executed only once and the data structure can be computed
each node q ∈ Qi is approximated within a (1 + ε)–factor by on a powerful machine (not necessarily the machine where the
going through one of the portals. Since each node only needs query algorithm is run on). One important technical difference
to remember O(1/ε) distances to portals per level, and since compared to the method described in [15] is that we cannot use
each node participates in O(log n) levels, the overall space an adaptive algorithm at query time, since the preprocessing
complexity per node is bounded by O(ε −1 log n). For directed algorithm must prepare for each step that might get executed
planar graphs, the construction is more involved and the space at query time. In other words, the preprocessing algorithm
and time complexities are slightly higher. must prepare for all possible query pairs (s,t) ∈ V × V , and,
potentially, deadline parameter values d ∈ R. While such
preprocessing algorithms are standard for classical graph dis-
tances, adaptations are necessary for stochastic shortest paths.
B. Preprocessing Algorithm
We adapt the algorithms in Theorem 1 and Theorem 2 to
our scenario, where we have access to an approximate oracle
for the deterministic shortest-path problem with edge lengths
e → k · μ(e) + τ(e) for some k ∈ K . In previous work [18],
the aim was a polynomial-time algorithm and shortest-path
Fig. 2. For each node v we have O(1/ε) portals on each separator path Q problems could be computed in quasi linear time for any value
such that the shortest path from v to each node q ∈ Q is approximated within k. Since we aim at sublinear query time, the queries made to
a (1 + ε)–factor by going through one of the portals. Note that two nodes u
and v may not necessarily use the same set of portals. the oracle need to be non-adaptive, or at least restricted to
a small set of possible queries K . We can then precompute
Furthermore, many efficient practical methods have been distance oracles for all these values k ∈ K . On a high level,
devised [5, 10, 21], their time and space complexities are the preprocessing algorithm is rather simple.
however difficult to analyze. Competitive worst-case bounds
have been achieved under the assumption that actual road Algorithm 1: P REPROCESS
networks have small highway dimension [2, 3]. We point
Data: G = (V, E) with given lengths μ, τ : E → R+
out one particular method, which is in some sense related
Result: distance oracles orck for each k ∈ K
to our construction. Geisberger, Kobitzsch, and Sanders [11]
compute the set K ;
provide a heuristic shortest-path query method with edge
for k ∈ K do
lengths defined as a customizable linear combination of two orck ← build (approximate) distance oracle for length
edge length functions (this linear combination, however, is not function e → k · μ(e) + τ(e);
arbitrary, which is why we cannot use their data structure). end
III. S TOCHASTIC M OTION P LANNING WITH S PARSE
P RECOMPUTED DATA
Note that this preprocessing algorithm can be parallelized
A. Overview in a straightforward way. Since there are no dependencies
We provide a new stochastic path planning algorithm and between the oracles constructed for different values of k,
its implementation in a real traffic network. The main research the preprocessing algorithm can also make use of multi-core
objective is to minimize the time for querying a path. Algo- architectures.
rithms studied in [15, 20] find an exact solution by iteratively As an immediate consequence, we obtain the following:
determining the parameter value k that governs the edge length Claim 1: The preprocessing time is bounded by O(|K | ·
(e) := k · μ(e) + τ(e) in terms of mean μ and variance τ. P), where P denotes the preprocessing time of the deter-
Such parameter values depend on a specific problem instance, ministic distance oracle.

251
A main technical contribution of this work is showing how on dL = max d as follows. As soon as the deadline is so large
to compute (and analyze) this set K . The size of K mostly that shortest paths do not change anymore, we do not have to
depends on the user’s choice of ε. consider larger values of d. In a straightforward way, dL can be
We run the deterministic preprocessing algorithm for the computed by computing all-pairs shortest paths for increasing
graph
 with edge lengths k (e) = k · μ(e) + τ(e) for k ∈ values of d. Alternatively, again for increasing values of d, a
L, (1 + ξ )L, (1 + ξ )2 L, . . .U for ξ , L,U to be defined for 2–approximation of the diameter can be found in linear time.
each model. Also, the approximation parameter ε we use for
the preprocessing algorithm in the distance oracle changes C. Query Algorithm
with the model. The simplest version just tries all values k ∈ K .
1) A Mean-Risk Model: Using a δ –approximate distance
oracle, the end result is a δ (1 + ε)–approximation. Therefore, Algorithm 2: Q UERY
when asking for a (1 + ε)–approximate answer, we internally
Data: G = (V, E) with given lengths μ, τ : E → R+ ;
set εint := ε/3, since, for ε < 1/2, we have that (1 + ε/3)2 ≤
distance oracles orck for each k ∈ K ; origin O,
1 + 2ε/3 + ε 2 /9 ≤ 1 + ε. In the following, we write ε instead
destination D, and mean risk coefficient c or
of εint . Next, we set ξ , L, and U as follows.
, deadline d depending on the risk model
ε Result: Approximately Optimal Path p
ξ = p ← NIL;
1+ε
L = min τ(e) for k ∈ K do
e q ← f ind path(O, D, orck ) ;
√ √
U = ∑ τ(e) ≤ n · max
e
τ(e) if ( meanRisk and q.μ + c q.τ < p.μ + c p.τ)
e d−q.μ d−p.μ
or ( probTail and q.τ > p.τ ) then
Alternatively, U can be set to the largest possible variance of p ← q;
any path. We have that the total number of values k is at most end
   
max τ(e) √ end
|K | = log1+ξ (U/L) = O log n / ε . return p;
min τ(e)
As a consequence, we obtain the following lemma.
Lemma 1: For any graph G = (V, E) and for any two length Claim 2: The query time is bounded by O(|K | · Q), where
functions μ : E →R+ and τ: E → R+ , there exists a set K
 Q denotes the query time of the deterministic distance oracle.
τ(e)
of size |K | = O ε −1/2 log n max min τ(e) such that mean-risk
shortest paths can be approximated by a shortest path in G Combined with Claim 1, we obtain our main theorem.
with length function k (e) = k · μ(e) + τ(e) for some k ∈ K . Theorem 3: There is a data structure that can be computed
2) Probability-Tail Model: For the probability-tail model, in time O(|K | · P), occupying space O(|K | · S ), and an-
we follow the same strategy as in the previous section, swering approximate stochastic shortest-path queries in time
based on the proof of [18, Lemma 3.4], showing that, in the O(|K | · Q), where P, S , Q denote the preprocessing, space,
mean–variance plane, any fixed parabola (also termed level and query complexities of the deterministic distance oracle,
set) with apex (d, 0) can be ε–approximated by a piecewise respectively.
linear curve (see [18, Figure 1(b)] for an illustration) using In the practical part, we also have a heuristic that binary
few line segments. We exploit this ε–approximation in our searches for the optimal value in K , resulting in faster query
algorithm. Furthermore, we define level sets irrespective of times.
the deadline d: we may actually define K for any value of d Plugging in the distance oracle of Thorup [24], we obtain
— for a given query deadline d 3 , the actual parabolas are just our main result. We state it in the mean-risk model. An
shifted to the left or to the right. Each segment defines a slope analogous bound holds for the probability-tail model.
value k and the union of all these slope values (their absolute To apply [24, Theorem 3.16], we set the largest integer
value, actually) is our set of slopes. Note that shifting the length N (k) := max k · μ(e) + τ(e) for each preprocessing step
e∈E
parabola to the left or to the right does not affect these slopes. τ(e)
k ∈ K . Also, let τ̄ = maxmin τ(e) .
We set ξ , L, and U as follows. Corollary 1: For any directed planar graph, there
ξ = ε/2 exists a data structure that can be computed in time
2 mine τ(e) O(n(log
 (nτ̄))(lg(nN))(lg n)3 ε −5/2 ), occupying space
L = O n · ε −3/2 (lg n)(lg(nN))(log (nτ̄)) , such that approximate
dL
stochastic shortest-path queries can be answered in time
2 ∑e τ(e) 2n maxe τ(e) 2n maxe τ(e)
U = ≤ ≤ O(ε −1/2 log (nτ̄) lg lg(nN) + ε −3/2 log (nτ̄)).
εdU εdU ε mine μ(e) The space can be reduced at the cost of increasing the query
Although the user may choose to query the data structure time by using the construction in [14]. In particular, for
for an arbitrarily large deadline d, we can give an upper bound scenarios where memory is scarce (e.g. on mobile devices),

252
one can obtain a linear-space data structure. Note that the Turn restrictions: For some intersections, particularly in
preprocessing algorithm can be executed on a powerful com- urban settings, it may not be possible to make a left turn due
puter and not necessarily on the mobile device. The query time to a turn restriction, or it may just be more time-consuming to
remains poly-logarithmic. make a left turn, which can be modeled by a turn cost. Turn
costs can be taken considered by modeling each road segment
IV. S TOCHASTIC T RAFFIC M ODEL BASED ON GPS T RACES by a node of the graph, and then connecting only those
nodes whose corresponding road segments can be traversed in
We compute a stochastic model of an entire city using sequence [7, 28]. Such a transformation increases the number
the following process. We start with a heterogeneous set of of nodes [9] and, more importantly for our oracles, it also
approximately 500M anonymized GPS points. Grouped by violates planarity. We argue that planarity is not just violated
individual vehicles, we extract those traces (subpaths), for in a moderate way but that the violation is actually such that
which the sampling frequency is sufficiently high (at least one it cannot possibly be fixed. If arbitrary turn restrictions were
data point every thirty seconds). For each trace, we employ allowed, we could just (i) draw any graph in the plane (for
map-matching algorithms to compute the sequence of road example the large-girth graphs in [25] or the expander graphs
segments that has the largest likelihood to have been used in [23]), (ii) add a node for each intersection, (iii) apply
by this particular trace. The process of matching GPS traces left-turn and right-turn restrictions (i.e., apply the supposedly-
data into a map has been investigated before [12, 27]. To existing transformation without violating planarity), (iv) com-
overcome the noise and sparsity of GPS data, a map matching pute a planar distance oracle [24], and (v) store only the
method based on the Viterbi algorithm was suggested in [17]. distance labels for the actual nodes of the graph. If the
We use their method for our dataset. Since we do not have transformation in step (iii) could be done, the resulting oracle
the groundtruth for our dataset, the correctness of the map- would contradict known lower bounds.
matching step cannot be verified. It is actually rather unlikely In our implementation, we connect each node to additional
that all traces are mapped to the correct sequence of road portals as follows: on any separator path, we deem each
segments. We assume that our dataset is large enough such turn-restricted intersection as a portal (assuming that those
that the majority of traces is mapped to correct road segments. intersections are not ubiquitous). The upper bound of O(1/ε)
We determine the speed of each vehicle per road segment and portals per separator path cannot be guaranteed anymore.
we sort all the values for each road segment by time. We then The bound in the implementation is O(1/ε + R), where R is
bin speed values into one-hour intervals for each day of the the number of intersections with turn restrictions on a given
week. Using standard statistics, we compute the sample mean separator path.
and standard deviation for the speed and travel time for each
B. Directions
bin.
For examples of distributions for different road segments Realistic modeling of traffic requires us to consider directed
and different times of the day, see Fig. 3. graphs.1 The existing implementation [16] is for undirected
graphs only and the directed oracle is significantly more in-
volved [24]. Existing methods for directed road networks [5, 6,
V. P RACTICAL C ONSIDERATIONS 10, 21] usually do not have theoretical worst-case bounds close
We adapt our method for planar networks so that it can to [24] (some theoretical results are known [2, 3]). For our
efficiently handle the actual road network models we use in scenario, we integrate the undirected implementation into our
our experiments. code. While our theoretical results hold for directed networks,
our experimental evaluation is for undirected networks.
A. Non-planarities C. Reduced Set of Portals on Separator Paths
On the highest level, we cannot use planar-graph algorithms
Our network is not exactly planar and many real-world road
to efficiently compute portals and distances to portals for
networks may be non-planar [8]. The internal algorithms of
each node. Instead, we propose to relax the approximation
the distance oracle use separator paths, which are sensitive
guarantee in a controlled way. We heuristically treat each
to non-planarities (crossings). We propose to address most of
separator path (or express way) as in Algorithm 3.
these non-planarities as follows (inspired by similar construc-
tions for minor-free graphs [1]). On the highest level of the D. Deadline values
recursion, instead of using two shortest paths to separate the In the probability-tail model, the user may specify an arbi-
graph, (i) we interpret the set of express ways (cf. highways) trary deadline d. As a consequence, the ratio of the smallest
as a set of separator paths, (ii) we compute portals for each and the largest possible d can be arbitrary large. As argued in
separator path Q and each node v, (distances to portals are
computed with respect to the entire graph) and (iii) we remove 1 The shortest-path problem for directed graphs is strictly more general. In

the set of express ways from the graph. The highest level of a straightforward way, any undirected graph can be represented by a directed
graph with bidirectional edges. For arbitrary directed graphs, even if they
the recursion handles all paths that use any part of an express are guaranteed to be sparse, not much is known with respect to shortest-path
way. The remaining graph has only few further crossings. queries, while the undirected variant is quite well understood.

253
Fig. 3. The travel time distribution of two different road segments for 0 ∼ 1 am, 8 ∼ 9 am, and 5 ∼ 6 pm on weekdays.

Algorithm 3: P ORTALS(H, Q, m) road may potentially be jammed in the other direction. Using
Data: a path Q and a parameter m the following heuristic, we can use the undirected variant of
Result: a set of portals for each v ∈ V (H) the distance oracle without significant impact on the path
S INGLE S OURCE S HORTEST PATHS(Q) ; quality. To compile the different travel time distributions of
connect dummy source to each q ∈ Q; bidirectional road segments, we built distance oracles for two
computes first portal q0 (v) and distance d(v, q0 (v)) for different edge length functions: one with inbound driving time
each v ∈ V (H); statistics and one with outbound driving time statistics. The
for qi ∈ Q do geometric orientation of an edge with respect to the city center
for j ∈ {0, 1, 2, . . . m − 1} do was used as the reference to indicate which of the two travel
Q( j) ← subset of nodes qi where i = j mod m ; time statistics should be used when constructing the distance
oracle. At query time, for example, if the route query overall
S INGLE S OURCE S HORTEST PATHS(Q( j) ) ;
corresponds to a inbound movement, the distance oracles built
dummy source connected to all q ∈ Q( j) ;
based on the inbound statistics are used.
computes another portal q j (v) and distance
We examined the pre-processing times, query times, and
d(v, q j (v)) for each v ∈ V (H);
approximation ratios for different ε values for the two different
if (1 + ε)d(v, q j (v)) > d(v, qi (v)) + d(qi (v), q j (v))
models: Specifically, we used ε ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 1} for
for all portals qi (v) added previously then
add portal; the Mean-Risk model, and ε ∈ {0.5, 1} for the Probability-Tail
end model.
end
B. Pre-processing
end
Depending on the user’s desired accuracy level ε, we find
the values ξ , L, and U (as outlined in the theoretical part).
The sizes of the set K for different values of ε are shown
the theoretical sections, this can be controlled for effectively.
in Fig. 4 (Left). The pre-processing time depends mostly
In practical implementations, one could also limit the range of
on |K | (and therefore on ε). For each value of ε, we
user input values by restricting the deadline values available.
used c ∈ {0, 0.5, 1, · · · , 6} for the Mean-Risk model, and d ∈
VI. E XPERIMENTS {1.1 · m∗ , 1.2 · m∗ , · · · , 2 · m∗ } for the Probability-Tail model,
We evaluate an implementation of our algorithm on a real- where m∗ is the minimum expected travel time for a given
world network dataset. We show that the query time is several origin-destination pair.
orders of magnitude faster than the previous algorithms, and 1) Computation time: The pre-processing times for differ-
we also show that approximately optimal paths are reasonably ent ε values are shown in Fig. 4 (Middle). For example, for the
close to optimal paths. Mean-Risk model, if we set ε = 0.1, 82 different k values are
used. The overall pre-processing time is 5,159 seconds. The
A. Setup average pre-processing time per oracle is about 63 seconds.
The tests were run on a machine with 2 GB of main memory Note that the number of distance oracles to pre-process only
and an Intel Xeon CPU 1.86GHz processor. depends on the desired accuracy of the solution paths defined
Our graph represents a city-scale network with 40,000+ by ε. It is independent of the users’ other query parameters,
nodes and 70,000+ edges. As described in the previous section, such as origin, destination and the risk-aversion coefficient c in
we adapted the worst-case-proven algorithm and tailored it the Mean-Risk model or the deadline d in the Probability-Tail
to fit our real-world scenario. We implemented the distance model.
oracle for undirected graphs based on the implementation 2) Data space: Depending on ε, the overall space con-
of [16]. sumption varies greatly (see Fig. 4 (Right)), mostly due to two
Since our network is mostly bidirectional (meaning that reasons. First, the number of k values increases as the accuracy
there are only few one-way streets), our graph could poten- is increased (meaning that ε is decreased), see Fig. 4 (Left),
tially be considered undirected. However, in time-dependent and second, the number of connections in the distance oracle
scenarios, the traffic may still flow in one direction, while a grows linearly with 1/ε.

254
Fig. 4. The number of k values needed for the road network — this figure illustrates the growth of |K | for the mean-risk and the probability tail model with
respect to the user-specified level of accuracy ε (Left), Pre-processing time for each ε (Middle), For each ε value (1/ε on the x–axis), we report the total
storage requirement for all the distance oracles (y–axis) (Right).

Fig. 5. The histogram of the query time for 1,000 random OD-paris. ‘Pruning Fig. 6. The histogram of the absolute performance guarantee (error) for 1,000
algorithm’ refers to Algorithm 1 in [15]. Our new method is faster by several random OD-paris. We used c ∈ {0, 0.5, · · · , 6} for the Mean-Risk model and
orders of magnitude. The speed-up is from tens of seconds or several minutes d ∈ {1.1 · m∗ , 1.2 · m∗ , · · · , 2 · m∗ } for the Probability-Tail model, where m∗ is
to tens of or hundreds of microseconds. the minimum expected travel time for a given origin-destination pair.

C. Path Queries smaller than ε, and, moreover, the ratio is substantially smaller
than ε. Even for relatively large values such as ε = 1, we
To measure the query time, we compute stochastic shortest
still obtain an approximation ratio less than 0.02. Based on
paths between 1,000 random origin-destination (OD) pairs. To
this experimental result, we suggest using larger values of ε,
evaluate the quality and accuracy of the reported paths, we
saving both computation time (both pre-processing and query
query 1,000 random OD pairs.
times) and storage (as outlined in the previous section).
1) Query time: The time required to answer a route query
for different values of ε is illustrated in Fig. 5. The query time D. Path Examples
decreases as ε increases. We implemented [15, Algorithm 1] We developed a web interface that responds to user inputs.
as a comparison to our method in this paper. The query time Users can select the origin and destination, as well as the travel
results are plotted in Fig. 5 . time of the day and the day of the week. The system computes
2) Approximation quality: We compare the quality of the the stochastic shortest path and displays it on the map. Fig. 1
path output by our algorithm to the optimal path length, shows our different solution paths for different deadlines for
computed by the algorithm described in [15]. As a quality the Probability-Tail model with ε = 0.1.
measure, we list the approximation ratio, defined as the differ-
ence between the optimal solution (as defined in equations (2) VII. C ONCLUSION
and (3), respectively) and our solution, divided by the optimal We improve upon the state of the art in stochastic path plan-
solution (see Fig. 6). Our worst-case guarantee says that this ning, providing a method that can answer stochastic shortest-
ratio can never be larger than the user-specified level of path queries in poly-logarithmic time using a data structure
accuracy ε. Our experimental results show that the ratio is that occupies space essentially proportional to the size of the

255
network. Our method has proven worst-case guarantees and it [13] J. Gudmundsson, C. Levcopoulos, G. Narasimhan, and
might be applicable to real-world scenarios. M. Smid. Approximate distance oracles for geometric
spanners. ACM Transactions on Algorithms, 4(1), 2008.
VIII. ACKNOWLEDGMENTS [14] K. Kawarabayashi, P. N. Klein, and C. Sommer. Linear-
space approximate distance oracles for planar, bounded-
Support for this research has been provided by the
genus and minor-free graphs. In 38th Int. Colloquium on
Singapore-MIT Alliance for Research and Technology (The
Automata, Languages and Programming (ICALP), pages
Future of Urban Mobility project), NSF awards CPS-0931550
135–146, 2011.
and 0735953, and ONR awards N00014-09-1-1051 (MURI
[15] S. Lim, H. Balakrishnan, D. Gifford, S. Madden, and
SMARTS) and N00014-09-1-1031. C.S. was partially sup-
D. Rus. Stochastic motion planning and applications to
ported by the Swiss National Science Foundation. We are
traffic. International Journal of Robotics Research, 30
grateful for this support. Many thanks to Oliver Senn for
(6):699–712, 2011. Announced at WAFR 2008.
connecting the first two authors.
[16] L. F. Muller and M. Zachariasen. Fast and compact
oracles for approximate distances in planar graphs. In
R EFERENCES
15th European Symposium on Algorithms (ESA), pages
[1] I. Abraham and C. Gavoille. Object location using path 657–668, 2007.
separators. In 25th ACM Symposium on Principles of [17] P. Newson and J. Krumm. Hidden markov map matching
Distributed Computing (PODC), pages 188–197, 2006. through noise and sparseness. 17th ACM SIGSPATIAL
[2] I. Abraham, A. Fiat, A. Goldberg, and R. Werneck. High- Int. Conference on Advances in Geographic Information
way dimension, shortest paths, and provably efficient Systems (GIS), pages 336–343, 2009.
algorithms. In 21st ACM-SIAM Symposium on Discrete [18] E. Nikolova. Approximation algorithms for reliable
Algorithms (SODA), 2010. stochastic combinatorial optimization. In 13th Int. Work-
[3] I. Abraham, D. Delling, A. Fiat, A. Goldberg, and shop on Approximation, Randomization, and Combina-
R. Werneck. VC-dimension and shortest path algorithms. torial Optimization (APPROX), pages 338–351, 2010.
In 38th Int. Colloquium on Automata, Languages, and [19] E. Nikolova, M. Brand, and D. R. Karger. Optimal
Programming (ICALP), pages 690–699, 2011. route planning under uncertainty. In 16th Int. Conference
[4] Y. Bartal, L.-A. Gottlieb, T. Kopelowitz, M. Lewenstein, on Automated Planning and Scheduling (ICAPS), pages
and L. Roditty. Fast, precise and dynamic distance 131–141, 2006.
queries. In 22nd ACM-SIAM Symposium on Discrete [20] E. Nikolova, J. A. Kelner, M. Brand, and M. Mitzen-
Algorithms (SODA), pages 840–853, 2011. macher. Stochastic shortest paths via quasi-convex max-
[5] H. Bast, S. Funke, P. Sanders, and D. Schultes. Fast imization. In 14th European Symposium on Algorithms
routing in road networks with transit nodes. Science, (ESA), pages 552–563, 2006.
316(5824):566, 2007. [21] P. Sanders and D. Schultes. Highway hierarchies hasten
[6] R. Bauer and D. Delling. SHARC: Fast and robust exact shortest path queries. In 13th European Symposium
unidirectional routing. ACM Journal of Experimental on Algorithms (ESA), pages 568–579, 2005.
Algorithmics, 14, 2009. [22] R.-P. Schäfer. IQ routes and HD traffic: technology
[7] T. Caldwell. On finding minimum routes in a network insights about TomTom’s time-dynamic navigation con-
with turn penalties. Comm. of the ACM, 4(2), 1961. cept. In Foundations of Software Engineering (SIGSOFT
[8] D. Eppstein and M. T. Goodrich. Studying (non-planar) FSE), pages 171–172, 2009.
road networks through an algorithmic lens. In 16th ACM [23] C. Sommer, E. Verbin, and W. Yu. Distance oracles for
SIGSPATIAL Int. Symposium on Advances in Geographic sparse graphs. In 50th IEEE Symposium on Foundations
Information Systems (GIS), page 16, 2008. of Computer Science (FOCS), pages 703–712, 2009.
[9] R. Geisberger and C. Vetter. Efficient routing in road [24] M. Thorup. Compact oracles for reachability and ap-
networks with turn costs. In 10th Int. Symposium on proximate distances in planar digraphs. Journal of the
Experimental Algorithms (SEA), pages 100–111, 2011. ACM, 51(6):993–1024, 2004.
[10] R. Geisberger, P. Sanders, D. Schultes, and D. Delling. [25] M. Thorup and U. Zwick. Approximate distance oracles.
Contraction hierarchies: Faster and simpler hierarchical Journal of the ACM, 52(1):1–24, 2005.
routing in road networks. In 7th Int. Workshop on [26] TomTom. How TomTom’s HD Traffic and IQ Routes
Experimental Algorithms (WEA), pages 319–333, 2008. data provides the very best routing — travel time mea-
[11] R. Geisberger, M. Kobitzsch, and P. Sanders. Route plan- surements using GSM and GPS probe data. White Paper.
ning with flexible objective functions. In 12th Workshop [27] C. White, D. Bernstein, and A. Kornhauser. Some map
on Algorithm Engineering and Experiments (ALENEX), matching algorithms for personal navigation assistants.
pages 124–137, 2010. Transportation Research Part C: Emerging Technologies,
[12] J. Greenfeld. Matching GPS observations to locations 8(1-6):91–108, 2000.
on a digital map. In 81st Annual Meeting of the [28] S. Winter. Modeling costs of turns in route planning.
Transportation Research Board, volume 1, 2002. GeoInformatica, 6(4):363–380, 2002.

256
A Distributable and Computation-Flexible Assignment Algorithm:
From Local Task Swapping to Global Optimality
Lantao Liu Dylan A. Shell
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Texas A&M University, College Station, USA Texas A&M University, College Station, USA
Email: [email protected] Email: [email protected]

Abstract— This paper introduces an algorithm for solving the But the flexibility afforded by an any-time algorithm will
assignment problem with several appealing features for online, be counterproductive if it comes at too high a cost. The
distributed robotics applications. The method can start with method we describe has strongly polynomial running time
any initial matching and incrementally improve the solution
to reach the global optimum, producing valid assignments at and we show that it can be competitive with the fastest
any intermediate point. It is an any-time algorithm with an existing implementation even for hundreds of robots and tasks.
attractive performance profile (quality improves linearly) that, Additionally, the cost can be borne by multiple robots because
additionally, is comparatively straightforward to implement and variants of the algorithm can be executed in a decentralized
is efficient both theoretically (O(n3 lg n) complexity is better than way. We are unaware of another solution to the assignment
widely used solvers) and practically (comparable to the fastest
implementation, for up to hundreds of robots/tasks). We present problem with these features.
a centralized version and two decentralized variants that trade II. RELATED WORK
between computational and communication complexity.
Inspired by techniques that employ task exchanges between Task allocation is one of the fundamental problems in
robots, our algorithm guarantees global optimality while using distributed multi-robot coordination [2]. This paper draws a
generalized “swap” primitives. The centralized version turns connection between methods for (A.) improving local perfor-
out to be a computational improvement and reinterpretation of
the little-known method of Balinski-Gomory, proposed half a mance, e.g., via incremental clique preferences improvement,
century ago. Deeper understanding of the relationship between and (B.) allocation methods which seek to solve (or approxi-
approximate swap-based techniques —developed by roboticists— mate) the global optimum of the assignment.
and combinatorial optimization techniques, e.g., the Hungarian
and Auction algorithms —developed by operations researchers A. Local Task Exchanges in Task-Allocation
but used extensively by roboticists— is uncovered. Several researchers have proposed opportunistic methods
I. INTRODUCTION in which pairs of robots within communication range adjust
their workload by redistributing or exchanging tasks between
A common class of multi-robot task-allocation mechanisms themselves [3, 4, 5], also called O-contracts [6]. These intu-
involve estimating the expected cost for each robot’s perfor- itively appealing methods allow for a form of localized, light-
mance of each available task, and matching robots to tasks in weight coordination of the flavor advocated by [7]. Zheng and
order to minimize overall cost. By allocating robots to tasks Koenig [8] recently explored a generalization of the idea in
repeatedly, a team can adapt as circumstances change and which an exchange mechanism involving K robots (called K-
demonstrate fluid coordination. A natural tension exists be- swaps) improves solution quality. They theoretically analyzed
tween two factors: running-time is important as it determines and illustrated properties of the method empirically. This paper
how dynamic the team can be, while quality of the allocation gives new insight into how generalized swap-like mechanisms
reflects the resultant total cost and hence the performance of can ensure optimality, in our case through something analo-
the team. While the importance of solutions that trade the gous to automatic computation of the necessary value of K.
quality of results against the cost of computation has been Also, we have characterized the running-time of our method.
established for some time (e.g., review in [1]), the assignment
problem underlying efficient task-allocation has received little B. Optimal Assignment in Task-Allocation
attention in this regard. The first and best-known optimal assignment method is
This paper introduces an algorithm that yields a feasible Kuhn’s O(n3 ) Hungarian algorithm [9]. It is a dual-based
allocation at any point in its execution and an optimal as- algorithm because the variables in the dual program are
signment when it runs to completion. The results give an maintained as feasible during each iteration in which a primal
easily characterizable relationship between running time and solution is sought. Many other assignment algorithms have
allocation quality, allowing one factor to be traded for the been developed subsequently (see review [10]) however most
other, and even for the marginal value of computation to are dual-based methods including: augmenting path [11], the
be estimated. Additionally, the algorithm may start from any auction [12], pseudo-flow [13] algorithms, etc. These (and ap-
initial matching so it can be easily used to refine sub-optimal proximations to them) underlie many examples of robot task-
assignments computed by other methods. allocation, e.g., see [14, 15, 16]. Special mention must be made

257
of market-based methods ([17, 18]) as they have proliferated
presumably on the basis of inspiration from real markets and
their naturally distributed operation, and Bertsekas’s economic
interpretation of dual variables as prices [12]. Fully distributing
such methods sacrifices optimality: [19] gives bounds for
some auction strategies. [20] introduces a distributed auction (a) (b)
Fig. 1. Primal transformations are task swaps. (a) A cost matrix with two
algorithm in the context of networked systems. independent swap loops, where shaded and bold-edged squares represent old
Little work reports using primal approaches for task- and new assigned entries, respectively; (b) Task swapping from an independent
allocation; researchers who solve the (relaxed) Linear Program swap loop (e.g., left closed loop in (a)) among four robots and tasks.
directly likely use the popular (and generally non-polynomial
The duality theorems show that a pair of feasible primal
time) simplex method [21]. The primal assignment algo- and dual solutions are optimal iff the following is satisfied:
rithm proposed by Balinski and Gomory [22] is an obscure
method that appears entirely unknown within robotics. The xij (cij − ui − vj ) = xij c̄ij = 0, ∀(i, j). (4)
relationship is not obvious from their presentation, but their Equation (4) is called complementary slackness. It also
chaining sequence of alternating primal variables is akin to the indicates that, if a robot-task pair (i, j) is assigned, i.e.,
swap loop transformation we have identified. Our centralized xij = 1, then the corresponding reduced cost c̄ij must be
algorithm improves on their run-time performance (they re- equal to 0.
quire O(n4 ) time). Also, the data structures we employ differ C. Transformations and Admissibilities
as they were selected to reduce communication cost in the
Primal and dual transformations and, in particular, their
decentralized versions, which is not something they consider.
admissibilities are used later in the paper.
III. PROBLEM DESCRIPTION & PRELIMINARIES • Admissible Primal Transformation: Map Zp : X → X
We consider the multi-robot task assignment problem in is an admissible primal transformation if the primal solution
which the solution is an association of each robot to exactly quality is better after the transformation, i.e. X = Zp (X) is
one task , denoted SR - ST- IA by [14]. An assignment A = admissible iff f (X ) < f (X) for a minimization problem.
R, T  consists a set of robots R and a set of tasks T . Given • Admissible Dual Transformation: Zd : (u, v) → (u , v )
matrix C = (cij )n×n , where cij : R × T → R+ represents is an admissible dual transformation if the size for the set of
the cost of having robot i perform task j. In our work, feasible reduced costs increases, i.e., (u , v ) = Zd (u, v) is
n = |R| = |T |, the number of robots is identical to the number admissible iff |{(i, j) | c̄ij ≥ 0}| > |{(i, j) | c̄ij ≥ 0}|.
of tasks (otherwise dummy rows/columns can be inserted).
IV. TASK SWAPPING AND OPTIMALITY
A. Formulations Any primal transformation X = Zp (X) is easily visualized
This problem can be formulated with an equivalent pair of 
by superimposing both X and X on an assignment matrix.
linear programs. The primal is a minimization formulation:
 Shown as shaded and bold-edged entries in Fig. 1(a), the
minimize f = cij xij , transformations can be interpreted as row-wise and column-

i,j wise aligned arcs, each of which bridges exactly one shaded
subject to xij = 1, ∀i, entry (old assignment) and exactly one bold-edged entry (new
j (1) assignment). Note that both types of such entries have reduced

xij = 1, ∀j, costs equal to 0s. Connecting the beginning to the end closes
i the path to form what we call a swap loop, which is easily
xij ≥ 0, ∀(i, j). imagined as a subset of robots handing over tasks in a chain.
where an optimal solution eventually is an extreme point of If a swap loop shares no path segment with any other, it is
its feasible set (xij equals to 0 or 1). Let binary matrix X = termed independent.
{xij }, ∀(i, j) contain
  the primal variables. The constraints
x = 1 Theorem 4.1: A primal transformation X = Zp (X) where
ij and i xij = 1 enforce a mutual exclusion
j
property. There are corresponding dual vectors u = {ui } and (X = X ) forms a (non-empty) set of independent swap loops.
v = {vj }, with dual linear program: Proof: The mutual exclusion property proves both parts.
  Independence: if a path is not independent, there must be at
maximize g = ui + vj , least one segment that is shared by multiple paths. Any such
i j (2)
segment  contradicts the
 mutual exclusion constraints since
subject to ui + vj ≤ cij , ∀(i, j). either i xij > 1, or j xij > 1, or both.
B. Reduced Cost, Complementary Slackness, and Feasibility Closeness: a non-closed
 path has
end entries that are exposed;
Reduced costs are defined as follows: but this leads to j xij = 0 or i xij = 0.
c̄ij = cij − ui − vj , ∀(i, j). (3) Assume Sswp = {swpλ } (λ ∈ [1, m]) is a set of swap
loops where swpλ denotes the λth swap loop. Let primal
For a maximization dual as shown in Program (2), its 
transformation X → X with specific set of swap loops Sswp
constraint shows that an assignment pair (i, j) is feasible when 

and only when c̄ij ≥ 0. also be denoted as X = Zp Sswp (X).

258
If we put all the sub-assignments together, the objective of
the whole assignment problem can be written in the form
 −1
n−1 
max g(Aγ ) (7)
K −1
γ∈[1,|SA |]
Fig. 2. Amalgamation allows synthesis of complex swap loops from multiple
dependent swap loops. Overlapped path segments cancel each other out. where the first term in the product accounts for the re-
Theorem 4.2: A primal transformation involving mutually peated
 summation of each dual variable. By the fact that
independent swap loops Sswp = {swp1 , swp2 , · · · , swpm } ∀i (max zi ) ≥ max ∀i zi , (zi are arbitrary functions), we
can be separated and chained in any random order. i.e., have
 {swp1 } {swp2 } {swpm }  −1
X = Zp (Zp · · · Zp (X)). n−1 
Proof: A primal transformation is isomorphic to a set of max g(Aγ ) ≥ max g(A) (8)
K −1
γ∈[1,|SA |]
row and column permutations. Assume the row and column
permutation matrices (each is a square orthogonal binary where A is the original n × n assignment. With the duality
doubly stochastic matrix) corresponding to set Sswp are P theorems, this is equivalent to
 −1
and Q, so that PXQ permutes the rows and columns of n−1 
X appropriately. If row i is unaffected the ith column of P, min f (Aγ ) ≥ min f (A). (9)
K −1
th γ∈[1,|SA |]
5im= ei (the i column of the identity matrix) and then P =
p
λ=1 Pλ , where Pλ represents the separated permutation So even completing every possible K−swap, and doing so
matrix for the λth swap loop, will have a non-interfering until equilibrium is reached, may still end sub-optimally.
form so that the order of product does not matter. Thus we
have X = PXQ = P1 P2 · · · Pm XQm Qm−1 · · · Q1 (order
V. AN OPTIMAL SWAP-BASED PRIMAL METHOD
of Pλ s do not matter, nor do Qλ s analogously), which is
{swp } {swp } {swp }
equivalent to X = Zp 1 (Zp 2 · · · Zp m (X)). The preceding results suggest that to obtain the optimal
However, many times independent swap loops can not be primal transformation, one seeks a set of independent swap
directly obtained. Instead, an independent swap loop may loops, but that these can be equivalently sought as a series of
be composed of multiple dependent swap loops that share dependent swap loops. The primal assignment method we de-
rows/columns on some path segments. scribe achieves it iteratively and avoids local minima because
Theorem 4.3: Two dependent swap loops with overlapping, later swaps may correct earlier ones based on “enlarged” views
reversed segments can be amalgamated into a new swap loop, that examine increasing numbers of rows and columns. The
and vise versa. essence of the primal assignment method is that, at any time,
Proof: A directed path segment can be conveniently the primal solution’s feasibility is maintained (i.e., the mutual
represented as vector π . Path segments π1 and π2 sharing the exclusion property is satisfied), while infeasible dual variables
same rows or columns, but with different directions, cancel via are manipulated under the complementary slackness condition.
 At each iteration either an admissible primal transformation is
π = π1 + π2 , which has interpretation as a task (robot) handed
from one robot (task) to another, but then passed back again. found, or a new improved set of dual variables are obtained.
Such cancellation must form a loop because each merger Once all reduced costs are feasible, the primal and dual
collapses one pair of such segments, consistently connecting solutions simultaneously reach their (equal valued) optimum.
two partial loops. The opposite operation (decomposition)
involves analogous reasoning. Algorithm V.1 PRIMAL (C)
While ordering of independent swap loops is unimportant, 1: init arrays u[n] := {0}, v[n] := diag(C), C̄[n][n] := {0}
2: for i := 1 to n do
the number, size and, order of dependant loops matter. 3:

update matrix C̄ with c̄i j  = ci j  − ui − vj  , ∀i , j


Theorem 4.4: When K < n, K-swaps are susceptible to 4: if min{C̄[:][i]} < 0 then
local minima. 5: array δ[n] := {0}
Proof: A K-swap loop involves at most K robots and K 6: heap h[n] := PRE-PROCESS(C̄, u, v)
assigned tasks. Quiescence results by reaching equilibrium af- 7: check the ith column of C̄, get smallest-valued entry (x, y)
ter sufficient K-swaps so that no more swaps can be executed. 8: SWAP LOOP(C̄, δ, h, x, y)
9: for j := 1 to n do
Robots and their assigned tasks involved in the K-swap can 10: u[j] := u[j] + δ(a(j)), v[j] := v[j] − δ[j]
n
form a smaller sub-assignment of size K. Thus, we have 11: v[y] := v[y] − |C[x][y] − ux − vy | so that C̄[x][y] = 0
K
possible such sub-assignments, and all of them are optimal at 12: if a swap loop found, swap tasks to augment solution
equilibrium. Assume the set of these sub-assignments is SA =
n Note: Variable ui and u[i] are equivalent, vector v ≡ v[n], matrix C̄ ≡ C̄[n][n].
{Aγ }, where γ ∈ [1, ]. Aγ = {Rγ , Tγ } represents the sub-
K
assignment with robot (task) index set |Rγ | = K (|Tγ | = K).
Therefore, the dual program for each sub-assignment is: The whole algorithm is organized in Algorithm V.1, and
  critical steps are described in Algorithms V.2–V.4 in some
max g(Aγ ) = ui + vj , (5)
i∈Rγ j∈Tγ
detail to ensure that the pseudo-code is appropriate for straight-
subject to ui + vj ≤ cij , ∀i ∈ Rγ , j ∈ Tγ . (6) forward implementation.

259
A. Algorithm V.2: Pre-processing
At each stage, the reduced cost matrix C̄ is pre-processed
before searching for a swap loop: a separate min-heap is used
to maintain the feasible reduced costs in each row, such that
smallest values (root elements) can be extracted or removed (a) (b)
efficiently. Fig. 3. (a) Path segments are bridged with one another while searching for
swap loops. Shaded entries are currently assigned, and bold-edged entries have
Algorithm V.2 PRE-PROCESS (C̄, u, v) reduced costs equal to zero. Waved lines represent the paths found after dual
adjustments; (b) The associated tree data structure that aids efficient searching.
1: initiate n min-heaps h[n] := {null}
2: for i := 1 to n do C. Algorithm V.4: Dual Adjustments
3: for j := 1 to n do If all branches hit the dead-ends, dual adjustment introduces
4: if C̄[i][j] ≥ 0 AND j = a(i) then entries with 0-valued reduced costs so that the tree can be
5: make pair p := label = j, value = C̄[i][j]
6: insert p into h[i]
further expanded. This is done by changing the values of
7: return min-heaps h
dual variables, which indirectly changes the reduced costs of
corresponding entries. Doing so can only increase the size of
B. Algorithm V.3: Searching for Swap Loops the set of feasible reduced costs, thus the dual adjustment will
Any swap loop yields an admissible primal transformation. never deteriorate the current result. The method subtracts the
Loops are sought by bridging path segments in the reduced smallest feasible reduced cost from all visited (colored) rows
costs matrix. A horizontal path segment is built from a and adds it to every visited columns, producing at least one
currently assigned entry to a new entry with reduced cost of new 0-valued reduce cost(s). Waved arrows in Fig. 3 illustrate
zero in the same row. Vertical path segments are implicitly such procedure.
identified as from unassigned entries equal to zeros to the
unique assigned entries in the respective column. Fig. 3(a) Algorithm V.4 DUAL ADJ (C̄, Q, h, δ, Sδ , rs , ts )
shows the process. The search uses a tree, expanded in a 1: array top d[n] := {∞}, col d[n] := {0}
2: for i := 1 to n do
breadth first fashion, to find the shortest loop; a dead-end (i.e., 3: if row i ∈ SR then
empty queue) triggers the dual adjustment step. 4: top d[i] := p(i, h[i].extract.label)
5: min δ := min{top d[i]}
Algorithm V.3 SWAP LOOP (C̄, h, δ, x, y) 6: if min δ > 0 then
7: update Sδ := {i | top d[i] = min δ}
1: starting row rs := x, column ts := y, SR := ST := ∅
8: else
2: initiate r := a−1 (y), t := y, V path(t : rs → r)
9: min δ := −p(rs , ts )
3: push r into queue Q, color(SR ∪ {r}; ST ∪ {ts })
10: for i := 1 to n do
4: while Q not empty AND Q.f ront = rs do
11: if row i ∈ SR then
5: r := Q.f ront, Q.pop once
12: update δ[a(i)] := δ[a(i)] + min δ
6: initiate set Sδ := {r}
13: col d[i] := p(i, ts )
7: for each r ∈ Sδ do
14: if min{col d[i]} ≥ 0 then
8: t := h[r].extract.label
15: terminate current stage
9: while p(r, t)† = 0 do
16: update starting row rs := argmini {col d[i]}
10: if t ∈
/ ST then
17: if rs ∈ SR then
11: Hpath(r : a(r) → t), V path(t : r → a−1 (t))
18: Hpath(rs : a(rs ) → ts ), form a loop
12: push a−1 (t) into Q, color(SR ∪ {a−1 (t)}; ST ∪ {t})
19: terminate current swap loop searching
13: h[r].remove root element and update root
14: update t := h[r].extract.label
15: if Q empty then Next, we return to the relation of this method to the
16: DUAL ADJ (C̄, Q, h, δ, Sδ , rs , ts ) Balinski-Gomory’s primal technique [22]. Theoretical com-
17: if updated Sδ not empty then plexity and empirical results below show the superiority of the
18: go to STEP 7 swap-based approach. Nevertheless, it is worthwhile to address
19: return
the conceptual differences in detail as a common underlying
20: Hpath(rs : t → ts ), form a loop
idea is involved: they employ an iterative labelling and up-
†: p(·) is a projection of reduced cost, defined in (11) on page 5. dating techniques to seek a chaining sequence of alternating
primal variables, which are used to adjust and augment the
In Algorithm V.3, function t = a(r) denotes the assignment primal solutions. Three aspects worth highlighting are:
for r is t and thus is used to extract the column index with 1) The swap loop search incorporates the dual adjustment
a given row index; the inverse does the reverse. Horizontal procedure. Balinski-Gomory’s method may require n rounds
(vertical) segments are constructed via Hpath(cur row : col1 → of traversals and cost O(n) times more than the traversal
col2) (V path(cur col : row1 → row2)), where the three domains based on building and maintaining our tree. This modifica-
represent the current row (column) containing the path, the tion is most significant for the decentralized context as each
starting column (row) and the ending column (row) for the traversal involves communication overhead.
segment, respectively. The coloring on visited rows/columns 2) Instead of directly updating u, v, the δ array accumulates
is merely the set union operation. the dual variable adjustments during each stage. All updates

260
are transferred to u and v after the whole stage is terminated:
  (ω)   (ω)
ui + ω δa(i) , ∀i ∈ SR vj − ω δj , ∀j ∈ ST
ui = vj =
ui otherwise vj otherwise
(10)
(a) (b) (c) (d)
where SR and ST are index sets of colored rows and Fig. 4. Swap loop searching in a multi-robot system using Euclidean distance
columns, respectively, and ω is the index of iterations. The as cost metric. Circles represent robots and triangles denote the tasks. The
benefit lies in that reduced costs in the whole matrix need not graphs can also be interpreted as Hypergraphs [25].
updated on each dual variables adjustment. Instead, query
of reduced cost c̄ij for individual entry (i, j) during an During each stage, there are at most n DUAL ADJs for the
intermediate stage can be obtained via a projection p(i, j): worst case and each needs O(n) time to obtain min δ via
c̄ij = p(i, j) = c̄ij + δj − δa(i) . (11)
the heaps. Visited columns are colored in a sorted set and are
never considered for future path bridging in any given stage.
3) Swap loops are found more efficiently: for example, the There are at most n2 entries to color and check, each costs
heaps, coloring sets and tree with alternating tree nodes — O(lg n), yielding O(n2 lg n) per stage. Therefore, the total
assigned entries with n-ary branches, and unassigned entries time complexity for the whole algorithm is O(n3 lg n) and
with unary branches — quickly track the formation of loops the light-weight operations lead to a small constant factor.
even when the root is modified (Step 16 of Algorithm V.4). By way of comparison, Balinski-Gomory’s primal
D. Correctness method [22] uses O(n2 ) searching steps with O(n2 ) time
Assume the starting infeasible entry of matrix is (k, l) with complexity for each step. Some researchers [23, 24] have
reduced cost c̄kl = ckl − uk − vl < 0. suggested that it may possible to further improve the time
Theorem 5.1: Once a task swap loop starting from entry complexity to O(n3 ) using techniques such as the blossom
(k, l) is obtained, the task swaps must lead to an admissible method [11]. To the best of our knowledge, no such variant
primal transformation.  has been forthcoming.
Proof: Term cij xij contributes to f (X) = i,j cij xij In addition, although min-heaps in Algorithm V.2 are cre-
only when binary variable xij = 1 and cij = ui + vj (see (4)). ated in a separate step for the algorithmic description reason,
With (10): in practice they can be constructed on the fly only when they
  
f (X ) − f (X) = (ui + vj ) − (ui + vj ) are required, through which a better practical running time
i,j

i,j can be obtained although the time complexity is unchanged.

= ui 
− ui + va(i) −va(i) Experimental results also show that using a fast approximation
i
  (12) algorithm for initialization produces running times close to
  (ω)
 (ω) the fastest existing assignment algorithms with O(n3 ) time
= δa(i) − δa(i) − |c̄kl |
complexity.
i ω ω
= c̄kl < 0, VI. DISTRIBUTED VARIANTS
(see Step 11 in Algorithm V.1). So after a swap, the value of Distributed variants of our primal method are easily ob-
the primal objective must decrease. tained. Swap loops are searched via message passing: mes-
sages carrying dual variables (u, v) and dual updates δ are
Theorem 5.2: If no task swap loop starting from entry (k, l) passed down the tree while searching progresses. The idea is
is found, an admissible dual transformation must be produced. illustrated in Fig. 4 for a single swap loop searching stage with
Proof: First, feasible reduced costs remain feasible: four robots. The lines in Fig. 4(a) show the initial pairwise
  
c̄ij = cij − ui − vj robot-task assignment; the arrows in Fig. 4(b) show bridging
 (ω)  (ω)
= cij − ui − δa(i) − vj + δj edges found by searching for a swap loop starting from a
 ω ω (13) selected pair. If the path ending pair connects to the starting
c̄ij ≥ 0, ∀i ∈ SR , j ∈ ST pair, then a swap loop has been found (Fig. 4(c)) and tasks may
=  (ω)
c̄ij − ω δa(i) ≥ 0, ∀i ∈ SR , j ∈
/ ST . be exchanged among robots in the loop. The new assignment
is finally shown in Fig. 4(d).
Second, at least c̄kl will become feasible which leads to Unlike centralized algorithms, the cost matrix may be not
the termination before formation of a swap loop, even in the globally visible. Instead, each robot maintains and manipulates
sophisticated strategy allowing dynamic updating of starting its own cost vector associated with all tasks. A noteworthy
entry (See step 16 of Algorithm V.4). This proves that the set feature is that a robot need not know the cost information
of feasible reduced costs must increase. of other robots, since the two arrays of dual variables are
E. Time Complexity shared. We do assume that the initial assignment solution and
Algorithm V.1 requires at most n stages because in each the corresponding costs for the assigned robot-task pairs are
stage the smallest infeasible reduced cost in each column is known by all robots, so the initial reduced costs for each robot
selected (Step 7 of Algorithm V.1), all other infeasible entries may be calculated locally.
in the same column will thus also become feasible. The pre- The algorithm has two roles: an organizer robot that holds
processing using min-heaps for any stage requires O(n2 lg n). the starting infeasible entry, and the remainder being member

261
robots (but with unique IDs). The organizer initiates a swap
loop search iteration at stage m, by communicating a message
containing the dual information um−1 , vm−1 obtained from
stage m − 1, as well as a newly created dual increment vector
(a) (b)
δ m . A successor robot is located from either the assignment Fig. 5. Illustrations of task oriented (a) and robot oriented (b) strategies. Here
information or the newly found feasible and orthogonal entries shaded entries have infeasible reduced costs. Solid and void stars represent
satisfying the complementary slackness, as presented in the current starting entry and (possibly) next starting entry, respectively.
centralized version. When a path can no longer be expanded, B. Robot Oriented Variant
member robots at the respective “dead-ends” request a dual The robot oriented method aims to covering all infeasible
adjustment from the organizer. Once the organizer has col- reduced costs of one robot before transferring to another robot;
lected requests equal to the number of branches, it computes it works in a row-wise fashion. The organizer is randomly
and transmits δ m . The process continues until a swap loop is selected from all members that hold infeasible reduced costs,
found and tasks are exchanged. At this point, the organizer and keeps the role for the whole stage. Monitoring of “worse”
either re-elects itself as next stage’s organizer, or hands over projected costs is not required, but the each stage only
the role to other robots, based on different strategies discussed guarantees that the starting entry will become feasible, not
below. The roles are described in Algorithms VI.1 and VI.2. others. This means the organizer may need to iteratively fix
all its associated infeasible reduced costs at each stage before
Algorithm VI.1 Organizer (um−1 , vm−1 , δ m )
transferring the role to a successor organizer. 
1: initiate: only once
2: decide starting entry xm , ym for current stage m To compare, (A.) the advantage of the task-oriented scheme
3: send msg(um−1 , vm−1 ) to member with ID a−1 (y) lies in that at most n stages are needed to reach global
4: listening: optimality, since each stage turns all infeasible reduced costs
5: if all involved IDs request dual adjustments then
6: compute δ m , send it it to corresponding ID(s)
to feasible associated with a task. Its disadvantage is the extra
7: endif communication because at the beginning of each stage, the
8: if swap loop formed then member holding the smallest reduced cost for the chosen task
9: with δ m , update um−1 , vm−1 to um , vm for next stage has to be determined; additional communications are involved
10: decide next organizer j and send msg(um , vm ) to ID j in the monitoring aspect too. (B.) the robot oriented strategy
has greater decentralization and eliminates extra monitoring
Algorithm VI.2 Member[i] (organizer ID, um−1 , vm−1 , δ m ) communication (disadvantage mentioned in the task oriented
1: update c̄ij ∀j with received um−1 , vm−1 ,δ m scheme). At any stage only a subset of robots need be involved
2: if {j | c̄ij = 0} = ∅ then and no global communication is required. The disadvantage of
3: for each j of {j | c̄ij = 0, j = a(i)} do this variant is that a total of O(n2 ) stages (note, each stage
4: send (um−1 , vm−1 , δ m ) to ID a−1 (j) is still equivalent to O(n) steps/stages of Balinski-Gomory’s
5: send newly involved IDs and No. of new branches to organizer
6: else
method) is needed.
7: send min{c̄ij |c̄ij > 0,∀j } to organizer, request dual adjustment VII. EXPERIMENTS
Three forms of experiment were conducted: run-time per-
Once a reduced cost becomes feasible it never becomes formance of the centralized algorithm, access pattern analysis,
infeasible again (see Theorem 5.2) so the algorithm needs to and comparison of the decentralized variants.
iteratively transform each infeasible reduced cost to approach
global optimality. Two different approaches for locating and A. Algorithmic Performance Analysis
transforming the infeasible values lead to two versions of the We implemented both our swap-based algorithm and
algorithm: task-oriented and robot-oriented variants. Balinski-Gomory’s method in C++ (with STL data struc-
tures), and used an optimized implementation of Hungarian
A. Task Oriented Variant algorithm (O(n3 ) complexity) available in the dlib library
The task oriented approach attempts to cover all infeasible
reduced costs of one task before moving to the costs of
other tasks; it is, thus, operates column-wise in the cost
matrix. The task oriented approach mimics the procedure
of the centralized version: for any given task (column), the
robot holding the smallest projected infeasible reduced cost is
elected as organizer. During the swap loop searching stages,
it is possible that after some DUAL ADJs one members can
hold a “worse” projected infeasible reduced cost. Therefore, (a) (b)
Fig. 6. Comparison of running times: (a) Time from an optimized Hungarian
after each update of δ, the organizer must check all involved method, the Balinski-Gomory’s method, and the swap-based algorithm. Primal
members within the current tree, and hands over the organizer methods start with random initial solutions; (b) Running time is improved
role if necessary. when the algorithm is combined with a fast approximation method.

262
(a) (b) (a) (b)
Fig. 7. (a) Linear solution quality and running time from different initial Fig. 8. Quantities of involved rows (robots) and lengths of swap loops over
solutions (matrix size: 100×100); (b) Entries traversed during stages. stages (matrix size is 100×100). (a) Results from random initial solutions;
(b) Results from greedy initial solutions.
(http://dlib.net) for comparison. The experiments were
run on a standard dual-core desktop with 2.7GHz CPU with results in a large reduction in accesses: the average is ∼ 100
3GB of memory. Fig. 6(a) shows the performance results. for each stage, in contrast with Balinski-Gomory’s method
We can see that the swap-based algorithm has a significantly requiring ∼ 700 with larger standard deviations (actually,
improved practical running time over the Balinski-Gomory’s reaching more than 8, 000 traversals when many dual ad-
method. The flexibility of the algorithm allowed for further justments occur). The results quantify claims made about the
improvement: fast approximation algorithms can give a rea- swap-based method fitting a decentralized paradigm.
sonable initial assignment. Fig. 6(b) shows the improvement We also investigated the total number of rows (and, cor-
using an extremely cheap greedy assignment that assigns the respondingly, columns) involved during each stage, which
robot-task pairs with lowest costs first, in a greedy manner. reflects the number of involved robots in decentralized ap-
This reduces the practical running time to be very close to the plications, as well as the size of swap loop formed at the end
Hungarian algorithm, especially for matrices with n < 300. of the stages (defined as the number of colored rows). Fig. 8
To analyze solution quality as a function of running-time, show results from random (left plot) and greedily (right plot)
we computed scenarios with 100 robots and 100 tasks with initiated solutions. We see that the number of involved rows
randomly generated cij ∈ [0, 104 ] ∀i, j. The solution qualities can be significantly reduced given better initial solutions, and
and consumed time for individual stages is illustrated in loops are comparatively small for either cases.
Fig. 7(a). The solution quality is measured by parameter η More detailed statistics are given in the table below. We
calculated as a ratio of current solution αi at current stage i conclude that improving initial assignment solutions, not only
to the final optimum αn , i.e., η = αi /αn ≥ 1. In each figure, improves running time, but also the degree of locality in
the three series represent initial assignments with different communication and computation. The averaged longest swap
“distances” to the optimal solution. A 60% processed initial lengths show that the admissible primal transformations are
solution means the initial solution is α60 (the solution output a series of small swaps (one can regard the longest length
at 60th stage from a random initialization). The matrix is equivalent to K of K-swaps), but which still attains optimality.
column-wise shuffled before the input of a processed solution S TATISTICS OF S WAP L OOPS AMONG S TAGES (M ATRIX SIZE 100)
such that a new re-computation from scratch can be executed Initial solution No. loops Avg. length Avg. longest Avg. involved
(otherwise it is equivalent to the continuing computation). We random initiation 97.12 10.16 21.06 46.72
can see that the solution qualities for all three scenarios change 30% processed 71.90 7.34 19.97 34.29
approximately linearly with the number of stages, which 60% processed 47.20 4.46 14.56 23.92
indicates the “step length” for the increment is a constant. greedy initiation 24.86 2.30 11.80 16.14

From this observation, computational resources and solution Note: The last three columns denote the averaged lengths, the averaged longest lengths
of swap loops, and the averaged number of colored rows in single stages, respectively.
accuracy are fungible as each is controllable in terms of the
other. Given a current solution αm at the mth stage (m ≥ 1) as C. Results from Decentralized Variants
well as an initial solution α0 , the optimum can be estimated: We also implemented both variants of the decentralized
n algorithms and distributed them over five networked computers
α̂n = α0 + (αm − α0 ) = α0 + nΔs , (14)
m for testing. The implementations can be directly applied to
where Δs ≥ 0 is the step length of solution increment. To distributed multi-robot task-assignment, e.g., as the test routing
bound the accuracy within 1 + , where  ≥ 0, assume we problems in [26, 27]. The hosts were given unique IDs from
need to stop at θth stage, then
1 to 5, and communication performed via UDP, each host
α0 − θΔs α0 − (1 + )(α0 − nΔs )
≤ 1 +  =⇒ θ ≥ . (15) running a UDP server to listen to the messages sent by its
α̂n Δs
peers. Information such as the IDs of machines, values of
B. Access Patterns Imply Suitability for Distribution dual variables, requests of dual adjustments, etc., were encoded
Intuitively, entries in the spanning tree during each stage via simple protocols over the message passing. To initiate the
reflect the cost of communication.∗ Thus, we compared the
∗ Every traversed entry on the path segments, no matter it is assigned or
access pattern of our swap-based algorithm with Balinski-
unassigned, must connect to a new entry in other rows, requiring a message
Gomory’s method on 100 × 100 matrices with random initial be passed. The number is approximately half of all the traversed entries since
assignment as used. Fig. 7(b) shows that swap-loop traversal each entry is counted twice for the analysis of communication complexity.

263
Ranked solutions: assignments are found with increasing
quality, allowing fast transitions to good choices without re-
computation if commitment to the optimal assignment fails.
Decentralized Variants, Local Computation & Communica-
tion: a small subset of robots are found to be typically
involved. The decentralized variants of the algorithm require
(a) (b) (c) no single privileged global controller. They allow one to
Fig. 9. Performance of the task oriented (T-O) and robot oriented (R-O)
decentralized implementation. Measurements of 5 hosts to 5 tasks. choose to trade between decentralization (communication)
system, we inject 5 tasks with IDs from 1 to 5 and each and running time (number of stages).
machine randomly generates an array of cost values associated R EFERENCES
with these 5 tasks. The initial allocation assigns every machine
[1] S. Zilberstein, “Using Anytime Algorithms in Intelligent Systems,” AI
with ID to the task with the identical ID; the corresponding Magazine 17(3), 1996.
costs for these assigned pairs are communicated. An initial [2] L. E. Parker, “Multiple Mobile Robot Systems,” in Handbook of
organizer is randomly selected. Robotics, B. Siciliano and O. Khatib, Eds. Springer, 2008, ch. 40.
[3] M. Golfarelli, D. Maio, and S. Rizzi, “Multi-agent path planning based
Both distributed variants of the algorithm were tested. on task-swap negotiation,” in Proc. UK Planning and Scheduling Special
Fig. 9(a) shows the number stages used for the two schemes Interest Group Workshop, 1997, pp. 69–82.
(average and variance for 10 separate instances). Fig. 9(b) and [4] M. B. Dias, , and A. Stentz, “Opportunistic optimization for market-
based multirobot control,” in Proc. IROS, 2002, pp. 2714–2720.
Fig. 9(c) show the communication cost (number of messages) [5] L. Thomas, A. Rachid, and L. Simon, “A distributed tasks allocation
and robots involved (ever having received/processed messages) scheme in multi-UAV context,” in Proc. ICRA, 2004, pp. 3622–3627.
per stage, respectively. These empirical results also validate the [6] T. Sandholm, “Contract types for satisficing task allocation: I Theoretical
results,” in AAAI Spring Symp: Satisficing Models, 1998, pp. 68–75.
claims made above: (i) the task oriented scheme requires fewer [7] P. Stone, G. A. Kaminka, S. Kraus, and J. S. Rosenschein, “Ad Hoc
stages, but has greater communication per stage; (ii) although Autonomous Agent Teams: Collaboration without Pre-Coordination,” in
the robot oriented method uses more stages, less the commu- Proc. AAAI, 2010.
[8] X. Zheng and S. Koenig, “K-swaps: cooperative negotiation for solving
nication and fewer the robots are involved, indicating more task-allocation problems,” in Proc. IJCAI, 2009, pp. 373–378.
local computation and communication. [9] H. W. Kuhn, “The Hungarian Method for the Assignment Problem,”
Naval Research Logistic Quarterly 2:83–97, 1955.
VIII. CONCLUSION [10] R. Burkard, M. Dell’Amico, and S. Martello, Assignment problems.
Strategies of task swaps are a natural paradigm for de- New York, NY: Society for Industrial and Applied Mathematics, 2009.
[11] J. Edmonds and R. M. Karp, “Theoretical Improvements in Algorithmic
centralized optimization and have been used for years (and Efficiency for Network Flow Problems,” J. ACM 19(2):248–264, 1972.
identified independently by several groups). It is now, using the [12] D. P. Bertsekas, “The auction algorithm for assignment and other
algorithm we present, that optimality can be guaranteed with network flow problems: A tutorial,” Interfaces 20(4):133–149, 1990.
[13] A. V. Goldberg and R. Kennedy, “An Efficient Cost Scaling Algorithm
these same primitive operations. Additionally, we have sought for the Assignment Problem,” Math. Program. 71(2):153–177, 1995.
to emphasize the useful any-time aspect of primal techniques. [14] B. P. Gerkey and M. J. Matarić, “A formal analysis and taxonomy of
task allocation in multi-robot systems,” IJRR 23(9):939–954, 2004.
In summary, we highlight features of the introduced method: [15] M. Nanjanath and M. Gini, “Dynamic task allocation for robots via
Natural primitives and optimality: the method is based on task auctions,” in Proc. ICRA, 2006, pp. 2781–2786.
swap loops, a generalization of O-contracts, task-exchanges, [16] S. Giordani, M. Lujak, and F. Martinelli, “A Distributed Algorithm for
the Multi-Robot Task Allocation Problem,” LNCS: Trends in Applied
and K-swaps; these are techniques which have intuitive inter- Intelligent Systems, vol. 6096, pp. 721–730, 2010.
pretations in distributed systems and natural implementations. [17] M. B. Dias, R. Zlot, N. Kalra, and A. Stentz, “Market-Based Multirobot
However, unlike other swap-based methods, global optimality Coordination: A Survey and Analysis,” Proc. of the IEEE, 2006.
[18] S. Koenig, P. Keskinocak, and C. A. Tovey, “Progress on Agent
can be guaranteed. Coordination with Cooperative Auctions,” in Proc. AAAI, 2010.
Computational flexibility and modularity: the algorithm can [19] M. G. Lagoudakis, E. Markakis, D. Kempe, P. Keskinocak, A. Kleywegt,
S. Koenig, C. Tovey, A. Meyerson, and S. Jain, “Auction-based multi-
start with any feasible solution and can stop at any non- robot routing,” in Robotics: Science and Systems, 2005.
decreasing feasible solution. It can be used as a portable [20] M. M. Zavlanos, L. Spesivtsev, and G. J. Pappas, “A Distributed Auction
module to improve non-optimal assignment methods, e.g., Algorithm for the Assignment Problem,” in Proc. CDC, 2008.
[21] G. Dantzig, Linear Programming and Extensions. Princeton University
some variants of market-based, auction-like methods. Press, Aug. 1963.
Any-time and efficiency: Unlike primal techniques for gen- [22] M. L. Balinski and R. E. Gomory, “A primal method for the assignment
and transportation problems,” Management Sci. 10(3):578–593, 1964.
eral LPs, optimality is reached within strongly polynomial [23] W. Cunningham and I. A.B. Marsh, “A Primal Algorithm for Optimum
time. Initialization with fast approximation methods makes Matching,” Mathematical Programming Study, pp. 50–72, 1978.
it competitive practically, and it can potentially be further [24] M. Akgül, “The linear assignment problem,” Combinatorial Optimiza-
tion, pp. 85–122, 1992.
accelerated. Additionally, the linear increase in the solution [25] L. Liu and D. Shell, “Large-scale multi-robot task allocation via dynamic
quality makes balancing between the computation time and partitioning and distribution,” Autonomous Robots 33(3):291–307, 2012.
assignment accuracy possible. [26] M. Berhault, H. Huang, P. Keskinocak, S. Koenig, W. Elmaghraby,
P. Griffin, and A. J. Kleywegt, “Robot Exploration with Combinatorial
Ease of implementation: the algorithm uses simple data Auctions,” in Proc. IROS, 2003, pp. 1957–1962.
structures with a straightforward implementation that is much [27] L. Liu and D. Shell, “Assessing Optimal Assignment under Uncertainty:
An Interval-based Algorithm,” IJRR 30(7):936–953, 2011.
simpler than comparably efficient techniques.

264
The Banana Distribution Is Gaussian:
A Localization Study with Exponential Coordinates
Andrew W. Long ∗ , Kevin C. Wolfe † , Michael J. Mashner † , Gregory S. Chirikjian †
∗ Northwestern University, Evanston, IL 60208
† Johns Hopkins University, Baltimore, MD 21218

Abstract—Distributions in position and orientation are central 1


to many problems in robot localization. To increase efficiency, y
a majority of algorithms for planar mobile robots use Gaus- 0.5

Y Position
sians defined on positional Cartesian coordinates and heading. Q
However, the distribution of poses for a noisy two-wheeled robot 0
moving in the plane has been observed by many to be a “banana-
shaped” distribution, which is clearly not Gaussian/normal in −0.5 (x,y)
these coordinates. As uncertainty increases, many localization 2r
l
algorithms therefore become “inconsistent” due to the normality −1
−0.5 0 0.5 1 1.5 x
assumption breaking down. We observe that this is because the X Position
combination of Cartesian coordinates and heading is not the
most appropriate set of coordinates to use, and that the banana (a) (b)
distribution can be described in closed form as a Gaussian in an
alternative set of coordinates via the so-called exponential map. Fig. 1. (a) Banana-shaped distribution for position of a differential-drive
robot moving along a straight line with noisy wheel speeds. (b) Notation for
With this formulation, we can derive closed-form expressions differential-drive robot.
for propagating the mean and covariance of the Gaussian in
these exponential coordinates for a differential-drive car moving
along a trajectory constructed from sections of straight segments
and arcs of constant curvature. In addition, we detail how to could be decomposed in to a robot localization problem and
fuse two or more Gaussians in exponential coordinates together several independent landmark estimation problems [18]. With
with given relative pose measurements between robots moving this factorization, many efficient SLAM algorithms known as
in formation. These propagation and fusion formulas utilized FastSLAM have been proposed [16, 17, 11].
here reduce uncertainty in localization better than when using
traditional methods. We demonstrate with numerical examples An inherent technique utilized in EKF-SLAM and many
dramatic improvements in the estimated pose of three robots FastSLAM algorithms for robots operating in the plane is
moving in formation when compared to classical Cartesian- to represent all distributions with Gaussians in Cartesian
coordinate-based Gaussian fusion methods. coordinates (x, y) and orientation heading angle (θ). This
Gaussian/normal representation can be fully parameterized by
I. I NTRODUCTION
a mean and covariance in these variables. Equal probabil-
A rich area of robotics research known as SLAM or ity density contours of these distributions are described by
simultaneous localization and mapping consists of a robot ellipses. However, if we command a differential-drive robot
mapping its environment while estimating where it may be in to drive along a straight path, the resulting distribution from
this map. To incorporate uncertainty, one strategy represents many sample paths may look similar to Fig. 1(a). Originally
all possible poses of the robot with a probability density described by Thrun et al. [23], [24], this distribution is gen-
function (pdf). The goal then is to maintain and update this erally referred to as the “banana-shaped” distribution, which
pdf as the robot moves. One solution is to propagate the entire does not have elliptical probability density contours in the
pdf such as with Fourier transform techniques [26]. However, variables (x, y, θ). As the uncertainty grows, this assumption
these techniques can be too numerically intensive for real-time of normality breaks down and the maps inevitably become
SLAM applications. inconsistent [14].
According to Durrant-Whyte and Bailey, the two most com- Many have studied the problem of inconsistency for the
mon tools used to increase efficiency for SLAM algorithms are EKF-SLAM [14], [3], [5], [13], and [6]. Julier and Ulhmann
the extended Kalman filter (EKF) and the Rao-Blackwellized proved that this inconsistency is a direct result of linearization
(RB) particle filter [8]. EKF-SLAM is based on the classic of the non-linear models in EKF-SLAM [14]. The extent of
work of Smith et al. in which linearized models are used the inconsistency was studied by Bailey et al. in the context of
to propagate uncertainty of the non-linear motion and obser- heading uncertainty. They showed that if the standard deviation
vations [21]. Several strategies for improving the efficiency on the heading was larger than one or two degrees the map
for large scale SLAM problems using this technique are ultimately failed, specically with regards to excessive infor-
documented by Bailey and Durrant-Whyte [2]. By using the mation gain and jagged vehicle trajectories [3]. In addition,
RB particle filter, Murphy observed that the SLAM problem the quality of the normal assumption in these coordinates was

265
studied for the RB particle filter in [22], which demonstrated where dwi are unit-strength Wiener processes and D is a noise
that several real-world examples were not well described by coefficient, then one obtains the SDE
Gaussians. ⎛ ⎞ ⎛ r ⎞
dx 2 (ω1 + ω2 ) cos θ
In the majority of existing algorithms, there is an inherent
dx = ⎝ dy ⎠ = ⎝ 2r (ω1 + ω2 ) sin θ ⎠ dt
assumption that the distribution should be represented by
dθ r
(ω1 − ω2 )
Gaussians in Cartesian coordinates. In this paper, we propose ⎛ r ⎞
r  
to use exponential coordinates and Lie groups to represent √ 2 cos θ 2 cos θ dw1
the robot’s pose. We demonstrate that a Gaussian in these + D ⎝ 2r sin θ 2r sin θ ⎠ .
dw2
coordinates for planar robots provides a much better fit
r
−r
than a Gaussian in Cartesian coordinates as the uncertainty (2)
increases. By using this approach, we can derive closed-form In this paper, we focus on the distributions for the robot,
propagation and fusion formulas that can be used to estimate which deterministically would be driving straight or driving
the mean and covariance of the robot’s pose. along an arc of constant curvature. For deterministic driving
In the last few years, the idea of using the Lie group nature straight forward at speed v, the constant wheel speeds are
of rigid body transformations has drawn some attention for the
v
SLAM problem, particularly in the area of monocular camera ω1 = ω2 = . (3)
SLAM [15], [20], [10]. General robot motion utilizing rigid r
body transformations has been applied with Consistent Pose For deterministic driving along an arc of radius a counter-
Registration (CPR) [1] and complementary filters [4]. clockwise at rate α̇, the wheel speeds can be shown to be
The outline of the paper is as follows. In Section II, we
α̇  α̇ 
derive the stochastic differential equation for the differential- ω1 = (a + ), ω2 = (a − ). (4)
drive robot operating in the plane. We review rigid body r 2 r 2
motions and their relationship to exponential coordinates in III. R EVIEW OF R IGID -B ODY M OTIONS
Section III. We provide the definitions for the mean, covari- The planar special Euclidean group, G = SE(2), is the
ance and the Gaussian probability density function in these semidirect product of the plane, R2 , with the special orthogo-
coordinates in Section IV. A comparison between the Gaussian nal group, SO(2). The elements of SE(2) can be represented
in Cartesian coordinates and the Gaussian in exponential with a rotational part R ∈ SO(2) and a translational part
coordinates is detailed in Section V. In Section VI, we derive t = [t1 , t2 ]T as 3 x 3 homogeneous transformation matrices
closed-form expressions that can be used to propagate the  
mean and covariance of the Gaussian in exponential coor- R t
g= ∈ SE(2),
dinates. We conclude by deriving closed-form formulas for 0T 1
fusing two (or more) Gaussians in exponential coordinates in
where the group operator ◦ is matrix multiplication.
Section VII. These fusion formulas are then applied to multiple
We will also make use of the Lie algebra se(2) associated
robots moving in formation.
with SE(2). For a vector x = [v1 , v2 , α]T , an element X of
II. S TOCHASTIC D IFFERENTIAL E QUATION the Lie algebra se(2) can then be expressed as
⎛ ⎞
The example we will be considering in this paper is the 0 −α v1
kinematic cart or differential-drive robot. This robot has two X = x̂ = ⎝ α 0 v2 ⎠ and X ∨ = x (5)
wheels each of radius r which can roll but not slip. The two 0 0 0
wheels share an axis of rotation and are separated by a length where the ∧ and ∨ operators allow us to map from R3 to
. The configuration of the robot can be represented as x = se(2) and back.
[x, y, θ]T where (x, y) is the Cartesian position in the plane By using the matrix exponential exp(·) on elements of
of the center of the axle and θ is the orientation of the robot se(2), we can obtain group elements of SE(2) as
as shown in Fig. 1(b). For the configuration coordinates x, the
governing differential equations are g(v1 , v2 , α) = exp(X)
⎛ ⎞
cos α − sin α t1
r cos θ(dφ1 + dφ2 ) r sin θ(dφ1 + dφ2 ) ⎝ sin α
dx = , dy = , = cos α t2 ⎠ ,
2 2
0 0 1
r
and dθ = (dφ1 − dφ2 ), where

where the rate of spinning of each wheel is given by ωi = t1 = [v2 (−1 + cos α) + v1 sin α]/α and (6)
dφi /dt for i = 1 or 2. If the wheel speeds are governed by t2 = [v1 (1 − cos α) + v2 sin α]/α. (7)
stochastic differential equations (SDE) as
√ Since we are using the exponential in this formulation, we will
dφi = ωi (t)dt + Ddwi for i = 1 or 2 (1) refer to the coordinates (v1 , v2 , α) as exponential coordinates.

266
We can obtain the vector of exponential coordinates x from where exp(·) is the usual scalar exponential function and c̃(Σ̃)
the group element g ∈ SE(2) from is a normalizing factor to ensure f (x; μ̃, Σ̃) is a pdf. In (19),
x = (log(g))∨ , (8) c̃(Σ̃) = (2π)n/2 | det Σ̃|1/2 . (20)
where log(·) is the matrix logarithm. These definitions can be naturally extended to matrix Lie
Given a time-dependent rigid body motion g(t), the spatial groups as in [25]. Given a group G with operation ◦, the mean
velocity as seen in the body-fixed frame is given as the quantity μ ∈ G of a pdf f (g) is defined to satisfy
 T  
R Ṙ RT ṫ  
−1
g ġ = ∈ se(2), (9) log∨ μ−1 ◦ g f (g)dg = 0. (21)
0T 0 G

where the dot represents a time derivative. The covariance about the mean is defined as
We define the adjoint operator Ad(g) to satisfy [7] 
Σ= log∨ (μ−1 ◦ g)[log∨ (μ−1 ◦ g)]T f (g)dg. (22)
Ad(g)x = log∨ (g ◦ exp(X) ◦ g −1 ). (10) G

For SE(2), the adjoint matrix is given by A multidimensional Gaussian for Lie groups can be defined
    as
R Mt 0 1 1 1
Ad(g) = where M = . f (g; μ, Σ) = exp − yT Σ−1 y , (23)
0T 1 −1 0 c(Σ) 2
(11)
We define another adjoint operator ad(X) to satisfy [7] where y = log(μ−1 ◦g)∨ and c(Σ) is a normalizing factor. For
highly concentrated distributions (i.e. the distribution decays
ad(X)y = ([X, Y ])∨ , (12) rapidly as you move away from the mean), c(Σ) ≈ c̃(Σ̃).
where [X, Y ] = XY − Y X is the Lie bracket. This adjoint V. G AUSSIAN C OMPARISONS
matrix for se(2) is given by
  In this section, we compare the Gaussian in Cartesian
−αM M v coordinates defined in (19) to the one defined in (23) in
ad(X) = (13)
0T 0 exponential coordinates for the differential-drive car moving
where v = [v1 , v2 ]T . The two adjoint matrices are related by along a straight path and moving along an arc. For all the
examples in the remainder of this paper, the wheel base  is
Ad(exp(X)) = exp(ad(X)). (14) 0.200 and the radius r of each wheel is 0.033. To obtain a
The Baker-Campbell-Hausdorff (BCH) formula [9] is a set of sample frames representing the true distribution of this
useful expression that relates the matrix exponential with the system, we numerically integrated the stochastic differential
Lie bracket. The BCH is given by equation given in (2) 10,000 times using a modified version
of the Euler-Maruyama method [12] with a time step of
Z(X, Y ) = log(exp(X) ◦ exp(Y )) dt = 0.001 for a total time of T = 1 second.
1 (15) Using these sample data points, we can approximate the
= X + Y + [X, Y ] + . . . .
2 mean and covariance for Cartesian coordinates with
If the ∨ operator is applied to this formula, we obtain
1 
N

1 μ̃ = xi , and (24)
z = x + y + ad(X)y + . . . . (16) N i=1
2
1 
N
IV. M EANS , C OVARIANCES AND G AUSSIAN
Σ̃ = (xi − μ̃)(xi − μ̃)T , (25)
D ISTRIBUTIONS N i=1
For a vector x ∈ Rn , the mean μ̃ and covariance Σ̃ about
the mean for the pdf f (x) are given respectively as 1 where xi = [xi , yi , θi ]T and N is the number of sample points.
 The mean for exponential coordinates defined in (21) can
0 = (x − μ̃)f (x)dx (17) be approximated with the recursive formula,
R
n 1 2
1 
N
Σ̃ = (x − μ̃)(x − μ̃)T f (x)dx (18) μ = μ ◦ exp log(μ−1 ◦ gi ) . (26)
Rn N i=1
A multidimensional Gaussian probability density function with The covariance matrix in exponential coordinates defined in
mean μ̃ and covariance matrix Σ̃ is defined as (22) can be estimated with
1 1
f (x; μ̃, Σ̃) = exp − (x − μ̃)T Σ̃−1 (x − μ̃) , (19) 1 
N
c̃(Σ̃) 2 Σ= yi yiT , (27)
N i=1
1 We will use a ‘tilde’ (˜) to represent quantities associated with the Gaussian
in Cartesian coordinates. where yi = [log(μ−1 ◦ gi )]∨ .

267
DT = 1 DT = 7
1 1

0.5 0.5

XY pdf XY pdf
Y Position

Y Position
0 0

Exp pdf Exp pdf

−0.5 −0.5

−1 −1
−0.5 0 0.5 1 1.5 −0.5 0 0.5 1 1.5
X Position X Position

Fig. 2. Distributions for the differential drive robot moving ideally along a straight line with diffusion constant DT = 1 (left) and DT = 7 (right). Both
plots have pdf contours of Gaussians of best fit in exponential coordinates and Cartesian coordinates marginalized over the heading.

DT = 4
DT = 1
1.5
1.5

1
1
Y Position

Y Position

0.5
0.5

0
0
XY pdf
XY pdf
Exp pdf
−0.5 Exp pdf
−0.5 0 0.5 1 1.5 2 −0.5
X Position −0.5 0 0.5 1 1.5 2
X Position

Fig. 3. Distributions for the differential-drive robot moving ideally along a constant-curvature arc with diffusion constant DT = 1 (left) and DT = 4
(right). Both plots have pdf contours of Gaussians of best fit in exponential coordinates and Cartesian coordinates marginalized over the heading.

The sample mean and covariance for each method parame- Fig. 4, it is clear that the result is approximately elliptical. To
terizes the associated pdf. These pdfs were marginalized over numerically verify that the Gaussian in exponential coordinates
orientation and contours of equal pdf values were plotted along is a better fit for the sample data than the Gaussian in
with the sampled end positions of the simulations. Fig. 2 and Cartesian coordinates, we performed a log-likelihood ratio test
Fig. 3 present these contours for various values of DT for for various values of the diffusion coefficient D as shown in
the driving straight and driving along an arc examples, where Fig. 5. Note that for small diffusion values, the log-likelihood
D is a diffusion coefficient and T is the total drive time. ratio is close to one indicating that both Gaussian models
Here T is equal to one. It is evident that the Gaussian in perform approximately the same. However, as the uncertainty
exponential coordinates fits the sampled data more accurately is increased the Gaussian in (23) in exponential coordinates
for both cases, especially as uncertainty increases. If we take performs better than the Gaussian in Cartesian coordinates.
the poses of the sampled data, represent them in exponential
coordinates, and look at the (v1 , v2 ) values as shown in

268
VI. P ROPAGATION WITH E XPONENTIAL C OORDINATES Straight, DT = 1 Arc, DT = 1
0.4 0.4

In this section, we describe how to propagate the mean 0.3 0.3


and covariance defined in (21) and (22) with closed-form
0.2 0.2
estimation formulas. By using the SDE (2), we can derive
another stochastic differential equation as 0.1 0.1

⎛ ⎞

V2

V2
0 0
cos(α)dx + sin(α)dy
(g −1 ġ)∨ dt = ⎝ − sin(α)dx + cos(α)dy ⎠ −0.1 −0.1

⎛ r ⎞ ⎛ r ⎞ −0.2 −0.2
r  
2 (ω1 + ω2 ) √ 2 2 dw1
= ⎝ 0 ⎠ dt + D ⎝ 0 0 ⎠ . −0.3 −0.3
dw2
r
(ω1 − ω2 ) r
−r −0.4 −0.4
(28) 0.9 1 1.1 1.4 1.5 1.6 1.7
V1 V1

This will be written in short-hand as Fig. 4. Distributions as a function of exponential coordinates (v1 , v2 ) with
pdf contours marginalized over the heading for the Gaussian in exponential
coordinates for driving straight (left) and driving along an arc (right) with
(g −1 ġ)∨ dt = h dt + Hdw. (29) diffusion DT = 1.

When rigid-body transformations are close to the identity,


the SE(2)-mean and SE(2)-covariance defined in (21) and
Straight
(22) can be approximated with [19] 1.6
Arc
 t  1.5
μ(t) = exp ĥ dτ , and (30) LL exp / LL Cart
0 1.4

1.3
 t
Σ(t) = Ad(μ−1 (τ ))HH T AdT (μ−1 (τ ))dτ (31) 1.2
0
1.1
T
where HH is a constant diffusion matrix. 1
For the straight driving example (ω1 = ω2 = ω), we have
0.9
⎛ ⎞ 1 2 3 4 5 6
1 0 rωt DT
μ(t) = ⎝ 0 1 0 ⎠. (32)
Fig. 5. Log-likelihood ratio test of the Gaussian in exponential coordinates
0 0 1 to the Gaussian in Cartesian coordinates as a function of the diffusion DT .

We can solve the integral in (31) in closed form as


c 2
σ11 = (4a + 2 )(2α̇t + sin(2α̇t))
⎛ 1 2 ⎞ 8 
2 Dr t 0 0
+16a2 (α̇t − 2 sin(α̇t)) ,
Σ(t) = ⎝ 0 2Dω 2 r 4 t3
3 2
Dωr 3 t2
2
⎠. (33)  2
0 Dωr 3 t2 2Dr 2 t −c  2  α̇t
2 2 σ12 = σ21 = 4a (−1 + cos(α̇t)) + 2 sin ,
2 2
For the constant curvature case (ω1 = ω2 ) with (4), we have σ13 = σ31 = 2ca(α̇T − sin(α̇t)),
c
⎛ ⎞ σ22 = − (4a2 + 2 )(−2α̇t + sin(2α̇t)),
cos(α̇t) − sin(α̇t) a sin(α̇t) 8
σ23 = σ32 = −2ca(−1 + cos(α̇t)),
μ(t) = ⎝ sin(α̇t) cos(α̇t) a(1 − cos(α̇t)) ⎠ . (34)
0 0 1 σ33 =
2cα̇, and
Dr2
c = .
The closed-form SE(2)-covariance matrix is then 2 α̇
⎛ ⎞ To demonstrate the accuracy of this propagation method,
σ11 σ12 σ13 we calculated the SE(2)-mean and SE(2)-covariance from
Σ(t) = ⎝ σ21 σ22 σ23 ⎠ , where (35) 10,000 sample data frames using (26) and (27) and from the
σ31 σ32 σ33 propagation formulas. For the straight driving example with

269
rω = 1 and DT = 1, the mean and covariance of the data are Noting that mji = m−1 ij and switching the roles of i and j
gives the same expression for pji (gj ). Our goal in this fusion
⎛ ⎞ problem is to write this new fused distribution in the form
1.0000 0.0003 1.0000
μdata = ⎝ −0.0003 1.0000 0.0000 ⎠ and pij (gi ) = f (a−1
i ◦ gi ; μij , Σij ),
(36) (46)
0 0 1
⎛ ⎞ and more generally, p1,2,...,N (gi ). To save space in this section,
0.0006 0.0000 −0.0001 we will occasionally remove the ◦ when the operation is clear.
Σdata = ⎝ 0.0000 0.0184 0.0276 ⎠ . (37) We can rewrite pij (gi ) in (45) as
−0.0001 0.0276 0.0551
pij (gi ) = f (μ−1 −1 −1 −1
i ai gi ; I, Σi ) · f (μj aj gi mij ; I, Σj ). (47)
The propagated mean and covariance for these parameters are
⎛ ⎞ Making the substitution h = μ−1 −1
i ◦ ai ◦ gi , we have
1 0 1
pij (ai μi h) = f (h; I, Σi ) · f (m−1
ij qhmij ; I, Σj ), (48)
μprop = ⎝ 0 1 0 ⎠ and (38)
0 0 1 where q = mij ◦ μ−1
j ◦ a−1
j ◦ ai ◦ μi . We rewrite (46) as
⎛ ⎞
0.0006 0 0 pij (ai μi h) = f (μi h; μij , Σij ) = f (h; μij , Σij ), (49)
Σprop = ⎝ 0 0.0184 0.0276 ⎠ . (39)
0 0.0276 0.0553 where μij = μ−1 i ◦ μij . Our goal is to find closed-form
expressions for μij and Σij .
If we increase the noise to DT = 7, the mean and covariance The exponents of all three Gaussian distributions (f ) in (48)
from sample data are and (49) are all scalars and thus we have, to within a constant
⎛ ⎞
1.0000 0.0011 1.0009 C,
μdata = ⎝ −0.0011 1.0000 −0.0002 ⎠ and (40)  ∨ Xij Y T −1  ∨ Xij Y 
log (e e ) Σij log (e e )
0 0 1  T  ∨ Xi Y 
⎛ ⎞ = C + log∨ (eXi eY ) Σ−1 i log (e e )
0.0068 0.0000 0.0002  ∨ −1 Xj Y T −1  ∨ −1 Xj Y 
Σdata = ⎝ 0.0000 0.1278 0.1943 ⎠ . (41) + log (mij e e mij ) Σj log (mij e e mij ) ,
0.0002 0.1943 0.3883 (50)

The propagated mean and covariance are where eXij = exp(Xij ) = (μij )−1 , eXi = exp(Xi ) = I,
⎛ ⎞ eXj = exp(Xj ) = q and eY = exp(Y ) = h. Note that
1 0 1
μprop = ⎝ 0 1 0 ⎠ and (42) log∨ (m−1
ij e e mij ) = Ad(m−1
Xj Y ∨ Xj Y
ij ) log (e e ). (51)
0 0 1 We assume that μij ,
q and h are small such that the BCH
⎛ ⎞ expansion can be approximated with only the first three terms
0.0039 0 0
Σprop = ⎝ 0 0.1290 0.1935 ⎠ . (43)
0 0.1935 0.3869 log∨ (exp(X) exp(Y )) ≈ x + y + 21 ad(X)y. (52)

Similar accuracy can be seen for the constant curvature case. Since ad(X)x = 0, we can rewrite (52) as

VII. F USION WITH E XPONENTIAL C OORDINATES log∨ (exp(X) exp(Y )) = (I + 12 ad(X))(x + y). (53)
Here we consider the fusion of banana distributions for N Since we renormalize, we rewrite (50) ignoring the constant
robots moving in formation. To begin, consider only two such C as
robots, i and j. The actual (unknown) pose of these robots at (xij + y)T ΓTij Σ−1
ij Γij (xij + y)
time t are gi and gj . At time t = 0, the known initial poses
= (xi + y)T ΓTi Σ−1
i Γi (xi + y)
are ai and aj , and the banana distributions for each robot
diffuse so as to result in distributions f (a−1 + (xj + y)T ΓTj Ad−T (mij )Σ−1
j Ad
−1
(mij )Γj (xj + y),
i ◦ gi ; μi , Σi ) and
f (a−1
j ◦ g j ; μ j , Σj ). The means and covariances of these two where Γk = I + 12 ad(Xk ). Let
distributions can be estimated using the propagation formulas
from the previous section. Sij = ΓTij Σ−1
ij Γij , Si = ΓTi Σ−1
i Γi , and
Let us assume that an exact relative measurement between Sj = ΓTj Ad−T (mij )Σ−1
j Ad
−1
(mij )Γj .
i and j is taken so that
We can then combine like terms and solve for the following
mij = gi−1 ◦ gj (44) closed-form formulas
is known. Then the distribution of gi becomes with a form of Sij = Si + Sj , (54)
Bayesian fusion −1
xij = Sij (Si xi + Sj xj ), and (55)
pij (gi ) = f (a−1 −1
i ◦ gi ; μi , Σi ) · f (aj ◦ gi ◦ mij ; μj , Σj ). (45) Σij = −1 T
Γij Sij Γij (56)

270
which have a similar form to the product of Gaussians on Rn . Fusion DT = 3
2
We can then extract the desired mean μij as

μij = μi ◦ exp(−x̂ij ). (57) 1.5

1
These equations can be used recursively for multiple robot
systems with multiple measurements.
0.5
These formulas can be contrasted with those used tradition-

Y Position
ally for Bayesian fusion of multivariate Gaussians on R3 , 0

f (xi ; ãi + μ̃ ij , Σ̃ij ) =


(58) −0.5
f (xi ; ãi + μ̃ i , Σ̃i ) · f (xi ; ãj + μ̃ j − m̃ij , Σ̃j )
−1 Sample Data
where m̃ij = xj − xi , Fusion PDFs
−1.5 Ideal Paths
Σ̃−1
ij = Σ̃−1 −1
i + Σ̃j , and True Poses

Σ̃ij Σ̃−1 −1 −2
μ̃ ij = i (ãi + μ̃ i ) + Σ̃j (ãj + μ̃ j − m̃ij ) − ãi . 0 0.5 1 1.5 2 2.5 3
X Position

The propagation and fusion formulas were tested on a three


robot system driving in a triangular formation with diffusion Fig. 6. Fusion of Gaussian in exponential coordinates for three robots driving
in formation with true positions marked with a filled in circle.
constant DT = 3 as shown in Fig. 6. Sample data from
integrating the SDE in (2) many times are represented by light
gray dots similar to Fig. 1(a). We integrated the SDE in (2) robot are
once from each starting position to represent a set of three true ⎛ ⎞
0.0006 −0.0000 −0.0001
poses represented by black circles. From these true poses, we
Σ1 = ⎝ −0.0000 0.0060 0.0011 ⎠ and
generated the exact measurements between each robot. The pdf
⎛ −0.0001 0.0011 0.0008 ⎞
contours from the fused Gaussians in exponential coordinates
0.0013 −0.0002 −0.0003
marginalized over the heading are also shown in the figure.
Σ̃1 = ⎝ −0.0002 0.0166 0.0267 ⎠ .
Note that the fused pdf contours are not trying to match the
−0.0003 0.0267 0.0571
banana distributions, but instead give a better estimate of the
location of the robots based on the exact measurements. Thus, the fused exponential coordinate Gaussians’ results are
These results obtained using propagation and fusion on more highly focused than the Cartesian ones.
Gaussians on exponential coordinates were compared with the
fusion of Gaussians on Cartesian coordinates. For comparison, VIII. C ONCLUSION
we let x∗i be the true pose (xi , yi , θi )T in Cartesian coordinates A comparison between the commonly used multivariate
for the ith robot. Then, xei and xci are the poses in Cartesian
coordinates associated with the means from the fused Gaus- Gaussian model on (x, y, θ) and one based on the coordinates
sians in exponential and Cartesian coordinates, respectively. of the Lie algebra (or exponential coordinates) has been
For Fig. 6, these poses are given as: provided. These models were examined in the context of
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ the stochastic kinematic model of a differential-drive robot.
2.043 2.010 1.976 Through numerical examples, we demonstrated that the model
x∗1 e c
= ⎝ 0.090 ⎠ , x1 = ⎝ 0.039 ⎠ , x1 = ⎝ 0.146 ⎠
based on exponential coordinates is able to capture the
−0.034 −0.054 0.132
underlying distribution more accurately and more precisely.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ This held true for trajectories that were both straight and
1.001 0.993 0.934 circular. In addition, a closed-form approximation was given
x∗2 e c
= ⎝ 0.840 ⎠ , x2 = ⎝ 0.836 ⎠ , x2 = ⎝ 0.896 ⎠
−0.128 −0.182 0.037 for propagating the mean and covariance for paths consisting
of straight segments and arcs of constant curvature. This allows
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ the mean and covariance to be estimated quickly.
1.033 0.977 0.966
x∗3 = ⎝ −1.077 ⎠ , x3e = ⎝ −1.106 ⎠ , x3c = ⎝ −1.021 ⎠ .
Finally, we presented a closed-form approximation for
−0.283 −0.303 −0.117 multiplying two Gaussian pdfs in exponential coordinates.
Multiplying two pdfs allows us to effectively fuse informa-
It is clear that the means for the Gaussian model in expo- tion obtained from two different sources. In the case of the
nential coordinates is closer to the true values. Moreover, if example provided, these sources were forward propagation
we examine the covariances associated with both models, it is of the stochastic kinematic model and relative measurements
apparent that the covariances of the exponential model have between robots. The pdfs obtained using this method provided
smaller magnitude than those for the Cartesian Gaussian. For significantly increased accuracy when compared with those
example, the covariances of the fused distributions for the first from just forward propagation.

271
The methods and models described above can be used to [12] D.J. Higham. An algorithmic introduction to numerical
improve current state of the art SLAM algorithms. We hope simulation of stochastic differential equations. SIAM
to use these methods to develop a new filter for SE(2) similar review, pages 525–546, 2001.
to the Kalman filter. Also, we would like to develop new [13] S. Huang and G. Dissanayake. Convergence and consis-
approaches for fusing data from different models together. For tency analysis for extended Kalman filter based SLAM.
example, if a Gaussian on exponential coordinates is used to IEEE Trans. on Robotics, 23(5):1036–1049, 2007.
model forward propagation, we want to fuse it with noisy [14] S.J. Julier and J.K. Uhlmann. A counter example to the
measurements from a Gaussian on Cartesian coordinates. theory of simultaneous localization and map building. In
IEEE Int’l Conf. on Robotics and Automation (ICRA),
ACKNOWLEDGMENTS volume 4, pages 4238–4243, 2001.
[15] J. Kwon and K.M. Lee. Monocular SLAM with locally
This work was funded in part by NSF grant IIS-0915542
planar landmarks via geometric Rao-Blackwellized par-
“RI: Small: Robotic Inspection, Diagnosis, and Repair” and
ticle filtering on Lie groups. In IEEE Conf. on Computer
Andrew Long’s NDSEG fellowship.
Vision and Pattern Recognition (CVPR), pages 1522–
1529, 2010.
R EFERENCES
[16] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit.
[1] M. Agrawal. A Lie algebraic approach for consistent FastSLAM: A factored solution to the simultaneous
pose registration for general euclidean motion. In Proc. localization and mapping problem. In Proc. of the Nat.
of IEEE Int’l Conf. on Intelligent Robots and Systems Conf. on Artificial Intelligence, pages 593–598, 2002.
(IROS), pages 1891–1897, 2006. [17] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit.
[2] T. Bailey and H. Durrant-Whyte. Simultaneous localiza- FastSLAM 2.0: An improved particle filtering algorithm
tion and mapping (SLAM): Part II. IEEE Robotics and for simultaneous localization and mapping that provably
Automation Magazine, 13(3):108–117, 2006. converges. In Int’l Joint Conf. on Artificial Intelligence,
[3] T. Bailey, J. Nieto, J. Guivant, M. Stevens, and E. Nebot. volume 18, pages 1151–1156. Lawrence Erlbaum Asso-
Consistency of the EKF-SLAM algorithm. In Proc. ciates LTD, 2003.
of IEEE Int’l Conf. on Intelligent Robots and Systems [18] K. Murphy. Bayesian map learning in dynamic envi-
(IROS), pages 3562–3568, 2006. ronments. Advances in Neural Information Processing
[4] G. Baldwin, R. Mahony, J. Trumpf, T. Hamel, and Systems (NIPS), 12:1015–1021, 1999.
T. Cheviron. Complementary filter design on the special [19] W. Park, Y. Wang, and G.S. Chirikjian. The path-of-
euclidean group SE (3). In Proc. of the 2007 European probability algorithm for steering and feedback control
Control Conference, 2007. of flexible needles. The Int’l J. of Robotics Res., 29(7):
[5] J.A. Castellanos, J. Neira, and J.D. Tardós. Limits to the 813–830, 2010.
consistency of EKF-based SLAM. IFAC Symposium on [20] G. Silveira, E. Malis, and P. Rives. An efficient direct
Intelligent Autonomous Vehicles, 2004. approach to visual SLAM. IEEE Trans. on Robotics, 24
[6] JA Castellanos, R. Martinez-Cantin, JD Tardós, and (5):969–979, 2008.
J. Neira. Robocentric map joining: Improving the [21] R. Smith, M. Self, and P. Cheeseman. Estimating
consistency of EKF-SLAM. Robotics and Autonomous uncertain spatial relationships in robotics. Autonomous
Systems, 55(1):21–29, 2007. robot vehicles, 1:167–193, 1990.
[7] G. Chirikjian. Stochastic Models, Information Theory, [22] C. Stachniss, G. Grisetti, W. Burgard, and N. Roy.
and Lie Groups, Volume 2. 2012. Analyzing gaussian proposal distributions for mapping
[8] H. Durrant Whyte and T. Bailey. Simultaneous lo- with Rao-Blackwellized particle filters. In Proc. of IEEE
calisation and mapping (SLAM): Part I the essential Int’l Conf. on Intelligent Robots and Systems (IROS),
algorithms. IEEE Robotics and Automation Magazine, pages 3485–3490, 2007.
13(2):99–110, 2006. [23] S. Thrun, W. Burgard, and D. Fox. A real-time algorithm
[9] E. B. Dynkin. Calculation of the coefficients in the for mobile robot mapping with applications to multi-
Campbell-Hausdorff formula. Doklady Akademii Nauk robot and 3D mapping. In IEEE Int’l Conf. on Robotics
SSSR, 57:323–326, 1947. and Automation (ICRA), volume 1, pages 321–328, 2000.
[10] E. Eade and T. Drummond. Scalable monocular SLAM. [24] S. Thrun, W. Burgard, and D. Fox. Probabilistic robotics.
In IEEE Conf. on Computer Vision and Pattern Recog- MIT Press, 2006.
nition (CVPR), volume 1, pages 469–476, 2006. [25] Y. Wang and G. S. Chirikjian. Nonparametric second-
[11] D. Hahnel, W. Burgard, D. Fox, and S. Thrun. An order theory of error propagation on motion groups. The
efficient FastSLAM algorithm for generating maps of Int’l J. of Robotics Res., 27(11-12):1258–1273, 2008.
large-scale cyclic environments from raw laser range [26] Y. Zhou and G.S. Chirikjian. Probabilistic models of
measurements. In Proc. of IEEE Int’l Conf. on Intelligent dead-reckoning error in nonholonomic mobile robots. In
Robots and Systems (IROS), volume 1, pages 206–211, IEEE Int’l Conf. on Robotics and Automation (ICRA),
2003. volume 2, pages 1594–1599, 2003.

272
Recognition and Pose Estimation of Rigid
Transparent Objects with a Kinect Sensor
Ilya Lysenkov Victor Eruhimov Gary Bradski
Itseez Itseez Willow Garage
[email protected] [email protected] [email protected]

Abstract—Recognizing and determining the 6DOF pose of


transparent objects is necessary in order for robots to manip-
ulate such objects. However, it is a challenging problem for
computer vision. We propose new algorithms for segmentation,
pose estimation and recognition of transparent objects from a
single RGB-D image from a Kinect sensor. Kinect’s weakness
in the perception of transparent objects is exploited in their
segmentation. Following segmentation, edge fitting is used for
recognition and pose estimation. A 3D model of the object is
created automatically during training and it is required for pose
estimation and recognition. (a) Kinect RGB image (b) Kinect depth image
The algorithm is evaluated in different conditions of a domestic
environment within the framework of a robotic grasping pipeline
where it demonstrates high grasping success rates compared
to the state-of-the-art results. The method doesn’t deal with
occlusions and overlapping transparent objects currently but it
is robust against non-transparent clutter.
I. I NTRODUCTION
Transparent objects are a common part of domestic and
industrial environments. Recognition and 6 degres of freedom
(6DOF) pose estimation of objects is required for robotic (c) Segmentation (d) Recognition and pose estimation
grasping and manipulation. However, transparent objects are
very challenging for robot perception with RGB images as Fig. 1. Example of test data with 3 transparent objects with segmentation,
recognition and pose estimation results using the proposed algorithms.
well as with modern 3D sensors.
• 2D Computer Vision. The appearance of a transparent
object strongly depends on its background and lighting. We address these challenges and propose an algorithm for
Transparent objects usually don’t have their own texture segmentation, pose estimation and recognition of transparent
features, their edges are typically weak and intensity objects. Unknown transparent objects are segmented from a
gradient features are heavily influenced by see through single image of a Kinect sensor by exploiting its failures on
background clutter as well as specularity edges induced specular surfaces. 3D models of objects created at the training
by lighting. So classical computer vision algorithms for stage are fitted to extracted edges. A cost function value is
recognition and pose estimation are difficult to apply to used to make a decision about an instance of the object and
transparent objects. determine its 6DOF pose relative to the robot.
• 3D Computer Vision. 3D point clouds are successfully The proposed algorithm was integrated into the object
used for object recognition and pose estimation (Hinter- recognition stack (Rublee et al. [26]) and connected with
stoisser et al. [10]). However, modern sensors (Kinect, the robotic grasping pipeline (Ciocarlie [5]) in the Robot
ToF cameras, stereo cameras, laser scanners) can’t esti- Operating System (ROS, Quigley and WillowGarage [24]) and
mate depth reliably and produce point clouds for trans- evaluated on a Willow Garage’s PR2 robot. The grasping is
parent and specular objects so these algorithms can not robust to non-transparent clutter and success rate is over 80%.
be applied. Cross-modal stereo can be used to get depth
estimation on transparent objects with Kinect (Chiu et al. II. R ELATED WORK
[4]) but its quality is far from estimation on Lambertian Transparent objects and the challenges they bring in percep-
objects. Acquisition of 3D data and reconstruction of tion were rarely addressed in research papers until recently.
transparent objects is a challenging unsolved problem McHenry et al. [18] derive features from inherent properties
(Ihrke et al. [11], Mériaudeau et al. [19]) and active area of transparent objects (transparency, reflection and refraction).
of research (Jiang and Smith [12]). Usually these properties are obstacles for computer vision

273
algorithms and features like specular highlights are considered Phillips et al. [23] propose to use inverse perspective map-
as noise. The situation is different for transparent objects. ping for detection and pose estimation of transparent objects.
These features are characteristic of transparency and so they This cue can be computed if two views of a test scene are
are used to segment unknown transparent objects from a single available and objects stay on a support plane. The algorithm
RGB image by McHenry et al. [18], McHenry and Ponce was evaluated on a dataset of 5 transparent objects that were
[17]. This problem is also addressed by Kompella and Sturm isolated from each other in a test scene. Poses of the objects
[14] in a robotic context of transparent obstacles detection. are estimated with the accuracy of 2.7-11.1 millimeters with
These approaches are based on finding correspondence be- the mean error of 7.8 millimeters. However, pose is estimated
tween deformed background behind a transparent object and with exhaustive coarse-to-fine search in 2D space of poses
its unoccluded part available as foreground on the same image. on a table (with refinement of the best candidate) so this is
This correspondence between regions is used to segment the not scalable to 6-DOF pose estimation. The approach was
transparent object. However, these counterpart regions should demonstrated to be applicable to the detection problem.
be large enough to estimate features such as blurring, texture
III. P ROPOSED APPROACH
distortion and other transparent overlay effects. It is not the
case in highly cluttered scenes or, for example, if a transparent We consider problems of segmentation, pose estimation,
glass is filled with a colored drink like milk. The algorithm recognition and grasping of transparent objects in this paper.
proposed in this paper allows segmentation even in such cases. Our main focus is accurate pose estimation of transparent
Lei et al. [15] use light detection and ranging (LIDAR) data objects which is used to enable robotic grasping. Previous
to segment unknown transparent objects: highlight spots from works by Klank et al. [13], Phillips et al. [23] are the most
a RGB image are used to find candidate areas of transparent relevant here because other papers don’t deal with 3D pose
objects and then the GrabCut segmentation algorithm (Rother estimation of transparent objects. These works require two
et al. [25]) is applied to a depth image and a laser reflectance views of a test scene, in contrary we use a single image of
intensity image to compute the final segmentation. The experi- Kinect at the test stage to recognize and estimate pose of
mental setup consist of a 2D LIDAR device, a registered RGB transparent objects. Kinect uses structured light and projects
camera and a pan-tilt unit so it is more complex and expensive a IR-pattern on objects to estimate their depth. However, such
than Kinect used in the current paper. a technique doesn’t work with specular, refractive, translucent
Osadchy et al. [22] utilize specular highlights of transparent objects and objects with very low surface albedo (Ihrke et al.
objects in the recognition problem. Successful recognition of 9 [11]). So Kinect can’t produce point clouds of transparent
transparent objects was demonstrated with this single feature. objects but we take advantage of this fact. The regions where
Unfortunately, a test scene must have a dominant light source Kinect fails to estimate depth are likely to be produced by
with known position so applicability of the approach is limited. specular or transparent surfaces and we use invalid areas in
Fritz et al. [8] use an additive model of latent factors to a Kinect depth map as an interest area operator. Using this
learn appearance of transparent objects and remove influence cue and a corresponding RGB image, we segment transparent
of a background behind them. It allows recognition of trans- objects from background and extract their silhouettes. 3D
parent objects in varying backgrounds. The algorithm was models of transparent objects are created during a training
evaluated on the problem of transparent object detection in stage by painting objects with color and scanning them. These
challenging conditions of domestic environment with complex learned models are fitted to the extracted RGB silhouettes on
backgrounds using a dataset of 4 transparent objects. the test image by varying poses of the object. We treated this
Klank et al. [13] detect unknown transparent objects and as an optimization problem with regard to 6 parameters that
reconstruct them using two views of a test scene from a ToF define a 3D pose of a rigid object relative to the robot’s camera.
camera. The problem is challenging especially because no This multi-dimensional optimization problem is decomposed
training is involved and a 3D model of an object is recon- into several parts. Some of them have closed-form solutions
structed directly from test data. The algorithm is tolerant to and others can be solved with standard iterative algorithms
difficult lighting conditions thanks to data from a ToF camera. like Levenberg-Marquardt. The inherent ambiguity in pose
Transparent objects are assumed to stay on a flat surface but estimation of transparent objects is discussed. This ambiguity
the algorithm is tolerant to violations of this assumption. The is resolved with a support plane assumption and the correct
algorithm is not misled by opaque objects and distinguishes pose is returned which can be used for robotic grasping.
them from transparent objects correctly. The algorithm was Our main contributions are:
evaluated on a dataset of 5 transparent objects with uniform • A model of transparent objects that takes into account
backgrounds behind them. The objects were placed on a both silhouette and surface edges.
dark table separately from each other. Transparent objects • An algorithm for 6DOF pose estimation of transparent
were reconstructed successfully in 55% of 105 attempts. Also objects. It is based on existing CAD-based approaches to
transparent object grasping with a robot was evaluated. It pose estimation (Ulrich et al. [28], Liu et al. [16]) but
was successful in 41% of the cases when reconstruction was our algorithm doesn’t need a CAD model and utilizes
successful so the robot grasped 23% of all transparent objects the specifics of transparent objects to meet practical
overall. requirements in performance and accuracy.

274
• A complete system for grasping transparent objects with
Kinect as input, going from the model capture up to
segmentation, pose estimation and grasping.
The proposed algorithm was evaluated on 13 transparent
objects in different conditions. As stated above, our algorithm
was integrated into a robotic grasping pipeline and evaluated
on a PR2 robot. Grasping was successful in 80% of cases and
this result is robust even to challenging complex backgrounds Fig. 2. A 3D model of a transparent object created with KinectFusion at
behind the transparent objects. the training stage. The transparent object was sprayed white so that a Kinect
sensor could be used to estimate surface depth.

IV. T RAINING STAGE


Our method creates a 3D model of an object that allows The surface edge model is created from the silhouette
us to generate edgels (pixels belonging to edges) in a 2D model. All points of the silhouette model are projected on
image given the object’s 6DOF pose. Transparent as well as the training images. Points that are often projected on Canny
any untextured objects create two types of edges: surface and edges are used to form the surface edge model.
silhouette edges. A surface edge is produced by a discontinuity The models are used to compute edges of the object for
in surface orientation. Image edgels corresponding to a surface given poses. Points of the silhouette model are projected onto
edge are made visible by lighting discontinuities. A silhouette an image for a given pose and then morphological operations
edge is produced by the object border and the corresponding are applied to get a single blob. The contour of this blob is
image edgels are made visible by both lighting and contrast the silhouette of the object. Surface edges are computed by
with the background. Examples of surface edges are edges projecting the surface edge model. Objects are transparent so
where the stem joins the rest of the glass and the back edge of there are no self-occlusions and we get surface edges directly.
the cup lip (Fig. 5), while silhouette edges are side edges of the Template-based approaches are popular in computer vision,
glass. These two types of edges have a different dependence a recent example is Hinterstoisser et al. [10]. Templates are
on the object pose. However, they are equally important for created for several poses by sampling views from a viewing
the pose estimation, so we have to take into account both. sphere of the object. These templates are matched to a test
We define the object model that contains a silhouette model image and a full pose is estimated. Standard template matching
and a surface edge model. The silhouette model is a 3D point algorithms are not suitable for transparent objects because
cloud of the whole object surface and is used to generate these objects have weak gradients and they can be severely
silhouette edges. The surface edge model is a set of 3D points disturbed by gradients of background behind the object. These
corresponding to surface curvature discontinuities. algorithms degrade significantly in such cases and should be
The creation of a silhouette model requires 3D scanning of a improved by using a depth map (Hinterstoisser et al. [10]) but
transparent object. However, there are no robust algorithms to it is not available for transparent objects.
do this (Ihrke et al. [11]). One possible solution is to create the We adopt a template-based approach for pose estimation of
model manually but that is a time-consuming process. Another transparent objects. Pose estimation of a rigid object requires
approach is to find and download a similar 3D model from the estimation of 6 parameters: 3 for rotation and 3 for translation.
Internet as done by Phillips et al. [23] for 5 transparent objects. A rotation matrix can be decomposed into a superposition
Unfortunately, existing databases of 3D models are limited and of three rotations around coordinate axes: R = Rz Rx Ry ,
problems arise if there are no similar models available. Finally, where we consider fixed coordinate axes. An optical axis is
one can cover transparent objects with powder or paint and use z and the upright direction of the object is aligned with y-
standard scanning algorithms on these now Lambertian objects axis for the identity transformation. We will find translation
as done by Osadchy et al. [22]. We use the last approach parameters and Rz at the testing stage. So, we create templates
because it allows us to create high-quality models. We paint of the object for different Ry and Rx during the training
a transparent object white and use KinectFusion (Newcombe stage. We sample possible rotations Ry and Rx with some
et al. [20]) to compute a clean point cloud of the object. The discrete steps and for each step, we project our model to get a
KinectFusion algorithm creates a single point cloud of a test silhouette of the object for this pose. The y-axis corresponds
scene by registering point clouds from different viewpoints. to the upright direction, many transparent objects common in
A modeled object was put on a transparent stand on a table the household environment (plates, glasses) are rotationally
to enable easy 3D segmentation of the model as a cluster of symmetric around this direction. So the algorithm checks to
points above the table plane. KinectFusion does not use 2D see if an object has rotation symmetry around the y-axis and
photometric data and so relies on tracking/registering to 3D templates are sampled for Rx only if it is the case. 100
structures only. To ensure good 3D registration, we placed a templates were sampled for non-symmetric objects and 10 for
3D calibration rig near the object to be modeled. This allows symmetric objects when evaluating the algorithm.
robust and precise camera tracking that results in accurate 3D As a result, we get a 3D model of the object and silhouettes
models. One of the created models is shown in the Fig. 2. for possible rotations Ry and Rx after the training stage.

275
V. T ESTING STAGE
A Kinect RGB image of two transparent objects is shown
in the Fig. 3a. Kinect has a low quality RGB camera and it is
difficult to detect and estimate pose from such images robustly,
especially when background is not uniform. However, in the
Fig. 3b a corresponding Kinect depth map is shown where
invalid depth, that is regions where Kinect fails to estimate
depth, is colored by black. Kinect is not able to estimate
depth on transparent objects but we can take advantage of this (a) Results of glass segmentation (b) Projection of the estimated pose
fact: regions where Kinect doesn’t work are likely to belong
to transparent objects. So this cue can be used to segment Fig. 4. (a) The algorithm uses simple morphological operations to get
approximate masks from a Kinect depth map and then refines them by
transparent objects on an image. applying the GrabCut algorithm to a Kinect RGB image. Pose estimation is
robust to errors of the segmentation. (b) The algorithm recognized the object
correctly and its pose was estimated accurately.

B. Initial pose estimation


We find the initial pose estimation using a test image
silhouette. We already have silhouettes of the object for various
Rx and Ry from the training stage. So we need to estimate
the translation parameters and Rz for each silhouette. In order
to do that, we first find a two-dimensional similarity transfor-
(a) Kinect RGB image (b) Kinect depth image mation between train and test silhouettes. The transformation
Fig. 3. Kinect images of two transparent objects. It is hard to detect
consists of a 2D translation, uniform scaling and rotation in
transparent objects from the RGB image but the depth map provides an the image plane. This is done using Procrustes Analysis (2D
informative cue for segmentation: Kinect can’t estimate depth in regions where shape matching, Dryden and Mardia [6]).
transparent objects are located. 2D translation is estimated by aligning centroids:
1
n
A. Transparent objects segmentation (x̄, ȳ) = (xi , yi ), (1)
n i=1
Kinect can fail to estimate depth not only on transparent
objects, it often fails on contours of objects. Sometimes it can where n is the number of points in a silhouette with points
also return valid depth of the background behind transparent (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ).
objects. To clean this up, morphological operations (closing 2D scale is estimated by aligning scatters of the points:
:
and opening) are applied to find and eliminate small regions. ; n
;1 
The result is a proposed mask representing transparent objects. s=< (xi − x̄)2 + (yi − ȳ)2 . (2)
However, invalid depth doesn’t always correspond to trans- n i=1
parent objects, and noise distorts masks so these operations
can only produce approximate regions of transparent objects. Correspondences between points in the training and test sil-
To refine these masks, we use them as an initialization and houette are not known. So a 2D-2D ICP (Besl and McKay
constraints for the GrabCut segmentation algorithm (Rother [1]) is used to estimate rotation between these silhouettes and
et al. [25]) on the corresponding Kinect RGB image. Regions refine other parameters of the similarity transformation.
with transparent objects differ from surrounding background Now we need to compute a corresponding 3D transforma-
only slightly so many segmentation algorithms fail to segment tion that maps points of a 3D model to the same locations as
them without additional information. However, this mask de- this 2D similarity transformation. Unfortunately, there is no
rived from a Kinect depth map provides good initialization and such transformation under the perspective camera model in
enough constraints to allow GrabCut to segment transparent general case. So we use a weak perspective projection model
objects accurately. (Hartley and Zisserman [9]) that assumes all object points
Results of segmentation are shown in the Fig. 4a. Segmen- have the same depth. This assumption is valid if object size
tation of the left object is accurate but the right object is seg- is small comparable to the distance to the camera. This model
mented incorrectly due to non-uniform background. However, produces the following equation which must be solved for all
subsequent steps of our algorithm are robust to such errors of x, y simultaneously with regard to translation (tx , ty , tz ) and
the segmentation. Also there is a third false segmentation mask rotation Rz :
⎛ ⎞
because Kinect failed on the non-transparent object. This is a ⎛ ⎞ ⎛ ⎞ x
x r11 r12 0 tx ⎜ ⎟
rare case but the pose estimation algorithm is robust to such 1 1 y⎟
SK ⎝y ⎠ = K ⎝r21 r22 0 ty ⎠ ⎜ ⎝ z̄ ⎠ , (3)
situations too since it is unlikely to match any of the models z̄ z̄ + tz
z̄ 0 0 1 tz
well. 1

276
where K is the camera matrix of intrinsics parameters, S is
the similarity transformation written as the 3 × 3 matrix and
⎛ ⎞
r11 r12 0
Rz = ⎝r21 r22 0⎠ . (4)
0 0 1
There are two solutions to this problem but one of them
places an object behind the camera. Denoting a matrix A =
K −1 SK with elements A = (aij ), the unique physically
realizable solution is given by equations:
 1 
tz = √ − 1 z̄,
det A
tx = a13 (z̄ + tz ),
Fig. 5. Ambiguous pose. There are two possible interpretations: either the
ty = a23 (z̄ + tz ), (5)
     wine glass is turned upside-down or it is laid on its side. This ambiguity
r11 r12 tz  a11 a12 wouldn’t arise if the object was opaque.
= 1+ .
r21 r22 z̄ a21 a22
Translation parameters and Rz are computed by these equa-
tions for each Ry and Rx from the training stage. If Ry and
Rx are close to the correct values of the test object pose then
the training silhouette and the test silhouette will be matched
well but incorrect Rx and Ry will produce bad matches.
So we need a function to measure the quality of silhouettes
(a) Ambiguous projection
matchings to find a correct pose. We use the Chamfer Distance
(Borgefors [2]) for this task. Its main weakness is low accuracy
in cluttered scenes and various modifications were proposed
(Olson and Huttenlocher [21], Shotton et al. [27]) but we
have segmented silhouettes so it is not a problem in our
pipeline. Poses with large Chamfer Distance are discarded
and non-maximum suppression is applied to remove similar
transformations. Several plausible poses remained after these
procedures (usually 1-3 for a test silhouette). Computed poses (b) Point cloud of this test scene
are not very accurate due to discrete sampling of Ry and Rx
and weak perspective assumption so they need to be refined. Fig. 6. (a) The projected model onto the image is shown. The projection
looks good because it is aligned well with edges of the object. (b) A view
from the side shows that in fact the model is far away from the true pose (the
C. Pose refinement table is below and the object model is above).
Surface and silhouette edges are represented by 3D point
clouds that should be aligned with a segmented silhouette
and Canny edges of a test image. However we do not know edges too. But a point cloud of this scene presented in Fig. 6b
correspondences between 3D points and edge points. So this shows that in fact this pose is not correct.
is a problem of 3D-2D registration and we use a robust Additional information is required to resolve this ambiguity.
variant of Levenberg-Marquardt Iterative Closest Points (LM- For example, one solution is to use a second view of the
ICP, Fitzgibbon [7]) to solve it. The result of this algorithm is scene from different viewpoint. However, this solution may
the refined pose. require a robot moving and it complicates the pose estimation
pipeline. So instead, we use the assumption that the transparent
D. Ambiguity of poses object stays on a support plane. It allows us to resolve pose
The problem of pose estimation from one monocular image ambiguity and also be used to refine computed poses. Note
has inherent ambiguity for transparent objects. There exist that this assumption is not inherent to the approach and can
significantly different poses that have very similar projections be easily relaxed.
to a test image. For example, see the Fig. 5. After a while you This assumption implies that we need to transform a com-
can see two different poses in this image: either the wineglass puted pose in such a way that the object will be put on a
is upside-down or it is lying on its side on the table. support plane and the projection of a model with the new
Also see the projected object model for a pose in the pose should be similar to the projection with the old pose.
Fig. 6a. It appears to be a good pose because model points are Denoting the vector of residual errors for a pose T as err(T )
projected on the object and model edges are aligned with test and the starting pose as T0 , we need to solve the following

277
problem:
min ||err(T ) − err(T0 )||, (6)
T

with such constraints on the pose T that the object with this
pose must stay on the support plane.
We expand err(T ) in Taylor series in the point T0 and
discard the higher order terms because T0 should be close to
the correct pose. So ||err(T ) − err(T0 )|| ≈ ||JΔT ||, where
J is Jacobian and ΔT = T − T0 . Minimization of ||JΔT || is
equivalent to minimization of the dot product (JΔT, JΔT ) =
ΔT T J T JΔT = ΔT QΔT, where Q = J T J.
Fig. 7. Example of a test image used to evaluate pose estimation accuracy.
Constraints on the pose T consist of constraints on rotation
of the object (the upright direction must be aligned with the
plane normal) and its location (the bottom of the object must image is in the Fig. 7. About 500 test images were captured
belong to the support plane). These are linear constraints on (70-120 images per object) with resolution 640 × 480.
ΔT which we denote as EΔT = d. So the final problem is: The results are shown in the Fig. 8. Poses are estimated with
min ΔT T QΔT (7) the accuracy of several millimeters. Ground truth data can have
ΔT a systematic error of about 4 millimeters due to approximate
EΔT = d. measurement of the distance to a test object. So we also report
relative translation error to exclude this bias in the ground truth
This is a quadratic programming problem with linear equal-
data. Rotation error is not reported because all objects in this
ity constraints. The solution of the problem is derived with
dataset have rotation symmetry and constraints of the support
Lagrange multipliers as the solution to the linear system (Boyd
plane define rotation of an object completely in this case.
and Vandenberghe [3]):
    
Q ET ΔT 0 Object Success Mean translation error (cm)
= . (8) rate absolute relative
E 0 λ d
bank 0.99 0.3 0.2
Estimated poses are updated with computed ΔT and further
bottle 0.99 1.2 0.5
refined by LM-ICP (Fitzgibbon [7]) with the constraint that the
bucket 1.00 0.5 0.5
object stays on the support plane. The best pose that achieves
glass 0.94 0.4 0.3
the minimum value of the cost function in LM-ICP, is returned
wine glass 0.97 1.1 0.5
as the pose of the object. An example of estimated pose is
mean 0.98 0.7 0.4
shown in Fig. 4b.
Fig. 8. Accuracy of pose estimation. Success rate is the proportion of cases
E. Recognition technique when a final pose was closer than 2cm to the ground truth pose. Translation
Each silhouette can belong to only one object because we error is the distance between an estimated location of the object and (a)
the ground truth location (absolute error) or (b) the mean estimated location
assume transparent objects don’t intersect each other in the computed from all test images (relative error).
test image. So we recognize objects by final values of the cost
function in LM-ICP. Its minimum value indicates the correct
object which is the most similar to this silhouette. B. Recognition
Recognition was evaluated on the same dataset as pose
VI. E XPERIMENTS AND RESULTS estimation. However, the algorithm had to estimate poses of
We scanned 13 transparent objects and created 3 datasets all objects in this case and return the one with the lowest cost
to evaluate the algorithm. The first dataset had 5 transpar- function. Results are in the Fig. 9. Bank, bucket and glass are
ent objects and it was used to measure accuracy of pose correctly recognized in 93% of cases on average. However,
estimation and recognition. The second and the third dataset the wine glass is not recognized at all. This is due to failures
had 8 other objects and they were used to evaluate robotic at the glass segmentation step. The wine glass has a very thin
grasping with uniform and textured background. The open leg and so was discarded by morphological operations. So,
source implementation of the system can be obtained from segmentation of the wine glass without the leg looks like a
https://github.com/wg-perception/transparent objects. regular glass. It is interesting to note that pose estimation
is still robust to such segmentation errors and it has good
A. Pose estimation accuracy accuracy for the wine glass as reported in the Fig. 8.
Each of 5 transparent objects from the first dataset was Recognition results show that our algorithm can be applied
placed on the table with uniform background, one object at to the problem of transparent object recognition. However, a
time. The table had fiducial markers attached to it. They were more robust segmentation algorithm is required to use it in
used to estimate the ground truth pose. An example of a test practice for this task. For example, the segmentation should

278
be improved with using both the invalid depth cue proposed
in this paper and complementary algorithms proposed by
McHenry et al. [18], McHenry and Ponce [17].

bank bottle bucket glass wine glass


bank 0.97 0 0 0.03 0
bottle 0.01 0.57 0 0.42 0
bucket 0 0 0.84 0.16 0
glass 0.01 0 0.01 0.98 0
wine glass 0 0 0 1 0 (a) Uniform background (b) Cluttered background

Fig. 9. Normalized confusion matrix. Actual classes are in the rows and Fig. 11. Typical images of a transparent object (it is in the middle) which
predicted classes are in the columns. The algorithm recognizes bank, bucket were used to evaluate robotic grasping.
and glass in 93% of cases on average but fails to recognize the wine glass
due to problems with segmentation of its thin leg.
Object Uniform Non-transparent
background clutter
C. Grasping tall glass 8 8
small cup 10 8
middle cup 10 10
wine glass 9 4
yellow cup 9 10
perfume bottle 9 10
perfume ball 3 7
perfume box 8 7
mean 8.25 8.00

Fig. 12. Number of successful grasps for each object from 10 attempts.
Grasps are successful in 80% of cases and this result is robust to complex
non-transparent clutter.

because they will be successful even if the gripper is shifted


slightly from the correct position. Grasps from the side are
Fig. 10. Dataset of 8 transparent objects with different shapes, materials and very difficult for some objects. For example, the wine glass
characteristics (one of them is translucent). Test objects for evaluating robotic is almost as big as the maximum width of the PR2 grip. This
grasping are at left. Identical copies of these objects, painted with color, are means that even small errors at any stage of the grasping
on the right. They were used to create 3D models with KinectFusion at the
training stage. pipeline (model capture, pose estimation, PR2 calibration) will
result in a failed grasp. The wine glass was grasped mostly
The dataset of 8 transparent objects was used to evaluate from the top when background was uniform. But the robot
grasping (Fig. 10). A transparent object was put before the tried mostly to accomplish grasps from the side when the
robot on a table in different places. Then our pose estimation background was cluttered because additional objects changed
algorithm was executed to get the pose of this object. The the collision map and other grasp points were selected in the
computed pose was used to grasp the object. The robot had to PR2 grasping pipeline (Ciocarlie [5]). So this difference in
lift the object for 10 cm. Grasping was considered successful results are explained not by the difference in background but
if the robot could hold the object for at least 30 seconds. by these two types of grasps. Also, results are different for the
We evaluated the algorithm in two different conditions. The perfume ball in two setups. The perfume ball is a challenging
first experiment had simple uniform background (Fig. 11a). object to grasp because it is heavy and can easily slip out of
The second experiment had challenging background with non- the gripper, resulting in less stable grasps than other objects
transparent clutter (Fig. 11b). We put objects on one of the used. The difference in results is explained by higher variance
paintings that lie on the table. of successful grasps for this object.
We had 10 grasp attempts for each object in each experi- The results of our algorithm don’t change with different
mental setup. Results are presented in the Fig. 12. The robot backgrounds. This property is very important when dealing
can grasp a transparent object in 80% of cases and this means with transparent objects because various cluttered backgrounds
the algorithm is stable to cluttered background. However, the behind transparent objects is one of the main challenges for
results in two setups are different for the wine glass. There computer vision algorithms. Our algorithm is robust to clutter
are two possible ways to grasp the objects from the database: because a Kinect depth map is used for segmentation and it
from the side or from the top. Grasps from the top are easy is not affected by background behind transparent objects.

279
VII. C ONCLUSION Robotics and Automation (ICRA), 2011 IEEE Interna-
The paper presents new algorithms for segmentation, pose tional Conference on. IEEE, 2011.
estimation and recognition of transparent objects. A robot [14] V.R. Kompella and P. Sturm. Detection and avoidance
is able to grasp 80% of known transparent objects with of semi-transparent obstacles using a collective-reward
the proposed algorithm and this result is robust across non- based approach. In Robotics and Automation (ICRA),
specular backgrounds behind the objects. Our approach and 2011 IEEE International Conference on. IEEE, 2011.
other existing algorithms for pose estimation (Klank et al. [15] Zhong Lei, K. Ohno, M. Tsubota, E. Takeuchi, and
[13], Phillips et al. [23]) can not handle overlapping trans- S. Tadokoro. Transparent object detection using color
parent objects so this is the main direction for future work. image and laser reflectance image for mobile manipula-
tor. In Robotics and Biomimetics (ROBIO), 2011 IEEE
ACKNOWLEDGMENTS International Conference on, pages 1 –7, dec. 2011.
We would like to thank Vincent Rabaud (Willow Garage) [16] M.Y. Liu, O. Tuzel, A. Veeraraghavan, R. Chellappa,
and Ethan Rublee (Willow Garage) for help with integration of A. Agrawal, and H. Okuda. Pose estimation in heavy
the algorithm into the object recognition stack and connection clutter using a multi-flash camera. In IEEE International
with the PR2 grasping pipeline. We also thank the anonymous Conference on Robotics and Automation (ICRA), 2010.
reviewers for their useful comments and suggestions that [17] K. McHenry and J. Ponce. A geodesic active contour
helped to improve the paper. framework for finding glass. In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition
R EFERENCES (CVPR 2006). IEEE, 2006.
[1] P.J. Besl and N.D. McKay. A method for registration of [18] K. McHenry, J. Ponce, and D. Forsyth. Finding glass. In
3-D shapes. IEEE Transactions on Pattern Analysis and IEEE Computer Society Conference on Computer Vision
Machine Intelligence, 1992. and Pattern Recognition (CVPR 2005). IEEE, 2005.
[2] G. Borgefors. Hierarchical chamfer matching: A para- [19] F. Mériaudeau, R. Rantoson, D. Fofi, and C. Stolz.
metric edge matching algorithm. IEEE Transactions on Review and comparison of non-conventional imaging
Pattern Analysis and Machine Intelligence, 1988. systems for three-dimensional digitization of transparent
[3] S.P. Boyd and L. Vandenberghe. Convex optimization. objects. Journal of Electronic Imaging, 21:021105, 2012.
Cambridge Univ Pr, 2004. [20] R.A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux,
[4] Walon Chiu, Ulf Blanke, and Mario Fritz. Improving the D. Kim, A.J. Davison, P. Kohli, J. Shotton, S. Hodges,
Kinect by Cross-Modal Stereo. In 22nd British Machine and A. Fitzgibbon. KinectFusion: Real-time dense sur-
Vision Conference (BMVC), 2011. face mapping and tracking. In 10th IEEE International
[5] M. Ciocarlie. Object manipulator package (ros). http: Symposium on Mixed and Augmented Reality, 2011.
// www.ros.org/ wiki/ object manipulator, 2012. [21] C.F. Olson and D.P. Huttenlocher. Automatic target
[6] I.L. Dryden and K.V. Mardia. Statistical shape analysis, recognition by matching oriented edge pixels. Image
volume 4. John Wiley & Sons New York, 1998. Processing, IEEE Transactions on, 6(1):103–113, 1997.
[7] A.W. Fitzgibbon. Robust registration of 2D and 3D point [22] M. Osadchy, D. Jacobs, and R. Ramamoorthi. Using
sets. Image and Vision Computing, 2003. specularities for recognition. In 9th IEEE International
[8] M. Fritz, M. Black, G. Bradski, and T. Darrell. An Conference on Computer Vision. IEEE, 2003.
additive latent feature model for transparent object recog- [23] C.J. Phillips, K.G. Derpanis, and K. Daniilidis. A
nition. Advances in Neural Information Processing Novel Stereoscopic Cue for Figure-Ground Segregation
Systems (NIPS), 2009. of Semi-Transparent Objects. In 1st IEEE Workshop on
[9] R. Hartley and A. Zisserman. Multiple view geometry in Challenges and Opportunities in Robot Perception, 2011.
computer vision. Cambridge Univ Press, 2000. [24] M. Quigley and WillowGarage. Robot operating system,
[10] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Kono- (ros). http:// www.ros.org/ wiki/ , 2012.
lige, N. Navab, and V. Lepetit. Multimodal Templates for [25] C. Rother, V. Kolmogorov, and A. Blake. GrabCut:
Real-Time Detection of Texture-Less Objects in Heavily interactive foreground extraction using iterated graph
Cluttered Scenes. In IEEE International Conference on cuts. ACM Transactions on Graphics (TOG), 2004.
Computer Vision, 2011. [26] E. Rublee, T. Straszheim, and V. Rabaud. Object recogni-
[11] Ivo Ihrke, Kiriakos N. Kutulakos, Hendrik P. A. Lensch, tion. http:// www.ros.org/ wiki/ object recognition, 2012.
Marcus Magnor, and Wolfgang Heidrich. State of the Art [27] J. Shotton, A. Blake, and R. Cipolla. Multiscale categor-
in Transparent and Specular Object Reconstruction. In ical object recognition using contour fragments. IEEE
STAR Proceedings of Eurographics, pages 87–108, 2008. Transactions on Pattern Analysis and Machine Intelli-
[12] L.T. Jiang and J.R. Smith. Seashell Effect Pretouch Sens- gence, pages 1270–1281, 2007. ISSN 0162-8828.
ing for Robotic Grasping. In Robotics and Automation [28] M. Ulrich, C. Wiedemann, and C. Steger. CAD-based
(ICRA), IEEE International Conference on, 2012. recognition of 3d objects in monocular images. In
[13] U. Klank, D. Carton, and M. Beetz. Transparent Object International Conference on Robotics and Automation,
Detection and Reconstruction on a Mobile Platform. In volume 1191, page 1198, 2009.

280
Towards Persistent Localization and Mapping with a
Continuous Appearance-Based Topology
Will Maddern, Michael Milford and Gordon Wyeth
School of Electrical Engineering and Computer Science
Queensland University of Technology, Brisbane, Australia
{w.maddern, michael.milford, gordon.wyeth}@qut.edu.au

Abstract— Appearance-based localization can provide loop easily derive odometry from visual information. The
closure detection at vast scales regardless of accumulated metric incorporation of a motion model into the image retrieval
error. However, the computation time and memory requirements process has the potential to greatly enhance the performance of
of current appearance-based methods scale not only with the size an appearance based system. In CAT-SLAM [6, 7], the local
of the environment but also with the operation time of the movement information of the robot is fused with appearance
platform. Additionally, repeated visits to locations will develop information using a particle filter to dramatically improve the
multiple competing representations, which will reduce recall recall of location over the use of appearance information alone.
performance over time. These properties impose severe
In this paper, we address the problem of using appearance
restrictions on long-term autonomy for mobile robots, as loop
based methods when revisiting the same location multiple
closure performance will inevitably degrade with increased
operation time. In this paper we present a graphical extension to times. Typical appearance-based methods have linear growth in
CAT-SLAM, a particle filter-based algorithm for appearance- the number of representations as locations are re-visited over
based localization and mapping, to provide constant computation and over. Consequently, both computation time and memory
and memory requirements over time and minimal degradation of usage have unbounded growth over time. This problem is
recall performance during repeated visits to locations. We compounded with a corresponding fall in recall performance as
demonstrate loop closure detection in a large urban environment multiple representations of a single place compete to be the
with capped computation time and memory requirements and “correct” or best representation.
performance exceeding previous appearance-based methods by a The new method, called CAT-Graph, uses a combination of
factor of 2. We discuss the limitations of the algorithm with visual appearance and local odometry data as in CAT-SLAM,
respect to environment size, appearance change over time and but fuses multiple visits to the same location into a topological
applications in topological planning and navigation for long-term graph-based representation. Localization is performed by
robot operation. propagating particles along the edges in the graph using local
motion information and updating particle weights based on
Keywords— Localization, Mapping, SLAM, Vision
local appearance information. Mapping is performed by adding
I. INTRODUCTION new motion and observation information to the graph as the
robot visits new locations. Locations that are revisited are
Appearance based mapping provides the means to create explicitly connected to the most recent location, and particles
useful topological and metric maps on resource limited robots can traverse these connections to simultaneously evaluate
by using appearance signatures to identify places, rather than multiple representations of a single place. The number of nodes
relying on accurate metric sensing. Appearance based methods in the graph is limited by a pruning scheme that maintains the
are popular as a method of detecting loop closure in range map at constant memory requirements while minimizing
based metric maps [1, 2], and also for generating complete information loss and maintaining performance.
topological maps that can be used for path planning and Importantly, the algorithm does not rely on global metric
navigation [3]. Appearance may refer more broadly to a robot’s information, or attempt to relax the measurements between
sensor signatures [4], but most often refers to a snapshot image nodes for global metric consistency. Instead, the topology
of a location from a camera mounted on the robot. maintains relative representations between nodes to provide
The computer vision community has provided much of the local metric information to provide improvement in recall
initial impetus in the advent of appearance based SLAM. performance. We illustrate that global metric accuracy is not
Advances in image retrieval techniques, such as visual bag-of- required to incorporate appearance and motion into our
words [5], have produced impressive results, but there is an mapping and localization algorithm.
opportunity to take advantage of the robotic context of the The paper proceeds with a review of recent work in
appearance based mapping problem. Robots on the move appearance-based mapping and localization systems, before
typically have readily available odometric information, or can presenting the details of the CAT-Graph algorithm. The
performance of the algorithm is demonstrated using the well-
This research was supported by an Australian Research Council Special known New College dataset [8], illustrating marked
Research Initiative on Thinking Systems, TS0669699, and an Australian
Postgraduate Award.

281
improvements in recall performance with capped memory and a clustering-based approach to identify unnecessary views and
computation costs. remove them from the map. [28] presented an information-
based approach to node pruning, which removes nodes based
II. RELATED WORK on local visual saliency relative to its neighbors. This method
Graphical representations in metric SLAM have been provided constant-memory loop closure detection using CAT-
studied extensively for many years [9], and pose-graph SLAM when operating in a fixed size environment, but did not
optimization remains an active area of research [10, 11]. address frequent location revisiting.
GraphSLAM [12] and other well-known topological SLAM In this paper we will explore the challenges of achieving
methods form a pose graph of observations connected by edges constant computation time and memory requirement mapping
constrained using relative motion information. However, the and localization during repeated revisits in a fixed size
goal of these graphical methods is to create an optimal metric environment using an appearance based system.
map in a global Euclidean frame. The use of relative
representations without global embedding has been explored in III. ALGORITHM DETAILS
the VO community [13], where only local metric accuracy is The proposed algorithm outlined in this section extends the
required. Notably, [14] combines local relative geometry linear ‘trajectory-based’ representation of [7] to a generalised
without a global frame with topological appearance-based graph-based representation. The steps of the algorithm for each
localization using FAB-MAP. new update of control input uk and visual bag-of-words
A number of recent algorithms in the field of probabilistic observation zk are as follows:
topological mapping approach loop closure and map 1. Add uk and zk to the graph G as node Nk.
construction as two parts of the same problem. The approach of
[15] finds the optimal set of local metric and appearance 2. Update all particle locations in the graph using
information in the current map that best matches the current set control input uk; match to best existing location in
of observations and local motion. [16] describes a system the graph.
where both local metric maps and topological position are used 3. Update all particle weights using observation zk:
to determine the current location within the hybrid map. A match to expected appearance at particle location.
general formulation of this approach is presented in [17] using
4. Normalise particle weights along with an ‘unvisited
a Rao-Blackwellised particle filter across a probabilistic
location’ weight to represent the likelihood of a new
topology. However, these approaches have only been
place.
demonstrated in small environments.
Appearance-based localization systems do not typically 5. Determine if the particles indicate a loop closure has
address data association when revisiting locations multiple occurred; if so, create a link from Nk to the particle
times, instead creating multiple representations for each location.
location. [18, 19] describes a data association procedure upon 6. Resample the particles if necessary. Distribute
loop closure detection, but the later approach adopted in [20] ‘unvisited location’ particles to random locations in
simply adds multiple representations. Large scale appearance- the graph (to combat particle deprivation).
based localization is typically only demonstrated on trajectories
that only feature one loop closure for each location [21], which 7. If the number of nodes exceeds the maximum,
determine node information content and remove the
does not address the persistence problem. Attempts to improve
least informative node.
the recall performance of appearance-based SLAM algorithms
such as FAB-MAP typically require additional information not The graph G defines a connected manifold which is locally
provided by descriptor-based image similarity alone; [1] uses Euclidian but not embedded in a global Euclidean frame.
RANSAC to compare feature geometry, [6] requires vehicle Nodes Ni in the graph are associated with observations zi,
odometry information and [22] uses additional laser or stereo which take the form of a visual bag-of-words representation of
image sensors for 3D geometric verification. features visible at location i. Edges eij connect node Ni to Nj and
Constant computation time and memory requirements for are associated with the control input uij (and associated
mapping systems have been addressed most extensively in the covariance matrix Σij) experienced at location i. Localization is
metric mapping domain. Submaps have been used to achieve performed using a set of np particles pk(i ) which represent the
constant time map convergence in the approach by [23], with likelihood of revisiting the previously visited location at
the assumption of repeated robot visits to all parts of the fraction α (i ) (between 0 and 1) along associated edge eij(i ) at time
environment. Dynamic submaps have also been used to achieve k. Each particle is also associated with a Boolean direction
a constant map update time in the approach by [24]. Occupancy variable d (i ) along with a weight wk(i ) .
grid mapping approaches typically scale linearly in terms of A. Local Graph Construction
memory requirements with the area of the map: [25] builds on
the occupancy grid approach by forming multiple occupancy To perform local operations on the graph we adopt a
maps in parallel, each representing a different timescale, and relative representation, which constructs the local
demonstrated it over a period of five weeks. neighbourhood of the graph from the reference frame of
Relatively little work on constant memory or computation particle pk(i ) to a maximum radius of r. A standard non-linear
time mapping has occurred in the appearance-based mapping motion model f(x,u) generates 3DOF Euclidean changes in
space. [26] describes a short/long term memory approach to local pose x from control input u, defined as follows:
limiting visual bag-of-words location growth, and [27] outlines

282
Algorithm 1 Local Graph Construction
function localGraph ( pk(i ), r ) :
N L is a list of nodes and local positions
E L is a list of edges
x iL := −α ( ) f ( 0, u ij ) or x iL := −α ( ) f ([0 0 π ], u ij ) if d (i ) = 1
i i

push { N i , x iL } to fringe F
until fringe F is empty
pop { N n , x nL } from fringe F
add { N n , x nL } to N L (if Nn not already present)
for every edge emn and enm linked to Nn do
x mL := f ( x nL , u nm ) or x mL := f ( x nL , −u mn )
(a) (b) push emn or enm to E L
Figure 1 – Local graph and particle update diagram. (a) illustrates the local
graph of particle pk(i ) . The graph is constructed from the reference frame of
if x mL < r then
the particle to a fixed radius r. (b) illustrates the local particle update push { N m , x mL } to fringe F
procedure for particle pk(i ) . The particle is locally propagated by control input
uk (plus additive Gaussian noise wk) to generate proposed local position x̂ (ik ) . else
Each edge is then tested to find the location x (ik ) which minimizes the add { N m , x mL } to N L
Mahalanobis distance to the proposed particle location x̂ (ik ) using covariance
matrix Σk. The updated particle location is set to the most likely edge ejk at end if
fraction α̂ . end for
end until
x = [x y θ ]T , u k = [Δx Δy Δθ ]T (1) return N L , E L

The first step in constructing the local graph is to determine location x̂ (ik ) . The local graph radius is set to the length of the
the local position x iL of node Ni relative to particle pk(i ) . From proposed pose change x̂ (ik ) plus a multiple s of the maximum
there a breadth-first search (with fringe F) is performed to find eigenvalue λ1 of covariance matrix Σk (typically set to 3
all nodes Nn (and associated local positions x nL ) and edges emn standard deviations or more). This represents a worst-case
and enm within a distance r of particle pk(i ) , outlined in scenario for the difference between the proposed pose change
Algorithm 1. Note that the first node and associated edge to fall x̂ (ik ) and nearest edge location x̂ kL .
outside distance r is still added to the list – this ensures all C. Appearance-based Observation Update
edges even partly within the local graph radius are included in
EL. The direction variable d(i) reverses the initial position x iL to The observation update weights particles based on the
likelihood of the current observation given the expected local
represent a particle in a reverse orientation along edge eij. A
appearance. The likelihood is calculated by comparing the
sample local graph is illustrated in Fig. 1 (a).
current visual bag-of-words zk to the appearance generated by
B. Particle Position Update interpolating between observations zi and zj, given particle pk(i )
To follow a localization hypothesis, each particle is
propagated along edges in the graph according to the current Algorithm 2 Particle Position Update
control input uk. The proposed local change in pose x̂ (ik ) for each function updateParticlePosition ( pk−1
(i )
, u k , Σk ) :
particle is generated by adding Gaussian noise w k with
covariance Σk to the position generated by the non-linear xˆ (ki ) := f (0, u k ) + N (0, Σ k )
motion model f(0,uk). However, the proposed current location (i )
N L , E L := localGraph pk−1 (
, x̂ (ik ) + sλ1 ( Σk ) )
will rarely correspond to a location along an existing edge in L L L
the local graph, therefore particles are assigned to the most for every edge eij in E from x to x in N L do i j

likely location along an edge in the local graph. Formally, we ∂ ⎡ L


⎣x i + α̂ ( x j − x i )⎦ Σk ⎣x i + α̂ ( x j − x i )⎦ = 0 for α̂
L ⎤ −1 ⎡ L L ⎤
T
L L
seek a location x̂ kL at fraction α̂ along edge eij in EL which solve
minimizes the Mahalanobis distance (using covariance matrix ∂α̂
Σk) to the proposed particle location x̂ (ik ) . We solve for the if 0 ≤ α̂ ≤ 1 then
minimum by differentiating the Mahalanobis distance along an x̂ kL := x iL + α̂ ( x Lj − x iL )
edge eij with respect to fraction α̂ and solving for ∂ ∂α̂ = 0 : if
α̂ is between 0 and 1 a local minimum lies along edge eij. The (
P ( x̂ kL | x̂ (ik ), Σk ) := exp − 12 ⎡⎣x̂ kL − x̂ (ik ) ⎤⎦ Σ−1
T
)
⎡x̂ L (i ) ⎤
k ⎣ k − x̂ k ⎦

weight of each particle is updated based on the Gaussian store α̂ , i, j for maximum P
likelihood function P ( x̂ kL | x̂ (ik ), Σk ) . This process is illustrated in end if
Fig. 1 (b) and outlined in Algorithm 2. To reduce end for
computational complexity the graph is only built to the size α k(i ) := max α̂, , ek(i ) := max eij , ŵk(i ) := wk−1
(i )
max P ( x̂ kL | x̂ (ik ), Σk )
required to include any edge near the most likely particle P P P

283
is at fraction α (i ) on edge eij at time k. The weight of each
particle is updated as follows:

ŵk(i ) := ŵk(i ) P ( z k | z i , z j , α (i ) ) (3)

The form of this likelihood function is derived from FAB-


MAP and illustrated in full in [7].
D. New Place Detection
Since the particle set pk is constrained to exist only along
edges e in the graph, they can only represent location
hypotheses for previously visited locations. To determine if the
current set of observation and motion information indicates the
vehicle is in a previously unvisited location we sample from an (a) (b)
‘unvisited’ location G not on the graph G.
Figure 2 – Node information content and node pruning diagram. (a) illustrates
the calculation of most likely local position x̂ iL at fraction α̂ on the
P (G | z k , u k ) = P ( z k | G ) P (G | u k ) (4) hypothetical edge ejk between node Nj and Nk. This location is used to generate
an expected appearance which is compared to observation zi at node Ni to
determine the information content Ii. (b) illustrates the connectivity of the
The observation and motion distributions for an unvisited local graph after node Ni has been removed.
location G can be approximated using information from
training data as follows: Algorithm 3 Loop Closure Detection
for every particle pk(i ) do
P ( z k | G ) P (G | u k ) ≈ P ( z k | z avg ) P ( u avg | u k ) (5) N L , E L := localGraph ( pk(i ), dh )
for every particle pk( j ) on every edge eij in E L do
zavg represents an ‘average’ observation and uavg an P ( pk(i ) ) := ∑ wk( j )
‘average’ control input. These are determined using the mean
field approximation or a sampling method, both presented in end for
[8]. The proposed weighting assigned to a location not on the end for
graph is given as follows: if max P ( pk(i ) ) > T then
create edge eki using u k − α k(i ) u ij , Σk + Σij
ŵknew = 1n P ( z k | z avg ) P ( u avg | u k ) (6) end if

The new location weight is denoted by wnew. Note that it is current and matched location. This process is outlined in
not recursively updated; this represents a uniform likelihood of Algorithm 3.
departing the graph at any point in time. The addition of a new edge on a loop closure event is
crucial for increasing recall on repeated traverses of a route; it
E. Resampling and Loop Closure Detection allows particles to simultaneously evaluate multiple
Particle resampling is performed using the Select with representations of a location while recognising that all
Replacement method as in [7]. Any particles selected to replace representations correspond to the same physical location (since
the new location weight are sampled to a random edge on the the local graph construction will connect both locations).
graph (with a random direction). This serves to counteract the
effects of particle deprivation since the proportion of particles F. Local Information-based Pruning
sampled to random locations on the graph increases as the new To limit the map to a fixed maximum size we extend the
place likelihood increases, thereby increasing the probability of information-based trajectory pruning approach of [28] to a
detecting loop closures without requiring evaluation of every graphical representation. The pruning stage is performed before
previously visited location. each particle update process. For each new node added to the
To determine the most likely location hypothesis from the graph, if the total number of nodes in the graph exceeds a
distribution of particles a spatially selective method is used, preset number, the node with the lowest information content
equivalent to integrating the probability distribution over a relative to its neighbors is removed and replaced with a direct
local area in the graph. For every particle pk(i ) , the location link between its neighbors.
hypothesis P ( pk(i ) ) is equal to the sum of the weights of all To find the information content of a node Ni relative to its
particles within distance dh of the current particle within its neighbors Nj and Nk we compare the observation zi to that
local graph. The value of dh is selected based on the desired generated by interpolating neighboring observations zj and zk
resolution of loop closure detection, and as such the location along a proposed edge ejk that bypasses node Ni. If the proposed
hypothesis is not subject to arbitrary location discretization due edge ejk produces an adequate representation for observation zi,
to local visual saliency. If the maximum location hypothesis then node Ni can be removed with minimal loss of information.
exceeds a threshold T, a new graph edge is added between the This process is illustrated Fig. 2 (a) and outlined in Algorithm

284
Algorithm 4 Node Information Content College, Oxford, pictured in Fig. 3, with multiple traversals of
for every node Ni do each location in both forward and reverse directions (a total of
for every edge pair eji from Nj and eik to Nk do 5 traversals of the quadrangle area). Ground truth is nominally
x Lj := f (0, −u ji ), x kL := f (0, u ik ), Σ jk := Σ ji + Σik provided by GPS locations; however, the signal is degraded in
many locations throughout the urban dataset (particularly
∂ ⎡ L
⎣x j + α̂ ( x k − x j )⎦ Σ jk ⎣x j + α̂ ( x k − x j )⎦ = 0
L ⎤ −1 ⎡ L L ⎤
T
solve
L L through a tunnel between courtyards). Approximately 45% of
∂α̂ the panoramic images have an associated valid GPS position;
x̂ iL := x Lj + α̂ ( x kL − x Lj ) recall data for the precision recall curves is instead based on a
hand-corrected trajectory from wheel odometry and manual
(
P ( x̂ iL | x Lj , x kL ) := exp − 12 ⎡⎣x̂ iL ⎤⎦ Σ jk ⎡⎣x̂ iL ⎤⎦
T
) loop closures, which provides a topologically correct location
for every frame.
I i = − log P ( z i | z j , z k , α̂ ) P ( x̂ | x , x
L
i
L
j
L
k ) B. Algorithm Parameters
end for
end for As FAB-MAP, CAT-SLAM and CAT-Graph only require
appearance information, no camera calibration or image
Algorithm 5 Node Pruning registration is required. Feature descriptors are extracted using
if k > nn then SURF [29], and a fast approximate nearest neighbour algorithm
find Ni with minimum Ii between nodes Nj and Nk [30] was used to find the corresponding visual word for each
create edge ejk between Nj and Nk using u ji + u ik , Σ ji + Σik descriptor.
The FAB-MAP implementation used for comparison is
for every edge eni from Nn to Ni where n ≠ j do
openFABMAP [31], which produces results comparable to
create edge enj using u ji − u ni , Σ ji + Σni
those presented in [2]. Parameters for the detector functions of
for every edge eim from Ni to Nm where m ≠ k do FAB-MAP were taken from [19]. The parameters and results
create edge ekm using u im − u ik , Σim + Σik for CAT-SLAM on the same dataset are taken from [6]. The
end for
delete node Ni and observation zi, edges eji and eik
end if

4. The information content Ii of node Ni, is defined as the


negative log-likelihood of the odometric and appearance-based
match between node Ni and the proposed location x̂ iL along the
proposed edge between Nj and Nk. Unlike in [28], nodes may
have multiple neighbors due to explicit loop closure events, and
therefore the information content of all proposed edges
between neighbors must be evaluated to find the bypassing
edge with highest information content.
To avoid unbounded growth in storage requirements, nodes
are removed from the graph once the total number of nodes
exceeds a threshold nn. The node Ni with minimum information
content Ii is deleted from the graph, and the proposed edge ejk Figure 3 – Aerial view of test environment with route overlaid.
between neighbors Nj and Nk is added. All other nodes
connected to Ni are re-routed to neighbors Nj and Nk to preserve TABLE 1 – ALGORITHM PARAMETERS
the connectivity of the graph. Particles previously on edges openFABMAP
connected to Ni are relocated to the new edges. This process is p(zi = 1 | ei = 0) 0
illustrated in Fig. 2 (b) and outlined in Algorithm 5. p(zi = 0 | ei = 1) 0.61
p( Lnew | Zk −1 ) 0.9
IV. EXPERIMENTAL SETUP
CAT-SLAM
To facilitate a direct comparison to FAB-MAP and p(zi = 1 | ei = 0) 0
previous implementations of CAT-SLAM, we use the same p(zi = 0 | ei = 1) 0.61
evaluation dataset as that presented in [6]. To differentiate the Number of Particles N 2000
algorithms we refer to the novel representation presented in this ESS Threshold 0.25
paper as a Continuous Appearance-based Topological Graph, Distribution Radius r 2.5m
or CAT-Graph. CAT-Graph
p(zi = 1 | ei = 0) 0
A. Testing Environment p(zi = 0 | ei = 1) 0.61
The urban dataset used for this evaluation is presented in Number of Particles np 1000
[8]. It consists of 8127 panoramic images from a Point Grey Number of Nodes nn 2000
Ladybug2 camera with accompanying wheel odometry (from Particle Update Graph Size s 3σ
ESS Threshold 0.25
shaft encoders on the Segway RMP) and GPS data logged at Distribution Radius dh 2.5m
5Hz. The route taken is a 2.5km tour of the grounds of New Hypothesis Threshold T 0.9

285
codebook was generated using modified sequential clustering
[32] yielding 5000 visual words. Parameters for the three
algorithms are presented in Table 1.
V. RESULTS
A. Precision-Recall Performance
To assess the performance of the CAT-Graph algorithm in
comparison to openFABMAP and CAT-SLAM, we examine
the precision-recall curves they produce for the test
environment. Expected matches are defined as previously
visited locations within 7.5m of the current location. The (a) Ii = 27.64 (b) Ii = 6.71
desired performance is high recall at 100% precision.
Fig. 4 presents the precision-recall curves produced by Figure 5 – Information content illustration. (a) illustrates a sequence of nodes
openFABMAP, CAT-SLAM and two variants of the CAT- with high information content. In this case, close proximity to buildings and
trees cause a high degree of difference between sequential frames, yielding
Graph algorithm; one using 1000 particles and one with 1000
high relative information. (b) illustrates a sequence through a tunnel, with very
particles and a limit of 2000 nodes. Both CAT-Graph variants little visual change between successive frames and therefore a low
provide almost double the recall of CAT-SLAM and 7 times information content.
the recall of openFABMAP at 100% precision, but do not
differ significantly from each other despite the difference in C. Node Information Content
memory scaling. This demonstrates the effectiveness of the Fig. 5 illustrates a pair of frame sequences at different
information-based node-pruning algorithm; for this locations in the environment. The first sequence involves the
environment, localization performance is not significantly robot moving under an overhanging tree and passing a
affected despite the removal of approximately 5000 distinctive building, while the second involves the robot
observations from the graph. travelling through a dark, featureless tunnel. As shown below
B. Loop Closure Distribution the sequences, the information content for the central frame in
the second sequence is significantly lower than that calculated
To assess the effects of using loop closure events to inform for the central frame in the first sequence.
the topological graph construction (and therefore to examine
the improvement gained over other appearance-based SLAM D. Computational and Memory Scaling
algorithms which do not explicitly perform data association) Fig. 7 presents the computation and storage requirements for
we examine the loop closure distribution for all four openFABMAP, CAT-SLAM and the two CAT-Graph variants.
algorithms. Fig. 6 shows the loop closures detected at 100% The computation time does not include feature extraction and
precision projected onto the hand-corrected ground truth. visual word classification (on average 800ms per frame), as
Both variants of CAT-Graph detect a significant number these will be identical for all three algorithms. The difference
more loop closures than openFABMAP and CAT-SLAM, and in computational scaling is clear, with openFABMAP reduced
the distribution of loop closures are uniform (they are not to update rates below 1Hz by the end of the dataset due to the
concentrated at any particular location but rather spread across linear increase in computational requirements. The CAT-Graph
the environment). configuration with 2000 nodes requires the greatest amount of
time per update (as the information content of each node is
assessed), but still remains approximately constant over the
length of the dataset. Spikes in the computation graph are due
to the evaluation of nodes with multiple neighbors, as the
information content between every combination of neighbors
is assessed.
The storage requirements scale linearly with operation time
for all but the CAT-Graph configuration with limited nodes.
All algorithms require the codebook and Chow-Liu tree from
training data, which contribute to the initial ~4MB at the start
of the dataset. Storage requirements initially increase linearly
for both algorithms; CAT-SLAM requires additional storage
for location and odometry information for the continuous
trajectory as well as particle locations, weights and directions.
However, once the number of locations reaches 2000, CAT-
Graph with limited nodes does not require more storage,
whereas openFABMAP and CAT-SLAM memory
Figure 4 – Precision-Recall curve for four algorithm variants on the New
College dataset. The two CAT-Graph variants provide greatly increased recall
requirements continue to increase linearly with the number of
performance over CAT-SLAM and openFABMAP despite the differences in frames.
computational and memory scaling.

286
(a)
Figure 7 - Computation time and memory requirements for openFABMAP,
CAT-SLAM and CAT-Graph. All CAT-SLAM and CAT-Graph variants
provide constant computational time scaling, and the CAT-Graph variant with
limited nodes provides constant memory scaling over time.

Loop closure events in CAT-Graph also increase the


number of edges in the map and therefore memory
requirements. However, the worst case memory requirement
for storage of edges is when an edge exists for every node pair,
making it O(N2) in number of nodes; the worst case edge
storage requirements do not grow over time if the number of
nodes is fixed.
VI. DISCUSSION
(b) The use of a topological graph and local metric
representations along with loop-closure-informed graph links
provide CAT-Graph with significant performance increases
over CAT-SLAM and openFABMAP. In this section we
discuss the key insights gained from this work and directions
for future work.
A. Improved Loop Closure Through Hypothesis Aggregation
The ability to combine multiple hypotheses for multiple
representations of the same location is crucial for long-term
operation in a fixed-size environment. CAT-Graph consistently
detects more loop closures over successive revisits to the same
location, whereas CAT-SLAM develops multiple separate
representations of the same trajectory but does not identify that
(c) all correspond to the same location. Unlike CAT-SLAM, where
location hypotheses compete upon revisiting a location, by
creating explicit links between locations when loop closures are
detected, multiple hypotheses from multiple representations of
the same location support, rather than complete with, each
other.
B. Saliency-based Map Pruning
The node pruning scheme enables localization and mapping
in a fixed size environment without memory requirements
increasing linearly with operation time. Nodes are removed
based on visual saliency, with less distinctive areas of the
environment generating fewer nodes. The selection of the
maximum number of nodes nn depends on the memory
available to the robot and the visual saliency of the
(d) environment. Experiments presented in [28] for the CAT-
Figure 6 – Loop closure distribution at 100% precision overlaid on hand- SLAM algorithm illustrate a graceful degradation of recall
corrected ground truth based on wheel odometry information. CAT-Graph performance with reduced numbers of nodes and particles;
detects consistently more loop closures than both openFABMAP and CAT- however, determining the absolute minimum number of nodes
SLAM in all areas of the environment.

287
sufficient to fully represent a particular fixed-size environment [10] M. Kaess et al., "iSAM2: Incremental smoothing and mapping with fluid
remains an open problem, and the constant-memory approach relinearization and incremental variable reordering" in IEEE International
Conference on Robots and Automation. 2011. Shanghai, China.
will fail if the robot continuously explores new locations. When [11] R. Kummerle et al., "g2o: A general framework for graph optimization"
maintaining a constant memory map in a fixed-size in IEEE International Conference on Robots and Automation. 2011.
environment, removing the least informative nodes ensures Shanghai, China.
future localization performance is minimally affected. [12] S. Thrun and M. Montemerlo, "The graph SLAM algorithm with
applications to large-scale mapping of urban structures". The
C. Future Work International Journal of Robotics Research, 2006. 25(5-6): p. 403.
Incorporating loop closure events into graph construction [13] D. Nister, O. Naroditsky, and J. Bergen. "Visual odometry" 2004: IEEE.
provides performance benefits when repeatedly traversing the [14] G. Sibley et al., "Vast-scale Outdoor Navigation Using Adaptive Relative
Bundle Adjustment." The International Journal of Robotics Research,
same location. However, in order for the loop closure events to 2010:
be detected, the appearance of the environment cannot change [15] E. Olson, "Robust and efficient robotic mapping" PhD Thesis. 2008,
significantly between visits. While FAB-MAP provides a level Massachussets Institute of Technology.
of robustness, matching still fails over large changes in [16] J. Blanco, J. Fernandez-Madrigal, and J. Gonzales, "Toward a Unified
environmental appearance, such as experienced during the Bayesian Approach to Hybrid Metric-Topological SLAM". IEEE
Transactions on Robotics, 2008. 24(2): p. 259-270.
course of a day [33]. Explicit modeling of appearance change [17] A. Ranganathan and F. Dellaert, "Online Probabilistic Topological
over time as in [34] could enable persistent localization and Mapping". The International Journal of Robotics Research, 2011.
mapping over longer time periods. [18] M. Cummins and P. Newman. "Probabilistic appearance based navigation
Along with its mapping and localization capabilities, the and loop closing." in IEEE International Conference on Robotics and
graph-based representation of CAT-Graph forms a suitable Automation. 2007. Rome, Italy.
[19] M. Cummins and P. Newman, "FAB-MAP: Probabilistic localization and
basis for mobile robot path planning and navigation. A world mapping in the space of appearance." The International Journal of
representation which is globally topological and locally metric, Robotics Research, 2008. 27(6): p. 647.
such as that provided by CAT-Graph, is suitable for [20] M. Cummins and P. Newman, "Appearance-only SLAM at large scale
autonomous robot operations as demonstrated in [3]. To use the with FAB-MAP 2.0." The International Journal of Robotics Research,
global topological plan for robot navigation, it must be 2010.
[21] M. Cummins and P. Newman. "Highly scalable appearance-only SLAM–
integrated with a local planning mechanism. This can be FAB-MAP 2.0." in Robotics: Science and Systems Conference. 2009.
accomplished by generating the local metric graph Seattle, Washington.
representation at the current location, and planning a local [22] R. Paul and P. Newman. "FAB-MAP 3D: Topological Mapping with
metric path towards the nodes selected by the global Spatial and Visual Appearance." in IEEE International Conference on
topological planner, such as in [28]. Robotics and Automation. 2010. Anchorage, Alaska.
[23] J. Leonard and P. Newman. "Consistent, convergent, and constant-time
In conclusion, we believe that the use of loop closure events SLAM." in ICJAI, Acapulco, Mexico, 2003.
to inform future appearance-based matches, along with a [24] S. Thrun et al. "SLAM updates require constant time." in Workshop on
framework for constant computational and memory resource the Algorithmic Foundations of Robotics, 2002.
usage, are important steps towards persistent robot autonomy. [25] P. Biber and T. Duckett, "Experimental analysis of sample-based maps
for long-term SLAM." The International Journal of Robotics Research,
REFERENCES 2009. 28(1): p. 20-33.
[26] M. Labbe and F. Michaud, “Memory management for real-time
[1] K. Konolige et al., "View-based maps." The International Journal of appearance-based loop closure detection,” in IEEE Conference on
Robotics Research, 2009. 29(8): p. 1-17. Intelligent Robots and Systems, San Francisco, CA, 2011.
[2] P. Newman et al., "Navigating, Recognizing and Describing Urban [27] K. Konolige and J. Bowman, “Towards lifelong visual maps,” in IEEE
Spaces With Vision and Lasers". The International Journal of Robotics Conference on Intelligent Robots and Systems, St. Louis, MO, 2009.
Research, 2009. 28(11-12): p. 1406. [28] W. Maddern, M. Milford and G. Wyeth, “Capping computation time and
[3] M. Milford and G. Wyeth, "Persistent Navigation and Mapping using a storage requirements for appearance-based localization with CAT-
Biologically Inspired SLAM System." The International Journal of SLAM,” in IEEE International Conference on Robotics and Automation,
Robotics Research, 2010,29: 1131-1153. Minneapolis, MN, 2012.
[4] P.E. Rybski et al., "Appearance-based minimalistic metric SLAM". in [29] H. Bay, T. Tuytelaars, and L. Van Gool. "Surf: Speeded up robust
IEEE International Conference on Intelligent Robots and Systems, Las features." in European Conference on Computer Vision. 2006. Graz,
Vegas, NV, 2003. Austria: Springer.
[5] J. Sivic and A. Zisserman, "Video Google: A text retrieval approach to [30] S. Arya et al., "An optimal algorithm for approximate nearest neighbor
object matching in videos." in IEEE International Conference on searching fixed dimensions." Journal of the ACM, 1998. 45(6): p. 891-
Computer Vision. 2003. Nice, France. 923.
[6] W. Maddern, M. Milford, and G. Wyeth, "Continuous Appearance-based [31] A. Glover et al. "OpenFABMAP: An Open Source Toolbox for
Trajectory SLAM". in IEEE International Conference on Robots and Appearance-based Loop Closure Detection." in IEEE International
Automation. 2011. Shanghai, China. Conference on Robotics and Automation. 2012. St Paul, United States:
[7] W. Maddern, M. Milford and G. Wyeth, “CAT-SLAM: Probabilistic IEEE.
localization and mapping using a continuous appearance-based [32] A. Teynor and H. Burkhardt, "Fast Codebook Generation by Sequential
trajectory,” The International Journal of Robotics Research, 31(4): 429- Data Analysis for Object Classification". Advances in Visual Computing,
451, 2012. 2007: p. 610-620.
[8] M. Smith et al., "The new college vision and laser data set." The [33] A. Glover et al., "FAB-MAP + RatSLAM: Appearance-based SLAM for
International Journal of Robotics Research, 2009. 28(5): p. 595. Multiple Times of Day", in IEEE International Conference of Robotics
[9] S. Thrun and J. Leonard, "Simultaneous Localization and Mapping", in and Automation. 2010: Anchorage, Alaska.
Springer Handbook of Robotics, B. Siciliano, Editor. 2008, Springer [34] F. Dayoub and T. Duckett. “An adaptive appearance-based map for long-
Berlin Heidelberg. term topological localization of mobile robots.” IEEE/RSJ International
Conference on Intelligent Robots and Systems, 2008.

288
Robust Navigation Execution by Planning in Belief
Space
Bhaskara Marthi
Willow Garage, Inc.
Menlo Park, CA 94025
Email: [email protected]

Abstract—We consider robot navigation in environments given


a known static map, but where dynamic obstacles of varying and
unknown lifespans appear and disappear over time. We describe
a roadmap-based formulation of the problem that takes the
sensing and transition uncertainty into account, and an efficient
B
online planner for this problem. The planner displays behaviors
such as persistence and obstacle timeouts that would normally
be hardcoded into an executive. It is also able to make inferences
about obstacle types even with impoverished sensors. We present
empirical results on simulated domains and on a PR2 robot. S
A
G

I. I NTRODUCTION
Navigation given a known map is a fundamental capability
for mobile robots, and there exist practical and efficient meth-
ods for subproblems such as localization and path-planning.
Less well-understood is how to put these pieces together such
that the overall system acts robustly in the world. Define a Fig. 1. An example navigation problem. The goal is to get to G from S.
navigation system to be a program that takes in a known
static map, localization information, sensor data, and naviga-
tion goals (represented as positions in space), and controls Also, the robot must not suddenly change its mind when it is,
lower level systems (say via velocity commands of the form say, at position B, and turn back around. This requires either
(ẋ, ẏ, θ̇)). Given a correct map of a static indoor environment making obstacles fairly long-lived, or hardcoding “persistence”
(i.e., with no new obstacles besides the ones in the map), a of some sort into the system. In the first case, there needs to
navigation system is easy to write. For example, we could be some scheme for eventually timing out obstacles, to avoid
take localization input from the AMCL algorithm [24], call a the possibility, e.g., of a corridor being considered permanently
planner such as A* [21] on a discretized grid to produce a path, out of bounds due to seeing a person blocking it once. Finally,
then use a trajectory following algorithm such as DWA [1] the algorithm should deal intelligently with different types
to generate velocity commands that will follow that path. An of obstacles. It might, for example, make sense to wait for
autonomous robot in an unconstrained real-world environment a person to move, but not for a couch. Ideally, perception
such as an office must, however, also deal with various sorts would give us this information. But even if, as is currently
of changes to the map. A person might be standing in a the norm, perception is fairly impoverished/noisy, there is still
doorway, a couch may have temporarily been moved into the possibility of a kind of implicit sensing of the obstacle
a corridor, rendering it impassable, and so on. Navigation type: in the example, it might make sense to just wait for a
systems should behave efficiently and sensibly given that such few seconds. If the obstacle disappears, the robot can take the
dynamic obstacles appear (and disappear) over time. short path after all. If it stays where it is, it is likely to be
A standard approach is to maintain a map that is updated in static and the robot should take the long route.
some manner given obstacles. Getting this to work robustly is The usual way to achieve robust and efficient behavior in
surprisingly tricky. Consider the prototypical example shown such cases is to have an program known as an executive
in Figure 1. Here the shortest way to the goal is to go down sitting above the planners [16]. The executive maintains its
the narrow hallway. The next best alternative is to go around own local state and contains various hand-coded procedures,
the building, which takes ten times as long. Now suppose a set which are intended to avoid infinite loops and dead-ends and
of obstacle points appear in the hallway at position A, making to appear intelligent and goal-directed. As a typical example,
it impossible to traverse. Certainly, the robot should not stand we consider the executive from the ROS navigation stack [11],
and wait forever. The map must therefore be updated with this a popular open source navigation system. Table I shows the
obstacle, causing the robot to eventually choose the long path. set of local state variables maintained by this executive. The

289
State variable Meaning
mode Overall mode: planning, controlling, or clearing we will show, the optimal policy for this POMDP automati-
time The current time cally exhibits the various behaviors discussed above, such as
last-valid-plan Timestamp when planning last succeeded
last-valid-control Timestamp when a valid velocity last found
persistence, patience, and implicit sensing, without having to
oscillation-pose Reference pose used to detect oscillations hardcode them into an executive.
oscillation-timestamp Timestamp for oscillation pose
recovery-index Which recovery behavior was last tried
The scientific contributions include efficient algorithms for
recovery-cause Cause of last failure. One of no-valid-plan, state-updates (Section IV) and planning (Section V) in this
no-valid-control, or oscillation POMDP. At the systems level, the main contribution is show-
ing by example that it is possible to build a robust navigation
TABLE I
S TATE VARIABLES OF THE EXECUTIVE IN THE ROS NAVIGATION STACK system with failure recovery using a principled decision-
theoretic planning approach, rather than scripted behaviors.
Section VI describes the implemented system and empirical
results on both simulated and real-world domains.
idea is that the overall system can be in one of three modes.
In “planning” mode, the robot is stationary and waiting for a II. BACKGROUND
plan. In “controlling” mode, the robot has a plan and is sending A. MDPs
velocity commands at some rate. Finally, in “clearing” mode, An undiscounted Markov decision process [17], or MDP,
something has gone wrong, and one of a user-specified set consists of a state space S, action set A, transition model
of recovery behaviors is being executed. These may include P (s |s, a), reward function R(s, a, s ), and terminal states
removing obstacles from the map beyond some distance, or T ⊂ S. A stationary policy is a function π : S → A. A
rotating in place. Several other variables govern the transitions policy induces a distribution over state-action trajectories that
between these modes, as shown in Table I. continue until reaching a terminal state. The value function of
The ROS navigation system has achieved impressive perfor- a policy V π (s) is the expected total reward for following π
mance in real-world demonstrations, such as navigating with starting at s, and the Q-function Qπ (s, a) is the expected total
no human intervention for eleven days in an office environ- reward for doing a in s, then following π. The optimal policy
ment. But there are a few disadvantages to using a complicated π ∗ maximizes the value at all states, and we write V and Q for
and stateful executive. First, there is significant programmer its value and Q-functions. In our examples, the set of actions
effort involved in coming up with the execution strategy, varies depending on the state, but this can be represented by
which is usually done in a trial-and-error fashion. Second, making nonapplicable actions have reward −∞.
understanding the system’s behavior requires knowing all the
state variables in Table I and their interactions with each other B. POMDPs
and with the recovery behaviors. This makes it hard to debug A partially observable Markov decision process [6], or
or modify. It also means the planning algorithms have an POMDP, is like an MDP except that states are not di-
inaccurate model of how their plans will be executed. Finally, rectly observed. Instead, there is an observation distribution
the executive is not capable of certain types of behavior, such Z(o|s, a, s ) over the observation that’s received when making
as information-seeking actions. the transition from s to s via a.
We propose an alternative way of structuring navigation In the context of POMDPs, we call a distribution over the
systems. The idea is to explicitly model and plan for the state space a belief state. Given a belief state b about the
uncertainty in the world, then use a simple stateless exec- current state s, if we do an action a, the marginal distribution
utive that just follows the resulting plan. There are many over the next state is the result of the projection operator
ways to formulate the planning problem, some of which b = P(b, a) where:
we discuss in Section III. One possibility is to model the 
transition uncertainty with costs or probabilities, while still b (s ) = b(s)P (s |s, a)
pretending that the state is fully observable. These approaches s∈S

are systematically suboptimal, though, as they will not take Also, given an observation o, the conditional distribution is
information-gathering actions. Alternatively, we can explicitly b = C(b , s, a, o) where:
model the sensing uncertainty, using a partially observable
b (s )Z(o|s, a, s )
Markov decision process (POMDP). POMDPs are notoriously b (s ) =    
difficult to solve, however, and the POMDP in our case has s ∈S b (s )Z(o|s, a, s )

size exponential in the number of possible obstacle locations. The filtering operator F consists of projection followed
In Section III, we present a particular formulation of by conditioning, and is used to update the distribution over
the problem as a POMDP, and an algorithm for solving the current state. A key fact about POMDPs is that the
it efficiently enough to run online on a mobile robot. The sequence of beliefs itself forms an MDP (with the same
formulation is based on a roadmap over the static map, where action space). To sample from the transition model of this
nodes represent particular locations and edges are local paths MDP given belief b and action a, first sample s from b,
that can become blocked and unblocked over time. Obstacles then sample s from P (·|s, a) and o from Z(·|s, a, s ), and
may belong to different classes, with varying lifetimes. As set b = F(b, a, o) (s is discarded). The reward function

290

is R(b, a, b ) =  
s,s b(s)P (s |s, a)R(s, a, s ). The belief
state summarizes the relevant information about the action– B
observation history; in particular, any optimal policy for the s G
belief state MDP is also optimal for the POMDP, assuming
the belief state is maintained exactly. A

C. Solution algorithms
Fig. 2. Problem in which two paths are possible, but edge A was more
The literature has tended to focus on the offline planning recently observed blocked than edge B.
problem of finding a full policy π over the belief space. Due to
the constraints of running on a robot in real-time, we consider
instead the online planning problem [20] where the agent is
repeatedly given an observation and just returns an action for
the current belief state. Most online algorithms are based on G
forward search. A forward search tree for (belief state) MDPs
consists of alternating layers of action nodes and chance nodes. s
The root of the tree is an action node labelled with the initial
(belief) state. An action node has children corresponding to the Fig. 3. Problem in which the top path is better because there is more scope
possible actions. The chance node corresponding to a state- to move around obstacles.
action has children corresponding to the possible successor
states, labelled with the probability of that particular state. A deterministic virtual sensor [10] that allows us to know the
search tree can be used to estimate the value of taking each state of all edges incident to the node of the graph the robot
action at the root by repeated backups. The leaf action nodes is currently at.
are given values according to some heuristic estimate of the
value function at their state. The value of a chance node is the A. Deterministic
average of the child values weighted by the probabilities. The A simple scheme for updating the roadmap is to maintain a
value of an action node is the maximum of the child values. list of blocked edges. An edge is added to this list whenever
There are various choices in how to generate search trees. it is observed blocked and removed when it is observed free.
First, the children of a chance node can be generated either An immediate problem is that the robot can eventually get
exhaustively, based on an explicitly given transition model of into a situation where no path exists. To avoid getting stuck,
the MDP, or indirectly, by sampling repeatedly from it, which whenever a path cannot be found, all edges that are not
only requires a simulator. Second, there is the choice of which currently observed blocked are unblocked. If a path still cannot
nodes to expand, given finite total computation time. Apart be found, a wait action is chosen.
from simple fixed-depth strategies [7], there are various more This scheme does not take any account of the likelihood of
sophisticated methods [20, 9]. Finally, the choice of evaluation an edge being blocked, or of the relative cost of alternative
function at the leaves is of key importance, especially when the paths. It therefore makes various kinds of systematic errors.
tree depth is much lower than the expected time to termination. Example 1: (Block probabilities) In Figure 2, there are two
paths, both blocked, but edge A has been observed blocked
III. M ODELING THE PROBLEM
much more recently than edge B. It therefore makes sense to
Our goal is to build a robotic system that navigates between take the top path.
locations in an indoor environment efficiently. We now con- Example 2: (Prediction) In Figure 3, no edges have been
sider several ways to model this as a planning problem, in observed blocked. It nevertheless makes sense to take the
order to motivate our chosen formulation in Section III-E. All top path, because a single blocked edge on that path can be
models will be defined with respect to a roadmap over the circumvented, while a single blocked edge on the bottom path
static map. This is generated as follows from a known static requires going back to the start.
map: first, waypoints are sampled from the free space. This Example 3: (Patience) In Figure 4, if an obstacle is ob-
can be done using simple uniform tiling sampling, or more served on the edge between S and G, it might make sense
intelligently, e.g., based on the Voronoi graph of free space [3]. (assuming a dynamic model of obstacles) to wait rather than
The main constraint is that from every point in free space, there taking the longer path, since the obstacle could disappear.
is a path of length l < R leading to a waypoint, where R is In each of these cases, the deterministic algorithm will
a fixed radius threshold. Next, given the waypoints, standard choose the wrong action either always or often.
path planning is run offline on pairs of nearby waypoints to
generate edges. If obstacles are not considered, this graph can B. Deterministic with blocked-edge costs
be used for planning by adding the start and goal positions Rather than viewing blocked edges as completely impass-
to the graph, then searching for a path between them. When able, we could give them an extra cost proportional to their
the robot is at a waypoint, it does a quick local reachability probability of being blocked. In other words, given an edge
check to the neighbors in the graph. This process acts as a such that it was last observed T seconds ago, and it was

291
E. POMDP
s G POMDPs model both the transition and sensing uncertainty
in the domain. We use the following POMDP model:
• There is one state variable for the current position in the
roadmap and, for each edge a status, which can either be
free or blocked. In the latter case it belongs to one of a
Fig. 4. Problem in which shortest path is currently blocked, but it might predefined set of classes. In our examples, the obstacle
make sense to wait and see if it disappears. classes are temporary, person, and static.
• The actions at a state are to take one of the outgoing
edges from that node, or to wait for 3 seconds at the
G current position. 1
• The transition model is that a move succeeds iff the edge
is not blocked (at the start of the move). If so, it has a
s R duration depending on the edge length, and the robot ends
up at the other node incident to the edge. Additionally,
each edge status evolves according to an independent
Fig. 5. Problem in which it may make sense to resense the obstacle before
committing to the longer path. continuous time Markov chain. The parameters are:
– the block rate (expected time till an obstacle appears)
– the prior distribution over obstacle classes;
blocked at that time, its cost is c(e) + e−aT B. It therefore – for each obstacle class, the unblock rate, or expected
deals correctly with Example 1. It still fails on Examples 2 time till it disappears.
and 3. While this algorithm takes more account of uncertainty, • The observation model is that the current position is
it still neglects the fact that actions can add information. known with certainty and, for each adjacent edge, the
Example 4: (Value of information) In Figure 5, suppose robot observes whether it is free or blocked (but not
the robot is at S, and the short path to G has been recently which class the obstacle belongs to).
observed blocked. Given the length of the long path, it might • The cost of an action is the time it takes.
still make sense to check again if the path has become free Optimal solutions to (instances of) this POMDP avoid the
before deciding on the long path. It is not possible to achieve problems listed above. For example, in Figure 5, the belief
this in general by adjusting the a and B parameter, for that will include a distribution on whether the obstacle is static,
would prevent the robot from ever considering the long path. temporary, or a person. If the prior probability of being
Having time-varying costs can also lead to another problem. temporary or a person is high enough, the optimal plan will be
Example 5: (Persistence) In Figure 5, suppose now that to move to the blocked edge and wait for some amount of time.
the robot is at R, proceeding down the long path which was If the path clears, take it. If not, then over time the probability
chosen because the short path to G from S had a high initial of being a static obstacle will increase. This falls out of the
probability of being blocked. The probability of being blocked Bayesian updating formula: P (E|h) ∝ P (E)P (h|E) where h
will decrease exponentially though, so it is possible that the is the observation history and E is the event that the obstacle
edge’s blocked cost decreases enough that the robot will stop is static. Even if the prior probability P (E) is low, the longer
in the middle of the path and go back down the other way, the robot waits without the obstacle moving, the higher the
which leads to behavior that is suboptimal (and looks strange). likelihood term P (h|E) becomes, until it eventually becomes
optimal to take an alternate path.
C. Most likely state
A related method is to maintain a probability distribution IV. S TATE E STIMATION
over the true state of the world and plan assuming the most A belief update given belief b, action a, and observation o,
likely graph. As above, this method fails to take sensing consists of a projection through a followed by conditioning
actions in Example 4. on o. In our case, the transition model is:
&
D. MDP P (s |a, s) = I{sp =fp (s,a)} P (suv |suv , t(s, a))
Rather than using costs, we can model the uncertainty using (u,v)∈E
an MDP. A straightforward way to do this would be to have where:
each edge’s status be sampled afresh each time the robot is
• fp (s, a) is the deterministic transition function of position
adjacent to it. The problem this runs into is that, since the
that results in moving to the other incident node of a if
edges have no hidden state, a behavior such as “wait for 10
the move is legal, and staying at the current position for
seconds, then choose another path if this edge is still blocked”
would never be followed in cases like Example 4 — the robot 1 A 3 second interval was chosen as the minimum reasonable wait time in
would either leave immediately or wait forever. our environment; longer waits can be obtained by repeating the action.

292
G G
3 4

2 2 2 2

3 4
s s

Fig. 6. Original problem. Dashed edges have length 2 and have block Fig. 7. Abstracted version of problem in Figure 6. In the initial belief state,
probability above the threshold. All other edges have length 1, and have block dashed edges are possibly blocked, and all others are definitely free.
probability below the threshold.

1 second otherwise. Wait actions also result in staying at locations where the robot has recently observed obstacles. In-
the current position. tuitively, (the nodes incident to) these edges are the important
• t(s, a) is the duration of the action, which is just the edge ones to reason about during planning. The remaining nodes
length. (apart from the start and goal) are just intermediate locations

• P (suv |suv , t) is given by the continuous time Markov we pass through — assuming the block rate is not high, the
chain transition distribution optimal conditional plan will look something like 1) visit the
left dashed edge; 2) if is free, take it; 3) otherwise wait for
We represent our belief states in factored form, as consisting
some number of seconds; 4) if the edge still is blocked, move
of a known position bp , and a distribution buv over each edge’s
to the right dashed edge; and so on. Figure 7 shows the an
status. Given such a belief, since the transition model above
abstracted version of the problem based on this intuition. The
factors over terms, each of which depend on one of the belief
edge lengths have been computed using Dijkstra’s algorithm
variables above, the projected distribution P(b, a) will also
in the original graph, implicitly assuming no new obstacles
have this factored form. Similarly, the observation model is
will appear. Note that Figure 7 is still nontrivial to solve
deterministic, and can be written:
& optimally because it is partially observable, and obstacles may
Z(o|s, a, s ) = I{ouv =g(su,v ,sp )} still appear and disappear on the dashed edges. Nevertheless,
(u,v)∈E it has a shorter effective horizon.
where the function g returns free or blocked if sp equals More precisely, define the abstract graph Ga given a
u or v, and unobserved otherwise. Since the position is problem and belief state. First, let B be the set of edges whose
known, the update can once again be done in factored form: for probability of being blocked exceeds some threshold (we use
1+pβ
edges adjacent to the current position, if the edge is observed 2 where pβ is the stationary probability of an unobserved
free the conditional distribution is free with probability 1, edge being blocked). Let the cut graph be the original graph
and if it is observed blocked, we make the probability of it with edges in B removed. Given current position vs and goal
being free 0 and reweight the remaining statuses to sum to 1. vg , the abstract graph has vertices Ṽ = VB ∪ {vs , vg } where
VB are the incident vertices to edges in B. For each edge in
V. E FFICIENT P LANNING B, there is a corresponding edge in Ga with the same length.
The state space of the POMDP model is exponential in the Also, for every pair of vertices u, v ∈ Ṽ , if the distance duv
number of edges, which is beyond the capabilities of current between them in the cut graph is finite, add an edge in Ga
general-purpose offline planners. Online planners based on between u and v with length duv .
forward search do not directly depend on the state space size, We can then construct the reduced POMDP based on this
but do have exponential dependence (for a fixed bound on abstract graph. The initial belief b0 of the reduced problem is
error) on the search horizon, which can be large in our case, the same as b except that edges not in B are not present,
since plan sizes can be on the order of the graph diameter. and the newly added abstract edges are included, and are
considered free with probability 1. The unblocking rates are
A. The Abstract Graph the same as the original problem, but the block rate is 0. The
We can take advantage of structure in the problem. Fig- proposed solution algorithm is to plan in the reduced POMDP
ure 6 shows an example instance. The dashed edges are two and follow the first concrete edge of the first action in that plan

293
(this is also computed as part of the Dijkstra search to find not been seen before. This helps in our setting because the
the edge costs). The assumption made in the reduced POMDP, observation space is much smaller than the state space.
that new obstacles do not appear on abstract edges, may not
VI. E XPERIMENTS
hold. Therefore, we replan after traversing each edge of the
original graph, using a new abstract graph constructed based We evaluated the various formulations and the ROS navi-
on the latest observations. gation stack on simulated graph instances as well as on a PR2
robot. The parameters for the models (e.g., block rates) were
B. Analysis chosen based on the known domain model in the simulated
We can analyze the quality of this approximation as a case, and set by hand for the physical robot experiments.
function of the domain parameters. Specifically, let p =
A. Simulated experiments
(p1 , . . . , pK ) be the stationary probability distribution for the
edge obstacle Markov chains, where p1 corresponds to no We first compared several of the approaches described in
obstacle, and let e = (e1 , . . . , eK ) be the expected unblocking Section III on a set of simulated instances. There are two
time for each obstacle type (and e1 = 0). metrics of interest: planning time and plan quality, which
Theorem 1: Let s and g be nodes such that the optimal we model as total execution time 2 . In our evaluations, we
expected travel time between them is T . Suppose the robot bounded planning time and compared the total execution
follows an optimal policy with respect to the abstract graph. time across algorithms. The scale of our roadmaps is such
Then, the expected time T  to travel from s to g satisfies that traversing each edge typically takes several seconds; we
T  ≤ (1 + pT e)T . therefore placed a conservative limit of one second of planning
Intuitively, the approximation quality depends on how fre- time per action, and compared the following approaches:
quent obstacles, especially long-lived ones, are. The bound is • DA is the deterministic agent from Section III-A.
loose because in fact, unforeseen obstacles mainly matter in • BA1 and BA2 are the block-cost agents from Sec-
cases like Example 2 where there are bottlenecks and narrow tion III-B, using block costs of 10 and 1000.
hallways. In practice, the approximation is unlikely to be • ALA1 through ALA3 are abstract POMDP agents from
substantial unless most of the domain consists of narrow paths. Section III-E, using search depths 1 to 3, and a sampling
width of 100.
C. Planning with the Abstract Graph
We used a set of instances with graph sizes ranging from
The reduced POMDP still has a large state space, but a 20 to 1000. The results are shown in Table II. The POMDP-
shorter horizon, making it a good candidate for forward search. based algorithms were the best by a significant margin on
We use simple fixed-depth search with fixed-width sampling, each domain. Interestingly, ALA2 occasionally did better than
as described in Section II-C. More intelligent strategies [9] ALA3. This phenomenon is known in the search literature as
would likely improve performance. a lookahead pathology [2]. A more intelligent search strategy
Leaves of the search tree are evaluated using the following than uniform-depth search should mitigate this problem [9].
heuristic function. Given a belief state b, sample some number
of graphs; in each graph, compute the minimum of the shortest B. Experiments on robotic platform
path distance to the goal and the graph diameter (caching We also evaluated our algorithm on a Willow Garage
within and across leaf nodes is used to make this efficient), PR2 robot. A roadmap was generated offline from a static
and average the results. Intuitively, this corresponds to taking map of our environment by starting with uniformly spaced
full account of partial observability until the search horizon, nodes, which were then perturbed locally to move away from
then making a full-observability approximation. obstacles. The environment is about 40 by 40 meters, and the
At state nodes of the lookahead tree, since the current roadmap had about 200 nodes. We implemented a bridge ROS
position in the graph and neighboring edge statuses are known node to connect our graph-based system to the continuous state
with certainty, the number of children is bounded by the out- of the robot, as shown in Figure 8. Pose messages from the
degree of the roadmap. For action nodes, suppose we are at AMCL localization system are mapped to the nearest node
a node corresponding to doing action a after belief state b. A on the roadmap, while actions (edges of the roadmap) are
naive way to generate a child would be: mapped to goal commands for a trajectory planner based
1) Sample a state s from the distribution b on the elastic band algorithm [19]. A local occupancy grid
2) Sample a next-state and observation from P (s , o|s, a) is maintained in the odometric frame, based on laser range
3) Set b ← F(b, a, o) readings; when reaching a new node, a shortest path com-
4) If there is already a child node corresponding to b , putation is done in this grid (with obstacles inflated by the
increase its count; otherwise create a new node robot radius) to determine the neighboring edge statuses. An
This procedure results in a large number of belief state action is run either until the trajectory follower reports success
updates in the inner loop. Instead, we use a trick that we have or failure, or the current nearest node changes. The planner
not previously seen in the literature, based on the fact that F is runs asynchronously, using increasing tree depths, while the
deterministic given b, a, o: label each outgoing edge with an 2 The framework can be straightforwardly extended to more complex cost
observation, and only compute F when the observation has functions including, e.g., proximity to obstacles.

294
Instance DA ALA1 ALA2 ALA3 BA1 BA2
1 161 ± 23 186 ± 36 132 ± 7 142 ± 12 272 ± 57 285 ± 94
2 758 ± 207 225 ± 59 228 ± 65 149 ± 38 522 ± 207 380 ± 123
3 1203 ± 64 936 ± 58 907 ± 54 1025 ± 48 1064 ± 73 1134 ± 63
4 353 ± 14 361 ± 33 191 ± 22 178 ± 12 259 ± 33 286 ± 40
5 897 ± 97 853 ± 122 639 ± 43 625 ± 37 914 ± 103 1321 ± 91
6 1115 ± 80 1026 ± 56 915 ± 54 924 ± 51 1079 ± 88 1226 ± 76
7 441 ± 39 493 ± 58 410 ± 44 378 ± 36 408 ± 37 461 ± 36
8 732 ± 101 857 ± 122 684 ± 55 621 ± 51 817 ± 62 808 ± 53

TABLE II
R ESULTS ON EVALUATING DIFFERENT AGENTS ON A SET OF BENCHMARK ENVIRONMENTS . E ACH CELL ENTRY REPORTS COST TILL REACHING THE
GOAL AND SAMPLE STANDARD DEVIATION , GIVEN 30 INDEPENDENT TRIALS . P LANNING TIME WAS BOUNDED AT ONE SECOND .

width. The people in the building are familiar with navigating


robots; they do not try to interact with or move out of the way
for them.
Table III shows navigation times. In the best case, the two
systems performed similarly. This corresponds to situations
where no unforeseen obstacles were encountered in hallways
or doorways. However, the worst case cost of the ROS naviga-
tion stack was much higher. This seems to mainly be because
it does not model the probability of different observation types
or take the length of the second-best path into account when
deciding how long to wait given an unforeseen obstacle; it
therefore often gives up prematurely in favour of a significantly
longer path.

VII. R ELATED W ORK

Navigation using roadmaps has been well studied. Most


existing work has assumed the graph is known. [13] is an
exception, in which a PRM planner is modified to accept or
reject sampled points based on the probability of collision,
and then looks for an unconditional plan with a high chance
of success. In contrast, since our approach has an observation
Fig. 8. Screenshot from visualization during execution on PR2 (on subset
of overall environment used in experiments). The roadmap graph is overlaid model, it will return conditional plans that base future actions
on a static occupancy grid. The polygon shows the robot’s current position, on the results of observations.
which is mapped to a particular node of the roadmap. An obstacle is visible
in the hallway, resulting in the right outgoing edge from the robot position
Ong et al [15] studied mixed observability Markov decision
being blocked. Thus the only available action is to follow the left edge. processes (MOMDPs), in which part of the state is perfectly
observed, and showed how to speed up offline dynamic pro-
gramming algorithms in this case. Our problem is a MOMDP,
robot is navigating, assuming the action succeeds, i.e., that since the robot position in the roadmap is perfectly observed.
we end up at the node on the other side of the given edge. If Unfortunately, the unobserved portion of the state space is still
this does happen, the next action can be selected immediately exponentially large.
without pausing. If the action terminates at an unexpected node Closely related to this research is that of Kneebone and
(usually because a person was in the robot’s way), the robot Dearden [8]. Like us, they consider navigation in a partially
pauses and the planner is rerun for one second. This only observed graph, and represent the uncertainty using a POMDP.
happens occasionally, as the elastic band planner has some A difference is that in their framework, obstacles do not
degree of reactivity to unforeseen obstacles. appear and disappear over time. This makes a qualitative
We compared the performance of our navigation system difference in the kinds of policies that are found: there is no
against the ROS navigation stack [11] by measuring the time longer any utility in taking wait actions, nor are there useful
taken to navigate to a sequence of waypoints. Five trials were inferences to be made about obstacle classes (effectively, all
run for each system. The trials were conducted during the obstacles belong to a single class whose unblocking rate is
day, when there is a reasonably high density of people. Our 0). Also related is the literature on the Canadian Traveller’s
environment is fairly typical of academic or office buildings, Problem [14]. Again, the obstacles here are either static or
with a mixture of open spaces, rooms, and hallways of varying Markovian, so that a bounded wait will never be optimal.

295
Trial ROS Navigation Stack ALA
1 436 460
2 844 605
3 1118 549
4 491 534
5 520 502

TABLE III
E XECUTION TIME ( SECONDS ) FOR PR2 USING THE ROS NAVIGATION STACK AND OUR SYSTEM ON A FIXED SET OF WAYPOINTS IN OUR ENVIRONMENT.

The idea of using an abstract graph to solve the planning [5] David Hsu, Robert Kindel, Jean-Claude Latombe, and Stephen
problem is reminiscent of approaches based on subgoals and Rock. Randomized kinodynamic motion planning with moving
macroactions [4]. Subgoal choice tends to be a challenging obstacles. IJRR, 21(3), 2002.
[6] Leslie Kaelbling, Michael Littman, and Anthony Cassandra.
question for these approaches [22]. We make use of the Planning and acting in partially observable stochastic domains.
structure of the problem to choose subgoals as places where Artif. Intell., 101, 1998.
obstacles are likely to be. [7] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A
Theocharous and Kaelbling [23] also describe a macro- sparse sampling algorithm for near-optimal planning in large
action-based POMDP planning algorithm for robot navigation, markov decision processes. In IJCAI, 1999.
[8] Michael Kneebone and Richard Dearden. Navigation planning
but they were concerned with uncertainty about the robots po- in probabilistic roadmaps with uncertainty. In Alfonso Gerevini,
sition rather than the obstacles. Another complementary line of Adele E. Howe, Amedeo Cesta, and Ioannis Refanidis, editors,
research is on motion planning among movable obstacles [5]. ICAPS. AAAI, 2009.
Unlike our work, which considers an unknown and changing [9] Levente Kocsis and Csaba Szepesvári. Bandit based Monte-
set of static obstacles, this work considers a known, fixed set Carlo planning. In ECML, 2006.
[10] Steve Lavalle. Filtering and planning in information spaces.
of moving obstacles. IROS tutorial notes, 2009.
Execution systems for robotics have been widely stud- [11] Eitan Marder-Eppstein, Eric Berger, Tully Foote, Brian Gerkey,
ied [16] but many implemented systems still use finite state and Kurt Konolige. The office marathon: Robust navigation in
machines or scripts in a general-purpose programming lan- an indoor office environment. In ICRA, 2010.
guage. TREX [18] is a sophisticated executive based on [12] Wim Meeussen, Melonee Wise, Stuart Glaser, Sachin Chitta,
Conor McGann, Patrick Mihelich, Eitan Marder-Eppstein, Mar-
temporal planning, that was recently used in a navigation task ius Muja, Victor Eruhimov, Tully Foote, John Hsu, Radu Rusu,
involving failure recovery [12]. The executive in that case still Bhaskara Marthi, Gary Bradski, Kurt Konolige, Brian Gerkey,
required significant procedural knowledge to be prespecified and Eric Berger. Autonomous door opening and plugging in
in the form of temporal constraints, and planning was mainly with a personal robot. In ICRA, 2010.
used to ensure that the constraints were met. [13] Patrycja Missiuro and Nicholas Roy. Adapting probabilistic
roadmaps to handle uncertain maps. In ICRA, 2006.
VIII. C ONCLUSION [14] Evdokia Nikolova and David R. Karger. Route planning under
uncertainty: The Canadian traveller problem. In AAAI, 2008.
We view the main contribution of this work as showing [15] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. POMDPs
by example that it is possible to build robotic systems that for robotic tasks with mixed observability. In Proc. Robotics:
reason about and react to execution failures using planning. Science and Systems, 2009.
[16] Ola Pettersson. Execution monitoring in robotics: A survey.
Our formulation deals with uncertainty in a principled way, Robotics and Autonomous Systems, 53(2), 2005.
and optimal solutions to it exhibit various behaviors that are [17] M.L. Puterman. Markov decision processes. Wiley-Interscience,
normally hardcoded into an executive, such as persistence 2005.
and inference about obstacles. From a software engineering [18] Frederic Py, Kanna Rajan, and Conor McGann. A systematic
point of view, the resulting systems are less stateful, hence agent framework for situated autonomous systems. In AAMAS,
2010.
easier to understand and modify. We also described an efficient [19] Sean Quinlan and Oussama Khatib. Elastic bands: Connecting
approximate solution algorithm that can feasibly be run online path planning and control. In ICRA, 1993.
on a robot, and showed improved performance compared with [20] Stéphane Ross, Joelle Pineau, Sébastien Paquet, and Brahim
traditional deterministic planners. Chaib-draa. Online planning algorithms for POMDPs. J. Artif.
Intell. Res. (JAIR), 32, 2008.
R EFERENCES [21] Stuart J. Russell and Peter Norvig. Artificial Intelligence - A
[1] Oliver Brock and Oussama Khatib. High-speed navigation using Modern Approach. Pearson, 2010.
the global dynamic window approach. In ICRA, 1999. [22] Özgür Simsek, Alicia P. Wolfe, and Andrew G. Barto. Identi-
[2] Vadim Bulitko and Mitja Lustrek. Lookahead pathology in real- fying useful subgoals in reinforcement learning by local graph
time path-finding. In AAAI. AAAI Press, 2006. partitioning. In ICML, 2005.
[3] Howie Choset and Joel W. Burdick. Sensor-based exploration: [23] Georgios Theocharous and Leslie Kaelbling. Approximate
The hierarchical generalized Voronoi graph. IJRR, 19(2), 2000. planning in POMDPs with macro-actions. In NIPS, 2003.
[4] Thomas G. Dietterich. Hierarchical reinforcement learning with [24] Sebastian Thrun. Probabilistic robotics. Commun. ACM, 45(3),
the MAXQ value function decomposition. J. Artif. Intell. Res. 2002.
(JAIR), 13, 2000.

296
Visual Route Recognition with a Handful of Bits
Michael J. Milford, Member, IEEE

Abstract—In this paper we use a sequence-based visual burgeoning sensor megapixel counts and large camera field of
localization algorithm to reveal surprising answers to the views. In this quest, one very important question has been
question, how much visual information is actually needed to largely neglected – what visual information is actually needed
conduct effective navigation? The algorithm actively searches for to conduct effective vision-based navigation?
the best local image matches within a sliding window of short
route segments or ‘sub-routes’, and matches sub-routes by In this paper we present evidence to suggest that, at least for
searching for coherent sequences of local image matches. In localizing along a route, a simple sequence-based localization
contrast to many existing techniques, the technique requires no algorithm is able to match or surpass the performance of a state
pre-training or camera parameter calibration. We compare the of the art algorithm while using images with resolutions up to
algorithm’s performance to the state-of-the-art FAB-MAP 2.0 one millionth the size and less than two hundredth the field of
algorithm on a 70 km benchmark dataset. Performance matches view. Specifically, we make the following contributions:
or exceeds the state of the art feature-based localization
technique using images as small as 4 pixels, fields of view reduced x a route recognition algorithm incorporating whole of
by a factor of 250, and pixel bit depths reduced to 2 bits. We appearance image comparison with Dynamic Time
present further results demonstrating the system localizing in an Warping [4] sequence recognition which requires no
office environment with near 100% precision using two 7 bit training and is not reliant on feature recognition,
Lego light sensors, as well as using 16 and 32 pixel images from a
motorbike race and a mountain rally car stage. By demonstrating x extensive experimental results showing the effect of
how little image information is required to achieve localization image resolution, camera field of view, pixel bit depth
along a route, we hope to stimulate future ‘low fidelity’ and sequence length, with comparison to a state of the
approaches to visual navigation that complement probabilistic art method on a publicly available, modern benchmark
feature-based techniques.1 dataset, and

Index Terms—localization, route recognition, visual x further experimental results from rally car, motorbike
navigation, featureless, SeqSLAM and office datasets, including localization using two 7
bit Lego light intensity sensors.
I. INTRODUCTION
II. BACKGROUND
Current state of the art visual navigation systems are
dominated by probabilistic feature-based techniques such as The most relevant use of image sequences to localize was
FAB-MAP [1], FrameSLAM [2], and MonoSLAM [3]. These in work by [5], in which loop closure was performed by
techniques have displayed impressive performance in a range comparing sequences of images based on the similarity of
of experiments, the largest of which have occurred over 128D vectors of SIFT descriptors. Due to its reliance on visual
distances of 1000 km [1]. These feature-based approaches have features, the method required the development of additional
desirable properties such as easy integration with metric pose algorithms to address visual ambiguity caused by repetitive
estimation and semantic mapping techniques, and the ability to foliage or architecture features. The use of image sequence
localize off a single image. However, such approaches also information has also been used to geo-locate a person based on
have shortcomings. Many require training on a suitable dataset a sequence of photos they have taken, even when none of the
to develop a visual ‘codebook’ before they can be applied in an individual images contain recognizable features [6]. In contrast,
environment, and using an inappropriate codebook can result in the technique presented here forgoes the use of features and
poor system performance. Feature-based techniques also rely uses a novel image difference normalization scheme to partially
on being able to reliably detect features, a requirement that is address visual ambiguity.
difficult in changing environmental conditions caused by While there are a large number of vision-based mapping
weather, day-night cycles and seasons. The quest to map systems [1-3, 7, 8], few current implementations use low
increasingly impressive datasets has been accompanied by a resolution images. Earlier research did use relatively low
trend towards increasingly sophisticated algorithms, resolution visual images to perform navigation (for reasons of
computation as much as anything else), including numerous
M. J. Milford is with the School of Electrical Engineering and Computer early systems such as ALVINN [9] and insect-inspired
Science at the Queensland University of Technology, Brisbane, Australia (e- algorithms [10]. More recently, low resolution approaches have
mail: michael.milford AT qut.edu.au). This work was supported by an
Australian Research Council Fellowship DE120100995 to MM.
been deployed on Sony AIBO robot dogs [11] and Pioneer

297
robots [12], including the biologically inspired RatSLAM current image i, and ımin is a minimum standard deviation
system [13]. In research fields outside of localization such as constant used to avoid undefined output, set to 1/256 of the
face recognition [14] and object recognition, matching using intensity range in this paper. Rw was set to 10 frames for all
low resolution images has been found to be highly effective experiments in this paper. Normalizing the difference values in
[15]. In this work we use image snapshots with up to two local image neighborhoods is a process that would be
orders of magnitude less information than in these previous counterproductive when localizing off single frames. However,
studies. The research presented here also builds on related work in the context of recognizing sequences of images, this process
by the authors on localization using ‘whole of image’ ensures there are clear locally best matching images in every
appearance-based methods under extreme environmental sub-route along the entire stored set of routes, to some extent
change [16], in which it was shown that routes could be negating the effect of global biases such as lighting changes
recognized over day to night, sunny to stormy and summer to and image commonalities.
winter transitions, albeit with image sizes of approximately
1000 pixels. Therefore we do not specifically address extreme To find likely route matches, we perform a continuous
environmental change in this paper, but rather focus on pushing version of the Dynamic Time Warping (DTW) method
the boundaries even further on the minimum resolution, pixel proposed by Sakoe and Chiba [4]. We impose continuity and
bit depth, and field of view required to recognize a route under slope constraint conditions to constrain the search space. The
more modest change. boundary condition and monotonically increasing constraints
are not applicable due to uncertainty in velocity and the need to
match both forward and reverse traverses of a route. The search
III. APPROACH is continuous in that searches are started at every element in the
The algorithm consists of two primary modules, the image left column of the image difference matrix (shown by the small
comparison algorithm and the sequence recognition algorithm. solid circles in Figure 1).

A. Image Similarity
For panoramic images, mean absolute image differences D
between the current image i and all stored images j are
calculated using the mean absolute intensity differences,
performed over a range of horizontal offsets:
 Dj min g ' x , i , j  
' xV
Figure 1. The image difference matrix M (a) before and (b) after
where ı is the offset range, and g( ) is given by: normalization, with small circles showing the elements where each search
originates. Only the bottom section of the difference matrix is shown.
1
 g 'x , i, j ¦ ¦ p xi  'x , y  p xj, y 
s x 0y 0
 The output of the DTW search produces a vector of sub-
route matching scores for each search origin and each slope
where s is the area in pixels of the image. Setting V >0, S @ condition. The best matching sub-route is determined as:
enables recognition when traversing a route in reverse. For
perspective cameras, ı can be set to span a range of offset  imin arg min s i  
1didm
angles to provide some invariance (assuming mostly distal
features) to camera yaw. However, for the perspective camera where m is the number of stored images and s(i) is the
datasets in this paper only a no offset case was used ( V >0 @ ). normalized sub-route difference score for sub-route i over all
slope constraints:
B. Sequence Matching
Comparisons between the current image and all stored  si min d i  
images yield a vector of image differences, as in [5]. The
The vector di contains the difference scores for sub-route i over
matrix M of image differences for the n most recent frames
all slope possibilities k:
forms the space within which the search for matching sub-
routes is performed. The key processing step is to normalize
the image difference values within their (spatially) local image 1 n ˆ
neighborhoods, similar to the creation of standard scores in
 d ik ¦ D ju i , j , k 
n j1

statistics (Figure 1a). The updated image difference vectors
(Figure 1b) are given by: where n is the sub-route length and u(i, j, k) provides the
element row index in the image difference matrix:
Di  Dl
 Dˆ i    u i, j , k i  j tan vk  
max(V l , V min )
where vk is a specific slope constraint. The slope constraint is
where Dl is the local mean, ıl is the local standard deviation, set to span a range of values that encompass possible frame rate
over a distance of Rw images acquired before and after the variations. For constant frame rate scenarios, such as the

298
Eynsham dataset or datasets with odometry, it is possible to use high resolution image captures from a Ladybug2 camera
a small range or even single value of vk. (circular array of five cameras) at 9575 locations spaced along
the route. The Rowrah dataset was obtained from an onboard
By considering the sum of sub-route difference scores s(i) camera mounted on a racing bike. The Pikes Peak dataset was
as a sum of normally distributed random variables, each with obtained from cameras mounted on two different racing cars
the same mean and variance, the sum of normalized differences racing up the mountain, with the car dashboard and structure
over a sub-route of length n frames has mean zero and variance cropped from the images. This cropping process could most
n, assuming that frames are captured far enough apart to be likely be automated by applying some form of image matching
considered independent. Dividing by the number of frames process to small training samples from each of the camera
produces a normalized route difference score with mean zero, types. The route consisted of heavily forested terrain and
variance 1/n. Percentile rank scores can then be used to switchbacks up the side of a mountain, ending in rocky open
determine an appropriate sub-route matching threshold. For terrain partially covered in snow.
example, for the primary sub-route length n = 50 used in this
paper, a sub-route threshold of -1 yields a 7.7×10-13 chance of TABLE I. DATASETS
the match occurring by chance.
Dataset Number of Distance between Image
Distance
To determine whether the current sub-route matches to any Name frames frames Storage
stored sub-routes, the minimum matching score is compared to Eynsham 70 km 9575 6.7 m (median) 306 kB
a matching threshold sm. If the minimum score is below the 2km 440 4.5 m (mean) 7 kB
Rowrah
http://www.youtube.com/watch?v=_UfLrcVvJ5o
threshold, the sub-route is deemed to be a match, otherwise the
40 km 4971 8 m (mean) 159 kB
sub-route is assigned as a new sub-route. An example of the Pikes
http://www.youtube.com/watch?v=4UIOq8vaSCc
minimum matching scores over every frame of a dataset (the Peak
http://www.youtube.com/watch?v=7VAJaZAV-gQ
Eynsham dataset described in this paper) is shown in Figure 2. 53 m 832 0.13 m (mean) 1.6 kB
Office
In the second half of the dataset the route is repeated, leading to http://df.arcs.org.au/quickshare/790eb180b9e87d53/data3.mat
lower minimum matching scores.

Figure 3. The (a) 35 km Eynsham, (b) 1 km Rowrah and (c) 20 km Pikes


Figure 2. Normalized sub-route difference scores for the Eynsham dataset Peak routes, each of which were repeated twice. Copyright DigitalGlobe,
with the matching threshold sm that yields 100% precision performance. GeoEye, Getmapping plc, The GeoInformation Group, USDA Farm Service
Agency, Infoterra Ltd & Bluesky, Map data ©2012 Google.

IV. EXPERIMENTAL SETUP


In this section we describe the four datasets used in this
work and the image pre-processing for each study.

A. Datasets
A total of four datasets were processed, each of which
consisted of two traverses of the same route. The datasets were:
a 70 km road journey in Eynsham, the United Kingdom, 2 km Figure 4. (a) The Lego Mindstorms dataset acquisition rig with 2 sideways
of motorbike circuit racing in Rowrah, the United Kingdom, 40 facing light sensors and GoPro camera for evaluation of matched routes. (b)
The 53 meter long route which was repeated twice to create the dataset.
km of off-road racing up Pikes Peak in the Rocky Mountains,
the United States, and 100 meters in an Office building (italics
indicate dataset names). The Eynsham route was the primary B. Image Pre-Processing
dataset on which extensive quantitative analysis was 1) Eynsham Resolution Reduced Panoramic Images
performed. The other datasets were added to provide additional For the Eynsham dataset, image processing consisted of
evidence for the general applicability of the algorithm. Key image concatenation and resolution reduction (Figure 5). The
dataset parameters are provided in Table I, including the raw camera images were crudely cropped to remove overlap
storage space required to represent the entire dataset using low between images. No additional processing such as camera
resolution images. undistortion, blending or illumination adjustment was
Figure 3 shows aerial maps and imagery of the Eynsham, performed. The subsequent panorama was then resolution
Rowrah and Pikes Peak datasets, with lines showing the route reduced (re-sampling using pixel area relation in OpenCV
that was traversed twice. The Eynsham dataset consisted of 2.1.0) to the resolutions shown in Table II.

299
study [1]. Detected route segment matches were classified as
correct if the spatial distance separating the central frames of
each route was less than 40 meters, as in the original study.
Matches outside this distance were classified as false positives,
with missed matches classified as false negatives. Matches
were assessed for both traverses of the route, rather than just
the second traverse. To generate each precision recall curve, we
conducted a sweep over the range of matching threshold sm
values. The range of values was chosen such that for most
experiments, a complete range of recall rates from 0% to 100%
was obtained.

TABLE II. IMAGE SIZES


Dataset & Image Type Reduced Resolution Width×Height
4 pixels 2×2
Eynsham panoramic images 8 pixels 4×2
(original image 829440 pixels, 32 pixels 8×4
Figure 5. Image pre-processing for the full panoramic images consisted of a 1620×512) 128 pixels 16×8
crude image stitching stage followed by a reduction in image resolution. The 512 pixels 32×16
current image was compared with 0° and 180° offsets to all stored images on a 2 pixels 2×1
pixel by pixel basis to form the image difference matrix described in Section 4 pixels 2×2
III.B. Eynsham cropped images
16 pixels 4×4
(original crop 4800 pixels, 80×60)
64 pixels 8×8
2) Reduced Field of View 256 pixels 16×16
For the reduced field of view experiments, a small area Rowrah 16 pixels 4×4
representing 0.4% of the total panoramic image was extracted Pikes Peak 32 pixels 8×4
Office NXT 2×7 bit pixels 2×1
from the centre of the forward facing image (Figure 6). The
resultant 80×60 pixel image was then resolution reduced to the V. RESULTS
sizes shown in Table II.
We present a range of results evaluating the performance of
the system with varying image resolution, sequence length,
pixel bit depth, and field of view. This extensive testing is
performed on the 70 km Eynsham dataset, for which both
ground truth and a state of the art comparison is available. We
also present additional results demonstrating qualitatively the
applicability of the technique to three other varied datasets to
demonstrate the wide applicability of the technique.

Figure 6. To evaluate the effect of drastically reducing the field of view, an


area representing 0.4% of the original panoramic image was extracted and
resolution reduced.

Figure 7. To evaluate the effect of reduced pixel bit depth, the resolution
reduced panoramic images were resampled at 1 bit, 2 bit, 4 bit and 8 bit
grayscale pixel depths.

3) Reduced Pixel Depth Figure 8. Precision recall curves for a range of reduced resolution panoramic
Reduced pixel depths were obtained by reducing the bit images, with performance compared to four FAB-MAP implementations.
depths of each pixel in the resolution reduced images Note the axis ranges.
(Figure 7). Grayscale image intensities were evenly distributed
over the 256 values possible in an 8 bit intensity range, such
that a 1 bit image had intensities values 85 or 171, a 2 bit image A. Precision-Recall
had intensity values 51, 102, 154 or 205 and so on. The precision-recall performance using panoramic images
from the Eynsham dataset is shown in Figure 8. At high
C. Precision-Recall Calculation precision levels, 4 pixel images produce superior performance
to the baseline FAB-MAP performance. Increasing the
To generate precision-recall curves, we used the manually
resolution to 8 pixels enables the system to overtake the FAB-
corrected GPS data provided by the authors of the original
MAP with motion model results, while with 32 pixels

300
performance is superior (50% recall at 100% precision) to performance superior at high precision levels to both the
FAB-MAP with motion model and epipolar geometry, except baseline and motion model FAB-MAP 2.0 performance. 50
between 86% and 89% recall rates. The sequence-based frame sequences provided the best performance at high
technique is also able to attain 100% recall, at 24%, 39%, and precision levels, while 100 frame sequences provided the
46% precision levels for 4, 8 and 32 pixel images respectively. highest recall at lower precision levels. The 50 frame sub-route
Figure 9 shows a zoomed in comparison of the techniques. length (335 meters) is consistent with the warm start
Performance gains are minimal above an image size of 32 localization times of commercial GPS navigation systems of
pixels. 28.5 seconds [17] (380 meters at 30 miles per hour).

Figure 10. Loop closure map for the Eynsham dataset using 32 pixel images
at 99% precision and 82% recall. Loop closures are shown by the large circles,
with false positive matches shown by small crosses.
Figure 9. Enlarged plot of the high precision and recall performance curves.
Note the rapidly reducing performance gains for image sizes above 32 pixels.

B. Loop Closure Locations


Although the precision performance using 32 pixel images
is higher for a given recall rate compared to FAB-MAP, this is
counteracted partly by inferior spatial loop closure coverage.
Consequently, the algorithm requires a higher recall rate to
achieve the equivalent loop closure coverage to FAB-MAP.
Figure 10 shows the loop closures achieved at a 99% precision
level, showing comparable loop closure coverage to the FAB-
MAP algorithm. The sections of the route where the algorithm
failed to match a route were mostly due to the linear search Figure 11. (a) Image difference matrix for a matched sub-route, with white
circles showing the corresponding matching frame pairs. The matching
constraints not finding sequence matches when frame capture gradient was approximately 1, indicating this route segment was traversed at
spatial frequencies varied significantly. The reverse route is the same speed both times. (b) Frames from the second traverse and (c)
also only thoroughly recognized at higher recall rates (and matching frames from the first traverse of the route. The full frames are shown
precision levels below 100%). Further discussion of this issue for visibility, although the actual processed images were low resolution
is provided at the end of the paper. versions of the top 75% of the frame.

C. Sample Route Matches and Speed Ratios


Figures 11-12 show matched sub-routes for both a forward
(11) and reverse (12) sub-route match. The section of the image
difference matrix in which the sub-route match was found is
shown in panel (a), with white circles indicating the matching
locations of five representative images from the matched sub-
route. Panels (b-c) show the corresponding images.
Figure 13 shows a histogram of the relative frame sampling
speed calculated for all matched sub-routes at the maximum
recall, 100% precision point. The peak around 1 shows that
Figure 12. (a) Image difference matrix for the reverse traversal along a section
frames were spatially sampled at similar rates during the of route. (b) Frames from the second traverse and (c) matching frames from
second traverse of the route, while the small peak at -0.8 the first traverse of the route. The matching route segment had a gradient of
indicates the sub-routes matched in reverse. approximately -0.8, indicating that the route was traversed approximately 20%
more slowly in reverse.
D. Sequence Length
Increasing sequence length had a positive effect on E. Pixel Bit Depth
performance up to a point (Figure 14). Matching 10 frame The pixel bit depth had little effect on system performance
sequences was clearly inadequate, but 20 frames provided beyond 2 bits (4 possible intensities). At a pixel depth of 2 bits,

301
performance was superior to the best FAB-MAP 2.0
performance at high precision levels, but had lower recall rates
at lower precision levels. There was negligible difference in
performance between 4 and 8 bit depths. A sequence length of
50 frames was used for all the pixel bit depth results.

Figure 16. Precision recall curves with a limited field of view equivalent to
0.4% of the original panoramic image (see Figure 6).

Figure 13. Relative speed ratios calculated for matching Eynsham sub-routes
at the 100% precision, maximum recall point. G. Rowrah Track Bike Dataset
For the Rowrah, Pikes Peak and Office NXT datasets we
did not have metric ground truth. Instead, we present the sub-
route recognition graphs at a performance level that was
qualitatively assessed to be near 100% precision, by comparing
the image sequences associated with the matched sub-routes.
For the Rowrah dataset, all sub-routes were matched within a
few frames of the correct location (each frame separated by an
average of 8 meters), as shown in Figure 17a. The vertical axis
shows the central frame number of the matching sub-route
from the first traverse of the route. Figure 17b shows five
frames obtained by evenly sampling a sub-route from the
second traverse of the circuit, with Figure 17c showing the
corresponding frames from the matching sub-route during the
first circuit traverse.
Figure 14. Precision-recall performance using 32 pixel images to match sub-
routes of 10, 20, 50 and 100 frames in length.

Figure 17. (a) Sub-route localization for the Rowrah dataset. (b) Frames from
second route traverse and (c) matching frames from the first traverse.

Figure 15. Precision-recall performance using 32 pixel images with 1, 2, 4


and 8 bit grayscale pixel depths. There was no significant improvement in H. Pikes Peak Dataset
performance above 4 bit pixel depths.
The recall level at near 100% precision was around 50% for
the Pikes Peak dataset (Figure 18). The lower recall rate was
F. Limited Field of View most likely due to significant changes in the racing-line taken
When limiting the field of view to 0.4% of the original by the car, larger variations in vehicle speed and the relatively
panoramic image, the recall rate at 100% precision improved to bland nature of the environment, especially in the later
56% for 16 pixel images, 67% for 64 pixel images, and 69% mountainous stages.
for 256 pixel images. The effect of increasing resolution was
broadly the same as for the full field of view images, with gains I. Office NXT Dataset
rapidly diminishing after reaching about 32 pixels in resolution. The route matching performance for the NXT dataset is
shown in Figure 19a. Figures 19d and 19e show the NXT light

302
sensor readings, with illustrative camera frames shown in device, or easily download imagery from local areas. For
Figures 19b-c. The first column (d) shows the pairs of light example, a 10 megabyte download would provide 3000 km of
sensor readings obtained at that location during the second road data, enough for a regional area.
traverse, and the second column (e) shows the matched light
sensor readings from the first route traverse. Manual inspection 2) Computation
of the camera frames verified that every matched route segment All experiments performed in this paper were performed at
was accurate to within approximately a meter. real-time or better speed using unoptimized Matlab and C code.
Computation is dominated by the search for matching sub-
routes. The primary factor affecting computation is the
allowance for varying velocities when conducting the matching
route segment search. Here we discuss the localization-only
scenario where a library of images already exists, starting with
the situation where self-motion information (such as car
odometry) enables the search space to be constrained to a
simple linear search.
For a sub-route length of n frames, the dominant calculation
is the nm frame comparisons that must be performed, where m
is the number of visual templates stored in the template library.
Each frame comparison constitutes s byte-wise comparisons, or
2s comparisons to enable forward and reverse route matching.
For 32 pixel panoramic images, this constitutes 64 byte-wise
Figure 18. (a) Sub-route recognition for the Pikes Peak dataset. Recall was not
achieved in sections of the 20 km route, most likely due to large variations in
comparisons. Table III presents a number of computation
racing-line. (b) Frames from second traverse and (c) matching frames from the scenarios, assuming 5 meter frame spacing and a camera speed
first traverse of the route. of 15 meters per second (54 km/hr). With single instruction,
multiple data (SIMD) the large city scenario is achievable on a
current desktop machine. To achieve real-time performance
during initial localization within an entire country or the world,
significant optimizations would need to be implemented. One
straightforward method is to cache frame by frame
comparisons, comparing new images in the current sub route as
they are seen, leading to an n times speed up, at the cost of
needing more fast memory. However the fast memory
requirement at the large city size is well within all current
device capabilities including mobile phones and most other
portable devices. Modern graphics card architectures and
growing CPU counts even on mobile devices offer the potential
for further significant speed ups through leveraging parallel
processing. In addition, the implementation of optimized data
structure methods could also remove the barrier to achieving
Figure 19. (a) Sub-route recognition for the Office NXT dataset. (b) Frames country wide localization. Once localized, search spaces could
from second traverse and (c) matching frames from the first traverse of the also be massively constrained, as is done with current GPS
route. (d) Light sensor readings from second traverse and (e) matching light systems. If a spatially regular spaced frame rate cannot be
sensor readings from the first traverse of the route. guaranteed, then the search space must expand to incorporate
multiple possible velocities. This increases the compute by a
J. Computation and Storage factor dependent on the range of possible velocities. For the
Eynsham dataset, allowing for a frame rate variation of 19%
In this section we describe the storage and computational increased computation time by a factor of 10.
requirements of the system and present a range of
computational scenarios for the proposed visual GPS scenario. TABLE III. COMPUTATION SCENARIOS
1) Storage Number of Number of Cache fast
At a pixel depth of 4 bits and an image size of 32 pixels, Route Qualitative Template byte-wise calculations memory
Length Description Storage calculations with storage
each stored image takes a total of 16 bytes. It is an informative per second caching requirement
exercise to calculate the storage required to store imagery from 100 km Local area 320 kB 192×106 3.84×106 1 MB
the entire global road network. According to the CIA World 10000 9 6
Large city 32 MB 19.2×10 384×10 100 MB
Factbook [18], the 191 countries surveyed have a total of km
approximately 70 million kilometers of paved and unpaved 1×106 Medium 12 9
3.2 GB 1.92×10 38.4×10 10 GB
km country
roads. Storing images every 5 meters of road would result in
World
224 gigabytes of data, easily stored on a hard drive dating from 7×107
road 224 GB 134×1012 2.69×1012 700 GB
2006 and current solid state media. Therefore it would be km
network
possible to store a global database of images either locally on a

303
VI. DISCUSSION ACKNOWLEDGMENT
The presented approach sacrifices single frame matching Author MM thanks Peter Corke, Gordon Wyeth and Liz
capability to achieve robust localization along a route, without Murphy for their helpful comments on the draft manuscript.
the need for prior training or feature detection. While the
method is not suitable for single snapshot mobile phone REFERENCES
localization, a large range of potential applications involve
[1] M. Cummins and P. Newman, "Highly scalable appearance-only SLAM
navigation along routes, such as street navigation and indoor - FAB-MAP 2.0," in Robotics: Science and Systems, Seattle, United
mobile robots. The experimental results show that in the States, 2009.
context of navigation along routes, vision-based localization [2] K. Konolige and M. Agrawal, "FrameSLAM: From Bundle Adjustment
can be achieved with remarkably few pixels, tiny fields of view to Real-Time Visual Mapping," IEEE Transactions on Robotics, vol. 24,
and reduced pixel bit depths. pp. 1066-1077, 2008.
[3] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, "MonoSLAM:
Localization was achievable through two key, Real-Time Single Camera SLAM," IEEE Transactions on Pattern
interdependent measures. Performing localization using a Analysis and Machine Intelligence, vol. 29, pp. 1052-1067, 2007.
sequence of images rather than single image removes the [4] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization
requirement that the image matching scheme be able to reliably for spoken word recognition," Acoustics, Speech and Signal Processing,
calculate a single global image match. Instead, the image IEEE Transactions on, vol. 26, pp. 43-49, 1978.
matching front end must only on average report matches better [5] P. Newman, D. Cole, and K. Ho, "Outdoor SLAM using Visual
Appearance and Laser Ranging," in International Conference on
than at chance. How much better depends on how long a Robotics and Automation, Florida, United States, 2006.
matching sequence is used, with longer matching sequences
[6] E. Kalogerakis, O. Vesselova, J. Hays, A. A. Efros, and A. Hertzmann,
reducing the performance requirements for the image matcher "Image sequence geolocation with human travel priors," in International
but increasing the computation of the sequence matching Conference on Computer Vision, Kyoto, Japan, 2009, pp. 253-260.
algorithm. This trade-off is avoided in a subset of real world [7] P. Newman, G. Sibley, M. Smith, M. Cummins, A. Harrison, C. Mei, I.
navigation applications such as domestic car travel, where Posner, R. Shade, D. Schroeter, L. Murphy, W. Churchill, D. Cole, and
translational speed information is available from On Board I. Reid, "Navigating, Recognizing and Describing Urban Spaces With
Diagnostic (OBD) systems. Vision and Lasers," The International Journal of Robotics Research,
2009.
The use of sequences rather than individual images also [8] R. Sim, P. Elinas, M. Griffin, and J. J. Little, "Vision-based SLAM using
introduces two types of lag – a delay in initial localization upon the Rao-Blackwellised Particle Filter," in International Joint Conference
startup, and a delay when the route taken consists of several on Artificial Intelligence, Edinburgh, Scotland, 2005.
fragmented previously traversed sequences. Variable sequence [9] D. A. Pomerleau, "Neural network perception for mobile robot
guidance," DTIC Document1992.
length matching could partially address the initial localization
[10] M. O. Franz, P. G. Scholkopf, H. A. Mallot, and H. H. Bulthoff,
lag problem, by localizing more rapidly when the environment "Learning View Graphs for Robot Navigation," Autonomous Robots,
is easily recognizable. To adapt the system to deal with vol. 5, pp. 111-125, 1998.
fragmented sequences, we are pursuing three approaches. The [11] D. Q. Huynh, A. Saini, and W. Liu, "Evaluation of three local
first is to use traditional probabilistic filters, which also descriptors on low resolution images for robot navigation," in Image and
potentially removes the need for a sequence length parameter. Vision Computing New Zealand, Wellington, New Zealand, 2009, pp.
The second is to expand the local best matching and search 113-118.
from one dimensional routes to two dimensional areas. The [12] V. N. Murali and S. T. Birchfield, "Autonomous navigation and
mapping using monocular low-resolution grayscale vision," in
third is to maintain localization using odometry in situations Conference on Computer Vision and Pattern Recognition, Alaska,
where a previously localized system briefly loses localization United States, 2008, pp. 1-8.
while traversing several fragmented sequences (such as passing [13] M. Milford and G. Wyeth, "Persistent Navigation and Mapping using a
through a complex intersection). Biologically Inspired SLAM System," International Journal of Robotics
Research, vol. 29, pp. 1131-1153, 2010.
Matching using sequences rather than individual frames
[14] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K.
allows the image matching algorithm to be modified in ways Hoffman, J. Marques, J. Min, and W. Worek, "Overview of the face
that would render it useless as a global image matcher. We recognition grand challenge," in International Conference on Control,
exploit this ability by normalizing the image difference scores Automation, Robotics and Vision, Singapore, 2005, pp. 947-954 vol. 1.
within local sub-routes, forcing the algorithm to calculate a best [15] A. Torralba, R. Fergus, and Y. Weiss, "Small codes and large image
image match candidate within every section of route stored in databases for recognition," in Computer Vision and Pattern Recognition,
the image database. While this measure would produce large Anchorage, United States, 2008, pp. 1-8.
numbers of false positive loop closures were single image [16] M. Milford and G. Wyeth, "SeqSLAM: Visual Route-Based Navigation
for Sunny Summer Days and Stormy Winter Nights," in IEEE
localization used, by matching over a sequence of images it International Conference on Robotics and Automation, St Paul, United
enhances the ability of the algorithm to localize by removing States, 2012.
the effect of systematic biases, in much the same way that [17] J. Mahaffey, "TTFF Comparisons," ed, 2003.
applying patch normalization to an image removes some of the [18] CIA, The World Factbook, 2012.
effect of illumination change [19]. This combination of [19] A. M. Zhang and L. Kleeman, "Robust Appearance Based Visual Route
sequence matching and local image comparison provides a Following for Navigation in Large-scale Outdoor Environments," The
basis for future development of sequence-based, featureless International Journal of Robotics Research, vol. 28, pp. 331-356, 2009.
localization techniques with capabilities complementary to
single frame feature-based techniques.

304
Exploiting Passive Dynamics with Variable Stiffness
Actuation in Robot Brachiation
Jun Nakanishi and Sethu Vijayakumar
School of Informatics, University of Edinburgh, United Kingdom
Email: [email protected], [email protected]

Abstract—This paper explores a passive control strategy with during a hitting movement. In a similar problem, Hondo and
variable stiffness actuation for swing movements. We consider Mizuuchi [13] have discussed the issue of determining the
brachiation as an example of a highly dynamic task which re- inertia parameter and spring constant in the design of series
quires exploitation of gravity in an efficient manner for successful
task execution. First, we present our passive control strategy elastic actuators to increase the peak velocity. In robot running,
considering a pendulum with variable stiffness actuation. Then, Karssen and Wisse [16] have presented numerical studies to
we formulate the problem based an optimal control framework demonstrate that an optimized nonlinear leg stiffness profile
with temporal optimization in order to simultaneously find an could improve robustness against disturbances.
appropriate stiffness profile and movement duration such that In this paper, we focus on the passive control strategy
the resultant movement will be able to exploit the passive
dynamics of the robot. Finally, numerical evaluations on a two- with variable stiffness actuation for swing movements in a
link brachiating robot with a variable stiffness actuator (VSA) brachiation task. Indeed, the importance of exploitation of the
model are provided to demonstrate the effectiveness of our intrinsic passive dynamics for efficient actuation and control
approach under different task requirements, modelling errors has been discussed in the study of passive dynamic walking
and switching in the robot dynamics. In addition, we discuss the where biped robots with no actuation or minimal actuation
issue of task description in terms of the choice of cost function
for successful task execution in optimal control. can exhibit human-like natural walking behavior [5]. In this
study, we consider brachiation as an example of dynamic
I. I NTRODUCTION task involving swing movement. Brachiation is an interesting
form of locomotion of an ape swinging from handhold to
In recent years, there has been growing effort in the de- handhold like a pendulum [7, 29] which requires explicit
velopment of variable stiffness actuators. Various designs of exploitation of the passive dynamics with the help of gravity
actuators with mechanically adjustable stiffness/compliance to achieve the task. From a control point of view, designing
composed of passive elastic elements such as springs have a brachiating controller is a challenging problem since the
been proposed [4, 6, 11, 12, 14, 15]. In contrast to conventional system is underactuated, i.e., there is no actuation at the
stiff actuators, one of the motivations to develop variable gripper. Efforts have been made to develop a control law for a
stiffness actuators is that such actuators are expected to class of underactuated systems from a control theoretic view1 ,
have desirable properties such as compliant actuation, energy e.g., [18, 19, 22, 32].
storage capability with potential applications in human-robot In our previous study [23], we have proposed a method
interaction and improvements of task performance in dynamic of describing the task using a dynamical system based on a
tasks. nonlinear control approach, and derived a nonlinear control
This paper explores a control strategy for exploiting passive law for a joint torque controlled two-link brachiating robot.
dynamics in tasks involving swing movements with variable The control strategy in [23] uses an active cancellation of the
stiffness actuation based on optimal control. Despite potential plant dynamics using input-output linearization to force the
benefits of variable stiffness joints, finding an appropriate robot to mimic the specified pendulum-like motion described
control strategy to fully exploit the capabilities of variable in terms of target dynamics. In contrast, Gomes and Ruina [9]
stiffness actuators (VSAs) is challenging due to the increased studied brachiation with zero-energy-cost motions using only
complexity of mechanical properties and the number of control passive dynamics of the body. They sought numerical solutions
variables. Taking an optimal control approach, recent studies for the initial conditions which lead to periodically continuous
in [3, 8, 10] have investigated the benefits of variable stiffness locomotion without any joint torques. By extending the (unsta-
actuation such as energy storage in explosive movements from ble) fixed point solutions in unactuated horizontal brachiation
a viewpoint of performance improvement. Braun et al. [3] found in [9], Rosa et al. [28] numerically studied open-loop
have demonstrated such benefits of VSAs by simultaneously stable (unactuated downhill and powered uphill) brachiation of
optimizing time-varying torque and stiffness profiles of the a two-link model from a viewpoint of hybrid systems control
actuator in a ball throwing task. In [8, 10], an optimal control including switching and discontinuous transitions.
problem of maximizing link velocity with variable stiffness
1 Much of the related work has focused on the motion planning of underac-
actuator models has been investigated. It is shown that much
tuated manipulators in a horizontal plane (not necessarily under the influence
larger link velocity can be achieved than that of the motor of the gravity). In such a case, dynamic coupling of link inertia is exploited
in the VSA with the help of appropriate stiffness adjustment rather than the passive dynamics due to gravity.

305
Motivated by the work in [9], our goal in this study is the tracking control of the given joint and stiffness trajectories,
to demonstrate that highly dynamic tasks such as brachiation and the problem of generating such desired trajectories for a
can be achieved by fully exploiting passive dynamics with given specific task is not addressed.
simultaneous stiffness and temporal optimization. In our recent On the other hand, if we rearrange the linearized dynamics
work [24], effectiveness of temporal optimization and stiff- of (1) (sin q ≈ q) as
ness optimization in periodic movements has been discussed.
ml2 q̈ + (mgl + k)q = v (2)
However, temporal optimization and stiffness optimization are
treated separately and a rather simplified, ideal actuator models where v = kqm , another view of the control problem could
were used in the evaluation. In this study, numerical evalua- be that varying the stiffness of the actuator k in the second
tions of our approach on a two-link brachiating robot with term of the left hand side effectively changes the dynamics
a realistic MACCEPA (Mechanically Adjustable Compliance property, e.g., the natural frequency of the pendulum. From
and Controllable Equilibrium Position Actuator) VSA model this perspective, the control problem can be framed as find-
[11] (see motivation for this particular VSA in Section III-B) ing an appropriate (preferably small) stiffness profile k to
are provided to show the effectiveness of our approach under modulate the system dynamics (only when necessary) and
different task requirements, modelling errors and variations of compute the virtual equilibrium trajectory qm [30] to fulfill
the robot dynamics. Furthermore, we also discuss the issue of the specified task requirement while maximally exploiting the
and effect of task encoding via an appropriate choice of the natural dynamics.
cost function for successful task execution. In a realistic situation, it is not straightforward to compute a
control command for the actuator to realize such an idea due to
II. PASSIVE C ONTROL S TRATEGY IN S WING M OVEMENT the complexity of the system dynamics, actuator mechanisms,
WITH VARIABLE S TIFFNESS ACTUATION the requirement of coordination of multiple degrees of freedom
Our goal in this paper is to devise a control strategy to and redundancy in actuation. Next, we exploit the framework
achieve the desired swing maneuver in brachiation by exploit- of optimal control and spatiotemporal optimization of variable
ing natural dynamics of the system. To begin with, we discuss stiffness actuation to find appropriate control commands to
our approaches of implementing a passive control strategy, implement the brachiation task.
considering a pendulum with variable stiffness actuation. III. P ROBLEM F ORMULATION
A natural and desirable strategy would be to make good
use of gravity by making the joints passive and compliant. For A. Robot Dynamics
example, in walking, unlike many high gain position controlled The equation of motion of the two-link brachiating robot
biped robots with stiff joints, humans seem to let the lower shown in Fig. 1 takes the standard form of rigid body dynamics
leg swing freely by relaxing the muscles controlling the knee where only the second joint has actuation:
joint during the swing phase and increase stiffness only when 0
necessary. In fact, stiffness modulation is observed during a M(q)q̈ + C(q, q̇)q̇ + g(q) + Dq̇ = (3)
τ
walking cycle in a cat locomotion study2 [1].
Consider the dynamics of a simplified single-link pendulum where q = [ q1 , q2 ]T is the joint angle vector, M is the
under the influence of gravity. If we consider an idealized inertia matrix, C is the Coriolis term, g is the gravity vector,
VSA model of the form τ = −k(q − qm ), where q is the joint D is the viscous damping matrix, and τ is the joint torque
angle, τ is the joint torque, k is the stiffness and qm is the acting on the second joint.
equilibrium position of the elastic actuator, then the dynamics B. Variable Stiffness Actuation
can be written as: We consider a MACCEPA model [11] as our VSA imple-
ml2 q̈ + mgl sin q = τ = −k(q − qm ) (1) mentation of choice. MACCEPA is one of the designs of
mechanically adjustable compliant actuators with a passive
where m is the mass, l is the length and g is the gravitational elastic element (see Fig. 1). This actuator design has the de-
constant. In this idealized VSA model, we assume that k and sirable characteristics that the joint can be very compliant and
qm are the control variables. From a viewpoint of position mechanically passive/back-drivable: this allows free swinging
control, one way of looking at this system is as a manipulator with a large range of movement by relaxing the spring, highly
with a flexible (elastic) joint, where we solve a tracking control suitable for the brachiation task we consider. MACCEPA is
problem [31]. Recently, Palli et al. [25], proposed a tracking equipped with two position controlled servo motors, qm1 and
control algorithm for such a flexible joint manipulator with qm2 , each of which controls the equilibrium position and
variable stiffness actuation to achieve asymptotic tracking to the spring pre-tension, respectively. The parameters of the
the desired joint and stiffness trajectories based on input- robot we use in this study (Fig. 1(c)) are based on a 2-link
output linearization, effectively an active cancellation of the MACCEPA joint (Fig. 1(b)) constructed in our lab [3].
intrinsic robot dynamics. Note that the main focus of [25] is The joint torque for this actuator model is given by
. /
2 To our knowledge, there are a large number of studies of stretch reflexes
rd qm2 − (C − B)
modulation in human walking, however, something that specifically addresses τ = κ sin(qm1 −q) BC 1+'
stiffness modulation is very limited. In human arm cyclic movement, Bennett B 2 + C 2 −2BC cos (qm1 −q)
et al. [2] reported time-varying stiffness modulation in the elbow joint. (4)

306
y

x Target bar
Robot parameters i=1 i=2
qm2 rd Mass mi (kg) 0.42 0.23
q1 lc1
Moment of inertia Ii (kgm2 ) 0.0022 0.0017
Link length li (m) 0.25 0.25
l1 m1 , I 1
A COM location lci (m) 0.135 0.0983
C κ Viscous friction di (Nm/s) 0.01 0.01
qm1 τ lc2

B q2 MACCEPA parameters value


m2 , I 2
Spring constant κ (N/m) 771
l2 Gripper
Lever length B (m) 0.03
q
Pin displacement C (m) 0.0125

MACCEPA model
Drum radius rd (m) 0.01

(a) Brachiating robot model with VSA (b) 2-link MACCEPA joint (c) Model parameters
Fig. 1. Model of a two-link brachiating robot with a MACCEPA variable stiffness actuator.

and the joint stiffness can be computed as where


  ⎡ ⎤
∂τ β κ sin2 (α)B 2 C 2 β  x2  
k=− = κ cos(α)BC 1 + − (5) ⎢ −1 0 ⎥
∂q γ γ3 ⎢ M (x1 ) −C(x1 , x2 )x2 −g(x1 )−Dẋ2 + τ (x , x ) ⎥
f=⎢ 1 3 ⎥
⎣ x ⎦
4
where κ is the spring constant, rd is the drum radius,
−a2 x3 − 2ax4 + a2 u
' = qm1 − q, β = rd qm2 − (C − B) and γ =
α (10)
B 2 + C 2 − 2BC cos (qm1 − q) (see Fig. 1(a) and (c) for x = [ x1 , x2 , x3 , x4 ]T = [ q, q̇, qm , q̇m ]T ∈ R8 ,
the definition of the model parameters and variables). q = [ q1 , q2 ]T , qm = [ qm1 , qm2 ]T and u = [ u1 , u2 ]T .
The spring tension is given by Note that a in (10) denotes a = diag{ai } and a2 is defined
F = κ(l − l0 ) (6) as a2 = diag{a2i } for notational convenience.
' D. Optimal Feedback Control with Temporal Optimization
where l = B 2 + C 2 − 2BC cos(qm1 − q) + rd qm2 is the
current spring length and l0 = C − B is the spring length at For plant dynamics
rest. The joint torque equation (4) also can be rearranged in
terms of the moment arm and the spring tension as ẋ = f (x, u), (11)

BC sin(qm1 − q) the objective of optimal control [33] is to find a control law


τ= F. (7)
γ u∗ = u(x, t) (12)
Note that MACCEPA has a relatively simple configuration in
which minimizes the cost function
terms of actuator design compared to other VSAs, however, the
 T
torque and stiffness relationships in (4) and (5) are dependent
on the current joint angle and two servo motor angles in a J = Φ(x(T )) + h(x(t), u(t))dt (13)
0
complicated manner and its control is not straightforward.
In addition, we include realistic position controlled servo for a given movement duration T , where Φ(x(T )) is the
motor dynamics, approximated by a second order system with terminal cost and h(x(t), u(t)) is the running cost. We employ
a PD feedback control the iterative linear quadratic Gaussian (ILQG) algorithm [17]
to obtain a locally optimal feedback control law
q̈mi + 2ai q̇mi + a2i (qmi − ui ) = 0, (i = 1, 2) (8)  
u(x, t) = uopt (t) + L(t) x(t) − xopt (t) . (14)
where ui is the motor position command, ai determines the
bandwidth of the actuator and the range of the servo motors In addition to obtaining an optimal control law, we simultane-
are limited as qmi,min ≤ qmi ≤ qmi,max and ui,min ≤ ui ≤ ously optimize the movement duration T using the temporal
ui,max [3]. In this study, we use ai = 50. optimization algorithm proposed in [27]. In [27], a mapping
β(t) from the real time t to a canonical time t
C. Augmented Plant Dynamics  t
1
The plant dynamics composed of the robot dynamics (3) t = ds, (15)
0 β(s)
and the servo motor dynamics (8) can now be formulated as
is introduced and β(t) is optimized to yield the optimal
ẋ = f (x, u) (9) movement duration T . In this study, we simplify the temporal

307
robot movement with spring tension cost
optimization algorithm by discretizing (15) with an assumption
that β(t) is constant during the movement as 0 Target

1
Δt = Δt. (16) −0.1
β
By updating β using gradient descent −0.2

βnew = β − η∇β J (17) −0.3

where η > 0 is a learning rate, we obtain the movement dura- −0.4


tion T  = β1 T where T = N Δt (N is the number of discrete
time steps). In the complete optimization procedure, ILQG and −0.5

the update of β in (17) are iterated in an EM (Expectation- −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4
Maximization)-like manner until convergence to obtain the (a) Movement of the robot with optimized duration T = 0.606 (sec)
final optimal feedback control law (14) and the associated  

movement duration T ∗ . Depending on the task objective, it

   

is further possible to augment the cost by including the time

explicitly as q1 with Topt



J  = J + wT T (18)  q2 with Topt





       


where J is the cost (13) and wT is the weight on the time
  
cost, which determines trade-off between the original cost J
and movement duration T . 

  
IV. E VALUATIONS3 
qm2 with Topt 

qm1 with Topt 
A. Optimization Results in Brachiation Task  

       


In this paper, we consider the task of swing locomotion from  
handhold to handhold on a ladder. and swinging-up from the (b) Joint trajectories and servo motor positions
suspended posture to catch the target bar. Motivated by the ! !"
# !#  ! !" 

discussions on our passive control strategy in Section II, we  


consider the following cost function to encode the task (the 
nearly zero joint torque
specific reason will be explained below) 
 T        
∗ T ∗
 T  # !#
J = (y(T ) − y ) QT (y(T ) − y ) + u R1 u + R2 F 2 dt 
0 
(19) nearly zero spring tension

where y = [ r, ṙ ]T ∈ R4 are the position and the velocity
of the gripper in the Cartesian coordinates, y∗ is the target 
       
values when reaching the target y∗ = [ r∗ , 0 ]T and F is ! #!##
#!##  


the spring tension in the VSA given in (6). This objective nearly zero joint stiffness

function is designed in order to reach the target located at r∗ 

at the specified time T while minimizing the spring tension F 

in the VSA. Note that the main component in the running cost        
! #
is to minimize the spring tension F by the second term while (c) Joint torque, spring tension and joint stiffness
the first term uT R1 u is added for regularization with a small Fig. 2. Optimization of the locomotion task using the cost (19). In (b)
choice of the weights in R1 . In practice, this is necessary since and (c), gray thin lines show the plots for non-optimized T in the range of
F is a function of the state and ILQG requires a control cost T = [0.5, · · · , 0.7] (sec) and thick lines show the plots for optimized Topt =
0.606. Note that especially at the beginning and the end of the movement,
in its formulation to compute the optimal control law. joint torque, spring tension and joint stiffness are kept small allowing the joint
Notice that the actuator torque (7) can be expressed in the to swing passively.
form
τ = −F sin(q − qm1 )/γ  (20) Another interpretation can be considered in such a way that if
' we linearize (4) around the equilibrium position assuming that
where γ  = B 2 + C 2 − 2BC cos (qm1 − q)/BC. In this
α = qm1 − q  1, the relationship between the joint stiffness
equation (20), it can be conceived that F has a similar role to
k in (5) and the spring tension F in (6) can be approximated
the stiffness parameter k as in the simplified actuator model
as
τ = −k(q − qm ). (21) 1
k≈√ F. (22)
B + C 2 − 2BC
2
3 A video clip of summarizing the results is available at
http://goo.gl/iYrFr Thus, effectively, minimizing the spring tension F corresponds

308
              


u1 qm1  

with optimal feedback
 u2 qm2
  



feedforward only
 
 
  

   

    

      
   

Fig. 3. Servo motor commands ui (dotted line) and actual angle qmi (solid     

line) for the results with optimal movement duration T = 0.606 (cf. Fig. 2    
(b) bottom). Servo motor response delay can be observed characterized by     
the servo motor dynamics (8). The proposed optimal control framework finds
appropriate control commands taking this effect into account. Fig. 4. Effect of optimal feedback under the presence of parameter mismatch
between the nominal model used to compute optimal control and the actual
robot. Comparison between the movement with the optimal feedback control
to minimizing the stiffness k in an approximated way. Note and the movement with only feedforward command is shown. The result
demonstrates the effectiveness of optimal feedback control under model
that it is possible to directly use k in the cost function. uncertainty.
However, in practice, first and second derivatives of k are
needed to implement the ILQG algorithm which become B. Effect of Optimal Feedback under Modelling Error
significantly more complex than those of F since the joint
One of the benefits of using the optimal feedback control
stiffness k is already the first derivative of τ as described in (5).
framework is that in addition to computing the optimal feedfor-
Thus, it is preferable to use the spring tension F . This close
ward control command, it provides a locally optimal feedback
relationship between F and k in the general nonlinear case
control in the neighborhood of the optimal trajectory, which
can be observed in the plots, for example, in Fig. 2 (middle
allows the controller to make corrections if there is small
and bottom in (c)). In fact, the appropriate choice of the cost
deviation from the nominal optimal trajectory. In this section,
function is critical for successful task execution. We discuss
we present numerical studies of the effect of optimal feedback
the issue of task encoding via cost selection in Section IV-D
control (14) under the influence of model mismatch between
in more details.
the nominal model and actual robot parameters. We introduce
1) Swing Locomotion: Consider the case where that target a modelling error as mi,nominal = 1.05mi (link mass) and
bar is located at d = 0.3 (m). We optimize both the control lc,i,nominal = 1.1lc,i (location of center of mass on the link)
command u and the movement duration T . We use QT = for i = 1, 2.
diag{10000, 10000, 10, 10}, R1 = diag{0.0001, 0.0001} and Fig. 4 shows the comparison between the movement using
R2 = 0.1 for the cost function in (19). As mentioned above, the optimal feedback control law (14) obtained in the sim-
R1 is chosen to be a small value for regularization needed ulation in Section IV-A1 above and with only feedforward
for ILQG implementation. The optimized movement duration (open loop) optimal control command u = uopt (t) under the
was T = 0.606 (sec). presence of modelling error. Using only feedforward control,
Fig. 2 shows (a) the optimized robot movement, (b) joint the robot deviates from the target bar due to the model
trajectories and servo motor positions, and (c) joint torque, mismatch. However, with the optimal feedback control law,
spring tension and joint stiffness. In the plots, trajectories the robot is able to get closer towards the target with the help
of the fixed time horizon ranging T = 0.5 ∼ 0.7 (sec) are of the feedback term. These results suggest the effectiveness of
also overlayed for comparison in addition to the case of the the optimal feedback control. In future work, we are interested
optimal movement duration T = 0.606 (sec). In the optimized in on-line learning of the plant dynamics to address the issue
movement, the spring tension and the joint stiffness are kept of model uncertainty [20, 21].
small at the beginning and end of the movement resulting
in nearly zero joint torque, which allows the joint to swing C. Switching Dynamics and Tasks Parameters
passively. The joint torque is exerted only during the middle In this section, we explore different task requirements with
of the swing by increasing the spring tension as necessary. switching dynamics. In the following simulation, we use the
This result suggests that the natural plant dynamics are fully robot model with the link length as l1 = 0.2 (m) and
exploited for the desirable task execution based on the control l2 = 0.35 (m) introducing asymmetric configuration in the
strategy discussed in Section II with simultaneous stiffness and robot structure. We consider the task of first swinging up from
temporal optimization. the suspended posture to the target at d = 0.45 (m), then
In order to illustrate the effect of the servo motor dynamics subsequently continuing to locomote twice to the target bars
characterized by (8), Fig. 3 shows the servo motor position at d = 0.4 (m) and d = 0.42 (m) (irregular intervals). Note
commands and actual motor angles with the optimal movement that every time the robot grasps the target and starts swinging
duration (cf. Fig. 2 (b) bottom). Delays in the servo motor for the next target, the robot configuration is interchanged,
response can be observed in this plot. This suggests that which significantly changes the dynamic properties for each
the proposed optimal control framework can find appropriate swing movement due to asymmetric structure of the robot.
control commands taking this effect into account. Thus, the stiffness and movement duration need to be adjusted

309
           
optimized movement time for the swing locomotion 2) and 3)
 d=0.45 d=0.4 d=0.42
(more than 25%). In 2), the lower link is heavier and in 3) the
top link is heavier due to the mass of the VSA model. Thus,

the effective natural frequency of the pendulum movement

is different, which resulted in different movement duration.
T=3.460 The results highlight that our approach can find appropriate

movement duration and command sequence to achieve the
T=0
          task under different requirement and conditions (locomotion,
1) swing up 2) 1st locomotion with 3) 2nd locomotion swing-up, robot dynamics change and target distance change).
T=0~2.071 interchanged configuration T=2.849~3.460 In this example, each maneuver is optimized separately. Opti-
T=2.071~2.849 (0.778s) (0.611s)
mization over multiple swing movements including transitions
(a) Sequence of the movement of the robot
1) swing up    2) 1st locomotion 3) 2nd locomotion will be of our future interest.


mod 2pi

D. Design and Selection of a Cost Function
   



In optimal control, generally, a task is encoded in terms of
  a cost function, which can be viewed as an abstract repre-
       sentation of the task. From our point of view and experience,
    design and selection of the cost function is one of the most

1) swing up 2) 1st locomotion 3) 2nd locomotion important and difficult parts for a successful application of


such an optimal control framework. For a simple task and


 




plant dynamics, an intuitive choice (typically a quadratic cost
  in the state and control as in an LQR setting) would suffice
   
  
   (still it is necessary to adjust the weights). However, for
(b) Joint trajectories and servo motor positions
a highly dynamic task with complex plant dynamics, this
  increasingly becomes difficult and an appropriate choice of
   

  the cost function which best encodes the task still remains an

open issue.
 1) swing up
    
2) 1st locomotion

3) 2nd locomotion
In this section, we explore a few more candidates of the
  cost functions. In addition to the cost function (19), consider
   


the following running cost functions h = h(x, u) in (13):

 • quadratic cost with the control command (servo motor

     
position command):
 
h = uT Ru
   

 (23)


• quadratic cost with the joint torque. The main term is

     
the cost associated with the joint torque τ and uT R1 u
  is added for regularization (small R1 ):
(c) Joint torque, spring tension and joint stiffness
h = uT R1 u + R2 τ 2 (24)
Fig. 5. Simulation results of the sequence of movements. Note that the robot
configuration is asymmetric with the link length l1 = 0.2 (m) and l2 = 0.35 Figure 6 shows the results using the running cost h =
(m). When the robot swings after grasping the bar, the robot configuration is
interchanged, which significantly changes the dynamic characteristics. uT Ru in (23) with R = diag{1, 1}. The obtained optimal
movement duration is T = 0.604 (sec). Figure 7 shows the
results using the running cost h = uT R1 u+R2 τ 2 in (24) with
appropriately to fulfill the desired task requirement. The cost R1 = diag{0.0001, 0.0001} and R2 = 100. The obtained
function (19) with the same parameters are used as in the optimal movement duration is T = 0.620 (sec). In both of
previous simulations. For the swing-up task, we add the time these two cases, the same terminal cost parameters are used
cost wT T with wT = 5 (see (18)), i.e., the task requirement as in the case of (19).
in swing-up is try to swing up quickly while minimizing the As demonstrated in Figs. 6 and 7, the robot is also success-
control cost. fully able to reach the target bar by minimizing each specific
Fig. 5 shows (a) the sequence of the optimized robot cost in addition to the case of the cost (19) presented in Section
movement, (b) joint trajectories and servo motor positions, IV-A above. However, with the choice of the running cost
and (c) joint torque, spring tension and joint stiffness. The ob- (23), significant difference in the resultant robot movement and
tained optimal movement duration was The obtained optimal much higher spring tension and joint stiffness can be observed
movement duration was 1) T = 2.071 (sec) for swing up, 2) in Fig. 6. As can be seen in Fig. 7, with the choice of cost
T = 0.778 (sec) for the locomotion with interchanged (upside associated with the joint torque in (24), the resultant movement
down) robot model and 3) T = 0.611 (sec) for the last swing looks almost identical to the one with the cost (19) (see Fig. 2)
movement, respectively. Notice the significant difference in the and the joint torque profile is comparable. However, we can

310
robot movement with servo motor position command cost joint angles joint torque

spring tension (N) joint torque (Nm)


τ

joint angles (rad)


1
0 Target 2
0

0 −1
q 0 0.1 0.2 0.3 0.4 0.5 0.6
−0.1 1
−2 q spring tension
2
40
−0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 F
20
servo motor positions
−0.3 0
2 0 0.1 0.2 0.3 0.4 0.5 0.6

angle (rad)
joint stiffness

stiffness (Nm/rad)
−0.4 2
0 k
q
m1 1
−2 q
−0.5 m2 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 time (sec) time (sec)

Fig. 6. Optimization of the locomotion task using the running cost l = uT R1 u in (23) with R1 = diag{1, 1}. Left: Movement of the robot with optimized
duration T = 0.604 (sec) Center: Joint angles and servo motor angles. Right: Joint torque, spring tension and joint stiffness. Note that while the task itself
is achieved, the movement looks very different from the one in Fig. 2 and much higher spring tension and joint stiffness during the swing movement can be
observed.
joint angles joint torque

spring tension (N) joint torque (Nm)


robot movement with joint torque cost
τ
joint angles (rad)
1
2
0 Target 0

0 −1
q1 0 0.1 0.2 0.3 0.4 0.5 0.6
−0.1
−2 q2 spring tension
40
−0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 F
20
servo motor positions
−0.3 0
2 0 0.1 0.2 0.3 0.4 0.5 0.6
angle (rad)

joint stiffness

stiffness (Nm/rad)
−0.4 2
0 k
qm1
1
−2 qm2
−0.5 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 time (sec) time (sec)

Fig. 7. Optimization of the locomotion task using the running cost l = uT R1 u + R2 τ 2 in (24) with R1 = diag{0.0001, 0.0001} and R2 = 100. Left:
Movement of the robot with optimized duration T = 0.620 (sec) Center: Joint angles and servo motor angles. Right: Joint torque, spring tension and joint
stiffness. The resultant movement looks almost identical to the one in Fig. 2 and the joint torque profile is comparable. However, we can observe that spring
tension and joint stiffness are larger than those of Fig. 2.

observe that spring tension and joint stiffness are larger than mechanisms. This is in contrast to joint actuation with geared
those of the cost (19). This is due to the redundancy in the electric motors in many of existing robotic systems. Typically,
variable stiffness actuation and the results depend on how we joints with geared motors with high gear ratio aimed for po-
resolve it by an appropriate choice of the cost function. These sition control cannot be fully back-drivable, i.e., joints cannot
results suggest that the choice of the cost function is crucial, be made passive to exploit natural dynamics of the link. For
however, its selection is still non-intuitive. example, the brachiating robot in [23] uses a DC motor with
Note that other consideration of cost functions could be a harmonic drive gear and exhibited complex and relatively
possible, e.g., energy consumption. In the brachiation task, high friction. Thus, in this design, it is not possible to exploit
without friction, the- T mechanical energy of the rigid body the passive dynamics of the second link since the motor is not
dynamics, E = 0 τ q̇2 dt, is conserved for the swing fully back-drivable by gravity, and it is necessary to actively
locomotion with the same intervals at the same height starting drive the joint to achieve the swing movement. To make the
and ending at zero velocity (if no potential energy is stored joint fully back-drivable without passive components, we may
in the spring of the VSA at the end of the swing). Thus, if need to use high performance direct drive motors which would
we wish to consider true energy consumption, it would be typically require precise torque control mechanisms.
necessary to evaluate the electrical energy consumed at the From the viewpoint of a different controller design ap-
motor level. However, this is not straightforward since we proach, the target dynamics method [23] uses input-output
need a precise model of the mechanical and electrical motor linearization to actively cancel the plant dynamics. While its
dynamics including all the factors such as motor efficiency and effectiveness has been demonstrated in the torque controlled
transmission loss, which could be rather complex to model in robot hardware, it is not straightforward to apply this method
practice, and the control strategy would largely depend on the to the control of robot with general variable stiffness mecha-
properties of the actual motors used. nisms since the system dynamics are not easily input-output
linearizable due to redundancy and complex nonlinearity in
E. Remarks on other Joint Actuation and Controller Design actuator dynamics. Furthermore, it turned out that for the
Approaches parameter setting used in Section IV-C, the target dynamics
In this paper, we explore variable stiffness actuation to controller becomes singular at some joint angle q2 within the
exploit passive dynamics in swing movement. One of the range of the movement even for the torque controlled case.
desirable properties of the variable stiffness actuation we With the link mass parameters used in this paper, we did not
consider is that the joint can be fully mechanically passive find problems with the same link length l1 = l2 , however,
by appropriately adjusting the spring tension in the actuator typically, we numerically found that when l2 > l1 , the

311
target dynamics method encounters an ill-posedness problem [10] S. Haddadin, M. Weis, S. Wolf, and A. Albu-Schaëffer. Optimal
of invertibility in the derivation of the control law (cf. Equation control for maximizing link velocity of robotic variable stiffness
(15) in [23]). joints. In IFAC World Congress, 2011.
[11] R. Van Ham, et al. MACCEPA, the mechanically adjustable
V. C ONCLUSION compliance and controllable equilibrium position actuator: De-
sign and implementation in a biped robot. Rob. and Aut. Sys.,
In this paper, we have presented an optimal control frame- 55(10):761–768, 2007.
work for exploiting passive dynamics of the system for swing [12] R. Van Ham, et al. Compliant actuator designs. IEEE Robotics
movements. As an example, we considered brachiation on a and Automation Mag., 16(3):81–94, 2009.
[13] T. Hondo and I. Mizuuchi. Analysis of the 1-joint spring-
two-link underactuated robot with a variable stiffness actuation
motor coupling system and optimization criteria focusing on
mechanism, which is a highly dynamic and challenging task. the velocity increasing effect. In IEEE ICRA, 2011.
Numerical simulations illustrated that our framework was able [14] J. W. Hurst, J. E. Chestnutt, and A. A. Rizzi. The actuator
to simultaneously optimize the time-varying joint stiffness with mechanically adjustable series compliance. IEEE Trans.
profile and the movement duration exploiting the passive on Robotics, 26(4):597–606, 2010.
[15] A. Jafari, N. G. Tsagarakis, B. Vanderborght, and D. G. Cald-
dynamics of the system. These results demonstrate that our well. A novel actuator with adjustable stiffness (AwAS). In
approach can deal with different task requirements (locomo- IEEE/RSJ IROS, 2010.
tion in different intervals, swing-up), modelling errors and [16] J. G. D. Karssen and M. Wisse. Running with improved
switching in the robot dynamics. In addition, we empirically disturbance rejection by using non-linear leg springs. Int. J.
explored the issue of the design and selection of an appropriate of Robotics Research, 30(13):1585–15956, 2011.
[17] W. Li and E. Todorov. Iterative linearization methods for
cost function for successful task execution. approximately optimal control and estimation of non-linear
The approach presented in this paper to exploit the passive stochastic system. Int. J. of Control, 80(9):1439–1453, 2007.
dynamics with VSA contrasts to the nonlinear controller [18] A. De Luca and G. Oriolo. Trajectory planning and control
design with active cancellation of the plant dynamics using for planar robots with passive last joint. Int. J. of Robotics
input-output linearization for the same task [23]. However, Research, 21(5-6):575–590, 2002.
[19] K. M. Lynch, et al. Collision-free trajectory planning for a 3-
we feel that it shares an important issue of task encoding DOF robot with a passive joint. Int. J. of Robotics Research,
(or description) either in the form of target dynamics or in 19(12):1171–1184, 2000.
terms of a cost function based on physical understanding and [20] D. Mitrovic, S. Klanke, R. Osu, M. Kawato, and S. Vijayaku-
insight into the task. We aim to extend our approach to include mar. A computational model of limb impedance control based
variable damping [26] for dynamic tasks involving interactions on principles of internal model uncertainty. PLoS ONE, 2010.
[21] D. Mitrovic, S. Klanke, and S. Vijayakumar. Learning
with environments. Impedance Control of Antagonistic Systems Based on Stochas-
ACKNOWLEDGMENTS tic Optimization Principles. Int. J. of Robotics Research, 2010.
[22] Y. Nakamura, et al. Nonlinear behavior and control of a
This work was funded by the EU Seventh Framework nonholonomic free-joint manipulator. IEEE Trans. on Rob. and
Programme (FP7) as part of the STIFF project. Aut., 13(6):853–862, 1997.
[23] J. Nakanishi, T. Fukuda, and D. E. Koditschek. A brachiating
R EFERENCES robot controller. IEEE Trans. on Robotics and Automation, 16
[1] K. Akawaza, et al. Modulation of stretch reflexes during (2):109–123, 2000.
locomotion in the mesencephalic cat. J. of Physiolosy, 329 [24] J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness and tem-
(1):553–567, 1982. poral optimization in periodic movements: An optimal control
[2] D. J. Bennett, et al. Time-varying stiffness of human elbow approach. In IEEE/RSJ IROS, 2011.
joint during cyclic voluntary movement. Exp. Brain Res., 88: [25] G. Palli, C. Melchiorri, and A. De Luca. On the feedback
433–442, 1992. linearization of robots with variable joint stiffness. In IEEE
[3] D. Braun, M. Howard, and S. Vijayakumar. Optimal Variable ICRA, 2008.
Stiffness Control: Formulation and Application to Explosive [26] A. Radulescu, M. Howard, D. J. Braun, and S. Vijayakumar.
Movement Tasks. Autonomous Robots, 2012. doi: 10.1007/ Exploiting variable physical damping in rapid movement tasks.
s10514-012-9302-3. In IEEE/ASME AIM Conf., 2012.
[4] M. G. Catalano, R. Schiavi, and A. Bicchi. Mechanism [27] K. Rawlik, M. Toussaint, and S. Vijayakumar. An approximate
design for variable stiffness actuation based on enumeration and inference approach to temporal optimization in optimal control.
analysis of performance. In IEEE ICRA, 2010. NIPS 2010.
[5] S. Collins, A. Ruina, R. Tedrake, and M. Wisse. Efficient [28] N. Rosa, A. Barber, R. D. Gregg, and K. Lynch. Stable open-
bipedal robots based on passive-dynamic walkers. Science, 307 loop brachiation on a vertical wall. In IEEE ICRA, 2012.
(5712):1082–1085, 2005. [29] F. Saito, T. Fukuda, and F. Arai. Swing and locomotion control
[6] O. Eiberger, et al. On joint design with intrinsic variable for a two-link brachiation robot. IEEE Cont. Sys. Mag., 14(1):
compliance: derivation of the DLR QA-joint. In IEEE ICRA, 5–12, 1994.
2010. [30] R. Shadmehr. Learning virtual equilibrium trajectories for
[7] S. Eimerl and I. DeVore. The Primates. TIME-LIFE BOOKS, control of a robot arm. Neural Computation, 2(4):436–446,
1966. 1990.
[8] M. Garabini, A. Passaglia, F. Belo, P. Salaris, and A. Bicchi. [31] M. W. Spong. Modeling and control of elastic joint robots.
Optimality principles in variable stiffness control: The VSA Trans. of ASME, J. of Dynamic Systems, Measurement, and
hammer. In IEEE/RSJ IROS, 2011. Control, 109(6):310–319, 1987.
[9] M. W. Gomes and A. L. Ruina. A five-link 2D brachiating ape [32] M. W. Spong. The swing up control problem for the acrobot.
model with life-like zero-energy-cost motions. J. of Theoretical IEEE Control Sys. Mag., 15(1):49–55, 1995.
Biology, 237(3):265–278, 2005. [33] R. F. Stengel. Optimal Control and Estimation. 1994.

312
Inference on Networks of Mixtures for Robust Robot Mapping
Edwin Olson Pratik Agarwal
Computer Science and Engineering, Computer Science and Engineering,
University of Michigan, University of Michigan,
2260 Hayward Street, 2260 Hayward Street,
Ann Arbor, Michigan Ann Arbor, Michigan
Email: [email protected] Email: [email protected]

Abstract— The central challenge in robotic mapping is ob-


taining reliable data associations (or “loop closures”): state-of-
the-art inference algorithms can fail catastrophically if even
one erroneous loop closure is incorporated into the map.
Consequently, much work has been done to push error rates
closer to zero. However, a long-lived or multi-robot system will
still encounter errors, leading to system failure.
We propose a fundamentally different approach: allow richer
error models that allow the probability of a failure to be
explicitly modeled. In other words, we optimize the map while
simultaneously determining which loop closures are correct
from within a single integrated Bayesian framework. Unlike
earlier multiple-hypothesis approaches, our approach avoids
exponential memory complexity and is fast enough for real-
time performance.
We show that the proposed method not only allows loop
closing errors to be automatically identified, but also that in
extreme cases, the “front-end” loop-validation systems can be
unnecessary. We demonstrate our system both on standard
benchmarks and on the real-world datasets that motivated this Fig. 1. Recovering a map in the presence of erroneous loop closures. We
work. evaluated the robustness of our method by adding erroneous loop closures
to the Intel data set. The top row reflects the posterior map as computed by
a state-of-the-art sparse Cholesky factorization method with 1, 10, and 100
I. I NTRODUCTION bad loop closures. The bottom row shows the posterior map for the same
data set using our proposed max mixture method. While earlier methods
Robot mapping problems are often formulated as an infer- produce maps with increasing global map deformation, our proposed method
ence problem on a factor graph: variable nodes (representing is essentially unaffected by the presence of the incorrect loop closures.
the location of robots or other landmarks in the environment)
are related through factor nodes, which encode geometric
relationships between those nodes. Recent Simultaneous tifying and validating loop closures and constructing a factor
Localization and Mapping (SLAM) algorithms can rapidly graph; the back-end then performs inference (often maximum
find maximum likelihood solutions for maps, exploiting likelihood) on this factor graph. In most of the literature, it is
both fundamental improvements in the understanding of assumed that the loop closures found by the front-end have
the structure of mapping problems [1], [2], [3], and the noise that can be modeled as Gaussian.
computational convenience afforded by assuming that error For example, the front-end might assert that the robot is
models are simple uni-modal Gaussian [4]. now at the same location that it was ten minutes ago (it has
Despite their convenience, Gaussian error models often “closed a loop”), with an uncertainty of 1 meter. Suppose,
poorly approximate the truth. In the SLAM domain, per- however, that the robot was somewhere else entirely— a
ceptual aliasing can lead to incorrect loop closures, and the full 10 meters away. The back-end’s role is to compute the
resulting error can lead to divergence of the map estimate. maximum likelihood map, and an error of ten standard devi-
Similarly, the wheels of a robot may sometimes grip and ations is so profoundly unlikely that the back-end will almost
sometimes slip, leading to a bi-modal motion model. Similar certainly never recover the correct map: it is compelled to
challenges arise throughout robotics, including sonar and distort the map so as to make the erroneous loop closure
radar (with multi-path effects), target-tracking (where multi- more probable (see Fig. 1).
ple disjoint hypotheses may warrant consideration), etc. The conventional strategy is to build better front-end
In the specific case of SLAM, it has become standard systems. Indeed, much effort has been devoted to creating
practice to decompose the problem into two halves: a “front- better front-end systems [5], [6], [7], and these approaches
end” and “back-end”. The front-end is responsible for iden- have succeeded in vastly reducing the rate of errors. But

313
for systems that accumulate many robot-hours of operation, • We formulate a new mixture model that provides signifi-
or robots operating in particularly challenging environments, cant computational advantages over the more traditional
even an extremely low error rate still results in errors. These sum-of-Gaussians mixtures, while retaining similar ex-
errors lead to divergence of the map and failure of the system. pressive power.
Our recent efforts at building a team of robots that can • We develop an algorithm for fast maximum-likelihood
cooperatively explore and map an urban environment [8] inference on factor graph networks containing these
illustrate the challenges, and motivated this work. At the max-mixtures.
time, we modeled the uncertainty of odometry and loop • We demonstrate how robot mapping systems can use
closing edges with simple Gaussians, but despite extensive these methods to robustly handle errors in odometry
validation of these edges prior to optimization, some of these and loop-closing systems.
edges had large errors that were virtually impossible given • We characterize the robustness of our method to local
their noise model. Even with a novel interface allowing a minima, identifying factors (like error rate and overall
human to help untangle the resulting map [8], errors were graph degree) and their impact.
still evident (see Fig. 6). Our subsequent analysis revealed • We evaluate our algorithm on real-world datasets to
that odometry edges were often to blame. We had assumed a demonstrate its practical applicability both in terms of
15% noise model, but our robots, driving under autonomous the quality of results and the computation time required.
control, would occasionally get caught on small, unsensed
obstacles. As a result, the robot actually encountered 100% II. R ELATED W ORK
error—five standard deviations given our prior noise model. We are not the first to consider estimation in the presence
The resulting error in our position estimates exacerbated of non-Gaussian noise. Two well-known methods allow more
the perceptual aliasing problem: our incorrect position prior complex error models to be used: particle filter methods and
would argue against correct loop closure hypotheses, and multiple hypothesis tracking (MHT) approaches.
would favor some incorrect hypotheses. Particle filters, perhaps best exemplified by Fast-
In this paper, we propose a novel approach that allows SLAM [10], approximate arbitrary probability distributions
efficient maximum-likelihood inference on factor graph net- through a finite number of samples. Particle filters attempt to
works that contain arbitrarily complex probability distribu- explicitly (though non-parametrically) describe the posterior
tions. This is in contrast to state-of-the-art factor graph based distribution. Unfortunately, the posterior grows in complexity
methods, which are limited to uni-modal Gaussian distribu- over time, requiring an ever-increasing number of particles
tions, and which suffer from the real-world problems de- to maintain the quality of the posterior approximation. This
scribed above. Specifically, we propose a new type of mixture growth quickly becomes untenable, forcing practical imple-
model, a max-mixture, which provides similar expressivity as mentations to employ particle resampling techniques [11],
a sum-mixture, but avoids the associated computational costs. [12], [13]. Unavoidably, resampling leads to a loss of
With such a mixture, the “slip or grip” odometry problem can information, since areas with low probability density are
be modeled as a multi-modal distribution, and loop closures effectively truncated to zero. This loss of information can
can be accompanied by a “null” hypothesis. In essence, the make it difficult to recover the correct solution, particularly
back-end optimization system serves as a part of the front- after a protracted period of high uncertainty [14], [15].
end— playing an important role in validating loop closures Multiple Hypothesis Tracking approaches provide an alter-
and preventing divergence of the map. native approach more closely related to mixture models [16].
We will demonstrate our system on real data, showing These explicitly represent the posterior using an ensemble of
that it can easily handle the error rates of current front- Gaussians that collectively encode a mixture. However, the
end data validation systems, allowing robust operation even number of potential hypotheses tends to grow exponentially,
when these systems produce poor output. We will also forcing pruning of hypotheses and thus information loss. A
illustrate that, in extreme cases, no front-end loop validation similar problem occurs when explicitly performing inference
is required at all: all candidate loop closures can simply be on mixtures; Whyte et al approximate terrain using mix-
added to the factor graph, and our approach simultaneously tures [17] but must resample the terrain in order to prevent
produces a maximum likelihood map while identifying the an intractable number of mixture components. In both cases,
set of edges that are correct. This is an interesting develop- information is lost that can result in lower-quality solutions.
ment, since it provides a fully integrated Bayesian treatment In the special case where errors are modeled as uni-modal
of both mapping and data association, tasks which are usually Gaussians, the maximum likelihood solution of the factor
decoupled. graph network can be found using non-linear least squares.
It has previously been shown that exact inference on even Beginning with the observation that the information matrix is
poly-trees of mixtures is NP-hard [9]. Our method avoids sparse [18], [19], [20], efforts to exploit that sparsity resulted
exponential complexity at the expense of guaranteed conver- in rapid improvements to map inference by leveraging sparse
gence to the maximum likelihood solution. In this paper, we factorization and good variable-ordering heuristics [21], [22],
explore the robustness of our method, and characterize the [23]. While the fastest of these methods generally provide
error rates that can be handled. only maximum-likelihood inference (a shortcoming shared
In short, the contributions of this paper are: by our proposed method), approximate marginal estimation

314
Original bi−modal mixture Max−mixture Sum−mixture
methods are fast and easy to implement [24], [25]. It is highly 0.5

0.45
0.5

0.45
0.7

desirable for new methods to be able to leverage the same 0.4 0.4
0.6

0.5
insights that made these approaches so effective. 0.35

0.3
0.35

0.3
0.4
One method similar to our own explicitly introduces 0.25 0.25
0.3
switching variables whose value determines whether or not a 0.2

0.15
0.2

0.15 0.2
loop closure is accepted [26]. This work is notable for being 0.1 0.1
0.1

the first to propose a practical way of dealing with front-end 0.05

0
0.05

0 0
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
errors during inference. In comparison to our approach, the
activation/deactivation of a loop closure is penalized through
a separate cost function (as opposed to being integrated into
a mixture model), they must assign initial values to these
variables (as opposed to our implicit inference over the latent
variables), and are limited to either “on” or “off” (as opposed
to being able to model mixture models with multiple distinct
modes).
Robustified cost functions [27] provide resilience to errors
by reducing the cost associated with outliers, and have been
Fig. 2. Mixture Overview. Given two mixture components (top left),
widely used in the vision community [28], [29]. We show the max- and sum- mixtures produce different distributions. In both cases,
that robustified cost functions are subsumed by our mixture arbitrary distributions can be approximated. A robustified cost function
approach. (in this case a corrupted Gaussian, bottom) can be constructed from two
Gaussian components with equal means but different variances.
Our proposed method avoids the exponential growth in
memory requirements of particle filter and MHT approaches
by avoiding an explicit representation of the posterior den-
of the form Ax = b. Critically, this is possible because the
sity. Instead, like other methods based on sparse factoriza-
logarithm operator can be “pushed” inside the product in
tion, our method extracts a maximum likelihood estimate.
Eqn. 1, reducing the product of N terms into a sum of
Critically, while the memory cost of representing the poste-
N simple quadratic terms. No logarithms or exponentials
rior distribution grows exponentially, the cost of storing the
remain, making the resulting expression easy to solve.
underlying factor graph network (which implicitly encodes
We might now consider a more complicated function
the posterior) grows only linearly with the size of the
pi (x|z), such as a sum-mixture of Gaussians:
network. In other words, our method (which only stores 
the factor graph) can recover solutions that would have p(zi |x) = wj N (μj , Λ−1
j ) (3)
been culled by particle and MHT approaches. In addition, j
our approach benefits from the same sparsity and variable-
ordering insights that have recently benefited uni-modal In this case, each N (μj , Λ−1
j ) represents a different Gaus-
approaches. sian distribution. Such a sum-mixture allows great flexibility
in specifying the distribution p(zi |x). For example, we can
III. A PPROACH encode a robustified cost function by using two components
Our goal is to infer the posterior distribution of the state with the same mean, but with different variances. Or, we can
variable (or map) x, which can be written in terms of encode a bi-modal distribution.
the factor potentials in the factor graph. The probability is The problem with a sum-mixture is that the maximum
conditioned on sensor observations z; with an application of likelihood solution is no longer simple: the logarithm can no
Bayes’ rule and by assuming an uninformative prior p(x), longer be pushed all the way into the individual Gaussian
we obtain: components: the summation in Eqn. 3 prevents it. As a result,
&
p(x|z) ∝ p(zi |x) (1) the introduction of a sum-mixture means that it is no longer
i possible to derive a simple solution for x.
Prior to this work, it is generally assumed that the factor A. Max-Mixtures
potentials p(zi |x) can be written as Gaussians:
Our solution to this problem is a new mixture model type,
1 − 12 (fi (x)−zi )T Λi (fi (x)−zi ) one based on a max operator rather than a sum:
p(zi |x) = −1 1/2 e (2)
(2π) |Λi |
n/2
p(zi |x) = max wj N (μj , Λ−1
j ) (4)
j
Further, while fi (x) is generally non-linear, it is assumed
that it can be approximated using a first-order Taylor expan- While the change is relatively minor, the implications to
sion such that fi (x) ≈ Ji Δx + fi (x0 ). optimization are profound. The logarithm can be pushed
The posterior maximum likelihood value can be easily inside a max mixture: the max operator acts as a selector,
solved in such cases by taking the logarithm of Eqn. 1, returning a single well-behaved Gaussian component. The
differentiating with respect to x, then solving for x. This optimization process will be more thoroughly described in
classic least-squares approach leads to a simple linear system the following section.

315
A max mixture has much of the same character as a Even in the non-mixture case, this sort of non-linear
sum mixture and retains a similar expressivity: multi-modal optimization cannot guarantee convergence to the global
distributions and robustified distributions can be similarly optimal solution. It is useful to think of a given inference
represented (see Fig. 2). Note, however, that when fitting problem as having a “basin of convergence”— a region that
a mixture to a desired probability distribution, different contains all the initial values of x that would ultimately
components will result for sum- and max- mixtures. Assur- converge to the global optimal solution. For most well-
ing that the distributions integrate
 to one is also handled behaved problems with simple Gaussian distributions, the
differently: for a sum mixture, wj = 1 is a necessary and basin of convergence is large. Divergence occurs when the
sufficient condition; for a max mixture, proper normalization linearization error is so severe that the gradient points in the
is generally more difficult to guarantee. Usefully, for maxi- wrong direction.
mum likelihood inference, it is inconsequential whether the The posterior distribution for a network with N mixtures,
distribution integrates to 1. Specifically, suppose that some each with c components, is a mixture with as many as
normalization factor γ is required in order to ensure that a cN components. In the worst-case, these could be non-
max mixture integrates to one. Since γ is used to scale the overlapping, resulting in cN local minima. The global op-
distribution, the log of the resulting max mixture is simply timal solution still has a basin of convergence: if our initial
the log of the un-normalized distribution plus a constant. The solution is “close” to the optimal solution, our algorithm will
addition of such a constant does not change the solution of converge. But if the basin of convergence is extremely small,
a maximum-likelihood problem, and thus it is unnecessary then the practical utility of our algorithm will be limited.
for our purposes to compute γ. In other words, the key question to be answered about
our approach is whether the basin of convergence is usefully
B. Cholesky-MM large. Naturally, the size of this basin is strongly affected
We now show how max mixture distributions can be by the properties of the problem and the robustness of the
incorporated into existing graph optimization frameworks. algorithm used to search for a solution. One of the main
The principle step in such a framework is to compute the results of this paper is to show that our approach yields a
Jacobian, residual, and information matrix for each factor large basin of convergence for a wide range of useful robotics
potential. As we described previously, these are trivial to problems.
compute for a uni-modal Gaussian distribution.
For the max-mixture case, it might seem that computing IV. A PPLICATIONS AND E VALUATION
the needed derivatives for the Jacobian is difficult: the max- In this section, we show how our approach can be ap-
mixture is not actually differentiable at the points where plied to several real-world problems. We include quantitative
the maximum-likelihood component changes. While this evaluations of the performance of our algorithm, as well as
makes it difficult to write a closed-form expression for the characterize its robustness and runtime performance.
derivatives, they are none-the-less easy to compute.
The key observation is that the max operator serves as a A. Uncertain loop closures
selector: once the mixture component with the maximum We first consider the case where we have a front-end that
likelihood is known, the behavior of the other mixture produces loop closures with a relatively low, but non-zero,
components is irrelevant. In other words, the solver simply error rate. For each uncertain loop closure, we introduce
iterates over each of the components, identifies the most a max-mixture consisting of two components: 1) the front-
probable, then returns the Jacobian, residual, and information end’s loop closure and 2) a null hypothesis. The null hypoth-
matrix for that component scaled according to the weight wj esis, representing the case when the loop closure is wrong,
of that component. If the likelihood of two components are is implemented using a mixture component with a very large
tied—an event which corresponds to evaluating the Jacobian covariance. Weights are assigned to these two components
at a non-differentiable point—we pick one of the components in accordance with the error rate of the front-end.
arbitrarily. However, these boundary regions comprise an In practice, the behavior of the algorithm is not particularly
area with zero probability mass. sensitive to the weights associated with the null hypotheses.
The resulting Jacobians, residuals, and information matri- The main benefit of our approach arises from having a
ces are combined into a large least-squares problem which larger probability associated with incorrect loop closures,
we subsequently solve with a minimum-degree variable as opposed to the exponentially-fast decreasing probability
ordering heuristic followed by sparse Cholesky factorization, specified by the loop closer’s Gaussian. Even if the null
in a manner similar to that described by [3]. With our hypothesis has a very low weight (we used 10−5 ), it will
modifications to handle max-mixtures, we call our system provide a sufficiently plausible explanation of the data to
Cholesky-MM. prevent a radical distortion of the graph. Second, once the
It is often necessary to iterate the full least-squares prob- null hypothesis becomes dominant, its large variance results
lem several times. Each time, the best component in each in a very weak gradient for the edge. As a result, the edge
max-mixture is re-evaluated: in essence, as the optimization plays virtually no role in subsequent optimization. We set
proceeds, we dynamically select the best mixture component the mean of the null hypothesis equal to that of the front-
as an integral part of the optimization process. end’s hypothesis so that the small amount of gradient that

316
remains produces a slight bias back towards the front-end’s We can evaluate the quality of the accepted edges by
hypothesis. If subsequent observations re-affirm the front- comparing the error distribution of the true positives and
end’s hypothesis, it can still become active in the future. false positives (see Fig. 4). As the histogram indicates, the
Unlike particle filter or MHT methods which must eventually error distribution is similar, though the error distribution for
truncate unlikely events, no information is lost. the false positives is slightly worse than for the true positives.
A two-component mixture model in which both compo- Still, no extreme outliers (the type that cause divergence of
nents have identical means but different variances can be the map) are accepted by our method.
viewed as a robustified cost function. In particular, param-
eters can be chosen so that a two-component max mixture B. Multi-modal distributions
closely approximates a corrupted Gaussian [27] (see Fig. 2). In the previous section, we demonstrated that our method
Analysis of false positive edges Analysis of true positive edges
can be used to encode null hypotheses, or equivalently,
35 12000
implement robustified cost functions—a capability similar to
earlier work [26]. In that case, the probability distributions
30
10000
being approximated by each mixture have only a single
25 maximum. Our use of a mixture model, however, also
total number of false positives

total number of true positives

8000
allows multi-modal distributions to be encoded. The ability
20
to directly represent multi-modal distributions is a feature of
6000
15
our approach.
For example, one of the original motivating problems of
4000
10 this work was dealing with the “slip or grip” problem: the
2000
case where a robot’s wheels occasionally slip catastrophi-
5
cally, resulting in near zero motion. With a typical odometry
0 0
noise model of 10-20%, such an outcome would wreak havoc
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Difference between true distance and Difference between true distance and on the posterior map.
false positive edge length (meters) true positive edge length (meters)
Our approach to the “slip or grip” problem is to use a
Fig. 4. Error distribution for true and false positives. Our method accepts two-component mixture model: one component (with a large
some randomly-generated “false positives”, but an analysis of the error of weight) corresponds to the usual 15% noise model, while the
those edges indicates that they (left) are only slightly worse than the error
of true edges (right).
second component (with a low weight) is centered around
zero. No changes to our optimization algorithm are required
To evaluate the performance of our approach, we added to handle such a case. However, since the distribution now
randomly-generated loop closures to two standard bench- has multiple local maxima, it poses a greater challenge in
mark datasets: the 3500 node Manhattan set [25] and the Intel terms of robustness.
data set. These were processed in an online fashion, adding Of course, without some independent source of informa-
one pose and its edges (both correct and incorrect). This tion that contradicts the odometry data, there is no way to
mimics real-world operation better than a batch approach, determine that the wheels were slipping. To provide this
and is more challenging due to the fact that it is easier independent information, we used a state-of-the-art scan
to become caught in a local minimum since fewer edges matching system [31] to generate loop closures. We manually
are available to guide the optimization towards the global induced wheel slippage by pulling on the robot. Despite the
optimum. good loop closures, standard methods are unable to recover
For a given number of randomly-generated edges, we the correct map. In contrast, our method determines that
compute the posterior map generated by our method and
a standard non-mixture method, using a laser-odometry so-
lution as the initial state estimate. The mean-squared error
of this map is compared to the global optimal solution [30],
and listed in Fig. 3.
Our proposed method achieves dramatically lower mean-
squared errors than standard non-mixture versions. While
the true positive rate is nearly perfect in both experiments,
some randomly-generated edges (labeled false positives) are
accepted by our system. However, since the false positives
are randomly generated, some of them (by chance) are
actually close to the truth. Such “accidentally correct” edges Fig. 5. Slip or Grip Example. We evaluate the ability of our algorithm
should be accepted by our algorithm1 . to recover a good map in the presence of catastrophically slipping wheels.
In this case, the robot is obtaining loop closures using a conventional laser
1 We favor generating “false positives” in a purely random way, even scanner front-end. These loop closures are of high quality, but the odometry
though it leads to “accidentally” correct edges. Any filtering operation to edges still cause significant map distortions when using standard methods
reject these low-error edges would introduce a free parameter (the error (left). When a small probability is added to account for slippage, our mixture
threshold) whose value could be tuned to favor the algorithm. approach recovers a much improved map (right).

317
Manhattan Data Set
True Loop Edges False edges True Positive False Positive Average FP Error MSE (Our method) MSE (Non-mixture)
2099 0 2099 0 NaN 0.6726 0.6726
2099 10 2099 0 NaN 0.6713 525.27
2099 100 2099 1 0.0208 0.6850 839.39
2099 200 2099 2 0.5001 0.6861 874.67
2099 500 2099 3 0.6477 0.6997 888.82
2099 1000 2099 10 0.7155 0.7195 893.98
2099 2000 2099 22 0.5947 0.7151 892.54
2099 3000 2099 36 0.5821 0.7316 896.01
2099 4000 2099 51 0.6155 0.8317 896.05

Intel Data Set


True Loop Edges False Edges True Positive False Positive Average FP Error MSE (Our method) MSE (Non-mixture)
14731 0 14731 0 NaN 7.122x10−10 1.55x10−9
14731 10 14731 0 NaN 7.123x10−10 0.044
14731 100 14731 2 0.1769 4.431x10−6 2.919
14731 200 14731 9 0.1960 5.583x10−6 8.810
14731 500 14731 19 0.1676 1.256x10−5 34.49
14731 1000 14731 29 0.1851 5.840x10−5 71.86
14731 2000 14731 64 0.1937 2.543x10−4 86.37
14731 3000 14731 103 0.1896 3.307x10−4 91.04
14731 4000 14217 146 0.1699 0.014 95.36

Fig. 3. Null-hypothesis robustness. We evaluate the robustness of our method and a standard Gaussian method to the presence of randomly-generated
edges. As the number of randomly-generated edges increases, the mean squared error (MSE) of standard approaches rapidly degenerates; our proposed
method produces good maps even when the number of randomly-generated edges is large in comparison to the number of true edges. Our approach
does accept some randomly-generated edges (labeled “false positives” above), however the error of these accepted edges is comparable to that of the true
positives. In each case, the initial state estimate is that from the open-loop odometry.

“slip” mode is more likely than the “grip” mode, and recovers should result from handling loop closing and mapping from
the correct map (see Fig. 5). within an integrated Bayesian framework. The conventional
As part of our earlier multi-robot mapping work [8], we decoupling of mapping into a front-end and back-end, while
employed a team of 14 robots to explore a large urban practical, prevents a fully Bayesian solution.
environment. Wheel slippage contributed to a poor map in We evaluated this possibility using the Intel data set. At
two ways: 1) the erroneous factor potentials themselves, and every pose, a laser scan matcher attempts a match to every
2) the inability to identify good loop closures due to a low previous pose. The top k matches (as measured by overlap
quality motion estimate. By using a better odometry model, of the two scans) are formed into a mixture containing k + 1
our online system produced a significantly improved map components. (The extra component remains a null hypothesis
(see Fig. 6). to handle the case where all k matches are incorrect.) To push
the experiment as far as possible, no position information was
C. Eliminating the Front End
used to prune the set of k matches. Larger values of k provide
In current approaches, front-end systems are typically robustness against perceptual aliasing, since it increases the
responsible for validating loop closures prior to adding them likelihood that the correct match is present somewhere within
to the factor graph network. However, if the back-end can the set of k components.
recover from errors, is it possible to omit the filtering Note that this approach is significantly different than
entirely? adding k conventional edges to the graph. When edges are
In certain cases, our inference method can eliminate the added “in parallel”, the posterior reflects a weighted average
need for loop validation by the front-end. This is desirable of all of them. In our mixture formulation, only the selected
from a conceptual standpoint: in principle, a better map mode has an effect on the solution: the others are effectively
“switched off”.
An example of one mixture with k = 4 putative matches
is shown in Fig. 7. The weight of the components is set in
proportion to the score of the scan matcher.
Running our system in an online fashion, we obtain the
final map shown in Fig. 8. Online operation is more difficult
than batch operation, since there is less information available
early on to correct erroneous edges. Our system recovers a
consistent global map despite the lack of any front-end loop
validation.
Fig. 6. Online results using odometry mixture model. The left figure shows The quality of the open-loop trajectory estimate plays
a map of a 30m × 25m area in which our multi-robot urban mapping team an important role in determining whether the initial state
produced a poor map due to wheel slippage and the ensuing inability to find
loop-closures. With our odometry mixture model (right), the wheel slippage
estimate is within the basin of convergence. In this case, our
is (implicitly) detected, and we find additional loop closures. The result is open-loop trajectory estimate is fairly good, and our method
a significantly improved map. is able to infer the correct mode for each mixture despite the

318
100
lack of any front-end loop validation. NodeDegree=4
90 NodeDegree=8
The robustness of our method is amplified by better front- NodeDegree=12
80
end systems: with better quality loop closures, the basin of 70
Non Mixture

convergence is enlarged, allowing good maps to be computed

MSE error
60

even when the open-loop trajectory is poor. 50

40

D. Robustness 30

We have identified two basic factors that have a significant 20

influence on the success of our method: the number of 10

incorrect loop closures and the node degree of the graph. 0

0 10 20 30 40 50 60 70 80 90 100
The node degree is an important factor because it determines Error percentage in loop closure edges

how over-determined the system is: it determines the degree


Fig. 9. Effect of error rate and node degree on robustness. We evaluate
to which correct edges can “overpower” incorrect edges. the quality of posterior maps (in terms of mean squared error) as a function
To illustrate the relationship between these factors and the of the percentage of bad edges and the node degree of the graph. Each
resulting quality of the map (measured in terms of mean data point represents the average of 3,000 random trials; the standard error
is also plotted showing that the results are significant. The quality of the
posterior graph is robust to even high levels of error, and is improved further
by problems with a high node degree. Our methods, regardless of settings,
dramatically out-perform non-mixture methods (black dotted line).

  0.5
Cholesky−MM
0.4
Normal Cholesky

processing time (s)


0.3

0.2
 

 0.1

Map 0
0 100 200 300 400 500 600 700 800 900
Node processed

Fig. 10. Runtime performance. Using the Intel dataset, we plot the time
required to compute a posterior map after every pose, using a batch solver.
Scan alignments Our Intel dataset contains 875 nodes and 15605 edges, and each edge is
modeled as a two-component max-mixture with a null hypothesis. The
Fig. 7. Data-association as a mixture. Given a query pose (R), we perform a additional cost of handling mixtures is quite small in comparison to the
brute-force scan matching operation to all previous poses. The best 4 scan total computation time.
match results, based on overlap, are added to a max-mixture model that
also includes a null hypothesis. The position of the best matches are shown
as circles numbered (1-4), and the corresponding scan matches shown at
the bottom. The similarity in appearance represents a significant degree of squared error), we considered a range of loop-closing error
perceptual aliasing. The scan matcher finds two correct matches and two rates (ranging from 0% to 100%) for graphs with an average
incorrect matches. The two correct matches are the matches numbered 1
and 2 at the bottom of the map and the first two scan alignments. node degree of 4, 8, and 12. Note that an error rate of 80%
means that incorrect loop closures outnumber correct loop
closures by a ratio of 4:1. In each case, the vehicle’s noisy
odometry is also provided. For each condition, we evalu-
ate the performance of our method on 100,000 randomly-
generated Manhattan-world graphs (see Fig. 9). Our method
produces good maps even when the error rate is very high,
and the performance improves further with increasing node
degree. In contrast, a standard non-mixture approach diverges
almost immediately.

E. Runtime Performance
Fig. 8. Intel without front-end loop validation. Our system can identify
correct loop closures and compute a posterior map from within a single The performance of our method is comparable to existing
integrated Bayesian framework (right); the typical front-end loop validation state-of-the-art sparse factorization methods (see Fig. 10).
has been replaced with a k + 1 mixture component containing the k best
laser scan matches (based purely on overlap) plus a null hypothesis. In this
It takes additional time to identify the maximum likelihood
experiment, we used k = 5. For reference, the open-loop trajectory of the mode for each mixture, but this cost is minor in comparison
robot is given on the left. to the cost of solving the resulting linear system.

319
V. C ONCLUSION [11] D. Hähnel, W. Burgard, D. Fox, and S. Thrun, “A highly efficient
FastSLAM algorithm for generating cyclic maps of large-scale en-
vironments from raw laser range measurements,” in Proc. of the
We have described a method for performing inference IEEE/RSJ International Conference on Intelligent Robots and Systems
on networks of mixtures, describing an application of our (IROS), 2003, pp. 206–211.
method to robot mapping. Our method consists of a novel [12] N. Kwak, I.-K. Kim, H.-C. Lee, and B. H. Lee, “Analysis of
resampling process for the particle depletion problem in fastslam,”
mixture model based on a “max” operator that makes the in IEEE International Symposium on Robots and Human Interactive
computation of the Jacobian and residual fast, and we show Communications (RO-MAN), 2007, pp. 200–205.
how existing sparse factorization methods can be extended to [13] C. Stachniss, G. Grisetti, and W. Burgard, “Recovering particle di-
versity in a Rao-Blackwellized particle filter for SLAM after actively
incorporate these mixtures. We believe that such an approach closing loops,” in Proceedings of the IEEE International Conference
is necessary for long-lived systems, since any system that on Robotics and Automation (ICRA), Barcelona, Spain, 2005, pp. 667–
relies on a zero error rate will fail. 672.
[14] T. Bailey, J. Nieto, and E. Nebot, “Consistency of the FastSLAM
We demonstrate how the mixture model allows null hy- algorithm,” in Proceedings of the IEEE International Conference on
potheses and robustified cost functions to be incorporated Robotics and Automation (ICRA). Ieee, 2006, pp. 424–429.
into a maximum likelihood inference system. We show that [15] G. Grisetti, C. Stachniss, and W. Burgard, “Improving grid-based
SLAM with Rao-Blackwellized particle filters by adaptive proposals
our system is robust to a large error rates far in excess of and selective resampling,” in Proceedings of the IEEE International
what can be achieved with existing front-end loop validation Conference on Robotics and Automation (ICRA), Barcelona, April
methods. We further demonstrate a multi-modal formulation, 2005, pp. 2432–2437.
[16] S. S. Blackman, “Multiple hypothesis tracking for multiple target
addressing the “slip or grip” problem and showing that our tracking,” IEEE Aerospace and Electronic Systems Magazine, vol. 19,
system can make loop validation unnecessary in some cases. pp. 5–18, 2004.
[17] H. Durrant-Whyte, S. Majumder, S. Thrun, M. de Battista, and
Our algorithm cannot guarantee convergence to the global S. Scheding, “A Bayesian algorithm for simultaneous localisation
optimum, but we characterized the basin of convergence, and map building,” in Robotics Research, ser. Springer Tracts in
demonstrating the relationship between error rate, node de- Advanced Robotics, R. Jarvis and A. Zelinsky, Eds. Springer Berlin
/ Heidelberg, 2003, vol. 6, pp. 49–60.
gree, and convergence to a good solution. [18] S. Thrun and Y. Liu, “Multi-robot SLAM with sparse extended
Finally, we demonstrate that the performance of our algo- information filters,” in Proceedings of the International Symposium
rithm is similar to that of existing state-of-the-art maximum of Robotics Research (ISRR), Sienna, Italy, 2003.
[19] M. Walter, R. Eustice, and J. Leonard, “A provably consistent method
likelihood systems, and can be easily integrated into state- for imposing exact sparsity in feature-based SLAM information fil-
of-the-art methods [22]. ters,” in Proceedings of the International Symposium of Robotics
Research (ISRR), S. Thrun, R. Brooks, and H. Durrant-Whyte, Eds.
San Francisco, CA: Springer, October 2005, pp. 214–234.
R EFERENCES [20] R. Eustice, H. Singh, J. Leonard, and M. Walter, “Visually mapping
the RMS Titanic: Conservative covariance estimates for SLAM infor-
mation filters,” International Journal of Robotics Research, vol. 25,
[1] P. M. Newman, “On the structure and solution of the simultaneous no. 12, pp. 1223–1242, December 2006.
localisation and map building problem,” Ph.D. dissertation, University [21] F. Dellaert and M. Kaess, “Square root SAM: Simultaneous localiza-
of Sydney, Australia, 1999. tion and mapping via square root information smoothing,” Interna-
[2] U. Frese, “A proof for the approximate sparsity of SLAM information tional Journal of Robotics Research, vol. 25, no. 12, pp. 1181–1203,
matrices,” in Proceedings of the IEEE International Conference on December 2006.
Robotics and Automation (ICRA), Barcelona, Spain, April 2005, pp. [22] M. Kaess, A. Ranganathan, and F. Dellaert, “iSAM: Incremental
331–337. smoothing and mapping,” IEEE Trans. on Robotics, vol. 24, no. 6,
[3] F. Dellaert, “Square root SAM,” in Proceedings of Robotics: Science pp. 1365–1378, 2008.
and Systems (RSS), Cambridge, USA, June 2005. [23] K. Konolige, “Sparse sparse bundle adjustment,” in British Machine
[4] R. Smith, M. Self, and P. Cheeseman, “A stochastic map for uncertain Vision Conference, Aberystwyth, Wales, August 2010.
spatial relationships,” in Proceedings of the International Symposium [24] M. Bosse, P. Newman, J. Leonard, and S. Teller, “Simultaneous
of Robotics Research (ISRR), O. Faugeras and G. Giralt, Eds., 1988, localization and map building in large-scale cyclic environments using
pp. 467–474. the Atlas framework,” International Journal of Robotics Research,
[5] J. Neira and J. D. Tardos, “Data association in stochastic mapping vol. 23, no. 12, pp. 1113–1139, December 2004.
using the joint compatibility test,” IEEE Transactions on Robotics and [25] E. Olson, “Robust and efficient robotic mapping,” Ph.D. dissertation,
Automation, vol. 17, no. 6, pp. 890–897, December 2001. Massachusetts Institute of Technology, Cambridge, MA, USA, June
[6] T. Bailey, “Mobile robot localisation and mapping in extensive out- 2008.
door environments,” Ph.D. dissertation, Australian Centre for Field [26] N. Sünderhauf and P. Protzel, “Towards a robust back-end for pose
Robotics, University of Sydney, August 2002. graph slam,” in Proc. of IEEE Intl. Conf. on Robotics and Automation
[7] E. Olson, “Recognizing places using spectrally clustered local (ICRA), 2012.
matches,” Robotics and Autonomous Systems, 2009. [27] R. Hartley and A. Zisserman, Multiple View Geometry in Computer
Vision, 2nd ed. Cambridge University Press, 2004.
[8] E. Olson, J. Strom, R. Morton, A. Richardson, P. Ranganathan,
[28] H. Strasdat, J. Montiel, and A. Davison, “Scale drift-aware large
R. Goeddel, M. Bulic, J. Crossman, and B. Marinier, “Progress to-
scale monocular slam,” Proceedings of Robotics: Science and Systems
wards multi-robot reconnaissance and the MAGIC 2010 competition,”
(RSS), vol. 2, no. 3, p. 5, 2010.
Journal of Field Robotics, September 2012.
[29] G. Sibley, C. Mei, I. Reid, and P. Newman, “Adaptive relative bundle
[9] U. Lerner and R. Parr, “Inference in hybrid networks: Theoretical adjustment,” in Robotics Science and Systems Conference, Seattle,
limits and practical algorithms,” in Proceedings of the Proceedings USA, June 2009.
of the Seventeenth Conference Annual Conference on Uncertainty [30] E. Olson, “Evaluating back-ends: Metrics,” in Automated SLAM Eval-
in Artificial Intelligence (UAI-01). San Francisco, CA: Morgan uation Workshop, Robotics Science and Systems, Los Angeles, USA,
Kaufmann, 2001, pp. 310–331. 2011.
[10] M. Montemerlo, “FastSLAM: A factored solution to the simultaneous [31] ——, “Real-time correlative scan matching,” in Proceedings of the
localization and mapping problem with unknown data association,” IEEE International Conference on Robotics and Automation (ICRA),
Ph.D. dissertation, Robotics Institute, Carnegie Mellon University, Kobe, Japan, June 2009.
Pittsburgh, PA, July 2003.

320
Turning-Rate Selective Control : A New Method for
Independent Control of Stress-Engineered MEMS
Microrobots
Igor Paprotny∗†∗∗†† , Christopher G. Levey‡ , Paul K. Wright†§ and Bruce R. Donald¶ ∗∗
∗ Berkeley Sensor & Actuator Center (BSAC), University of California, Berkeley, CA, USA.
† Center for Information Technology Research in the Interest of Society (CITRIS), University of California, Berkeley, CA, USA.
‡ Thayer School of Engineering, Dartmouth College, Hanover, NH, USA.
§ Department of Mechanical Engineering, University of California, Berkeley, CA, USA.
¶ Department of Computer Science, Duke University, Durham, NC, USA.

Department of Biochemistry, School of Medicine, Duke University Medical Center, Durham, NC, USA.
∗∗ Duke Institute for Brain Sciences, Duke University Medical Center, Durham, NC, USA.
†† Email: [email protected]

Abstract—We present a novel method for independently con- of underactuation is a result of the limited ability of the
trolling multiple stress-engineered MEMS microrobots called microrobots to decode a global control signal.
MicroStressBots through a single, global, control signal. We Donald et al. [8] analyzed the control voltage complex-
call this technique Turning-rate Selective Control (TSC). To
implement TSC, we fabricated MicroStressBots that exhibit ity of stress-engineered MEMS microrobots, defined as the
different turning rates. TSC specifically exploits these designed number of distinct voltage levels of the control waveform
variations in turning rates between individual microrobots to required to independently maneuver n robots. It was shown
differentiate their motion, and thereby obtain individual control. that MicroStressBots can exhibit sub-linear control voltage
Thus, even though all robots move simultaneously, and are √
complexity (O( n)) if their steering arms are designed to have
identical except for their turning rates, TSC can individually and
independently position the robots’ centers of rotation within a Symmetric Electromechanically Saturated (SESat) hysteresis
planar configuration space. This allows the individual robots to be gaps. In this work, we report on an new method to inde-
independently maneuverable to within a distance r from an arbi- pendently maneuver (independently control) MicroStressBots
trary desired goal point in the plane. The distance r is the turning on a planar substrate. Called Turning-rate Selective Control
radius (approximately half of a microrobot width). We describe (TSC), this method allows for independent control of mi-
the theory behind TSC and, by using fabricated microrobots,
show experimental results that confirm the feasibility of TSC for crorobots that are only differentiated by their turning rates.
controlling multiple MicroStressBots through a single, global, Because MicroStressBots can be designed to have different
control signal. Our paper further validates the multi-microrobot turning rates, TSC fits within the paradigm of Global Control
control paradigm called Global Control Selective Response that Selective Response (GCSR) [5], where independent control of
we first introduced in 2007. We conclude by discussing how TSC multiple microrobots is enabled by engineering the robots to
can extend the maximum number of independently controllable
MicroStressBots. exhibit different behavior during portions of the global control
signal. Our TSC idea is very simple, yet highly effective, as
I. I NTRODUCTION our experimental results show (Sec. IV). The mechanism of
TSC is both different and independent of the mechanism of
The last decade has seen the development of several novel SESat-based control, allowing TSC to be used to complement
microscale mobile robotic systems, such as electrostatically- SESat-based control and increase the maximum number of
driven stress-engineered MEMS microrobots (MicroStress- independently controllable MicroStressBots.
Bots) [6], resonating stepping robots [10], stick-slip magnetic We present the concept behind TSC, and show experi-
walkers [9], and microscrew-based swimmers [11]. Virtu- mental results confirming the feasibility of TSC for enabling
ally all envisioned microrobotic applications (for example independent control of stress-engineered MEMS microrobots.
neurological: [1]) rely on the combined actions of many There are no simulations in this paper; as described below,
microrobots. However, control of many microrobots through the presented control method was validated (up to proof-of-
a single global control signal has been demonstrated by concept) using physically fabricated MicroStressBots.
only a few groups [7, 4]. Simultaneous control of several The paper is structured as follows: in Sec. II, we introduce
microrobots is significantly more challenging than control of the design, the kinematics, and notation for MicroStressBots.
single microrobots because one must overcome the high level Sec. III describes the theory behind TSC, while experimental
of underactuation present in such systems. The high level results confirming its feasibility are presented in Sec. IV. A

321
concluding discussion summarizing the benefits and limita- one side and cannot back up. MicroStressBots can be either
tions of TSC is included in Sec. V. right- or left-handed, depending on whether the steering-arm
is attached to the right of left side of the USDA, respectively.
II. S TRESS - ENGINEERED MEMS M ICROROBOT
(M ICRO S TRESS B OT )
A scanning-electron microscope (SEM) image of a Mi- WUDMHFWRU\ZKHQDUPUDLVHG
VWUDLJKWOLQHPRWLRQ
croStressBot is shown in Fig. 1. The robot is approximately
260 μm × 60 μm × 10 μm in size, and has two actuated WUDMHFWRU\ZKHQDUPORZHUHG
WXUQLQJ
internal degrees of freedom: an untethered scratch-drive actu- T  [\θ) T

ator (USDA) that provides forward motion, and a steering-arm 


ȝP
actuator that determines whether the robot moves in a straight- Z) 
y
r
line or turns. The steering arm actuator consists of a cantilever
beam (80-150 μm long) with a circular pad at its end (20- USDA
50 μm in diameter). The robot chassis, including the USDA 
ȝP
Z
and the steering arm actuator, is fabricated from polycrystaline y
θ
x)
VWHHULQJDUP
x cr
silicon using a MEMS foundry process [3]. Post-processing is
used to add a stress-layer to curve the initially planar steering
arm out-of-plane. The stressor layer (consisting of evaporated Fig. 2. Kinematics of the stress-engineered MEMS microrobot. When the
chromium) is lithographically patterned to produce the exact steering arm is lowered, the robot turns around a center of rotation cr .
amount of steering-arm deflection.

A. Turning Mechanism
The interaction between the steering arm and the underlying
substrate as the USDA is powered causes the MicroStressBots
to follow a curved trajectory, i.e. to turn. The turning mech-
anism is illustrated in Fig. 3. During pull-down, a portion s
of the steering arm comes into flat contact with the substrate
(Fig. 3.a). When the USDA is subsequently actuated, s acts
as a temporary anchor, restricting the motion of the tip of the
steering arm. The robot follows a curved trajectory, flexing
the steering arm until the restoring force of the arm equals
the force applied by the USDA (Fig. 3.b). When the arm is
released during periodic polarity reversal of the AC waveform,
the flexure in the arm is relieved, resulting in a net change in
the heading θ of the microrobot (Fig. 3.b). The amount of the
Fig. 1. SEM image of a MicroStressBot. The robot consists of an untethered steering arm flexure is highly dependent on the geometry of
scratchdrive actuator (USDA) that provides forward motion (propulsion), and the steering arm actuator, making the corresponding turning
a steering-arm actuator that controls whether the robot moves in straight-line
or turns. rate design-specific.

MicroStressBots are actuated on a field of zirconia-insulated B. Notation for MicroStressBot Control


interdigitated electrodes. When a voltage waveform is applied Consider a system of n MicroStressBots Ri , i ∈ {1, . . . , n}.
between the opposing pairs of these electrodes, alternating Let Pj be a control primitive, drawn from an alphabet Σ of m
electric potential is capacitively coupled to the microrobot available control primitives; Σ = {P1 , . . . , Pm }. Application
chassis supplying both power for the actuation of the USDA of each control primitive Pj to the power delivery substrate
(propulsion) and a control signal for the actuation of the will cause all the MicroStressBots on that substrate to move
steering arm. An alternating voltage waveform containing both simultaneously. However, the differences in the designs of their
a steering arm control signal and an USDA power-delivery steering-arm actuators may cause individual robots to move
waveform is called a control primitive [7]. When the steering along different trajectories (exhibit different behavior).
arm is elevated and the USDA receives power, the robot Because each robot Ri can exhibit different behavior during
moves along a straight-line trajectory. When the steering arm each control primitive Pj , we use a control matrix to define
is lowered as the USDA is powered, the robot turns with a the relationship between the applied control primitives and
turning rate ω around its center of rotation, cr . the resulting microrobot behavior. An m × n control matrix
The kinematics of the MicroStressBots are summarized in M = (aji ), where aji is a pair containing the velocity vji
Fig. 2. Although a single robot is globally controllable within and the turning radius rji of robot Ri during the application
its R2 × S1 configuration space (C-space) [12], the robot is of control primitive Pj , aji = (vji , rji ). If robot Ri moves
not small-time locally controllable, since it can only turn to forward in straight line during the application of Pj , rji

322
primitive is applied to the substrate. Consequently, the control
sequence S defines nominal (i.e. error free) trajectories Ti ,
i ∈ {1, . . . , n} for all n microrobots on the substrate. The
nominal trajectories described by S are especially useful for
planning the motion of microrobots. The relationship between
the control sequence and the trajectories of a system of three
s MicroStressBots subject to a hypothetical control sequence S
(a) and a control matrix M is illustrated on Fig. 4.
An orbit is a trajectory that returns the microrobot to
r its initial configuration. An example of a circular orbit is
illustrated on Fig. 4. A cr -orbit is a trajectory that returns
the center of rotation of a microrobot to its initial location.
All orbits are cr -orbits, but the inverse is not necessary the
case.

a
s P3
(b) c
b R3 c P2
P1 P2
c
P2
r
b
P1
T3

T1 b
P1

P3
a T2
s
(c)

R1
Fig. 3. Turning mechanism of the MicroStressBot (top and cross-section a
views). (a) The steering arm is pulled in contact with the substrate, and a P3
flat region s is temporarily anchored in place by electrostatic forces. (b) The P1 P2 P3 R2
USDA is actuated, causing the steering arm to flex while the robot follows
a trajectory with the radius of curvature r. (c) As the arm is released during
a polarity reversal of the drive voltage waveform, the flexure of the steering Fig. 4. Relationship between the control sequence S and trajectories for a
arm is released. This cycle is continuously repeated, causing the robot to turn. group of three microrobots, R1 , R2 , and R3 . Nominal trajectories T1 , T2 ,
and T3 for a control sequence S = {P3a , P1b , P2c } are shown. Note that the
behavior of R1 and R2 only differ during the application of control primitive
P2 , while robot R3 always turns, completing an orbit.
approaches infinity. While turning, the velocity vji and turning
radius rji does not vary much between the control primitives
that cause turning, and for the remainder of this paper we shall C. Interpolated Turning
assume that vi = vji and ri = rji for all j that correspond Interpolated turning is a technique that allows a Mi-
to turning control primitives. The turning rate of the robots is croStressBot to follow a curved trajectory with an effective
consequently ωi = vrii . We define the center of rotation cr of radius of curvature (r ) that can be set to any value between
robot Ri as cr (Ri ). the turning radius (r) and infinity by interleaving straight-
To maneuver the microrobots on the power delivery sub- line and turning trajectory segments. The concept of inter-
strate, control primitives will be applied in some sequence to polated turning is illustrated in Fig. 5. Let an interpolated
actuate the robots. All the MicroStressBots on the substrate turning trajectory TI be defined by the control sequence
receive the same single global power-delivery and control SI = {P1a , P2b , · · · , P1a , P2b }, where P1 and P2 are straight-
signal, and all the robots will move during the application motion and turning control primitives, respectively. Let ρa and
of the control primitives. We define a control sequence S to ρb be the fraction of the time primitives P1 and P2 are applied
be a sequence of control primitives in the control sequence SI ; ρa = a+b a b
and ρb = a+b .
The the trajectory TI will now approximate a turning
S = {P1t1 , P2t2 , . . . , Pktk }, (1) trajectory with a radius of curvature r , r ≤ r < ∞, according
to
where Pltl , l ∈ {1, . . . , k}, is a control primitive drawn from  
the alphabet of available control primitives Σ, and tl denotes ρa
r = r 1 + . (2)
the duration the voltage waveform described by the control ρb

323
TI
q r q r rc

r r’ cr cr
rc
cr

straight-line trajectory turning trajectory

interpolated turning trajectory approximated r’


(a) (b)

Fig. 5. Interpolated turning. The robot follows an interpolated trajectory TI Fig. 6. Orbit trajectories. (a) During an orbit following a turning trajectory,
defined by the control sequence SI = {P1a , P2b , · · · , P1a , P2b }, where P1 and the microrobot rotates with radius r around its center of rotation cr . The
P2 are straight-motion trajectory and turning control primitives, respectively. location of cr does not change (the orientation θ does change; cr is turning
a b in place). (b) During an orbit based on an interpolated turning trajectory, the
ρa = a+b and ρb = a+b , and the trajectory of SI approximates a turning
trajectory with turning radius r according to Eq. (2). microrobot follows a trajectory with a radius of curvature r = r + rc . The
center of turning cr follows a circular trajectory (its own orbit) with radius
rc , and returns to its starting location.

The approximation improves as the number of primitive


pairs P1 -P2 repeated within SI increases. In our experiments,
are maneuvered along interpolated turning trajectories and
the P1 -P2 pair was merged into a new control primitive where
do not complete an orbit are translated in C-space. Fig. 7
the pair was repeated more than 20 times producing a good
illustrates the concept of TSC. Consider robots R1 and R2
approximation to r .
designed such that ω2 = 13 ω1 . An orbit for R1 based on an
III. T URNING - RATE S ELECTIVE C ONTROL (TSC) interpolated turning trajectory translates the center of rotation
The ability to vary the turning rates by varying the steering (cr (R2 )) for robot R2 by a vector γ. The vector γ can be
arm designs, as well as to adjust the effective turning radius expressed as
r during interpolated turning allows us to propose a new ⎛   ⎞
method for independent control of MicroStressBots. Called sin θ2 − sin θ2 + 2πh ω 2

γ = rc ⎝   ⎠,
ω 1
(3)
Turning-rate Selective Control (TSC), this method uses a novel cos θ2 + 2πh ω ω1
2
− cos θ 2
mechanism to achieve independent control of MicroStressBots
different from our previous presented approaches. In TSC, where θ2 is the orientation of robot R2 , h denotes the side
the microrobots are differentiated only through design-induced the steering-arm is attached to (h = 1 when the robot is
variation of their turning rates and not through selective pull- lefthanded, h = −1 when robot is righthanded), rc = r2 − r2 ,
down and release of their steering-arm actuators [7]. This where r2 and r2 are the turning radius and the interpolated
implies that that during TSC, all the robots are either turning turning radius (from Eq. 2), respectively, of robot R2 , and ω1
or moving straight at any given time. and ω2 are the turning rates of robots R1 and R2 , respectively.
TSC is based on engineering orbit trajectories that maneuver Note that in Fig. 7 as well as the other figures in this
some robots towards goal, while other robots are returned to section, the length of the trajectories is not to scale. The
their starting configurations. By combining several orbits, TSC starting and final locations of the centers of rotation cr for the
can independently translate the centers of rotation cr of the robots are marked by (S) and (E)-labeled dots, respectively.
individual robots in R2 . (During TSC all robots are turning Overlapping dots indicate no change in location. In addition,
at exactly the same time, and hence TSC can independently where appropriate, grey color denotes past trajectories and
maneuver only the centers of rotation.) intermediate configurations.
Consider two orbits that follow a turning trajectory and an After translating the center of rotation cr (R2 ) by displace-
interpolated turning trajectory, shown in Figs. 6.a and 6.b, ment vector γ, cr (R2 ) can be translated by a second linearly
respectively. When the robot is turning, the center of rotation independent displacement vector γ2 (see Fig. 8). The vectors
cr remains in the same location in R2 , as illustrated in Fig. 6.a. γ and γ2 effectively allow cr (R2 ) to be translated anywhere
However, when the robot is following an interpolated turning within R2 without moving the center of rotation cr (R1 ).
trajectory, the center of rotation cr is translated along a curved Correspondingly, the center of rotation cr (R1 ) of robot R1
path with radius of curvature rc = r − r. When the robot can be translated while robot R2 completes an orbit, and
completes this orbit, cr returns to its starting location (Fig. cr (R2 ) returns to its starting location. One approach is to
6.b). initially move robot R2 using interpolated turning, and then
Because each MicroStressBot makes a full rotation when construct a non-orbit trajectory that maneuvers cr (R2 ) back
it completes an orbit, MicroStressBots with different turning to its original location, while displacing robot R1 by vector η.
rates will complete their orbits at different times. Robots that This approach is illustrated in Fig. 9.

324
E

K
S
E R1 S

E S
S
J

R1 R2
J’
Fig. 7. The concept of Turning-rate Selective Control (TSC). Trajectories E

of two robots R1 and R2 with different turning rates: ω2 = 13 ω1 . As R1


R2 S
J
S
J
follows a circular orbit trajectory based on interpolated turning, the center of
rotation (cr (R2 )) for robot R2 is displaced by vector γ.

(a) (b)

Fig. 9. Translating robot R1 and not robot R2 . (a) Initially, the center
of rotation cr (R2 ) is translated by vector γ using an interpolated turning
R1 S E
S
trajectory while robot R1 completes an orbit. (b) Both robots are then rotated
such that center of rotation cr (R2 ) can return to its starting location using
straight-line motion along vector γ  (bottom), while robot R1 and center of
rotation cr (R1 ) are both translated by vector η (top).

TABLE I
E XPERIMENTALLY MEASURED TURNING RATES ω AND TURNING RADII r
FOR FABRICATED MICROROBOTS R1 AND R2

J2 ω [rad/s] r [μm]
R2 R1 0.46 (0.05) 136.66 (7.6)
S R2 0.20 (0.01) 177.50 (5.0)
J S
J

TSC on three robots are shown on Fig. 10. By induction, two


(a) (b) displacement vectors ζ3 and ζ3 can be combined to form a
cr -orbit for robot R3 , allowing the translation of the center
Fig. 8. Translating the center of rotation cr (R2 ) by two displacement vectors of rotation of a fourth robot R4 , and so on. We will use the
that span R2 . (a) Translating cr (R2 ) by vector γ, while robot R1 completes
an orbit. (b) Both robots turn (non-interpolated) to an arbitrary angle, and
shorthand TSC(n) to denote TSC of n > 2 microrobots.
cr (R2 ) is translated by another vector γ2 , while robot R1 completes another
orbit. The angle between the vectors, as well as their length, can be varied IV. E XPERIMENTAL R ESULTS
arbitrary, allowing cr (R2 ) to be translated in R2 .
We fabricated two MicroStressBots designed to exhibit
different turning rates. The designs and optical micrographs
of the two robots are shown in Figs. 11 and 12 respectively. L
A. Extension to n Microrobots and w correspond to the length and width of the steering arm,
TSC can be extended to independently control n Mi- respectively. d is the diameter of the steering-arm pad and c
croStressBots by individually translating a single robot while is the length of the stressor layer applied to the steering arm.
the remaining n − 1 robots complete their orbits. Consider The large difference in the turning rate of the robots was
adding a third microrobot R3 with turning rate ω3 = 32 ω1 to experimentally verified. Measured average turning rates ω and
the system shown on Fig. 7. The center of rotation cr (R3 ) turning radii r for both robots are shown in Tab. I. Values in
and not cr (R2 ) is translated by engineering the displacement parentheses show the standard deviations. Differences in turn-
vectors γ and γ2 such that they cancel each other, resulting in a ing rates are also clearly visible in the optical micrographs in
non-circular cr -orbit returning the center of rotation cr (R2 ) to Fig. 13. The four images were obtained at times t = 2, 6, 8, 12
its starting location. Because all the trajectories complete orbits seconds from the start of the experiment and include traces of
for robot R1 , its center of rotation, cr (R1 ), also remains in its the completed trajectories (black) for the two robots turning.
starting location. The center of rotation cr (R3 ) is translated Fig. 14 shows the basic implementation of TSC on Mi-
by a vector ζ3 . The trajectories for this implementation of croStressBots R1 and R2 , where one of the robots is displaced

325
E
R1 S S


100 μm ȝP
 )
Zy


/
 ȝP

Z
P
ȝ  )
J ’
ȝP  Zx
 F  ȝP
R2 S E

J S
J

ȝP
 G 

Fig. 12. Design and optical micrograph (inset) of robot R2 .

]
S E
R3 S
R2 R2
] ] ]

(a) (b) R1 R1
ȝP ȝP

Fig. 10. Extending TSC to three MicroStressBots. A third robot R3 (ω3 =


2 t = 2 s. t = 6 s.
ω ) is added to the system from Fig. 7. (a) The robots are actuated along
3 1
interleaved trajectories that results in an orbit of R1 (top), while the centers
of rotation cr (R2 ) and cr (R3 ) are translated by vectors γ (middle) and ζ1
R2 R2
(bottom), respectively. (b) The robots rotate (solid line trajectory) such that
an interleaved turning trajectory translates the center of rotation cr (R2 ) back
to starting location (middle). The center of rotation cr (R3 ) is translated by
a vector ζ2 (bottom), while robot R1 completes another orbit. The overall
displacement of cr (R3 ) is ζ3 = ζ1 + ζ2 , while the location of cr (R1 ) and
cr (R2 ) remain unchanged. R1 R1
ȝP ȝP

t = 8 s. t = 12 s.

Fig. 13. Optical micrographs of robots R1 and R2 at times t = 2, 6, 8, 12


seconds while following turning trajectories. Traces of the completed trajec-
tories for both robots are shown in black.
100 μm

ȝP
 )
error contributed to a slight displacement of the center of
Zy rotation cr (R1 ).
Fig. 15 shows the implementation of TSC where the center
/
 ȝP of rotation of one robot is translated in R2 by two independent

vectors, analogous to the concept illustrated in Fig. 8. The
 ) center of rotation for robot R2 , cr (R2 ), is translated by two
ȝP
  Zx
F ȝP vectors γ and γ2 . The robot R1 completes two closed-loop
ȝP



ȝP
 Z orbits during this experiment. Again, a slight drift in the center
  of rotation cr (R1 ) is apparent due to control error. During the
G
experiment, the robot R2 moves partially outside the view of
Fig. 11. Design and optical micrograph (inset) of robot R1 . the optical microscope. A movie of this experiment is available
online here [13].
Fig 16 shows the implementation of TSC to independently
while the second robot orbits back to its initial configuration, translate the robot with a larger turning rate, analogous to
analogous to the concept illustrated in Fig. 7. The center of the implementation illustrated in Fig. 9. The steps in this
rotation cr (R2 ) is translated by vector γ using an interpolated particular implementation were conducted in the reverse order
turning trajectory while robot R1 completes an orbit. Control than what was suggested in Sec. III. In this implementation,

326
E
R2 R2
J E

J
S S

R1
S SE
E R1

ȝP
ȝP

(a)
Fig. 14. Composite optical micrograph showing the basic implementation
of TSC. Robots R1 and R2 follow interpolated turning trajectories. As robot
R1 completes an orbit and returns back to starting configuration, the center R2
of rotation of robot R2 , cr (R2 ), is translated by a vector γ. J
J E

robot R2 was first translated using a combination of turning


and straight-line trajectories, displacing the center of rotation
cr (R1 ) by vector η (Fig. 16.a). Consequently, the center of
rotation cr (R2 ) was translated back to its starting location, E
S
completing a cr -orbit for R2 . Also in this experiment, a slight R1
drift in the location of cr (R2 ) can be observed. In practice,
this control error will define the ultimate accuracy with which ȝP

the TSC can be implemented, analogous to forward projections (b)


described in [8]. A movie of this experiment is avaliable online
here [13]. Fig. 15. Composite optical micrographs showing the translation of the center
of rotation for robot R2 , cr (R2 ), by two linearly independent vectors. (a)
V. C ONCLUSION Both robots follow interpolated turning trajectories. The center of rotation
We presented a new method for independent control cr (R2 ) is displaced by vector γ while robot R1 completes an orbit. (b) Both
robots turn (turning trajectories are denoted by dotted lines). During turning,
of stress-engineered MEMS microrobots (MicroStressBots). the centers of rotation of both robots remain in place, allowing the angle
Called Turning-rate Selective Control (TSC), this method between γ and γ2 to be set to an arbitrary value. The center of rotation for
allows for the independent translation of centers of rotation cr robot R2 , cr (R2 ), is then translated using interpolated turning by a second
vector γ2 , as R2 moves outside of the view of the microscope (intermediate
of MicroStressBots that are identical except for their turning configurations of robot R2 are included for clarity). The (E) dot and arrow
rates. This allows the individual robots to be maneuvered point to the approximate location of cr (R2 ) after the completion of the
independently to within the turning radius (r) away from arbi- experiment.
trary locations in R2 . The concept of TSC was experimentally
validated using physical fabricated MicroStressBots.
The mechanism of TSC is both different from, and indepen- According to SESat-based control, one would need 20 control
dent of, SESat-based control [8]. This allows TSC to augment voltage levels to achieve independent control. Suppose we
SESat and increase the number of microrobots that can be divide the 100 microrobots into four groups, each containing
independently controlled using a single, global, control signal. 25 robots that are independently controlled using SESat, while
When properly designed, n MicroStressBots can be controlled robots between the groups are differentiated using TSC. The

using 2 n independent control voltage levels [8]. Consider a robots within this hybrid SESat-TSC(4) system can now be
hybrid SESat-TSC(u) system containing n = u × w robots, controlled using only 10 independent control voltage levels.
where u is the number of groups of w SESat MicroStressBots TSC can only maneuver the centers of rotation for the robots in
that are differentiated through their u different turning rates. R2 , however the SESat system can decouple the rotation of the
The w robots within a group are controlled using SESat, individual robots. It would be interesting to analyze whether
while the robots between the u groups are controlled using a SESat-TSC(n) system can be engineered to independently
TSC. Such SESat-TSC(u) system can independently control maneuver the robots in R2 × S1
n robots using the same number of control voltage levels as Because TSC cannot selectively snap-down or release the
a w-robot
' SESat system, and has control voltage complexity steering arms of the individual robots, all robots either all
of 2 nu . move in a straight line or all turn at the same time. Con-
For example, suppose one wants to independently control sequently, the orientations θ of the robots controlled by
100 MicroStressBots through a single global control signal. TSC remain coupled, i.e. cannot be independently controlled.

327
NIH, and 2000-DT-CX-K001 to B.R.D., from the Office for Domestic
E Preparedness, Department of Homeland Security, USA, and funded
J in part by the Center for Information Technology Research in the
Interest of Society (CITRIS). The authors wish to thank D. Balkcom,
R2 S C. Bailey-Kellogg, and Kris Pister for their advice and suggestions.
The electron micrograph was taken at the Dartmouth Ripple Electron
Microscopy Laboratory, with the help of C.P. Daghlian.

R EFERENCES
[1] I. V. Borzenets, I. Yoon, M. W. Prior, B. R. Donald, R. D.
Mooney, and G. Finkelstein. Ultra-sharp metal and nanotube-
K S
R1 based probes for applications in scanning microscopy and neural
E recording. Journal of Applied Physics, 111(7):074703, 2012.
ȝP [2] T. Bretl. Control of many agents using few instructions. In the
Proceedings of Robotics: Science and Systems 2007 conference,
(a) Atlanta, GA, 2007.
[3] A. Cowen, B. Hardy, R. Mahadevan, and S. Wilcenski. Poly-
MUMPs Design Handbook. MEMSCAP, 2011.
[4] E. Diller, C. Pawashe, S. Floyd, and M. Sitti. Assembly and
J disassembly of magnetic mobile micro-robots towards deter-
R2 ministic 2-D reconfigurable micro-systems. The International
J' S Journal of Robotics Research, 30(14):1667–1680, December
E
2011.
[5] B. R. Donald. Building very small mobile micro robots.
Inaugural Lecture, Nanotechnology Public Lecture Series, MIT
(Research Laboratory for Electronics, Electrical Engineer-
ing and Computer Science, and Microsystems Technology,
Laboratories). Cambridge, MA., April 2007. avaliable on-
K S
R1 line at: http://video.mit.edu/watch/building-very-small-mobile-
E microrobots-9273/.
ȝP
[6] B. R. Donald, C. G. Levey, C. McGray, I. Paprotny, and
D. Rus. An untethered, electrostatic, globally-controllable
MEMS micro-robot. Journal of Microelectromechanical Sys-
(b) tems, 15(1):1–15, January 2006.
[7] B. R. Donald, C. G. Levey, and I. Paprotny. Planar microassem-
Fig. 16. Composite optical micrographs showing the implementation of TSC bly by parallel actuation of MEMS microrobots. Journal of
to independently translate the robot with a larger turning rate (R1 ). (a) The Microelectromechanical Systems, 17(4):789–808, August 2008.
center of rotation of robot R2 , cr (R2 ), is translated using a combination of
turning and straight-line trajectories (denoted by dotted lines), and is translated
[8] B. R. Donald, C. G. Levey, I. Paprotny, and D. Rus. Simultane-
by a vector γ. There resulting trajectory of robot R1 does not form a an ous control of multiple mems microrobots. In the proceedings
orbit, causing the center of rotation cr (R1 ) to be translated by vector η. (b) of the 9th International Workshop on Algorithmic Foundations
The center of rotation of robot R2 , cr (R2 ), is translated back to its starting of Robotics (WAFR), December 2008, and also in Springer
location by vector γ  using interpolated turning, completing a cr -orbit for R2 . Tracts on Advanced Robotics, 57, G. S. Chirikjian, H. Choset,
Because the interpolated turning trajectory completes a cr -orbit for robot R1 , M. Morales, and T. Murphey (eds.), Springer-Verlag (Berlin),
the center of rotation cr (R1 ) maintains its displacement by vector η from (2010).
step (a). [9] S. Floyd, C. Pawashe, and M. Sitti. An untethered magnetically
actuated micro-robot capable of motion on arbitrary surfaces. In
the Proceedings of IEEE International Conference on Robotics
Because r > 0, the MicroStressBots do not turn in place. and Automation (ICRA), 2008.
To approximate desired configurations in R2 × S1 , the robots [10] D. R. Frutiger, B. E. Kratochvil, K. Vollmers, and B. J. Nelson.
Magmites wireless resonant magnetic microrobots. In the
must orbit around their centers of rotation (cr ) until mutual Proceedings of IEEE International Conference on Robotics and
differences in their turning rates cause their orientations to Automation (ICRA), May 2008.
align close to the desired configurations. Hence, there is a [11] A. Ghosh and P. Fischer. Controlled propulsion of artificial
trade-off between the lengths of the robot trajectories and magnetic nanostructured propellers. Nano letters, 9(6):2243–
the resulting proximity to the goal configurations, which is 2245, 2009.
[12] S. LaValle. Planning Algorithms. Cambridge University Press,
especially important when we consider perturbations due to 2006.
the effects of control error. It would be interesting to fully [13] I. Paprotny, C. G. Levey, P. K. Wright, and B. R. Donald.
investigate such trade-offs in future work. Turnig-rate Selective Control (TSC). [Online]. Avaliable:
Finally, related theoretical ideas have been also presented in http://youtu.be/aZMbf6lrEMA, 2012.
e.g. [2], and it would be interesting to investigate how these
methods fit with the SESat-TSC framework.
ACKNOWLEDGMENTS
This work was supported by a grant to B.R.D. from the Duke
Institute of Brain Sciences, grant numbers GM-65982 to B.R.D. from

328
Affine Trajectory Deformation for Redundant Manipulators
Quang-Cuong Pham and Yoshihiko Nakamura
Department of Mechano-Informatics
University of Tokyo, Japan

Abstract—We propose a new method to smoothly deform or motion transfer to humanoid robots [16, 14]. In the context
trajectories of redundant manipulators in order to deal with un- of robot control, the original trajectory is often obtained
foreseen perturbations or to retarget captured motions into new through extensive computations (in order e.g. to minimize
environments. This method is based on the recently-developed
affine deformation framework, which offers such advantages as energy consumption, to avoid obstacles, to respect joint limits,
closed-form solutions, one-step computation and no trajectory etc.), therefore, if the deformed trajectory is “close” to the
re-integration. Satisfaction of inequality constraints and dynam- original one, one can expect the former to keep the good
ics optimization are seamlessly integrated into the framework. properties of the latter. The deformation algorithm should also
Applications of the method to interactive motion editing and leave the possibility for specifying explicitly other optimization
motion transfer to humanoid robots are presented. Building on
these developments, we offer a brief discussion of the concept of objectives, such as integrated torque, energy consumption,
redundancy from the viewpoint of group theory. distance to obstacles, etc.

I. I NTRODUCTION B. Existing approaches


Planning trajectories for highly redundant manipulators is Two main approaches exist in the literature for handling
challenging and time-consuming because of the large number trajectory deformations. In spline-based approaches, a defor-
of degrees of freedom associated with these systems [7]. As a mation is made by altering the coefficients multiplying the
consequence, in order to deal with unforeseen obstacles or per- basis splines [13] or by adding to the original trajectory
turbations of the target or of the system state, it is sometimes a displacement map – which is a sum of splines [2, 6].
more advantageous to deform a previously planned trajectory These modifications can furthermore be done in a coarse-to-
rather than to re-compute entirely a new one. In motion- fine manner using wavelet bases [13] or through hierarchical
capture-based applications, deforming captured trajectories – approximations [6].
e.g. to adapt them to a different environment, to retarget them Another approach is based on the encoding of the orig-
to a different character [10, 2, 6], or to transfer them to a inal trajectory by an autonomous nonlinear dynamical sys-
humanoid robot [16, 14] – is the only viable option, as one tem [3, 14]. A deformation is then made by altering the
cannot reasonably record beforehand all the motions with the coefficients multiplying the basis functions that appear in
desired kinematic and dynamic properties. the definition of the dynamical system. This approach yields
a robust execution-time behavior thanks to the autonomous
A. Some desirable properties of the deformations nature of the dynamical system. At the same time, because of
Our aim in the present article is to design a trajectory this dynamical-system representation, inequality (such as joint
deformation algorithm that satisfy the three following re- limits, obstacle avoidance) and equality (such as specified final
quirements : accuracy, smoothness and optimality. “Accuracy” velocity, acceleration, etc.) constraints at specific time instants
simply means that the deformed trajectory should attain the cannot be taken into account without integrating the whole
objective for which the deformation has been conceived : for trajectory up to these time instants, which can be very costly.
instance, to reach exactly a new configuration, with a specified The above two approaches are similar in that they make
velocity, acceleration, etc. use of exogenous basis functions : splines in the spline-based
By “smoothness”, we mean that the deformed trajectory approach, and Gaussian kernel functions in the dynamical-
should conserve the regularity properties (such as being C 1 , system-based approach. A first, pervasive, difficulty then con-
C 2 or more generally C p ) of the original trajectory. This is sists of choosing the appropriate bases for a particular task.
particularly critical for instance in the context of computer Second, adding artificial functions to a natural movement can
graphics to avoid motion jerkiness or, in the context of robot produce undesirable behaviors, such as large undulations in
control, to avoid infinite torques. the case of splines [6, 13], lack of smoothness, etc. – which
By “optimality”, we mean that one should have the pos- call for supplementary and often costly efforts to correct.
sibility to choose a deformation that optimizes certain cri-
teria, so long as this optimization does not interfere with C. Our approach
the previous requirements of accuracy and smoothness. One Unlike the spline-based and the dynamical-system-based
typical objective can be for instance to minimize the distance approaches, the method we propose makes use of no exoge-
between the deformed trajectory and the original one. This is nous basis function. Indeed, inspired by the recent finding that
obviously desirable in the case of motion retargeting [10, 2, 6] human movements are – to some extent – invariant under

329
affine transformations [1], we deform a given trajectory by B. Equality constraints on the deformations
applying affine transformations on parts of it. Thus, the only
“basis functions” are the original joint angle time series and, 1) Smoothness constraints at the deformation time instant:
as a consequence, any deformed trajectory under this scheme Consider a C p trajectory θ(t)t∈[0,T ] and an affine transfor-
automatically preserves some properties of the original one. mation F that deforms θ(t)t∈[0,T ] into θ  (t)t∈[0,T ] at a time
Such invariant properties may include for instance : smooth- instant τ , i.e.
ness, absence of large undulations, piece-wise parabolicity ∀t < τ θ  (t) = θ(t)
(which is particularly important for industrial robots), affine ∀t ≥ τ θ  (t) = F(θ(t)).
velocity, frequency spectrum, etc. or more qualitatively, motion
“naturalness” [18], which is a desirable feature in motion We say that F is C p -preserving if the resulting θ  (t)t∈[0,T ]
retargeting applications, yet difficult to quantify. We shall is also C p . From [8], we know that F is C p -preserving if
show that, despite this small functions basis, it is still possible and only if (i) θ(τ ) is a fixed-point of F and (ii) the first p
to satisfy the requirements of “accuracy”, “smoothness” and derivatives of θ  are continuous at τ . Condition (i) implies that
“optimality” stated earlier by leveraging the extra redundancy F is of the form
offered by the affine transformations. ∀φ F(φ) = θ(τ ) + M(φ − θ(τ )), (1)
More precisely, our approach is based on the affine trajec-
tory deformation framework first developed for nonholonomic where M is a non-singular linear map A → A.
Next, for i = 1 . . . p, let us note ui = ddtθi (τ ) (u1 is the
i
mobile robots [8, 9]. In contrast with previous trajectory de-
formation methods for mobile robots [5, 11], affine-geometry- velocity vector at time τ , u2 is the acceleration vector, etc.)
based algorithms are exact (given by closed-form expressions), Then condition (ii) can be formulated as
can be executed in one step, and do not require any trajectory
∀i = 1 . . . p M(ui ) = ui . (2)
re-integration. The present manuscript focuses on manipulators
with many degrees of freedom, but some of the new devel- Let us now consider the non-degenerate case when u1 , . . . , up
opments presented here (about e.g. inequality constraints or are linearly independent. In this case, it is possible to construct
an orthonormal basis B whose first p vectors form a basis of
dynamics optimization) should benefit all classes of system U = Span(u1 , . . . , up ), for instance using a Gram-Schmidt
for which affine deformations are applicable. orthonormalization procedure. From equation (2), the matrix
In section II, we present the main algorithms of affine tra- of M in this basis is of the form
⎛ ⎞
jectory deformation for redundant manipulators. Closed-form 1 m1,p+1 ··· m1,n
expressions of the deformations that optimize the closeness ⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
between the original and the deformed trajectories are given, ⎜ 1 mp,p+1 ··· mp,n ⎟
M=⎜
⎜ 0
⎟,

and explicit bounds on the dissimilarity between the original ⎜ ... 0 1 + mp+1,p+1 ··· mp+1,n ⎟
⎜ . . . . . . ⎟
and the deformed trajectories or between their derivatives are ⎝ . .. .. .. .. .. ⎠
.
presented. Inequality constraints, dynamics optimization, and 0 ... 0 mn,p+1 ··· 1 + mn,n
kinematics filtering are integrated into the affine deformation
which shows that the space of C p -preserving affine transfor-
framework. In section III, we illustrate these results with
mations at τ forms a Lie subgroup of the General Affine group
two simple concrete examples : interactive motion editing
GAn of dimension n(n − p) = n2 − pn, parameterized by the
and motion transfer to humanoid robots. In section IV, we
mi,j .
give a characterization of trajectory redundancy by the group
of admissible deformations, revisiting thereby the concept 2) Constraints on the final configuration: Regarding the
of kinematic redundancy and suggesting a new theoretical “accuracy” requirement, let us now characterize, within the
approach to motion planning. space of C p -preserving deformations, those that allow at-
taining a desired final position θ d , i.e., such that θ  (T ) =
II. A FFINE TRAJECTORY DEFORMATIONS F(θ(T )) = θ d .
A. Affine spaces and affine transformations Let (λ1 , . . . , λn ) and (μ1 , . . . , μn ) denote respectively the
coordinates of θ(T ) − θ(τ ) and of θ d − θ(τ ) in the basis B.
An affine space is a set A together with a group action of a Then, the condition for the deformed trajectory to reach the
vector space W. An element w ∈ W transforms a point θ ∈ A desired configuration θ d at time T is (see Fig. 1A for an
into another point θ  by θ  = θ + w, which can also be noted illustration)
−→
θ  − θ = w or θθ  = w. M (λ1 . . . λn ) = (μ1 . . . μn ) . (3)
Given a point θ 0 ∈ A (the origin), an affine transformation
F of the affine space can be defined by a couple (w, M) Equation (3) can actually be transformed into
where w ∈ W and M is a non-singular linear map W → W. Vm = δ, (4)
The transformation F acts on A by ∀θ ∈ A, F(θ) = θ 0 + where V is the n × n(n − p) matrix defined by
−−→
M(θ 0 θ) + w. Note that, if θ 0 is a fixed-point of F, then F ⎛ ⎞
can be written in the form λp+1 ··· λn 0 ··· 0
⎜ .. ⎟
V=⎝
−−→ 0 ⎠
, (5)
0 ··· 0 . 0 ···
∀θ ∈ A F(θ) = θ 0 + M(θ 0 θ). 0 ··· 0 λp+1 ··· λn

330
% 0      
   
    
dimension n2 − (k + p)n. Within this space, one can then
4 choose the deformations that optimize certain criteria,
/'1*
 3  


'*
4
/'*
 3
as detailed in the following sections and summarized in

/'* /2'*  
    Fig. 1B.
 
      
/'* #$
!"

C. Minimum-norm deformations
" -)   - 
     
/'1* / 1) Closed-form solution: As stated in the Introduction, one
%    +, &      important optimization objective for the deformed trajectory
 #  (  '( () 

!"      * can consist of being the “closest” possible to the original
trajectory. One way to achieve this is to minimize the distance
.#  
    between the transformation F and the identity transformation,
 
which in turn can be quantified by the Frobenius distance
Fig. 1. A : Illustration for the case n = 2 (system of dimension 2) and p = 1 between the matrix M and I. The  Frobenius
 distance between
(C 1 continuity, i.e. continuity of the velocity, is required). B : Flowchart of M and I is given by M−IF = i,j m 2 = m , where
ij 2
the deformation algorithm.
 · 2 denotes the 2-norm of vectors.
On the other hand, the m of minimum 2-norm that satisfies
and m and δ are the vectors of sizes n(n − p) × 1 and n × 1 equation (4) is given by
defined by m = V+ δ,
⎛ ⎞
m1,p+1
⎜ .. ⎟ where V+ denotes the Moore-Penrose pseudo-inverse of V
⎜ ⎟
⎜ . ⎟ ⎛ ⎞ ⎛ ⎞ (for simplicity, we treat the case when only the final configu-
⎜ ⎟ μ1 − λ1
⎜ m1,n ⎟ δ1
ration is constrained).
⎜ ⎟ ⎜ ⎟ ⎜ . ⎟
m=⎜

.. ⎟
⎟ and δ=⎝ ..
⎠ = ⎝ .. ⎠ .
⎜ . ⎟ . Observe next that, since the rows of V are linearly inde-
⎜ mn,p+1 ⎟ μ n − λn δn
⎜ ⎟ pendent, one can in fact compute explicitly V+ by
⎜ . ⎟
⎝ .. ⎠
1
mn,n V+ = V (VV )−1 = V ,
S
Note from the above definition that δ represents in fact the n @proj (T )2 , where θ
@proj (T ) is the
with S = i=p+1 λ2i = θ
coordinates of θ d − θ(T ) in the basis B. 2

3) Constraints on the final derivatives: Assume for instance orthogonal projection of θ(T ) − θ(τ ) on U (see Fig.1A for
that one wishes to arrive at the final configuration with a an illustration). One can thus compute m by
desired velocity vd . Then, the matrix of the transformation
must satisfy, in addition to equation (3), the following equation 1
m = V+ δ = V δ
⎛ ⎞ ⎛ ⎞ S
λv1 μv1
⎜ .. ⎟ = ⎜ .. ⎟ , 1
M⎝
. ⎠ ⎝ . ⎠
(6) = (λp+1 δ1 , . . . , λn δ1 , . . . , λp+1 δn , . . . , λn δn ) ,
S
λvn μvn
1
which leads to the explicit form of M as M = I + S W,
where (λv1 , . . . , λvn ) and (μv1 , . . . , μvn ) denote respectively the where
coordinates of θ̇(T ) and of vd in the basis B. ⎛ ⎞
0 ··· 0 λp+1 δ1 ··· λn δ1
Next equation (4) becomes ⎜ ⎟
W = ⎝ .. .
..
.
..
.
..
.
..
.
.. ⎠. (8)
.
Vg m = δ g , (7) 0 ··· 0 λp+1 δn ··· λn δn

where Vg is a 2n × n(n − p) matrix constructed by adding 2) Bound on the dissimilarity of the trajectories: Let us
below V a n×n(n−p) matrix similar to V but containing the now compute the distance between the deformed and the
λvi s instead of the λi , and where δ g is a 2n × 1 vector defined original trajectories. First, observe from the form of F [cf.
by δ g = (μ1 − λ1 , . . . , μn − λn , μv1 − λv1 , . . . , μvn − λvn ) . equation (1)] that, for all t ≥ τ
More generally, satisfying k final constraints (k = 3 for
instance if one wants to constraint the final position, velocity, θ  (t) − θ(t) = (M − I)(θ(t) − θ(τ ))
and acceleration) will give rise to a matrix Vg of size kn × = @proj (t).
(M − I)θ (9)
n(n − p). From the dimensions of the matrices of the linear
Letting · 2 also denote the induced 2-norm of linear maps
system (7), one can thus draw the following conclusions and matrices, one has, from the explicit form (8) of M
• If n − pn ≥ kn (i.e. n ≥ k + p), then it is possible to
2
 n ⎛ n ⎞
deform θ into a θ  that satisfies k final constraints while 1 1 1  
M−I 2 = W 2 ≤ W F =  δi ⎝
2 λi ⎠
2
guaranteeing C p continuity. S S S i=1 i=p+1

• Furthermore, if n > k + p, then such a θ is not unique. √
In fact, the space of affine transformations that satisfy the δ2 S θ d − θ(T )2
= = √ . (10)
above conditions constitutes a Lie subgroup of GAn of S S

331
Next, from equation (9), one has tion (12) is then equivalent to
θ  (t) − θ(t)2 ≤ M − I2 · θ @proj (t)2 Ai PVi m ≤ bi − Ai θ Can (ti ).
θ d − θ(T )2 @
= √ θ proj (t)2 One can finally construct a matrix Ag and a vector bg by
S piling vertically all the Ai PVi on one hand and all the bi −
@proj (t)2
θ Ai θ Can (ti ) on the other hand.
= θ d − θ(T )2 . (11)
@
θ proj (T )2 Now, to find the deformation of minimum-norm (cf. sec-
tion II-C) while satisfying the inequalities (11), it suffices to
Note that inequality (11) is in fact an equality for t = T . solve the Quadratic Program
This inequality provides a bound on the distance between the
1
deformed and the original trajectories at each time instant. min m22
@proj (t) and θ
Since θ @proj (T ) do not depend on θ d , one can in 2
fact write with the equality and inequality constraints

max θ  (t) − θ(t)2 ≤ Kθ d − θ(T )2 , Vg m = δ g , Ag m ≤ b g . (13)


t∈[τ,T ]
Note finally that the system of equality and inequality
where K is a constant independent of θ d (for instance, K = 1
constraints can be fully prioritized and solved efficiently using
in Fig. 1A). In particular, this shows a desirable “continuity”
recent algorithms [4].
property of our deformation algorithm : when θ d → θ(T ), one As an illustration, consider a planar 3-link manipulator.
has θ  (t)t∈[0,T ] → θ(t)t∈[0,T ] in L2 . The original trajectory of the end-effector is a straight line
3) Bounds on the dissimilarity of the derivatives: One can between the initial position and the final position. However,
also compute the distance between the derivatives (velocity, the corresponding joint angle trajectory violates several joint
acceleration, etc.) of the deformed and the original trajectories limit constraints. A deformed C 1 trajectory is then computed
in a similar way as above. One has indeed, for all levels q of that connects the initial and final positions while respecting
differentiation the constraints and staying the closest possible to the original
 q 
dq θ  dq θ d θ trajectory, see Fig. 2.
∀t ∈ [τ, T ] (t) − (t) = (M − I) (t) ,
dtq dtq dtq
which, together with (10), allows obtaining the following 2

bound for all t ∈ [τ, T ] 1.5

3 q  3 3 q 3
3d θ dq θ 3 θ d − θ(T )2 3 d θ 3 1

3 3 3 3
3 dtq (t) − dtq (t)3 ≤ @ 3 dtq (t)3 .
2 θ proj (T )2 2
0.5

0
D. Inequality constraints
−1.5 −1 −0.5 0 0.5 1 1.5 2
θ1 (rad) θ (rad) θ3 (rad) θ2+θ3 (rad)
In addition to the equality constraints of section II-B, most 2
2
3.5

1.8
applications also require the satisfaction of inequality con- 0.8
1.6
1.2 3

straints, such as joint limits, upper-bounds on the velocities, 0.6


1.4 1 2.5

1.2
accelerations or torques, avoidance of obstacles, etc. In many 0.4 1
0.8 2

0.6 1.5
cases, these constraints can be expressed by 0.2
0.8

0.6 0.4 1

Ai θ Can (ti ) ≤ bi ,
0.4
(12) 0
0.2
0.2 0.5

−0.2 0 0 0
0 5 0 5 0 5 0 5

where ti ∈ [τ, T ] is a specific time instant, Ai is a c×n matrix, Time (s)

bi is a c×1 vector and θ Can (ti ) is the n×1 vector containing Fig. 2. Deformation under inequality constraints. The original trajectory
the coordinates of θ(ti ) in the canonical basis. To enforce (gray) is deformed into the black one in order to respect the constraints θ1 ≥ 0
joints limits, one can for example choose several ti that sample and θ2 + θ3 ≤ 2.8 rad (black horizontal lines in the bottom plots). The
gray dots correspond to the time instant of the deformation. Note that the
the region where the joint values are expected to be large. Note deformed trajectory remains C 1 . The time instants when the constraint are
also that constraints on higher-order derivatives such as θ̇, θ̈, enforced (black dots in the bottom plots) were chosen near the minimum of
etc. can be treated using the formulae of section II-B3. θ1 and the maximum of θ2 + θ3 in the original trajectory.
Next, let P denote the basis matrix of B (cf. section II-B1).
Equation (4) can then be equivalently written as E. Optimization of dynamic quantities
Vi m = P −1
(θ Can (ti ) − θ Can (ti )), As stated in the Introduction, it is sometimes necessary to
specify explicitly the objective of the optimization, such as
which yields energy consumption or integrated square torque. However, in
contrast with the minimum-norm deformation case, closed-
θ Can (ti ) = θ Can (ti ) + PVi m,
form solutions cannot, in general, be obtained for such opti-
where Vi has been constructed as in equation (5). Equa- mizations. One must then resort to iterative methods.

332
Typically, consider a cost function of the type needs to make sure that the stance foot neither leaves the
 T ground nor slips. This is expressed by maintaining the foot
C(θ) = c(θ(t), θ̇(t), θ̈(t)) dt. position constant throughout a continuous time interval. It is
0 clear that a single affine transformation cannot satisfy such
Then the optimization problem can be formulated as a constraint that applies on a continuous time interval. Thus,
 T to take into account this type of fine-grained constraints, we
min c(θ(τ ) + M(θ(t) − θ(τ )), M(θ̇(t)), M(θ̈(t))) dt propose to subsequently “filter” the deformed trajectory. More
M τ
(14) precisely, consider for instance the following constraint
subject to the linear equality and inequality constraints of g(θ(t)) = 0, ∀t ∈ [t0 , t1 ]. (15)
equation (13).
Note that, if an iterative method is used, a very convenient Let us denote by Jg the Jacobian matrix of g and Ng (φ)
candidate for an initial guess is the M of minimum norm, the nullspace of Jg at point φ. One first needs to make sure
which can be computed very quickly following sections II-C that the deformed trajectory θ  satisfies the constraint at t0
and II-D by requiring g(θ  (t0 )) = 0 (and θ̇  (t0 ) ∈ Ng (θ  (t0 )) if one
As an illustration, consider again the 3-link planar manipu- needs C 1 continuity). This can be done by using the affine de-
lator of the previous section. The deformation was made here formation method presented earlier. Next, differentiating (15)
in order to minimize the integrated squared torque yields the constraint Jg (θ)θ̇ = 0, ∀t ∈ [t0 , t1 ].
 T Let us now consider the filtered trajectory θ  defined by
C(θ) = ρ21 (t) + ρ22 (t) + ρ23 (t) dt, ∀t ≤ t0 θ  (t) = θ  (t)
0
where ρj is the torque applied at joint j. Note that the above ∀t ∈ [t0 , t1 ] θ̇  (t) = projNg (θ (t)) θ̇  (t), (16)
equation can be easily put in the form of equation (14) by where proj is the operator of orthogonal projection. Equa-
expressing the ρj as functions of θ, θ̇, θ̈ and the geometric tion (16) is a differential equation very simple to integrate and
and inertial parameters of the system. An example is given in whose solution θ  satisfies the constraint g for all t ∈ [t0 , t1 ].
Fig. 3. While in general there is no guarantee that θ  will stay close
to θ  , in practice, for highly redundant manipulators and short
2
constraint time intervals, this procedure can yield satisfying
1.5
results, as illustrated in Fig. 4. Finally, if needed, one can
1
make a final affine deformation to reconnect θ  (t)t∈(t1 ,T ] with
0.5
θ  (t)t∈[t0 ,t1 ] at time t1 .
0

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5


2
θ 1 (rad ) θ̇ 1 (rad . s −1) θ̈ 1 (rad . s −2) Torque1 (N.m)
0.5 0.5 1.5
0.4
0.2 1 1.5
0
0 0.5
−0.2
−0.5 0 0 1
0 5 0 5 0 5 0 5
θ2 Torque2
θ̇ 2 θ̈ 2
2 0.5 0.5
0 0.4
1 −0.2 0.2
0
−0.4 0
0 0
0 5 0 5 0 5 0 5
θ3 θ̇ 3 θ̈ 3 Torque
3 −0.5
2 0.5 0.1
0
1 0 −0.2 0 −1
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
−0.4
0 −0.5 −0.1 θ1 (rad) θ (rad)
2
θ3 (rad)
0 5 0 5 0 5 0 5
Time (s) 2 2
2
1
1
Fig. 3. Minimization of squared torque. The original trajectory (gray) is 0 1

deformed into a new trajectory (black) whose integrated squared torque is −1 0 0


0 2 4 0 2 4 0 2 4
smaller. The dots correspond to the time instant of the deformation. Note that Time (s)
the deformed joint angle trajectories are C 1 but not C 2 . Note also that, in
agreement with classical results (see e.g. [15]), the torque-optimal trajectory Fig. 4. Kinematics filtering to fulfill fine-grained constraints. The original
of the end-effector in Cartesian space is not straight but slightly curved. trajectory (dark gray) allows connecting the initial and final end-effector
positions by a straight line in the end-effector space. The joint angle trajectory
is next deformed in order to reach the final end-effector position but using
F. Fine-grained kinematics filtering a different joint angle configuration (black trajectory, top plot). However the
black end-effector trajectory (black thick line in the top plot) is no longer a
In the previous sections, we have presented a method to straight line. The black joint angle trajectory is then “kinematically filtered”
find trajectory deformations that suit global, coarse-grained, into the light gray one whose end-effector trajectory is a straight line in
Cartesian space, similarly to the original dark gray trajectory, but whose joint
objectives, such as reaching a desired final configuration, angle trajectory is close to the black one (see bottom plots). In particular, the
avoiding obstacles at specific time instants or optimizing a final black and light gray configurations are almost identical.
global trajectory cost. However, a trajectory obtained from
such deformations may not satisfy local, fine-grained, con- The previous development constitutes in some sense a
straints. For instance, when modifying a walking pattern, one “kinematics filter”. A more sophisticate approach – the “dy-

333
namics filter” – allows filtering a non-dynamically-consistent 900

trajectory into a dynamically-consistent one by using a fast dy- 800

700
namics computation algorithm for structure-varying kinematic
600
chains [16]. 500

III. E XAMPLES OF APPLICATIONS 400

300

A. Interactive motion editing 200

In typical computer graphics applications, the system under 100

0
study is so complicated that interactive manipulations are the
−100
only way to modify the motion to suit one’s needs. In this −200 0 200 400 600 800 1000 1200
−1
1400 1600 1800 2000
−2
Hip angle (rad) Hip velocity (rad.s ) Hip accel (rad.s )
perspective, a pin-and-drag algorithm that allows interactively 0 5 50

altering a particular configuration while respecting kinematic −1 0 0

−2 −5 −50
and contact constraints using differential inverse kinematics 0 1 2 0 1 2 0 1 2
Knee angle Knee velocity Knee accel
was designed in [17]. Based on the previous development, 1
5 50
0
we propose here a way to extend this algorithm to handle −1
0 0
−50
−5
interactive modifications of the entire trajectory. −2
0 1 2 0 1 2
−100
0 1 2

Assume that a joint angle trajectory θ(t)t∈[0,T ] is given, and 2


Ankle angle
5
Ankle velocity
100
Ankle accel

that we are interested in modifying the position of a certain 1 0


50
0
landmark point r = f (θ). For instance, if θ represents the 0
0 1 2
−5
0 1 2
−50
0 1 2

joint angles of a human leg, r can represent the position of Time (s)

the right toe, and we want to modify r so that the toe passes
above an obstacle at a time instant tobs . This can be done as Fig. 5. Interactive editing of a human walking pattern. The original trajectory
is reconstructed from motion-capture data and is plotted in dark gray. The
follows intermediate configuration at time tobs is highlighted by bold dark gray
1) First, “drag” the position of r(tobs ) towards a higher sticks (top plot) and the landmark point is represented by the leftmost black
position rd and compute the corresponding θ d using the disk. First, differential inverse kinematics is performed to compute a desired
configuration θ d (highlighted by the super bold gray sticks in the top plot)
pin-and-drag algorithm [17] ; corresponding to a higher position rd of the landmark point (rightmost black
2) Next, choose an appropriate τ < tobs and deform disk). Then a first deformation (black) is made to bring the configuration at
θ(t)t∈[0,T ] at τ to obtain θ  (t)t∈[0,T ] with θ  (tobs ) = time tobs towards the desired configuration θ d . Finally a second deformation
(light gray) is made to bring the configuration at T back to that of the original
θd ; trajectory, with the original velocity, ensuring thereby C 1 continuity with the
3) Finally, choose an appropriate τ  ≥ tobs and deform final part of the trajectory. Note that the time instant T corresponds here to
θ  (t)t∈[0,T ] at τ  to obtain θ  (t)t∈[0,T ] with θ  (T ) = the time instant of the first heel strike after tobs .
θ(T ).
Note that, at steps 2 and 3, one may also impose velocity,
The smoothness of the deformed robot joint angles and of
acceleration, etc. constraints to guarantee C 1 , C 2 , etc. conti-
the velocity profiles can be observed in the plots of Fig. 6. Note
nuity.
also that, although noisy (because of motor and sensor noises),
The new trajectory θ  (t)t∈[0,T ] can be computed very
the accelerations remained small at the global scale thanks
quickly, even for systems with large numbers of degrees of
to the C 1 continuity guaranteed by the affine deformation
freedom, thanks to the efficiency of the pin-and-drag and affine
algorithm. Finally, note that it is possible to implement online
deformation algorithms. One can thus visualize the whole
feedback control by applying this algorithm in a reactive
deformed trajectory in real-time as one drags the landmark
manner [9].
point, which greatly enhances the editing experience. This is
illustrated in Fig. 5. IV. A CHARACTERIZATION OF TRAJECTORY REDUNDANCY
B. Motion transfer to humanoid robots BY THE GROUP OF ADMISSIBLE DEFORMATIONS

We first recorded a hand waving movement (four degrees Building from the previous development, we now discuss
of freedom : elbow flexion, shoulder pitch, roll, yaw) of a the concept of redundancy from the viewpoint of group theory.
human subject (see the first row of snapshots in Fig. 6).
A. Configuration and velocity redundancies
Optical markers were placed on the subject’s shoulder, elbow
and wrist, which allow reconstructing the joint angles by A manipulator is said to be kinematically redundant with
inverse kinematics. These joint angles were then transferred respect to a task when more degrees of freedom than the
to a humanoid robot (second row of snapshots in Fig. 6). minimum number required to execute that task are available,
Another robot movement was obtained by smoothly deforming see e.g. [7, 12]. As in section III-A, consider the system
the joint angle trajectories in order to reach a new final
r = f (θ), (17)
arm configuration corresponding to : elbow angle − 0.2rad,
shoulder pitch + 0.2rad, roll + 0.2rad, yaw − 0.2rad (third where r is a vector of dimension m representing the config-
row of snapshots in Fig. 6). uration of the end-effector and θ is a vector of dimension

334
point, which we call velocity redundancy. Differentiating (17)
indeed yields ṙ = Jf (θ)θ̇, where Jf = ∂θ ∂f
is the Jacobian
matrix of f , of dimension m × n. If n > m and Jf (θ(t)) is
non-singular, then a given desired instantaneous velocity vrd of
the end-effector can be achieved by infinitely many different
instantaneous velocities vθ of the joint angles (for notational
simplicity, we have dropped the time index t). More precisely,
let S represent the null-space of Jf (θ) and vθ∗ = Jf (θ)+ vrd .
Then any joint angle velocity in the affine subspace {vθ∗ + S}
will achieve the desired end-effector velocity vrd [7, 12].
From a group-theoretic viewpoint, which will be convenient
later on, let TS denote the space of the translations whose
vectors belong to S. This set can actually be viewed as a
Lie subgroup of dimension n − m of the general affine group
GAn . The space of all joint angle velocities vθ corresponding
−1 −2

1
Elbow angle (rad)
5
Elbow ang velocity (rad.s )
200
Elbow ang acceleration (rad.s )
to a single end-effector velocity vrd described above can then
be seen as the orbit of vθ∗ under the action of TS , and the
0
−1 0 0
−2
“degree of velocity redundancy” of the system at θ as the
−5
−3 −200
0 1 2 0 1 2 0 1 2
Shoulder pitch angle Shoulder pitch velocity Shoulder pitch acceleration
1
0
10
200 dimension of TS as a Lie subgroup of GAn .
−1 0 0
−2
−3
0 1 2
−10
0 1 2
−200
0 1 2
B. Trajectory redundancy
Shoulder roll angle Shoulder roll velocity Shoulder roll acceleration

0
5 200
The developments of sections II and III have highlighted
0 0
−1
another type of redundancy, namely trajectory redundancy :
once a particular joint configuration θ d has been chosen from
−2 −5 −200
0 1 2 0 1 2 0 1 2
Shoulder yaw angle Shoulder yaw velocity Shoulder yaw acceleration
3 10
2
200
the many possible joint configurations that achieve a given
1 0 0
0
−1 −10
−200
end-effector configuration, there still exists infinitely many
0 1 2 0 1
Time (s)
2 0 1 2
joint angle trajectories that can bring the manipulator from the
initial configuration θ 0 towards θ d with a specified velocity,
Fig. 6. Transferring a human hand waving motion to a humanoid robot. acceleration, etc. while respecting the system kinematic and
Top row: the original human movement (snapshots taken every 500ms). dynamic constraints, see Fig. 7B.
Second row (condition NORMAL) : the joint angles obtained from inverse Unlike configuration/velocity redundancies, trajectory re-
kinematics are transferred on the humanoid robot after making one affine
deformation at t = 1s to align the initial posture of the motion with the dundancy is generally of infinite dimension. Finding a conve-
standard initial posture of the robot (snapshots taken every 2s). Third row nient way to parameterize a subset of admissible trajectories is
(condition MODIFIED) : a second affine deformation is made at t = 1.5s to then of particular interest. We have seen, from the development
alter the final posture of the robot. In the last three snapshots, the robot’s right
arm in condition NORMAL was superimposed for comparison. Note that the of section II that, given a time instant τ , the set of admissible
robot movement speed was reduced to 1/4th of the original human speed to trajectories that can be obtained by affinely deforming an
comply with the hardware limitations. The graphics show the joint angles and original trajectory θ(t)t∈[0,T ] is – roughly – the orbit of
their derivatives for the original human movement (dark gray), and for the
scaled robot movements (NORMAL in black and MODIFIED in light gray). that trajectory under the action of a subgroup of dimension
n2 −(k+p)n of GAn . To obtain a more convenient description,
we now propose a construction that removes the dependance
n representing the joint angles. If n > m, then there gen- upon τ , allowing thereby the composition of deformations at
erally exists infinitely many θ that correspond to a given r, different time instants.
which constitutes the notion of configuration redundancy, see For a given τ , let G(τ ) denote the group of admissible affine
Fig. 7A. transformations at time τ . As we remarked previously, G(τ )
is of dimension n2 − (k + p)n if θ is not singular at τ .
  Consider now two finite sequences τ̂ = {τ1 , . . . , τl } (with
0 ≤ τ1 < . . . < τl < T ) and ĝ = {g1 , . . . , gl } ∈ G(τ1 ) ×
0 ). We define the deformation fτ̂,ĝ of θ by
. . . × G(τl ) = G(τ̂
∀t ∈ [0, τ1 ], fτ̂,ĝ (θ)(t) = θ(t)
∀i ∈ [1, l], ∀t ∈ (τi , τi+1 ], fτ̂,ĝ (θ)(t) = (g1 ◦ . . . ◦ gi )(θ(t)),
Fig. 7. Configuration redundancy (A) and trajectory redundancy (B). For with the convention τl+1 = T .
simplicity, we have sketched in plot B the end-effector trajectory redundancy,
but this notion applies more generally to the joint angle trajectories. Given two sequences τ̂ and ĝ ∈ G(τ̂ 0 ), we define the inverse
−1
of fτ̂,ĝ , denoted fτ̂,ĝ , by the deformation associated with the
Redundancy can also be studied from a differential view- sequences τ̂ and ĝ −1 = {g1−1 , . . . , gl−1 }.

335
0 ) and ĝ  ∈ G(τ̂
Given τ̂ , τ̂  , ĝ ∈ G(τ̂ 0  ), we define the product nature of the proposed method allows the deformed motions to
fτ̂,ĝ  fτ̂  ,ĝ by the deformation associated with the sequences preserve quantitative and qualitative properties, such as e.g. the
τ̂ ∪ τ̂  and ĝ + ĝ  where the latter sequence is constructed as difficult-to-quantify “naturalness” [18], of the original motions
follow with no extra effort.
 
• For all τi ∈ τ̂ \ τ̂ , put gi into ĝ + ĝ at the position Our current research focuses on developing this method for
corresponding to τi ; full-scale applications in character animation [10, 6, 17, 18]
   
• For all τi ∈ τ̂ \ τ̂ , put gi into ĝ + ĝ at the position and humanoid robots control [16, 14].

corresponding to τi ; Acknowledgments
   
• For all σ ∈ τ̂ ∩ τ̂ , say σ = τi = τj , put gi ◦ gj into ĝ + ĝ
We would like to thank Mr. Hamano for the human experiment
at the position corresponding to σ.
data, Dr. Srinivasa and Mr. Santacruz for help with the humanoid
We can now state the following proposition
robot experiment, Prof. Bennequin and Dr. Diankov for helpful
Proposition 1 The set of all fτ̂,ĝ endowed with the inverse
discussions. This work was supported by “Grants-in-Aid for Scientific
and product operations as defined above is a Lie group of
Research” for JSPS fellows and by a JSPS postdoctoral fellowship.
dimension n2 − (k + p)n + 1. We call this group the affine
deformation group of θ(t)t∈[0,T ] and denote it by A(θ). 4 R EFERENCES
Recall that we have previously identified the redundancy of [1] D. Bennequin, R. Fuchs, A. Berthoz, and T. Flash. Movement timing
and invariance arise from several geometries. PLoS Comput Biol, 5(7):
velocities with a certain group of translations TS of dimension e1000426, Jul 2009. doi: 10.1371/journal.pcbi.1000426.
n − m. Similarly, we identify here part of the redundancy of [2] M. Gleicher. Retargetting motion to new characters. In ACM SIG-
trajectories with the group A(θ), in the sense that the orbit GRAPH, pages 33–42. ACM, 1998.
[3] A.J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with
of θ under the action of A(θ) is the set of all admissible nonlinear dynamical systems in humanoid robots. In IEEE International
trajectories that can be obtained from θ by finite sequences of Conference on Robotics and Automation, 2002.
affine deformations. Note that, in the limit of large n – as in the [4] O. Kanoun, F. Lamiraux, and P.-B. Wieber. Kinematic control of
redundant manipulators: Generalizing the task-priority framework to
case of highly redundant manipulators or of humanoid robots inequality tasks. IEEE Transactions on Robotics, 27(4):785–792, 2011.
– the dimension of A(θ) grows linearly with n2 , allowing doi: 10.1109/TRO.2011.2142450.
thereby more “freedom” than configuration redundancy alone [5] F. Lamiraux, D. Bonnafous, and O. Lefebvre. Reactive path deformation
for nonholonomic mobile robots. IEEE Transactions on Robotics, 20
(the dimension of TS grows linearly with n only). (6):967–977, 2004. doi: .829459.
The group property and the matrix representation of the [6] J. Lee and S.Y. Shin. A hierarchical approach to interactive motion
admissible deformations allow searching efficiently (using editing for human-like figures. In ACM SIGGRAPH, pages 39–48. ACM,
1999.
random sampling, optimization, etc.) within the space of [7] Y. Nakamura. Advanced Robotics: Redundancy and Optimization.
trajectory redundancy, as partially illustrated in sections II Addison-Wesley, 1990.
and III. We also believe that a better understanding of A(θ) – [8] Q.-C. Pham. Fast trajectory correction for nonholonomic mobile robots
using affine transformations. In Robotics: Science and Systems, 2011.
in particular that of its associated Lie algebra – may provide [9] Q.-C. Pham and Y. Nakamura. Regularity properties and deformation
powerful methods to make trajectory deformations. of wheeled robots trajectories. In IEEE International Conference on
Robotics and Automation, 2012.
V. C ONCLUSION [10] C. Rose, B. Guenter, B. Bodenheimer, and M.F. Cohen. Efficient
generation of motion transitions using spacetime constraints. In ACM
We have presented a new method of trajectory deformation SIGGRAPH, pages 147–154. ACM, 1996.
for redundant manipulators based on affine transformations. [11] K. Seiler, S. Singh, and H. Durrant-Whyte. Using Lie group symmetries
This method is exact, fast, and guarantees the smoothness of for fast corrective motion planning. In Algorithmic Foundations of
Robotics IX, 2010.
the resulting trajectories. Furthermore, the resulting trajecto- [12] B. Siciliano and J.-J. E. Slotine. A general framework for managing
ries can be chosen so as to remain close the original trajectory multiple tasks in highly redundant robotic systems. In International
(with explicit bounds), to optimize a given cost function, or Conference on Advanced Robotics, pages 1211–1216, 1991. doi: 10.
1109/ICAR.1991.240390.
to satisfy inequality constraints at specific time instants. Com- [13] A. Ude, C.G. Atkeson, and M. Riley. Planning of joint trajectories
bined with further kinematics or dynamics filtering, the method for humanoid robots using B-spline wavelets. In IEEE International
can also yield trajectories that satisfy constraints applying on Conference on Robotics and Automation, volume 3, pages 2223–2228.
IEEE, 2000.
continuous time intervals, in a coarse-to-fine manner. [14] A. Ude, A. Gams, T. Asfour, and J. Morimoto. Task-specific general-
This method is advantageous with respect to spline- or ization of discrete and periodic dynamic movement primitives. IEEE
dynamic-system-based approaches in that it does not require Transactions on Robotics, 26(5):800–815, 2010.
[15] Y Uno, M Kawato, and R Suzuki. Formation and control of optimal
choosing an exogenous functions basis (such as hierarchical trajectory in human multijoint arm movement. minimum torque-change
spline basis [6], wavelet spline basis [13], Gaussian ker- model. Biol Cybern, 61(2):89–101, 1989. ISSN 0340-1200.
nels [3, 14], etc.) : indeed, the only “basis functions” we [16] K. Yamane and Y. Nakamura. Dynamics filter – concept and implemen-
tation of online motion generator for human figures. IEEE Transactions
use are the original joint angle trajectories. In particular, on Robotics and Automation, 19(3):421–432, 2003.
there is no need to fine-tune the basis functions or to put [17] K. Yamane and Y. Nakamura. Natural motion animation through con-
extra constraints on the coefficients multiplying the basis straining and deconstraining at will. IEEE Transactions on visualization
and computer graphics, pages 352–360, 2003.
functions in order to avoid undesirable behaviors (such as [18] K. Yamane, J.J. Kuffner, and J.K. Hodgins. Synthesizing animations of
spline trajectories that undulate too much [6] or wavelets that human manipulation tasks. In ACM Transactions on Graphics (TOG),
have too much energy [13]). Finally, the purely endogenous volume 23, pages 532–539. ACM, 2004.

336
E-Graphs: Bootstrapping Planning
with Experience Graphs
Mike Phillips Benjamin Cohen Sachin Chitta Maxim Likhachev
[email protected] [email protected] [email protected] [email protected]
Carnegie Mellon University University of Pennsylvania Willow Garage Carnegie Mellon University

Abstract—Human environments possess a significant amount


of underlying structure that is under-utilized in motion planning
and mobile manipulation. In domestic environments for example,
walls and shelves are static, large objects such as furniture and
kitchen appliances most of the time do not move and do not
change, and objects are typically placed on a limited number of
support surfaces such as tables, countertops or shelves. Motion
planning for robots operating in such environments should
be able to exploit this structure to improve its performance Fig. 1. Motion planning is often used to compute motions for repetitive tasks
with each execution of a task. In this paper, we develop an such as dual-arm mobile manipulation in a kitchen.
online motion planning approach which learns from its planning
episodes (experiences) a graph, an Experience Graph. This graph
represents the underlying connectivity of the space required should be capable of exploiting learned knowledge about the
for the execution of the mundane tasks performed by the underlying geometric structure in tasks and human environ-
robot. The planner uses the Experience graph to accelerate its ments. While human environments can be very dynamic, e.g.
planning efforts whenever possible. On the theoretical side, we with people walking around, large parts of the environment
show that planning with Experience graphs is complete and are still static for significant periods of time. Similarly, tasks
provides bounds on sub-optimality with respect to the graph that
represents the original planning problem. On the experimental tend to have some form of spatial structure, e.g. objects are
side, we show in simulations and on a physical robot that our often found on support surfaces like tables and desks.
approach is particularly suitable for higher-dimensional motion This work focuses on learning from experience for motion
planning tasks such as planning for single-arm manipulation planning. Our approach relies on a graph-search method for
and two armed mobile manipulation. The approach provides
significant speedups over planning from scratch and generates planning that builds an Experience Graph online to represent
predictable motion plans: motions planned from start positions the high-level connectivity of the free space used for the
that are close to each other to goal positions that are also encountered planning tasks. New motion planning requests
close to each other tend to be similar. In addition, we show reuse this graph as much as possible, accelerating the planning
how the Experience graphs can incorporate solutions from other process significantly by eliminating the need for searching
approaches such as human demonstrations, providing an easy
way of bootstrapping motion planning for complex tasks. large portions of the search-space. While previously encoun-
tered motion planning problems can speed up the planner
I. I NTRODUCTION dramatically, in their absence, it gracefully falls back to search-
Motion planning is essential for robots operating in dynamic ing the original search-space, adding the newly generated
human environments. Tasks like picking and placing objects motion to the Experience graph. Planning with Experience
and mobile manipulation of larger boxes require the robot to graphs is therefore complete. Furthermore, we show that it
approach and pick up (or place) objects with minimal collision provides bounds on sub-optimality with respect to the graph
with the rest of the environment. Fast performance of motion that represents the original planning problem. Planning with
planning algorithms in such tasks is critical, to account for Experience graphs leads to consistent and predictable solutions
the speed of operation expected by humans and to account for for motion plans requested in similar (but not the same)
sudden changes in the environment. This is especially true of scenarios, e.g when the goal states of the robot are close
tasks involving higher-dimensional configuration spaces, e.g. to each other. Our approach is particularly useful when the
for a two-armed mobile manipulation system ( Figure 1). tasks are somewhat repeatable spatially, e.g. in moving a set
At the same time, many of mundane manipulation tasks of dishes off a particular counter into a dishwasher. Although
such as picking and placing various objects in a kitchen are the start and goal states would be different for each motion
highly repetitive. It is therefore expected that robots should plan, the general motion of moving a dish from the counter
be capable of learning and improving their performance with to the washer would essentially be the same each time.
every execution of these repetitive tasks. In particular, robots We provide experimental results demonstrating the use of
our approach both in simulation and on a real robot. A full-
We thank Willow Garage for their support of this work. This research was
also sponsored by ARL, under the Robotics CTA program grant W911NF- body planner for the PR2 robot was developed for dual-
10-2-0016. arm mobile manipulation tasks using a search-based planner.

337
We show how the use of Experience graphs improves the in that we attempt to use all the information from previous
performance of the planner in this high-dimensional state searches instead of attempting to pick the most similar or
space. We also present results comparing our planner against best path. Our approach can reuse parts of the Experience
sampling-based planning approaches. Finally, we show a Graph even when the start and goal states change. Contrary
preliminary application of Experience graphs to learning by to other approaches, our approach also gracefully degenerates
demonstration. to planning from scratch (with no reused information) to deal
with new scenarios which might be completely different from
II. R ELATED W ORK the ones in the database. Our approach also does not rely
Initial approaches to motion planning focused on planning on object or shape recognition and is thus agnostic to the
from scratch, i.e. there was no reuse of the information from representation of the environment, e.g. as a voxel grid or
previous plans. Recently, there has been more work on reuse individual objects represented using meshes. Although our
of previous information for motion planning, especially in the approach is also comparable to Probabilistic Roadmaps [11], a
context of performance optimization for motion planning in crucial difference is that Experience Graphs are generated from
realtime dynamic environments. Lien et. al. [12] presented an task-based requests instead of sampling the whole space. We
approach that involved constructing roadmaps for obstacles, are also able to provide a bound on the quality of the returned
storing them in a database, and reusing them during motion solution, with respect to the discretization of the action and
planning. Bruce et. al. [3] extended the traditional RRT state spaces. Finally, our approach tends to generate consistent
approach to reuse cached plans and bias the search towards solutions as we confirm in our experimental analysis.
waypoints from older plans. Extensions of this approach can
III. A LGORITHM
be found in [20, 6].
In [15], an evolutionary algorithm approach was used to bias A. Overview
RRT search for replanning in dynamic environments towards An Experience Graph or E-Graph is a graph formed from
the edges of the explored areas, intended to reduce the time the solutions found by the planner for previous planning
spent on searching parts of the space that have already been queries or from demonstrations. We will abbreviate this graph
explored. In [20], workspace probability distributions were as GE . The graph GE is incomparably smaller than graph
automatically learned for certain classes of motion planning G used to represent the original planning problem. At the
problems. The distributions attempted to capture a locally- same time, it is representative of the connectivity of the space
optimal weighting of workspace features. exercised by the previously found motions. The key idea of
Trajectory libraries have seen use for adapting policies for planning with GE is to bias the search efforts, using a specially
new situations [18], especially for control of underactuated constructed heuristic function, towards finding a way to get
systems and high-dimensional systems. In [1], new trajec- onto the graph GE and to remain searching GE rather than
tories in a state space were generated by combining nearby G as much as possible. This avoids exploring large portions
trajectories, including information about local value function of the original graph G. In the following we explain how to
estimates at waypoints. In [14], a trajectory library was used do this in a way that guarantees completeness and bounds on
in combination with an optimal control method for generating solution quality with respect to the original graph G.
a balance controller for a two link robot. Transfer of policies
B. Definitions and Assumptions
across tasks, where policies designed for a particular task were
adapted and reused in a new task, were discussed in [19]. First, we will list some definitions and notations that will
A learning based approach to reuse information from previ- help explain our algorithm. We assume the problem is repre-
ous motion plans, the environment, and the types of obstacles sented as a graph where a start and goal state are provided
was presented in Jetchev [7]. Here, a high-dimensional feature (sstart , sgoal ) and the desired output is a path (sequence of
vector was used to capture information about the proximity edges) that connect the start to the goal.
G G
of the robot to obstacles. After a dimensionality reduction in • G(V , E ) is a graph modeling the original motion
the feature space, several learning methods including nearest planning problem, where V G is the set of vertices and
neighbor, locally weighted regression and clustering are used E G is the set of edges connecting pairs of vertices in
to predict a good path from the database in a new situation. V G.
E E E
The work in [8] is most similar to our work and involves • G (V , E ) is the E-Graph that our algorithm builds
the use of a database of older motion plans. The approach over time (GE ⊆ G).
uses a bi-directional RRT and tries to draw the search towards • c(u, v) is the cost of the edge from vertex u to vertex v
E
a path which is most similar to the new motion planning • c (u, v) is the cost of the edge from vertex u to vertex
problem (based on distances to the start, goal and obstacles). v in graph GE and is always equal to c(u, v)
Some more recent work [2] also attempts to repair paths from Edge costs in the graph are allowed to change over time
a database of past paths using sampling-based planners. It’s (including edges being removed and added which happens
future work to compare against this approach. when new obstacles appear or old obstacles disappear). The
The use of a database of motion plans is a key feature more static the graph is, the more benefits our algorithm
of our approach. However, we differ from other approaches provides. The algorithm is based on heuristic search and is

338
therefore assumed to take in a heuristic function hG (u, v)
estimating the cost from u to v (u, v ∈ V G ). We assume
hG (u, v) is admissible and consistent for any pair of states
u, v ∈ V G . An admissible heuristic never overestimates the
minimum cost from any state to any other state. A consistent
heuristic hG (u, v) is one that satisfies the triangle inequality,
hG (u, v) ≤ c(u, s) + hG (s, v) and hG (u, u) = 0, ∀u, v, s ∈
(a) εE = 1 (b) εE → ∞
V G and ∀(u, s) ∈ E G .
Fig. 2. Effect of εE . The dark solid lines are paths in GE while the dark
C. Algorithm Detail dotted lines are best paths from s0 to sgoal according to hE . Note as εE
increases, the heuristic prefers to travel on GE . The light gray circles and
The planner maintains two graphs, G and GE . At the high- lines show the graph G and the filled in gray circles show the expanded states
level, every time the planner receives a new planning request when planning with E-Graphs. The dark gray arrow shows the returned path.
the findPath function is called. It first updates GE to account
for edge cost changes and perform some precomputations. an arbitrary number of two kinds of segments. The first type
Then it calls the computePath function, which produces a of segment corresponds to an instantaneous jump between
path π. This path is then added to GE . The updateEGraph between si and si+1 at a cost equal to the original heuristic
function works by updating any edge costs in GE that have inflated by εE (this is equivalent to saying all states can
changed. If any edges are invalid (e.g. they are now blocked reach all of the other states according to the original heuristic
by obstacles) they are put into a disabled list. Conversely, if but inflated by εE ). The second kind of segment is an edge
an edge in the disabled list now has finite cost it is re-enabled. from GE and it uses its actual cost. As the penalty term εE
At this point, the graph GE should only contain finite edges. increases, the heuristic path from s0 to the goal will go farther
A precomputeShortcuts function is then called which can be out of its way to travel toward the goal using E-Graph edges.
used to compute shortcut edges before the search begins. Ways The larger εE is, the more the actual search avoids exploring
to compute shortcuts are discussed in Section V. Finally, our G and focuses on traveling on paths in GE . Figure 2 demon-
heuristic hE , which encourages path reuse, is computed† . strates how this works. As εE increases, the heuristic guides
the search to expand states along parts of GE as shown in
findPath(sstart , sgoal ) Figure 2. In Figure 2a, the heuristic causes the search to ignore
1: updateEGraph(sgoal )
2: π = computeP ath(sstart , sgoal )
the graph GE because without inflating hG at all (εE = 1),
3: GE = GE ∪ π
the heuristicis will never favor following edges in GE . This
figure also shows how during the search, by following GE
paths, we can avoid obstacles and have far fewer expansions.
The expanded states are shown as filled in gray circles, which
updateEGraph(sgoal )
1: updateChangedCosts()
change based on εE .
2: disable edges that are now invalid The computePath function (shown below) runs weighted A*
3: re-enable disabled edges that are now valid without re-expansions [13]. Weighted A* uses a parameter
4: precomputeShortcuts()
ε > 1 to inflate the heuristic used by A*. The solution cost is
5: compute heuristic hE according to Equation 1
guaranteed to be no worse than ε times the cost of the optimal
solution and in practice it runs dramatically faster than A*.
Our algorithm’s speed-up comes from being able to reuse The main modification to Weighted A*, is that in addition
parts of old paths and avoid searching large portions of to using the edges that G already provides (getSuccessors),
graph G. To accomplish this we introduce a heuristic which we add two additional types of successors: shortcuts and snap
intelligently guides the search toward GE when it looks like motions (line 9). The only other change is that instead of using
following parts of paths in GE will help the search get close the heuristic hG , we use our new heuristic hE (lines 4 and 16).
to the goal. We define a new heuristic hE in terms of the given Shortcut successors are generated when expanding a state
heuristic hG and edges in GE for any state s0 ∈ V G . s ∈ GE . A shortcut successor uses GE to jump to a place

N −1 much closer to sgoal (closer according to the heuristic hG ).
hE (s0 ) = min min{εE hG (si , si+1 ), cE (si , si+1 )} (1) This shortcut may use many edges from various previous
π
i=0 paths. The shortcuts allow the planner to quickly get near
where π is a path s0 . . . sN −1  and sN −1 = sgoal and εE is the goal without having to re-generate paths in GE . Possible
a scalar ≥ 1. shortcuts are discussed in Section V.
Equation 1 returns the cost of the minimal path from the Finally, for environments that can support it, we introduce
the queried state s0 to the goal where the path consists of snap motions. Sometimes, the heuristic may lead the search
to a local minimum at the “closest” point to GE with respect
† This can be done with one search prior to planning or can be done on the
to the heuristic, hG , but it may not be a state on GE . For
fly for states generated by the planner. The first option is feasible whenever
the heuristic is computed for lower-dimensional projections of the original example, in (x, y, θ) navigation, a 2D (x, y) heuristic will
problem, such as 3D space for the end-effector of a high-dimensional arm. create a minimum for 2 states with the same x,y but different

339
computePath(sstart , sgoal ) Theorem 2. For a finite graph G, the planner terminates, and
1: OP EN = ∅ the solution it returns is guaranteed to be no worse than ε · εE
2: CLOSED = ∅
3: g(sstart ) = 0 times the optimal solution cost in graph G.
4: f (sstart ) = εhE (sstart )
5: insert sstart into OP EN with f (sstart ) Consider h (s) = hE (s)/εE . h (s) is clearly consistent.
6: while sgoal is not expanded do Then, εhE (s) = ε·εE h (s). The proof that ε·εE h (s) leads to
7: remove s with the smallest f -value from OP EN Weighted A* (without re-expansions) returning paths bounded
8: insert s in CLOSED
9: S = getSuccessors(s) ∪ shortcuts(s) ∪ snap(s) by ε · εE times the optimal solution cost follows from [13].
10: for all s ∈ S do
11: if s was not visited before then V. I MPLEMENTATION D ETAIL
12: f (s ) = g(s ) = ∞
13: end if
In this section we discuss how various parts of the algorithm
14: if g(s ) > g(s) + c(s, s ) and s ∈
/ CLOSED then could be implemented.
15: g(s ) = g(s) + c(s, s )
16: f (s ) = g(s ) + εhE (s ) A. Heuristic
17: insert s into OP EN with f (s )
18: end if Some heuristics hG are derived using dynamic program-
19: end for ming in a lower dimensional state space, such as a 2D Dijkstra
20: end while search for an (x, y, θ) navigation problem. Alternatively, some
heuristics hG can be computed in O(1) upon request (e.g.
euclidean distance).
θ. A problem then arises because there isn’t a useful heuristic In the first case, we can compute hE by running a Dijkstra
gradient to follow, and therefore, many states will be expanded search on the low-dimensional projection of G, with additional
blindly. We borrow the idea of adaptive motion primitives [4] edges from GE connecting the low-dimensional projection of
to generate a new action which can snap to a state on GE the states in GE . This can be computed with similar efficiency
whenever states si , sj have hG (si , sj ) = 0 and sj ∈ GE to the original heuristic hG , so it doesn’t hurt the planning
and si ∈ / GE . The action is only used if it is valid with times (this is what we used in our experiments).
respect to the current planning problem (e.g. doesn’t collide In the second case, hE can be computed by constructing a
with obstacles). As with any other action, it has a cost that is fully connected graph on V E and sgoal (using both εE hG and
taken into account during the search. cE ). A Dijkstra seach is used to find the heuristic hE from
each of these states to the goal. Then during the search for any
IV. T HEORETICAL A NALYSIS
state s, hE (s) = mins ∈{V E ∪sgoal } (hG (s, s )+hE (s , sgoal )).
Our planner provides a guarantee on completeness with In our implementation, we compute the heuristic hE in an
respect to G (the original graph representation of the problem). on-demand fashion. Our computation runs Dijkstra’s algorithm
Theorem 1. For a finite graph G, our planner terminates and until the heuristic of the requested state is computed and then
finds a path in G that connects sstart to sgoal if one exists. suspends until another un-computed heuristic is requested.
Since no edges are removed from the graph (we only B. Shortcuts
add) and we are searching the graph with Weighted A* (a Shortcuts accelerate the search by allowing the search to
complete planner), if a solution exists on the original graph, bypass retracing an old path (re-expanding the states on it)
our algorithm will find it. in GE . The algorithm works with or without the shortcuts.
Our planner provides a bound on the sub-optimality of Basically, the shortcuts are pre-computed edges that connect
the solution cost. The proof for this bound depends on our all states in GE to a very small set of states in GE . Shortcut
heuristic function hE being εE -consistent. successors can only be generated when expanding a state s ∈
Lemma 1. If the original heuristic function hG (u, v) is GE . There are several obvious choices for this subset. For
consistent, then the heuristic function hE is εE -consistent. example, it can contain all states s in GE that are closest to the
goal within each connected component of GE . The closeness
From Equation 1, for any s, s ∈ V G , (s, s ) ∈ E G . can be defined by hG or hE . In our experiments we used hG .
hE (s) ≤ min{εE hG (s, s ), cE (s, s )} + hE (s ) Other ways can also be used to compute this subset of states.
It is future work to explore these options.
hE (s) ≤ εE hG (s, s ) + hE (s )
C. Pre-Computations
hE (s) ≤ εE c(s, s ) + hE (s )
Some of the computations on GE can be done before the
The argument for the first line comes from Equation 1 goal is known. In particular, shortcuts(s) (line 9 of com-
by contradiction. Suppose the line is not true. Then, during putePath) need to know the costs of least-cost paths between
the computation of hE (s), a shorter path π could have pairs of states in GE . If state u ∈ GE is being expanded and
been found by traveling to s and connecting to s with has a shortcut to the state v on the same component in GE then
min{εE hG (s, s ), cE (s, s )}. The last step follows from hG we need to assign an edge cost c(u, v). In order to do that we
being admissible. Therefore, hE is εE -consistent. need to know the cost of a least-cost path on GE from u to v.

340
These costs can be computed before knowing the goal by using
an all-pairs shortest path algorithm like Floyd-Warshall. This
can be done in a separate thread between planning queries (as
well as adding the path from the previous query into GE ). To
make Floyd-Warshall run faster and to save memory, we can
also exploit the fact that most of the paths in GE don’t intersect
each other in many places. We can therefore compress it into a
much smaller graph containing only vertices of degree = 2 and
run Floyd-Warshall on it. Then, the cost of a path between any
pair x, y ∈ GE is given by min{c(x, xi )+c(xi , yi )+c(yi , y)}. (a) Bootstrap goals (b) One of the test sets
xi ,yi
Where xi ∈ {x1 , x2 } and yi ∈ {y1 , y2 }. x1 and x2 are states Fig. 3. Full-body planning in a warehouse
with degree = 2 that contain the path on which x resides. y1 TABLE I
and y2 are defined similarly. WAREHOUSE E NVIRONMENT: E-G RAPH P LANNING T IME
successes(of 500) mean time(s) std dev(s) max(s)
VI. E XPERIMENTAL R ESULTS 500 0.30 0.43 3.37
A variety of experiments were run in multiple domains to
verify the performance gains of our algorithm. We compared following it can get it closer to the goal. Setting ε to 2 inflates
our approach against Weighted A* without using GE as well the whole heuristic including the part of the hE using paths
as with a variety of sampling-based planners. The test domains in GE . This encourages the planner to use shortcuts as soon
include planning for a 7 degree of freedom arm as well as full- as they become available, preventing it from re-expanding an
body planning for the PR2 robot (two arms, the telescoping old path.
spine and the navigation of the base). The results were compared against regular Weighted A*
with ε = 20 so that both approaches would have the same sub-
A. Full Body Planning optimality bound. We also compared against several sampling
Planning in higher-dimensional spaces can be challenging techniques from the OMPL (Open Motion Planning Library),
for search-based planners. A full-body planning scenario for including RRT-Connect (a bi-directional RRT), PRM, and
a PR2 robot is thus a good test of the capabilities developed RRT* [5, 9, 11, 10]. Since the PRM is a multi-query approach,
in this work. Our test involves the PR2 carrying objects in we allowed it to keep its accumulated roadmap between
a large environment. We restrict the objects to be upright in queries like our method does. Since RRT* is an anytime
orientation, a constraint that often presents itself in real-world method that improves as more time is allowed we provide
tasks like carrying large objects, trays, or liquid containers. results for the first solution found as well as the solution after
We assume that the two end-effectors are rigidly attached to our timeout (we gave all planners up to 2 minutes to plan).
the object. The state space is 10 dimensional. Four define the We post-process all sampling planner paths using OMPL’s
position and yaw of the object in a frame attached to the robot. short-cutter (E-Graph and Weighted A* paths have no post-
The planner operates directly in this workspace (instead of processing).
joint space) and uses inverse kinematics to map movements 1) Warehouse scenario: Our first scenario is modeled on an
back onto the joint angles of both arms. To restrict the number industrial warehouse where the robot must pick up objects off
of inverse kinematics solutions to one, the redundant degree a pallet and move them onto a shelving unit (Figure 3). The
of freedom in each arm (upper arm roll joint) are part of our goals alternate between pallet and shelves. Since our planner
state. The robot’s position and yaw in the map frame and the improves in performance with repeated attempts in a particular
height of the telescoping spine are the last four dimensions. spatial region of space, we first bootstrap it with 45 uniformly
A 3D Dijkstra heuristic is used to plan (backwards) for a distributed goals (split between the pallet and the shelves).
sphere inscribed in the carried object from its goal position The bootstrap goals and the resultant GE (projected onto
to the start position (this is the low-dimensional projection 3D end effector position) after processing them are shown in
for the heuristic). The heuristic is useful in that it accounts Figure 3a. We bootstrapped the PRM with the same 45 goals.
for collisions between the object and obstacles. However, it A set of 100 random goals (with varying positions and yaws
does not handle the complex kinematic constraints on the of the object) alternating between the pallet and the shelves
motion of the object due to the arms, spine, and base and were then specified to the planner. This entire process was
does not account for collisions between the body of the robot repeated 5 times with GE cleared before running each new
and the environment. In all experiments, ε = 2 and εE = 10 set of 100 goals. The planning times for E-Graphs for all 500
resulting in a sub-optimality bound of 20. We chose these random goals are shown in Table I. On average, 94% of the
values for the parameters (manually) as they provided a good edges on a path produced by our planner were recycled from
combination of speed-up and sub-optimality bound. In future GE . The mean time to update GE between queries (add the
work, we will look into ways to automatically reduce the sub- new path and compute Floyd-Warshall) was 0.31 seconds.
optimality bound as planning time allows. Setting εE to 10 In Table II we compare E-Graphs to several other methods.
greatly encourages the search to go to GE if it looks like We show the average speed-up by computing the average E-

341
TABLE II
WAREHOUSE E NVIRONMENT: P LANNING T IME C OMPARISON
method successes(of 500) mean speed-up std dev max
speed-up speed-up
Weighted A* 497 15.80 54.05 625.54
RRT-Connect 500 0.59 0.42 3.08
PRM 500 2.16 14.79 226.51
RRT* (first solution) 440 11.90 45.57 594.45

TABLE III
WAREHOUSE E NVIRONMENT: PATH Q UALITY C OMPARISON
method object XYZ path std dev base XY path std dev
length ratio ratio length ratio ratio
Weighted A* 0.70 0.15 1.36 2.89
RRT-Connect 0.94 0.59 1.55 3.31 (a) GE after bootstrap goals (b) GE after bootstrap goals
PRM 0.78 0.33 0.64 0.39
RRT* (first solution) 0.86 0.37 1.47 3.46
RRT* (final solution) 0.87 0.43 1.41 2.84

Graph to other method ratio (across the cases where both


methods found a solution to the query). We can see that
generally E-Graphs are significantly faster than other methods
(except for bi-directional RRT). In Table III we can see results
comparing path quality on two metrics: the length of the path
the carried object travels and the length of the path the base
(c) GE after test goals (d) GE after test goals
travels. The results in this table are also ratios and we can
see that all methods produce shorter paths than E-Graphs. Fig. 4. Full-body planning in a kitchen scenario
This is expected from Weighted A* since E-Graph paths TABLE V
sometimes go out of their way a bit to find solutions quicker. K ITCHEN E NVIRONMENT: P LANNING T IME C OMPARISON
Sampling planners generally have poor path quality but the method successes(of 40) mean speed-up std dev max
speed-up speed-up
shortcutting post-process step works well on this relatively Weighted A* 37 34.62 87.74 506.78
simple experiment where most paths are very close to straight RRT-Connect 40 1.97 2.35 11.32
PRM 25 16.52 74.25 372.90
lines in configuration space. Our next experiment is a more RRT* (first solution) 23 50.99 141.35 613.54
difficult kitchen scenario where the more complicated paths
cause shortcutting to be less effective. however, PRMs only solved 25 of the 40 cases and therefore
2) Kitchen Environment: A second set of tests was run is only compared against E-Graphs on easier queries where
in a simulated kitchen environment. 50 goals were chosen shortcutting still works well.
in locations where object are often found (e.g. tables, coun- We ran an additional experiment in the kitchen scenario
tertops, cabinets, refrigerator, dishwasher). 10 representative to show the consistency of E-Graph solutions. Consistency
goals were chosen to bootstrap our planner, which was then measures how similar output of a planner is, given similar
tested against the remaining 40 goals. Figure 4 shows GE both inputs (start and goal). In many domains, this kind of path
after bootstrapping and after all the goals have been processed.
Table IV shows the planning times for E-Graphs. On
average, 80% of the edges on a path produced by our planner
were recycled from GE . The mean time to update GE between
queries (add the new path and compute Floyd-Warshall) was
0.34 seconds. In Table V we can see that E-Graphs provide a
significant speed-up over all the other methods, generally over
30 times (though only a factor of 2 for bi-directional RRT).
The PRM and RRT* also failed to solve many of the queries
within the 2 minute limit.
In Table VI we can see that E-Graphs provide significantly (a) RRT-Connect (b) E-Graph
shorter paths than bi-directional RRT (Figure 5 shows an Fig. 5. An example comparing an RRT-Connect path to an E-Graph path
example). While RRT* path quality does improve with time, (path waypoints move from black to white). The E-Graph path is shorter and
after 2 minutes, it still is no better than E-Graphs. The more goal directed.
PRM still does better than E-Graphs in the kitchen scenario, TABLE VI
K ITCHEN E NVIRONMENT: PATH Q UALITY C OMPARISON
method object XYZ path std dev base XY path std dev
TABLE IV length ratio ratio length ratio ratio
K ITCHEN E NVIRONMENT: E-G RAPH P LANNING T IME Weighted A* 0.91 0.68 1.14 1.40
successes(of 40) mean time(s) std dev(s) max(s) RRT-Connect 2.54 4.67 3.45 9.67
40 0.33 0.25 1.00 PRM 0.85 0.32 0.88 0.48
RRT* (first solution) 1.08 0.60 1.39 1.79
RRT* (final solution) 1.03 0.48 1.36 1.96

342
Fig. 6. Consistency experiment in a kitchen scenario. 58 goals were split (a) A demonstrated path (b) GE after 12 goals
into two groups and the various methods were asked to plan between them.
Fig. 7. Learning by demonstration in a more difficult warehouse scenario
TABLE VII
K ITCHEN E NVIRONMENT: PATH C ONSISTENCY C OMPARISON
method dynamic time
warping similarity
E-Graphs 407
RRT-Connect 1512
PRM 748
RRT* (first solution) 1432
RRT* (final solution) 1034

predictability is critical for people to be comfortable around


robots. We tested this by placing two groups of goals in the (a) Grasping pipeline setup (b) GE partway through the
kitchen environment as shown in Figure 6. There are 58 goals experiment
split evenly between a cabinet and a refrigerator shelf. We Fig. 8. Tabletop manipulation experiments
ran through the goals alternating between the two groups so
that while no two queries have the same start and goal, the quality is. Our planner will use parts of it if possible and return
paths should be mostly the same for each. The results of the a solution with a bound on the resulting solution quality.
experiment are shown in Table VII. We used the dynamic time
warping similarity metric [16] to compare the methods. Having C. Single Arm Planning
a value closer to 0 means the paths are more similar. Since
this method is for comparing pairs of paths, we performed The planner’s performance was also tested for tabletop
an all-pairs comparison within each group and then took the manipulation using the PR2 robot (Figure 8a). A search-based
average path similarity. We can see that E-Graphs by far have planner [4] generates safe paths for each of the PR2’s 7-DOF
the most consistent paths due to the deterministic nature of arms separately. The goals for the planner are specified as the
the planner and reuse of previous path segments. position and orientation of the end-effector. Our implementa-
tion builds on the ROS grasping pipeline to pick up and put
B. Learning by Demonstration down objects on the table in front of the robot. Our approach
is used during both the pick and place motions to plan paths
The high dimensionality of the full-body domain makes it for the robot’s arms (individually).
easy for the planner to get stuck in large local minima. This In the experiments, statistics for 411 planning requests were
is especially true when the heuristic is misleading, such as in recorded using Weighted A*, our approach, and a randomized
Figure 7a. After picking up an object at the pallet and given planner (SBL [17]). The results are shown in Table VIII and
a goal in the shelf, the heuristic guides the search around the Figure 8b shows GE part-way through the experiment. We set
right side of the pole where the robot’s body won’t fit through. ε = 2 and εE = 50 for a sub-optimality bound of 100. We
The search without Experience Graphs then takes over 30 ran the regular Weighted A* planner with ε = 100.
minutes to solve the problem. Table VIII shows that we have a speed increase of about
However, if a path were demonstrated, and then added to 2.5 over both methods. The heuristic computation time (0.1s)
GE , our approach could harness this information to reach dominates the planning times for both our approach and the
all the desired goals. We demonstrated such a path through Weighted A* approach, resulting in a smaller speedup than
teleoperation in the simulated world (Figure 7a). We added this expected by the ratio of expansions. On average, 95% of the
demonstrated path to GE by having each recorded waypoint edges on a path produced by our planner were recycled from
be a vertex with a cost represented by the cost function used
in our approach. After the demonstration, our approach had
an average planning time of 1.08 seconds to all 12 goals as TABLE VIII
R ESULTS ON TABLETOP M ANIPULATION (411 GOALS )
shown in Figure 7b. While learning by demonstration is not mean time(s) std dev time(s) mean expands mean cost
the focus of this work, these preliminary results show it as a E-Graphs (E) 0.13 0.07 4 117349
Weighted A* (W) 0.26 0.10 145 109297
promising application. It is also interesting to note that any SBL (S) 0.24 0.09 N/A N/A
path can be demonstrated regardless of how sub-optimal the Ratio (W/E) 2.50 1.23 66 1.03
Ratio (S/E) 2.44 1.50 N/A N/A

343
comparison against the same search-based planning method
without E-Graphs showed a speed-up of 1 to 2 orders of
magnitude. Our comparison against a state of the art sampling
based approach showed that we are competitive in terms of
speed, but we yield more consistent paths.
As future work, we would like to use heuristics to prune GE
as it gets large, as well as taking a deeper look at applications
in the field of learning by demonstration. We are also interested
(a) 40 end effector trajectories (b) One visualized path in making this into an anytime search so that the solution could
with similar starts and goals. Our approach optimality as more time is allowed.
approach is in black, while the
sampling approach is in gray. R EFERENCES
Fig. 9. E-Graphs provide similar solutions to similar problems [1] C. Atkeson and J. Morimoto. Nonparametric representation of policies
and value functions: A trajectory-based approach. In Advances in Neural
TABLE IX Information Processing Systems (NIPS), 2003.
L ENGTH OF 40 SIMILAR QUERIES IN TABLETOP MANIPULATION [2] Dmitry Berenson, Pieter Abbeel, and Ken Goldberg. A robot path
mean length (m) std dev length (m) planning framework that learns from experience. In ICRA, 2012.
E-Graphs 1.378 0.012 [3] J. Bruce and M. Veloso. Real-time randomized path planning for robot
SBL 1.211 0.178 navigation. In IEEE/RSJ International Conference on Intelligent Robots
and Systems, 2002.
[4] B.J. Cohen, G. Subramanian, S. Chitta, and M. Likhachev. Planning
GE . The mean time to update GE (add the new path and for manipulation with adaptive motion primitives. In Robotics and
compute Floyd-Warshall) was 0.12 seconds. Automation (ICRA), 2011 IEEE International Conference on, 2011.
[5] Ioan A. Şucan, Mark Moll, and Lydia E. Kavraki. The Open Motion
These results show that our approach is competitive with Planning Library. IEEE Robotics & Automation Magazine, 2012. URL
sampling-based method in terms of planning times. How- http://ompl.kavrakilab.org. To appear.
ever, our approach provides explicit cost minimization and [6] D. Ferguson, N. Kalra, and A. T. Stenz. Replanning with rrts. In IEEE
International Conference on Robotics and Automation, May 2006.
therefore, some notion of consistency (for similar goals we [7] Nikolay Jetchev and Marc Toussaint. Trajectory prediction: Learning to
provide similar paths). Figure 9a shows the paths of the end map situations to robot trajectories. In IEEE International Conference
effector for 40 paths that had similar starts and goals (within on Robotics and Automation, 2010.
[8] Xiaoxi Jiang and Marcelo Kallmann. Learning humanoid reaching
a 3x3x3cm cube). The gray paths are from the randomized tasks in dynamic environments. In IEEE International Conference on
approach while the black are from our approach. Notice that Intelligent Robots and Systems, 2007.
the randomized approach produces highly varying paths even [9] James J. Kuffner Jr. and Steven M. LaValle. Rrt-connect: An efficient
approach to single-query path planning. In ICRA, 2000.
after shortcutting. On the other hand our approach consistently [10] S. Karaman and E. Frazzoli. Incremental sampling-based algorithms
provides almost the same path each time (we also applied a for optimal motion planning. In Robotics: Science and Systems (RSS),
simple short-cutter to eliminate artifacts from the discritized Zaragoza, Spain, June 2010.
[11] L. E. Kavraki, P. Svestka, J.-C Latombe, and M. H. Overmars. Prob-
environment). Table IX shows that our approach has only a abilistic roadmaps for path planning in high-dimensional configuration
slightly longer path length (for the end effector distance) but spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580,
a significantly lower standard deviation. While our planner’s 1996.
[12] J. Lien and Y. Lu. Planning motion in environments with similar obsta-
cost function is actually trying to minimize the change in joint cles. In Proceedings of the Robotics, Science and Systems Conference,
angles, our average end effector path length is still relatively 2005.
competitive with the sampling-based approach. Figure 9b [13] M. Likhachev, G. Gordon, and S. Thrun. ARA*: Anytime A* with
provable bounds on sub-optimality. In Advances in Neural Information
shows one of these paths visualized in more detail. Processing Systems (NIPS) 16. Cambridge, MA: MIT Press, 2003.
[14] C. Liu and C. G. Atkeson. Standing balance control using a trajectory
VII. C ONCLUSION library. In IEEE/RSJ International Conference on Intelligent Robots and
In this paper we have presented planning with Experience Systems, 2009.
[15] S. Martin, S. Wright, and J. Sheppard. Offline and online evolutionary
Graphs, a general search-based planning method for reusing bi-directional rrt algorithms for efficient re-planning in environments
parts of previous paths in order to speed up future planning with moving obstacles. In IEEE Conference on Automation Science and
requests. Our approach is able to do this while still providing Engineering, 2007.
[16] Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm
theoretical guarantees on completeness and a bound on the optimization for spoken word recognition. IEEE Transactions on
solution quality with respect to the graph representing the Acoustics, Speech, and Signal Processing, 26, 1978.
[17] Gildardo Sanchez and Jean claude Latombe. A single-query bi-
planning problem. The paths our planner uses can be fed back
directional probabilistic roadmap planner with lazy collision checking.
from each planning episode in an online fashion to allow In International Symposium on Robotics Research, 2001.
the planner to get better over time. The paths can also be [18] M. Stolle and C. Atkeson. Policies based on trajectory libraries. In
IEEE International Conference on Robotics and Automation, 2006.
demonstrated by a human (or other planner) to allow our
[19] M. Stolle, H. Tappeiner, J. Chestnutt, and C. Atkeson. Transfer
planner to improve (while still providing a bound on solution of policies based on trajectory libraries. In IEEE/RSJ International
quality). We provided experiments in the robotics domains Conference on Intelligent Robots and Systems, 2007.
[20] M. Zucker, J. Kuffner, and M. Branicky. Multipartite rrts for rapid
of planning for a 7 DoF arm in tabletop manipulation tasks
replanning in dynamic environments. In IEEE International Conference
and full-body planning for the PR2 robot performing mobile on Robotics and Automation, 2007.
manipulation tasks in warehouse and kitchen scenarios. Our

344
Walking and Running on Yielding and Fluidizing
Ground
Feifei Qian1, Tingnan Zhang1, Chen Li1, Pierangelo Masarati4, Aaron M. Hoover3, Paul Birkmeyer2,
Andrew Pullin2, Ronald S. Fearing2 & Daniel I. Goldman1
1 3
Georgia Institute of Technology, Atlanta, Georgia, Franklin W. Olin College of Engineering, Needham,
USA, 30332–0250 MA, USA, 02492-1200
2 4
University of California at Berkeley, Berkeley, USA, Dipartimento di Ingegneria Aerospaziale, Politecnico
94720 di Milano, Milano, Italy, 20156

Abstract—We study the detailed locomotor mechanics of a For example, in wheeled and tracked vehicles, wheel slippage
small, lightweight robot (DynaRoACH, 10 cm, 25 g) which and sinkage can cause significant performance loss [6].
can move on a granular substrate of closely packed 3 mm Granular media are collections of particles that interact
diameter glass particles at speeds up to 50 cm/s (5 body through dissipative, repulsive contact forces [7]. Forced
length/s), approaching the performance of small, high- granular media remain solid below the yield stress but flow like
performing, desert-dwelling lizards. To reveal how the a fluid when the yield stress is exceeded [8]. The solid-fluid
robot achieves this high performance, we use high speed transition presents great challenges for terrestrial devices
imaging to capture kinematics, and develop a numerical moving on granular media. For example, previous studies [9]
multi-body simulation of the robot coupled to an demonstrated that a bio-inspired RHex-class legged robot,
experimentally validated discrete element method (DEM) SandBot (30 cm, 2.3 kg), walked effectively at up to 1 body
simulation of the granular media. Average forward speeds length/s on granular media at low to intermediate stride
measured in both experiment and simulation agreed well, frequencies, where the granular material behaved like a
and increased non-linearly with stride frequency, yielding solid. The granular material yielded as the legs
reflecting a change in the mode of propulsion. At low penetrated until vertical force balance was achieved. The
frequencies, the robot used a quasi-static “rotary walking” granular material then solidified under the legs while the body
mode, in which the granular material yielded as the legs lifted and moved forward, as if the robot were walking on a
penetrated and then solidified once vertical force balance solid. At high stride frequencies, however, because the legs
was achieved. At high frequencies, duty factor decreased encountered previously disturbed ground, the granular material
below 0.5 and aerial phases occurred. The propulsion around the legs became continuously fluidized, and the robot
“swam” forward slowly (~ 0.01 body length/s) using drag on
mechanism was qualitatively different: the robot ran
the legs to overcome belly drag.
rapidly by utilizing the speed-dependent fluid-like inertial
response of the material. We also used our simulation tool In contrast, a variety of animals live in the deserts and move
to vary substrate parameters that were inconvenient to rapidly across granular surfaces. For example, the zebra-tailed
vary in experiment (e.g., granular particle friction) to test lizard (Callisaurus draconoides, ~ 10 cm, ~ 10 g) can run at
performance and reveal limits of stability of the robot. speeds over 100 cm/s (10 body length/s) on sand. Unlike
Using small robots as physical models, our study reveals a SandBot which must penetrate a large portion (> 70%) of its
mechanism by which small animals can achieve high legs to move on granular media, the lizard is light enough that
performance on granular substrates, which in return even while running it only penetrates a small portion (< 30%)
advances the design and control of small robots in of its legs into granular substrates to generate force [5]. This
suggests that a small, lightweight body may confer advantages
deformable terrains.
for locomotion on deformable surfaces such as granular media.
Keywords: bio-inspired robot; legged locomotion; Recent advances in smart composite microstructure (SCM)
granular media; lightweight [10] have enabled the development of small, lightweight robots
(~ 10 cm, ~ 20 g) [11] [12] like DynaRoACH (Fig. 1A). These
I. INTRODUCTION robots are similar in size to the zebra-tailed lizard (among other
myriad desert vertebrates and invertebrates [13], [14]) and can
There is an increasing need for robots to traverse a diversity achieve performance approaching animals (~ 10 body length/s)
of complex terrain. Platforms have been developed that can on solid surfaces. Therefore, in addition to advancing
effectively run on fractured rigid ground [1], [2], crawl within locomotor capabilities of devices on complex terrain [15], these
concave surfaces [3], and climb on walls [4]. However, relative lightweight robots provide promising physical models to study
to biological organisms [5], manmade devices often have poor how effective legged locomotion can be achieved on granular
locomotor ability on granular substrates like sand and gravel. substrates in small, high-performing animals.
We acknowledge funding from the Burroughs Wellcome Fund, the Army
Research Laboratory (ARL) Micro Autonomous Systems and Technology
(MAST) Collaborative Technology Alliance (CTA), the National Science
Foundation (NSF) Physics of Living Systems (PoLS), and the Army Research
Office (ARO). 345
Another challenge for studying locomotion on granular demonstrated that these particles were a good model for
media is the lack of comprehensive force models at the level of studying locomotion within granular media.
the Navier-Stokes equations for fluids [16]. Recently an
experimentally validated discrete element method (DEM) In nature, granular media exist in a range of compactions,
simulation (described below) of a model granular medium (3 measured by the volume fraction  (the ratio between the solid
mm diameter glass particles) was developed and successfully volume and the occupied volume). For dry granular media, 
captured the locomotor mechanics of a sand-swimming lizard can vary from 0.57 <  < 0.64 [19], although this range is
moving within granular media [17]. The DEM simulation influenced by particle friction [20]. The yield strength of a
provides a tool to obtain accurate, detailed information such as granular medium generally increases with [9], [21] and
force and flow of the granular media during intrusions relevant affects locomotor performance on the granular medium [9]. In
to locomotion. Such information is challenging to obtain in our study, we prepared the granular medium into a closely
experiments, since force platforms [18] and 3D particle image packed state ( = 0.63). However, we found that our results did
velocimetry (PIV) are not yet developed for deformable opaque not qualitatively change for different  (e.g., robot speed was
ground. insensitive to ), likely because the robot penetrated its legs
into the granular medium to depths of only a few particle
In this paper we reveal basic principles of movement of
diameters and the range of achievable  was small (0.61< <
lightweight locomotors on granular media using a combination
0.63) in the low friction 3 mm particles.
of laboratory experiment and computer simulation. We perform
studies of DynaRoACH on a medium of small glass particles.
To obtain estimates of ground reaction forces that result in high
performance, we integrate the DEM simulation with a multi-
body dynamic simulation of the robot. Our study reveals for the
first time that qualitatively different propulsion mechanisms
exist for low and high frequency movement on granular media.
While the low frequency locomotion of DynaRoACH can be
understood using a previously introduced “rotary walking”
model, at higher frequency, the robot utilizes a hydrodynamic
mechanical response of the granular medium to achieve high
performance through sustained fluidization of the ground. We
use the simulation to systematically vary parameters like
particle friction that are inconvenient to modify in experiment,
and demonstrate performance and stability limits. We expect
that the mechanics discovered here and the tools we have
developed should be applicable to other devices and provide a
starting point to understand biological locomotion and develop
robot designs on more complex deformable substrates, like leaf
litter and mud.

II. MATERIALS AND METHODS

A. Experiments
1) Robotic platform
The DynaRoACH robot used in this study (Fig. 1A) is a
small, lightweight (10 cm, 25 g), bio-inspired hexapedal robot
[11]. It has six c-shaped legs (radius 1 cm) and uses an
alternating tripod gait. All six legs are driven by a single motor
through body linkages. The motor is controlled by a centralized
controller mounted close to the center of mass (CoM) of the
robot. Control parameters like stride frequency, running time,
and PID control gains are set on a PC and communicated to the
controller through a Bluetooth wireless interface.
2) Model granular media
We used 3.0 ± 0.2 mm diameter glass particles (density =
2.47 g/cm3) as the granular medium (see Fig. 1C). The large Figure 1. Locomotion experiment and simulation. (A) The lightweight,
hexapedal DynaRoACH robot resting on a bed of 3 mm diameter glass
size of the particles reduces computation time in the simulation
particles. (B) Leg tip trajectories from top view and side view. Blue and green
portion of the study, facilitating a direct comparison between trajectories denote the two alternating tripods. (C) High speed video
experiment and simulation. While these particles are larger experimental setup. (D) Simulation of the robot using MBDyn (E) Leg tip
than most natural sand grains, they have similar qualitative trajectories in simulation. (F) Simulation of the robot running on a bed of 3
behavior in response to intrusion; a previous study [17] also mm particles.

346
3) Locomotion experiments conditions. The simulated granular bed (3 × 105 particles) was
We ran the DynaRoACH robot on a 75 cm long, 30 cm 60 PD (particle diameter) in width, 15 PD in depth, and 290 PD
wide trackway filled to a depth of 6 cm (Fig. 1C). We pressed in length, and had frictionless boundaries (Fig. 1F). At low
the bed using a flat plate before each trial to prepare the frequencies we used a shorter granular bed (90 PD) containing
particles to a closely packed state. Running kinematics were 1 × 105 particles to save computation time.
captured by two high speed video cameras (AOS X-PRI) from
2) Dynamic simulation of the robot
both top and side views at a frame rate of 200 fps. One high
contrast dorsal marker was bonded above the robot center of To model the robot we used a multi-body dynamic
mass to obtain average forward speed; two lateral markers were simulator, MBDyn [23], which allows time domain simulation
bonded on the front and rear of the robot body to obtain CoM of multi-body mechanical systems from first principle
height (approximated by the average vertical position of the equations. MBDyn features a full 3D simulation with six
two markers). Stride frequency was determined from the translation and rotation degree-of-freedoms. This is essential
videos. for locomotion on the surface during which pitch, roll, and yaw
are often present [15].
4) Leg trajectories
In the dynamic simulation, the robot was constructed with
To capture prescribed leg trajectories, the robot was similar body and leg geometries as the actual robot (Fig. 1D).
suspended in the air and the trajectories of the leg tips were The simulated robot was composed of 13 individual rigid parts:
recorded as the motor rotated. During a cycle, each leg rotated one box-shaped body, six c-shaped legs, and six linking plates
backward about the hip (retraction), lifted-up sideways, and between legs and body. The legs of the actual robot were not
swung forward (protraction) (Fig. 1B). Leg kinematic perfectly rigid but experimental observations showed little leg
parameters such as the fore-aft swing angle and lateral lifting deformation during locomotion on granular media. The joints
angle were determined by tracking the markers on the legs and between the link plates and c-legs allowed front-back swing of
used to guide tuning of leg trajectories in simulation. We define the legs while the plate-body joints allowed sideway lifting of
a stride period T as the time between the start of two the legs. Tuning kinematic parameters for the joint movements
consecutive retraction phases, and stance as when the leg produced leg trajectories that resembled experimental
generates ground reaction force (determined from simulation). measurements (Fig. 1E) without mimicking the internal linkage
of the actual robot.
B. Simulation
1) Discrete element method to model contact forces 3) Integration of DEM with dynamic simulation
To investigate locomotion of DynaRoACH in more detail, a We combined MBDyn with the DEM code to simulate
simulation of the robot was developed and coupled to a robot locomotion on granular media via a communication
granular media simulation. This granular simulation used the interface (a UNIX socket using C++). At each time step,
discrete element method (DEM) to compute the particle- MBDyn integrated the equations of motion for the robot
particle and particle-intruder interaction for the 3 mm diameter combined with the force and torques calculated from the DEM
glass particles. As in previous work [17], the DEM simulation code. The updated kinematics including position, orientation,
was validated by matching the forces on intruders moving in velocity, and angular velocity of each part of the robot were
the granular medium (e.g., a rod dragged horizontally) with then passed back to the DEM code to compute the force and
experimental measurements. torque on all interacting elements at the next time step. The
time step was 1 μs set by the particle collision time in DEM.
In the DEM simulation, the normal contact force between
an interacting pair of particles is given by a standard force law In addition to the kinematics during locomotion (e.g., CoM
[22], a Hertzian repulsion and a velocity dependent dissipation position and velocity, stride length, limb penetration depth), the
(to account for a coefficient of restitution): dynamics during locomotion (e.g., net ground reaction force on
each limb and tripod) were also determined from the
Fn  k n δ3 / 2  Gn vn δ1/ 2 ; simulation.

the tangential contact force is modeled as Coulomb friction: C. Kinematic predictions based on previous work
Fs  μ Fn ; Because the small DynaRoACH robot has similarly shaped
c-legs to the larger SandBot, we use the rotary walking model
where δ is the deformation (virtual overlap) between developed for SandBot [9] to make two kinematic predictions
contacting particle pairs or particle-intruder pairs, v n the normal for the locomotion of the DynaRoACH robot on granular
media. We will then test these predictions in both experiment
component of relative velocity, k n  2106 kg s -2 m-1 and and simulation.
Gn  15 kg s -1 m -1/2 the contact stiffness and viscoelasticity First, we predict that both the body height and forward
dissipation coefficient, and μ {pp, pi}  0.1, 0.3 the particle- speed will increase during stance and decrease between
particle and particle-intruder friction coefficients. The stances. For walking at low frequencies, leg intrusion speeds
restitution and friction coefficients were experimentally will be small enough that granular forces will be dominated by
measured and validated in a rod drag experiment [17]. Once the friction, and therefore independent of speed, increase with
parameters were set in the DEM simulation, the robot depth (hydrostatic-like) [24], [25]. As a result, during each
locomotion could be accurately predicted over a wide range of step, the legs will initially slowly penetrate into granular

347
medium while the body rests on the surface. As the legs increased monotonically with stride frequency. At the highest
penetrate deeply enough for the lift on the legs to balance the stride frequency (12 Hz) tested in experiment, the robot
weight and vertical inertial force of the body, the legs should reached a speed of 50 cm/s (5 body length/s), comparable to
stop penetrating and rotate atop solidified granular media, slow runs of the zebra-tailed lizard. Calculated stride length
lifting the body and kinematically propelling it forward. As the (Fig. 3B) decreased with stride frequency from low (0-3 Hz) to
legs withdraw from the granular medium, the body should fall intermediate frequencies (46 Hz) but increased with stride
and forward speed will decrease to zero. We refer to this as frequency at high frequencies (712 Hz). Duty factor (the
rotary walking in SandBot, and expect to see these features in percentage of the total stride period during which the limb is in
the small robot. contact with the ground) measured in simulation (Fig. 2E,J) fell
Second, based on the rotary walking model, we predict that below 0.5 at intermediate frequencies of ~6 Hz (Fig. 3C),
stride length should decrease with stride frequency. In the indicating the onset of aerial phases. Closer examination of the
quasi-static rotary walking mode, stride length is inversely kinematics revealed that the robot displayed a transition in
related to leg penetration depth by geometry. As stride locomotor mode as stride frequency increased:
frequency increases, because the inertial force of the body 1) Walking at low stride frequencies
during body lift-up increases, the legs should penetrate more At low stride frequencies (e.g., 3 Hz, Fig. 2A-E), as
deeply, and therefore the stride length will decrease. In predicted, the DynaRoACH robot used a quasi-static rotary
addition, the transition from walking to swimming should be walking locomotor mode, where forward speed increased sub-
triggered by the reduction in stride length—at high enough linearly with stride frequency (i.e., stride length decreased; Fig.
stride frequency, stride length should become small enough 3B). Instantaneous forward speed also increased from 0 to 25
that the legs will encounter previously disturbed material cm/s during most of stance and then dropped to zero (Fig. 2C).
during each step. Vertical position of the CoM tracked in simulation (Fig.
2A,B,D) showed that average body height was 1.28 ± 0.03 cm
III. RESULTS AND DISCUSSION during stance, increased by 0.46 ± 0.03 cm (38% of the
standing body height 1.2 cm) during most of stance, and then
A. Kinematics decreased by the same amount.
The time-averaged forward speed of the robot measured in
both experiment and simulation (Fig. 3A) agreed well, and

Figure 2. Two locomotion modes observed for DynaRoACH moving on granular media. (A-E) Walking at low frequencies (e.g., 3 Hz). (F-J) Running at high
frequencies (e.g., 11 Hz). (A,F) Sideview of the robot in experiment. (B,G) Sideview of the robot in simulation. (C,H) Instantaneous forward speed vs. time. (D,I)
Body height vs. time. (E,J) Vertical ground reaction force on a tripod vs. time. Dashed curve and solid curve are for the two alternating tripods. Black horizontal
line indicates body weight (mg). In (A,B) and (F,G) the three time instants shown are the start, middle, and end of three different stances. In (C-E) and (H-J) data
are shown for simulation only. Duty factor is stance duration t divided by stride period T.

348
The observed decrease in stride length with stride water [27] or granular surfaces [5], [9], it is critical to
frequency, increase in body height and forward speed during generate sufficient Fz to balance the weight and inertial force
most of stance, and decrease in body height and forward of the body before legs sink too deeply into the substrate.
speed between stances were in accord with the rotary Averaged over a cycle, lift must equal the body weight, i.e.,
walking model [9] (Fig. 3A,B, red curve). This suggests that 1
like SandBot, the small DynaRoACH robot also rotary- F dt  mg , where mg is the body weight of the robot,
walked on solidified granular media at low frequencies. DT z
T the cycle period, and D the duty factor defined as the
The decrease in body height and forward speed occurred stance duration divided by T.
at the start of stance; however, this does not contradict the
rotary walking model, but is a consequence of the different
leg trajectories of the small robot and SandBot. Because
SandBot rotates its legs in circular trajectories, its body must
rest on the surface between two tripods, resulting in stance
phase that begins during the retraction phases of legs. The
small DynaRoACH robot instead uses protraction-retraction
leg trajectories, which result in stance phases that begin
during protraction phases of the legs.
2) Running at high stride frequencies
At high stride frequencies (Fig. 2F-J, 11 Hz) the
DynaRoACH robot exhibited a different locomotor mode
than predicted by the rotary walking model. The forward
speed of DynaRoACH increased super-linearly with stride
frequency (i.e., stride length increased; Fig. 3B).
Instantaneous forward speed was always greater than zero
and decreased during the first half of stance. It then increased
during the second half of stance (Fig. 2H). The average body
height measured in simulation was 1.86 ± 0.02 cm above the
surface (Fig. 2F,G,I), which was 0.58 cm (48% of the
standing body height 1.2 cm) higher than that at low
frequencies. Body height decreased by 0.19 ± 0.02 cm (16%
of the standing body height 1.2 cm) during the first half of
stance and increased by the same amount during the second
half of stance. Simulation revealed that grains around the
intruding legs remained fluidized throughout the stance
phase.
While these kinematics were different from those
predicted by the rotary walking model (Fig. 3A,B, red
curve), they also differed from the slow surface swimming
which SandBot used at high stride frequencies [9], in which
the body height remained constant while the belly lay on the
surface and forward speed was small (~ 1 cm/s). The
decrease in body height and instantaneous forward speed
during the first half of stance, the monotonic increase of
average forward speed with stride frequency, and the aerial
phases observed in the small robot at high stride frequencies
resembled those observed in the zebra-tailed lizard running
on granular substrates [5], which follows a spring-loaded
inverted pendulum (SLIP) model [26]. This suggested that
unlike SandBot but like the zebra-tailed lizard, the small
robot used a SLIP-like running mode at high frequencies.

B. Vertical ground reaction force


To understand the mechanism of the transition in
locomotor mode from walking at low frequencies to running
Figure 3. Performance and gait parameters. (A) Average forward speed
(but not swimming) at high frequencies, we examined in vs. stride frequency (circles: experiment; squares: simulation). (B) stride
simulation the vertical ground reaction force, Fz, on a tripod length vs. stride frequency (circles: experiment; squares: simulation) (C)
of legs. For animals and legged robots moving on duty factor vs. stride frequency (simulation only). Dashed curves in (A,B)
deformable or yielding substrates such as the surface of are predictions from the rotary walking model [9]. Error bars indicate
standard deviation.

349
At low stride frequencies (e.g., 3 Hz), because duty factor mechanism from low frequency (i.e., low vertical penetration
was greater than (but close to) 0.5, the Fz on both tripods was speed) walking on yielding ground to high frequency (i.e.,
close to the body weight for most of the cycle (Fig. 2E). As high vertical penetration speed) running on fluidizing ground
duty factor decreased below 0.5 with increasing stride is generic to locomotion on granular media. However the
frequency, the Fz on both tripods no longer overlapped, and frequency at which the walk-to-run transition occurs should
the magnitude of Fz on each tripod increased. The peak of Fz depend on parameters associated with the granular media as
increased from ~ 1 mg at 3 Hz (Fig. 2E) to ~ 6-7 mg at 12 Hz well as the robot morphology and kinematics.
(Fig. 2J). Peak torque on a tripod about the hips measured in
simulation at 12 Hz was 10 mN-m; this was less than the The capability of the small robot to run rapidly at high
stall torque of the motor-gearbox system. frequencies on granular media by using hydrodynamic-like
forces, in contrast to SandBot’s slow swimming, suggests
The large increase in Fz at high frequencies was likely a that lightweight locomotors have an advantage when moving
result of a material response to leg intrusion and differed on granular surfaces. Indeed, the small robot’s legs are
from the friction-dominated yielding observed in the relatively large (~ 1.4 cm2) compared to its body weight (25
SandBot study. This is because the SandBot walking model g) (each leg applies a pressure of 800 Pa when standing), and
based on the friction-dominated hydrostatic-like forces could can generate enough hydrodynamic-like lift by paddling its
only explain the mechanism governing the increase in Fz at legs rapidly to maintain the body well above the surface. By
low frequencies. At low frequencies (< 3 Hz), the granular contrast, SandBot’s legs are relatively small (~ 5 cm2)
force was friction-dominated, and therefore was assumed to compared to its body weight (2300 g) (each leg applies a
be independent of intrusion speed and to increase with pressure of ~1.5 × 104 Pa when standing), and cannot
penetration depth. As stride frequency increased, this depth generate enough hydrodynamic-like lift to support the body
dependent granular force increased due to the increasing before the legs sink deeply enough to encounter previously
inertial force associated with lifting of the body [9], resulting disturbed material over steps and trigger swimming. This
in an increasing penetration depth (Fig. 4A). At high stride may explain why the zebra-tailed lizard, the highest-
frequencies (> 3 Hz), however, the measured leg penetration performing among desert lizards of similar size, has the
depth decreased (Fig. 4A), counter to the rotary walking largest hind feet [5].
model prediction. In this case, the walking model predicted a
decrease in the lift force on the legs, contrary to observations
(Fig. 2E, J).
This discrepancy suggests that there must be additional
contribution to the force at high frequencies. Examination of
leg kinematics in simulation revealed that the vertical
penetration speed of the legs increased with stride frequency
and reached nearly 1 m/s at 12 Hz (Fig. 4B). It is known that
the granular forces during high speed impact are
hydrodynamic-like and increase quadratically with impact
speed [28], [29]. Our data of lift vs. vertical leg penetration
speed (Fig. 5, green squares) can be described approximately
by a quadratic with a non-zero y-intercept (Fig. 5, red curve),
due to the finite yield stress of the medium. We hypothesize
that as the vertical leg penetration speeds increase, the
inertial force of the grains being accelerated by the legs
becomes important and contributes significantly to the
vertical ground reaction force. In other words, at high
frequencies, instead of swimming, the robot runs on granular
material that behaves like an “inertial fluid”, much like the
basilisk lizard (Basiliscus), the so-called “Jesus Christ lizard”
that runs on the surface of water [27].
We used previous intrusion studies in granular media to
estimate the transition frequency for the DynaRoACH robot.
Studies of horizontal drag and vertical impact in granular
media [24], [28], [29] suggest that inertial effects become
important for intrusion speeds beyond vc ~ (2gd)1/2, where d
is the particle diameter and g the gravitational acceleration.
For 3 mm glass particles, vc = 25 cm/s. This indicates that
hydrodynamic-like force should become significant as the
vertical leg penetration speed increases beyond ~ 25 cm/s, or
as stride frequency increases past ~ 6 Hz (Fig. 4B). This Figure 4. Limb penetration and intrusion speed measured in simulation.
matches the observed transition in locomotor mode around 6 (A) Maximal leg penetration depth (measured in particle diameters) vs.
stride frequency. (B) Maximum leg vertical penetration speed vs. stride
Hz (Fig. 3). We posit that this transition of the propulsion frequency. Error bars indicate standard deviation.

350
region of accelerating particles increases with pp because
larger pp facilitates interlocking of the particles.
The decrease of speed with pp between 0.2 < pp < 0.4
for running at high frequencies is likely a result of reduction
in propulsion due to an asymmetric gait. We observed that in
this regime, the particles became more tightly interlocked
and lost fluidity. The lift force on one tripod alone became
sufficient to accelerate the robot up such that the body
travelled in the air during the remainder of the cycle and the
other tripod never touched the ground.
As pp increased above 0.4, the particles became so
tightly interlocked that the granular substrate behaved like
rigid ground. Locomotion became unstable as the robot
bounced erratically, and its movement direction changed
randomly. This is similar to the unstable motion of the robot
Figure 5. Vertical ground reaction force vs. peak leg penetration speed on rigid ground [15]. Forward speed was not well defined in
measured in simulation. Dashed red curve indicates quadratic fit with this regime. The perfectly rigid legs of the robot used in
fitting parameter α = 5.7×10-3 cm-2 s2and with non-zero intercept at mg. simulation may have contributed to this instability by
Error bars indicate standard deviation. generating force spikes during substrate impact [30].

C. Parameter variation and broader applicability IV. CONCLUSIONS


Since both the DEM simulation of the granular medium Inspired by the high performing desert animals moving
and the MBDyn multi-body analysis are based on first on granular media, we studied the locomotion of a
principles (i.e., Newton-Euler equations of motion) and lightweight, bio-inspired, legged robot on a granular
validated empirical models (i.e., Hertzian contact theory, substrate and developed an experimentally validated
Coulomb model of friction), our results should be applicable computer simulation of the locomotion. Kinematics
to a broad class of granular substrates. The simulation tool measured in simulation matched experiment and enabled
therefore allows rapid and accurate systematic variation of examination of ground reaction forces responsible for the
the properties of the locomotor (e.g., leg geometry, leg high locomotor performance. The small robot displayed a
trajectories) as well as the granular substrate (e.g., size, transition in locomotor mode from walking at low
density, friction, and hardness of particles). For example, frequencies to running at high frequencies. At low
while in experiment it is difficult to alter only one property frequencies, hydrostatic-like forces generated during the
while not changing others, we can test in simulation the yielding of the granular material ultimately led to
effects of each substrate property. This can provide insight solidification of the material, and the robot moved as if it
for both design of future multi-terrain robots, and allows were walking on a solid surface. At high frequencies,
testing of biological hypotheses. however, the inertia of the grains being accelerated became
To demonstrate the capabilities of our simulation, we important and forces became hydrodynamic-like. In this
analyzed the effect of the coefficient of particle-particle regime the robot ran rapidly by paddling the legs on fluidized
friction, pp. Dramatic changes in locomotor performance granular material which behaved like an “inertial fluid”.
were observed when pp was varied (Fig. 6). For walking at
low frequencies (e.g., 3 Hz, Fig. 6, filled green squares), the
robot speed increased monotonically with pp, and saturated
at high pp. In this regime, forces were dominated by friction.
As pp increased, the legs needed to penetrate less to balance
body weight and inertial force. As a result, stride length and
speed increased with pp. The saturation of speed at high pp
was a consequence of leg penetration depth approaching zero
with increasing pp. Thus stride length could not increase
further.
For running at high frequencies (e.g., 12 Hz, Fig. 6,
empty green squares), the robot speed increased with pp for
pp < 0.2 and decreased with pp for pp > 0.2. The increase of
speed with pp for pp < 0.2 is likely because the inertial force
provided by the particles being accelerated by the legs
increased with pp We observed that multiple layers of
Figure 6. Robot average speed upon variation of particle-particle friction
particles directly under and around the legs were fluidized in simulation (filled squares: 3Hz, empty squares: 12Hz).
during leg penetration. We hypothesize that the size of the

351
Our results reveal that lightweight robots can achieve [9] C. Li, P. B. Umbanhowar, H. Komsuoglu, D. E. Koditschek,
high locomotor performance on granular media by exploiting and D. I. Goldman, “Sensitive dependence of the motion of a legged robot
on granular media,” Proceedings of the National Academy of Sciences of
fluid properties of the granular material. This locomotion the United States of America, vol. 106, no. 9, pp. 3029-3034, 2009.
mode is distinct from previously observed low frequency [10] R. J. Wood, R. Sahai, E. Steltz, and R. S. Fearing, “Microrobot
yielding walking strategy, and provides a better design using fiber reinforced composites,” Journal of Mechanical Design,
understanding of fundamental locomotive modes for a broad vol. 130, no. May, pp. 1-11, 2008.
class of granular substrates. The lightweight robot platform [11] A. M. Hoover, S. Burden, X. Fu, S. S. Sastry, and R. S. Fearing,
enables detailed examination of legged locomotion, and “Bio-inspired design and dynamic maneuverability of a minimally actuated
provides likely the best model to date of a robot running on six-legged robot,” Biomedical Robotics and Biomechatronics (BioRob), pp.
869-876, 2010.
granular media. In addition, the integrated simulation tool we
[12] P. Birkmeyer, K. Peterson, and R. S. Fearing, “DASH: A
developed can be used to systematically test the effect of dynamic 16g hexapedal robot,” in IEEE International Conference on
both locomotor and substrate properties on locomotor Intelligent Robots and Systems, pp. 2683-2689, 2009.
performance, which can guide the design, control and power [13] W. Mosauer, “Adaptive convergence in the sand reptiles of the
consumption estimates for high-performing multi-terrain Sahara and of California: A study in structure and behavior,” Copeia, vol.
robot platforms. 1932, no. 2, pp. 72-78, Jul. 1932.
[14] C. S. Crawford, Biology of Desert Invertebrates. New York:
Finally, we note that while experiment and simulation Springer, 1981.
allow detailed investigation of mechanics of movement on [15] C. Li, A. M. Hoover, P. Birkmeyer, P. B. Umbanhowar, R. S.
granular media, a complementary approach is needed, that of Fearing, and D. I. Goldman, “Systematic study of the performance of small
low order dynamical models that can be used to gain insight robots on controlled laboratory substrates,” Proceedings of SPIE Defense,
into the critical mechanics of dynamical running. In our Security, and Sensing Conference, 7679:76790Z (1–13), 2010.
future work, we will investigate if a dynamic force law that [16] S. Vogel, Life in moving fluids: the physical biology of flow.
describes the hydrodynamic-like forces during high speed leg Princeton Univ Pr, 1996.
intrusions can be obtained from measurements in DEM [17] R. D. Maladen, Y. Ding, P. B. Umbanhowar, A. Kamor, and D.
I. Goldman, “Mechanical models of sandfish locomotion reveal principles
simulation. We posit that a generalized locomotion model of high performance subsurface sand-swimming,” Journal of The Royal
similar to the Spring-Loaded Inverted Pendulum (SLIP) [26] Society Interface, vol. 8, no. 62, pp. 1332-1345, Mar. 2011.
can be developed based on the force law, and can extend our [18] A. A. Biewener and R. J. Full, “Force platform and kinematic
current study to more generalized conditions. This analysis,” in Biomechanics: Structures and Systems: A Practical Approach,
generalized model will shed light on the locomotor dynamics Oxford University Press, pp. 45-73, 1992.
of legged animals and robots on granular media, as well as [19] W. W. Dickinson and J. D. Ward, “Low depositional porosity in
guide development for analytically tractable low order eolian sands and sandstones, Namib Desert,” Journal of Sedimentary
Research, vol. 64, no. 2a, pp. 226-232, 1994.
models.
[20] M. Jerkins, M. Schr, T. J. Senden, M. Saadatfar, and T. Aste,
“Onset of mechanical stability in random packings of frictional spheres,”
ACKNOWLEDGMENTS Physical Review Letters. vol. 101, no. 1, pp. 1-4, 2008.
[21] N. Gravish, P. B. Umbanhowar, and D. I. Goldman, “Force and
We thank Jeff Shen for the help with data collection, and flow transition in plowed granular media,” Physical Review Letters, vol.
105, no. 12, pp. 208301(1-4), Sep. 2010.
Yang Ding and Paul Umbanhowar for helpful discussion.
[22] J. Shäfer, S. Dippel, and D. Wolf, “Force schemes in
simulations of granular materials,” Journal de physique I, vol. 6, no. 1, pp.
REFERENCES 5–20, 1996.
[1] U. Saranli, M. Buehler, and D. E. Koditschek, “RHex : A [23] G. L. Ghiringhelli, P. Masarati, and P. Mantegazza, “Multi-
Simple and Highly Mobile Hexapod Robot,” The International Journal of Body Analysis of a Tiltrotor Con guration Multi-Body Formulation,”
Robotics Research, vol. 20, no. 7, pp. 616-631, 2001. Nonlinear Dynamics, vol. 19, no. 4, pp. 333-357, 1999.
[2] R. T. Schroer, M. J. Boggess, R. J. Bachmann, R. D. Quinn, and [24] R. Albert, M. A. Pfeifer, A. Barabási, and P. Schiffer, “Slow
R. E. Ritzmann, “Comparing cockroach and Whegs robot body motions,” Drag in a Granular Medium,” Physical Review Letters, pp. 205-208, 1999.
in Proceedings of the 2004 IEEE International Conference on Robotics and [25] G. Hill, S. Yeung, and S. a Koehler, “Scaling vertical drag
Automation, vol. 4, pp. 3288–3293, 2004. forces in granular media,” Europhysics Letters (EPL), vol. 72, no. 1, pp.
[3] C. Wright et al., “Design of a modular snake robot,” 137-143, Oct. 2005.
Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent [26] R. Blickhan, “The spring-mass model for running and hopping,”
Robots and Systems, pp. 2609-2614, Oct. 2007. Journal of Biomechanics, vol. 22, no. 11–12, pp. 1217-1227, 1989.
[4] S. Kim, M. Spenko, S. Trujillo, et al., “Whole body adhesion: [27] J. W. Glasheen and T. A. Mcmahon, “A hydrodynamic model of
hierarchical, directional and distributed control of adhesive forces for a locomotion in the Basilisk Lizard,” Nature, vol. 380, no. 6572, pp. 340-341,
climbing robot,” in Proceedings of the 2007 IEEE International Conference 1996.
on Robotics and Automation, pp. 1268–1273, April 2007. [28] H. Katsuragi and D. J. Durian, “Unified force law for granular
[5] C Li, S. T. Hsieh, and D. I. Goldman, "Multi-functional foot use impact cratering,” Nature Physics, vol. 3, no. 6, pp. 420-423, Apr. 2007.
during running in the zebra-tailed lizard (Callisaurus draconoides)", [29] D. I. Goldman and P. B. Umbanhowar, “Scaling and dynamics
Journal of Experimental Biology, vol. 215, no. 18, 2012. of sphere and disk impact into granular media,” Physical Review E, vol. 77,
[6] J. Matson, “Unfree Spirit: NASA’s Mars Rover Appears Stuck no. 2, pp. 021308(1-14), Feb. 2008.
for Good,” Scientific American, vol. 302, no. 4, p. 16, 2010. [30] S. Kim, J. E. Clark, and M. R. Cutkosky, “iSprawl: Design and
[7] H. M. Jaeger, S. R. Nagel, and R. P. Behringer, “The physics of tuning for high-speed autonomous open-loop running,” The International
granular materials,” Physics Today, vol. 49, p. 32, 1996. Journal of Robotics Research, vol. 25, no. 9, pp. 903–912, 2006.
[8] R. M. Nedderman, Statics and Kinematics of Granular
Materials. Cambridge: Cambridge University Press, 1992.

352
On Stochastic Optimal Control and Reinforcement
Learning by Approximate Inference
Konrad Rawlik∗ , Marc Toussaint† and Sethu Vijayakumar∗
∗School of Informatics, University of Edinburgh, UK
† Department of Computer Science, FU Berlin, Germany

Abstract—We present a reformulation of the stochastic optimal cases make restrictive assumptions about the problem dynam-
control problem in terms of KL divergence minimisation, not ics and costs which can be relaxed under our framework,
only providing a unifying perspective of previous approaches in besides providing a unifying and generic formalism.
this area, but also demonstrating that the formalism leads to
novel practical approaches to the control problem. Specifically, a Finally, we are able clarify the relation of SOC and the
natural relaxation of the dual formulation gives rise to exact iter- inference control formulation by [24, 17, 26], which allows
ative solutions to the finite and infinite horizon stochastic optimal for arbitrary problems, showing it to be an instance of risk
control problem, while direct application of Bayesian inference sensitive control. The generalisation of this relation given by
methods yields instances of risk sensitive control. We furthermore
our approach makes it possible to apply out of the box infer-
study corresponding formulations in the reinforcement learning
setting and present model free algorithms for problems with both ence methods to obtain approximate optimal policies. This is
discrete and continuous state and action spaces. Evaluation of of particular interest in the case of continuous problems – here
the proposed methods on the standard Gridworld and Cart-Pole approximations are unavoidable since explicit representations
benchmarks verifies the theoretical insights and shows that the are often not available.
proposed methods improve upon current approaches.

I. I NTRODUCTION II. P RELIMINARIES

In recent years the framework of stochastic optimal control A. Stochastic Optimal Control
(SOC) [20] has found increasing application in the domain of
We will consider control problems which can be modeled
planning and control of realistic robotic systems, e.g., [6, 14,
by a Markov decision process (MDP). Using the standard
7, 2, 15] while also finding widespread use as one of the most
formalism, see also e.g., [21], let xt ∈ X be the state and
successful normative models of human motion control [23]. In
ut ∈ U the control signals at times t = 1, 2, . . . , T . To
general, SOC can be summarised as the problem of controlling
simplify the notation, we shall denote complete state and
a stochastic system so as to minimise expected cost. A specific
control trajectories x1...T ,u0...T by x̄,ū. Let P (xt+1 |xt , ut ) be
instance of SOC is the reinforcement learning (RL) formalism
the transition probability for moving from xt to xt+1 under
[21] which does not assume knowledge of the dynamics or cost
control ut and let Ct (x, u) ≥ 0 be the cost incurred per stage
function, a situation that may often arise in practice. However,
for choosing control u in state x at time t. Let policy π(ut |xt )
solving the RL problem remains challenging, in particular in
denote the conditional probability of choosing the control ut
continuous spaces [16].
given the state xt . In particular a deterministic policy is given
A recent, promising direction in the field has been the by a conditional delta distribution, i.e. π(ut |xt ) = δut =τ (xt )
application of inference methods [1] to these problems, e.g., for some function τ . The SOC problem consists of finding a
[10, 22, 24]. In this context, we introduce a generic formula- policy which minimises the expected cost, i.e., solving
tion of the SOC problem in terms of Kullback-Leibler (KL)
A T B
divergence minimisation. Although the arising KL divergences 
can, in general, not be minimised in closed form, we provide π * = argmin Ct (xt , ut ) , (1)
π
a natural iterative procedure that results in algorithms that t=0 qπ
we prove to asymptotically converge to the exact solution of
the SOC problem. Specifically, algorithms for both finite and where ·qπ denotes the expectation with respect to
infinite horizon problems are derived and their corresponding
formulations in the RL setting are introduced. We show that &
T
qπ (x̄, ū|x0 ) = π(u0 |x0 ) π(ut |xt )P (xt+1 |xt , ut ) , (2)
the latter corresponds to the independently derived result of
t=1
[5] for the specific case of infinite horizon discrete problems;
here, we extend this to problems with continuous actions. the distribution over trajectories under policy π.
Formulation of SOC problems in terms of KL minimisation In the case of infinite horizon problems, i.e. we let T → ∞,
has been previously studied by, amongst others, [22], [11] and we will consider the discounted cost setting and specifically
[10], leading to efficient methods for both stochastic optimal assume that Ct (xt , ut ) = γ t C• (xt , ut ), where C• is a time
control [22] and RL [7]. However, as we will discuss, these stationary cost and γ ∈ [0, 1] a discount factor.

353
u0 u1 u2 III. I TERATIVE S OLUTIONS
Although Theorem 1 provides the correspondence between
x0 x1 x2 ... xT the SOC formulation and the computationally attractive in-
ference control approach, due to the constraint π ∈ D, (4)
r0 r1 r2 rT
remains as intractable as the classical formulation via the
Bellmann equation. However relaxation of this constraint to
allow minimisation over arbitrary stochastic policies provides
Fig. 1: The graphical model of for the Bayesian formulation a closed form solution, and although it does not directly lead
of the control problem in the finite horizon case. In the infinite to an optimal policy, we have the following result:
horizon case we obtain a stochastic Markov process.
Theorem 2. For any π = π 0 , KL (qπ ||pπ0 ) ≤ KL (qπ0 ||pπ0 )
implies C(x̄, ū)qπ < C(x̄, ū)q 0 .
π

B. Inference Control Model Proof: see Supplementary Material


Consequently, with some initial π 0 , the iteration
A Bayesian inference based approximation of the above
control problem can be formulated [24] as illustrated in Fig.1. π n+1 ← argmin KL (qπ ||pπn ) , (6)
π
In addition to the state and control variables of classical SOC,
a binary dynamic random task variable rt is introduced and where π is an arbitrary3 conditional distribution over u, gives
the task likelihood is related to the classical cost by choosing rise to a chain of stochastic policies with ever decreasing
P (rt = 1|xt , ut ) = exp{−ηC(xt , ut )}, where η > 0 is expected costs.
some constant in analogy with the inverse temperature of a We would like to note that the conditions imposed by the
Boltzmann distribution. For some given policy π and assuming above result, in order to guarantee a policy improvement, are
the artificial observations r0...T = 1, we denote the un- relatively weak. By exploiting this, in addition to the iteration
normalised posterior by pπ (x̄, ū): arising from (6), we present a relaxation, which satisfy The-
orem 2 and leads to practical algorithms for infinite horizon
pπ (x̄, ū) = P (x̄, ū, r̄ = 1|x0 ) problems, and the related iteration of Bayesian inference which
&
T leads to risk-sensitive control.
= qπ (x̄, ū) exp{−ηCt (xt , ut )} . (3)
A. Exact Minimisation - Finite Horizon Problems
t=0
The general minimisation in iteration (6) can, as previously
C. General Duality indicated, be performed in closed form and the new policy
While the Bayesian model has been employed successfully (for derivation, see Supplementary Material), is given by the
for trajectory planning, e.g., in [24], it’s relation to the classical Boltzmann like distribution,
SOC problem remained unclear. Although a specific subset of π n+1 (ut |xt ) = exp{Ψn+1 (xt , ut ) − Ψ̄n+1 (xt )} , (7)
t t
SOC problems, studied by [11] and [22], can be formulated
in a similar Bayesian model, as explicitly done by [10] (we with energy
discuss the relation to this work further in III-D3), here,
Ψn+1 (xt , ut ) = log π n (ut |xt ) + log P (rt = 1|xt , ut )
we establish the formal correspondence between the two t

formalisms in the general case with the following result: + P (xt+1 |xt , ut )Ψ̄n+1
t+1 (xt+1 ) (8)
xt+1
Theorem 1. Let π be an arbitrary stochastic policy and D
0
and log partition function
the set of deterministic policies, then the problem 
Ψ̄n+1 (xt ) = log exp{Ψn+1 (xt , u)} . (9)
π * = argmin KL (qπ (x̄, ū)||pπ0 (x̄, ū)) (4) t
u
t
π∈D
In the finite horizon case, the policy can therefore be computed
is equivalent to the stochastic optimal control problem (1) with backwards in time.
cost per stage 1) Convergence Analysis: Following [12], we bound the
1 progress of the trajectory posterior under policy π n towards
Cˆt (xt , ut ) = Ct (xt , ut ) − log π 0 (ut |xt ) . (5) the corresponding distribution under some chosen π̂, obtaining
η
Lemma 3. Let the sequence {π n } be generated by (6) and
Proof: see Supplementary Material1 . let π̂ be an arbitrary (stochastic) policy. Then
As an immediate consequence we may recover any given
stochastic optimal control problem with cost Ct by choosing KL (qπ̂ ||qπn+1 ) − KL (qπ̂ ||qπn )
π 0 (·|x) to be the uniform distribution over U2 . ≤ ηC(x̄, ū)qπ̂ − ηC(x̄, ū)q . (10)
π n+1

1 Supplementary Material can be found at http://arxiv.org/abs/1009.3958 3 n.b., formally certain assumptions have to be made to ensure the support
2 n.b., formally we require U to be finite or bounded of qπ is a subset of the support of pπn

354
Proof: See Supplementary Material. assumption that the cost is discounted, the expected cost of
Summing the above bound over 0 . . . N , we can compute the the tail goes to zero as the horizon increases, leading to a
bound result analogous to Theorem 4 (see Supplementary Material
for formal proof).
1 
N +1
1
C(x̄, ū)qπn ≤ C(x̄, ū)qπ̂ + KL (qπ̂ ||qπ0 ) , (11)
N n=1 ηN
C. Posterior Policy Iteration
on the average expected cost of the policies π 1 . . . π n+1 . Now,
since Theorem 2 guarantees that the expected cost for each Since our starting point was the relaxation of the relation
π n is non increasing with n, using (11), we can obtain the between SOC and inference control, it is interesting to con-
following stronger convergence result. sider sequential inference of the posterior policy, which is
the natural iteration arising in the latter framework. Such an
Theorem 4. Let {π n } be a sequence of policies generated by iteration is of particular interest as posterior inference is a well
(6), with π 0 s.t. π 0 (·|x ∈ X) has support U. Then studied problem with a large range of approximate algorithms
lim C(x̄, ū)qπn = min C(x̄, ū)qπ . (12) [1] which could be exploited for practical implementations.
n→∞ π Although unconstrained minimisation of the KL divergence
Proof: See Supplementary Material. is achieved by the posterior, in our case, the specific form of
qπ in (6) is, as can be seen in (2), restricted by the prescribed
B. Asynchronous Updates - Infinite Horizon Problems system dynamics, leading to the results presented in the last
In the infinite horizon setting, discussed in II-A, it is easy sections. Nonetheless, we may consider the iteration
to show that the time stationary analog of (8) can be obtained
as π n+1 = pπn (ut |xt ) , (15)

Ψn+1 (x, u) = log π n (u|x) + log P (r = 1|x, u) which, as we show (see Supplementary Material), will con-

+ γ P (y|x, u)Ψ̄n+1 (y) . (13) verge to the policy
y
1
However, due to the form of Ψ̄n+1 , this does not yield Ψn+1 π̃ = argmin − log exp{−ηCt (x̄, ū)}qπ . (16)
π η
in closed form. To obtain a practical solution we make use
of the relatively weak conditions given by Theorem 2 for The objective being minimized is exactly the risk sensitive
obtaining a lower expected cost, which allow us to consider objective of [8], which has been recently also used in the path
the minimisation in (6) over some iteration dependent subset integral approach to SOC [3]. In particular, note that for η →
Pn of the set of all (stochastic) policies. Then, Theorem 2 0, we obtain the classical risk neutral controls, allowing near
guarantees the expected costs to be non increasing, if for all optimal policies for arbitrary SOC problems to be computed
n, π n ∈ Pn . by iterated Bayesian inference.
Such iterations admit asynchronous updates as an interesting
case, i.e., updating one or several time steps of the policy at
each iteration in any particular order. Formally, we choose a D. Relation to Previous Work
schedule of time step sets T0 , T1 , . . . and let Pn = {π : ∀t ∈
/ 1) Dynamic Policy Programming (DPP): The recently in-
Tn , πt = πtn }. Specifically, we will consider the schedule for troduced DPP algorithm [5] is closely related to the formalism
such updates given by Tn = {0, . . . , n − 1}, i.e., in each described here. Specifically, while the update equations (14)
iteration we consider finite horizon problems with increasing coincide, we provide a more general view of DPP by deriving
n+1
horizon. Such a schedule leads to the update πt+1 = πtn for it as a special case of the novel result in Theorem 2. In
n+1
all t > 0 while the new first step policy, π0 , is of the form addition, III-A provides the direct extension of DPP to finite
(7) and obtained via horizon problems, while the convergence proofs of III-B
extend those given by [5] to continuous state and action spaces.
Ψn+1 (x, u) = Ψn0 (x, u) − Ψ̄n0 (x) + log P (r = 1|x, u)
0
 2) Approximate Inference Control (AICO): The AICO [24]
+γ P (x |x, u)Ψ̄n0 (x ) , (14) approach to trajectory optimisation shares the same Bayesian
x Model used as a starting point here (cf. II-B). However,
hence yielding a practical iteration which has a strong analogy although using local LQG approximations AICO converges to
to value iteration, see e.g., [21]. locally optimal trajectories, the relation to the classical SOC
1) Convergence Analysis: Essentially equivalent conver- problem remained unclear. We not only establish such a formal
gence results to the finite horizon case can be obtained for relation, but also note that AICO can be interpreted as one
the asynchronous algorithm (14) in the infinite horizon setting. step of the posterior policy iteration introduced in III-C. More
Informally, we proceed by assuming that the cost is bounded specifically, if one were to use the maximum likelihood policy
and consider finite horizon problems with growing horizon, obtained by AICO one would obtain (approximate) optimal
bounding the expected cost of the infinite tail. Due to the risk seeking controls.

355
3) Path Integral Control: Let us briefly recall the KL find the policy implementing the best realisable transition, i.e.,
control framework [10], the alternative formulations in [22] relaxation of (19) to
being equivalent for our purposes. Choose some free dynamics  
ν0 (xt+1 |xt ) and let the cost be given as argmin KL P (xt+1 |xt , ut )π(ut |xt )||ν(xt+1 |xt ) ,
π∈D ut
ν(x̄)
C(x̄) = (x̄) + log does also not lead to the desired result.
ν0 (x̄) However, for problems of the specific form (20), a closer
where ν(xt+1 |xt ) is the controlled process under some policy. relation between Theorem 1 and (17) does indeed exist. To
Then illustrate this, we write the KL divergence of Theorem 1 in
terms of the state trajectory (x̄) marginals as
C(x̄)ν = KL (ν(x̄)||ν0 (x̄) exp{−(x̄)}) (17)
KL (qπ (x̄, ū)||pπ0 (x̄, ū)) = KL (qπ (x̄)||ν(x̄))
is minimised w.r.t. ν by C D
1
1 − mTt Q−1 But − uTt Hut ) ,
ν(x1:T |x0 ) = exp{−(x1:T )}ν0 (x1:T |x0 ) (18) 2 qπ (x̄,ū)
Z(x0 )
where mt = xt+1 − xt − F(xt ). Furthermore, since for a
and one concludes that the optimal control is given by deterministic policy, i.e. π(ut |xt ) = δut =τ (xt ) ,
ν(xt+1 |xt ), where the implied meaning is that ν(xt+1 |xt ) is
the trajectory distribution under the optimal policy. mt qπ = But qπ = Bτ (xt ) ,
Although (18) gives a process which minimises (17), it is the second term is zero under the condition required, i.e., H =
not obvious how to compute the actual controls ut . Specifically 2BT Q−1 B, and analogous to (17), it is sufficient to consider
when given a model of the dynamics, i.e., P (xt+1 |xt , ut ), and the distributions over state trajectories only.
having chosen some ν0 , a non trivial, yet implicitly made, In conclusion, for discrete time problems, the work of
assumption is that there exists a policy implementing the [10, 22] constitutes special cases of Theorem 1, which either
required transitions ν(xt+1 |xt ), i.e., ∃π s.t. assume fully controllable dynamics or where the control
  trajectories can be marginalised from Theorem 1.
KL P (xt+1 |xt , ut )π(ut |xt )||ν(xt+1 |xt ) = 0. (19) 4) Expectation Maximisation: Several suggestions for map-
ut
ping the SOC problem onto a maximum likelihood problem
However, in general, such a π will not exist. This is made and using Expectation Maximization (EM) have been recently
very explicit for the discrete MDP case in [22], where it made in the literature, e.g., [25], and going further back, the
is acknowledged that the method is only applicable if the probability matching approach [4, 19] has also close links
dynamics are fully controllable, i.e., P (xt+1 |xt , ut ) can be with EM. Considering (6), the proposed approach has a close
brought into any required form by the controls. Although in the relation to the free energy view of EM. Given a free energy
same paper, it is suggested that solutions to classical problems
can be obtained by continuous embedding of the discrete MDP, F (q̃, π) = log P (r̄ = 1; π) − KL (q̃||pπ ) (21)
such an approach has several drawbacks. For one, it requires = log P (r̄ = 1, x̄, ū; π)q̃ + H(q̃) , (22)
solving a continuous problem even for cases which could
EM alternates between minimizing KL (q̃||pπ ) w.r.t. q̃ in (21)
have been otherwise represented in tabular form, but more
and maximising the free energy w.r.t. the potentially infinite
importantly such an approach is obviously not applicable to
parameter vector π in (22). Our iteration of (6) deviates from
problems which already have continuous state or action spaces.
this standard EM in that the KL-minimization in (6) is w.r.t.
In the case of problems with continuous states and actions
a constrained q̃, namely one which can be generated by a
we may consider the specific form
control π. The M-step is then trivially assigning the new π
xt+1 = F(xt ) + B(ut + ξ), ξ ∼ N (0, Q) , to the one corresponding to q̃. The constraint E-step departs
(20) from standard EM but is a special case of the alternating
Ct (xt , ut ) = (xt ) + uTt Hut ,
minimisation procedures of [9]. Importantly however, unlike
with F, B and  having arbitrary form, but H, Q are such the previously mentioned EM based approaches which can
that H ∝ BT Q−1 B. This is of interest, as it is the discrete only guarantee convergence to a local extremum, we have
time form of the fully controllable continuous time problem demonstrated algorithms with guaranteed convergence to the
which underlies the path integral approach [11]. It also has global optimum.
been claimed, e.g., [10], that, analogously to the continuous
time case, the solution of this problem is given by (18). IV. R EINFORCEMENT L EARNING
However considering the simple instance of a one step LQG We now turn to the RL setting [21], where one aims to learn
problem, we see that (19) will not hold, as in this case the a good policy given only samples from the transition proba-
variance of P (x1 |x0 , u0 ) is uncontrolled. Hence ν is not the bility and associated incurred costs. As RL usually considers
trajectory distribution under the optimal policy. Furthermore the discounted cost infinite horizon setting we concentrate on
it is straightforward to convince oneself that attempting to this case, with the understanding that equivalent steps can

356
be taken in the finite horizon case. We note that for any previous approaches, e.g., [5], we used a linear basis function
given x, u the update of (14) can be written as an expectation model to approximate Ψ, i.e.,
w.r.t. the transition probability P (y|x, u), and hence, may be

M
approximated from a set of sampled transitions. In particular Ψ(x, u) ≈ Ψ̃(x, u, w) = wm φ(x, u) (24)
given a single sample (x, u, R, y) of a transition from x to y m=0
under control u, obtaining reward R = log P (r = 1|x, u), we
where φi : X × U → R are a set of given basis functions
may perform the approximate update
and w = (w1 , . . . , wM ) is the vector of parameters that
 
Ψ(x, u) ← Ψ(x, u) + α R + γ Ψ̄(y) − Ψ̄(x) , (23) are optimised. For such an approximation, and given a set
of samples (x1...K , u1...K , R1...K , y1...K ), the updates (8) and
with α a learning rate and for trajectory data applying such (23) can be written in matrix notation as
an update individually for each tuple (xt , ut , Rt , xt+1 ).
Φwn+1 = Φwn + z , (25)
A. Relation to Classical Algorithms where Φ is the K × M matrix with entries Φi,j = φi (xj , uj )
Before proceeding let us highlight certain similarities and and z is the vector with elements
differences between (23) and two classical algorithms, Q-
zk = γ Ψ̄(yk ) + Rk − Ψ̄(xk ) . (26)
learning and TD(0) [21].
The Q-learning algorithm learns the state-action value func- This suggests the update rule of the form
tion. We note that Ψ has certain similarities to a Q function,
in the sense that a higher value of Ψ for a certain control in a w ← w + (ΦT Φ)−1 ΦT z . (27)
given state indicates that the control is ’better’ – in fact, for the The choice of basis functions is somewhat complicated by
optimal controls the Q function and Ψ converge to the same the need to- evaluate the log partition function of the policy
absolute value (see Supplementary Material). However, unlike Ψ̄, i.e. log u exp{Ψ̃(x, u)}, when forming the vector z. In
the Q function, which also converges to the expected cost cases where U is a finite set, arbitrary basis functions can be
for the sub-optimal controls, Ψ goes to −∞ for sub-optimal chosen as the integral reduces to a finite sum. However, for
actions. A potentially more insightful difference between the problems with infinite (or continuous) control spaces, bases
two algorithm is the nature of updates employed. The Q- need to be chosen such that the resulting integral is analytically
learning algorithm uses updates of the form tractable, i.e. the partition function of the stochastic policy can
 be evaluated. One class of such basis sets is given by those

Q(x, u) ← Q(x, u) + α R + γ max 
Q(y, u ) − Q(x, u) , Ψ̃(x, u, w) that can be brought into the form
u

where α is a learning rate. Note that it employs information 1


Ψ̃(x, u, w) = − uT A(x, w)u+uT a(x, w)+a(x, w) , (28)
from the current command and the single best future command 2
under current knowledge. The proposed algorithm on the other where A(x, w) is a positive definite matrix-valued function,
hand uses a soft-max operation by employing Ψ̄, averaging a(x, w) is a vector-valued function and a(x, w) a scalar
over information about the future according to the current function. For such a set, the integral is of the Gaussian form
belief about the control distribution, hence taking uncertainty and the closed form solution
arising from, e.g., sampling into account. 
1
On the other hand, the TD(0) algorithm, which learns log exp{Ψ̃} = − log |A|− a A−1 a+a+constant (29)
u 2
through value function approximation, has updates of the form
is obtained. This gives us a recipe to employ basis functions
V(x) ← V(x) + α [R + γV(y) − V(x)] , that lead to tractable computations and the policy can be
computed as π(u|x, w) = N (u|A−1 a, A−1 ).
with α again a learning rate. Since it can be shown that
Ψ̄ converges to the value function of the optimal policy V. E XPERIMENTS
(cf. Supplementary Material), the proposed update converges A. Gridworld - Analytical Infinite Horizon RL
towards the TD(0) update for samples generated under the We start by evaluating the proposed algorithm (23) on a
optimal policy. In particular, while TD(0) is an on-policy problem used in [22], with finite state and action spaces, which
method and learns the value function of the policy used allows a tabular representation of Ψ. The state space is given
to generate samples, the proposed method learns the value by a N × N grid (see Fig. 2(b)) with some obstacles. The
function of the optimal policy directly. control can move the state to any adjacent ones not occupied
by an obstacle and the move succeeds with a probability of
B. RL with continuous states and actions 0.8. Additionally, a set A ⊆ X of absorbing target states was
One needs to use parametric representations [21] to store defined and the agent incurs a cost of 1 at all states other than
Ψ when tabular means are no longer viable or efficient, the target, i.e., C(x, u) = δx∈A
/ with δ the Kronecker delta.
as is the case with high dimensional, large discrete [5] or The cost was not discounted. We benchmark performance
continuous state and control spaces. Similar to numerous against tabular Q-learning [21]. Both algorithms were given

357
We have chosen the classical Cart-Pole problem [21], which
1 Q-learning has been repeatedly used as a benchmark in reinforcement
Ψ-learning learning, e.g., [18]. This plant, illustrated in Fig. 3a, consists
Ψ-learning (online)
of a inverted pendulum which is mounted on a cart and is
Error

controlled by exerting forces on the latter. Formally, the state


space is given by x = (x, ẋ, θ, θ̇), with x the position of the
cart, θ the pendulum’s angular deviation from the upright posi-
tion and ẋ, θ̇ their respective temporal derivatives. Neglecting
0 the influence of friction, the continuous time dynamics of the
0 105
Number of Samples state are given by

(a) (b)
g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
θ̈ =
3 l − c2 cos (θ) 
Fig. 2: Results from the Gridworld problem. (a) Evolution 4 2 (31)
of the mean error in (30) averaged over 10 trials with error
bars indicating the s.d. (b) Optimal value function (white low ẍ =c1 u + c2 θ̇ sin(θ) − θ̈ cos(θ)
2

expected cost - black high expected cost) of the problem. with g = 9.8m/s2 the gravitational constant, l = 0.5m
Obstacles are black and the target state is indicated by *. the pendulum length and constants c1 = (Mp + Mc )−1 and
c2 = lMp (Mp +Mc )−1 where Mp = 0.1kg, Mc = 1kg are the
pendulum and cart masses, respectively. The control interval
data from episodes generated with controls sampled from was 0.02s and the dynamics were simulated using the fourth
an uninformed policy. Once a target state was reached, or order Runga-Kutta method. Stochasticity was introduced by
if the target wasn’t reached within 100 steps, the state was adding zero mean Gaussian noise, with small diagonal covari-
reset randomly. The learning rate for Q-learning decayed as ance, to the new state. These settings correspond to those used
α = c/(c + k) with k the number of transitions sampled and by comparative evaluations in [18].
c a constant which was optimised manually. Representative 1) Model Based Posterior Policy Iteration: First, we con-
results are illustrated in Fig. 2. We plot the approximation sider a finite horizon optimal control problem, assuming we
error have access to both the plant dynamics and cost function.
maxx |J (x) − Jˆ(x)|
eJ = (30) The exact algorithm of III-A does not lend itself easily to
maxx J (x) this setting, due to the intractable integrals arising in (8) as a
between the true value function J , obtained by value iteration, consequence of the nonlinear dynamics – although we note
and it’s estimate J,ˆ given by Ψ̄ and maxu Q(x, u) respec- that by taking local LQG approximations of the problem,
tively. Both algorithms achieved the same error at conver- closed form updates can be derived. However, we demonstrate
gence, but the proposed algorithm (Ψ-learning) consistently that by using standard approximate inference techniques, the
required fewer samples than Q-learning for convergence – alternative posterior policy iteration in III-C can yield good
this is consistent with the discussion in IV-A. We additionally approximate optimal policies.
considered a online variant of Ψ-learning where the controls Specifically we consider the swing up task in which the
are sampled from the policy given by the current Ψ, i.e. pendulum has to be moved from a hanging down to an upright
π(u|x) = exp{Ψ(x, u) − Ψ̄(x)}. As expected, the online position and balanced. The per-step cost for this task is given
version outperformed sampling using an uninformed policy. by
The aim of this evaluation, besides providing a sanity check
Ct (xt , ut ) = ω1 θ2 + ω2 θ̇2 + ω3 u2t ∀t ∈ [0, T ] , (32)
to the working of the algorithm, was to illustrate that the
proposed method provides similar performance advantages as where ω is a vector of weights. The time horizon was T = 3s,
obtained for the restricted class of problems considered in [22], but note that, since a cost is incurred in each time step for
despite working in the product space of states and actions, as pendulum positions away from rest in the upright position, a
necessitated by considering the unrestricted SOC problem. rapid swing up followed by holding is encouraged.
As the posterior pπ (x̄, ū) is not tractable in this setting,
B. Cart-Pole System we use an extended Kalman smoother [20] to estimate a
We now move on to problems with continuous state and Gaussian approximation to the full posterior, leading to a
action spaces which will make approximations necessary, Gaussian posterior policy. As a consequence of the Gaussian
demonstrating that the theoretical results presented in III can approximation and inference method chosen, inference is
lead to practical algorithms. Specifically we will consider, required to be performed only once, for pπ0 (x̄, ū), and the
both, an approximate inference approach for implementing the eventual result of the iteration (15) can be obtained as the
posterior policy iteration of III-C on a finite horizon problem linear policy given by the mean of the posterior policy.
and the basis function based approach, discussed in IV-B, to In Fig. 3, we plot the expected costs and the cost variances,
the RL version of the asynchronous updates for infinite horizon both estimated by sampling under the obtained policies, for
problems derived in III-B. different values of the parameter η. For reference, we also

358
4 30 Ψ-Learning
140
200 eNAC
120
θ 3 AICO
20 100

Episode Length

Expected Cost
Expected Cost

Cost Variance
80
2
60
10
u 1 40

20
0 0 0
-5 -4 -3 -2 -1 0 1 -5 -4 -3 -2 -1 0 1 0 0 500 1,000 1,500
0 100 200 300 400 500
x log10 (η) log10 (η)
Number of Episodes Number of Episodes
(a) (b) (c) (a) (b)

Fig. 3: Results for model based approximate posterior policy Fig. 4: Results for RL with continuous state and action
iteration on the Cart-Pole swing-up task. (a) Schematic of spaces. (a) Length of training episodes, averaged over blocks
the pole on cart plant used in the experiments. (b) Expected of 25 episodes, for Ψ-Learning, when initialized with an
cost achieved by policies obtained for different values of uninformed policy. The vertical dashed line indicates the point
the parameter η. The dashed line indicates expected cost of at which initial policies for the results in the subsequent
policy obtained using iLQG. All values estimated from 1000 comparison experiment were picked. Error bars indicate s.d.
trajectories sampled using the respective policy. (c) Variance (b) Comparison of evolution of the expected cost between
of the costs achieved by the same policies as in (b). eNAC and Ψ-Learning. Both methods are initialised with the
same stabilising policies (cf. (a)) and results averaged over 10
trials with error bars indicating s.d.
show the expected cost from the policy obtained using the
iLQG algorithm [13] which also computes an approximately
optimal linear policy. We first observe that as predicted by Episodes were sampled with starting states drawn such that
III-C, η acts to control the risk seeking behavior of the policy, θ ∈ [−0.2rad, 0.2rad] and x ∈ [−0.5m, 0.5m] and controls
and for increasing values of η the cost variance increases were sampled from the stochastic policy given by the current
substantially. Furthermore, we note that the choice of η = 1, parameters. During training, episodes were terminated if the
which, as discussed, corresponds to the AICO setting, leads to plant left the acceptable region θ ∈ [−0.2rad, 0.2rad] and
results substantially different from the case of classical (risk x ∈ [−0.5m, 0.5m] or after 200 steps. Policy parameters were
neutral) optimal control. However reducing η leads rapidly updated every 10 episodes and every 5 updates policies were
to policies obtained by approximate inference which exhibit evaluated by sampling 50 episodes of 500 step length using the
similar performance to those obtained by classical approximate mean of the policy. All results were averaged over 10 trials.
methods. The learning rate parameter for policy gradient methods was
2) RL with approximations: To evaluate the RL approach adjusted manually for best results.
proposed in IV-B we consider the balancing task, following Despite the change in cost function, like [18], we were
closely the procedures in [18], where this task was used for not able to reliably obtain good policies from uninformed
evaluation of policy gradient methods. initialisation when using policy gradient methods. Our method
The task, which consists of stabilising the pendulum in the on the other hand, when initialised with an uninformed policy,
upright position while simultaneously keeping the cart at the i.e., zero mean and a variance of 10, was able to learn
center of the track, had the cost function a stabilising policy within 400 training episodes. This is
4
0 if (x, θ) in target set illustrated in Fig. 4a where the average length of training
C• (x, u) = 2 2
, (33) episodes is shown. In order to be able to compare to the
ωθ θ + ωx x else
episodic Natural Actor Critic (eNAC) method, which produced
where the target was given by x ∈ [−0.05m, 0.05m] and the best result in [18], we used the policies obtained by Ψ-
θ ∈ [−0.05rad, 0.05rad] and the discount rate was γ = 0. Learning after 400 training episodes as initial policies. By this
We chose this cost as we found it to give better results for stage, the average expected cost of the policies was 239.35
uniformed initial policies, for which the piece wise constant compared to the initial cost which had been of the order
cost of [18] provided little information. 3 × 105 . Fig. 4b shows the evolution of the expected cost for
The linear policy learned in [18] corresponds to a second both methods with such an initialisation and as can be seen
order polynomial basis for Ψ in the proposed method (Ψ- Ψ-Learning outperformed eNAC both in terms of convergence
Learning). Specifically we used the basis set speed and attained expected cost.
As the quality of the obtained policy will depend on how
{u2 , ux, uẋ, uθ, uθ̇, x2 , xẋ, xθ, xθ̇, ẋ2 , ẋθ, ẋθ̇, θ2 , θθ̇, θ̇2 }
well the basis set can approximate the true Ψ, we also
which is of the form (28) and indeed only constitutes an considered a more complex set of bases. Specifically, while
approximation to the true Ψ as the problem is non-LQG. keeping A in (28) a set of non-zero constant basis functions,

359
we represented a(x, w) and a(x, w) using the general and [5] A.M. Gheshlaghi et al. Dynamic policy programming
commonly used squared exponential bases which are of the with function approximation. In AISTATS, 2011.
form [6] D. Mitrovic et al. Optimal feedback control for anthro-
φ(x) = exp{−(x − mφ )T Σφ (x − mφ )} (34) pomorphic manipulators. In ICRA, 2010.
[7] E. A. Theodorou et al. Learning policy improvements
with center mφ and metric Σφ . The centers were sampled ran-
with path integrals. In AISTATS, 2010.
domly from a region given by the acceptable region specified
[8] S.I. Marcus et al. Risk sensitive markov decision pro-
earlier and ẋ ∈ [−1m/s, 1m/s], θ̇ ∈ [−1rad/s, 1rad/s] and
cesses. Systems and control in the 21st century, 1997.
Σφ was chosen to be diagonal. For this setting we were not
[9] A. Gunawardana and W. Byrne. Convergence theorems
able to obtain good policies using eNAC, while in the case
for generalized alternating minimization procedures. J.
of Ψ-Learning this choice did not outperform the polynomial
of Machine Learning Research, 6:2049–2073, 2005.
basis, yielding a best policy with expected cost 26.4.
[10] B. Kappen, V. Gomez, and M. Opper. Opti-
VI. C ONCLUSION mal control as a graphical model inference problem.
arXiv:0901.0633v2, 2009.
We have presented a general relation between stochastic
[11] H. J. Kappen. Path integrals and symmetry breaking
optimal control problems and minimisation of KL diver-
for optimal control theory. J. of Statistical Mechanics:
gences of the form (4). This allowed us to derive iterative
Theory and Experiment, page 11011ff, 2005.
algorithms for obtaining both risk neutral and risk sensitive
[12] J. Kivinen and M. Warmuth. Exponentiated gradient
optimal controls for finite and infinite horizon MDPs. We
versus gradient descent for linear predictors. Information
show that these algorithms, although instances of generalised
and Computation, 132:1–64, 1997.
EM procedures, enjoy guaranteed convergence to the global
[13] W. Li and E. Todorov. An iterative optimal control and
optimum. Further, we discuss the connections of our work to
estimation design for nonlinear stochastic system. In
previous approaches in this area, highlighting that many of
CDC, 2006.
these arise in our formulation as special cases which either
[14] D. Mitrovic, S. Klanke, and S. Vijayakumar. Adaptive
require restrictions on the class of problems (e.g., [22, 10]),
optimal control for redundantly actuated arms. In SAB,
or for which the relation to SOC was previously unclear (e.g.,
2008.
[24]). The formalism is then extended to the model free RL
[15] J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness
setting in both the finite and infinite horizon case. In the case of
and temporal optimization in periodic movements: An
finite state and action spaces, using a tabular representation, we
optimal control approach. In IROS, 2011.
obtain an exact algorithm with interesting relations to Q- and
[16] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement
TD(0) learning. We also present an approximation, based on
learning for humanoid robotics. In Humanoids, 2003.
basis function representations, which extends [5] to problems
[17] K. Rawlik, M. Toussaint, and S. Vijayakumar. An
with continuous state and action spaces.
approximate inference approach to temporal optimization
Our approach is verified in the discrete setting and we
in optimal control. In NIPS, 2010.
highlight the novel aspects of our work in experiments on a
[18] M. Riedmiller, J. Peters, and S. Schaal. Evaluation of
problem with continuous states and actions in the form of the
policy gradient methods and variants on the cart-pole
standard Cart-Pole benchmark. On the one hand we show that,
benchmark. In IEEE ADPRL, 2007.
by employing standard out of the box approximate inference
[19] P. N. Sabes and M. I. Jordan. Reinforcement learning by
methods, optimal policies can be computed for model based
probability matching. In NIPS, 1996.
finite horizon problems, improving the shortcomings of [24].
[20] R. F. Stengel. Optimal Control and Estimation. Dover
On the other hand, we consider an infinite horizon problem
Publications, 1986.
in the model free RL setting, demonstrating that the proposed
[21] R.S. Sutton and A.G. Barto. Reinforcement Learning.
approximate algorithm shows performance competitive with
MIT Press, Cambridge, 1998.
the well established eNAC algorithm. We also provide a recipe
[22] E. Todorov. Efficient computation of optimal actions.
for selecting appropriate basis functions that lead to efficient,
PNAS, 106:11478–11483, 2009.
tractable solutions.
[23] E. Todorov and M. Jordan. Optimal feedback control as
R EFERENCES a theory of motor coordination. Nature Neuroscience,
[1] C. M. Bishop. Pattern recognition and machine learning. 5:1226–1235, 2002.
Springer, 2006. [24] M. Toussaint. Robot trajectory optimization using ap-
[2] D. Braun, M. Howard, and S. Vijayakumar. Exploiting proximate inference. In ICML, 2009.
variable stiffness in explosive movement tasks. In R:SS, [25] M. Toussaint and A. Storkey. Probabilistic inference for
2011. solving discrete and continuous state markov decision
[3] J.L. van den Broek, W.A.J.J. Wiegerinck, and H.J. Kap- processes. In ICML, 2006.
pen. Risk sensitive path integral control. In UAI, 2010. [26] D. Zarubin, V. Ivan, M. Toussaint, T. Komura, and S. Vi-
[4] P. Dayan and G. E. Hinton. Using EM for reinforcement jayakumar. Hierarchical motion planning in topological
learning. Neural Computation, 9:271–278, 1997. representations. In R:SS, 2012.

360
Failure Anticipation in Pursuit-Evasion
Cyril Robin∗† and Simon Lacroix∗†
∗ CNRS, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France
† Univ
de Toulouse, LAAS, F-31400 Toulouse, France
[email protected], [email protected]

Abstract—This paper presents a new approach for the pursuit them. Localization may use different sensors and vantage
of targets by a team of aerial and ground robots under realistic points [19] or some targets specificities like group coherence
conditions. Mobile target pursuit is often a sub-task of more [17], while monitoring is achieved through a direct view to
general scenarios, that call for environment exploration or
monitoring activities. Since most of the time a single robot the targets. With several targets, the problem often comes to
is sufficient to ensure the pursuit of a target, our approach trade the targets between the observers, depending on their
aims at minimizing the team resources devoted to the pursuit: relative positions [7, 8]. When there is only one target, a single
while ensuring the pursuit, a single pursuer evaluates its own robot may perform the tracking task alone – which we call
potential failures on the basis of the situation defined by the target “following”, but it often referred to as “pursuit-evasion”. This
evolution and the environment structure. It thus assesses its need
for team support. When support is necessary to keep the target latest problem often assumes that a single robot is enough to
in view, one or more additional robots are involved, according perform the target monitoring task.
to a task allocation scheme. We provide mathematical bounds
of the complexity of the approach, that ensure that the system
has real-time performance. Simulations in a variety of realistic Target Management
situations illustrate the efficiency of the proposed solution.

I. I NTRODUCTION – CONTEXT
Target Detection Target Tracking
Numerous mobile robotics applications are related to “tar-
gets”, be the target adversaries to detect and chase, flocks of
animals to monitor, other robots to follow or assist... Detecting,
Art Gallery Search Localizing
localizing or tracking targets raises a large variety of problems, Monitoring
[16] [5] [10, 19, 17]
to which the research community has devoted a lot of work.
Figure 1 summarizes the primary problems encountered in
target related applications. One may first distinguish two
different missions: detecting one or more targets, and tracking Observing multiple targets Following a single target
them. [7, 8] [13, 3, 11, 2, 12, 1]
The first mission aims to control an area, and ends with the
detection or the capture of the targets inside this area, using Fig. 1. Overview of the target management problems.
several agents, robots or fixed sensors. The historical problem
is known as the art gallery problem [16] which considers fixed For all these problems, whatever the application context and
sensors. More recent variations are the patrolling problem [14] the target type considered, multi-robot teams naturally extend
and the surveillance problem which consider either mobile the capacity of a single robot to manage the targets. Heteroge-
sensors only or a combination of mobile and fixed sensors. neous team provides even more solutions and opportunities to
Besides, most problems known as pursuit-evasion or search perform a mission, e.g. by multiplying the vantage points, as in
problems are actually “capture” problems, where the aim cooperating aerial and ground teams. Also, while a single kind
is to surround a target, avoiding both the contamination of of robots can perform well on specific cases, a heterogeneous
previously cleared areas and the evasion of the targets. It team can achieve multi-objective missions, some robots being
is assumed that the number of robots is sufficient for such more adapted to target tracking, others being better fitted to
purpose. Chung et al. [5] propose a good survey of existing target detection for instance.
mathematical and robotics oriented work in this area. We hereby consider a multi-objective scenario where an
Discovering or detecting a target is one thing, but in real heterogeneous team of robots is in charge of controlling a
applications this is often only a part of the whole scenario, predefined known area. They are not numerous enough to stat-
either because the robots cannot neutralize the targets or ically cover the entire area (this is not an art gallery problem),
because it is not desired or expected. Once the targets are and several targets can be present in the environment. The
designated, the robots often have to track them. This is the robots are initially engaged in an exploration phase, according
second kind of mission, which starts upon target detection. to a pre-planned strategy. Once a mobile target is discovered,
One may then want to localize the targets [10] or to monitor the team must ensure that the target remains visible, while

361
pursuing the environment exploration. The scenario naturally or “the target is about to cross an area the pursuer can not
imposes the satisfaction of real-time constraints. cross” are situations that do call for support from other robots,
The paper focuses on the target pursuit phase, under the whereas the situation “the target is moving behind a small
realistic constraints raised by this multi-objective scenario. building located in a wide open area” does not if temporary
The only hypothesis on the target is that it is not harmful for occlusions are allowed (figure 2).
the robots: besides this it may either be adversarial, evasive,
or move according to any other strategy. The success of the
tracking task is defined according to some visibility criteria (in
terms of distance and continuity, from a 100% target visibility 
constraint to a more relaxed one).
As the team performs both exploration and one or more (a) Pursuit with AAV (b) Risks evaluation
tracking tasks, we want to minimize the resources required by
the management of one target: the objective is to satisfy visi-
bility constraints on the target, while minimizing the number
of required robots for this purpose. As a single pursuer is most
of the time sufficient to perform the tracking task, this is the
(c) Ask for support (d) Pursuit with AGV
favored tracking configuration. However a single pursuer can
sometimes fail, in which case it is asks for support by other
Fig. 2. Illustration of our approach. In (a) the helicopter is the pursuer. In
robots. The two main issues raised here and tackled in this (b), the target is about to enter a building in which the helicopter can not
paper are (i) how to predict the single pursuer failures and (ii) proceed: the pursuer asks for support (c), and the ground robot becomes the
how to anticipate and prevent them with support robots. pursuer (d).
The next section reviews the main results of the literature
on target following, formalizes the problem, and presents our
approach and the used models. Section III is the heart of the A. Problem statement
paper: it depicts how the pursuer can assess future tracking
According to an economy of means principle, we want to
failures while the target is being tracked. Results in various
minimize the number of robots requisitioned for the tracking
realistic situation are presented and analyzed. Section IV
task.
exploits these potential tracking failures to define cooperative
support tasks, using a task allocation scheme, and a discussion Let R the set of robots. The problem is formally defined as
concludes the paper. follows:

II. T HE PURSUIT PROBLEM ∀τ, minimize a(r, τ ) (1)
Much work on the target following problem can be found r∈R
in the literature. Eaton and Zadeh [18] first exposed the
search for an optimal strategy as an optimization problem where

in discrete probabilistic systems, working with huge search ⎨ 1 if the robot r is active at time τ
space size. More recently, contributions by Hutchinson et a(r, τ ) = for tracking or support in pursuit (2)
al. [13, 3, 11, 2] thoroughly analyze the problem and some ⎩
0 else
of its variations: the following problem is fully decidable,
but its complexity hinders the definition of optimal solutions while satisfying the visibility criteria :
under real time constraints. Besides, some single robot local  τ
following strategies however exhibit great results in both h({r, a(r) = 1}, target, θ)dθ ≤ Tmax (3)
simulations and actual experiments (e.g. [12, 1]). Note that τ −Tmax
these approaches exploit poor environment models, usually
2D binary free/obstacle models in which the obstacle areas with

coincide with visibility masks. ⎪ 1 if target is hidden


To minimize the resources allocated to the target pursuit, from the set of robots S
our approach is to mainly rely on one robot (the pursuer) to h(S, target, θ) = (4)

⎪ at time θ
perform the pursuit task: the state of the art shows that this ⎩
0 else
should be sufficient most of the time. But due to its capacities,
the target capacities and the environment configuration, the More specifically, the target is hidden if none of the active
purser may fail to ensure the visibility constraint, be its strate- robots sees it (i.e. it is occluded by the environment or beyond
gy optimal or not. The pursuer therefore constantly evaluates the sensor maximal range). Tmax is a criteria specified by the
the potential upcoming failures and their associated risk, and operator, which allows relaxing the visibility conditions: the
asks for support from others robots only when required. For target may be out of sight, but no longer than Tmax seconds
instance “the target is rather far and about to enter a maze” ( Tmax can also be set to 0).

362
B. Realistic Models
To be able to deal with a whole variety of realistic environ-
ments, different types of target and heterogeneous robots, we
use environment models that explicit traversability properties
for both the ground robots and the target and on which
visibility constraints can be assessed.
A layered model of the environment allows to recover
inter-visibility and traversability information (namely, a 2.5D
elevation map and a symbolic layer that expresses the terrain
type, e.g. roads, grass, obstacles, buildings, rivers...). Motion
(a) Orthoimage (b) Elevation map
models are associated to each vehicle (robots and targets),
they define their movement capacities in terms of attainable
speeds.1 The robot sensors are modeled as omnidirectional
cameras, to which a maximal range of detection is associated.
By convolving the vehicle motion models with the terrain
type layer, we end up with a multi-layered map for traversabil-
ity, used for target motion prediction and robot planning. A
priori target visibility information are also computed for each
robot (e.g. an AAV will never be able to view a target in a
building), and the 2.5D elevation map is exploited to assess
visibilities (see figure 3). All these layers are represented by
Cartesian grids, and are exploited online to compute the mo-
(c) Traversability map for AGV (d) Visibility map for AGV
tion and sensing possibilities using the available information
on the current state (such as target speed and position). Each Fig. 3. Environment models used. (a) Satellite view of the area (orthoimage);
robot embeds the models of its motion and sensing capacities, (b) elevation model (derived from an airborne lidar survey); (c) 2.5D multi-
channel traversability map: here different gray values correspond to different
and the various target motion models. attainable speeds, along a light-gray/dark-gray fast/slow scale; (d) 2.5D multi-
channel visibility map.
III. A SSESSING THE NEED FOR TEAM SUPPORT
Our approach relies on the fact that the robot which tracks the pursuer and target motion models2 . At every time step,
the target, called the pursuer, is able to predict tracking the tree grows with a branching factor of mt ∗ mr , where
failures, that is loss of target visibility for a duration longer mt and mr are respectively the number of possible motions
than Tmax . For this purpose, we evaluate all the possible for the target and the robot. With a temporal horizon h, the
pursuer / target situations over a temporal horizon h to isolate complexity of the tree building is O((mt ∗ mr )h ).
the tracking failures that can occur between the current time Usual values for the parameters are mt = mr = 9 (9-
τ and the time τ + h. We hereby depict how this can be connexity in 2D-grid), and h = 20 at least. h must be large
achieved in real-time – the predicted failures generate support enough so that the other robots of the team can address the
task requests, whose processing is depicted in section IV. requests for support, without a priori constraining them too
much to remain in the vicinity of the pursuer: indeed the
A. Failure evaluation smallest h is, the quicker and closer the supporting robots
As failures mainly depend on the environment and on the must be. Using a value of h of 20 and considering 9-connexity,
relative positions of the target and the pursuer, there is no the tree complexity is O(81h ), i.e. near 1038 , which is by no
real general shortcut for their assessment, and so one has to means tractable.
compute all the possible future states over the time horizon h.
A future state is a failure state if constraint (3) is not satisfied, B. Introducing the tracking strategy
which for the pursuer is equivalent to: Besides the combinatorial problem of the failures eval-
uation, the pursuer tracking strategy has to be efficient to
∀θ, θ ∈ [τ ; τ + Tmax ], h({pursuer}, target, θ) = 1 (5) minimize the needs for team support. Efficient stands here for
i.e. if the target is hidden from the pursuer sensors for Tmax “keep the target in sight”, while being predictable and fast to
seconds or more. compute. Indeed we seek to evaluate all the possible outcomes
with unpredictable target motions, hence numerous strategies
To compute all the possible states, we exploit the discrete
structure of the traversability models, and define a tree using 2 For the sake of simplicity, we consider throughout this section that the
purser and the target evolve at the same speeds. This does not cause any
1 We assume that once detected, a target is identified – hence its capacities loss of genericity, but only slightly influences the overall complexity – see
are known to the pursuer appendix.

363
must be evaluated within the time horizon h to cope with this
unpredictability.
As computing an optimal tracking strategy raises a com-
binatorial problem, we use a local greedy strategy. One may
      
want to use a pre-computed optimal strategy but this would   
 

not be robust to differences between the a priori environment     

model and the actual environment. Real-time state-of-the-art


local strategies are efficient, but still require much computing Fig. 5. Reducing the complexity: merging spatial redundancies
resources and often make strong assumption about the target
motions. Our local strategy applies the two following rules:
(1) if the target is visible, try to get closer to the target
while maintaining visibility; (2) else, find the shortest path
to the closest position which satisfies the target visibility   
  

criteria. While clearly sub-optimal, this simple local strategy  



 

shows acceptable tracking performances, is extremely quickly 


computed, and can be applied to either an AAV or an AGV.      

This local strategy also brings an important advantage:     

the branching factor of the tree is directly reduced to mt ,


the number of available positions for the target, because the Fig. 6. Reducing the complexity: using a graph structure to exploit spatial
strategy associates a single robot position for a given target redundancies
position. The tree complexity becomes O(mht ) (see figure 4),
that is near 1019 with mt = 9 and h = 20. Note that mt could
be reduced by making assumptions on the target motions, but For a 9-connexity grid, these two structure transformations
this is not desired, and this does not drastically reduces the drastically reduce the complexity, down to O(h5 ) in the
problem the exponential complexity. worst case (see appendix for details). Temporal redundan-
cies allow to further reduce the complexity, by introducing
temporal loops in the graph: some situations are indeed
 encountered several times, e.g. when none of the vehicle
  move, or after one loop around a building (figure 7). The
nodes are embedded the following data: (τ̂ , {pt }, pr ) where
  τ̂ = minτ {τ /(τ, {pt }, pr )}, i.e. τ̂ is the first temporal occur-
 rence of the considered spatial state. The resulting structure is
referred to as the pursuit graph.
Fig. 4. Reducing the complexity: the impact of a local strategy Last, we introduce an iterative algorithm to build the graph,
which allows to only consider the real new nodes at each next
temporal step, which are referred to as the front nodes. This
C. From a tree to a cyclic pursuit graph shrinks the complexity down to O(h3 ), still without any loss
The previous complexity gain is not enough to allow the of information nor precision in the prediction (see appendix
satisfaction of real-time constraints: the complexity needs to for details).
be further reduced, while still ensuring the detection of all the
potential failures. Various structure transformations of the tree
allow to find and exploit states redundancies, that are brought
 
by the fact that the tree is built upon a grid structure.  
First, note that several positions pt of the target may be   
  
handled by only one robot position pr . The former state-nodes  

(τ, pt , pr ) become new nodes of type (τ, {pt }, pr ), with τ the


 

considered time. This lets the tree structure unchanged (figure 




     
5).
 
Second, for a same stage in the tree, there are several
similar nodes which only differ by their parents. However the
evaluation of a node only depends on spatial (and sometimes Fig. 7. Reducing the complexity: introducing temporal loops
temporal) considerations, not on the past positions of the target
or the robot. So similar nodes can be gathered, even if they The resulting complexity is polynomial of low degree,
have different parents. The tree structure can be changed into which allows to assess the potential failures, and hence the
a graph structure, but the information carried by the nodes are needs for team support in real time. Each potential failure
the same (figure 6). generates a task, which has the form (τ, p) where τ is the date

364
of the failure and p its position. In other words the support
task is defined as watch the position p at date τ . The result is
a small set of tasks, which is empty most of the time (when
the target is fully under the control of the pursuer), and which
constitute the seeds for multirobot cooperation.
D. Results
The bounded tractable complexity and the fact that the elab-
(a) Manhattan environment
orated solution guarantees the assessment of all the potential
failures are satisfying, but in practice, how does the pursuer
perform and are the complexity bounds low enough for real-
time applications?
Three examples are hereby presented to appreciate the
performance of the solution: one takes place in a Manhattan-
like environment, one is a “river-crossing” situation, and one
exploits models from an actual experimental field.

(b) River-crossing situation

(a) Robot traversability (b) Target traversability

(c) Realistic Environment

Fig. 9. Evolution of the number of dangers (thin line) and of the temporal
(c) Robot visibility horizon (thick line) as a function of time steps for the three considered cases.
The time laps during which the target is temporary hidden are indicated with
Fig. 8. Pursuit in a Manhattan environment: (a) traversability for the pursuer thin vertical lines. Phases are distinguished by thick vertical lines.
(AAV); (b) traversability for the target (ground target); (c) Visibility map:
elevations of the buildings, and presence of a covered area in the bottom
center of the map (in black). The vehicle trajectories are displayed (dark grey
= target; light grey = pursuer), and the elliptical dark areas indicates places required by the building of the pursuit graph (see paragraph
where the target is temporary hidden. The target escapes in the covered area III-D4). In phase II, the target approaches the covered area,
(which is a definite failure, denoted as fail). In phase I the target is under
control, while in phase II the target is too close to the covered area (danger). which generates several danger states – and in the absence
of support by an other robot, the target eventually enters the
1) Manhattan situation: Figure 8 shows the traversability covered area.
and visibility maps, and the trajectories of both the target The figures highlight that the greedy strategy, although not
and the robot pursuer. The situation is a typical pursuit in a optimal, behaves quite well. Furthermore, the system generates
Manhattan-like town center, with a ground target and an AAV temporally and spatially gathered tasks (remind each danger
pursuer. There are buildings and, in the middle, a covered area generates one task). This is really important because on the
where the AAV can not enter in nor see through (it can only fly one hand it will avoid “goings and comings” from the support
above). The buildings do not constraint the pursuer movement, robots, and on the other hand the support robots will probably
but its visibility. The motions of both the target and pursuer be able to handle several tasks at the same time, which will
are holonomic. help to minimize the number of robots required for the pursuit
Figure 9-a displays the number of dangers (in dark gray) mission (see section IV).
and temporal horizon (in light gray) over the time. Dangers are 2) River-crossing situation: Figure 10 shows a situation
states that do require a request for support from other robots of where the target and the robot do not have the same traversabil-
the team. One can distinguish two phases: in phase I the target ity map, with an advantage for the former – imagine for
is under control (no predicted failures), even if the target is example that the target is waterproof whereas the robot is not.
sometimes temporary hidden. The temporal horizon increases While the target crosses the river, the robot has to take the
at the beginning (easy environment) and then lowers as more bridge instead of directly following the target (phase I), which
buildings obstruct the visibility. This is the result of the real- will probably leads to a temporary occlusion (phase II), until
time constraints and the adjustment of the computation time the robot reaches the target again (phase III).

365
horizon (above 15). As a matter of fact, the temporal horizon
is constantly adjusted to fit the computation time with the real-
time constraint – that is, expand at least the graph of one time
step in no longer than one time step.

(a) Robot traversab. (b) Target traversab. (c) Robot visibility

Fig. 10. Pursuit in a river-crossing situation: (a) traversability for the pursuer
(AAV), the river is the at the bottom of the terrain, a bridge in on the left;
(b) traversability for the target (ground target); (c) visibility map. Difficulties
arise when the target crosses the river, as the pursuer cannot follow it.

Figure 9-b displays the number of dangers and the temporal


horizon evolution over the pursuit. One can again distinguish
three phases: at the beginning there are risks that the target
escapes (by breaking some visibility constraints); then as the Fig. 12. Typical evolution of the computation time with the temporal horizon.
target trajectory is not optimal the risks disappear, even if the Polynomial regression gives a near quadratic complexity (here : F (h) ≈
target hides behind the building; at the end the pursuer catches 2 ∗ h2.2 with a very confident value for the coefficient of determination R2 ).
up the target again. As the surrounding environment forms
a dead-end, which is easy to handle, the temporal horizon Note on figure 13 that at few times the computation time
reaches high values (over sixty time steps). exceeds the 1sec cycle criteria. This is actually because the
system currently computes the states not one by one, but one
horizon step at a time. As the prediction of the computational
cost is not very precise, sometimes the system asks for one
more horizon step and exceeds the time constraint. This
is always balanced by the next computation cycles, which
are very low. This would be solved by a slight adjustment
of the implementation to compute the states one by one,
improving both the respect of the time constraints and the
overall performance (since more states should be computed).

(a) Robot and target traversability (b) Robot visibility

Fig. 11. Pursuit in a realistic environment: (a) the pursuer (AGV) and ground
target traversability; (b) the pursuer visibility. Difficulties arise in cluttered
areas (phases I and III).

3) Experimental field situation: The third example is a re-


alistic environment for ground target and pursuer, the vehicles
Fig. 13. Computation cycle at each time step for the tree previous examples.
evolve in both cluttered and rather clear areas (figure 11). The thick horizontal line represents the “one second” limit.
As figure 9-c shows, dangers exist near and within the
cluttered areas, whereas in less difficult areas the greedy
strategy performs well enough to prevent risks of failure IV. S UPPLYING TEAM SUPPORT
(no need for support). In cluttered areas, the greedy strategy While a pursuer performs the following task, it predicts all
performs quite well too. It cannot prevent the target to escape its potential failures until a temporal horizon τ + h, τ being
with an optimal strategy, but is able to follow quite easily a the current time. These predictions straightforwardly generate
not so adversarial target. a set a support tasks, defined as “watch place p at time θ”, with
4) Real-Time performances: The previous sections states p a cell of the spatial grid and τ ≤ θ ≤ τ +h. By construction,
that considering a 9-connexity grid, the upper bound for the pursuer which issues these tasks cannot handle them. But
the complexity in the graph construction is O(h3 ) in the it manages them: it allocates them to other robots according to
worst case. It actually appears that the complexity is rather their availability, capacities, priorities and the global objective
near quadratic (figure 12). This allows real-time computation, of minimizing the number of required robots. Note that com-
considering a value of 1 or 2 seconds for the time steps munication issues are currently not considered – they mainly
(figure 13), while keeping a reasonably acceptable temporal lead to situational awareness inconsistencies between robots,

366
and may also lead the pursuer to have no (or not enough) sidering Nt the average number of bundle for each robot and
N
support. Nr the number of bidding robots, there are O((Nt + 1) r )
The set of support tasks is updated at each time step. 3
possible combinations . Typical maximum values are Nt = 3
Depending on the target motions, new tasks may occur and and Nr = 4, which leads to 44 = 256 possible combinations.
previous ones may become deprecated. The pursuer broadcasts This is still easy and quick to compute for the auctioneer
the updates to the surrounding robots, which in turn update using Boolean representation and bit-to-bit computation. Fi-
their current tasks list and evaluate if they could be of any nally, combinations are ranked with the number of handled
help for the non-allocated tasks: this is a classical multiple tasks (max) and the number of support robot involved (min),
task assignment problem. including the previously required robots for non-deprecated
Task allocation problems have given rise to a large amount former tasks. Then the auctioneer sends the support robots the
of work, they raise combinatorial issues and finding optimal support tasks they have been allocated.
solution is still very difficult for non-trivial sized problem. Note that the heavy computational load of the auction
Nevertheless many solutions have been proposed, for example phase is deported on the bundles construction, sparing the
using stochastic methods [15]. Auction-based approaches have auctioneer which is also the pursuer. It can thus spend more
been largely studied (see Dias et al. [6] for a survey) and have time computing the pursuit graph and expending the temporal
been shown to efficiently produce acceptable solutions. Choi horizon, which is the heart of the system. For the same reason
et al. [4] presents decentralized algorithms which lead to good we also introduce a temporal shift (of one time-step) between
solution with some guaranties for optimum. The authors also the time tasks are issued and the time associated bids are
highlight the combinatorial issue raised by the construction of computed. Doing so allows support robots to compute bundles
the bundles of tasks (how to choose which subset of tasks will while the pursuer computes the next pursuit graph, avoiding
an agent handle) and the conflicts that result from the iterative the robots to await for the others. This shift is negligible
construction of those bundles. compared to the expected horizon size (1  h).
We have chosen to implement an auction process, as such Work is still in progress for this part, but first results are
approaches have shown good results and can handle incon- promising and show that a minimum number of robots are
sistencies within the situational awareness of agents. But in involved, while still respecting our real-time constraints.
our case, costs are quite different from usual: as we aim to
minimize the number of required robots, performing one or V. D ISCUSSION
several support tasks has the same cost for a given robot,
whereas reward is linear with the number of handled tasks. We have presented a new approach for realistic target
The auction process is one-turn, with multiple bids allowed: tracking problems. Keeping in mind that a team of robots often
after the update of the task lists, each available robot computes have several distinct objectives, we provide a new pursuer-
several bundles of tasks it can handle together, and propose centered cooperation which minimizes the number of required
them to the auctioneer (the pursuer robot). Then the auctioneer robots at any time, thus satisfying an economy of means
combines those bundles and allocates subset of tasks (either principle. The approach is based on a single pursuer which
a bundle or sub-part of a bundle) to the robots, trying to calls for team support only when required, by the evaluation
minimize the number of robots involved. of future failures it can not handle alone. We exploit realistic
The support robots compute the bundles of tasks they can multi-layered 3D models, which allow to consider different
handle according to their capacities, the other tasks they capacities (such as being waterproof or not) for both targets
already have, and their goals and and robots. Kolling et al. [9] recently introduced 2.5D visibility
npriorities.
  As they cannot
in pursuit-evasion, but to the best of our knowledge it is the
be exhaustive (n tasks lead to p=1 np = 2n − 1 possible
bundles), they only compute the biggest bundles in an iterative first time realistic multi-layered models as ours are used for
way. We also want each task to be part of at least one bundle such problems.
(if the robot is indeed able to perform the task). The results is As our local following strategy performs quite well, only
an overlapping partition of the task set. Reminding the results one robot is required most of the time, but team provide
of part III, the generated support tasks are spatially and tem- support when needed to prevent the target loss. The resulting
porally gathered, which helps to keep a small number of task system shows promising results in realistic simulations, and
bundles (usual values are 0 to 3). Tasks are also independent integration within a team composed of one AAV and two
from each other (except for the temporal constraints) and thus UGVs is under way. We intend to extend the models by
can be handled separately. considering communication constraints and failures: this is not
Once these bundles are computed, the auctioneer gathers a trivial issue, as communication constraints threaten the whole
them and allocate tasks. It optimally combines the different cooperation. Possible solutions are to extend the anticipation
bundles according to the following criteria : from the pursuer, and to integrate communication rendez-vous
(i) all the support task must be handled if possible for the support robots.
(ii) the number of required robots must be minimized. 3 “+1” is for the empty bundle : each robot can either handle a bundle
This directly comes from the objective function (1). Finding of tasks, or none. This is important to consider as we want to minimize the
the optimal combination of bundles could be expensive: con- number of required robots.

367
Problems may also occur if the pursuer does not behave as [3] S. Bhattacharya, S. Candido, and S. Hutchinson. Motion
planned (i.e. according to its pursuing strategy). However, in strategies for surveillance. In Proceedings of Robotics: Science
such situations the system is not totally failing, as re-planning and Systems, 2007.
[4] Han-lim HL Choi, Luc Brunet, and Jonathan P How.
and re-computing the risks on-the-go remain tractable. Be- Consensus-based decentralized auctions for robust task alloca-
sides, adjusting the spatial and temporal resolutions will lead tion. Robotics, IEEE Transactions on, 25(4):912–926, 2009.
to better anticipation and to more flexibility in the execution [5] T. Chung, G. Hollinger, and V. Isler. Search and pursuit-evasion
of the plans. in mobile robotics. Autonomous Robots, 31:299–316, 2011.
We also plan to integrate finer probabilistic motion models [6] M.B. Dias, R. Zlot, N. Kalra, and A. Stentz. Market-based
multirobot coordination: A survey and analysis. Proceedings of
and thus to handle potential failures according to both their the IEEE, 94(7):1257 –1270, july 2006.
probability and their temporal proximity. [7] B. Jung and G. S. Sukhatme. Tracking Targets Using Multiple
Finally, our longer term perspective is to merge our target Robots: The Effect of Environment Occlusion. Autonomous
tracking system within an overall surveillance scheme, aiming Robots, 13:191–205, 2002.
at having a whole team of robots performing under realistic [8] A. Kolling and S. Carpin. Cooperative Observation of Multiple
Moving Targets: an algorithm and its formalization. Int. J. Rob.
conditions several tasks and achieving different goals in par- Res., 26:935–953, September 2007.
allel. [9] A. Kolling, A. Kleiner, M. Lewis, and K. Sycara. Pursuit-
evasion in 2.5D based on team-visibility. In IROS 2010,
ACKNOWLEDGMENTS IEEE/RSJ International Conference on, pages 4610 –4616, oct.
2010.
This research is partially funded by the DGA.
[10] R. Mottaghi and R. Vaughan. An integrated particle filter and
potential field method applied to cooperative multi-robot target
A PPENDIX
tracking. Autonomous Robots, 23:19–35, 2007.
We develop hereby the mathematical bounds presented [11] R. Murrieta-Cid, R. Monroy, S. Hutchinson, and J.-P. Laumond.
section III. The mathematical proof stands for an environment A Complexity result for the pursuit-evasion game of maintaining
visibility of a moving evader. In IEEE International Conference
free of any obstacle and in 9-connexity (using a 2D grid).
on Robotics and Automation.
Unexpectedly this is an upper bound for the complexity, [12] R. Murrieta-Cid, B. Tovar, and S. Hutchinson. A Sampling-
as obstacles, viewlength and lower-connexity lead to motion Based Motion Planning Approach to Maintain Visibility of
restriction, and thus reduce the number of possible spatial Unpredictable Targets. Autonomous Robots, 19:285–300, 2005.
states. [13] R. Murrieta-Cid, R. Sarmiento, S. Bhattacharya, and S. Hutchin-
son. Surveillance strategies for a pursuer with finite sensor
Under such hypotheses, we have, for each stage s in
range. International Journal on Robotics Research, 2007.
the graph obtained after spatial redundancies treatment, at [14] D. Portugal and R. Rocha. A Survey on Multi-robot Patrolling
most ns nodes with ns ≤ |Acc (target, s)| ∗ |Acc (robot, s)| Algorithms. In Technological Innovation for Sustainability, vol-
where Acc (i, s) = Ai,s is the set of all accessible posi- ume 349 of IFIP Advances in Information and Communication
Technology, pages 139–146. Springer Boston, 2011.
h i in time s. The total size N of the graph is [15] Emilio Frazzoli Sertac Karaman, Tal Shima. Task assignment
tions for
under s=1 |A t,s | ∗ |Ar,s |. Besides, in 9-connexity we have
s for complex UAV operations using genetic algorithms. AIAA
Ai,s ≤ 1 + σ=1 8σ = (2s + 1)2 , therefore we have
h Proceedings.[np]. 10- . . . , 2009.
N ≤ 4 5
s=1 (2s + 1) = O(h ). Here we consider that the [16] Jorge Urrutia. Art Gallery and Illumination Problems. In
pursuer and the target have the same velocity: this does not Handbook of Computational Geometry, pages 973–1027, 2000.
restrict the generality of our proof, as different velocity will [17] Christopher Vo and Jyh-Ming Lien. Visibility-Based Strategies
for Searching and Tracking Unpredictable Coherent Targets
only change the resulting complexity (|At,s | = |Ar,s |), not the Among Known Obstacles. In ICRA 2010 Workshop: Search and
reasoning. Pursuit/Evasion in the Physical World: Efficiency, Scalability,
Adding temporal loops and iterative construction for the and Guarantees, Anchorage, Alaska, May 2010.
graph, one only develop the new nodes at each stage, referred [18] L. A. Zadeh and J. H. Eaton. Optimal pursuit strategies in
to as the front nodes. Under the same hypothesis, we now have discrete-state probabilistic systems. Trans. ASME Ser. D, J.
Basic Eng, 1962.
a number of front nodes Nf : [19] Ke Zhou and S.I. Roumeliotis. Multirobot Active Target Track-
ing With Combinations of Relative Observations. Robotics,
Nf ≤ (At,h − At,h−1 ) ∗ Ar,h−1 + (Ar,h − Ar,h−1 ) ∗ At,h−1
IEEE Transactions on, 27(4):678–695, Aug. 2011.
≤ (8h ∗ O(h ) + 8h ∗ O(h ))
2 2

Nf ≤ O(h3 )
R EFERENCES
[1] T. Bandyopadhyay, Yuanping Li, M.H. Ang, and D. Hsu. A
greedy strategy for tracking a locally predictable target among
obstacles. In Proceedings. ICRA 2006. IEEE International
Conference on, pages 2342–2347, may 2006.
[2] S. Bhattacharya and S. Hutchinson. On the Existence of
Nash Equilibrium for a Two Player Pursuit-Evasion Game with
Visibility Constraints. In Algorithmic Foundation of Robotics
VIII, volume 57 of Springer Tracts in Advanced Robotics, pages
251–265. 2009.

368
Tendon-Driven Variable Impedance Control
Using Reinforcement Learning
Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emanuel Todorov, and Yoky Matsuoka

Abstract—Biological motor control is capable of learning


complex movements containing contact transitions and unknown
force requirements while adapting the impedance of the system.
In this work, we seek to achieve robotic mimicry of this
compliance, employing stiffness only when it is necessary for
task completion. We use path integral reinforcement learning
which has been successfully applied on torque-driven systems to
learn episodic tasks without using explicit models. Applying this
method to tendon-driven systems is challenging because of the
increase in dimensionality, the intrinsic nonlinearities of such
systems, and the increased effect of external dynamics on the
lighter tendon-driven end effectors.
We demonstrate the simultaneous learning of feedback gains
and desired tendon trajectories in a dynamically complex sliding-
switch task with a tendon-driven robotic hand. The learned
controls look noisy but nonetheless result in smooth and expert Fig. 1: The ACT hand is a tendon-driven robot designed to
task performance. We show discovery of dynamic strategies not
explored in a demonstration, and that the learned strategy is mimic the tendons and joints of the human hand.
useful for understanding difficult-to-model plant characteristics.

I. I NTRODUCTION In this work we use Policy Improvement with Path Integrals


Though significant progress has been made toward control- (Section IV) [24, 22] for learning complex manipulation tasks
ling tendon-driven robotic systems, learning control of such with a tendon-driven biomimetic robotic hand (Section II).
systems remains a challenging task. Most of the difficulties P I 2 concentrates sampling of the state space around a rough
arise from the existence of strong nonlinearities imposed initial demonstration or previously learned strategy, making
by the tendon-hood structure, model uncertainty, and task it effective in high dimensional problems. We introduce a
complexity due to the need for simultaneous movement and structure by which P I 2 can learn a discontinuous variable
force control. impedance control policy that enables tasks requiring contact,
Additionally, fine object manipulation using tendon-driven motion, and force control during object interaction. With
systems may require sudden changes in gains and/or tendon respect to previous results on variable stiffness control with
excursions. Smooth controls can be used when the mass of a reinforcement learning [3], here we are not using any policy
manipulated object is small compared to that of the manipu- parameterizations that are based on function approximation.
lator, but as the dynamics of the external object become sig- Instead, we represent trajectories and control gains as markov
nificant, dextrous contact with the object requires more abrupt diffusion processes. This choice expands the dimensionality of
changes in control. Tendon-driven systems allow lightweight the controllable space and allows for better exploration of the
manipulators which accentuate this problem. Previous work nonlinear dynamics of the ACT hand and the task.
has learned manipulation tasks by learning parameters of The learned strategies can yield insight into subtleties of
smooth trajectory basis functions [19] [3], but the imposed the plant, showing how biomimetic robotics can not only
control smoothness limits the dynamic interaction that can use inspiration from nature to achieve robotic goals, but can
occur with the manipulated object. provide insights into the systems which they mimic (Section
Optimal control provides a principled approach to determin- VII-D).
ing control policies that are compliant and dynamic [2], and Videos of the experiments can be found at
can also handle contact dynamics [27], but requires identified http://tendondriven.pbworks.com/.
models of the robot and task. Instead, we use state-of-the-art
model-free reinforcement learning to directly learn policies for II. T ENDON -D RIVEN B IOMIMETIC H AND
movement control without having to explicitly model uncertain The robotic hand mimics the interaction among muscle
robot and task dynamics. The challenges that come with excursions and joint movements produced by the bone and
manipulators using tendon actuation constitute an important tendon geometries of the human hand, as in [6]. The index
and previously unmet challenge for model-free reinforcement finger has the full 4 degree-of-freedom joint mobility and
learning. is controlled by six motor-driven tendons acting through a

369
crocheted tendon-hood. Two tendons, the FDS and FDP act
as flexors; the EI, RI, and PI act as extensors and ab/aductors; Passive Demonstration Controlled Replay
the LUM is an abductor but switches from extensor to flexor EI
1 1
depending on finger posture. By sharing the redundancies and
PI

L* (cm)

L (cm)
nonlinearities of human hands [5], the system constitutes a RI
0 LUM 0
challenging testbed for model identification, control, and task
learning, while also providing a unique perspective for the −1
FDP
−1
study of biomechanics and human motor control. FDS
The experiments presented here use only the index finger, 0 1 2 0 1 2
Time (sec) Time (sec)
with a silicon rubber skin on the palmar surface of the distal
segment. The brushless DC motors that actuate the 6 tendons
are torque-controlled at 200 Hz and measure tendon displace- Fig. 2: Playback of the demonstration trajectories using a
ments at a resolution 2.30 μm; the tendon lengths alone are constant-gain proportional controller fails to achieve task-
used for feedback control as there is no direct measurement of performing tendon trajectories, and does not exhibit compli-
joint kinematics. Successfully performing manipulation tasks ance for task-irrelevant time periods. Acronyms are anatomical
thus requires a control policy that can handle the nonlinear tendon names, eg Flexor Digitorum Profundis (FDP).
dynamics and high dimensionality of the robot as well as the
dynamics of the task itself.

III. S LIDING S WITCH TASK

The kinematically simple task of sliding a switch is difficult


to perform expertly with a tendon-driven finger; contact and
task dynamics constitute a large part of the force required from
the controlling tendons. An important research topic in neuro-
muscular control is how humans achieve such hybrid control, x
transitioning from motion to force control, as in tapping a
xreach
finger on a rigid surface [29]. Even for isometric tasks it is
nontrivial to decode the recorded activations of muscles and xmin
understand how these act through tendons to the end effector
[28]. In this paper we examine the task of contacting a sliding
switch and pushing it down (see Figure 3). The switch in
our apparatus is coupled to a belt and motor which allow the
imposition of synthetic dynamics. The position of the switch
x is measured with a linear potentiometer. Importantly, the
finger loses contact with the switch at xreach before reaching Fig. 3: Hybrid Switch Task. During the third experiment,
the bottom of the possible range, denoted xmin . spring dynamics are added to the natural switch dynamics,
We begin with a single demonstration of the desired task in the form of a virtual spring force F . Xmin is the physical
in which a human holds the finger and moves it through a extent of the sliding switch, but the finger loses contact before
motion of pushing the switch down. The tendon excursions reaching it, at a point Xreach which is dependent on finger
produced by this externally-powered example grossly resemble movement.
those required for the robot to complete the task, but simply
replaying them using a general-purpose PID controller would
not result in successful task completion for two main reasons. Without any prior system identification of the robot or
Firstly, during demonstration the tendons are not loaded, which sliding switch, P I 2 can directly learn a control policy that
changes the configuration of the tendon network in comparison minimizes a cost associated with the position of the switch,
to when it is actively moving. Secondly, and more importantly, feedback gain, and tendon travel.
the tendon trajectories encountered during a demonstration do
not impart any information about the necessary forces required IV. P OLICY I MPROVEMENT WITH PATH I NTEGRALS : P I 2
to accommodate the dynamics of the task. For instance, at the In this section we review the framework of reinforcement
beginning of the task, the finger must transition from moving learning based on path integrals. The stochastic optimal control
through air freely, to contacting and pushing the switch. A is a constrained optimization problem formulated as follows:
feedback controller following a reference trajectory has no way
of anticipating this contact transition, and therefore will fail  tN
to initially strike the switch with enough force to produce the
V (x) = min J(x, u) = min L(x, u, t)dt (1)
desired motion. u(x,t) u t

370
subject to the nonlinear stochastic dynamics: solution of the value function in (5) the path integral optimal
control takes the form:
dx = α(x)dt + C(x)udt + B(x)δω (2) 
with x ∈ &n×1 denoting the state of the system, u ∈ &p×1 the uP I (xti )dt = lim P (τ i ) δω ti (6)
dt→0
control vector and δω ∈ &p×1 brownian noise. The function
where τ i is a trajectory in state space starting from xti and
α(x) ∈ &n×1 is the drift, which can be a nonlinear function of
ending in xtN , so τ i = (xti , ..., xtN ). The probability P (τ i )
the state x. The matrix C(x) ∈ &n×p is the control transition
is defined as
matrix and B(x) ∈ &n×p is the diffusion matrix.
e− λ S̃(τ i )
1

TABLE I: Policy Improvements with path integrals PI . 2 P (τ i ) = - (7)


e− λ S̃(τ i ) dτ i
1

• Given:
– An immediate state dependent cost function q(xt ) The term S̃(τ i ) is defined as:
– The control weight R ∝ Σ−1
• Repeat until convergence of the trajectory cost:  tN
– Create K roll-outs of the system from the same start state x0 S̃(τ i ) ∝ φ(xtN ) + q(xtj )δt (8)
using stochastic parameters u + δus at every time step ti
 E E2
– For k = 1...K, compute: tN E xc (tj + δt) − xc (tj ) E
  − 1 S(τ i,k )
+ E − αc (x(tj ))E
∗ P τ i,k = e λ
E δt E −1 δt
K [e− λ S(τ i,k ) ]
1
N −1 
k=1 
ti Σ tj
∗ S(τ i ) = φtN + j=i qtj dt + 12 ||u + δus ||M
– For i = 1...(N − 1), compute: Which in discrete time is approximated by:
  
∗ δu(xti ) = K P τ i,k δus (ti , k)

k=1  

N −1  
– Update u ← max umin , min u + δu, umax 1
S(τ i ) ∝ φ(xtN ) + q(xtj )dt + δω Ttj Mδω tj (9)
j=i
2

Under the optimal controls u∗ the cost function is equal  −1


with M = B(x)T C(x)R−1 C(x)T B(x). Essentially the
to the value function V (x). L(x,u,t), the immediate cost, is optimal control is an average of variations δω weighted by
expressed as: their probabilities. This probability is inversely proportional
1 to the path cost according to (7) and (9). Low-cost paths have
L(x, u, t) = q(x, t) + uT Ru (3) high probability and vice versa. Table I illustrates the path
2
integral control in iterative form as applied to the ACT Hand.
The running cost L(x, u, t) has two terms: the first q0 (xt , t) Alternative iterative formulations based on Girsanov’s theorem
is an arbitrary state-dependent cost, and the second is the [13] resulting in small differences in biasing terms of the path
control cost with weight R > 0. The optimal controls u∗ (x, t) cost S̃ are available in [23].
as a function of the cost to go: V (x, t) as follows.
V. PI2 L EARNING VARIABLE T ENDON I MPEDANCE
u∗ (x, t) = −R−1 C(x)T ∇x V (x, t) (4) A. Variable Impedance
If interaction forces with the environment were neglible, a
The role of optimal control is to drive the system towards standard approach would be to apply negative feedback control
parts of the state space with small values of V (x, t). The with as aggressive gains as possible while still maintaining sta-
value function V (x, t) satisfies the Hamilton Jacobi Bellman bilty. High feedback gains, however, are dangerous for robots
equation partial differential equation [7, 20]. Recent work on interacting with humans and are potentially wasteful when
path integral work in [24, 11] and logarithmic transformations perturbations are not task-relevant. The idea of impedance
of the value function V (x, t) = − λ1 log Ψ(x, t) are exploited control is to control the dynamic behavior of the manipulator
to transform the HJB into a linear PDEs for which their solu- in addition to commanding a reference state trajectory [10] [3]
tion [8, 13] is represented via forward sampling of diffusions [4]. Variable stiffness impedance control specifies a schedule
processes. The outcome is the expression that follows: of gains emphasizing uncertain or unforgiving components of
the task, while allowing compliance elsewhere.
 (
− φt +
N −1
) For complex tasks and unknown environmental dynamics,
1 N j=i
qt dt
j
V (x, ti ) = − log Ppath e λ dxN (5) a suitable target impedance schedule is difficult to specify a
λ
priori. For biological and some biomimetic systems, stiffness
where the probability Ppath = P (xN , tN |xi , ti ) has the is achieved by coactivation of antagonist muscles, introducing
form of path integral starting from the initial state xi and futher challenge for impedance scheduling due to complex
ending in state xN under uncontrolled u = 0 stochastic actuator dynamics [9] [16]. Here we define the basic structure
dynamics in (2). In [24], [12] it has been shown that under the of our controller, then describe how learning is applied.

371
Consider motor commands τ calculated via a proportional to the exact value of σ, and high variance results in low control
controller: cost (Section IV) and therefore increased control authority.
Each trial consists of ten rollouts, and after every third trial
τi = −ki (li − lˆi ) (10) performance is evaluated by executing three exploration-free
rollouts (σ = 0).
where ki is the proportional gain and lˆi is the desired reference The controls learned for each tendon are linked to the others
length for tendon i. By varying k in time according to a only through their shared effect on the system reflected in
gain schedule, we may achieve variable impedance specific the rollout costs. In this way, each tendon-specific PI2 can be
to each tendon. The reference trajectories may differ from the considered a parallel agent operating on its own.
demonstration to anticipate task dynamics.
VI. C OST F UNCTION AND E XPERIMENTS
B. Learning Variable Impedance Using PI2
A. Cost Function
We apply PI2 to learning controls with no time-averaging
We conduct three experiments using the same setup and
of the learned parameter vector. We use a time-varying
demonstration. For all experiments, the cost-to-go function for
impedance controller with a differential equation on the gains
a rollout having duration T may be expressed as :
and reference lengths:

T
dk(t) = (−αk(t) + uk (t)) dt + σk δuk (11) Ct = qterminal (xT ) + q(xt ) + ut T Rut (13)
t
 
Here xt corresponds to the position of the switch at time
dˆl(t) = −αˆl(t) + ul̂ (t) dt + σl̂ δul̂ (12)
t. q(xt ) is the cost weighting on the switch state, with
where uk and ul̂ are the change in gains and reference tra- qterminal (xT ) the terminal cost at the end of the rollout. R
jectories. δuk and δul̂ are sampled from a zero-mean Normal is the cost weighting for controls. The experiments presented
distribution as in the PI2 algorithm, Table I. here use q(xt ) = 20000xt , qterminal (xt ) = 300q(xt ), mean-
The smoothing effect of these differential equations pro- ing the terminal state cost is as costly as 300 time steps.
vides a means for using anything from unsmoothed (α → ∞) B. Gain Scheduling: Experiment One
to highly smoothed (α = 0) controls. In either case, no time
averaging acts on the level of the learning algorithm [19]. The For the first experiment P I 2 learns only the gains K. There-
experiments presented here use α = 0.9 fore the sampling noise σl̂ = 0. This means that the learned
dt = 180, providing a
very slight smoothing effect to the control outputs. Section strategy will always kinematically resemble the demonstration,
VII presents learned controls which, although quite noisy, but will optimize compliant control of the dynamics.
produce smooth tendon trajectories with contact-produced C. Gains and Reference Lengths: Experiment Two
discontinuities, as illustrated in Figure 5.
The second experiment is to learn both gains and reference
The controls vector u = u(t) + δu(t) optimized by the
lengths simultaneously. The learned strategy may take advan-
learning algorithm (see Table I) corresponds to a concatenation
T tage of kinematic poses not experienced in the demonstration.
of gain and reference length controls: u = uk (t) ul̂ (t) .
P I 2 learns the optimal gain and reference length for each D. Stabilizing Spring Dynamics: Experiment Three
tendon at each timestep, a considerable number of param- In the third experiment P I 2 learns both gains and reference
eters. To combat this high-dimensionality without adversely lengths, and we also introduce spring behavior to the switch
effecting the learning, we subsample the injected noise δu to dynamics, requiring a qualitatively different learned optimal
change each 50 timesteps during a rollout. Such windowing strategy. The intrinsic switch dynamics are retained, but the
is especially important for systems like ours whose high-order motor coupled to the switch also resists the finger with a
dynamics filter out much of the injected noise. stiffness force fspring = k(xmax − x). This force springs
Learning proceeds through iterative revision of the policy the switch back up toward its home position if the finger
parameters. Each of these revisions we refer to as a trial. pushes too far and loses contact. For the first two experiments,
A sample trajectory is queried from the system by sampling the optimal strategy could include pushing past the point of
δu, and actually performing a switch-slide using the resulting losing contact with the switch, but for the third the finger must
u + δu. We refer to one of these exploratory executions of proceed quickly to the edge of losing contact but go no further.
the task as a rollout. To revise u at the end of a trial, each
sampled control strategy is weighted according to the cost VII. R ESULTS
encountered by the corresponding rollout (Table I). The results The task is successfully learned for each experiment. Figure
reported here use σk = 50 for sampling gains and σl̂ = .05 for 6 depicts the sum cost-to-go results as learning proceeds for
sampling reference trajectories. The smaller this exploration 100 trials (revisions of control) for two separate executions of
variance is, the more similar rollouts are, so the magnitute of σ each experiment. Every third trial, three unexploratory rollouts
should depend on the natural stochasticity of the plant, though are performed, and their means and standard deviation are
here it is set by hand. Convergence is qualitatively insensitive denoted by the solid line and shaded width in the figure. Cost

372
Experiment 1 Experiment 2 Experiment 3

(a) (d) (g)


Early
Learning

Policy Improvement
(b) (e) (h)
Middle
Learning

(c) (f) (i)


Late
Learning

Fig. 4: Final posture attained for early, middle, and late learning (rows) for each experiment (columns). In pane (f) we observe
the switch being thrown past the last point of finger contact to its physical limit. Pane (h) depicts a strategy which has pushed
the finger too far, and the switch has bounced upward due to the added spring force (experiment three, Section V). In pane (i)
we see the very tip of the finger just preventing the switch from returning.

Experiment 1 Experiment 2 Experiment 3


1500
τ (mA)

1000
500
0

1
L (cm)

−1

4
x

0
0 1 2 0 1 2 0 1 2
Time (sec) Time (sec) Time (sec)

Fig. 5: Results after learning (trial 100) for all three experiments. The first row depicts the torque commands learned for all
tendons. Dynamic requirements of the tasks are apparent in the learned strategies. For instance, Experiment 2 appears to require
two bursts of torque to achieve vigourous flexion, but Experiment 3 requires another in order to arrest finger motion before
losing contact. The middle row depicts the resulting tendon trajectories, and the third row depicts the position of the switch.

373
Contact Slider Movement Holding At
Fast Slow Bottom

Switch (x)
4

2
0 0.5 1 1.5 2 2.5
Time (sec)

Switch (x)
Fig. 6: Learning as trials progress. In experiment one (left), 4
only the gains are learned and the reference lengths recorded 2
during the demonstration are used. In experiment two (middle), 0 0.5 1 1.5 2 2.5
both the gains and reference lengths of the tendons are learned. Time (sec)
In experiment three (right), additional spring dynamics resist
Fig. 7: Above: Switch position learned for Experiment 3, in
switch displacement from initial position. Solid line indicates
which additional spring dynamics resist switch displacement
mean cost for three unexploratory rollouts every third trial;
from initial position. Below: Switch position for a middle-
the shaded area is standard deviation. Each of the two curves
learning failure to stop before leaving contact with the switch.
corresponds to a repetition of the experiment beginning with
the same initial control strategy.

Contact Slider Movement Holding At


magnitude is different for each experiment due to different Bottom
Fast Slow
control exploration (Equation 13), so sum cost-to-go for the
first noise-free rollout is normalized for comparison.
Figure 5 shows the commanded torques, resulting tendon Flexors
1000
trajectories, and switch position trajectories for the final
learned control strategies for each experiment.
Commanded Motor Torque (mA)

A. Gain Scheduling: Experiment One Results 0


In Experiment 1, the optimized gain schedule produces
a movement which smoothly pushes the switch to near the 1000 Extensors
position attained in the demonstration. As can be seen in
Figure 5, top left pane, the learned strategy uses the two
primary flexors very similarly. After 100 trials, the total cost
0
is reduced by about 15% from the initial policy.
Lumbrical
B. Gains and Reference Lengths: Experiment Two Results 1000
The strategy found for Experiment 2, learning both the gains
and references, differed between the two executions of the
experiment. Both learned strategies quickly move the switch 0
down, past the kinematic pose observed in the demonstration. 0 0.5 1 1.5 2
Time (sec)
During the second execution, however, P I 2 learned that the
switch could be thrown down, using the inertia of the switch Fig. 8: Torques resulting from the learned control strategy for
to carry it past the last point of finger contact, xreach . The experiment three, in which additional spring dynamics resist
result of this highly dynamic behavior can be seen in Figure switch displacement from initial position.
4, pane (f). The switch is thrown to its physical limit xmin in
a single ballistic motion, as evidenced in the switch position
trajectory pictured in Figure 5, middle-bottom.
C. Stabilizing Spring Dynamics: Experiment Three Results initial switch movement followed by a burst of flexion and
Experiment 3 presents the most dynamically challenging then co-contraction of flexors and extensors. In order to better
task. In this experiment, the motor resisting switch displace- illustrate the effect of the additional spring dynamics, Figure
ment implicitly imposes a task constraint: the finger must 7 presents a suitably stopping (top) and failed stop (bottom)
move quickly to push the switch as before, but now it must switch trajectory. Recall that the control policy incorporates
stop quickily to prevent overshoot. As in the other experi- feedback of tendon lengths only; measurement of the switch
ments, Figure 5, bottom-right presents the switch trajectory position is used in evaluating a rollout but not within the
for the learned strategy. The learned strategy makes a large control loop.

374
D. The Role of the Lumbrical Tendon parameterization allows us to learn non-smooth trajectories
Experiment 3 yields insight into the role of the Lumbrical and control gains required for tasks that involve contact,
tendon. In that experiment the finger must move quickly but which is particularly important when controlling tendon-
then stabilize the switch at the extent of the finger’s reach, actuated manipulators. Tendon-driven systems are particularly
corresponding to switch position xreach in Figure 3. This advantageous in applications like dexterous hands where it
requires more elaborate use of the extensor tendons than in the is desirable to have compact and low inertia end effectors
first two experiments, in which extensors primarily stabilize [14]. The associated increase in control dimensionality and
the abduction-adduction motion of the finger. nonlinearity, as well as the relatively greater prominence of
Examining the strategy learned by the controller for this task dynamics make system identification of the task and the
tendon serves as a novel way of illuminating the complex robot less tractable. Model-based control methods for systems
role it plays. The tendon extensor mechanism matches the reflecting the biomechanical complexities of hands are very
human tendon network in important ways [30], including much an open research topic due mostly to the interaction of
finger lumbricals. Lumbricals are smaller tendons which attach tendons in the network.
to other tendons, both flexors and extensors, instead of bone. This paper shows that by using a model-free reinforcement
This produces a moment arm relationship to joint torques learning algorithm like P I 2 , we can bridge the gap between
which is complex and highly pose-dependent. Roughly, it is a optimal control and tendon-driven systems: we simultaneously
flexor of the metacarpophalangeal (MCP) joint, the ”knuckle“, enjoy the benefit of choosing cost functions instead of refer-
while being an extensor of the two distal joints. With the hand ence trajectories or gains while also circumventing the need for
outstretched, thumb up, the lumbrical would be used to push an accurate model of the robot and environment. Future work
the fingertip as far as possible from the palm, straightening will investigate the generalizability of learned policies and the
the finger while flexing the knucle. This is exactly the motion possibility of using this algorithm also as a data collection
necessary for the finger to contact the switch for the longest mechanism for system identification.
amount of time during downstroke. The absence of models for the underlying hand and task
By examining the control strategy learned for the lumbrical dynamics as well as the lack of policy parameterization does
tendon in concert with the more straightforward tendons, we not come without cost. As demonstrated in this paper, our
observe its protean nature. As illustrated in Figure 8, it acts as learning approach convergences to an optimal solution for all
a flexor at the beginning of the motion, with a clear peak of experiments, nevertheless this may require the execution of
torque being delivered in concert with the FDP/FDS flexors many rollouts on the real system. This is not a surprising
at the moment of greatest switch movement. However, after observation since it is related to the exploitation-exploration
motion has ceased we observe cocontraction of all tendons, but trade-off in reinforcement learning literature [21] in correspon-
the pattern of activation of Lumbrical more closely matches dence to how much information of the underlying dynamics
the torque profile of the extensors. is initially provided to the learning algorithm. Clearly, once
a model is known, its use could speed up learning. Future
VIII. R ELATED W ORK AND D ISCUSSION work will explore the use of Stochastic Differential Dynamic
Programming (SDDP) [25] or iLQG on contact-less models
A. Optimal Control for Tendon-Driven Systems and integrate the resulting control policies with P I 2 to learn
Optimal control provides a principled approach for robots to tasks with contact and motion to force control transitions.
expertly perform tasks without relying on pre-programmed or
stereotyped motions. Recent successes in applying algorithms B. Discovery of System Phenomena
like iLQG [17, 15, 18] suggest that difficult control tasks, like The second execution of Experiment 2 discovered that
dexterous manipulation, may be better achieved by formulating the switch could be thrown beyond xreach , and the learned
models and cost functions than by laboriously hand-tuning ref- behavior for Experiment 3 involved cocontraction of antago-
erence trajectories or feedback gains directly. Notably, optimal nists, as seen in biological tendon actuation, without explicit
control can solve for time-varying torque and stiffness profiles instruction. By interacting directly with the environment, the
to achieve dynamic tasks compliantly[2]. robotic system learned to perform dynamic behavior never en-
In contrast however with [17, 15, 2] we do not use state countered in the demonstration, exploiting subtle phenomena
space dynamical models. P I 2 is a derivative-free method in without system ID. The advantages of this kind of embodied
the sense that it does not require linearization of dynamics learning have been explored in a variety of experiments, for
and quadratic approximations of cost functions on state space example learning circuit configurations [26] or robot control
trajectories. More importantly the optimal control in our ap- and morphology [1]. Manipulators incorporating complex ac-
proach is an average over sampled controls based on how well tuation and eventually sensory capabilities could benefit from
they performed on the real system and thus no differentiation learning from task-relevant experience as opposed to expert
of the value functions is performed. With respect to previous knowledge.
work on variable stiffness control with reinforcement learning Similarly, analysis of the effects of varying aspects of
[4, 3], our approach does not use function approximators to complex systems like tendon networks can be difficult due to
represent gains or desired trajectories. This lack of policy dimensionality, model mismatch, and nonlinear phenomena.

375
As described in Section VII-D, the learned policy produces ball bouncing. In IEEE International Conference on
torques which validate the biomimetic anatomical properties of Robotics and Automation, pages 2144–2151, 2011.
the tendon hood extensor mechanism. The approach outlined [16] S.A. Migliore, E.A. Brown, and S.P. DeWeerth. Bio-
in these experiments is valid for a variety of complex systems logically inspired joint stiffness control. In IEEE Inter-
which might otherwise be difficult to measure or simulate. national Conference on Robotics and Automation, pages
By observing that the learned usage the Lumbrical tendon 4508–4513. IEEE, 2005.
produces dynamic consequences similar to those known for [17] D. Mitrovic, S. Nagashima, S. Klanke, T. Matsubara,
humans, we now can more confidently assert the biomimicry and S. Vijayakumar. Optimal feedback control for
of that aspect of the tendon network. anthropomorphic manipulators. In IEEE International
Conference on Robotics and Automation, pages 4143–
R EFERENCES 4150.
[18] D. Mitrovic, S. Klanke, and S. Vijayakumar. Learning
[1] J. Bongard. The utility of evolving simulated robot impedance control of antagonistic systems based on
morphology increases with task complexity for object stochastic optimization principles. International Journal
manipulation. Artificial Life, 16(3):201–223, 2010. of Robotics Research, 30(5):556–573, 2010.
[2] D.J. Braun, M. Howard, and S. Vijayakumar. Exploit- [19] E. Rombokas, E. Theodorou, M. Malhotra, E. Todorov,
ing variable stiffness in explosive movement tasks. In and Y. Matsuoka. Tendon-driven control of biomechan-
Robotics: Science and Systems Conference, 2011. ical and robotic systems: A path integral reinforcement
[3] J. Buchli, E. Theodorou, F. Stulp, and S. Schaal. Variable learning approach. 2012.
impedance control - a reinforcement learning approach. [20] R.F. Stengel. Optimal Control and Estimation. Dover
In Robotics: Science and Systems Conference, 2010. Publications, New York, 1994.
[4] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal. Learning [21] Richard S. Sutton and Andrew G. Barto. Reinforcement
variable impedance control. International Journal of learning : An introduction. Adaptive computation and
Robotics Research, 30(7):820, 2011. machine learning. MIT Press, Cambridge, 1998.
[5] A.D. Deshpande, J. Ko, D. Fox, and Y. Matsuoka. [22] E. Theodorou. Iterative Path Integral Stochastic Optimal
Anatomically correct testbed hand control: muscle and Control: Theory and Applications to Motor Control. PhD
joint control strategies. In IEEE International Conference thesis, University of Southern California, 2011.
on Robotics and Automation, pages 4416–4422, 2009. [23] E. Theodorou and E. Todorov. Relative entropy and
[6] A.D. Deshpande, Z. Xu, M. J. V. Weghe, L. Y. Chang, free energy dualities: Connection to path integral and kl
B. H. Brown, D. D. Wilkinson, S. M. Bidic, and Y. Mat- control. Submitted, 2012.
suoka. Mechanisms of anatomically correct testbed [24] E. Theodorou, J. Buchli, and S. Schaal. A generalized
(ACT) hand. IEEE Trans. Mechatronics, to appear, 2012. path integral approach to reinforcement learning. Journal
[7] P. Dorato, V. Cerone, and C. Abdallah. Linear Quadratic of Machine Learning Research, (11):3137–3181, 2010.
Control: An Introduction. Krieger Publishing Co., Inc., [25] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic
2000. differential dynamic programming. In American Control
[8] A. Friedman. Stochastic Differential Equations And Conference, 2010.
Applications. Academic Press, 1975. [26] A. Thompson. An evolved circuit, intrinsic in silicon,
[9] N. Hogan. Adaptive control of mechanical impedance by entwined with physics. Evolvable Systems: From Biology
coactivation of antagonist muscles. IEEE Transactions on to Hardware, pages 390–405, 1997.
Automatic Control, 29(8):681–690, 1984. [27] E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics
[10] N. Hogan. Impedance control: An approach to manipu- engine for model-based control.
lation. Journal of Dynamic Systems, Measurement, and [28] F.J. Valero-Cuevas, F.E. Zajac, C.G. Burgar, et al.
Control, 107(2):17, 1985. Large index-fingertip forces are produced by subject-
[11] H. J. Kappen. Linear theory for control of nonlinear independent patterns of muscle excitation. Journal of
stochastic systems. Phys Rev Lett, 95:200–201, 2005. Biomechanics, 31(8):693–704, 1998.
[12] H. J. Kappen. Path integrals and symmetry breaking for [29] M. Venkadesan and F.J. Valero-Cuevas. Neural control
optimal control theory. Journal of Statistical Mechanics: of motion-to-force transitions with the fingertip. Journal
Theory and Experiment, 11:P11011, 2005. of Neuroscience, 28(6):1366–1373, 2008.
[13] I. Karatzas and S.E. Shreve. Brownian Motion and [30] D.D. Wilkinson, M.V. Weghe, and Y. Matsuoka. An
Stochastic Calculus. Springer, 2nd edition, August 1991. extensor mechanism for an anatomical robotic hand. In
[14] H. Kobayashi, K. Hyodo, and D. Ogane. On tendon- IEEE International Conference on Robotics and Automa-
driven robotic mechanisms with redundant tendons. The tion, 2003.
International Journal of Robotics Research, 17(5):561–
571, 1998.
[15] P. Kulchenko and E. Todorov. First-exit model predictive
control of fast discontinuous dynamics: Application to

376
Guaranteeing High-Level Behaviors
While Exploring Partially Known Maps
Shahar Sarid Bingxin Xu Hadas Kress-Gazit
Sibley School of Mechanical and Aerospace Engineering
Cornell University
Ithaca, New York 14853, USA
Email: {sarid, bx38, hadaskg}@cornell.edu

Abstract—This paper presents an approach for automatically What is common to all aforementioned approaches is that
synthesizing and re-synthesizing a hybrid controller that guaran- the workspace is assumed to be known; that is, the robot is
tees a robot will exhibit a user-defined high-level behavior while required to perform its high-level task within a known map.
exploring a partially known workspace (map).
The approach includes dynamically adjusting the discrete In the process of generating the control, the map is abstracted
abstraction of the workspace as new regions are detected by to a graph where the nodes are regions in the map and edges
the robot’s sensors, automatically rewriting the specification (for- in the graph indicate that the robot can move between the
mally defined using Linear Temporal Logic) and re-synthesizing two regions without going through a third region. In general,
the control while preserving the robot state and its history of task when considering a convex partition of a polygonal map, the
completion. The approach is implemented within the LTLMoP
toolkit and is demonstrated using a Pioneer 3-DX in the lab. edges represent the adjacency of the regions (an edge exists
between nodes representing two regions that share a common
face) since there are several low-level controllers that have
I. I NTRODUCTION
been developed over the years that are guaranteed to drive a
Consider the case of a “Search and Rescue” scenario where robot from any point within the convex polygon to one of its
a mobile robot is patrolling inside a collapsed building with faces, and thus to the adjacent region (e.g. [4, 1, 16]).
unknown rooms. The robot is required to explore all the This paper relaxes the assumption of an a priori completely
accessible rooms and search for survivors. The robot’s high- known map; instead, an initial map is assumed which may
level behavior must be guaranteed to ensure safety and mission contain as little as one region, and as the robot is performing
completion while expanding the map to include unknown its task, new regions are discovered and the task is adjusted
regions. accordingly in an automatic and provable-correct manner. The
Recently, several approaches have been suggested for gener- new regions are detected using an occupancy grid [7] and then
ating correct high-level robot behavior [14, 12, 8, 15, 11, 2, 20] added to the known map, gradually expanding and creating a
from an abstract description of a task; these approaches tackle larger map.
the inherent continuous problem of robot sensing, motion and This work builds on the approach of [14] where a fragment
action by creating a discrete abstraction of the task, generating of LTL is the formalism used to captures tasks that are reactive,
a provably-correct discrete solution and implementing the so- meaning that the robot behavior depends on the occurrence
lution by composing a set of continuous low-level controllers of environmental events, for example “continually patrol all
such that the overall continuous behavior of the robot is the the rooms, stop if you see a person and ask them if they
desired one. need help”. Given a formula in the logic, a tractable synthesis
The specification formalisms are usually a variant of tem- technique is used to generate an automaton such that all
poral logic, with Linear Temporal Logic (LTL) being the its executions satisfy the LTL formula. The complexity of
most commonly used. The creation of the discrete solution is synthesizing an arbitrary LTL formula is double exponential in
based on ideas from formal methods, mainly model checking the size of the formula, but if the specification is restricted to a
[3] and synthesis [17] when assuming no noise on sensors specific fragment of LTL, the complexity becomes polynomial
and actuators (e.g. [14, 12, 20, 2]), and probabilistic model in the state space [17].
checking and policy synthesis for Markov Decision Processes The contribution of this paper is in providing guaran-
(MDPs) when uncertainty is taken into account (e.g.[15, 10]). teed high-level behavior of robots operating in unknown
When implementing the continuous behavior, researchers have workspaces. Specifically, this paper describes: (i) The set of
considered potential field type motion planners (e.g. [14, 12]) user-specified high-level tasks which is enriched by allowing
as well as sampling-based approaches (e.g. [11, 2]). quantifiers (‘all’, ‘any’) over regions. This allows a user to say
“visit all offices”. (ii) The discrete abstraction of the workspace
This work was supported by NSF CNS-0931686 and NSF CAREER CNS- that is updated on-the-fly based on sensor information, and
0953365 (iii) A re-synthesis algorithm. The algorithm preserves the task

377
A. Linear Temporal Logic (LTL) specifications
Linear Temporal Logic (LTL) is a modal logic that includes,
in addition to Boolean operators (such as ‘not’, ‘and’, etc.),
temporal operators which allow formulas to capture truth
values of atomic propositions (π) as they evolve over time.
LTL formulas are constructed from atomic propositions π ∈
AP according to the following recursive rules
ϕ ::= π|¬ϕ|ϕ ∨ ϕ|  ϕ|3ϕ|2ϕ (1)
where  is ‘Next’, 3 is ‘Eventually’, and 2 is ‘Always’.
This work considers robot specifications that are defined
over a discrete abstraction of the robot motion and action
and are captured in a fragment of LTL. Specifically, the
atomic propositions comprise of a set X = {x1 , ..., xm }
of environment propositions corresponding to abstract sen-
Fig. 1: The robot is visiting the classrooms. The cone repre-
sor information (e.g. “object detected”), and a set Y =
sents a hazardous item in front of a classroom and the raised
{r1 , ...rn , a1 , ..., ak } of robot propositions that correspond to
flag was the robot’s response to detecting the cone. The cup
the location of the robot (if ri is true then the robot is currently
contained a touch sensor and was used to collect assignments.
in region i) and its actions aj (e.g.“flag is raised”).
The fragment of LTL considered in this work follows [17,
history and automatically updates the instructions of the robot
14] where formulas are of the form ϕ = (ϕe ⇒ ϕs ); ϕe is
behaviors, according to the new workspace.
an assumption about the sensor propositions, and thus about
Testing the re-synthesis method and algorithms in real en-
the behavior of the environment, and ϕs represents the desired
vironments with a physical robot equipped with range sensors,
behavior of the robot. ϕe and ϕs have the following structure:
gives rise to questions regarding when and how to decide a new
ϕe = ϕei ∧ ϕet ∧ ϕeg ; ϕs = ϕsi ∧ ϕst ∧ ϕsg , where:
region is detected. As a first step toward a generalized solution e s
• ϕi ,ϕi are non-temporal Boolean formulas constraining
to the problem, assumptions regarding the environment are
made (Section III). the initial value(s) for the environment and robot respec-
The work described in this paper was implemented on tively. They are of the form ∧i Bi , where Bi is a boolean
a physical robot in the lab, where the onboard laser range formula over the set X ∪ Y .
e s
• ϕt ,ϕt represents safety assumptions on the environment
scanner was used to detect previously unknown regions while
the robot was performing a high-level tasks. The task described and safety requirements on the robot’s behavior. Safety
in Section VI required the robot to collect assignments from includes all behaviors that the environment or the robot
students in classrooms and hand them to professors in their must always satisfy. ϕet constrains the next environment
offices. When the robot saw a hazardous item in front of a state based on the current environment state and current
classroom (a cone), it avoided entering that classroom and robot state, and ϕst constrains the possible next robot state
raised a flag of warning (Fig. 1). The robot detected new based on the current environment, current robot state, and
rooms, distinguished between classrooms and offices accord- the next environment state. The formulas are of the form
ing to their size, and continued behaving according to the ∧i 2Bi where Bi is a boolean formula over the set X ∪
specification in the a priori unknown workspace. Y ∪ X for ϕet and X ∪ Y ∪ X ∪ Y for ϕst .
e s
• ϕg , ϕg represent fairness assumptions for the environment
The paper is structured as follows: Section II presents back-
ground information regarding the approach and formalisms for and liveness requirements for the robot. These include
generating high-level correct robot control. Section III defines goals the environment or the robot should always even-
the problem this paper is focusing on and the assumptions tually satisfy. The formulas are of the form ∧i 2 5 Bi
that are made. Section IV discusses the enriched grammar where Bi is a boolean formula over the set X ∪ Y .
containing region quantifiers, and Section V describes the re- One property of the synthesis algorithm is that the order of
synthesis process. Section VI describes the experiments and the formulas in ϕsg determines the sequence in which the robot
the paper concludes in Section VII. will fulfill its liveness requirements. For a liveness requirement
ϕi , the goal number gNum(ϕi ) = i indicates that ϕi is in
II. BACKGROUND the i-th position in the sequence of goals. For example, in
ϕsg = 23ra ∧ 23(rb ∨ f lag), the requirement that the robot
This section presents the background information needed eventually go to ra is of gNum = 0 (the first liveness) and the
for the remainder of the paper. It includes the definition requirement of going to rb or raising the flag is of gNum = 1.
of the underlaying logical formalism, an overview of the
process of generating provably-correct robot control from B. Discrete abstraction of the workspace
logical formulas and a description of LTLMoP [9], the toolbox The robot’s workspace is assumed to be a 2 dimensional
that is used to generate the control and perform experiments. polygonal environment. The motion of the robot in the

378
workspace is abstracted by a graph where each node represents order of the size of the robot. This assumption is essential,
a region and the edges represent adjacency relations between since the robot must be positioned in a free area first. Second,
the regions. Leveraging controllers such as those of [4, 1, 16], the robot must have the capabilities to detect new regions, i.e.,
given a set of convex polygons, a robot can move between it must have adequate sensors that can sense obstacles or lack
any adjacent regions if there are no obstacles. thereof.
For example, from region r2 shown in Figure 2 the robot The robot is assumed to have the appropriate sensors and
can move to adjacent regions r0 , r1 , r3 or stay in r2 . The actuators to perform the high-level tasks it is instructed to
motion constraints are captured by: do. These sensors and actuators are assumed to be binary, as
described in Section II.
2(r2 ⇒ (r2 ∨ r0 ∨ r1 ∨ r3 ))
Definition 1: Robot model. The robot is assumed to be of
C. Control generation size D, where size can be the diameter of a circular robot, or
Given a robot task as an LTL formula belonging to the the longest dimension in the plane of movement. We assume
fragment described above, if the task is synthesizable [18] that the robot’s structure and dynamic properties constrain its
an automaton whose behaviors satisfy the formula will be movement such that it has a minimal operating circular area of
automatically generated (see [17, 14] for details). radius sD (for example for turning), where s ≥ 1 is a safety
The hybrid controller used to continuously control the robot factor which depends on the robot’s structure and dynamic
is based on the execution of the automaton. An admissible properties. The robot’s position and orientation relative to a
input sequence is a sequence X1 , X2 , ... s.t. Xj ∈ 2X is the fixed frame are assumed to be known.
set of environment propositions that are true at time step j, Definition 2: Region. A region p with index i, denoted pi , is
that satisfies ϕe . A run of the automaton under an admissible a polygon in the plane, defined as a circuit of j line segments,
input sequence is a sequence of states q0 , q1 , ..., which starts at sides [5], or edges [6], pi1 pi2 , pi2 pi3 , . . . , pij pi1 joining j = pi 
a possible initial state of the automaton: q0 ∈ Q0 . At each time points, or vertices, pi1 , pi2 , pi3 , . . . , pij . The edges of pi must
step, the robot sensor information is used to determine the truth not cross each other. The polygon is regarded as consisting of
values of the environment propositions X, and together with its vertices, edges, and the interior area bounded by it. Each
the current state q the next state q  is determined following region is simply connected, meaning there are no holes within
the transition relation δ, i.e., q  = δ(q, X). γ is the state it. The region pi can be non-convex. A region is free from
labeling function where γ(q) = y and y ⊆ Y is the set of obstacles, and it contains a bounded circle ri whose diameter
robot proposition that are true in state q. Based on the labels dri satisfies dri ≥ sD to comply with Definition 1.
of q  , the next region and the next actions are performed and Definition 3: Initial region: The robot starts from initial
the appropriate low-level controllers are executed. The reader region p , which conforms to Definition 2.
is referred to [14] for more details. Definition 4: Workspace. The workspace is the map the
Every state q ∈ Q has an associated goal number, which robot uses, in which its operation can be guaranteed and is
indicates the current goal (liveness requirement) the robot is composed of a set of regions as defined above. The current
heading toward. This goal number is denoted by γr (q). workspace is assumed to be known in each step i, and is
denoted P i . A step is defined as a stage of execution, for which
D. LTL MissiOn Planner (LTLMoP) the map does not change. If the map changes, as explained
Linear Temporal Logic MissiOn Planning (LTLMoP) [9] later, the step is incremented. Each workspace has a boundary
is a Python-based, open-source toolkit that allows users to defined as follows.
control physical and simulated robots by specifying high- Definition 5: Boundary. The boundary B i of a workspaces
i
level instructions in structured English. Furthermore, if a P is defined as a polygon, or a set of polygons, composed
specification contains behaviors that cannot be guaranteed of all the edges of the regions of P i which are not shared
(they may be inconsistent or there may be an environment in between two regions. If the workspace is a simple polygon,
which the robot fails), no automaton will be synthesized and its boundary is composed from only one polygon. However, if
LTLMoP facilitates the process of understanding the problem the workspace contains holes, the boundary is composed from
in the given specification [18]. The experiments described in an outer polygon, and another polygon for each hole. Within
this paper were executed using LTLMoP as further discussed each step, the workspace is assumed to be static inside its
in Section VI. boundary, but dynamic in the sense that a previously occupied
boundary edge can become unoccupied and serve as the edge
III. P ROBLEM F ORMULATION AND A SSUMPTIONS of a new detected region.
This paper focuses on guaranteeing the execution of high- Definition 6: Expansion. The workspace can be dynami-
level tasks by a mobile robot operating in an a priori unknown cally expanded in each step. The robot starts in step 0 with
environment. While executing it’s mission, the robot detects initial workspace P 0 , where P 0 contains at least the initial
regions that are not defined in its current map. The basic region p , and is possibly composed of a total of g regions,
assumptions regarding the unknown regions are as follows. such that P 0 = p1 ∪ p2 ∪ p3 . . . ∪ pg . In each step i, a new
First, the robot starts the execution with a partially known region pi is added. Each new added region pi conforms to
map. The initially known map can be arbitrary small, on the

379
A. Quantifiers
Consider the scenario in Example 1, the same task is
assigned over multiple regions. When a new region is detected,
no matter safe or dangerous, extra specification is needed for
defining the tasks over the new region. Therefore an automatic
process for assigning tasks over multiple regions is necessary.
To deal with such conditions, quantifiers are introduced in
Fig. 2: Left: Initial known map. The robot position is marked this section. To apply quantifiers over multiple regions, one
with a white circle. Right: Two new, unknown regions, safe1, defines groups as follows:
danger1, are updated when the robot visits r2, r3 • Group groupName is region1, region2, region3...
If the groupName is used together with the quantifiers “all” or
Definition 2. Moreover, each pi must be adjacent to at least “any”, the sentences are automatically converted into LTL for-
one known region ,ph , h = 1 . . . i − 1 or ph ∈ P 0
g. The
mulas over the regions. The translation process is as follows:
workspace is recursively defined as follows, P = d=1 pd ,
0 if quantifier ‘any’:
P i = P i−1 ∪ pi , ∀i > 0. result = join regioni [:] with ∨
Definition 7: New-region sensor: The robot is assumed to substitute groupName with result
have a new-region sensor, which can detect new regions. if quantifier ‘all’
The new-region sensor is capable of distinguishing, up to its resulti = substitute groupName with regioni
operating range, between free and occupied space1 . According join resulti [:] with ∧
to Definition 2, the dimensions of a new region must have Revisiting Example 1, the new specification would be:
minimal values, therefore, the sensor’s operating range must • Group Safe is r0 , r1

be greater than sD. • Group Dangerous is r2 , r3

Problem 1: (Operating in an unknown environment) Given • If you are not activating guide then visit all Dangerous

an initial known workspace P 0 , an initial region p ∈ P 0 , • If you are activating guide then visit any Safe

a mobile robot equipped with a new-region sensor, initial is translated into


conditions and task specifications, construct a controller such ∧i∈{2,3} 23((¬aguide ) ⇒ ri )
that the robot’s behavior satisfies the user specification, in any
∧ 23((aguide ) ⇒ (r0 ∨ r1 ))
admissible, possibly unknown environment.
The problem formulation is illustrated in the following These quantifiers facilitate writing specifications for un-
example which demonstrates a simple high-level task defined known workspaces as they do not require the explicit enumer-
over an unknown workspace and executed using LTLMoP. ation of all regions. When a new region is detected, the user is
not required to manually write the additional specification and
Example 1: Search and Rescue mission. the robot is able to automatically re-synthesize the controller
Consider the case of a ”Search and Rescue” scenario where and resume the execution. The grammar for re-synthesis is
a mobile robot is placed in a building which collapsed due introduced in Section IV-B, and the algorithm is introduced in
to an earthquake. The robot must explore the inner parts of Section V.
what is left from the building and search for survivors. The
B. Specification for re-synthesis
building is partially ruined, such that the original blueprint
cannot be used. The robot is placed in a partially clear area, Now assume that in Example 1, when the robot enters r2, it
shown in Figure 2 (Left). There are two groups of regions: detects a new dangerous region danger1 adjacent to r2. Then,
dangerous and safe. The robot is required to visit all the when it enters r3, it detects a new safe region safe1 adjacent
dangerous regions: r2,r3, and to search for survivors (people). to r3. The new workspace is shown in Figure 2 (Right).
If the robot finds people, it will guide them back to one of the We define a special robot action “re-synthesize”, which ter-
safe places: r0 or r1. If a new region is detected, it is identified minates the execution of the hybrid controller and re-generates
by the sensor as safe or dangerous, and then it is added to the the controller. “re-synthesize” is used in the same manner as
map. a robot proposition in the requirements specification:
• If you are sensing regionSensor then do re-synthesize.
The re-synthesize action is activated when the regionSensor
IV. G RAMMAR
returns true.
The grammar [13, 9] used by LTLMoP has been enriched If the user explicitly indicates which group to add the new
with quantifiers and reactions to new regions needed for region to, the extra indication is allowed as follows:
defining tasks in unknown maps. • If regionSensor then do re-synthesize and add into group-
Name.
1 The new-region sensor should be some sort of a range sensor (could be The following relation is defined: regionSensor → groupName.
infrared, ultrasonic, laser scanner or camera using vision techniques) Later, during the re-synthesis process, if regionSensor = True,

380
the corresponding groupName is returned to ensure that the Algorithm 1 Low-level controller for the re-synthesize action
correct group is updated with the new region. Furthermore, 1: Break execution, reset snew−region , are−synthesize
only specifications applied to this group will need to be 2: CurrRobotState ⇐ γ(q)
rewritten as new LTL formulas. If the robot is capable of 3: CurrEnvState ⇐ values of sensor propositions
distinguishing between different features of the new regions, 4: NewInitState: q̂0 ⇐ CurrRobotState ∧ CurrEnvState
the detected regions can be added into different groups ac- 5: Modify discrete abstraction in workspace, get Ŷ (see
cordingly. One sensor proposition for each of the groups is Section V-B)
necessary, as in Example 1: 6: CurrGoalNum ⇐ γr (q)
• If you are sensing safe-new-region then do re-synthesize 7: Modify liveness conditions in LTL (see Algorithm 2)
and add it into Safe. 8: Re-synthesize the automaton
• If you are sensing dangerous-new-region then do re- 9: Load new automaton and resume execution
synthesize and add it into Dangerous.

V. C ONTROLLER RE - SYNTHESIS DURING EXECUTION B. Modifying the discrete abstraction


As defined in Section III, the robot maintains a map of the
This section describes the algorithms and automated pro- workspace and expands it on-the-fly. The map maintained by
cess for automatically re-synthesizing the high-level controller the robot is composed of a list of polygonal regions. At step i,
when new areas are found. It addresses the special robot action the robot is equipped with a map of the already known area,
re-synthesize, the need for capturing the current robot state and P i , as well as its outer boundary, B i .
goal, the detection of a new region using sensors, the process The new region sensor repeatedly checks each edge of the
of modifying the discrete abstraction of the workspace and current region pc , the region where the robot is currently
the algorithm for creating and synthesizing a modified LTL located. If the edge also belongs to the boundary, the new
formula. region sensor tests whether there is a new region emanating
from it. Defining a new region is possible only if the new
A. The robot action “re-synthesize” regions fulfill the following conditions: the length and the
width of the new region, and the width of the edge the new
Detection of new region is captured by a Boolean envi-
region is emanating from, must be greater than sD.
ronment proposition, regionSensor. The re-synthesis action is
Since the new region is adjacent to the known map, there
captured by a robot proposition. The two propositions are
must exist a transition boundary between the new region
denoted as snew−region and are−synthesize respectively. The
and the known regions. The motion constraint graph is re-
specification “If you are sensing regionSensor then do re-
generated from the new adjacency relationships between the
synthesize” is captured by the LTL formula:
regions, which is written into the robot safety assumption ϕst .
Revisiting Example 1, the change in the known workspace,
2(snew−region ⇒ are−synthesize )) (2)
as shown in Fig. 2, results in an updated ϕst :

The “re-synthesize” action is a special action since the low- ⎪ 2(r0 ⇒ (r0 ∨ r2 ))


level controller associated with it terminates the execution ⎪
⎪ ∧ 2(r1 ⇒ (r1 ∨ r2 ))


of the automaton, and calls the module to rewrite the LTL ⎪
⎨ ∧ 2(r ⇒ (r ∨ r ∨ r ∨ r ∨ danger ))
2 2 0 1 3 1
formula and re-synthesize an appropriate automaton.
The re-synthesis process is shown in Algorithm 1. In line ⎪ ∧ 2(r3 ⇒ (r3 ∨ r2 ∨ saf e1 ))



1, the execution of the current automaton is terminated. The ⎪
⎪ ∧ 2(danger1 ⇒ (danger1 ∨ r2 ))


following two propositions are reset: snew−region = F alse ⎩
∧ 2(saf e1 ⇒ (saf e1 ∨ r3 ))
and are−synthesize = F alse. In lines 2-3 the environment
propositions and robot propositions of the current state are C. Rearranging the liveness conditions
recorded, in order to describe the initial state of the robot when Given the current goal number, the robot is able to deter-
resuming the execution. In line 4, the initial condition for the mine which liveness requirement it was pursuing. Therefore,
new automaton is obtained and the LTL formulas ϕsi , ϕei for it is able to distinguish between the complete and incomplete
initial conditions are updated.
 In line 5, the new region propo- goals. The history of the completed high-level tasks is captured
sition is added as: Ŷ = Y rnew . The process of modifying by re-assigning the order of the goals. The robot is allowed to
the workspace and rewriting the LTL formula ϕst is discussed first address incomplete liveness requirements, and in addition,
in Section V-B. In line 6, the goal number of the liveness to choose the exploration strategies by inserting the new goal
requirement toward which the robot is currently moving is at different positions, either first, thereby creating a depth first
recorded. In line 7, re-ordering of the goal requirements is strategy, or afterwards, creating a breadth first strategy.
achieved by rewriting the LTL formula ϕsg as described in The detailed process is shown in Algorithm 2. In lines
Algorithm 2 and Section V-C. In lines 8-9, the updated LTL 1-2, the liveness requirements are classified as complete or
formula is synthesized and the new automaton is executed. incomplete goals. In line 3, the user’s predefined group for the

381
Algorithm 2 Rewriting the LTL formula for liveness require- Listing 1 Specification for the experiments.
ments
Environment starts with false
1: CompGoals ⇐ ϕ ∈ ϕsg , gN um(ϕ) < CurrGoalNum Robot starts with false
2: IncompGoals ⇐ ϕ ∈ ϕsg , gN um(ϕ) ≥ CurrGoalNum Always not (newClassroom and newOffice)
Always not ((newClassroom or newOffice) and hazardous)
3: groupName ← regionSensor Group Classrooms is classroom1
Group Offices is office1
4: Translate specs with groupName into LTL
Do flag if and only if you are sensing hazardous
5: newLiveness ⇐ liveness with newRegion in new LTL If you are sensing newClassroom then do re-synthesize and
add to Classrooms
6: if Depth First Order then
If you are sensing newOffice then do re-synthesize and add
7: ϕsg ⇐ newLiveness ∧ IncompGoals ∧ CompGoals to Offices
If you are sensing newClassroom or you are sensing
8: end if
newOffice then stay
9: if Breadth First Order then If you are not sensing newClassroom and you are not sensing
10: ϕsg ⇐ IncompGoals ∧ newLiveness ∧ CompGoals newOffice then do not re-synthesize
If you are not sensing assignment and you are not
11: end if activating re-synthesize then visit all Classrooms
If you are sensing assignment and you are not activating re
-synthesize then visit all Offices
If you are not sensing hazardous and you are not sensing
assignment and you are not activating re-synthesize
new region is loaded. In lines 4-5, the liveness with the relevant then visit classroom2
groupName is translated into LTL, as described in Section If you are sensing hazardous then always not classroom2
IV-A. In lines 6-8, the goals are re-ordered, if following the
Depth First strategy, the new goal is put before incomplete
goals. In lines 9-11, if following the Breadth First strategy,
the new goal is added after the set of incomplete goals.
Revisiting Example 1, assume that when the new region
danger1 is detected, the robot has already visited r1 , r2 , so the
priority of the original goals have been changed. The robot is
also capable of choosing between either a depth-first order or
breadth-first order, which is achieved as follows:
Depth First(Alg 2, ln 6-8): Breadth First(Alg 2, ln 9-11):
ϕsg = ϕsg =
2 5 ((¬a guide
) ⇒ danger1 ) 2 5 ((¬aguide ) ⇒ r3 )
∧2 5 ((¬a guide
) ⇒ r3 ) ∧2 5 ((¬aguide ) ⇒ danger1 ) Fig. 3: Left: The initial map is composed of hall1, hall2,
∧2 5 ((¬a guide
) ⇒ r2 ) ∧2 5 ((¬aguide ) ⇒ r2 ) classroom1, classroom2, office1. Right: field top view
∧2 5 ((a guide
) ⇒ (r0 ∨ r1 )) ∧2 5 ((aguide ) ⇒ (r0 ∨ r1 ))
such as sensing objects (e.g., cones) and performing actions
VI. E XPERIMENTS (e.g., raising a flag).
The new algorithms and procedures for detecting and The mobile robot used is based on the Pioneer 3-DX mobile
adding unknown regions to the synthesized controller were robot platform, upgraded with a compact PC on top, and
validated in a real environment with a physical robot. The sensors and actuators as follows. For sensing new regions, a
experiments illustrates how high-level tasks are automatically Hokuyo URG04-LX Scanning Laser Range Finder was used.
adjusted while the environments are being expanded. The range finder scans at 10 Hz with a field of view of
Example 2: Classroom assistant 240 degrees, and angular step size of 0.36 degree. The range
The robot is looking for students who need to submit assign- is approximately 4 meters. The scanner data was overlaid
ments to their professors in the workspace depicted in Fig. 3. on an occupancy grid with resolution of 10 [cm]. LTLMoP
The robot is wandering through the classrooms until it finds communicated with the robot via wireless communication.
such a student. Once it is handed an assignment, it searches The experiments were conducted in an indoor environment.
for a professor in the offices until it finds one. If a door opens The Vicon Motion Capture system was used for obtaining
(new classroom or new office), the robot explores that region, accurate pose information for the robot. The Vicon system
classifies it, and continues executing its task accordingly. If consists of 24 infrared cameras mounted on a truss attached
the robot detects a hazardous item in front of classroom2, it to the ceiling, allowing for accurate tracking of rigid bodies,
avoids entering that classroom and it raises its flag as warning. marked with reflective markers.
The specifications, given in structured English, are presented The new-region sensor detects new regions as explained in
in Listing 1. Section V-B. The new region must be at least sD wide and sD
deep relative to the current region edge it is emanating from
A. Experimental Setup and it is assumed to be rectangular. The new region sensor
The experimental test bed includes a mobile robot which scans the occupancy grid for unoccupied cells, starting from
can move autonomously in the environment and detect new the edges of the current region. It only scans edges which are
regions. Additionally, the robot is able to perform other actions contained in boundary edges. If the sensor finds an opening

382
larger than a minimal width, it continues to scan parallel lines
on the grid, advancing outward from the current region. The
scan continues until a line is not wide enough. During the
scan, the start and end points of each line are recorded, along
with its depth, such that at the end of the scan, the new
region can be defined according to the maximal width, depth
or area, as long as it fulfills the minimal size requirements. The
new-region sensor distinguishes between a small new region
and a large new region by checking whether either the width
or the depth of the new region exceeds a certain threshold.
In the reported experiment, the threshold was set to 2sD,
D = 0.46 [m] and s = 1.5. In the experiments scenario, the (a) After detecting the classroom new1. (b) LTLMoP map after adding
small regions are referred to as offices and the large regions The boundary is in yellow, the regions are classroom new1.
as classrooms. In order to maintain the guarantees, the new- separated by turquoise lines.
region sensor is implemented conservatively. The sensors and
robot propositions used are:
Sensors: newClassroom, newOffice, hazardous, assignment
• newClassroom/newOffice: The two propositions are at-
tached to the same new-region sensor. If the detected new
region is larger than the threshold, the sensor returns true
in newClassroom. Otherwise newOffice returns true.
• hazardous: A red cone served as a hazardous item. The
cone detector is composed of a software blob detector,
which receives the image frames from a video camera (c) After adding classroom new2 and of- (d) LTLMoP map after the ad-
mounted on the mobile robot. fices, new3,new4. dition of the office new4.
• assignment: The object sensor, which is a cup that can
sense the presence or absence of objects within, was used
to sense the presence of an assignment.
Actuators: flag, re-synthesize:
• flag: The robot action is raising a flag if the proposition
value is true, or lowering a flag if false.
• re-synthesize: When the proposition is true, it activates
the local controller that terminates the execution and (e) Student submit assignment to (f) Robot submit assignment to pro-
robot in classroom2. fessor in office new4.
starts the re-synthesis process.

B. Description of the experiment


The robot starts in classroom1, and heads toward hall1.
As soon as it enters hall1, it detects new1 (Fig. 4a,4b) and
classifies it as a large room, thus adding it to the group
classrooms and then re-synthesizing. It visits region new1
and classroom1 according to the specifications, then it heads
toward classroom2. On its way, the robot passes through
(g) Avoiding classroom2 and raising the (h) Visiting the office new3
hall2, where it detects new2 (large) (Fig. 4c,4d), adds it to warning flag as a response to the hazardous with assignment in the cup
group classrooms, and re-synthesizes. Immediately afterwards, cone.
it detects new3 (small) (Fig. 4c,4d), adds it to group offices Fig. 4: Assignment collecting and handing experiment
and re-synthesizes. Afterward, it continue to classroom new2,
there, a student hands the robot an assignment (Fig. 4e), it continues to search for students in the classrooms, starting
and the robots start searching for the professor to submit the with classroom2 which it did not visit, yet.
assignment to in the offices new3 and office1. Since the robot The robot continues its correct execution according to
does not find the professor in those offices, it continues to the specifications, continuously searching for students with
search the offices until a door to a new office is opened, the assignments and handing them to professors. After some time,
robot detects new4 (Fig. 4c,4d), adds it to group offices and a hazardous item appears in front of classroom2 (Fig. 4g).
re-synthesizes. The robot moves to the new4 office, there it When the robot is in hall2 and heading to classroom2, it
finds the professor and hands him the assignment (Fig. 4f). detects the hazardous item, consequently, it raises the warning
Since now the robot does not have assignments to deliver, flag (Fig. 4g), skips classroom2 and continues to new2 instead.

383
After the hazardous item exits the view of the sensor, the robot [6] M. de Berg, M. van Kreveld, M. Overmars, and
lowers the flag, and continues execution with correct behavior. O. Schwarzkopf. Computational Geometry: Algorithms
From the experiment execution [19], it is evident that under and Applications. Springer-Verlag, second edition, 2000.
the assumptions of Section III, correct behavior is achieved [7] A. Elfes. Sonar-based real-world mapping and naviga-
even when adding regions that are unknown a priori. Moreover, tion. Robotics and Automation, IEEE Journal of, 3(3):
the robot correctly classified the new regions according to their 249 –265, June 1987.
sizes, and added them to the specifications accordingly. [8] G.E. Fainekos, A. Girard, H. Kress-Gazit, and G.J.
Pappas. Temporal logic motion planning for dynamic
VII. C ONCLUSIONS AND F UTURE W ORK robots. Automatica, 45(2):343 – 352, 2009.
This paper describes an approach to guaranteeing high-level [9] C. Finucane, G. Jing, and H. Kress-Gazit. LTLMoP
robot behaviors in partially known and continuously updated : Experimenting with Language , Temporal Logic and
workspaces through re-synthesis. This includes modifying the Robot Control. In IEEE/RSJ International Conference
discrete abstraction of the workspace on-the-fly, preserving on Intelligent Robots and Systems., pages 1988–1993,
the robot’s state and history of task progress and sequencing Taipei, Taiwan, 2010.
the liveness requirements such that the robot exhibits the [10] B. Johnson and H. Kress-Gazit. Probabilistic Analysis of
desired behavior. The proposed methods and algorithms were Correctness of High-Level Robot Behavior with Sensor
successfully tested in experiments. Moreover, it was shown Error. In Proceedings of Robotics: Science and Systems,
that specific features of the new detected regions can be used to Los Angeles, CA, USA, June 2011.
classify them into different groups, enabling the automation of [11] S. Karaman and E. Frazzoli. Sampling-based motion
redefinition of the high level task accordingly, and maintaining planning with deterministic μ-calculus specifications. In
the correct execution. IEEE Conference on Decision and Control, pages 2222–
The process of changing the controller is automated and a 2229, 2009.
new controller is guaranteed to be correct with respect to the [12] M. Kloetzer and C. Belta. A fully automated framework
specifications. Furthermore, if a controller cannot be gener- for control of linear systems from LTL specifications. In
ated, it means that the task can no longer be achieved in the Hybrid Systems: Computation and Control, volume 3927,
modified workspace and LTLMoP can provide explanations pages 333–347, 2006.
for the failure. [13] H. Kress-Gazit, G.E. Fainekos, and G.J. Pappas. Trans-
Ongoing research is focused on improving the new region lating Structured English to Robot Controllers. Advanced
detection, such that the under approximation will be closer to Robotics, 22(12):1343–1359, 2008.
the exact free area. Another research direction aims to deal [14] H. Kress-Gazit, G.E. Fainekos, and G.J. Pappas. Tempo-
with the changes in known regions, such as the unpredictable ral Logic based Reactive Mission and Motion Planning.
appearance of obstacles which may result in re-decomposition IEEE Transactions on Robotics, 25(6):1370–1381, 2009.
of the regions, or even in unrealizable specification. [15] M. Lahijanian, J. Wasniewski, S.B. Andersson, and
C. Belta. Motion planning and control from temporal
ACKNOWLEDGMENTS logic specifications with probabilistic satisfaction guar-
The authors gratefully acknowledge the contribution of antees. In Robotics and Automation (ICRA), 2010 IEEE
Cameron Finucane, Eric Sample, and Nyk Lotocky to the International Conference on, pages 3227–3232, May
development process and to the implementation. 2010.
[16] S. R. Lindemann and S. M. LaValle. Smooth Feedback
R EFERENCES for Car-Like Vehicles in Polygonal Environments. In
[1] C. Belta and L.C.G.J.M. Habets. Constructing decidable IEEE Conference on Robotics and Automation, 2007.
hybrid systems with velocity bounds. In IEEE Confer- [17] N. Piterman, A. Pnueli, and Y. Sa’ar. Synthesis of Reac-
ence on Decision and Control, Bahamas, 2004. tive(1) Designs. In VMCAI, pages 364–380, Charleston,
[2] A. Bhatia, L.E. Kavraki, and M.Y. Vardi. Sampling- SC, Jenuary 2006.
based motion planning with temporal goals. In IEEE [18] V. Raman and H. Kress-Gazit. Analyzing unsynthesiz-
International Conference on Robotics and Automation able specifications for high-level robot behavior using
(ICRA), pages 2689–2696, 2010. ltlmop. In CAV, pages 663–668, July 2011.
[3] E.M. Clarke, O. Grumberg, and D.A. Peled. Model [19] S. Sarid, B. Xu, and H. Kress-Gazit. Guaranteed
Checking. MIT Press, Cambridge, Massachusetts, 1999. High-level robot behavior while exploring unknown
[4] D.C. Conner, A.A. Rizzi, and H. Choset. Composition workspace, 2012. URL http://www.youtube.com/watch?
of Local Potential Functions for Global Robot Control v=oiF4Wt8xgio. Video (YouTube).
and Navigation. In IEEE/RSJ Int’l. Conf. on Intelligent [20] T. Wongpiromsarn, U. Topcu, and R.M. Murray. Reced-
Robots and Systems, pages 3546 – 3551, Las Vegas, NV, ing Horizon Temporal Logic Planning. In IEEE Confer-
October 2003. ence on Decision and Control (CDC), pages 5997–6004,
[5] H.S.M. Coxeter. Regular polytopes. Dover Publications, December 2009.
1973.

384
Development of a Testbed for
Robotic Neuromuscular Controllers
Alexander Schepelmann, Michael D. Taylor, Hartmut Geyer
The Robotics Institute
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
Email: [email protected], [email protected], [email protected]

Abstract—Current control approaches to robotic legged loco- between gaits [28][4][20]. Similar neuro-scientific experiments
motion rely on centralized planning and tracking or motion pat- reveal that, in biological systems, dexterous performance of
tern matching. Central control is not available to robotic assistive segmented legs is realized to a large extent by local feedback
devices that integrate with humans, and matching predefined
patterns severely limits user dexterity. By contrast, biological controls that bypass central processing, and by biomechanical
systems show substantial legged dexterity even when their central designs that hardcode functional leg responses [5][24][14][7].
nervous system is severed from their spinal cord, indicating that Recently, neuromuscular models of human locomotion were
neuromuscular feedback controls can be harnessed to encode developed, which are controlled by autonomous local feed-
stability, adaptability, and maneuverability into legged systems. backs without central planning, yet adapt to their environment
Here we present the initial steps to develop a robotic gait testbed
that can implement and verify neuromuscular controls for robotic and show substantial robustness of locomotion[8][29]. Eilen-
assistive devices. The initial stage consists of an antagonistically berg et al. [6] have implemented part of this feedback control
actuated two segment leg with a floating compliant joint. We in a powered ankle-foot prosthesis, resulting in a system that
detail its electromechanical design and low level, velocity-based adapts to the environment without requiring explicit terrain
torque control. Additionally, we present experiments that test the sensing.
leg’s performance during human-like high fidelity motions. The
results show that the robot can track fast motions corresponding To generalize this approach to segmented powered legs,
to 87% of the maximum performance limit of human muscle. The we here present our initial steps of developing a robotic gait
experiments also reveal limitations of our current implementation testbed that can implement and test neuromuscular controllers
and we discuss solutions to overcoming them. for robotic assistive devices. The testbed currently consists of
a half-human sized, two segment leg with two antagonistic ac-
I. I NTRODUCTION
tuators and a compliant floating joint. We detail the electrome-
Current approaches to leg control in locomotion either use chanical design and control of this robotic neuromuscular leg
centralized planning and tracking or mimic predefined joint (RNL) (sect. II and III), and present and discuss experimental
motion patterns extracted from normal human gait. The first results that test the performance of the leg during human-like,
approach is used in humanoids including Honda’s ASIMO and high-fidelity motions (sect. IV and V).
AIST’s HRP-4 [11][18][27], but cannot be applied to robotic
assistance wherein the central human user’s state is unknown. II. E LECTROMECHANICAL D ESIGN
As a result, the second approach prevails in rehabilitation RNL is a half-human sized, two segmented, antagonistically
robotics [17][2][12][13]. For instance, exoskeletons developed actuated robotic leg with joint compliance (Fig. 1). The
for paralyzed patients enforce a limited set of pre-defined electromechanical design of RNL is driven by three themes:
motion patterns of normal human gait [21][1], which severely dynamic similarity, antagonistic actuation, and leg compliance.
constrains their functional dexterity. In part, this problem
can be overcome by combining motion libraries and pattern A. Dynamic Similarity
recognition, as Sup et al. [30] demonstrated with a powered We aim to build a testbed that matches human leg per-
legged prosthesis for speed and slope adaptation. However, formance to develop neuromuscular controllers for powered
control strategies that generate the stability, maneuverability, segmented legs. For cost and safety considerations we build a
and adaptability needed for truly high mobility have not been robotic leg that is half the size and a quarter of the weight of a
identified with this approach. human leg. To ensure that the dynamic behavior of our robot
We seek to develop an alternative control approach to matches human legs, we use dynamic scaling. Dynamic scal-
powered legged systems that builds on decentralized neuro- ing uses fundamental physical variables to define relationships
muscular control strategies of human locomotion. Animal and between a system’s quantities at different scales. This approach
human legs possess remarkable autonomy in behavior and was formalized by Buckingham [3] and is often applied in
control. For example, decerebrate cats and rats have no brain aerospace and fluid engineering applications. In mechanical
control over their legs, yet they seamlessly adapt to different systems, fundamental units are mass, length, and time. The
locomotion speeds on a treadmill and autonomously transition robot’s mass and length targets define these scaling factors

385
 
$\  $\ $\ 
$^\
  



!


   


     

Fig. 1: Robotic testbed development. (a) Major human leg muscles with monoarticular knee extensor and biarticular knee flexor
in dark grey. (b) Testbed concept. (c) CAD assembly of initial stage. (d) Robotic implementation. For experiments, thigh was
rigidly coupled to mounting cage.

Human RNL Scaling Factor


zero joint torque to allow passive dynamics, satisfying human
Leg Length (m) 1 0.5 1/2 mass distributions, and actuation across floating joints.
Thigh Length (m) 0.46 0.23 1/2 SEAs were originally developed by Pratt and Williamson
Shank Length (m) 0.54 0.27 1/2
Knee Radius (m) 0.06 0.03 1/2 [25] and are common in bipedal robots such as Spring
Total Mass (kg) 80 20 1/4 Flamingo developed by Pratt and Pratt [26] and M2 developed
Thigh Mass (kg) 8 2 1/4 by Paluska [22]. SEAs are characterized by a compliant
Shank Mass (kg) 3.7 0.9 1/4
Foot Mass (kg) 1.2 0.3 1/4 element between motor and load. This element decouples load
Vas. Max Force (N ) 6000 1500 √1/4 and motor inertia, which enables precise torque control with
Vas. Max Vel (m/s) 0.96 0.68 2/2 highly geared motors (at the expense of system bandwidth),
Ham. Max Force (N ) 3000 750 √1/4 including zero torque. SEAs have been combined with cable
Ham. Max Vel (m/s) 1.2 0.84 2/2
Max Joint Torque (N m) 368 45 1/8 drives in legged systems [10]. Using nondirect elements in the

Max Joint Vel (rpm) 153 217 2 drivetrain, like cable, belt, and chain drives, allows actuators to
be located away from the joint, ideal for realizing human-like
TABLE I: Mechanical performance goals. Human mechanical
segment mass distributions. Unlike belt and chain drives, cable
properties dynamically scaled to RNL dimensions. Human
drives can also go slack like human muscles, which allows
data taken from Winter [32].
them to act across compliant segments.
The drivetrain of RNL’s SEAs is shown in Fig. 2b. With
a later technology transfer to prosthetic and orthotic devices
as mrobot /mhuman = 1/4 and lrobot /lhuman = 1/2, respec- in mind, we limit ourselves to electric DC motors, focusing
tively. Since the robot is exposed to the same gravitational on the Maxon RE line. Several motor configurations match
field as humans, the relationship grobot = ghuman' must hold, our design targets (Tab. II). The optimal combination of
resulting in a time scale of trobot /thuman = 1/2. Given low weight, low rotor inertia, torque, and speed is reached
these fundamental unit scales, force, torque, and velocity by four mechanically coupled RE 30 motors. However, we
scale as Frobot /F√
human = 1/4, τrobot /τhuman = 1/8, and instead opt for two mechanically coupled RE 40 motors to
vrobot /vhuman = 2/2. limit actuator complexity, at the expense of increased rotor
The performance envelope of human leg actuation is defined inertia. For compactness, we incorporate a custom three stage
by the maximum contraction velocity and maximum isometric
force that muscles can develop. We dynamically scale these
two parameters to identify the equivalent actuator no load     ` `
speed and torque requirements of RNL. The largest human
   
leg muscle is the vastus, a knee extensor located in the front 

of the thigh (Fig. 1a). The dominating muscle used in knee 
flexion is the hamstring, whose performance requirements are  

lower than those of the vastus. (Tab. I). 


  
B. Antagonistic Actuation
RNL uses antagonistic actuators, realized by cable driven
series elastic actuators (SEAs), which simultaneously meets Fig. 2: RNL SEAs. (a) Drivetrain schematic (b) Drivetrain
several design requirements: the ability to actively command assembly (c) Prototype

386
4x RE 30 2x RE 40
which hold the compliant element. The compliant element is
Gear Ratio 40 36 made from a two-part PMC-744 urethane rubber mix (Smooth-
Total Weight (g) 952 960 on Inc.), which is cast into the clamping plates. A compliance
Rotor Inertia (kgm2 ) 0.0221 0.03
Nominal Torque (N m) 14.1 13.2 retainer, constructed from a brass spur gear, couples the joint’s
Stall Torque (N m) 163 180 shaft and compliant element, and defines the floating center
No Load Speed (rpm) 212 211 of rotation of the knee joint’s axis. The retainer’s set screw
Nominal Speed (rpm) 194 194
rigidly couples the thigh and joint shaft. This design restricts
TABLE II: SEA motor configurations. Optimal configuration rotational motion to only occur in the robot’s shank. An
is comprised of four RE 30s. Two RE 40s meet the same RM22B encoder head is mounted on the joint shaft. The
performance criteria with lower mechanical complexity, at the corresponding encoder body is located on the shank. Relative
expense of increased rotor inertia. motion between the encoder head and body measures knee
position during leg movement.
III. V ELOCITY-BASED SEA C ONTROL
Our actuators generate desired torques using a velocity-
based SEA control scheme [31]. Pratt and Williamson [25]
originally approached SEA control from a torque-based per-
spective. Recently, an alternative SEA controller was proposed
by Wyeth [33], which modulates load torque, τl , using motor
velocity, ωm , as a control target instead of motor torque,
 
1 s
ωm = τl + , (1)
Jl s ks
Fig. 3: Floating compliant knee joint. (a) Concept (b) Proto-
type (c) Close-up of implementation where Jl is the load inertia and ks is the stiffness of the
compliant element. This approach is advantageous because
losses due to gear dynamics between the motor and spring do
drivetrain into our actuation system (Fig. 2a). The first stage not need to be considered. Since motor velocity corresponds
mechanically couples the motors with a 4:1 reduction; the exactly to velocity at the drivetrain output, the velocity loop
second stage orients the output shaft’s axis of rotation with automatically compensates for losses without additional tuning
a 3:1 reduction. We locate a rotary spring coupling between of the outer control loop. In addition, the controller’s inverse
these stages. Currently, we use an off-the-shelf spring cou- dynamics terms (Fig. 4) only require the first derivative of
pler (Stock Drive Products) with a stiffness of 1.75N m/rad motor position, which leads to increased system bandwidth,
and a maximum torque rating of 1.4N m. An additional 3:1 since low-pass filters with higher cutoff frequencies can be
reduction is located between the SEA output shaft and joint, used.
connected via cable drive, for a total 36:1 gear reduction in our Wyeth [33]’s formulation of velocity-based SEA control (eq.
drivetrain. The gear ratio of the external stage can be modified, 1) requires load inertia to be known. For legged systems, it
which allows us to use the same SEA for muscles with is not clear what the load inertia is, as load dynamics con-
different properties. We realize force measurements via two stantly change due to joint position and gait phase. However,
absolute rotary encoders (RM22B: 9bit, magnetic, analogue knowledge of load inertia is not necessary in the formulation
encoders; Renishaw PLC) located on either side of the spring. of a velocity-based SEA controller. Starting with Pratt and
Additionally, an incremental encoder on the shaft of one RE Williamson [25]’s derivation, torque exerted by a spring due
40 measures motor velocity (RM22I: 9bit, magnetic, digital to angular deflection is given as
encoder; Renishaw PLC). One motor controller (Solo Whistle, τl = −ks (θl − θm ) (2)
Elmo MC) supplies the same current to both RE 40s. Currently
RNL uses two SEAs. One is located in the robot’s thigh, where τl is the torque applied to the load, ks is the spring
simulating the vastus muscle; the other is located on the stiffness, and (θl -θm ) defines the deflection of the spring be-
mounting frame, simulating the biarticular hamstring muscle tween the load and motor side. To formulate torque control as
(Fig. 1c). velocity-based control, we write motor position as a function
of motor velocity θm = ωm /s, and resolve eq. 2 to
C. Leg Compliance
ωm = τl /ks s + θl s. (3)
Humans are not rigidly coupled kinematic chains, possess-
ing interjoint cartilage and soft tissue around bones. To capture Here Jl does not need to be known. Fig. 4 shows a schematic
this aspect in RNL, we incorporate a floating joint design implementation of this velocity control, in which eq. 3 has
into the robot (Fig. 3). The floating joint design connects been implemented as feedforward compensation, P (s) is PD
the robot’s thigh and shank. The joint is composed of two feedback compensation for model uncertainty, and C(s) rep-
rapid prototyped clamping plates (VeroWhitePlus, Objet Ltd.), resents the motor controller.

387
  

 ‚ƒ
„        



  €    

Fig. 4: Velocity-based SEA control system model.

Fig. 5: High fidelity motion experiments. RNL knee position, velocity, and antagonistic actuator torques for walking trajectories
corresponding to (a) 1.0x, (b,c) 1.6x, and (d) 1.8x nominal walking speed trajectories. In experiments, motion occurred between
4s and 8s. Plots show mean±s.d. for 10 repetitions. Dark grey lines: commanded trajectories. Black lines: measured trajectories.
Light grey lines: ±s.d. of measured trajectories.

IV. E XPERIMENTS controller, and a floating compliant knee joint. RNL is intended
as platform to develop neuromuscular controllers that embed
The electromechanical design of RNL includes custom human-like performance. To verify if our design and control
built antagonistic actuators, an alternative velocity-based SEA

388
Fig. 6: RNL knee position and antagonistic actuator torque during co-contraction experiments for (a) 0% (b) 5% (c) 10% (d)
15%. Dark grey lines: commanded trajectory. Black lines: measured trajectory. Light grey lines: External impulse disturbances
to instigate shank oscillation.

implementations meet this performance, we devise two experi- between 1.0x and 2.0x speed of the nominal trajectory, the
ments. The first experiment tests if the robot can generate fast, latter of which reaches maximum knee joint velocity (Tab. I).
high fidelity motions that characterize human locomotion. The With these references, we implement a tracking control
second experiment targets the ability to execute antagonistic that outputs desired actuator torques. The controller includes
co-contraction often seen in neuromuscular control. feedforward torque trajectories, Js φ̈ref
k , gravity compensation,
and PD feedback compensation, kp (φref ref ˙
k −φk )+kd (φ̇k −φk ),
Since RNL currently only has actuators simulating the
hamstring and vastus, we focus on the robot’s knee for this where kp = 15 and kd = 0.01kp are the position and velocity
evaluation. To simulate the inertial effects of a foot segment, feedback gains. The gains are kept constant throughout all
we attach a weight to the bottom of RNL’s shank, increasing experiments. φk and φ̇k are measured by the absolute encoder
its total mass to 1.1kg. The resulting mechanical properties of on RNL’s knee joint. The four components are summed to
the shank segment are lcom = 0.107m and Js = 0.005kgm2 , generate net joint torque. The resulting actuator flexion and
where lcom is the distance of the shank’s center of mass extension torques are commanded to the corresponding SEA
position from the knee pivot and Js is the inertia. via the velocity-based control scheme described in section III.
To avoid cable slack, each SEA applies a minimum 0.5N m
A. High Fidelity Motion Experiments torque to the joint at all times. Fig. 5 summarizes the results of
We evaluate RNL’s motion fidelity using dynamically scaled the motion experiments. Tabs. III and IV list the mean cross-
human joint trajectories. We first numerically differentiate correlation coefficients and signal time delays over the trials
knee angular position data, φrefk , tabulated in Winter [32], to for all traces. We observe that the executed joint position and
generate reference trajectories for joint velocity, φ̇ref
k , and ac- velocity trajectories closely follow the desired trajectories for
celeration, φ̈ref
k , observed in human walking. The trajectories trials up to 1.6x, with position and velocity correlation coef-
are median filtered (filter order = 10) to eliminate artifacts ficients greater than 0.90 and 0.80, respectively. The largest
resulting from the differentiation. Next, we use dimensional differences between desired and commanded position occur
scaling to adapt the trajectories to RNL’s scale. The resulting during periods of knee extension at 1.6x, with a maximum
joint trajectories correspond to normal human walking speed error of 7.3◦ . Velocity shows a similar tracking quality. Top
(1.0x nominal trajectory). We scale these trajectories to range speeds achieved 160rpm in 1.6x trials. In addition, we observe

389
R θk Rθ̇ Rτvas Rτham A (%) kmuscle (N/m) τP L (N m) kknee (N m/rad)
k

1.0x 0.97±0.01 0.87±0.04 0.71±0.06 0.90±0.02 0 0 0 0


1.2x 0.97±0.01 0.88±0.02 0.67±0.04 0.90±0.02 5 7310 1.22 6.59
1.4x 0.94±0.02 0.82±0.04 0.80±0.02 0.82±0.05 10 14640 2.44 13.17
1.6x 0.95±0.02 0.85±0.02 0.80±0.01 0.80±0.03 15 21950 3.66 19.79
1.8x 0.80±0.06 0.64±0.08 0.77±0.04 0.62±0.04
TABLE V: Equivalent stiffnesses & SEA pre-load torques at
TABLE III: Mean correlation coefficients (R) for walking different muscle activation levels. Values calculated at lCE =
trials. n =10 for all speeds. 1/2lopt .

t θk tθ̇ tτvas tτham 0% 5% 10%


k

1.0x 9.5±1.5 18.4±2.6 13.1±1.7 4.8±0.9 T Simulation (s) 0.78 0.30 0.22
1.2x 7.2±0.9 11±0.7 10.3±1.2 5.1±0.7 T RNL (s) 0.75±0.04 0.25±0.01 0.25±0.02
1.4x 15.9±1.4 19.5±1.8 9.1±1.0 9.0±1.6
1.6x 12.3±2.4 15.2±2.3 10.7±0.8 9.3±1.7 TABLE VI: Period of oscillation at different levels of co-
1.8x 25.0±3.9 25.2±4.4 9.5±2.1 14.9±0.9 contraction.
TABLE IV: Mean time delays (t) in ms for walking trials.
n =10 for all speeds.
where lCE is the length of the muscle’s contractile element
and lopt is its optimal length. The resulting muscle stiffness is
a high degree of repeatability throughout all trials, with a
dfl iso
maximum standard deviation of 7.4◦ and 44rpm from the kmuscle = A F . (7)
mean values over all trials1 . The results indicate that human- dlCE max
like segment motions can be reliably and accurately replicated Using scaled values of the vastus complex (lopt =0.04,
for speeds up to 160rpm. w=0.056), we calculate kmuscle , τP L , and kknee , for four
At higher speed trials, the quality of position and velocity levels of A, ranging from 0 % to 15% muscle activation. (Tab.
tracking declines (Tabs. III and IV). Nevertheless, velocity V).
targets were still reached with top speeds of at least 190rpm If RNL can successfully execute co-contraction, the motion
(Fig. 5d), or about 87% of our desired performance goals resulting from the torques described by eq. 4 matches the
outlined in Tab. I. We could not test higher speeds, because motion of a physical driven pendulum whose center of mass
the compliance retainer in the knee joint failed during the high and inertia properties correspond to those of the shank. We test
speed trials. the quality of co-contraction control by deflecting the shank
from its rest position φ0 = 0 and comparing the resulting
B. Co-Contraction Experiments
behavior to that of a simulated, equivalent driven physical
Antagonistic muscle co-contraction is characterized by pendulum.
equal pre-load torques of the muscles, τP L , and an equivalent Fig. 6 shows the behavior of RNL during the co-contraction
rotational joint stiffness, kknee . The corresponding desired experiments. Fig. 6a shows the observed motion and torques
SEA torques are given by for 0% co-contraction, which evaluates the quality of RNL’s
τvas/ham = τP L ± kknee (φk − φ0 ) (4) zero torque control. The shank follows the motion of a damped
pendulum with a period T that matches the simulated period
where φ0 is the joint reference position. To estimate appro- (Tab. VI). For 0% co-contraction, the desired τvas/ham is
priate values for τP L and kknee , we use a Hill-type muscle zero. The measured torques track the commanded torque
model as described in Geyer et al. [9]. In these models, the within torque resolution limits of the SEAs. (±0.1N m joint
muscle force, F , is given by torque resolution for SEA spring stiffness of 1.75N m and
iso 9bit encoders.) Higher levels of co-contraction are possible
F = Afl fv Fmax (5)
max
(Fig. 6b-d), but the commanded torques are not well tracked,
where A is percentage of muscle activation, Fiso is the as large oscillations result from the antagonistic actuators
muscle’s maximum isometric force, and fl and fv are the counteracting each other with large pre-load torques. Because
force-length and force-velocity relationships. Combined with the oscillations exceed the maximum torque rating of our
the knee joint radius, the muscle force defines τP L . For spring at approximately 15% co-contraction, we cannot com-
calculating the local stiffness, we neglect modulations by fv mand higher levels of co-contraction. This shortcoming of our
and model fl as current implementation of antagonistic control is visible in the
1 ( ( 2
( lCE − lopt (3 position trajectories as well.
fl = exp ln(0.05) ( ( ( (6)
0.056lopt ( V. D ISCUSSION & C ONCLUSION
1 The first leg swing was not included in this calculation as impulse Our goal is to develop neuromuscular controllers for pow-
accelerations from rest are not representative of system dynamics. ered segmented legs. To realize this goal, we presented the

390
initial design and validation stage of a robotic test platform cular control strategies is unclear. While humans can reach up
targeting human-like performance. The platform currently to 30% of antagonistic co-contraction at the leg joints [23],
consists of RNL, a half-human sized, two segment robotic these levels are observed during the stance phase in which the
leg with a floating compliant knee joint (Fig. 1d). The elec- legs are loaded by body weight, which is about 17 times larger
tromechanical design of the leg meets the physical weight- than the weight of the shank. In the experiments, by contrast,
size properties and actuation performance defined by human the leg was unloaded and resistance to motion was entirely
physiology. To meet these criteria, we developed a modular, due to the mass properties of the shank. In particular, already
compact SEA (Fig. 2) that matches the performance of the at 5% co-contraction the peak torques of the actuators reached
vastii, the largest muscle group in human legs. In addition, we 3.2N m, which corresponds to the torque created by one body
formulated an SEA control that takes advantage of velocity- weight. Hurst et al. [16] suggest that damping elements parallel
based torque control without having to know load inertia (Fig. to the SEA spring may attenuate oscillations when load inertia
4). is low. We plan to test if adding such elements into the SEA
In two experiments, we tested if our design and control drivetrain will improve the quality of co-contraction control.
implementation can deliver fast motions that characterize hu- The presented robot validates our initial design and control
man locomotion and can generate antagonistic co-contraction implementation, but does not represent a full neuromuscular
seen in neuromuscular systems. The experimental results show leg. We are currently expanding RNL into a multi-degree
that we can reliably generate human-like leg motions with of freedom leg with hip, knee, and ankle joints as well as
high positional accuracy for joint speeds up to 160rpm and an adaptive, compliant foot. This robot will incorporate 7
with lesser positional accuracy for joint speeds up to 190rpm actuators based on our presented SEA design and control
(Fig. 5), approximately 90% of the maximum designed for implementation. These actuators represent the major mono-
joint velocity (Tab. I). The failure of the knee joint’s clamping and bi-articular muscles that propel human legs during walking
plates during the high speed experiments indicates that the and running. The actuators will be controlled by reflexive
prototyped plates will need to be replaced by stronger material neuromuscular models [8] [29], which will generate desired
versions to achieve these maximum speeds. On the other hand, torques for each SEA unit (τd in Fig. 4). With this robot, the
the compliant part of the floating joint did not show any wear immediate goal will be to realize reflexive leg controls during
throughout the experiments, suggesting that its design works swing, crucial for autonomous balance recovery in amputee
very well. The co-contraction experiments revealed that the locomotion. Ultimately, we expect this work to result in a
antagonistic actuators can command zero torque within the generalized gait testbed to develop and test neuromuscular
torque resolution limits of the SEAs, enabling near passive controllers for multi-articulated powered robotic limbs for
motions at the joint level (Fig. 6a). Higher levels of co- rehabilitation and humanoid robotics.
contraction up to 15% of human muscle activations were
possible, but produced oscillations in the SEA torque patterns ACKNOWLEDGMENTS
which limited the performance of co-contraction control.
This work is supported by the Richard King Mellon Foun-
The experiments revealed two shortcomings of our current
dation. A.S. is supported by the National Science Foundation
implementation, which we are working on overcoming. First,
Graduate Research Fellowship Program. The authors wish to
the actuators’ bandwidth limitation creates about 25ms of
thank Larry Hayhurst and David Matten for their help with
time delay during high speed position and velocity tracking.
robot construction. We also wish to thank Daniel Häufle for
Scaled to human dimensions, this delay would be about 35ms,
his help with hardware communication.
which is similar to the feedback delays observed in humans.
However, the SEA delay will increase substantially with larger
R EFERENCES
actuation levels necessary for stance motions. To diminish
the delay, we are looking to replace the linear rotary spring [1] S.K. Agrawal, S.K. Banala, K. Mankala, V. Sangwan, J.P.
in the actuators with a nonlinear stiffening spring. Stiffening Scholz, V. Krishnamoorthy, and W.L. Hsu. Exoskeletons
springs in SEAs enable high precision zero torque control with for gait assistance and training of the motor impaired.
high bandwidth responses at large commanded torques. The Proc IEEE Int Conf Rehab Rob, pages 1108–1113, 2007.
advantages of nonlinear springs to series elastic actuation are [2] H.K. Au, H. Herr, J. Weber, and E.C. Martinez-
widely recognized (Pratt and Williamson [25], Hurst et al. Villalpando. Powered ankle-foot prosthesis for the im-
[15], Migliore et al. [19]), although a systematic method provement of amputee ambulation. Conf Proc IEEE Eng
for developing compact springs with custom force deflection Med Biol Soc, pages 3020–3026, 2007.
profiles has not been proposed. In addition, a custom nonlinear [3] E. Buckingham. On physically similar systems; illustra-
spring coupler will also allow us to overcome the current tions of the use of dimensional equations. Phys Rev, 4:
torque limit of 12.6N m at the output that is defined by the 345–376, 1914.
maximum torque capacity of the off-the-shelf springs in the [4] G. Courtine, Y. Gerasimenko, R. van den Brand, A. Yew,
SEAs. P. Musiekno, H. Zhong, B. Song, Y. Ao, R.M. Ichiyama,
The second shortcoming is the 15% limit on co-contraction I. Lavrov, R.R. Roy, M.V. Sofroniew, and V.R. Edgerton.
control. The impact of this limit on implementing neuromus- Transformation of nonfunctional spinal circuits into func-

391
tional states after the loss of brain input. Nat Neurosci, 2007.
12(10):1333–1342, 2009. [20] P. Musienko, R. van den Brand, O. Maerzendorfer, R.R.
[5] V. Dietz. Proprioception and locomotor disorders. Nat Roy, Y. Gerasimenko, V.R. Edgerton, and G. Cour-
Rev Neurosci, 3(10):781–790, 2002. tine. Controlling specific locomotor behaviors through
[6] M.F. Eilenberg, H. Geyer, and H. Herr. Control of a multidimensional monoaminergic modulation of spinal
powered ankle-foot prosthesis based on a neuromuscular circuitries. J Neurosci, 31(25):9264–9278, 2011.
model. IEEE Trans Neural Sys Rehab Eng, 18(2):164– [21] P.D. Neuhaus, J.H. Noorden, T.J. Craig, J. Kirschbaum,
173, 2010. and J.E. Pratt. Design and evaluation of mina: A robotic
[7] Y. Gerasimenko, R.R. Roy, and V.R. Edgerton. Epidural orthosis for paraplegics. Proc IEEE ICORR, pages 1–8,
stimulation: comparison of the spinal circuits that gener- 2011.
ate and control locomotion in rats, cats and humans. Exp [22] D.J. Paluska. Design of a humanoid biped for walking
Neurology, 209(2):417–425, 2008. research. MS Thesis, Massachusetts Institute of Technol-
[8] H. Geyer and H. Herr. A muscle-reflex model that ogy, 2000.
encodes principles of legged mechanics produces human [23] J. Perry and J.M. Burnfield. Gait analysis: Normal and
walking dynamics and muscle activities. IEEE Trans pathological function. SLACK Incorporated, 2, 2010.
Neural Sys Rehab Eng, 18(3):263–273, 2010. [24] E. Pierrot-Desseilligny and D. Burke. The circuitry of
[9] H. Geyer, Seyfarth A., and R. Blickhan. Positive force the human spinal cord: its role in motor control and
feedback in bouncing gaits? Proc R Soc B, 270:2173– movement disorders. Cambridge University Press, 2005.
2183, 2002. [25] G.A. Pratt and M.M. Williamson. Series elastic actuators.
[10] J.W. Grizzle, J. Hurst, B. Morris, H.W. Park, and Proc IEEE/RSJ IROS, 1:399–406, 1995.
K. Sreenath. Mabel, a new robotic bipedal walker and [26] J.E. Pratt and G.A. Pratt. Intuitive control of a planar
runner. American Control Conference, pages 2030–2036, bipedal walking robot. Proc IEEE ICRA, 3:2014–2021,
2009. 1998.
[11] M. Hirose and K. Ogawa. Honda humanoid robots [27] M. Raibert. Legged robots that balance. MIT Press,
development. Phil Trans R Soc A, 365(1850):11–19, Cambridge, 1986.
2007. [28] M.L. Shik, F.V. Severin, and G.N. Orlovski. Control of
[12] J. Hitt, A.M. Oymagil, T. Sugar, K. Hollander, Boehler walking and running by means of electric stimulation of
A., and J. Fleeger. Dynamically controlled ankle-foot or- the midbrain. Biofizika, 11(4):659–666, 1966.
thosis (dco) with regenerative kinematics: incrementally [29] S. Song and H. Geyer. Regulating speed and generating
attaining user portability. Proc IEEE ICRA, pages 1541– large speed transitions in a neuromuscular human walk-
1546, 2007. ing model. Proc IEEE ICRA, pages 511–516, 2012.
[13] M.A. Holgate, A.W. Boehler, and T. Sugar. Control algo- [30] F. Sup, H.A. Varol, and M. Goldfarb. Upslope walking
rithms for ankle robots: a reflection on the state-of-the- with a powered knee and ankle prosthesis: initial results
art and presentation of two novel algorithms. IEEE/RAS- with an amputee subject. IEEE Trans Neural Sys Rehab
EMBS Int Conf Biomed Rob and Biomech, pages 97–102, Eng, 19(1):71–78, 2011.
2008. [31] M.D. Taylor. A compact series elastic actuator for
[14] H. Hultborn and J.B. Nielsen. Spinal control of loco- bipedal robots with human-like dynamic performance.
motion - from cat to man. Acta Physiologica, 189(2): MS Thesis, Carnegie Mellon University, 2011.
111–121, 2007. [32] D.A. Winter. Biomechanics and motor control of human
[15] J.W. Hurst, J.E. Chestnutt, and A.A. Rizzi. An actuator movement. Wiley, (4), 2009.
with physically variable stiffness for highly dynamic [33] G. Wyeth. Control issues for velocity sourced series
legged locomotion. Proc IEEE ICRA, pages 4662–4667, elastic actuators. Proc IEEE ICRA, 2006.
2004.
[16] J.W. Hurst, A.A. Rizzi, and D. Hobbelen. Series elastic
actuation: Potential and pitfalls. Proc Int Conf Climb
Walk Robots, pages 1–6, 2004.
[17] R. Jimenez-Fabian and O. Verlinden. Review of control
algorithms for robotic ankle systems in lower-limb or-
thoses, prostheses, and exoskeletons. Med Eng Phys, 34
(4):397–408, 2012.
[18] S. Kajita, T. Nagasaki, K. Kaneko, and H. Hirukawa.
Zmp-based biped running control. IEEE Robotics and
Automation Magazine, 14(2):63–72, 2007.
[19] S.A. Migliore, E.A. Brown, and S.P. DeWeerth. Novel
nonlinear elastic actuators for passively controlling
robotic joint compliance. J Mech Des, 129(4):406–412,

392
Experiments with Balancing on Irregular Terrains
Using the Dreamer Mobile Humanoid Robot
Luis Sentis, University of Texas at Austin, USA
Josh Petersen, University of Texas at Austin, USA
Roland Philippsen, Halmstad University, Sweden

Abstract—We investigate models and controllers for mobile


humanoid robots that maneuver in irregular terrains while
performing accurate physical interactions with the environment
and with human operators and test them on Dreamer, our new
robot with a humanoid upper body (torso, arm, head) and a
holonomic mobile base (triangularly arranged Omni wheels). All
its actuators are torque controlled, and the upper body provides
redundant degrees of freedom. We developed new dynamical
models and created controllers that stabilize the robot in the
presence of slope variations, while it compliantly interacts with
humans. This paper considers underactuated free-body dynamics
with contact constraints between the wheels and the terrain.
Moreover, Dreamer incorporates a biarticular mechanical trans-
mission that we model as a force constraint. Using these tools,
we develop new compliant multiobjective skills and include self-
motion stabilization for the highly redundant robot.

Fig. 1. Experimental setup with an indication of the coordinates used by


I. I NTRODUCTION the physial intraction skill described in the experimental section.
Real-world terrains exhibit curvature changes, discrete
slabs, granular obstacles, etc. Any robot that is supposed
to help automate industrial and urban related chores, in the our earlier work on multicontact control. By incorporating
presence of irregular terrains and human operators, has to be the full mobility of the base, as opposed to the frequently
able to maneuver, manipulate and interact safely, accurately unstated planarity assumption, it becomes possible to estimate
and robustly in these type of environments. And it will have balance coordinates due to variations in slope, account for
to do so while safely and physically interacting with human free-body dynamic effects, and create an infrastructure that is
users. dynamically correct in 3D space. For Dreamer, there is another
This paper contributes a theoretical and experimental study fundamental constraint due to the mechanical coupling of two
of a control approach with such capabilities. It is based on the motion axes in its torso. This biarticular joint provides a large
whole-body control framework, which provides effective tools range of motion while keeping the mechatronics very compact.
to incorporate constrains and build robot skills as hierarchies Stable control of redundancies is a general issue for ma-
of motion and force tasks. We formulate unified dynamics nipulators with more degrees of freedom (DOF) than required
for tasks and constraints and use them to construct compliant for fulfilling a particular skill. In this paper, we contribute a
controllers for our humanoid torque-controlled mobile robot detailed stability analysis of our method of addressing such
Dreamer. We describe, model, and control two fundamental redundancies.
constraints, prove the stability of our method for controlling
motion redundancy, and show how our control approach can
A. Related Work
naturally implement kinesthetic interaction with the robot. The
experiments provide an empirical proof of concept for the Our line of research is a direct successor of [11]. Here, a mo-
developed concepts and methodology. bile omnidirectional manipulator for flat terrain was developed
One of the most fundamental physical constraints for any using caster wheels and a Puma industrial manipulator with an
mobile platform is the contact between its wheels and the Operational Control layer [14]. Our work is a departure from
ground. It is common to build mobile bases that can be the previous one in various respects: (1) we implement whole-
treated as planar joints, but this requires them to be heavy body dynamics that incorporate the free floating underactuated
and large to create passive righting moments that exceed any effects, (2) we describe the dependencies between wheels as
expected tipping moment. This requirement severely limits contact and rolling constraints, (3) we implement a dynam-
areas for potential deployment, which is why we designed ically correct prioritized control strategy, and (4) we control
Dreamer with a smaller and more lightweight base, and chose balance to prevent falling in the rough terrains. As such, we
to address stability using an active approach inspired by provide a platform that is more capable to adapt to outdoor
393
environments and is more dexterous in executing whole-body
skills.
Although, whole-body compliant control in mobile robots
and humanoids has been thoroughly studied by us [15], [21]
and by our predecesors [3], [1] more recent implementations
include [17], [9], [6]. In these works whole-body torque
control of the base and the upper body are considered. In
particular, in [9], [6] the coordination of whole-body torques
is accomplished by separating upper-body torques and an
admittance control strategy for the velocity controlled base.
Limitations of these works include flat terrain assumptions,
lacking balancing capabilities and considering static contact
conditions on the base wheels.
Balancing is an important component of our approach to
handling rough terrains. In [8], a large wheeled robot with Fig. 2. “Free-body” degrees of freedom, reaction forces, and robot’s
kinematic chain: in order to adapt to irregular terrains, we model the robot
reconfigurable pose is used to maneuver in outdoor back- kinematics and dynamics with respect to an inertial reference frame. This
terrains. Compared to our work, the previous one does not step entails adding 6 kinematic degrees of freedom of free-body movement
utilize whole-body dynamics nor contact state estamination. to the kinematic description of the robot. Furthermore, rolling and contact
constraints on the wheels are modeled for contact-based control.
In turn, the robot will perform poorly when executing fast
dynamic maneuvers. A simple wheeled robot that balances
to operate in rough roads has been proposed in [16]. They where qfreebody ∈ R6 corresponds to the passive free-body
use wheeled inverted pendulum dynamics to stabilize balance linear and angular DOF of the base, qwheels ∈ R3 correspond
through state feedback control. This work shares balance to the actuated base wheels, qtorso ∈ R3 correspond to the
capabilities with our approach but is limited in that it does robot’s upper torso with the biarticular constraint, and qarm ∈
not have manipulation capabilities. R7 corresponds to the robot’s actuated right arm. The equation
As for the control architecture, our software relates (within of motion underlying the model is
a limited architectural scope) to [2], but in contrast we
provide whole-body dynamic models apt for the control of
T
mobile humanoids under contact constraints. When compared A(q) q̈ + b(q, q̇) + g(q) + Jconstr λconstr = U T τcontrol , (2)
to [13] our software provides a multi-objective torque control
layer that enables to utilize the robot’s motion and contact where {A, b, g} are the inertial, Coriolis-centrifugal, and grav-
redundancy more efficiently. ity terms; Jconstr is the constraint Jacobian (sections II-B
and II-C); λconstr are Lagrangian multipliers; U T maps ac-
B. The Dreamer Humanoid Robot Hardware tuation forces into the system (section II-A); and τcontrol ∈
Rndofs −(1+6) is the vector of motor torques. We assume
The main hardware tool that we use for this study is a new
{A, b, g} known, solve for λconstr here, and develop the ex-
mobile dexterous humanoid robot. It includes a torso, an arm, a
pressions for the remaining terms in the following subsections.
hand, an anthropomorphic head developed by Meka Robotics,
λconstr can be found by left-multiplying (2) by Jconstr A−1
and a torque-controlled holonomic base developed by our lab.
and applying the time-derivative of the constraint equation
The actuators for the base and the upper body, except for the
Jconstr q̇ = 0 to yield
head, contain torque/force sensors that enable Elmo amplifiers
to implement current or torque feedback. An Ethercat serial
bus communicates with sensors and motor amplifiers from a T  
λconstr = J constr U T τcontrol − b − g
single computer system. A PC running Ubuntu Linux with the
RTAI Realtime Kernel runs the models and control infrastruc- + Λconstr J˙constr q̇, (3)
ture described in this project. The holonomic base contains
where J constr  A−1 Jconstr T
Λconstr is the dynamically
rotary torque sensors as well as the inertial measurement unit
consistent generalized inverse of Jconstr and Λconstr 
(IMU) 3DM-GX3-25 from MicroStrain. It achieves holonomic
(Jconstr A−1 Jconstr
T
)−1 is the inertial matrix projected in the
motion and force capabilities by utilizing Omni wheels located
manifold of the constraints. These steps eliminate the con-
in a equilateral triangular configuration.
straint forces from (2), and we obtain the equation of motion
II. C ONSTRAINED , U NDERACTUATED DYNAMICS of the constrained underactuated system as
The model of Dreamer’s dynamic behavior is formulated
using Lagrangian rigid multibody dynamics. A diagram of the T
A q̈ + Nconstr T
(b + g) + Jconstr Λconstr J˙constr q̇
modeled entities is shown in Figure 2. We start by introducing T
the generalized coordinate vector = (U Nconstr ) τcontrol , (4)

  where Nconstr  I − J constr Jconstr is the dynamically consis-


q  qfreebody qwheels qtorso qarm ∈ Rndofs , (1) tent null space of the constraint Jacobian.
394
In the following subsection, we develop U and Jconstr . The where vwheel[i] and ωwheel[i] are linear and angular velocities
latter collects various constraints. In this paper, it is defined of the wheel, and δcontact[i]  pcontact[i] − pwheel[i] is the
as displacement from the wheel center to the contact point. The
  rolling constraint implies that the velocity of the contact point
Jbiart T
Jconstr  ∈ R(1+6)×ndofs . (5) in the direction of the wheel is zero, i.e. urolling[i] vcontact[i] =
Jcontact
0, where urolling[i] ∈ R is the direction of the wheel axis. We
3

rewrite this condition as Jrolling[i] q̇ = 0, where


A. Underactuation Model
 
There are two sources of underactuation on Dreamer,
Jrolling[i]  urolling[i]
T
Jω,wheel[i] − δcontact[i] × Jω,wheel[i] .
namely the six free-floating DOF (modeled as virtual unac-
tuated prismatic and revolute joints), and the fact that the two  T (10)
Sagittal joints in the torso are driven by a single motor. We and Jwheel[i]  Jv,wheel[i] Jω,wheel[i]
T T
∈ R 6×n dofs
are
express this dependencies as qact = U q ∈ Rndofs −(1+6) where the linear and rotational
 part of the wheel’s
 basic Jacobian.
T
⎛ ⎞ T
The screw vector vwheel[i] T
ωwheel[i] = Jwheel[i] q̇ corre-
03×6 13×3 03×3 03×7
⎜ 1 0 0 ⎟ sponds to the linear and angular velocity of the wheel.
U ⎜ ⎝02×6 02×3 0 1 0 02×7 ⎠
⎟ (6) In addition, there is a contact constraint with respect to the
07×6 07×3 07×3 17×7 wheel’s normal direction to the ground: Jnormal[i] q̇ = 0, with
Jnormal[i]  unormal[i]
T
Jwheel . Therefore the combined rolling
is the (ndofs − (1 + 6))×ndofs matrix that maps the biarticular constraints of the three wheels correspond to the aggregation
and free-body DOF into the equation of motion. of the slip and the normal constraints for each wheel, i.e.
⎛ ⎞
B. Biarticular Constraint Jrolling[1]
⎜ Jrolling[2] ⎟
⎜ ⎟
Biarticular joints in our robot are located in between the ⎜ Jrolling[3] ⎟
hip and thorax segments of the torso. In particular, the torso Jcontact  ⎜ ⎟
⎜Jnormal[1] ⎟ ∈ R
6×ndofs
. (11)
⎜ ⎟
has 3 degrees of freedom (DOF) — waist rotation, hip ⎝Jnormal[2] ⎠
flexion/extension, thorax flexion/extension — but only the Jnormal[3]
waist rotation and hip flexion/extension are directly actuated
by independent motors. A steel wire runs between the hip III. C OMPLIANT W HOLE -B ODY C ONTROL
and thorax DOF to provide actuation to the two Sagittal
The whole-body control framework expresses controllers in
joints of the upper torso. This type of transmission constraint
a representation that is the most natural for a given task, with a
represents an underactuated DOF and creates tension forces
generic coordinate transformation xtask = Ttask (q) ∈ Rntask ,
across the wire. Because it creates position constraints, it is
e.g. a homogeneous transformation for a Cartesian point, or
of a holonomic nature.
a rotation and translation in the group SE(3) for a spatial
In particular, Dreamer’s biarticular constraint can be mod-
transformation of a frame. It follows that instantaneous task
eled as q hip = q thorax . It follows that the instantaneous
kinematics can be expressed as ẋtask = Jtask q̇, where Jtask =
velocities of both joints are also equal (q̇ hip = q̇ thorax ).
∂xtask /∂q ∈ Rntask ×ndofs is the geometric Jacobian matrix
Let us define the coordinate δbiart  qhip − qthorax . Then, a
with respect to the generalized coordinates.
differential relation with respect to the generalized coordinates
In order to express the task with respect to the actu-
can be expressed as
ated joints, section III-A summarizes earlier results on a
δ̇biart = Jbiart q̇ = 0, (7) dynamically consistent generalized inverse of U Nconstr . Then,
section III-B proceeds to develop a prioritized operational
where q̇ is the vector of generalized joint velocities (6 for the space task/posture controller for Dreamer, and section III-C
free-body base, 3 for the wheels, 3 for the upper torso and 7 contributes a new proof of stability for our posture control
for the right arm), and the Jacobian of the biarticular constraint approach.
is expressed as The sum of all tasks may not fully determine the motor
torques for all actuated joints. In fact, this is a very common
  circumstance with redundant manipulators such as Dreamer.
Jbiart  01×6 01×3 0 1 −1 01×7 ∈ R1×ndofs (8)
In order to stabilize the motions that remain in the null space
of all tasks, we define posture as a special kind of task which
C. Rolling and Contact Constraints optimize a performance criterion, as opposed to the usual
The relative velocity of the ground contact point of the i-th position or force tracking. We usually place at least one posture
wheel, vcontact[i] , depends on the velocity of the wheel center, at the end of any task hierarchy.
the angular velocity of the wheel, and the relative position of
the contact point: A. Actuated-Joint Kinematics
The number of constraints (one for the biarticulate joint plus
vcontact[i] = vwheel[i] + ωwheel[i] × δcontact[i] ∈ R3 , (9) two for each of the three wheels in contact) is equal to the
395

number of unactuated DOF (one for the biarticulate joint and with Jtask(k)  Jtask(k) U N constr . (20)
six for the free-floating formulation). Thus, we can solve the
robot’s kinematics from the actuated joints alone. In particular, It can be demonstrated [20], that the dynamics of the
we consider decomposing the generalized coordinates into actuated joints can be expressed as
actuated and unactuated parts, i.e. q̈act + φ∗ (b∗ + g ∗ ) = φ∗ τcontrol . (21)
 
qact where
q = Dkin , (12)
qunact
φ∗ (b∗ + g ∗ )  U A−1 (Nconstr
T T
(b + g) + Jconstr Λconstr J˙constr q̇).
where Dkin is a matrix that distributes generalized coordinates
(22)
to their respective positions in the robot’s kinematic chain.
By left multiplying the constrained actuated dynamics by
Because of the already encountered condition Jconstr q̇ = 0, ∗
J task(k)  φ∗ Jtask(k)
T
(Jtask(k) φ∗ Jtask(k)
T
)−1 , and using the
we can state that q̇ ∗ ∈ Null(Jconstr ) where q̇ ∗ expresses the
∗ ∗
set of generalized velocities that fulfills all of the constraints. equality, ẍtask(k) = Jtask(k) q̈act + J˙task(k) q̇act , the constrained
In other words, the constrained velocities can be expressed as task dynamics can be expressed as
the self motion manifold q̇ ∗ = (I − Jconstr
#
Jconstr ) q̇, where ∗T
# Λ∗task(k) ẍtask(k) + μ∗task(k) + p∗task(k) = J task(k) τcontrol , (23)
Jconstr is any right inverse of the constraint Jacobian (i.e.
#
Jconstr Jconstr = I) and q̇ represents unconstrained velocities. with {Λ∗task(k) , μ∗task(k) , p∗task(k) } being inertial, Coriolis-
The constrained generalized velocities can be reconstructed centrifugal, and gravitational terms (not derived here).
from the velocities of the actuated DOF alone using the ∗
If Jtask(k) is full rank, the following control structure yields
following formula full control of the task dynamics,
q̇ ∗ = U N constr q̇act , (13) ∗T
τcontrol = Jtask(k) Ftask(k) . (24)
where
This statement can be proven by applying the above torques
U N constr  A−1 (U Nconstr ) (φ∗ )+ ,
T
(14)
into (23) and using the property of the generalized inverse
∗ ∗
is the dynamically consistent generalized inverse of U Nconstr , Jtask(k) J task(k) = I, thus getting
(.)+ is the Moore-Penrose pseudoinverse operator and
∗ −1
Λ∗task(k) ẍtask(k) + μ∗task(k) + p∗task(k) = Ftask(k) , (25)
φ  U Nconstr A (U Nconstr ) , T
(15)
which delivers full control of the task dynamics ẍtask(k) by
is a projection of the inertia matrix in the reduced constrained
using feedforward control and feedback linearization by means
manifold. See [20] for a proof of (13).
of the control input Ftask(k) , i.e.
B. Prioritized Control Structure Ftask(k) = Λ∗task(k) aref ∗ ∗
task(k) + μtask(k) + ptask(k) , (26)
Based on (14), the differential kinematics can now be
expressed with respect to actuated joints alone: where aref
task(k) is an acceleration control input. The above
controller yields ẍtask(k) = aref
task(k) .
∗ For the case of one task and one posture, we define the
ẋtask = Jtask q̇act , (16)
∗ following prioritized control structure that provides optimal
where Jtask is the reduced constraint consistent Jacobian
projection of the posture gradient,

Jtask  Jtask U N constr , (17) ∗T ∗T
τcontrol = Jtask Ftask + Ntask τposture , (27)
∗ ntask ×nact
with Jtask ∈ R a reduced form of the task Ja- ∗
∗ ∗
cobian consistent with the general constraint conditions. To where Ntask I− J task Jtask ,
is the dynamically consistent
prove (16), consider the constrained generalized velocities null-space matrix of the reduced task Jacobian.
q̇ ∗ , and apply (13) to the instantaneous task kinematics, thus
getting ẋtask = Jtask q̇ ∗ = Jtask U N constr q̇act . C. Posture Control Stability
Now, consider the skill of touching objects using our robot,
Uncontrolled null-space tasks are known to produce un-
encoded as a control hierarchy with one task and one posture.
stable behaviors [5]. Therefore, we investigate the stabilizing
This skill uses two coordinate systems, xskill  {xhand ∈
properties of feedback controllers for the posture. We consider
T (3), xposture = qact ∈ Rnact }, where T (3) is the group of
the following energy function to be minimized in posture space
translations. The experiments in section IV employ a controller
which follows the same principle, but it adds a balancing task 1 T
Vposture = e K eqact , (28)
to the top of the control hierarchy. 2 qact
In general, an arbitrary task(k) can be kinematically char- goal goal
where eqact  qact − qact is a feedback error function, qact
acterized by its mapping with respect to the generalized
is a postural attractor goal and K is a gain matrix. A naı̈ve
coordinates and by its generalized constrained Jacobian, i.e.
approach is to project the gradient of (28) directly into the
xtask(k) = Ttask(k) (q), (18) control structure (27), i.e.

ẋtask(k) = Jtask(k) q̇act , (19) ∗T
τcontrol = Jtask ∗T
Ftask + Ntask Keqact . (29)
396
2. As such (32) becomes
q̈act + φ∗ (b∗ + g ∗ ) = φ∗p|t Fposture . (34)
By choosing the feedback control policy
 +  +
Fposture = φ∗p|t K eqact + φ∗p|t φ∗ (b∗ + g ∗ ) , (35)

leads to [20]
T
Urank,p|t (q̈act = K eqact ) , (36)
where  T
Fig. 3. Balance: The upper body is controlled such that, when projected on
to a horizontal plane, its center of mass (COM) remains as close as possible φ∗p|t = Up|t
rank
Σrank
p|t
rank
Up|t (37)
to the projected center of the support triangle formed by the three wheels.
This is demonstrated by manually tilting the base. Notice that the COM task is the minimal (rank) svd decomposition of φ∗p|t , and
controls two DOF, and the posture control takes care of stabilizing the joints
at a position that minimizes the posture error while not interfering with the
projected COM position.
rank
Up|t ∈ Rnact ×(nact −ntask ) , (38)
is the set of controllable eigenvectors of φ∗p|t .
If we apply the above torque to the posture dynamics (which The feedback behavior of Equation (36) is now linear in
T
are equal to the actuated dynamics) of Equation (21), and the controllable directions Urank,p|t and therefore leads to a
neglect the task torques, i.e. Ftask = 0, we get q̈act + system of Ordinary Differential Equations (ODEs). If follows,
φ∗ (b∗ + g ∗ ) = φ∗ Ntask
∗T
Keqact . that it must minimize the energy function of Equation (28).
The problem with this policy is that Ntask ∗T
in the above The resulting performance improvement in posture space is
equation introduces cross-coupling of the error coordinates. substantial [15].
Such distortion leads to a system of Partial Differential Equa- As a new contribution, we study in more detail the stability
tions (PDE’s) for which convergence to the minimum energy of self-motions of our
 proposed
 controller.
  The general con-
+ +
cannot be assured. A detail study on convergence and stability troller Fposture = φ∗p|t v + φ∗p|t φ∗ (b∗ + g ∗ ) yields
of the above distortion should be conducted. However, proof the closed loop behavior [20]
of non convergence to the minimum energy was empirically
demonstrated in [15]. q̈act = Φ∗p|t v, (39)
This type of deficiency has been studied before, for instance  T
in [18] by proposing a time varying feedback stabilization where Φ∗p|t  φ∗p|t φ∗+
p|t = Up|t
rank rank
Up|t is a singular and
scheme which augments the gain matrix K. However, such symmetric positive semidefinite matrix that represents the self
solution does not take into account the null space charac- motion manifold of the task coordinates, and v a closed loop
teristics, and therefore cannot optimize the gradient descent. controller. Let us define a projected state error
In [15], [21], we extensively investigated this problem in the
E  Φ∗p|t (−kp eqact − kv q̇act ) , (40)
context of dynamic behavior of posture control, and proposed
a control strategy equivalent to choosing a gain matrix that with kp > 0, kv > 0. The controller choice
rotates
 the gradient
 to the best possible direction away from
∗T 1 kp 1
Null Ntask . It was found that the following postural control v= KE E − q̇act − Φ̇∗p|t (−kp eqact − kv q̇) , (41)
structure yields favorable gradient descent of postural energies kv kv kv
∗T where KE is a gain matrix, will render stable behavior of the
τp|t = Jp|t Fposture , (30)
self-motions. To demonstrate the stability, let us consider the
∗T Lyapunov candidate function
where τp|t  Ntask τposture is the right-most term of (27),
∗ ∗ ∗ 1 T
Jp|t  Jposture Ntask , (31) V = E E ≥ 0. (42)
2

is the task consistent posture Jacobian matrix and Jposture  Practically, the above energy function is strictly positive def-
U U N constr , is the reduced constraint consistent Jacobian of inite as desired postures are normally uncorrelated with the
the posture. To prove (30) let us consider the following steps: task. The time derivative yields
1. If we plug (30) into (21) we get
V̇ = E T Ė = −E T (kp q̇act + kv v)
q̈act + φ∗ (b + g) = φ∗ Jp|t
∗T
Fposture . (32)
− E T Φ̇∗p|t (kp eqact + kv q̇act ) = −E T KE E ≤ 0 (43)
However, in [20] it was demonstrated that the right hand terms
φ∗ Jp|t
∗T
= φ∗p|t , where where we have used the equality

φ∗p|t  Jp|t

φ∗ Jp|t
∗T
, (33) Ė = Φ∗p|t (−kp q̇act − kv q̈act ) + Φ̇∗p|t (−kp eqact − kv q̇act )

is an inverse inertial matrix in a space that is simultaneously = Φ∗p|t (−kp q̇act − kv Φ∗p|t v) + Φ̇∗p|t (−kp eqact − kv q̇act ),
consistent with the robot constraints and with the task priority. (44)

397
of the center of mass (COM) projected onto a horizontal plane.
Second, the posture is modified to act only in the upper body
DOF, which naturally implements kinesthetic control of the
base: a human user simply needs to gently push or pull the
hand in the direction where the robot should go, and without
further modifications to the system model or the control
formulation, the three base DOF will provide the required
motion. This complex coordination works because the whole-
body formulation includes all DOF in all task representations,
and the hand goal is defined relative to the body, thus moving
the base closer to the hand.
The coordinates of this Phyisical Human-Robot Interac-
tion skill are thus xskill (q)  {xcom , xhand , xposture } ∈
Fig. 4. Prioritized whole-body compliant controller. R2+3+nposture . In the following lines, we describe the task
kinematics and then formulate the prioritized whole-body
control structure. All the controllers and related software
the property of idempotence Φ∗p|t Φ∗p|t = Φ∗p|t (which follows were implemented using the open source realtime software
from the property of generalized inverse φ∗p|t φ∗+ ∗ ∗
p|t φp|t = φp|t )
infrastructure described in [19].
and the closed loop input of (41). Because Equation (43) The balance task coordinates are
is essentially the negative equivalent of Equation (42), the
1 
ndofs
proposed self-motion controller renders stable behavior of the xcom  mi xcom[i] (45)
posture if (42) is chosen to be striclty positive. M i=i
base
Tracking under model uncertainties remains an open ques-
tion for our posture controller. We are planning to conduct where M is the total mass of the robot, ibase = nfreebody +
such an analysis inspired by [4], [23], [10]. nwheels is the first joint attached to the upper body base in the
kinematic chain depicted in Figure 2. The COM Jacobian can
thus be given as
IV. E XPERIMENTS
1 
ndofs
We performed an experiment involving maneuvering in an Jcom (q) = mi Jcom[i] (q), (46)
irregular terrain while physically interacting with a human M i=i
base
user. It serves as empirical validation for the approach, by
focusing on one essential irregular terrain feature, namely a where Jcom[i] is the Jacobian of the i-th linkage in the
change in slope between two planar surfaces. kinematic chain.
This relatively simple setup, shown in Figure 1 already The hand coordinate system can be derived from the homo-
contains many of the challenges that Dreamer will have to geneous transformation from the Inertial Frame I to the Hand
face on more irregular terrain: excessive tilting or upper Frame H, both depicted in Figure 2. Thus, the hand frame is
body accelerations may topple the robot; feed-forward gravity given by
compensation, which is crucial for accurate yet compliant task   n&
hand H 
control, depends on orientation; and forces due to interaction xhand i+1 xhand
= Ti (q) , (47)
with the environment can lift wheels off the ground or lead to 1 1
i=0
unwanted slippage of the base.
where xhand is the vector of hand translations with respect
The objective of the experiment is to demonstrate that: (1) to the inertial frame I, i+1 Ti (q) is the matrix describing the
the robot can transition safely between the flat and the sloping Homogeneous Transformation from frame i + 1 to frame i,
portions at various speeds; (2) the IMU unit and the models of and H xhand is the hand translation expressed in its local frame
the robot produce a good estimation of the constrained free- H. The Jacobian of the hand is thus the geometric operator
body dynamics; (3) the base follows the body as a result of the Jhand (q)  ∂xhand /∂q.
compliant interactions and because of the contact models; (4)
The posture coordinates correspond to the actuated joints
the robot can control balance to prevent falling; (5) accurate
of the upper body. Notably, for reasons described above, the
manipulation and physical interaction can take place in the upper
wheels are not included. Therefore xposture = Sact q, where
irregular terrain; (6) posture energies are minimized given the upper
Sact is a selection matrix corresponding to the actuated
control objectives; and (7) all of this is accomplished while
joints of the upper body only. It follows that the posture
complying with the biarticular and rolling/contact constraints. upper
Jacobian is Jposture = Sact .
To show the desired capabilities, we create a skill for The prioritized whole-body compliant controller described
interacting with a human user in the irregular terrain. It is earlier endows Dreamer with the physical human robot inter-
very similar to the one discussed in section III-B, with two actions skill for this experiment. The control structure can be
modifications. expressed as
First of all, a balance task prevents falling when transition-
∗T ∗T ∗T
ing from flat to sloping terrain. This task controls the position τcontrol = Jcom Fcom + Jhand Fhand + Jposture Fposture , (48)
398
Fig. 5. Experimental Results. Sequence of video snapshots, and data taken during the experiment on the irregular terrain. Snapshots, (A), (B) and (C)
correspond to maneuvers on a terrain on a terrain of 7◦ slope. The corresponding data is shown on images (D), (E), (F) and (G). At instant (1) the upper
body motors are enabled. At (2) the base motors are enabled. At (3) the robot starts ascending the slope.

where the center of mass operates with higher priority than contrast, the posture is required to be locally stable only. The
the hand task and the hand task operates with higher priority proposed feedback/feedforward control laws are
than the posture task. Priorities are enforced by recursively  
Fcom = Λ∗com −kp,com egoalcom − kv,com ẋcom + pcom , (57)
defining the differential manifold of the self-motion of higher  
priority tasks, i.e. Fhand = Λ∗hand −Kp,hand egoal hand − Kv,hand ẋhand + phand ,

Jcom = Jcom U N constr , (49) (58)
∗ ∗ =
Fposture  (59)
Jhand = Jhand U N constr Ncom , (50) 

Jposture ∗
= Jposture U N constr Nhand,com , (51) Λ∗posture −kp,posture egoal
posture − kv,posture ẋposture + pposture .

with null spaces of self-motion expressed as [20] Given the above linearizing control laws, the resulting closed-
∗ loop dynamics of the task coordinates become
∗ ∗
Ncom = I − J com Jcom , (52)
∗ ∗
ẍcom + kp,com egoal
com + kv,com ẋcom = 0, (60)
∗ ∗ ∗
Nhand,com =I− J hand Jhand − J com Jcom . (53)
ẍhand + Kp,hand egoal
hand + Kv,hand ẋcom = 0, (61)
The above control structures leverage the developments in
sections II and III to achieve dynamic consistency. Fcom , which are globally asymptotically stable while the closed-loop
Fhand , and Fposture constitute the input commands. dynamics of the posture become
We now focus on the feedback control policies for the T
 
Ur,posture ẍposture + kp,posture egoal
com + kv,posture ẋcom = 0,
overall controller. Figure 4 shows the flow diagram. The goal (62)
of the COM task is to move the robot’s center of mass to a which is locally stable.
goal position above the center of the triangle defined by the Note that there are alternative approaches for creating
wheels (see Figure 3). The goal of the hand task is to reach a such a skill. For example, inverse kinematic [22], Resolved
position in front of the robot’s body, compliance with respect Momentum [12], whole-body impedance controller [7]. To the
to the human is achieved by reducing the control gains. The best of our knowledge, however, none of these is adept at
posture goal is to move the upper body actuated joints to the handling irregular terrains.
zero position, except for the elbow joint which should bend Figure 5 shows a sequence of snapshots and the accompa-
90◦ . Summarizing, nying data of the experiment. The robot starts on flat terrain.
Then, the human operator pulls gently backward on the hand,
com = projected center of the base ∈ R
xgoal 2
(54)
and the robot transitions to the slope. After approximately 1
xgoal
hand = position relative to body ∈ R
3
(55) meter, the robot is pushed back to the horizontal area. Looking
at the data plots, various phases can be discerned.
xgoal
posture = posture with bent elbow ∈ Rnupperbody −1 . (56)
During the first 8 seconds, the emergency stop button was
We assume that the center of mass and hand tasks are engaged while the operator moved from the command inter-
nonsingular in their priority spaces. Therefore, the objectives face to the robot. Fluctuations of hand position and postural
of these tasks are to converge asymptotically to the goal. In error are due to the operator grasping Dreamer’s upper arm.
399
After 8 seconds, the emergency button of the upper body [3] K.S. Chang, R. Holmberg, and O. Khatib. The augmented object model:
is released, the hand position stabilizes and the postural error Cooperative manipulation and parallel mechanism dynamics. In Robotics
and Automation, 2000. Proceedings. ICRA’00. IEEE International Con-
gets minimized. ference on, volume 1, pages 470–475. IEEE, 2000.
The independent power of the base is turned on after 10 sec- [4] CC Cheah, C. Liu, and JJE Slotine. Adaptive tracking control for robots
onds. Consequently, the COM relative error gets minimized. with unknown kinematic and dynamic properties. The International
Journal of Robotics Research, 25(3):283–296, 2006.
After 15 seconds, the operator starts to guide the robot up [5] A. DeLuca. Zero dynamics in robotics systems. Progress in Systems and
the slope. It can be seen that the base moves for approximately Control Theory, Proceedings of the IIASA Workshop, 9:70–87, 1991.
0.9 m at a speed of about 0.35 m/s. During this time, the [6] A. Dietrich, T. Wimbock, and A. Albu-Schaffer. Dynamic whole-
body mobile manipulation with a torque controlled humanoid robot via
balance task maintains the COM within 1 cm of its goal. The impedance control laws. In Intelligent Robots and Systems (IROS), 2011
relative hand position also remains stable and close to its goal. IEEE/RSJ International Conference on, pages 3199 –3206, sept. 2011.
Note that the operator keeps pulling on the hand during all of [7] A. Dietrich, T. Wimbock, and A. Albu-Schaffer. Dynamic whole-
body mobile manipulation with a torque controlled humanoid robot via
this, and notice that the postural error is minimized in the impedance control laws. In Intelligent Robots and Systems (IROS), 2011
task’s null space throughout the interaction. IEEE/RSJ International Conference on, pages 3199–3206. IEEE, 2011.
[8] Gustavo Freitas, Fernando Lizarralde, Liu Hsu, and Ney R. Salvi dos
Reis. Kinematic reconfigurability of mobile robots on irregular terrains.
V. C ONCLUSION In Robotics and Automation, 2009. ICRA ’09. IEEE International
Conference on, pages 1340 –1345, may 2009.
Controlling mobile humanoid robots in irregular terrains [9] M. Fuchs, C. Borst, P.R. Giordano, A. Baumann, E. Kraemer, J. Lang-
entails modeling the free-body dynamics constrained by wheel wald, R. Gruber, N. Seitz, G. Plank, K. Kunze, R. Burger, F. Schmidt,
contacts and, in our case, a biarticular transmission in the T. Wimboeck, and G. Hirzinger. Rollin’ justin - design considerations
and realization of a mobile platform for a humanoid upper body. In
torso. Based on these foundations, we have presented a prior- Robotics and Automation, 2009. ICRA ’09. IEEE International Confer-
itized compliant controller which precisely tracks a hierarchy ence on, pages 4131 –4137, may 2009.
of tasks and optimizes a posture in the remaining degrees [10] M. Galicki. An adaptive regulator of robotic manipulators in the task
space. Automatic Control, IEEE Transactions on, 53(4):1058–1062,
of redundancy. The general applicability of our approach 2008.
has been demonstrated with a detailed study of closed loop [11] Robert Holmberg and Oussama Khatib. Development and control
stability and an experiment that reflects some of the key of a holonomic mobile robot for mobile manipulation tasks. The
International Journal of Robotics Research, 19(11):1066–1074, 2000.
challenges while remaining sufficiently simple to serve as an [12] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and
illustrative example. H. Hirukawa. Resolved momentum control: Humanoid motion planning
A remarkable result is that base movement can emerge based on the linear and angular momentum. In Proceedings of the
IEEE/RSJ International Conference on Intelligent Robots and Systems,
quasi spontaneously, adapting to the terrain without an explicit pages 1644–1650, Las Vegas, USA, October 2003.
representation of the feasible directions of movement. This is [13] Fumio Kanehiro, Hirohisa Hirukawa, and Shuuji Kajita. Openhrp: Open
enabled by the properly modeled contact constraints and using architecture humanoid robotics platform. The International Journal of
Robotics Research, 23(2):155–165, 2004.
an IMU to estimate spatial orientation. Another important [14] O. Khatib. A unified approach for motion and force control of robot
consequence is that we do not require an explicit model of manipulators: The operational space formulation. International Journal
the wheel geometry and the direction of base motion in the of Robotics Research, 3(1):43–53, 1987.
[15] O. Khatib, L. Sentis, J.H. Park, and J. Warren. Whole body dynamic
controller. behavior and control of human-like robots. International Journal of
The whole-body skill used in the experiment is simple yet Humanoid Robotics, 1(1):29–43, March 2004.
powerful, which demonstrates the synergistic capabilities of [16] O. Matsumoto, S. Kajita, K. Tani, and M. Oooto. A four-wheeled robot
to pass over steps by changing running control modes. In Robotics and
the framework and the robot. Firstly, the posture behavior Automation, 1995. Proceedings., 1995 IEEE International Conference
coupled with the compliant hand task ensures that the robot on, volume 2, pages 1700–1706. IEEE, 1995.
kinesthetically follows the human around the terrain. Secondly, [17] K. Nagasaka, Y. Kawanami, S. Shimizu, T. Kito, T. Tsuboi,
A. Miyamoto, T. Fukushima, and H. Shimomura. Whole-body coop-
the IMU sensor feedback and the balance task ensure that erative force control for a two-armed and two-wheeled mobile robot
the center of mass always stays very close to the center of using generalized inverse dynamics and idealized joint units. In Robotics
the horizontal projection of the base, even while transitioning and Automation (ICRA), 2010 IEEE International Conference on, pages
3377 –3383, may 2010.
between horizontal and sloped sections. Lastly, tracking and [18] G. Oriolo. Stabilization of self-motions in redundant robots. In
optimizing the various goals takes advantage of all degrees of Robotics and Automation, 1994. Proceedings., 1994 IEEE International
freedom and constraints. Conference on, pages 704–709. IEEE, 1994.
[19] R. Philippsen, L. Sentis, and O. Khatib. An open source extensible
Future directions of this work include, detecting wheel software package to create whole-body compliant skills in personal
contacts due to disturbances with terrain obstacles, dealing mobile manipulators. In Intelligent Robots and Systems (IROS), 2011
with wheel traction and slip, and improving the base hardware IEEE/RSJ International Conference on, pages 1036–1041. IEEE, 2011.
[20] L. Sentis. Synthesis and Control of Whole-Body Behaviors in Humanoid
to reduce actuator friction, transmission alignment, backlash Systems. PhD thesis, Stanford University, Stanford, USA, 2007.
and shock absortion. [21] L. Sentis and O. Khatib. Synthesis of whole-body behaviors through
hierarchical control of behavioral primitives. International Journal of
R EFERENCES Humanoid Robotics, 2(4):505–518, December 2005.
[22] R. Tellez, F. Ferro, S. Garcia, E. Gomez, E. Jorge, D. Mora, D. Pinyol,
[1] O. Brock, O. Khatib, and S. Viji. Task-consistent obstacle avoidance and J. Oliver, O. Torres, J. Velazquez, and D. Faconti. Reem-b: An
motion behavior for mobile manipulation. In Proceedings of the IEEE autonomous lightweight human-size humanoid robot. In Humanoid
International Conference on Robotics and Automation, pages 388–393, Robots, 2008. Humanoids 2008. 8th IEEE-RAS International Conference
Washingtion, USA, 2002. on, pages 462 –468, dec. 2008.
[2] H. Bruyninckx. Open robot control software: the orocos project. [23] N.D. Vuong, MH Ang, T.M. Lim, and S.Y. Lim. An analysis of the
In Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE operational space control of robots. In Robotics and Automation (ICRA),
International Conference on, volume 3, pages 2523 – 2528 vol.3, 2001. 2010 IEEE International Conference on, pages 4163–4168. IEEE, 2010.

400
Parsing Indoor Scenes Using RGB-D Imagery
Camillo J. Taylor and Anthony Cowley
GRASP Laboratory
University of Pennsylvania
Philadelphia, PA 19104
Email: [cjtaylor,acowley]@cis.upenn.edu

Abstract—This paper presents an approach to parsing the reasoning about scene structure using volumetric constraints.
Manhattan structure of an indoor scene from a single RGB- Saxena and Ng [12] recast the problem in terms of a Markov
D frame. The problem of recovering the floor plan is recast as Random field and infer the scene structure based on learned
an optimal labeling problem which can be solved efficiently using
Dynamic Programming. models. Furukawa et. al [4] describe an impressive system
for recovering the structure of indoor scenes that utilizes a
I. I NTRODUCTION sequence of monocular image and employs a sophisticated
but expensive volumetric analysis procedure. In this paper we
This paper considers the problem of parsing RGB-D images
make use of the range data provided by the Kinect sensor
of indoor scenes, such as the one shown in Figure 1 to extract
which simplifies the interpretation problem and provides more
an underlying floor plan defined by the delimiting walls.
accurate parsing results.
Recently Silberman and Fergus [13] have addressed the
problem of scene analysis using RGB-D data. In this work the
analysis problem is framed in terms of pixel labeling where
the goal is to assign each pixel in the frame an appropriate
class label. The goal in our work is to go beyond a labeling
(a) (b) (c) of the visible pixels and to instead propose a coherent floor
plan that accurately extrapolates the underlying structure of the
Fig. 1. a. RGB image b. Depth Image c. Inferred floor plan scene even in the face of clutter. In this paper we choose to
focus on extracting the floors and walls since these represent
It is important to note that this parsing problem involves the major structures which delimit the extent of the scene and
more than simply identifying the wall surfaces in the scene. provide the underlying context for other structures such as
While this is a necessary first step, the extracted walls are doors, walls, tables and chairs.
effectively infinite planes supported by a finite set of depth The approach to interpretation taken in this paper is most
measurements. In order to produce a floor plan one must similar to the one given by Lee, Hebert and Kanade [10] who
reason about the extent of each plane and how the walls propose an approach to extracting the Manhattan structure
interact with each other to form corners, occlusion edges and of a frame based on vanishing points and an analysis of
other structures. possible corners in the scene. Our approach makes use of
Furthermore, in a typical scene one must contend with the the 2.5D structure of the image in the same way that they
fact that wall segments are often occluded by furniture, pedes- do but takes a different approach to formulating and solving
trians and general clutter and that the depth measurements the parsing problem. The proposed approach is also similar
may be inaccurate or missing entirely. An effective parsing to the parsing scheme described by Flint et al.[3, 2] who
procedure needs to be able to infer the extents of the wall also make use of Dynamic Programming to efficiently propose
surfaces even in the presence of ambiguous, uncertain and interpretations of indoor scenes from monocular imagery. The
incorrect input measurements. The approach proposed in this principal differences between this work and that of Flint et
paper deals with all of these problems by formulating the al.is that it takes as input an RGB-D frame and begins by
parsing process as an optimal labeling problem which can be explicitly extracting planar surfaces, further the subsequent
solved exactly and efficiently using Dynamic Programming. dynamic programming optimization procedure is phrased in
A number of researchers have addressed the related, but terms of the RGB-D measurements as opposed to monocular
more challenging problem of inferring the 3D structure of a and stereo cues.
scene from monocular imagery. Gupta et al. [5] describe an Several schemes have been proposed to address the problem
interesting system for reasoning about scene structure using of interpreting point cloud data. Rusu et al. [11] describe
qualitative geometric and structural constraints. Hedau and an impressive system for parsing range scans acquired from
his colleagues [6, 7] have explored algorithms for inferring indoor kitchen scenes. Toshev et al. [14] describe a scheme that
the layout of indoor scenes based on vanishing points and has been used to automatically parse range scans to produce
other cues. Lee et al. [9] present an interesting approach to building models. Anguelov et al. [1] and Lalonde et al. [8]

401
describe schemes for classifying regions in point cloud data A. Image Segmentation
sets to identify, buildings, trees and other structures.
Most of these schemes were designed to work offline in a
batch fashion. In this context one can afford to make several
passes over the data to identify nearest neighbors, or to fuse
neighboring regions. The goal in this work is to develop a
scheme that can ultimately be run in an online fashion so
that it can be used to parse the data as it is being acquired.
Another salient difference is the fact that this approach seeks
to exploit the relationship between the 2D range image and the (a) (b)
associated imagery to accelerate the interpretation process.
Fig. 3. a. Extracted Intensity Edges b. Image Segmentation
II. T ECHNICAL A PPROACH
The first step in our analysis is an edge accurate segmenta-
tion scheme that breaks the RGB imagery into coherent, dis-
Segment RGB Image
joint regions. The image segmentation procedure begins with
a standard Canny edge extraction step which finds significant
discontinuities in the intensity image as shown in Figure 3a.
The detected edgels are then passed to a Delaunay Triangula-
Extract Planar Segments
tion procedure which produces a triangular tessalation of the
image. The resulting triangular graph is then segmented using
an agglomerative merging procedure that repeatedly merges
Find Dominant Rectilinear Structure
the two regions with the lowest normalized boundary cost.
These merging costs are computed by considering the average
HSV color in each region. This procedure can be efficiently
implemented using a heap data structure and the entire seg-
Identify Candidate Wall Segments mentation procedure can be carried out in 0.1 seconds on a
typical image in Matlab.

B. Extracting Planar Surfaces


Divide Image into Intervals

Extract Wall Layout

Fig. 2. Flowchart of overall parsing procedure.


(a) (b)

The overall procedure for parsing the scene based on an Fig. 4. a. Extracted Planes b. Result after coplanar segments are merged
RGB-D image is outlined in the flowchart given in Figure 2.
The first stage in the pipeline segments the input RGB image The regions derived from the image segmentation are used
into regions based on extracted image edges. The second stage to suggest groupings of the depth samples from the range
of processing uses the image segmentation as a prior to search imagery. More specifically, the depth samples associated with
for planar surfaces in the scene. The third stage identifies each of the image regions are passed to a RANSAC routine
the floor of the scene and estimates the dominant rectilinear which is used to recursively divide the point set into planar
orientation. The fourth stage considers the set of extracted regions. A key advantage of the proposed approach is that the
planes and identifies segments that could serve as walls. The image segmentation procedure is very effective at suggesting
fifth stage breaks the image up into intervals based on the useful groupings so very few RANSAC iterations are needed
extracted wall segments. The final stage estimates the layout to discover the structures of interest. Effectively, the image
of the scene by labeling each interval with the index of the segmentation serves to focus the computational effort of
underlying wall. the procedure on groupings that are likely to yield fruitful
Each of these stages is explained in more detail in the interpretations so the procedure is able to discover relevant
following subsections using the scene shown in Figure 1 as a groupings quickly even in complex environments with several
running example. surfaces.

402
It is important to keep in mind that the depth measurements orientation by determining how many of the other planar
produced by the Kinect sensor are derived from structured surfaces are aligned with one of the cardinal axes. The
light via triangulation as opposed to time of flight. As such candidate rotation matrix with the most support is chosen
the device is best thought of as measuring disparity which is as the dominant rectilinear orientation. The fourth column of
inversely related to depth. One practical consequence is that Figure 8 depicts the extracted rectilinear structure by showing
the error in the depth estimates increases rapidly as one gets how various segments are aligned with the dominant axes.
further away from the sensor which argues against standard In addition to the walls extracted by the fitting procedure,
approaches to fitting planes to points based on the residual 4 additional walls are added to form an axis aligned bounding
error in 3D. box that surrounds all of the recovered points. This bounding
In the proposed scheme the planar surfaces are fit to box serves as a ’backdrop’ providing a default interpretation
the RGB-D measurements by exploiting the observation that for every point in the scene.
planar surfaces in the scene will project to planar regions in
the disparity image. This can be seen by taking the standard D. Identify Candidate Walls and Wall Segments
equation for a plane in the coordinate frame of the sensor: Once the dominant rectilinear structure of the scene has
been established, the system identifies extracted planar seg-
nx X + n y Y + nz Z = c ments that may be walls in the scene. This is accomplished
dividing through by scene depth, Z, to obtain: by finding planar structures in the scene whose normals are
aligned with either the x or y axis and that have an appropriate
X Y 1
nx + ny + nz = c horizontal and vertical extent. Note that in practice the walls
Z Z Z are often partially occluded by furniture and other clutter so
and noting that u = X Y
Z and v = Z correspond to the
there is no requirement that the wall segment extend from the
normalized image coordinates while w = Z1 denotes the floor to the ceiling. The third column of Figure 8 shows the
measured disparity at that coordinate. This means that planar results of the analysis that identifies wall segments. Note that
regions in the scene can be extracted by fitting affine models for the purposes of our experiments tall cabinets and office
to the disparity in each image region. partitions can serve as walls since they have an appropriate
Figure 4 shows the results of the planar interpretation proce- extent and serve to delimit the floor plan of the scene.
dure on the sample scene. Here the different colors correspond Once the candidate walls have been identified in the image,
to different planar segments that were recovered. These planar the points that make up that segment are further divided into
segments are then passed to a greedy merging procedure which contiguous sections called wall segments. This partitioning
seeks to group coplanar segments into extended regions as accounts for the fact that a wall may stop and start in the scene
shown in Figure 4b. as the dominant wall in Figure 1 does. These wall segments
represent continuous stretches of wall surface observed in the
C. Finding Dominant Rectilinear Structure RGB-D image. Figure 5 makes the distinction between walls,
The planar extraction procedure returns a set of segments which are modeled as infinite planes, and wall segments which
which can then be analyzed to identify salient structures. are thought of as finite sections of wall surface observed in
The analysis procedure assumes that the vertical axis of the the RGB-D image.
image is roughly aligned with the gravity vector. Given this
assumption, the first step in the interpretation procedure is to E. Divide Image into Intervals
identify the floor plane by searching for a large planar region Figure 6 shows an idealized image of an indoor scene where
near the bottom of the image whose normal is approximately the vertical axis of the camera is aligned with the gravity
vertical and which appears to underly most of the other 3D direction. In this case vertical lines in the scene will project
points. The normal to this plane defines the direction of the to vertical lines in the image. In particular the vertical lines
gravity vector in the RGB-D sensors frame of reference. corresponding to corners in the scene or to points where one
Candidate wall segments are identified by looking for ex- wall segment occludes another would effectively subdivide the
tended planar segments whose normals are perpendicular to horizontal field of view into a sequence of disjoint intervals as
this gravity direction. Each candidate wall segment effectively shown. Each interval in the image would be associated with a
defines an associated rectilinear orientation for the scene where wall. This structure was noted and exploited by Lee, Hebert
the z-axis corresponds to the gravity direction, the x-axis and Kanade [10] in their work on parsing indoor scenes from
corresponds to the normal to the wall segment and the y- single images.
axis is simply the cross product of these normal vectors. This The parsing scheme proposed in this paper proceeds by
rectilinear orientation can be compactly represented
 with a breaking the image into a sequence of intervals and then
single rotation matrix Rcw ∈ SO(3), Rcw = x̂ ŷ ẑ associating a wall with each span to produce an interpretation
which captures the orientation of the RGB-D sensor with of the scene. The endpoints of the intervals are derived from
respect to the Manhattan structure of the scene. the extents of the extracted wall segments and from the
The interpretation system cycles through each of the can- inferred intersections between all pairs of perpendicular walls.
didate wall segments and scores the associated rectilinear These intersections are depicted by circles in Figure 5 . All of

403
Fig. 7. For each pixel in the frame we can compute the index of the wall
that best explains the observed disparity and orientation. We can then evaluate
Fig. 5. An overhead view of a floor plan of the scene indicating the walls, how well a particular wall explains a given image interval by projecting the
which are considered to have infinite extent and are depicted by dashed wall into the frame at that interval and determining how many pixels in that
lines, and the wall segments, which are depicted by the solid line segments. region agree with the proposed label. The quadrilateral demarcated on the
These wall segments correspond to contiguous planar regions in the RGB-D frame corresponds to the projection of one of the prospective walls into one
image. The interpretation system also considers all of the possible intersections of the image intervals.
between perpendicular walls, denoted by circles in the figure, since these may
correspond to corners in the scene.

observer.
This can be viewed as a graph labeling problem where the
nodes in the graph are the intervals and neighboring intervals
are connected by edges to form a simple 1D chain. We asso-
ciate a cost with assigning wall labels to each of the intervals
and a cost to assigning different wall labels to neighboring
segments and then find a labeling which minimizes the overall
cost.
In order to assign costs to each interval, the procedure
first determines the optimal wall segment label for each
pixel in the frame. This is accomplished by projecting each
wall into the RGB-D frame and comparing the disparity and
orientation predicted at each pixel with the observed disparity
Fig. 6. Idealized figure of a frame showing how the horizontal field of view
can be divided into non-overlapping intervals each of which will be associated and orientation. The predicted depth or disparity at each pixel
with an underlying wall. can easily be calculated based on the best fit normal to the
plane. The system considers each of the extracted walls in
turn and each pixel in the frame retains the index of the wall
these putative endpoints are projected into the horizontal field that comes closest to predicting the observed disparity and
of view of the sensor and then sorted from right to left to surface normal at that point.
define image intervals as shown in Figure 6. Including more Once this initial labeling step has been completed, we can
endpoints than needed is not a problem since the subsequent evaluate the cost of assigning a particular image interval to a
labeling stage can consolidate neighboring intervals as needed. particular wall candidate by projecting a quadrilateral defined
Note that we do not require our input images to be perfectly by the interval endpoints and the ceiling and floor heights
aligned with gravity as shown in this idealized view since the into the frame as shown in Figure 7. The system considers
rotation matrix Rcw recovered in the rectilinear analysis stage all of the pixels that lie within that region and measures what
allows us to effectively rectify the input image to account for fraction of these pixels do not have the wall in question as
the tilt and roll of the sensor. their first choice. This fraction indicates the cost of assigning
the wall label to the interval. The lower the number, the better
F. Extract Wall Layout the match.
As noted in the previous subsection, the scene interpretation This score is also weighted by the apparent size of the
procedure is rephrased as a labeling problem where the goal interval in the image. More specifically a normalized interval
is to label each interval in the horizontal field of view with an size is computed by taking the angle subtended by the interval
associated wall segment. The extents of the intervals would in the image and dividing it by the effective horizontal field
define the extents of the walls from the point of view of the of view of the image to produce a number between 0 and 1.

404
The product of this normalized interval size and the fractional costs are quite different because this approach exploits RGB-D
score mentioned in the previous paragraph is used as the final imagery.
label score for the interval.
Unfortunately it is typically not sufficient to simply assign III. E XPERIMENTAL R ESULTS
each interval the index of the wall with the lowest cost. Miss- In order to evaluate the effectiveness of the proposed
ing depth measurements, occlusion and inherent ambiguities algorithm it was applied to a data set of 38 RGB-D im-
serve to introduce spurious labelings. The proposed scheme ages taken from various vantage points in a typical office
makes use of a global optimization procedure to enforce environment. For each of the test images an interpretation
smoothness and produce a more coherent interpretation in the score was manually generated by counting the number of
face of these uncertainties. The labeling procedure exploits the image intervals where the wall assignment suggested by the
fact that the nodes in the graph we are labeling form a simple automatic procedure differed from the humans assessment. A
chain which allows us to find the globally optimal assignment score of zero would indicate that the system had correctly
efficiently via Dynamic Programming. labeled all of the wall surfaces in the scene while a score of
In order to encourage continuity in the labeling process a 1 would indicate that one of the image intervals was labeled
cost is associated with each of the edges in the chain. This incorrectly.
transition cost is designed to capture the cost associated with On 20 of these images the procedure produced a completely
assigning adjacent intervals to different wall segments. If the correct interpretation extracting all wall surfaces even in the
proposed labeling induces a significant change in depth at the presence of significant clutter, on sixteen of the images one
junction between the two image intervals then a fixed transition of the recovered wall segments was labeled incorrectly. In the
penalty is applied, in our experiments this transition penalty vast majority of these cases the erroneous parse covers a small
was fixed at 0.03. Note that if two perpendicular walls meet to section of the frame and the error is caused by incomplete
form a corner there is no depth discontinuity and the associated range data. On two of the frames the procedure failed to
transition penalty would be zero. produce an intelligible result. The same parameter settings
The objective function that we are interested in minimizing were used on all of the examples.
takes the following form: Figure 8 shows samples of the scenes that the system parsed
correctly. Note that the scheme was able to handle situations

ni
with significant amounts of clutter such as the third and fifth
O(l) = (fi (li ) + ei (li , li−1 )) (1) case. It can also correctly handle cases with relatively complex
i=1
occlusion structures as in the second example. Note that the
Where ni denotes the number of intervals, fi (li ) denotes the system correctly recovers small features like the edges of the
cost of associating interval i with wall label li and ei (li , li−1 ) doorway on the second example and the structure of the water
represents the transition cost associated with assigning interval cooler alcove in the third example. It is also able to deal
i the label li while interval i − 1 is assigned the label li−1 . correctly with the clutter in the fourth and fifth examples.
The optimization problem can be tackled in stages where Figure 9 shows examples of the cases where one of the
at each stage the system considers a series of optimization planes is labeled incorrectly. Note that in all of these cases
problem of the form: the errors are fairly subtle and the gross structure of the scene
is actually recovered quite well. For example In the fifth case
Di,j = min[fi (j) + ei (j, k) + Di−1,k ] (2) the person in the foreground is labeled as a wall, in the fourth
k
example the end of the corridor is not parsed correctly because
Where Di,j represents the cumulative cost of assigning it is beyond the range of the sensor.
interval i the label j at this stage. The Dynamic Programming The entire segmentation and analysis procedure is imple-
procedure systematically considers all of the legal labels that mented in Matlab and it takes approximately 6 seconds to
can be assigned to each interval. Note that at each stage run the complete analysis on a typical RGB-D image on a
this optimization problem uses the results from the previous Macbook Pro laptop. No attempt has been made to optimize
optimization stage. The computational complexity of the op- the code and we expect that a more efficient implementation
timization problem O(ni n2w ). Where nw denotes the number would run significantly faster.
of wall labels. On a typical problem ni is on the order of All of the code and datasets used in this paper
20 while nw is on the order of 10 so the computational cost are freely available online at the following website:
of the optimization is usually quite small. The final output of http://www.cis.upenn.edu/∼cjtaylor/RESEARCH/projects/
the optimization procedure is a labeling of each of the image RGBD/RGBD.html.
intervals. This labeling can be used to construct the 2.5D scene
models shown in Figure 8. IV. C ONCLUSIONS AND F UTURE W ORK
The overall strategy of phrasing the interpretation process as This paper has presented an approach to parsing the floor
an optimal labeling problem that can be solved with Dynamic plan of an indoor scene from a single RGB-D frame by
Programming is similar to the approach proposed by Flint et finding a set of candidate walls and delimiting their extent
al.[3, 2] however the schemes used to define the interpretation in the image. The problem of parsing the scene is recast as an

405
optimal labeling problem which can be solved efficiently using [6] Varsha Hedau, Derek Hoiem, and David Forsyth. Recov-
Dynamic Programming. In this sense, the method exploits the ering the spatial layout of cluttered rooms. In Proceed-
2.5D structure of the image to simplify the scene interpretation ings of the IEEE International Conference on Computer
problem. Vision (ICCV ’09), 2009.
The analysis provides a compact description of the overall [7] Varsha Hedau, Derek Hoiem, and David Forsyth. Think-
structure of the scene in terms of a floor plane and wall ing inside the box: using appearance models and con-
segments. This representation can serve as a basis for further text based on room geometry. In Proceedings of the
semantic analysis which would identify other structures such 11th European conference on Computer vision: Part
as doors, tables, chairs and windows. We note that while the VI, ECCV’10, pages 224–237, Berlin, Heidelberg, 2010.
Manhattan structure is a convenient cue, it is not essential to Springer-Verlag.
this approach. The same basic scheme could be employed to [8] J. F. Lalonde, N. Vandapel, D. F. Huber, and M. Hebert.
parse environments where some of the walls do not adhere to Natural terrain classification using three-dimensional
the rectilinear model. ladar data for ground robot mobility. Journal of Field
Future work will seek to merge parse results from a Robotics, 23(10):839–861, 2006.
sequence of RGB-D frames into larger floor plans. Here [9] David Changsoo Lee, Abhinav Gupta, Martial Hebert,
the Manhattan structure provides a convenient framework and Takeo Kanade. Estimating spatial layout of rooms
for accumulating information about recovered wall segments using volumetric reasoning about objects and surfaces.
into a coherent map in an incremental fashion. Such an Advances in Neural Information Processing Systems
approach could be used to produce semantic decompositions of (NIPS), 24, November 2010.
extended regions which could be useful in a variety of robotic [10] D.C. Lee, M. Hebert, and T. Kanade. Geometric rea-
applications. soning for single image structure recovery. In Computer
Vision and Pattern Recognition, 2009. CVPR 2009. IEEE
ACKNOWLEDGMENTS Conference on, pages 2136 –2143, june 2009.
This research was partially sponsored by the Army Research [11] Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow,
Laboratory Cooperative Agreement Number W911NF-10-2- Mihai Dolha, and Michael Beetz. Towards 3D Point
0016 and by the National Science Foundation through an Cloud Based Object Maps for Household Environments.
I/UCRC grant on Safety, Security, and Rescue Robotics. Robotics and Autonomous Systems Journal (Special Issue
on Semantic Knowledge), 2008.
R EFERENCES [12] Ashutosh Saxena, Min Sun, and Andrew Y. Ng. Make3d:
[1] D. Anguelov, B. Taskarf, V. Chatalbashev, D. Koller, Learning 3d scene structure from a single still image.
D. Gupta, G. Heitz, and A. Ng. Discriminative learning IEEE Transactions on Pattern Analysis and Machine
of markov random fields for segmentation of 3d scan Intelligence, 31:824–840, 2009.
data. In IEEE Conference on Computer Vision and [13] N. Silberman and R. Fergus. Indoor scene segmentation
Pattern Recognition, volume 2, pages 169 – 176, jun. using a structured light sensor. In Proceedings of the
2005. International Conference on Computer Vision - Workshop
[2] A. Flint, D. Murray, and I. Reid. Manhattan scene on 3D Representation and Recognition, 2011.
understanding using monocular, stereo, and 3d features. [14] A. Toshev, P. Mordohai, and B. Taskar. Detecting
In Computer Vision (ICCV), 2011 IEEE International and parsing architecture at city scale from range data.
Conference on, pages 2228 –2235, nov. 2011. In IEEE Conference on Computer Vision and Pattern
[3] Alex Flint, Christopher Mei, David Murray, and Ian Recognition, pages 398–405, 2010.
Reid. A dynamic programming approach to recon-
structing building interiors. In Kostas Daniilidis, Petros
Maragos, and Nikos Paragios, editors, Computer Vision –
ECCV 2010, volume 6315 of Lecture Notes in Computer
Science, pages 394–407. Springer Berlin / Heidelberg,
2010.
[4] Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and
Richard Szeliski. Reconstructing building interiors from
images. In International Conference on Computer Vision,
pages 80–87, Kyoto, October 2009.
[5] Abhinav Gupta, Alexei A. Efros, and Martial Hebert.
Blocks world revisited: image understanding using qual-
itative geometry and mechanics. In Proceedings of the
11th European conference on Computer vision: Part
IV, ECCV’10, pages 482–496, Berlin, Heidelberg, 2010.
Springer-Verlag.

406
Fig. 8. Figure showing some successful scene parsing results. The second column shows the shows the planar segments extracted from the scene, different
colors are used to label points belonging to different planes. The third column shows the points belonging to candidate wall segments, the fourth column shows
shows the results of the analysis that finds the dominant rectilinear structure. The final column shows the results of parsing the image into wall segments,
each pixel is assigned a label corresponding to the underlying wall.

407
Fig. 9. Figure showing cases where one of the recovered planes has an error. The second column shows the shows the planar segments extracted from the
scene, different colors are used to label points belonging to different planes. The third column shows the points belonging to candidate wall segments, the
fourth column shows shows the results of the analysis that finds the dominant rectilinear structure. The final column shows the results of parsing the image
into wall segments, each pixel is assigned a label corresponding to the underlying wall.

408
Toward Information Theoretic Human-Robot Dialog
Stefanie Tellex1† , Pratiksha Thaker1† , Robin Deits‡ , Dimitar Simeonov†, Thomas Kollar§ , Nicholas Roy†
† MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Email: {stefie10, prthaker, mitko, nickroy}@csail.mit.edu
‡ Battelle Memorial Institute, Columbus, Ohio
Email: [email protected]
§ Carnegie Mellon University, Pittsburgh, PA
Email: [email protected]

Abstract—Our goal is to build robots that can robustly


interact with humans using natural language. This problem is
challenging because human language is filled with ambiguity,
and furthermore, due to limitations in sensing, the robot’s
perception of its environment might be much more limited than
that of its human partner. To enable a robot to recover from a
failure to understand a natural language utterance, this paper
describes an information-theoretic strategy for asking targeted
clarifying questions and using information from the answer to
disambiguate the language. To identify good questions, we derive
an estimate of the robot’s uncertainty about the mapping between
specific phrases in the language and aspects of the external
world. This metric enables the robot to ask a targeted question
about the parts of the language for which it is most uncertain. (a)
After receiving an answer, the robot fuses information from the
command, the question, and the answer in a joint probabilistic Move the pallet from the truck.
graphical model in the G3 framework. When using answers to
questions, we show the robot is able to infer mappings between Remove the pallet from the back of the truck.
parts of the language and concrete object groundings in the
external world with higher accuracy than by using information Offload the metal crate from the truck.
from the command alone. Furthermore, we demonstrate that by
effectively selecting which questions to ask, the robot is able to (b)
achieve significant performance gains while asking many fewer Fig. 1: Sample natural language commands collected from
questions than baseline metrics.
untrained users, commanding the forklift to pick up a pallet
I. I NTRODUCTION in (a).
Our aim is to make robots that can naturally and flexibly
interact with a human partner via natural language. An espe-
cially challenging aspect of natural language communication is language. The robot first identifies the most ambiguous parts
the use of ambiguous referring expressions that do not map to of a command, then asks a targeted question to try to reduce
a unique object in the external world. For instance, Figure 1 its uncertainty about which aspects of the external world
shows a robotic forklift in a real-world environment paired correspond to the language. For example, when faced with
with instructions created by untrained users to manipulate a command such as “Pick up the pallet on the truck” in the
one of the objects in the scene. These instructions contain situation shown in Figure 1, the robot can infer that because
ambiguous phrases such as “the pallet” which could refer there is only one truck in the scene, but two pallets, the
equally well to multiple objects in the environment. Even phrase “the pallets” is the most ambiguous and ask a question
if the person gives a command that would be unambiguous like, “What do you mean by ‘the pallet’?” Then it can use
to another person, they might refer to aspects of the world information from the answer to disambiguate which object is
that are not directly accessible to the robot’s perceptions. For being referenced in order to infer better actions in response to
example, one of the commands in Figure 1 refers to “the the natural language command.
metal crate.” If a robot does not have access to perceptual Previous approaches to robotic question-asking do not
features corresponding to the words “metal” or “crate,” it directly map between natural language and perceptually-
cannot disambiguate which object is being referenced. grounded aspects of the external world or incorporate addi-
In this paper, we present an approach for enabling robots to tional information from free-form natural language answers in
recover from failures like these by asking a clarifying question, order to disambiguate the command. [3, 11, 1]. As a result, the
the same strategy that humans use when faced with ambiguous robot cannot take advantage of its model of the environment
to determine the most ambiguous parts of an arbitrary natural
1 The first two authors contributed equally to this paper. language command and identify a question to ask.

409
In order to derive an expression for the robot’s uncertainty factor the model. However, if we define a directed model over
about groundings in the external world, our approach builds on these variables, we must assume a possibly arbitrary order to
the Generalized Grounding Graph (G3 ) framework [18, 17]. the conditional γi factors. For example, for a phrase such as
The G3 framework defines a probabilistic model that maps “the tire pallet near the other skid,” we could factorize in either
between parts of the language and groundings in the external of the following ways:
world, which can be objects, places, paths, or events. The
model factors according to the linguistic structure of the nat- p(γtires , γskid |Λ) = p(γskid |γtires , Λ) × p(γtires |Λ) (2)
ural language input, enabling efficient training from a parallel p(γtires , γskid |Λ) = p(γtires |γskid , Λ) × p(γskid |Λ) (3)
corpus of language paired with corresponding groundings.
Depending on the order of factorization, we will need different
However, this factorization requires introducing a new corre-
conditional probability tables that correspond to the meanings
spondence variable which leads to difficulties when estimating
of words in the language. To resolve this issue, another
the entropy over grounding variables. In this paper we derive
approach is to use Bayes’ Rule to estimate the p(Λ|γ1 . . . γN ),
a metric based on entropy using the G3 framework. The
but this distribution would require normalizing over all possi-
robot uses this metric to identify the most uncertain random
ble words in the language Λ. Another alternative is to use an
variables in the model in order to select a question to ask. Once
undirected model, but this would require normalizing over all
the robot has asked a question, we show that it can exploit
possible values of all γi variables in the model.
information from an answer produced by an untrained user by
To address these problems, the G3 framework introduces a
merging variables in the grounding graph based on linguistic
correspondence vector Φ to capture the dependency between
coreference. By performing inference in the merged model,
γ1 . . . γN and Λ. Each entry in φi ∈ Φ corresponds to whether
the robot infers the best set of groundings corresponding to
linguistic constituent λi ∈ Λ corresponds to grounding γi . We
the command, the question, and the answer.
assume that γ1 . . . γN are independent of Λ unless Φ is known.
We evaluate the system by collecting answers to questions
Introducing Φ enables factorization according to the structure
created by the robot using crowdsourcing. We demonstrate that
of language with local normalization at each factor over a
the system is able to incorporate information from the answer
space of just the two possible values for φi .
in order to more accurately ground concrete noun phrases in
1) Inference: In order to use the G3 framework for infer-
the language to objects in the external world. Furthermore, we
ence, we want to infer the groundings γ1 . . . γN that maximize
show that our metric for identifying uncertain variables to ask
the distribution
questions about significantly reduces the number of questions
the robot needs to ask in order to resolve its uncertainty. This argmax p(γ1 . . . γN |Φ, Λ), (4)
γ1 ...γN
work expands on previous work presented in [14].
II. BACKGROUND which is equivalent to maximizing the joint distribution of all
groundings γ1 . . . γN , Φ and Λ,
We briefly review grounding graphs, which were intro-
duced by [18], giving special attention to the motivation for argmax p(γ1 . . . γN , Φ, Λ). (5)
the correspondence variable, Φ. The correspondence variable γ1 ...γN
makes it possible to efficiently train the model using local We assume that Λ and γ1 . . . γN are independent when Φ
normalization at each factor but complicates the calculation is not known, yielding:
of entropy described in Section III-A.
In order for a robot to understand natural language, it argmax p(Φ|Λ, γ1 . . . γN )p(Λ)p(γ1 . . . γN ) (6)
γ1 ...γN
must be able to map between words in the language and
corresponding groundings in the external world. The aim is This independence assumption may seem unintuitive, but
to find the most probable groundings γ1 . . . γN given the it is justified because the correspondence variable Φ breaks
language Λ and the robot’s model of the environment m: the dependency between Λ and γ1 . . . γN . If we do not
know whether γ1 . . . γN correspond to Λ, we assume that the
argmax p(γ1 . . . γN |Λ, m) (1) language does not tell us anything about the groundings.
γ1 ...γN
Finally, we assume a constant prior on γ1 . . . γN and ignore
For brevity, we omit m from future equations. Groundings p(Λ) since it does not depend on γ1 . . . γN , leading to:
are the specific physical concept that is meant by the language
and can be objects (e.g., a truck or a door), places (e.g., argmax p(Φ|Λ, γ1 . . . γN ) (7)
γ1 ...γN
a particular location in the world), paths (e.g., a trajectory
through the environment), or events (e.g., a sequence of robot The G3 framework described by [18] trains the model in
actions). a discriminative fashion. However, it is not a conventional
To learn this distribution, one standard approach is to conditional random field in that the correspondence vector
factor it based on certain independence assumptions, then Φ = T rue is observed, and the conditioning variables γi are
train models for each factor. Natural language has a well- hidden, preventing the use of standard inference techniques.
known compositional, hierarchical argument structure [6], and To compute the maximum value of the objective in Equation 7,
a promising approach is to exploit this structure in order to the system performs beam search over γ1 . . . γN , computing

410
Command Question Answer

γ1 =
γ2 = γ3 = γ5 = γ6 =

φ1 φ2 φ3 φ5 φ6 φ7

λr1 λf2 λr3 λf5 λr6 λf7


“Pick up” “the pallet.” “Which one?” “The one” “near” “the truck.”

(a) Unmerged grounding graphs for three dialog acts. The noun phrases “the pallet,” “one” and “the one near the truck” refer to the same grounding
in the external world, but initially have separate variables in the grounding graphs.

γ3 =
γ1 =
γ6 =

φ1 φ2 φ3 φ5 φ6 φ7

λr1 λf2 λr3 λf5 λr6 λf7


“Pick up” “the pallet.” “Which one?” “The one” “near” “the truck.”

(b) The grounding graph after merging γ2 , γ3 and γ5 based on linguistic coreference.

Fig. 2: Grounding graphs for a three-turn dialog, before and after merging based on coreference . The robot merges the three
shaded variables.

the probability of each assignment from Equation 7 to find 2) Training: We learn model parameters from a corpus of
the maximum probability assignment. Although we are using labeled examples. The G3 framework assumes a log-linear
p(Φ|Λ, γ1 . . . γN ) as the objective function, Φ is fixed, and parametrization with feature functions fj and feature weights
the γ1 . . . γN are unknown. This approach is valid because, μj :
given our independence assumptions, p(Φ|Λ, γ1 . . . γN ) cor- & 1 
responds to the joint distribution over all the variables given p(Φ|Λ, γ1 . . .γN ) = exp( μj fj (φi , λi , γi1 . . . γik ))
in Equation 5. i
Z j
In order to perform beam search, we factor the model (8)
according to the hierarchical, compositional linguistic structure Features map between words in the language and correspond-
of the command: ing groundings in the external world. For example, features
& include object class, whether one grounded object is physically
p(Φ|Λ, γ1 . . . γN ) = p(φi |λi , γi1 . . . γik )
supported by another grounded object, or whether the robot
i
is approaching or moving away from a landmark object. The
This factorization can be represented graphically; we call training set consists of an aligned, parallel corpus of language
such a graphical model the grounding graph for a natural paired with positive and negative examples of groundings in
language command. We can draw it as a factor graph, but the external world. The alignment annotations consist of a
in this paper we represent it as a directed graphical model mapping between each natural language constituent and a
to emphasize the role of the correspondence variable and corresponding grounding in the external world. We use the
independence assumptions in the factorization. The directed corpus and alignment annotations described in [18].
model for the command “Pick up the pallet” appears in
Figure 2. The λ variables correspond to language; the γ III. T ECHNICAL A PPROACH
variables correspond to groundings in the external world, and When faced with a command, the system extracts grounding
the φ variables are T rue if the groundings correspond to the graphs from the natural language input and performs inference
language, and F alse otherwise. to find the most likely set of values for the grounding variables

411
variables, then ask a question about the variable with max-
imum entropy. We begin by defining a family of marginal
distributions for each grounding variable γj conditioned on Φ
and Λ:
p(γj |Φ, Λ) (10)
To find the most uncertain grounding variable γj , we find the
distribution in this family with the highest entropy:
argmax Hp(γj |Φ,Λ) (γj ) (11)
j

The system can collect more information from the human


Fig. 3: System diagram. Grayed out blocks show pre-existing partner in order to disambiguate the command by asking a
components; black parts show the question-asking feedback question about the most uncertain variable, For example, if a
system new to this paper. command like “bring the pallet on the truck to receiving” were
issued in a context with two trucks and one pallet, the entropy
would be higher for the phrase “the truck”and the system could
γ1 . . . γN . Next, it identifies the most uncertain grounding ask a question such as “Which truck?” On the other hand, if
variable γj and asks a question about that variable, described there were two pallets and one truck, the entropy would be
in Section III-A. After receiving an answer from a human higher for the phrase “the pallet” and the system would ask a
partner, the robot merges grounding graphs from the com- question such as “Which pallet?”
mand, question, and answer into a single graphical model, We can expand the entropy function as:
described in Section III-B. Finally, it performs inference in the
merged graph to find a new set of groundings that incorporates 
information from the answer as well as information from the Hp(γj |Φ,Λ) (γj ) = − p(γj |Φ, Λ) log p(γj |Φ, Λ) (12)
original command. Figure 3 shows dataflow in the system. γj

Grayed-out blocks in the figure show pre-existing components, Unfortunately, p(γj |Φ, Λ) cannot be directly computed in
not novel to this paper. Black blocks show the question-asking the G3 framework, because we only have p(Φ|Λ, γ1 . . . γN ),
feedback system that is the contribution of this work. which the system maximizes with respect to the unknown
A. Generating a Question grounding variables γ1 . . . γN . Instead, we rewrite it as a
marginalization over the joint:
Our approach to asking questions is to first identify ground- 
ing variables whose values are most uncertain, then generate p(γj |Φ, Λ) = p(γ1 . . . γN |Φ, Λ) (13)
a question to try to disambiguate the value of that variable. γ1 ...γj−1 ,γj+1 ...γN
The system iteratively asks questions about the most uncertain
grounding variable γj until it is sufficiently confident about We use Bayes’ rule to rewrite it:
having inferred the correct groundings.  p(Φ|γ1 . . . γN , Λ)p(γ1 . . . γN |Λ)
One intuitive estimate for the uncertainty of a grounding p(γj |Φ, Λ) =
p(Φ|Λ)
variable γj is to look at the probability of the correspondence γ1 ...γj−1 ,γj+1 ...γN
variable φk for each factor it participates in: (14)
&
argmin p(φk |γ1 . . . γN , Λ) (9) Next, we assume that γ1 . . . γN are independent of Λ when
γj
k∈f actors(γj ) we do not know Φ, as we did in Equation 6, yielding:
If the system was unable to find a high-probability grounding  p(Φ|γ1 . . . γN , Λ)p(γ1 . . . γN )
for a variable γj , then it could ask a question to collect more p(γj |Φ, Λ) =
p(Φ|Λ)
information. We refer to this approach as Metric 1. 1 γ1 ...γj−1 ,γj+1 ...γN
However, this metric will not perform well if there are (15)
several objects in the external environment that could cor- Finally, we assume a constant prior p(γ1 . . . γN ) = C and
respond equally well to the words in the language. As an define a constant K = C/p(Φ|Λ):
example, a vague expression such as “the pallet” would have 
high confidence for any pallet that was grounded to it. But if p(γj |Φ, Λ) = K p(Φ|γ1 . . . γN , Λ) (16)
there were many pallets in the environment, the robot might γ1 ...γj−1 ,γj+1 ...γN
be very uncertain about which one was meant. We can efficiently approximate this summation based on
A more principled approach is to formally derive an ex- the results of the inference process. After running inference,
pression for the entropy of the distribution over grounding the system saves all M sets of values that it considered for
1 We use confidence instead of H
p(Φ|γ1 ...γN ,Λ) (Φ), since entropy is low
grounding variables in the model; we call each sample sm .
when p(φk |γ1 . . . γN , Λ) is either very high or very low probability. Each sample consists of bindings for grounding variable γj

412
in the grounding graph, and we denote the value of variable
γj in sm as sm [γj ]. When performing inference using beam
search, we set the beam width to be large so that the samples
are diverse enough to enable accurate estimate of entropy. We
approximate 16 with:
c(γj = g, Φ, Λ)
p̂(γj = g|Φ, Λ) =  (17)
x c(γj = x, Φ, Λ)

where c is

c(γj = g|Φ, Λ) =

K p(Φ|sm [γ1 ] . . . sm [γN ], Λ)×
Fig. 4: Precision vs. recall at predicting whether an inferred
{ sm |sm [γj ]=g }
grounding is incorrect.
p(sm [γ1 ] . . . sm [γN ]) (18)
We can substitute this equation into Equation 12 to obtain
an estimate for the entropy. We refer to this approximation as of positive and negative pairs of coreferences. We set the
Metric 2. classification threshold of the model to 0.5 so that it choses
the result with the most probability mass. Once coreferring
After identifying a variable to ask about, the robot asks a
question using a template-based algorithm. It finds text asso- variables have been identified, a merging algorithm creates
a single unified grounding graph. The coreference resolution
ciated with the grounding variable and generates a question of
algorithm identifies pairs of γ variables in the grounding graph
the form “What do the words X refer to?” Once a question has
been generated, the system asks it and collects an answer from that corefer; the merging algorithm combines all pairs of
coreferring variables. Figure 2 shows a merged graph created
the human partner. In general, answers could take many forms.
For example, Figure 5 shows commands, questions generated from a command, a question, and an answer.
using the template-based algorithm, along with corresponding C. Deciding When to Ask a Question
answers collected from untrained users. Section III-A described how to generate a question in
B. Merging Graphs response to a natural language command. However, at a higher
level, the robot needs to decide whether to ask a question or
Once a question has been chosen and an answer obtained, take an action. For example, if the robot is very confident
the robot incorporates information from the answer into its about all grounding variables, it would be better to ask no
inference process. It begins by computing separate grounding questions at all. If it is uncertain about just one variable, a
graphs for the command, the question and the answer ac- single question might suffice to disambiguate the command.
cording to the parse structure of the language. Next, variables Or it might ask a question, get an answer that it cannot
in separate grounding graphs are merged based on linguistic understand, then choose to ask an additional question about the
coreference. Finally, the system performs inference in the same grounding variable to receive further clarification. Our
merged graph to incorporate information from the command, approach to this problem is to ask questions until the entropy
question, and answer. of the most uncertain variable is below a certain threshold. To
Resolving linguistic coreferences involves identifying lin- avoid going into an infinite loop, we prohibit the robot from
guistic constituents that refer to the same entity in the external asking a question about the same variable more than two times.
world. For example, in the command, “Pick up the tire pallet
and put it on the truck,” the noun phrases “the tire pallet” IV. R ESULTS
and “it” refer to the same physical object in the external To evaluate the system, we use a corpus of 21 manually
world, or corefer. Coreference resolution is a well-studied created commands given to a simulated robotic forklift. The
problem in computational linguistics [7]. Although there are commands were designed to be deliberately ambiguous in
several existing software packages to address this problem, order to provide an opportunity for clarifying questions and
most are developed for large corpora of newspaper articles answers. In Section IV-A, we assess the performance of the
and generalize poorly to language in our corpus. Instead, we Metric 1 (Confidence) and Metric 2 (Entropy) at identifying in-
created a coreference system which is trained on language correct grounding variables after inference has been performed
from our corpus. Following typical approaches to coreference for commands in the dataset. Second, we assess the end-to-end
resolution [16], our system consists of a classifier to predict performance of the question-asking framework at increasing
coreference between all pairs of noun phrases in the language the number of correctly grounded concrete noun phrases.
combined with a clustering algorithm that enforces transitivity
and finds antecedents for all pronouns. For the pair-wise A. Predicting Incorrect Examples
classifier we used a log-linear model which uses bag-of-words To directly assess the performance of the metrics defined
features. The model is trained using an annotated corpus in Section III-A, we measure their performance at identifying

413
grounding variables in the corpus that are incorrect. Figure 4 about each concrete noun phrase. The baseline results in
shows the effect of this process on the commands from the Table I show that the system realizes a large improvement
corpus as a precision vs. recall curve. Here, a true positive in performance when using information from commands,
is an incorrect grounding variable that was predicted to be questions, and answers compared to information from the
incorrect; a false positive is a correct grounding variable that commands alone.
was predicted to be incorrect. Metric 1 (Confidence) performs Next, we assess the performance of the two metrics at
quite poorly, even having positive slope. This is not due to a selecting questions to ask. We report performance for three
bug; instead, there is a group of low-confidence grounding conditions: selecting just one question to ask about a com-
variables which are in fact correct. For example, for the mand, selecting two questions, and selecting questions until
command “Lift it from the truck,” the factor for “the truck” has uncertainty is below a specified, hand-tuned threshold. We also
high confidence, but “from the truck” has lower confidence. compare to selecting a question at random. The system may
Metric 1 multiplies the two values together, yielding an overall ask up to two questions about each concrete noun phrase.
low estimate for confidence. Metric 2, in contrast, gives this When asking about a noun phrase for a second time, it
variable much lower entropy compared to other commands, generates the same question but receives a different answer.
since the “from the truck.” The system could ask a total of 76 possible questions over the
21 commands in the corpus; for each approach we also report
B. Asking Questions the fraction of questions from this space that were actually
Next, we assess the performance of the system at using asked. Table I shows the performance of the system in these
answers to questions to disambiguate ambiguous phrases in conditions.
the corpus. To collect a corpus of questions and answers, When asking one question and using automatic coreference
we first generated questions for each concrete noun phrase resolution, Metric 2 (Entropy) slightly outperforms Metric 1
in the corpus, then collected answers to those questions using (Confidence), but does no better than randomly choosing a
crowdsourcing. For example, for a command like “Take the question to ask. When asking a second question, Metric 2
pallet and place it on the trailer to the left,” the question- achieves better accuracy than Metric 1 or random question
generation algorithm could ask about “the pallet,” “it,” or “the selection, nearly matching the 71% correct achieved by asking
trailer to the left.” By asking answers for all concrete noun all possible questions. This is despite the fact that it only
phrases in the dataset in advance, we can compare different asks 55% of the available questions. These results show clear
question selection strategies offline, without collecting new improvement from the additional information in the questions
data. and answers, and some advantage from Metric 2 over Metric 1
To collect an answer to a question, we showed annotators a and random selection of questions. Note when asking a second
natural language command directing the robot to perform an question, the system may choose to ask again about the
action in the environment, such as “Pick up the pallet,” paired previously encountered phrase; it may also choose to ask about
with a question such as “What do you mean by ‘the pallet’?” another phrase. This mechanism means that if the answer to
In addition, annotators saw a video of the simulated robot the first question was ambiguous, the system can recover by
performing the action sequence, such as picking up a specific asking again to collect more information.
tire pallet in the environment. We instructed them to provide In order to focus on the results of the question-selection
an answer to the question in their own words, assuming that process, we repeat the analysis of the two metrics using oracle
what they saw happening in the video represented the intended coreference, which determines linguistic coreference directly
meaning of the command. We collected two answers from from the mapping between constituents in the language and
different annotators for each question. Example commands, groundings in the real world. This eliminates automatic coref-
questions, and answers from the corpus appear in Table 5. erence as a source of error in these results. We see a significant
To measure the performance of the system, we report the improvement using oracle coreference compared to automatic
fraction of correctly grounded concrete noun phrases in the coreference; this is due to errors in which the automatic
original command, not including the question and answer. A resolver merges variables that do not actually refer to the same
concrete noun phrase is one which refers to a specific single object.
object in the external world, such as “the skid of tires.” An When using oracle coreference, asking all possible questions
example of a non-concrete noun phrase is “your far left-hand shows a dramatic improvement over asking no questions (92%
side.” A noun phrase such as “the skid of tires” is considered from 58%). With one and two questions being asked, Met-
to be correct if the inference maps it to the same tire pallet ric 2 (Entropy) again outperforms Metric 1 (Confidence) and
that the human user referenced. It is considered to be incorrect random selection. Notably, question selection with Metric 2
if the inference maps it to some other object, such as a trailer. (Entropy) is able to achieve the same 92% accuracy as
We evaluate our system in several different conditions, using asking all questions, despite only asking 55% of the ques-
both automatic coreference resolution and oracle coreference tions available to it. Improved coreference resolution could
resolution. As baselines, we present the performance using be achieved by training on a larger corpus of examples, as
only information from the commands, without asking any well as adding additional features to the coreference resolver,
questions, as well as performance when asking a question especially including information from groundings.

414
Command: Move your pallet further right.
Next, we assess the system’s performance using the algo- Question: What do the words your pallet refer to?
rithm described in Section III-C in order to decide when to Answer: Your pallet refers to the pallet you are currently
ask a question, in addition to deciding what question to ask. carrying.
Table II shows the performance of Metric 1 and Metric 2.
Metric 2 significantly outperforms Metric 2 while asking only Command: Move closer to it.
Question: What does the word it refer to?
a few more questions. Although it asks less than half of Answer: It refers to the empty truck trailer.
the possible questions, it approaches the performance of the
system which asks all possible questions. Command: Take the pallet and place it on the one to the left.
As an example of the system’s operation, for a command Question: What do the words the one refer to?
such as “Move the pallet over to it,” the entropy according to Answer: The one refers to the empty trailer.
Metric 2 for the phrase “it” is (1.82) and “the pallet” (1.00). Command: Place the pallet just to the right of the other pallet.
The system identifies “it” as the most uncertain variable, then Question: What do the words the pallet refer to?
asks “What does the word ‘it’ refer to?” The answer was Answer: The wooden crate that the merchandise sits on
“It refers to the empty trailer to the left of the two pallets.” top of.
After incorporating the answer into the model and performing Fig. 5: Sample commands, questions, and answers from the
inference, the system correctly grounds the phrase “it” to the corpus.
trailer and computes a new estimate for the entropy. The
entropy for “it” has now been reduced, and the robot next TABLE I: Performance at Grounding Concrete Noun Phrases
asks “What do the words the pallet refer to?” and receives an
answer “The pallet is the piece of wood with the orange and % Correct, % Correct,
% Questions
automatic oracle
grey boxes that is directly in front of the forklift.” After this coreference coreference
Asked
process, the robot has inferred correct values for all grounding Baselines
variables in the grounding graph for this command. No Questions 47% 58% 0%
Finally, we assess the system’s performance at producing All Questions 71% 92% 100%
better action sequences using answers to questions. For each Asking One Question
top-level clause in the corpus, we generated a sequence of Random 55% 74% 28%
robot actions and manually assessed whether those actions Metric 1 (Confidence) 53% 74% 28%
Metric 2 (Entropy) 55% 79% 28%
matched the actions in the original video. When using infor-
mation from commands only, the system correctly executes Asking Two Questions
37% of the top level actions. In many cases, for a command Random 61% 82% 55%
Metric 1 (Confidence) 63% 82% 55%
such as “pick up the pallet” it does pick up a pallet, but Metric 2 (Entropy) 68% 92% 55%
a different one from the original video. In contrast, after
incorporating information from the answers to questions, it
correctly executes 66% of the commands.
commands [21, 9, 4]. Previous probabilistic approaches have
Our framework provides first steps toward an information-
used generative and discriminative models for understanding
theoretic approach for enabling the robot to ask questions
route instructions but did not make interactive systems that
about objects in the environment. Failures occurred for a
can use dialog to resolve ambiguity [10, 19, 8, 18, 13]. Our
number of reasons. Sometimes, the answer obtained from the
work instead focuses on using an induced probabilistic model
human user was not useful. For example, one user answered
over natural language commands and groundings in order to
“What do the words the pallet refer to?” with a definition of the
incorporate information from questions and answers.
pallet “The wooden crate that the merchandise sits on top of”
Others have created robotic systems that interact using
rather than specifying which pallet was being referenced (e.g.,
dialog [5, 15, 2]. Bauer et al. [1] built a robot that can
something like “the pallet with tires.”) Other failures occurred
find its way through an urban environment by interacting
in more complex environments because the robot failed to
with pedestrians using a touch screen and gesture recognition
understand the disambiguating answer, as in “the object on
system. Our approach differs in that it focuses on simple three-
the far left,” when the system did not have a good model
turn dialogs but is able to understand language from untrained
of left versus right. Our entropy-based approach is limited to
users rather than a predefined, fixed vocabulary or fixed types
types of questions that target specific phrases in the answer.
of answers. Furthermore, it chooses targeted questions based
More general algorithms are needed to handle a wider array
on the robot’s model of the external world.
of dialog strategies, such as asking yes/no’ questions, such as
Existing work in dialog systems [3, 12, 20, 22] use MDP
“Do you mean this one?” or more open-ended questions such
and POMDP models with a fixed, predefined state space
as “Now what?”
to represent the user’s intentions. In contrast, the space of
possible groundings is defined by objects and action available
V. R ELATED W ORK
to the robot’s perception; the user’s intentions are defined by
Many have created systems that exploit the compositional the grounding graph and vary according to the structure of the
structure of language in order to follow natural language language.

415
TABLE II: Performance Deciding When to Ask a Question [8] Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy.
Toward understanding natural language directions. In Proc.
% Correct, oracle % Questions Asked
coreference
ACM/IEEE Int’l Conf. on Human-Robot Interaction (HRI),
pages 259–266, 2010.
Metric 1
71% 38% [9] Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers.
(Confidence)
Walk the talk: Connecting language, knowledge, and action in
Metric 2 route instructions. In Proc. Nat’l Conf. on Artificial Intelligence
87% 40%
(Entropy) (AAAI), pages 1475–1482, 2006.
[10] Cynthia Matuszek, Dieter Fox, and Karl Koscher. Follow-
ing directions using statistical machine translation. In Proc.
ACM/IEEE Int’l Conf. on Human-Robot Interaction (HRI),
VI. C ONCLUSION pages 251–258, 2010.
In this paper, we presented results for a robot dialog [11] Stephanie Rosenthal, Manuela Veloso, and Anind K. Dey.
understanding system based on a probabilistic graphical model Learning accuracy and availability of humans who help mobile
robots. In Proc. AAAI, 2011.
that factors according to the structure of language. This system [12] N. Roy, J. Pineau, and S. Thrun. Spoken dialogue management
is able to ask the human user targeted questions about parts using probabilistic reasoning. In Proceedings of the 38th Annual
of a command that it failed to understand and incorporate Meeting of the Association for Computational Linguistics (ACL-
information from an open-ended space of answers into its 2000), 2000.
model, iteratively improving its confidence and accuracy. [13] Nobuyuki Shimizu and Andrew Haas. Learning to follow
navigational route instructions. In Proceedings of the 21st
Our next steps are to scale the algorithm to more complex International Joint Conference on Artifical Intelligence, pages
dialogs, expanding the repertoire of questions and answers that 1488–1493, 2009.
the system can understand. Integrating nonverbal backchannels [14] Dimitar Simeonov, Stefanie Tellex, Thomas Kollar, and
and gesture into the framework as new types of factors in the Nicholas Roy. Toward interpreting spatial language discourse
grounding graph remains an open problem. Finally, we aim to with grounding graphs. In 2011 RSS Workshop on Grounding
Human-Robot Dialog for Spatial Tasks, 2011.
extend the framework to support active learning, enabling the [15] M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams,
robot to learn new word meanings based on answers it has M. Bugajska, and D. Brock. Spatial language for human-robot
received to questions. dialogs. IEEE Trans. on Systems, Man, and Cybernetics, Part
C: Applications and Reviews, 34(2):154–167, 2004. ISSN 1094-
VII. ACKNOWLEDGEMENTS 6977.
[16] Veselin Stoyanov, Claire Cardie, Nathan Gilbert, Ellen Riloff,
This work was sponsored by the U.S Army Research Laboratory
David Buttler, and David Hysom. Reconcile: A coreference res-
under the Robotics Collaborative Technology Alliance, by the Office olution research platform. Technical report, Cornell University,
of Naval Research under MURI N00014-07-1-0749, and by Battelle April 2010.
under Robert Carnes and Scott Sheaf. Their support is gratefully [17] S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee,
acknowledged. We would like to thank Javier Velez, Ashis Banerjee S. Teller, and N. Roy. Approaching the symbol grounding
problem with probabilistic graphical models. AI Magazine, 32
and Josh Joseph for helpful discussions about this work.
(4):64–76, 2011.
R EFERENCES [18] S. Tellex, T. Kollar, S. Dickerson, M.R. Walter, A. Banerjee,
S. Teller, and N. Roy. Understanding natural language com-
[1] Andrea Bauer, Klaas Klasing, Georgios Lidoris, Quirin mands for robotic navigation and mobile manipulation. In Proc.
Mhlbauer, Florian Rohrmller, Stefan Sosnowski, Tingting Xu, AAAI, 2011.
Kolja Khnlenz, Dirk Wollherr, and Martin Buss. The Au- [19] Adam Vogel and Dan Jurafsky. Learning to follow navigational
tonomous City Explorer: Towards natural human-robot inter- directions. In Proc. Association for Computational Linguistics
action in urban environments. International Journal of Social (ACL), pages 806–814, 2010.
Robotics, 1(2):127–140, April 2009. [20] J. D Williams and S. Young. Scaling POMDPs for spoken
[2] Rehj Cantrell, Matthias Scheutz, Paul Schermerhorn, and Xuan dialog management. IEEE Transactions on Audio, Speech, and
Wu. Robust spoken instruction understanding for HRI. In Language Processing, 15(7):2116–2129, September 2007.
Proceeding of the 5th ACM/IEEE international conference on [21] Terry Winograd. Procedures as a representation for data in a
Human-robot interaction, page 275282, 2010. computer program for understanding natural language. PhD
[3] F. Doshi and N. Roy. Spoken language interaction with thesis, Massachusetts Institute of Technology, 1970.
model uncertainty: An adaptive human-robot interaction system. [22] S. Young. Using POMDPs for dialog management. In IEEE
Connection Science, 20(4):299–319, 2008. Spoken Language Technology Workshop, 2006, pages 8–13,
[4] J. Dzifcak, M. Scheutz, C. Baral, and P. Schermerhorn. What to 2006.
do and how to do it: Translating natural language directives into
temporal and dynamic logic representation for goal management
and action execution. In Proc. IEEE Int’l Conf. on Robotics and
Automation (ICRA), pages 4163–4168, 2009.
[5] Kai-yuh Hsiao, Stefanie Tellex, Soroush Vosoughi, Rony Kubat,
and Deb Roy. Object schemas for grounding language in a
responsive robot. Connection Science, 20(4):253–276, 2008.
[6] Ray S. Jackendoff. Semantics and Cognition, pages 161–187.
MIT Press, 1983.
[7] Daniel Jurafsky and James H. Martin. Speech and Language
Processing. Pearson Prentice Hall, 2 edition, May 2008. ISBN
0131873210.

416
Efficiently Finding Optimal Winding-
Constrained Loops in the Plane
Paul Vernaza, Venkatraman Narayanan, and Maxim Likhachev
[email protected], [email protected], [email protected]

The Robotics Institute


Carnegie Mellon University
Pittsburgh, PA 19104

Abstract—We present a method to efficiently find winding-


constrained loops in the plane that are optimal with respect
to a minimum-cost objective and in the presence of obstacles.
Our approach is similar to a typical graph-based search for an
optimal path in the plane, but with an additional state variable
that encodes information about path homotopy. Upon finding a
loop, the value of this state corresponds to a line integral over
the loop that indicates how many times it winds around each
obstacle, enabling us to reduce the problem of finding paths
satisfying winding constraints to that of searching for paths to
suitable states in this augmented state space. We give an intuitive
interpretation of the method based on fluid mechanics and show
how this yields a way to perform the necessary calculations
efficiently. Results are given in which we use our method to
find optimal routes for autonomous surveillance and intruder
containment.

I. I NTRODUCTION
Fig. 1: An optimal plan (solid line) for a hypothetical UAV
The subject of this work is finding optimal constrained loops
surveillance mission. The UAV is constrained to take off and
in the plane. More specifically, out of all paths starting and
return to a particular location, winding twice around each of
ending at a specific location in the plane and satisfying certain
three regions of interest. Regions of interest are designated by
winding constraints, we would like to find one such path that
polygonal dashed lines. Striped polygons denote high-traffic
minimizes a given location-dependent cost accumulated along
regions that the UAV must avoid. Circular dashed lines denote
the path, while also avoiding some regions entirely.
location and ranges of radar installations, with radar power
Figure 1 depicts a practical situation in which this type of
decreasing with distance to center.
problem arises. Here, an unmanned aerial vehicle (UAV) is
tasked with flying a surveillance mission to photograph certain
regions of interest (ROI) located within hostile territory. The
II. M ETHOD OVERVIEW
UAV must find a route that begins and ends at its home base.
It must additionally fly a route that minimizes a given cost The basic idea of our method is very simple and best
functional, determined by distance traveled and proximity to motivated by a fluid analogy. Suppose we wish to find a planar
hostile radar installations, while entirely avoiding certain high- loop that encloses some set of regions. We imagine that within
traffic regions deemed too risky to traverse. Of prime concern each region is a source that produces fluid at a certain rate. By
is the need to photograph the ROI from all perspectives. the divergence theorem (illustrated in Fig. 2), we know that
Formally, we can ensure this by requiring that the generated the net rate at which fluid escapes any loop is equal to the
path satisfy topological winding constraints with respect to the rate at which it is produced by sources enclosed by the loop.
ROI—if the UAV winds at least once around each ROI, it will Therefore, we can find a loop enclosing the desired regions
be able to view each ROI from every angle. by searching for a loop through which fluid escapes at a rate
The rest of this paper is organized as follows. First, we give equal to the sum of the rates assigned to those regions. So
an intuitive description of our approach, followed by a brief long as there are no two sets of regions whose rates sum to
discussion of the relation of our work to other methods. An in- the same quantity, we can be assured that any such loop must
depth description of technical details of our method follows contain only the desired regions.
in Section IV. Experimental results are given in Section V, We can find such a loop via a straightforward modification
followed by discussion. of any standard graph-based method for navigation in the

417
3/4 1, 5/4,
net outflow = D E F
10 (bo
th -1/4 +1/4
l

oo
ps)
5

+1/4

-1
/2

-1/4
/2
+1
region of
1/2 B
interest C

3 1

net ou
0 t outflow = 1 +1

tfl
net o u

ne /2

ow = 4

/2
-1
t fl o

w
=0 1
A
F= 0
Fig. 3: Illustration of homotopy-augmented search. Shown is a
2x2 grid on which a graph search is performed, with a single
Fig. 2: Illustration of divergence theorem: total flow rate obstacle in the center. Graph search begins at point A (marked
exiting the boundary of any region is equal to the sum F = 0). Directed edges are marked with signed F -value
of source flow rates contained within, even in the presence increments (in italics). Other values denote F -values at grid
of other sources. Locations of flow sources are indicated vertices associated with one or more paths discovered during
by faucet icons, with their flow rates indicated by numbers search. Solid lines show paths found passing the obstacle to the
adjacent to these icons. left, while dashed lines show paths found passing the obstacle
to the right

plane. This is accomplished by treating the rate at which fluid


passes through a path as a dependent variable whose state
F = −1/2. Next, the search expands B, generating D with
is tracked along with the usual state necessary for navigation,
F = 3/4 and E with F = 1, followed by a similar expansion
such as position and orientation. Any given state of the vehicle
of C.
can therefore be associated with any number of different flow
values, depending on how the search reached that state. When At this point, we note that E has been reached via two (non-
the search visits the goal configuration, we can check the flow homotopic) paths, each yielding a distinct F -value. Therefore,
value to verify whether the desired regions were enclosed; we next expand the location E twice—once for the state (E, 1)
if they have been, then we are assured that the loop thus and once for the state (E, −1). If we continue this process,
found is optimal with respect to all other loops satisfying the reader may verify that we will eventually expand the state
our constraints, as long as the search algorithm employed is (A, 2) and/or (A, −2), though this is not explicitly illustrated
admissible. for purposes of clarity. Note that this will require traversing
arcs in directions contrary to those illustrated, in which case
A. Example with one region the arc’s F value is negated before adding it to the state’s
Fig. 3 gives a simple illustration of how our method might F -value.
proceed given just one region of interest around which to loop. Upon expanding the state (A, 2), it is apparent that the
Specifically, we consider the problem of finding an optimal search has found a path winding clockwise around the obstacle
path that begins at the flow-augmented state ((x, y), F ) = exactly once. Furthermore, it may be verified that any other
(A, 0) and winds around the obstacle one or more times before path that winds once, clockwise, around the obstacle, also has
returning to the location A. We will refer to the flow state F F = 2. However, if we expand the states in an admissible
as the F -value. ordering, such as that provided by A* [9], we are guaranteed
The figure shows a number of possible actions that the that the first path via which we have expanded (A, 2), is the
search could take, for illustrative purposes. Each arc is marked optimal one to that state. Since all paths starting at A, winding
by the quantity by which the F -value is incremented traversing clockwise, once, around the obstacle, arrive at (A, 2), we can
the arc in the direction shown, given that the region of interest in that case conclude that this path is one that achieves the
contains a source of rate one. Each location is marked by minimum cost out of all such paths.
the F -values found at those locations by exploring different We finally observe that any loop winding around the ob-
illustrated paths. For instance, the search begins by expanding stacle k times arrives at the state (A, ±2k) for k > 0, with
A, generating B with F = 1/2, and generating C with sign determined by the winding direction. If the loop does not

418
wind around the obstacle, however, the path arrives at the state appears to be more flexible in a variety of ways, due to our use
(A, 0). The converse statements are also true. of graph-based search. We need not assume the environment
is polygonal, nor do we assume the cost of the path is uniform
B. Winding around multiple regions across free space, or that the vehicle dynamics are trivial, as
In the case that there are multiple regions of interest, we in [1]. Moreover, we can leverage problem-specific heuristics
proceed as previously mentioned, placing a fluid source inside in the context of A* in order to accelerate the search. A final
each region and computing F -values via superposition. We difference between our work and that of [1] is that we are able
may then specify how many times the path should wind around to find solutions with arbitrary winding constraints.
each region, where winding is defined in the topological As surveillance constitutes a principal application of our
sense [10]. Assuming we assign to each region a fluid source method, our method is related to another problem from com-
of rate log zi , where zi is the ith prime number, a loop putational geometry knows as the optimum watchman route
satisfying the winding constraints may be found by searching problem. This problem is defined as that of generating a path
for a path to a state with of minimum length through an environment such that every
point in the environment is visible from some point along the
exp F = Πi ziWi , (1)
path. We motivate the UAV surveillance problem considered
where Wi is the winding number associated with the ith here from an aim of rendering each point along the ROI
region. The fundamental theorem of arithmetic implies that visible, but the nature of our solutions is such that this would
this constitutes an invertible map between winding numbers be guaranteed only in certain cases, such as the case where
and F -values. ROI are flat enough to guarantee that no ROI occludes any
other, or the case where the ROI are spaced sufficiently such
III. R ELATED W ORK that winding twice around each ROI is sufficient to guarantee
Our work was inspired by the work of Bhattacharya et. full visibiity. Therefore, it is unlikely that optimum watchman
al. [3, 4], who first proposed the general idea described in routes are equivalent to optimum winding-constrained loops.
Section II of performing graph search for navigation with a UAV planning in itself has been the focus of a significant
state vector augmented by homotopy information. Our work amount of work in engineering disciplines. Typical goals of
differs principally in two ways. First, we derive a version of these approaches include cooperative planning, enforcing non-
homotopy-augmented graph search that is suitable for finding holonomic constraints, and planning in real-time [6, 5, 2].
optimal loops with arbitrary winding constraints. Furthermore, Other work focuses on ensuring geometric visibility in urban
we employ a real-valued encoding of homotopy information, environments [7]. However, it appears that comparatively little
as opposed to the complex-valued encoding derived in [3]. work has focused on planning optimal loop paths for UAVs.
Although it is not the focus of this work, we note that the An exception is the work of [11], who devised a suboptimal
real-valued encoding of homotopy information derived here algorithm to find looping paths visiting a number of ROI while
may also be applied to the problem described in [3], which satisfying path curvature constraints. This method is again
might be characterized as point-to-point planning in 2D with less flexible than ours in that we allow an arbitrary spatially-
homology constraints on the generated paths. In addition to dependent cost function to take into account factors such as
having an intuitive interpretation in terms of fluids, applying spatially-varying risk.
the encoding we derive here would yield a method arguably
IV. T ECHNICAL D ETAILS
simpler to implement.
Also noteworthy is recent work in computer vision that We now clarify some of the technical points of the method
employs planning with winding angles in order to perform presented in Sec. II.
tracking with occlusion [8]. Planning in this way is essentially A. Graph construction
equivalent to the method described here, a fact that we show
An informal description of our graph construction was given
formally in Sec. IV-C. Additionally, we give an intuitive
in Section II. That description may be formalized in the
fluid-based interpretation of the method, derive an appropriate
following way. First, we assume we are given a description of
heuristic, and apply the method to surveillance and optimal
the graph corresponding to a point-to-point navigation problem
confinement problems.
in the plane, consisting of a state space X and a successor
The problem of loop planning discussed here also bears
function succ : X → 2X , where 2X is the power set of X .
some similarity to several problems in computational geom-
Typical examples of X include X = R2 and X = R2 ×S 1 , the
etry. In the geometric knapsack problem [1], the objective is
latter corresponding to navigation with heading dependence.
to find a simple loop enclosing discrete reward locations of
We then define a new state space X  = X ×R, with elements
varying reward, such that the enclosed reward minus the length
denoted by (x, f ). Let ρ : X × X → (I → R2 ) denote the
of the loop, is maximized. Arkin et al. [1] give algorithms to
function that generates the Cartesian path taken from a node
solve certain variants of this problem in time polynomial in the
in the original graph to a successor. succ is then defined in
number of vertices of the polygons given as input. Our method
the following way:
is similar if the reward regions are thought of as regions around
which the path should wind at least once. However, our method succ : (x, f ) → {(x , f  ) | x ∈ succ(x), f  = f +F (ρ(x, x ))},

419
where F : (I → R2 ) → R is a line integral that will be
defined formally in the next section.
The start state for graph search is defined by (x0 , 0), where
x0 is the start state inthe pre-augmented graph. The goal
state is defined as (x0 , i Wi log zi ), where Wi is the desired

le
a
winding number of the solution path around the ith region.

sc
B. Line integral construction
Aspects of the line integral referred to in the last section
are now detailed. First, we note that a suitable flow field is
given by rotate
 ri x − o i +translate
V (x) = , (2)
i
2π x − oi 2
Fig. 4: Illustration of symmetries in line integration of flow
where oi ∈ R2 is an arbitrary source location inside the ith field: the flux through the segment L due to o is equal to the
region for which we are given a winding constraint, and ri = flux through the segment (0, 0) → (0, 1) due to o.
log zi is the source rate assigned to the ith source. This may
be established by considering, for example, the case where oi
is located at the center of a circle and devising an integrand with wi = log zi /(2π). The correctness of Eq. (1) follows
that will integrate to ri . Denoting by n̂(s) the normal to x̃ at immediately from this observation. This raises the question of
s, we define whether an equivalent method may be obtained by augmenting
 the state vector with a vector of winding angles as opposed
F (x̃) = V (x̃(s)) · n̂(s) ds. (3) to the single F -value. Here we show that the two methods
are indeed equivalent, in the sense that the entire vector of
In order to compute the value by which F increments along winding angles can be uniquely recovered from any F -value.
an edge, we must evaluate this integral with respect to the path The key observation is that each valid winding angle at
taken by the edge. Assuming this path is a straight line, the each location in the plane can be expressed in the form θi =
integral can be computed analytically. The integral is simpli- θ̄i + 2πki , where ki ∈ Z and θ̄i is a location-dependent offset
fied considerably by taking into consideration its rotational, angle. We then observe that
translational, and scaling invariances, as depicted in Fig. 4.  
By further applying linearity, we may therefore reduce the  1
exp F = exp log zi ki + θ̄i , (7)
general case to the line integral of V over (0, 0) → (0, 1) due 2π
i
to a single obstacle of rate r located at position o = (ox , oy ).
Assuming the segment normal is given by n̂ = −1 0 the which implies that
absolute value of the line integral is in this case given by . /
 log zi
 1  1 exp F − θ̄i = Πi ziki . (8)
r −ox 2π
V (x) · n̂ ds = − 2 dy (4) i
s=0 2π y=0 o + (y − oy )2
 x
 Therefore, given F , the winding angles θi can be obtained by
r 1 − oy −oy
= arctan − arctan , (5) finding the prime factors of the left-hand side of the previous
2π ox ox expression.
which is equal to rθ/(2π), with θ as defined in Fig. 4. The An interesting geometric interpretation of this fact is that
general case may then be derived via similar triangles and at each location, the winding angles are determined by the
superposition. For a line segment with generic endpoints p0 intersection of the hyperplane F = w, θ and the discrete,
and p1 , this yields infinite lattice of valid winding angles. The slopes of this
 ri C D hyperplane are ratios of logs of prime numbers, and are
p1 − oi p0 − oi
F = si arccos , , (6) therefore irrational; consequently, the hyperplane can intersect
2π p1 − oi  p0 − oi 
i the infinite lattice in at most one location. This is illustrated
where the sign si ∈ {−1, 1} is determined by whether oi falls in Fig. 5.
on the left or right side of the directed segment p0 → p1 , A practical consequence of this analysis concerns the mem-
according to the desired convention. ory efficiency of the method. Given many winding constraints,
it is clearly inefficient to take the naive approach of storing
C. Equivalence with winding-vector approach each winding angle as an individual floating-point value,
Summing Equation (6) along a path reveals the F -value since the set of valid winding angles for each location is a
to be a linear projection of the vector of winding angles discrete set. Storing a single F -value as a floating-point value
associated with the path around each region; i.e., F = w, θ, constitutes a memory-efficient alternative to this approach.
where θ is a vector of winding angles, and w is a vector Alternatively, one may store a single floating-point offset angle

420
query
state

start
(goal)
Fig. 6: Illustration of heuristic. Query state indicates state for
which heuristic is to be evaluated. Solid line shows a path
that may have been taken to this state. Regions with positive
winding constraints are depicted as solid circles. Striped area
Fig. 5: Illustration of invertibility of map from winding angles is an obstacle. Shaded area encloses regions whose winding
to F -values for an example with two regions of interest. constraints would be satisfied if the loop were closed with a
The set of valid winding angles (θ1 , θ2 ) consists of the set straight line to the goal. Dashed line shows the maximum-
of lattice points (intersections of gray lines). The only valid length excursion to reach any winding-constrained region not
pair of winding angles that also satisfies the F equation is in the shaded area. The value of the heuristic is the length of
the unique intersection of the pictured line with the lattice the dashed line.
(circled). The uniqueness of the intersection point follows from
the irrationality of the slope − log 2/ log 3.
experiment was to examine the qualitative aspects of paths
with different winding constraints.
vector θ̄ per location in addition to a vector of integer ki per In the simplest case, the path is constrained to wind around
state; for any reasonable problem, only a few bits per state each ROI exactly once. Since we simulated no vehicle dy-
would be necessary to represent each ki . namics in this experiment, the optimal solutions must always
be Jordan [10] in this case, as depicted. Paths constrained
D. Search heuristic to wind twice around some ROI tend to consist of a large
The combinatorial nature of winding constraints necessitates loop winding once around each obstacle with offshoots that
a careful choice of search strategy. For this reason, we employ loop around each ROI once more. However, if any pair of
A* search with the heuristic illustrated in Fig. 6. Given a query ROI are sufficiently close, the optimal path may contain a
state for which the heuristic is to be computed, we calculate, Jordan subloop containing both, as depicted in the lower-right
of the regions with winding constraints, which regions would portion of the figure. From an applications perspective, this
have their constraints satisfied if the loop were closed with a type of behavior may be desirable in the case of surveillance
straight line to the goal. For each of the remaining regions with with limited sensing range. In this case, it may be sufficient to
non-zero winding constraints, we compute the length of the wind around groups of ROI (instead of winding around each
minimum-length path beginning at the query state, touching individually), provided that each group is contained within a
the region, and returning to the goal location; this constitutes circle of radius smaller than the agent’s sensing range.
the minimum-length excursion necessary to satisfy that re- Note also that it is possible to specify a zero winding
gion’s winding constraint. Of all the distances so computed, number for certain ROI. In the case that all the windings
we choose the greatest to be the value of the heuristic. are either zero or one, the ROI with winding one must be
contained within the region enclosed by the loop (which must
V. E XPERIMENTS be Jordan), while the ROI with winding zero must fall outside
We implemented the method and applied it to several the loop.
test scenarios described here, with the aim of understanding
the qualitative characteristics of planning with winding con- B. UAV surveillance
straints, the applicability of the method to realistic problems, We now return to the UAV surveillance problem mentioned
and the method’s computational efficiency. For simplicity, the in the introduction. For these experiments, we simulated hypo-
examples presented here only consider curves with positive thetical UAV missions by annotating aerial imagery of military
winding, though constraints with negative winding present no bases with ROI, hypothetical radar installations, and regions
obstacle to the method. of excessive risk to be avoided entirely. Radar installations
were modeled as overlapping ellipses, each containing a cost
A. Synthetic environment attaining its maximum value at the center of the ellipse and di-
Fig. 7 shows the result of a simple synthetic experiment minishing gradually to zero at the boundary of the ellipse. Our
showing optimal plans generated by our method winding objective was to find minimum-cost loops subject to winding
around three regions (referred henceforth as ROI) in varying constraints and entirely avoiding regions of excessive risk. We
ways, and in the presence of obstacles. The purpose of this approximated nontrivial dynamics for the vehicle by planning

421
1 1
1 3
1 0 1

2 2 3
2 1 2
2 2
3
Fig. 7: Demonstration showing loops found by the method to
satisfy different winding constraints. Numbers indicate wind-
ing number constraints enforced for circular regions located 1
directly beneath. Obstacles depicted by rectangular regions.

2
1

Fig. 9: Results of path planning for UAV surveillance with


Fig. 8: Result of path planning for hypothetical UAV surveil- time-varying cost function. Curved, dashed line shows planned
lance mission. UAV is constrained to wind exactly twice path. Colored, striped wedges show fields of view of enemy
around each ROI. Annotations are as described in Fig. 1. patrollers. Three frames are shown, each showing the position
of the UAV and patrollers at a particular point in time.

for its orientation as well, limiting the maximum change in


in which enemy patrol units were present. These were modeled
orientation between straight segments to 22.5 degrees.
by augmenting the state with a time variable; high costs were
Figure 1 shows the result of finding an optimal path winding assigned to those positions and times representing states in
around each ROI exactly twice. The optimal path in this case which the UAV could be observed by an enemy patrol unit.
has the previously discussed characteristic of having a large Fig. 9 shows an example result. The optimal path successfully
outer loop with smaller loops branching off of it. It is observed allows the UAV to evade the patrol units. An interesting feature
that the path generally stays just outside the region of radar of the plan is a small loop that effectively slows the UAV in
visibility, straying inside only briefly in order to wind around order to avoid detection by patrol unit 2.
the ROI. As expected, the path also never wanders inside the
forbidden regions. C. Intruder confinement
Figure 8 shows a result obtained in another scenario. Again, We finally studied the application of our method to a
the path was constrained to wind twice around each ROI. problem we will refer to as intruder confinement. This problem
An interesting observation here is that upon visiting the last consists of finding an optimal way to confine several intruders
ROI, the UAV has a choice of either winding around the ROI in a maze-like environment, subject to the constraint that
and roughly retracing its path in the opposite direction, or innocent bystanders should not be confined. Confinement is
penetrating radar range briefly in order to return to its home achieved by surrounding the intruders with a team of robots
base. We observe that the optimal path indeed penetrates radar that deploy from a central location. We assume that the robots
range via a short path near the boundaries of two overlapping have limited communication range, such that we can define
radar installations before completing the loop. a connectivity graph having an edge between two robots iff.
We additionally attempted to generate plans for a scenario they are within communication range of each other. In order to

422
ensure that the robots can coordinate, this graph should remain
connected at all times.
A feasible deployment strategy may be obtained by finding
a loop that winds around each intruder exactly once without
winding around a bystander. Robots may then incrementally ?
deploy along the loop at intervals smaller than their commu-
nication radius in order to surround the intruders while main-
taining full connectivity and allowing bystanders to escape.
The results shown in Fig. 10 demonstrate that the method is
able to find optimal enclosures in highly irregular, maze-like
indoor environments of varying topologies. As was expected,
we observed that the complexity of planning did not scale
significantly with the complexity of the environment, allow- (a)
ing us to solve problems such as those depicted. However,
the complexity was significantly affected by the number of
winding constraints imposed, as will be discussed shortly.
D. Computational efficiency
In order to study the computational efficiency of the method,
we applied it to a synthetic example where we varied just the
number of regions with winding constraints. The experimental
? ?
scenario is depicted in Fig. 11. In the nth trial, we used our
method to find a loop around the regions labeled 1, 2, . . . n,
winding once in the positive direction around each. This was
conducted both using the heuristic described in Sec. IV-D—
which we will refer to as the loop heuristic—and using the
Euclidean distance-to-goal heuristic.
Fig. 11b shows a clearly asymptotic exponential scaling
in the run time as a function of the number of winding
constraints for both heuristics. The loop heuristic, however,
exhibits a significantly more shallow slope in the log plot; for
10 constraints, the Euclidean distance heuristic is slower by
a factor of 10. An interesting observation is that the perfor- (b)
mance of the loop heuristic was relatively insensitive to the Fig. 10: Results of intruder confinement experiments. Loca-
number of constraints up to about six, at which point planning tions of intruders are marked by circles below exclamation
times increased significantly. For a very small number of symbols. Locations of bystanders are marked by circles below
constraints, the Euclidean distance heuristic outperformed the question marks. Paths generated by the method are shown
loop heuristic by a large margin, which we attribute to the superimposed on floor plan of environment.
smaller overhead of the Euclidean heuristic.
The reason for the observed exponential scaling is that the
search ultimately begins to explore many different combina- geometric knapsack problem in computational geometry may
tions of windings around different obstacles. As we increase be considered relaxations of the winding-constrained plan-
the number of constraints, the number of plausible winding ning problem, and may be solved efficiently. An admissible
states (and by extension, the open list) grows exponentially. heuristic might therefore be obtained as the solution to such a
The loop heuristic was able to compensate for this effect problem. Alternatively, it is possible to construct an admissible
somewhat, but not completely. We also observed in experimens heuristic as a graph search problem on a vastly reduced graph
with constraints winding multiple times around each region, that assumes a uniform cost function. We leave implementation
that the performance of the loop heuristic deteriorated. This is of these ideas as future work.
to be expected, as it does not take into account the effect of
multiple windings. VI. C ONCLUSION
We are currently investigating different avenues to enable We have demonstrated a method for efficiently finding
scaling to yet larger numbers of winding constraints. One optimal loops in the plane subject to winding constraints.
simple enhancement would be to prune the search when The method accomplishes this by collapsing together all paths
winding angles become too large (a similar strategy was used sharing a common F -value, which encodes homotopy-type
in [8]), at the expense of losing the completeness guarantee of information about the paths. The F -value is constructed in
the search. Also, as noted in Section III, certain variants of the such a way as to reversibly encode winding data when a

423
start/goal
1 2 3 4 5 6 7 8 9 10 ACKNOWLEDGMENTS
This work was supported by ONR DR-IRIS MURI grant
(a) Experimental setup
#N00014-09-1-1052 and ONR ANTIDOTE MURI grant
3 #N00014-09-1-1031.
10
2 tic R EFERENCES
10 u ris
1 n he [1] E.M. Arkin, S. Khuller, and J.S.B. Mitchell. Geo-
10 ea
run time (s)

d metric knapsack problems. Algorithmica, 10(5):399–


cli
0 Eu 427, 1993. URL http://www.springerlink.com/content/
c
risti
10
eu
ph g007w81p153h3326/.
10
ï loo
[2] J.S. Bellingham, M. Tillerson, M. Alighanbari, and J.P.
10
ï How. Cooperative path planning for multiple UAVs in
ï
dynamic and uncertain environments. In IEEE Confer-
10 enece on Decision and Control, volume 3, dec. 2002.
1 2 3 4 5 6 7 8 9 10
number of winding constraints
[3] S. Bhattacharya, V. Kumar, and M. Likhachev. Search-
based path planning with homotopy class constraints.
(b) Run time scaling
In Third Annual Symposium on Combinatorial Search,
Fig. 11: Synthetic experiment to determine empirical scaling of 2010. URL http://www.aaai.org/ocs/index.php/AAAI/
computational complexity as a function of number of winding AAAI10/paper/view/1920.
constraints. Fig. 11a shows experimental setup with point [4] Subhrajit Bhattacharya, Maxim Likhachev, and Vijay
obstacles around which winding constraints were imposed, Kumar. Identification and representation of homotopy
along with order in which these constraints were introduced classes of trajectories for search-based path planning in
and the solution path for 10 obstacles. Fig. 11b shows the time 3d. In Proceedings of Robotics: Science and Systems,
to find the optimal solution versus the number of winding 27-30 June 2011. URL https://www.aaai.org/ocs/index.
constraints (note log scale), with and without the heuristic php/SOCS/SOCS10/paper/view/2089/0.
described in Sec. IV-D (labeled loop heuristic). [5] S.A. Bortoff. Path planning for UAVs. In American
Control Conference, volume 1, pages 364 –368 vol.1,
sep 2000. doi: 10.1109/ACC.2000.878915.
[6] P.R. Chandler, M. Pachter, and S. Rasmussen. UAV
loop is completed, allowing us to reduce a search for cyclic, cooperative control. In American Control Conference,
winding-constrained paths to a search for an acyclic path to volume 1, pages 50 –55 vol.1, 2001. doi: 10.1109/ACC.
a specific state encoding the desired winding. Computation of 2001.945512.
the F -value additionally has an intuitive fluid-based interpre- [7] Peng Cheng, J. Keller, and V. Kumar. Time-optimal
tation and may be performed efficiently. Finally, our method UAV trajectory planning for 3d urban structure coverage.
leverages standard graph-based search methods to achieve In Intelligent Robots and Systems, 2008. IROS 2008.
flexibility in problem representation, vehicle dynamics, and IEEE/RSJ International Conference on, pages 2750 –
selection of domain-specific admissible heuristics to accelerate 2757, sept. 2008. doi: 10.1109/IROS.2008.4650988.
the search process. [8] H. Gong, J. Sim, M. Likhachev, and J. Shi. Multi-
hypothesis motion planning for visual object tracking.
A particular challenge for the method as implemented up
In ICCV, 2011.
to this point is the case where many winding constraints are
[9] P.E. Hart, N.J. Nilsson, and B. Raphael. A formal basis
to be enforced. In the near-term, we anticipate that employing
for the heuristic determination of minimum cost paths.
straightforward heuristics (as described in Section V-D) will
Systems Science and Cybernetics, IEEE Transactions on,
greatly enhance the ability of the method to solve problems
4(2):100 –107, july 1968. ISSN 0536-1567. doi: 10.1109/
of this type.
TSSC.1968.300136. URL http://ieeexplore.ieee.org/xpls/
An additional area of future research is using our method abs all.jsp?arnumber=4082128.
as a stepping stone to solving problems with constraints [10] J.R. Munkres. Topology: a first course. Prentice Hall,
related to, but stronger than winding. For instance, in the UAV 1975.
surveillance problem, we may wish to enforce the constraint [11] Zhijun Tang and U. Ozguner. Motion planning for mul-
that the UAV fully observe one location before proceeding titarget surveillance with mobile sensor agents. Robotics,
to another. We are currently investigating how this type of IEEE Transactions on, 21(5):898 – 908, oct. 2005. ISSN
constraint might be enforced by augmenting the state vector 1552-3098. doi: 10.1109/TRO.2005.847567.
with state that keeps track of such ordering information.
Though the variations are endless, we believe that winding-
constrained planning yields fundamental insight that will prove
useful in a variety of related endeavors.

424
On the Structure of Nonlinearities
in Pose Graph SLAM
Heng Wang∗ , Gibson Hu† , Shoudong Huang† and Gamini Dissanayake†

College of Electronic Information and Control Engineering,
Beijing University of Technology, Beijing, China. Email:[email protected]
† Centre for Autonomous Systems, Faculty of Engineering and Information Technology,
University of Technology, Sydney, Australia. Email:{Gibson.Hu, Shoudong.Huang, Gamini.Dissanayake}@uts.edu.au

Abstract—Pose graphs have become an attractive representa- unless very good initial value is available. However, the results
tion for solving Simultaneous Localization and Mapping (SLAM) in [3][4][5] demonstrate that sometimes the algorithms can
problems. In this paper, we analyze the structure of the nonlinear- converge to good solution with poor or even zero initial values.
ities in the 2D SLAM problem formulated as the optimizing of a
pose graph. First, we prove that finding the optimal configuration In particular, Tree-based network optimizer (TORO)[6], which
of a very basic pose graph with 3 nodes (poses) and 3 edges applies incremental pose parametrization and a tree structure
(relative pose constraints) with spherical covariance matrices, on top of SGD, has been reported to be extremely robust to
which can be formulated as a six dimensional least squares poor initial values, especially when the covariance matrix of
optimization problem, is equivalent to solving a one dimensional the relative pose is close to spherical.
optimization problem. Then we show that the same result can
be extended to the optimizing of a pose graph with “two anchor These phenomena show that the nonlinear optimization
nodes” where every edge is connecting to one of the two anchor problem involved in pose graph SLAM appears to have a
nodes. Furthermore, we prove that the global minimum of the very special underling structure that requires further investi-
resulting one dimensional optimization problem must belong to a gation. Since loop closing is an important feature in SLAM
certain interval and there are at most 3 minima in that interval. optimization problems, we start from analyzing the “minimal
Thus the globally optimal pose configuration of the pose graph
can be obtained very easily through the bisection method and pose graph SLAM problem with loop closure”, namely, the
closed-form formulas. pose graph with 3 poses and 3 constraints. Analyzing minimal
problems (e.g. http://cmp.felk.cvut.cz/minimal/ and [7]) can
I. I NTRODUCTION provide us some insights into the fundamental structure and
Recently, SLAM techniques based on pose graphs are properties of the large problems.
becoming very popular because of their robustness and com- In this paper, based on our initial work on the analysis of
putational efficiency. Solving the SLAM problem using pose the number of minima for point feature based SLAM problems
graph representation typically includes two stages. In the first [8], we prove that solving the basic 3-pose 3-constraint pose
stage (SLAM front-end), relative pose constraints are obtained graph as a least squares problem is equivalent to solving a one
(e.g. from odometry and/or scan matching) to construct a pose dimensional optimization problem. Furthermore, we extend
graph where each node is a robot pose and each edge is a the result to pose graph with “two anchor nodes” and show
relative pose constraint. In the second stage (SLAM back- how to obtain the global minimum of the one dimensional
end), an optimization technique is applied to find the optimal optimization problem1 .
pose configuration. The paper is organized as follows. Section II presents
The concept of pose graph SLAM formulation has been the least squares formulation of the 3-pose 3-constraint pose
introduced in [1]. Recent insights into the sparse structure of graph SLAM problem and shows its equivalence to a one
the SLAM problem have boosted the research in this direction. dimensional optimization problem. In Section III, the result
The work includes multi-level relaxation [2], stochastic gra- is extended to the pose graph with only two anchor nodes.
dient descent (SGD) [3], sparse Pose Adjustment (SPA) [4], Section IV presents some properties of the one dimensional
preconditioned conjugate gradient and subgraph strategy [5]. optimization problem. Section V discusses the related work
A number of 3D pose graph SLAM algorithms have also been and potential applications of the results. Finally Section VI
developed that can be applied to visual SLAM and SLAM with concludes the paper. The proofs of the two main results in
3D laser data (http://openslam.org/). this paper are given in the Appendix.
The optimization problem involved in pose graph SLAM II. P OSE G RAPH WITH 3 P OSES AND 3 C ONSTRAINTS
has a very high dimension because all the poses involved are
Suppose there are three 2D robot poses r0 , r1 , r2 and three
variables. It is expected that such a high dimensional nonlinear
constraints Z10 , Z20 , Z21 where Zji is the relative pose constraint
optimization problem could have a very large number of local
minima and general iteration based optimization algorithms 1 The MATLAB source code for solving this kind of pose graph SLAM
could frequently be trapped into one of these local minima problems is available at http://services.eng.uts.edu.au/˜sdhuang/research.htm.

425
then the objective function is
F (X)
= γr1 [(zxr1 − xr1 )2 + (zyr1 − yr1 )2 ] + γφr1 (zφr1 − φr1 )2
+ γr2 [(zxr2 − xr2 )2 + (zyr2 − yr2 )2 ] + γφr2 (zφr2 − φr2 )2
+ γr2 r1 [zxrr1 − cφr1 (xr2 − xr1 ) − sφr1 (yr2 − yr1 )]2
2

+ γr2 r1 [zyrr1 + sφr1 (xr2 − xr1 ) − cφr1 (yr2 − yr1 )]2


2

Fig. 1. Pose graph with 3 poses and 3 constraints. + γφrr1 (zφrr1 − φr2 + φr1 )2 . (6)
2 2

Note that F (X) can also be written in a new form

from pose ri to pose rj (i = 0, 1, j = 1, 2). The corresponding F (X)


pose graph is shown in Fig. 1 with 3 nodes (poses) and 3 edges = γr1 [(zxr1 − xr1 )2 + (zyr1 − yr1 )2 ] + γφr1 (zφr1 − φr1 )2
(relative pose constraints).
+ γr2 [(zxr2 − xr2 )2 + (zyr2 − yr2 )2 ] + γφr2 (zφr2 − φr2 )2
A. Least Squares Problem Formulation + γr2 r1 [A(φr1 ) − (xr2 − xr1 )]2
+ γr2 r1 [B(φr1 ) − (yr2 − yr1 )]2
We choose pose r0 as the origin of the global coordinate
frame. That is, r0 = (xr0 , yr0 , φr0 )T = (0, 0, 0)T . The least + γφrr1 (zφrr1 − φr2 + φr1 )2 (7)
2 2
squares problem is to minimize 4
where

F (X) = (Zji − H Zj (X))T PZ−1
i
i (Zj − H
i Zji
(X)) (1) A(φr1 ) = zxrr1 cφr1 − zyrr1 sφr1 ,
j
2 2
(8)
i,j B(φr1 ) = zxrr1 sφr1 + zyrr1 cφr1 .
2 2

where Zji is the relative pose constraint from pose ri to Remark 1. In (7), the term cφr1 and sφr1 are separated from
pose rj , and PZji is the corresponding covariance matrix. The the term (xr2 − xr1 ) and (yr2 − yr1 ) which make the com-
0 putation of the gradient and analyzing of the stationary points
function H Zj (X) is given by
easier. It is clear from our results that spherical covariance in
0  T the form (5) can simplify the problem significantly.
H Zj (X) = xrj yrj φ rj (2)
1
B. The Equivalence to One Dimensional Problem
and the function H Zj (X) is given by 2 The following theorem shows that the problem of minimiz-
⎡ ⎤ ing the objective function (7) is equivalent to an optimization
cφr1 (xrj − xr1 ) + sφr1 (yrj − yr1 )
problem with one variable.
H Zj (X) = ⎣−sφr1 (xrj − xr1 ) + cφr1 (yrj − yr1 )⎦ .
1
(3)
The results in this paper hold for general spherical covari-
φ rj − φ r1
ance matrices as given in (5). However, in order to simplify the
For the 3-pose 3-constraint problem, the state vector X is formulas and make the paper more readable, in the reminder
 T of this paper, we assume
X = xr2 yr2 φr2 xr1 yr1 φr1 .
γr1 = γφr1 = γr2 = γφr2 = γr2 r1 = γφrr1 = 1. (9)
2

Suppose the relative pose constraints are Theorem 1: Given data zxr1 , zyr1 , zφr1 , zxr2 , zyr2 , zφr2 , and
zxrr1 , zyrr1 , zφrr1 , we have that minimizing the objective func-
Z10 = (zxr1 , zyr1 , zφr1 ) T 2 2 2
tion (7) under assumption (9) is equivalent to minimizing the
Z20 = (zxr2 , zyr2 , zφr2 )T (4) following function of one variable φ:
Z21 = (zxrr1 , zyrr1 , zφrr1 )T .
2 2 2
1
f (φ) = φ2 + (φ + Δzφrr1 )2 − 2a cos(φ + α) + b (10)
Assume the covariance matrices PZji are spherical 3 2 2

where Δzφrr1 , a, α, b are constants that can be computed from


, γφ−1
2
PZ10 = diag(γr−1
1
, γr−1
1 r1
) the data by
−1 −1 −1
PZ20 = diag(γr2 , γr2 , γφr ) (5) Δzφrr1 = zφrr1 − zφr2 + zφr1 ∈ [−π, π),
2
'
diag(γr−1 , γr−1 , γφ−1
2 2
PZ21 = 2 r1 2 r1
r1 ), a = p2 + (d + q)2 ,
r2
α = atan2(p, d + q),
2 In this paper, we use c
φr1 , sφr1 , czφr , szφr to denote cos(φr1 ), b = 13 [zx2r1 + zy2r1 + (zxr2 − zxr1 )2 + (zyr2 − zyr1 )2 ]
1 1 r2 r2
sin(φr1 ), cos(zφr1 ), sin(zφr1 ) respectively. (11)
3 Here spherical means diagonal with the first two elements (corresponding
to x and y) being the same. Most of the publicly available datasets (e.g. [9]) 4 In the following, we will simply use A, B to denote the functions
have the covariance matrices in this format. A(φr1 ), B(φr1 ). This also applies to Ai , Bi , A2k , B2k etc.

426
TABLE I
E XAMPLE 1: FOUR DATA WITH DIFFERENT LEVEL OF NOISES . T HE GROUND TRUTH OF THE POSES ARE r0 = (0, 0, 0), r1 = (1, 0.5, π
2
), r2 = (0, 1, 34 π).
noise (zxr1 , zyr1 , zφr1 ) (zxr2 , zyr2 , zφr2 ) (zxr1 , zyr1 , zφr1 ) Δzφr1 α No. φ∗ f (φ∗ ) F (X ∗ )
r2 r2 r2 r2
zero (1.0000, 0.5000, 1.5708) (0, 1.0000, 2.3562) (0.5000, 1.0000, 0.7854) 0 0 1 0 0 0
small ( 1.1022 0.3967 1.5383) ( 0.0805 0.9991 2.5293) ( 0.6335 0.9682 0.8705) -0.1204 -0.0793 1 0.0493 0.0057 0.0057
large (0.9368 0.6861 2.1437) (-0.2468 1.0940 2.3901) (0.5627 1.3630 1.1482) 0.9018 0.5132 1 -0.3623 0.3073 0.3073
huge (1.1674 6.1375 0.0755) (-6.6684 2.7509 2.4706) (-0.8100 -7.7511 -0.6429) -3.0380 1.1342 3 -0.9978 9.7355 9.7355
‘No.’ means “the number of minima in [−2π − α, 2π − α]”.

where 200
d = 13 [(zxr2 − zxr1 )2 + (zyr2 − zyr1 )2 ] zero
p = δa czφr + δb szφr 150
small
1 1
large
q = −δa szφr + δb czφr huge
1 1
δa = −Δzxrr1 (− 13 zyr1 + 13 zyr2 ) + Δzyrr1 (− 13 zxr1 + 13 zxr2 )

f(φ)
2 2 100
δb = Δzxrr1 (− 13 zxr1 + 13 zxr2 ) + Δzyrr1 (− 13 zyr1 + 13 zyr2 )
2 2
Δzxrr1 = zxrr1 − (zxr2 − zxr1 )czφr − (zyr2 − zyr1 )szφr
2 2 1 1 50
Δzyrr1 = zyrr1 + (zxr2 − zxr1 )szφr − (zyr2 − zyr1 )czφr .
2 2 1 1
(12)
In fact, once the solution φ to the one variable optimization 0
−10 −5 0 5 10
problem is obtained, the solution to the problem of minimizing φ

F (X) in (7) can be obtained in a closed-form by the following


Fig. 2. The function f (φ) in Theorem 1 for datasets with different levels
formulas of noise: pose graph with 3 poses and 3 constraints.
φr1 = φ + zφr1
φr2 = 21 (zφr2 + zφrr1 + φr1 )
2
xr1 = 23 zxr1 + 13 zxr2 − 13 A III. P OSE G RAPH WITH T WO A NCHOR P OSES
(13)
yr1 = 23 zyr1 + 13 zyr2 − 13 B Now consider a more general pose graph as shown in Fig. 3.
xr2 = 31 zxr1 + 23 zxr2 + 13 A This pose graph has a special structure. Namely, all the edges
yr2 = 13 zyr1 + 23 zyr2 + 13 B. are connecting one of the two nodes r0 and r1 . Here we call
Proof: See Appendix A. nodes r0 and r1 “anchor nodes” or “anchor poses”.
Remark 2. The constants Δzφrr1 , a, α, b in f (φ) summarize Assume there are:
2
some important information from the relative pose constraints • n poses that have two edges linked to both r0 and r1 :
data. For example, Δzφrr1 indicates the level of consistency
2
r2 , · · · , rn , rn+1 ;
among the three angles zφrr1 , zφr2 , and zφr1 . If the three • n1 poses that only have one edge linked to r0 :
2
angles are compatible with each other (e.g. when the relative r11 , r12 , · · · , r1n1 ;
orientation estimates are accurate), then Δzφrr1 is close to 0. • n2 poses that only have one edge linked to r1 :
2
Similarly, if the relative pose data are accurate, then α is close r21 , r22 , · · · , r2n2 .
to 0 and a is proportional to the square of the distance between Assume the relative pose constraints are:
r1 and r2 . Constant b does not have any impact on the solution T
• from r0 to r1 : (zxr , zyr , zφr ) ;
to the optimization problem. 1 1 1
T
• from r0 to ri : (zxr , zyr , zφr ) , i = 2, . . . , n + 1;
Example 1. To illustrate the results in Theorem 1, consider i i i
T
• from r1 to ri : (zxr1 , zy r1 , zφr1 ) , i = 2, . . . , n + 1;
the following example. Assume the ground truth of the poses ri ri ri
T
• from r0 to r1j : (zxr , zyr , zφr ) , j = 1, . . . , n1 ;
are r0 = (0, 0, 0), r1 = (1, 0.5, π2 ), r2 = (0, 1, 34 π) which is 1j 1j 1j
T
similar to the configuration in Fig. 1. Four examples of relative • from r1 to r2k : (zxr , zyr , zφr ) , k = 1, . . . , n2 .
2k 2k 2k
constraints datasets randomly generated with 4 different levels In the example shown in Fig. 3, n = 3, n1 = 2, n2 = 4. It
of noise are listed in Table I. The corresponding constants is obvious that the 3-pose 3-constraint pose graph in Fig. 1 is
Δzφrr1 , a, α, b can then be computed for each dataset using a special case with n = 1, n1 = 0, n2 = 0.
2
the formulas in (11) and the function f (φ) in (10) can be The covariance matrices of the relative pose constraints can
obtained. The two key constants Δzφrr1 and α, the number of be assumed to have the same format as in (5). However, to
2
local minima within [−2π − α, 2π − α], the global minimum simplify the formulas, we assume the covariance matrices are
φ∗ and f (φ∗ ), as well as the objective function value obtained all identity matrices.
through solving the least squares problem using Gauss-Newton To find the optimal configuration of the pose graph, the
(F (X ∗ )) are all listed in Table I. Fig. 2 shows the function state vector X contains all the robot poses except pose r0
f (φ) for the 4 different datasets. It can be seen that more than which is the origin of the coordinate frame. Similar to (7), the
one minima exist only when the noise is unrealistically large. objective function of the least squares optimization problem

427
1 
n+1
2
b =( zxri + ( − 1)zxr1 )2
n + 2 i=2 n+2

1 
n+1
2
+( zy + ( − 1)zyr1 )2
n + 2 i=2 ri n+2

n+1
1 1 1 
n+1
+2 [ zxr1 − zxrj + zx ] 2
j=2
n+2 2 2(n + 2) i=2 ri

n+1
1 1 1 
n+1
+2 [ zyr1 − zyrj + zy ] 2
j=2
n+2 2 2(n + 2) i=2 ri

n+1  2
n+1
+ (z r1 + zy2rr1 )
2(n + 2) i=2 xri i
Fig. 3. Pose graph with two anchor poses r0 and r1 .
1 
− (zxrr1 zxrr1 + zyrr1 zyrr1 ) (17)
n+2 i j i j
2≤i<j≤n+1
can be written as where
1 
d= [(zxri − zxr1 )2 + (zyri − zyr1 )2 ]
F (X) = (zxr1 − xr1 )2 + (zyr1 − yr1 )2 + (zφr1 − φr1 )2 2
2≤i≤n+1

n+1
1 
+ [(zxri − xri ) + (zyri − yri ) + (zφri − φri )
2 2 2
− [(zxrj − zxr1 )(zxri − zxr1 )
i=2
2(n + 2)
2≤i,j≤n+1
+ (Ai − (xri − xr1 ))2 + (Bi − (yri − yr1 ))2 + (zyrj − zyr1 )(zyri − zyr1 )],
+ (zφrr1 − (φri − φr1 ))2 ] p =δa czφr + δb szφr ,
i
1 1

n1
q = − δa s z φ r + δ b c z φ r ,
+ [(zxr1j − xr1j ) + (zyr1j − yr1j )
2 2
1
⎡ 1 ⎤
j=1 
n+1
1 2 T ⎢ 1 zy r − 1 z y r + 1
zyri ⎥

n2
 Δzxrr1
n+1
⎢ n+2 1 2 j 2(n+2) ⎥
+ (zφr1j − φr1j )2 ] + [(A2k − (xr2k − xr1 ))2 δa = ( j ⎢ i=2 ⎥),
Δzyrr1 ⎢ 
n+1 ⎥
k=1 ⎣ −1 ⎦
n+2 zxr1 + 2 zxrj −
j=2 j 1 1
zxri
+ (B2k − (yr2k − yr1 )) + (zφrr1 − (φr2k − φr1 ))2 ] (14)
2 2(n+2)
i=2
2k
⎡ ⎤

n+1
where for i = 2, . . . , n + 1, k = 1, . . . , n2 , 1 2T ⎢ −1 zxr + 1
− 1
zxri ⎥

n+1
Δzxr r1 ⎢ n+2 1 2 zxrj 2(n+2) ⎥
δb = ( j ⎢ i=2 ⎥),
Δzyrr1 ⎢ 
n+1 ⎥
Ai = zxrr1 cφr1 − zyrr1 sφr1 , Bi = zxrr1 sφr1 + zyrr1 cφr1 , ⎣ −1 ⎦
i i i i
j=2 j
n+2 zyr1 + 12 zyrj − 1
2(n+2) zyri
A2k = z r
xr12k cφr1 − z r1
yr2k sφr1 , B2k = z r
xr12k sφr1 + z r1
yr2k cφr1 . i=2
Δzxrr1 = zxrr1 − [(zxri − zxr1 )czφr + (zyri − zyr1 )szφr ],
i i 1 1

The following theorem demonstrates that minimizing (14) Δzyrr1 = zyrr1 − [−(zxri − zxr1 )szφr + (zyri − zyr1 )czφr ],
i i 1 1
is equivalent to a one dimensional optimization problem.
i = 2, · · · , n + 1.
Theorem 2: The problem of minimizing the objective function
(14) is equivalent to minimizing the following function with In fact, once the solution φ to the one variable optimization
one variable φ: problem is obtained, the X that minimizes (14) can be
obtained in a closed-form by the following formulas:
(i) For pose r1j , (1 ≤ j ≤ n1 )
1
n+1
f (φ) = φ2 + [φ + Δzφrr1 ]2 − 2a cos(φ + α) + b (15) φr1j = zφr1j , xr1j = zxr1j , yr1j = zyr1j . (18)
2 i=2 i

(ii) For pose r1 ,


where Δzφrr1 , a, α and b can be computed from the relative φr1 = φ + zφr1
i
constraints data by
1  
n+1 n+1
2
x r1 = (zxri + Ai ) + (zxr1 − Ai )
Δzφrr1 = zφrr1 − (zφri − zφr1 ) ∈ [−π, π], i = 2, · · · , n + 1, n + 2 i=2 n+2 i=2
i
'i
1  
n+1 n+1
a = p2 + (d + q)2 , 2
yr1 = (zyri + Bi ) + (zyr1 − Bi ) (19)
α = atan2(p, d + q), (16) n + 2 i=2 n+2 i=2

428
TABLE II
(iii) For pose ri , (2 ≤ i ≤ n + 1), T HE NUMBER OF LOCAL MINIMA FOR f (φ) IN (15) WHEN a ≥ 1 + n
.
2
1 f  (−2π − α) <0 <0 <0 <0 <0 <0 <0 <0
φri = (zφri + zφrr1 + φr1 )
2 i f  (φ1 ) >0 >0 >0 >0 >0 ≤0 ≤0 ≤0

n+1 f  (φ2 ) <0 <0 <0 ≥0 ≥0 <0 <0 <0
1 1 f  (φ3 ) >0 >0 ≤0 >0 >0 >0 >0 ≤0
xri = (zxri + Ai ) + (zx + Aj )
2 2(n + 2) j=2 rj f  (φ4 ) <0 ≥0 <0 <0 ≥0 <0 ≥0 <0
f  (2π − α) >0 >0 >0 >0 >0 >0 >0 >0
1  n+1 No. 3 2 2 2 1 2 1 1
+ (zxr1 − Aj )
n+2 ’1, 2, 3’ denote the number of minima in [−2π − α, 2π − α].
j=2

1 1 
n+1
yri = (zyri + Bi ) + (zy + Bj )
2 2(n + 2) j=2 rj and f  (φ) is the first derivative of f (φ) given by

1  n+1

n+1
+ (zyr1 − Bj ) (20) 
f (φ) = 2a sin(φ + α) + (n + 2)φ + Δzφrr1 . (23)
n+2 j=2
i
i=2
(iv) For pose r2k , (1 ≤ k ≤ n2 ) All the 8 different cases are listed in Table II.
φr2k = zφrr1 +φr1 , xr2k = xr1 +A2k , yr2k = yr1 +B2k . (21) Proof: See Appendix B.
2k
Remark 4. Theorem 3 shows that the global minimum of the
Proof: The proof is similar to that of Theorem 1 and
one dimensional problem is in [−2π −α, 2π −α] and there are
is omitted. One of the technical challenges involved is the
at most three minima in this interval. So the one dimensional
computing of the inverse of a (2n + 2) × (2n + 2) matrix (see
problem can be solved easily using a bisection algorithm.
the proof of Theorem 1 in [8]) in order to solve the linear
Example 2. We use the Freiburg Indoor Building 079 dataset
equations to get the closed-form formulas in (19) and (20).
Remark 3. The objective functions considered in (7) and [9] to construct an example. There are 989 poses (nodes) and
(14) treat the orientation variables as real numbers instead 1314 constraints (edges) in this dataset. The data is separated
of angles. Since the error in angles is limited to [−π, π), into two parts to construct two pose graphs. Each pose graph
a regularization term might be needed in some situations. is optimized by performing a Gauss-Newton least squares
The problem of obtaining the correct regularization term is optimization. The optimized configuration of the two pose
discussed in details in [10]. Note that for the problem consid- graphs and the edges involved are shown in Fig. 4(a) and
ered in Theorem 2, because there are only two anchor poses Fig. 4(b), respectively.
and each relative angle is within [−π, π), the regularization For each pose graph, the resulting optimized pose configu-
term 2kπ with k = ±1 will be enough if needed. Once ration is the relative pose estimate with respect to the origin
the regularization term is determined, minimizing the new of the pose graph. We simply ignore the correlation among
objective function is still a one dimensional problem. the pose estimates and treat the optimized configuration as
independent relative pose constraints from the origin to the
IV. P ROPERTIES OF THE O NE D IMENSIONAL P ROBLEM other poses in the graph. We also assume the covariance
Now we consider the properties of the one dimensional matrices of the relative pose constraints are all identity.
function in Theorem 2 (Theorem 1 is a special case of Now we have got a pose graph with two anchor nodes, one
Theorem 2 with n = 1, n1 = 0, n2 = 0). anchor is the origin of the first pose graph (r0 ) and the other
Theorem 3: Assume that a > 0, α ∈ [−π, π) are constants, anchor is the origin of the second pose graph (r1 ). There are
n ≥ 1 is a positive integer and Δzφrr1 ∈ [−π, π), i = n = 39 nodes which are linked with both r0 and r1 as shown
i
2, · · · , n + 1 are all constants. Then we have that in Fig. 4(c). Applying Theorem 2, we can get function f (φ)
(i) The optimal solution φ∗ of minimizing function f (φ) in as shown in Fig. 4(d). This function has three local minima in
(15) satisfies φ∗ ∈ [−2π − α, 2π − α]. [−2π − α, 2π − α] and the global minimum is φ∗ = 0.0078.
(ii) If a < 1 + n2 , then f (φ) in (15) has one and only one The objective function value f (φ∗ ) = 0.06009 is the same
minimum which is the global minimum. as that obtained by solving the constructed pose graph as a
(iii) If a ≥ 1 + n2 , then f (φ) in (15) has at least one 988 × 3 dimensional least squares problem. Fig. 4(c) shows
and at most three local minima in [−2π − α, 2π − α]. And the result of computing all the 988 nodes using φ∗ by the
the exact number of local minima depends on the values of closed-form formulas in Theorem 2.
f  (φ1 ), f  (φ2 ), f  (φ3 ), f  (φ4 ), where 5
φ1 = −2π + arccos(− n+2 2a ) − α,
V. R ELATED W ORK AND D ISCUSSIONS
φ2 = − arccos(− n+2
2a ) − α, The pose graph optimization problem considered in Theo-
(22)
φ3 = arccos(− n+2
2a ) − α, rem 2 has some similarity with the problem of estimating the
φ4 = 2π − arccos(− n+22a ) − α. relative transformation between two coordinate frames given
5 The φ , φ , φ , φ in the one dimension problem considered in Example
1 2 3 4
two corresponding point sets [11]. The major differences are
2 are shown in Fig. 4(d). that our problem includes a relative pose information between

429
Nodes
10 While the results in this paper provide some interesting
6
Nodes

4
Edges
Edges properties of pose graph SLAM problems, they raise a number
2
5
of questions in SLAM which require further investigation.
Y (m)

Y (m)
0 In pose graph SLAM, the property that the Jacobian and
−2
0
information matrix are exactly sparse [14] has been exploited
−4
in most of the recent efficient pose graph SLAM algorithms.
−6 −5
−25 −20 −15 −10 −5
X (m)
0 5 10 −15 −10 −5
X (m)
0 5
Can we further exploit the special structure of the nonlinearity
(a) The first pose graph. (b) The second pose graph. in pose graph SLAM such that more efficient algorithms can
be developed? Or, has this structure already been implicitly
6
used in the existing algorithms?
4
r1
For example, SGD is proposed in [3][6] to be used to solve
2 the pose graph SLAM problems. This approach solves the
r
0 optimization problem by dealing with each constraint one-
Y (m)

−2 by-one and can converge to good solutions most of the time


−4
even when the initial value is very poor. There might be some
−6
relationship between the 3-pose 3-constraint problem in our
−8
−25 −20 −15 −10 −5
X (m)
0 5 10
paper and the problem of dealing with one constraint in SGD.
Moreover, since our results can be extended to pose graph
(c) The optimized pose graph with two anchor nodes (Only edges
connecting the 39 common nodes are shown: solid edges linked with multiple constraints but only two anchor poses. It might
to r0 and dashed edges linked to r1 ). be possible to deal with a set of constraints at a time when
using SGD to speed up the algorithm further.
1600
In practice, covariance matrices obtained from odometry
1400 f(φ)
1200
f’(φ) and scan matching can have more general forms, however,
1000
f"(φ) most of the publicly available datasets has the spherical
800 covariance in the format of equation (5). Our results show that
600 spherical covariances can simplify the problem significantly.
400
It might be valuable to introduce some datasets with non-
200

0
φ
1
φ
3
spherical covariance matrices as benchmark datasets for testing
φ2 φ4 2π−α
−200
−2π−α different algorithms in the SLAM community.
−400
−6 −4 −2 0 2 4 6
The pose graph with two anchor nodes shown in Fig. 3
φ
has similar structure with the problem of combining two
(d) The objective function of the one dimensional problem and optimized pose graphs using the map joining idea [15][16]
its derivatives. which can be applied to large-scale SLAM or multi-robot
Fig. 4. Example of Theorem 2 using the Freiburg Indoor Building 079 SLAM [17]. However, in each of the optimized pose graph,
dataset: pose graph with two anchor poses. all the poses are fully correlated and they cannot be simply
treated as independent relative pose constraints as what we did
in Example 2 6 . So the result in Theorem 2 cannot be directly
r0 and r1 , and we are optimizing all the variables not only applied to combining two pose graphs although it can serve
the relative pose. as a quick way to obtain a good initial value.
Recently, an approximate solution to pose graph SLAM
problem using linear estimation methods is proposed in [12] VI. C ONCLUSION AND F UTURE W ORK
and then in [10]. The first step is to obtain a suboptimal
This paper provides some insights into the structure of the
estimate of all the robot orientations using linear method,
nonlinearities in the pose graph SLAM optimization problem.
then estimate all the variables using linear method. In our
In particular, it shows that under the spherical covariance
current work, we are proposing exact solutions instead of
matrices assumption, optimizing a pose graph with only two
approximate solutions, and we just need to solve a one
anchor nodes is equivalent to a one dimensional optimization
dimensional nonlinear problem to obtain one variable and all
problem which has at most three local minima. The globally
the other variables can then be calculated using closed-form
optimal pose configuration can thus be obtained by simply
formulas. However, our current results only hold for specific
computing all the minima using a bisection algorithm.
pose graphs.
In the next step, we would like to analyze the property of the
The closest related work is [13] where it was attempted to SGD algorithm proposed in [3][6] where the constraints are
compute a closed-form solution for pose-graph SLAM with dealt with once at a time. We will also consider the extension
identity covariance matrices by using cφ = cos(φ) and sφ = of the results to deal with more general covariance matrices
sin(φ) as variables with an additional constraint c2φ + s2φ = 1.
However, only some initial results on computing the robot 6 The “good looking” result in Example 2 is probably due to the high quality
positions were derived in [13]. of the data and the high quality of the two optimized pose graphs.

430
and more general pose graph SLAM. Extension of the work Now we have got the last 5 equations in (13).
to 3D scenarios will also be a nontrivial contribution. Submit (27) and (28) into (25), and use the definition of
A, B, we have
ACKNOWLEDGMENTS
This work has been supported in part by the Funds of Na- F (X) =(zφr1 − φr1 )2 + 2(zφr2 − φr2 )2
tional Science of China (Grant No. 61004061) and Australian +3( 13 zxr1 − 13 zxr2 + 13 A)2
Research Council Discovery project (DP120102786). +3( 13 zyr1 − 13 zyr2 + 13 B)2
= (zφr1 − φr1 )2 + 12 (−zφr2 + zφrr1 + φr1 )2
2
A PPENDIX + 13 (zxr2 − zxr1 − A)2 + 13 (zyr2 − zyr1 − B)2
A. Proof of Theorem 1 = (zφr1 − φr1 )2 + 12 (−zφr2 + zφrr1 + φr1 )2
2
Since the objective function F (X) is smooth, the optimal + 13 [A2 + B 2 + (zxr2 − zxr1 )2 + (zy1r2 − 2 zyr1 )2 ]
solution that minimizes F (X) must be a stationary point with T
zxr2 − zxr1 cφr1 −sφr1 zxrr12
zero gradient, that is ∇F (X) = 0. − 23 .
zyr2 − zyr1 sφr1 cφr1 zyrr1
By the definition of A, B in (8), we have 2
(29)
dA dB By (11) and (12), we have
= −B, = A.
dφr1 dφr1
zxrr1 = (zxr2 − zxr1 )czφr + (zyr2 − zyr1 )szφr + Δzxrr1
2 1 1 2
Thus for F (X) in (7) with assumption (9), we have
zyrr1 = −(zxr2 − zxr1 )szφr + (zyr2 − zyr1 )czφr + Δzyrr1
2 1 1 2
∇F (X) zφrr1 = zφr2 − zφr1 + Δzφrr1 (30)
T 2 2
∂F (X) ∂F (X) ∂F (X) ∂F (X) ∂F (X) ∂F (X)
= ∂xr2 ∂yr2 ∂φr2 ∂xr1 ∂yr1 ∂φr1 Submit (30) into (29), also denote
⎡ ⎤
−(zxr2 − xr2 ) − (A − xr2 + xr1 ) φ = φr1 − zφr1 (31)
⎢ −(zyr − yr2 ) − (B − yr2 + yr1 ) ⎥
⎢ 2 ⎥ 2 2
zx2r1 zy2r1 ,
⎢−(zφr − φr2 ) − (zφr1 − φr2 + φr1 )⎥ and notice that A + B = + we can have

= 2⎢ 2 r2 ⎥ (24) r2 r2

⎢ −(zxr1 − xr1 ) + (A − xr2 + xr1 ) ⎥ F (X)
⎣ −(zy − yr ) + (B − yr + yr ) ⎦
r1 1 2 1
1
Θ =φ2 + (φ + Δzφrr1 )2 − 2d cos φ − 2δb cφr1 + 2δa sφr1 + b
2 2

where 1
=φ + (φ + Δzφrr1 )2 − 2(d + q) cos φ + 2p sin φ + b
2
2 2
Θ = 2φr1 − zφr1 + zφrr1 − φr2 + B(xr2 − xr1 ) − A(yr2 − yr1 ). 1
=φ + (φ + Δzφrr1 )2 − 2a cos(φ + α) + b
2 2
(32)
For any stationary point X, we can simplify the function F (X) 2 2

using the first 5 equations of ∇F (X) = 0 to get where d, δa , δb , b, p, q, a, α are given by (11) and (12).
Now we have proved that F (X) = f (φ) for all the
F (X) =(zφr1 − φr1 )2 + 2(zφr2 − φr2 )2
stationary points X. Furthermore, it can be proved that for
+ 3(zxr2 − xr2 )2 + 3(zyr2 − yr2 )2 . (25) stationary points X, ∇F (X) = 0 is equivalent to f  (φ) = 0,

The 1st, 2nd, 4th, and 5th equations of ∇F (X) = 0 can be and ∇ F (X) > 0 is equivalent to f (φ) > 0. The details are
2

written as omitted due to the space limit.


Thus the minimization of F (X) in (7) is equivalent to
M1 X1 = N1 minimizing f (φ) in (10). This completes the proof.
 T
where X1 = xr2 yr2 xr1 yr1 , and B. Proof of Theorem 3
⎡ ⎤ ⎡ ⎤ First, note that f (φ) can be divided into two parts,
2 0 −1 0 zxr2 + A
⎢0 0 −1⎥ ⎢ ⎥
1
n+1
M1 = ⎢
2 ⎥ , N1 = ⎢zyr2 + B ⎥ . (26)
⎣−1 0 2 0 ⎦ ⎣ zxr1 − A ⎦ f1 (φ) = φ 2
+ (φ + Δzφrr1 )2 (33)
2 i=2 i
0 −1 0 2 zyr1 − B
and
Then we can compute
⎡ ⎤ ⎡1 f2 (φ) = −2a cos(φ + α) + b. (34)
2 1 ⎤
xr2 3 zxr1 + 3 zxr2 + 3 A
⎢ yr2 ⎥ ⎢1 1 ⎥ Obviously, f1 is convex and the minimum of f1 is φmin =
⎥ = M −1 N1 = ⎢ 23 zyr1 + 31 zyr2 + 31 B ⎥ . (27) − 1 n+1 Δz r1 . Since Δz r1 ∈ [−π, π], we have φmin ∈
2
X1 = ⎢ ⎣x r 1 ⎦ 1 ⎣ zxr + zxr − A⎦ n+2 i=2 φri φri
3 1 3 2 3
[− n
π, n
π] ∈ (−π, π). Then f1 (φ) must be monotone
3 zyr1 + 3 zyr2 − 3 B
yr1 2 1 1 n+2 n+2
decreasing in (−∞, −π] and monotone increasing in [π, ∞).
From the 3rd equation of ∇F (X) = 0, we have On the other hand, f2 is periodical and the minimal value
1 is achieved at −α, −2π − α, 2π − α. Since α ∈ [−π, π], we
φr2 = (zφr2 + zφrr1 + φr1 ). (28) have −2π − α ≤ −π and 2π − α ≥ π. So f1 (φ) is monotone
2 2

431
decreasing in (−∞, −2π − α] and is monotone increasing in Intelligent Robots and Systems (IROS), pp. 2566-2571,
[2π − α, +∞). Thus we have 2010.
[6] G. Grisetti, C. Stachniss, and W. Burgard. Non-linear
f (φ) ≥ f (2π − α), ∀φ ≥ 2π − α,
(35) constraint network optimization for efficient map learn-
f (φ) ≥ f (−2π − α), ∀φ ≤ −2π − α,
ing. IEEE Transations on Intelligent Transportation Sys-
and thus the global minimum of f (φ) is in the interval [−2π − tems. 10(3): 428-439, 2009.
α, 2π − α]. n+1 [7] X. S. Zhou and S. I. Roumeliotis. Determining the Robot-
Second, denote e = i=2 Δzφrr1 ∈ [−nπ, nπ], then to-Robot 3D relative pose using combinations of range
i


and bearing measurements: 14 minimal problems and
f (φ) = 2a sin(φ + α) + (n + 2)φ + e (36) closed-form solutions to three of them. IEEE/RSJ Inter-
and national Conference on Intelligent Robots and Systems
f  (φ) = 2a cos(φ + α) + (n + 2). (37) (IROS), pp. 2983-2990, 2010.
[8] S. Huang, H. Wang, U. Frese, and G. Dissanayake.
If a < 1 + n2 , we have f  (φ) > 0 for any φ. Thus f (φ) is On the number of local minima to the point feature
convex and there is one and only one minimum. based SLAM problem. IEEE International Conference
Third, consider the case when a ≥ 1 + n2 . Let f  (φ) = 0, on Robotics and Automation (ICRA), pp. 2074-2079,
we get four roots φ1 , φ2 , φ3 , φ4 as given in (22). Note that 2012.
φ1 , φ2 , φ3 , φ4 divide interval [−2π −α, 2π −α] into 5 intervals [9] R. Kummerle, B. Steder, C. Dornhege, M. Ruhnke,
where f  is monotone in each of the intervals, i.e., [−2π − G. Grisetti, C. Stachniss, and A. Kleiner. SLAM
α, φ1 ), [φ1 , φ2 ), [φ2 , φ3 ), [φ3 , φ4 ), and [φ4 , 2π − α]. So the benchmarking webpage. http://kaspar.informatik.uni-
number of minima of f (φ) can be analyzed by observing the freiburg.de/˜slamEvaluation/datasets.php, 2009.
values f  (−2π − α), f  (φ1 ), f  (φ2 ), f  (φ3 ), f  (φ4 ), f  (2π − [10] L. Carlone, R. Aragues, J. Castellanos, and B. Bona.
α). For example, if f  (φ4 ) < 0 and f  (2π − α) > 0, then A linear approximation for graph-based simultaneous
there is a minima in interval [φ4 , 2π − α] (see Fig. 4(d)). localization and mapping. Robotics: Science and Systems
Note that e ∈ [−nπ, nπ], so we have (RSS), 2011.
f  (−2π − α) = (2 + n)(−2π − α) + e [11] K. S. Arun, T. S. Huang, and S. D. Blostein. Least
squares fitting of 3-D point sets. IEEE Transactions on
= −(2 + n)(π + α) − (2 + n)π + e Pattern Analysis and Machine Intelligence, PAMI-9(5):
≤ −(2 + n)π + e ≤ −2π < 0 698-700. 1987.

f (2π − α) = (2 + n)(2π − α) + e [12] J. L. Martinez, J. Morales, A. Mandow, A. Garcia-
= (2 + n)(π − α) + (2 + n)π + e Cerezo, Incremental closed-form solution to globally
consistent 2D range scan mapping with two-step pose
≥ (2 + n)π + e ≥ 2π > 0 estimation, The 11th IEEE International Workshop on
Since f  (φ1 ) ≤ 0 ⇒ f  (φ2 ) < 0, f  (φ2 ) ≥ 0 ⇒ f  (φ3 ) > 0, Advanced Motion Control, 21-24 March 2010, pp. 252-
f  (φ3 ) ≤ 0 ⇒ f  (φ4 ) < 0, there are only 8 possible cases 257, 2010.
which are listed in Table II. So there are at most three minima [13] D. L. Rizzini. Towards a closed-form solution of con-
in [−2π − α, 2π − α]. This completes the proof. straint networks for Maximum Likelihood mapping. The
14th International Conference on Advanced Robotics
R EFERENCES (ICAR 2009), pp. 1-5, 2009.
[1] F. Lu and E. Milios. Globally consistent range scan align- [14] V. Ila, J. M. Porta, and J. Andrade-Cetto. Information-
ment for environment mapping. Autonomous Robots, based compact pose SLAM. IEEE Transactions on
4(4): 333-349, 1997. Robotics, 26(1): 78-93, 2010.
[2] U. Frese, P. Larsson, and T. Duckett. A multilevel [15] K. Ni, D. Steedly, and F. Dellaert. Tectonic SAM: Exact,
relaxation algorithm for simultaneous localization and out-of-core, submap-based SLAM. IEEE International
mapping. IEEE Trans. on Robotics, 21(2):196-207, 2005. Conference on Robotics and Automation (ICRA), pp.
[3] E. Olson, J. Leonard, and S. Teller. Fast iterative opti- 1678-1685, 2007.
mization of pose graphs with poor initial estimates. IEEE [16] S. Huang, Z. Wang, G. Dissanayake, and U. Frese.
International Conference on Robotics and Automation Iterated SLSJF: A sparse local submap joining algorithm
(ICRA), pp. 2262-2269, 2006. with improved consistency. 2008 Australiasan Confer-
[4] K. Konolige, G. Grisetti, R. Kummerle, B. Limketkai, ence on Robotics and Automation, 2008.
and R. Vincent. Efficient sparse pose adjustment for 2D [17] B. Kim, M. Kaess, L. Fletcher, J. Leonard, A. Bachrach,
mapping. IEEE/RSJ International Conference on Intelli- N. Roy, and S. Teller. Multiple relative pose graphs for
gent Robots and Systems (IROS), pp. 22-29, 2010. robust cooperative mapping. IEEE International Con-
[5] F. Dellaert, J. Carlson, V. Ila, K. Ni, and C. E. Thorpe. ference on Robotics and Automation (ICRA), pp. 3185-
Subgraph-preconditioned conjugate gradients for large 3192, 2010.
scale SLAM. IEEE/RSJ International Conference on

432
Probabilistic Modeling of Human Movements for
Intention Inference
Zhikun Wang1,2 , Marc Peter Deisenroth2 , Heni Ben Amor2 , David Vogt3 , Bernhard Schölkopf1 , Jan Peters1,2
1
MPI for Intelligent Systems, Spemannstr. 38, 72076 Tübingen, Germany.
2
TU Darmstadt, Hochschulstr. 10, 64289 Darmstadt, Germany.
3
TU Freiberg, Bernhard-von-Cotta Str. 2, 09596 Freiberg, Germany.
{zhikun, bs}@tuebingen.mpg.de {marc, amor, peters}@ias.tu-darmstadt.de

Abstract—Inference of human intention may be an essential


step towards understanding human actions and is hence impor-
tant for realizing efficient human-robot interaction. In this paper,
we propose the Intention-Driven Dynamics Model (IDDM), a
latent variable model for inferring unknown human intentions.
We train the model based on observed human movements/actions.
We introduce an efficient approximate inference algorithm to
infer the human’s intention from an ongoing movement. We
verify the feasibility of the IDDM in two scenarios, i.e., target in- (a) Robot table tennis. (b) Interactive humanoid robot
ference in robot table tennis and action recognition for interactive
humanoid robots. In both tasks, the IDDM achieves substantial Fig. 1: Two representative HRI scenarios where intention
improvements over state-of-the-art regression and classification. inference plays an important role: (a) target inference in robot
I. I NTRODUCTION table tennis games. (b) action recognition for human-robot
interaction.
Recent advances in sensors and algorithms allow for robots
with improved perception abilities. For example, robots can
now recognize human poses in real time using depth cameras smooth trajectories in latent space with respect to the intention.
[22], which can enhance their ability to interact with humans. In this paper, we exploit this smoothness property and pro-
However, effective perception alone may not be sufficient for pose the Intention-Driven Dynamics Model (IDDM), in which
Human-Robot Interaction (HRI), since the robot’s reactions the dynamic in the latent states is driven by the intention of the
should depend on understanding the human’s action. An im- human action/behavior. The IDDM can simultaneously find
portant understanding problem is inferring the others’ intention a good low-dimensional representation of high-dimensional
(also referred to as goal, target, desire, plan) [23], which and noisy observations and describe the dynamics in the low-
humans heavily rely on, for example, in sports, games and dimensional latent space. Using the learned model, human
social activities. Humans can learn and improve the ability of intention can be inferred from an ongoing movement. As
prediction by training. For example, skilled tennis players are exact inference is not tractable in our model, we propose
usually trained to possess better anticipation than amateurs. an approximate inference algorithm and show that it can
This observation raises the question of how a robot can also efficiently infer the intention of a human partner.
learn to infer the underlying intention from human movements. To verify the feasibility of the proposed methods, we discuss
In order to infer the intention from an ongoing movment, two representative scenarios where intention inference plays an
we first address its inverse problem, i.e., modeling how the important role in human-robot interactions:
movement is governed by the intention. This idea is built (1) Target inference in robot table tennis. We consider
upon the hypothesis that a human movement usually follows human-robot table tennis games, as shown in Fig. 1a. The
a goal-directed policy [4, 10]. The human movement con- robot’s hardware constraints impose strong limitations on its
sidered here is represented by a time series of observations. flexibility. It requires sufficient time to execute a ball-hitting
This makes discrete-time dynamics models a straightforward plan: movement initiation to an appropriate preparation pose
choice for movement modeling and intention inference. In a is needed before the opponent returns the ball, to achieve the
robotic scenario, we often rely on high-dimensional and noisy required velocity for returning the ball [27]. The robot player
sensor data. However, the intrinsic dimensionality is typically uses different preparation poses for forehand and backhand
much smaller. Therefore, we seek a low-dimensional latent hitting plans. Hence, it is necessary to choose between them
representation of the relevant information in the data, and based on inference of the opponent’s target location.
then model how the intention governs the dynamics in this (2) Action recognition for interactive humanoid robot. In
low-dimensional state space. Jointly considering both the low- this setting, we use our technique to improve the interaction
dimensional representation and the latent dynamics leads to capabilities of a NAO humanoid robot, as shown in Fig. 1b. In

433
measured from the d-dimensional state space X to the D-
g dimensional observation space Z, given by
zt = Wz̃t , z̃t = h(xt ) + nz,t , nz,t ∼ N (0, Σz,t ) , (1)
x1 x2 x3 ··· x1 x2 x3 ···
where W = diag(w1 , . . . , wD ), i.e., the i-th dimension of
observations zt is the scaled i-th dimension of the outputs
z1 z2 z3 z1 z2 z3 z̃t with a parameter wi . The scaling parameters W allow for
dealing with raw features that are measured in different units,
(a) GPDM (b) IDDM such as positions and velocities. We place a Gaussian process
Fig. 2: Graphical models of the Gaussian process dynamical (GP) prior distribution on every dimension of the unknown
model (GPDM) and the proposed intention-driven dynamics function h [20], which can be marginalized out during learning
model (IDDM). The proposed model explicitly incorporates and inference. A GP is fully specified by a mean function
the intention as an input to the transition function. mz (·) and a positive semidefinite covariance (kernel) function
kz (·, ·). The predictive probability of the observations zt is
given by a Gaussian distribution zt ∼ N (mz (xt ), Σz (xt )) .
order to realize natural and compelling interactions, the robot We consider first-order Markovian dynamics, see Fig. 2b,
needs to correctly recognize the actions of its human partner. which is modeled by a latent transition function f , given by
This ability, in turn, allows it to act in a proactive manner. We
show that the IDDM has the potential to identify the intention xt = f (xt−1 , g) + nx,t , nx,t ∼ N (0, Σx,t ) . (2)
of action from movements in a simplified scenario. The state at time t is determined by the latent state at time
The paper is organized as follows. We present the IDDM t − 1 as well as by the intention g. We place a GP prior
and address the problem of training in Section II. In Sec- GP(mx (·), kx (·, ·)) on every dimension of f and marginalize
tion III, we study efficient approximate algorithms for in- it out. Then, the predictive distribution of the latent state
tention inference. We verify the feasibility of the proposed xt conditioned on xt−1 and the intention g is a Gaussian
methods in the two scenarios in Section IV and V. Finally, distribution given by xt ∼ N (mx (xt−1 , g), Σx (xt−1 , g)).
we provide a brief review of related work in Section VI, and To summarize, in the proposed IDDM, one set of GPs mod-
conclude in Section VII. 1 els the transition function in the latent space conditioned on
the intention g. A second set of GPs models the measurement
II. I NTENTION -D RIVEN DYNAMICS M ODEL mapping from the latent states x and the observations z.
We propose the Intention-Driven Dynamics Model (IDDM), A. Covariance Functions
which is an extension of the Gaussian Process Dynamical For simplicity, we use GP prior mean functions that are zero
Models (GPDM) [25]. The GPDM provides a nonparametric everywhere, i.e., mz (·) ≡ 0 and mx (·) ≡ 0. Hence, the model
approach to learning the transition function in the latent state is determined by the covariance functions kz (·, ·) and kx (·, ·),
space and the measurement mapping from states to obser- which will be motivated in the following.
vations simultaneously. As shown in Fig. 2a, the transition The underlying dynamics of human motion are usually
function in GPDM is only determined by the latent state. nonlinear. To account for nonlinearities, we use a flexible
However, in the cases considered in this paper, the underlying Gaussian tensor-product covariance function with automatic
intention, as an important drive of human movements, can relevance determination for the dynamics, i.e.,
hardly be discovered directly from the observations or the
kx ([x, g], [x , g ]; α) = kx (x, x ; α)kx (g, g ; α) (3)
estimated latent states. Considering that the dynamics can be  
 2  2 −1
substantially different when the actions are based on different = α1 exp − 2 x − x Λx − 2 g − g Λg + α4 δxx δgg ,
1 1

intentions, we propose the IDDM, the graphical model of


which is shown in Fig. 2b. The IDDM explicitly incorporates where the shorthand notation a2Λ  aT Λa, the diago-
the intention into the transition function in the latent state nal matrices Λx = diag(α21 , . . . , α2d ) > 0 and Λg =
space. The dynamics model is inspired by the hypothesis that diag(α31 , . . . , α3|g| ) > 0 weight the corresponding input
the human action is directed by the goal [4, 10]. For example, dimensions, α is the set of all hyperparameters, and δ is
in table tennis, the player swings the racket in order to return the Kronecker delta function. When the intention g is a
the ball to the intended target. discrete variable, we set the hyperparameter α3 = ∞ such
The observations of a movement comprise a time series of that kx (g, g  ; α) ≡ δg,g .
observations z1:T  [z1 , . . . , zT ]. For notation simplicity and The covariance function for the measurement mapping from
without loss of generality, we consider that all the observations the state space to observation space is chosen depending on
of movements have the same length T . In the proposed the task. For example, the GPDM in [25] uses an isotropic
generative model, we assume that the observation zi are Gaussian covariance function parameterized by the hyperpa-
rameters β
 
kz (x, x ; β) = exp − β21 x − x 2 + β2−1 δx,x , (4)
1 Supplementary technical details, demos and results can be found at
http://www.ias.tu-darmstadt.de/Research/ProbabilisticMovementModeling

434
as intuitively the latent states that generate human poses lie Algorithm 1: The algorithm learns the model hyperparam-
on a nonlinear manifold. Note that the hyperparameters β do eters α, β, and W. They are obtained by maximizing the
not contain the signal variance, which is parameterized by the marginal likelihood using the Monte Carlo EM algorithm.
scaling factors W. When applying it to target prediction in Input : Data: D = {Z, g}
table tennis games, we use the linear kernel Input : Number of EM iterations: L
kz (x, x ; β) = xT x + β1−1 δx,x , (5) Output: Model hyperparameters: Θ = {α, β, W}
1 for l ← 1 to L do
as the observations are already low-dimensional, but subject to 2 for i ← 1 to I do
substantial noise. Empirical comparisons of different covari- 3 Initialize X by its MAP estimate ;
ance functions for the measurement mapping, i.e., the isotropic 4 Draw sample X(i) from p(X|Z, g, Θ) using
Gaussian in Eq. (4) and the linear covariance function in HMC ;
Eq. (5), will be shown in Section IV. I
5 Maximize I1 i=1 p(Z, X(i) |g, Θ) w.r.t. Θ using
B. Learning the IDDM SCG;
The proposed model can be learned from a training data set
D = {Z, g} of J movements and corresponding intentions.
Each movement Zj consists of a time series of observations
by the kernel function kx (·, ·). We use a Gaussian prior
given by Zj = [zj1 , . . . , zjT ]T . We construct the overall
distribution on the initial states X1 .
observation matrix Z by vertically concatenating observation
Based on Eqs. (7)-(8), the MAP estimate of the states is
matrices Z1 , . . . , ZJ , and the overall intention matrix g from
learned by maximizing the posterior in Eq. (6). We minimize
g1 , . . . , gJ . In the robot table tennis example, one movement
the negative log-posterior
corresponds to a stroke of the opponent, represented by a
time series of observed racket and ball configurations. We  −1 
L(X) = D 2 log |Kz | + 2 tr Kz ZW Z
1 2 T
− M log |W|
assume the intention g can be obtained in the training data,    
for example by post-processing the data. In the robot table + d2 log |Kx | + 12 tr K−1
x X X T
2:T 2:T + 1
2 tr X1 XT1 + const
tennis example, the observed intention corresponds to the
with respect to the states X, using the Scaled Conjugate
target where the opponent returns the ball to (see Fig. 3 for
Gradient (SCG) method.
an illustration).
Similar to the work by [25], we find the maximum a
C. Learning Hyperparameters
posteriori (MAP) estimate of the latent states X. Alternative
learning methods and an empirical comparison can be found A reliable approach to learning the hyperparameters
in [24, 5]. Given the model hyperparameters, the posterior Θ = {α, β, W} - is to maximize the marginal likelihood
distribution of latent states X can be decomposed into the p(Z|g, Θ) = p(Z, X|g, Θ)dX, which can be achieved
probability of observations given states and the probability of approximately by using Expectation-Maximization (EM) algo-
states given intention: rithm. The EM algorithm computes the posterior distribution
of states q(X) = p(X|Z, g, Θ) in the Expectation (E) step,
p(X|Z, g, α, β, W) ∝ p(Z|X, β, W)p(X|g, α), (6)
and updates the hyperparameters by maximizing the expected
both obtained by the GP marginal likelihood [20]. data likelihood Eq [p(Z, X|g, Θ)] in the Maximization (M)
The GP marginal probability of the observations Z given step. However, the posterior distribution q(X) is difficult to
the latent states X is given by a Gaussian distribution achieve in IDDM. Following [25], we draw samples of the
states X(1) , . . . , X(I) from the posterior distribution using
p(Z|X, β, W) Hybrid Monte Carlo (HMC) [2], and hence the data like-
  
|W|M
=√ exp − 12 tr K−1 2 T
z ZW Z , (7) lihood is estimated viaMonte Carlo integration given by
(2π)M D |Kz |D I
Eq [p(Z, X|g, Θ)] ≈ I1 i=1 p(Z, X(i) |g, Θ). In the M step,
where M  JT is the length of observations Z, and Kz is we use SCG to update the hyperparameters. In practice, we
the covariance matrix of X computed by the kernel function choose the number of samples I = 100 and the number of
kz (·, ·). EM iterations L = 10. Although this procedure, as described
Given the intention g, the sequence of latent states X has in Algorithm 1, is time demanding, this is not a big issue in
a Gaussian probability practice as we learn the hyperparameters off-line.
The model also depends on the hyperparameter d: the
p(X|g, α) = p(X1 )p(X2:T |X1:T −1 , g, α) dimensionality of the latent state space. Choosing an appro-
  
−1
= √ p(X 1)
exp − 1
2 tr K x X 2:T X T
2:T , (8) priate d is important. If the dimensionality is too small the
md
(2π) |Kx |
d
latent states cannot recover the observations and, therefore,
where Xindices is constructed by vertically concatenated state leads to significant prediction errors. On the other hand, a
matrices x1indices , . . . , xJindices , m  J(T − 1) is the length of high-dimensional state space results in redundancy and can
X2:T , and Kx is the covariance matrix of X1:T −1 computed cause a drop in performance and computational efficiency.

435
Nevertheless, model selection, for example based on cross- A. Filtering and Smoothing in the IDDM
validation, is helpful and should be conducted before learning Given a prior distribution p(x1 ) and the current estimate
and applying the model. of g, we are interested in computing the posterior distribution
To summarize, the model M = {X, Θ} can be learned p(x1:T |z1:T , g0 ).
from a data set D. It is used to infer the unobserved intention A Gaussian approximation of the joint distribution
of a new ongoing movement. p(x1:T |z1:T , g0 ) is computed explicitly by computing
III. A PPROXIMATE I NTENTION I NFERENCE the marginals p(xt |z1:T , g0 ) and the cross-covariances
p(xt−1 , xt |z1:T , g0 ). These steps yield a Gaussian approxi-
After learning the model M from the data set D, the
mation with a block-tri-diagonal covariance matrix. For the
stationary intention g can be inferred from a sequence of
computations, we employ forward-backward smoothing (GP-
new observations z1:T at test time. In this section, we omit
RTSS) in GPDM, see [9].
the model M and the data set D for notation simplicity.
The computation of the joint distributions
Exact inference is not tractable as the intention g is connected
p(xt−1 , xt |z1:t−1 , g0 ) and p(xt , zt |z1:t−1 , g0 ) suffices
to all the unobserved latent states x1:T in the graphical
for forward-backward smoothing in the IDDM, as the
model. Therefore, we use EM algorithm to find the maximum-
Gaussian filter/smoothing updates can be expressed solely
likelihood estimate (MLE) of the intention g∗ , given by
 in terms of means and (cross-)covariances of these joint
g∗ = arg maxg p(z1:T |g) = p(z1:T , x1:T |g)dx1:T , (9) distributions [7].
We outline the computations required for the Gaussian
which is obtained by maximizing the marginal likelihood with approximation of p(xt−1 , xt |z1:t−1 , g0 ) using moment match-
the latent states x1:T integrated out. To find the MLE, we apply ing; the Gaussian approximation of p(xt , zt |z1:t−1 , g0 ) fol-
the EM algorithm. lows analogously.
In the E step, we calculate the posterior distribution of the We approximate the joint distribution p(xt−1 , xt |z1:t−1 , g0 )
latent states p(x1:T |z1:T , g0 ) based on the current estimate of by the Gaussian
the intention g0 . This problem corresponds to the filtering and .1 2 /
μxt−1|t−1 Σxt−1|t−1 Σxt−1,t|t−1
smoothing problems in nonlinear dynamical systems. For the N , . (13)
GPDM and the IDDM, approximate filtering and smoothing is μxt|t−1 Σxt,t−1|t−1 Σxt|t−1
necessary, using particle filtering or Gaussian approximations
We use the short-hand notation adb|c where a = μ denotes
for example. In [13] Particle filters (GP-PF), extended Kalman
the mean μ and a = Σ denotes the covariance, b denotes
filters (GP-EKF), and unscented Kalman filters (GP-UKF) for
the time step of interest, c denotes the time step up to which
approximate filtering with GPs were proposed. Assumed Den-
we consider measurements, and d ∈ {x, z} denotes either the
sity Filter (GP-ADF) for efficient GP filtering and smoothing
latent space (x) or the observed space (z).
based on moment matching were proposed in [8, 9]. We follow Without loss of generality, in Eq. (13), we assume
the work in [8, 9] as the approximated posterior distribution   that the
marginal distribution N xt−1 | μxt−1|t−1 , Σxt−1|t−1 is known
q(x1:T ) ≈ p(x1:T |z1:T , g0 ) provides credible error bars, i.e., (corresponds to the filter distribution at time t−1). We compute
it is robust to incoherent estimates. We describe our approach the remaining elements of Eq. (13) explicitly using moment-
to approximate filtering and smoothing in the intention-driven matching.
dynamics model in Section III-A. Using iterated expectations, the a-th dimension of the mean
In the M step, we update the intention based on approx-
of the marginal p(xt |z1:t−1 , g0 ) is
imated posterior distribution q(x1:T ) ≈ p(x1:T |z1:T , g0 ), by  
maximizing the expected data log-likelihood (μxt|t−1 )a = Ext−1 Efa [fa (xt−1 , g0 )|xt−1 ]|g0 , z1:t−1 (14)

arg maxQ(g|g0 ) = Eq [log p(x1:T , x1:T |g)] = max (xt−1 , g0 )p(xt−1 |z1:t−1 , g0 )dxt−1 ,
g
= Eq [log p(z1:T |x1:T )] + Eq [log p(x1:T |g)], (10) where we plugged in the posterior GP mean function, for the
     
Qz Qx (g) inner expectation. Writing out the posterior mean function and
defining γ a := K−1 x ya , with yai , i = 1, . . . , M , being the
where Qz is independent of the intention g and hence can be
training targets of the GP with target dimension a, we obtain
omitted, and decomposition of Qx (g) leads to

T −1 (μxt|t−1 )a = q γ a , where (15)

arg max Qx (g) = arg max Qt (g), (11)
g g
t=1
q = kx ([xt−1 , g0 ], X̄)p(xt−1 |z1:t−1 , g0 )dxt−1 .
T
(16)
where
 Here, X̄ denotes the set of the M GP training inputs (xij , gj ),
Qt (g) = q(xt , xt+1 ) log p(xt+1 |xt , g)dxt+1 dxt . (12) i = 1, . . . , N, j = 1, . . . , J. The intention gj is assumed
stationary in a single trajectory x1j , . . . , xN j . For notational
This optimization problem can be solved approximately. We convenience we assume that all training sequences are N time
present the M step in detail in Sec III-B. steps long.

436
If kx is a Gaussian kernel, we can solve the integral in Algorithm 2: The EM algorithm finds the maximum-
Eq. (16) analytically and obtain the vector q with entries likelihood estimate of continuous intention g.
qi , i = 1, . . . , M, Input : Observations x1:T
1   Output: MLE of the continuous intention g
qi = σf2 |Σxt−1|t−1 Λx + I|− 2 exp − 12 ζ Ti Ω−1 ζ i ,
1 Initialize g ;
ζ i = xi − μxt−1|t−1 , Ω = Σxt−1|t−1 + Λ−1
x . 2 repeat

The computations of the (cross-)covariances Σxt−1,t|t−1 and 3 g0 ← g ;


Σxt|t−1 in Eq. (13) follow the same scheme—we have to solve 4 E step: approximate p(x1:T |z1:T , g0 ) with q(x1:T ) by
integrals of a Gaussian prior p(xt−1 |z1:t−1 , g0 ) times two forward-backward smoothing (GP-RTSS) ;
Gaussian kernels (instead of a single one for the mean, see 5 M step: update g by optimizing (11) using SCG with
Eq. (16)). The computations are performed analytically but J steps, based on Eq. (26) ;
6 until g − g  <  ;
0
are omitted here. We refer to [8, 9] for details.
With the Gaussian kernel as in Eq. (4), a Gaussian ap-
proximation to the second joint p(xt , zt |z1:t−1 , g0 ) can be
computed analogously. For the linear measurement kernel in We can approximate p(xt+1 |xt , g)q(xt ) with a joint Gaussian
Eq. (5), we can also solve the integration corresponding to distribution using the same technique as in the E step, and
Eq. (16) analytically and obtain obtain q̃(xt , xt+1 |g) = N (μ̃, Σ̃), as well as gradients ∂ μ̃/∂g
and ∂ Σ̃/∂g. As a result, the Eq. (12) can be approximated as
q = β1 X̄μxt−1|t−1 . (17)
Due to space restrictions, we omit further details, but the Qt (g) ≈ DKL (qt,t+1 ||q̃t,t+1 ) + H(qt,t+1 ) + H(q(xt )), (26)
required computations are straightforward. with shorthand notation qt,t+1  q(xt , xt+1 ) and q̃t,t+1 
Following [7], we obtain the updates for the latent state q̃(xt , xt+1 |g). In Eq. (26), the entropy of a Gaussian distri-
posteriors (filter and smoothing distributions) as bution and the KL divergence between two Gaussians can be
−1
μxt|t = μxt|t−1 + Σxz z
t|t−1 (Σt|t−1 ) (zt − μzt|t−1 ) , (18) computed analytically. Therefore, we can analytically compute
−1 zx the approximate value of Qt (g) in Eq. (26) and the corre-
Σxt|t = Σxt|t−1 − Σxz z
t|t−1 (Σt|t−1 ) Σt|t−1 , (19)
sponding gradients with respect to the intention g. The M
μxt−1|T = μt−1|t−1 + Jt−1 (μt|T − μt|t−1 ) ,
x x x
(20) step, the optimization problem in (10) and (11), is then solved
Σxt|T = Σxt−1|t−1 + Jt−1 (Σxt|T − Σxt|t−1 )JTt−1 , (21) using SCG. The Algorithm 2 describes the inference algorithm
for continuous intentions g. We set the number of SCG steps
where we have to J = 20 in experiments.
Jt−1 = Σxt−1,t|t−1 (Σxt|t−1 )−1 . (22) When the intention g is a discrete variable, for example the
type of action, a more efficient algorithm is to enumerate all
These smoothing updates yield the marginals of the possible value of g and choose the one with the maximum
p(x1:T |z1:T , g0 ). The missing cross-covariances Σxt−1,t|T of approximated value (Jensen’s lower bound) of the log marginal
p(x1:T |z1:T , g0 ) that yield a block-tri-diagonal covariance likelihood in Eq. (9), given by
matrix are
log p(z1:T |g) ≈ H(q(x1:T )) + Eq [log p(z1:T , x1:T |g)] , (27)
Σxt−1,t|T = Jt−1 Σxt|T , (23)
where q(x1:T ) ≈ p(x1:T |z1:T , g) is the approximated posterior
where Jt−1 is given in Eq. (22). For more technical details, we
distribution given the current value of intention g.
refer to [6] and the supplementary material (see Footnote 1).
To summarize, we proposed an efficient approximate ap-
B. Maximization proach to intention inference from a new movement. The
With these computation, we obtain a Gaussian approxima- method can deal with both continuous (Algorithm 2) and
tion to the posterior q(x1:T ) ≈ p(x1:T |z1:T , g0 ) with a block- discrete (Eq. (27)) intention variables.
tri-diagonal covariance matrix. The result
.1 2 / IV. ROBOT TABLE T ENNIS
μxt|T Σxt|T Σxt,t+1|T Playing table tennis is a challenging task for robots, as it
q(xt , xt+1 ) = N , (24)
μxt+1|T Σxt+1,t|T Σxt+1|T requires accurate prediction of the ball’s movement and very
is used in the M step. Here, we first consider continuous fast response. Hence, robot table tennis has been used by many
intention variables g. groups as a benchmark task in robotics [16, 17]. Thus far, none
The integration in Eq. (12) can be rewritten as of the groups which have worked on robot table tennis ever got
 to the levels of a young child despite having robots that could
Qt (g) = q(xt , xt+1 ) log (p(xt+1 |xt , g)q(xt )) dxt+1 dxt see and move faster and more accurate than humans [17].
 Likely explanations for this performance gap are (i) the human
− q(xt ) log q(xt )dxt . (25) ability to predict hitting points from opponent movements
and (ii) the robustness of human hitting movements [17]. In

437
45
IDDM
GPR

mean absolute error in cm


40

35

30

Fig. 3: The robot’s hitting point is the intersection of the 25


320 240 160 80
coming ball’s trajectory and the virtual hitting plane 80 cm time in ms before opponent returns ball
behind the table.
Fig. 4: Mean absolute error of the ball’s target with standard
error of the mean. These errors are before the opponent has
this paper, we used a Barrett WAM robot arm to play table hit the ball and only based on his movements.
tennis against human players. The robot’s hardware constraints
impose strong limitations on its flexibility.
straightforward approach to prediction is to learn a mapping
The robot requires sufficient time to execute a ball-hitting
from the features zt to the target g. Hence, we compare our
plan: to achieve the required velocity for returning the ball,
model to Gaussian Process Regression (GPR) using a Gaussian
movement initiation to an appropriate preparation pose is
kernel with automatic relevance determination.
needed before the opponent hits the ball. The robot player uses
different preparation poses for forehand and backhand hitting For every recorded play, we compared the performance
plans. Hence, it is necessary to choose between them based of the proposed IDDM intention inference and the GPR
on the modeling the opponent’s preference [26] and inference prediction at 80 ms, 160 ms, 240 ms and 320 ms before the
of the opponent’s target location for the ball [27]. opponent hits the ball. Note that this time-step was only used
The robot perceives the ball and the opponent’s racket in such that the algorithms could be compared, and that the
real-time, using seven Prosilica GE640C cameras [27]. These algorithms were not aware of the hitting time of the opponent
cameras were synchronized and calibrated to the coordinate in advance. As demonstrated in Fig. 4, the proposed model
system of the robot. The ball tracking system uses four outperformed the GPR. At 80 ms before the opponent hit the
cameras to capture the ball on both courts of the table. ball, the proposed model resulted in the mean absolute error of
The racket tracking system provides the information of the 31.55 cm, which achieved a 13.3% improvement over the GPR
opponent’s racket, i.e., position and orientation. As a result, (p-value: 0.016), whose average error is 36.4 cm. One model-
the observation zt includes the ball’s position and velocity, free naive intention prediction would be to always predict the
and the opponent’s racket position, velocity, and orientation center of the intentions in the training set. This naive prediction
before the human plays the ball. We downsampled the ob- model caused an error of 81.9 cm. Hence, both the GPR and
servations at 12Hz. Here, the position and velocity of the IDDM substantially outperformed naive goal prediction.
ball were processed online with an extended Kalman filter. Note that the prediction was made before the opponent
However, the same smoothing method cannot be applied to the hits the ball, and only used to choose between forehand and
racket’s trajectory, as its dynamics are unknown. Therefore, the backhand hitting plans. More fine-tuning can follow later when
obtained states of the racket were subject to substantial noise the returned ball’s trajectory has been observed. Hence, a
and the model has to be robust to this noise. certain amount of error is tolerable since the robot can apply
The robot always chooses its hitting point on a virtual small changes to its hitting plan based on the ball’s trajectory.
hitting plane (80 cm behind the table), as shown in Fig. 3. However, it cannot switch between forehand and backhand
We define the human’s intended target g as the intersection of hitting plans because of torque/temporal constraints.
the returned ball’s trajectory with the virtual hitting plane. As We performed model selection to determine the covariance
the X-coordinate (see Fig. 3) is most important for choosing function kz , which can be either an isotropic Gaussian kernel,
either forehand or backhand hitting plans, the intention g con- see Eq. (4), or a linear covariance function, see Eq. (5). Fur-
sidered here is the X-coordinate of the hitting point. Physical thermore, we performed model selection to find the dimension
limitations of the robot restrict the X-coordinate to the range d of the latent states. In the experiments, the model was
to ±1.2 m from the robot’s base (table is 1.52 m wide). selected by cross-validation on the training set. The best model
To evaluate the performance of target prediction, we col- under consideration was with a linear covariance function and
lected a data set with recorded stroke movements from differ-
ent human players. The true targets were obtained from the TABLE I: The mean absolute errors (in cm) of the goal
ball tracking system. The data set was divided into a training inference made 80 ms before the opponent hits the ball.
set with 100 plays and a test set with 126 plays. The standard kernel d=3 d=4 d=5 d=6
deviation of the target coordinate in the test set is 102.2 cm. A linear 41.52 31.55 35.36 37.04
Gaussian 38.49 34.16 34.44 37.28

438
four dimensional latent state space. Experiments on the test
set verified the model selection result, as shown in Table I.
Our results demonstrated that the IDDM can improve the
target prediction in robot table tennis and choose the correct
hitting plan. We have verified the model in a simulated
Latent dimension 1 Latent dimension 2
environment, but using data from real human movements
Time
recorded from a human playing against another human. The
simulation showed that the robot could successfully return the
ball when given a prediction by the IDDM model. For a demo,
see Footnote 1.
C J KH TK C J KH TK C J KH TK C J KH TK
V. ACTION R ECOGNITION FOR I NTERACTIVE ROBOTS
Fig. 5: The trajectories of the 2D latent states for two consec-
To realize safe and meaningful HRI, it is important that utive Jumping actions are obtained by smoothing. The error
robots can recognize the human’s action. The advent of robust, bars represent the corresponding standard deviations. The bar
marker-less motion capture techniques [22] has provided us charts correspond to the likelihood of Crouching, Jumping,
with the technology to record the full skeletal configuration of Kick-High and Turn-Kick at different stages of an action.
the human during HRI. Yet, recognition of the human’s action
from this high-dimensional data set poses serious challenges. In order to apply the IDDM to real-time action recognition,
In this paper, we show that the IDDM has the potential
we need to trade-off between accuracy and time complexity,
to recognize the intention of action from movements in a
by varying the size of sliding windows. As shown in Table II,
simplified scenario. Using a Kinect camera, we recorded the
the proposed model could yield real-time action recognition
32-dimensional skeletal configuration of a human during the
in 3Hz with a little sacrifice in accuracy. Note that this
execution of a set of actions namely: crouching (C), jumping
was only a preliminary experiment to show the feasibility
(J), kick-high (KH), kick-low (KL), defense (D), punch-high
of IDDM in action recognition. We will actively work on
(PH), punch-low (PL), and turn-kick (TK). For each type of
more efficient algorithms and implementations. For future
action we collected a training set consisting of ten repetitions
experimental results and a demo, see Footnote 1.
and a test set of three repetitions. The system downsampled
the output of Kinect and processes three skeletal configurations VI. R ELATED W ORK
per second. The actions were performed slowly in the data set, Inference of intention, or referred to in some contexts as
for example one jumping action took about 1.5 seconds, so that goal, target, desire, plan, etc., has been investigated under dif-
the NAO robot has sufficient time to respond. ferent settings. For example, an early work [18] used Hidden
In this task, the intention g is a discrete variable and Markov Models to model and predict human behavior where
corresponds to the type of action. Action recognition can be different dynamics models are adopted to the corresponding
regarded as a classification problem. We compared the pro- behaviors. Bayesian models were used for inferring goals
posed algorithm to Support Vector Machines (SVM) [21] and from behavior in [19], where a policy conditional on the
multi-class Gaussian Process Classification (GPC) [20, 12]. agent’s goal is learned to represent the behavior. Bayesian
The algorithms made a prediction after observing a new models can also interpret the agent’s behavior and predict
skeletal configuration. The proposed IDDM used a sliding its behavior in a similar environment with the learned model
window of length n = 5, i.e., it recognized actions based [3]. Inverse Reinforcement Learning (IRL) [1] assumes a
on the recent n observations. We chose the IDDM with a rational agent that maximizes expected utility, and infers the
linear covariance function for kz and a two-dimensional state underlying utility function from its behavior. In a recent work
space. The IDDM achieved the precision of 83.8%, which [10], a computational framework is proposed to model gaze
outperformed SVM (77.5%) and GPC (79.4%) using the following, where GPs are used to model the policies with
same sliding windows. We observed that the SVM and GPC actions directed by a goal.
confused between crouching and jumping, as they were similar Observation of human movement often consists of high-
in the early and late stages. In contrast, the IDDM could dimensional features. Determining the low-dimensional latent
distinguish crouching (C) and jumping (J) from their different state space is an important issue for understanding observed
dynamics, as shown in Fig. 5. The distinction between C and actions. Gaussian Process Latent Variable Model (GPLVM)
J became significant while the human performed the actions: [15] finds the most likely latent variables by marginalizing
The longer the movement was observed the more evidence out the mapping function from latent to observed space. Its
was used by the IDDM. extension, Gaussian Process Dynamical Models [25] can be
used to model the dynamics of human motion while simulta-
TABLE II: Trade-off between accuracy and time complexity, neously finding low-dimensional latent states. For example, it
by varying the size of sliding windows n. can model and extrapolate the appearance of human walking.
n 4 5 6 Nonlinear dynamics models have successful applications
Time(s) 0.27 0.32 0.39 in robotics. For example, Dynamic Motion Primitives [11]
Accuracy 79.0% 83.8% 86.6%

439
were used in imitation learning, parametrizing the dynamic [9] M.P. Deisenroth, R. Turner, M. Huber, U.D. Hanebeck,
as differential equations to achieve robust dynamics and fast and C.E. Rasmussen. Robust filtering and smoothing
learning. Dynamics models are also helpful in tracking, for with Gaussian processes. Trans. on Automatic Control,
example, a small robotic blimp using two cameras [13], where 2012.
GP-Bayes filters were proposed for efficient filtering. In a [10] A.L. Friesen and R.P.N. Rao. Gaze following as goal
following work, the model is learned using GPLVM [14], so inference: A Bayesian model. In CogSci, 2011.
that the latent states need not be provided for learning. [11] A.J. Ijspeert, J. Nakanishi, and S. Schaal. Movement
imitation with nonlinear dynamical systems in humanoid
VII. D ISCUSSIONS
robots. In ICRA, 2002.
In this paper, we proposed the intention-driven dynamics [12] M.E. Khan, S. Mohamed, B.M. Marlin, and K.P. Murphy.
model (IDDM). Our contributions include: (1) suggesting the A stick-breaking likelihood for categorical data analysis
IDDM, which simultaneously finds a good low-dimensional with latent Gaussian models. In AISTATS, 2012.
representation of high-dimensional and noisy observations, [13] J. Ko and D. Fox. GP-BayesFilters: Bayesian filtering us-
and models the dynamics that are driven by the intention; (2) ing Gaussian process prediction and observation models.
introducing an approximate algorithm to efficiently infer the Autonomous Robots, 27(1), 2009.
human’s intention from an ongoing movement; (3) verifying [14] J. Ko and D. Fox. Learning GP-BayesFilters via Gaussian
the proposed model in two human-robot interaction scenarios, process latent variable models. Autonomous Robots,
i.e., target inference in robot table tennis and action recognition 2009.
for interactive robots. In these two scenarios, we show that [15] N.D. Lawrence. Gaussian process latent variable models
modeling the intention-driven dynamics can achieve better for visualization of high dimensional data. In NIPS,
prediction than algorithms without modeling dynamics. 2004.
There are many interesting directions that we are actively [16] M. Matsushima, T. Hashimoto, M. Takeuchi, and
exploring. The model suffers somewhat from its computational F. Miyazaki. A learning approach to robotic table tennis.
complexity. The acceleration of the inference algorithm for its IEEE Trans. on Robotics, 21(4), 2005.
real-time application in robot table tennis can be achieved by a [17] K. Mülling, J. Kober, and J. Peters. A biomimetic
more efficient implementation, e.g. using parallel computing. approach to robot table tennis. Adaptive Behavior, 19
Moreover, the discovery of action types without supervision (5):359–376, 2011.
can be interesting for HRI. [18] A. Pentland and A. Liu. Modeling and prediction of
ACKNOWLEDGMENTS human behavior. Neural Computation, 11(1), 1999.
[19] R.P.N. Rao, A.P. Shon, and A.N. Meltzoff. A Bayesian
Part of the research leading to these results has received
model of imitation in infants and robots. Imitation and
funding from the European Community’s Seventh Framework
Social Learning in Robots, Humans, and Animals, 2004.
Programme under grant agreements no. ICT-270327 (Com-
[20] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes
pLACS) and no. ICT-248273 (GeRT). We thank Abdeslam
for Machine Learning. MIT Press, 2006.
Boularias for valuable discussions on this work.
[21] B. Schölkopf and A.J. Smola. Learning with Kernels:
R EFERENCES Support Vector Machines, Regularization, Optimization,
[1] P. Abbeel and A.Y. Ng. Apprenticeship learning via and Beyond. MIT Press, 2001. ISBN 0262194759.
inverse reinforcement learning. In ICML, 2004. [22] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finoc-
[2] C. Andrieu, N. De Freitas, A. Doucet, and M.I. Jordan. chio, R. Moore, A. Kipman, and A. Blake. Real-
An introduction to MCMC for machine learning. Ma- time human pose recognition in parts from single depth
chine learning, 50(1):5–43, 2003. images. In CVPR, 2011.
[3] C. Baker, J. Tenenbaum, and R. Saxe. Bayesian models [23] M.A. Simon. Understanding Human Action: Social
of human action understanding. In NIPS, 2006. Explanation and the Vision of Social Science. State
[4] C.L. Baker, R. Saxe, and J.B. Tenenbaum. Action University of New York Press, 1982.
understanding as inverse planning. Cognition, 113(3), [24] R. Turner, M.P. Deisenroth, and C.E. Rasmussen. State-
2009. space inference and learning with Gaussian processes. In
[5] A.C. Damianou, M.K. Titsias, and N.D. Lawrence. Vari- AISTATS, 2010.
ational gaussian process dynamical systems. In NIPS, [25] J.M. Wang, D.J. Fleet, and A. Hertzmann. Gaussian pro-
2011. cess dynamical models for human motion. IEEE Trans.
[6] M.P. Deisenroth. Efficient Reinforcement Learning using on Pattern Analysis and Machine Intelligence, 2008.
Gaussian Processes. KIT Scientific Publ., 2010. [26] Z. Wang, A. Boularias, K. Mülling, and J. Peters. Bal-
[7] M.P. Deisenroth and H. Ohlsson. A general perspective ancing safety and exploitability in opponent modeling.
on Gaussian filtering and smoothing. In ACC, 2011. In AAAI, 2011.
[8] M.P. Deisenroth, M.F. Huber, and U.D. Hanebeck. Ana- [27] Z. Wang, C.H. Lampert, K. Mülling, B. Schölkopf, and
lytic moment-based Gaussian process filtering. In ICML, J. Peters. Learning anticipation policies for robot table
2009. tennis. In IROS, 2011.

440
Optimization of Temporal Dynamics for Adaptive
Human-Robot Interaction in Assembly
Manufacturing
Ronald Wilcox, Stefanos Nikolaidis, and Julie Shah
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
Email: [email protected]; [email protected]; julie a [email protected]

Abstract—Human-robot collaboration presents an opportunity In this paper, we present a robotic scheduling and control
to improve the efficiency of manufacturing and assembly pro- capability for human-robot collaborative work that addresses
cesses, particularly for aerospace manufacturing where tight two key challenges in the manufacturing environment. First,
integration and variability in the build process make physical
isolation of robotic-only work challenging. In this paper, we preferences about task completion are prone to change since
develop a robotic scheduling and control capability that adapts the ordering and timing of activities in many manual processes
to the changing preferences of a human co-worker or supervisor are left to the discretion of the human workers. A high level
while providing strong guarantees for synchronization and timing of adaptability and robustness must therefore be built into any
of activities. This innovation is realized through dynamic execu- robotic system that works in close collaboration with people.
tion of a flexible optimal scheduling policy that accommodates
temporal disturbance. We describe the Adaptive Preferences Second, human and robotic work in manufacturing and
Algorithm that computes the flexible scheduling policy and assembly must meet hard scheduling constraints, including
show empirically that execution is fast, robust, and adaptable pulse rates between build stations and flow rates for end-to-
to changing preferences for workflow. We achieve satisfactory end assembly. The changing preferences of a human co-worker
computation times, on the order of seconds for moderately- or supervisor must be accommodated while preserving strong
sized problems, and demonstrate the capability for human-robot
teaming using a small industrial robot. guarantees for synchronization and timing of activities.
Our approach generalizes from dynamic scheduling meth-
I. I NTRODUCTION ods [2, 6, 13] first developed to perform scheduling onboard
a deep space satellite [6]. Dynamic scheduling is domain
Traditionally, industrial robots in manufacturing and assem- independent and has been successfully applied to scheduling
bly perform work in isolation from people. When this is not within the avionics processor of commercial aircraft [13],
possible, the work is done manually. We envision a new class autonomous air vehicles [12], robot walking [3], and recently,
of manufacturing processes that achieve significant economic human-robot teaming [8, 11]. We leverage prior art that
and ergonomic benefit through robotic assistance in manual addresses efficient real-time scheduling of plans whose tem-
processes. For example, mechanics in aircraft assembly spend poral constraints are described as Simple Temporal Problems
a significant portion of their time retrieving and staging tools (STPs) [2, 6, 13]. STPs compactly encode the set of feasible
and parts for each job. A robotic assistant can provide pro- scheduling policies for plan events that are related through
ductivity benefit by performing these non-value-added tasks simple interval temporal constraints. Temporal flexibility in
for the worker. Other concepts for human and robot co-work the STP provides robustness to disturbances at execution.
envision large industrial robotic systems (examples in Fig. 1) We make use of this simple yet powerful framework to
that operate in the same physical space as human mechanics. model joint human-robot work as a Simple Temporal Problem
with soft constraints (called preferences). The preferences
encode person-specific workflow patterns and human operator
input for suggested workflow. Simple Temporal Problems with
Preferences (STPPs) have been studied previously [4, 5, 7]
for weakest-link optimization criteria, but these solution tech-
niques do not generalize to optimization criteria relevant
to manufacturing applications. Alternatively, an STPP with
arbitrary objective function may be formulated and solved as a
non-linear program (NLP), where the solution is an assignment
Fig. 1. (Left) Coriolis Composite Placement Robot; (Right) ElectroImpact
Robotic Drilling Arm of execution times to each event in the plan. This approach
results in brittle solutions; any disturbance in execution time
This work is supported by Boeing Research and Technology and ABB, Inc. requires time-consuming recalculation of the schedule.
Mechatronics US Corporate Research. In this work, we describe a robotic scheduling capabil-

441
ity that leverages the strengths of STP and NLP solution
methods: flexibility in execution and optimization of arbitrary
objective functions, respectively. We present the Adaptive
Preferences Algorithm (APA) that uses the output of a non-
linear program solver to compute a flexible optimal scheduling
policy that accommodates temporal disturbance. The algorithm
also supports on-the-fly optimization in response to changing
preferences. In Section II we discuss examples of human- Fig. 3. Spar assembly is a manual process that could be improved by a
robot interaction that motivate our work. Next, in Section III robotic assistant (image courtesy of Boeing Research and Technology)
we briefly review STP dynamic scheduling and optimization
of preferences, since this prior art forms the basis for our
approach. In Sections IV and V we present the Adaptive sealant, hammering the bolts, and placing and torquing the
Preferences Algorithm (APA) and its evaluation. We show collars. This division of labor would provide productivity
empirical results indicating both that APA provides significant benefit through parallelization of tasks.
robustness to temporal disturbance and that a robot using Our aim is to enable a robotic assistant to adapt to person-
APA can adaptively schedule its actions over a horizon of specific workflow patterns. If most mechanics like to hammer
75 activities with sub-second speed. Finally in Section VI, all bolts before torquing collars, the robot will support this
we apply APA to perform human-robot teaming using a small approach by placing all bolts in a pattern that anticipates the
industrial robot. mechanic’s actions. When the robot is paired with a mechanic
that instead prefers to hammer and torque the collar for each
II. M OTIVATING E XAMPLES bolt as it is placed, the robot will quickly perceive this differ-
In this section, we discuss two types of applications that ence and reoptimize its schedule to converge on a turn-taking
motivate our work: one-on-one robotic assistance for a worker, pattern with the mechanic. The robot will adapt according to
and single-operator orchestration of robot teams. the mechanic’s preferences, subject to the constraint that the
sealant is utilized within the specified window.
A. Robotic assistant to assembly mechanic
We aim to develop a capability that supports efficient B. Robotic Team Orchestration
and productive interaction between a worker and a robotic We also aim for our capability to enable a single operator
assistant, such as the FRIDA robot shown in Fig. 2. Although to direct a team of robots, while ensuring that hard scheduling
important aspects like tolerances and completion times are deadlines such as mandated flow rates are met. Work will
well defined, many details of assembly tasks are left largely be shifted according to operator preferences through fast re-
up to the mechanic. computation of the robots’ schedule, while preserving guaran-
tees that assembly will finish within specified deadlines.
Unscheduled maintenance is frequently required for new,
specialized robots that perform traditionally manual work,
including drilling and composite lay-down. Current practices
require all robots halt while one robot is repaired, or while a
quality assurance agent inspects the work. These slowdowns
and subsequent workflow recalculations cost the facilities
hours of productivity that can be avoided with the quick re-
Fig. 2. ABB FRIDA robot acting as a robotic assistant computation and flexible schedules provided by our approach.

III. BACKGROUND
Assembly of airplane spars is one example of a manual
process where mechanics develop highly individualized styles The robotic scheduling and control capability we develop
for performing the task. Fig. 3 shows a mechanic assembling builds on previous work in STP dynamic scheduling and
a spar. The spar is composed of two pieces that must be optimization of STPs with Preferences (STPPs).
physically manipulated into alignment. After alignment, wet
sealed bolts are hammered into pre-drilled holes and are A. Dynamic Scheduling of STPs
fastened with collars. Excess sealant is removed, and the A Simple Temporal Problem (STP) [2] consists of a set of
collars are re-torqued to final specifications. Sequencing of executable events, X. These events are connected via binary
these tasks is flexible, subject to the constraint that the sealant temporal constraints (intervals) bij that indicate a range for the
is applied within a specified amount of time after opening it. temporal distance between events Xi and Xj . Fig. 4 (left)
A robot such as FRIDA can assist a mechanic by picking presents the constraint form graphical depiction of a binary
bolts and fasteners from a singulator, rotating them in front temporal constraint. Events are represented as nodes, and the
of a stationary sealant end-effector, and inserting them into temporal constraint is depicted with an arrow and assigned
the bores. This would allow the mechanic to focus on wiping interval.

442
actions, so as to drive the robot schedule to conform to human
behavior. Data mining of typical human workflows can provide
the statistical information necessary to infer preference func-
Fig. 4. Left: Constraint form representation; indicates that event B must
occur at least 3 time units after event A but no more than 5 time units after
tions. In addition, preference functions may be used to model
it, or 3 < XB − XA < 5, Right: Distance graph representation; indicates the effect of implicit communications; recent studies indicate
the same interval as the constraint, but yields two equivalent inequalities, that gestures induce preferences over execution sequence and
XB − XA < 5 and XA − XB < −3
timing in human teams [9]. This effect may be reproduced in
human-robot teams using preference functions.
The STP constraint form may be mapped to an equivalent STPPs were originally developed to perform scheduling
distance graph form to support efficient inference [2]. Fig. for Earth observation satellites [4]. Scientists were asked
4 (right) presents the distance graph form of the temporal to provide preferences indicating the most effective times for
constraint. The interval upperbound is mapped to a positive arc them to access the satellite. The STPP framework was applied
from XA to XB , and the lowerbound is mapped to a negative to solve the scheduling problem, using an objective function
arc from XB to XA . that maximized the preferences of the least satisfied scientist.
A solution to an STP is a time assignment to each event Solution methods, including a slow constraint propagation
such that all constraints are satisfied. An STP is said to be technique and fast binary chop method [7], have been
consistent if at least one solution exists. Checking an STP for designed for this weakest link optimization criterion but do
consistency can be cast as an all-pairs shortest path problem. not readily generalize to other objective functions.
The STP is consistent if and only if there are no negative cycles Fairness is not a concern in the optimization of a manu-
in the all-pairs distance graph. This check can be performed facturing process. It is acceptable to sacrifice one interval’s
in O(n3 ) time by applying the Floyd-Warshall algorithm [2]. preference value to improve the preference values for many
The all-pairs shortest path graph of a consistent STP is other intervals (e.g. slow down one robot so that it does
also a dispatchable form of the STP, enabling flexible real- not block the path for the other robots). For example, for
time scheduling [6]. The dispatchable STP provides a com- many manufacturing applications, an approach that optimizes
pact representation of the set of feasible schedules. Dynamic the
 STPP with respect to the sum of preference values,
scheduling of the dispatchable STP provides a strategy that bij fbij (t), is more appropriate. An STPP with arbitrary
schedules events online just before they are executed, with objective function may be formulated and solved as a non-
a guarantee that the resulting schedule satisfies the temporal linear program (NLP), where the solution is an assignment of
constraints of the plan. Scheduling events on-the-fly allows the execution times to each event in the plan. However, this ap-
robot to adapt to temporal disturbance associated with past proach results in brittle solutions; any disturbance in execution
events through fast linear-time constraint propagation. More time requires time-consuming re-calculation of the schedule.
formally, a network is dispatchable if for each variable XA it In the next sections, we present a method for computing a
is possible to arbitrarily pick a time t within its timebounds temporally flexible optimal scheduling policy that leverages
and find feasible execution windows in the future for other the strengths of STP and NLP solution methods. The Adaptive
variables through one-step constraint propagation of the XA Preferences Algorithm computes a flexible optimal scheduling
temporal commitment. Dispatching of STPs is described in policy that accommodates fluctuations in execution time and
more detail with examples in Section IV. supports robust online optimization in response to changing
preferences.
B. Prior art in STP Preference Optimization
An STP with Preferences (STPP) [5] is a Simple Tempo- IV. P ROBLEM F ORMULATION
ral Problem with the addition of soft binary constraints, or
preference functions, fbij (t) relating the temporal durations The Adaptive Preferences Algorithm (APA) takes as input a
between events. The global preference function, F , of an STPP Simple Temporal Problem with Preferences (STPP), composed
represents the objective function derived from the preference of
values based on a time assignment to each event. An optimal • a set of variables, X1 , ...Xn , representing executable
solution to the STPP is consistent with the temporal constraints events,
bij and optimizes the global preference function F . • a set of binary temporal constraints of the form bij en-
Preferences provide an expressive and natural framework coding activity durations and qualitative and quantitative
for encoding human input. A supervisor may apply preference temporal relations between events Xi and Xj ,
functions to specify the most effective timing for an activity • a set of preference functions of the form fbij (t) encoding
without providing hard constraints that lead to schedule brit- preference values over the temporal interval bij , and
tleness. For example, a supervisor may specify the desire for • a global objective criterion F defined as a function
 of the
painting to take four hours, but allow any time up to six hours preferences functions fbij (t). We use F = bij fbij (t)
as acceptable. for prototyping of the described manufacturing applica-
Preference functions may also be applied to encode sta- tions, although note APA generalizes to other forms of
tistical information about likely execution times for human the objective function.

443
The output of the algorithm is a dispatchably optimal (DO)
form of the STPP that supports fast dynamic scheduling.
We define an STPP as dispatchably optimal if it is possible
to maximize the global preference function F through the
following procedure: for each variable Xi it is possible to
arbitrarily pick a time t within the DO form’s timebounds
and find feasible execution windows in the future for other
variables through fast one-step constraint propagation of the
Xi temporal commitment.
Notice that the proposed problem may be formulated as
a non-linear optimization problem to solve for event execu-
tion times. This approach provides a solution that is brittle Fig. 5. Pseudocode for the compilation algorithm
to disturbance, requiring recomputation when an event does
not execute at precisely the specified time. In contrast, our
approach compiles a temporally flexible optimal scheduling execution times computed by the APSP computation. Binary
policy that accommodates fluctuations in execution time. This constraints are formulated as linear inequality constraints re-
method leverages the insight that there are many potential lating the variables. For the manufacturing applications
 we are
schedules that are consistent with an optimal time assignment interested in, the objective function is defined as bij fbij (t),
to preference functions. Section IV-A presents the compi- the sum of the preference values evaluated across each binary
lation algorithm that computes the DO form for the STPP. interval constraint. The preference functions are permitted to
Section IV-B presents the dispatcher algorithm that generates be nonlinear, resulting in the nonlinear formulation. The solver
a schedule using the STPP DO form, and supports robust returns an assignment of event execution times that optimizes
online reoptimization in response to changing preferences. the global preference value F subject to the given constraints
bij .
A. Compiler for STPP DO Form Note that we do not use the output of the nonlinear optimizer
The Compiler takes as input a STPP composed of events directly to set the schedule, as this will provide no robustness
Xi , constraints bij , and preference functions fbij (t). It then to uncertainty and disturbance in the execution. Instead, we use
reformulates and optimizes the STPP as a non-linear pro- the output as follows to reformulate the STPP and compute a
gram. The resulting optimal timestamps are used to modify temporally flexible, optimal scheduling policy.
the network so that intervals with preference functions are In Line 4, the algorithm iterates through all constraints in the
tightened to the values returned by the optimizer; intervals original STPP and makes a list given prefs of those that have
without preferences retain their flexibility. After an all-pairs- preference functions associated with them. Line 5 searches
shortest-path computation, the resulting output is a DO plan, through each constraint bij in the partially compiled plan. If
which encodes a flexible scheduling policy that maximizes bij also exists in given prefs, then bij is updated, setting both
the global preference function F subject to the given binary the upper and lower bounds of the constraint to the optimized
temporal constraints bij . time of execution (with a small tolerance built in). Finally, in
Pseudocode for the compilation algorithm is provided in Line 10, the APSP network is computed to expose implicit
Fig. 5. The first step (Line 1) of APAcompilePlan is to constraints of the tightened network. The result (Line 11) is
compute the all-pairs-shortest-path form of the STP using the a DO form of the STPP that preserves temporal flexibility in
Floyd-Warshall algorithm [1]. This process exposes implicit the network where there is no impact on the time assignments
constraints and is necessary to ensure events are scheduled in to preference values.
the proper order with requisite temporal distances between We now walk through an illustrative example for applying
events. The result of the APSP computation is a fully- the compilation algorithm. Consider the STPP shown in Fig.
connected network, with binary constraints relating each pair 6. This network is an all-pairs-shortest path graph (Line 1),
of events. Many of the added constraints are redundant and with all implicit constraints exposed, and does not contain
can be removed from the problem (Line 2) without loss of any redundant constraints (Line 2). Line 3 generates a list
information [6]. Our empirical investigations indicate that containing the following constraints with preference functions:
pruning of redundant constraints reduces the total number bAD and bBC .
of constraints by 40-50%. The resulting network is the most Line 4 creates a solver with variables for each event:
compact representation of the binary temporal constraints that A, B, C, D. All six intervals act as inequality constraints (e.g.
still contains all feasible solutions present in the original for interval AC, we have 1 < CA < 5). The objective function
problem [6]. is given by
In Line 3, we use the resulting representation as input to
a standard, third-party optimization solver [1]. The STPP
is formulated as a nonlinear program as follows. Events fglobal = −(D−A)2 +15(D−A)−50−(B −C)2 +2(B −C).
are encoded as variables with ranges that span the possible (1)

444
that encodes the DO form of an STPP S. The dispatcher
schedules events on-the-fly just before they are executed while
guaranteeing that the resulting schedule satisfies the temporal
constraints of the plan. This guarantee is achieved through
constraint propagation of temporal commitments to executed
events. The output of the dispatcher is an assignment of event
execution times that optimizes the STPP S global preference
value F , subject to the given temporal constraints of S.
Fig. 6. Example STPP to illustrate compilation The dispatcher also supports robust online replanning of
the DO form, in response to changing preference functions
and disturbances in the optimal execution. In these situations,
The non-linear program is solved, and yields optimal execu- the DO form must be recompiled by calling the algorithm
tion times of A = 0, B = 3, C = 4, D = 7.5. Next, we create APAcompilePlan with the modified STPP S  . As discussed
a new copy of the plan and replace intervals AD[5, 10] with in Section V, this recompilation takes on the order of seconds
AD[7.5, 7.5] and BC[0, 2] with BC[1, 1]. Performing Floyd for moderately-sized real-world problems.
Warshall on this new network then produces the DO form of Function1 of the dispatcher is achieved using the standard
the STPP, given in Fig. 7. Any choice of times satisfying the STP dispatching algorithm [6]. Function2 is achieved by
constraints in Fig. 7 produces a solution that maximizes the augmenting the STP dispatching algorithm with two additional
global preference value fglobal . methods. The first method triggers recompilation for changing
preference functions or deviations from the optimal schedule;
the second method runs concurrently to ensure the dispatcher
makes progress during recompilation and that the execution
schedule satisfies the hard constraints of the STPP S.
Fig. 8 presents the STPP dispatching algorithm. Aug-
mentations to the standard STP dispatching algorithm are
highlighted. We walk through the dispatch of the DO plan
in Fig. 7 to illustrate the algorithm.
First, in Line 1, all events without predecessors are added
Fig. 7. DO Form of the STPP in Fig. 6 to the Enabled list. In our example from Fig. 7, event A
is initially added to the Enabled list. In Line 2, the current
Next, we provide a proof that the STPP-DO form computed time is set to zero. Line 3 contains the first major change to
by APAcompilePlan encodes all feasible solutions in the the standard dispatching algorithm. Here a concurrent thread
original STPP that are consistent with a given optimal time is started to shadow dispatch the STP associated with the
assignment to preference functions. In the next section, we orig plan. This thread is used to ensure the dispatcher makes
discuss the method for dispatching the DO form of the STPP. progress during recompilation and that the execution schedule
Lemma (STPP-DO Form): Given an STPP with an optimal satisfies the hard constraints of orig plan.
time assignment tbij → fbij (t) to each preference function, Dispatching continues until there are no unexecuted events
(i) the STPP-DO form encodes all feasible solutions in the in the plan (Line 4). If new preference functions are made
STPP that are consistent with tbij → fbij (t), and available or the execution deviates from the optimal scheduling
(ii) the STPP-DO form supports dispatchable scheduling. policy, recompilation is triggered (Line 5). Execution control
Proof: (i) Lines 5-8 in APAcompilePlan tighten constraints is switched to the STPdispatch thread (Line 6). The orig plan
in the original STPP, ensuring that any solution satisfies tbij → is updated with execution commitments (Line 7) and is com-
fbij (t) and achieves the optimal global preference value. Line piled (Line 8). Next, execution control is transferred back to
10 computes the all-pairs-shortest-path form of the resulting STPPdispatch (Line 9), and execution proceeds in Lines 11-25
STP, which by definition contains all feasible solutions present according to the standard STP dispatching algorithm.
in the original problem [6] that also satisfy tbij → fbij (t). The dispatcher listens for notice of successful event execu-
(ii) The resulting STPP-DO form returned at Line 11 is tions from the robot (Line 13). Executed events are recorded in
an all-pairs-shortest-path STP, which by definition is also a the Executed list and removed from the Enabled list (Lines 14-
dispatchable STP [6]. 17). In Lines 18-23, the dispatcher commands an event to be
executed if it is both enabled, meaning all predecessors have
B. Dispatcher been executed, and is alive, meaning the current time is within
In this section, we present a dispatcher algorithm that the event’s feasible window of execution. In our example, at
supports two functions: the dispatcher (function1) generates a t = 0 event A is enabled and alive, and is executed.
schedule using the STPP DO form, and (function2) supports If an event is executed (Line 24), the Enabled list is
robust online reoptimization in response to changing prefer- updated (Line 26), and the commitment is propagated through
ences. The dispatcher takes as input an STP compiled plan the network compiled plan to update liveness windows for

445
robot response in Line 13) provides robustness in execution
by allowing for situations that prevent the robot from com-
pleting the task at precisely the specified time. For example,
consider event C is commanded at t = 2.5 but is delayed at
execution till t = 3.0. The STPP DO form accommodates this
disturbance on-the-fly through one-step constraint propagation.
The liveness windows for events B and D are updated to
B : [4.0, 4.0], D : [7.5, 7.5].
The compiler and dispatcher presented in Figs. 5 and 8
have been implemented and tested successfully. Section V
presents an empirical evaluation of APA, and Section VI
describes a robot demonstration applying APA to one-to-one
human-robot teaming.
V. E MPIRICAL E VALUATION
The STPP DO form is designed to be temporally flexible,
reducing the impact of disturbance on the schedule. In this
section, we empirically investigate the benefit of this flexi-
bility in two ways and compare the results to the non-linear
programming (NLP) solution. We also present computation
times for on-demand recompilation of the plan, showing that
a robot using APA can quickly adapt its schedule in response
to changing preferences.
Empirical results are produced using a random problem
generator that creates structured problems in the same manner
as prior art [10, 14]. The generator takes as input the number
n of events, the number of user-specified constraints c, and
Fig. 8. Pseudocode for the dispatching algorithm the set P of preference functions. Each temporal constraint
relating plan events is generated by randomly selecting two
events from an array and connecting them with a binary
all connected unexecuted events. With event A successfully interval constraint. Constraint upper and lower bounds are set
executed, the liveness windows for events B, C, and D are randomly and then scaled by the difference in array indices
updated to B : [3.5, 4.5], C : [2.5, 3.5], D : [7.5, 7.5]. Once between the two events. This creates a network that has a
A executes, event C is added to the Enabled list. Event C is natural structure, with more distant events related through
alive when the current time is between 2.5 seconds and 3.5 longer temporal durations than local events. Each preference
seconds. Executing event C at 2.5 seconds then leads to the function in P is assigned to a binary constraint in the order
situation shown in Fig. 9; executed events are shown with the constraints are generated. Following the precedence of
squares around letters. previous work in STPPs [7], we consider preference functions
of constant, linear, and quadratic form only. Only positive-
valued preference functions are permitted. A randomized
multiplier is applied to distinguish relative importance among
preference functions. The output of the generator is an STPP,
which is provided as input to the compiler. The APA compiler,
dispatcher, and random problem generator are implemented
in Java, and non-linear (here, quadratic) programs are solved
using the Java implementation of Gurobi [1]. Results are
generated using an Intel Core i7-2620M 2.70 GHz Processor.
First we run simulations to evaluate the cumulative time a
Fig. 9. Dispatching & propagation status after event A has been executed
at t = 0 and event C has been executed at t = 2.5 robot spends re-computing the schedule in response to frequent
small disturbances, for example, from a human co-worker
With events A and C in the Executed list, event B becomes that does not precisely follow the optimal scheduling policy.
enabled and is executed at t = 3.5. This commitment is This measure represents the total execution time the robot
propagated forward, and event D is executed at t = 7.5. The spends unresponsive to the human co-worker’s preferences for
resulting schedule maximizes the global preference value and workflow. Fig. 10 presents results showing the worst-case cu-
satisfies the temporal constraints of the problem. mulative compilation time for randomly-generated structured
The signal-and-response structure (signal in Line 21 and problems, in response to frequent small disturbances in the

446
optimal schedule. Each data point signifies the average and 20% the number of events. We empirically analyzed the impact
standard deviation across fifty randomly generated problems. of the number of preference functions, ranging from 20% to
Results were computed for problem sizes ranging from 25 to 80% of the number of events, and found no significant effect
250 events. The number of preference functions was set at 20% on performance. Instead, the number of temporal constraints
the number of events, based on the observation that real-world appears to be the primary driver of computation time.
problems typically have many fewer preference functions than
events. Cumulative compilation time for the inflexible NLP
approach scales with the number of events in the plan, whereas
the STPP DO approach scales with the number of preference
functions. The result is that the STPP DO form provides on
average an 80% reduction in cumulative compilation time.

Fig. 12. Compilation Time as a function of the number of events in the plan

The results show satisfactory compilation times on the order


of seconds for problems with hundreds of events. Compilation
time is less than five seconds for problems with 400 events and
Fig. 10. Cumulative Compilation Time as a function of the number of events less than 1 second for 150 events or less. These results provide
in the plan
sufficient capability for one-to-one human-robot collaboration,
indicating a robot can adaptively schedule its actions over a
Next, we compute a comparative measure of the temporal
horizon of approximately 75 activities with sub-second speed.
flexibility between both the STPP-DO form and the NLP solu-
Multi-robot orchestration problems involve problems of one
tion and the original STPP. We compiled 50 random problems
to two orders of magnitude larger, with thousands to tens
and compared the resulting interval durations to the original
of thousands of events. The current non-linear programming
problems’ interval durations. This ratio then represents the
approach does not scale to these sizes, and we are exploring
percentage of flexibility retained from the original problem;
techniques to improve the efficiency of the compilation. The
higher values of this ratio correspond to an increased robust-
bottleneck in the algorithm’s speed lies in the non-linear
ness to disturbances during execution. We compare this to the
optimizer. We are currently investigating the reformulation of
flexibility ratio for the NLP-specified schedule on the same 50
the non-linear program as an approximate linear program. This
problems; Fig. 11 presents the results. The DO form captures
approach is feasible for non-linear programs where the non-
on average more than 70% of the temporal flexibility in the
linear terms are found only in the objective function, and the
original plan, whereas the NLP solution captures less than 1%.
objective function is composed of separable, concave terms.
The DO form provides a marked improvement in robustness
Both of these conditions hold for the described approach, if
to disturbance over the NLP solution while achieving global
the preference functions strictly relate event execution times
optimization of the schedule.
to the plan start.
VI. ROBOTIC D EMONSTRATION
We have applied the Adaptive Preferences Algorithm to
perform human-robot teaming using a small ABB IRB 120
industrial robot (set-up shown in 13). This demonstration is
based on the spar building application described in Section
II. The robot’s job is to apply sealant to each hole, and
Fig. 11. Plan Flexibility of DO Form and NLP Solution the mechanic places and torques the fasteners. The mechanic
and robot must work together to ensure that each fastener
Finally, we present the computation times for single on- is placed within three seconds of sealant application. This
demand recompilation of the plan. These results simulate the requires that the robot adapt to the timing of the mechanic’s
execution latency associated with operator-specified changes to actions to avoid applying the sealant too early. One set of
the workflow. Fig. 12 presents the compilation time results for workers, group A, likes to place all fasteners before torqueing
randomly-generated structured problems ranging in size from them. The other set, group B, likes to place and torque each
50 to 1000 events. The number of preference functions is set at fastener before moving on to the next. The robot uses APA to

447
adaptively schedule its actions based on the type of worker robot scheduling and control in assembly manufacturing. We
it is paired with; worker-type is inferred from the timing show through empirical evaluation that the method is fast,
of the mechanic’s actions. Specifically, APA tracks the two adaptable, and robust to uncertainty in execution and supports
different sets of preference functions and switches to the set interaction with human co-workers and supervisors whose
that achieves the maximum possible global preference value. preferences about task completion are prone to change. We
The STPP representation of this joint human-robot plan is also demonstrate the use of APA in human-robot teaming
shown in Figure 14. Video of the demonstration can be found trials using a small industrial robot and show that the robot
at http://tinyurl.com/7n439eg. successfully adapts to the task preferences of different types of
workers. As future work, we are investigating an approximate
linear program formulation of the problem, with the aim of
scaling-up APA to multi-robot orchestration problems involv-
ing thousands to tens of thousands of events.
R EFERENCES
[1] Gurobi optimizer version 5.0, April 2012.
[2] R. Dechter, I. Meiri, and J. Pearl. Temporal constraint
networks. Artif. Intell., 49(1):61–91, 1991.
[3] A. Hofmann and B.Williams. Robust Execution of
Temporally Flexible Plans for Bipedal Walking Devices.
In Proc. ICAPS, pages 386–389, 2006.
Fig. 13. Demonstration Set-up [4] L. Khatib, P. Morris, R. Morris, and F. Rossi. Temporal
constraint reasoning with preferences. In Proc. IJCAI,
pages 322–327, 2001.
[5] P. Morris, R. Morris, L. Khatib, S. Ramakrishnan, and
A. Bachmann. Strategies for global optimization of
temporal preferences. In Proc. CP, pages 408–422.
Springer, 2004.
[6] N. Muscettola, P. Morris, and I. Tsamardinos. Reformu-
lating Temporal Plans For Efficient Execution. In Proc.
KRR, 1998.
[7] F. Rossi, K. B. Venable, L. Khatib, P. Morris, and R. Mor-
ris. Two Solvers for Tractable Temporal Constraints With
Preferences. In Proc. AAAI workshop on preference in
Fig. 14. STPP for the Robotic Demonstration; the f preference functions AI and CP, 2002.
correspond to group A workers, while preferences, g, correspond to group B [8] J. Shah. Fluid Coordination of Human-Robot Teams.
workers MIT PhD Thesis, Cambridge, Massachusetts, 2010.
[9] J. Shah and C. Breazeal. An Empirical Analysis of
Trials of human-robot teaming demonstrated that the robot
Team Coordination Behaviors and Action Planning With
was successfully able to adapt its schedule to both types of
Application to Human-Robot Teaming. Human Factors,
workers. When a group A mechanic performed the assembly
52, 2010.
task, the robot applied the sealant in regular intervals every
[10] J. Shah, J. Stedl, B. Williams, and P. Robertson. A Fast
3 seconds to keep just ahead of the mechanic, allowing the
Incremental Algorithm for Maintaining Dispatchability
mechanic to place the fasteners in the holes before the sealant
of Partially Controllable Plans. In Proc. ICAPS, 2007.
dried. When a group B mechanic performed the task, the robot
[11] J. Shah, J. Wiken, B. Williams, and C. Breazeal. Im-
began by applying the sealant every 3 seconds. However, once
proved Human-Robot Team Performance Using Chaski,
it sensed that the mechanic had torqued the first fastener before
a Human-inspired Plan Execution System. In Proc.
inserting the second, the robot recompiled its schedule using
ACM/IEEE HRI, pages 29–36, 2011.
group B preferences. The robot changed its pace to match
[12] J. Stedl. Managing Temporal Uncertainty Under Limited
the mechanic’s using the newly computed flexible optimal
Communication: A Formal Model of Tight and Loose
scheduling policy. This required slowing down the rate of
Team Communication. Master’s thesis, MIT, 2004.
sealant application to every 7 seconds. Using the Adaptive
[13] I. Tsamardinos and N. Muscettola. Fast transformation
Preferences Algorithm, the robot was able to make on-the-fly
of temporal plans for efficient execution. In Proc. AAAI,
decisions about how to most effectively aid each worker.
1998.
VII. C ONCLUSIONS AND F UTURE W ORK [14] I. Tsamardinos and M. Pollack. Efficient solution tech-
In this paper, we introduce the Adaptive Preferences Al- niques for disjunctive temporal reasoning problems. Artif.
gorithm (APA) that computes a flexible optimal policy for Intell., 151:43–89, December 2003.

448
Optimal Control with Weighted Average Costs and
Temporal Logic Specifications
Eric M. Wolff Ufuk Topcu Richard M. Murray
Control and Dynamical Systems Control and Dynamical Systems Control and Dynamical Systems
California Institute of Technology California Institute of Technology California Institute of Technology
Pasadena, California 91125 Pasadena, California 91125 Pasadena, California 91125
Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—We consider optimal control for a system subject repeatedly visit set of regions, but visit a high-weight region
to temporal logic constraints. We minimize a weighted average more often than others. Thus, we minimize a weighted average
cost function that generalizes the commonly used average cost cost function over system trajectories subject to the constraint
function from discrete-time optimal control. Dynamic program-
ming algorithms are used to construct an optimal trajectory for that a given temporal logic specification is satisfied. This cost
the system that minimizes the cost function while satisfying a function generalizes the average cost-per-stage cost function
temporal logic specification. Constructing an optimal trajectory commonly studied in discrete-time optimal control [4].
takes only polynomially more time than constructing a feasible Optimality has been considered in the related area of
trajectory. We demonstrate our methods on simulations of vehicle routing [21]. Vehicle routing problems generalize the
autonomous driving and robotic surveillance tasks.
traveling salesman problem, and are thus NP-complete. A
I. I NTRODUCTION different approach to control with LTL specifications converts
the controller design problem into a mixed-integer linear pro-
As the level of autonomy expected of robots, vehicles, and gram [12]. However, this approach is restricted to properties
other cyberphysical systems increases, there is a need for specified over a finite horizon. Chatterjee et al. [5] create
expressive task-specification languages that can encode desired control policies that minimize an average cost function in the
behaviors. Temporal logics such as linear temporal logic (LTL) presence of an adversary. The approach in [20] is the most
are promising formal languages for robotics. LTL provides closely related to our work. Motivated by surveillance tasks,
a natural framework to specify desired properties such as they minimize the maximum cost between visiting specific
response (if A, then B), liveness (always eventually A), safety regions. Our work is complementary to [20] in that we instead
(always not B), stability (eventually always A), and priority minimize a weighted average cost function.
(first A, then B, then C). The main contribution of this paper is a solution to the
Temporal logics have been used in the robotics and control problem of, given a transition system model, creating a system
communities to reason about system properties. A framework trajectory that minimizes a weighted average cost function
for automatically generating controllers for linear systems subject to temporal logic constraints. We solve this problem
that satisfy a given LTL specification is presented in [16]. by searching for system trajectories in the product automaton,
Sampling-based approaches for more general dynamical sys- a lifted space that contains only behaviors that are valid for the
tems are given in [11, 19]. Controllers can be constructed transition system and also satisfy the temporal logic specifica-
that satisfy a specification in the presence of an adversarial tion. An optimal system trajectory corresponds to a cycle in
environment [17], and receding horizon control can reduce the the product automaton, which is related to the well-studied
resulting computational complexity of synthesizing such con- cost-to-time ratio problem in operations research. We give
trol policies [23]. While these approaches all generate feasible computationally efficient dynamic programming algorithms
control policies that satisfy a temporal logic specification, no for finding the optimal system trajectory. In fact, it takes only
practical optimality notions can be imposed in their settings. polynomially more effort to calculate an optimal solution than
Often there are numerous control policies for a system a feasible solution, i.e., one that just satisfies the specification.
that satisfy a given temporal logic specification, so it is We present preliminaries on system models, task-
desirable to select one that is optimal with respect to some specification languages, and graph theory in Section II. The
cost function, e.g., time or fuel consumption. Since temporal main problem is formulated in Section III and reformulated
logic specifications include properties that must be satisfied as an equivalent graph problem in Section IV. We solve
over infinite state sequences, it is important that the form of this equivalent problem for finite-memory and infinite-memory
the cost function is also well-defined over infinite sequences. policies in Sections V-A and V-B respectively. These tech-
We consider an average cost, which is bounded under certain niques are demonstrated on numerical examples motivated by
mild assumptions discussed in Section 1. Additionally, it may autonomous driving and surveillance tasks in Section VI. We
be desired to give varying weights to different behaviors, i.e., conclude with future directions in Section VII.

449
II. P RELIMINARIES sequel, we will create a run with the minimal weighted average
We now provide preliminaries on the modeling and spec- cost. Thus, the designer can give transitions that she thinks
ification languages, weighted transition systems and linear are preferable a higher weight than the rest. As an example,
temporal logic respectively, used throughout the paper. consider an autonomous car that is supposed to visit different
An atomic proposition is a statement that has a unique truth locations in a city while obeying the rules-of-the-road. In
value (True or False). this case, a task specification would encode the locations that
should be visited and the rules-of-the-road. Costs might be the
A. Modeling Language time required to traverse different roads. Weights might encode
We use finite transition systems to model the system behav- preferences such as visiting certain landmarks. An example
ior. In robotics, however, one is usually concerned with con- scenario is discussed in detail in Section VI.
tinuous systems that may have complex dynamic constraints.
B. Specification Language
This gap is partially bridged by constructive procedures for
exactly abstracting relevant classes of continuous systems, We are interested in using linear temporal logic (LTL) to
including unicycle models, as finite transition systems [3, 2, concisely and unambiguously specify desired system behavior.
9]. Additionally, sampling-based methods, such as rapidly- LTL is a powerful formal language that is relevant to robotics
exploring random trees [18] and probabilistic roadmaps [15], because it allows system behaviors such as response, liveness,
gradually build a finite transition system that approximates safety, stability, priority, and guarantee to be specified.
a continuous system, and have been studied in this context However, the syntax and semantics of LTL are not relevant
[11, 19]. Examples of how one can abstract continuous dynam- for the theory developed in this paper, so we only mention
ics by discrete transition systems are given in [2, 3, 9, 11, 19]. them as needed for the examples in Section VI. The interested
reader can find a comprehensive treatment of LTL in [1].
Definition 1. A weighted (finite) transition system is a tuple Instead, we follow the automata-based approach of Vardi and
T = (S, R, s0 , AP, L, c, w) consisting of (i) a finite set of states Wolper [22], and consider non-deterministic Buchi automata
S, (ii) a transition relation R ⊆ S×S, (iii) an initial state s0 ∈ S, (hereafter called Buchi automata), which accept the class of
(iv) a set of atomic propositions AP , (v) a labeling function languages equivalent to ω-regular languages. Thus, our results
L ∶ S → 2AP , (vi) a cost function c ∶ R → R, (vii) and a weight hold for any property that can be specified as an ω-regular
function w ∶ R → R≥0 . language, which is a regular language extended by infinite
We assume that the transition system is non-blocking, so for repetition (denoted by ω). In particular, LTL is a subset of ω-
each state s ∈ S, there exists a state t ∈ S such that (s, t) ∈ R. regular languages, so an equivalent Buchi automaton can be
A run of the transition system is an infinite sequence of constructed for any LTL formula ϕ [1].
its states, σ = s0 s1 s2 . . . where si ∈ S is the state of the Definition 2. A Buchi automaton is a tuple
system at index i (also denoted σi ) and (si , si+1 ) ∈ R for A = (Q, Σ, δ, Q0 , Acc) consisting of (i) a finite set of
i = 0, 1, . . .. A word is an infinite sequence of labels L(σ) = states Q, (ii) a finite alphabet Σ, (iii) a transition relation
L(s0 )L(s1 )L(s2 ) . . . where σ = s0 s1 s2 . . . is a run. δ ⊆ Q × Σ × Q, (iv) a set of initial states Q0 ⊆ Q, (v) and a
Let Sk+ be the set of all runs up to index k. An infinite- set of accepting states Acc ⊆ Q.
memory control policy is denoted by π = (μ0 , μ1 , . . .) where Let Σω be the set of infinite words over Σ. A run for σ =
μk ∶ Sk+ → R maps a partial run s0 s1 . . . sk ∈ Sk+ to a new A0 A1 A2 . . . ∈ Σω denotes an infinite sequence q0 q1 q2 . . . of
transition. A policy π = (μ, μ, . . .) is finite-memory if μ ∶ S × states in A such that q0 ∈ Q0 and (qi , Ai , qi+1 ) ∈ δ for i ≥ 0.
M → R × M, where the finite set M is called the memory. Run q0 q1 q2 . . . is accepting (accepted) if qi ∈ Acc for infinitely
For the deterministic transition system models we consider, many indices i ∈ N appearing in the run.
the run of a transition system implicitly encodes the control
policy. An infinite-memory run is a run that can be imple- Intuitively, a run is accepted by a Buchi automaton if a state
mented by an infinite-memory policy. Similarly for finite- in Acc is visited infinitely often.
memory runs. We use the definition of an accepting run in a Buchi
When possible, we will sometimes abuse notation and refer automaton and the fact that every LTL formula ϕ can be
to the cost c(t) of a state t ∈ S, instead of a transition between represented by an equivalent Buchi automaton Aϕ to define
states. In this case, we enforce that c(s, t) = c(t) for all satisfaction of an LTL formula ϕ.
transitions (s, t) ∈ R, i.e., the cost of the state is mapped Definition 3. Let Aϕ be a Buchi automaton corresponding to
to all incoming transitions. Similar notational simplification is the LTL formula ϕ. A run σ = s0 s1 s2 . . . in T satisfies ϕ,
used for weights and should be clear from context. denoted by σ ⊧ ϕ, if the word L(σ) is accepted by Aϕ .
The cost function c can be viewed concretely as a physical
cost of a transition between states, such as time or fuel. This C. Graph Theory
cost can be negative for some transitions, which could, for This section lists basic definitions for graphs that will be
example, correspond to refueling if the cost is fuel consump- necessary later. Let G = (V, E) be a directed graph (digraph)
tion. The weight function w can be viewed as the importance with ∣V ∣ vertices and ∣E∣ edges. Let e = (u, v) ∈ E denote
of each transition, which is a flexible design parameter. In the an edge from vertex u to vertex v. A walk is a finite edge

450
sequence e0 , e1 , . . . , ep , and a cycle is a walk in which the a run σ ∗ such that
initial vertex is equal to the final vertex. A path is a walk with
J(σ ∗ ) = inf {J(σ) ∣ σ is finite-memory run of T , σ ⊧ ϕ} ,
no repeated vertices, and a simple cycle is a path in which the
(2)
initial vertex is equal to the final vertex.
i.e., run σ ∗ achieves the infimum in (2).
A digraph G = (V, E) is strongly connected if there exists
a path between each pair of vertices s, t ∈ V . A digraph G′ = An optimal satisfying infinite-memory run is defined simi-
(V ′ , E ′ ) is a subgraph of G = (V, E) if V ′ ⊆ V and E ′ ⊆ E. larly for infinite-memory runs of T .
A digraph G′ ⊆ G is a strongly connected component if it is Although we show in Section V-B that infinite-memory
a maximal strongly connected subgraph of G. runs are generally necessary to achieve the infimum in (2),
we focus on finite-memory runs, as these are more practical
III. P ROBLEM S TATEMENT than their infinite-memory counterparts. However, finding an
optimal satisfying finite-memory run is potentially ill-posed,
In this section, we formally state the main problem of the as the infimum might not be achieved due to the constraint
paper and give an overview of our solution approach. Let that the run must also satisfy ϕ. This happens when it is
T = (S, R, s0 , AP, L, c, w) be a weighted transition system possible to reduce the cost of a satisfying run by including an
and ϕ be an LTL specification defined over AP . arbitrarily long, low weighted average cost subsequence. For
instance, consider the run σ = (s0 s0 s1 )ω . Let c(s0 , s0 ) = 1,
Definition 4. Let σ be a run of T where σi is the state at the
c(s1 , s0 ) = c(s0 , s1 ) = 2 and the weights equal 1 for each tran-
i-th index of σ. The weighted average cost of run σ is
sition. Assume that a specification is satisfied if s1 is visited
∑i=0 c(σi , σi+1 )
n
infinitely often. Then, J(σ) can be reduced by including an
J(σ) ∶= lim sup , (1) arbitrarily large number of self transitions from s0 to s0 in σ,
∑i=0 w(σi , σi+1 )
n
n→∞
even though these do not affect satisfaction of the specification.
where J maps runs of T to R ∪ ∞. Intuitively, one should restrict these repetitions to make finding
Since LTL specifications are typically defined over infinite an optimal satisfying finite-memory run well-posed. We will
sequences of states, we consider the (weighted) average cost show that one can always compute an -suboptimal finite-
function in (1) to ensure that the cost function is bounded. This memory run by restricting the length of these repetitions. We
cost function is well-defined when (i) c(σi , σi+1 ) < ∞ for all defer the details to Section IV, when we will have developed
i ≥ 0, and (ii) there exists a j ∈ N such that w(σi , σi+1 ) > 0 for the necessary technical machinery.
infinitely many i ≥ j, which we assume is true for the sequel. Problem 1. Given a weighted transition system T and an LTL
Assumption (ii) enforces that a run does not eventually visit specification ϕ, compute an optimal satisfying finite-memory
only states with zero weights. run σ ∗ of T if one exists.
To better understand the weighted average cost function
Remark 1. For completeness, we show how to compute
J, consider the case where w(s, t) = 1 for all transitions
optimal satisfying infinite-memory runs in Section V-B. These
(s, t) ∈ R. Let a cost c(s, t) be arbitrarily fixed for each tran-
runs achieve the minimal weighted average cost, but do so
sition (s, t) ∈ R. Then, J(σ) is the average cost per transition
by adding arbitrarily long progressions of states that do not
between states (or average cost per stage). If w(s, t) = 1 for
change whether or not the specification is satisfied.
states in s, t ∈ S ′ ⊂ S and w(s, t) = 0 for states in S − S ′ , then
J(σ) is the mean time per transition between states in S ′ . IV. R EFORMULATION OF THE PROBLEM
As an example, consider σ = (s0 s1 )ω where s0 s1 repeats We solve Problem 1 by first creating a product automaton
indefinitely. Let c(s0 , s1 ) = 1, c(s1 , s0 ) = 2, w(s0 , s1 ) = 1, that represents runs that are allowed by the transition system
and w(s1 , s0 ) = 1. Then, J(σ) = 1.5 is the average cost T and also satisfy the LTL specification ϕ. We can limit our
per transition. Now, let w(s1 , s0 ) = 0. Then, J(σ) = 3 is the search for finite-memory runs, without loss of generality, to
average cost per transition from s0 to s1 . runs in the product automaton that are of the form σP =
The weighted average cost function is more natural than the σpre (σsuf )ω . Here σpre is a finite walk and σsuf is a finite cycle
minimax cost function of Smith et al. [20] in some applica- that is repeated infinitely often. Runs with this structure are
tions. For example, consider an autonomous vehicle repeatedly said to be in prefix-suffix form. We observe that the weighted
picking up people and delivering them to a destination. It takes average cost only depends on σsuf , which reduces the problem
a certain amount of fuel to travel between discrete states, and to searching for a cycle σsuf in the product automaton. This
each discrete state has a fixed number of people that need to be search can be done using dynamic programming techniques

picked up. A natural problem formulation is to minimize the for finite-memory runs. The optimal accepting run σP is then

fuel consumption per person picked up, which is a weighted projected back on T as σ , which solves Problem 1.
average cost where fuel is the cost and the number of people A. Product Automaton
is the weight. The cost function in [20] cannot adequately
We use the standard product automaton construction, due to
capture this task.
Vardi and Wolper [22], to represent runs that are allowed by
Definition 5. An optimal satisfying finite-memory run of T is the transition system and satisfy the LTL specification.

451
Definition 6. Let T = (S, R, s0 , AP, L, c, w) be a weighted to that of [20]; we optimize a different cost function on the
transition system and A = (Q, 2AP , δ, Q0 , Acc) be a Buchi Vardi and Wolper [22] product automaton construction.
automaton. The product automaton P = T × A is the tuple
Definition 7. Let σpre be a finite walk in P and σsuf be a finite
P ∶= (SP , δP , AccP , sP,0 , APP , LP , cP , wP ), consisting of
cycle in P. A run σP is in prefix-suffix form if it is of the form
(i) a finite set of states SP = S × Q, σP = σpre (σsuf )ω .
(ii) a transition relation δP ⊆ SP × SP , where
((s, q), (s′ , q ′ )) ∈ δP if and only if (s, s′ ) ∈ R It is well-known that if there exists an accepting run in
and (q, L(s), q ′ ) ∈ δ, P for an LTL formula ϕ, then there exists an accepting run
(iii) a set of accepting states AccP = S × Acc, in prefix-suffix form for ϕ [1]. This can be seen since the
product automaton P is finite, but an accepting run is infinite
(iv) a set of initial states SP,0 , with (s0 , q0 ) ∈ SP,0 if q0 ∈ Q0 ,
and visits an accepting state infinitely often. Thus, at least
(v) a set of atomic propositions APP = Q,
one accepting state must be visited infinitely often, and this
(vi) a labeling function LP ∶ S × Q → 2Q ,
can correspond to a repeated cycle including the accepting
(vii) a cost function cP ∶ δP → R, where cP ((s, q), (s′ , q ′ )) =
state. For an accepting run σP , the suffix σsuf is a cycle in the
c(s, s′ ) for all ((s, q), (s′ , q ′ )) ∈ δP , and
product automaton P that satisfies the acceptance condition,
(viii) a weight function wP ∶ δP → R≥0 , where
i.e., it includes an accepting state. The prefix σpre is a finite
wP ((s, q), (s′ , q ′ )) = w(s, s′ ) for all ((s, q), (s′ , q ′ )) ∈
run from an initial state sP,0 to a state on an accepting cycle.
δP . The following lemma shows that a minimum weighted
A run σP = (s0 , q0 )(s1 , q1 ) . . . is accepting if (si , qi ) ∈ average cost run can be found searching over finite-memory
AccP for infinitely many indices i ∈ N. runs of the form σP = σpre (σsuf )ω .
The projection of a run σP = (s0 , q0 )(s1 , q1 ) . . . in the Lemma 2. There exists at least one accepting finite-memory
product automaton P is the run σ = s0 s1 . . . in the transition run σP of P that minimizes J and is in prefix-suffix form, i.e.,
system. The projection of a finite-memory run in P is a finite- σP = σpre (σsuf )ω .
memory run in T [1].
The following proposition relates accepting runs in T and Proof: Let σgen be an accepting finite-memory run in P
P and is due to Vardi and Wolper [22]. that is not in prefix-suffix form and has weighted average cost
J(σgen ). Since σgen is accepting, it must visit an accepting state
Proposition 1. ([22]) Let Aϕ be a Buchi automaton cor- sacc ∈ SP infinitely often. Let the finite walk σpre be from an
responding to the LTL formula ϕ. For any accepting run initial state sP,0 to the first visit of sacc . Now consider the set
σP = (s0 , q0 )(s1 , q1 ) . . . in the product automaton P = T ×Aϕ , of walks between successive visits to sacc . Each walk starts and
its projection σ = s0 s1 . . . in the transition system T satisfies ends at sacc (so it is a cycle), is finite with bounded length,
ϕ. Conversely, for any run σ = s0 s1 . . . in T that satisfies ϕ, and has a weighted average cost associated with it. For each
there exists an accepting run σP = (s0 , q0 )(s1 , q1 ) . . . in the cycle τ , compute the weighted average cost J(τ ω ). Let σsuf
product automaton. be the finite cycle with the minimum weighted average cost
Lemma 1. For any accepting run σP in P and its projection σ over all τ . Then, J(σP ) = J(σpre (σsuf )ω ) ≤ J(σgen ). Since
in T , J(σP ) = J(σ). Conversely, for any σ in T that satisfies σgen was arbitrary, the claim follows.
ϕ, there exists an accepting run σP in P with J(σP ) = J(σ). The next proposition shows that the weighted average cost
of a run does not depend on any finite prefix of the run.
Proof: Consider a run σP = (s0 , q0 )(s1 , q1 ) . . . in P. By
Proposition 2. Let σ = s0 s1 . . . be a run (in T or P) and
definition, for states (si , qi ), (si+1 , qi+1 ) ∈ SP and si , si+1 ∈
σk∶∞ = sk sk+1 . . . be the run σ starting at index k ∈ N. Then,
ST , the cost cP ((si , qi ), (si+1 , qi+1 )) = c(si , si+1 ) and the
their weighted average costs are equal, i.e., J(σ) = J(σk∶∞ ).
weight wP ((si , qi ), (si+1 , qi+1 )) = w(si , si+1 ) for all i ≥ 0, so
J(σP ) = J(σ). Now consider a run σ = s0 s1 . . . in T that Proof: From Definition 1, costs and weights depend only
satisfies ϕ. Proposition 1 gives the existence of an accepting on the transition—not the index. Also, from the assumptions
run σP = (s0 , q0 )(s1 , q1 ) . . . in P, and so J(σP ) = J(σ). that directly follow equation (1), transitions with positive

By Lemma 1, an accepting run σP with minimal weighted weight occur infinitely often. Thus,
average cost in the product automaton has a projection in the ∑ c(σi , σi+1 )
n
transition system σ ∗ that is a satisfying run with minimal J(σ) ∶= lim sup ni=0
n→∞ ∑i=0 w(σi , σi+1 )
weighted average cost.
∑i=0 c(σi , σi+1 ) + ∑i=k c(σi , σi+1 )
k−1 n
= lim sup k−1
n→∞ ∑i=0 w(σi , σi+1 ) + ∑i=k w(σi , σi+1 )
n
B. Prefix-Suffix Form
∑ c(σi , σi+1 )
n
We show that Problem 1 is equivalent to finding a run = lim sup ni=k = J(σk∶∞ ).
of the form σP = σpre (σsuf )ω , in the product automaton P n→∞ ∑i=k w(σi , σi+1 )
that minimizes the weighted average cost function (1). We
equivalently treat the product automaton as a graph when From Proposition 2, finite prefixes do not
convenient. Our analysis and notation in this section is similar contribute to the weighted average cost function, so

452
J(σpre (σsuf )ω ) = J((σsuf )ω ). Thus, one can optimize over the Algorithm 1 Overview of solution approach
suffix σsuf , which corresponds to an accepting cycle in the Input: Weighted transition system T and LTL formula ϕ
product automaton. Given an optimal accepting cycle σsuf ∗
, Output: Run σ ∗ , a solution to Problem 1
∗ Create Buchi automaton Aϕ
one then computes a walk from an initial state to σsuf .
We now define a weighted average cost function for finite Create product automaton P = T × Aϕ
walks in the product automaton that is analogous to (1). Compute states in P reachable from an initial state
Calculate strongly connected components (scc) of P
Definition 8. The weighted average cost of a finite walk for scc ∈ P do
σP = (s0 , q0 )(s1 , q1 ) . . . (sm , qm ) in the product automaton is ∗
Let σsuf ˜
= arg inf {J(σ) ∣ σ is cycle in P}

˜ P ) ∶= ∑i=0 cP (σi , σi+1 ) ,
m
if σsuf is an accepting cycle then
J(σ (3)
∑i=0 wP (σi , σi+1 )
m break {finite-memory run achieves infimum}
end if
with similar assumptions on c and w as for equation (1). ∗
Find best bounded-length accepting σsuf over all sacc ∈ scc
Problem 2. Let acc(P) be the set of all accepting cycles in (Section V)
the product automaton P reachable from an initial state. Find end for
∗ ˜ ∗ ) = inf σ ∈acc(P) J(σ
˜ P ) if it exists. ∗
a suffix σsuf where J(σ suf P
Take optimal σsuf over all sccs

∗ ∗ ω Compute finite prefix σpre from initial state to σsuf
Proposition 3. Let σP = σpre (σsuf ) be a solution to Problem ∗ ∗ ω ∗
Project run σP = σpre (σsuf ) to T as σ
2. The projection to the transition system of any optimal

accepting run σP is a solution to Problem 1.
Proof: From Lemma 2, there exists an accepting run σP = Remark 2. In Section V, we treat the product automaton as
σpre (σsuf )ω that minimizes J. From Proposition 2 and equation a graph GP = (VP , EP ), with the natural bijections between
˜ suf ).
(5), J(σP ) = J((σsuf )ω ) = J(σ states SP and vertices VP and between edges (u, v) ∈ EP
We now pause to give a high-level overview of our approach and transitions in δP . We further assume that a reachability
to solving Problem 1, using its reformulation as Problem 2. computation has been done, so that GP only includes states
The major steps are outlined in Algorithm 1. First, a Buchi reachable from an initial state sP,0 . We assume that GP is
automaton Aϕ corresponding to the LTL formula ϕ is created. strongly connected. If not, the strongly connected components
Then, we create the product automaton P = T × Aϕ . Reacha- of the P can be found in O(∣VP ∣ + ∣EP ∣) time with Tarjan’s
bility analysis on P determines, in linear time in the size of algorithm [6]. To compute the optimal cycle for the entire
P, all states that can be reached from an initial state, and thus graph, one finds the optimal cycle in each strongly connected
guarantees existence of a finite prefix σpre to all remaining component and then selects the optimal over all strongly
states. Next, we compute the strongly connected components connected components. We denote each strongly connected
(scc) of P, since two states can be on the same cycle only component of GP by G = (V, E), where n = ∣V ∣ and m = ∣E∣.
if they are in the same strongly connected component. This
partitions the original product automaton into sub-graphs, each V. S OLUTION A PPROACH
of which can be searched independently for optimal cycles.
For each strongly connected component of P, we compute In this section, we give algorithms for computing op-
the cycle σsuf with the minimum weighted average cost, timal finite-memory and infinite-memory runs. We assume
regardless of whether or not it is accepting (see Section V-B). that G = (V, E) is a strongly connected component of the
This is the infimum of the minimum weighted average cost product automaton P and has at least one accepting state.
over all accepting cycles. If this cycle is accepting, then the The techniques we adapt were originally developed for the
infimum is achieved by a finite-memory run. If not, then the minimum cost-to-time ratio problem [7, 8, 10, 13, 14].
infimum is not achieved by a finite-memory run and thus we
A. Computing finite-memory runs
must further constrain the form of the suffix σsuf to make the
optimization well-posed. We present two related algorithms (that find an optimal

A natural choice is finite-memory policies, which corre- accepting cycle σsuf ) in increasing levels of generality. While
spond to bounding the length of σsuf . We can solve for the the algorithm in Section V-A2 subsumes the first algorithm,
optimal accepting σsuf subject to this additional constraint us- the first one is more intuitive and computationally efficient
ing dynamic programming techniques. The optimal accepting when the weight function is a constant function.
∗ ∗
σsuf over all strongly connected components is σsuf . Given σsuf , 1) Minimum mean cycle: We first investigate the case
we compute a finite walk σpre from an initial state to any state where w(e) = 1 for all e ∈ E, so the total weight of a walk

on σsuf . The finite walk σpre is guaranteed to exist due to is equivalent to the number of transitions. This is similar to
the initial reachability computation. The optimal run in the the problem investigated by Karp [13], with the additional
∗ ∗ ω
product automaton is then σP = σpre (σsuf ) . The projection of constraint that the cycle must be accepting. This additional
∗ ∗
σP to the transition system as σ solves Problem 1, given the constraint prevents a direct application of Karp’s theorem [13],
additional constraint that σsuf has bounded length. but our approach is similar. The intuition is that, conditional

453
on the weight of a walk, the minimum cost walk gives the where G0 (v) = d(s, v).
minimum average cost walk. The optimal cycle cost and the corresponding cycle are
Let s ∈ V be an accepting vertex (i.e., accepting state). For recovered in a similar manner as described in Section V-A1,
every v ∈ V , let Fk (v) be the minimum cost of a walk of and are accepting by construction. The recurrence in (5)
length k ∈ N from s to v. Thus, Fk (s) is the minimum cost requires O(na T ∣V ∣2 ) operations, where na is the number of
cycle of length k, which we note is accepting by construction. accepting vertices. This algorithm runs in pseudo-polynomial
We compute Fk (v) for all v ∈ V and k = 1, . . . , n by the time, as T is an integer, and so its binary description length
recurrence is O(log(T )). The recurrence for Gk can be computed more
efficiently if the edge costs are assumed to be non-negative,
Fk (v) = min [Fk−1 (u) + c(u, v)] , (4)
(u,v)∈E as explained in [10]. This improves the overall complexity of
where F0 (s) = 0 and F0 (v) = ∞ for v ≠ s. the recurrence to O(na T (∣E∣ + ∣V ∣log∣V ∣)) time [10].
It follows from equation (4) that Fk (v) can be computed for Remark 3. Consider the special case where weights are
all v ∈ V in O(∣V ∣∣E∣) operations. To find the minimum mean restricted to be 0 or 1. Then, the total weight T is O(∣V ∣)
cycle cost with fewer than M transitions (i.e., bounded-length and the above algorithm has polynomial time complexity
suffix), we simply compute min Fk (s)/k for all k = 1, . . . , M . O(na ∣V ∣3 ) or O(na (∣V ∣∣E∣ + ∣V ∣2 log∣V ∣) if edge costs are
If there are multiple cycles with the optimal cost, pick the assumed to be non-negative.
cycle corresponding to the minimum k.
We repeat the above procedure for each accepting vertex Remark 4. Although finite prefixes do not affect the cost
s ∈ V . The minimum mean cycle value is the minimum (cf. Proposition 2), it may be desired to create a ”good”
of these values. We record the optimal vertex s∗ and the finite prefix. The techniques described in Section V-A2 (and
corresponding integer k ∗ . To determine the optimal cycle similarly Section V-A1) can be adapted to create these finite
corresponding to s∗ and k ∗ , we simply determine the cor- prefixes. After computing the optimal accepting cycle C ∗ , one
responding transitions from (4) for Fk∗ (s∗ ) from vertex s∗ . can compute the values Gk (v) and corresponding walk defined
The repeated application of recurrence (4) takes O(na ∣V ∣∣E∣) with respect to the initial state s0 for all states v ∈ C ∗ .
operations, where na is the number of accepting vertices, B. Computing infinite-memory runs
which is typically significantly smaller than ∣V ∣.
Infinite-memory runs achieve the minimum weighted aver-
2) Minimum cycle ratio: We now discuss a more general
age cost J ∗ . However, their practical use is limited, as they
case, which subsumes the discussion in Section V-A1. Our
achieve J ∗ by increasingly visiting states that do not affect
approach is based on that of Hartmann and Orlin [10], who
whether or not the specification is satisfied. This is unlikely
consider the unconstrained case.
the designer’s intent, so we only briefly discuss these runs for
Let the possible weights be in the integer set Val =
completeness. A related discussion on infinite-memory runs,
{1, . . . , wmax }, where wmax is a positive integer. Let E ′ ⊆ E,
but in the adversarial environment context, appears in [5].
and define weights as
Let σopt be the (possibly non-accepting) cycle with the


⎪x ∈ Val if e ∈ E ′ minimum weighted average cost J(σopt ω
) over all cycles in
w(e) = ⎨

⎪ 0 if e ∈ E − E ′ . G. Clearly, the restriction that a cycle is accepting can only
⎩ increase the weighted average cost. Let σacc be a cycle that
The setup in Section V-A1 is when E ′ = E and Val = {1}. contains both an accepting state and a state in σopt . Let σacc,i
Let Tu ∶= max(u,v)∈E w(u, v) for each vertex u ∈ V . Then, denote the ith state in σacc . For symbols α and β, let (αβ k )ω
T ∶= ∑u∈V Tu is the maximum weight of a path. for k = 1, 2, . . . denote the sequence αβαββαβββ . . ..
Let s ∈ V be an accepting state. For each v ∈ V , let Gk (v)
be the minimum cost walk from s to v that has total weight Proposition 4. Let σP = (σacc σopt
k ω
) , where k = 1, 2, . . .. Then,
equal to k. This definition is similar to Fk (v) in Section V-A1 σP is accepting and achieves the minimum weighted average
except now k is the total weight w of the edges, which is cost (1).
no longer simply the number of edges. Let G′k (v) be the Proof: Run σP is accepting because it repeats σacc
minimum cost walk from s to v that has total weight equal to k infinitely often. Let αc = ∑pi=0 c(σacc,i , σacc,i+1 ) and αw =
and with the last edge of the walk in E ′ . Finally, let d(u, v) be p
∑i=0 w(σacc,i , σacc,i+1 ), where integer p is the length of σacc .
the minimum cost of a path from u to v in G consisting solely Define βc and βw similarly for σopt . Then,
of edges of E − E ′ . The costs d(u, v) are pre-computed using
∑k=1 (αc + kβc )
n
an all-pairs shortest paths algorithm, which assumes there are J(σ) ∶= lim sup
n→∞ ∑k=1 (αw + kβw )
n
no negative-cost cycles in E − E ′ [6].
The values Gk (v) can be computed for all v ∈ V and k = (n + 1)αc + βc ∑nk=1 k
= lim sup
n→∞ (n + 1)αw + βw ∑k=1 k
1, . . . , T by the recurrence: n

G′k (v) = min [Gk−w(u,v) (u) + c(u, v)] (5) βc


(u,v)∈E ′
= lim sup = J((σopt )ω ).
n→∞ βw
Gk (v) = min [G′k (u) + d(u, v)]
u∈V

454
A direct application of a minimum cost-to-time ratio algo-
rithm, e.g., [10], can be used to compute σopt since there is no
constraint that it must include an accepting state. Also, given
σopt , σacc always exists as there is an accepting state in the
same strongly connected component as σopt by construction.
The next proposition shows that finite-memory runs can be
arbitrarily close to the optimal weighted average cost J ∗ .
Fig. 1. Driving task, with optimal run (solid, dark) and feasible run (dashed,
Proposition 5. Given any  > 0, a finite-memory run σP exists light).
with J((σP )ω ) < J((σopt )ω ) +  = J ∗ + .
Proof: Construct a finite-memory run of the form σP =
σpre (σsuf )ω , where σsuf has fixed length. In particular, let σsuf =
σacc (σopt )M for a large (fixed) integer M . By picking M large
enough, the error between J((σsuf )ω ) and J((σopt )ω ) can be
made arbitrarily small.
Thus, finite-memory runs can approximate the performance
of infinite-memory runs arbitrarily closely. This allows a
designer to tradeoff between runs with low weighted average
cost and runs with short lengths.
C. Complexity
We now discuss the complexity of the entire procedure,
i.e., Algorithm 1. The number of states and transitions in
the transition system is nT and mT , respectively. The ω- Fig. 2. Surveillance task, with optimal run (solid, dark) and feasible run
regular specification is given by a Buchi automaton Aϕ . (dashed, light).
The product automaton has nP = nT × ∣Aϕ ∣ states and mP
edges. For finite-memory runs, the dynamic programming can transition between neighboring cells in the grid (Figure
algorithms described in Section V-A take O(na nP mP ) and 1). The car’s task is to repeatedly visit the states labeled a,
O(na T (mP + nP lognP )) operations, assuming non-negative b, and c while always avoiding states labeled x. Formally,
edge weights for the latter bound. Here, na is the number ϕ = ◻ ◇ a ∧ ◻ ◇ b ∧ ◻ ◇ c ∧ ◻¬x. Costs are defined over
of accepting states in the product automaton. Usually, na states to reward driving in the proper lane (the outer boundary)
is significantly smaller than nP . For infinite-memory runs, and penalize leaving it. Weights are zero for all states except
there is no accepting state constraint for the cycles, so stan- states labeled a, b, and c, which each have weight of one.
dard techniques [10, 13] can be used that take O(nP mP ) The second example, Figure 2, is motivated by surveillance.
and O(T (mP + nP lognP )) operations, again assuming non- The robot’s task is to repeatedly visit states labeled either a,
negative edge weights for the latter bound. The algorithms b, c or d, e, f . States labeled x should always be avoided.
in Section V are easily parallelizable, both between strongly Formally, ϕ = ((◻ ◇ a ∧ ◻ ◇ b ∧ ◻ ◇ c) ∨ (◻ ◇ d ∧ ◻ ◇
connected components of P and for each accepting state. e ∧ ◻ ◇ f )) ∧ ◻¬x. Costs vary with the state as described
In practice, an LTL formula ϕ is used to automatically in Figure 2, and might describe the time to navigate different
generate a Buchi automaton Aϕ . The length of an LTL formula terrain. The weight is zero at each state, except states a and
ϕ is the number of symbols. A corresponding Buchi automaton f , where the weight is one.
Aϕ has size 2O(∣ϕ∣) in the worst-case, but this behavior is Numerical results are in Table I. Computation times for op-
rarely encountered in practice. timal and feasible runs are given by topt and tfeas respectively.
All computations were done using Python on a Linux desktop
VI. E XAMPLES with a dual-core processor and 2 GB of memory. The feasible
The following examples demonstrate the techniques devel- satisfying runs were generated with depth-first search. The
oped in Section V in the context of autonomous driving and optimal satisfying runs were generated with the algorithm from
surveillance. Each cell in Figures (1) and (2) corresponds to Section V-A2. Since it was possible to decrease the weighted
a state, and each state has transitions to its four neighbors. average cost by increasing the length of the cycle (i.e., the
We specify costs and weights over states, as discussed in infimum was not achieved by a finite-memory satisfying run),
Section II. Tasks are formally specified by LTL formulas and we used the shortest cycle such that J(σopt ) < ∞. Thus, the
informally in English. We use the following LTL symbols optimal values J(σopt ) are conservative. The improvement of
without definition: negation (¬), disjunction ( ∨ ), conjunction the optimal runs over a feasible run is evident from Figures 1
( ∧ ), always (◻), and eventually (◇) [1]. and 2. In Figure 1, the optimal run immediately heads back
The first example is motivated by autonomous driving. The to its lane to reduce costs, while the feasible run does not. In
weighted transition system represents an abstracted car that Figure 2, the optimal run avoids visiting high-cost regions.

455
TABLE I
N UMERICAL R ESULTS
Example T (nodes/edges) Aϕ P P (reachable) # SCC # acc. states Jopt (units) Jfeas (units) topt (sec) tfeas (sec)
Driving 300 / 1120 4 / 13 1200 / 3516 709 / 2396 1 1 49.3 71.3 2.49 0.68
Surveillance 400 / 1520 9 / 34 3600 / 14917 2355 / 8835 2 2 340.9 566.3 21.9 1.94

VII. C ONCLUSIONS [10] M. Hartmann and J. B. Orlin. Finding mimimum cost


We created optimal runs of a weighted transition system to time ratio cycles with small integral transit times.
that minimized a weighted average cost function subject to Networks, 23:567–574, 1993.
ω-regular language constraints. These constraints include the [11] S. Karaman and E. Frazzoli. Sampling-based motion
well-studied linear temporal logic as a subset. We showed that planning with deterministic μ-calculus specifications. In
optimal system runs correspond to cycles in a lifted product Proc. of IEEE Conference on Decision and Control,
space, which includes behaviors that both are valid for the 2009.
system and satisfy the temporal logic specification. Dynamic [12] S. Karaman, R. G. Sanfelice, and E. Frazzoli. Optimal
programming techniques were used to solve for an optimal control of mixed logical dynamical systems with linear
cycle in this product space. temporal logic specifications. In Proc. of IEEE Confer-
Future directions include investigating notions of optimality ence on Decision and Control, pages 2117–2122, 2008.
with both non-deterministic transition systems and adversarial [13] R. M. Karp. A characterization of the minimum cycle
environments. Additionally, better computational complexity mean in a digraph. Discrete Mathematics, 23:309–311,
may be achievable for fragments of ω-regular languages. 1978.
[14] R. M. Karp and J. B. Orlin. Parametric shortest path
ACKNOWLEDGEMENTS algorithms with an application to cyclic staffing. Discrete
The authors thank Pavithra Prabhakar and the anonymous and Applied Mathematics, 3:37–45, 1981.
reviewers. This work was supported by a NDSEG Fellowship, [15] L. E. Kavraki, P. Svestka, J. C. Latombe, and M. H.
the Boeing Corporation, and NSF Grant CNS-0911041. Overmars. Probabilistic roadmaps for path planning in
high-dimensional configuration spaces. IEEE Transac-
R EFERENCES tions on Robotics and Automation, 12:566–580, 1996.
[1] C. Baier and J-P. Katoen. Principles of Model Checking. [16] M. Kloetzer and C. Belta. A fully automated framework
MIT Press, 2008. for control of linear systems from temporal logic spec-
[2] C. Belta and L.C.G.J.M. Habets. Control of a class of ifications. IEEE Transaction on Automatic Control, 53
non-linear systems on rectangles. IEEE Transactions on (1):287–297, 2008.
Automatic Control, 51:1749–1759, 2006. [17] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas. Tem-
[3] C. Belta, V. Isler, and G. Pappas. Discrete abstractions poral logic-based reactive mission and motion planning.
for robot motion planning and control in polygonal IEEE Transactions on Robotics, 25:1370–1381, 2009.
environments. IEEE Transactions on Robotics, 21:864– [18] S. LaValle and J. J. Kuffner. Randomized kinodynamic
874, 2004. planning. International Journal of Robotics Research,
[4] D. P. Bertsekas. Dynamic Programming and Optimal 20:378–400, 2001.
Control (Vol. I and II). Athena Scientific, 2001. [19] E. Plaku, L. E. Kavraki, and M. Y. Vardi. Motion
[5] K. Chatterjee, T. A. Henzinger, and M. Jurdzinski. Mean- planning with dynamics by a synergistic combination of
payoff parity games. In Annual Symposium on Logic in layers of planning. IEEE Transactions on Robotics, 26:
Computer Science (LICS), 2005. 469–482, 2010.
[6] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. [20] S. L. Smith, J. Tumova, C. Belta, and D. Rus. Optimal
Introduction to Algorithms: 2nd ed. MIT Press, 2001. path planning for surveillance with temporal-logic con-
[7] G. B. Dantzig, W. O. Blattner, and M. R. Rao. Finding straints. The International Journal of Robotics Research,
a cycle in a graph with minimum cost to time ratio with 30:1695–1708, 2011.
application to a ship routing problem. In P. Rosenstiehl, [21] P. Toth and D. Vigo, editors. The Vehicle Routing
editor, Theory of Graphs, pages 77–84. Dunod, Paris and Problem. Philadelphia, PA: SIAM, 2001.
Gordon and Breach, New York, 1967. [22] M. Y. Vardi and P. Wolper. An automata-theoretic
[8] A. Dasdan and R. K. Gupta. Faster maximum and approach to automatic program verification. In Logic
minimum mean cycle algorithms for system performance in Computer Science, pages 322–331, 1986.
analysis. IEEE Transactions on Computer-Aided Design [23] T. Wongpiromsarn, U. Topcu, and R. M. Murray. Re-
of Integrated Circuits and Systems, 17:889–899, 1998. ceding horizon control for temporal logic specifications.
[9] L. Habets, P.J. Collins, and J.H. van Schuppen. Reach- In Proc. of the 13th International Conference on Hybrid
ability and control synthesis for piecewise-affine hybrid Systems: Computation and Control, 2010.
systems on simplices. IEEE Transaction on Automatic
Control, 51:938–948, 2006.

456
Probabilistic Temporal Logic for Motion Planning
with Resource Threshold Constraints
Chanyeol Yoo, Robert Fitch and Salah Sukkarieh
Australian Centre for Field Robotics (ACFR)
The University of Sydney
NSW, 2006, Australia
{c.yoo, rfitch, salah}@acfr.usyd.edu.au

Abstract—Temporal logic and model-checking are useful the- One issue in applying formal methods is choosing the
oretical tools for specifying complex goals at the task level and appropriate form of temporal logic [3]. Since we are interested
formally verifying the performance of control policies. We are in systems with stochasticity, probabilistic computation tree
interested in tasks that involve constraints on real-valued energy
resources. In particular, autonomous gliding aircraft gain energy logic (PCTL) is the natural choice [17]. But PCTL as defined
in the form of altitude by exploiting wind currents and must cannot fully express the task-level specifications required in
maintain altitude within some range during motion planning. robotics. In particular, we are interested in tasks that involve
We propose an extension to probabilistic computation tree logic capacity constraints on energy resources. These energy re-
that expresses such real-valued resource threshold constraints, sources are continuous quantities subject to upper and lower
and present model-checking algorithms that evaluate a piecewise
control policy with respect to a formal specification and hard or bounds. We would like to specify these bounds in continuous
soft performance guarantees. We validate this approach through form, but PCTL specifies constraints as boolean propositions
simulated examples of motion planning among obstacles for an over discrete states. It is common in applications outside
autonomous thermal glider. Our results demonstrate probabilistic of robotics to extend temporal logic to increase expressive
performance guarantees on the ability of the glider to complete its power; in this paper we propose such an extension to PCTL to
task, following a given piecewise control policy, without knowing
the exact path of the glider in advance. model resource capacity as a real-valued reward threshold and
demonstrate its use in the context of an autonomous gliding
I. I NTRODUCTION aircraft.
An intuitive application of a resource bound would be
Task-level programming is a long-standing, decades-old to constrain fuel or battery level above some minimum.
problem in robotics. The goal is to command complex tasks However, in the case of an autonomous thermal glider the
using simple, natural language. Progress towards this goal has ‘resource’ of interest is altitude. The glider gains altitude,
been achieved recently using temporal logic to specify rich and hence energy, by exploiting favourable wind currents.
tasks [3, 15] or complex goals [4]. In addition to providing Gliders must operate within a fixed altitude range for several
powerful means for representing tasks, temporal logic specifi- reasons, including maintaining the safety of the platform and
cation offers the additional benefit of proving the correctness to comply with government regulations for autonomous flight.
of a plan or policy that completes a task. By verifying a plan It may seem reasonable to model altitude discretely, but as
or policy against its logical specification while considering with all discrete approximation it is then necessary to choose
uncertainty, we can provide probabilistic performance guaran- an appropriate resolution. Even with a very fine resolution,
tees. We are interested in developing provably correct control discrete approximation can lead to inaccurate evaluation of the
policies with expressive task-level specifications. safety criteria. For example, there may exist conditions where
Formal verification is important for safety-critical systems the length scale of wind features is less than the discretisation
and currently is widely employed in embedded systems such of altitude. A policy could then lead the glider into an unsafe
as pipelined CPUs [7] and medical monitoring systems [8]. In wind gust yet is evaluated as safe. From the perspective of
robotics, the significance of a probabilistic performance guar- formal methods, there is no strong guarantee in the evaluation
antee is that, given a stochastic transition model, it is possible since such approximation may find the presence of certain
to provide a probability of success independent of the path behaviour, but not the absence of such behaviour.
eventually executed. This principled understanding of level Our approach is to extend PCTL to admit resource thresh-
of confidence is necessary for robots operating in complex old constraints in continuous form. We present Resource
environments, especially outdoor environments, where failure Threshold-PCTL (RT-PCTL), in which the logic is not only
in the worst case leads to catastrophic loss of the platform and able to formally represent high-level symbolic specifications,
compromises the safety of nearby humans. but also a constraint on an accumulated real-valued resource.
RT-PCTL includes a set of operators to formally represent hard
This work is supported in part by the Australian Centre for Field Robotics
and the New South Wales State Government. We thank Christopher J. guarantees and soft guarantees by considering the probability
Langmead and Nicholas Lawrance for helpful discussions. of mission success in all possible immediate transitions from a

457
state. For a hard guarantee, all possible immediate transitions has been on systems where temporal logic is used for high-
from a state lead to a successor state that satisfies a given level discrete mission planning complemented by low-level
probability requirement for mission success. For a soft guar- planners or controllers operating in continuous space [4],
antee, there exists at least one immediate transition that leads and with approximation methods [1]. LTL is often used
to a successor state that satisfies the requirement. since stochasticity is not directly considered in the discrete
Because control actions in our approach depend on temporal abstraction layer [6, 9, 16], and there have been attempts to
conditions (the value of accumulated resource at the time a use LTL for motion-planning in uncertain environments with
state is entered), we define a piecewise control policy, where probabilistic guarantees [10]. Our work is distinct in that we
the value of a control action varies with accumulated resource. focus on probabilistic transitions in the high-level discrete
We also define a piecewise probability function (PPF) that layer, with real-valued resource threshold constraints.
represents the probability of mission success in a given state Other related work using MDPs and PCTL for com-
with respect to the value of accumulated resource. A set of plex mission specifications includes extending the MDP for-
PPFs, one for each state, represents the formal performance mation to maintain the probability of entering undesired
guarantee for a given control policy over all possible paths. states below a given constant [22], extending PCTL for
Finally, we present algorithms for model-checking the control more complex missions [18], and approximating PCTL [12].
policy and associated PPF set against a given safety-critical Kwiatkowska et al. [17] have shown that PCTL can specify the
requirement (hard or soft guarantee). reward structure such that a mission satisfies reward-related
We validate our approach by developing an example with an requirements such as ‘the expected amount of energy that
autonomous thermal glider [19] in a hexagonal grid environ- an agent acquires during the mission’. However, the reward
ment. The glider flies non-holonomically with a priori knowl- structure in PCTL only considers the expected value at the end
edge of the thermal airflow distribution. The resource threshold of the time horizon and thus is not suitable for a mission where
constraint requires the glider to maintain its altitude between success depends not only on the mission specification but also
given upper and lower bounds. We investigate two scenarios on the accumulated reward within the path. Our approach
specified in RT-PCTL: 1) reaching the goal while avoiding addresses this case directly.
danger states, and 2) reaching the goal while traversing either
only safe states, or semi-safe states where all successor states III. P ROBLEM F ORMULATION
are not danger states. The tasks are model-checked against the A labelled discrete-time Markov chain (DTMC) M has
hard and soft constraints defined. We also compare the PPF a tuple < S, s0 , rs , rss , hs , P, AP, L > with a finite set of
evaluation with discrete evaluation at several fixed resolutions states S, an initial state s0 ∈ S, anda set of transition prob-
to demonstrate the need for representing resource values in abilities P : S × S → [0, 1] where s ∈S Pss = 1, ∀s ∈ S.
continuous form. Scalar rs ∈ R is an instant resource gained when entering the
state s and rss ∈ R2 is a transition resource gained while
II. R ELATED W ORK transiting from state s to state s . Scalar set hs = [hls , hus ]
Temporal logic has been used extensively in embedded represents the lower and upper resource bounds respectively
systems to specify required system properties over all pos- at state s. Function L : S → 2AP is a labelling function that
sible paths that are not possible using traditional proposi- assigns atomic propositions AP for each state s ∈ S.
tional logic. System properties include functional correctness, To illustrate the need for resource threshold constraints, a
liveness, safety, fairness, reachability and real-time property simple environment using the DTMC M is given in Fig. 1(a)
[5, 7, 8, 20]. With the formal specification, model-checking with three states where state s3 is the goal state. The agent
is then used to systematically determine if the specification starting from s1 must maintain its accumulated resource be-
holds [2]. The critical difference between model-checking tween 0 and 5 until it reaches the goal state. The formulation
and testing/simulation is that model-checking is capable of using PCTL [17] allows us to compute the satisfaction of
detecting the absence of error whereas testing/simulation is a property over an indefinite number of paths as shown in
only able to detect the presence of error. Therefore the Fig. 1(b) without considering the resource bounds where the
formalism plays an important role in safety-critical systems. probability is computed to be 0.9728 after four time steps.
Various forms of temporal logic have been proposed, in- However, the resource structure in standard PCTL is only
cluding linear temporal logic (LTL) [21] and computation able to model-check against ‘expected accumulated resource
tree logic (CTL) [11]. Neither is defined to include stochastic after k time steps’, ‘expected instantaneous state resource at k
transition models in their basic forms. For temporal logic to time steps’, and ‘expected accumulated state resource before
represent real-world environments with sensor noise and actu- satisfying a formula’. Since these all compute the expectation
ation error, probabilistic computation tree logic (PCTL) [17] of the resource at or after a certain number of time steps,
was introduced to replace the non-determinism of CTL with standard PCTL is unable to determine if the accumulated
probabilities. resource within a path ever violates the bounds. As a result,
The application of temporal logic has recently become an the computation tree keeps branching from the state within
important topic in robotics where a control policy generated the path that already went below threshold. Note that although
is formally guaranteed to ensure safety [3, 13, 14]. The focus the final resource at the end is above threshold, the mission

458
A and E are quantifiers over a path and they state ‘with
all paths’ and ‘with at least one path’ respectively. In this
(a) Simple environment paper, the quantifiers A and E are used to evaluate hard and
soft guarantees. X, F and U are path-specific quantifiers. [Xp],
[F≤k p] and [p1 U≤k p2 ] represent ‘the property p is satisfied in
the next time step’, ‘the property p is satisfied in future within
k time steps’ and ‘the property p1 holds until p2 is satisfied
within k time steps’. A transition from s to s (s → s ) is
said to be valid if Pss > 0. P ath(s) is defined as a set of all
infinite sequences of states starting from state s in which all
consecutive transitions are valid. ω ∈ P ath(s) is a path in the
set. For the evaluation of state formulae Φ, we use the notation
Sat(Φ) = {s ∈ S | s |= Φ} which denotes ‘a set of states that
x:hs
satisfy the formula Φ’. We use the notation P∼λ [Φ] to specify
probability inequality [∼ λ], initial resource x and upper and
lower resource bounds hs = [hls , hus ] to a state formula Φ that
is related to either Aφk or Eφk . P robs (x, hs , φk ) indicates
the probability of satisfying the path formula φk with initial
(b) Probabilistic computation tree without resource constraint resource of x over set of paths P ath(s).
The satisfaction relation |= is defined for state formulae by:

s |= a ⇐⇒ a ∈ L(s)
s |= ¬Φ ⇐⇒ s |= Φ
s |= Φ1 ∧ Φ2 ⇐⇒ s |= Φ1 and s |= Φ2
s |= Φ1 ∨ Φ2 ⇐⇒ s |= Φ1 or s |= Φ2
(c) Probabilistic computation tree with resource constraint
s |= Aφk+1 ⇐⇒ ω  ∈ P ath(s ) |= φk , ∀(s → s )

Fig. 1. Subfigs. 1(b) and 1(c) show computation trees for 1(a), without and
s |= Eφk+1 ⇐⇒ ω  ∈ P ath(s ) |= φk , ∃(s → s )
with resource constraints. s |= P∼λ
x:h
[Aφk+1 ] ⇐⇒ P robs (x , hs , φk ) ∼ λ, ∀(s → s )
s |= P∼λ
x:h
[Eφk+1 ] ⇐⇒ P robs (x , hs , φk ) ∼ λ, ∃(s → s )
is considered to be unsuccessful if any state within the path and P robs (x, hs , φk+1 ) ∼ λ
does not satisfy the constraint. where x = x + rs + rss
The computation tree with a threshold constraint (Fig. 1(c)) (2)
shows that branching terminates at a state when the accumu-
lated resource goes below zero. The probability of success Given a path ω in M, the satisfaction relation is defined:
in this case is 0.1536. From Fig. 1(c) it is shown intuitively
that the successful path within four time steps is the one in ω |= XΦ ⇐⇒ ω[1] |= Φ
which the agent stays in state s1 for two time steps until its ω |= F≤k Φ ⇐⇒ trueU≤k Φ
accumulated resource exceeds +1. Hence there is a need for a (3)
ω |= Φ1 U≤k Φ2 ⇐⇒ ω[i] |= Φ2 ∧ ω[j] |= Φ1 ,
control policy structure and evaluation function that depend on
the accumulated resource. This will be discussed in Sec. IV. ∃0 ≤ i ≤ k, ∀0 ≤ j < i

A. Resource Threshold-Probabilistic Computation Tree Logic Two examples are shown below:
(RT-PCTL) •
2:[0,10]
P<0.05 [AXdanger] = ‘with all possible paths from
The syntax for RT-PCTL is defined as the starting state, the probability of reaching the danger
state in the next time step is less than 0.05, given the
Φ ::=true|a|¬Φ|Φ1 ∧ Φ2 |Φ1 ∨ Φ2 |Aφ|Eφ|P∼λ
x:h k
[φ ]
initial accumulated resource of 2, while maintaining the
φk ::=XΦ|F≤k Φ|Φ1 U≤k Φ2 accumulated resource between 0 and 10’.
P>0.8 [E[¬dangerU≤20s goal]] = ‘there exists at least
0:[2,4]
where Φ is a state formula, •
k
φ is a path formula, (1) one path from the starting state where the probability
of reaching the goal state while avoiding the danger
a is an atomic proposition,
states within 20 seconds is greater than 0.8, given the
∼∈ {<, ≤, ≥, >}, initial accumulated resource of 0 while maintaining the
x ∈ R and λ ∈ [0, 1] and k ∈ N . accumulated resource between 2 and 4’.

459
B. Definition of Piecewise Function IV. P ERFORMANCE E VALUATION OF C ONTROL P OLICY
This section defines the properties and operations of piece- To evaluate a control policy, we analytically solve a piece-
wise functions that are used later. wise probability function (PPF) at each state. In this section,
1) Piecewise Function: Suppose f (x) is a piecewise func- we show how to compute PPFs with respect to quantifiers
tion, X (next), U (until), and F (future).
⎧ ⎧
⎪pf1 if x > cf1
⎪ ⎪
⎪ pf1 if x > cf1 A. PPF Solutions for Quantifiers X, U, and F

⎪ ⎪

⎪..
⎪ ⎪..


⎪ . ⎪
⎪ . The quantifier X specifies a path property from a state where

⎪ ⎪

⎨pf if cf ≥ x > cf ⎨pf elseif x > cf [Xp] denotes ‘property p holds in the next transition’. The solu-
f (x) = . k k−1 k
= .k k tion for computing the PPF is shown in (11) for a single-action

⎪ . ⎪
⎪ .

⎪ . ⎪
⎪ . control policy π(s). Note that P robs (x, h, XΦ) is a piecewise

⎪ ⎪

⎪ f ⎪ function defined in III-B that returns the probability of holding
⎪pn if cn−1 ≥ x > cn
⎪ ⎪
f f f f
⎪ ⎪pn elseif x > cn

⎩0 else ⎩ 0 else
the property Φ in the next transition starting from the state s
with the entering accumulated resource of x while maintaining
where k ∈ N , pfk = [0, 1] and cfk+1 < cfk , ∀k ∈ N the accumulated resource between hs = [hls , hus ], ∀s ∈ S. The
π(s)
(4) algorithm for solving (11) is shown in Alg. 1. Pss denotes

the transition probability from s to s for an action π(s).
2) Addition:
⎧ P robs (x, hs , XΦ)
⎪ . ⎧

⎪.. if x > hus − rs

⎪ ⎪
⎪ 0
⎨f (c) + g(c) ⎨  π(s)
f (x) + g(x) = .
elseif x > c
= Pss · P robs (x , hs , X0 Φ) if x > hls − rs

⎪ (5) ⎪
⎪ 


.. ⎩s ∈S

⎩ 0 else
0 else  π(s)
hus −rs 
=Γhl −rs ( Pss · P robs (x , hs , X Φ)),
0
where c ∈ (c ∪ c )
f g
s
s ∈S
hu −r
3) Shift: where P robs (x, hs , X0 Φ) = Γhsl −rss (1), ∀s ∈ Sat(Φ),
⎧ s


⎪ p1 if x > c1 − c x = x + rs + rss


⎨.. (11)
f (x + c) = . (6) The quantifier U specifies the satisfaction of a property Φ

⎪ pn elseif x > cn − c

⎪ along the path until it ends with another property Ψ, formally

0 else written as [ΦUΨ]. The PPF is shown in (12) for a single-action
4) Multiplication: control policy π(s) and the algorithm is defined in Alg. 2. The
⎧ formula [F≤k Ψ] is identical to [trueU≤k Ψ].

⎪ c · p1 if x > c1 P robs (x, hs , ΦU≤k+1 Ψ), ∀s ∈ Sat(Φ ∧ ¬Ψ)


⎨.. hu −r
 π(s)
c · f (x) = . (7) = Γhsl −rss ( Pss · P robs (x , hs , ΦU≤k Ψ)),

⎪ c · pn elseif x > cn
s
s ∈S


⎩ where x = x + rs + rss
0 else
P robs (x, hs , ΦU≤k Ψ), ∀s ∈ Sat(Ψ) (12)
5) Conditioning:
hus −rs
4 =Γ hls −rs
(1)
f (x) if x > c
f (x) 6 c = (8) P robs (x, hs , ΦU≤k Ψ), ∀s ∈ Sat(¬Φ ∧ ¬Ψ)
0 else
=0
6) Merging: B. Piecewise Control Policy
4
f (x) if x > cfn The choice of action at a given state s, in our formulation,
f (x) ⊕ g(x) = (9) depends on the value of the accumulated resource x. We
g(x) else
represent this as a piecewise control policy π(s, x) where A(s)
7) Bound Function: Returns zero if x is above cu or below is a set of possible actions at state s:
cl ; otherwise return f (x). ⎧

⎪ a1 ∈ A(s) if x > xsa1
⎧ ⎪

⎪ ⎨..
⎨0 if x > cu
cu π(s, x) = . . (13)
Γcl (f (x)) = f (x) elseif x > cl (10) ⎪
⎪an ∈ A(s) elseif x > xsan

⎩ ⎪

0 else ⎩
∅ else

460
Algorithm 1 Solve for P robs (x, hs , XΦ), ∀s ∈ S Algorithm 3 Solve for P∼λ x:hs
[Aφ≤k+1 ]
1: P robs (x, hs , X Φ) ← 0, ∀s
0
∈ Sat(¬Φ) 1: P ← {∅}
hus −rs
P robs (x, hs , X Φ) ← Γhl −rs (1), ∀s ∈ Sat(Φ) 2: Get P robs (x, hs , φ≤k ), ∀s ∈ S
0
2:
s
3: for s ∈ S\Sat(Φ) do 3: for s ∈ S do
4: P robas (x, hs , XΦ) ← Λ(s, a), ∀a ∈ A(s) 4: if P robs (x + rs + rss , hs , φ≤k ) ∼ λ, ∀Pss > 0 then
5: P robs (x, hs , XΦ) ← (P robas 1 (. . .) 6 xsa1 ) ⊕ . . . 5: P ←s ∪ P
. . . ⊕ (P robas n (. . .) 6 xsan ) 6: end if
6: end for 7: end for
7: return P robs (x, hs , XΦ), ∀s ∈ S 8: return P
8: where Λ(s, a) 
hu −r
 · P robs (x + rs + rss , hs , X Φ))
a 0
= Γhsl −rss ( Pss
s Algorithm 4 Solve for P∼λ x:hs
[Eφ≤k+1 ]
s ∈S
1: P ← {∅}
2: Get P robs (x, hs , φ≤k ), ∀s ∈ S
Algorithm 2 Solve for P robs (x, hs , ΦU≤K Ψ), ∀s ∈ S 3: for s ∈ S do
1: P robs (x, hs , ΦU≤0 Ψ) ← 0, ∀s ∈ Sat(¬Ψ) 4: if P robs (x + rs + rss , hs , φ≤k ) ∼ λ, ∃Pss > 0
≤0 hus −rs and P robs (x, hs , φ≤k+1 ) ∼ λ then
2: P robs (x, hs , ΦU Ψ) ← Γhl −rs (1), ∀s ∈ Sat(Ψ)
3: for k = 1 to K do
s
5: P ←s ∪ P
4: for s ∈ Sat(Φ ∧ ¬Ψ) do 6: end if
5: P robas (x, hs , ΦU≤k Ψ) ← Λ(s, a, k − 1), ∀a ∈ A(s) 7: end for
6: P robs (x, hs , ΦU≤k Ψ) ← (P robas 1 (. . .) 6 xsa1 ) ⊕ . . . 8: return P
. . . ⊕ (P robas n (. . .) 6 xsan )
7: end for
8: end for PCTL for richer expressivity of safety-critical requirements.
9: return P robs (x, hs , ΦU≤K Ψ), ∀s ∈ S Along with quantifiers φ≤k ∈ {Xp, F≤k p, p1 U≤k p2 }, [Aφ≤k ]
10: where u Λ(s, a, k) and [Eφ≤k ] define the hard and soft satisfaction guarantees
h −r ≤k
 · P robs (x + rs + rss , hs , ΦU
a
= Γhsl −rss ( Pss Ψ)) that specify ‘all transitions from the initial state should lead to
s
s ∈S states that satisfy the formula φ≤k ’ and ‘at least one transition
from the initial state should lead to states that satisfy it’.
The solutions to computing the set of states that satisfy
For evaluation of the control policy, we define the hard guarantee and soft guarantee are shown in (15)
P robas (x, hs , · · ·≤k · · · ) as the PPF for the state s of and (16), where φ≤k ∈ {Xp, F≤k p, p1 U≤k p2 }. Note that
taking action a at time k. and this function would come Sat(P∼λ x:hs
[Aφ≤k ]) ⊆ Sat(P∼λ x:hs
[Eφ≤k ]). This is shown al-
from IV-A. The PPF with respect to a piecewise control gorithmically in Algs. 3 and 4.
policy π(s, x) is shown in (14). It is important to note that  
P robs (x, hs , · · ·≤k · · · ) and P robas (x, hs , · · ·≤k · · · ) should
Sat P∼λ x:hs
Aφ≤k+1 = {s ∈ S | ∀Pss > 0,
always be computed in the same iteration; they cannot be (15)
computed separately. ≤k
P robs (x + rs + rss , hs , φ ) ∼ λ}

P robs (x, hs , · · ·≤k · · · )


 
= (P robπ(s,x
s
1)
(x, hs , · · ·≤k · · · ) 6 xsa1 )⊕ x:hs
Sat P∼λ Eφ ≤k+1
= {s ∈ S | P robs (x, hs , φ≤k+1 ) ∼ λ
. . . ⊕ (P robπ(s,x
s
n)
(x, hs , · · ·≤k · · · ) 6 xsan )
⎧ and (P robs (x + rs + rss , hs , φ≤k ) ∼ λ, ∃Pss > 0)}
P robs 1 (x, hs , · · ·≤k · · · ) if x > xsa1
π(s,x )



⎪ (16)
⎨..
= .

⎪ P robs n (x, hs , · · ·≤k · · · ) elseif x > xsan
π(s,x )


⎩ D. Calculation Example
0 else
(14) We illustrate the calculation of a PPF using the simple
scenario shown earlier in Fig. 1(a). The agent can move left
C. Hard and Soft Guarantees or right with probability 0.8 of moving as intended, and prob-
For a given control action at a state, there may be a number ability 0.2 of self-transition. Attempting to move past the left
of paths that an agent may take due to uncertainty in transition. or right boundary always results in self-transition. The agent
In RT-PCTL, we add hard guarantee constraints and soft guar- starts in state s1 with entering resource value 0 and attempts to
antee constraints with quantifiers A and E to the conventional reach goal state s3 while maintaining the value of accumulated

461
resource between the resource bound hs = [0, 5], ∀s ∈ S.

π(s1 ) = Right,
π(s2 ) = Right


⎪ 0.0000 if x > +3.79



⎪ 0.7680 if x > +3.11





⎪ 0.6400 if x > +2.58



⎨0.7936 if x > +1.90 (17)
P rob1 (x, hs , F≤4 s3 ) = 0.7680 if x > +1.37



⎪ 0.7936 if x > +0.95

⎪ Fig. 2. Scenario environment.

⎪ if x > −0.26

⎪ 0.1536


⎪0.0256
⎪ if x > −1.21


0 else
A purely state-based policy for this example is shown
in (17), where P rob1 (x, hs , F≤4 s3 ) and π(s1 ) are the PPF and
policy at state s1 within four time steps. P rob1 (x, hs , F≤4 s3 )
represents the PPF of the logic specification to reach state s3
(a) (b)
with accumulated resource x within four time steps with Fig. 3. Glider dynamics in a hexagonal grid where (a) shows possible
resource constraint hs at state s1 . The policy is simply to move transitions and (b) defines the glider’s non-holonomic behaviour.
right at states s1 and s2 . Since we constrain the accumulated
resource to be above zero and below five, the agent is likely
to fail if it enters state s2 with zero accumulated resource;
transitioning from state s1 to state s2 results in accumulated into a 10x10 grid with six possible glider orientation values. A
resource −0.95. state-direction pair is denoted (s, d). Each state is labelled with
In contrast, evaluation of the piecewise control policy is S, D, S̄ and G that denote safe, danger, semi-safe and goal.
shown in (18). Here, the action mapping varies with the initial The task is formulated based on the labels. The glider fails its
accumulated resource and the agent attempts to go left until mission when: 1) it hits the boundary of the environment, 2) it
its accumulated resource exceeds +1.0. With such a policy, enters a forbidden state, or 3) its altitude goes out of resource
the probability of mission success has jumped from 0.1536 to bounds hs = [0, 30], ∀s ∈ S. The glider satisfies the mission
0.768. Note that the resource values can be any real number. when it completes its task without violating any constraints.
4 Fig. 3(a) shows the dynamics of the glider in this scenario.
Right if x > +1.0 There are three possible actions defined relative to current
π(s1 , x) = orientation. The two actions that correspond to 60-degree turns
Left else
lead to an altitude reduction of 0.1m, whereas maintaining
π(s2 , x) = Right, ∀x ∈ R the previous direction does not incur any loss of altitude.


⎪ 0.0000 if x > +3.79 Transitions are stochastic; the glider moves in its intended



⎪ 0.7680 if x > +3.11 direction with probability 0.8 and moves 60 degrees to either



⎪ side of the intended direction with probability 0.1.

⎪ 0.6400 if x > +2.58 (18)



⎨0.7936 if x > +1.90 For the purpose of evaluation, a piecewise control policy
P rob1 (x, hs , F≤4 s3 ) = 0.7680 if x > +1.37 is generated based on a heuristic that is divided into risky

⎪ and conservative actions based on the agent’s altitude when it
⎪0.7936 if x > +1.00


⎪ enters a region of interest. Note that in general, a control policy

⎪ 0.7680 if x > −0.21

⎪ in RT-PCTL is capable of having any number of actions based



⎪ 0.6400 if x > −1.21 on accumulated resources. The risky action gives the most


0 else direct path to the goal position without considering the energy
distribution along the path, whereas the conservative action
V. TASK P LANNING FOR AN AUTONOMOUS G LIDER gives the greatest expected return of instantaneous altitude
Suppose there is an autonomous thermal glider in a hexag- increase. The risky action is taken when the accumulated
onal grid-based environment that gains or loses altitude based resource exceeds the amount of resource required to take the
on instantaneous thermal wind energy. The glider has precise most probable path to the goal state, otherwise the conserva-
a priori knowledge of the time-invariant thermal energy dis- tive action is taken. We do not attempt to generate the optimal
tribution as shown in Fig. 2. The environment is discretised policy in this paper.

462
the heuristic control policy that is not capable of finding the
optimal solution for this formulation.
Fig. 5 shows probability of mission success evaluated by
approximating altitude using discretisation versus PPF based
on real-valued altitude at (s68 , d1 ). Probabilities were calcu-
lated using Alg. 2. In the discrete cases, accumulated reward
was rounded to the nearest value discretised at uniform reso-
lution (0.5m, 1.0m, and 2.0m) starting from 0.0m. Discrete
approximation evaluates success probability at discrete altitude
values, whereas the PPF is a piecewise-constant function. This
difference is important because success probability can change
abruptly with altitude. In Fig. 5, the PPF value has a dip
Fig. 4. The most probable path of a glider launched from minimum altitude at centred at 9.7m corresponding to a decision boundary in the
(s68 , d1 ) to reach the goal without entering danger states while maintaining
altitude between 0 and 30m.
policy. Above this range, the glider has sufficient energy to
fly directly to the goal. Below this range, the policy directs
the glider along a more energy-conservative path. Within
this range, the policy takes the direct route but has high
probability of failure. This is a good illustration of the benefit
of the PPF representation because although the glider has
reasonable direct control over lateral position, it has less direct
control over altitude. The PPF captures safety-critical altitude
conditions exactly, but discrete approximation in general does
not.
We also illustrate the use of hard and soft guarantee con-
straints in this scenario. For the hard guarantee, we would
like to find the set of states that satisfies the requirement
that the probability of mission success is greater than 0.28 in
all potential immediate stochastic transitions. The RT-PCTL
formula for the requirement is written in (20) where 7 states
satisfy the requirement, given resource bounds hs = [0, 30].
Fig. 5. Probabilities of mission success with respect to entering altitudes at
 
state (s68 , d1 ) after 30-step time horizon for approximation cases and PPF.    
0m:hs
Sat P≥0.28 A ¬DU≤30 G = (s29 , d1 ), . . . , (s38 , d4 )

The heuristic policy is defined as: (20)


⎧ Consider state (s38 , d3 ) which belongs to the set in (20).
⎨{d ∈ A(s) | ‘Most direct route to G’} if x > α
 The control policy at the state with altitude of zero metres is
π(s, x) = {d ∈ A(s) | max d
Pss · r s } else ,
⎩ d d 3 which leads to (s37 , d2 ), (s28 , d3 ) and (s29 , d4 ) where the
s ∈S
probabilities of satisfying the specification are 0.2843, 0.2841
(19)
and 0.3165 at time step 29. As expected from (20), all paths
where A(s) is the set of actions available at state s and α is from the state (s38 , d3 ) hold true for the mission requirement.
the negative sum of rewards along the most direct path to G As opposed to state (s38 , d3 ), the state (s58 , d3 ) does not
from s that avoids forbidden states. satisfy the hard constraint although the state itself has the
success probability of 0.2812, because one of the successor
A. Reach the Goal and Avoid Danger states has success probability less than 0.2387.
We consider an initial scenario where the glider is ‘to reach
the goal state within 30 time steps while avoiding any danger B. A Complex High-Level Mission Specification
states, and maintaining minimum altitude of zero metres’. The In this scenario, the glider has a more complex high-level
RT-PCTL path formula for the specification is [¬DU≤30 G]. mission specification as shown in Fig. 6. The mission is ‘to
The environment and most probable path from (s68 , d1 ) are reach the goal position within 30 time steps such that the glider
shown in Fig. 4. The PPF represents success probability over always travels along safe states or semi-safe states if all next
all possible paths from the given state while following the locations from the state are not dangerous’.  The RT-PCTL  path
given piecewise control policy, not just for the path in Fig. 4. formula for the specification is [ S ∨ (S̄ ∧ AX¬D) U≤30 G].
Generally, the probability of success increases with altitude. The most probable path is shown in Fig. 6.
However, there are certain altitude ranges where success We also consider an example of a soft guarantee where the
probability decreases with increasing altitude. This is due to glider is to have probability greater than 0.20 of holding the

463
R EFERENCES
[1] A. Abate, J.-P. Katoen, J. Lygeros, and M. Prandini. Approx-
imate model checking of stochastic hybrid systems. Eur. J.
Contr., 16(6):624–641, 2010.
[2] C. Baier and J.-P. Katoen. Principles of model checking. The
MIT Press, 2008.
[3] A. Belta, C.and Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins,
and G.J. Pappas. Symbolic planning and control of robot
motion. IEEE Rob. Autom. Mag., 14(1):61–70, 2007.
[4] A. Bhatia, M.R. Maly, L.E. Kavraki, and M.Y. Vardi. Motion
planning with complex goals. IEEE Rob. Autom. Mag., 18(3):
55–64, 2011.
Fig. 6. The most probable path of a glider launched from minimum altitude
[5] A. Biere, C. Artho, and V. Schuppan. Liveness checking as
at (s68 , d1 ) to reach the goal while traversing safe states or semi-safe states safety checking. Electron. Notes Theor. Comput. Sci., 66(2):
if all successor states are not danger states and maintaining altitude above 160–177, 2002.
zero. [6] L. Bobadilla, O. Sanchez, J. Czarnowski, K. Gossman, and
LaValle S.M. Controlling wild bodies using linear temporal
logic. In Proc. of Robotics: Science and Systems, 2011.
[7] J.R. Burch and E.M. Clarke. Sequential circuit verification using
mission specification in at least one direction. The RT-PCTL symbolic model checking. In Proc. of ACM/IEEE DAC, pages
formula is given in (21) and there are 36 states satisfying the 46–51, 1990.
requirement. Consider the state (s58 , d3 ) which belongs to the [8] L.A. Cortes, P. Eles, and Z. Peng. Formal coverification
set in (21). The successor states from the starting state have the of embedded systems using model checking. In Proc. of
EUROMICRO, pages 106–113, 2000.
success probabilities of 0.2061, 0.2064 and 0.1812. Therefore
[9] X.C. Ding, M. Kloetzer, Y. Chen, and C. Belta. Automatic
the soft guarantee requirement is met. deployment of robotic teams. IEEE Rob. Autom. Mag., 18(3):
  75–86, 2011.
   [10] X.C. Ding, S.L. Smith, C. Belta, and D. Rus. LTL Control in
0m:hs
Sat P≥0.20 E S ∨ (S̄ ∧ AX¬D) U≤30 G = Uncertain Environments with Probabilistic Satisfaction Guaran-
(21)
  tees. 18th IFAC World Congress, 18, 2011.
(s17 , d2 ), . . . , (s78 , d4 ) [11] E.A. Emerson and E. Clarke. Using branching time temporal
logic to synthesize synchronization skeletons. Sci. Comput.
VI. C ONCLUSIONS AND F UTURE W ORK Programming, 2(3):241–266, 1982.
[12] M. Groesser. Reduction methods for probabilistic model check-
We have presented an extension to PCTL for systems with ing. PhD thesis, TU Dresden, 2008.
[13] B. Johnson and H. Kress-Gazit. Probabilistic analysis of
real-valued resource threshold constraints and stochastic tran-
correctness of high-level robot behavior with sensor error. In
sitions. We introduced the piecewise control policy and pre- Proc. of Robotics: Science and Systems, 2011.
sented algorithms for model-checking a given policy against a [14] H. Kress-Gazit. Ensuring correct behavior: Formal methods
formal specification and performance guarantee. We validated for hardware and software systems [Special Issue]. IEEE Rob.
our theoretical results through a simulated example of motion Autom. Mag., 18(3), 2011.
[15] H. Kress-Gazit, G.E. Fainekos, and G.J. Pappas. Translating
planning among obstacles for an autonomous gliding aircraft.
structured english to robot controllers. Adv. Robot., 22(12):
The glider example demonstrates the significance of our 1343–1359, 2008.
results. We provide a level of confidence in the glider’s [16] H. Kress-Gazit, T. Wongpiromsarn, and U. Topcu. Correct,
ability to complete its mission without knowing in advance reactive, high-level robot control. IEEE Rob. Autom. Mag., 18
the exact path the glider will follow through discrete high-level (3):65–74, 2011.
[17] M. Kwiatkowska, G. Norman, and D. Parker. Stochastic model
states. Model-checking in our method provides a performance checking. In Formal Methods for the Design of Computer,
guarantee that applies to a piecewise control policy where real- Communication and Software Systems: Performance Evalua-
valued energy resources are represented exactly. tion, volume 4486 of LNCS, pages 220–270. Springer, 2007.
RT-PCTL represents an important step towards the grand [18] M. Lahijanian, J. Wasniewski, S.B. Andersson, and C. Belta.
Motion planning and control from temporal logic specifications
goal of complex mission specifications for stochastic systems, with probabilistic satisfaction guarantees. In Proc. of IEEE
but many open problems remain. We have shown how to ICRA, pages 3227–3232, 2010.
evaluate a given control policy, but generating an optimal [19] N.R. Lawrance and S. Sukkarieh. Autonomous exploration of
piecewise control policy is an important area of future work. a wind field with a gliding aircraft. J. Guid. Control Dynam.,
Understanding the scalability of model-checking with RT- 34(3):719–733, 2011.
[20] G. Madl, S. Abdelwahed, and D.C. Schmidt. Verifying dis-
PCTL and developing computationally efficient algorithms for tributed real-time properties of embedded systems via graph
formal verification is an open problem in general. We are also transformations and model checking. Real-Time Systems, 33
interested in extending RT-PCTL for dynamic environments (1):77–100, 2006.
where state labels change over time, which would be useful in [21] A. Pnueli. The temporal logic of programs. In Proc. of FOCS,
tasks such as searching and tracking. Allowing for Gaussian- pages 46–57, 1977.
[22] A. Undurti and J.P. How. An online algorithm for constrained
distributed and partially observable resource values are further POMDPs. In Proc. of IEEE ICRA, pages 3966–3973, 2010.
major avenues of future work.

464
Hierarchical Motion Planning in Topological
Representations
Dmitry Zarubin† , Vladimir Ivan∗ , Marc Toussaint† , Taku Komura∗ , Sethu Vijayakumar∗
∗School of Informatics, University of Edinburgh, UK
† Department of Computer Science, FU Berlin, Germany

Abstract—Motion can be described in alternative represen-


tations, including joint configuration or end-effector spaces,
but also more complex topological representations that imply
a change of Voronoi bias, metric or topology of the motion
space. Certain types of robot interaction problems, e.g. wrapping
around an object, can suitably be described by so-called writhe
and interaction mesh representations. However, considering mo-
tion synthesis solely in topological spaces is insufficient since
it does not cater for additional tasks and constraints in other
representations. In this paper we propose methods to combine
and exploit different representations for motion synthesis, with
specific emphasis on generalization of motion to novel situations.
Our approach is formulated in the framework of optimal con-
trol as an approximate inference problem, which allows for a
direct extension of the graphical model to incorporate multiple Fig. 1. KUKA LWR 4 robotic arm reaching through a hollow box with task
being defined in combined Writhe and interaction mesh space, showing an
representations. Motion generalization is similarly performed by
example of planning and dynamic remapping using topological representations
projecting motion from topological to joint configuration space. as described in Section V-B.
We demonstrate the benefits of our methods on problems where
direct path finding in joint configuration space is extremely
hard whereas local optimal control exploiting a representation An alternate representation may imply a different metric, such
with different topology can efficiently find optimal trajectories. that a trajectory that is a complex path in one space becomes a
Further, we illustrate the successful online motion generalization simple geodesic in another. 3) An alternate representation may
to dynamic environments on challenging, real world problems. change the topology of the space such that local optimization
I. I NTRODUCTION in one space is sufficient for finding a solution whereas global
Many relevant robotic tasks concern close interactions with optimization (randomized search) would be needed in the
complex objects. While standard motion synthesis methods other. 4) Finally, different representations may allow us to
describe motion in configuration space, tasks that concern express different motion priors, for instance, a prior preferring
the interaction with objects can often more appropriately be “wrapping-type motions” can be expressed as a simple Brow-
described in representations that reflect the interaction more nian motion or Gaussian process prior in one space, whereas
directly. For instance, consider the wrapping of arms around the same Brownian motion prior in configuration space renders
an object, e.g. embracing a human. Described in joint space, wrapping motions extremely unlikely.
such a motion is complex and varies greatly depending on the As opposed to the computer animation domain, where
embraced object. When describing the motion more directly topological representation have recently been used [5], syn-
in terms of the interaction of arm segments with object parts thesizing motion in such abstract spaces for planning and
(e.g. using the interaction mesh representation that we will control of robotic systems come with additional challenges.
introduce below) we gain better generalization to objects of Typically, control tasks are specified in world (or end effector)
different shape or position and online adaptation to dynamic coordinates, the obstacles may be observed in visual (or
objects. Similar arguments apply to other scenarios, e.g. multi- camera) coordinates, and the joint limits of the actuators are
link articulated robots reaching through small openings and typically described in joint coordinates. Therefore, the general
complex structures, surfaces wrapping around objects and challenge is to devise motion synthesis methods that combine
fingers grasping and manipulating objects. In such cases, the benefits of reasoning in topological coordinates while
the alternate representations greatly simplify the problems of preserving consistency across the control coordinates and
motion generalization as well as planning. managing to incorporate dynamic constraints from alternate
There are several formal views on the implication of an representations seamlessly.
alternate abstract representation: 1) In the context of ran- In this paper, we propose methods that can combine the
domized search, such representations alter the Voronoi bias different representations at the abstract and lower level and
and therefore, the efficiency of RRT or randomized road plan simultaneously at these different levels to efficiently
maps. [9] demonstrate this effect in the case of RRTs. 2) generate optimal, collision free motion that satisfy constraints.

465
Each representation has their own strength and weaknesses and researchers have developed knotting robots that abstract the
coupling them can help solve a wider range of problems. More status of the strands and plan the maneuvers by probabilistic
specifically, our contributions are: road maps [15, 18]. These approaches represent the rope
• The introduction of topological representations tuned to state based on how it is overlapping with itself when viewing
the domain of robot motion synthesis and manipulation, it from a specific direction [4]. Then, the state transition is
with a strong focus on the interaction of manipulation achieved by moving the end points toward a specific direction.
and the environment. Such a representation is not very convenient due to the view-
• An extension of a stochastic optimal control framework dependence and the difficulty of manipulating the rope.
that combines various representations for motion synthe- Alternate topological representations that describes the re-
sis. This is expressed in a graphical model that couples lationships between 1D curves using their original configura-
motion priors at different levels of representations. tions are proposed for motion synthesis [5, 16] and classifying
• A method for motion generalization (remapping) to novel paths into homotopy groups [1]. In [5], a representation called
situations via a topological representation. Topology Coordinates that is based on the Gauss Linking
• Experiments that validate the benefit of Bayesian infer- Integral is proposed to synthesize tangling movements. Bodies
ence planning with topological representations in prob- are mapped from the new representation to the joint angles by
lems involving complex interactions. the least squares principle. In [16], the same representation
In the rest of this section, we will first review previous is applied for controlling the movement of a robot that puts a
work on the use of topological representations for character shirt on a human. The mapping from the new representation to
animation and abstract spaces used for robot motion synthesis. the low level representation is learned through demonstrations
Section II then introduces two specific types of alternate by humans. Such an approach is applicable only when the
representations, one of which has the origin in topological desired movement is consistent and simple, but not when the
properties of strings—the writhe—and the other capturing planning needs to be done between arbitrary sample points in
the relative distances between interacting parts. Section III the state space. An idea to abstract the paths connecting a start
presents our approach to combining topological and configura- point and the end point using homotopy classes in 2D based on
tion space representations in an optimal control setting through complex numbers and in 3D based on the Ampere’s law [1] are
the Approximate Inference Control (AICO) [17, 13, 12]) also proposed. Here the paths are only classified into homotopy
framework. This naturally leads to an extension that includes groups and there is no discussion about the mapping from the
random variables for both the topological and configuration topological representation to the low level control coordinates.
space representations, with their specific motion priors coupled These representations are only applicable for 1D curves and
via the graphical model. Section IV addresses using topolog- is not applicable for describing the relationship between 2D
ical representation to generalize motion to novel situations surfaces, which is often needed for controlling robots.
by “remapping” it using this more abstract space. Finally, Another representation called interaction mesh that abstracts
in Section V we present experiments on using the proposed the spatial relationship of the body parts is introduced in [6].
methods to solve motion synthesis problems like unwrapping The relationship is quantified by the Laplacian coordinates of
that are infeasible (e.g. for RRT methods) without exploiting the volumetric mesh whose points are sampled on the body
alternate representations and demonstrate generalizability to parts composing the scene. The advantage of this approach is
novel, dynamic situations. that it is not restricted to describe the relationship between 1D
curves but is extendable to those between 2D surfaces. On the
A. Previous Work contrary, it is a discrete representation which is only valid in
There is strong interest in reducing the dimensionality of the the neighborhood of the posture that the volumetric mesh is
state space of robots to simplify the control strategy. In [14], computed.
the dimensionality is reduced by projecting the states into the In summary, using a representation based on the spatial
task space and biasing the exploration of random trees towards relations is a promising direction to reduce the dimensionality
the goal in the task space. Although such an approach is valid of the control, although little work has been done for path
if the work space is an open area with low number of obstacles, planning or optimal control in such state spaces.
it is difficult to use when the task involves close interactions
between the robot and objects in the scene. II. T OPOLOGICAL REPRESENTATIONS
Machine learning techniques have been used to reduce the The topological spaces we will introduce significantly mod-
dimensionality of the task space. In [2], a latent manifold ify the metric and topology of the search space. For instance,
in joint space is computed using Gaussian process from points that are near in the topology space might be far in
sample configurations produced by an expert. This manifold configuration space, or a linear interpolation in topology space
is, however, defined by samples from a valid trajectory in joint may translate to complex non-linear motion in configuration
space and it does not capture state of the environment directly. space. The motion synthesis methods proposed in the next sec-
In order to cope with problems of close interactions, it tion are independent of the specific choice of representation.
is necessary to abstract the state space based on the spatial Here we will introduce two specific examples of topological
relations between the body parts and objects. Several robotics representations which we will also use in the experiments.

466
Fig. 2. Illustration of the definition of writhe for two segments. One - ab -
belongs to the manipulator and another - cd - is a part of the obstacle.

These representations have previously been used in the context Fig. 3. Interaction mesh created from the obstacles (dark balls), the goal
(bright ball) and the KUKA LWR 4 arm. Edges of the interaction mesh are
of computer animation, namely the writhe representation [5] shown as lines connecting the balls. Edges between the obstacles have been
that captures the “windedness” of one object around another, removed.
and the interaction mesh representation [6] that captures the
relative positioning of keypoints between interacting objects. The writhe matrix W is a detailed description of the relative
Both types of representations can be formalized by a map- configuration of two chains. Fig. 5 illustrates 2 configurations
ping φ : q → y from configuration space to the topological together with their writhe matrix representation. Roughly, the
space, where q ∈ Rn is the configuration state with n degrees amplitude of the writhe (shading) along the diagonal illustrates
of freedom. To be applicable within our motion synthesis which segments are wrapped around each other.
system, we need to compute φ and its Jacobian J for any From the full writhe matrix we can derive simpler metrics,
q. usually by summing over writhe matrix elements. For instance,
the Gauss linking integral, which counts the mean number of
A. Writhe matrix and scalar intersections of two chains when projecting from all directions,
The writhe [7] is a property of the configuration of two is the sum of all elements of the writhe matrix.  In our
kinematic chains (or in the continuous limit, of two strings). experiments, we will also use the vector wj = i Wij as a
Intuitively the writhe describes to what degree (and how and representation of the current configuration. Writhe, however,
where) the two chains are wrapped around each other. does not provide a unique mapping to joint angles which is
Let us describe two kinematic chains by positions p1,2 1:K of why we require additional constraints and cost terms especially
their joints, where pik ∈ R3 is the kth point of the ith chain. in scenarios where wrapping motion is not dominant.
Using standard kinematics, we know how these points depend
on the configuration q ∈ Rn , that is, we have the Jacobian B. Interaction mesh
∂pi
Jki := ∂qk for each point. The writhe matrix is a function of An interaction mesh describes the spatial interaction of the
the link positions p1,2 1:K .
robot with its environment [6]. The interaction mesh is a
More precisely, the writhe matrix Wij describes the rel- function of a graph connecting a set of landmark points on the
ative configuration of two points (p1i , p1i+1 ) on the first chain robot and in the environment. More precisely, assume we have
and two points (p2j , p2j+1 ) on the second where i, j are indexes a set of points P = {pi }i including landmarks on a kinematic
of points along the first and the second chain respectively. For robot configuration and on objects in the environment. Let G
brevity, let us denote these points by (a, b) = (p1i , p1i+1 ) and be a (fully or partially connected) graph on P . The selection of
(c, d) = (p2j , p2j+1 ), respectively (see Fig. 2). Then landmarks and the graph connectivity should reflect the desired
interaction of the environment and the robot. To each vertex
n nd n nc n na n nb p ∈ G in the graph, we associate the Laplace coordinate
Wij =sin-1 a +sin-1 b +sin-1 c +sin-1 d
|na ||nd | |nb ||nc | |nc ||na | |nd ||nb |  r
(1) LG (p) = p − (4)
|r − p|W
where na , nb , nc , nd are normals at the points a, b, c, d with r∈∂G p

respect to the opposing segment (c.f. Fig. 2), where ∂ G p is the neighborhood of p in the graph G and
−1
na = ac×ad , nb = bd×bc , nc = bc×ac , nd = ad×bd . (2) W = s∈∂G p |s − p| is the normalization constant for
the weighting inversely proportional to |r − p|. This constant
The above equations for computing the Writhe are an analyt- ensures that the weights sum up to one and therefore make
ical expression for Gauss integral along two segments which the representation invariant to scale. The collection of Laplace
is the solid angle formed by all view directions in which coordinates of all points,
segments (a, b) and (c, d) intersect [7].
Since the writhe matrix is a function of the link positions M = (LG (p))p∈P , (5)
p1,2
1:K we can compute its Jacobian using the chain rule
is a 3|P |-dimensional vector which we denote as interaction
∂Wij ∂Wij 1 ∂Wij 1 ∂Wij 2 ∂Wij 2
= J + J + J + J . (3) mesh. As with the writhe matrix, we assume the Jacobian of
∂q ∂p1i i ∂p1i+1 i+1 ∂p2j j ∂p2j+1 j+1 all robot landmarks in P is given and the Jacobian of other

467
environmental landmarks is zero. The Jacobian ∂M ∂q of the where cu describes costs for the control and cx describes task
interaction mesh is given via the chain rule. costs depending on the state (usually a quadratic error in some
We would like to point out that the squared metric in M - task space). The robot dynamics are described by the transition
space has a deformation energy interpretation [6]. To see probabilities P (xt+1 | ut , xt ). The AICO framework translates
this, consider a change of position of a single vertex p to this to the graphical model
a new position p . The deformation energy associated to such
a change in position is defined based on the neighborhood in &
T &
T
p(x0:T , u0:T ) ∝ P (x0 ) P (ut ) P (xt | ut-1 , xt-1 ) (8)
a tetrahedronisation T of the point set: t=0 t=1
1 &
T
ET (p ) = LT (p ) − LT (p)2 (6) · exp{−cx (xt )} .
2
t=0
where LT (p) are the Laplace coordinates of p w.r.t. the
tetrahedronisation T . The difference to our definition of the The control prior P (ut ) = exp{−cu (ut )} reflects the control
interaction mesh is that we consider Laplace coordinates LG costs, whereas the last term exp{−cx (xt )} reflects the task
w.r.t. the fully connected graph G instead of only T , which is costs and can be interpreted as “conditioning on the tasks” in
a subgraph of the fully connected graph G. Since different the following sense: We may introduce an auxiliary random
configurations lead to topologically different T , using LG variable zt with P (zt = 1 | xt ) ∝ exp{−cx (xt )}, that is,
has the benefit of more continuous measures (deformation z = 1 if the task costs cx (xt ) are low in time slice t. The
energies as well as Jacobian). Neglecting this difference, above defined distribution is then the posterior p(x0:T , u0:T ) =
minimizing squared distances in M -space (as is implicit, P (x0:T , u0:T | z0:T = 1). AICO in general tries to estimate
e.g., in inverse kinematics approaches as well as the optimal p, in particular the posterior trajectory and controls. In [17],
control approaches detailed below) therefore corresponds to this is done using Gaussian message passing (comparable to
minimizing deformation energies. Kalman smoothing) based on local Gaussian approximations
around the current belief model. In [13], theory on the general
III. O PTIMAL CONTROL COMBINING TOPOLOGICAL AND equivalence of this framework with stochastic optimal control
CONFIGURATION SPACE REPRESENTATIONS is detailed. Generally, the approach is very similar to differ-
The motivation for introducing topological spaces is that ential dynamic programming [10] or iLQG [8] methods with
they may provide better metrics or topology for motion the difference that not only backward messages or cost-to-go
synthesis, ideally such that local optimization methods within functions are propagated but also forward messages (“cost-
topological space can solve problems that would require more to-reach functions”), which allows AICO to compute a local
expensive global search in configuration space. In this section, Gaussian belief estimate b(xt ) ∝ α(xt )β(xt ) as the product of
we describe our method for exploiting topological spaces for forward and backward message and utilize it to iterate message
motion synthesis in an optimal control context. We formulate optimization within each time slice.
the approach within the framework of Approximate Infer-
ence Control[13, 17], which is closely related to differential B. Expressing motion priors in topological spaces and cou-
dynamic programming [10] or iLQG [8, 11, 3] (see details pling spaces
below), and will allow us to use a graphical model to describe To estimate the posterior, the controls ut can be marginal-
the coupling of motion estimation on both representations. The ized, implying the following motion prior:
main difference between iLQG and AICO is that iLQG (and 
other classical methods) do not have have an counterpart of P (xt+1 | xt ) = P (xt | ut-1 , xt-1 ) P (ut ) du . (9)
the forward messages (see [17] for more details). AICO also u
allows for easy definition of the problem by extending the This motion prior arises as the combination of the system
graphical model. In the following, we first briefly introduce dynamics and our choice of control costs cu (ut ) in x-space;
this framework before we explain how to couple additional for LQ systems, it is a linear Gaussian.
representations in the probabilistic inference framework. This motion prior is a unique view on our motivation for
topological representations. In the introduction, we mentioned
A. Approximate Inference Control
the impact of representations on the Voronoi bias, the metric,
Approximate Inference Control (AICO) frames the problem or the topology. In other words, successful trajectories are
of optimal control as a problem of inference in a dynamic likely to be “simpler” (easier to find, shorter, local) in an
Bayesian network. Let xt be the state of the system—we appropriate space. In Machine Learning terms, this is ex-
will always consider the dynamic case where xt = (qt , q̇t ). pressed in terms of a prior. In this view, topological spaces
Consider the problem of minimizing (the expectation of) the are essentially a means to express priors about potentially
cost successful trajectories—in our case we employ the linear

T Gaussian prior in a topological space to express the belief
C(x0:T , u0:T ) = [cx (xt ) + cu (ut )] (7) that trajectories may appear “simpler” in a suitable topological
t=0 space.

468
y0 y1 y2 yT topology−based space step. To overcome this limitation, we would have to iterate
z0 z1 z2 zT tasks inference between levels. For the problems investigated in our
x0 x1 x2 xT configuration space experiments, the approximation scheme above is sufficient.
b̂(xT )
End-state posterior estimation computes an approximate
u0 u1 u2 uT control belief b̂(xT ) ≈ P (xT | x0 , z0:T = 1) about the final
state given the start state and conditioned on the task. This
Fig. 4. AICO in configuration and topological space. The grey arcs represent approximation neglects all intermediate task costs and assumes
the approximation used in the end-state posterior estimation.
linear Gaussian system dynamics of the form
P (xt | xt-1 ) = N(xt | At xt-1 + at , Wt ) . (10)
However, using AICO with a linear Gaussian motion prior
in topology space is not sufficient to solve general motion We integrate the system dynamics,
synthesis problems: 1) The computed posterior in topology  &
T
does not directly specify an actual state trajectory or control P (xT | x0 ) = P (xt | xt-1 ) , (11)
law on the joint level. 2) We neglect the problem of mini- x1:T -1 t=1
mization of control and task costs originally defined on the
which corresponds to the grey arc in Fig. 4. For stationary
joint level. To address these issues, we need mechanisms to
linear Gaussian dynamics, we have
couple inference in topological and state space. We do so by
coupling topological and joint state representations in AICO’s T -1 T -1
i
graphical model framework. P (xT |x0 ) = N(xT |AT x0 + Ai a, Ai W A ) , (12)
Fig. 4 displays a corresponding graphical model. The bot- i=0 i=0
tom layer corresponds to the- standard AICO setup, with the where superscript on A states for a power of matrix, defined
motion prior P (xt+1 | xt ) = u P (xt | ut-1 , xt-1 ) P (ut ) du im- iteratively Ai = A ∗ Ai−1 . To estimate b̂(xT ), we condition
plied by the system dynamics and control costs. Additionally on the task,
it includes the task costs represented by P (zt = 1 | xt ) =
P (zT = 1 | xT ) P (xT | x0 )
exp{−cx (xt )}. The top layer represents a process in topolog- P (xT | x0 , z = 1) = . (13)
ical space with an a priori given linear Gaussian motion prior P (zT = 1 | x0 )
P (yt+1 |yt ). Both layers are coupled by introducing additional Since we assume P (xT | x0 ) to be Gaussian, using a local
factors f (xt , yt ) = exp{− 12 ρ||φ(qt ) − yt ||2 }, which essentially Gaussian approximation of the task P (zT = 1 | xT ) around
aim to minimize the squared distance between the topological the current mode of b̂(xT ), P (xT | x0 , z = 1) can be
state yt and the one computed from the joint configuration approximated with a Gaussian as well. We iterate this by
φ(qt ), weighted by a precision constant ρ. Note that for alternating between updating the end-state estimate b̂(xT ) and
Gaussian message passing between levels using a local lin- re-computing the local Gaussian approximation of the task
earization of φ (having the Jacobian of the topological space) variable. A Levenberg-Marquardt type damping (depending
is sufficient. These factors essentially treat the topological state on monotonous increase of likelihood) ensures robust and fast
yt as an additional task variable for the lower level inference, convergence.
analogous to other potential task variables like end-effector
position or orientation. IV. G ENERALIZATION AND REMAPPING USING A
TOPOLOGICAL SPACE
C. End-state posterior estimation
The optimal control approach presented in the previous
Probabilistic inference in the full factor graph given in Fig. 4 section exploits the additional topological space to generate
would require joint messages over configuration and topolog- an optimal trajectory (and controller). However, the computed
ical variables or loopy message passing. We approximate this optimal trajectories may no longer be valid when there is a
inference problem by a stage-wise inference process: change in the environment, e.g. the obstacles have moved.
1) We first focus on approximating directly an end-state In order to cope with this issue, we propose a per-frame re-
posterior b̂(xT ) for the end-state xT which fulfils the mapping approach in which the optimal trajectory in the topo-
task defined in configuration space. We explain the ∗
logical representation y0:T is inverse-mapped to the configura-
method used for this below. tion space according to the novel condition of the environment.
2) Accounting for the coupling of this end-state posterior to Technically, this is done by computing the configuration
the topological representation, we compute a trajectory of the system per-frame such that the remapping error from
posterior in topological space. ∗
the original optimal topological trajectory y0:T is minimized.
3) We then project this down to the joint level, using AICO However, topological representations such as the writhe matrix
in configuration space coupled to the topological space or the interaction mesh are very high-dimensional—often
via factors introduced above. higher dimensional than the configuration space itself. This is

Clearly this scheme is limited in that the initial inference in in strong contrast to thinking of y0:T as a lower dimensional

topology space only accounts for the task at the final time task space like an endeffector space. Therefore, following y0:T

469
exactly is generally infeasible and requires a regularisation 1.8
1.6

Chain segments
procedure that minimizes the 1-step cost function: 15 1.4
1.2
f (qt+1 ) = ||qt+1 − qt − h|| + ||φ(qt+1 ) −
2
y ∗ ||2C , (14) Time 1
10
argmin f (qt+1 ) = qt + J  (y ∗ − yt ) + (I − J  J)h (15) 0.8
0.6
qt+1 5 0.4
with J  = J (JJ + C -1 )-1 0.2
y0 10 20 30 40 yT
where C describes a cost metric in y-space. Time
For the case of the interaction mesh, we mentioned the
relation of a squared metric C in M -space to the deformation Fig. 6. Each column represents a state in writhe space - values show how
energy. Therefore, using the per-frame remapping to follow much each segment of the manipulator is twisted around the stick at each
∗ time step.
an interaction mesh reference trajectory M0:T essentially tries
to minimize the deformation energy between the reference 4 Comparison of planning algorithms
x 10
Mt∗ and the actual φ(qt ) at each time step. This implies 8

Number of collision detection calls


generalizing to new situations by approximately preserving Unidirectional RRT
6 Bi−directional RRT
relative distances between interacting objects instead of di-
AICO with WritheTV
rectly transferring joint angles. In conjunction with the use of
feedback gains, the methodology proposed here is able to cope 4

with dynamic environments (see Section V-C) and bounded


2
unpredictable changes.

V. E XPERIMENTS 0
0 100 200 300 400 500 600 700 800 900
Angle to unwrap (degrees)
A. Rope unwrapping and reaching using writhe space
As an example of a possible application of the writhe Fig. 7. Performance of planning algorithms for unwrapping task. Computa-
tional time is proportional to the number of collision detection calls.
space abstraction we simulated a manipulator, consisting of
20 segments and a hand with three fingers, making in total The task is to plan a trajectory which should grasp the
29 DoF. Initially this rope-like manipulator is twisted two and black cylinder without colliding with the pole (Fig. 5(a),5(b)
a half times around a pole, giving us approximately 900◦ of and video available at http://goo.gl/fwSxG). Clearly, a local
writhe density (See Fig. 5(a)). feedback approach using Inverse Kinematics will experience
failure in this task. On the other hand, a successful trajectory
can be well captured as a linear interpolation in writhe
space and projected back to the configuration space using
the coupling described in Section II-A. Fig. 6 illustrates an
example of a unwrapping trajectory in topological space when
all rows of writhe matrix are summed up into one column,
representing the current state.
Hierarchical AICO conditioned on the end-state in writhe
space yT was able to generate locally optimal trajectories,
consisting of 50 time steps, in only few iterations, requiring
a relatively small number of expensive collision checks (less
than 1000). Comparison with Rapidly-exploring Random Tree
(RRT) planning for this reaching task revealed a dependence of
the performance on distance between end-effector and object
position. Moreover, total costs of obtained trajectories were on
(a) initial configuration (b) final configuration
average 100 times higher than those generated with the local
optimizer. Here, the end-state in configuration space qT was
given as a target for RRTs.
For more systematic evaluation of our planning platform
we have designed a sequence of unwrapping trajectories -
gradually increasing the relative angle to be unwrapped. This
sequence of final states was given as goals to uni- and bi-
(c) initial writhe matrix (d) final writhe matrix
directional RRT planners. The results demonstrate that for
simple trajectories (e.g. in case of nearby lying objects) all
Fig. 5. The experimental task is to grasp the object without collisions. methods have no difficulties, whereas starting with one and a
Corresponding writhe matrices are depicted on lower plots - the darkness
corresponds to the amplitude of the writhe value. half of full twist, unidirectional search fails and bi-directional

470
Target for entering loop in writhe space
significantly slows down. (See Fig. 7.)
0.2
In this comparison, the RRTs solved a somewhat simpler 1

problem than our system: For the RRTs we assumed to know 0.15

Loop segments
the final state qT in configuration space – we take our final 2

pose estimate from Section III-C as the target qT for RRTs. 0.1

This is in contrast to our planning platform, where we use the 3

0.05
final pose estimate only to estimate a final topological state
4
yT and then use the hierarchical AICO to compute an optimal 0
trajectory (including an optimal qT ) conditioned on this final 1 2
Robot segments
3

topological state. Therefore, the RRT’s problem is reduced (a) (b)


to growing to a specific end state. We applied the standard
method of biasing RRT search towards qT by growing the Fig. 8. (a) writhe space target for passing the last link of a robotic arm
through a loop. (b) the configuration corresponding to the target in Writhe
tree 10% of the time towards qT instead of a random sample space. The loop was built by connecting the black cylinders attached to the
of the configuration space. Knowing qT also allowed us to rim of the hollow box.
test bi-directional RRTs, each with 10% bias to grow towards
a random node of the other tree. Further, RRTs output non- positions of the obstacles. For this the position of a hollow box
smooth paths whereas AICO produces (locally) optimal dy- defining the loop was tracked using magnetic motion tracking
namic trajectories since it minimizes dynamic control costs1 . system in our experiment on the real robot. The computation of
locally optimal trajectories using this methods required 3 to 6
B. Dynamic reaching through a loop using writhe and inter- AICO iterations. We compared to generating trajectories using
action mesh only classical end-effector position and collision avoidance
Writhe space is a suitable representation for tasks that methods. This required 20 to 30 iterations which shows that
involve interactions with chains—or loops—of obstacles. As a combined writhe and interaction mesh space is a better choice
second demonstration, we have altered the task from Experi- of representation than end-effector position with collision
ment V-A to reaching through a hollow box, where the rim of avoidance.
the box forms a loop of segments, see Fig. 8(b). The interac- We also tested online remapping as described in Section IV
tion of the manipulator with the box can be described by the using both, the writhe and interaction mesh space, to test
writhe representation between this loop of box segments and behavior of the system when the box position is changed
the manipulator. Classically this problem would be addressed dynamically on the fly. A video of this experiment can be
using only an end-effector task variable and collision checks in found at http://goo.gl/fwSxG.
joint configuration space. The advantage of using writhe as a
description of the interaction is in defining the task as a relative C. Dynamic obstacle avoidance using interaction mesh space
configuration of the robot and the loop—this relative descrip- Finally we present an experiment in which the robot ma-
tion remains effective also when the situation changes or the nipulator reaches a target object in an environment where
box is moved dynamically. The writhe matrix corresponding spherical obstacles dynamically move around. We show that
to the final configuration contains a peak around the last link the cost for the movement stayed close to the optimal value
which passes through the box (see Fig. 1 and 8). In addition to when the movement is adjusted through the local re-mapping
this, target in writhe space does not uniquely define the task for scheme described in Section IV in response to the unpre-
all arbitrary positions of the box (unlike the unwrapping task dictable random displacement of the obstacles.
in Experiment V-A). Fig. 8(b) bottom shows configuration that The experiment proceeded as follows. We used AICO with
does not reach through the loop but reaches the writhe space work-space target for the end-effector and collision avoidance
target. The writhe is computed as absolute value of Gauss task variables to compute the optimal trajectory under a given
linking integral which means that direction of wrapping is obstacle configuration. The obstacles are then randomly moved
not clearly defined. This is the reason why one of the arm in the workspace and the trajectories were recomputed by
segments wraps around the rim of the box in wrong direction. (1) re-mapping using Interaction mesh and (2) re-planning
In this demonstration we use an additional interaction mesh using AICO with complete knowledge of the obstacle motion.
representation which is well suited to represent movement in The costs of the two methods with respect to the amplitude
a way that keeps appropriate distance between robot links and of the obstacle displacement are compared in Fig. 9. While
the obstacle and is mostly invariant w.r.t. the concrete position re-mapping approach (which does not have the benefit of
of the box. Explicit collision avoidance is then superfluous. We knowledge of the updated obstacle locations in remaining time
coupled all three representations—writhe, interaction mesh, steps) results in a higher cost, it stays within a comparable
and joint configuration space—using AICO by extending the range of the full AICO re-planning. If we simply rely on a
graphical model to efficiently generate motions for varying generic collision detection / response scheme based on inverse
1 AICO can also be applied on the kinematic level, where x = q and with
kinematics to amend the movements, close to an imminent
t t
the control ut = qt+1 − qt being just the joint angle step. This would reduce collision, the cost increases rapidly as shown in the black plot
the computational cost further. in Fig. 9. This illustrates that the topological representation

471
4 Replanning cost
6
x 10 we coined as a motion prior in topological spaces would
5
AICO re−plan here correspond to pseudo control costs for transitions in
Interaction mesh topological space. Which formulation will eventually lead to
Average total cost

4 Inverse kinematics
computationally most efficient algorithms is a matter of future
3 research. As an outlook, we aim to apply the proposed methods
2 for dexterous robot manipulation (including grasping) of more
1
complex, articulated or flexible objects, where we believe that
multiple parallel representations will enable more robust and
0
0 0.1 0.2 0.3 0.4 0.5 generalizing motion synthesis strategies.
Obstacle displacement (m)
Acknowledgements: This work was funded by the EU FP7
project TOMSY and EPSRC project EP/H012338/1.
Fig. 9. The average total cost of (1) AICO re-planning, (2) re-mapping the
trajectory using interaction mesh and (3) reaching using inverse kinematics R EFERENCES
to control the end-effector position. The running cost consists of end-effector
position term, a collision term and a control cost term. [1] S Bhattacharya, M Likhachev, and V Kumar. Identification and
representation of homotopy classes of trajectories for search-
generalizes the state space well such that the local re-mapping based path planning in 3D. Proc.of RSS, 2011.
scheme can produce an adequate response in real-time with [2] S Bitzer and S Vijayakumar. Latent spaces for dynamic
movement primitives. In Proc. of 9th IEEE-RAS International
low computational effort while achieving the overall task aims. Conference on Humanoid Robots, pages 574–581. IEEE, 2009.
We demonstrate the results on a real time manipulation us- [3] D J Braun, M Howard, and S Vijayakumar. Optimal vari-
ing the KUKA LWR 4 robotic manipulator and a vision track- able stiffness control: Formulation and application to explosive
ing system. The movement of the obstacles was individually movement tasks. Autonomous Robots (in press), 2012.
controlled by three uncorrelated individuals, showing that this [4] C H Dowker and B T Morwen. Classification of knot projec-
tions. Topology and its Applications, 16(1):19–31, 1983.
method can deal with unpredictable movement of obstacles in [5] S L Ho Edmond and T Komura. Character motion synthesis
real time (Video is available at http://goo.gl/fwSxG). by topology coordinates. In Computer Graphics Forum (Proc.
Eurographics 2009), volume 28, Munich, Germany, 2009.
VI. C ONCLUSIONS [6] Edmond S L Ho, T Komura, and Chiew-Lan Tai. Spatial
Different motion representations have different strengths relationship preserving character motion adaptation. ACM
Transactions on Graphics, 29(4):1–8, 2010.
and weaknesses depending on the problem. For certain inter-
[7] K Klenin and J Langowski. Computation of writhe in modeling
action problems there exist suitable topological representations of supercoiled DNA. Biopolymers, 54(5):307–317, 2000.
in which the interaction can be described in a way that general- [8] Weiwei Li and E Todorov. An iterative optimal control and
izes well to novel or dynamic situations (as with the interaction estimation design for nonlinear stochastic system. In Proc. of
mesh), or where local optimization methods can find solutions the 45th IEEE Conference on Decision and Control, 2006.
[9] S R Lindemann and S M LaValle. Incrementally reducing
that would otherwise require inefficient global search (as
dispersion by increasing voronoi bias in RRTs. In Proc. of
with the writhe representations). However, considering motion International Conference on Robotics and Automation, 2004.
planning only in a topological representation is insufficient [10] D M Murray and S J Yakowitz. Differential dynamic pro-
for additionally accounting for tasks and constraints in other gramming and newtons method for discrete optimal control
representations. problems. Journal of Optimization Theory and Applications,
43(3):395–414, 1984.
Previous work with such representations has only tested [11] J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness and tem-
basic approaches for inverse mapping of fixed topological poral optimization in periodic movements: An optimal control
trajectories to the joint configuration [4, 5, 16]. In contrast, approach. In Proc. of IROS, pages 718 –724, 2011.
in this paper we presented methods that combine the different [12] K Rawlik, M Toussaint, and S Vijayakumar. An approximate
representations at the abstract and lower level for motion inference approach to temporal optimization in optimal control.
In Proc. of Neural Information Processing Systems, 2010.
synthesis. For instance, the reaching task in an endeffector [13] K Rawlik, M Toussaint, and S Vijayakumar. On stochastic
space is coupled with a writhe space that allows a local optimal control and reinforcement learning by approximate
optimization method to generate an unwrapping-and-reaching inference. In Proc. of RSS, 2012.
motion. Considering such a problem only in joint configuration [14] A Shkolnik and R Tedrake. Path planning in 1000+ dimensions
and endeffector space leads to “deep local minima” that are using a task-space Voronoi bias. In Proc.of ICRA, pages 2892–
2898. IEEE, 2009.
practically infeasible to solve—as our comparison to RRTs in [15] J Takamatsu and et.al. Representation for knot-tying tasks. IEEE
Section V-A demonstrated. Considering such a problem only Transactions on Robotics, 22(1):65–78, 2006.
in writhe space would not address the actual reaching task. [16] T Tamei, T Matsubara, A Rai, and T Shibata. Reinforcement
We chose to formulate our approach in the framework of learning of clothing assistance with a dual-arm robot. Proceed-
optimal control as an approximate inference problem since ings of Humanoids, pages 733–738, 2011.
[17] M Toussaint. Robot trajectory optimization using approximate
this allows for a direct extension of the graphical model to inference. In Proc. of ICML, pages 1049–1056. ACM, 2009.
incorporate multiple representations. Alternative formulations [18] H Wakamatsu, E Arai, and S Hirai. Knotting/unknotting ma-
are possible, for instance as a structured constraint opti- nipulation of deformable linear objects. International Journal
mization problem (MAP inference in our graphical model) of Robotics Research, 25(4):371–395, 2006.
that could be solved by methods such as SNOPT. What

472
Rigidity Maintenance Control for Multi-Robot Systems
Daniel Zelazo∗ , Antonio Franchi† , Frank Allgöwer∗ , Heinrich H. Bülthoff†‡ , and Paolo Robuffo Giordano†
∗ Institute
for Systems Theory and Automatic Control, University of Stuttgart, Germany.
Email: {daniel.zelazo, allgower}@ist.uni-stuttgart.de
† Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
Email: {antonio.franchi, prg}@tuebingen.mpg.de
‡ Department of Brain and Cognitive Engineering, Korea University. Email: [email protected]

Abstract—Rigidity of formations in multi-robot systems is formation control using only relative distance measurements,
important for formation control, localization, and sensor fusion. as opposed to relative position measurements from a global or
This work proposes a rigidity maintenance gradient controller for relative inertial frame [2, 3, 5, 20, 26, 30]. For example, in [20]
a multi-agent robot team. To develop such a controller, we first
provide an alternative characterization of the rigidity matrix and it was shown that formation stabilization using only distance
use that to introduce the novel concept of the rigidity eigenvalue. measurements can be achieved only if rigidity of the formation
We provide a necessary and sufficient condition relating the pos- is maintained. In fact, rigidity represents a necessary condition
itivity of the rigidity eigenvalue to the rigidity of the formation. for estimating relative positions using only relative distance
The rigidity maintenance controller is based on the gradient measurements [4, 8]. Therefore, in the context of formation
of the rigidity eigenvalue with respect to each robot position.
This gradient has a naturally distributed structure, and is thus control and estimation, the notion of rigidity plays the same
amenable to a distributed implementation. Additional require- role as that of connectivity when distance measurements are
ments such as obstacle and inter-agent collision avoidance, as the only available information (rather than relative or absolute
well as typical constraints such as limited sensing/communication position measurements).
ranges and line-of-sight occlusions, are also explicitly considered. In a broader context, rigidity turns out to be an important
Finally, we present a simulation with a group of seven quadrotor
UAVs to demonstrate and validate the theoretical results. architectural property of many multi-agent systems when a
common inertial reference frame is unavailable. Applications
I. I NTRODUCTION that rely on sensor fusion for localization, exploration, map-
Multi-robot systems provide advantages over their mono- ping and cooperative tracking of a target, all can benefit from
lithic counterparts for many reasons. Perhaps of greatest notions in rigidity theory [4, 29, 32]. While equipping each
interest is their resilience against system failures and their robot with a GPS-like sensor may eliminate many difficulties,
ability to adapt to highly dynamic and unknown environ- these solutions would not work in harsher environments, such
ments. Applications that rely on multi-agent solutions vary as under-water, indoors, or any GPS-denied environments. Fur-
from interferometry in deep space, distributed sensing and thermore, coordination via relative sensing inherently provides
data collection, surveillance, construction and transportation, greater accuracy and reliability in addition to more resilience
and search and rescue operations [1, 2, 7, 22, 23, 24, 25]. and robustness against failures.
Despite the many advantages of multi-agent systems, there A goal of this paper, therefore, is to present a distributed
remain several open problems towards the complete auton- gradient-based control strategy for a group of mobile robots
omy of such systems, among which control of the agents for maintaining formation rigidity at all times while navigating
collective motion is of paramount importance. In this context, in cluttered environments. In particular, we do not aim for a
the alignment of theoretical and analytical tools with the solution constraining the robots to keep a given fixed rigid
constraints of real-world systems remains challenging. Limited formation, e.g., by choosing beforehand a particular topology
sensing and communication ranges, inter-agent and obstacle for the interaction graph. Rather, we are interested in devising
collision avoidance, and connectivity maintenance all represent a flexible strategy that can allow, for instance, creation or
fundamental requirements for the successful implementation disconnection of interaction links among robots, while still
of any collective motion control strategy. In this direction, ensuring rigidity of the entire formation. In this way, the
there has been a focused research effort on the development robot motion will not be overly constrained: this is a desir-
of distributed algorithms for collective motion control while able feature when, for example, establishment/maintenance of
guaranteeing maintenance of such mission objectives. an interaction link among two agents conflicts with typical
An important objective for collective motion control of sensing/communication issues such as limited ranges or line-
multiple agents that has not received much attention in the of-sight occlusions.
robotics community is that of rigidity and rigid formations. In general, rigidity as a property of a given formation
The study of rigidity has a long history with contributions (i.e., of the robot spatial arrangement) has been studied from
from both pure mathematics and engineering disciplines [9, either a purely combinatorial perspective [21], or by providing
18, 20, 21, 29, 31]. One benefit from rigidity theory in the an algebraic characterization via the state-dependent rigidity
context of motion control of multiple robots is that it allows for matrix [31]. One contribution of this work, therefore, is the

473
development of an alternative representation of the rigidity graph G is specified by a vertex set V, an edge set E whose
matrix that separates the underlying graph structure from elements characterize the incidence relation between distinct
the position of each agent. This is instrumental to create a pairs of V, and diagonal |E| × |E| weight-matrix A, with
companion matrix, the symmetric rigidity matrix, which will Aii ≥ 0 the weight on edge ei ∈ E. Two vertices i and j
be shown to resemble a weighted Laplacian matrix. Using are called adjacent (or neighbors) when {i, j} ∈ E; we denote
the symmetric rigidity matrix, we are then able to deter- this by writing i ∼ j. An orientation of an undirected graph
mine whether a formation is rigid by examining a particular G is the assignment of directions to its edges, i.e., an edge ek
eigenvalue of this matrix, named the rigidity eigenvalue. This is an ordered pair (i, j) such that i and j are, respectively, the
quantity is in fact, in the planar case, identical to the worst- initial and the terminal nodes of ek .
case rigidity index, introduced in [34]. However, contrary to The incidence matrix E(G) ∈ R|V|×|E| is a {0, ±1}-matrix
the results presented in [19, 34], the structure of the symmetric with rows and columns indexed by the vertices and edges
rigidity matrix allows for using the rigidity eigenvalue and of G such that [E(G)]ik has the value ‘+1’ if node i is the
its associated eigenvector in a distributed control strategy for initial node of edge ek , ‘-1’ if it is the terminal node, and ‘0’
ensuring rigidity under time-varying topologies and additional otherwise. The degree of vertex i, di , is the cardinality of the
sensing constraints. In particular, we will show how to embed set of vertices adjacent to it. The degree matrix, Δ(G), and
into a unified gradient-based rigidity maintenance action the the adjacency matrix, A(G), are defined in the usual way [12].
fulfillment of a number of additional constraints, such as The (graph) Laplacian of G, L(G) = E(G)E(G)T = Δ(G) −
inter-robot and obstacle avoidance, limited communication and A(G), is a rank deficient positive semi-definite matrix. One
sensing ranges, and line-of-sight occlusions. Our approach, of the most important results from algebraic graph theory in
therefore, can be seen as the suitable recasting into the rigidity the context of collective motion control states that a graph is
framework of those works related to decentralized connectivity connected if and only if the second smallest eigenvalue of the
maintenance based on the second smallest eigenvalue of the Laplacian is positive [12].
graph Laplacian matrix, see, e.g., [27, 33].
The organization of this paper is as follows. Section I-A II. R IGIDITY AND THE R IGIDITY E IGENVALUE
provides a brief overview of some notation and fundamental
In this Section we review the fundamental concepts of
theoretical properties of graphs. In Section II, the theory of
graph rigidity. The main construct from rigidity theory that
rigidity is introduced. This section also introduces the first
we focus on is the rigidity matrix. A main contribution of this
main results of this paper, being the characterization of the
work is the presentation of a companion matrix to the rigidity
symmetric rigidity matrix and of the rigidity eigenvalue. Then,
matrix, and the characterization of what we term the rigidity
Section III shows how to exploit these results in order to derive
eigenvalue. For a more detailed treatment of the theory of
the sought gradient-based rigidity maintenance controller. The
graph rigidity, the reader is referred to [13, 17].
validity of this controller is further illustrated in Section IV
by means of a realistic simulation involving 7 quadrotor UAVs A. Graph Rigidity and the Rigidity Matrix
navigating in a cluttered environment. During the simulation,
the motion of two of these UAVs is also partially controlled by We consider graph rigidity from what is known as a d-
two human operators through two joysticks in order to stress dimensional bar-and-joint framework. A framework is the pair
the action of the controller in maintaining formation rigidity. (G, p), where G = (V, E) is a graph, and p : V → Rd maps
Finally, Section V concludes the paper and provides some final each vertex to a point in Rd . For simplicity of exposition,
remarks. we only consider the planar case, d = 2, and note that all the
following results can be extended to d = 3 in a straightforward
T
A. Preliminaries and Notations way. Therefore, for node u ∈ V, p(u) = pxu pyu is
The notation employed is standard. Matrices are denoted the position vector in R2 for the mapped node. We denote
by capital letters (e.g., A), and vectors by lower case letters by p(x,y) = [p(v1 ) · · · p(v|V| )]T as the aggregate position
(e.g., x). The rank of a matrix A is denoted rk[A]. Diagonal vector of all agents. In fact, although the following derivations
matrices will be written as D = diag{d1 , . . . , dn }; this only consider the planar case, the simulations presented in
notation will also be employed for block-diagonal matrices. Section IV are run by taking into account full 3-dimensional
A matrix and/or a vector that consists of all zero entries will agents and their associated 3-dimensional controller. We now
be denoted by 0; whereas, ‘0’ will simply denote the scalar provide some basic definitions.
zero. Similarly, the vector 1 denotes the vector of all ones. Definition 2.1: Frameworks (G, p0 ) and (G, p1 ) are equiva-
The n × n identity matrix is denoted as In . The set of real lent if p0 (u) − p0 (v) = p1 (u) − p1 (v) for all {u, v} ∈ E,
numbers will be denoted as R, and  .  denotes the standard and are congruent if p0 (u) − p0 (v) = p1 (u) − p1 (v) for
Euclidean 2-norm for vectors. The Kronecker product of two all u, v ∈ V.
matrices A and B is written as A ⊗ B [15]. Definition 2.2: (G, p0 ) is globally rigid if every framework
Graphs and the matrices associated with them will be widely which is equivalent to (G, p0 ) is congruent to (G, p0 ).
used in this work. The reader is referred to [12] for a detailed Definition 2.3: (G, p0 ) is rigid if there exists an  > 0 such
treatment of the subject. An undirected (simple) weighted that every framework (G, p1 ) which is equivalent to (G, p0 )

474
v1
v2 An example of an infinitesimally rigid graph is shown in Fig-
v2 v3 ure 1(b). Furthermore, note that infinitesimal rigidity implies
v3
rigidity, but the converse is not true [31], see Figure 1(c) for
v1 a rigid graph that is not infinitesimally rigid.
v4 Infinitesimal rigidity can also be determined by examining
v5
v4
v5
the rank of the rigidity matrix, R(p) ∈ R|E|×2|V| [31]. In
fact, the rigidity matrix can be constructed from the system
(a) Two equivalent minimally rigid graphs. of equations (2). This can be thought as a way to generate
v1 all possible infinitesimal rigid motions of a formation. The
quantity (p(u) − p(v)), therefore, represents the coefficients
v2
v3 of that system of equations, and each row of the matrix
corresponds to an edge e = {u, v}; for example, the row
corresponding to edge e has the form
v1
{v1 , v3 }
v3
1 2
v5 {v1 , v2 } v2 {v2 , v3 } 0 (p(u) − p(v))T 0 (p(v) − p(u))T 0
v4       .
vertex u vertex v
(b) An infinitesimally and glob- (c) A non-infinitesimally rigid graph (note
ally rigid graph. that vertexes v1 and v3 are connected). The definition of infinitesimal rigidity can then be restated in
the following form:
Fig. 1. Examples of rigid and infinitesimally rigid frameworks.
Lemma 2.1 ([31]): A framework (G, p) is infinitesimally
and satisfies p0 (v) − p1 (v) <  for all v ∈ V, is congruent rigid if and only if rk[R(p)] = 2|V| − 3.
Note that, as expected from Definition 2.5, for an infinitesi-
to (G, p0 ).
mally rigid graph the three-dimensional kernel of R(p) only
Definition 2.4: A minimally rigid graph is a rigid graph
allows for three independent feasible framework motions,
such that the removal of any edge results in a non-rigid graph.
that is, the above-mentioned collective roto-translation on the
Figure 1 shows three graphs illustrating the above defi-
plane. Note also that, despite its name, the rigidity matrix is
nitions. The graphs in Figure 1(a) are both minimally rigid
actually characterizing infinitesimal rigidity rather than rigidity
and are equivalent to each other, but are not congruent, and
of a framework.
therefore not globally rigid. By adding an additional edge, as in
A first contribution of this work is to provide an alternative
Figure 1(b), the graph becomes globally rigid. The key feature
representation of the rigidity matrix that separates the under-
of global rigidity, therefore, is that the distances between all
lying graph from the relative positions of each agent. In this
node pairs are maintained for different framework realizations,
direction, we first define the notion of a directed local graph
and not just those defined by the edge set.
at node vi .
Using the above definitions, a framework (G, p) can be Definition 2.6: Consider a graph G = (V, E) and its asso-
described as the set ciated incidence matrix with arbitrary orientation E(G). The
  directed local graph at node vj is the sub-graph Gj = (V, Ej )
p(u) ∈ R2 | (p(u) − p(v))2 = 2uv , ∀ {u, v} ∈ E , (1)
induced by node vj such that
where the Euclidean distances between nodes, uv , are speci- Ej = {(vj , vi ) | ek = {vi , vj } ∈ E}.
fied. At a particular point in the configuration space, one can
assign velocity vectors ξ(u) ∈ R2 to each vertex u ∈ V such The local incidence matrix at node vj is the matrix
that El (Gj ) = E(G)diag{s1 , . . . , s|E| } ∈ R|V|×|E|
(ξ(u) − ξ(v))T (p(u) − p(v)) = 0, ∀ {u, v} ∈ E. (2) where sk = 1 if ek ∈ Ej and sk = 0 otherwise.
Note, therefore, that the local incidence matrix will contain
Note that this relation can be obtained by differentiating the columns of all zeros in correspondence to those edges not
length constraint described in (1). These motions are referred adjacent to vj .
to as infinitesimal motions of the mapped vertices p(u). If Proposition 1: Let p(x,y) ∈ R|V|×2 be the position vector
the mapping p is further parameterized by a positive scalar for all agents. The rigidity matrix R(p) can be defined as
representing time, then we can consider the infinitesimal   
motions at each time, and define R(p) = El (G1 )T · · · El (G|V| )T I|V| ⊗ p(x,y) , (4)

ṗ(u, t) = ξ(u), (3) where El (Gi ) is the local incidence matrix for node vi .
This observation allows us to define the rigidity matrix in a
where ξ(u) will be treated as the agent velocity input for way that separates the underlying graph and the actual position
control purposes (see Section III). of each node. In the sequel, we explore the ramification of
Definition 2.5: A framework is called infinitesimally rigid Proposition 1 via the introduction of a symmetric companion
if every possible motion that satisfies (2) is trivial (i.e., consists matrix, termed the symmetric rigidity matrix, and the notion
of only rotations and translations of the entire framework). of the rigidity eigenvalue.

475
B. The Rigidity Eigenvalue where E(G) is simply the incidence matrix of the underlying
Lemma 2.1 relates the property of infinitesimal rigidity for graph G (with arbitrary orientation). From this we are able
a given framework to the rank of a corresponding matrix. to construct the diagonal weight matrices as stated in the
A main contribution of this work is the translation of the proposition, concluding the proof.
rank condition to that of a condition on the spectrum of The representation of the symmetric rigidity matrix as a
a corresponding matrix that we term the symmetric rigidity weighted Laplacian allows for a more transparent understand-
matrix. ing of certain eigenvalues related to this matrix. The next result
The symmetric rigidity matrix is a symmetric and positive- shows that the first three eigenvalues of R must equal zero for
semidefinite matrix defined as any connected graph G.
Theorem 2.3: If the underlying graph G in a framework is
R := R(p)T R(p) ∈ R2|V|×2|V| . (5) connected, then the symmetric rigidity matrix has at least three
An immediate consequence of the construction of the sym- eigenvalues at the origin; that is, λ1 = λ2 = λ3 = 0. Further-
metric rigidity matrix is that rk[R] = rk[R(p)] [14], leading more, the eigenvectors associated with each eigenvalue are
 T  T
to the following corollary. respectively v1 = P T 1T 0T , v2 = P T 0T 1T ,
Corollary 2.2: A framework (G, p) is infinitesimally rigid  y T T
and v3 = P T
(p ) −(p )x T
, where P is defined in
if and only if rk[R] = 2|V| − 3. (7).
The rank condition of Corollary 2.2 can be equivalently Proof: Recall that for any connected graph, one has
stated in terms of the eigenvalues of R. Denoting the eigen- E(G)T 1 = 0 [12]. Therefore, P RP T must have two eigen-
values of R as λ1 ≤ λ2 ≤ . . . ≤ λ2|V| , note that infinitesimal  T
values at the origin, with eigenvectors u1 = 1T 0T
rigidity is equivalent to λ1 = λ2 = λ3 = 0 and λ4 > 0.  T
We will now show that, in fact, for any connected graph, the and u2 = 0T 1T . To show that there is also a third
first three eigenvalues are always 0. This places an additional eigenvalue at the origin, we constructively build another eigen-
emphasis on the eigenvalue λ4 as a measure of infinitesimal vector corresponding to the zero eigenvalue that is linearly
rigidity; consequently, we term λ4 the Rigidity Eigenvalue. independent of u1 and u2 . Let u3 ∈ R2|V| be the eigenvector
The first result in this direction shows that the symmetric corresponding to λ3 , and construct it as
rigidity matrix is similar to a weighted Laplacian matrix. ⎧
⎨ 0, i ∈ {1, |V| + 1}
Proposition 2: The symmetric rigidity matrix is similar to [u3 ]i = pyi − py1 , i ≤ |V|
the weighted Laplacian matrix via a permutation of the rows ⎩ px − px , i > |V| + 1
1 i−|V|
and columns as
 T
Wx Wxy   Next, observe that (I2 ⊗ E(G)T )u3 = bT1 bT2 is such
P RP T = (I2 ⊗ E(G)) I2 ⊗ E(G)T , (6)
Wxy Wy that b1 is ±(pyi − py1 ) only for edges incident to node vi , and 0
where Wx , Wy , and Wxy are diagonal weighting matrices for otherwise. Similarly, b2 is ±(px1 − pxi ) only for edges incident
each edge in G such that for the edge ek = (vi , vj ), to node vi . It can now be verified that from this construction
one has
[Wx ]kk = (pxi − pxj )2 , [Wy ]kk = (pyi − pyj )2 ,
Wx Wxy
[Wxy ]kk = (pxi − pxj )(pyi − pyj ), (I2 ⊗ E(G)T )u3 = 0.
Wxy Wy
and pxi (pyi ) represents the x-coordinate (y-coordinate) of the To conclude the proof, observe that v3 = u3 + py1 u1 − px1 u2
position of agent i. is a linear combination of the three eigenvectors.
Proof: The proof is by direct construction using Propo- Theorem 2.3 provides a precise characterization of the first
sition 1 and (5). Consider the permutation matrix P as three eigenvalues of the symmetric rigidity matrix. The explicit
 
I|V| ⊗  1 0  characterization of the corresponding eigenvectors is given for
P = . (7)
I|V| ⊗ 0 1 completeness. It should be noted, however, that the descrip-
Then, tion of the eigenvectors might be of utility when exploring
distributed strategies for computing the rigidity eigenvalue,
P RP T =
analogously to, e.g., the case of the distributed power iteration
(I|V| ⊗ pxT )Ê Ê T (I|V| ⊗ px ) (I|V| ⊗ pxT )Ê Ê T (I|V| ⊗ py ) developed for the connectivity eigenvalue in [33]. Theorem 2.3
,
(I|V| ⊗ pyT )Ê Ê T (I|V| ⊗ px ) (I|V| ⊗ pyT )Ê Ê T (I|V| ⊗ py ) can be used to arrive at the main result relating infinitesimal
 
where Ê = El (G1 )T · · · El (G|V| )T . Observe that each rigidity to the rigidity eigenvalue.
block matrix above is size |V| × |V|. Furthermore, note that Theorem 2.4: A framework (G, p) is infinitesimally rigid
⎡ ⎤ if and only if the rigidity eigenvalue is strictly positive, i.e.
..
⎢ . ⎥ λ4 > 0.
(I|V| ⊗ pTx )Ê = E(G) ⎢ (pxi − pxj ) ⎥ , Proof: The proof is a direct consequence of Corollary 2.2
⎣ ⎦
.. and Theorem 2.3.
.
   Another useful observation relates infinitesimal rigidity of
a diagonal matrix of size |E|×|E| a framework to connectedness of the underlying graph.

476
Corollary 2.5: Rigidity of the framework (G, p) implies 1.5 1.5

connectedness of the graph G.


Having established strong connections between the infinites- 1 1

b u v, c u , c v
imal rigidity of a framework and spectral properties of the

auv
symmetric rigidity matrix, we can proceed to demonstrate how 0.5 0.5

the rigidity eigenvalue can be used in a gradient control law


for rigidity maintenance. 0 0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
 u v [m]  u v o,  u v [m]

III. R IGIDITY M AINTENANCE


(a) (b)
In a dynamic and uncertain environment, a team of mobile
robots may not be able to maintain the same communication Fig. 2. Left: shape of the weights auv with d0 = 8 and D = 16. Right:
shape of the weights buv with domin = 0.3 and domax = 3. The shape of
and sensing graph throughout the duration of a mission. weights cu and cv is equivalent to that of buv .
Indeed, line of sight occlusions and sensing range limitations
can cause links between agents to drop out. Furthermore, col- uv → dmin for any other agent v (see, again, Figure 2(b)).
lisions with obstacles and among robots should be mandatorily With these settings, the total weight Auv (8) of the edge
avoided during motion. In order to take into account all these between agents u and v will vanish when any of the conditions
constraints, we use results from [27, 33] and propose the of Definition 3.1 are not met.
following definition of neighboring agents: We wish to stress that the weights Auv are function only
Definition 3.1: Two agents u and v are considered neigh- of the relative distances among agents and relative distances
bors if and only if (i) their relative distance uv is smaller than among the segment joining two agents and the closest ob-
D ∈ R+ (the sensing range) and larger than dmin ∈ [0, D) stacles. Therefore, this is consistent with the assumption of
(the safety range), (ii) their line of sight is not occluded by relying solely on relative measurements for the purpose of
an obstacle, and (iii) neither u nor v are closer than dmin to formation control; see the discussion in the Introduction.
any other agent or obstacle.
Conditions (i) and (ii) are meant to take into account the
typical sensing constraints in multi-robot applications (min- A. Gradient of the Rigidity Eigenvalue
imum/maximum communication and sensing range, occlu-
sions), while the purpose of condition (iii) is to force dis- In order to use the rigidity eigenvalue as a feedback
connection from the group if an agent is colliding with any mechanism for rigidity maintenance, it is first required that
other agent or obstacle in the environment. By virtue of this an expression for its gradient with respect to the positions
definition, we are then able to embed into a unique rigidity of each agent is determined. In this subsection, we compute
maintenance action all the requirements stated above.1 this gradient and demonstrate that it can in fact be described
A way to encode the neighboring relationship of Defini- entirely by relative positions of each agent in the formation.
tion 3.1 into the interaction graph G is to suitably shape the In this direction, first recall that the rigidity eigenvalue can
weights Auv ≥ 0 of the edges joining neighboring agents. In be expressed as
particular, following [27], we define the weights Auv as the λ4 = v4T P RP T v4 ,
product of four terms
Auv = auv buv cu cv , (8) where v4 is the normalized eigenvector associated with λ4 ,
which we call the rigidity eigenvector. For notational conve-
with the following properties: weights auv (uv ) ≥ 0 stay T
nience, we denote v4 = (v x )T (v y )T , and we index
constant if uv < d0 , where 0 < d0 < D is a desired inter-
each element of this vector with a subscript. Expanding this
robot distance, and smoothly vanish as uv → D (Figure 2(a)
expression, we can obtain
shows an example for d0 = 8 and D = 16). As for
weights buv (uvo ) ≥ 0, let uvo be the distance between the ⎡⎛ ⎞
segment joining agents u and v and the closest obstacle point, 
λ4 = ⎣⎝ (pxi − pxj )2 (vix − vjx )2 ⎠ +
and let 0 ≤ domin < domax ≤ D be a desired minimum
i∼j
and maximum link/obstacle distance. Then, buv (uvo ) are ⎛ ⎞
designed to stay constant if uvo ≥ domax and to smoothly  y
⎝ (pi − pyj )2 (viy − vjy )2 ⎠ +
vanish as uvo → domin (Figure 2(b) shows an example for
domin = 0.3 and domax = 3). Finally, weights cu (uv ) ≥ 0 ⎛
i∼j
⎞⎤
(respectively cv (vu ) ≥ 0) are designed to stay constant if 
uv ≥ d0 for any other agent v, and to smoothly vanish as ⎝2 (pxi − pxj )(pyi − pyj )(vix − vjx )(viy − vjy )⎠⎦ .
i∼j
1 In fact, whenever an agent will get too close to an obstacle or another
agent, all its edges will disappear thus leading to a disconnected graph. Since
infinitesimal rigidity implies rigidity which, in turn, implies connectivity, this The above expression lends itself to a straightforward calcu-
will also cause loss of infinitesimal rigidity. lation of the gradient with respect to each agent’s position.

477
200


180

∂λ4 160

= 2 (pxi − pxj )(vix − vjx )2 + 140

∂pxi i∼j
120


100

(pyi − pyj )(vix − vjx )(viy − vjy )


80

(9) 60

∂λ4  40

(pyi − pyj )(viy − vjy )2 +


20

= 2
∂pyi
0
0 5 10 15 20 25 30 35 40
λ4
i∼j

(pxi − pxj )(vix − vjx )(viy − vjy ) . (10) Fig. 3. A possible shape for the rigidity potential function Vλ with λmin
4 = 5.

The key feature of (9) and (10) is that the gradients


are expressed entirely in terms of the relative positions of
neighboring pairs. It is important to emphasize here that, the formation (while adopting the neighboring definition in
despite this distributed structure, each agent must have access Definition 3.1), it is then sufficient that every agent follows
to some global information, namely, the relative components the anti-gradient of Vλ , that is, applies the control
of the rigidity eigenvector. ∂Vλ ∂Vλ ∂λ4
ξi = − =− . (12)
B. Gradient of the Rigidity Eigenvalue with Weighted Links ∂pi ∂λ4 ∂pi
The previous derivation of the gradient of the rigidity where ξi is the agent velocity input defined in (3), and
eigenvalue can be easily extended to the case of weighted pi = [pxi pyi ]T is the position vector of the ith agent. We
links, that is, the case under consideration in this paper. To note, again, that the agent control action (12) is almost
this end, we redefine the elements of matrixes Wx , Wy , Wxy decentralized, since it is a function of relative positions among
in (6) as neighbors and closest obstacles. The only pieces of global
information required in (12) are the current value of λ4
[Wx ]kk = (pxi − pxj )2 Aij , [Wy ]kk = (pyi − pyj )2 Aij , ∂Vλ
(needed for evaluating ), and the current entries of the
[Wxy ]kk = (pxi − pxj )(pyi − pyj )Aij . ∂λ4
∂λ4
where Aij are the weights defined in (8). With this formula- eigenvector v4 (needed for evaluating ). We are currently
∂pi
tion, the weighted rigidity eigenvalue can now be expressed working towards a decentralized algorithm for estimating λ4
as and v4 online, similar to what has been done for the case of
 
λ4 = (pxi − pxj )2 Aij (vix − vjx )2 + the connectivity eigenvalue and eigenvector obtained from the
i∼j graph Laplacian matrix, see, e.g., [33].
 
(pyi − pyj )2 Aij (viy − vjy )2 + As a final remark, note that the control action (12), while
i∼j ensuring rigidity maintenance for graph G and, implicitly,
  
inter-robot and obstacle collision avoidance, will not prevent
2 (pxi − pxj )(pyi − pyj )Aij (vix − vjx )(viy − vjy ) .
i∼j
the creation or disconnection of individual links (as per
Definition 3.1). Therefore, the graph topology will be allowed
It can be verified that the computation of the gradient of λ4 to change over time whenever needed, for example because
w.r.t. the pair (pxi , pyi ) is a straightforward extension of the of sensing limitations or autonomous split/join decisions. This
previous case. In fact, constitutes an additional feature of our approach, as the robots
∂λ4   x will not be overly constrained, e.g., by requiring maintenance
= 2(pi − pxj )Aij (vix − vjx )2 +
∂pxi i∼j
of a given fixed graph topology.
∂Aij x
(pxi − pxj )2 (vi − vjx )2 + IV. S IMULATIONS
∂pxi (11)
2(pyi − pyj )Aij (vix − vjx )(viy − vjy )+ In this last Section we report the simulation results of a case
∂Aij 
2(pxi − pxj )(pyi − pyj ) x (vix − vjx )(viy − vjy ) study involving 7 quadrotor UAVs exploited as mobile robotic
∂pi platforms. Assuming availability of standard trajectory track-
∂λ4 ing controllers [6, 16], we model the closed-loop quadrotor
and analogously for . behavior as a single integrator in R3 , see also [10, 11, 28] for
∂pyi
similar working assumptions. As explained in Section II-A,
C. The Rigidity Potential here, for the sake of illustration, we consider tridimensional
As a final step, we define a scalar potential function agents (and related tridimensional version of the rigidity main-
Vλ (λ4 ) > 0 with the properties of growing unbounded as tenance controller (12)), although the previous developments
λ4 → λmin 4 > 0 and vanishing (with vanishing derivative) as have been worked out only for the planar case. In fact,
λ4 → ∞. A possible shape for Vλ is illustrated in Figure 3 extension of the proposed machinery to R3 is a straightforward
with λmin
4 = 5. In order to maintain infinitesimal rigidity of exercise.

478
Fig. 4. Snapshots of a simulation with 7 quadrotor UAVs. The two UAVs partially controlled by two human operators are marked with two semi-transparent
spheres.

In order to better illustrate the overall behavior of the system


1.8

(maintenance of infinitesimal rigidity despite the time-varying 25


1.6

topology due to the neighboring conditions of Definition 3.1), 1.4

we added an exogenous bounded term u∗i ∈ R3 to the control


20
1.2

λ2
λ7

15

action (12) for two of the 7 quadrotors, namely, quadrotors 0.8

10 0.6
1 and 2. This way, two human operators could independently 0.4
5

add two velocity commands to quadrotors 1 and 2 by acting on 0.2

0 0
two joysticks during the simulation. Being that the commands 0 20 40 60 80
time [s]
100 120 140 160 0 20 40 60 80
time [s]
100 120 140 160

u∗i are bounded, their effect does not threaten rigidity mainte- (a) (b)
nance since, intuitively, the feedback term (12) is eventually
always dominant as the slope of Vλ grows unbounded when Fig. 5. A plot of the rigidity eigenvalue and connectivity eigenvalue of
the simulation. Note how both keep positive at all times, confirming that
approaching loss of infinitesimal rigidity. Figure 4 shows eight infinitesimal rigidity and, as a consequence, connectivity of the graph were
snapshots taken during the simulation; a video of this simu- always maintained.
lation run can be found at http://youtu.be/Ni6rIrcA5Hw. The
quadrotors controlled by the two human operators are marked 20

with two semi-transparent spheres, while the links joining any


19

two agents change brightness from light (yellow) to dark (red)


18
as the weights Auv drop from their maximum value to zero.
|E|

The quadrotors were flown in the cluttered environment shown 17

in Figure 4 with the aim of triggering as much as possible split 16

and join events because of the constraints of Definition 3.1, 15

namely, maximum communication/sensing range, line-of-sight 14


0 20 40 60 80 100 120 140 160
time [s]
occlusion, obstacle and inter-robot minimum distance.
Figure 5(a) reports the behavior of the rigidity eigenvalue2 Fig. 6. The total number of edges |E| over time. Note how the graph topology
during the simulation: one can appreciate that, being positive changes over time as edges are created and disconnected.
at all times, infinitesimal rigidity is never compromised despite
in order to show that the graph topology is actually varying
the complex maneuvering among the obstacles. Furthermore,
because of the frequent split and join events.
we show in Figure 5(b) the behavior of the connectivity
eigenvalue, the eigenvalue associated to the connectivity of V. C ONCLUSION
the graph G. This is meant to show that, as expected from
This work presented a gradient controller for the mainte-
Corollary 2.5, connectivity of the graph is also automatically
nance of infinitesimal rigidity in a multi-robot team. A key
preserved when ensuring graph rigidity. Finally, Figure 6
component of this work was the identification of an important
reports the total number of edges |E| of the graph G over time
parameter for ensuring infinitesimal rigidity that we termed
2 Note that for d = 3, the rigidity eigenvalue is λ of the symmetric rigidity
7
the rigidity eigenvalue. We provided necessary and sufficient
matrix. conditions that relate the positivity of the rigidity eigenvalue

479
to the infinitesimal rigidity of the formation. This, in turn, was [14] R. Horn and C Johnson. Matrix Analysis. Cambridge University
used to develop a gradient controller for rigidity maintenance Press, New York, 1985.
which was shown to have a distributed structure depending [15] R.A. Horn and C.R. Johnson. Topics in Matrix Analysis.
Cambridge University Press, New York, NY, 1991.
only on the relative positions of each robot and relative values [16] M.-D. Hua, T. Hamel, P. Morin, and C. Samson. A control
of the eigenvector associated with the rigidity eigenvalue. We approach for thrust-propelled underactuated vehicles and its
also augmented this controller with obstacle and collision application to VTOL drones. IEEE Trans. on Automatic Control,
avoidance features. The results were demonstrated with a 54(8):1837–1853, 2009.
simulation example. An important extension of this work that [17] B. Jackson. Notes on the Rigidity of Graphs. In Levico
Conference Notes, 2007.
we are considering is the distributed determination of the [18] D. Jacobs. An Algorithm for Two-Dimensional Rigidity Perco-
rigidity eigenvalue and eigenvector. lation: The Pebble Game. Journal of Computational Physics,
137(2):346–365, November 1997.
ACKNOWLEDGMENTS [19] Y. Kim, G. Zhu, and J. Hu. Optimizing formation rigidity under
This research was partly supported by WCU (World Class connectivity constraints. In 49th IEEE Conf. on Decision and
Control, pages 6590–6595, Atlanta, GA, Dec. 2010.
University) program funded by the Ministry of Education,
[20] L. Krick, M. E. Broucke, and B. A. Francis. Stabilisation
Science and Technology through the National Research Foun- of infinitesimally rigid formations of multi-robot networks.
dation of Korea (R31-10008). International Journal of Control, 82(3):423439, 2009.
[21] G. Laman. On graphs and rigidity of plane skeletal structures.
R EFERENCES Journal of Engineering Mathematics, 4(4):331–340, 1970.
[1] I. F. Akyildiz, Y. Sankarasubramaniam, and E. Cayirci. A survey [22] Q. Lindsey, D. Mellinger, and V. Kumar. Construction of cubic
on sensor networks. IEEE Communications Magazine, 40(8): structures with quadrotor teams. In 2011 Robotics: Science and
102–114, 2002. Systems, Los Angeles, CA, Jun. 2011.
[2] B. D. O. Anderson, B. Fidan, C. Yu, and D. van der Walle. [23] M. Mesbahi and M. Egerstedt. Graph Theoretic Methods
UAV formation control: Theory and application. In V. D. in Multiagent Networks. Princeton Series in Applied Math-
Blondel, S. P. Boyd, and H. Kimura, editors, Recent Advances in ematics. Princeton University Press, 1 edition, 2010. ISBN
Learning and Control, volume 371 of Lecture Notes in Control 9780691140612.
and Information Sciences, pages 15–34. Springer, 2008. [24] N. Michael, J. Fink, and V. Kumar. Cooperative manipulation
[3] B. D. O. Anderson, C. Yu, B. Fidan, and J. M. Hendrickx. Rigid and transportation with aerial robots. In 2009 Robotics: Science
graph control architectures for autonomous formations. IEEE and Systems, Seattle, WA, Jun. 2009.
Control Systems Magazine, 28(6):48–63, 2008. [25] R. M. Murray. Recent research in cooperative control of
[4] J. Aspnes, T. Eren, D. K. Goldenberg, A. S. Morse, W. Whiteley, multi-vehicle systems. ASME Journal on Dynamic Systems,
Y. R. Yang, B. D. O. Anderson, and P. N. Belhumeur. A theory Measurement, and Control, 129(5):571–583, 2006.
of network localization. IEEE Trans. on Mobile Computing, 5 [26] R. Olfati-Saber and R. M. Murray. The combinatorial graph
(12):1663–1678, 2006. theory of structured formations. In 41th IEEE Conf. on Decision
[5] J. Baillieul and L. McCoy. The combinatorial graph theory and Control, pages 3609–3615, Las Vegas, NV, Dec. 2002.
of structured formations. In 2007 46th IEEE Conference on [27] P. Robuffo Giordano, A. Franchi, C. Secchi, and H. H. Bülthoff.
Decision and Control, pages 3609–3615, December 2007. Passivity-based decentralized connectivity maintenance in the
[6] S. Bouabdallah and R. Siegwart. Backstepping and sliding- bilateral teleoperation of multiple UAVs. In 2011 Robotics:
mode techniques applied to an indoor micro. In 2005 IEEE Science and Systems, Los Angeles, CA, Jun. 2011.
Int. Conf. on Robotics and Automation, pages 2247–2252, May [28] M. Schwager, B. Julian, M. Angermann, and D. Rus. Eyes in
2005. the sky: Decentralized control for the deployment of robotic
[7] J. Bristow, D. Folta, and K. Hartman. A Formation Flying camera networks. Proceedings of the IEEE, 99(9):1541–1561,
Technology Vision. In AIAA Space 2000 Conference and 2011.
Exposition, volume 21, Long Beach, CA, April 2000. [29] I. Shames, B. Fidan, and B. D. O. Anderson. Minimization of
[8] G. C. Calafiore, L. Carlone, and M. Wei. A distributed the effect of noisy measurements on localization of multi-agent
gradient method for localization of formations using relative autonomous formations. Automatica, 45(4):1058–1065, 2009.
range measurements. In 2010 IEEE Int. Symp. on Computer- [30] B. Smith, M. Egerstedt, and A. Howard. Automatic generation
Aided Control System Design, pages 1146–1151, Yokohama, of persistent formations for multi-agent networks under range
Japan, Sep. 2010. constraints. In 1st Int. Conf. on Robot communication and
[9] R. Connelly and W.J. Whiteley. Global Rigidity: The Effect coordination, pages 1–8, 2007.
of Coning. Discrete Computational Geometry, 43(4):717–735, [31] T. Tay and W. Whiteley. Generating isostatic frameworks.
2009. Structural Topology, 11(1):21–69, 1985.
[10] J. Fink, N. Michael, S. Kim, and V. Kumar. Planning and [32] C. Wu, Y. Zhang, W. Sheng, and S. Kanchi. Rigidity guided
control for cooperative manipulation and transportation with localisation for mobile robotic sensor networks. International
aerial robots. In 14th Int. Symp. on Robotics Research, Lucerne, Journal of Ad Hoc and Ubiquitous Computing, 6(2):114, 2010.
Switzerland, Sep. 2009. [33] P. Yang, R. A. Freeman, G. J. Gordon, K. M. Lynch, S. S. Srini-
[11] A. Franchi, C. Secchi, M. Ryll, H. H. Bülthoff, and P. Robuffo vasa, and R. Sukthankar. Decentralized estimation and control
Giordano. Bilateral shared control of multiple quadrotors: of graph connectivity for mobile sensor networks. Automatica,
Balancing autonomy and human assistance with a group of 46(2):390–396, 2010.
UAVs. IEEE Robotics & Automation Magazine, 19(3), 2012. [34] G. Zhu and J. Hu. Stiffness matrix and quantitative measure of
[12] C. D. Godsil and G. Royle. Algebraic Graph Theory. Springer, formation rigidity. In 48th IEEE Conf. on Decision and Control,
2001. ISBN 978-0-387-95241-3. pages 3057–3062, Shanghai, China, Dec. 2009.
[13] J. Graver, B. Servatius, and H. Servatius. Combinatorial Rigid-
ity. Graduate Studies in Mathematics. American Mathematical
Society, Providence, RI, 1993. ISBN 0821838016.

480

You might also like