0% found this document useful (0 votes)

239 views107 pages

Manipulation

Uploaded by

irteza_faruqi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

239 views107 pages

Manipulation

Uploaded by

irteza_faruqi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

ROBOTIC MANIPULATION

Perception, Planning, and Control

Russ Tedrake

© Russ Tedrake, 2020

Last modified 2021-1-3.
How to cite these notes, use annotations, and give feedback.

Note: These are working notes used for a course being taught at MIT. They will be
updated throughout the Fall 2020 semester.

PDF VERSION OF THE NOTES

The PDF version of these notes are autogenerated from the HTML version. There
are a few conversion/formatting artifacts that are easy to fix (please feel free to point
them out). But there are also interactive elements in the HTML version are not easy
to put into the PDF. When possible, I try to provide a link. But I consider the online
HTML version to be the main version.

TABLE OF CONTENTS
• Preface
• Chapter 1: Introduction
◦ Manipulation is more than pick-and-place
◦ Open-world manipulation
◦ Simulation
◦ These notes are interactive
◦ Model-based design and analysis
◦ Organization of these notes
• Chapter 2: Let's get you a robot
◦ Arms
▪ Position-controlled robots
▪ Torque-controlled robots
▪ A proliferation of hardware
▪ Simulating the Kuka iiwa
◦ Hands
▪ Dexterous hands
▪ Simple grippers
▪ Soft/underactuated hands
▪ Other end effectors
▪ If you haven't seen it...
◦ Sensors
◦ Putting it all together
◦ Exercises
• Chapter 3: Basic Pick and Place
◦ Monogram Notation
◦ Pick and place via spatial transforms
◦ Spatial Algebra
◦ Forward kinematics
▪ The kinematic tree
▪ Forward kinematics for pick and place
◦ Differential kinematics (Jacobians)
◦ Differential inverse kinematics
▪ The Jacobian pseudo-inverse
▪ Invertibility of the Jacobian
◦ Defining the grasp and pre-grasp poses
◦ A pick and place trajectory
◦ Putting it all together
◦ Differential inverse kinematics with constraints
▪ Pseudo-inverse as an optimization
▪ Adding velocity constraints
▪ Adding position and acceleration constraints
▪ Joint centering
▪ Alternative formulations
◦ Exercises
• Chapter 4: Geometric Pose Estimation
◦ Cameras and depth sensors
▪ Depth sensors
▪ Simulation
◦ Representations for geometry
◦ Point cloud registration with known correspondences
◦ Iterative Closest Point (ICP)
◦ Dealing with partial views and outliers
▪ Detecting outliers
▪ Point cloud segmentation
▪ Generalizing correspondence
▪ Soft correspondences
▪ Nonlinear optimization
▪ Global optimization
◦ Non-penetration and "free-space" constraints
▪ Free space constraints as non-penetration constraints
◦ Putting it all together
◦ Looking ahead
◦ Exercises
• Chapter 5: Bin Picking
◦ Generating random cluttered scenes
▪ Falling things
▪ Static equilibrium with frictional contact
▪ A few of the nuances of simulating contact
▪ Static equilibrium as an optimization constraint
◦ Grasp selection
▪ Point cloud pre-processing
▪ Estimating normals and local curvature
▪ Evaluating a candidate grasp
▪ Generating grasp candidates
◦ The corner cases
◦ Putting it all together
▪ A simple state machine
◦ Exercises
• Chapter 6: Object detection and segmentation
◦ Getting to big data
▪ Crowd-sourced annotation datasets
▪ Segmenting new classes via fine tuning
▪ Annotation tools for manipulation
▪ Synthetic datasets
◦ Object detection and segmentation
◦ Putting it all together
◦ Exercises
• Chapter 7: Force Control
◦ A simple model
▪ (Direct) force control
▪ A force-based "flip-up" strategy
▪ Indirect force control
▪ A stiffness-control-based "flip-up" strategy
▪ Hybrid position/force control
◦ Peg in hole
◦ Manipulator control
▪ Joint stiffness control
▪ Cartesian stiffness control
▪ Some implementation details on the iiwa
◦ Putting it all together
◦ Exercises
• Chapter 8: Motion Planning
◦ Inverse Kinematics
▪ Closed-form solutions
▪ IK as constrained optimization
▪ Grasp planning as IK
◦ Kinematic trajectory optimization
◦ Sample-based motion planning
◦ Exercises
• Chapter 9: Reinforcement Learning
◦ Exercises

APPENDIX
• Appendix A: Drake
◦ Pydrake
◦ Online Jupyter notebooks
◦ Running on your own machine
▪ Install Drake
◦ Getting help
• Appendix B: Setting up your own "Manipulation Station"
◦ Message Passing
◦ Kuka LBR iiwa + Schunk WSG Gripper
◦ Intel Realsense D415 Depth Cameras
◦ Miscellaneous hardware.
• Appendix C: Miscellaneous
◦ How to cite these notes
◦ Annotation tool etiquette
◦ Please give me feedback!

PREFACE
I've always loved robots, but it's only relatively recently that I've turned my attention
to robotic manipulation. I particularly like the challenge of building robots that can
master physics to achieve human/animal-like dexterity and agility. It was passive
dynamic walkers and the beautiful analysis that accompanies them that first helped
cement this centrality of dynamics in my view of the world and my approach to
robotics. From there I became fascinated with (experimental) fluid dynamics, and
the idea that birds with articulated wings actually "manipulate" the air to achieve
incredible efficiency and agility. Humanoid robots and fast-flying aerial vehicles in
clutter forced me to start thinking more deeply about the role of perception in
dynamics and control. Now I believe that this interplay between perception and
dynamics is truly fundamental, and I am passionate about the observation that
relatively "simple" problems in manipulation (how do I button up my dress shirt?)
expose the problem beautifully.

My approach to programming robots has always been very computational/

algorithmic. I started out using tools primarily from machine learning (especially
reinforcement learning) to develop the control systems for simple walking machines;
but as the robots and tasks got more complex I turned to more sophisticated tools
from model-based planning and optimization-based control. In my view, no other
discipline has thought so deeply about dynamics as has control theory, and the
algorithmic efficiency and guaranteed performance/robustness that can be obtained
by the best model-based control algorithms far surpasses what we can do today
with learning control. Unfortunately, the mathematical maturity of controls-related
research has also led the field to be relatively conservative in their assumptions
and problem formulations; the requirements for robotic manipulation break these
assumptions. For example, robust control typically assumes dynamics that are
(nearly) smooth and uncertainty that can be represented by simple distributions
or simple sets; but in robotic manipulation, we must deal with the non-smooth
mechanics of contact and uncertainty that comes from varied lighting conditions,
and different numbers of objects with unknown geometry and dynamics. In practice,
no state-of-the-art robotic manipulation system to date (that I know of) uses rigorous
control theory to design even the low-level feedback that determines when a robot
makes and breaks contact with the objects it is manipulating. An explicit goal of
these notes is to try to change that.

In the past few years, deep learning has had an unquestionable impact on robotic
perception, unblocking some of the most daunting challenges in performing
manipulation outside of a laboratory or factory environment. We will discuss relevant
tools from deep learning for object recognition, segmentation, pose/keypoint
estimation, shape completion, etc. Now relatively old approaches to learning control
are also enjoying an incredible surge in popularity, fueled in part by massive
computing power and increasingly available robot hardware and simulators. Unlike
learning for perception, learning control algorithms are still far from a technology,
with some of the most impressive looking results still being hard to understand and
to reproduce. But the recent work in this area has unquestionably highlighted the
pitfalls of the conservatism taken by the controls community. Learning researchers
are boldly formulating much more aggressive and exciting problems for robotic
manipulation than we have seen before -- in many cases we are realizing that
some manipulation tasks are actually quite easy, but in other cases we are finding
problems that are still fundamentally hard.

Finally, it feels that the time is ripe for robotic manipulation to have a real and
dramatic impact in the world, in fields from logistics to home robots. Over the
last few years, we've seen UAVs/drones transition from academic curiosities into
consumer products. Even more recently, autonomous driving has transitioned from
academic research to industry, at least in terms of dollars invested. Manipulation
feels like the next big thing that will make the leap from robotic research to practice.
It's still a bit risky for a venture capitalist to invest in, but nobody doubts the size of
the market once we have the technology. How lucky are we to potentially be able to
play a role in that transition?

So this is where the notes begin... we are at an incredible crossroads between

learning and control and robotics with an opportunity to have immediate impact in
industrial and consumer applications and potentially even to forge entirely new eras
for systems theory and controls. I'm just trying to hold on and to enjoy the ride.

A MANIPULATION TOOLBOX

Another explicit goal of these lecture notes is to provide high-quality

implementations of the most useful tools in a manipulation scientist's toolbox. When
I am forced to choose between mathematical clarity and runtime performance, the
clear formulation is always my first priority; I will try to include a performant
formulation, too, if possible or try to give pointers to alternatives. Manipulation
research is moving quickly, and I aim to evolve these notes to keep pace. I hope that
the software components provided in DRAKE and in these notes can be directly useful
to you in your own work.

If you would like to replicate any or all of the hardware that we use for these notes,
you can find information and instructions in the appendix.

As you use the code, please consider contributing back (especially to the mature
code in DRAKE). Even questions/bug reports can be important contributions. If you
have questions/find issues with these notes, please submit them here.
CHAPTER 1 Open in Colab

Introduction
It's worth taking time to appreciate just how amazingly well we are able to perform
tasks with our hands. Tasks that often feel mundane to us -- loading the dishwasher,
chopping vegetables, folding laundry -- remain as incredibly challenging tasks for
robots and are at the very forefront of robotics research.

Consider the problem of picking up a single plate from a stack of plates in the sink
and placing it into the dishwasher. Clearly you first have to perceive that there is
a plate in the sink and that it is accessible. Getting your hand to the plate likely
requires navigating your hand around the geometry of the sink and other dishes.
The act of actually picking it up might require a fairly subtle maneuver where you
have to tip up the plate, sliding it along your fingers and then along the sink/dishes
in order to get a reasonable grasp on it. Presumably as you lift it out of the sink,
you'd like to mostly avoid collisions between the plate and the sink, which suggests
a reasonable understanding of the size/extent of the plate (though I actually think
robots today are too afraid of collisions). Even placing the plate into the dishwasher
is pretty subtle. You might think that you would align the plate with the slats and
then slide it in, but I think humans are more clever than that. A seemingly better
strategy is to loosen your grip on the plate, come in at an angle and intentionally
contact one side of the slat, letting the plate effectively rotate itself into position as
you set it down. But the justification for this strategy is subtle -- it is a way to achieve
the kinematically accurate task without requiring much kinematic accuracy on the
position/orientation of the plate.

Figure 1.2 - A robot picking up a plate from a potentially cluttered sink (left: in
simulation, right: in reality). This example is from the manipulation team at the
Toyota Research Institute.

Perhaps one of the reasons that these problems remain so hard is that they require
strong capabilities in numerous technical areas that have traditionally been
somewhat disparate; it's challenging to be an expert in all of them. More so than
robotic mapping and navigation, or legged locomotion, or other great areas in
robotics, the most interesting problems in manipulation require significant
interactions between perception, planning, and control. This includes both
geometric perception to understand the local geometry of the objects and
environment and semantic perception to understand what opportunities for
manipulation are available in the scene. Planning typically includes reasoning about
the kinematic and dynamic constraints of the task (how do I command my rigid
seven degree-of-freedom arm to reach into the drawer?). But it also includes higher-
level task planning (to get milk into my cup, I need to open the fridge, then grab
the bottle, then unscrew the lid, then pour... and perhaps even put it all back
when I'm done). The low-level begs for representations using real numbers, but
the higher levels feel more logical and discrete. At perhaps the lowest level, our
hands are making and breaking contact with the world either directly or through
tools, exerting forces, rapidly and easily transitioning between sticking and sliding
frictional regimes -- these alone are incredibly rich and difficult problems from the
perspective of dynamics and control.

There is a lot for us to discuss!

1.1 MANIPULATION IS MORE THAN PICK-AND-PLACE

There are a large number of applications for manipulation. Picking up an object from
one bin and placing it into another bin -- one version of the famous "pick and place"
problem -- is a great application for robotic manipulation. Robots have done this for
decades in factories with carefully curated parts. In the last few years, we've seen
a new generation of pick-and-place systems that use deep learning for perception,
and can work well with much more diverse sets of objects, especially if the location/
orientation of the placement need not be very accurate. This can be done with
conventional robot hands or more special-purpose end-effectors that, for instance,
use suction. It can often be done without having a very accurate understanding of
the shape, pose, mass, nor friction of the object(s) to be picked.

The goal for these notes, however, is to examine the much broader view of
manipulation than what is captured by the pick and place problem. Even our thought
experiment of loading the dishwasher -- arguably a more advanced type of pick and
place -- requires much more from the perception, planning, and control systems.
But the diversity of tasks that humans (and hopefully soon robots) can do with their
hands is truly remarkable. To see one small catalog of examples that I like, take a
look at the Southampton Hand Assessment Procedure (SHAP), which was designed
as a way to empirically evaluate prosthetic hands. Matt Mason also gave a broad and
thoughtful definition of manipulation in the opening of his 2018 review paper[1].

It's also important to recognize that manipulation research today looks very different
than manipulation research looked like in the 1980s and 1990s. During that time
there was a strong and beautiful focus on "manipulation as grasping," with seminal
work on, for instance, the kinematics/dynamics of multi-fingered hands assuming a
stable grasp on an object. Sometimes still, I hear people use the term "grasping"
as almost synonymous with manipulation. But please realize that the goals of
manipulation research today, and of these notes, are much broader than that. Is
grasping a sufficient description of what your hand is doing when you are buttoning
your shirt? Making a salad? Spreading peanut butter on toast?

1.2 OPEN-WORLD MANIPULATION

Perhaps because humans are so good at manipulation, our expectations in terms of

performance and robustness for these tasks are extremely high. It's not enough to be
able to load one set of plates in a laboratory environment into a dishwasher reliably.
We'd like to be able to manipulate basically any plate that someone might put in the
sink. And we'd like the system to be able to work in any kitchen, despite various
geometric configurations, lighting conditions, etc. The challenge of achieving and
verifying robustness of a complex manipulation stack with physics, perception,
planning, and control in the loop is already daunting. But how do we provide test
coverage for every kitchen in the world?

The idea that the world has infinite variability (there will never be a point at which
you have seen every possible kitchen) is often referred to as the "open-world" or
"open-domain" problem -- a term popularized first in the context of video games.
It can be tough to strike a balance between rigorous thinking about aspects of the
manipulation problem, and trying to embrace the diversity and complexity of the
entire world. But it's important to walk that line.

There is chance that diversity of manipulation in the open world might actually
make the problem easier. We are just at the dawn of the era of big data in robotics;
the impact this will have cannot be overstated. But I think it is deeper than that.
As an example, some of our optimization formulations for planning and control
might get stuck in local minima now because narrow problem formulations can have
many quirky solutions; but once we ask a controller to work in a huge diversity of
scenarios then the quirky solutions can be discarded and the optimization landscape
may become much simpler. But to the extent that is true, then we should aim to
understand it rigorously!

1.3 SIMULATION
There is another reason that we are now entering a golden age for manipulation
research. Our software tools have (very recently) gotten much better!

It was only about three years ago that I remember talking to my students, who
were all quite adept at using simulation for developing control systems for walking
robots, about using simulation for manipulation. "You can't work on manipulation in
simulation" was their refrain, and for good reason. The complexity of the contact
mechanics in manipulation has traditionally been much harder to simulate than a
walking robot that only interacts with the ground and through a minimal set of
contact points. Looming even larger, though, was the centrality of perception for
manipulation; it was generally accepted that one could not simulate a camera well
enough to be meaningful.

How quickly things can change! The last few years has seen a rapid adoption of
video-game quality rendering by the robotics and computer vision communities.
The growing consensus now is that game-engine renderers can model cameras
well enough not only to test a perception system in simulation, but even to train
perception systems in simulation and expect them to work in the real world! This is
fairly amazing, as we were all very concerned before that training a deep learning
perception system in simulation would allow it to exploit any quirks of the simulated
images that could make the problem easier.

We have also seen dramatic improvements in the quality and performance of contact
simulation. Making robust and performant simulations of multi-body contact
involves dealing with complex geometry queries and stiff (measure-) differential
equations. There is still room for fundamental improvements in the mathematical
formulations of the governing equations, but today's solvers are good enough to be
extremely useful.

1.4 THESE NOTES ARE INTERACTIVE

By leveraging the power of simulation, and the new availability of free online
interactive compute, I am trying to make these notes more than what you would find
in a traditional text. Each chapter will have working code examples, which you can
run immediately (no installation required) on Google's Colaboratory, or download/
install and run on your local machine (see the appendix for more details). It uses an
open-source library called DRAKE which has been a labor of love for me over the last
7 years or so. Because all of the code is open source, it is entirely up to you how
deep into the rabbit hole you choose to go.

Mortimer Adler famously said "Reading a [great] book should be a conversation

between you and the author." In addition to the interactive graphics/code, I've added
the ability to highlight/comment/ask questions directly on these notes (go ahead and
try it; but please read my note on annotation etiquette). Adler's suggestion was that
great writing can turn static text into a dialogue, transcending distance and time;
perhaps I'm cheating, but technology can help me communicate with you even if my
writing isn't as strong as Adler would have liked!

I have organized the software examples into notebooks by chapter. There is an

"Open in Colab" button at the top of each chapter; I'd encourage you to open it
immediately when you are reading the chapter and run the first cell, which takes up
to two minutes to provision the cloud machine. Then as you read the text, I will have
examples that will have corresponding sections in the notebook.

Example 1.1 (Teleoperation in 3D.)

Before we get into any autonomous manipulation, let's start by just getting a feel
for what it will be like to work on manipulation in an online Jupyter notebook. This
example will open up a new window with our 3D visualizer, and in the original
notebook it has sliders that you can use to drive the end-effector of the robot
around. Give it a spin!

Open in Colab

Example 1.2 (Teleoperation in 2D.)

Many of the core concepts in this class can be studied in 2D instead of 3D. And
everything is simpler/cleaner in two dimensions, including teleoperation!

Open in Colab

1.5 MODEL-BASED DESIGN AND ANALYSIS

Thanks to progress in simulation, it is now possible to pursue a meaningful study

of manipulation without needing a physical robot. But simulation (of our robot
dynamics, sensors, actuators and its environment) is not enough. Another major
challenge in manipulation is the potential number and complexity of autonomy
components (perception, planning, and control) that we need to assemble in order
to do the tasks. A major goal of these notes is to attempt to bridle this complexity, to
the extent possible.

Many of you will be familiar with ROS (the Robot Operating System). I believe that
ROS was one of the best things to happen to robotics in the last decade. It meant that
experts from different subdisciplines could easily share their expertise in the form of
modular components. Components (as ROS packages) simply agree on the messages
that they will send and receive on the network; packages can inter-operate even if
they are written in different programming languages or even on different operating
systems.

Although ROS makes it relatively easy to get started with manipulation, it doesn't
serve my pedagogical goal of thinking clearly about manipulation. The modular
approach to authoring the components is extremely good, and we will adopt it here.
But in DRAKE we ask for a little bit more from each of the components -- essentially
that they declare their states, parameters, and timing semantics in a consistent way
-- so that we have a much better chance of understanding the complex relationships
between systems. This has great practical value as well; the ability to debug a full
manipulation stack with repeatable deterministic simulations (even if they include
randomness) is surprisingly rare in the field but hugely valuable.

The key building block in our work will be DRAKE Systems, and systems can be
combined in complex combinations into Diagrams. System diagrams have long
been the modeling paradigm used in controls, and the software aspect of it will be
very familiar to you if you've used tools like Simulink, LabView, or Modelica. These
software tools refer to the block-diagram design paradigm as "model-based design".

Example 1.3 (System Diagrams)

Even the examples above, which relied on you to perform teleoperation instead of
having an autonomy stack, were still the result of combining numerous parts. For
any system (a system diagram is still a system) that you have in DRAKE, you can
visualize that diagram directly in the notebook.

This graphic is interactive. Make sure you zoom in and click around to get a sense
for the amount of complexity that we can abstract away in this framework. For
instance, try expanding the iiwa_controller block.

Whenever you are ready to learn more about the Systems Framework in DRAKE, I
recommend starting with the "Modeling Dynamical Systems" tutorial linked from
the main Drake website.

Let me be transparent: not everybody likes this systems framework. Some people
are just trying to move fast, and don't see the benefits of slowing down to declare
their state variables, etc. It will likely feel like a burden to you the first time you
go to write a new system. But it's precisely because we're trying to build complex
systems quickly that I advocate this more rigorous approach. I believe that getting
to the next level of maturity in our open-world manipulation system requires more
maturity from our building blocks.

1.6 ORGANIZATION OF THESE NOTES

The remaining chapters of these notes are organized around the component-level
building blocks of manipulation. Many of these components each individually build
on a wealth of literature (e.g. from computer vision, or dynamics and control).
Rather than be overwhelmed, I've chosen to focus on delivering a consistent
coherent presentation of the most relevant ideas from each field as they relate to
manipulation, and pointers to more literature. Even finding a single notation across
all of the fields can be a challenge!

The next few chapters will give you a minimal background on the relevant robot
hardware that we are simulating, on (some of) the details about simulating them,
and on the geometry and kinematics basics that we will use heavily through the
notes.

For the remainder of the notes, rather than organize the chapters into "Perception",
"Planning", and "Control", I've decided to spiral through these topics. In the first
part, we'll do just enough perception, planning, and control to build up a basic
manipulation system that can perform pick-and-place manipulation of known
isolated objects. Then I'll try to motivate each new chapter by describing what our
previous system cannot do, and what we hope it will do by the end of the chapter.

I welcome any feedback. And don't forget to interact!

REFERENCES

[1] Matthew T Mason. Toward robotic manipulation. Annual Review of Control,

Robotics, and Autonomous Systems, 1:1--28, 2018.
CHAPTER 2 Open in Colab

Let's get you a robot

In this chapter we're going to outfit your mech. I want to make sure you understand
the robot hardware that we've selected for these notes, and how it compares to the
other hardware available today. You should also come away with an understanding of
how we simulate the robot and what commands you can send to the robot interface.

Although we are going to focus on one particular set of hardware for these notes,
one of the great things about modern robotics is that many of the tools we will
develop over the course of these notes are quite general, and can be transferred
from one robot to another easily. I could imagine a future version of these notes
where you really do get to build out your robot in this chapter, and use your
customized robot for the remaining chapters!

2.1 ARMS
There appear to be a lot of robotics arms available on the market. So how does
one choose? Cost, reliability, usability, payload, range of motion, ...; there are many
important considerations. And the choices we might make for a research lab might
be very different than the choices we might make as a startup.

Example 2.1 (Robot arms)

I've put together a simple example to let you explore some of the various robot
arms that are popular today.

This is made possible by the proliferation of some common file formats for
describing our robots. Unfortunately, the field has not converged on a single
preferred format (yet), and each of them have their quirks. Drake currently loads
Universal Robot Description Format (URDF) and Simulation Description Format
(SDF); the Drake developers have been trying to upstream improvements to SDF
rather than start yet another format.

Let me know if your favorite arm isn't on the list yet!

Open in Colab

There is one particular requirement which, if we want our robot to meet, quickly
winnows the field to only a few viable platforms: that requirement is joint-torque
sensing and control. It's not absolutely clear that this feature is required, even for
very advanced applications, but as a researcher who cares a great deal about the
contact interactions between my robots and the world, I prefer to have the capability
and explore whether I need it rather than wonder what life might have been like.
Figure 2.1 - Kuka LBR iiwa robot. This one has a 7kg payload.

Out of the torque-controlled robots on the market, I've chosen to use the Kuka LBR
iiwa robot to use throughout these notes (I will try to use the lower case "iiwa"
to be consistent with the manufacturer, but it looks wrong to me every time!). To
better understand why, let us start by understanding the difference between most
robots, which are position-controlled, and the handful of robots that have accepted
the additional cost and complexity to provide torque sensing and control.

2.1.1 Position-controlled robots

Figure 2.2 - Two popular position controlled manipulators. (Left) The UR10 from
Universal Robotics. (Right)The ABB Yumi.

Most robot arms today are "position controlled" -- given a desired joint position
(or joint trajectory), the robot executes it with relatively high precision. Basically
all arms can be position controlled -- if the robot offers a torque control interface
(with sufficiently high bandwidth) then we can certainly regulate positions, too. In
practice, calling a robot "position controlled" is a polite way of saying that it does
not offer torque control. Do you know why position control and not torque control is
the norm?

Lightweight arms like the examples above are actuated with electric motors. For a
reasonably high-quality electric motor (with windings designed to minimize torque
ripple, etc), we expect the torque that the motor outputs to be directly proportional
to the current that we apply:
τmotor = kt i,

where τmotor is the motor torque, i is the applied current, and kt is the "motor
torque constant". (Similarly, applied voltage has a simple (affine) relationship with
the motor's steady-state velocity). If we can control the current, then why can we not
control the torque?

The short answer is that to achieve reasonable cost and weight, we typically choose
small electric motors with large gear reductions, and gear reductions come with
a number of dynamic effects that are very difficult to model -- including backlash,
vibration, and friction. So the simple relationship between current and torque breaks
down. Conventional wisdom is that for large gear ratios (say ≫ 10), the unmodeled
terms are significant enough that they cannot be ignored, and torque is no longer
simply related to current.

Position Control.
How can we overcome this challenge of not having a good model of the transmission
dynamics? Regulating the current or speed of the motor only requires sensors on
the motor side of the transmission; if we add more sensors on the output side of
the transmission, then we can use high-gain feedback to try to directly regulate the
joint, instead of the motor. Importantly, although the torques due to the transmission
are not known precisely, they are also not arbitrary -- for instance they will never
add energy into the system. Most importantly, we can be confident that there is
a monotonically increasing relationship between the current that we put into the
motor and the torque at the joint, and ultimately the acceleration of the joint.
Note that I chose the term monotonic carefully, meaning "non-decreasing" but not
implying "strictly increasing", because, for instance, when a joint is starting from
rest, static friction will resist small torques without having any acceleration at the
output.

The most common sensor to add to the joint is a position sensor -- typically an
encoder or potentiometer -- these are inexpensive, accurate, and robust. In practice,
we think of these as providing (after some signal filtering/conditioning) accurate
measurements of the joint position and joint velocity via differentiation -- joint
accelerations can also be obtained via differentiating twice but are generally
considered more noisy and less suitable for use in tight feedback loops. This is
sufficient for accurately tracking desired position trajectories of the arm. For each
joint, if we are given a desired trajectory q d (t), then I can track this using
proportional-integral-derivative (PID) control:

τ = kp (q d − q) + kd (q̇ d − q˙) + ki ∫ (q d − q)dt,

with kp , kd , and ki being the position, velocity, and integral gains. PID control has a
rich theory, and a trove of knowledge about how to choose the gains, which I will
not reproduce here. I will note, however, that when we simulate position-controlled
robots we use the equation above, commanding a torque directly, but on hardware
our PID controller typically outputs voltage commands (via pulse-width modulation).
Closing this modeling gap has traditionally not been a priority for robot simulation,
but does mean that we often have one list of gains for the physical robot and a
separate list for the simulation.

An aside: link dynamics with a transmission.

One thing that might be surprising is that, despite the fact that the joint dynamics
of a manipulator are highly coupled and state dependent, the PID gains for each
joint are often chosen independently for each joint, and are constant (not gain-
scheduled ). Wouldn't you expect for the motor commands required for e.g. a robot
arm at full extension holding a milk jug might be very different than the motor
commands required when it is unloaded in a vertical hanging position? Surprisingly,
the required gains/commands might not be as different as one would think.

Electric motors are most efficient at high speeds (often > 100 or 1,000 rotations
per minute). We probably don't actually want our robots to move that fast even if
they could! So nearly all electric robots have fairly substantial gear reductions, often
on the order of 100:1; the transmission output turns one revolution for every 100
rotations of the motor, and the output torque is 100 times greater than the motor
torque. For a gear ratio, n, actuating a joint q, we have

1
qmotor = nq, q˙motor = nq̇ , q¨motor = nq̈ , τmotor = τ.
n
Interestingly, this has a fairly profound impact on the resulting dynamics (given
by f = ma), even for a single joint. Writing the relationship between joint torque
and joint acceleration (no motors yet), we can write ma = ∑ f in the rotational
coordinates as

Iarm q¨ = τgravity + τ,

where Iarm is the moment of inertia. For example, for a simple pendulum, we might
have

ml2 q¨ = −mgl sin q + τ.

But the applied joint torque τ actually comes from the motor -- if we write this
equation in terms of motor coordinates we get:
Iarm
q¨ = τgravity + nτmotor .
n motor
If we divide through by n, and take into account the fact that the motor itself has
inertia (e.g. from the large spinning magnets) that is not affected by the gear ratio,
then we obtain:

Iarm τgravity
(Imotor + )q̈ motor = + τmotor .
n2 n

It's interesting to note that, even though the mass of the motors might make up only
a small fraction of the total mass of the robot, for highly geared robots they can play
a significant role in the dynamics of the joints. We use the term reflected inertia to
denote the inertial load that is felt on the opposite side of a transmission, due to
the scaling effect of the transmission. The "reflected inertia" of the arm at the motor
is cut by the square of the gear ratio; or the "reflected inertia" of the motor at the
arm is multiplied by the square of the gear ratio. This has interesting consequences
-- as we move to the multi-link case, we will see that the inertia of the arm is a
state-dependent function that captures the inertial coupling of the other joints in the
manipulator. The motor inertia, on the other hand, is constant. For large gear ratios,
this means the dynamics of the joint are less dependent on the nonlinear effects of
joint position and velocity, making it relatively easy to tune constant feedback gains
that perform well in all configurations.

2.1.2 Torque-controlled robots

Although not as common, there are a number of robots that do support direct control
of the joint torques. There are a handful of ways that this capability can be realized.

It is possible to actuate a robot using electric motors that require only a small gear
reduction (e.g. ≤ 10:1) where the frictional forces are negligible. In the past, these
"direct-drive robots"[1] had enormous motors and limited payloads. More recently,
robots like the Barrett WAM arm used cable drives to keep the arm light but having
large motors in the base. And just in the last few years, we've seen progress in high-
torque outrunner and frameless motors bringing in a new generation of low-cost,
"quasi-direct-drive" robots: e.g. MIT Cheetah [2], Berkeley Blue, and Halodi Eve.

Hydraulic actuators provide another solution for generating large torques without
large transmissions. Sarcos had a series of torque-controlled arms (and humanoids),
and many of the most famous robots from Boston Dynamics are based on hydraulics
(though there is an increasing trend towards electric motors). These robots typically
have a single central pump and each actuator has a (lightweight) valve that can
shunt fluid through the actuator or through a bypass; the differential pressure across
the actuator is at least approximately related to the resulting force/torque.

Another approach to torque control is to keep the large gear-ratio motors, but add
sensors to directly measure the torque at the joint side of the actuator. This is
the approach used by the Kuka iiwa robot that we use in the example throughout
this text; the iiwa actuators have strain gauges integrated into the transmission.
However there is a trade-off between the stiffness of the transmission and the
accuracy of the force/torque measurement [3] -- the iiwa transmission includes an
explicit "Flex Spline" with a stiffness around 5000 Nm/rad [4]. Taking this idea
to an extreme, Gill Pratt proposed "series-elastic actuators" that have even lower
stiffness springs in the transmission, and proposed measuring joint position on
both the motor and joint sides of the transmission to estimate the applied torques
[5]. For example, the Baxter and Sawyer robots from Rethink use series elastic
actuators; I don't think they've published the spring stiffness values but similarly-
motivated series-elastic actuators from HEBI robotics are closer to 100 Nm/rad.
Even for the iiwa actuators, the joint elasticity is significant enough that the low-
level controllers go to great length to take it into account explicitly in order to
achieve high-performance control of the joints[6]. We will discuss these details when
we get to the chapter covering force control.

2.1.3 A proliferation of hardware

The low-cost torque-controlled arms that I mentioned above are just the beginnings
in what promises to be a massive proliferation of robotic arms. As demand increases,
costs will continue to come down.

Let me just say that, compared to working on legged robots, where for decades
we did our research on laboratory prototypes built by graduate students (and
occasionally professors!) in the machine shop down the hall, the availability of
professionally engineered, high-quality, high-uptime hardware is an absolute treat.
This also means that we can test algorithms in one lab and have another lab perhaps
at another univeristy testing algorithms on almost identical hardware; this facilitates
levels of repeatability and sharing that were impossible before. The fact that the
prices are coming down, which will mean many more similar robots in many more
labs/environments, is one of the big reasons why I am so optimistic about the next
few years in the field.

It's a good time to be working on manipulation!

2.1.4 Simulating the Kuka iiwa

It's time to simulate our chosen robotic arm. The first step is to obtain a robot
description file (URDF or SDF). For convenience, we ship the models for a few
robots, including iiwa, with Drake. If you're interested in simulating a different
robot, you can find either a URDF or SDF describing most commercial robots
somewhere online. But a word of warning: the quality of these models can vary
wildly. We've seen surprising errors in even the kinematics (link lengths, geometries,
etc), but the dynamics properties (inertia, friction, etc) in particular are often not
accurate at all. Sometimes they are not even mathematically consistent (e.g. it is
possible to specify an inertial matrix which is not physically realizable by any rigid
body). Drake will complain if you ask it to load a file with this sort of violation; we
would rather alert you early than start generating bogus simulations. There is also
increasingly good support for exporting to a robot format directly from CAD software
like Solidworks.

Now we have to import this robot description file into our physics engine. In Drake,
the physics engine is called MultibodyPlant. The term "plant" may seem odd -- it
is the word used in the controls literature to represent the physical system, with
origins in the control of chemical plants -- but it is pervasive. This connection is
also important to me; not many physics engines in the world go to the lengths that
Drake does to make the physics engine compatible with system-theoretic design and
analysis. You can access the methods of MultibodyPlant through its class methods,
e.g. for use in designing a model-based controller. But in order to work with our
model-based design philosophy, MultibodyPlant also provides an interface as a
System, with input and output ports.

applied_generalized_force→ MultibodyPlant → continuous_state

→ body_poses
→ body_spatial_velocities
→ body_spatial_accelerations
→ generalized_acceleration

applied_spatial_force→ → reaction_forces

model_instance_name[i]_actuation→ → contact_results

geometry_query→ → model_instance_name[i]_continuous_state
→
model_instance_name[i]_generalized_acceleration
→
model_instance_name[i]_generalized_contact_force
→ geometry_pose

As you might expect for something as complex and general as a physics engine,
it has many input and output ports; most of them are optional. I'll illustrate the
mechanics of using these in the following example.

Example 2.2 (Simulating the passive iiwa)

It's worth spending a few minutes with this example, which should help you
understand not only the physics engine, but some of the basic mechanics of
working with simulations in Drake.

Open in Colab

The best way to visualize the results of a physics engine is with a 2D or 3D visualizer.
For that, we need to add the system which curates the geometry of a scene; in Drake
we call it the SceneGraph. Once we have a SceneGraph, then there are a number of
different visualizers and sensors that we can add to the system to actually render
the scene.

source_pose{0}→
→ lcm_visualization
...→ SceneGraph
→ query
source_pose{N-1}→

Example 2.3 (Visualizing the scene)

This example is far more interesting to watch. Now we have the 3D visualization!

Open in Colab

You might wonder why MultibodyPlant doesn't handle the geometry of the scene
as well. Well, there are many application where we'd like to render complex scenes,
and use complex sensors, but supply custom dynamics instead of the default physics
engine. Autonomous driving is a great example; in that case we want to populate a
SceneGraph with all of the geometry of the vehicles and environment, but we often
want to simulate the vehicles with very simple vehicle models that stop well short of
adding tire mechanics into our physics engine.

We now have a basic simulation of the iiwa, but already some subtleties emerge. The
physics engine needs to be told what torques to apply at the joints. In our example,
we apply zero torque, and the robot falls down. In reality, that never happens; in
fact there is essentially never a situation where the physical iiwa robot experiences
zero torque at the joints, even when the controller is turned off. Like many mature
industrial robot arms, iiwa has mechanical brakes at each joint that are engaged
whenever the controller is turned off. To simulate the robot with the controller
turned off, we would need to tell our physics engine about the torques produced by
these brakes.

In fact, even when the controller is turned on, and despite the fact that it is a
torque-controlled robot, we can never actually send zero torques to the motors. The
iiwa software interface accepts "feed-forward torque" commands, but it will always
add these as additional torques to its low-level controller which is compensating
for gravity and the motor/transmission mechanics. This often feels frustrating, but
probably we don't actually want to get into the details of simulating the drive
mechanics.

As a result, the simplest reasonable simulation we can provide of the iiwa must
include a simulation of their low-level controller. We will use the iiwa's "joint
impedance control" mode, and will describe the details of that once they become
important for getting the robot to perform better. For now, we can treat it as given,
and produce our simplest reasonable iiwa simulation.

Example 2.4 (Adding the iiwa low-level controller)

This example adds the iiwa controller and sets the desired positions (no longer the
desired torques) to be the current state of the robot. It's a more faithful simulation
of the real robot. I'm sorry that it is boring once again!

Open in Colab

As a final note, you might think that simulating the dynamics of the robot is overkill,
if our only goal is to simulate manipulation tasks where the robot is moving only
relatively slowly, and effects of mass, inertia and forces might be less important than
just the positions that the robot (and the objects) occupy in space. I would actually
agree with you. But it's surprisingly tricky to get a kinematic simulation to respect
the basic rules of interaction; e.g. to know when the object gets picked up or when
it does not (see, for instance [7]). Currently, in Drake, we mostly use the full physics
engine for simulation, but often use simpler models for manipulation planning and
control.

2.2 HANDS
You might have noticed that the iiwa model does not actually have a hand attached;
the robot ships with a mounting plate so that you can attach the "end-effector" of
your choice (and some options on access ports so you can connect your end-effector
to the computer without wires running down the outside of the robot). So now we
have another decision to make: what hand should we use?

Example 2.5 (Robot hands)

We can explore different hand models in Drake using the same sort of interface
we used for the arms, though I don't have as many hand models here yet. Let me
know if your favorite hand isn't on the list!

Open in Colab
It is interesting that, when it comes to robot end effectors, researchers in
manipulation tend to partition themselves into a few distinct camps.

2.2.1 Dexterous hands

Figure 2.3 - Dexterous hands. Left: the Shadow Dexterous Hand. Right: the Allegro
Hand.

Of course, our fascination with the human hand is well placed, and we dream of
building robotic hands that are as dexterous and sensor-rich. But the reality is that
we aren't there yet. Some people choose to pursue this dream and work with the
best dexterous hands on the market, and struggle with the complexity and lack of
robustness that ensues. The famous "learning dexterity" project from OpenAI used
the Shadow hand for playing with a Rubik's cube, and the work that had to go into
the hand in order to support the endurance learning experiments was definitely a
part of the story. There is a chance that new manufacturing techniques could really
disrupt this space -- videos like this one of FLLEX v2 look amazing[8] -- and I am
very optimistic that we'll have more capable and robust dexterous hands in the not-
so-distant future.

2.2.2 Simple grippers

Click here to watch the video.

Figure 2.4 - This video of tele-operation with the PR1 from Ken Salisbury's group is
now a classic example of doing amazingly useful things with a very simple hand.
Check out their website for more videos, including sweeping, fetching a beer, and
unloading a dishwasher.

Another camp points out that dexterous hands are not

necessary -- I can give you a simple gripper from the toy
store and you can still accomplish amazingly useful tasks
around the home. The PR1 videos above are a great
demonstration of this.

Another important argument in favor of simple hands is the elegance and clarity
that comes from reducing the complexity. If thinking clearly about simple grippers
helps us understand more deeply why we need more dexterous hands (I think it will),
then great. For most of these notes, a simple two-fingered gripper will serve our
pedagogical goals the best. In particular, I've selected the Schunk WSG 050, which
we have used extensively in our research over the last few years. We'll also explore a
few different end-effectors in later chapters, when they help to explain the concepts.
To be clear: just because a hand is simple (few degrees of freedom) does not
mean that it is low quality. On the contrary, the Schunk WSG is a very high-quality
gripper with force control and force measurement at its single degree of freedom
that surpasses the fidelity of the Kuka. It would be hard to achieve the same in a
dexterous hand with many joints.

2.2.3 Soft/underactuated hands

Finally, the third and newest camp is promoting clever mechanical designs for
hands, which we'll call "underactuated hands", that can achieve some of the
advantages of a human hand based on mechanics alone. But often these hands are
amazingly good at some range of tasks (most often "enveloping grasps"), but not as
general purpose. It would be very hard to use one of these to, for instance, button
my shirt.

TODO: finish this section.

2.2.4 Other end effectors

Suction, jamming grippers, tool-changers...`

TODO: finish this section.

I particularly liked the response that Matt Mason, one of the main advocates for
simple grippers throughout the years, gave to a question at the end of one of our
robotics seminars: he argued that useful robots in e.g. the kitchen will probably have
special purpose tools that can be changed quickly.

2.2.5 If you haven't seen it...

One time I was attending an event where the registration form asked us "what is
your favorite robot of all time, real or fictional". That is a tough question for someone
who loves robots! But the answer I gave was a super cool "high-speed multifingered
hand" by the Ishikawa group; a project that started turning out amazing results back
in 2004! They "overclocked" the hand -- sending more current for short durations
than would be reasonable for any longer applications -- and also used high-speed
cameras to achieve these results. And they had a Rubik's cube demo, too, in 2017.

Click here to watch the video.

So good!

2.3 SENSORS
I haven't said much yet about sensors. In fact, sensors are going to be a major topic
for us when we get to perception with (depth) cameras, and when we think about
tactile sensing. But I will defer those topics until we need them.

For now, let us focus on the joint sensors on the robot. Both the iiwa and the Schunk
WSG provide joint feedback -- the iiwa driver gives "measured position", "estimated
velocity", and "measured torque" at each of its seven joints; remember that joint
accelerations are typically considered too noisy to rely on. Similarly the Schunk WSG
outputs "measured state" (position + velocity) and "measured force". We can make
all of these available as ports in a block diagram.
2.4 PUTTING IT ALL TOGETHER

If you've worked through the examples, you've seen that a proper simulation of our
robot is more than just a physics engine -- it requires assembling physics, actuator
and sensor models, and low-level robot controllers into a common framework. In
practice, in Drake, that means that we are assembling increasingly sophisticated
block diagrams.

One of the best things about the block-diagram modeling paradigm is the power
of abstraction and encapsulation. We can assemble a Diagram that contains all of
the components necessary to simulate our hardware platform and its environment,
which we will refer to affectionately as the "Manipulation Station". This diagram
itself can then be used as a System in additional diagrams, which can include
our perception, planning, and, higher-level control systems. All together, the
ManipulationStation system looks like this:

→ iiwa_position_commanded
→ iiwa_position_measured
→ iiwa_velocity_estimated
→ iiwa_state_estimated
→ iiwa_torque_commanded
→ iiwa_torque_measured
→ iiwa_torque_external
→ wsg_state_measured
→ wsg_force_measured
iiwa_position→
→ camera_[NAME]_rgb_image
iiwa_feedforward_torque
(optional)→ →
ManipulationStation camera_[NAME]_depth_image
wsg_position→
→
wsg_force_limit camera_[NAME]_label_image
(optional)→
→ ...
→ camera_[NAME]_rgb_image
→
camera_[NAME]_depth_image
→
camera_[NAME]_label_image

→ query_object

→ contact_results

→ plant_continuous_state

→ geometry_poses

This model also defines the abstraction between the simulation and the real
hardware. We offer an almost identical system, the
ManipulationStationHardwareInterface. If you replace this directly in place of the
ManipulationStation, then the same code you've developed in simulation can be
run directly on the real robot. The ports that are available only in simulation, but not
in reality, are colored orange on the ManipulationStation system.
→ iiwa_position_commanded
→ iiwa_position_measured
→ iiwa_velocity_estimated
→ iiwa_torque_commanded
→ iiwa_torque_measured
iiwa_position→ → iiwa_torque_external
iiwa_feedforward_torque→ → wsg_state_measured
wsg_position→ ManipulationStationHardwareInterface
→ wsg_force_measured
wsg_force_limit → camera_[NAME]_rgb_image
(optional)→
→
camera_[NAME]_depth_image
→ ...
→ camera_[NAME]_rgb_image
→
camera_[NAME]_depth_image

The ManipulationStationHardwareInterface is also a diagram, but rather than

being made up of the simulation components like MultibodyPlant and SceneGraph,
it is made up of systems that perform network message passing to interface with the
small executables that talk to the individual hardware drivers. If you dig under the
covers, you will see that we use LCM for this instead of ROS messages, precisely
because LCM is a lighter-weight dependency for our public repository. But many
Drake developers/users use Drake in a ROS ecosystem; we have examples/coverage
ensuring that our build systems remain compatible here, and are working on
curating well-maintained tutorials for users who want that workflow.

If you do have your own similar robot hardware available, and want to run the
hardware interface on your machines, I've started putting together a list of drivers
and bill of materials in the appendix.

2.5 EXERCISES

Exercise 2.1 (Role of Reflected Inertia)

For this exercise you will investigate the effect of reflected inertia on the joint-
space dynamics of the robot, and how it affects simple position control laws. You
will work exclusively in this notebook. You will be asked to complete the following
steps:

a. Derive the first-order state-space dynamics x ˙ = f(x, u) of a simple

pendulum with a motor and gearbox.
b. Compare the behavior of the direct-driven simple pendulum and the
simple pendulum with a high-ratio gearbox, under the same position
control law.

Exercise 2.2 (Input and Output Ports on the Manipulation

Station)
For this exercise you will investigate how a manipulation station is abstracted in
Drake's system-level framework. You will work exclusively in this notebook. You
will be asked to complete the following steps:

a. Learn how to probe into inputs and output ports of the manipulation
station and evaluate their contents.
b. Explore what different ports correspond to by probing their values.

REFERENCES

[1] Haruhiko Asada and Kamal Youcef-Toumi. Direct-Drive Robots - Theory and
Practice. The MIT Press, 1987.
[2] P. M. Wensing, A. Wang, S. Seok, D. Otten, J. Lang, and S. Kim. Proprioceptive
actuator design in the mit cheetah: Impact mitigation and high-bandwidth
physical interaction for dynamic legged robots. IEEE Transactions on Robotics,
33(3):509--522, June 2017.
[3] Navvab Kashiri, Jörn Malzahn, and Nikos Tsagarakis. On the sensor design of
torque controlled actuators: A comparison study of strain gauge and encoder
based principles. IEEE Robotics and Automation Letters, PP, 02 2017.
[4] Armin Wedler, Maxime Chalon, Klaus Landzettel, Martin Görner, Erich Krämer,
Robin Gruber, Alexander Beyer, Hans-Jürgen Sedlmayr, Bertram Willberg,
Wieland Bertleff, Josef Reill, Markus Schedl, Alin Albu-Schaffer, and Gerd
Hirzinger. Dlrs dynamic actuator modules for robotic space applications. In
Proceedings of the 41st Aerospace Mechanisms Symposium, May 16-18 2012.
[5] G. A. Pratt and M. M. Williamson. Series elastic actuators. In Proceedings 1995
IEEE/RSJ International Conference on Intelligent Robots and Systems. Human
Robot Interaction and Cooperative Robots, volume 1, pages 399--406 vol.1, Aug
1995.
[6] Alin Albu-Schaffer, Christian Ott, and Gerd Hirzinger. A unified passivity-based
control framework for position, torque and impedance control of flexible joint
robots. The international journal of robotics research, 26(1):23--39, 2007.
[7] Tao Pang and Russ Tedrake. A robust time-stepping scheme for quasistatic rigid
multibody systems. In IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 2018. [ www: ]
[8] Yong-Jae Kim, Junsuk Yoon, and Young-Woo Sim. Fluid lubricated dexterous
finger mechanism for human-like impact absorbing capability. IEEE Robotics and
Automation Letters, 4(4):3971--3978, 2019.
CHAPTER 3 Open in Colab

Basic Pick and Place

Figure 3.1 - Your challenge: command the robot to pick up the brick and place it in
a desired position/orientation.

The stage is set. You have your robot. I have a little red foam brick. I'm going to
put it on the table in front of your robot, and your goal is to move it to a desired
position/orientation on the table. I want to defer the perception problem for one
chapter, and will let you assume that you have access to a perfect measurement of
the current position/orientation of the brick. Even without perception, completing
this task requires us to build up a basic toolkit for geometry and kinematics; it's a
natural place to start.

First, we will establish some terminology and notation for kinematics. This is one
area where careful notation can yield dividends, and sloppy notation will inevitably
lead to confusion and bugs. The Drake developers have gone to great length to
establish and document a consistent multibody notation, which we call "Monogram
Notation". The documentation even includes some of the motivation/philosophy
behind that notation. I'll use the monogram notation throughout this text.

If you'd like a more extensive background on kinematics than what I provide here,
my favorite reference is still [1]. For free online resources, Chapters 2 and 3 of the
1994 book by Murray et al. (now free online)[2] are also excellent, as are the first
seven chapters of Modern Robotics by Lynch and Park[3] (they also have excellent
accompanying videos). Unfortunately, with three different references you'll get three
(slightly) different notations; ours is most similar to [0].

Please don't get overwhelmed by how much background material there is to know!
I am personally of the opinion that a clear understanding of just a few basic ideas
should make you very effective here. The details will come later, if you need them.

3.1 MONOGRAM NOTATION

The following concepts are disarmingly subtle. I've seen incredibly smart people
assume they knew them and then perpetually stumble over notation. I did it for years
myself. Take a minute to read this carefully!

Perhaps the most fundamental concept in geometry is the concept of a point. Points
occupy a position in space, and they can have names, e.g. point A, C , or more
descriptive names like Bcm for the center of mass of body B. We'll denote the position
of the point by using a position vector pA ; that's p for position, and not for point,
because other geometric quantities can also have a position.
But let's be more careful. Position is actually a relative quantity. Really, we should
only ever write the position of two points relative to each other. We'll use e.g. A pC to
denote the position of C relative to A. The left superscript looks mighty strange, but
we'll see that it pays off once we start transforming points.

Every time we describe the (relative) position as a vector of numbers, we need to

be explicit about the frame we are using, specifically the "expressed-in" frame. All of
our frames are defined by orthogonal unit vectors that follow the "right-hand rule".
We'll give a frame a name, too, like F . If I want to write the position of point C
relative to point A, expressed in frame F , I will write A pC
F . If I ever want to get just a
single component of that vector, e.g. the x component, then I'll use A pC Fx .

That is seriously heavy notation. I don't love it myself, but it's the most durable I've
got, and we'll have shorthand for when the context is clear.

There are a few very special frames. We use W to denote the world
frame. We think about the world frame in Drake using vehicle
coordinates (positive x to the front, positive y to the left, and positive z
is up). The other particularly special frames are the body frames: every
body in the multibody system has a unique frame attached to it. We'll
typically use Bi to denote the frame for body i.

Frames have a position, too -- it coincides with the frame origin. So it is perfectly
valid to write W pA
W to denote the position of point A relative to the origin of the world
frame, expressed in the world frame. Here is where the shorthand comes in. If the
position of a quantity is relative to a frame, and expressed in the same frame, then
we can safely omit the subscript. F pA ≡ F pA
F . Furthermore, if the "relative to" field is
omitted, then we assume that the point is relative to W , so pA ≡W pA W.

Frames also have an orientation. We'll use R to denote a rotation, and follow the
same notation, writing B RA to denote the rotation of frame A relative to frame B.
Unlike vectors, pure rotations do not have an additional "expressed in" frame.

A frame F can be specified completely by a position and rotation relative to another

frame. Taken together, we call the position and rotation a spatial pose, or just pose.
A spatial transform, or just transform, is the "verb form" of pose. In Drake we use
RigidTransform to represent a pose/transform, and denote it with the letter X. B X A
is the pose of frame A relative to frame B. When we talk about the pose of an object
O, without mentioning a reference frame explicitly, we mean W X O where O is the
body frame of the object.

The Drake documentation also discusses how to use this notation in code. In short,
pC is written p_BA_C. It works, I promise.
B A

3.2 PICK AND PLACE VIA SPATIAL TRANSFORMS

Now that we have the notation, we can formulate our approach to the basic pick and
place problem. Let us call our object, O, and our gripper, G. Our idealized perception
sensor tells us W X O . Let's create a frame Od to describe the "desired" pose of the
object, W X Od . So pick and place manipulation is simply trying to make X O = X Od .

To accomplish this, we will assume that the object doesn't move relative to the world
(W X O is constant) when the gripper is open, and the object doesn't move relative to
the gripper (G X O is constant) when the gripper is closed. Then we can:
• move the gripper in the world, X G , to an appropriate pose relative to the object:
O Ggrasp
X .
• close the gripper.
• move the gripper+object to the desired pose, X O = X Od .
• open the gripper, and retract the hand.
To simplify the problem of approaching the object (without colliding with it) and
retracting from the object, we will insert a "pregrasp pose", O X Gpregrasp , above the
object as an intermediate step.

Clearly, programming this strategy requires good tools for working with these
transforms, and for relating the pose of the gripper to the joint angles of the robot.

3.3 SPATIAL ALGEBRA

Here is where we start to see the pay-off from our heavy notation, as we define
the rules of converting positions, rotations, poses, etc. between different frames.
Without the notation, this invariably involves me with my right hand in the air
making the "right-hand rule", and my head twisting around in space. With the
notation, it's a simple matter of lining up the symbols properly, and we're more likely
to get the right answer!

Here are the basic rules of algebra for our spatial quantities:

• Positions expressed in the same frame can be added when their reference
and target symbols match:
A B
pF + B pC A C
F = pF . (1)

The additive inverse is well defined:

A B
pF = −B pA
F. (2)

Those should be pretty intuitive; make sure you confirm them for yourself.
• Multiplication by a rotation can be used to change the "expressed in" frame:
A B
pG = G RF A pB
F. (3)

You might be surprised that a rotation alone is enough to change the

expressed-in frame, but it's true. The position of the expressed-in frame
does not affect the relative position between two points.
• Rotations can be multiplied when their reference and target symbols match:
A
RB B RC = A RC . (4)

The inverse operation is also simply defined:

−1
[A RB ] = B RA . (5)

When the rotation is represented as a rotation matrix, this is literally the

matrix inverse, and since rotation matrices are orthonormal, we also have
R−1 = RT .
• Transforms bundle this up into a single, convenient notation when positions
are relative to a frame (and the same frame they are expressed in):
G A
p = G X F F pA = G pF + F pA G F G FF A
G = p + R p . (6)

• Transforms compose:
A
X BB X C = A X C , (7)

and have an inverse

−1
[A X B ] = BX A. (8)

Please note that for transforms, we generally do not have that X −1 is X T ,

though it still has a simple form.

In practice, transforms are implemented using homogenous coordinates, but for now
I'm happy to leave that as an implementation detail.

Example 3.1 (From camera frame to world frame)

Imagine that I have a depth camera mounted in a fixed pose in my workspace.
Let's call the camera frame C and denote its pose in the world with W X C . This pose
is often called the camera "extrinsics".

A depth camera returns points in the camera frame. Therefore, we'll write this
position of point Pi with C pPi . If we want to convert the point into the world frame,
we simply have

pPi = X C C pPi .

This is a work-horse operation for us. We often aim to merge points from multiple
cameras (typically in the world frame), and always need to somehow relate the
frames of the camera with the frames of the robot.

3.4 FORWARD KINEMATICS

The spatial algebra gets us pretty close to what we need for our pick and place
algorithm. But remember that the interface we have with the robot reports
measured joint positions, and expects commands in the form of joint positions. So
our remaining task is to convert between joint angles and cartesian frames. We'll do
this in steps, the first step is to go from joint positions to cartesian frames: this is
known as forward kinematics.

Throughout this text, we will refer to the joint positions of the robot (also known
as "configuration" of the robot) using a vector q. If the configuration of the scene
includes objects in the environment as well as the robot, we would use q for
the entire configuration vector, and use e.g. qrobot for the subset of the vector
corresponding to the robot's joint positions. Therefore, the goal of forward
kinematics is to produce a map:
G
X G = fkin (q). (9)

Moreover, we'd like to have forward kinematics available for any frame we have
defined in the scene. Our spatial notation and spatial algebra makes this
computation relatively straight-forward.

3.4.1 The kinematic tree

In order to facilitate kinematics and related multibody computations, the

MultibodyPlant organizes all of the bodies in the world into a tree topology. Every
body (except the world body) has a parent, which it is connected to via either a Joint
or a "floating base".

Example 3.2 (Inspecting the kinematic tree)

Drake provides some visualization support for inspecting the kinematic tree data
structure. The kinematic tree for an iiwa is more of a vine than a tree (it's a serial
manipulator), but the tree for the dexterous hands are more interesting. I've added
our brick to the example, too, so that you can see that a "free" body is just another
branch off the world root node.

Open in Colab

Every Joint and "floating base" has some number of position variables associated
with it -- a subset of the configuration vector q -- and knows how to compute the
configuration dependent transform across the joint from the child joint frame JC
to the parent joint frame JP : JP X JC (q). Additionally, the kinematic tree defines the
(fixed) transforms from the joint frame to the child body frame, C X JC , and from the
joint frame to the parent frame, P X JP . Altogether, we can compute the configuration
transform between any one body and its parent,
P
X C (q) = P X JP JP X JC (q)JC X C .

Example 3.3 (Specifying the kinematic tree in URDF)

Example 3.4 (Specifying the kinematic tree in SDF)

You might be tempted to think that every time you add a joint to the
MultibodyPlant, you are adding a degree of freedom. But it actually works the other
way around. Every time you add a body to the plant, you are adding many degrees of
freedom. But you can then add joints to remove those degrees of freedom; joints are
constraints. "Welding" the robot's base to the world frame removes all of the floating
degrees of freedom of the base. Adding a rotational joint between a child body and a
parent body removes all but one degree of freedom, etc.

Interestingly, SceneGraph (the geometry engine) does not maintain any tree
structure. All that SceneGraph knows about is a list of frames specified relative to
the world frame, W , and a list of geometries attached to each frame.

3.4.2 Forward kinematics for pick and place

In order to compute the pose of the gripper in the world, X G , we simply query
the parent of the gripper frame in the kinematic tree, and recursively compose the
transforms until we get to the world frame.
Figure 3.3 - Kinematic frames on the iiwa (left) and the WSG (right). For each
frame, the positive x axis is in red, the positive y axis is in green, and the positive z
axis is in blue. It's (hopefully) easy to remember: XYZ ⇔ RGB.

Example 3.5 (Forward kinematics for the gripper

frame)
Let's evaluate the pose of the gripper in the world frame: X G . We know that it will
be a function of configuration of the robot, which is just a part of the total state of
the MultibodyPlant. The following example shows you how it works.

Open in Colab

The key lines are

gripper = plant.GetBodyByName("body")
pose = plant.EvalBodyPoseInWorld(context, gripper)

Behind the scenes, the MultibodyPlant is doing all of the spatial algebra we
described above to return the pose (and also some clever caching because you can
reuse much of the computation when you want to evaluate the pose of another
frame on the same robot).

Example 3.6 (Forward kinematics of "floating-base"

objects)
Consider the special case of having a MultibodyPlant with exactly one body
added, and no joints. The kinematic tree is simply the world frame, the body frame,
and they are connected by the "floating base". What does the forward kinematics
function:

X B = fkin
B
(q),

look like in that case? If q is already representing the floating-base configuration,

is fkin
B
just the identity function?

This gets into the subtle points of how we represent transforms, and how we
represent rotations in particular. There are many possible representations of 3D
rotations, they are good for different things, and unfortunately, there is no one
representation to rule them all. (This is one of the many reasons why everything
is better in 2D!) Common representations include 3x3 rotation matrices, Euler
angles (e.g. Roll-Pitch-Yaw), axis angle, and unit quaternions. In Drake, we provide
all of these representations, and make it easy to convert back and forth between
them. In order to make the spatial algebra efficient, we use rotation matrices
in our RigidTransform, but in order to have a more compact representation of
configuration we use unit quaternions in our configuration vector, q. You can think
of unit quaternions as a form of axis angles that have been carefully normalized
to be unit length and have magical properties. My favorite careful description of
quaternions is probably chapter 1 of [4].

As a result, for this example, the software implementation of the function fkin
B
is
precisely the function that converts the position × unit quaternion representation
into the position × rotation matrix representation.

3.5 DIFFERENTIAL KINEMATICS (JACOBIANS)

The forward kinematics machinery gives us the ability to compute the pose of the
gripper and the pose of the object, both in the world frame. But if our goal is to move
the gripper to the object, then we should understand how changes in the joint angles
relate to changes in the gripper pose. This is traditionally referred to as "differential
kinematics".

At first blush, this is straightforward. The change in pose is related to a change in

joint positions by the (partial) derivative of the forward kinematics:
B
∂fkin (q)
dX B = dq = J B (q)dq. (10)
∂q

Partial derivatives of a function are referred to as "Jacobians" in many fields; in

robotics it's rare to refer to derivatives of the kinematics as anything else.

All of the subtlety, again, comes in because of the multiple representations that
we have for 3D rotations (rotation matrix, unit quaternions, ...). While there is
no one best representation for 3D rotations, it is possible to have one canonical
representation for differential rotations. Therefore, without any loss of generality,
we can represent the rate of change in pose using a six-component vector for spatial
velocity:

A B
A ωC
VCB = [ ]. (11)
A B
vC
A
VCB is the spatial velocity of frame B relative to frame A expressed in frame C ,
ωC ∈ R3 is the angular velocity (of frame B relative frame A expressed in frame C
A B

C ∈ R is the translational velocity (along with the same shorthands as for

), and A vB 3

positions). Translational and rotational velocities are simply vectors, so they can be
operated on just like positions:

• Spatial velocities add (when the frames match):

A B
vF + B vC A C
F = vF ,
A B
ωF + B ωC A C
F = ωF , (12)

and have the additive inverse. For the translational velocity, the addition is
perhaps known from high-school physics; for the angular velocity I consider
it quite surprising, and deserves to be verified.
• Rotation matrices can be used to change between the "expressed-in" frames
when those frames are fixed:

A B F
vG = G RF A vB
F,
A B
ωG = G RF A ωB
F, when G R
˙ = 0. (13)

• If the relative pose between two frames is not fixed, then we get slightly
more complicated formulas involving vector cross products; they are
derived from the derivative of the transform equation and the chain rule.
(Click the triangle to expand those details, but consider skipping them
for now!) The angular velocity vector is related to the time-derivative of
a rotation matrix via the skew-symmetric (as verified by differentiating
RRT = I ) matrix:

⎡ 0 −ωz ωy ⎤
˙ T = RR
RR ˙ −1 = ωz 0 −ωx . (14)
⎣ ⎦
−ωy ωx 0

Multiplying a vector by this matrix can be written as a vector-cross product:

˙ T p = RR
RR ˙ −1 p = ω × p. (15)

Therefore, differentiating
G A
p = G X F F pA = G pF + G RF F pA ,
yields

G A ˙ F F pA + G RF F vA
v =G vF + G R
=G vF + G ωF × G RF F pA + G RF F vA . (16)

There is one more velocity to be aware of: I'll use v to denote the generalized
velocity vector of the plant. While a spatial velocity A V B is six components, a
translational or angular velocity, B vC or B ωC , is three components, the generalized
velocity vector is whatever size it needs to be to encode the time derivatives of the
configuration variables, q. For the iiwa welded to the world frame, that means it has
seven components. I've tried to be careful to typeset each of these v's differently
throughout the notes. Almost always the distinction is also clear from the context.

Example 3.7 (Don't assume q˙ ≡ v)

The unit quaternion representation is four components, but these must form a
"unit vector" of length 1. Rotation matrices are 9 components, but they must
form an orthonormal matrix with det(R) = 1. It's pretty great that for changes in
rotation, we can use an unconstrained three component vector, what we've called
the angular velocity vector, ω. And you really should use it; getting rid of that
constraint makes both the math and the numerics better.

But there is one minor nuisance that this causes. We tend to want to think of the
generalized velocity as the time derivative of the generalized positions. This works
when we have only our iiwa in the model, and it is welded to the world frame. But
we cannot assume this in general; not when floating-base rotations are concerned.
As evidence, here is a simple example that loads exactly one rigid body into the
MultibodyPlant, and then prints its Context.

Open in Colab

The output looks like this:

Context
--------
Time: 0
States:
13 continuous states
1 0 0 0 0 0 0 0 0 0 0 0 0

plant.num_positions() = 7
plant.num_velocities() = 6

You can see that this system has 13 total state variables. 7 of them are positions, q;
we use unit quaternions in the position vector. But we have just 6 velocities, v; we
use angular velocities in the the velocity vector. Clearly, if the length of the vectors
don't even match, we do not have q˙ = v.

It's not really any harder; Drake provides the MultibodyPlant methods
MapQDotToVelocity and MapVelocityToQDot to go back and forth between them.
But you have to remember to use them!

Due to the multiple possible representations of 3D rotation, and the potential

difference between q˙ and v, there are actually many different kinematic Jacobians
possible. You may hear the terms "analytic Jacobian", which refers to the explicit
partial derivative of the forward kinematics (as written in (10)), and "geometric
Jacobian" which replaces 3D rotations on the left-hand side with spatial velocities.
In Drake's MultibodyPlant, we currently offer the geometric Jacobian versions via

• CalcJacobianAngularVelocity,
• CalcJacobianTranslationalVelocity, and
• CalcJacobianSpatialVelocity,

with each taking an argument to specify whether you'd like the Jacobian with respect
to q˙ or v. If you really like the analytical Jacobian, you could get it (much less
efficiently) using our support for automatic differentiation.

Example 3.8 (Kinematic Jacobians for pick and place)

Let's repeat the setup from above, but we'll print out the Jacobian of the gripper
frame, relative to the world frame, expressed in the world frame.

Open in Colab

3.6 DIFFERENTIAL INVERSE KINEMATICS

There is important structure in Eq (10). Make sure you didn't miss it. The
relationship between joint velocities and end-effector velocities is (configuration-
dependent) linear:

V G = J G (q)v. (17)

Maybe this gives us what we need to produce changes in gripper frame G? If I have
a desired gripper frame velocity V Gd , then how about commanding a joint velocity
v = [J G (q)] V Gd ?
−1

3.6.1 The Jacobian pseudo-inverse

Any time you write a matrix inverse, it's important to check that the matrix is
actually invertible. As a first sanity check: what are the dimensions of J G (q)? We
know the spatial velocity has six components. Our gripper frame is welded directly
on the last link of the iiwa, and the iiwa has seven positions, so we have
J G (qiiwa ) ∈ R6x7 . The matrix is not square, so does not have an inverse. But having
more degrees of freedom than the desired spatial velocity requires (more columns
than rows) is actually the good case, in the sense that we might have many solutions
for v that can achieve a desired spatial velocity. To choose one of them (the minimum-
norm solution), we can consider using the Moore-Penrose pseudo-inverse, J + ,
instead:

v = [J G (q)]+ V Gd . (18)

The pseudo-inverse is a beautiful mathematical concept. When the J is square and

full-rank, the pseudo-inverse returns the true inverse of the system. When there are
many solutions (here many joint velocities that accomplish the same end-effector
spatial velocity), then it returns the minimum-norm solution (the joint velocities that
produce the desired spatial velocity which are closest to zero in the least-squares
sense). When there is no exact solution, it returns the joint velocities that produce
an spatial velocity which is as close to the desired end-effector velocity as possible,
again in the least-squares sense. So good!

Example 3.9 (Our first end-effector "controller")

Let's write a simple controller using the pseudo-inverse of the Jacobian. First,
we'll write a new LeafSystem that defines one input port (for the iiwa measured
position), and one output port (for the iiwa joint velocity command). Inside that
system, we'll ask MultibodyPlant for the gripper Jacobian, and compute the joint
velocities that will implement a desired gripper spatial velocity.

iiwa_position→ PseudoInverseController → iiwa_velocity_command

To keep things simple for this first example, we'll just command a constant gripper
spatial velocity, and only run the simulation for a few seconds.

Note that we do have to add one additional system into the diagram. The output
of our controller is a desired joint velocity, but the input that the iiwa controller is
expecting is a desired joint position. So we will insert an integrator in between.

I don't expect you to understand every line in this example, but it's worth finding
the important lines and making sure you can change them and see what happens!

Open in Colab

Congratulations! Things are really moving now.

3.6.2 Invertibility of the Jacobian

There is a simple check to understand when the pseudo-inverse can give an exact
solution for any spatial velocity (achieving exactly the desired spatial velocity): the
Jacobian must be full row-rank. In this case, we need rank(J) = 6. But assigning
an integer rank to a matrix doesn't tell the entire story; for a real robot with
(noisy) floating-point joint positions, as the matrix gets close to losing rank, (pseudo-
)inverse starts to "blow-up". A better metric, then, is to watch the smallest singular
value; as this approaches zero, the norm of the pseudo-inverse will approach infinity.

Example 3.10 (Invertibility of the gripper Jacobian)

You might have noticed that I printed out the smallest singular value of J G in one of
the previous examples. Take it for another spin. See if you can find configurations
where the smallest singular value gets close to zero.

Open in Colab

Here's a hint: try some configurations where the arm is very straight, (e.g. driving
joint 2 and 4 close to zero).

Another good way to find the singularities are to use your pseudo-inverse
controller to send gripper spatial velocity commands that drive the gripper to the
limits of the robot's workspace. Try it and see! In fact, this is the common case,
and one that we will work hard to avoid.

Configurations q for which rank(J(qiiwa )) < 6 for a frame of interest (like our gripper
frame G) are called kinematic singularities. Try to avoid going near them if you can!
The iiwa has many virtues, but admittedly its kinematic workspace is not one of
them. Trust me, if you try to get a big Kuka to reach into a little kitchen sink all day,
every day, then you will spend a non-trivial amount of time thinking about avoiding
singularities.

In practice, things can get a lot better if we stop bolting our robot base to a fixed
location in the world. Mobile bases add complexity, but they are wonderful for
improving the kinematic workspace of a robot.
Example 3.11 (Are kinematic singularities real?)
A natural question when discussing singularities is whether they are somehow
real, or whether they are somehow an artifact of the analysis. Perhaps it is useful
to look at an extremely simple case.

Imagine a two-link arm, where each link has length one. Then the kinematics
reduce to

pG = [ c0 + c0+1 ],
s0 + s0+1

where I've used the (very common) shorthard s0 for sin(q0 ) and s0+1 for sin(q0 + q1 ),
etc. The Jacobian is

J G (q) = [−s0 − s0+1 −s0+1 ],

c0 + c0+1 c0+1

and as expected, it loses rank when the arm is at full extension (e.g. when
q0 = q1 = 0 which implies the first row is zero).

Click here for the animation.

Let's move the robot along the x-axis, by taking q0 (t) = 1 − t, and q1 (t) = −2 + 2t
. This clearly visits the singularity q0 = q1 = 0 at time 1, and then leaves again
without trouble. In fact, it does all this with a constant joint velocity (q˙0 = −1, q˙1 = 2
)! The resulting trajectory is

pG (t) = [2 cos(1 − t)].

There are a few things to understand. At the singularity, there is nothing that
the robot can do to move its gripper farther in positive x -- that singularity is
real. But it is also true that there is no way for the robot to move instantaneously
back in the direction of −x. The Jacobian analysis is not an approximation, it
is a perfect description of the relationship between joint velocities and gripper
velocities. However, just because you cannot achieve an instantaneous velocity in
the backwards direction, it does not mean you cannot get there! At t = 1, even
though the joint velocities are constant, and the Jacobian is singular, the robot is
accelerating in −x, p¨GWx (t) = −2 cos(1 − t).

3.7 DEFINING THE GRASP AND PRE-GRASP POSES

I'm going to put my red foam brick on the table. Its geometry is defined as a 7.5cm
x 5cm x 5cm box. For reference, the distance between the fingers on our gripper in
the default "open" position is 10.7cm. The "palm" of the gripper is 3.625cm from the
body origin, and the fingers are 8.2cm long.

To make things easy to start, I'll promise to set the object down on the table with the
object frame's z-axis pointing up (aligned with the world z axis), and you can assume
it is resting on the table safely within the workspace of the robot. But I reserve the
right to give the object arbitrary initial yaw. Don't worry, you might have noticed that
the seventh joint of the iiwa will let you rotate your gripper around quite nicely (well
beyond what my human wrist can do).

Observe that visually the box has rotational symmetry -- I could always rotate the
box 90 degrees around its x-axis and you wouldn't be able to tell the difference.
We'll think about the consequences of that more in the next chapter when we start
using perception. But for now, we are ok using the omniscient "cheat port" from the
simulator which gives us the unambiguous pose of the brick.

Figure 3.5 - The gripper frame and the object frame. For each frame, the positive x
axis is in red, the positive y axis is in green, and the positive z axis is in blue (XYZ ⇔
RGB).

Take a careful look at the gripper frame in the figure above, using the colors to
understand the axes. Here is my thinking: Given the size of the hand and the object,
I want the desired position (in meters) of the object in the gripper frame,

⎡ 0 ⎤ ⎡0⎤
Ggrasp O Gpregrasp O
p = 0.12 , p = 0.2 .
⎣ ⎦ ⎣ ⎦
0 0

Recall that the logic behind a pregrasp pose is to first move to safely above the
object, if our only gripper motion that is very close to the object is a straight
translation from the pregrasp pose to the grasp pose and back, then it allows us to
mostly avoid having to think about collisions (for now). I want the orientation of my
gripper to be set so that the positive z axis of the object aligns with the negative y
axis of the gripper frame, and the positive x axis of the object aligns with the positive
z axis of the gripper. We can accomplish that with
Ggrasp π π
RO = MakeXRotation( )MakeZRotation( ).
2 2
I admit I had my right hand in the air for that one! Our pregrasp pose will have the
same orientation as our grasp pose.

Example 3.12 (Computing grasp and pregrasp poses)

Here is a simple example of loading a floating Schunk gripper and a brick,
computing the grasp / pregrasp pose (drawing out each transformation clearly),
and rendering the hand relative to the object.

Open in Colab

I hope you can see the value of having good notation at work here. My right hand
was in the air when I was deciding what a suitable relative pose for the object in
the hand should be (when writing the notes). But once that was decided, I went to
type it in and everything just worked.

3.8 A PICK AND PLACE TRAJECTORY

We're getting close. We know how to produce desired gripper poses, and we know
how to change the gripper pose instantaneously using spatial velocity commands.
Now we need to specify how we want the gripper poses to change over time, so that
we can convert our gripper poses into spatial velocity commands.

Let's define all of the "keyframe" poses that we'd like the gripper to travel through,
and time that it should visit each one. The following colab example does precisely
that.

Example 3.13 (A plan "sketch")

Open in Colab

Figure 3.7 - Keyframes of the gripper. The robot's base will be at the origin, so
we're looking over the (invisible) robot's shoulder here. The hand starts in the
"initial" pose near the center, moves to the "prepick" to "pick" to "prepick" to
"clearance" to "preplace" to "place" and finally back to "preplace".

How did I choose the times? I started everything at time, t = 0, and listed the rest
of our times as absolute (time from zero). That's when the robot wakes up and
sees the brick. How long should we take to transition from the starting pose to the
pregrasp pose? A really good answer might depend on the exact joint speed limits
of the robot, but we're not trying to move fast yet. Instead I chose a conservative
time that is proportional to the total Euclidean distance that the hand will travel,
say k = 10 s/m (aka 10 cm/s):
G0 Gpregrasp
tpregrasp = k p 2
.
I just chose a fixed duration of 2 seconds for the transitions from pregrasp to grasp
and back, and also left 2 seconds with the gripper stationary for the segments
where the fingers needs to open/close.

To fill in the details -- defining a gripper pose for all t -- we'll convert these keyframes
into a proper trajectory, X G (t), ∀t ∈ [tinitial , tpostpick ]. Actually, we will produce a handful
of trajectories, one for each of the:

• desired gripper positions,

• desired gripper orientations, and
• desired gripper finger positions (open and close).

There are a number of ways one might represent a trajectory (mathematically or

computationally). We have a pretty good collection of trajectory classes available in
Drake. Many of them are implemented as splines -- piecewise polynomial functions
of time. Interpolating between orientations requires some care, but for positions we
can do a simple linear interpolation between each of our keyframes. That would
be called a "first-order hold", and it's implemented in Drake's PiecewisePolynomial
class. For rotations, we'll use something called "spherical linear interpolation" or
slerp, which is implemented in Drake's PiecewiseQuaternionSlerp.

Example 3.14 (Grasping with trajectories)

There are a number of ways to visualize the trajectory when it's connected to 3D.
I've plotted the position trajectory as a function of time below.

Open in Colab

0.5
x

0.0

0 5 10 15 20 25 30 35
t

0.0
y

−0.5

0 5 10 15 20 25 30 35
t

0.4
z

0.2

0.0
0 5 10 15 20 25 30 35
t

With 3D data, you can plot it in 3D. But my favorite approach is as an animation in
our 3D renderer! Make sure you try the "Open controls>Animation" interface. You
can pause it and then scrub through the trajectory using the time slider.

For a super interesting discussion on how we might visualize the 4D quaternions

as creatures trapped in 3D, you might enjoy this series of "explorable" videos.
3.9 PUTTING IT ALL TOGETHER

We can slightly generalize our PseudoInverseController to have additional input

ports for the gripper position and gripper orientation (in our first version, the spatial
velocity command was just hard-coded in the controller).

omega_WG→
v_WG→ PseudoInverseController → iiwa_velocity_command
iiwa_position→

So far, our trajectories are position and orientation trajectories. Our controller needs
translational velocity and angular velocity commands. Fortunately, the trajectory
classes we have used support differentiating the trajectories. In fact, the
PiecewiseQuaternionSlerp is clever enough to return the derivative of the
4-component quaternion trajectory as a 3-component angular velocity trajectory. The
rest is just a matter of wiring up the system diagram.

Example 3.15 (The full pick and place demo)

The next few cells of the notebook should get you a pretty satisfying result. Click
here to watch it without doing the work.

Open in Colab

pick_and_place

omega_WG
y0

PseudoInverseController
v_WG
omega_WG
y0
v_WG iiwa_velocity
integrator
iiwa_position
manipulation_station u0 y0
iiwa_position_commanded
iiwa_position_measured
iiwa_position
iiwa_velocity_estimated
iiwa_state_estimated
iiwa_torque_commanded
iiwa_torque_measured
iiwa_feedforward_torque
wsg_state_measured
wsg_force_measured
iiwa_torque_external
wsg_command
camera_0_rgb_image
y0 wsg_position
camera_0_depth_image
camera_0_label_image meshcat_visualizer
query_object lcm_visualization
contact_results
wsg_force_limit
plant_continuous_state
geometry_poses

It's worth scrutinizing the result. For instance, if you examine the context at the
final time, how close did it come to the desired final gripper position? Are you
happy with the joint positions? If you ran that same trajectory in reverse, then back
and forth (as an industrial robot might), would you expect errors to accumulate?

3.10 DIFFERENTIAL INVERSE KINEMATICS WITH

CONSTRAINTS

Our solution above works in many cases. We could potentially move on. But with just
a little more work, we can get a much more robust solution... one that we will be
happy with for many chapters to come.

So what's wrong with the pseudo-inverse controller? You won't be surprised when I
say that it does not perform well around singularities. When the minimum singular
value of the Jacobian gets small, that means that some values in the inverse get very
large. If you ask our controller to track a seemingly reasonable end-effector spatial
velocity, then you might have extremely large velocity commands that result.

There are other important limitations, though, which are perhaps more subtle.
The real robot has constraints, very real constraints on the joint angles, velocities,
accelerations, and torques. If you, for instance, send a velocity command to the iiwa
controller that cannot be followed, that velocity will be clipped. In the mode we
are running the iiwa (joint-impedance mode), the iiwa doesn't know anything about
your end-effector goals. So it will very likely simply saturate your velocity commands
independently joint by joint. The result, I'm afraid, will not be as convenient as a
slower end-effector trajectory. Your end-effector could run wildly off course.

Since we know the limits of the iiwa, a better approach is to take these constraints
into account at the level of our controller. It's relatively straight-forward to take
position, velocity, and acceleration constraints into account; torques would require
a full dynamics model so we won't worry about them yet here.

3.10.1 Pseudo-inverse as an optimization

I introduced the pseudo-inverse as having almost magical properties: it returns an

exact solution when one is available, or the best possible solution (in the least-
squares norm) when one is not. These properties can all be understood by realizing
that the pseudo-inverse is just the optimal solution to a least-squares optimization
problem:
2
min J G (q)v − V Gd 2 . (19)
v

When I write an optimization of this form, I will refer to v as the decision variable(s),
and I will use v∗ to denote the optimal solution (the value of the decision variables
that minimizes the cost). Here we have
+
v∗ = [J G (q)] V Gd .

Optimization is an incredibly rich topic, and we will put many tools from optimization
theory to use over the course of this text. For a beautiful rigorous but accessible
treatment of convex optimization, I highly recommend [5]; it is free online and even
reading the first chapter can be incredibly valuable. For a very short introduction
to using optimization in Drake, please take a look at the tutorials on "Solving
Mathematical Programs" linked from the Drake front page. I use the term
"mathematical program" almost synonymously with "optimization problem".
Mathematical program is slightly more appropriate if we don't actually have an
objective; only constraints.

3.10.2 Adding velocity constraints

Once we understand our existing solution through the lens of optimization, we

have a natural route to generalizing our approach to explicitly reason about the
constraints. The velocity constraints are the most straight-forward to add.
2
min J G (q)v − V Gd 2 , (20)
v
subject to vmin ≤ v ≤ vmax .

You can read this as "find me the joint velocities that achieve my desired gripper
spatial velocity as closely as possible, but satisfy my joint velocity constraints."
The solution to this can be much better than what you would get from solving the
unconstrained optimization and then simply trimming any velocities to respect the
constraints after the fact.

This is, admittedly, a harder problem to solve in general. The solution cannot be
described using only the pseudo-inverse of the Jacobian. Rather, we are going to
solve this (small) optimization problem directly in our controller every time it is
evaluated. This problem has a convex quadratic objective and linear constraints, so
it falls into the class of convex Quadratic Programming (QP). This is a particularly
nice class of optimization problems where we have very strong numerical tools.

Example 3.16 (Jacobian-based control with velocity

constraints)

3.10.3 Adding position and acceleration constraints

We can easily add more constraints to our QP, without significantly increasing the
complexity, as long as they are linear in the decision variables. So how should we
add constraints on the joint position and acceleration?

The natural approach is to make a first-order approximation of these constraints. To

do that, the controller needs some characteristic time step / timescale to relate its
velocity decisions to positions and accelerations. We'll denote that time step as h.

The controller already accepts the current measured joint positions q as an input;
let us now also take the current measured joint velocities v as a second input. And
we'll use vn for our decision variable -- the next velocity to command. Using a simple
Euler approximation of position and first-order derivative for acceleration gives us
the following optimization problem:
2
min J G (q)vn − V Gd 2 , (21)
vn
subject to vmin ≤ vn ≤ vmax ,
qmin ≤ q + hvn ≤ qmax ,
vn − v
v˙min ≤ ≤ v˙max .
h

3.10.4 Joint centering

(Need to finish/clean up the writing here). Our Jacobian is 6 × 7. We have extra

degrees of freedom. Not only an opportunity, but a responsibility. If we don't tell
them what we want them to do, then you might not like what you get. Discuss
analytic centering. Add a term to the objective in the nullspace of our current
objective.

Here is one way to formulate it. If we had no end-effector goals, then we might write
the controller v = K(q0 − q), a proportional control that would drive the robot to its
nominal configuration. In the unconstrained control case, there is a very elegant way
of implementing this controller as a "secondary" control, guaranteed not to interfere
with our primary end-effector spatial velocity controller, by projecting it into the
nullspace of J G . We can use almost the same trick here.

Denote P (q) as an orthonormal basis for the kernel of a Jacobian J . Traditionally

in robotics we implemented this using the pseudo-inverse: P = (I − J + J), but many
linear algebra packages now provide methods to obtain one more directly. Adding
P v ≈ P k(q0 − q) as a secondary objective can be accomplished with
2
min J G (q)vn − V Gd 2
+ ϵ|P (q)(vn − K(q0 − q))|22 , (22)
vn
subject to constraints.
Note the scalar ϵ that we've placed in front of the secondary objective. There is
something important to understand here. If we do not have any constraints, then
we can remove ϵ completely -- the secondary task will in no way interfere with the
primary task of tracking the spatial velocity. However, if there are constraints, then
these constraints can cause the two objectives to clash (Exercise). So we pick ϵ ≪ 1
to give the primary objective relatively more weight. But don't make it too small,
because that might make the numerics bad for your solver. I'd say ϵ = 0.01 is just
about right.

There are more sophisticated methods if one wishes to establish a strict task
prioritization in the presence of constraints (e.g. [6, 7]), but for th simple
prioritization we have formulated here, the penalty method is quite reasonable.

3.10.5 Alternative formulations

Once we have embraced the idea of solving a small optimization problem in our
control loop, many other formulations are possible, too. You will find many in the
literature. Minimizing the least-squares distance between the commanded spatial
velocity and the resulting velocity might not actually be the best solution. The
formulation we have been using heavily in Drake adds an additional constraint that
our solution must move in the same direction as the commanded spatial velocity.
If we are up against constraints, then we may slow down, but we will not deviate
(instantaneously) from the commanded path. It would be a pity to spend a long
time carefully planning collision-free paths for your end-effector, only to have your
controller treat the path as merely a suggestion. Note, however, that your plan
playback system still needs to be smart enough to realize that the slow-down
occurred (open-loop velocity commands are not enough).

max α, (23)
vn ,α

subject to J G (q)vn = αV Gd ,
0 ≤ α ≤ 1,
additional constraints.

You should immediately ask yourself: is it reasonable to scale a spatial velocity by a

scalar? That's another great exercise.

Example 3.17 (Drake's DifferentialInverseKinematics)

We will use this implementation of differential inverse kinematics whenever we need

to command the end-effector in the next few chapters.
3.11 EXERCISES

Exercise 3.1 (Spatial frames and positions.)

I've rendered the gripper and the brick with their corresponding body frames.
Given the configuration displayed in the figure, which is a possible value for G pO ?

a. [0.2, 0, -.2]
b. [0, 0.3, .1]
c. [0, -.3, .1]

Which is a possible value for G pO

Exercise 3.2 (The additive property of angular velocity

vectors.)
TODO(russt): Fill this in.

Exercise 3.3 (Spherical linear interpolation (slerp))

Exercise 3.4 (Scaling spatial velocity)

TODO(russt): Fill this in. Until then, here's a little code that might convince you
it's reasonable.

figures/scaling_spatial_velocity.py

Exercise 3.5 (Planar Manipulator)

For this exercise you will derive the translational forward kinematics A pC = f(q)
and the translational Jacobian J(q) of a planar two-link manipulator. You will work
exclusively in this notebook. You will be asked to complete the following steps:

a. Derive the forward kinematics of the manipulator.

b. Derive the Jacobian matrix of the manipulator.
c. Analyze the kinematic singularities of the manipulator from the
Jacobian.
Exercise 3.6 (Spatial Transforms and Grasp Pose)
For this exercise you will apply your knowledge on spatial algebra to write poses
of frames in different reference frames, and design a grasp pose yourself. You
will work exclusively in this notebook. You will be asked to complete the following
steps:

a. Express poses of frames in different reference frames using spatial

algebra.
b. Design grasp poses given the configuration of the target object and
griper configuration.

Exercise 3.7 (The Robot Painter)

For this exercise you will design interesting trajectories for the robot to follow,
and observe the robot virtually painting in the air! You will work exclusively in
this notebook. You will be asked to complete the following steps:

a. Design and compute the poses of key frames of a designated trajectory.

b. Construct trajectories by interpolating through the key frames.

Exercise 3.8 (Introduction to QPs)

For this exercise you will practice the syntax of solving Quadratic Programs (QPs)
via Drake's MathematicalProgram interface. You will work exclusively in this
notebook.

Exercise 3.9 (Virtual Wall)

For this exercise you will implement a virtual wall for a robot manipulator, using
an optimization-based approach to differential inverse kinematics. You will work
exclusively in this notebook. You will be asked to complete the following steps:

a. Implement a optimization-based differential IK controller with joint

velocity limits.
b. Using your own constraints, implement a virtual wall in the end-effector
space using optimization-based differential IK controller.

Exercise 3.10 (Competing objectives)

In the section on joint centering, I claimed that an secondary objective might
compete with a primary objective if they are linked through constraints. To see
this, consider the following optimization problem

min (x − 5)2 + (y + 3)2 .

x,y

Clearly, the optimal solution is given by x∗ = 5, y∗ = −3. Moreover, the objective

are separable. The addition of the second objective, (y + 3)2 , did not in any way
impact the solution of the first.

Now consider the constrained optimization

min (x − 5)2 + (y + 3)2

x,y
subject to x − y ≤ 6.
If the second objective was removed, then we would still have x∗ = 5. What is the
result of the optimization as written (it's only a few lines of code, if you want
to do it that way)? I think you'll find that these "orthogonal" objectives actually
compete!

What happens if you change the problem to

1
min (x − 5)2 + (y + 3)2
x,y 100
subject to x − y ≤ 6?

I think you'll find that solution is quite close to x∗ = 5, but also that y∗ is quite
different than that y∗ = 0 one would obtain if the "secondary" objective was
omitted completely.

Note that if the constraint was not active at the optimal solution (e.g. x = 5, y = −3
satisfied the constraints), then the objectives do not compete.

This seems like an overly simple example. But I think you'll find that it is actually
quite similar to what is happening in the nullspace objective formulation above.

REFERENCES

[1] John J. Craig. Introduction to Robotics: Mechanics and Control. Pearson

Education, Inc, third edition, 2005.
[2] Richard M. Murray, Zexiang Li, and S. Shankar Sastry. A Mathematical
Introduction to Robotic Manipulation. CRC Press, Inc., 1994.
[3] Kevin M Lynch and Frank C Park. Modern Robotics. Cambridge University Press,
2017.
[4] John Stillwell. Naive Lie theory. Springer Science & Business Media, 2008.
[5] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge
University Press, 2004.
[6] Fabrizio Flacco, Alessandro De Luca, and Oussama Khatib. Control of redundant
robots under hard joint constraints: Saturation in the null space. IEEE
Transactions on Robotics, 31(3):637--654, 2015.
[7] Adrien Escande, Nicolas Mansard, and Pierre-Brice Wieber. Hierarchical
quadratic programming: Fast online humanoid-robot motion generation. The
International Journal of Robotics Research, 33(7):1006--1028, 2014.
CHAPTER 4 Open in Colab

Geometric Pose Estimation

In the last chapter, we developed an initial solution to moving objects around, but
we made one major assumption that would prevent us from using it on a real robot:
we assumed that we knew the initial pose of the object. This chapter is going to be
our first pass at removing that assumption, by developing tools to estimate that pose
using the information obtained from the robot's depth cameras. The tools we develop
here will be most useful when you are trying to manipulate known objects (e.g. you
have a mesh file of their geometry) and are in a relatively uncluttered environment.
But they will also form a baseline for the more sophisticated methods we will study.

The approach that we'll take here is very geometric. This is in contrast to, and very
complimentary with, approaches that are more fundamentally driven by data. There
is no question that deep learning has had an enormous impact in perception -- it's
fair to say that it has enabled the current surge in manipulation advances -- and we
will certainly cover it in these notes. But when I've heard colleagues say that "all
perception these days is based on deep learning", I can't help but cry foul. There
has been another surge of progress in the last few years that has been happening in
parallel: the revolution in geometric reasoning, fueled by applications in autonomous
driving and virtual/augmented reality in addition to manipulation. Most advanced
manipulation systems these days combine both "deep perception" and "geometric
perception".

4.1 CAMERAS AND DEPTH SENSORS

Just as we had many choices when selecting the robot arm and hand, we have many
choices for instrumenting our robot/environment with sensors. Even more so than
our robot arms, the last few years have seen incredible improvements in the quality
and reductions in cost and size for these sensors. This is largely thanks to the cell
phone industry, but the race for autonomous cars has been fueling high-end sensors
as well.

These changes in hardware quality have caused sometimes dramatic changes in

our algorithmic approaches. For example, estimation can be much easier when the
resolution and frame rate of our sensors is high enough that not much can change
in the world between two images; this undoubtedly contributed to the revolutions in
the field of "simultaneous localization and mapping" (SLAM) we have seen over the
last decade or so.

One might think that the most important sensors for manipulation are the touch
sensors (you might even be right!). But in practice today, most of the emphasis is
on camera-based and/or range sensing. At very least, we should consider this first,
since our touch sensors won't do us much good if we don't know where in the world
we need to touch.

Traditional cameras, which we think of as a sensor that outputs a color image at

some framerate, play an important role. But robotics makes heavy use of sensors
that make an explicit measurement of the distance (between the camera and the
world) or depth; sometimes in addition to color and sometimes in lieu of color.
Admittedly, some researchers think we should only rely on color images.

4.1.1 Depth sensors

Primarily RGB-D (ToF vs projected texture stereo vs ...) cameras and Lidar

The cameras we are using in this course are Intel RealSense D415....
Monocular depth.

4.1.2 Simulation

There are a number of levels of fidelity at which one can simulate a camera like the
D415. We'll start our discussion here using an "ideal" RGB-D camera simulation --
the pixels returned in the depth image represent the true geometric depth in the
direction of each pixel coordinate. In DRAKE that is represented by the RgbdSensor
system, which can be wired up directly to the SceneGraph.

→ color_image
→ depth_image_32f
geometry_query→ RgbdSensor → depth_image_16u
→ label_image
→ X_WB

The signals and systems abstraction here is encapsulating a lot of complexity.

Inside the implementation of that system is a complete rendering engine, like one
that you will find in high-end computer games. Drake actually supports multiple
rendering engines; for the purposes of this class we will primarily use an OpenGL-
based renderer that is suitable for real-time simulation. In our research we also
use Drake's rendering API to connect to high-end physically-based rendering (PBR)
based on ray tracing, such as the Cycles renderer provided by Blender. These are
most useful when we are trying to render higher quality images e.g. to train a deep
perception system; we will discuss them in the deep perception chapters.

Example 4.1 (Simulating an RGB-D camera)

Open in Colab

As a simple example of depth cameras in drake, I've constructed a scene with a

single object (the mustard bottle from the YCB dataset), and added an RgbdSensor
to the diagram. Once this is wired up, we can simply evaluate the output ports in
order to obtain the color and depth images:
Please make sure you spend a minute with
the MeshCat visualization (available
here). You'll see a camera and the camera
frame, and you'll see the mustard bottle as
always... but it might look a little funny.
That's because I'm displaying both the
ground truth mustard bottle model and
the point cloud rendered from the
cameras. You can use the MeshCat GUI to
uncheck the ground truth visualization
(image right), and you'll be able to see just
the point cloud.

Remember that, in addition to looking at

the source code if you like, you can always
inspect the block diagram to understand what is happening at the "systems level".
Here is the diagram used in this example.

In the ManipulationStation simulation of the entire robot system for class, the
RgbdSensors have already been added, and their output ports exposed as outputs
for the station.

4.1.3 Sensor noise and depth dropouts

Real depth sensors, of course, are far from ideal -- and errors in depth returns
are not simple Gaussian noise, but rather are dependent on the lighting conditions,
the surface normals of the object, and the visual material properties of the object,
among other things. The color channels are an approximation, too. Even our high-
end renderers can only do as well as the geometries, materials and lighting sources
that we add to our scene, and it is very hard to capture all of the nuances of a
real environment. We will examine real sensor data, and the gaps between modeled
sensors and real sensors, after we start understanding some of the basic operations.

Figure 4.2 - Real depth sensors are messy. The red pixels in the far right image are
missed returns -- the depth sensor returned "maximum depth". Missed returns,
especially on the edges of objects (or on reflective objects) are a common sighting
in raw depth data. This figure is reproduced from [1].

4.1.4 Occlusions and partial views

The view of the mustard bottle in the example above makes one primary challenge
of working with cameras very clear. Cameras don't get to see everything! They only
get line of sight. If you want to get points from the back of the mustard bottle, you'll
need to move the camera. And if the mustard bottle is sitting on the table, you'll have
to pick it up if you ever want to see the bottom. This situation gets even worse in
cluttered scenes, where we have our views of the bottle occluded by other objects.
Even when the scene is not cluttered, a head-mounted or table-mounted camera is
often blocked by the robot's hand when it goes to do manipulation -- the moments
when you would like the sensors to be at their best is often when they give no
information!
It is quite common to mount a depth camera on the robot's wrist in order to help
with the partial views. But it doesn't completely resolve the problem. All of our depth
sensors have both a maximum range and a minimum range. The RealSense D415
returns matches from about 0.3m to 1.5m. That means for the last 30cm of the
approach to the object -- again, where we might like our cameras to be at their best
-- the object effectively disappears!

Figure 4.3 - When luxury permits, we try to put as many cameras as we can into the
robot's environment. In this dish-loading testbed at TRI where we put a total of
eight cameras into the scene (it still doesn't resolve the occlusion problem).

4.2 REPRESENTATIONS FOR GEOMETRY

Depth image, point cloud, triangulated mesh, signed distance functions, (+ surfels,
occupancy maps, ...)

The data returned by a depth camera takes the form of an image, where each
pixel value is a single number that represents the distance between the camera
and the nearest object in the environment along the pixel direction. If we combine
this with the basic information about the camera's intrinsic parameters (e.g. lens
parameters, stored in the CameraInfo class in Drake) then we can transform this
depth image representation into a collection of 3D points, si . I use s here because
they are commonly referred to as the "scene points" in the algorithms we will
present below. These points have a pose and (optionally) a color value or other
information attached. This is known as a point cloud representation. By default,
their pose is described in the camera frame, C X si , but the DepthImageToPointCloud
system in Drake also accepts P X C to make it easy to transform them into another
frame (such as the world frame).

depth_image→
color_image (optional)→ DepthImageToPointCloud → point_cloud
camera_pose (optional)→

As depth sensors have become so pervasive the field has built up libraries of tools
for performing basic geometric operations on point clouds, and that can be used to
transform back and forth between representations. We can implement many of of
the basic operations directly in Drake, but we will use the Open3D library if we need
more. Many older projects used the Point Cloud Library (PCL), which is now defunct
but still has some very useful documentation.

It's important to realize that the conversion of a depth image into a point cloud does
lose some information -- specifically the information about the ray that was cast from
the camera to arrive at that point. In addition to declaring "there is geometry at
this point", the depth image combined with the camera pose also implies that "there
is no geometry in the straight line path between camera and this point". We will
make use of this information in some of our algorithms, so don't discard the depth
images completely! More generally, we will find that each of the different geometry
representations have strengths and weaknesses -- it is very common to keep multiple
representations around and to convert back and forth between them.

4.3 POINT
CLOUD REGISTRATION WITH KNOWN
CORRESPONDENCES

Let us begin to address the primary objective of the chapter -- we have a known
object somewhere in the robot's workspace, we've obtained a point cloud from our
depth cameras. How do we estimate the pose of the object, X O ?

One very natural and well-studied formulation of this problem

comes from the literature on point cloud registration (also known as
point set registration or scan matching). Given two point clouds,
point cloud registration aims to find a rigid transform that optimally
aligns the two point clouds. For our purposes, that suggests that our
"model" for the object should also take the form of a point cloud (at
least for now). We'll describe that object with a list of model points,
mi , with their pose described in the object frame: O X mi .

Our second point cloud will be the scene points, si , that we obtain from our depth
camera, transformed (via the camera intrinsics) into the camera coordinates, C X si .
I'll use Nm and Ns for the number of model and scene points, respectively.

Let us assume, for now, that we also know X C . Is this reasonable? In the case of
a wrist-mounted camera, this would amount to solving the forward kinematics of
the arm. In an environment-mounted camera, this is about knowing the pose of the
cameras in the world. Small errors in estimating those poses, though, can lead to
large artifacts in the estimated point clouds. We therefore take great care to perform
camera extrinsics calibration; I've linked to our calibration code in the appendix.
Note that if you have a mobile manipulation platform (an arm mounted on a moving
base), then all of these discussions still apply, but you likely will perform all of your
registration in the robot's frame instead of the world frame.

The second major assumption that we will make in this section is "known
correspondences". This is not a reasonable assumption in practice, but it helps us
get started. The assumption is that we know which scene point corresponds with
which model point. I'll use a correspondence vector c ∈ [1, Nm ]Ns , where ci = j denotes
that scene point si corresponds with model point mj . Note that we do not assume
that these correspondences are one-to-one. Every scene point corresponds to some
model point, but the converse is not true, and multiple scene points may correspond
to the same model point.

As a result, the point cloud registration problem is simply an (inverse) kinematics

problem. We can write the model points and the scene points in a common frame
(here the world frame),

pmci = X O O pmci = X C C psi ,

leaving a single unknown transform, X O , for which we must solve. In the previous
chapter I argued for using differential kinematics instead of inverse kinematics; why
is my story different here? Differential kinematics can still be useful for perception,
for example if we want to track the pose of an object after we acquire its initial pose.
But unlike the robot case, where we can read the joint sensors to get our current
state, in perception we need to solve this harder problem at least once.

What does a solution for X O look like? Each model point gives us three constraints,
when psi ∈ R3 . The exact number of unknowns depends on the particular
representation we choose for the pose, but almost always this problem is
dramatically over-constrained. Treating each point as hard constraints on the
relative pose would also be very susceptible to measurement noise. As a result, it will
serve us better to try to find a pose that describes the data, e.g., in a least-squares
sense:
Ns
min ∑ ∥X O O pmci − X C C psi ∥2 .
X O ∈SE(3)
i=1

Here I have used the notation SE(3) for the "special Euclidean group," which is the
standard notation saying the X O must be a valid rigid transform.

To proceed, let's pick a particular representation for the pose to work with. I will use
3x3 rotation matrices here; the approach I'll describe below also has an equivalent
in quaternion form[2]. To write the optimization above using the coefficients of the
rotation matrix and the translation as decision variables, I have:
Ns
min ∑ ∥pO + RO O pmci − X C C psi ∥2 , (1)
pO ,RO
i=1
subject to R = R−1 ,
T
det(R) = +1.

The constraints are needed because not every 3x3 matrix is a valid rotation matrix;
satisfying these constraints ensures that R is a valid rotation matrix. In fact, the
constraint RT = R−1 is almost enough by itself; it implies that det(R) = ±1. But a
matrix that satisfies that constraint with det(R) = −1 is called an "improper rotation",
or a rotation plus a reflection across the axis of rotation.

Let's think about the type of optimization problem we've formulated so far. Given
our decision to use rotation matrices, the term pO + ROO pmci − X C C psi is linear in
the decision variables (think Ax ≈ b), making the objective a convex quadratic. How
about the constraints? The first constraint is a quadratic equality constraint (to see
it, rewrite it as RRT = I which gives 9 constraints, each quadratic in the elements of
R). The determinant constraint is cubic in the elements of R.

Example 4.2 (Rotation-only point registration in 2D)

I would like you to have a graphical sense for this optimization problem, but it's
hard to plot the objective function with 9+3 decision variables. To whittle it down
to the essentials that I can plot, let's consider the case where the scene points are
known to differ from the model points by a rotation only (this is famously known as
the Wahba problem). To further simplify, let's consider the problem in 2D instead
of 3D.

In 2D, we can actually linearly parameterize a rotation matrix with just two
variables:

R = [a −b].
b a

With this parameterization, the constraints RRT = I and the determinant

constraints are identical; they both require that a2 + b2 = 1.
Figure 4.4 - The x and y axes of this plot correspond to the parameters a, and b,
respectively, that define the 2D rotation matrix, R. The green quadratic bowl is
the objective ∑i ∥R O pmci − psi ∥2 , and the red open cylinder is the rotation matrix
constraint. Click here for the interactive version.

Is that what you expected? I generated this plot using just two model/scene points,
but adding more will only change the shape of the quadratic, but not its minimum.
And on the interactive version of the plot, I've added a slider so that you control
the parameter, θ, that represents the ground truth rotation described by R. Try it
out!

In the case with no noise in the measurements (e.g. the scene points are exactly
the model points modified by a rotation matrix), then the minimum of the objective
function already satisfies the constraint. But if you add just a little noise to the
points, then this is no longer true, and the constraint starts to play an important
role.

The geometry should be roughly the same in 3D, though clearly in much higher
dimensions. But I hope that the plot makes it perhaps a little less surprising that
this problem has an elegant numerical solution based on the singular value
decomposition (SVD).

What about translations? There is a super important insight that allows us to

decouple the optimization of rotations from the optimization of translations. The
insight is this: the relative position between points is affected by rotation, but not by
translation. Therefore, if we write the points relative to some canonical point on the
object, we can solve for rotations alone. Once we have the rotation, then we can back
out the translations easily. For our least squares objective, there is even a "correct"
canonical point to use -- the central point (like the center of mass if all points have
equal mass) under the Euclidean metric.

Therefore, to obtain the solution to the problem,

Ns
min ∑ ∥p + R O pmci − psi ∥2 , (2)
p∈R ,R∈R3×3
3
i=1
subject to RRT = I, (3)

first we define the central model and scene points, m

¯ and s¯, with positions given by
O m 1 Ns O mc 1 Ns si
¯
p = ∑ p i, ps¯ = ∑p .
Ns i=1 Ns i=1

If you are curious, these come from taking the gradient of the Lagrangian with
respect to p, and setting it equal to zero:
Ns
1 Ns si 1 Ns O mc
∑ 2(p + R O pmci − psi ) = 0 ⇒ p∗ = ∑ p − R∗ ( ∑ p i ),
i=1
Ns i=1 Ns i=1

but don't forget the geometric interpretation above. This is just another way to see
that we can substitute the right-hand side in for p in our objective and solve for R by
itself.

To find R, compose the data matrix

Ns
W = ∑(psi − ps¯ )(O pmci − O pm¯ )T ,
i=1

and use SVD to compute W = UΣV T . The optimal solution is

R∗ = UDV T ,
p = ps¯ − R∗O pm¯ ,
∗

where D is the diagonal matrix with entries [1, 1, det(UV T )] [3]. This may seem
like magic, but replacing the singular values in SVD with ones gives the optimal
projection back onto the orthonormal matrices, and the diagonal matrix ensures
that we do not get an improper rotation. There many derivations available in the
literature, but many of them drop the determinant constraint instead of handling the
improper rotation. See [4] (section 3) for one of my early favorites.

Example 4.3 (Point cloud registration with known

correspondences)

In the example code, I've made a random object (based on a random set of points
in the x-y plane), and perturbed it by a random 2D transform. The blue points are
the model points in the model coordinates, and the red points are the scene points.
The green dashed lines represent the (perfect) correspondences. On the right, I've
plotted both points again, but this time using the estimated pose to put them both
in the world frame. As expected, the algorithm perfectly recovers the ground truth
transformation.

Open in Colab

What is important to understand is that once the correspondences are given we have
an efficient and robust numerical solution to estimating the pose.
4.4 ITERATIVE CLOSEST POINT (ICP)
So how do we get the correspondences? In fact, if we were given the pose of the
object, then figuring out the correspondences is actually easy: the corresponding
point on the model is just the nearest neighbor / closest point to the scene point
when they are transformed into a common frame.

This observation suggests a natural iterative scheme, where we start with some
initial guess of the object pose and compute the correspondences via closest points,
then use those correspondences to update the estimated pose. This is one of the
famous and often used (despite its well-known limitations) algorithms for point cloud
registration: the iterative closest point (ICP) algorithm.

To be precise, let's use X

^ to denote our estimate of the object pose, and c^ to denote
O

our estimated correspondences. The "closest-point" step is given by

OO m
^
∀i, c^i = argminj∈Nm ∥X p ci
− psi ∥2 .

In words, we want to find the point in the model that is the closest in Euclidean
distance to the transformed scene points. This is the famous "nearest neighbor"
problem, and we have good numerical solutions (using optimized data structures)
for it. For instance, Open3D uses FLANN[5].

Although solving for the pose and the correspondences jointly is very difficult,
ICP leverages the idea that if we solve for them independently, then both parts
have good solutions. Iterative algorithms like this are a common approach taken
in optimization for e.g. bilinear optimization or expectation maximization. It is
important to understand that this is a local solution to a non-convex optimization
problem. So it is subject to getting stuck in local minima.

Example 4.4 (Iterative Closest Point)

Click here for the animation.

Here is ICP running on the random 2D objects. Blue are the model points, red are
the scene points, green dashed lines are the correspondences. I encourage you to
run the code yourself.

I've included one of the animation where it happens to converge to the true
optimal solution. But it gets stuck in a local minima more often than not! I
hope that stepping through will help your intuition. Remember that once the
correspondences are correct, the pose estimation will recover the exact solution.
So every one of those steps before convergence has at least a few bad
correspondences. Look closely!

Open in Colab

Intuition about these local minima has motivated a number of ICP variants, including
point-to-plane ICP, normal ICP, ICP that use color information, feature-based ICP, etc.

4.5 DEALING WITH PARTIAL VIEWS AND OUTLIERS

The example above is sufficient to demonstrate the problems that ICP can have
with local minima. But we have so far been dealing with unreasonably perfect point
clouds. Point cloud registration is an important topic in many fields, and not all of the
approaches you will find in the literature are well suited to dealing with the messy
point clouds we have to deal with in robotics.
Example 4.5 (ICP with messy point clouds)
I've added a number of parameters for you to play with in the notebook to add
outliers (scene points that do not actually correspond to any model point), to add
noise, and/or to restrict the points to a partial view. Please try them by themselves
and in combination.
Click here for the animation.
Open in Colab
Figure 4.7 - A sample run of ICP with outliers (modeled as additional points
drawn from a uniform distribution over the workspace).

Click here for the animation.

Figure 4.8 - A sample run of ICP with Gaussian noise added to the positions of the
scene points.

Click here for the animation.

Figure 4.9 - A sample run of ICP with scene points limited to a partial view. They
aren't all this bad, but when I saw that one, I couldn't resist using it as the
example!

4.5.1 Detecting outliers

Once we have a reasonable estimate, X O , then detecting outliers is straight-forward.

Each scene point contributes a term to the objective function according to the
(squared) distance between the scene point and its corresponding model point.
Scene points that have a large distance can be labeled as outliers and removed from
the point cloud, allowing us to refine the pose estimate without these distractors.
You will find that many ICP implementations, such as this one from Open3d include
a "maximum correspondence distance" parameter for this purpose.

This leaves us with a "chicken or the egg" problem -- we need a reasonable estimate
of the pose to detect outliers, but for very messy point clouds it may be hard to
get a reasonable estimate in the presence of those outliers. So where do we begin?
One common approach that was traditionally used with ICP is an algorithm called
RANdom SAmple Consensus (RANSAC) [6]. In RANSAC, we select a number of
random (typically quite small) subsets of "seed" points, and compute a pose estimate
for each. The subset with the largest number of "inliers" -- points that are below the
maximum correspondence distance -- is considered the winning set and can be used
to start the ICP algorithm.

There is another important and clever idea to know. Remember the important
observation that we can decouple the solution for rotation and translation by
exploiting the fact that the relative position between points depends on the rotation
but is invariant to translation. We can take that idea one step further and note
that the relative distance between points is actually invariant to both rotation and
translation. This has traditionally been used to separate the estimation of scale from
the estimation of translation and rotation (I've decided to ignore scaling throughout
this chapter, since it doesn't make immediate sense for our proposed problem). In
[7] they proposed that this can also be used to prune outliers even before we have an
initial pose estimate. If the distance between a pair of points in the scene is unlike
any distance between any pair of points in the model, then one of those two points
is an outlier. Finding the maximum set of inliers by this metric would correspond
to finding the maximum clique in a graph. Unfortunately, finding optimal cliques is
probably too expensive for large point clouds, so in [6] conservative approximations
are used instead.
4.5.2 Point cloud segmentation

Not all outliers are due to bad returns from the depth camera, very often they are
due to other objects in the scene. Even in the most basic case, our object of interest
will be resting on a table or in a bin. If we know the geometry of the table/bin
a priori, then we can subtract away points from that region. Alternatively, we can
crop the point cloud to a region of interest. This can be sufficient for very simple or
uncluttered scenes, and will be just enough for us to accomplish our goal for this
chapter.

More generally, if we have multiple objects in the scene, we may still want to find the
object of interest (mustard bottle?). If we had truly robust point cloud registration
algorithms, then one could imagine that the optimal registration would allow us
to correspond the model points with just a subset of the scene points; point cloud
registration could solve the object detection problem! Unfortunately, our algorithms
aren't strong enough.

Figure 4.10 - A multi-object scene from LabelFusion [8].

As a result, we almost always use some other algorithm to "segment" the scene in
a number of possible objects, and run registration independently on each segment.
There are numerous geometric approaches to segmentation[9]. But these days we
tend to use neural networks for segmentation, so I won't go into those details here.

4.5.3 Generalizing correspondence

The problem of outliers or multiple objects in the scene challenges our current
approach to correspondences. So far, we've used the notation ci = j to represent the
correspondence of every scene point to a model point. But note the asymmetry in our
algorithm so far (scene ⇒ model). I chose this direction because of the motivation of
partial views from the mustard bottle example from the chapter opening. Once we
start thinking about tables and other objects in the scene, though, maybe we should
have done model ⇒ scene? You should verify for yourself that this would be a minor
change to the ICP algorithm. But what if we have multiple objects and partial views?

It would be simple enough to allow ci to take a special value for "no correspondence",
and indeed that helps. In that case, we would hope that, at the optimum, the scene
points corresponding to the model of interest would be mapped to their respective
model points, and the scene points from outliers and other objects get labelled as "no
correspondence". The model points that are from the occluded parts of the object
would simply not have any correspondences associated with them.

There is an alternative notation that we can use which is slightly more general.
Let's denote the correspondence matrix C ∈ {0, 1}Ns ×Nm , with Cij = 1 iff scene point i
corresponds with model point j. Using this notation, we can write our estimation
step (given known correspondences) as:
Ns Nm
min ∑ ∑ Cij ∥X O O pmj − psi ∥2 .
X O ∈SE(3)
i=1 j=1

This subsumes our previous approach: if ci = j then set Cij = 1 and the rest of the
row equal to zero. If the scene point i is an outlier, then set the entire row to zero.
The previous notation was a more compact representation for the true asymmetry
in the "closest point" computation that we did above, and made it clear that we only
needed to compute Ns distances. But this more general notation will take us to the
next round.

4.5.4 Soft correspondences

What if we relax our strict binary notion of correspondences, and think of them
instead as correspondence weights Cij ∈ [0, 1] instead of Cij ∈ {0, 1}? This is the
main idea behind the "coherent point drift" (CPD) algorithm [10]. If you read the
CPD literature you will be immersed in a probabilistic exposition using a Gaussian
Mixture Model (GMM) as the representation. The probabilistic interpretation is
useful, but don't let it confuse you; trust me that it's the same thing. To maximize
the likelihood on a Gaussian, step one is almost always to take the log because it is a
monotonic function, which gets you right back to our quadratic objective.

The probabilistic interpretation does give a natural mechanism for setting the
correspondence weights on each iteration of the algorithm, by thinking of them as
the probability of the scene points given the model points:
m 2
X O O p j −psi
1 −1
σ2
Cij = exp 2 , (4)
ai

which is the standard density function for a Gaussian, σ is the and ai is the proper
normalization constant to make the probabilities sum to one. It is also natural
to encode the probability of an outlier in this formulation (it simply modifies the
normalization constant). Click to see the normalization constant, but it's really an
implementation detail. The normalization works out to be
m 2
Nm X O O p j −psi
− 12 3 w Nm
ai = ∑ exp σ2
+(2πσ2 ) 2 , (5)
j=1
1 − w Ns

with 0 ≤ w ≤ 1 the probability of a sample point being an outlier [9].

The CPD algorithm is now very similar to ICP, alternating between assigning the
correspondence weights and updating the pose estimate. The pose estimation step
is almost identical to the procedure above, with the "central" points now

O m 1 1
¯
p = ∑ Cij O pmj , ps¯ = ∑ Cij psi , NC = ∑ Cij ,
NC i,j NC i,j i,j

and the data matrix now:

∥
W = ∑ Cij (psi − ps¯ )(O pmj − O pm¯ ) .
i,j

The rest of the updates, for extracting R using SVD and solving for p given R, are the
same. You won't see the sums explicitly in the code, because each of those steps has
a particularly nice matrix form if we collect the points into a matrix with one point
per column.

The probabilistic interpretation also gives us a strategy for determining the

covariance of the Gaussians on each iteration. [9] derives the σ estimate as: (TODO:
finish converting this part of the derivation)

The word on the street is the CPD is considerably more robust than ICP with its hard
T
correspondences and quadratic objective; the Gaussian model mitigates the effect of
outliers by setting their correspondence weight to nearly zero. But it is also more
expensive to compute all of the pairwise distances for large point clouds. In [11],
we point out that a small reformulation (thinking of the scene points as inducing a
distribution over the model points, instead of vice versa) can get us back to summing
over the scene points only. It enjoys many of the robustness benefits of CPD, but also
the performance of algorithms like ICP.

4.5.5 Nonlinear optimization

All of the algorithms we've discussed so far have exploited the SVD solution to
the pose estimate given correspondences, and alternate between estimating the
correspondences and estimating the pose. There is another important class of
algorithms that attempt to solve for both simultaneously. This makes the
optimization problem nonconvex, which suggests they will still suffer from local
minima like we saw in the iterative algorithms. But many authors argue that the
solution times using nonlinear solvers can be on par with the iterative algorithms
(e.g. [12]).

Figure 4.11 - A variety of objective functions for robust registration. Click here for
the interactive version.

Another major advantage of using nonlinear optimization is that these solvers can
accommodate a wide variety of objective functions to achieve the same sort of
robustness to outliers that we saw from CPD. I've plotted a few of the popular
objective functions above. The primary characteristic of these functions is that they
taper, so that larger distances eventually do not result in higher cost. Importantly,
like in CPD, this means that we can consider all pairwise distances in our objective,
because if outliers add a constant value (e.g. 1) to the cost, then they have no
gradient with respect to the decision variables and no impact on the optimal
solution. We can therefore write, for our favorite loss function, l(x), an objective of
the form, e.g.

min ∑[l(∥X O O pmj − psi ∥)].

i,j

And what are the decision variables? We can also exploit the additional flexibility of
the solvers to use minimal parameterizations -- e.g. θ for rotations in 2D, or Euler
angles for rotations in 3D. The objective function is more complicated but we can
get rid of the constraints.

Example 4.6 (Nonlinear pose estimation with known

correspondences)
To understand precisely what we are giving up, let's consider a warm-up problem
where we use nonlinear optimization on the minimal parameterizations for the
pose estimation problem. As long as we don't add any constraints, we can still
separate the solutions for rotation from the solutions from translation. So consider
the problem:
2
min ∑ [cos θ − sin θ](O pmci − O pm¯ ) − (psi − ps¯ ) .
θ sin θ cos θ
j

We now have a complex, nonlinear objective function. Have we introduced local

minima into the problem?

As a thought experiment, consider the problem where all of

our model points lie on a perfect circle. For any one scene
point i, in the rotation-only optimization, the worst case is
when our estimate θ is 180 degrees (π radians) away from the
optimal solution. The cost for that point would be 4r2 , where r
is the radius of the circle. In fact, using the law of cosines, we
can actually write the squared distance for the point for any
error, θerr = θ − θ∗ , as

distance2 = r2 + r2 − 2r2 cos θerr .

And in the case of the circle, every other model point contributes the same cost.
This is not a convex function, but every minima is a globally optimal solution (just
wrapped around by 2π.

In fact, even if we have a more complicated, non-circular, geometry, then this same
argument holds. Every point will incur and error as it rotates around the circle,
but may have a different radius. The error for all of the model points will decrease
as cos θerr goes to one. So every minima is a globally optimal solution. We haven't
actually introduced local minima (yet)!

It is the correspondences, and other constraints that we might add to the problem,
that really introduce local minima.

Precomputing distance functions

be of the form

∥
There is an exceptionally nice trick for speeding up the computation in these
nonlinear optimizations, but it does require us to go back to thinking about the
minimum distance from a scene point to the model, as opposed to the pairwise
distance between points. This, I think, it not a big sacrifice now that we have stopped
enumerating correspondences explicitly. Let's slightly modify our objective above to

min ∑[l(min ∥X O O pmj − psi ∥)].

i
j

In words, we can an apply arbitrary robust loss function, but we will only apply it to
the minimum distance between the scene point and the model.

The nested min functions look a little intimidating from a computational perspective.
But this form, coupled with the fact that in our application the model points are
fixed, actually enables a fantastic optimization. First, let's move our optimization into
the model frame by changing the inner term to

min ∥O pmj − O X W psi ∥.

Now realize that for any 3D point x, we can precompute the minimum distance
from x to any point on our model, call it ϕ(x). The term above is just ϕ(O X W psi ).
This function of 3D space, sometimes called a distance field, and the closely related
signed distance function (SDF) and level sets are a common representation in
geometric modeling. And they are precisely what we need to quickly compute the
cost from many scene points.
Figure 4.12 - Contour plot of the distance function for our point-based
representation of the rectange (left), and the signed distance function for our
"mesh-based" 2D rectangle. The smoother distance contours in the mesh-based
representation can help alleviate some of the local minima problems we observed
above.

Figure 4.13 - A visualization of a "signed distance function" (SDF) representation of

3D geometry. Reproduced with permissions from [13].

As the figure I've included above suggests, we can use the precomputed distance
functions to represent the minimum distance to a mesh model instead of just a
point cloud. And with some care, we can define alternative distance-like functions
in 3D space that have smooth subgradients which can work better with nonlinear
optimization. There is a rich literature on this topic; see [14] for a nice reference.

4.5.6 Global optimization

Is there any hope of exploiting the beautiful structure that we found in the pose
estimation with known correspondences with an algorithm that searches for
correspondences and pose simultaneously?

There are some algorithms that claim global optimality for the ICP objective,
typically using a branch and bound strategy [15, 16, 17], but we have been
disappointed with the performance of these on real point clouds. We proposed
another, using branch and bound to explicitly enumerate the correspondences with a
truncated least-squares objective, and a tight relaxation of the SE(3) constraints[18],
but this method is still limited to relatively small problems.

In recent work, TEASER [6] takes a different approach, using the observation
that truncated least squares can be optimized efficiently for the case of scale and
translation. For rotations, they solve a semi-definite relaxation of the truncated least-
squares objective. This relaxation is not guaranteed to be tight, but (like all of the
convex relaxations we describe in this chapter), it is easy to certify after the fact
whether the optimal solution satisfied the original constraints.
4.6 NON-PENETRATION AND "FREE-SPACE" CONSTRAINTS

We've explored two main lineages of algorithms for the pose estimation -- one based
on the beautiful SVD solutions and the other based on nonlinear optimization. As we
will see, non-penetration and free-space constraints are, in most cases, non-convex
constraints, so are a better match for the nonlinear optimization approach. But
there are some examples of convex non-penetration constraints (e.g., when points
must not penetrate a half-plane) and it is possible to include these in our convex
optimization approach. I'll just show it in a simple example.

Example 4.7 (Non-penetration of a half-plane via

convex optimization)
For simplicity, I will restrict this example to 2D, where we can leverage our
convenient form of the rotation matrices:

R = [a −b].
b a

Recall that the rotation matrix constraints for this form reduce to a2 + b2 = 1. These
are non-convex constraints, but we can relax them to a2 + b2 ≤ 1. This is almost
exactly the problem we visualized in this earlier example, except that we will add
translations here. The relaxation is exactly changing the non-convex unit circle
constraint into the convex unit disk constraint. Based on that visualization, we
have reason to be optimistic that the unit disk relaxation could be tight.

Let's solve the following optimization:

min ∑ ∥p + R O pmci − psi ∥2 ,

p,a,b
i
subject to a2 + b2 ≤ 1,
W mi
p ≥ 0, ∀i ∈ [1, Nm ],

where R depends on a, b as above. Just as the addition of constraints forced us to

move from the pseudo-inverse solution to a numerical optimization-based solution
in the last chapter, the solution to this problem is no longer given simply by SVD.
As formulated, this optimization falls under the category of a Second-Order Cone
Program (SOCP), though we need to use a slack variable to put it into the standard
form with a linear objective.

Open in Colab

Please be careful. Now that we have a constraint that depends on p, our original
approach to solving for R independently is no longer valid (at least not without
modification). I've provided a simple implementation in colab.
Figure 4.14 - The convex approximation to the constrained pose estimation
problem with known correspondences. I have added constraints that both the x
and y positions of all of the points in the estimated pose must be greater than 0 in
the world frame (as illustrated by the green lines).

This is a slightly exaggerated case, where the scene points are really pulling the
box into the constraints. We can see that the relaxation is not tight in this situation;
the box is being shrunk slightly to explain the data.

Constrained optimizations like this can be made relatively efficient, and are
suitable for use in an ICP loop for modest-sized point clouds. But the major
limitation is that we can only take this approach for convex non-penetration
constraints (or convex approximations of the constraints), like the "half-plane"
constraints we used here. It is probably not very suitable for non-penetration
between two objects in the scene.

For the record, the fact that I was able to easily add this example here is actually
pretty special. I don't know of another toolbox that brings together the advanced
optimization algorithms with the geometry / physics / dynamical systems in the
way that Drake does.

4.6.1 Free space constraints as non-penetration

constraints

There is another beautiful idea that I first saw in [19]...

Figure 4.15 - The rays cast from the camera to the depth returns define a "free
space obstacle" that other objects should not penetrate.

Ultimately, as much as I prefer the convex formulations with their potential for
deeper understanding (by us) and for stronger guarantees (for the algorithms), the
ability to add non-penetration constraints and free-space constraints is simply too
valuable to ignore. I hope that, if you come back to these notes in a year or two, I will
be able to report that we have strong results for these more complex formulations.
But for now, in most applications, I will steer you towards the nonlinear optimization
approaches and taking advantage of these constraints.

4.7 PUTTING IT ALL TOGETHER

4.8 LOOKING AHEAD

Although I very much appreciate the ability to think rigorously about these
geometric approaches, we need to think critically about whether we are solving for
the right objectives.

Even if we stay in the realm of pure geometry, it is not clear that a least-squares
objective with equal weights on the points is correct. Imagine a tall skinny book
laying flat on the table -- we might only get a very small number of returns from the
edges of the book, but those returns contain proportionally much more information
than the slew of returns we might get from the book cover. It is no problem to include
relative weights in the estimation objectives we have formulated here, but we don't
yet have very successful geometry-based algorithms for deciding what those weights
should be in any general way. (There has been a lot of research in this direction, but
it's a hard problem.)

Please realize, though, that as beautiful as geometry is, we are so far all but ignoring
the most important information that we have: the color values! While it is possible
to put color and other features into an ICP-style pipeline, it is very hard to write a
simple distance metric in color space (the raw color values for a single object might
be very different in different lighting conditions, and the color values of different
objects can look very similar). Advances in computer vision, especially those based
on deep learning over the last few years, have brought new life to this problem.
When I asked my students recently "If you had to give up one of the channels, either
depth or color, which would you give up?" the answer was a resounding "I'd give up
depth; don't take away my color!" That's a big change from just a few years ago.

But deep learning and geometry can (should?) work nicely together. For example,
identifying correspondences between point clouds has been a major theme in this
chapter -- and we haven't completely solved it. This is one of the many places where
the color values and deep learning have a lot to offer [20]. We'll turn our attention to
them soon!

4.9 EXERCISES

Exercise 4.1 (How many points do you need?)

Consider the problem of point cloud registration with known correspondences. I
said that, in most cases, we have far more points than we have decision variables,
and therefore treating each as an equality constraint would make the problem
over-constrained. That raises a natural question: what is the minimal number of
points required to uniquely specify a pose? In 2D? In 3D? (The answer might not
be what you might first think!)

Exercise 4.2 (Rotational symmetries)

We have a beautiful formulation for the point cloud registration problem with
known correspondences -- it has a quadratic objective and (with the determinant
constraint relaxed) a quadratic equality constraint. Sometimes we have objects
that are rotationally symmetric -- think of box that is a perfect cube. How can
the quadratic objective capture a problem where there should be equally good
solutions at rotations that are 90 degrees apart?

Exercise 4.3 (ICP with random initializations)

If you run the ICP example code which has randomized object geometry and
ground truth object pose, you can't help but notice that the algorithm does a
quite reasonable job of estimating translation, but (at least for the geometries
I've generated here) does a fairly lousy job with rotation. One mitigation is
to run ICP multiple times with different initial estimates of the rotation. This
is a reasonable strategy for any nonconvex optimization problem with local
minima, but is particularly useful here since even when the point clouds are
quite complex, the dimensionality of the search space is low. In particular, we
can generate a reasonable coverage of 2D or even 3D rotations with a modest
number of samples (for 3D, consider using UniformlyRandomRotationMatrix).
Furthermore, running ICP from multiple initial conditions can be done in parallel.

Try implementing ICP from multiple initial estimates X ^ , sampling only in

rotation. If you only keep the best one (lowest estimation error), then how much
does it improve the performance over a single ICP run?

Exercise 4.4 (Planar Two-Point ICP)

We saw that ICP has many local minimas. But what are these local minimas,
and can we say something about how wrong our initial guess has to be until we
arrive at one of these local minima? Although the analysis can get complicated
for general geometry, let's start by analyzing a simple example of a planar two-
point ICP.

The model points and the scene points are given as:

p m 1 = ( 1) pm2 = (−1) ps1 = (1) ps2 = (−1).

0 0 0 0

The true correspondence is given by their numbering, but note that we don't
know the true correspondence - ICP simply determines it based on nearest
neighbors. Given our vanilla-ICP cost (sum of pairwise distances squared), we
can parametrize the 2D pose X O with px , py , a, b and write down the resulting
optimization as:
2
min ∑ (px ) + (a −b)pmi − psi
px ,py ,a,b py b a
i
2 2
s.t. a + b = 1.

a. When the initial guess for the pose results in the correct
correspondences based on nearest-neighbors, show that the ICP cost is
minimum when px , py , b = 0 and a = 1. Describe the set of initial poses
that results in convergence to the true solution.
b. When the initial guess results in a flipped correspondence, show that
ICP cost is minimum when px , py , b = 0 and a = −1. Describe the set of
initial poses that results in convergence to this incorrect solution.
c. Remember that correspondence need not be one-to-one. In fact they are
often not when computed based on nearest-neighbors. By constructing
the data matrix W , show that when both scene points correspond to one
model point, ICP gets stuck and does not achieve zero-cost. (HINT: You
may assume that doing SVD on a zero matrix leads to identity matrices
for U and V ).

Exercise 4.5 (Bunny ICP)

For this exercise, you will implement the ICP algorithm to match pointclouds
of two Stanford bunnies. You will work exclusively in this notebook. You will be
asked to complete the following steps:

a. Implement a method to compute the least-squares transform given the

correspondences.
b. Implement the ICP algorithm using the least square transform method
from part a.

Exercise 4.6 (RANSAC)

∥
For this exercise, you will remove the environmental background from the
Stanford bunny pointcloud using the RANSAC algorithm. You will work
exclusively in this notebook. You will be asked to complete the following steps:

a. Implement the RANSAC algorithm.

b. Use the RANSAC algorithm to remove the planar surface from the scene
point cloud.

Exercise 4.7 (Pose Estimation)

Let's go back to the example from Chapter 3, but relax the assumption that
we have the pose of the red foam brick. You will be given a simulated raw
pointcloud from our previous setup from a depth camera. Your task is to perform
segmentation on this raw pointcloud data, and perform ICP to estimate the pose
of the brick. You will work exclusively in this notebook. You will be asked to
complete the following steps:

a. Perform segmentation on the raw pointcloud to remove the background.

b. Tweak an existing ICP implementation to correctly estimate the pose of
the red foam brick.
REFERENCES

[1] Chris Sweeney, Greg Izatt, and Russ Tedrake. A supervised approach to
predicting noise in depth images. In Proceedings of the IEEE International
Conference on Robotics and Automation. IEEE, May 2019. [ www: ]
[2] Berthold K.P. Horn. Closed-form solution of absolute orientation using unit
quaternions. Journal of the Optical Society of America A, 4(4):629--642, April
1987.
[3] Andriy Myronenko and Xubo Song. On the closed-form solution of the rotation
matrix arising in computer vision problems. arXiv preprint arXiv:0904.1613,
2009.
[4] Robert M Haralick, Hyonam Joo, Chung-Nan Lee, Xinhua Zhuang, Vinay G
Vaidya, and Man Bae Kim. Pose estimation from corresponding point data. IEEE
Transactions on Systems, Man, and Cybernetics, 19(6):1426--1446, 1989.
[5] Marius Muja and David G Lowe. Flann, fast library for approximate nearest
neighbors. In International Conference on Computer Vision Theory and
Applications (VISAPPâ€™09), volume 3. INSTICC Press, 2009.
[6] M.A. Fischler and R.C. Bolles. Random sample consensus: A paradigm for
model fitting with applications to image analysis andd automated cartography.
In Communications of the Association for Computing Machinery (ACM),
volume 24, pages 381--395, 1981.
[7] Heng Yang, Jingnan Shi, and Luca Carlone. Teaser: Fast and certifiable point
cloud registration. arXiv preprint arXiv:2001.07715, 2020.
[8] Pat Marion, Peter R. Florence, Lucas Manuelli, and Russ Tedrake. A pipeline
for generating ground truth labels for real RGBD data of cluttered scenes.
In International Conference on Robotics and Automation (ICRA), Brisbane,
Australia, May 2018. [ www: ]
[9] Anh Nguyen and Bac Le. 3d point cloud segmentation: A survey. In 2013
6th IEEE conference on robotics, automation and mechatronics (RAM), pages
225--230. IEEE, 2013.
[10] Andriy Myronenko and Xubo Song. Point set registration: Coherent point drift.
IEEE transactions on pattern analysis and machine intelligence,
32(12):2262--2275, 2010.
[11] Wei Gao and Russ Tedrake. Filterreg: Robust and efficient probabilistic point-
set registration using gaussian filter and twist parameterization. In
Proceedings of theConference on Computer Vision and Pattern Recognition
(CVPR), June 2019. [ www: ]
[12] Andrew W Fitzgibbon. Robust registration of 2d and 3d point sets. Image and
vision computing, 21(13-14):1145--1153, 2003.
[13] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven
Lovegrove. DeepSDF: Learning continuous signed distance functions for shape
representation. In Proceedings of the IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR), June (To Appear) 2019.
[ www: ]
[14] Stanley Osher and Ronald Fedkiw. Level Set Methods and Dynamic Implicit
Surfaces. Applied Mathematical Sciences. Springer, 2003.
[15] Natasha Gelfand, Niloy J Mitra, Leonidas J Guibas, and Helmut Pottmann.
Robust global registration. In Symposium on geometry processing, volume 2,
page 5. Vienna, Austria, 2005.
[16] Nicolas Mellado, Dror Aiger, and Niloy J Mitra. Super 4pcs fast global
pointcloud registration via smart indexing. In Computer Graphics Forum,
volume 33, pages 205--215. Wiley Online Library, 2014.
[17] Jiaolong Yang, Hongdong Li, Dylan Campbell, and Yunde Jia. Go-ICP: A globally
optimal solution to 3d icp point-set registration. IEEE transactions on pattern
analysis and machine intelligence, 38(11):2241--2254, 2015.
[18] Gregory Izatt, Hongkai Dai, and Russ Tedrake. Globally optimal object pose
estimation in point clouds with mixed-integer programming. International
Symposium on Robotics Research, Dec 2017. [ www: ]
[19] Tanner Schmidt, Richard Newcombe, and Dieter Fox. Dart: dense articulated
real-time tracking with consumer depth cameras. Autonomous Robots,
39(3):239--258, 2015.
[20] Peter R. Florence*, Lucas Manuelli*, and Russ Tedrake. Dense object nets:
Learning dense visual object descriptors by and for robotic manipulation. In
Conference on Robot Learning (CoRL), October 2018. [ www: ]
CHAPTER 5 Open in Colab

Bin Picking
Our initial study of geometric perception gave us some powerful tools, but also
revealed some major limitations. In the next chapter, we will begin applying
techniques from deep learning to perception. Spoiler alert: those methods are going
to be insatiable in their hunger for data. So before we get there, I'd like to take a
brief excursion into a nice subproblem that might help us feed that need.

In this chapter we'll consider the simplest version of the bin picking problem: the
robot has a bin full of random objects and simply needs to move those objects from
one bin to the other. We'll be agnostic about what those objects are and about where
they end up in the other bin, but we would like our solution to achieve a reasonable
level of performance for a very wide variety of objects. This turns out to be a pretty
convenient way to create a training ground for robot learning -- we can set the robot
up to move objects back and forth between bins all day and night, and intermittently
add and remove objects from the bin to improve diversity. Of course, it is even easier
in simulation!

Bin picking has potentially very important applications in industries such as

logistics, and there are significantly more refined versions of this problem. For
example, we might need to pick only objects from a specific class, and/or place the
objects in known position (e.g. for "packing"). But let's start with the basic case.

5.1 GENERATING RANDOM CLUTTERED SCENES

If our goal is to test a diversity of bin picking situations, then the first task is to figure
out how to generate diverse simulations. How should we populate the bin full of
objects? So far we've set up each simulation by carefully setting the initial positions
(in the Context) for each of the objects, but that approach won't scale.

The first change that you'll see in this chapter is that I will start using Drake's
"Model Directives" for parsing more complex environments from yaml files. That
will help us set up the bins, cameras, and the robot which can be welded in fixed
positions. But generating the distributions over objects and object initial conditions
is more bespoke, and we need an algorithm for that.

5.1.1 Falling things

In the real world, we would probably just dump the random objects into the bin.
That's a decent strategy for simulation, too. We can roughly expect our simulation
to faithfully implement multibody physics as long as our initial conditions (and time
step) are reasonable; the physics isn't well defined if we initialize the Context with
multiple objects occupying the same physical space. The simplest and most common
way to avoid this is to generate a random number of objects in random poses, with
their vertical positions staggered so that they trivially start out of penetration.

If you look for them, you can find animations of large numbers of falling objects in
the demo reels for most advanced multibody simulators. But we're doing it for a
purpose!

Example 5.1 (Piles of foam bricks in 2D)

Here is the 2D case. I've added many instances of our favorite red foam brick to
the plant. Note that it's possible to write highly optimized 2D simulators; that's
not what I've done here. Rather, I've added a planar joint connecting each brick
to the world, and run our full 3D simulator. The planar joint has three degrees of
freedom. I've oriented them here to be x, z, and θ to constrain the objects to the xz
plane.

I've set the initial positions for each object in the Context to be uniformly
distributed over the horizontal position, uniformly rotated, and staggered every
0.1m in their initial vertical position. We only have to simulate for a little more than
a second for the bricks to come to rest and give us our intended "initial conditions".

Open in Colab

Click here for the animation.

It's not really any different to do this with any random objects -- here is what it looks
like when I run the same code, but replacing the brick with a random draw from a
few objects from the YCB dataset. It somehow amuses me that we can see the central
limit theorem hard at work, even when our objects are slightly ridiculous.

Example 5.2 (Filling bins with clutter)

The same technique also works in 3D. Setting uniformly random orientations
in 3D requires a little more thought, but Drake supplies the method
UniformlyRandomRotationMatrix (and also one for quaternions and roll-pitch-
yaw) to do that work for us.

Open in Colab

I had to decide how to visualize the results of this one for you. The mesh and
texture map files for the YCB objects are very large, so downloading many of
them to your browser from Colab felt a bit too painful. If you've decided to run
the notebooks on your local machine, then go ahead and run drake_visualizer
before running this test to see the live simulation (drake visualizer will load the
mesh files directly from your disk, so avoid the download). For Colab, I've decided
to add a camera to the scene and take a picture after the simulating for a few
seconds. After all, that's perhaps the data that we're actually looking for.
Please appreciate that bins are a particularly simple case for generating random
scenarios. If you wanted to generate random kitchen environments, for example,
then you won't be as happy with a solution that drops refrigerators, toasters, and
stools from uniformly random i.i.d. poses. In those cases, authoring reasonable
distributions gets much more interesting; we will revisit the topic of generative
scene models later in the notes.

5.1.2 Static equilibrium with frictional contact

Even in the case of bins, we should try to think critically about whether dropping
objects from a height is really the best solution. Given our discussion in the last
chapter about writing optimizations with non-penetration constraints, I hope you are
already asking yourself: why not use those constraints again here? Let's explore that
idea a bit further.

I won't dive into a full discussion of multibody dynamics nor multibody simulation,
though I do have more notes available here. What is important to understand here
is that the equations of motion of our multibody system are described by differential
equations of the form:

M(q)v̇ + C(q, v)v = τg (q) + ∑ JiT (q)F ci .

The left side of the equation is just a generalization of "mass times acceleration",
with the mass matrix, M , and the Coriolis terms C . The right hand side is the sum
of the (generalized) forces, with τg (q) capturing the terms due to gravity, and F ci is
the spatial force due to the ith contact. Ji (q) is the ith "contact Jacobian" -- it is the
Jacobian that maps from the generalized velocities to the spatial velocity of the ith
contact frame.

Our interest here is in finding (stable) steady-state solutions to these differential

equations that can serve as good initial conditions for our simulation. At steady-
state we have v = v˙ = 0, and conveniently all of the terms on the left-hand side of the
equations are zero. This leaves us with just the force-balance equations

τg (q) = − ∑ JiT (q)F ci .

But we still need to understand where the contact forces come from.

Contact forces, static friction, friction cones...

5.1.3 A few of the nuances of simulating contact

Example 5.3 (Contact force inspector)

I've created a simple GUI that allows you to pick any two primitive geometry types
and inspect the contact information that is computed when those object are put
into penetration. There is a lot of information displayed there! Take a minute to
make sure you understand the colors, and the information provided in the textbox
display.

Open in Colab

When I play with the GUI above, I feel almost overwhelmed. First, overwhelmed by
the sheer number of cases that we have to get right in the code; it's unfortunately
extremely common for open-source tools to have bugs in here. But second,
overwhelmed by the sense that we are asking the wrong question. Asking to
summarize the forces between two bodies in deep penetration with a single
Cartesian force applied at a point is fraught with peril. As you move the objects,
you will find many discontinuities; this is a common reason why you sometimes
see rigid body simulations "explode". It might seem natural to try to use multiple
contact points instead of a single contact point to summarize the forces, and some
simulators do, but it is very hard to write an algorithm that only depends on the
current configurations of the geometry which applies forces consistently from one
time step to the next with these approaches.

I currently feel that the only fundamental way to get around these numerical
issues is to start reasoning about the entire volume of penetration. The Drake
developers have recently proposed a version of this that we believe we can make
computationally tractable enough to be viable for real-time simulation [1], and are
working hard to bring this into full service in Drake. You will find many references
to "hydroelastic" contact throughout the code. We hope to turn it on by default soon.

In the mean time, we can make simulations robust by carefully curating the contact
geometry...

Figure 5.4 - The (exaggerated) contact geometry used for robust simulation of
boxes. We add contact "points" (epsilon radius spheres) to each corner, and have a
slightly inset box for the remaining contacts. Here is the interactive version.

5.1.4 Static equilibrium as an optimization constraint

5.2 GRASP SELECTION

Now that we have that ability to populate our bins full of random clutter, we need
to determine where we will grasp. The pipeline that one might expect, and indeed
the pipeline that robotics was somewhat stuck on for decades, is that the first step
should be to write some form of perception system to look into the bin and segment
the point cloud into individual objects. These days that's not impossible, but it's still
hard!

The newer approach that has proven incredibly useful for bin picking is to just look
for graspable areas directly on the (unsegmented) point cloud. Some of the most
visible proponents of this approach are [2, 3, 4]. If you look at those references,
you'll see that they all use learning to somehow decide where in the point cloud to
grasp. And I do think learning has a lot to offer in terms of choosing good grasps
-- it's a nice and natural learning formulation and there is significant information
besides just what appears in the immediate sensor returns that can contribute to
deciding where to grasp. But often you will see that underlying those learning
approaches is an approach to selecting grasps based on geometry alone, if only to
produce the original training data. I also think that the community doesn't regularly
acknowledge just how well the purely geometric approaches still work. I'd like to
discuss those first.

5.2.1 Point cloud pre-processing

To get a good view into the bin, we're going to set up multiple RGB-D cameras. I've
used three per bin in all of my examples here. And those cameras don't only see the
objects in the bin; they also see the bins, the other cameras, and anything else in the
viewable workspace. So we have a little work to do to merge the point clouds from
multiple cameras into a single point cloud that only includes the area of interest.

First, we can crop the point cloud to discard any points that are from outside the
area of interest (which we'll define as an axis-aligned bounding box immediately
above the known location of the bin).

As we will discuss in some detail below, many of our grasp selection strategies will
benefit from estimating the "normals" of the point cloud (a unit vector that estimates
the normal direction relative to the surface of the underlying geometry). It is actually
better to estimate the normals on the individual point clouds, making use of the
camera location that generated those points, than to estimate the normal after the
point cloud has been merged.

For sensors mounted on the real world, merging point clouds requires high-quality
camera calibration and must deal with the messy depth returns. All of the tools
from the last chapter are relevant, as the tasks of merging the point clouds is
another instance of the point-cloud-registration problem. For the perfect depth
measurements we can get out of simulation, given known camera locations, we can
skip this step and simply concatenate the list of points in the point clouds together.

Finally, the resulting raw point clouds likely include many more points then we
actually need for our grasp planning. One of the standard approaches for down-
sampling the point clouds is using a voxel grid -- regularly sized cubes tiled in 3D.
We then summarize the original point cloud with exactly one point per voxel (see, for
instance Open3D's note on voxelization). Since point clouds typically only occupy a
small percentage of the voxels in 3D space, we use sparse data structures to store
the voxel grid. In noisy point clouds, this voxelization step is also a useful form of
filtering.
Example 5.4 (Mustard bottle point clouds)

Open in Colab

I've produced a scene with three cameras looking at our favorite YCB mustard
bottle. I've taken the individual point clouds (already converted to the world frame
by the DepthImageToPointCloud system), converted them into Open3D's point
cloud format, cropped the point clouds (to get rid of the geometry from the other
cameras), estimated their normals, merged the point clouds, then down-sampled
the point clouds. The order is important!

I've pushed all of the point clouds to meshcat, but with many of them set to not be
visible by default. Use the drop-down menu to turn them on and off, and make sure
you understand basically what is happening on each of the steps. For this one, I
can also give you the meshcat output directly, if you don't want to use Colab.

5.2.2 Estimating normals and local curvature

It's worth going through the basic estimation of the local geometry in a point cloud.
You'll see it has a lot in common with the point cloud registration we studied in the
last chapter. Let's think about the problem of fitting a plane, in a least-squares sense,
to a set of points[5]. We can describe a plane in 3D with a position p and a unit
length normal vector, n. The distance between a point pi and a plane is simply given
by the magnitude of the inner product, (pi − p)T n . So our least-squares optimization
becomes
N
2
min ∑ (pi − p)T n , subject to |n| = 1.
p,n
i=1

Taking the gradient of the Lagrangian with respect to p and setting it equal to zero
gives us that

1 N i
p∗ = ∑p .
N i=1

Inserting this back into the objective, we can write the problem as
min nT W n, subject to |n| = 1, where W = [∑(pi − p∗ )(pi − p∗ )T ].
n
i

Geometrically, this objective is a quadratic bowl centered at the origin, with a unit
circle constraint. So the optimal solution is given by the (unit-length) eigenvector
corresponding to the smallest eigenvalue of the data matrix, W . And for any optimal
n, the "flipped" normal −n is also optimal. We can pick arbitrarily for now, and then
flip the normals in a post-processing step (to make sure that the normals all point
towards the camera origin).

What is really interesting is that the second and third eigenvalues/eigenvectors

also tell us something about the local geometry. Because W is symmetric, it has
orthogonal eigenvectors, and these eigenvectors form a (local) basis for the point
cloud. The smallest eigenvalue pointed along the normal, and the largest eigenvalue
corresponds to the direction of least curvature (the squared dot product with this
vector is the largest). This information can be very useful for finding and planning
grasps. [1] and others before them use is as a primary heuristic in generating
candidate grasps.

In order to approximate the local curvature of a mesh represented by a point cloud,

we can use our fast nearest neighbor queries to find a handful of local points,
and use this plane fitting algorithm on just those points. Then we can repeat that
entire procedure for every point in the point cloud. I remember when working with
point clouds started to become a bigger part of my life, I thought that surely doing
anything moderately computational like this on every point in some dense point
cloud would be incompatible with online perception. But I was wrong! Even years
ago, operations like this one were often used inside real-time perception loops. (And
they pale in comparison to the number of FLOPs we spend these days evaluating
large neural networks).

Example 5.5 (Normals and local curvature of the

mustard bottle.)

Open in Colab

I've coded up the basic least-squares surface estimation algorithm, with the query
point in green, the nearest neighbors in blue, and the local least squares
estimation drawn with our RGB⇔XYZ frame graphic. You should definitely slide it
around and see if you can understand how the axes line up with the normal and
local curvature.

You might wonder where you can read more about algorithms of this type. I don't
have a great reference for you. But Radu Rusu was the main author of the point
cloud library[6], and his thesis has a lot of nice summaries of the point cloud
algorithms of 2010[7].

5.2.3 Evaluating a candidate grasp

Now that we have processed our point cloud, we have everything we need to start
planning grasps. I'm going to break that discussion down into two steps. In this
section we'll come up with a cost function that scores grasp candidates. In the
following section, we'll discuss some very simple ideas for trying to find grasps
candidates that have a low cost.

There is a massive literature in robotics on "grasp metrics" and "grasp analysis". The
Handbook of Robotics[8] has an entire chapter on grasping devoted to it (which I
recommend)[9]. For a course on manipulation (especially from someone who cares
a lot about dynamics) to spend so little time on this topic is perhaps surprising,
but a great deal of that literature is focused on analyzing point contacts between a
dexterous hand and an object with known geometry. That's simply not the situation
we find ourselves in for a lot of manipulation work today. We have ourselves a simple
two fingered grippers with relatively a lot of grasping force (> 80 N) and only local
point clouds to represent our geometry.

But there is one key idea that I do want to extract from that literature: the value
of antipodal grasps. Roughly speaking, once we pick up an object -- or whatever
happens to be between our fingers when we squeeze -- then we will expect the
contact forces between our fingers to have to resist at least the gravitational wrench
(the spatial force due to gravity) of the object. The closing force provided by our
gripper is in the gripper's x-axis, but if we want to be able to pick up the object
without it slipping from our hands, then we need forces inside the friction cones of
our contacts to be able to resist the gravitational wrench. Since we don't know what
that wrench will be (and are somewhat constrained by the geometry of our fingers
from tilting our contact normals into the vertical direction), a reasonable strategy
is to look for places on the point cloud where the normals will align with x-axis of
the gripper. That means finding a pair of points on the point cloud with normals
that are collinear with the line connecting the points and are pointing in opposite
directions, these are antipodal points. In a real point cloud, we are unlikely to find
perfect antipodal pairs, but finding areas with normals pointing in nearly opposite
directions is a good strategy for grasping!

Example 5.6 (Scoring grasp candidates)

In practice, the contact between our fingers and the object(s) will be better
described by a patch contact than by a point contact (due to the deflection of the
rubber fingertips and any deflection of the objects being picked). So it makes sense
to look for patches of points with agreeable normals. There are many ways one
could write this, I've done it here by transforming the processed point cloud of
the scene into the candidate frame of the gripper, and cropped away all of the
points except the ones that are inside a bounding box between the finger tips (I've
marked them in red in MeshCat). The first term in my grasping cost function is just
reward for all of the points in the point cloud, based on how aligned their normal
is to the x-axis of the gripper:
cost = − ∑(niGx )2 ,
i

where niGx is the x component of the ith point in the cropped point cloud expressed
in the gripper frame.

There are other considerations for what might make a good grasp, too. For our
kinematically limited robot reaching into a bin, we might favor grasps that put the
hand in favorable orientation for the arm. In the grasp metric I've implemented
in the code, I added a cost for the hand deviating from vertical. I can reward the
dot product of the vector world −z vector, [0, 0, −1] with the y-axis in gripper frame
rotated into world frame with :

⎡0⎤
G G
cost += −α[0 0 −1]R 1 = αR3,2 ,
⎣ ⎦
0

where α is relative cost weight, and RG

3,2 is the scalar element of the rotation matrix
in the third row and second column.

Finally, we need to consider collisions between the candidate grasp and both the
bins and with the point cloud. I simply return infinite cost when the gripper is
in collision. I've implemented all of those terms in the notebook, and given you a
sliders to move the hand around and see how the cost changes.

Open in Colab

5.2.4 Generating grasp candidates

We've defined a cost function that, given a point cloud from the scene and a model
of the environment (e.g. the location of the bins), can score a candidate grasp pose,
X G . So now we would like to solve the optimization problem: find X G that minimizes
the cost subject to the collision constraints.

Unfortunately, this is an extremely difficult optimization problem, with a highly

nonconvex objective and constraints. Moreover, the cost terms corresponding to the
antipodal points is zero for most X G -- since most random X G will not have any
points between the fingers. As a result, instead of using the typical mathematical
programming solvers, most approaches in the literature resort to a randomized
sampling-based algorithm. And we do have strong heuristics for picking reasonable
samples.

One heuristic, used for instance in [1], is to use the local curvature of the point
cloud to propose grasp candidates that have the point cloud normals pointing into
the palm, and orients the hand so that the fingers are aligned with the direction
of maximum curvature. One can move the hand in the direction of the normal until
the fingers are out of collision, and even sample nearby points. We have written an
exercise for you to explore this heuristic. But for our YCB objects, I'm not sure it's
the best heuristic; we have a lot of boxes, and boxes don't have a lot of information
to give in their local curvature.

Another heuristic is to find antipodal point pairs in the point cloud, and then
sample grasp candidates that would align the fingers with those antipodal pairs.
Many 3D geometry libraries support "ray casting" operations at least for a voxel
representation of a point cloud; so a reasonable approach to finding antipodal pairs
is to simply choose a point at random in the point cloud, then ray cast into the point
cloud in the opposite direction of the normal. If the normal of the voxel found by ray
casting is sufficiently antipodal, and if the distance to that voxel is smaller than the
gripper width, then we've found a reasonable antipodal point pair.

Example 5.7 (Generating grasp candidates)

Open3D doesn't implement ray casting (yet), so I've implemented an even simpler
heuristic in my code example: I simply choose a point at random, and start
sampling grasps in orientation (starting from vertical) that align the x-axis of the
gripper with the normal of that point. Then I mostly just rely on the antipodal term
in my scoring function to allow me to find good grasps.

I do implement one more heuristic -- once I've found the points in the point cloud
that are between the finger tips, then I move the hand out along the gripper x
-axis so that those points are centered in the gripper frame. This helps prevent us
knocking over objects as we close our hands to grasp them.

But that's it! It's a very simple strategy. I sample a handful of candidate grasps and
just draw the top few with the lowest cost. If you run it a bunch of times, I think
you will find it's actually quite reasonable. Every time it runs, it is simulating the
objects falling from the sky; the actual grasp evaluation is actually quite fast.

Open in Colab

5.3 THE CORNER CASES

If you play around with the grasp scoring I've implemented above a little bit, you will
find deficiencies. Some of them are addressed easily (albeit heuristically) by adding
a few more terms to the cost. For instance, I didn't check collisions of the pre-grasp
configuration, but this could be added easily.

There are other cases where grasping alone is not sufficient as a strategy. Imagine
that you place an object right in one of the corners of the bin. It might not be possible
to get the hand around both sides of the object without being in collision with either
the object or the side. The strategy above will never choose to try to grab the very
corner of a box (because it always tried to align the sample point normal with the
gripper x), and it's not clear that it should. This is probably especially true for our
relatively large gripper. In the setup we used at TRI, we implemented an additional
simple "pushing" heuristic that would be used if there were point clouds in the sink,
but no viable grasp candidate could be found. Instead of grasping, we would drive
the hand down and nudge that part of the point cloud towards the middle of the bin.
This can actually help a lot!

There are other deficiencies to our simple approach that would be very hard to
address with a purely geometric approach. Most of them come down to the fact that
our system so far has no notion of "objects". For instance, it's not uncommon to see
this strategy result in "double picks" if two objects are close together in the bin. For
heavy objects, it might be important to pick up the object close to the center of mass,
to improve our chances of resisting the gravitational wrench while staying in our
friction cones. But our strategy here might pick up a heavy hammer by squeezing
just the very far end of the handle.

Interestingly, I don't think the problem is necessarily that the point cloud
information is insufficient, even without the color information. I could show you
similar point clouds and you wouldn't make the same mistake. These are the types
of examples where learning a grasp metric could potentially help. We don't need to
achieve artificial general intelligence to solve this one; just experience knowing that
when we tried to grasp in someplace before we failed would be enough to improve
our grasp heuristics significantly.

5.4 PUTTING IT ALL TOGETHER

5.4.1 A simple state machine

5.5 EXERCISES

Exercise 5.1 (Normal Estimation from Depth)

For this exercise, you will investigate a slightly different approach to normal
vector estimation. In particular, we can exploit the spatial structure that is
already in a depth image to avoid computing nearest neighbors. You will work
exclusively in this notebook. You will be asked to complete the following steps:

a. Implement a method to estimate normal vectors from a depth image,

without computing nearest neighbors.
b. Reason about a scenario where the depth image-based solution will not
be as performant as computing nearest-neighbors.

Exercise 5.2 (Analytic Antipodal Grasping)

So far, we have used sampling-based methods to find antipodal grasps - but can
we have find one analytically if we knew the equation of the object shape? For
this exercise, you will analyze and implement a correct-by-construction method
to find antipodal points using symbolic differentiation and MathematicalProgram.
You will work exclusively in this notebook. You will be asked to complete the
following steps:

a. Prove that antipodal points are critical points of an energy function

defined on the shape.
b. Prove the converse does not hold.
c. Implement a method to find these antipodal points using
MathematicalProgram.
d. Analyze the Hessian of the energy function and its relation to the type of
antipodal grasps.

Exercise 5.3 (Grasp Candidate Generation)

In the chapter Colab notebook, we generated grasp candidates using the
antipodal heuristic. In this exercise, we will investigate an alternative method
for generating grasp candidates based on local curvature, similar to the one
proposed in [1]. This exercise is implementation-heavy, and you will work
exclusively in this notebook.

REFERENCES

[1] Ryan Elandt, Evan Drumwright, Michael Sherman, and Andy Ruina. A pressure
field model for fast, robust approximation of net contact force and moment
between nominally rigid objects. pages 8238--8245, 11 2019.
[2] Andreas ten Pas, Marcus Gualtieri, Kate Saenko, and Robert Platt. Grasp pose
detection in point clouds. The International Journal of Robotics Research,
36(13-14):1455--1473, 2017.
[3] Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu
Liu, Juan Aparicio Ojea, and Ken Goldberg. Dex-net 2.0: Deep learning to plan
robust grasps with synthetic point clouds and analytic grasp metrics. arXiv
preprint arXiv:1703.09312, 2017.
[4] Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R Hogan, Maria
Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, et al. Robotic pick-
and-place of novel objects in clutter with multi-affordance grasping and cross-
domain image matching. In 2018 IEEE international conference on robotics and
automation (ICRA), pages 1--8. IEEE, 2018.
[5] Craig M Shakarji. Least-squares fitting algorithms of the nist algorithm testing
system. Journal of research of the National Institute of Standards and
Technology, 103(6):633, 1998.
[6] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library (PCL). In
Proceedings of the IEEE International Conference on Robotics and Automation
(ICRA), Shanghai, China, May 9-13 2011.
[7] Radu Bogdan Rusu. Semantic 3D Object Maps for Everyday Manipulation in
Human Living Environments. PhD thesis, Institut für Informatik der Technischen
Universität München, 2010.
[8] Bruno Siciliano and Oussama Khatib. Springer handbook of robotics. Springer,
2016.
[9] Domenico Prattichizzo and Jeffrey C Trinkle. Grasping. In Springer Handbook of
Robotics, pages 671--700. Springer Berlin Heidelberg, 2008.
CHAPTER 6
Object detection and
segmentation
Our study of geometric perception gave us good tools for estimating the pose of
a known object. These algorithms can produce highly accurate estimates, but are
still subject to local minima. When the scenes get more cluttered/complicated, or if
we are dealing with many different object types, they really don't offer an adequate
solution by themselves.

Deep learning has given us data-driven solutions that complement our geometric
approaches beautifully. Finding correlations in massive datasets has proven to be
a fantastic way to provide practical solutions to these more "global" problems like
detecting whether the mustard bottle is even in the scene, segmenting out the
portion of the image / point cloud that is relevant to the object, and even in providing
a rough estimate of the pose that could be refined with a geometric method.

There are many sources of information about deep learning on the internet, and I
have no ambition of replicating nor replacing them here. But this chapter does being
our exploration of deep perception in manipulation, and I feel that I need to give just
a little context.

6.1 GETTING TO BIG DATA

6.1.1 Crowd-sourced annotation datasets

The modern revolution in computer vision was unquestionably fueled by the

availability of massive annotated datasets. The most famous of all is ImageNet,
which eclipsed previous datasets with the number of images and the accuracy and
usefulness of the labels[1]. Fei-fei Li, who led the creation of ImageNet has been
giving talks that give some nice historical perspective on how ImageNet came to
be. Here is one (slightly) tailored to robotics and even manipulation; you might start
here.

[0] describes the annotations available in ImageNet:

... annotations fall into one of two categories: (1) image-level annotation of
a binary label for the presence or absence of an object class in the image,
e.g., "there are cars in this image" but "there are no tigers," and (2) object-
level annotation of a tight bounding box and class label around an object
instance in the image, e.g., "there is a screwdriver centered at position
(20,25) with width of 50 pixels and height of 30 pixels".
Figure 6.1 - A sample annotated image from the COCO dataset, illustrating the
difference between image-level annotations, object-level annotations, and
segmentations at the class/semantic- or instance- level..

The COCO dataset similarly enabled pixel-wise instance-level segmentation [2],

where distinct instances of a class are given a unique label (and also associated
with the class label). COCO has fewer object categories than ImageNet, but more
instances per category. It's still shocking to me that they were able to get 2.5 million
images labeled at the pixel level. I remember some of the early projects at MIT
when crowd-sourced image labeling was just beginning (projects like LabelMe [3]);
Antonio Torralba used to joke about how surprised he was about the accuracy of
the (nearly) pixel-wise annotations that he was able to crowd-source (and that his
mother was a particularly prolific and accurate labeler)!

Instance segmentation turns out to be an very good match for the perception needs
we have in manipulation. In the last chapter we found ourselves with a bin full
of YCB objects. If we want to pick out only the mustard bottles, and pick them
out one at a time, then we can use a deep network to perform an initial instance-
level segmentation of the scene, and then use our grasping strategies on only the
segmented point cloud. Or if we do need to estimate the pose of an object (e.g.
in order to place it in a desired pose), then segmenting the point cloud can also
dramatically improve the chances of success with our geometric pose estimation
algorithms.

6.1.2 Segmenting new classes via fine tuning

The ImageNet and COCO datasets contain labels for a variety of interesting classes,
including cows, elephants, bears, zebras and giraffes. They have a few classes that
are more relevant to manipulation (e.g., plates, forks, knives, and spoons), but they
don't have a mustard bottle nor a can of potted meat like we have in the YCB dataset.
So what are we do to? Must we produce the same image annotation tools and pay
for people to label thousands of images for us?

One of the most amazing and magical properties of the deep architectures that
have been working so well for instance-level segmentation is that they appear to
have some modularity in them. An image that was pre-trained on a large dataset
like ImageNet or COCO can be fine-tuned with a relatively much smaller amount of
labeled data to a new set of classes that are relevant to our particular application.
In fact, the architectures are often referred to as having a "backbone" and a "head"
-- in order to train a new set of classes, it is often possible to just pop off the
existing head and replace it with a new head for the new labels. A relatively small
amount of training with a relatively small dataset can still achieve surprisingly
robust performance. Moreover, it seems that training initially on the diverse dataset
(ImageNet or COCO) is actually important to learn the robust perceptual
representations that work for a broad class of perception tasks. Incredible!

This is great news! But we still need some amount of labeled data for our objects
of interest. The last few years have seen a number of start-ups based purely on the
business model of helping you get your dataset labeled. But thankfully, this isn't our
only option.

6.1.3 Annotation tools for manipulation

Just as projects like LabelMe helped to streamline the process of providing pixel-
wise annotations for images downloaded from the web, there are a number of tools
that have been developed to streamline the annotation process for robotics. Our
group introduced a tool called LabelFusion, which combines geometric perception
of point clouds with a simple user interface to very rapidly label a large number of
images[4].

Figure 6.2 - A multi-object scene from LabelFusion [3]. (Mouse over for animation)

In LabelFusion, we provide multiple RGB-D images of a static scene containing

some objects of interest, and the CAD models for those objects. First we use a
dense reconstruction algorithm, ElasticFusion, to merge the point clouds from the
individual images into a single dense reconstruction; this is just another instance
of the point cloud registration problem. The dense reconstruction algorithm also
localizes the camera relative to the point cloud. Then we provide a simple gui that
asks the user to click on three points on the model and three points in the scene
to establish the "global" correspondence, and then we run ICP to snap refine the
pose estimate. Finally, we render the pixels from the CAD model on top of all of the
images in the established pose giving beautiful pixel-wise labels.

Tools like LabelFusion can be use to label large numbers of images very quickly
(three clicks from a user produces ground truth labels in many images).

6.1.4 Synthetic datasets

All of this real world data is incredibly valuable. But we have another super powerful
tool at our disposal: simulation! Computer vision researchers have traditionally
been very skeptical of training perception systems on synthetic images, but as
game-engine quality physics-based rendering has become a commodity technology,
roboticists have been using it aggressively to supplement or even replace their real-
world datasets. The annual robotics conferences now feature regular workshops
and/or debates on the topic of "sim2real". It is still not clear whether we can
generate accurate enough art assets (with tune material properties) and
environment maps / lighting conditions to represent even specific scenes to the
fidelity we need. An even bigger question is whether we can generate a diverse
enough set of data with distributions representative of the real world to train robust
feature detectors in the way that we've managed to train with ImageNet. But for
many serious robotics groups, synthetic data generation pipelines have significantly
augmented or even replaced real-world labeled data.

For the purposes of this chapter, I aim to train an instance-level segmentation system
that will work well on our simulated images. For this use case, there is (almost) no
debate! Leveraging the pretrained backbone from COCO, I will use only synthetic
data for fine tuning.

You may have noticed it already, but the RgbdSensor that we've been using in Drake
actually has a "label image" output port that we haven't used yet.

→ color_image
→ depth_image_32f

geometry_query→ RgbdSensor → depth_image_16u

→ label_image
→ X_WB

This output port exists precisely to support the perception training use case we have
here. It outputs an image that is identical to the RGB image, except that every pixel
is "colored" with a unique instance-level identifier.

Figure 6.3 - Pixelwise instance segmentation labels provided by the "label image"
output port from RgbdSensor. I've remapped the colors to be more visually distinct.

Example 6.1 (Generating training data for instance

segmentation)
I've provided a simple script that runs our "clutter generator" from our bin picking
example that drops random YCB objects into the bin. After a short simulation,
I render the RGB image and the label image, and save them (along with some
metadata with the instance and class identifiers) to disk.

I've verified that this code can run on Colab, but to make a dataset of 10k
images using this un-optmized process takes about an hour on my big development
desktop. And curating the files is just easier if you run it locally. So I've provided
this one as a python script instead.

python3 manipulation/clutter_maskrcnn_data.py

You can also feel free to skip this step! I've uploaded the 10k images that I
generated here. We'll download that directly in our training notebook.
6.2 OBJECT DETECTION AND SEGMENTATION

There is a lot to know about modern object detection and segmentation pipelines. I'll
stick to the very basics.

For image recognition (see Figure 1), one can imagine training a standard
convolutional network that takes the entire image as an input, and outputs a
probability of the image containing a sheep, a dog, etc. In fact, these architectures
can even work well for semantic segmentation, where the input is an image and the
output is another image; a famous architecture for this is the Fully Convolutional
Network (FCN) [5]. But for object detection and instance segmentation, even the
number of outputs of the network can change. How do we train a network to output
a variable number of detections?

The mainstream approach to this is to first break the input image up into many
(let's say on the order of 1000) overlapping regions that might represent interesting
sub-images. Then we can run our favorite image recognition and/or segmentation
network on each subimage individually, and output a detection for each region that
that is scored as having a high probability. In order to output a tight bounding box,
the detection networks are also trained to output a "bounding box refinement" that
selects a subset of the final region for the bounding box. Originally, these region
proposals were done with more traditional image preprocessing algorithms, as in
R-CNN (Regions with CNN Features)[6]. But the "Fast" and "Faster" versions of R-
CNN replaced even these preprocessing with learned "region proposal networks"[7,
8].

For instance segmentation, we will use the very popular Mask R-CNN network which
puts all of these ideas, using region proposal networks and a fully convolutional
networks for the object detection and for the masks [9]. In Mask R-CNN, the
masks are evaluated in parallel from the object detections, and only the masks
corresponding to the most likely detections are actually returned. At the time of this
writing, the latest and most performant implementation of Mask R-CNN is available
in the Detectron2 project from Facebook AI Research. But that version is not quite
as user-friendly and clean as the original version that was released in the PyTorch
torchvision package; we'll stick to the torchvision version for our experiments
here.

Example 6.2 (Fine-tuning Mask R-CNN for bin picking)

The following notebook loads our 10k image dataset and a Mask R-CNN network
pre-trained on the COCO dataset. It then replaces the head of the pretrained
network with a new head with the right number of outputs for our YCB recognition
task, and then runs just a 10 epochs of training with my new dataset.

Open in Colab
(Training Notebook)

Training a network this big (it will take about 150MB on disk) is not fast. I strongly
recommend hitting play on the cell immediately after the training cell while you
are watching it train so that the weights are saved and downloaded even if your
Colab session closes. But when you're done, you should have a shiny new network
for instance segmentation of the YCB objects in the bin!

I've provided a second notebook that you can use to load and evaluate the trained
model. If you don't want to wait for your own to train, you can examine the one
that I've trained!

Open in Colab
(Inference Notebook)
Figure 6.4 - Outputs from the Mask R-CNN inference. (Left) Object detections.
(Right) One of the instance masks.

6.3 PUTTING IT ALL TOGETHER

We can use our Mask R-CNN inference in a manipulation to do selective picking from
the bin...

6.4 EXERCISES

Exercise 6.1 (Label Generation)

For this exericse, you will look into a simple trick to automatically generate
training data for Mask-RCNN. You will work exclusively in this notebook. You will
be asked to complete the following steps:

a. Automatically generate mask labels from pre-processed pointclouds.

b. Analyze the applicability of the method for more complex scenes.
c. Apply data augmentation techniques to generate more training data.

REFERENCES

[1] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al.
Imagenet large scale visual recognition challenge. International journal of
computer vision, 115(3):211--252, 2015.
[2] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects
in context. In European conference on computer vision, pages 740--755.
Springer, 2014.
[3] Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman.
Labelme: a database and web-based tool for image annotation. International
journal of computer vision, 77(1-3):157--173, 2008.
[4] Pat Marion, Peter R. Florence, Lucas Manuelli, and Russ Tedrake. A pipeline
for generating ground truth labels for real RGBD data of cluttered scenes.
In International Conference on Robotics and Automation (ICRA), Brisbane,
Australia, May 2018. [ www: ]
[5] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional
networks for semantic segmentation. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 3431--3440, 2015.
[6] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature
hierarchies for accurate object detection and semantic segmentation. In
Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 580--587, 2014.
[7] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference
on computer vision, pages 1440--1448, 2015.
[8] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards
real-time object detection with region proposal networks. In Advances in neural
information processing systems, pages 91--99, 2015.
[9] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN.
In Proceedings of the IEEE international conference on computer vision, pages
2961--2969, 2017.
CHAPTER 7 Open in Colab

Force Control
Over the last few chapters, we've developed a great initial toolkit for perceiving
objects (or piles of objects) in a scene, planning grasps, and moving the arm to
grasp. But there is a lot more to manipulation than grasping!

In fact, the signs have already been there if you were paying close attention. I'm sure
that you noticed image number 9954 in my clutter segmentation dataset. Good old
09954.png. Just in case you happened to miss it, here it is again.

I don't know how many of the other 9999 random scenes that we generated are
going to have this problem, but 9954 makes it nice and clear. How in the world are
we going to pick up that "Cheez-it" cracker box with our two-fingered gripper? (I
know there are a few of you out there thinking about how easy that scene is for your
suction gripper, but suction alone isn't going to solve all of our problems either.)

Click here to watch the video.

Figure 7.2 - Just for fun, I asked my daughter to try a similar task with the "two-
fingered gripper" I scrounged from my kitchen. How are we to program something
like that!

The term nonprehensile manipulation means "manipulation without grasping", and

humans do it often. It is easy to appreciate this fact when we think about pushing an
object that is too big to grasp (e.g. an office chair). But if you pay close attention, you
will see that humans make use of strategies like sliding and environmental contact
often, even when they are grasping. These strategies just come to the forefront in
non-grasping manipulation.

As we build up our toolkit for prehensile and nonprehensile manipulation, one of

the things that is missing from our discussion so far, which has been predominantly
kinematic, has been thinking about forces. This chapter aims to begin that
discussion. By the end, I hope you'll agree that we have a pretty satisfying way to
flip up that box!

7.1 A SIMPLE MODEL

As always in our battle against complexity, I want to find a setup that is as simple
as possible (but no simpler)! Here is my proposal for the box-flipping example.
First, I will restrict all motion to a 2D plane; this allows me to chop off two sides
of the bin (for you to easily see inside), but also drops the number of degrees of
freedom we have to consider. In particular, we can avoid the quaternion floating
base coordinates, by adding a PlanarJoint to the box. Instead of using a complete
gripper, let's start even simpler and just use a "point finger". I've visualized it as a
small sphere, and modeled two control inputs as providing force directly on the x
and z coordinates of the point mass.

Figure 7.3 - The simple model: a point finger, a cracker box, and the bin all in 2D.
The green arrows are the contact forces.

Even in this simple model, and throughout the discussions in this chapter, we will
have two dynamic models that we care about. The first is the full dynamic model
used for simulation: it contains the finger, the box, and the bin, and has a total of
5 degrees of freedom (x, z, θ for the box and x, z for the finger). The second is the
robot model used for designing the controllers: this model has just the two degrees
of freedom of the finger, and experiences unmodelled contact forces. By design, the
equations of this second model are particularly simple:

[m 0 ]v̇ = m[x¨] = mg + [ux ] + f c ,

0 m z¨ uz

where m is the mass, g = [0, −9.81]T is the gravity vector, u is the control input vector,
and f c is the contact force (from the world to the finger). Notably absent is the
contact Jacobian which normally pre-multiplies the contact forces, ∑i JiT f ci ; in the
case of a point finger this Jacobian is always the identity matrix, and there is only one
point at which contact can be made, so there is no meaningful reason to distinguish
between multiple forces.

7.1.1 (Direct) force control

Almost always, we implement a low-level controller that converts some more

intuitive command interface down into the generalized force inputs on the robot.
In the case of direct force control, the abstraction that we want is to be able to
directly "control" the contact forces. (In the general case, we must specify the
contact location to provide this interface; again the point finger allows us to ignore
that detail for the moment).

desired_contact_force→
robot_state→
ForceControl → robot_actuation
robot_accelerations OR
measured_contact_force→

Let us assume that the robot is already in contact. What information do we need
to regulate the contact forces? Certainly we need the desired contact force, f cd .
In general, we will need the robot's state (though in the immediate example, the
dynamics of our point mass are not state dependent). But we also need to either (1)
measure the robot accelerations (which we've repeatedly tried to avoid), (2) assume
the robot accelerations are zero, or (3) provide a measurement of the contact force
so that we can regulate it with feedback.

Let's consider the case where the robot is already in contact with the box. Let's
also assume for a moment that the accelerations are (nearly) zero. This is actually
not a horrible assumption for most manipulation tasks, where the movements are
relatively slow. In this case, our equations of motion reduce to

f c = −mg − u.

Our force controller implementation can be as simple as u = −mg − f cd . Note that we

only need to know the mass of the robot to implement this controller.

What happens when the robot is not in contact? In this case, we cannot reasonably
ignore the accelerations, and applying the same control results in mv̇ = −f cd . That's
not all bad. In fact, it's one of the defining features of force control that makes
it very appealing. When you specify a desired force and don't get it, the result is
accelerating the contact point in the (opposite) direction of the desired force. In
practice, this (locally) tends to bring you into contact when you are not in contact.

Example 7.1 (Commanding a constant force)

Let's see what happens when we run a full simulation which includes not only the
non-contact case and the contact case, but also the transition between the two
(which involves collision dynamics). I'll start the point finger next to the box, and
apply a constant force command requesting to get a horizontal contact force from
the box. I've drawn the x trajectory of the finger for different (constant) contact
force commands.

Figure 7.4 - This is a plot of the horizontal position of the finger as a function of
time, given different constant desired force commands.

For all strictly positive desired force commands, the finger will accelerate at a
constant rate until it collides with the box (at x = 0.089). For small f cd , the box
barely moves. For an intermediate range of f cd , the collision with the box is enough
to start it moving, but friction eventually brings the box and therefore also the
finger to rest. For large values, the finger will keep pushing the box until it runs
into the far wall.

Open in Colab

Consider the consequences of this behavior. By commanding force, we can write

a controller that will come into a nice contact with the box with essentially no
information about the geometry of the box (we just need enough perception to start
our finger in a location for which a straight-line approach will reach the box).

This is one of the reasons that researchers working on legged robots also like force
control. On a force-capable walking robot, we might mimic position control during
the "swing phase", to get our foot approximately where we are hoping to step. But
then we switch to a force control mode to actually set the foot down on the ground.
This can significantly reduce the requirements for accurate sensing of the terrain.

7.1.2 A force-based "flip-up" strategy

Here is my proposal for a simple strategy for flipping up the box. Once in contact,
we will use the contact force from the finger to apply a moment around the bottom
left corner of the box to generate the rotation. But we'll add constraints to the forces
that we apply so that the bottom corner does not slip.

Figure 7.5 - Flipping the box. Click here for an animation of the controller we'll
write in the example below. But it's worth running the code, where you can see the
contact force visualization, too!

These conditions are very natural to express in terms of forces. And once again, we
can implement this strategy with very little information about the box. The position
of the finger will evolve naturally as we apply the contact forces. It's harder for
me to imagine how to write an equally robust strategy using a (high-gain) position-
controller finger; it would likely require many more assumptions about the geometry
(and possibly the mass) of the box.

Example 7.2 (A force-based flip-up strategy)

Let's encode the textual description above. I'll use C for the contact frame between
the finger and the box, and A for the contact frame for the lower left corner of the
box contacting the bin. The friction cone provides (linear inequality) constraints
on the forces we want to apply. Within those constraints, we would like to rotate
up the box. Constrained least-squares is as natural a formulation for this as any!
Here is one version:
2
min fCCx − PID(θd , θ) ,
C
fW
C C
subject to |fW x
|≤μ
^A (mg
^ − fW z
),
fCCz ≥ 0, |fCCx | ≤ μ
^C fCCz ,

with W
RC (θ) = [− sin θ − cos θ].
cos θ − sin θ

I've used PID here as shorthand for a simple proportional-integral-derivative term.

Implementing this strategy assumes:
• We have some approximation, θ^, for the orientation of the box. We could
obtain this from a point cloud normal estimation, or even from tracking
the path of the fingers.
• We have conservative estimates of the coefficients of static friction
between the wall and the box, μ ^A , and between the finger and the box,
^C , as well as the box mass, m.
μ ^ (You should experiment and decide for
yourself whether these estimates can be loose).
• We assume that θd (t) changes slowly (in particular, that the box
accelerations are small), so that we can mostly ignore the unknown
inertial terms from the box .
Apart from the finger making some contact, there are no other assumptions about
the shape of the box!
Open in Colab

To understand this controller, let's break down the dynamics of the task.

Once the box has rotated up off the bin bottom, the only forces on the box are
the gravitational force, the force that the bin applies on the box corner, call it f A
, and the contact force, f C , that the finger applies on the box. By convention, the
contact frames A and C have the z axis aligned with the contact normal (pointing
into the box). Remember the contact geometry we use for the cracker box has
small spheres in the corners; so both of these contacts are "sphere on box", with
the box normals defining the contact normals (e.g. the x axis is aligned with the
wall of the box), and W RA = I . The friction cone constraints are:
A A A
fW z
≥ 0, |fW x
| ≤ μA fW z
,
fCCz ≥ 0, |fCCx | ≤ μC fCCz .

The orientation of frame C is a function of the box orientation, W

RC (θ), and is
written above.

The dynamics of the box can be written as

A C
mẍ = fW x
+ fW x
A C
mz̈ = fW z
+ fW z
− mg

I θ¨ =A pCM 0 ] + A pC × f C ,
W ×[ W W
−mg

where I is the moment of inertia of the box taken around the top corner, and pCM
is the position of the center of mass. We don't want our controller to depend on
most of these parameters, but it helps to understand them! If we move slowly (the
velocities and accelerations are nearly zero), then we can write
A C A C
fW x
= −fW x
, fW z
= mg − fW z
.

Taken together, the friction cone constraints on the bin imply the following
constraints on our finger contact:
C C
|fW x
| ≤ μA (mg − fW z
).

Note that the full dynamic derivation is not too much harder; I did it on paper by
writing CM pA = [− h2 , − w2 ]T , with the height, h, and width, w of the box. Then one
can write the no-slip at the bin constraint as ϕ(q) = CM pA Wx = constant which implies
that ϕ¨ = 0, which gives the full expression for f A (as long as we're in the friction
Wx
cone).

Now here's a nice approximation. By choosing to make contact near the bottom of
the box (since we know where the bin is), we can approximate
A C C
pW × fW ≈ −A pC C
Cz fCx .

In words, the moment produce by the finger contact is approximately the

tangential force on the box (times the width of the box). I therefore claim that
choosing

fCCxd = PID(θd , θ),

is a good strategy for regulating the orientation of the box. It does not assume we
know the box geometry nor inertial properties.
That's a lot of details, but all to justify a very simple and robust controller.

We have multiple controllers in this example. The first is the low-level force
controller that takes a desired contact force and sends commands to the robot
to attempt to regulate this force. The second is the higher-level controller that is
looking at the orientation of the box and deciding which forces to request from the
low-level controller.

Please also understand that this is not some unique or optimal strategy for box
flipping. I'm simply trying to demonstrate that sometimes controllers which might
be difficult to express otherwise can easily be expressed in terms of forces!

7.1.3 Indirect force control

There is a nice philosophical alternative to controlling the contact interactions by

specifying the forces directly. Instead, we can program our robot to act like a
(simple) mechanical system that reacts to contact forces in a desired way. This
philosophy was described nicely in an important series of papers by Ken Salisbury
introducing stiffness control [1] and then Neville Hogan introducing impedance
control [2, 3, 4].

This approach is conceptually very nice. With only knowledge of the parameters of
the robot itself (not the environment), we can write a controller so that when we
push on the end-effector, the end-effector pushes back (using the entire robot) as
if you were pushing on, for instance, a simple spring-mass-damper system. Rather
than attempting to achieve manipulation by moving the end-effector rigidly through
a series of position commands, we can move the set points (and perhaps stiffness)
of a soft virtual spring, and allow this virtual spring to generate our desired contact
forces.

This approach can also have nice advantages for guaranteeing that your robot won't
go unstable even in the face of unmodeled contact interactions. If the robot acts like
a dissipative system and the environment is a dissipative system, then the entire
system will be stable. Arguments of this form can ensure stability for even very
complex system, building on the rich literature on passivity theory or more generally
Port-Hamiltonian systems[5].

Our simple model with a point finger is ideal for understanding the essence of
indirect force control. The original equations of motion of our system are

m[x¨] = u + f c .
z¨

We can write a simple controller to make this system act, instead, like (passive)
mass-spring-damper system:

m[x¨] + b[x˙] + k[x − xd ] = f c ,

z¨ z˙ z − zd

with the rest position of the spring at (xd , zd ). The controller that implements this
follows easily; in the point finger case this has exactly the familiar form of a
proportional-derivative (PD) controller!

Technically, if we are just programming the stiffness and damping, as I've written
here, then a controller of this form would commonly be referred to as "stiffness
control", which is a subset of impedance control. We could also change the effective
mass of the system; this would be impedance control in its full glory. My impression,
though, is that changing the effective mass is most often considered not worth the
complexity that comes from the extra sensing and bandwidth requirements.
The literature on indirect force control has a lot of terminology and implementation
details that are important to get right in practice. Your exact implementation will
depend on, for instance, whether you have access to a force sensor and whether you
can command forces/torque directly. The performance can vary significantly based
on the bandwidth of your controller and the quality of your sensors. See e.g. [6] for
a more thorough survey (and also some fun older videos), or [7] for a nice earlier
perspective.

Example 7.3 (Teleop with stiffness control)

I didn't give you a teleop interface with direct force control; it would have been
very difficult to use! Moving the robot by positioning the set points on virtual
springs, however, is quite natural. Take a minute to try moving the box around, or
even flipping it up.

Open in Colab

To help your intuition, I've made the bin and the box slightly transparent, and
added a visualization (in orange) of the virtual finger or gripper that you are
moving with the sliders.

7.1.4 A stiffness-control-based "flip-up" strategy

Let's embrace indirect force control to come up with another approach to flipping up
the box. Flipping the box up in the middle of the bin required very explicit reasoning
about forces in order to stay inside the friction cone in the bottom left corner of
the box. But there is another strategy that doesn't require as precise control of the
forces. Let's push the box into the corner, and then flip it up.

To make this one happen, I'd like to imagine creating a virtual spring -- you can think
of it like a taught rubber band -- that we attach from the finger to a point near the
wall just a little above the top left corner of the box. The box will act like a pendulum,
rotating around the top corner, with the rubber band creating the moment. At some
point the top corner will slip, but the very same rubber band will have the finger
pushing the box down from the top corner to complete the maneuver.

Consider the alternative of writing an estimator and controller that needed to detect
the moment of slip and make a new plan. That is not a road to happiness. By only
using our model of the robot to make the robot act like a different dynamical system
at the point we can accomplish all of that!

Example 7.4 (A stiffness-control-based flip-up

strategy)
This controller almost couldn't be simpler. I will just command a trajectory the
move the virtual finger to just in front of the wall. This will push the box into
contact and establish our bracing contact force. Then I'll move the virtual finger
(the other end of our rubber band) up the wall a bit, and we can let mechanics take
care of the rest!

Open in Colab

7.1.5 Hybrid position/force control

There are a number of applications where we would like to explicitly command force
in one direction, but command positions in another. One classic examples if you are
trying to wipe or polish a surface -- you might care about the amount of force you are
applying normal to the surface, but use position feedback to follow a trajectory in
the directions tangent to the surface. In the simplest case, imagine controlling force
in z and position in x for our point finger:

kp (xd − x) + kd (ẋd − x
˙)
u = −mg + [ ].
Cd
−fWz

If want the forces/positions in a different frame, C , then we can use

kp (pCd C Cd C
Cx − pCx ) + kd (vCx − vCx )
u = −mg + W RC [ ].
−fCCzd

By commanding the normal force, you not only have the benefit of controlling
how hard the robot is pushing on the surface, but also gain some robustness to
errors in estimating the surface normal. If a position-controlled robot estimated
the normal of a wall badly, then it might leave the wall entirely in one direction,
and push extremely hard in the other. Having a commanded force in the normal
direction would allow the position of the robot in that direction to become whatever
is necessary to apply the force, and it will follow the wall.

The choice of position control or force control need not be a binary decision. We can
simply apply both the position command and a "feed-forward" force command:
ff
u = −mg + W RC [Kp (pCd C Cd C
C − pC ) + Kd (vC − vC ) + fC ].

As we'll see, this is quite similar to the interface provided by the iiwa (and many
other torque-controlled robots).

7.2 PEG IN HOLE

One of the classic problems for manipulator force control, inspired by robotic
assembly, is the problem of mating two parts together with under tight kinematic
tolerances. Perhaps the cleanest and most famous version of this is the problem
of inserting a (round) peg into a (round) hole. If your peg is square and your hole
is round, I can't help. Interestingly, much of the literature on this topic in the late
1970's and early 1980's came out of MIT and the Draper Laboratory.
Figure 7.7 - (Left) the "peg in hole" insertion problem with kinematic tolerance c.
(Middle) the desired response to a contact force. (Right) the desired response to a
contact moment. Figures reproduced (with some symbols removed) from [8].

For many peg insertion tasks, it's acceptable to add a small chamfer to the part
and/or the top of the hole. This allows small errors in horizontal alignment to be
accounted for with a simple compliance in the horizontal direction. Misalignment in
orientation is even more interesting. Small misalignments can cause the object to
become "jammed", at which point the frictional forces become large and ambiguous;
we really want to avoid this. Again, we can avoid jamming by allowing a rotational
compliance at the contact. I think you can see why this is a great example for force
control!

The key insight here is that the rotational stiffness that we want should result in
rotations around the part/contact, not around the gripper. We can program this
with stiffness control; but for very tight tolerances the sensing and bandwidth
requirements become a real limitation. One of the coolest results in force control
is the realization that a clever physical mechanism can be used to produce a
remote-centered compliance completely mechanically, with infinite bandwidth and
no sensing required [7]!

Figure 7.8 - A passive remote-center compliance device. (Left) concept schematic

from [7]. (Right) schematic of a mechanical implementation; image credit for Arne
Nordmann, 2008. See here for some example products, and here for a nice (older)
video.

Interestingly, the peg-in-hole problem inspired incredibly important and influential

ideas in mechanics and control, but also in motion planning [9]. The general ideas
are quite relevant and applicable for a wide variety of tasks for which compliant
contact is a reasonable strategy, such as opening doors, tool use, etc.

7.3 MANIPULATOR CONTROL

Using the floating finger/gripper is a good way to understand the main concepts
of force control without the details. But now it's time to actually implement those
strategies using joint commands that we can send to the arm.

Our starting point is understanding that the equations of motion for a fully-actuated
robotic manipulator have a very structured form:
M(q)q̈ + C(q, q˙)q̇ = τg (q) + u + ∑ JiT (q)f ci . (1)
i

The left side of the equation is just a generalization of "mass times acceleration",
with the mass matrix, M , and the Coriolis terms C . The right hand side is the sum
of the (generalized) forces, with τg (q) capturing the forces due to gravity, u the joint
torques supplied by the motors, and f ci is the Cartesian force due to the ith contact,
where Ji (q) is the ith "contact Jacobian". I introduced a version of these briefly when
we described multibody dynamics for dropping random objects into the bin, and
have more notes available here. For the purposes of the remainder of this chapter,
we can assume that the robot is bolted to the table, so does not have any floating
joints; I've therefore used q˙ and q¨ instead of v and v˙.

7.3.1 Joint stiffness control

In practice, the way that we most often interface with the iiwa is through the its
"joint-impedance control" mode, which is written up nicely in [10, 11]. For our
purposes, we can view this as a stiffness control in joint space:

u = −τg (q) + Kp (qd − q) + Kd (q̇ d − q˙) + τff ,

where Kp , Kd are positive diagonal matrices, and τff is a "feed-forward" torque. In

practice the controller also includes some joint friction compensation[12], but I've
left those friction terms out here for the sake of brevity. The controller does also
do some shaping of the inertias (earning it the label "impedance control" instead of
only "stiffness control"), but only the rotor inertias, and the user does not set these
effective inertias.

Check yourself: What is the difference between traditional position control with
a PD controller and joint-stiffness control?

The difference in the algebra is quite small. A PD control would typically have the
form

u = Kp (qd − q) + Kd (q̇ d − q˙),

whereas stiffness control is

u = −τg (q) + Kp (qd − q) + Kd (q̇ d − q˙).

In other words, stiffness control tries to cancel out the gravity and any other
estimated terms, like friction, in the model. As written, this obviously requires an
estimated model (which we have for iiwa, but don't believe we have for most arms
with large gear-ratio transmissions) and torque control. But this small difference
in the algebra can make a profound difference in practice. The controller is no
longer depending on error to achieve the joint position in steady state. As such
we can turn the gains way down, and in practice have a much more compliant
response while still achieving good tracking when there are no external torques.

7.3.2 Cartesian stiffness control

A better analogy for the control we were doing with the point finger example is to
write a controller so that the robot acts like a simple dynamical system in the world
frame. To do that, we have to identify a frame, C on the robot where we want to
impose these simple dynamics -- origin of this frame is the expected point of contact.
Following our treatment of kinematics and differential kinematics, we'll define the
forward kinematics of this frame as:

pC = fkin (q), vC = p˙C = J(q)q̇ , aC = p¨C = J(q)q̈ + J˙(q)q̇ . (2)

We haven't actually written the second derivative before, but it follows naturally
from the chain rule. Also, I've restricted this to the Cartesian positions; one can think
about the orientation of the end-effector, but this requires some care in defining the
3D stiffness in orientation.

One of the big ideas from manipulator control is that we can actually write the
dynamics of the robot in the frame C , by writing the joint torques in terms of a
spatial force command, F u : u = J T (q)F u . By substituting this and the manipulator
equations (1) into (2) and assuming that the only external contacts happen at C , we
can write the dynamics:

MC (q)p̈ C + CC (q, q˙)q̇ = C F g (q) + F u + F ext , (3)

where

MC = (JM −1 J T )−1 , CC = MC (JM −1 C + J˙), C

F g = MC JM −1 τg .

Now we can simply apply the controller we used in joint space, e.g.:

F u = −C F g (q) + Kp (pCd − pC ) + Kd (ṗ Cd − p˙C ) + F ff .

Please don't forget to add some terms to stabilize the null space of your Jacobian if
necessary.

7.3.3 Some implementation details on the iiwa

The implementation of the low-level impedance controllers has many details that
are explained nicely in [9, 10]. In particular, the authors go to quite some length
to implement the impedance law in the actuator coordinates rather than the joint
coordinates (remember that they have an elastic transmission in between the two).
I suspect there are many subtle reasons to prefer this, but they go to lengths to
demonstrate that they can make a true passivity argument with this controller.

The iiwa interface offers a Cartesian impedance control mode. If we want high
performance stiffness control in end-effector coordinates, then we should definitely
use it! The iiwa controller runs at a much higher bandwidth (faster update rate)
than the interface we have over the provided network API, and many implementation
details that they have gone to great lengths to get right. But in practice we do not
use it, because we cannot convert between joint impedance control and Cartesian
impedance control without shutting down the robot. Sigh. In fact we cannot even
change the stiffness gains nor the frame C (aka the "end-effector location") without
stopping the robot. So we stay in joint impedance mode and command some
Cartesian forces through τff if we desire them. (If you are interested in the driver
details, then I would recommend the documentation for the Franka Control Interface
which is much easier to find and read, and is very similar to functionality provided
by the iiwa driver.)

You might also notice that the interface we provide to the ManipulationStation
takes a desired joint position for the robot, but not a desired joint velocity. That is
because we cannot actually send a desired joint velocity to the iiwa controller. In
practice, we believe that they are numerically differentiating our position commands
to obtain the desired velocity, which adds a delay of a few control timesteps (and
sometimes non-monotonic behavior). I don't really know why we aren't allowed to
send our own joint velocity commands.
7.4 PUTTING IT ALL TOGETHER

7.5 EXERCISES

Exercise 7.1 (Force and Position Control)

Suppose you are given a point-mass system with the following dynamics:

mẍ = u + f c

where x is the position, u the input force applied on the point-mass, and f c the
contact force from the environment. In order to control the position and force
from the robot at the same time, the following controller is proposed:

u= kp (xd − x) − kd x
˙ − fc



    d 
feedback force for position control feedforward force

Define the error for this system to be:

e = (x − xd )2 + (f c − fdc )2

a. In free-space (i.e. f c = 0), can we achieve zero error for xd = 0 and a non-
zero desired force fdc ? Show your answer by writing down the steady-
state (x ¨ = 0) error as a function of the gains and the desired force, thus
˙, x
arguing that both terms of the error do not achieve zero (i.e. we cannot
achieve desired position or force).
b. Assume a rigid-body contact with a wall located at xd = 0, where a
desired non-zero force of fdc is commanded. By writing down the error at
equilibrium, show that we are able to achieve zero-error.

Exercise 7.2 (Box Flip-up)

Consider the initial phase of the box flip-up task from the example above, when
θ ≈ 0. We will assume that the forces can still be approximated as axis-aligned,
but the ground contact is only being applied at the pivoting point A.

a. If we do not push hard enough (i.e. fCCz is too small), then we will not be
able to create enough friction (fCCx ) to counteract gravity. By summing
the torques around the pivot point A, derive a lower bound for the
required normal force fCCz , as a function of m, g, and μC . You may assume
that the moment arm of gravity is half that of the moment arm on C, and
that the box is square.
b. If we push too hard (i.e. fCCz is too big), then the pivot point will end up
sliding to the left. By writing the force balance along the axii, derive an
upper bound for the required normal force fCCz , as a function of m, g, and
μA . (HINT: It will be necessary to use the torque balance equation from
before)
c. By combining the lower bounds and upper bounds from the previous
problems, derive a relation for μA , μC that must hold in order for this
motion to be possible. If μA ≈ 0.25 (wood-metal), how high does μC have
to be in order for the motion to be feasible? What if μA = 1 (concrete-
rubber)?

Exercise 7.3 (Hybrid Force-Position Control)

For this exercise, you will analyze and implement a hybrid force-position
controller to drag a book, as we saw from this video during lecture. You will work
exclusively in this notebook. You will be asked to complete the following steps:

a. Analyze the conditions for this motion to be feasible.

b. Implement a control law to implement the motion

REFERENCES

[1] K. Salisbury. Active stiffness control of a manipulator in Cartesian coordinates.

In Proc. of the 19th IEEE Conference on Decision and Control, 1980.
[2] Neville Hogan. Impedance control: An approach to manipulation. part i - theory.
Journal of Dynamic Systems, Measurement and Control, 107:1--7, Mar 1985.
[3] Neville Hogan. Impedance control: An approach to manipulation. part ii -
implementation. Journal of Dynamic Systems, Measurement and Control,
107:8--16, Mar 1985.
[4] Neville Hogan. Impedance control: An approach to manipulation. part iii -
applications. Journal of Dynamic Systems, Measurement and Control,
107:17--24, Mar 1985.
[5] Vincent Duindam, Alessandro Macchelli, Stefano Stramigioli, and Herman
Bruyninckx, editors. Modeling and Control of Complex Physical Systems: The
Port-Hamiltonian Approach. Springer-Verlag, 2009.
[6] Luigi Villani and J De Schutterm. Force control, 2008.
[7] Daniel E Whitney. Historical perspective and state of the art in robot force
control. The International Journal of Robotics Research, 6(1):3--14, 1987.
[8] Samuel Hunt Drake. Using compliance in lieu of sensory feedback for
automatic assembly. PhD thesis, Massachusetts Institute of Technology, 1978.
[9] Tomas Lozano-Perez, Matthew Mason, and Russell Taylor. Automatic synthesis
of fine-motion strategies for robots. The International Journal of Robotics
Research, 3(1):3--24, 1984.
[10] Christian Ott, Alin Albu-Schaffer, Andreas Kugi, and Gerd Hirzinger. On the
passivity-based impedance control of flexible joint robots. IEEE Transactions on
Robotics, 24(2):416--429, 2008.
[11] Alin Albu-Schaffer, Christian Ott, and Gerd Hirzinger. A unified passivity-based
control framework for position, torque and impedance control of flexible joint
robots. The international journal of robotics research, 26(1):23--39, 2007.
[12] Alin Albu-Schaffer and Gerd Hirzinger. A globally stable state feedback
controller for flexible joint robots. Advanced Robotics, 15(8):799--814, 2001.
CHAPTER 8 Open in Colab

Motion Planning
There are a few more essential skills that we need in our toolbox. In this chapter, we
will explore some of the powerful methods of kinematic trajectory motion planning.

I'm actually almost proud of making it this far into the notes without covering this
topic yet. Writing a relatively simple script for the pose of the gripper, like we did in
the bin picking chapter, really can solve a lot of interesting problems. But there are
a number of reasons that we might want a more automated solution:

1. When the environment becomes more cluttered, it is harder to write such

a simple solution, and we might have to worry about collisions between the
arm and the environment as well as the gripper and the environment.
2. If we are doing "mobile manipulation" -- our robotic arms are attached to
a mobile base -- then the robot might have to operate in many different
environments. Even if the workspace is not geometrically complicated,
it might still be different enough each time we reach that it requires
automated (but possibly still simple) planning.
3. If the robot is operating in a simple known environment all day long, then
it probably makes sense to optimize the trajectories that it is executing; we
can often speed up the manipulation process significantly.

We'll focus on the problem of moving an arm through space in this chapter, because
that is the classical and very important motivation. But I need to cover this now
for another reason, too: it will also soon help us think about programming a more
dexterous hand.

I do need to make one important caveat. Despite having done some work in this
field myself, I actually really dislike the problem formulation of collision-free motion
planning. I think that on the whole, robots are too afraid of bumping into the world
(because things still go wrong when they do). I don't think humans are solving these
complex geometric problems every time we reach... even when we are reaching in
dense clutter (I actually suspect that we are very bad at it). I would much rather
see robots that perform well even with very coarse / approximate plans for moving
through a cluttered environment, that are not afraid to make incidental contacts,
and that can still accomplish the task when they do!

8.1 INVERSE KINEMATICS

The goal of this chapter is to solve for motion trajectories. But I would argue that if
you really understand how to solve inverse kinematics, then you've got most of what
you need to plan trajectories.

We know that that the forward kinematics give us a (nonlinear) mapping from joint
angles to e.g. the pose of the gripper: X G = fkin (q). So, naturally, the problem of
inverse kinematics (IK) is about solving for the inverse map, q = fkin −1
(X G ). Indeed,
that is perhaps the most common and classical question studied in inverse
kinematics. But I want you to think of inverse kinematics as a much richer topic than
that.

For example, when we were evaluating grasp candidates for bin picking, we had only
a soft preference on the orientation of the hand relative to some antipodal grasp. In
that case, a full 6 DOF pose would have been an overly constrained specification.
And often we have many constraints on the kinematics: some in joint space (like
joint limits) and others in Cartesian space (like non-penetration constraints). So
really, inverse kinematics is about solving for joint angles in a very rich landscape of
objectives and constraints.
Click here to watch the video.

Figure 8.1 - We made extensive use of rich inverse kinematics specifications in our
work on humanoid robots. The video above is an example of the interactive inverse
kinematics interface (here to help us figure out how to fit the our big humanoid
robot into the little Polaris). Here is another video of the same tool being used for
the Valkyrie humanoid, where we do specify and end-effector pose, but we also add
a joint-centering objective and static stability constraints [1, 2].

8.1.1 Closed-form solutions

With its obvious importance in robotics, you probably won't be surprised to hear
that there is an extensive literature on inverse kinematics. But you may be surprised
at how extensive and complete the solutions can get. The forward kinematics, fkin
, is a nonlinear function in general, but it is a very structured one. In fact, with
rare exceptions (like if your robot has a helical joint, aka screw joint), the equations
governing the valid Cartesian positions of our robots are actually polynomial. "But
wait! What about all of those sines and cosines in my kinematic equations?" you say.
The trigonometric terms come when we want to relate joint angles with Cartesian
coordinates. In R3 , for two points, A and B, on the same rigid body, the (squared)
distance between them, ∥pA − pB ∥2 , is a constant. And a joint is just a polynomial
constraint between positions on adjoining bodies, e.g. that they occupy the same
point in Cartesian space. See [3] for an excellent overview.

Understanding the solutions to polynomial equations is the subject of algebraic

geometry. There is a deep literature on the theory, on symbolic algorithms, and on
numerical algorithms. For even very complex kinematic topologies, such as four-bar
linkages and Stewart-Gough platforms, we can count the number of solutions, and/
or understand the continuous manifold of solutions. For instance, [2] describes a
substantial toolbox for numerical algebraic geometry (based on homotopy methods)
with impressive results on difficult kinematics problems.

While the algebraic-geometry methods are mostly targeted for offline global
analysis, they are not designed for fast real-time inverse kinematics solutions needed
in a control loop. The most popular tool these days for real-time inverse kinematics
for six- or seven-DOF manipulators is a tool called "IKFast", described in Section 4.1
of [4], that gained popularity because of its effective open-source implementation.
Rather than focus on completeness, IKFast uses a number of approximations to
provide fast and numerically robust solutions to the "easy" kinematics problems. It
leverages the fact that a six-DOF pose constraint on a six-DOF manipulator has a
closed-form solutions with a finite number of joint space configurations that produce
the same end-effector pose, and for seven-DOF manipulators it adds a layer of
sampling in the last degree of freedom on top of the six-DOF solver.

These closed-form solutions are important to understand because they provide deep
insight into the equations, and because they can be fast enough to use inside a more
sophisticated solution approach. But the closed-form solutions don't provide the rich
specification I advocated for above; in particular, they break down once we have
inequality constraints instead of equality constraints. For those richer specifications,
we will turn to optimization.
8.1.2 IK as constrained optimization

Figure 8.2 - A richer inverse kinematics problem: solve for the joint angles, q, that
allow the robot to reach into the shelf and grab the object, while avoiding
collisions.

We have already discussed the idea of solving differential inverse kinematics as an

optimization problem. This was a very nice use case, where the

Example 8.1 (Interactive IK)

Colab.

Write minimal constraints.

Example 8.2 (Grasp the cylinder.)

Collision avoidance constraints.

Example 8.3 (Visualizing the configuration stace)

Example 8.4 (Visualizing the IK optimization problem)

Zoomed in. The global optimization problem.

Global IK.

8.1.3 Grasp planning as IK

8.2 KINEMATIC TRAJECTORY OPTIMIZATION

8.3 SAMPLE-BASED MOTION PLANNING

8.4 EXERCISES

Exercise 8.1 (Door Opening)

For this exercise, you will implement a optimization-based inverse kinematics
solver to open a cupboard door. You will work exclusively in this notebook. You
will be asked to complete the following steps:

a. Write down the constraints on the IK problem of grabbing a cupboard

handle.
b. Formalize the IK problem as an instance of optimization.
c. Implement the optimization problem using MathematicalProgram.

Exercise 8.2 (RRT Motion Planning)

For this exercise, you will implement and analyze the RRT algorithm introduced
in class. You will work exclusively in this notebook. You will be asked to complete
the following steps:

a. Implement the RRT algorithm for the Kuka arm.

b. Answer questions regarding its properties.

REFERENCES

[1] Maurice Fallon, Scott Kuindersma, Sisir Karumanchi, Matthew Antone, Toby
Schneider, Hongkai Dai, Claudia Pérez D'Arpino, Robin Deits, Matt DiCicco,
Dehann Fourie, Twan Koolen, Pat Marion, Michael Posa, Andrés Valenzuela,
Kuan-Ting Yu, Julie Shah, Karl Iagnemma, Russ Tedrake, and Seth Teller. An
architecture for online affordance-based perception and whole-body planning.
Journal of Field Robotics, 32(2):229--254, September 2014. [ www: ]
[2] Pat Marion, Maurice Fallon, Robin Deits, Andrés Valenzuela, Claudia Pérez
D'Arpino, Greg Izatt, Lucas Manuelli, Matt Antone, Hongkai Dai, Twan Koolen,
John Carter, Scott Kuindersma, and Russ Tedrake. Director: A user interface
designed for robot operation with shared autonomy. Journal of Field Robotics,
1556-4967, 2016. [ www: ]
[3] Charles W. Wampler and Andrew J. Sommese. Numerical algebraic geometry and
algebraic kinematics. Acta Numerica, 20:469--567, 2011.
[4] Rosen Diankov. Automated Construction of Robotic Manipulation Programs. PhD
thesis, Carnegie Mellon University, August 2010.
CHAPTER 9
Reinforcement Learning
This is a placeholder chapter. Content to be filled in. In the mean time, I have some
relevant notes here.

9.1 EXERCISES

Exercise 9.1 (Stochastic Optimization)

For this exercise, you will implement a stochastic optimization scheme that does
not require exact analytical gradients. You will work exclusively in this notebook.
You will be asked to complete the following steps:

a. Implement gradient descent with exact analytical gradients.

b. Implement stochastic gradient descent with approximated gradients.
c. Prove that the expected value of the stochastic update does not change
with baselines.
d. Implement stochastic gradient descent with baselines.

Exercise 9.2 (REINFORCE)

For this exercise, you will implement the vanilla REINFORCE algorithm on a box
pushing task. You will work exclusively in this notebook. You will be asked to
complete the following steps:

a. Implement the policy loss function.

b. Implement the value loss function.
c. Implement the advantage function.
APPENDIX A
Drake
DRAKE is the main software toolbox that we use for this text, which in fact originated
in a large part due to the MIT underactuated course. The DRAKE website is the main
source for information and documentation. The goal of this chapter is to provide any
additional information that can help you get up and running with the examples and
exercises provided in these notes.

A.1 PYDRAKE
DRAKE is primarily a C++ library, with rigorous coding standards and a maturity
level intended to support even professional applications in industry. In order to
provide a gentler introduction, and to facilitate rapid prototyping, I've written these
notes exclusively in python, using Drake's python bindings (pydrake). These bindings
are less mature than the C++ backend; your feedback (and even contributions) are
very welcome. It is still improving rapidly.

In particular, while the C++ API documentation is excellent, the autogenerated

python docs are still a work in progress. I currently recommend using the C++
documentation to find what you need, then checking the Python documentation only
if you need to understand how the class or method is spelled in pydrake.

There are also a number of tutorials in DRAKE that can help you get started.

A.2 ONLINE JUPYTER NOTEBOOKS

I will provide nearly all examples and exercises in the form of a Jupyter Notebook
so that we can leverage the fantastic and relatively recent availability of (free)
cloud resources. In particular I have chosen to focus on using Google's Colaboratory
(Colab) as our primary platform. Most of the chapters have a corresponding
notebook that you can run on the cloud without installing anything on your machine!

Colab is a fantastic resource, with generous computational resources and a very

convenient interface that allows you to save/load/share your work on Google Drive.
Although we cannot provision the machines ahead of time, we can provision them
quickly with a slightly ugly install script at the top of every notebook. The first time
you open each notebook it takes about two minutes to provision the machine. At the
time of this writing, it seems that the machine will stay provisioned for you for 90
minutes if you close the browser, or 12 hours if you keep it open. Provisioning is per
user and appears relatively unlimited.

A.3 RUNNING ON YOUR OWN MACHINE

As you get more advanced, you will want to run (and extend) these examples on your
own machine. The DRAKE website has a number of installation options, including
precompiled binaries and Docker instances. Here I provide an example workflow of
setting up drake from the precompiled binaries and running the textbook examples.

First, pick your platform (click on your OS):

Ubuntu Linux (Bionic) | Mac Homebrew

A.3.1 Install Drake

The links below indicate the specific distributions that the examples in this text have
been tested against.

Download the binaries

curl -o drake.tar.gz https://drake-packages.csail.mit.edu/drake/nightly/drake-latest-bionic.

Unpack and set your PYTHONPATH and Test

sudo tar -xvzf drake.tar.gz -C /opt

sudo /opt/drake/share/drake/setup/install_prereqs
export PYTHONPATH=/opt/drake/lib/python3.6/site-packages:${PYTHONPATH}

python3 -c 'import pydrake; print(pydrake.file)

A.3.2 Download the textbook supplement

git clone https://github.com/RussTedrake/manipulation.git

and install the prerequisites using the platform-specific installation script provided:

cd manipulation
sudo scripts/setup/ubuntu/18.04/install_prereqs.sh
pip3 install --requirement requirements.txt
export PYTHONPATH=`pwd`:${PYTHONPATH}

Note that, as always, I would strongly recommend running pip commands in a

virtualenv.

A.3.3 Run Jupyter Notebook

You'll likely want to start from the manipulation root directory. Then launch your
notebook with:

jupyter notebook

The examples for each chapter that has them will be in a .ipynb file right alongside
the chapter's html file, and the notebook exercises are all located in theexercises
subdirectory.

A.4 GETTING HELP

If you have trouble with DRAKE, please follow the advice here. If you have trouble
with the manipulation repo, you can check for known issues (and potentially file a
new one) here.
APPENDIX B
Setting up your own
"Manipulation Station"
Simulation is an extremely powerful tool, and the fact that we can give it away
freely makes it a powerful equalizer. Just a few years ago, nobody believed that you
could develop a manipulation system in simulation and expect it to work in reality.
Nowadays, though, we are seeing consistent evidence that our simulation fidelity
is high enough that if can achieve robust performance in simulation, then we are
optimistic that it can transfer to reality.

Of course, there is no substitute to watching your ideas/code bring life to a real

physical robot. The goal of this chapter is to help commoditize the robot hardware
infrastructure, to the extent possible, by proving the bill of materials, the low-level
drivers, and basic setup instructions to get you up and running with your very own
manipulation station.

B.1 MESSAGE PASSING

DRAKE supports multiple message passing subsystems (including ROS1), and for the
most part I have no strong opinions about what you should use. We use LCM for
this class because (1) it is lightweight as a dependency (it would be very painful/
restrictive to have ROS as a mandatory dependency for DRAKE) and (2) we prefer
multicast UDP to tcp/ip for the lowest level communications to/from the robot.

As a result, the primary drivers we list/provide below are based on LCM. I'll try
to point to ROS equivalents when possible. If you already have an established ROS
workflow and still want to use our drivers, then it is also possible to use simple
lcm2ros/ros2lcm "bridge" executables to e.g. publish the LCM messages on a ROS
topic (for logging, etc).

B.2 KUKA LBR IIWA + SCHUNK WSG GRIPPER

We have made extensive use of both the iiwa7 R800 and the iiwa14 R820; DRAKE
provides URDFs for both models and the drivers should work for either. We use the
iiwa7 for the labs in this course because it is slightly smaller and we do not need the
end-effector payload.

• IIWA LCM driver

We've been quite happy with the Schunk WSG 50 gripper as a reliable and accurate
two-fingered gripper solution.

• Schunk LCM driver

Note: We are evaluating the Franka Panda as an alternative platform (and are likely
to provide drivers for it here soon).

B.3 INTEL REALSENSE D415 DEPTH CAMERAS

The D415 has been the camera of choice because of it's relatively good minimum
range and due to the fact that multiple cameras do not interfere with each other
when viewing the same scene.
• LCM Driver
• Camera calibration toolbox.

B.4 MISCELLANEOUS HARDWARE.

For the purposes of having a common workstation that we can use in the labs, we
have included here the simple environment in which we mount the robot. The robot
just barely fits in the arena -- it was designed with the dimensions limited so that it
can be moved (via a manual pallet jack) through standard office doors.

CAD and bill of materials coming soon.

APPENDIX C
Miscellaneous
C.1 HOW TO CITE THESE NOTES

Thank you for citing these notes in your work. Please use the following citation:

Russ Tedrake. Robot Manipulation: Perception, Planning, and Control (Course

Notes for MIT 6.881). Downloaded on [date] from http://manipulation.csail.mit.edu/

C.2 ANNOTATION TOOL ETIQUETTE

My primary goal for the annotation tool is to host a completely open dialogue on
the intellectual content of the text. However, it has turned out to serve an additional
purpose: it's a convenient way to point out my miscellaneous typos and grammatical
blips. The only problem is that if you highlight a typo, and I fix it 10 minutes
later, your highlight will persist forevermore. Ultimately this pollutes the annotation
content.

There are two possible solutions:

• you can make a public editorial comment, but must promise to delete that
comment once it has been addressed.
• You can join my "editorial" group and post your editorial comments using
this group "scope".

Ideally, once I mark a comment as "done", I would appreciate it if you can delete that
comment.

I highly value both the discussions and the corrections. Please keep them coming,
and thank you!

C.3 PLEASE GIVE ME FEEDBACK!

I'm very interested in your feedback. The annotation tool is one mechanism, but you
can also comment directly on the YouTube lectures, or even add issues to the github
repo that hosts these course notes.

I will also post a proper survey questionnaire here once the notes/course has existed
long enough for that to make sense.

Jean-Claude Latombe Auth. Robot Motion Planning
100% (1)
Jean-Claude Latombe Auth. Robot Motion Planning
667 pages
Introduction To Robotics Mechanics and Control - J J Craig
100% (1)
Introduction To Robotics Mechanics and Control - J J Craig
459 pages
CNH Spec MAT0101Q (86628043) - Approved Paint Materials
100% (3)
CNH Spec MAT0101Q (86628043) - Approved Paint Materials
21 pages
Wiley - Robot Modeling and Control - 978-0-471-64990-8 PDF
No ratings yet
Wiley - Robot Modeling and Control - 978-0-471-64990-8 PDF
3 pages
Robert J. Shilling-Fundamentals of Robotics PDF
97% (37)
Robert J. Shilling-Fundamentals of Robotics PDF
447 pages
Foundations of Robotics Tsuneo Yoshikawa
No ratings yet
Foundations of Robotics Tsuneo Yoshikawa
308 pages
Principles of Motion Howie Choset, Kevin M. Lynch, Seth Hutchinson
100% (3)
Principles of Motion Howie Choset, Kevin M. Lynch, Seth Hutchinson
616 pages
Robotic Manipulation Notes
No ratings yet
Robotic Manipulation Notes
138 pages
Principles of Robot Motion Theory, Algorithms, and Implementations PDF
No ratings yet
Principles of Robot Motion Theory, Algorithms, and Implementations PDF
616 pages
A 01 Front Matter
No ratings yet
A 01 Front Matter
3 pages
Rosen Diankov Thesis
No ratings yet
Rosen Diankov Thesis
263 pages
已读2022（3区）Review of Learning-Based Robotic Manipulation in
No ratings yet
已读2022（3区）Review of Learning-Based Robotic Manipulation in
37 pages
Principles of Robot Motion
No ratings yet
Principles of Robot Motion
620 pages
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
No ratings yet
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
69 pages
Modern Robotics Mechanics Planning and Control Frank C. Park download
100% (1)
Modern Robotics Mechanics Planning and Control Frank C. Park download
51 pages
54584
No ratings yet
54584
64 pages
Lecture3 robotics
No ratings yet
Lecture3 robotics
19 pages
0521895944-Carte Roboti
No ratings yet
0521895944-Carte Roboti
361 pages
134922867
No ratings yet
134922867
8 pages
Vision 2 Motion Planning 1
No ratings yet
Vision 2 Motion Planning 1
50 pages
httpsjrobio.springeropen.comcounterpdf10.1186s40638-016-0056-9.pdf
No ratings yet
httpsjrobio.springeropen.comcounterpdf10.1186s40638-016-0056-9.pdf
10 pages
Robert J Shilling Fundamentals of Robotics PDF
No ratings yet
Robert J Shilling Fundamentals of Robotics PDF
447 pages
Robotics and Control by Mittal and Nagrath
No ratings yet
Robotics and Control by Mittal and Nagrath
502 pages
180825 ScrewTheoryRoboticsKINEMATICS PBook PrePrint VShort
No ratings yet
180825 ScrewTheoryRoboticsKINEMATICS PBook PrePrint VShort
29 pages
Field Robotics
No ratings yet
Field Robotics
70 pages
000003
No ratings yet
000003
444 pages
Introduction To Robotics: Texts
No ratings yet
Introduction To Robotics: Texts
17 pages
Immediate download Modern Robotics Mechanics Planning and Control Frank C. Park ebooks 2024
100% (2)
Immediate download Modern Robotics Mechanics Planning and Control Frank C. Park ebooks 2024
55 pages
1 Dobot
No ratings yet
1 Dobot
4 pages
2001.09950v1
No ratings yet
2001.09950v1
28 pages
Engineering Foundations of Robotics
100% (1)
Engineering Foundations of Robotics
288 pages
54584
No ratings yet
54584
66 pages
Fandamentals of Robotics
No ratings yet
Fandamentals of Robotics
447 pages
Tesi
No ratings yet
Tesi
100 pages
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
No ratings yet
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
3 pages
Robotics Powerpoint Presentation
100% (3)
Robotics Powerpoint Presentation
25 pages
已读（4区）机器人抓取检测技术的研究综述
No ratings yet
已读（4区）机器人抓取检测技术的研究综述
40 pages
Barraquand91 RPP
No ratings yet
Barraquand91 RPP
24 pages
Instant Access to Modern Robotics Mechanics Planning and Control Frank C. Park ebook Full Chapters
100% (3)
Instant Access to Modern Robotics Mechanics Planning and Control Frank C. Park ebook Full Chapters
62 pages
MCT-352_Rob_2022_Week #2 (1)
No ratings yet
MCT-352_Rob_2022_Week #2 (1)
48 pages
Robotics by Ganesh Hegde
No ratings yet
Robotics by Ganesh Hegde
118 pages
m&r de Zg561 Course Handout
No ratings yet
m&r de Zg561 Course Handout
8 pages
Neural_Networks_in_Robotics
No ratings yet
Neural_Networks_in_Robotics
12 pages
Robotics: R&N: CH 25 R&N: CH 25
No ratings yet
Robotics: R&N: CH 25 R&N: CH 25
18 pages
Ris Module1 Bai 3603
No ratings yet
Ris Module1 Bai 3603
18 pages
Making Objects Graspable in Confined Environments Through Push and Pull Manipulation With A Tool
No ratings yet
Making Objects Graspable in Confined Environments Through Push and Pull Manipulation With A Tool
8 pages
Dynamic Path Planning For Dexterous Manipulation A Matlab Implementation IJERTV13IS100037
No ratings yet
Dynamic Path Planning For Dexterous Manipulation A Matlab Implementation IJERTV13IS100037
8 pages
Download Modern Robotics Mechanics Planning and Control Frank C. Park ebook All Chapters PDF
100% (1)
Download Modern Robotics Mechanics Planning and Control Frank C. Park ebook All Chapters PDF
65 pages
Good 3
No ratings yet
Good 3
15 pages
!!!trends and Challenges in Robot Manipulation 2019
No ratings yet
!!!trends and Challenges in Robot Manipulation 2019
10 pages
Robotics Lec01
No ratings yet
Robotics Lec01
34 pages
AI Presentation
No ratings yet
AI Presentation
20 pages
Robotic Grasping and Manipulation of Objects: András Fekete University of New Hampshire
No ratings yet
Robotic Grasping and Manipulation of Objects: András Fekete University of New Hampshire
5 pages
EC7010 Robotics and Automation
No ratings yet
EC7010 Robotics and Automation
24 pages
Kubernetes Unleashed: Herding Containers Like a DevOps Cowboy
From Everand
Kubernetes Unleashed: Herding Containers Like a DevOps Cowboy
Scott Markham
No ratings yet
IBM DB2 9.7 Advanced Application Developer Cookbook
From Everand
IBM DB2 9.7 Advanced Application Developer Cookbook
Mohankumar Saraswatipura
No ratings yet
The World Of Agile:Incarnation Of DevOps
From Everand
The World Of Agile:Incarnation Of DevOps
Binayaka Mishra
No ratings yet
Java Programming: Algorithms and Structures
From Everand
Java Programming: Algorithms and Structures
Tanushri Kaniyar
No ratings yet
OpenSceneGraph 3.0 Beginner's Guide
From Everand
OpenSceneGraph 3.0 Beginner's Guide
Rui Wang
No ratings yet
Angular 2 Cookbook
From Everand
Angular 2 Cookbook
Matt Frisbie
4/5 (1)
Unreal Development Kit Game Programming with UnrealScript Beginner's Guide
From Everand
Unreal Development Kit Game Programming with UnrealScript Beginner's Guide
Rachel Cordone
3/5 (1)
Characters of Insect Orders Orthoptera, Dictyoptera, Dermaptera, Embioptera, Mallophaga & Siphunculata
100% (1)
Characters of Insect Orders Orthoptera, Dictyoptera, Dermaptera, Embioptera, Mallophaga & Siphunculata
15 pages
Krishna in His Last Days - DR Ashok Sharma PDF
0% (1)
Krishna in His Last Days - DR Ashok Sharma PDF
104 pages
Atomic Spectra
100% (1)
Atomic Spectra
12 pages
Annex 1 RESOLUTION MEPC.123 (53) Adopted On 22 July 2005 Guidelines For Ballast Water Management Equivalent Compliance (G3)
No ratings yet
Annex 1 RESOLUTION MEPC.123 (53) Adopted On 22 July 2005 Guidelines For Ballast Water Management Equivalent Compliance (G3)
4 pages
Reability PDF
100% (1)
Reability PDF
396 pages
Definition of Terms - Plumbing2
No ratings yet
Definition of Terms - Plumbing2
15 pages
B-S Distribution (Kundu)
No ratings yet
B-S Distribution (Kundu)
108 pages
Eternal Progression - Justin Titus
100% (1)
Eternal Progression - Justin Titus
47 pages
statistics Project Part 1
No ratings yet
statistics Project Part 1
7 pages
Smart Urban Farming Kits Business Plan
No ratings yet
Smart Urban Farming Kits Business Plan
2 pages
Lecture 1 Drude Model
No ratings yet
Lecture 1 Drude Model
8 pages
Major Project Updated
No ratings yet
Major Project Updated
5 pages
FFP08S60S 8 A, 600 V, Stealth II Diode: Features Description
No ratings yet
FFP08S60S 8 A, 600 V, Stealth II Diode: Features Description
6 pages
3.7 - 3.9 Calc New
No ratings yet
3.7 - 3.9 Calc New
8 pages
White Revolution: Objectives
No ratings yet
White Revolution: Objectives
3 pages
2019 May MA204-E - Ktu Qbank
No ratings yet
2019 May MA204-E - Ktu Qbank
3 pages
Protection Cathodic PDF
No ratings yet
Protection Cathodic PDF
123 pages
Shivam Enviroquip Goa
No ratings yet
Shivam Enviroquip Goa
7 pages
ROintro 1
No ratings yet
ROintro 1
15 pages
18 Hilux: Front Wiper and Washer
No ratings yet
18 Hilux: Front Wiper and Washer
1 page
Boeco LaboratoryEquipment
No ratings yet
Boeco LaboratoryEquipment
30 pages
Thép W6
No ratings yet
Thép W6
4 pages
New Resid Fluid Catalytic Cracking (FCC) Catalyst Technology For Maximum Distillates Yield Demonstrated in Big West Oil's Salt Lake City Refinery
No ratings yet
New Resid Fluid Catalytic Cracking (FCC) Catalyst Technology For Maximum Distillates Yield Demonstrated in Big West Oil's Salt Lake City Refinery
12 pages
Manual Elevador Launch TLT 245
No ratings yet
Manual Elevador Launch TLT 245
16 pages
Splat Book
No ratings yet
Splat Book
170 pages
Ggsipu Bba Cutoff
No ratings yet
Ggsipu Bba Cutoff
18 pages
Bca - Code On Envelope Thermal Performance For Buildings PDF
No ratings yet
Bca - Code On Envelope Thermal Performance For Buildings PDF
84 pages
Wave Optics CH1-2
No ratings yet
Wave Optics CH1-2
2 pages
Thyristor Converters: - Controlled Conversion of Ac Into DC
No ratings yet
Thyristor Converters: - Controlled Conversion of Ac Into DC
39 pages