Rapport2023 Massa
Rapport2023 Massa
A ROS 2 Interface
for the UMI-RTX robotic arm
Théo MASSA
ENSTA Bretagne, France
Under the supervision of Arnoud Visser
Intelligent Robotics Lab, Universiteit van Amsterdam, The Netherlands
https://github.com/gardegu/LAB42_RTX_control
3.1 The banana plush on the dark horizontal plane that supports the arm . . . 15
3.2 Example of detected banana and its contour with OpenCV . . . . . . . . . 16
3.3 The ZED Mini stereo device . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 One of the views of the chessboard used to calibrate the ZED Mini device . 17
3.5 Detected corners on the precedent left view, associated to the declared
pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Work scene view and associated rectified image . . . . . . . . . . . . . . . 18
3.7 Left view of the scene and the associated depth map . . . . . . . . . . . . 18
3.8 Left view with target detection on and depth map . . . . . . . . . . . . . . 20
3.9 3D point cloud of the scene. Front and side view. On the bottom of the
front view, one can see the 3D model of the stereo camera . . . . . . . . . 20
1
2
Acknowledgement
I would like to deeply thank LAB42 and its team, especially my supervisor, Arnoud, and
his colleague, Joey, for their contribution to this work and for the help always provided
when needed. This internship has been a great opportunity. It has been a privilege to
be welcomed by the University of Amsterdam, a top-rated center of scientific knowledge,
and to have access to state-of-the-art technology that has allowed us to go further in my
research and to open my horizons of possibilities. I have had great working conditions
here and really enjoyed my stay.
3
Résumé
Le but de ce projet est de travailler avec un vieux bras robotique appelé UMI-RTX (créé
dans les années 1980) et de lui faire saisir des objets sur un plan à l’aide de sa pince. Des
travaux ont déjà été réalisés sur ce robot par des étudiants, mais principalement avec de
vieux outils. L’idée de ce projet est d’implémenter une nouvelle façon de le faire fonction-
ner et d’utiliser des outils plus récents. Plus précisément, mon objectif est de mettre en
place un environnement ROS 2 et de construire une interface qui permettra d’effectuer
l’analyse d’images, la planification de trajectoires et la saisie de cibles.
Une banane en peluche a été choisie comme cible. Grâce à la bibliothèque de vision
par ordinateur OpenCV et au kit de développement logiciel de Stereolabs (le fabricant de
notre caméra), nous avons réussi, dans un nœud ROS 2 dédié, à détecter la banane sur le
terrain et à calculer sa position en 3D.
Une fois cela fait, cette information est envoyée à un nœud dédié à la cinématique
inverse. Dans ce nœud, les états des articulations nécessaires pour atteindre la pose visée
sont traités et envoyés à la fois à une simulation et au bras réel. Pour cela, nous avons
deux nœuds, chacun dédié à sa propre partie, l’un pour la simulation, l’autre pour le bras
réel.
Enfin, nous avons conçu une interface utilisateur graphique (GUI) personnalisée dans
laquelle la simulation, l’image traitée et la carte de profondeur sont intégrées, et dans
laquelle nous pouvons choisir entre le contrôle automatique du bras et un mode manuel,
dans lequel nous pouvons choisir notre propre pose cible.
Abstract
The aim of this project is to work with an old robotic arm called the UMI-RTX (created
in the 1980’s) and make it grab objects on a plane with its gripper. Some work has
already been done on this robot by students, but mainly with old tools. The idea of this
project is to implement a new way of making it work and to use more recent tools. More
specifically, my goal is to set up a ROS 2 environment and build an interface that will
allow to perform image analysis, trajectory planning, and target grabbing.
A plush banana has been chosen as a target. With the computer vision library OpenCV
and the software development kit of Stereolabs (the manufacturer of our camera), we man-
aged, in a dedicated ROS 2 node, to detect the banana on the field and compute its 3D
position.
Once this is done, this information is sent to a node dedicated to inverse kinematics.
In this node, the joints’ states required to reach the aimed pose are processed and sent
both to a simulation and the real arm. For this, we have two nodes, each dedicated to its
own part, one for the simulation, the other for the real arm.
Finally, we designed a custom Graphic User Interface (GUI) in which the simulation,
processed image and the depthmap are integrated, and in which we are able to choose
between automatic control of the arm and a manual mode, where we can choose our own
target pose.
Thanks to this process, the arm is now able to follow and grab the banana. We can
display the result in our GUI and choose to manipulate the arm or let it go through the
automatic process.
Contents
1 Introduction 6
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Arm manipulation 7
2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 URDF description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Communication with the arm . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Inverse kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Docker image 21
4.1 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Necessity of Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Building the bespoke project’s image . . . . . . . . . . . . . . . . . . . . . 22
5 ROS 2 Interface 23
5.1 Configuration and ROS 2 presentation . . . . . . . . . . . . . . . . . . . . 23
5.2 ROS 2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 Custom GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Conclusion 28
5
Chapter 1
Introduction
1.1 Context
This project is led in an internship context, mainly for educational purposes, at Lab42 in
the University of Amsterdam, the Netherlands.
It is important to notice that this internship was done in collaboration with Guillaume
Garde, another ENSTA Bretagne student in robotics. Therefore, we worked together on
this project. However I will focus more on what I did myself and only explain Guillaume’s
work without going too much in details as it is important too for the whole purpose of
the internship. The way less detailed part concerning Guillaume’s work will be indicated.
The subject of this paper is an old industrial arm, the UMI-RTX, which was created in
the 1980s [7][8]. Despite his age, the arm is still compatible with recent software and
hardware. Its educational appeal is obvious when we consider that a lot of work has
already been done on it, especially by Van Der Borght [1] and Dooms [3] who worked on
a ROS 1 interface in order to control it.
1.2 Objectives
The objectives of this project are plural. First, it is to improve our ROS 2 skills and learn
new knowledge. We have studied, on various levels, a wide scope of robotic disciplines like
computer vision, simulation, C++, network, and embedded Linux. Our goal, shared by
ENSTA Bretagne, our school, is to consolidate them. As students, this kind of internship
is strong in apprenticeships. Then, more importantly, the main objective of this project
is to create a ROS 2 interface in order to be able to control the arm. The robot should
be able to detect a recognizable object, a yellow banana plush in our case, move to this
object, and grab it. To this extent, we have to rely on previous works [1][3] and create,
from not much except the hardware drivers, a full ROS 2 interface.
1.3 Contribution
This project has been split in two. Guillaume’s objective was to manage the computer
vision part in order to succeed in our objectives. I personally worked on the arm ma-
nipulation, created a simulation, managed the ROS 2 architecture and created a working
custom GUI that will make all those aspects work together. The both of us have worked
together on building the Docker image for this project.
6
Chapter 2
Arm manipulation
2.1 Description
This project uses the UMI-RTX arm, which is quite basic in its composition [7]. Indeed,
it is composed of an axis to translate on the z-axis and a three-part arm, where each part
is connected to another through revolute joints. Those joints can be controlled through
both position and velocity, but in this project, they are only controled through position,
as it is more adequate to our project, which is to grab a target, a mission that requires
to go to a specific position. Our method is also more adapted to a position control. Each
motor has encoders [7] that allow it to be controlled and know its state.
7
8
As shown in Figure 2.1, this arm can be compared to a human arm. Joint 1 corre-
sponds to the shoulder, joint 2 to the elbow, and ensembles 3-4-5 to the wrist. For the
rest of this document, they will be referred to as in the following table:
One typical characteristic of this arm is how the roll and pitch of the hand work. They
are not controlled separately but together by two motors, one on each side, causing two
rotation axis at the same origin. A view of this system can be seen in the following figure:
This particular system has to be taken into account when controlling the arm, and
the two motors will be referred to as WRIST1 and WRIST2.
extract of the description of the arm. The entire description can be found on Appendix
C 1.
Listing 2.1: URDF Description of the arm
<robot name="umi-rtx">
<link name="base_link">
<visual>
<geometry>
<box size="1.252 0.132 0.091"/>
</geometry>
<origin rpy="0 -1.57 1.57" xyz="0 -0.0455 0"/>
<material name="blue">
<color rgba="0 0 .8 1"/>
</material>
</visual>
</link>
<link name="shoulder_link">
<visual>
<geometry>
<box size="0.278 0.132 0.091"/>
</geometry>
<origin rpy="0 -1.57 0" xyz="0 0 0"/>
<material name="white">
<color rgba="1 1 1 1"/>
</material>
</visual>
</link>
This description will be particularly useful when it comes to seeing the virtual model
in our simulation and processing the inverse kinematics (see section 2.4 ).
There is only one main difference between this description and reality, which is the
wrist, particularly the pitch and roll. In this description, there are two independent joints
dedicated to pitch and roll, whereas in reality, it was said before that two motors worked
together to handle those angles. Therefore, one have to be careful when converting this
1
The URDF description comes from here :
https://github.com/LHSRobotics/hsrdp/blob/master/hsrdp_bridge/umi_rtx_100.urdf
10
On this model, every frames is attached to its part and represented in red, green and
blue, for the x, y and z axis.
this is that it is more intuitive to have a dedicated interface instead of controlling through
the terminal. See section 5.4 for more details. For this, one had to look into the source
code of the drivers, understand how they manage to communicate with the arm, and reuse
their functions in its own code.
One particularity of the arm, is that the motors are controlled by two 8031 chips, IPs2
called. Each IP ensures the proper operation of a selection of motors. Table 2.2 shows
which motors there are with their corresponding IP. So, for example, to move the arm up
and down (ZED), it must first be switched to IP1. Once switched, the new command of
the motor can be entered [3]. IPC stands for Intelligent Peripheral Communication, and
it uses three possible ways of communication, relying on a request from the computer and
a response from the arm (see Figure 2.5).
In order to use the arm, a precise procedure have to be followed. First, one have to
start communication with the arm by launching the daemon and specifying which USB
port is used by the arm. Then, in our code, the communications are initialized by sending
a certain command to the arm, and every time the arm is used, an initialization procedure
has to be processed for the arm to know what the encoder’s limits are. Indeed, there is no
real memory of the encoders’ limits and parameters, so those have to be initialised before
every use of the arm. Fortunately, the drivers contain everything necessary to do so.
After the initialization process, everything is ready to command the arm. All that’s
left is to use the predefined commands to set motors’ positions in memory and then go
to those position. It is indeed a two-step process to command the motors.
The inverse kinematics process is way more complicated than forward kinematics be-
cause there are none, one, or multiple solutions, and the difficulty increases with the
number of joints or degrees of freedom. Fortunately, each joint has only one degree of
freedom, so computation is a bit simplified.
To compute those inverse kinematics, a C++ library named Pinocchio [2] was used,
which allows us to create algorithms that will process the inverse kinematics. This library
was selected due to its versatility and efficiency, but also and mainly because of its inte-
gration into ROS 2 packages.
To process the inverse kinematics, a method called CLIK, for Closed-Loop Inverse
Kinematics [4], was chosen, using methods and object from Pinocchio library. This iter-
ative algorithm allows us to find the best state of each joint in order to be as close as
possible to an objective defined by a position (x, y, z) and an orientation (yaw, pitch, roll).
13
R = Rϕ .Rθ .Rψ
Finally, the desired pose lies in SE(3) space, defined by the desired position and R,
the desired rotation.
Then one can define q, a vector defining the initial state of the arm. Each value of q
corresponds to a joint "value". For example, the joint value for ZED will be in metres,
whereas SHOULDER, ELBOW... are in radians.
q0 ZED
q1 SHOU LDER
q2 ELBOW
q= =
q 3
Y AW
q4 P IT CH
q5 ROLL
This vector is then initialised, whether at the neutral position of the arm or at the last
known state. Initialising this vector at the last state known allows a sort of continuity in
the solutions, because of the iterative method that is used after that.
Once all those parameters are set, the iterative process can begin.
• First, the forward kinematics with the current configuration defined by the vector
q is computed.
• Then, the transformation T ∈ SE(3) between the current pose and the desired one
is calculated and the logarithmic error computed, which is defined by :
err = log(T )
v = −J T (JJ T + λ.I)−1 .e
v can be considered as the speed vector that will get the configuration closer to the
desired one.
• Then one can integrate q = q + v.dt and reiterate this process.
Once this iterative process is over, the required configuration is stored in q in order to reach
the desired pose. Finally, the angles needs to be transformed from [0, 2π] to [−180, 180]
for practical purposes and the required state is sent on the corresponding ROS topic
/motor_commands.
As one can see in the figures above, this method is quite efficient as the error decreases
exponentially. It also confirms the fact that our algorithm converges towards a solution.
On the other side of the coin, the smaller the ϵ, the greater the number of iterations
required, up to a point where it is no longer possible to converge within a reasonable time
or even to converge at all.
See Chapter 4 to see how the inverse kinematic calculation is integrated into our ROS 2
architecture.
Chapter 3
To make a robotic arm grab a target, a strong starting move is to rely on one key element:
computer vision [11]. This element is a set of several techniques to see the scene of interest
with an optical device and extract valuable information from it. In this project, these
techniques are used to detect a target and get its 3D position in the camera’s frame.
However due to the fact I haven’t personnaly worked deeply on this aspect, so I won’t go
into details, I will only look through the main aspects and results of Guillaume’s work, in
order to keep this essential aspect of the project and respect that this part is not mine.
Figure 3.1: The banana plush on the dark horizontal plane that supports the arm
The first task of the computer vision part is to detect this banana. The banana was
chosen because it is a convenient target. It is a standard object easy to find; its colour is
convenient to detect; its softness makes it easy for a gripper to grab it; and it is coherent
with the fact that most objects are not rectangular but have curves, therefore the approach
is more general.
15
16
The image process is made with OpenCV which includes build in methods for computer
vision. To extract the banana from the scene, one works in a specific colour space:
the HSV colour space[12]. The HSV space is an appropriate colour space to perform
colour detection. Each colour is represented by a triplet of values between 0 and 255
corresponding to its values of hue, saturation, and value [12].
The extraction of an object from an image is based on contour detection once the image
has been binarized according to a specific strategy. In this case, two HSV thresholds have
been selected, to extract objects in between. Then one can perform contour detection on
these objects.
Figure 3.2: Example of detected banana and its contour with OpenCV
With the coordinates of the centroid, one can locate the banana in the horizontal
plane. The next step is to access its depth with respect to the camera, to grab it with
the UMI-RTX’s gripper.
The main idea behind stereo vision is to reproduce human vision [13]. One will use
the difference in perception of the scene to extract depth information.
1
https://www.stereolabs.com/zed-mini/
17
3.2.1 Parameters
One needs to know some parameters associated with the scene and the camera in order
to access depth information. The first type of parameters are the intrinsic parameters
[13], which are internal to the camera meanwhile the second type of parameters are the
extrinsic parameters [13], which links the scene reference frame to that of the camera.
The third type of parameter is matrices that won’t be described here. Accessing depth
information can only be done through the determination of these parameter.
Figure 3.4: One of the views of the chessboard used to calibrate the ZED Mini device
One has to declare the inner pattern that will be searched by the algorithm. It is
very important to correctly count the inner corners for the algorithm to work. It is also
necessary to provide the algorithm with the size of a square.
Once this is done, the 2D positions of the corners in the left and right images are
saved and, for each pair of images, the 3D real coordinates of the corners with respect to
the top left corner with no knowledge of the depth yet, are saved too. Finally, thanks to
dedicated OpenCV function, it is possible to process the wanted parameters.
Figure 3.5: Detected corners on the precedent left view, associated to the declared pattern
18
Figure 3.7: Left view of the scene and the associated depth map
On the disparity map, one notices that there are many levels of grey. The brighter,
the closer. This map is still noisy, but it is one of the cleanest disparity maps made during
19
the internship with this method, despite being far from perfect.
3.3.1 Stereolabs
Stereolabs is a French company based in Silicon Valley. It is a world leader in the use of
stereo vision and artificial intelligence to provide 3D depth and motion sensing solutions.
It sells stereo cameras, software, and embedded PCs for 3D perception and AI. This
project works with one of their cameras, the ZED Mini, initially designed for mixed
reality, and their software development kit.
the SDK’s tools. The advantage of doing so is that the code is much lighter and clearer
now. There is, indeed, no use for a calibration or rectification part. Thanks to the SDK,
one can use the stereo camera nearly directly at full capacity.
One can see below the depth map obtained for our system. After some convenient
conversions, those images can now be included in the ROS 2 architecture described later.
Figure 3.8: Left view with target detection on and depth map
Those figures are useful to make a fair comparison with the images from Section 3.2.4.
On it, one can see the project’s work space with the robotic arm and the banana. On
the first one, it is clear that the banana has been detected and its contour drawn. On
the second one, one can see the UMI-RTX, brighter than the rest of the image, above its
planar support (much darker). There lies the banana target on a box.
Afterwards, the process is similar. The target detection is identical to Section 3.1, but
the acquisition of its orientation has been added to it (yaw, pitch, and roll).
Then, once the contour of the banana and the best fitting line have been found, one
can find an orientation for the target. At this point, one also has the coordinates of its
centroid. This is where the 3D point cloud comes in.
Figure 3.9: 3D point cloud of the scene. Front and side view. On the bottom of the front
view, one can see the 3D model of the stereo camera
As one has the coordinates in pixels of the target’s centroid, one can find the cor-
responding metric coordinates in the point cloud. Then one has the 3D coordinates in
millimeters of the banana. But they are in the reference frame of the left camera. Hence,
it is necessary to add offsets with respect to the origin of the world frame of the UMI-RTX.
At this point, one just has to publish the coordinates of the target.
Chapter 4
Docker image
4.1 Docker
To solve this issue, it was decided to create a Docker image of an Ubuntu 20.04 instal-
lation that would take advantage of the hardware of the computer. This containerization
allows anyone who has Docker to use our project. They don’t need a ROS installation
any more, just to install Docker. As a result, this makes our project really portable. The
only thing is that it requires a NVIDIA graphic card for the reasons explained in Section
3.3.
1
https://www.docker.com/
21
22
ROS 2 Interface
ROS 2 brings several key benefits to developers and roboticists. It offers improved
performance and reliability thanks to its optimised middleware, which enables faster and
more efficient communication between nodes. This enhanced performance is particularly
beneficial for applications requiring real-time or low-latency operations. ROS 2 also em-
phasises security and safety with features such as authentication and fine-grained access
control, making it more suitable for sensitive applications and environments [9].
1
https://docs.ros.org/en/foxy/index.html
23
24
ROS 2 communications are based on two principal concepts: nodes and topics. Nodes
are individual software modules that perform a specific task within a robotic system.
Nodes are the fundamental building blocks of a ROS application, and they communicate
with each other by passing messages. On the other side of the coin, topics are the com-
munication channels used by nodes to exchange messages in ROS. A topic is a named
bus where nodes can publish messages or subscribe to receive messages. It follows the
publish-subscribe communication pattern, where nodes that generate data publish mes-
sages to a topic, and nodes interested in that data subscribe to the topic to receive the
messages [9].
Here, the central node is /GUI. This node is the one where the GUI is executed and
where information is centralized. It sends the targeted pose to the /inverse_kinematics
node that will process the inverse kinematics in order to get and send the required joints’
state that will make the arm reach its target, through the /motor_commands topic. The
/simulation node subscribes to this topic and, through /robot_state_publisher, sends the
robot description to the integrated simulation panel in our GUI that will be described in
section 5.4. This closed loop allows to control the arm as wanted.
In this system, one has to be careful how he deals with the yaw. This is because for the
real arm, the yaw is equal to 0 when it is in reality equal to arctan2(y, x) where x and y
are the coordinates of the end-effector. To be clearer, in simulation, the yaw origin is the
y-axis, whereas for the encoders, the yaw origin and neutral point follow the axis between
the zed-axis and the wrist. Because of this, one has to be careful to avoid confusion with
the yaw value.
Concerning the node handling the stereovision, which is /camera_api, the ZED SDK
described earlier was used. Our working version of the interface fully uses the SDK, and
one can see that it sends lots of information, like the desired pose or two images, the depth
map, and an image where the object is surrounded. This data is sent to GUIw which is
the intermediary between the data and the simulation and/or the arm.
25
For the real arm, the node graph is a bit more complicated due to the use of the arm
node. This node also subscribes to the targeted pose to be aware of when the target
changes. By doing so, useless calculations are avoided. Indeed, if commands are sent
to the arm only when the target changes, it will cost less resources than trying to send
commands at every loop, and it will also be more reactive to any changes.
Every topic distributes its own type of message among the standard messages that
exist in ROS 2. Below are the message types associated with every topic of our ROS
architecture.
Table 5.1: Messages description
5.3 Simulation
Simulation takes place thanks to RViz2, a ROS 2 visualization tool, in which every frame
is well defined thanks to the inverse kinematics and the TF included in ROS 2. A TF (for
Transformed Frame) provides a convenient way to manage the frames relative positions
by maintaining a tree-like structure of frames and managing the transformations between
them. It helps to seamlessly convert coordinates from one frame to another and handle
various operations involving transformations and coordinate systems. It simplifies a lot the
dispositions of the frames in the simulation. Those TF are obtained thanks to the URDF
file that initiates them, and then they are actualized thanks to the inverse kinematics
algorithm that gives the positions of each joint. The rendering of this simulation and how
it is processed are explained in the following part of our GUI description.
26
Thanks to Qt5 3 , a custom GUI (for Graphic User Interface) was designed, one that
allows us to define the desired pose if we want to manually control the arm, see the
simulation through to the integration of RViz2 into the GUI, and check the processed
image or the depth map as well. This interface is an all-in-one interface, allowing us to
control the arm as we want.
There were some aspects of this interface that required more work due to their higher
difficulty to implement. Integrating RViz2 into this custom interface necessitated a lot
of introspection into RViz’s API. Unfortunately, the documentation was sparse, so it was
challenging to understand which functions, classes, and concepts to use.
To quickly explain how RViz works, it is based on Qt (which facilitates the integration
into our own GUI) and relies on what’s called a render_panel. It is sort of the "frame" in
which everything happens. It will be this object that will be integrated into our interface.
However, this render_panel by itself is not sufficient to have everything printed. What is
called a VisualizationManager, a tool that allows us to add specific displays to our panel,
has to be instantiated. Thanks to this tool, one is able to add a RViz default displays that
will allow us to see our arm and the TF, more precisely rviz_default_plugins/RobotModel
and rviz_default_plugins/TF. However, if only this is done, only the different frames
defined by the TFs and not the model will be seen. This is because the RobotModel display
needs to subscribe to a ROS 2 topic dedicated to providing the description of the robot.
Then a display’s property that allows us to subscribe to the topic /robot_description,
which is such a topic, is used.
Finally, one just has to add the tool necessary to be able to move the camera into this
panel, and here is a fully custom and customizable RViz2 integration in our interface.
The particularity of this custom GUI is that it is linked to a ROS node. This is nec-
essary because, as explained above, this interface allows you to interact with the targeted
pose. For this, it is necessary to have a connection with the node that handles the pub-
lication of this targeted pose, even more so when we consider that we have to choose if
the arm has to be controlled manually or automatically. To do so, a switch button was
included in the interface that modifies a public Boolean of the node. This Boolean defines
what type of commands are wanted for our system.
Thanks to "Manual mode", one can control the arm with the sliders displayed on our
interface. If the button is clicked, it switches to "Grab mode", in which the arm follows
a predefined procedure: it goes to where the object is, grabs it, displays the banana, and
then puts it at another place. The algorithm works, but there are still minor issues. Be-
cause of the lack of precision, sometimes the grip misses the banana, but it is always very
close. Despite that, the planned trajectory is well followed; one just has to be careful not
to plan the arm to move too fast because the system is really limited by the motors’ speed.
In order to prove this assumption, a series of tests have been conducted. The test was
to launch the automatic mode from a random position of the arm and of the target. The
only condition was that the arm had to be above the target. It resulted that 63% of the
attempts to realise our algorithm were successful and in most of the missed attempts, the
arm was close to the target and it missed by not much. This can be the consequence of
a lack of precision in our algorithm. More, when the arm grabbed the target correctly,
it was put correctly at its end pose (a predefined position) on 85% of the tries. Those
success rates can be considered sufficient for the time put on it.
Chapter 6
Conclusion
Our progress during this internship has been interesting. Our purpose was to create a
ROS 2 interface that could make the UMI-RTX grab a simple object using computer
vision and inverse kinematics. We are proud of the work accomplished during these past
few months. It was quite a challenge to gather state-of-the-art and classic technology.
That does not mean there is no work left to be done. Our interface can be improved.
Nonetheless, it works efficiently. We managed to get a strong computer vision process,
and the arm responds well to the commands we send through it. The simulation is also ef-
ficient and well integrated into the GUI, which allows us to gather all the useful commands
and data. More concretely, we succeeded in using stereo vision to access the target’s 3D
position with respect to the UMI-RTX’s world frame; in converting these coordinates into
orders for the arm to follow a calculated trajectory; and in grabbing the target and lifting
it to another location. In that aspect, we can say that we have found a way to accomplish
our mission.
It is also important to mention the work we did with Docker. Creating a custom
Docker image was quite challenging, but we succeeded in doing it, making the project
way more accessible now. This custom image has several advantages. First, it allows ev-
eryone who has a Nvidia GPU and Docker installed to use the arm; Ubuntu and ROS are
no longer mandatory to do so. Secondly, it also allows us to compile and sum up in a file
what needs to be done in order to make the utilization successful. The only counterpart
is that we didn’t manage to do a fully self-built custom image. But in the end, it works
really well, and we are quite satisfied with the work we did on it.
As more general improvements, we could work on the threshold settings that enable
the code to detect the targeted banana. Depending on the light in the laboratory, there
can be some noise that makes the banana go undetected. Concerning the arm, efficiency
can still be improved. When using automated mode, the movement is quite jerky due to
the time the arm needs to reach a position. However, when we try to have a full view of
our work, it seems we have globally accomplished what we wanted to do, despite minor
aspects that still need some work.
We are also very satisfied with all that we have learned. We have consolidated our
ROS 2 skills, improved our Docker knowledge, and learned a lot about computer vision
and creating a custom GUI. We are grateful for all of this.
28
Bibliography
[1] Sebastian Van Der Borght. Camera gebaseerde robotsturing. Master’s thesis, KU
Leuven, 2015-2016.
[3] Xavier Dooms. Camera gebaseerde robotsturing d.m.v. ros implementatie met
opencv. Master’s thesis, KU Leuven, 2014-2015.
[4] Dániel András Drexler. Solution of the closed-loop inverse kinematics algorithm using
the crank-nicolson method. In 2016 IEEE 14th International Symposium on Applied
Machine Intelligence and Informatics (SAMI), pages 351–356, 2016.
[5] Aidan Fuller, Zhong Fan, Charles Day, and Chris Barlow. Digital twin: Enabling
technologies, challenges and open research. IEEE Access, 8:108952–108971, 2020.
[6] Richard Hartley and Andrew Zisserman. Multiple View Geometry in computer vision.
Cambridge, 2006. Chapters 9 to 12.
[7] Universal Machine Intelligence Ltd. Inside RTX: Guide to the Design, Mechanics
and Electronics. April 1987.
[8] Universal Machine Intelligence Ltd. Maintenance Manual for RTX, August 1987.
[9] Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall.
Robot operating system 2: Design, architecture, and uses in the wild. Science
Robotics, 7(66):eabm6074, 2022.
[10] Dirk Merkel. Docker: lightweight linux containers for consistent development and
deployment. Linux journal, 2014(239):2, 2014.
[11] P.K. Shukla, K.P. Singh, A.K. Tripathi, and A. Engelbrecht. Computer Vision and
Robotics: Proceedings of CVR 2022. Algorithms for Intelligent Systems. Springer
Nature Singapore, 2023.
[12] Hélène Thomas. Traitement numérique des images. Technical report, ENSTA Bre-
tagne, 2 rue François Verny, 29200 France, 2023. Private communication.
[13] Hélène Thomas. Vision par ordinateur. Technical report, ENSTA Bretagne, 2 rue
François Verny, 29200 France, 2023. Private communication.
29
How to cite this report
Do not forget to cite this report if it was of any help in your project. Here are the BibTeX
corresponding citing lines:
@mastersthesis{Massa2023,
title={A ROS 2 Interface
for the UMI-RTX robotic arm},
author={MASSA, Theo},
school={ENSTA Bretagne and the University of Amsterdam},
year={2023}
}
30
Appendices
31
32
<link name="base_link">
<visual>
<geometry>
<box size="1.252 0.132 0.091"/>
</geometry>
<origin rpy="0 -1.57 1.57" xyz="0 -0.0455 0"/>
<material name="blue">
<color rgba="0 0 .8 1"/>
</material>
</visual>
</link>
<link name="shoulder_link">
<visual>
<geometry>
<box size="0.278 0.132 0.091"/>
</geometry>
<origin rpy="0 -1.57 0" xyz="0 0 0"/>
<material name="white">
<color rgba="1 1 1 1"/>
</material>
</visual>
</link>
<link name="shoulder_to_elbow">
<visual>
<geometry>
<box size="0.252 0.10 0.10"/>
34
</geometry>
<origin rpy="0 0 0" xyz="0.126 0 0"/>
<material name="white">
<color rgba="1 0 0 1"/>
</material>
</visual>
</link>
<link name="elbow_to_wrist">
<visual>
<geometry>
<box size="0.252 0.10 0.07"/>
</geometry>
<origin rpy="0 0 0" xyz="0.126 0 0"/>
<material name="white">
<color rgba="1 0 0 1"/>
</material>
</visual>
</link>
<link name="wrist_gripper_connection_to_gripper">
<visual>
<geometry>
<box size="0.10 0.10 0.05"/>
</geometry>
<origin rpy="0 0 0" xyz="0.05 0 0"/>
<material name="white">
<color rgba="1 0 0 1"/>
</material>
</visual>
</link>
<link name="wrist_to_wrist_gripper_connection">
<visual>
<geometry>
35
<link name="virtual_link"/>
<link name="gripper_base">
<visual>
<geometry>
<box size="0.077 0.15 0.078"/>
</geometry>
<origin rpy="0 0 0" xyz="0.10 -0.02 0.012"/>
<material name="white">
<color rgba="0.5 0.5 0.5 1"/>
</material>
</visual>
</link>
<link name="left_finger">
<visual>
<geometry>
<box size="0.07 0.015 0.025"/>
</geometry>
<origin rpy="0 0 0" xyz="0.06 -0.04 0"/>
36
<material name="white">
<color rgba="0.5 0.5 0.5 1"/>
</material>
</visual>
</link>
<link name="right_finger">
<visual>
<geometry>
<box size="0.07 0.015 0.025"/>
</geometry>
<origin rpy="0 0 0" xyz="0.06 0.04 0"/>
<material name="white">
<color rgba="0.5 0.5 0.5 1"/>
</material>
</visual>
</link>
D. Dockerfile
ARG DEBIAN_FRONTEND=noninteractive
SHELL ["/bin/bash", "-c"]
ENV USER=root
# Setlocale
RUN apt update && apt install locales
RUN locale-gen en_US en_US.UTF-8
RUN update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
RUN export LANG=en_US.UTF-8
RUN apt install software-properties-common -y
RUN add-apt-repository universe
WORKDIR /home/Stage
RUN git clone https://github.com/gardegu/LAB42_RTX_control.git
WORKDIR /home/Stage/LAB42_RTX_control
RUN ./install_dependencies.sh
RUN mkdir logs
WORKDIR /home/Stage/LAB42_RTX_control
RUN apt install nano -y
RAPPORT D'EVALUATION
ASSESSMENT REPORT
ENSTA
8retagne
Mercide retourner ce rapport par courrier ou par voie électronique en fin du stage à :
At the end of the internship, please return this report via mail or email to:
ENSTABretagne - Bureau des stages -2 rue François Verny -29806 BREST Cedex 9- FRANCE
00.33 (0) 2.98.34.87.70/ stages @ensta-bretagne.fr
Veuillez attribuer une note, en encerclant la lettre appropriée, pour chacune des caractéristiques
suivantes. Cette note devra se situer entre A (très bien) et F (très faible)
Please attribute a markfrom A (excellent) to F (very weak).
MISSION /TASK
Lamission de départ a-t-elle été remplie ? ABCD EF
Was the initial contract carried out to your satisfaction?
Souhaitez-vous nous faire part d'observations ou suggestions ?/f you wish to comment or make a
Suggestion, please do so here
7
Version du 05/04/2019
COMPORTEMENT AUTRAVAIL/BEHAVIOUR TOWARDS WORK
Le comportement du stagiaire était-il conforme àvos attentes (Ponctucl, ordonn, respectueux,
soucieux de participer et d'acquérir de nouvelles connaissanccs) ?
Did the intern lie wp to epectations? (Punctual, methodieal, responslve to management
instructions, attentive to quality, concerned with acquiring new skills)?
ABCDEF
Souhaitez-Vous nous faire part d'observations ou suggestions ? /If you wish to comment or make a
suggestion, please do so here
Version du 05/04/2019