Classification and Realization of the Different Vision Based Tasks
Classification and Realization of the Different Vision Based Tasks
François CHAUMETTE
IRISA / INRIA Rennes, Campus de Beaulieu,
35042 Rennes-cedex, France
Patrick RIVES
INRIA Sophia Antipolis, 2004 Route des Lucioles,
06565 Valbonne, France
and
Bernard ESPIAU
INRIA Sophia Antipolis, 2004 Route des Lucioles,
06565 Valbonne, France
ABSTRACT
This paper describes the different robotics tasks which can be realized using the
visual servoing approach. A classification of these tasks, based on the virtual linkage
formalism, is proposed and the different properties that the visual features and their
related interaction matrix have to satisfy are presented. For the most classical virtual
linkages between a camera and its environment, several combinations of visual features
are selected. Finally, we recall the application of the task function approach to visual
servoing by focusing on automatic control aspects.
1. Introduction
1
are really required. For completing correctly the task, the path provided by the high
level should therefore be transformed in a specification at the control level. For doing
that, we express a task at a low-level by a set of constraints on the motion of a frame
linked to the robot with respect to a reference frame associated with the environment.
In this framework, this paper focuses on the utilization of the so-called visual servo-
ing approach as an adequate way to perform non-contact sensor-based tasks in robotics.
By visual servoing, we mean the use of vision sensors embedded in a closed-loop control
scheme and providing with some measurements of the interaction between the robot
and its local environment at a sufficiently high sampling rate for ensuring the stability
of the robot’s control loop. Non-contact sensing is useful in the achievement of many
kinds of robotics tasks in various application domains. For example, relative position-
ing errors dues to the inaccuracies in the modeling of robot kinematics, or elasticities
in the case of large light structure for space application, can thus be compensated. An-
other important problem, which arises in the case of mobile robots, is on-line obstacle
avoidance when the environment is not well known, imperfectly modeled or liable to
unexpected changes. Sensor-based control is again an elegant way of addressing the
problem.
In the following, we therefore propose a framework allowing to design and perform
efficiently Vision-based Tasks. In order to be easily tractable by the user, the modeling
aspect is based on the well-known theory of mechanisms. Besides, efficiently and
robustness will be considered at the control level by using the task function approach
introduced by Samson [20].
The paper is organized as follows. In section 2, we recall the concept of interaction,
which links a mobile camera to its environment. Explicit results are given for the most
usual geometrical primitives of a scene (points, segments, lines, circles, spheres and
cylinders). In section 3, we present the formalism of virtual linkage, which allows us to
classify the different vision-based tasks. For the most classical virtual linkages (rigid,
revolute, prismatic, etc.), several combinations of visual features are obtained by using
the properties of their interaction matrices. Robustness issues of the vision-based tasks
are also evoked. Section 4 is devoted to control aspects and to the application of the
task function approach to visual servoing. Finally, section 5 gives simulation results of
the realization of several vision-based tasks by a six-jointed robot arm and a mobile
robot with non-holonomic constraints.
From a general point of view, let us consider a sensor (S), rigidly fixed to a robot,
with associated frame (FS ). This sensor perceives a part of the environment which
contains a target (T ) with associated frame (FT ). Let the one-dimensional signal sj be
the useful part of the signal provided by (S). We shall further assume in the following
2
that given (S) and (T ), the signal sj is only function of the relative location (position
and attitude) of (T ) with respect to (S). In practice, this assumption is reasonable
for a large class of sensors providing with geometric information, like range finders or
force sensors. Let us remark that, in the peculiar case of vision sensors, the use of
data based on geometric features like points, lines,..., is authorized, while data based
on photometric properties, which may depend on the lightning conditions, cannot be
considered. The relative location of (FT ) with respect to (FS ) can be parametrized by
an element r̄ of the displacement space SE3 . Moreover, let us consider that the location
of (FS ) with respect to a fixed frame (F0 ) is a function of the robot joint coordinates q
and let us use the variable t to parametrize the location of the target (when its motion
is independent), we may thus write:
where sj is a C 2 function with value in R and domain in SE3 . The signal sj admits a
derivative represented by an element of se3 and we can write the differential:
∂sj dr̄
ṡj = • = Hsj • TST (2)
∂r̄ dt
where:
- TST = (V, ω) is the velocity screw of the frame FT with respect to the frame FS ,
where V and ω represent its translational and rotational components respectively;
With an obvious breach of notation, Eq. 2 can also be written under matrix form:
where TS and TT are the sensor and target velocity screw respectively, expressed in the
sensor frame FS .
As an important remark, we can note that for a given situation (S, T ), Hsj and LTsj
only depend on the characteristics of the sensor itself and of the target. It should be
emphasized that it is independent of the robot itself. Thereby, Hsj and LTsj are only
representative of the interaction between the sensor and the robot’s environment. For
this reason, they are called interaction screw and interaction matrix respectively, and
contain most of the information required to design sensor-based control schemes.
This general framework is valid for a large class of sensors as long as they satisfy
the assumption mentioned previously. This is in particular the case of visual servoing,
as shown now.
3
2.2 Case of an Image Sensor
Let us model a camera by a perspective projection (see figure 1). In the following,
all the used variables (point coordinates, screws,...) are expressed in the camera frame
FS (O, ~x, ~y , ~z).
Without loss of generality, the focal length is assumed to be equal to 1, so that any
point with coordinates x = (x y z)T is projected on the image plane as a point with
coordinates X = (X Y 1)T with:
1
X= x (4)
z
In [22] and [10], an experimental learning approach is proposed to compute the
interaction matrix related to points. But it is also possible to derive it in an explicit
way: see for example [7]. More generally, in [6], a systematic method for computing
the interaction matrix of any set of visual features defined upon geometrical primitives
is proposed for this camera model. As it will be shown in the next section, the choice
of the visual features able to realize any vision-based task is essentially based on these
interaction matrices. We now present the obtained results for the parameters describing
the projection in the image of the most usual primitives (points, segments, lines, circles,
spheres and cylinders). Let us note that all the computations leading to the following
expressions are detailed in [3].
2.2.1. Points
4
2.2.2. Segments
LTXc = [ −λ2 0
λ2 Xc − λ1 l cos α/4 Xc Yc + l2 cos α sin α/4
−(1 + Xc2 + l2 cos2 α/4) Yc ]
LTYc = [ 0 −λ2
λ2 Yc − λ1 l sin α/4 1+ Yc2 + l2 sin2 α/4
−Xc Yc − l2 cos α sin α/4 −Xc ]
(6)
with λ1 = (z1 − z2 )/z1 z2 and λ2 = (z1 + z2 )/2z1 z2 .
A straight line is here represented as the intersection of two planes under the fol-
lowing form:
a1 x + b 1 y + c 1 z = 0
h(x, p) = (7)
a2 x + b 2 y + c 2 z + d 2 = 0
with d2 6= 0 in order to exclude degenerated cases. The equation of the projected line
in the image plane is written:
p
cos θ = a1 /p a21 + b21
g(X, P ) = X cos θ + Y sin θ − ρ = 0 with / a21 + b21
sin θ = b1p (8)
2 2
ρ = −c1 / a1 + b1
5
with λθ = (a2 sin θ − b2 cos θ)/d2 and λρ = (a2 ρ cos θ + b2 ρ sin θ + c2 )/d2 .
2.2.4. Circles
Its projection on the image plane takes the form of an ellipse or a circle (except in
degenerated cases where the projection is a segment), which may be represented as:
Xc = (K1 K3 − K2 K4 )/(K22 − K0 K1 )
Yc = (K0 K4 − K2pK3 )/(K22 − K0 K1 )
with E = (K1 − K0 ± (K1 − K0 )2 + 4K22 )/2K2 p
A2 = 2(K0 Xc2 + 2K2 Xc Yc + K1 Yc2 − K5 )/(K0 + K1 ± p(K1 − K0 )2 + 4K22 )
2
B = 2(K0 Xc2 + 2K2 Xc Yc + K1 Yc2 − K5 )/(K0 + K1 ∓ (K1 − K0 )2 + 4K22 )
where the coefficients Ki ’s are obtained from the polynomial equation of an ellipse:
with a = α/(αx0 + βy0 + γz0 ), b = β/(αx0 + βy0 + γz0 ) and c = γ/(αx0 + βy0 + γz0 ).
The natural parametrization (Xc , Yc , E, A, B) is always ambiguous. Furthermore,
the orientation E is not defined when the ellipse is a circle (A = B). So we use a
representation based on the normalized inertial moments (Xc , Yc , µ20 , µ11 , µ02 ) with:
µ20 = (A2 + B 2 E 2 )/(1 + E 2 )
µ11 = E(A2 − B 2 )/(1 + E 2 ) (14)
µ02 = (A2 E 2 + B 2 )/(1 + E 2 )
6
and we can obtain:
LTXc = [ −1/zc 0 Xc /zc + aµ20 + bµ11
Xc Yc + µ11 −1 − Xc2 − µ20 Yc ]
LTµ11 = [ −aµ11 − bµ02 −aµ20 − bµ11 aYc µ20 + (3/zc − c)µ11 + bXc µ02
3Yc µ11 + Xc µ02 −Yc µ20 − 3Xc µ11 µ02 − µ20 ]
2.2.5. Spheres
h(x, p) = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 − r2 = 0 (16)
1/z = aX + bY + c (19)
7
2.2.6. Cylinders
with:
rx0 /A − α rx0 /A + α
cos θ1 = p , cos θ2 = p
(rx0 /A − α)2 + (ry0 /A − β)2 (rx0 /A + α)2 + (ry0 /A + β)2
ry0 /A − β ry0 /A + β
sin θ1 = p , sin θ2 = p
(rx0 /A − α) + (ry0 /A − β)
2 2 (rx0 /A + α)2 + (ry0 /A + β)2
rz0 /A − γ rz0 /A + γ
ρ1 = p
, ρ2 = p
(rx0 /A − α)2 + (ry0 /A − β)2 (rx0 /A + α)2 + (ry0 /A + β)2
p
A = x20 + y02 + z02 − r2
α = cy0 − bz0
where
β = az0 − cx0
γ = bx0 − ay0
and we finally obtain:
LTθ1 =[ λθ1 cos θ1 λθ1 sin θ1 −λθ1 ρ1 −ρ1 cos θ1 −ρ1 sin θ1 −1 ]
LTρ1 =[ λρ1 cos θ1 λρ1 sin θ1 −λρ1 ρ1 (1 + ρ21 ) sin θ1 −(1 + ρ21 ) cos θ1 0 ]
LTθ2 =[ λθ2 cos θ2 λθ2 sin θ2 −λθ2 ρ2 −ρ2 cos θ2 −ρ2 sin θ2 −1 ]
LTρ2 =[ λρ2 cos θ2 λρ2 sin θ2 2 2
−λρ2 ρ2 (1 + ρ2 ) sin θ2 −(1 + ρ2 ) cos θ2 0 ]
(22)
2
λθ = (y0 cos θ1 − x0 sin θ1 )/A
1
λ ρ1 = −(x0 ρ1 cos θ1 + y0 ρ1 sin θ1 + z0 )/A2
with
λθ = (y0 cos θ2 − x0 sin θ2 )/A2
2
λ ρ2 = −(x0 ρ2 cos θ2 + y0 ρ2 sin θ2 + z0 )/A2
8
3. Vision-based Tasks
In the previous section, we shown how sensor signals could be modelized. We now
study how they can be used in the design of vision-based tasks at a task planning
level. Indeed, it seems useful to provide the user with systematic methods allowing
to specify vision-based tasks with a natural formalism. The one we chose rely on the
well-known theory of mechanisms. More precisely, we shall show that the properties
of the interaction screw can be analyzed in terms of constraints in the configuration
space, and represented by a formalism very similar to the one which is currently used
for kinematics of contacts between solids.
Eq. 2 computes the differential of the sensor signal sj as a screw product. It may
occur that, for a peculiar value sjd of the signal associated to a given location r̄, certain
∗
displacements TST of the sensor with respect to the target do not result in any change
of the value sjd . For example, when the camera motion is parallel to the projection
line joining a point in the scene to the optical center of the camera, then the signals
s1d = X and s2d = Y where (X, Y ) are the coordinates in the image of the point,
remain unchanged.
∗
Given the interaction screw Hsj , the velocity screws TST that leave sj invariant are
solutions of:
∗
Hsj • TST =0 (23)
∗
The definition of the screw product implies that TST is a screw reciprocal to Hsj in
the six-dimensional vector space of screws. Let us now consider a more general case,
where we can extract from the image more than one visual signal, and let us denote as
s the vector regrouping all the visual signals s = (s1 , · · · , sp )T . Thereby, the motions
around the location r̄ which leave s unchanged are characterized by the reciprocal
∗
subspace (TST ) of the subspace spanned by the set of screws (Hs1 , · · · , Hsp ). Using the
∗
interaction matrix LTs = (Ls1 , · · · , Lsp )T , the subspace (TST ) is characterized through:
∗
(TST ) = Ker LTs (24)
As well known, the formalism of screws is also a straightforward manner to describe
contacts between objects. This suggests to make an analogy between true contacts and
sensor-based tasks. The underlying idea is the following: imposing ṡ = 0 is equivalent
to introducing constraints in the configuration space SE3 in order to achieve a virtual
contact between the target and the sensor. Going on with this analogy, we can now
introduce the notion of virtual linkage:
A set of k compatible constraints: s(r̄)−sd = 0 determines a virtual linkage between
the sensor (S) and the target (T ). At a location r̄ where the constraints are satisfied,
∗
the dimension N of the reciprocal subspace (TST ) in se3 is called the class of the virtual
linkage at r̄.
9
In this definition, the ‘class’ of the virtual linkage is the number of degrees of freedom
of the sensor corresponding to the motions which leave s unchanged. The concept of
virtual linkage, which may include the physical linkage when contact sensors are used,
will allow us to design the wished sensor-referenced robotics tasks in a simple way.
This also establishes a connection with the approach known as ‘hybrid control’, which
is traditionally used in control schemes involving contact force sensors.
In a more practical way, the design of a sensor-based task (here a vision-based task)
should be done by the user through the following steps:
2. from the a priori knowledge on the application, selecting the visual signals which
realize the virtual linkage;
3. building from these signals the task function which ensures a good realization of
the task (see section 4);
4. building the control scheme which allows to regulate the task function (see sec-
tion 4).
Note that, in order to make easier the achievement of first and second steps, it
may be useful to constitute a ‘library of canonical vision-based tasks’. This library will
contain the most classical linkages which are used in practice and some examples of
signals s able to realize them.
∗
In order to realize such a linkage, the subspace (TST ) corresponding to the selected
visual features should have the form:
∗
T
(TST )= a b c 0 0 0 (25)
10
where the direction given by the vector (a, b, c)T is the one which is not constrained
by the linkage. To realize this class-1 virtual contact, different solutions exist. For
example, the use of (ρi , θi ) parameters associated with the projection of at least three
lines of direction (a, b, c)T easily solves the problem. Other features are proposed in
[3].
We only consider here the particular case of a linkage allowing the camera to ro-
tate around its optical axis. We have therefore to find visual features such that the
interaction matrix kernel be spanned by:
∗
T
(TST )= 0 0 0 0 0 1 (26)
A first idea is to consider a centered circle parallel to the image plane. Unfortunately,
the equilibrium position is then singular. This may be seen in representing ellipses by
their moments, which leads to the following interaction matrix (see Eq. 15):
−1/zd 0 0 0 −1 − R2 0
0 −1/zd 0 1 + R2 0 0
L sd =
T
0 0 2R 2
/z d 0 0 0
(27)
0 0 0 0 0 0
0 0 2R2 /zd 0 0 0
where z − zd = 0 is the equation of the circle plane and R is the radius of the projected
circle. In that case, we have:
0 1 + R2 0
0 0 −1 − R2
0 0 0
∗
(TST ) = (28)
0 −1/z 0
d
0 0 −1/zd
1 0 0
∗
We indeed find in (TST ) the expected motion, but for exactly representing the desired
linkage, supplementary visual features should be taken into account. We may for
example choose (see figure 2):
• visual signals associated with a second centered circle, also parallel to the image
plane (either in the same plane with different radius or in a different plane with
the same radius),
11
✻Y
~ ✻Y
~
✬✩
✗✔ ~ ✗✔~
X✲ X✲
✖✕ ✖✕
❅
❅
✫✪
where the vector (a, b, c)T is both the rotation axis and the translation direction of
unconstrained camera motions. To realize it, let us firstly consider a cylinder with
radius r, the generating line of which is parallel to the image plane and meets the
point (0, 0, zd )T . Its equation is (cf. section 2.3.5):
where the vector (a, b, 0)T with a2 + b2 = 1 is the direction of the generating line.
The cylinder is projected on the image plane as two parallel straight lines which are
symmetric with respect to the center of the image (see figure 3). Their equations are:
√
D1 : bX − aY + r/ √zd 2 − r 2 = 0
(31)
D2 : −bX + aY + r/ zd 2 − r2 = 0
from which we may derive, by using Eq. 22, the expressions of associated interaction
12
Figure 3: Prismatic/revolute linkage using a cylinder
matrices:
LTρ1 = [ λρ b −λρ a −λρ ρ −a(1 + ρ2 ) −b(1 + ρ2 ) 0 ]
LTθ1 =[ 0 0 0 −bρ aρ −1 ]
(33)
LTρ2 = [ −λρ b λρ a −λρ ρ a(1 + ρ2 ) b(1 + ρ2 ) 0 ]
LTθ2 =[ 0 0 0 bρ −aρ −1 ]
Since, for any screw T = (V, ω) and any point m, we have ω(m) = ω(O) and
∗
V (m) = V (O) + ω × Om, the set of screws (TST ) may be expressed in a point of the
T
linkage axis like (0, 0, zd ) . We then have:
a 0
b 0
0 0
∗
(TST ) = (35)
0 a
0 b
0 0
Therefore, these visual features allows us to realize the desired linkage the axis of
which is parallel to the image plane (c = 0). If the desired axis is ~z, the linkage may
13
be derived from the case of the revolute one by choosing as parameters the coordinates
(Xci , Yci )T of the considered circle centers.
In order to realize this class-3 linkage, we have to find visual features such that the
space of invariant screws be spanned by:
1 0 0
0 1 0
0 0 0
∗
(TST )= 0 0 0
(36)
0 0 0
0 0 1
Since the plane/plane contact only allows motions parallel to the image plane, the
scene may be made for example of any regular (periodic) polygonal planar mesh such
that there always exists in the camera field the required set of adequate image features.
In order to preserve isotropy, we also impose that all mesh edges have same length (see
figure 4).
Y~ Y~
❍ ✟❍ ✟❍ ✟❍
❍✟ ❍✟✻❍✟ ❍ ❅ ❅ ❅✻ ❅ ❅
❅
✟❍
✟ ✟❍
❍✟ ✟❍
❍✟ ✟
❍✟ ❅ ❅ ❅ ❅
❅ ❅ ❅ ❅ ❅
❍ ✟❍ ✟❍
❍✟✟❍
~
❍ X
~
❅ ❅ ❅ ❅✲ X
❍✟ ❍✟ ✲ ❅ ❅ ❅ ❅ ❅
✟❍
✟ ✟❍
❍✟ ✟❍
❍✟ ✧
❍✧ ❅ ❅
❅ ❅ ❅❅ ❅❅
❍
❍✟✟❍
❍✟✟❍
❍✟✟❍
❍ ❅ ❅ ❅❅
❅ ❅ ❅ ❅ ❅
❅ ❅
We may then choose as visual signals the length of at least three projected mesh
edges which do not belong to the same line. In practice we choose a connected set
of edges as close as possible to the image center. We may verify that, for any edge
represented by (Xc , Yc , l, α), we have (see Eq. 6):
LTl = [ 0 0
l/zd l[Xc cos α sin α + Yc (1 + sin2 α)]
−l[Xc (1 + cos2 α) + Yc cos α sin α] 0 ]
(37)
∗
By choosing at least three edges, (TST ) has the required form. Let us emphasize that,
in order to improve the resulting motion, the use of a larger set of edges, together with
a process handling in a smooth way edge occurrence and removal, should be preferred.
14
3.2.6. Ball-and-socket linkage
Let us end this survey of the most classical linkages with this class-3 contact. The
related feature is the set of the parameters representing the projection in the image of
a sphere with center coordinates (0, 0, zd )T . This projection is a centered circle, and
the interaction matrix associated with the representation Xc , Yc , and µ = (µ20 + µ02 )/2
of the circle is:
LTXc = [ −1/zd 0 0 0 −1 − µ 0 ]
T
LYc = [ 0 −1/zd 0 1+µ 0 0 ] (38)
T
Lµ = [ 0 0 2µ/zd 0 0 0 ]
and we see that all rotations are free, which is the definition of the ball-and-socket
linkage.
To conclude, let us emphasize that similar study may be done when the configura-
tion space of the camera is of dimension less than 6. In the general case, this restriction
15
comes from the kinematics of the system which handles the camera. When the added
constraint is holonomic, the motion is locally restricted to a submanifold of SE3 . If
the associated constraints equations may be explicitly expressed in SE3 or in the tan-
gent space, they may be added to the set of visual-based constraints or directly to the
interaction matrix. In practice, it is even often possible to use the previously defined
linkages and to exploit motion restrictions to simplify them. The case of non-holonomic
constraints is evoked in section 5.3.
Let us suppose that the geometric features to be controlled in the image have
been chosen (for example, s is a set of coordinates of peculiar points in the image
like corners). Let us also assume that in order to complete the task, the control of a
minimum of m features is needed, and that it exists, in the image, p features such that
p > m which can be potentially used. The question which arises is: what criterion
may be used to select the ‘best’ set of signals, in terms of performance and robustness?
To be significant, this criterion has to take into account several different aspects, like
the quality of the features extraction (computation cost, accuracy, robustness...), or
more specific control aspects (stability, convergence rate, decoupling...). Finding such
a criterion is a hard task and a few works has been done in this area [7].
Presently, only the canonical case of a positioning task (rigid linkage) using the
coordinates of three points in the image as visual signals, has been really investigated.
Obviously, we assume that the three points in the scene which project on the image,
constitute a rigid triangle. Although this case is the simplest (for a positioning task
based on points as features), it allows us to point out some general properties which are
required to the mapping which is defined by LTs , between the location space (SE3 ) and
the measurement space (R6 in this particular case). In a certain sense, this mapping
can be viewed as similar to the inverse kinematic model (from SE3 to the joint space)
well known in robotics. As in the case of the inverse kinematic model, it is interesting
to address the problem of the unicity of the positioning between the camera and the
object. In other words, is the mapping a one to one mapping ? This question, often
referred as the “perspective-3-point” problem (P3P) [9] has been largely discussed and
it can be shown that there are, in general, four configurations r̄ of SE3 physically
admissible which bring to a same image of the three points. A consequence of these
multiple solutions, is that, reaching a value sd will not ensure that the right desired
configuration r̄d has been really reached since the convergence towards a given solution
will be strongly dependent of the initial configuration. Moreover, as in the case of
robots, this multiplicity leads up to infer the existence of singularities in the interaction
matrix.
16
We will therefore concentrate on another issue, by adopting a control point of view.
Let us assume that the user’s objective can be expressed as the regulation to zero of
a n-dimensional C 2 function, e(q, t), called task function, during a time interval [0, tm ]
[20]. When sensors are used, the vector s(q, t) obviously contributes to the design of
e(q, t) (see section 4). We can write:
∂e ∂e ∂e ∂s ∂r̄ ∂e
ė = q̇ + = q̇ + (41)
∂q ∂t ∂s ∂r̄ ∂q ∂t
∂e
The characteristics of the Jacobian matrix ∂q and of its model, which will be later
used in a feedback control scheme in the space of visual signals, have a strong influence
on the robustness and convergence properties of the control law. It is important to
notice that this matrix is the product of three terms which are very different in nature:
∂ r̄
- ∂q
is the classical robot Jacobian which appears in virtually all robotics problems.
∂s
- ∂ r̄
is what we called the interaction matrix or task Jacobian. It is canonically
related to the vision problem without any reference to a particular robot kine-
matics, this last fully appearing in the robot Jacobian.
∂e
- ∂s
depends on the way we take into account the set of visual signals in order to
build the task function. Here arises the tricky problem of visual signals selec-
tion, vastly discussed in [8]. Without addressing this question in details, and by
considering the robustness of image features extraction, it is easy to understand
that the most reliable image features in the sense of image processing should be
preferred.
Let us assume that the robot Jacobian matrix ∂∂qr̄ is nonsingular in our working area
and let us now focus on the task Jacobian. The important question which arises is:
does it exist any configuration where this Jacobian becomes of rank not full? If so,
may we characterize such configurations?
Unfortunately, no result may apparently be obtained in the most general case, and
only the P3P case could be completed [15]. After intensive use of symbolic calculus,
it was shown that, for the P3P problem, Jacobian singularities are encountered if and
only if:
1. The three points are aligned (degenerate triangle): for any position of the optical
center and attitude of the camera frame, a singularity always exists.
2. The optical center lies on the cylinder which includes the three points and the axis
of which is perpendicular to the plane containing these points: for any attitude
of the camera frame, the configuration is singular (see figure 5).
17
Figure 5: Cylinder of singularities
We may say in conclusion that the choice of well-suited visual signals is not easy
and that work remains to be done to define a relevant criterion of signal selection. Even
in the simple case of the three points, the computations are tedious and generalization
to more sophisticated primitives is not straightforward. However, the P3P example is
interesting since it points out an analogy between singularities in the robot’s Jacobian
and singularities coming from the task itself. A practical way to avoid this kind of
singularities is clearly to introduce redundancy in the task by selecting more visual
signals than necessary. For the rigid linkage case for example, four coplanar points are
enough to avoid task singularities.
Embedding visual servoing in the task function framework allows us to take advan-
tage of general results helpful for analysis and synthesis of efficient closed loop control
schemes taking explicitly into account redundancy in the measurements, for example in
so-called hybrid control schemes [20]. We here only recall the obtained results, all the
developments being fully described in [20] and, in the particular case of vision-based
control, in [6].
We define a vision-based task, e1 , considered as the main one, with dimension
m ≤ n, m = 6 − N , N being the class of the virtual linkage associated to e1 . A
secondary task, e2 , can thus be combined with e1 and is expressed as the minimization
of a cost function hs , with gradient g s = ( ∂h s T
∂ r̄
) . The task function, also called hybrid
task when e2 consists in a trajectory tracking, achieves the goal of the minimization of
hs under the constraint e1 = 0 when it takes the form:
e = W + e1 + In − W + W g Ts (42)
where the vision-based task e1 is written:
e1 = C s(q, t) − sd (43)
18
The terms of the previous equations have the following meaning:
• sd is the desired value of the selected visual features;
• s(q, t) is their current value, taken from the image at each time;
∂e
• W is a m × n full-rank matrix such that Ker W = Ker ∂ r̄1 ;
• W + is the pseudo-inverse of W ;
• (In − W + W ) is an orthogonal projection operator on Ker W .
∂e1
• C= ∂s
is called combination matrix and can be defined as [6]:
+
– C = W LTs when the interaction matrix is known, or when its 3D terms
(such as the range z of a point in Eq. 5) can be estimated on-line, using for
example structure from known motion techniques [4]. In that case, W can
be chosen such that Ker W = Ker LTs .
+
– C = W LTs|s=sd when isolated singularities may occur during the convergence
of e or when the value of the interaction matrix can not be updated at each
time. Assumptions on the shape and on the geometry of the considered
primitives in the scene have thus generally to be done and we set W such
that Ker W = Ker LTs|s=sd .
A general control scheme aimed to regulate the task function e is described in [20].
We here only present the simplified control scheme that we have used to perform the
experimentations described in the next section. Similar control approaches and other
ones using the optimal control theory can be also found in [11], [13] and [17]. Dynamics
effects such as delay and mechanical vibrations are also studied in [5].
For making e exponentially decrease and then behave like a first order decoupled
system, we set:
ė = −λ e with λ > 0 (44)
We also have:
∂e ∂e
ė = TS + (45)
∂r̄ ∂t
where the camera velocity screw TS relies to q̇ through the inverse robot Jacobian
matrix. We assume that the robot Jacobian is perfectly known and non singular, and
that the setting variable tunable by the user is simply the desired joint velocity, q̇,
as for most of industrial robots. Then, the desired camera velocity screw TS can be
considered as a pseudo-control vector. We thus obtain, by setting ∂∂er̄ as the identity
matrix In , since C and W have been chosen in order to satisfy stability conditions
(see [6] for more details):
c
∂e
TS = −λe − (46)
∂t
19
where:
• λ is the proportional coefficient involved in the exponential convergence of e and
has to be tuned in order to ensure the global stability of the system and to
optimize the time to convergence [19];
be can be written under the form:
• ∂∂t
c
∂e c
∂s ∂g T
= W +C + (In − W + W ) s (47)
∂t ∂t ∂t
∂g T
The secondary cost function generally allows to know ∂ts . On the other hand,
vector ∂bs represents an estimation of the contribution of a possible autonomous
∂t
target motion. If the target moves, this estimation has to be introduced in the
control law in order to suppress tracking errors. It may be obtained using classical
filtering techniques such as Kalman [12] [16] [19] or α − β − γ filters [1].
5. Results
We firstly present simulation results in the realization of two classical virtual link-
ages (prismatic/revolute and ball-and-socket ones) using a camera mounted on a six
degrees of freedom robot arm. Then, other vision-based tasks using a camera mounted
on a mobile robot with non-holonomic constraints will be considered.
20
with λρ = −zd /(zd 2 −r2 ). Since this matrix is of full rank 4, we may choose W = LT|s=sd
+
and C = W LT|s=sd = I4 . We therefore have:
e = W + (s − sd ) + (I6 − W + W ) g Ts (51)
where:
λρ /2l 0 −λρ /2l 0
0 0 0 0
−1/2λρ ρd 0 −1/2λρ ρd 0
W+ =
0 −1/2ρd 0 1/2ρd
−(1 + ρd 2 )/2l 0 (1 + ρd 2 )/2l 0
0 −1/2 0 −1/2
(1 + ρd 2 )2 /l 0 0 0 λρ (1 + ρd 2 )/l 0
0 1 0 0 0 0
0 0 0 0 0 0
(I6 − W + W ) =
0 0 0 0 0 0
λρ (1 + ρd 2 )/l 0 0 0 λ2ρ /l 0
0 0 0 0 0 0
with l = λ2ρ + (1 + ρd 2 )2
Simulation results for this task are given in figure 6. Top windows represent the
relative position of the camera (drawn as a pyramid) with respect to the cylinder.
Middle windows show the camera image. Left and right windows correspond to initial
and final positions respectively. Finally, in bottom windows, the time evolution of every
component (si − sdi ) (at left), of ks(r̄) − sd k (in the middle) and of the six components
of TS are plotted.
In that simulation, we set λ = 0.1. The secondary task consists in moving with a
constant velocity Vx = 0.5 cm/s along the ~x camera axis and Vy = −0.5 cm/s along its
~y axis. The secondary cost function is then:
1 1
hs = βx (x − x0 − Vx t)2 + βy (y − y0 − Vy t)2 (52)
2 2
where β x and β y are two positive scalar weights. Therefore:
βx (x − x0 − Vx t) −βx Vx
βy (y − y0 − Vy t) −βy Vy
0 ∂g Ts 0
gs =
T
,
=
(53)
0 ∂t 0
0 0
0 0
21
and the velocity screw TS to be used as an input in order to control e is:
βx Vx (1 + ρd 2 )2 /l
βy Vy
0
TS = −λ e +
(54)
0
βx Vx (1 + ρd 2 )λρ /l
0
For this task realizing the desired linkage, we therefore have e1 = s − sd , and, since no
secondary task was specified in this experiment, the desired camera velocity takes the
form:
−zd /2 0 0
0 −zd /2 0
0 0 zd /2µd
+
TS = −λW e1 = −λ (s − sd ) (57)
0 1/2(1 + µ ) 0
d
−1/2(1 + µd ) 0 0
0 0 0
22
Figure 6: Positioning with respect to a cylinder
23
Figure 7: Positioning with respect to a sphere
24
Due to this fact, it is now known [21] that if we consider a two-wheeled non holonomic
cart, despite the controllability of the system, the stabilization of the cart’s configura-
tion around a given terminal configuration is not possible using a pure stationary state
feedback. This particularity has for consequence that, for a camera rigidly attached
to the mobile base, we cannot ensure a convergence of the task error by using simple
gradient-like methods in the image frame. One way to overcome this problem consists
in adding some degrees of freedom to the camera by mounting it on the wrist of a
manipulator. Then, if we consider the whole mechanical system constituted by the
cart and its manipulator as a single kinematic chain, we can show that the mapping
between the configuration space (d.o.f. of the cart + d.o.f. of the manipulator) and
the location of the camera is nonsingular excepted in few isolated configurations. It
becomes therefore possible to fully control the motion of the camera without being lim-
ited by the cart non holonomic constraints. By this way, we can execute visual servoing
tasks directly defined in the camera frame in the same manner as previously done in
the case of manipulators. Obviously, in such an approach, we are unable to control
explicitly the trajectory of the cart but, by using some trivial geometric considerations,
we can bound the volume which will be occupied by the cart during its displacement.
This approach has been applied to a system constituted by a two-wheeled cart with a
camera mounted on a three d.o.f. manipulator (see figure 8).
z y
y x
q2 q3
x
qd E z
C
F
z y F C
F E
M N
N x
qg q1
z y
F
O
x
O
L2 L3
E C
L1
R
M N
We present here only a few results which have been obtained in simulation (for
more details concerning this application, see [18]). In order to validate visual servoing
approach applied to mobile robot, we give a simple example where the robot task is
defined by two successive visual servoing sub-tasks. We set the robot to an initial
25
position with respect to an environment constituted by several segments (see figure 9).
Initial arm configuration is q1 = π, q2 = − π4 , q3 = 0.
2.5
4
1.5
1 2
0
3 0
The task consists of wall following. Knowing the size of the skirting board of the
room, we can compute by Eq 9 the interaction matrix taking as features the two
corresponding parallel lines. This gives:
−0.156323 0.236455 −0.0509507 0.0991288 −0.149942 −1
−0.240955 0.364468 −0.0785346 −0.861134 −0.569307 0
LTs|s=sd =
−0.158875 0.265158 −0.0766261 0.127409 −0.212643 −1
By applying this screw during 4 seconds in simulation (see figure 10), we illustrate
that this “kernel trajectory” is a translation along the wall. We can see that the robot
“manœuvres” in order to realize the aimed camera motion without using any heuristic
method.
The second vision-based subtask consists in moving around the corner of the room.
For doing that, we use the vertical line formed by the intersection of the walls. After
a half-turn of the arm and moving q2 on zero, we have now a vertical segment on the
image. Taking this segment as a new feature for visual servoing and knowing the height
of this segment, the simulator can compute the interaction matrix corresponding to this
new task. Here again the reduced kernel is only of dimension one and we can deduce:
∗
TST = (−1, 0.0000148184, 0.0145816, 0, 0.126032, 0)T (60)
In this second sub-task when this screw is applied to the camera motion, it rotates
around the vertical segment (see figure 11).
26
o o o o o o o o o o o
3
2.5
2
1.5
1 6
4
2.5
5
2
7.5
0
10
27
Let us finally note that experimental results obtained with a six-jointed robot using
the approach detailed in this paper are described in [6] and [3].
References
[1] P.K. Allen, B. Yoshimi, A. Timcenko, “Real-Time Visual Servoing”, IEEE Int.
Conf. on Robotics and Automation, Vol. 1, pp. 851-856, Sacramento, California,
April 1991.
[2] J. Barakan, J. C. Latombe, “On non Holonomic Mobile Robots and Optimal Ma-
neuvering”, Revue d’Intelligence Artificielle, Vol. 3, n. 2, pp. 77-103, Ed. Hermes,
Paris, 1989.
[4] F. Chaumette, S. Boukir, “Structure from Motion using an Active Vision Para-
digm”, 11th IAPR International Conference on Pattern Recognition, Vol. 2, pp. 41-
45, Den Haag, The Netherlands, August 1992.
[5] P.I. Corke, M.C. Good, “Dynamics Effects in High Performance Visual Servoing”,
IEEE Int. Conf. on Robotics and Automation, Vol. 2, pp. 1838-1843, Nice, France,
May 1992.
28
[11] K. Hashimoto, T. Kimoto, T. Ebine, H. Kimura, “Manipulator Control with
Image-based Visual Servo”, IEEE Int. Conf. on Robotics and Automation, Vol. 3,
pp. 2267-2271, Sacramento, California, April 1991.
[12] A.E. Hunt, A.C. Sanderson, “Vision-Based Predictive Robotic Tracking of a Mov-
ing Object”, Technical Report CMU-RI-TR-82-15, Carnegie-Mellon University,
January 1982.
[14] J. P. Laumond, “Feasible Trajectories for Mobile Robots with Kinematic and Envi-
ronment Constraints”, Int. Conf. on Intelligent Autonomous Systems, pp. 346-354,
Amsterdam, The Netherlands, 1986.
[16] N. Papanikolopoulos, P.K. Khosla, T. Kanade, “Vision and Control Techniques for
Robotic Visual Tracking”, IEEE Int. Conf. on Robotics and Automation, Vol. 1,
pp. 857-864, Sacramento, California, April 1991.
[17] N. Papanikolopoulos, B. Nelson, P.K. Khosla, “Full 3D Tracking using the Con-
trolled Active Vision Paradigm”, 7th IEEE Symposium on Intelligent Control, Th.
Henderson editor, pp. 267-274, Glasgow, Scotland, August 1992.
[20] C. Samson, B. Espiau, M. Le Borgne, Robot Control: the Task Function Approach.
Oxford University Press, 1991.
29