0% found this document useful (0 votes)
7 views

Block-3-output

The document provides an overview of the course MCS-230 on Colour Image Processing offered by Indira Gandhi National Open University, focusing on computer vision and digital image processing. It covers topics such as camera models, geometric transformations, and the mathematical principles behind image projections, including perspective and orthographic projections. The course aims to equip students with an understanding of how machines interpret images and videos through various transformations and camera models.

Uploaded by

Prerna Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Block-3-output

The document provides an overview of the course MCS-230 on Colour Image Processing offered by Indira Gandhi National Open University, focusing on computer vision and digital image processing. It covers topics such as camera models, geometric transformations, and the mathematical principles behind image projections, including perspective and orthographic projections. The course aims to equip students with an understanding of how machines interpret images and videos through various transformations and camera models.

Uploaded by

Prerna Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

ignou MCS-230 Colour Image

Processing
THE PEOPLES
UMVERSITY Digital Image Processing
Indira Gandhi National Open University and Computer Vision

Block

207
Digital images PROGRAMME DESIGN COMMITTEE
Processing —II
Prof. (Retd.) S.K. Gupta, IIT, Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. Ela Kumar, IGDTUW, Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. T.V. Vijay Kumar JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Gayatri Dhingra, GVMITM, Sonipat Dr. V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
Mr.Milind Mahajan,. Impressico Business Sh. M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Solutions, New Delhi Dr. Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU

COURSE DESIGN COMMITTEE


Prof. T.V. Vijay Kumar, JNU, New Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. S.Ba1asundaram, JNU, New Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. D.P. Vidyarthi, JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Anjana Gosain, USICT, GGSIPU Dr.V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
New Delhi Sh. M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Dr.Ayesha Choudhary, JNU, New Delhi Dr. Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU

SOCIS FACULTY

ShSanjay Aggarwal
Assistant Registrar, MPDD, IGNOU, New Delhi
June, 2023
Slndira Gandhi National Open University, 2023
Allrights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without
permission in writingfrom theIndira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from theUniversity's
office at Maidan Garhi, New Delhi-1 10068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by MPDD, IGNOU.
Laser Typesetter: Tessa Media& Computers, C-206, Shaheen Bagh, Jamia Nagar, New Delhi-1 10025

208
Colour Image
BLOCK3 INTRODUCTION Processing

This Block relates to the coverage of the various topics, relevant from the
point of view of computer vision, which includes the introduction to the
actual meaning of computer vision, along with various camera models and
the transformations involved in computer vision. The block also coves the
concepts related to single camera and multiple camera environments. The
unit wise distribution of content is given below:
Unit 8, includes the introduction to the actual meaning of computer vision,
along with various camera models and the transformations involved in
computer vision
Unit 9, includes the concepts related to single camera model environment viz.
perspective projection, homography, camera calibration, and affine motion
models.
Finally, Unit 10 involves the various concepts relevant to multiple camera

209
210
Introduction to
UNIT8 INTRODUCTION TO COMPUTER Computer Vision
VISION
Structure Page No.
8.1 Introduction 211
8.2 Objectives 211
8.3 Introduction to Computer Vision 212
8.4 Camera Models 212
8.5 Projections 214
8.6 Transformations 215
8.7 Summary 222
8.8 Solutions/ Answers 223
8.8 References 223

problems and obtain their solutions.

8.2 OBJECTIVES
The objective of this unit is to introduce the subject of computer vision. “A
picture is wortha thousand words” holds true asonecan see that an image of
any scene hasa lot of detailed information. Further, it is easy to note thata
color image has more information thana grey-scale image. In today's world,
camera technology has become very cheap and ubiquitous Cameras arethe
instruments through which we capture an image. The mathematics behind the
camera model helps us to understand computer vision. Therefore, we need to
understand the various camera models. As it has beendiscussed earlier in
digital image processing, thata digital image isa matrix of non-negative
integers, therefore, any transformation applied ona matrix can be applied on
a digital image.
211
Computer Vision-I In this unit, we shall study various geometric transformations. We shall
finally summarise theunit.

8.3 INTRODUCTION TO COMPUTER VISION

Computer vision is the field of study that endeavours to develop techniques


to help machines understand the content of images and videos captured by
single and/or multiple cameras. It seems to bea trivial problem forhuman
beings to understand and recognize contents of images and videos once seen,
however, images and videos havea lot of content and it is not easy fora
machine to focus on the relevant portions of an image, understand the
content, recognize familiar objects and faces and “tell the story”. However,
for machines to carry out intelligent tasks by “seeing” its surroundings,
current computer vision and machine learning techniques need to be learnt,
understood and further developed. Applications of computer vision exist in

are two major classes of camera models: camera models witha finite center
and camera models with center at infinity. We shall focus our attention to the
camera model witha finite center and discuss the simplest camera model: the
pin-hole camera model.

Figure 1:The pin-hoIe camera geometry. Figure taken from [1]

212
8.4.1 The Pin-Hole Camera Model IRtFOdtlCtlOR t0
Computer Vision
Ina pin-hole camera,a barrier witha small hole (pin-hole) is placed between
the 3D object and the 2D image plane (light-sensitive plane). Each point on
the 3D object emits/ reflects multiple light rays out of which only one or very
few of these light rays fall on the image plane by passing through the pin-
hole. Thereby, one can see that there existsa mapping between thepoints of
the 3D object and the 2D plane forming an image and sucha mapping will be
a projection where thepin-hole is called the centre of projection.
Formulation of the pin-hole camera model
Assume that the centreof projection is the origin of the 3D Euclidean
coordinate system. LetZ =@e the image plane or the focal plane. Then, the
centre of projection (‘C’) is called the camera centre or optical centre andfis
the focal length. The principal axis or the principal ray is the line from the
camera centre that is perpendicular to the image plane. The point of

Note that it is not possible to write the above perspective projection


transformation in matrix form. To overcome this problem, we introduce
homogeneous coordinates system.

8.4.2 Homogeneous Coordinates


A coordinate system where every point having three coordinates, {x, y, w)T, is
calleda homogeneous coordinatesystem provided w z0 in which forpoints
T '
s, = {x,, y„ wk) and x2 =(x2y, .• ) we have s,= x2 m = 2 and
1 2

y —— will be satisfied. Clearly, the points ofa homogeneous coordinate

system belong to the 2- dimensional space,


A two dimensional point in the Euclidean coordinate system can be
represented asa point in the homogeneous coordinate system and vice-versa.
213
Computer Vision-I In fact, for a 2D point (x, J) its corresponding point in the homogeneous
coordinate systemistaken as (x, J,1)T. Similarly, for a given a point
{x, y, w)T and w z0 in homogeneous coordinates, its corresponding 2D point
is obtained as: Since

and therefore its corresponding 2D point becomes (x/w, y/w).


Likewise, asa point in the homogeneous coordinate system having four
coordinates {X, Y, Z, W)T such that Y z 0, the following additional
condition holds: For X, = (J„ Y, Z„ W,)T, X2 =(J2,F2,Z2,< )" from the
homogenous coordinate systemwe have
T
(I,, Y, Z,, W,) ——(J2,F2,Z2,Y2) 1 2 Y __ 2 and Z, __Z2
W, F2 F, W2 ’1 2

homogeneous coordinates

f 0 0 0 X fX
0 f 0 0 Y fY
0 0 f 0 Z fZ
0 0101 Z

the
projection point in the Euclidean space can be obtained

'f—! y——!i
This way the perspective projection ofa 3D point via the center of
projection (the pin-hole) ontoa point on the image planeZ ——f can be
obtained in matrix form.

214
Orthographic projection IRtFOdtlCtlOR t0
Computer Vision
Orthographic projection is the projection ofa 3D object ontoa plane bya
set of parallel rays that are orthogonal tothe image plane, i.e., it isa
parallel projection. In this projection, the center of projection is taken at
infinity.
For any point (X, Y,Z)T from theobject, its orthographic projection on
the image planeZ ——f is given by

The important properties of the orthographic projection are that parallel


lines remain parallel and the size of the object does not change.

from thecamera.

Equations of the weak perspective projection are:

in which each point is scaled by

8.6 TRANSFORMATIONS
Geometric transformations play an important role in Computer Vision. In this
section, we discuss the important transformations that we shall require in the
future in this course.

215
Computer Vision-I 8.6.1 Euclidean Transformations
The most important property of Euclidean transformation is that it preserves
lengths and angle measures. They are the most commonly used
transformations consisting of translation and rotation.

(i) Translation

Whena point is moved from one location to another along straight line
paths, it is known asa translation. In 2D, let be any point and
and denote the translations along x- and y- directions respectively.
Then, the new coordinates of the point are given by

Or,

Fig (a)An object (b) Object translated to the origin

For example: Ifthere isa rectangle with coordinates (2, 2), (2, 6), (5,2)
and (5, 6) and it is to be translated to the origin. Then thetranslation
vector will be

Therefore, ,

Similarly,

216
The graphical representation is given below: Introduction to
Computer Vision
(2, 6) (5, 6)
(0,4) (3,4)

(2, 2) (5, 2)

(0,0) (0,0) (3,0)

Fig: a) Square before Translation b) Square after Translation

(ii) Rotation

A rotation is specified by an angle 8, and the pivot-point, about


which theobject is to be rotated.A two-dimensional rotation applied to
an object re-positions it alonga circular path inthe 2D-plane.A positive

Fig. (c) Object atthe origin (d) Object rotated at the origin

Example 2:Performa 450 rotation ofa triangle ABC with coordinates A: (0,0),B:
(1,1),C: (5,2) about the origin.

Solution: We can represent the given triangle, in matrix form, using homogeneous
coordinates of the vertices:

217
Computer Vision-I

The matrix of rotation is: Rt= &5

So the new coordinates A'B’C’ oftherotated triangle ABC can be found as:

0 01 /2 /2 0 0 0 1
• 5°' 111 — /2 2/2 0 = 0 1
5 21 0 0 1 3 /2 7 /21

Thus =(0,0), =(0,12), =(312/2,712/2)

The following Figure (a) shows theoriginal, triangle ABC and Figure QJ shows the
triangle ABC after the rotation.

B’ ^

A’

Figure (b)

A rigid body transformation arecombinations of rotation and


translations. A general rigid body transformation can be represented
using the homogeneous coordinates as

218
Introduction to
Computer Vision

Fig (f) Object rotated and translated


Fig (e) Object atorigin after rotation from initial positioning Fig (e)

The rigid body transformations preserve angles between vectors and length of
vectors therefore parallel lines remain parallel after a rigid body
transformation.

B(0,1) C(1,1)

8.6.2 Affine Transformations

Euclidean transformations do not change the shape of the object. Lines


transform to lines, planes to planes and circles to circles. However, the
lengths and angles are preserved by Euclidean transformation. Affine
transformations are an extension of the Euclidean transformations which do
not preserve lengths and angles. That is, under an affine transformation,
circle may transform to an ellipse, however,a line will transform toa line.
The important affme transformations are scaling and shear. Moreover, 219
Computer Vision-I translation and rotation also are affine transformations, since affine
transformations are an extension of the Euclidean transformations.

(i) Scaling
By scaling, the dimensions of an object are either compressed or
expanded. A scaling transformation is carried out by multiplying the
coordinate values of eachvertex by scaling factors Sxand Sy,inthe
x— and y- directions respectively, to produce thetransformed coordinates

Example: Consider the triangle ABC with coordinatesA =(0,0),B =


(1,0)andC = (1,1). If the scaling factor along X-axis, and along

Therefore, the scaled triangle, will have the


coordinates

Fig (a) The original triangle ABC Fig(b) The Scaled triangle

220
(ii) Shear IRtFOdtlCtlOR t0
Computer Vision
A transformation that slants the shape of an object is called the shear
transformation. Shearing transformation can also be carried out in both
X, Y directions or only one the directions. The new coordinates after
shearing in X direction are given by:

and shear in Y-direction is given by:

Here., and are the shearing factors along the X- and Y-


directions respectively and are given as inputs.
Th i ti f

Therefore, the sheared triangle, will have the coordinates

221
Computer Vision-I

General Affine Transformation


Using the homogeneous coordinates, a general affine transformation is
defined of the form:

8.7 SUMMARY

Inthis unit, we have studied an introduction to computer vision, the basic


pin-hole camera model. Homogeneous coordinates were introduced, which
helps in defining the projection transformations in terms of matrices. We
have discussed perspective projection, orthographic projection and weak
perspective projection. Geometric transformations are important for computer
vision and we have discussed Euclidean, Affine and Projective
transformations and their representation in homogeneous coordinates.

222
Introduction to
8.8 QUESTIONS AND SOLUTIONS Computer Vision

Q1. Which projection does thepin-hole camera represent?


Ans.1 Pin-hole camera represents the perspective projection, such that it
representsa mapping between the points on the 3D object and its
image formed on the2D plane
Q 2. What aretherigid body transformations?
Ans.2 The rigid body transformations are translation and rotation.
Translation in movement alonga line, while rotation movesa point
by an angle arounda pivot point.
Q 3. What arethethree classes of transformations?
Ans 3. The three classes of transformations are: Projective, Affine and
Euclidean transformations. The projective transformation preserves

223
Computer Vision-I
UNIT9 SINGLE CAMERA
Structure Page No.
9.1 Introduction 224
9.2 Objectives 224
9.3 Camera Models 224
9.4 Perspective Projection 226
9.5 Homography 229
9.6 Camera Calibration 231
9.7 Affine Motion Models 234
9.8 Summary 235
9.9 Solutions/ Answers 235

9.1 INTRODUCTION

o ea aboutt e ca e a ode deta,


• To understand the concept of camera matrix,
• To understand the process of camera calibration
• To understand the affine motion model.
• To give an overview of the aspects of computer vision related toa single
camera and theestimation of 3D parameters froma single camera.

9.3 CAMERA MODELS


Asdiscussed in Unit- 8,a pin-hole camera doesa perspective projection ofa
3D scene ontoa 2D plane.

224
Single Camera

Figure-1: The pin-hole camera model [1]

Under the pin-hole camera projection, a 3D point X = (X, Y, Z) is


mapped( toa 2D pointx = bx, J2on theimage plane where, the line joining
th t C t th 3D i t Xi t t th i 1 t Ui

Therefore, we see that in the pin-hole camera modela perspective projection


occurs. The image coordinates are related to the world coordinates as given
by EquationI under this perspective projection where thecamera centre is at
the origin of the world coordinate system. However, in real-life it is not
always possible to keep the camera (pin-hole/ centre of lens) at the origin of
the world coordinate system. To bring the object into the camera's view, we
may need to move thecamera away from the origin. In Unit-8, we studied
about transformations. In real-world, we shall have to carry outa sequence of
translations and rotations of the camera to bring the object of interest in its
view. In the next section, we shall discuss two important concepts: (a)
intrinsic parameters ofa camera and, (b) extrinsic parameters ofa camera.

Before we discuss these parameters, we have to understand that both the


pixel coordinate system and the world coordinate systems are related by the
following physical parameters: (a) size of pixels (b) position of principal 225
Computer Vision-I point (c) focal length of the lens and (d) position and orientation of the
camera.

The internal or intrinsic parameters of the camera define the relation between
thepixel coordinates ofa point on the image with the corresponding camera
coordinates that exist in the camera reference frame.

The external/ extrinsic parameters of the camera are the parameters that
define the location and orientation of the camera coordinate frame with
respect toa known world coordinate frame.

Questions to check your progress:


1. What isa pinhole camera?
2. What does f, the focal length represent ina pin-hole camera model?

94 PERSPECTIVE PROJECTION

(b) Finding the elements of the rotation matrix to align the corresponding
axes of the two coordinate frames.

Therefore, the extrinsic camera parameters help us in finding the relation


between thecoordinates ofa 3D point in the world coordinate system with its
coordinates in the camera coordinate system.
LetR be the rotation matrix,

andT be the translation vector,

226
Single Camera

Then, fora 3D point A, whose world coordinates be and camera

coordinates are

-(2)

2. What does the rotation matrix in the camera external parameters


represent?
3. What does the translation vector in the camera external parameters
represent?
9.4.2 Internal/Intrinsic Parameters ofa Camera
Equation1 assumes that the coordinates in the image plane are measured
with the principal point as the origin. As we discussed in Unit-8, the
principal axis or the principal ray is the line from thecamera centre that is
perpendicular to the image plane. Therefore, the principal point is the point
of intersection of the principal axis and the image plane. In general, the pin-
hole camera model isa geometrical model, implying that all points and focal
length are measured in millimetres or centimetres or meter. However, the
camera coordinates are measured in pixel distance or height and width of
pixels (sampling distance).
227
Computer Vision-I Therefore, the internal camera parameters characterize the following:
a) The perspective projection (focal length)
b) The relation between pixel size and image plane coordinates
c) The geometric distortions introduced by the optics
The relation between thecamera coordinates and the image plane coordinates
is given by the perspective projection:

(3)

9.4.2.1 Relation between image plane coordinates and pixels


Let u, and uy be the coordinates of the principal point in pixels and sx, Syare
the sizes of pixels in the horizontal and vertical directions in millimeters.
Therefore,
X = — (yi,n— tfx)Sx=>Xim= — X/Sx+ ttx y

Therefore, the matrix of intrinsic parameters, also known as thecamera


calibration matrix is

And the matrix representing the external or extrinsic parameters of the


camera is

228
Single Camera

In homogeneous coordinates,

-(6)

P = K[R|T] is called the camera projection matrix, which isa 3x4 matrix,
where K represents the intrinsic parameters and [R|T] represent the extrinsic
parameters of the camera.

Questions forreview of progress:

An important condition fora transformation to bea homography is that it


maps collinear points to collinear points, that is, if xi, xc and x3 lie on a line
then, h(xi), h(x) and h(x3) also lie on a line.

229
Computer Vision-I Fig. 2. The projection maps points on one plane to points on another plane.
These points are related by a homography, H, such that . (Fig.
taken from [1])

9.5.1 Homography Estimation: Direct Linear


Transformation Algorithm
We can estimate the homography between two images of the same scene
under the assumption that the point correspondences are given and that there
is planar motion ofthecamera.
We assume that are the given setof point correspondences. We need
tocompute the3x3 homography matrix H.
For each point correspondence, we have,

Therefore, ifwe simplify, we get the equation where,

230
Single Camera

And by stacking all the equations of this form, we will get the system of
equation Ah = 0. To obtain the solution of the system of equations Ah = 0,
we obtain the Singular Value Decomposition (SVD)[2] of A. IfSVD(A)
UDVT where,D is the diagonal matrix with singular values arranged in
descending order, then the solution,h is the last column of V, since for the
system of equation Ah=0, thesolutionh is the smallest singular vector which

8 8 P P
perspective distortion froma planar building façade. Fig. taken from [1]

Questions forreview of progress:


1. Planar homography helps in relating the points on one plane with the
corresponding points on another plane. True or False.
2. Write out the complete equations to find the system Ah = 0.

9.6 CAMERA CALIBRATION


Theaimofcamera calibration is to estimate the intrinsic and extrinsic
parameters ofa camera. To compute the intrinsic and extrinsic camera
parameters, we need to knowa set of correspondences between the world
points (X, Y, Z) and their projections on the image (x,y). Therefore, the first
step is to establish the set of correspondences between the world points and
231
Computer Vision-I their projections on the image plane. To do so, in general, images ofa known
calibration object are used. The calibration object hasa known 3D geometry
and location in space. Moreover, it generates image points which can be
accurately located.

Fig 4. Image ofa calibration object. (Image taken from [1])

Step 1: Normalization. In this step, we computea similarity transformation


T to normalise the 2D image points. To carry out normalization, the points
should be translated such that their centroid should be at the origin and scaled
in sucha manner that the root mean squared distance from the origin is
12.Similarly, computea similarity transformationU to normalise the 3D
world points. The 3D points should also be normalised such that centroid of
the points is translated to the origin and root mean squared distance from the
origin is 13. (This works well forthe case when thevariation in depth of the
points is less such as ina calibration object).

Step 2: Direct Linear Transform (DLT):


Assuming that the camera matrix is represented by

232
Single Camera
For each corresponding point, we have

Therefore,

We can write by stackingA for each corresponding point, we generate the


2nxl2 matrix A, such that Am = 0, where m is the vector that contains the
entries of the matrix M. The solution of Am = 0, subject to is then
obtained from the unit singular vector of A corresponding to the smallest
singular value of A. We use Singular Value Decomposition (SVD) tofind the
solution as in the case of homography computation. This gives us the linear
solution for M, which is used as an initial estimate of M.

Step 3: The measurement errors need to be reduced. Therefore, we minimize


the geometric error over M using the linear estimate as the
starting point and an iterative algorithm such as Levenberg-Marquardt.
233
Computer Vision-I
Step 4: De-normalization: Finally, the camera matrix for the original
(unnormalized) coordinates is obtained as

Therefore, is the camera matrix. Using QR-decomposition[2], the camera


matrix can be decomposed as where, consists of the
internal parameters of the camera and is an upper-triangular matrix and
is the matrix of external parameters of the camera.

Questions forreview:
1. What aretheminimum number ofpoint correspondences between 3D
points and 2D points required for camera calibration?
2. Write down theequations needed to set up the system Am = 0.

9.7.1 Affine Camera


An affine camera is the camera whose projection matrix M has the last row of
the form (0,0,0,1). An important property of the affine camera is that it maps
thepoints at infinity to points at infinity. Therefore, an affine camera isa
camera at infinity, implying that the camera center lies on the plane at
infinity. The affine camera preserves parallelism.

As we calibratea projective camera, similarly we can estimate an affine


camera also. An affine camera matrix isa 3x4 projection matrix with the last
row given as (0,0,0,1).

In general, the affine motion model is used for approximating the flow
pattern of the camera motion ina video.

234
Single Camera
9.8 SUMMARY
Inthis unit, we have discussed the single camera geometry, including the
camera model, perspective projection, the camera parameters, homography,
camera calibration and the affine motion model. The camera parameters are
of two types, the internal camera parameters and the external camera
parameters which together form the camera matrix. We also discussed how
image to image homography can be computed with the knowledge of
sufficient number of corresponding points. We also discussed that if
sufficient number ofworld to image corresponding points are known then the
camera can be calibrated and the projection matrix can be estimated. This can
be done easily with a calibration object. We also discussed the affine
transformations, the affine camera, and the affine motion model.

9.9 REFERENCES

235
Computer Vision-I
UNIT 10 MULTIPLE CAMERA
Structure Page No.
10.1 Introduction 236
10.2 Objectives 236
10.3 Stereo Vision 237
10.4 Point correspondences 237
10.5 Epipolar Geometry 238
10.6 Motion: Optical Flow 240
10.7 Summary 242
10.8 Solutions/ Answers 242

10.1 INTRODUCTION

despite occlusion from one view. Therefore, multiple camera systems have
many advantages over single camera systems. However, this also leads to
questions such as how many cameras are enough, where should the cameras
be placed and how much overlap should the cameras have between their
views. The answers to these questions are dependent on factors such as the
cost of equipment, the type of cameras used and theapplication of the camera
system.

10.2 OBJECTIVES
Theobjectives of this unit are to:
• Learn about the stereo vision system.
• Discuss the concept of point correspondences and epipolar geometry.
• Discuss the concept of motion and optical flow that allows a
computer vision system to find the movement pattern ina video.
236
Multiple Camera
10.3 STEREO VISION
A stereo vision system consists of two cameras such that both the cameras
can capture the same scene, however with some disparity between theviews.
One stereo vision system that we can easily relate to are the two eyes that we
have.
The process of using two images of the scene captured bya stereo camera
system to extract 3D information is known asstereo vision. Since the stereo
pair, or the images taken by the stereo system, enable us to get the 3D
information of the scene, therefore, it finds wide application in autonomous
navigation of robots, autonomous cars, virtual and augmented reality, etc.
In stereo vision, the 3D information is obtained by estimating the relative
depth of the points in the scene. The corresponding points in the stereo pair
are matched anda disparity map is created which helps in estimating the
depth of the points. We shall first understand the concept of corresponding
points

Fig10.1 The figure shows theconcept ofa stereo pair.C and C’ are the two
camera centers and x and x’ are the images of the 3D point X in the
corresponding image planes. x and x’ are said to be corresponding points. Fig
taken from [1]

As shown inFigure 10.1, the stereo pair consist of two cameras that are ata
certain distance apart. The line joining the camera centers is known asthe
baseline. The view volume of the two cameras is such that they viewa
common area in the scene. A 3D point that lies in the common view volume
ofthetwo cameras will be imaged by both the cameras. Therefore, as shown
inFigure 10.1, the 3D point X is visible to both the cameras and therefore, it
has an image inCamera1 (with camera center ) and image inCamera
2 (with camera center ). Therefore, and are called corresponding
points. 237
Computer Vision-I Givena point in one image, we can find it corresponding point in the other
image because of epipolar geometry which we shall study next.

10.5 EPIPOLAR GEOMETRY

Figure 102 (a) (b) Epipolar geometry (Image taken from [2])

see that in the case of stereo vision, it is possible to find the 3D point if it is
imaged by both the cameras and thepoint correspondences are known.

The line joining the two camera centres Co and Ci is known asthebaseline.
The baseline intersects the two image planes at and respectively. The
point of intersection of the baseline with an image plane is known as an
epipole. Therefore, is the left epipole and is the right epipole. The left
epipole is the image of the camera center in the left image plane
while, the right epipole is the image of the camera centre n the right
image plane The epipolar line is the intersection of the epipolar plane with
the image plane. An important point to be noted is that every epipolar line
passes through the epipole, therefore, the point of intersection of all the
epipolar lines is the epipole. Another point to be noted is that the epipolar
238
line in is the image of theback projected ray joining the camera centre Multiple Camera

ith the 3D point while the epipolar lime n s the image of theback
projected ray, that is the ray from thecamera centre ith the 3D point

Therefore, given the corresponding point n the second image is


constrained to lie on the corresponding epipolar line. The Fundamental
matrix, represents the epipolar geometry algebraically. An important point
to be noted is that the fundamental matrix defines the relation between
corresponding points as given by Equation 10.1.

10.1)

More precisely, the fundamental matrix apsa given point rom the first
i

olving for the Fundamental Matrix


Given point correspondences, nd e can setupa system of equations
using Equation (1)to solve for F.
mplies

10.2)

which gives Equation 10.3 on solving:

10.3)
239
Computer Vision-I If we consider msuch correspondences, then we can solve the system of
equation forthe9 unknowns inF.

(10.4)
This system requires atleast 8 points to solve sinceF has 8 degrees of
freedom. Therefore, this is also known asthe8-point algorithm.

We use Singular Value Decomposition (SVD) to solve the system of

JR
2. The camera is moving buttheobject is static.
3. Both thecamera and objects are moving
4. The camera is static but light source and objects are moving.

Motion perception plays an important role in applications of computer vision


such as activity recognition, surveillance, 3D reconstruction, and many
others. Therefore, it is important to estimate the motion from images.

10.6.1 Optical Flow

Estimating motion of thepixels ina sequence of images or videos hasa very


large number of applications. Optical flow is used to compute where the
motion information between sequence of images. It helps to determinea
dense point to point correspondence between pixels of an image at timet
240
with the pixels in the image at time t+1. Therefore, optical flow is said to Multiple Camera
compute themotion ina video or sequence of images.

Optical flow is based on the assumption that across consecutive frames, the
pixel intensities do not rapidly change. It also assumes that neighbouring
pixels have similar pattern of motion. Most often, it also assumes that the
luminance remains constant. We assume that /(x,J,iJ isa pixel in the image
taken at time i, moves toa point (x+H, y+ 6y) at the time i+6i, that is,
tof{x+6x, y+by, t+bt). Since they are the same point, and therefore, the above
assumption can be written mathematically as
(10.6)

Equation (10.6) forms the basis of the 2D motion constraint equation and
holds true given that 6x, by, bt are small. Taking into consideration the first
order Taylor series expansion about inEquation (10.6), we get

where, are the image velocities or optical flow at (x,y) at

time i.
are the image intensity derivatives at (x, y).

Then, from Equation (10.8), we get


(10.9)

Equation (10.9) is called the 2D Motion Constraint Equation.

There are various methods for solving for optical flow. The Lucas and
Kanade optical flow algorithm [3] and Horn and Shunck [4]optica1 flow
methods aretwo ofthemost popular methods forestimating the optical flow.
241
Computer Vision-I
10.7 SUMMARY
Inthis unit we learned about various concepts related to Multiple camera
models required for computer vision. The concept of stereo vision was
discussed and the concept was extended to point correspondence and
Epipolar geometry; finally the unit was completed with the motion oriented
concepts which includes optical flow.

10.8 QUESTIONS AND SOLUTIONS


Q1. What is Stereo Vision?
Sol. Ref. 10.3
Q2. Describe the concept of Point Correspondance.
Sol. Ref. 10.4
Q3. Discuss the term Epipolar Geometry.
Sol. Ref. 10.5

242

You might also like