Block-3-output
Block-3-output
Processing
THE PEOPLES
UMVERSITY Digital Image Processing
Indira Gandhi National Open University and Computer Vision
Block
207
Digital images PROGRAMME DESIGN COMMITTEE
Processing —II
Prof. (Retd.) S.K. Gupta, IIT, Delhi Sh. Shashi Bhushan Sharma, Associate Professor, SOCIS, IGNOU
Prof. Ela Kumar, IGDTUW, Delhi Sh. Akshay Kumar, Associate Professor, SOCIS, IGNOU
Prof. T.V. Vijay Kumar JNU, New Delhi Dr. P. Venkata Suresh, Associate Professor, SOCIS, IGNOU
Prof. Gayatri Dhingra, GVMITM, Sonipat Dr. V.V. Subrahmanyam, Associate Professor, SOCIS, IGNOU
Mr.Milind Mahajan,. Impressico Business Sh. M.P. Mishra, Assistant Professor, SOCIS, IGNOU
Solutions, New Delhi Dr. Sudhansh Sharma, Assistant Professor, SOCIS, IGNOU
SOCIS FACULTY
ShSanjay Aggarwal
Assistant Registrar, MPDD, IGNOU, New Delhi
June, 2023
Slndira Gandhi National Open University, 2023
Allrights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without
permission in writingfrom theIndira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from theUniversity's
office at Maidan Garhi, New Delhi-1 10068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by MPDD, IGNOU.
Laser Typesetter: Tessa Media& Computers, C-206, Shaheen Bagh, Jamia Nagar, New Delhi-1 10025
208
Colour Image
BLOCK3 INTRODUCTION Processing
This Block relates to the coverage of the various topics, relevant from the
point of view of computer vision, which includes the introduction to the
actual meaning of computer vision, along with various camera models and
the transformations involved in computer vision. The block also coves the
concepts related to single camera and multiple camera environments. The
unit wise distribution of content is given below:
Unit 8, includes the introduction to the actual meaning of computer vision,
along with various camera models and the transformations involved in
computer vision
Unit 9, includes the concepts related to single camera model environment viz.
perspective projection, homography, camera calibration, and affine motion
models.
Finally, Unit 10 involves the various concepts relevant to multiple camera
209
210
Introduction to
UNIT8 INTRODUCTION TO COMPUTER Computer Vision
VISION
Structure Page No.
8.1 Introduction 211
8.2 Objectives 211
8.3 Introduction to Computer Vision 212
8.4 Camera Models 212
8.5 Projections 214
8.6 Transformations 215
8.7 Summary 222
8.8 Solutions/ Answers 223
8.8 References 223
8.2 OBJECTIVES
The objective of this unit is to introduce the subject of computer vision. “A
picture is wortha thousand words” holds true asonecan see that an image of
any scene hasa lot of detailed information. Further, it is easy to note thata
color image has more information thana grey-scale image. In today's world,
camera technology has become very cheap and ubiquitous Cameras arethe
instruments through which we capture an image. The mathematics behind the
camera model helps us to understand computer vision. Therefore, we need to
understand the various camera models. As it has beendiscussed earlier in
digital image processing, thata digital image isa matrix of non-negative
integers, therefore, any transformation applied ona matrix can be applied on
a digital image.
211
Computer Vision-I In this unit, we shall study various geometric transformations. We shall
finally summarise theunit.
are two major classes of camera models: camera models witha finite center
and camera models with center at infinity. We shall focus our attention to the
camera model witha finite center and discuss the simplest camera model: the
pin-hole camera model.
212
8.4.1 The Pin-Hole Camera Model IRtFOdtlCtlOR t0
Computer Vision
Ina pin-hole camera,a barrier witha small hole (pin-hole) is placed between
the 3D object and the 2D image plane (light-sensitive plane). Each point on
the 3D object emits/ reflects multiple light rays out of which only one or very
few of these light rays fall on the image plane by passing through the pin-
hole. Thereby, one can see that there existsa mapping between thepoints of
the 3D object and the 2D plane forming an image and sucha mapping will be
a projection where thepin-hole is called the centre of projection.
Formulation of the pin-hole camera model
Assume that the centreof projection is the origin of the 3D Euclidean
coordinate system. LetZ =@e the image plane or the focal plane. Then, the
centre of projection (‘C’) is called the camera centre or optical centre andfis
the focal length. The principal axis or the principal ray is the line from the
camera centre that is perpendicular to the image plane. The point of
homogeneous coordinates
f 0 0 0 X fX
0 f 0 0 Y fY
0 0 f 0 Z fZ
0 0101 Z
the
projection point in the Euclidean space can be obtained
'f—! y——!i
This way the perspective projection ofa 3D point via the center of
projection (the pin-hole) ontoa point on the image planeZ ——f can be
obtained in matrix form.
214
Orthographic projection IRtFOdtlCtlOR t0
Computer Vision
Orthographic projection is the projection ofa 3D object ontoa plane bya
set of parallel rays that are orthogonal tothe image plane, i.e., it isa
parallel projection. In this projection, the center of projection is taken at
infinity.
For any point (X, Y,Z)T from theobject, its orthographic projection on
the image planeZ ——f is given by
from thecamera.
8.6 TRANSFORMATIONS
Geometric transformations play an important role in Computer Vision. In this
section, we discuss the important transformations that we shall require in the
future in this course.
215
Computer Vision-I 8.6.1 Euclidean Transformations
The most important property of Euclidean transformation is that it preserves
lengths and angle measures. They are the most commonly used
transformations consisting of translation and rotation.
(i) Translation
Whena point is moved from one location to another along straight line
paths, it is known asa translation. In 2D, let be any point and
and denote the translations along x- and y- directions respectively.
Then, the new coordinates of the point are given by
Or,
For example: Ifthere isa rectangle with coordinates (2, 2), (2, 6), (5,2)
and (5, 6) and it is to be translated to the origin. Then thetranslation
vector will be
Therefore, ,
Similarly,
216
The graphical representation is given below: Introduction to
Computer Vision
(2, 6) (5, 6)
(0,4) (3,4)
(2, 2) (5, 2)
(ii) Rotation
Fig. (c) Object atthe origin (d) Object rotated at the origin
Example 2:Performa 450 rotation ofa triangle ABC with coordinates A: (0,0),B:
(1,1),C: (5,2) about the origin.
Solution: We can represent the given triangle, in matrix form, using homogeneous
coordinates of the vertices:
217
Computer Vision-I
So the new coordinates A'B’C’ oftherotated triangle ABC can be found as:
0 01 /2 /2 0 0 0 1
• 5°' 111 — /2 2/2 0 = 0 1
5 21 0 0 1 3 /2 7 /21
The following Figure (a) shows theoriginal, triangle ABC and Figure QJ shows the
triangle ABC after the rotation.
B’ ^
A’
Figure (b)
218
Introduction to
Computer Vision
The rigid body transformations preserve angles between vectors and length of
vectors therefore parallel lines remain parallel after a rigid body
transformation.
B(0,1) C(1,1)
(i) Scaling
By scaling, the dimensions of an object are either compressed or
expanded. A scaling transformation is carried out by multiplying the
coordinate values of eachvertex by scaling factors Sxand Sy,inthe
x— and y- directions respectively, to produce thetransformed coordinates
Fig (a) The original triangle ABC Fig(b) The Scaled triangle
220
(ii) Shear IRtFOdtlCtlOR t0
Computer Vision
A transformation that slants the shape of an object is called the shear
transformation. Shearing transformation can also be carried out in both
X, Y directions or only one the directions. The new coordinates after
shearing in X direction are given by:
221
Computer Vision-I
8.7 SUMMARY
222
Introduction to
8.8 QUESTIONS AND SOLUTIONS Computer Vision
223
Computer Vision-I
UNIT9 SINGLE CAMERA
Structure Page No.
9.1 Introduction 224
9.2 Objectives 224
9.3 Camera Models 224
9.4 Perspective Projection 226
9.5 Homography 229
9.6 Camera Calibration 231
9.7 Affine Motion Models 234
9.8 Summary 235
9.9 Solutions/ Answers 235
9.1 INTRODUCTION
224
Single Camera
The internal or intrinsic parameters of the camera define the relation between
thepixel coordinates ofa point on the image with the corresponding camera
coordinates that exist in the camera reference frame.
The external/ extrinsic parameters of the camera are the parameters that
define the location and orientation of the camera coordinate frame with
respect toa known world coordinate frame.
94 PERSPECTIVE PROJECTION
(b) Finding the elements of the rotation matrix to align the corresponding
axes of the two coordinate frames.
226
Single Camera
coordinates are
-(2)
(3)
228
Single Camera
In homogeneous coordinates,
-(6)
P = K[R|T] is called the camera projection matrix, which isa 3x4 matrix,
where K represents the intrinsic parameters and [R|T] represent the extrinsic
parameters of the camera.
229
Computer Vision-I Fig. 2. The projection maps points on one plane to points on another plane.
These points are related by a homography, H, such that . (Fig.
taken from [1])
230
Single Camera
And by stacking all the equations of this form, we will get the system of
equation Ah = 0. To obtain the solution of the system of equations Ah = 0,
we obtain the Singular Value Decomposition (SVD)[2] of A. IfSVD(A)
UDVT where,D is the diagonal matrix with singular values arranged in
descending order, then the solution,h is the last column of V, since for the
system of equation Ah=0, thesolutionh is the smallest singular vector which
8 8 P P
perspective distortion froma planar building façade. Fig. taken from [1]
232
Single Camera
For each corresponding point, we have
Therefore,
Questions forreview:
1. What aretheminimum number ofpoint correspondences between 3D
points and 2D points required for camera calibration?
2. Write down theequations needed to set up the system Am = 0.
In general, the affine motion model is used for approximating the flow
pattern of the camera motion ina video.
234
Single Camera
9.8 SUMMARY
Inthis unit, we have discussed the single camera geometry, including the
camera model, perspective projection, the camera parameters, homography,
camera calibration and the affine motion model. The camera parameters are
of two types, the internal camera parameters and the external camera
parameters which together form the camera matrix. We also discussed how
image to image homography can be computed with the knowledge of
sufficient number of corresponding points. We also discussed that if
sufficient number ofworld to image corresponding points are known then the
camera can be calibrated and the projection matrix can be estimated. This can
be done easily with a calibration object. We also discussed the affine
transformations, the affine camera, and the affine motion model.
9.9 REFERENCES
235
Computer Vision-I
UNIT 10 MULTIPLE CAMERA
Structure Page No.
10.1 Introduction 236
10.2 Objectives 236
10.3 Stereo Vision 237
10.4 Point correspondences 237
10.5 Epipolar Geometry 238
10.6 Motion: Optical Flow 240
10.7 Summary 242
10.8 Solutions/ Answers 242
10.1 INTRODUCTION
despite occlusion from one view. Therefore, multiple camera systems have
many advantages over single camera systems. However, this also leads to
questions such as how many cameras are enough, where should the cameras
be placed and how much overlap should the cameras have between their
views. The answers to these questions are dependent on factors such as the
cost of equipment, the type of cameras used and theapplication of the camera
system.
10.2 OBJECTIVES
Theobjectives of this unit are to:
• Learn about the stereo vision system.
• Discuss the concept of point correspondences and epipolar geometry.
• Discuss the concept of motion and optical flow that allows a
computer vision system to find the movement pattern ina video.
236
Multiple Camera
10.3 STEREO VISION
A stereo vision system consists of two cameras such that both the cameras
can capture the same scene, however with some disparity between theviews.
One stereo vision system that we can easily relate to are the two eyes that we
have.
The process of using two images of the scene captured bya stereo camera
system to extract 3D information is known asstereo vision. Since the stereo
pair, or the images taken by the stereo system, enable us to get the 3D
information of the scene, therefore, it finds wide application in autonomous
navigation of robots, autonomous cars, virtual and augmented reality, etc.
In stereo vision, the 3D information is obtained by estimating the relative
depth of the points in the scene. The corresponding points in the stereo pair
are matched anda disparity map is created which helps in estimating the
depth of the points. We shall first understand the concept of corresponding
points
Fig10.1 The figure shows theconcept ofa stereo pair.C and C’ are the two
camera centers and x and x’ are the images of the 3D point X in the
corresponding image planes. x and x’ are said to be corresponding points. Fig
taken from [1]
As shown inFigure 10.1, the stereo pair consist of two cameras that are ata
certain distance apart. The line joining the camera centers is known asthe
baseline. The view volume of the two cameras is such that they viewa
common area in the scene. A 3D point that lies in the common view volume
ofthetwo cameras will be imaged by both the cameras. Therefore, as shown
inFigure 10.1, the 3D point X is visible to both the cameras and therefore, it
has an image inCamera1 (with camera center ) and image inCamera
2 (with camera center ). Therefore, and are called corresponding
points. 237
Computer Vision-I Givena point in one image, we can find it corresponding point in the other
image because of epipolar geometry which we shall study next.
Figure 102 (a) (b) Epipolar geometry (Image taken from [2])
see that in the case of stereo vision, it is possible to find the 3D point if it is
imaged by both the cameras and thepoint correspondences are known.
The line joining the two camera centres Co and Ci is known asthebaseline.
The baseline intersects the two image planes at and respectively. The
point of intersection of the baseline with an image plane is known as an
epipole. Therefore, is the left epipole and is the right epipole. The left
epipole is the image of the camera center in the left image plane
while, the right epipole is the image of the camera centre n the right
image plane The epipolar line is the intersection of the epipolar plane with
the image plane. An important point to be noted is that every epipolar line
passes through the epipole, therefore, the point of intersection of all the
epipolar lines is the epipole. Another point to be noted is that the epipolar
238
line in is the image of theback projected ray joining the camera centre Multiple Camera
ith the 3D point while the epipolar lime n s the image of theback
projected ray, that is the ray from thecamera centre ith the 3D point
10.1)
More precisely, the fundamental matrix apsa given point rom the first
i
10.2)
10.3)
239
Computer Vision-I If we consider msuch correspondences, then we can solve the system of
equation forthe9 unknowns inF.
(10.4)
This system requires atleast 8 points to solve sinceF has 8 degrees of
freedom. Therefore, this is also known asthe8-point algorithm.
JR
2. The camera is moving buttheobject is static.
3. Both thecamera and objects are moving
4. The camera is static but light source and objects are moving.
Optical flow is based on the assumption that across consecutive frames, the
pixel intensities do not rapidly change. It also assumes that neighbouring
pixels have similar pattern of motion. Most often, it also assumes that the
luminance remains constant. We assume that /(x,J,iJ isa pixel in the image
taken at time i, moves toa point (x+H, y+ 6y) at the time i+6i, that is,
tof{x+6x, y+by, t+bt). Since they are the same point, and therefore, the above
assumption can be written mathematically as
(10.6)
Equation (10.6) forms the basis of the 2D motion constraint equation and
holds true given that 6x, by, bt are small. Taking into consideration the first
order Taylor series expansion about inEquation (10.6), we get
time i.
are the image intensity derivatives at (x, y).
There are various methods for solving for optical flow. The Lucas and
Kanade optical flow algorithm [3] and Horn and Shunck [4]optica1 flow
methods aretwo ofthemost popular methods forestimating the optical flow.
241
Computer Vision-I
10.7 SUMMARY
Inthis unit we learned about various concepts related to Multiple camera
models required for computer vision. The concept of stereo vision was
discussed and the concept was extended to point correspondence and
Epipolar geometry; finally the unit was completed with the motion oriented
concepts which includes optical flow.
242