Computer Vision 8th Sem Lab Manual
Computer Vision 8th Sem Lab Manual
Experiment No. 1
Aim: Perform basic Image Handling and processing operations on the image.
Objectives: The objective of this lab is to introduce the student to OpenCV, especially for
image processing.
Reading an image in python
Convert Images to another format
Convert an Image to Grayscale
Play a video file
Lab Exercise: The combination of Python (the language), Numpy (the numerical array lib),
SciPy (scientific libs), and Matplotlib (the graphical plot lib) will serve as our computational
lOMoARcPSD|245 368 66
basis to learn image processing and computer vision. Where possible and needed we will use
other libraries as well.
Experiment No. 2
Aim: Geometric Transformation
Objectives: The objective of this lab is to introduce Geometric Transformation and apply it
to images.
Affine Transformation
Rotation, Translation and Scaling Transformation
Lab Exercise:
The code below shows the overall affine matrix that would give the same results as above. A good exercise
would be to derive the formulation yourself!
def get_affine_cv(t, r, s):
sin_theta = np.sin(r)
cos_theta = np.cos(r)
a_11 = s * cos_theta
a_21 = -s * sin_theta
a_12 = s * sin_theta
a_22 = s * cos_theta
2. Rely on OpenCV to return the affine transformation matric using cv2.getRotationMatrix2D(center, angle, scale).
This function rotates the image about the point center with angle and scale it with scale
A3 = cv2.getRotationMatrix2D((tx, ty), np.rad2deg(angle), scale)warped = cv2.warpAffine(image, b3, (width, height),
flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT, borderValue=0)
lOMoARcPSD|245 368 66
Experiment No. 3
Aim: Compute Homography Matrix
Objectives:
Estimate the Homography matrix without Singular Value Decomposition
Implement Direct Linear Transformation (DLT) algorithm to estimate the Homography
Remove the Projective Distortion in the image using DLT
Lab Exercise:
This experiment will demonstrate the basic concepts of homography with some codes. For
detailed explanations of the theory, please refer to video recordings of a computer vision course
or a computer vision book.
Lab Exercise
1. Compute the Homography matrix for a given 4 data points without SVD and
transformed the point using the computed homography matrix.
(51,791) -- (1,900)
(63,143) -- (1,1)
(444,211) -- (501,1)
(426,719) -- (501,900)
2. Compute the Homography matrix for a given 4 data points using DLT and transformed
the point using the computed homography matrix.
(51,791) -- (1,900)
(63,143) -- (1,1)
(444,211) -- (501,1)
(426,719) -- (501,900)
Experiment No. 4
Aim: Perspective Transformation
Objectives:
Remove the Projective Distortion in the image using Homography
Lab Exercise
1. For perspective transformation, you need a 3x3 transformation matrix. Straight lines
will remain straight even after the transformation. To find this transformation matrix,
you need 4 points on the input image and corresponding points on the output image.
Among these 4 points, 3 of them should not be collinear. Then the transformation
matrix can be found by the function cv2.getPerspectiveTransform(). Then
apply cv2.warpPerspective() with this 3x3 transformation matrix.
Code:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('nature.jpg')
rows,cols,ch = img.shape
pt1 = np.float32([[50,65],[370,52],[30,387],[390,390]])
pt2 = np.float32([[0,0],[310,0],[0,310],[310,310]])
matrix_aff = cv2.getPerspectiveTransform(pt1,pt2)
dst = cv2.warpPerspective(img,matrix_aff,(cols,rows))
plt.subplot(121),plt.imshow(img),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
2. Write a function <perspective transform(f, x1, y1, x2, y2, x3, y3, x4, y4, width, height)” that
warps a quadrilateral with vertices at (x1,y1), (x2,y2), (x3,y3) and (x4,y4) to a new
image of given width and height. The mapping is given in the following table:
lOMoARcPSD|245 368 66
Calculation of the parameters of the perspective transform should be done by your code (do
not use the code from OpenCV or other sources).
Be sure that your code works with:
3. image of unequal width and height
4. scalar and color images
In your report present the code and the formulas on which it is based together with some
examples showing that it works as intended. You should at least be able to redo the following
example (make a reasonable choice for width and height of the new image assuming the flyer
is of A4 shape).
lOMoARcPSD|245 368 66
Experiment No. 5
Aim: Camera Calibration
Objectives:
Calibrate the camera and extract the intrinsic and extrinsic parameters of the camera
Camera Calibration:
Camera Calibration is nothing but estimating the parameters of a camera, parameters about the camera
are required to determine an accurate relationship between a 3D point in the real world and its
corresponding 2D projection (pixel) in the image captured by that calibrated camera.
Some pinhole cameras introduce significant distortion to images. Two major kinds of distortion are
radial distortion and tangential distortion.
Radial distortion causes straight lines to appear curved. Radial distortion becomes larger the farther
points are from the center of the image. For example, one image is shown below in Figure 1 which two
edges of a chess board are marked with red lines. But, you can see that the border of the chess board is
not a straight line and doesn't match the red line. All the expected straight lines are bulged out.
Similarly, tangential distortion occurs because the image-taking lense is not aligned perfectly
parallel to the imaging plane. So, some areas in the image may look nearer than expected.
lOMoARcPSD|245 368 66
In addition to this, we need some other information, like the intrinsic and extrinsic parameters
of the camera. Intrinsic parameters are specific to a camera. They include information like focal
length ( fx, fy) and optical centers ( cx, cy). The focal length and optical centers can be used to
create a camera matrix, which can be used to remove distortion due to the lenses of a specific
camera. The camera matrix is unique to a specific camera, so once calculated, it can be reused
on other images taken by the same camera. It is expressed as a 3x3 matrix:
Extrinsic parameters correspond to rotation and translation vectors which translate the
coordinates of a 3D point to a coordinate system.
To find these parameters, we must provide some sample images of a well-defined pattern (e.g.
a chess board). We find some specific points of which we already know the relative positions
(e.g. square corners on the chess board). We know the coordinates of these points in real-world
space and we know the coordinates in the image, so we can solve for the distortion coefficients.
For better results, we need at least 10 test patterns.
Code
As mentioned above, we need at least 10 test patterns for camera calibration. OpenCV comes
with some images of a chess board (see samples/data/left01.jpg 3 left14.jpg), so we will utilize
these. Consider an image of a chess board. The important input data needed for the calibration
of the camera is the set of 3D real-world points and the corresponding 2D coordinates of these
points in the image. 2D image points are OK which we can easily find from the image. (These
image points are locations where two black squares touch each other on chess boards)
What about the 3D points from real-world space? Those images are taken from a static camera
and chess boards are placed at different locations and orientations. So we need to know (X, Y,
Z) values. But for simplicity, we can say the chess board was kept stationary at XY plane, (so
Z=0 always) and the camera was moved accordingly. This consideration helps us to find only
X and Y values. Now for X, and Y values, we can simply pass the points as (0,0), (1,0), and
(2,0), ... which denotes the location of the points. In this case, the results we get will be on the
scale of the size of the chess board square. But if we know the square size, (say 30 mm), we
can pass the values as (0,0), (30,0), (60,0), Thus, we get the results in mm. (In this case, we
don't know square size since we didn't take those images, so we pass in terms of square size).
3D points are called object points and 2D image points are called image points.
lOMoARcPSD|245 368 66
Setup
Once we find the corners, we can increase their accuracy using cv.cornerSubPix(). We can also
draw the pattern using cv.drawChessboardCorners(). All these steps are included below code:
import numpy as np
glob
# termination criteria
np.mgrid[0:7,0:6].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.objpoints = [] # 3d point
if ret == True:
objpoints.append(objp)
imgpoints.append(corners)
cv.waitKey(500)
cv.destroyAllWindows()
Calibration
Now that we have our object points and image points, we are ready to go for calibration. We
can use the function, cv.calibrateCamera() which returns the camera matrix, distortion
coefficients, rotation and translation vectors, etc.
Undistortion
Now, we can take an image and undistort it. OpenCV comes with two methods for doing this.
However, first, we can refine the camera matrix based on a free scaling parameter
using cv.getOptimalNewCameraMatrix(). If the scaling parameter alpha=0, it returns an
undistorted image with minimum unwanted pixels. So it may even remove some pixels at
image corners. If alpha=1, all pixels are retained with some extra black images. This function
also returns an image ROI which can be used to crop the result.
img = cv.imread('left12.jpg')
h, w = img.shape[:2]
Using cv.undistort()
Just call the function and use ROI obtained above to crop the result.
# undistort
x, y, w, h = roi
cv.imwrite('calibresult.png', dst)
cv2.destroyAllWindows()
Lab Exercise
1. Perform the camera calibration and compute the intrinsic and extrinsic parameters of
the camera.
2. Use the camera calibration parameters to undistort the image.
lOMoARcPSD|245 368 66
Experiment No. 6
Aim: Compute Fundamental Matrix
Objectives:
Find fundamental matrix, epipoles, epipolar lines.
Plot epipole lines on the images.
Find the projection matrix of the second camera position using the fundamental matrix.
Lab Exercise
he normalized eight-point algorithm is used to compute the fundamental matrix given point
correspondences x = (u, v) and x' = (u', v') in the left and right images, respectively. Each point
correspondence generates one constraint on the fundamental matrix F and must satisfy the epipolar
constraint equation
Expanding the matrices out by multiplication, we obtain the following equation for n point
correspondences
where A is the n x 9 equation matrix, and f is a 9-element column vector containing the entries of the
fundamental matrix F.
From here, the least-squares solution f is easily computed by performing singular value decomposition
(SVD) on the matrix A=UDVT. It is well-known that the vector f that minimizes ||Af|| such that ||f|| = 1 can
be found along the column of V corresponding to the least singular value. Next, we rearrange the 9 entries
of f to create the 3x3 fundamental matrix F. Then, we perform SVD on F to obtain
We set the smallest singular value of D f to 0 to create matrix D'f, thus reducing the rank of the matrix
from 3 to 2, and from there we can recompute the rank-2 fundamental matrix as
Using the version of the eight-point algorithm without prior coordinate normalization to compute the
fundamental matrix on the laboratory image pair, we obtain the following fundamental matrix F
(truncated to 4 decimal places) and the epipolar lines for both images:
If you look closely, you'll notice that the epipolar lines don't pass exactly through the center of all the
point correspondences. How can we improve this?
However, in order to improve the accuracy of epipolar lines generated in Part III, we want to modify the
'vanilla' eight-point algorithm to recenter the point correspondences xi = (ui, vi) and x'i = (u'i, v'i) in both
lOMoARcPSD|245 368 66
images to their respective centroids before proceeding to compute the least-squares solution for f. After
recentering the image points, we must scale the points to be a fixed squared distance from the origin. The
coordinate normalization steps can be summarized in the following steps:
2. Recenter by subtracting the mean u and v coordinates from the original point correspondences to
obtain
3. Define the scale term s and s' to be the average distances of the centered points from the origin in
both the left and right images:
6. Solve for the fundamental matrix F by applying the eight-point algorithm on the normalized set of
point correspondences computed in the previous step.
7. After obtaining the normalized fundamental matrix Fnorm, retrieve the fundamental matrix in the
original coordinate frame using the following formula
Using the normalized eight-point algorithm on the laboratory image pair, we get
lOMoARcPSD|245 368 66
Experiment No. 7
Aim: Edge Detection, Line Detection and Corner Detection
Objectives:
Find the edges in the image
Corner detection with Harris Corner Detector
Line detection
Lab Exercise
1. Compute the edge detection using Sobel, Prewitt and canny operator.
2. Implement the Harris Corner detector algorithm to determine the corner in the image.
3. Implement the Harris Corner Detector algorithm without the inbuilt OpenCV()
function.
4. Detect the line using Hough Transform
1. Compute the edge detection using Sobel, Prewitt and canny operator.
Prewitt Operator
The Prewitt operator was developed by Judith M. S. Prewitt. Prewitt operator is used for edge detection in an image.
Prewitt operator detects both types of edges, these are:
Horizontal edges or along the x-axis,
Vertical Edges or along the y-axis.
Prewitt operator provides us two masks one for detecting edges in the horizontal direction and another for detecting edges
in a vertical direction.
Prewitt Operator [X-axis] = [ -1 0 1; -1 0 1; -1 0 1]
Prewitt Operator [Y-axis] = [-1 -1 -1; 0 0 0; 1 1 1]
Steps:
Read the image.
Convert into grayscale if it is colored.
Convert into the double format.
Define the mask or filter.
Detect the edges along X-axis.
Detect the edges along Y-axis.
Combine the edges detected along the X and Y axes.
Display all the images.
Imtool() is the inbuilt function in Matlab. It is used to display the image. It takes 2 parameters; the first is the image
variable and the second is the range of intensity values. We provide an empty list as the second argument which means the
complete range of intensity has to be used while displaying the image.
Example:
k=imread("logo.png");
k=rgb2gray(k);
lOMoARcPSD|245 368 66
k1=double(k);
p_msk=[-1 0 1; -1 0 1; -1 0 1];
ked=sqrt(kx.^2 + ky.^2);
imtool(k,[]);
imtool(abs(kx), []);
imtool(abs(ky),[]);
imtool(abs(ked),[]);
Output:
lOMoARcPSD|245 368 66
Scharr Operator
This is a filtering method used to identify and highlight gradient edges/features using the first derivative. Performance is
quite similar to the Sobel filter.
Scharr Operator [X-axis] = [-3 0 3; -10 0 10; -3 0 3];
Scharr Operator [Y-axis] = [ 3 10 3; 0 0 0; -3 -10 -3];
Example:
k=imread("logo.png");
k=rgb2gray(k);
k1=double(k);
ked=sqrt(kx.^2 + ky.^2);
imtool(k,[]);
imtool(abs(kx), []);
imtool(abs(ky),[]);
imtool(abs(ked),[]);
Output:
lOMoARcPSD|245 368 66
Sobel Operator
It is named after Irwin Sobel and Gary Feldman. Like the Prewitt operator Sobel operator is also used to detect two kinds of
edges in an image:
Vertical direction
Horizontal direction
The difference between Sobel and Prewitt Operator is that in Sobel operator the coefficients of masks are adjustable
according to our requirement provided they follow all properties of derivative masks.
% edge detection
k=imread("logo.png");
k=rgb2gray(k);
lOMoARcPSD|245 368 66
k1=double(k);
s_msk=[-1 0 1; -2 0 2; -1 0 1];
ked=sqrt(kx.^2 + ky.^2);
imtool(k,[]);
imtool(abs(kx), []);
imtool(abs(ky),[]);
imtool(abs(ked),[]);
Output:
lOMoARcPSD|245 368 66
%matplotlib inline
plt.imshow(image_copy)
Out[1]:
<matplotlib.image.AxesImage at 0x7ffb1f306eb8>
Detect corners
In [2]:
# Convert to grayscale
gray = cv2.cvtColor(image_copy, cv2.COLOR_RGB2GRAY)
gray = np.float32(gray)
# Detect corners
dst = cv2.cornerHarris(gray, 2, 3, 0.04)
plt.imshow(dst, cmap='gray')
Out[2]:
<matplotlib.image.AxesImage at 0x7ffb1f2b2828>
corner_image = np.copy(image_copy)
# Iterate through all the corners and draw them on the image (if they pass the threshold)
for j in range(0, dst.shape[0]):
for i in range(0, dst.shape[1]):
if(dst[j,i] > thresh):
# image, center pt, radius, color, thickness
cv2.circle( corner_image, (i, j), 1, (0,255,0), 1)
plt.imshow(corner_image)
Out[3]:
<matplotlib.image.AxesImage at 0x7ffae844cac8>
lOMoARcPSD|245 368 66
Experiment No. 8
Aim: SIFT feature descriptors
Objectives:
To understand the concepts of SIFT algorithm
To find the key points and descriptors
Lab Exercise
#import cv2
img = cv2.imread('geeks.jpg')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.xfeatures2d.SIFT_create()
kp = sift.detect(gray, None)
img=cv2.drawKeypoints(gray ,
kp ,
img ,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
lOMoARcPSD|245 368 66
cv2.imwrite('image-with-keypoints.jpg', img)
Output:
The image on left is the original, the image on right shows the various highlighted interest points on the image
import cv2
image_paths=['1.jpg','2.jpg','3.jpg']
# initialized a list of images
imgs = []
for i in range(len(image_paths)):
imgs.append(cv2.imread(image_paths[i]))
imgs[i]=cv2.resize(imgs[i],(0,0),fx=0.4,fy=0.4)
# this is optional if your input images isn't too large
# you don't need to scale down the image
# in my case the input images are of dimensions 3000x1200
# and due to this the resultant image won't fit the screen
# scaling down the images
# showing the original pictures
cv2.imshow('1',imgs[0])
cv2.imshow('2',imgs[1])
cv2.imshow('3',imgs[2])
stitchy=cv2.Stitcher.create()
(dummy,output)=stitchy.stitch(imgs)
if dummy != cv2.STITCHER_OK:
# checking if the stitching procedure is successful
# .stitch() function returns a true value if stitching is
# done successfully
print("stitching ain't successful")
else:
print('Your Panorama is ready!!!')
# final output
cv2.imshow('final result',output)
cv2.waitKey(0)
Output:
lOMoARcPSD|245 368 66
Experiment No. 9
Aim: SURF and HOG feature descriptors
Objectives:
To understand the concepts of SURF and HOG algorithm
To compute the SURF and HOG features
Lab Exercise
Example 1 :
Python3
import mahotas
import mahotas.demos
import mahotas as mh
import numpy as np
nuclear = mahotas.demos.nuclear_image()
# filtering image
nuclear = nuclear[:, :, 0]
nuclear = mahotas.gaussian_filter(nuclear, 4)
# showing image
print("Image")
lOMoARcPSD|245 368 66
imshow(nuclear)
show()
spoints = surf.surf(nuclear)
Output :
No of points: 217
Example 2 :
Python3
import numpy as np
import mahotas
# loading image
img = mahotas.imread('dog_image.png')
img = img[:, :, 0]
gaussian = mahotas.gaussian_filter(img, 5)
# showing image
lOMoARcPSD|245 368 66
print("Image")
imshow(gaussian)
show()
spoints = surf.surf(gaussian)
Output :
No of points: 364
I'm gonna perform HOG on a cute cat image, get it here and put it in the current working directory (you can
use any image you want, of course). Let's load the image and show it:
(1349, 1012, 3)
3. Write a program to detect the pedestrians in an image using HOG.
Requirements
opencv-python 3.4.2
imutils 0.5.3
To install the above modules type the below command in the terminal.
pip install moudle_name
Example 1:
Lets make the program to detect pedestrians in an Image:
Image Used:
import cv2
import imutils
# detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
image = cv2.imread('img.png')
image = imutils.resize(image,
width=min(400, image.shape[1]))
(regions, _) = hog.detectMultiScale(image,
winStride=(4, 4),
padding=(4, 4),
lOMoARcPSD|245 368 66
scale=1.05)
(x + w, y + h),
(0, 0, 255), 2)
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output:
import cv2
import imutils
# detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
cap = cv2.VideoCapture('vid.mp4')
while cap.isOpened():
if ret:
image = imutils.resize(image,
width=min(400, image.shape[1]))
# pedestrians inside it
(regions, _) = hog.detectMultiScale(image,
winStride=(4, 4),
padding=(4, 4),
scale=1.05)
# Image
(x + w, y + h),
(0, 0, 255), 2)
cv2.imshow("Image", image)
break
else:
break
cap.release()
cv2.destroyAllWindows()
Output:
Video Player
00:00