ETE-DIP Solution
ETE-DIP Solution
Faculty of Engineering
School of Computer Science and Engineering
Department of CSE
PhD End Term Examination: 2022-23
CS8012 – Image and Video Processing (MOOC)
Solution
Time: 2 hours MAX.MARKS: 40
Instructions to Candidates
Answer any five questions.
Missing data, if any, may be assumed suitably.
Scientific Calculator is allowed.
Q. 1 (a)
Solution:
Q. 1 (b) z
Q. 2 (a) Solution: Following are Fundamental Steps of Digital Image Processing:
1. Image Acquisition
Image acquisition is the first step of the fundamental steps of DIP. In this stage, an image is given
in digital form. Generally, in this stage, pre-processing such as scaling is done.
2. Image Enhancement
Image enhancement is the simplest and most attractive area of DIP. In this stage details which are not known, or we
can say that interesting features of an image is highlighted. Such as brightness, contrast, etc.
3. Image Restoration
Color image processing is a famous area because it has increased the use of digital images on the internet. This
includes color modeling, processing in a digital domain, etc....
In this stage, an image is represented in various degrees of resolution. Image is divided into smaller regions for data
compression and for the pyramidal representation.
6. Compression
Compression is a technique which is used for reducing the requirement of storing an image. It is a very important stage
because it is very necessary to compress data for internet use.
7. Morphological Processing
This stage deals with tools which are used for extracting the components of the image, which is useful in the
representation and description of shape.
8. Segmentation
In this stage, an image is a partitioned into its objects. Segmentation is the most difficult tasks in DIP. It is a process
which takes a lot of time for the successful solution of imaging problems which requires objects to identify
individually.
Representation and description follow the output of the segmentation stage. The output is a raw pixel data which has
all points of the region itself. To transform the raw data, representation is the only solution. Whereas description is
used for extracting information's to differentiate one class of objects from another.
In this stage, the label is assigned to the object, which is based on descriptors.
Knowledge is the last stage in DIP. In this stage, important information of the image is located, which limits the
searching processes. The knowledge base is very complex when the image database has a high-resolution satellite.
Q. 3 (a) Solution: Equalization of the image histogram
Pixel Value Number of Pixels Cumulative X (L- Round off
1) to the
nearest
grey level
0 8 0.875 1
1 10 1.968 2
2 10 3.062 3
3 2 3.281 3
4 12 4.593 5
5 16 6.343 6
6 4 6.781 7
7 2 7 7
Now Map the values from source to the target. The final mapping between the source and the
target histograms in shown blow:
Table: Final mapping process
Pixel Value (grey H (mapping of the S (mapping of the Map
levels) equalization) equalization of the target)
0 1 0 4
1 2 0 4
2 3 0 5
3 3 0 5
4 5 2 6
5 6 4 6
6 7 6 7
7 7 7 7
Q. 3 (b) Solution:
Q. 4 (a) Solution: Step 1: Center the Data Calculate the mean of the original image along each column and
subtract it from the original image to center the data.
Mean of each column: mean_col = [10.5, 8.0, 7.0, 8.5]
Centered data: A_centered = [[1.5, -3.0, 1.0, -6.5], [-1.5, -5.0, -3.0, -2.5], [-3.5, 3.0, -6.0, 1.5], [3.5, 5.0, 8.0, 7.5]]
Step 2: Compute the Covariance Matrix Compute the covariance matrix of the centered data using the formula:
cov_matrix = (1/N) * A_centered.T * A_centered, where N is the number of samples (4 in this case) and A_centered.T
is the transpose of the centered data matrix.
Covariance matrix:
cov_matrix = [[12.5, 3.5, -4.5, -5.5],
[3.5, 13.0, -6.0, -4.0],
[-4.5, -6.0, 20.0, 12.0],
[-5.5, -4.0, 12.0, 12.5]]
Step 3: Compute the Eigenvectors and Eigenvalues Compute the eigenvectors and eigenvalues of the covariance
matrix. Eigenvectors represent the principal components of the data, and eigenvalues represent the amount of
variance explained by each principal component.
Eigenvalues and eigenvectors of the covariance matrix:
Eigenvalues = [39.641836, 17.885057, 2.657107, 0.816999]
Eigenvectors = [[-0.382107, -0.460635, -0.784290, 0.170785],
[-0.337353, -0.203019, 0.251235, -0.879227],
[0.675234, -0.717861, 0.112516, -0.143346],
[0.524200, 0.462985, 0.553678, 0.445569]]
Step 4: Select Principal Components Select the top k eigenvectors with the highest eigenvalues to form the principal
components. In this case, we want to reduce the dimensionality of the image to 2, so we select the top 2
eigenvectors.
Selected eigenvectors (principal components):
PC1 = [-0.382107, -0.460635, -0.784290, 0.170785]
PC2 = [-0.337353, -0.203019, 0.251235, -0.879227]
Q. 4 (b) Solution:
Q. 5 (a) Solution: Given an image, write down the 8-chain code.
Start-point Image
Q. 5 (b) Solution: Scale Invariant Feature Transform (SIFT) is a feature extraction technique used in computer
vision to identify and describe local features in images. It is widely used for image matching and object recognition.
Here are the steps for using SIFT for image matching:
1. Extract keypoints from the images: The first step is to extract keypoints from the images.
Keypoints are the distinctive features in the image that are invariant to scale, orientation,
and illumination changes. SIFT algorithm detects keypoints by looking for local extrema in
the difference of Gaussian (DoG) scale-space representation of the image.
2. Assign orientations to keypoints: The next step is to assign an orientation to each
keypoint. This is done by calculating the gradient magnitude and orientation at each pixel
within a region around the keypoint. The orientation is then assigned to the keypoint
based on the dominant direction of the gradients.
3. Generate feature descriptors: The next step is to generate feature descriptors for each
keypoint. This is done by computing the gradient magnitude and orientation at a set of
points within a region around the keypoint. The resulting gradient orientations are then
used to generate a histogram of orientations, which is used as the feature descriptor.
4. Match keypoints: Once the keypoints and feature descriptors have been extracted from
the images, the next step is to match the keypoints between the images. This is typically
done using a nearest neighbor search to find the best matching keypoints between the
two sets of keypoints. The distance between the feature descriptors is used as the
similarity metric.
5. Filter out incorrect matches: In order to filter out incorrect matches, a ratio test is applied
to the nearest neighbor matches. This test compares the distance to the nearest and
second nearest neighbors and only accepts matches where the distance to the nearest
neighbor is significantly smaller than the distance to the second nearest neighbor.
6. Estimate transformation: Once the correct matches have been identified, a transformation
can be estimated between the two images. This transformation can be used to align the
images or to find correspondences between different views of the same object.
Overall, SIFT is a powerful technique for image matching, and can be used in a wide variety of
applications, including object recognition, image retrieval, and panorama stitching.
L2 norm
L2-Hys (Lowe-style clipped L2 norm)
Now, we could simply normalize the 9×1 histogram vector but it is better to normalize a bigger
sized block of 16×16. A 16×16 block has 4 histograms (8×8 cell results to one histogram) which
can be concatenated to form a 36 x 1 element vector and normalized. The 16×16 window then
moves by 8 pixels and a normalized 36×1 vector is calculated over this window and the process is
repeated for the image.
Calculate HOG Descriptor vector
To calculate the final feature vector for the entire image patch, the 36×1 vectors are
concatenated into one giant vector.
So, say if there was an input picture of size 64×64 then the 16×16 block has 7 positions
horizontally and 7 position vertically.
In one 16×16 block we have 4 histograms which after normalization concatenate to form a
36×1 vector.
This block moves 7 positions horizontally and vertically totaling it to 7×7 = 49 positions.
So, when we concatenate them all into one giant vector we obtain a 36×49 = 1764
dimensional vector.
This vector is now used to train classifiers such as SVM and then do object detection.
Q. 6 (b) Solution: Discontinuity-based segmentation methods divide an image into regions based
on the presence of discontinuities, such as edges or colour changes. On the other hand, similarity-
based segmentation methods divide an image into regions based on the similarity between
neighbouring pixels. Here is a brief explanation of the different types of discontinuity and
similarity-based segmentation methods:
1. Discontinuity-based segmentation methods:
a. Edge-based segmentation: This method segments an image based on the presence of edges.
Edges are defined as points in an image where there is a sudden change in intensity or color. Edge-
based segmentation methods use edge detection techniques to find these points and then group
neighboring points into regions.
b. Region-based segmentation: This method segments an image based on the homogeneity of
regions. Homogeneous regions are areas of an image that have similar intensity or color. Region-
based segmentation methods use a variety of techniques to group pixels into regions, such as
clustering algorithms or graph-based methods.
c. Line-based segmentation: This method segments an image based on the presence of straight
lines. Line-based segmentation methods use line detection techniques to find straight lines in an
image and then group pixels that are close to these lines into regions.
Similarity-based segmentation methods:
2. Region growing segmentation: This method starts with a set of seed pixels and then iteratively
grows the region by adding neighboring pixels that are similar to the seed pixels. The similarity
between pixels can be based on various criteria such as intensity, color, or texture.
a. K-means clustering: This method clusters pixels into regions based on their similarity in color or
intensity. K-means clustering is an unsupervised learning algorithm that groups pixels into k
clusters based on their similarity.
b. Watershed segmentation: This method segments an image into regions based on the topography
of the image. In this method, the image is treated as a topographic map, and the image pixels are
considered as mountains and valleys. The watershed segmentation algorithm then floods the
valleys to create regions.
In summary, discontinuity-based segmentation methods use discontinuities in intensity, color, or
texture to segment images, while similarity-based segmentation methods group pixels based on
their similarity in intensity, color, or texture. Both types of methods have their advantages and
disadvantages, and the choice of method depends on the specific application and the type of image
being segmented.