0% found this document useful (0 votes)
17 views

UNIT- II

The document discusses the pinhole camera model, explaining its basic structure, functionality, and limitations in computer vision. It covers various problems related to focal length calculation, homography matrix, and perspective projection, alongside applications of homography in image processing. Additionally, it introduces techniques using the Mahotas library for image stretching, highlighting maxima, and cropping images.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

UNIT- II

The document discusses the pinhole camera model, explaining its basic structure, functionality, and limitations in computer vision. It covers various problems related to focal length calculation, homography matrix, and perspective projection, alongside applications of homography in image processing. Additionally, it introduces techniques using the Mahotas library for image stretching, highlighting maxima, and cropping images.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

UNIT- II

Pinhole Camera Model


The pinhole camera model is a basic model of a camera used in computer
vision. It describes the relationship between a 3D point and its projection
onto a 2D image plane.
How it works
A pinhole camera is a box with a small hole and a photosensitive surface
on the opposite side.
Light rays travel in straight lines and enter the box through the pinhole.
The light hits the photosensitive surface and creates an image.
The image is flipped upside down.
Why it's used
The pinhole camera model is the simplest camera model.
It's a good first approximation of the mapping from 3D to 2D.
Limitations
- The pinhole camera model doesn't account for lens distortion
or blurring.
- It also doesn't account for the fact that most cameras have
discrete image coordinates.
- The model's validity decreases from the center of the image to
the edges.
Components
Camera center: Also called the optical center
Principal axis: The line from the camera center to the image
plane
Principal point: The point where the principal axis meets the
image plane
Problem 1: Focal Length Calculation for a Pinhole Camera
Q1. A pinhole camera has a sensor of size 36mm × 24mm (full-frame). The
camera captures an image where an object of height 2m appears as 40mm on
the image sensor. If the object is placed 5m in front of the camera, calculate the
focal length of the camera.

ℎ𝑖 𝑓
=
ℎ𝑜 𝑑𝑜
Q2. A pinhole camera has an image sensor of 24mm × 36mm. The
camera captures an image of a 1.5m tall object that appears 30mm
tall on the image sensor. The object is placed 3m away from the
camera.
Find the focal length of the camera.
Problem 2: Homography Matrix Calculation
A planar object undergoes a homography transformation, where a point P(2, 3) in the original image gets
mapped to P'(8, 6) in the transformed image. Given the homography matrix:
2 0 0
𝐻= 0 2 0
0 0 1
Find the mapped coordinates of the point Q(4, 5).

Problem 3: Perspective Projection


A 3D point 𝑃 4,6,10 is projected onto a 2D image plane using a perspective projection matrix:
𝑓 0 0 0
𝑃= 0 𝑓 0 0
0 0 1 0
where 𝑓 = 50𝑚𝑚. Find the projected coordinates 𝑥 ′ , 𝑦 ′ .
Practice Question 2: Camera Calibration – Perspective Projection
A 3D point 𝑃 8,12,20 is projected onto a 2D image plane using a perspective projection matrix:
50 0 0 0
𝑃= 0 50 0 0
0 0 1 0
Find the projected 2D image coordinates 𝑥 ′ , 𝑦 ′ .
What is Homography?

Homography is a transformation matrix that defines a


projective transformation between two images. It is a
3x3 matrix that relates the pixel coordinates of one
image to the pixel coordinates of another image. If the
two images depict the same scene from different
viewpoints, and the scene lies on a plane,
homography can be used to warp one image onto
another.
Applications of Homography

•Image Stitching: Aligns images to create a panoramic view by


warping them using homography.

•Augmented Reality: Projects virtual objects onto real-world surfaces


by estimating the homography between the camera view and the real-
world plane.

•Planar Object Detection: Identifies and tracks objects by estimating


their homography in different images or video frames.

•3D Reconstruction: Helps recover 3D geometry from multiple


images.
Example 1: Convert (4, 5) to Homogeneous and Apply Transformation and Convert
Back to Cartesian

Example 2: Convert (10,20) to Homogeneous and Apply Transformation matrix and


convert Back to cartesian
Problem 4: Image Stitching using Homography
Two images are being stitched together using a homography matrix:
1.2 0.1 5
𝐻 = 0.2 1.1 3
0.01 0.02 1
If a point in the first image has coordinates 50,75 , find its corresponding coordinates in the second image.
Mahotas – Image Stretching

In this topic we will see how we can do image


stretching in mahotas. Contrast stretching (often called
normalization) is a simple image enhancement
technique that attempts to improve the contrast in an
image by `stretching’ the range of intensity values it
contains to span a desired range of values, e.g. the full
range of pixel values that the image type concerned
allows.
# importing required libraries
import mahotas
import numpy as np
from pylab import gray, imshow, show
import os
import matplotlib.pyplot as plt

# loading image
img = mahotas.imread('/content/Dog.jpg')

# filtering image
img = img[:, :, 0]

print("Image")
Mahotas – Highlighting Image Maxima

In this article we will see how we can highlight the maxima of


image in mahotas. Maxima can be best found in the distance map
image because in labeled image each label is maxima but in
distance map maxima can be identified easily. For this we are
going to use the fluorescent microscopy image from a nuclear
segmentation benchmark. We can get the image with the help
Mahotas – Getting Mean Value of Image
In this article we will see how we can get the mean value of image in mahotas. Mean
value is the sum of pixel values divided by the total number of pixel values.
Pixel Values Each of the pixels that represents an image stored inside a computer has a
pixel value which describes how bright that pixel is, and/or what color it should be. In
the simplest case of binary images, the pixel value is a 1-bit number indicating either
foreground or background.
Mean is most basic of all statistical measure. Means are often used in geometry and
analysis; a wide range of means have been developed for these purposes. In contest of
image processing filtering using mean is classified as spatial filtering and used for noise
reduction.
In order to do this we will use mean method

Syntax : img.mean()
Argument : It takes no argument
Return : It returns float32
Mahotas – Element Structure for Dilating Image

In this article we will see how we can set the element structure of the dilate
of image in mahotas. Dilation adds pixels to the boundaries of objects in an
image, while erosion removes pixels on object boundaries. The number of
pixels added or removed from the objects in an image depends on the size
and shape of the structuring element used to process the image.

In order to dilate the image we use mahotas.morph.dilate method.By


setting element structure we can increase or decrease the dilating effect on
the image.
Mahotas – Local Maxima in Image

In this article we will see how we can find local maxima of


the image in mahotas. Local maxima is basically local
peaks in the image.

In order to do this we will use mahotas.locmax method


Syntax : mahotas.locmax(img)
Argument : It takes image object as argument
Return : It returns image object
Implementation steps :

1. Load the image


2. Filter the image
3. Use otsu method for threshold of
the image
4. Create a structure of the element
with the help of numpy ndarray for
binary values
5. Use the element for dilating the
image
# importing required libraries
import mahotas
import numpy as np
import matplotlib.pyplot as plt
import os

# loading image
img = mahotas.imread(‘image')

# setting filter to the image


img = img[:, :, 0]

# otsu method
T_otsu = mahotas.otsu(img)
Mahotas – Cropping Image

we will see how we can crop the image in mahotas. Cropping is


easily done simply by slicing the correct part out of the array,
here array is the image object which is numpy.ndarray.

Mahotas – Image Cropped to bounding box

In this article we will see how we can get the image cropped to the
bounding box in mahotas. We can get image bounding box with the help of
bbox method.
# importing required libraries
import mahotas
import numpy as np
from pylab import gray, imshow, show
import os

# loading image
img = mahotas.imread(‘image')

# filtering image
img = img[:, :, 0]

# otsu method
T_otsu = mahotas.otsu(img)
# image values should be greater than otsu
value
img = img > T_otsu

print("Image threshold using Otsu Method")

# showing image
imshow(img)
show()

# crop to bbox
new_img = mahotas.croptobbox(img)

print("Cropped to bbox Image")

# showing image
imshow(new_img)
show()
# Create figure and subplots
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
ax = axes.ravel()

# Display original image


ax[0].imshow(original)
ax[0].set_title("Original")
ax[0].axis("off")

# Display grayscale image


ax[1].imshow(grayscale, cmap=plt.cm.gray)
ax[1].set_title("Grayscale")
ax[1].axis("off")

# Display log-transformed image


ax[2].imshow(transformed_image, cmap=plt.cm.gray)
ax[2].set_title("Log Transformed (s = c log(1+r))")
ax[2].axis("off")

# Show the figure


plt.show()
# image values should be greater than otsu value
img = img > T_otsu

print("Image threshold using Otsu Method")

# showing image
imshow(img)
show()

# erode structure
es = np.array([
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], bool)

# dilating image
dilate_img = mahotas.morph.dilate(img, es)

# showing dilated image


print("Dilated Image")
imshow(dilate_img)
show()
import numpy as np
import matplotlib.pyplot as plt

# Sample grayscale image (2D NumPy array)


grayscale = np.array([[100, 150], [200, 250]])

def transform_function(r):
s = 2 * r # Transformation: Multiply each pixel by 2
return s

# Apply the transformation


transformed_image = transform_function(grayscale)

# Display the transformed image


plt.imshow(transformed_image, cmap='gray')
plt.title('Transformed Image')
plt.show()
# Display original image
ax[0].imshow(original)
ax[0].set_title("Original")
ax[0].axis("off")

# Display grayscale image


ax[1].imshow(grayscale, cmap=plt.cm.gray)
ax[1].set_title("Grayscale")
ax[1].axis("off")

# Display transformed image


ax[2].imshow(transformed_image, cmap=plt.cm.gray)
ax[2].set_title("Transformed (s = L - 1 - r)")
ax[2].axis("off")

# Show the figure


plt.show()
import matplotlib.pyplot as plt
from skimage import data
from skimage.color import rgb2gray

# Load the astronaut image


original = data.astronaut()

# Convert to grayscale
grayscale = rgb2gray(original[..., :3]) # Avoids
deprecation warning

# Create figure and subplots


fig, axes = plt.subplots(1, 2, figsize=(8, 4))
ax = axes.ravel()
# Display original image
ax[0].imshow(original)
ax[0].set_title("Original")
ax[0].axis("off")

# Display grayscale image


ax[1].imshow(grayscale, cmap=plt.cm.gray)
ax[1].set_title("Grayscale")
ax[1].axis("off")

# Show the figure


plt.show()
Section 2: 2D Projective Geometry
Practice Question 3: Homography Transformation
Given a homography matrix:
1.5 0.2 4
𝐻 = 0.1 1.3 5
0.01 0.02 1
Find the transformed coordinates of the point P(20, 30).
Cartesian coordinates
In computer vision, Cartesian coordinates refer to a system used to precisely
define the position of a point in an image or 3D space by using a set of
perpendicular axes, typically labeled as X, Y, and (in 3D) Z, where each
point is represented by its distance along each axis from a designated origin,
allowing for accurate location tracking and manipulation of visual data
within a scene; essentially, it's a way to map out a point in space using
numerical values on these axes.
Key points about Cartesian coordinates in
computer vision:

•2D vs. 3D:


•In most computer vision applications, 2D Cartesian
coordinates (X, Y) are used to represent points on an
image plane, while 3D Cartesian coordinates (X, Y, Z)
are used to represent points in a 3D scene.
•Origin:
•The origin (0, 0) is the reference point where the X
and Y axes intersect, usually located at the top-left
corner of an image.
•Axis directions:
•X-axis: Represents the horizontal position, usually
increasing from left to right.

•Y-axis: Represents the vertical position, usually


increasing from top to bottom.

•Z-axis (in 3D): Represents the depth dimension,


perpendicular to the XY plane.
How Cartesian coordinates are used in computer vision:
•Object detection:
•When detecting objects in an image, their bounding boxes are
often defined using Cartesian coordinates of the top-left and
bottom-right corners.
•Feature extraction:
•Many feature descriptors in computer vision, like corner
detection algorithms, rely on Cartesian coordinates to identify and
represent key points in an image.
•Image alignment:
•By transforming coordinates between different camera views
using camera calibration, Cartesian coordinates enable accurate
alignment of images from multiple viewpoints.
•Motion tracking:
•Tracking the movement of objects in a video sequence often
involves calculating the change in Cartesian coordinates of the
tracked object over time.
Formulas in Cartesian Coordinate System

We know that Cartesian Coordinate System is used to locate


points and draw graphs for various algebraic function. Hence,
the distance between the points and the equations for the
graphs can be written using Cartesian System.
Distance Formula
Distance Formula is used to calculate distance between two
points, two lines, between a point and a line and many more.
The most commonly is used to calculate distance between two
points in 2D and as well as three 3D. These formulas are
mentioned below:
•Distance Formula for Two Points in 2D: √{(x2 – x1)2 + (y2 –
y1)2}
•Distance Formula for Two Points in 3D: √{(x2 – x1)2 + (y2 –
y )2 + (z – z )2}
Section Formula
Section formula is given to find the coordinates of a point
which divides a given line in a given ratio.
Consider a line which is formed by joining two points (x1, y1)
and (x2, y2) is divided by a Point P(x, y) in the ratio m:n then
the coordinates will be given by
x = (mx2 + nx1)/(m + n) and y = (my2 + ny1)/(m + n)
Mid-Point Formula
In case of section formula if the ratio becomes equal i.e. 1:1
then it is called Midpoint Formula. Hence, if a Point is mid-
point of a line then its coordinates are given as
x = (x1 + x2)/2 and y = (y1 + y2)/2
Slope of a Line
Slope of a line is the inclination of line with respect to the
coordinate axes. The slope of a line is calculated as m = Tan θ where
θ is the angle between line and the coordinate axis.

The formula for slope of line in cartesian form is given as

m = (y2 – y1)/(x2 – x1)

We know that Cartesian Coordinate System can also be used to


draw graph for various algebraic expressions. In this article we will
learn Cartesian Coordinate Equation of line and plane.

Equation of Line in Cartesian Form


Cartesian Coordinates System Examples
Example 1: Find the distance between the two points A(-2, 3) and B(3, 1)

Solution:

Here we see that each point is indicated by two numbers. Hence this is the case of two dimensional coordinate
system.

Distance between two points in is given as √{(x2 – x1)2 + (y2 – y1)2}

⇒ AB = √{(3 – (-2))2 + (1 – 3)2} = √{(5)2 + (-2)2} = √29 units


Example 2: Find the distance of the points A(2, -1, 4) from the origin

Solution:

Here the point is indicated by three values hence this is a case of 3D Cartesian
Coordinate System. In 3D Cartesian Coordinate System, the distance of the
point from the origin is given as
√(x2 + y2 + z2)

Hence OA = √{(2)2 + (-1)2 + (4)2} = √21 units


Example 4: Find the slope of the line formed by joining the points
(3, 2) and (-3, -2)

Solution:

The slope of a line is given by the formula


m = (y2 – y1)/(x2 – x1)

⇒ m = (-3 – 3)/(-2 – 2) = -6/-4 = 2/3


Example 3: Find the coordinate of point O(x, y) which divides the line
joining the points P(3, 4) and Q(1, 2) in the equal ratio.

Solution:

It is given that O divides PQ in equal ratio. Hence, O is the midpoint of PQ.


Therefore by using midpoint formula we have

x = (3 + 1)/2 and y = (4 + 2)/2

⇒ x = 4/2 = 2 and y = 6/2 = 3

Hence the coordinates of the point is O(2, 3)


Cartesian Coordinate System Questions
Q1: Find the distance between Origin and Point P(-3, -2)

Q2: Find the slope of the line joining the points (-1, 4) and (2, -3)

Q3: Find the equation of a line using slope form of a line which passes through
point (3,4) and slope is 2/3.

Q4: Find the coordinates of a point which is the midpoint of a line joining the
points (1, 3) and (-3, 4).

Q5: Locate Points (-5, 6), (2, -3), (1, 2) and (-1, 0) in Cartesian System.
"Perspective correction" and "perspective
rectification“
Essentially mean the same thing: the process of digitally adjusting an image to eliminate
distortion caused by the angle at which a photo was taken, making lines that appear to
converge in the image appear straight and parallel in the corrected version, often used
to straighten up buildings or objects photographed from a low angle; it's achieved
through a mathematical transformation known as a "projective transformation" in image
editing software.
Key points about perspective correction/rectification:
•What it corrects:
•When you take a picture from a non-parallel angle to a flat surface, lines in the image
can appear to converge towards a vanishing point, creating a distorted
perspective. Perspective correction aims to "un-distort" this effect, making the lines
appear straight and parallel as they would in reality.
•How it works:
•Software like Photoshop utilizes a "perspective warp" tool that allows you to define
reference points on the image (like the corners of a building) and then mathematically
adjust the pixels to create a corrected image where the lines are straight.
Applications:

•Architecture photography: Correcting perspective distortion in


photos of buildings to accurately represent their geometry.

•Document scanning: To straighten out scanned documents with


skewed edges.

•Aerial photography: Adjusting for the perspective distortion


caused by the camera angle when capturing images from above.

•Untitled41.ipynb - Colab

You might also like