0% found this document useful (0 votes)
15 views22 pages

CV Notes unit-1

Computer vision is a multidisciplinary field focused on enabling machines to interpret visual data through algorithms and systems, aiming to replicate human vision capabilities. It has numerous applications, including optical character recognition, medical imaging, self-driving vehicles, and surveillance. Understanding photometric image formation, digital camera components, and various transformations is essential for effective computer vision and graphics techniques.

Uploaded by

Rahul Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

CV Notes unit-1

Computer vision is a multidisciplinary field focused on enabling machines to interpret visual data through algorithms and systems, aiming to replicate human vision capabilities. It has numerous applications, including optical character recognition, medical imaging, self-driving vehicles, and surveillance. Understanding photometric image formation, digital camera components, and various transformations is essential for effective computer vision and graphics techniques.

Uploaded by

Rahul Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

COMPUTER VISION

UNIT - 1

1.1 Introduction
Computer vision is a multidisciplinary field that enables machines to
interpret and make decisions based on visual data. It involves the
development of algorithms and systems that allow computers to gain
high-level understanding from digital images or videos. The goal of
computer vision is to replicate and improve upon human vision
capabilities, enabling machines to recognize and understand visual
information.
In computer vision, we are trying to describe the world that we see in
one or more images and to reconstruct its properties, such as shape,
illumination, and color distributions. Computer vision is so difficult?
Because it is an inverse problem, in which we seek to recover some
unknowns given insufficient information to fully specify the solution.
We must therefore resort to physics-based and probabilistic models, or
machine learning from large sets of examples, to disambiguate between
potential solutions. However, modeling the visual world in all of its
rich complexity is far more difficult.

1.2 Computer vision is being used today in a wide variety of


real-world applications, which include:

• Optical character recognition (OCR): Reading handwritten postal


codes on letters and automatic number plate recognition (ANPR);
• Machine inspection: Rapid parts inspection for quality assurance
using stereo vision with specialized illumination to measure tolerances
on aircraft wings or auto body parts or looking for defects in steel
castings using X-ray vision;
• Retail: Object recognition for automated checkout lanes and fully
automated stores.
• Warehouse logistics: Autonomous package delivery and pallet-
carrying “drives” and parts picking by robotic manipulators.
• Medical imaging: Registering pre-operative and intra-operative
imagery or performing long-term studies of people’s brain morphology
as they age;
• Self-driving vehicles: Capable of driving point-to-point between
cities, autonomous flight.
• 3D model building (photogrammetry): Fully automated
construction of 3D models from aerial and drone photographs
• Match move: Merging computer-generated imagery (CGI) with live
action footage by tracking feature points in the source video to estimate
the 3D camera motion and shape of the environment. Such techniques
are widely used in Hollywood, e.g., in movies
such as Jurassic Park. They also require the use of precise matting to
insert new elements between foreground and background elements.
• Motion capture (mocap): Using retro-reflective markers viewed
from multiple cameras or other vision-based techniques to capture
actors for computer animation;
• Surveillance: Monitoring for intruders, analyzing highway traffic and
monitoring pools for drowning victims.
• Fingerprint recognition and biometrics: For automatic access
authentication as well as forensic applications.
• Face detection: For improved camera focusing as well as more
relevant image searching.
• Visual authentication: Automatically logging family members onto
your home computer as they sit down in front of the webcam.
2.1 Photometric image formation

Photometric image formation refers to the process by which light


interacts with surfaces and is captured by a camera, resulting in the
creation of a digital image. This process involves various factors
related to the properties of light, the surfaces of objects, and the
characteristics of the imaging system. Understanding photometric
image formation is crucial in computer vision, computer graphics, and
image processing.

Here are some key concepts involved:

Illumination:
 Ambient Light: The overall illumination of a scene
that comes from all directions.
 Directional Light: Light coming from a specific
direction, which can create highlights and shadows.

Reflection:
 Diffuse Reflection: Light that is scattered in various
directions by rough surfaces.
 Specular Reflection: Light that reflects off
smooth surfaces in a concentrated direction,
creating highlights.
Shading:
 Lambertian Shading: A model that assumes
diffuse reflection and constant shading across a
surface.
 Phong Shading: A more sophisticated model that
considers specular reflection, creating more
realistic highlights.

Surface Properties:
 Reflectance Properties: Material characteristics that
determine how light is reflected (e.g., diffuse and
specular reflectance).
 Albedo: The inherent reflectivity of a surface,
representing the fraction of incident light that is
reflected.

Lighting Models:
 Phong Lighting Model: Combines diffuse and
specular reflection components to model lighting.
 Blinn-Phong Model: Similar to the Phong model but
computationally more efficient.

Shadows:
 Cast Shadows: Darkened areas on surfaces where light
is blocked by other objects.
 Self Shadows: Shadows cast by parts of an object onto
itself.

Colour and Intensity:

 Colour Reflection Models: Incorporate the colour


properties of surfaces in addition to Image.
Cameras:

 Camera Exposure: The amount of


light allowed to reach the camera
sensor or film.
 Camera Response Function: Describes how a camera
responds to light of different intensities.

3.1 The digital camera


A digital camera is an electronic device that captures and stores digital
images. It differs from traditional film cameras in that it uses electronic
sensors to record images rather than photographic film. Digital
cameras have become widespread due to their convenience, ability to
instantly review images, and ease of sharing and storing photos
digitally.

Here are key components and concepts related to digital


cameras:

Sensor:
 Digital cameras use image sensors (such as CCD or CMOS) to
convert light into electrical signals.
 The sensor captures the image by measuring the
intensity of light at each pixel location.

Lens:
 The lens focuses light onto the image sensor.
 Zoom lenses allow users to adjust the focal length,
providing optical zoom.
Aperture:
 The aperture is an adjustable opening in the lens that
controls the amount of light entering the camera.
 It affects the depth of field and exposure.

Shutter:
 The shutter mechanism controls the duration of
light exposure to the image sensor.
 Fast shutter speeds freeze motion, while slower
speeds create motion blur.

Viewfinder and LCD Screen:


 Digital cameras typically have an optical or
electronic viewfinder for composing shots.
 LCD screens on the camera back allow users to review
and frame images.

Image Processor:
 Digital cameras include a built-in image processor to
convert raw sensor data into a viewable image.
 Image processing algorithms may enhance color,
sharpness, and reduce noise.

Memory Card:
 Digital images are stored on removable memory
cards, such as SD or CF cards.
 Memory cards provide a convenient and portable way to store and
transfer images.
Autofocus and Exposure Systems:
 Autofocus systems automatically adjust the lens to ensure a sharp
image.
 Exposure systems determine the optimal combination
of aperture, shutter speed, and ISO sensitivity for
proper exposure.

White Balance:
 White balance settings adjust the color temperature of
the captured image to match different lighting
conditions.

Modes and Settings:


 Digital cameras offer various shooting modes (e.g.,
automatic, manual, portrait, landscape) and settings
to control image parameters.

Connectivity:
 USB, HDMI, or wireless connectivity allows users
to transfer images to computers, share online, or
connect to other devices.

Battery:
 Digital cameras are powered by rechargeable
batteries, providing the necessary energy for
capturing and processing images.
Intensity:
 The brightness of light color in an image.
4.1 Transformations
Transformation means changing some graphics into something else by
applying rules. We can have various types of transformations such as
translation, scaling up or down, rotation, shearing, etc.
When a transformation takes place on a 2D plane, it is called 2D
transformation.
Transformations play an important role in computer graphics to
reposition the graphics on the screen and change their size or
orientation.

Homogenous Coordinates
To perform a sequence of transformation such as translation followed
by rotation and scaling, we need to follow a sequential process −

 Translate the coordinates,


 Rotate the translated coordinates, and then
 Scale the rotated coordinates to complete the composite
transformation.

To shorten this process, we have to use 3×3 transformation matrix


instead of 2×2 transformation matrix. To convert a 2×2 matrix to 3×3
matrix, we have to add an extra dummy coordinate W.
In this way, we can represent the point by 3 numbers instead of 2
numbers, which is called Homogenous Coordinate system. In this
system, we can represent all the transformation equations in matrix
multiplication. Any Cartesian point P(X, Y) can be converted to
homogenous coordinates by P’ (Xh, Yh, h).

4.1.1 Translation:
A translation moves an object to a different position on the screen. You
can translate a point in 2D by adding translation coordinate (tx, ty) to
the original coordinate (X, Y) to get the new coordinate (X’, Y’).
From the above figure, you can write that −
X’ = X + tx
Y’ = Y + ty
The pair (tx, ty) is called the translation vector or shift vector. We can
also write it as −
P’ = P + T

4.1.2 Rotation:
In rotation, we rotate the object at particular angle θ (theta) from its
origin. From the following figure, we can see that the point P(X, Y) is
located at angle φ from the horizontal X coordinate with distance r from
the origin.

Let us suppose you want to rotate it at the angle θ. After rotating it to a


new location, you will get a new point P’ (X’, Y’).
4.1.3 Scaling:
To change the size of an object, scaling transformation is used. In the
scaling process, you either expand or compress the dimensions of the
object. Scaling can be achieved by multiplying the original coordinates
of the object with the scaling factor to get the desired result.

Let us assume that the original coordinates are (X, Y), the scaling
factors are (SX, SY), and the produced coordinates are (X’, Y’). This
can be mathematically represented as shown below −

X' = X . SX and Y' = Y . SY

The scaling factor SX, SY scales the object in X and Y direction


respectively. The above equations can also be represented in matrix
form as below –

OR P’ = P . S

Where S is the scaling matrix. The scaling process is shown in the


following figure.

If we provide values less than 1 to the scaling factor S, then we can


reduce the size of the object. If we provide values greater than 1, then
we can increase the size of the object.
4.1.4 Reflection:
Reflection is the mirror image of original object. In other words, we
can say that it is a rotation operation with 180°. In reflection
transformation, the size of the object does not change.

The following figures show reflections with respect to X and Y axes,


and about the origin respectively.
4.1.5 Shear:
A transformation that slants the shape of an object is called the shear
transformation. There are two shear transformations X-Shear and Y-
Shear. One shifts X coordinates values and other shifts Y coordinate
values. However; in both the cases only one coordinate changes its
coordinates and other preserves its values. Shearing is also termed as
Skewing.

 X-Shear
The X-Shear preserves the Y coordinate and changes are made to
X coordinates, which causes the vertical lines to tilt right or left
as shown in below figure.
 Y-Shear

The Y-Shear preserves the X coordinates and changes the Y


coordinates which causes the horizontal lines to transform into
lines which slopes up or down as shown in the following figure.
5.1 Composite Transformation
If a transformation of the plane T1 is followed by a second plane
transformation T2, then the result itself may be represented by a single
transformation T which is the composition of T1 and T2 taken in that
order. This is written as T = T1∙T2.

Composite transformation can be achieved by concatenation of


transformation matrices to obtain a combined transformation matrix.

A combined matrix −

[T][X] = [X] [T1] [T2] [T3] [T4] …. [Tn]

Where [Ti] is any combination of

 Translation
 Scaling
 Shearing
 Rotation
 Reflection

The change in the order of transformation would lead to different


results, as in general matrix multiplication is not cumulative, that is [A]
. [B] ≠ [B] . [A] and the order of multiplication. The basic purpose of
composing transformations is to gain efficiency by applying a single
composed transformation to a point, rather than applying a series of
transformation, one after another.

For example, to rotate an object about an arbitrary point (Xp, Yp), we


have to carry out three steps –

 Translate point (Xp, Yp) to the origin.


 Rotate it about the origin.
 Finally, translate the center of rotation back where it
belonged.
In the 2D system, we use only two coordinates X and Y but in 3D, an
extra coordinate Z is added. 3D graphics techniques and their
application are fundamental to the entertainment, games, and
computer-aided design industries. It is a continuing area of research in
scientific visualization.
Furthermore, 3D graphics components are now a part of almost every
personal computer and, although traditionally intended for graphics-
intensive software such as games, they are increasingly being used by
other applications.

5.1.1 Parallel Projection


Parallel projection discards z-coordinate and parallel lines from each
vertex on the object are extended until they intersect the view plane. In
parallel projection, we specify a direction of projection instead of
center of projection.

In parallel projection, the distance from the center of projection to


project plane is infinite. In this type of projection, we connect the
projected vertices by line segments which correspond to connections
on the original object.
Parallel projections are less realistic, but they are good for exact
measurements. In this type of projections, parallel lines remain parallel
and angles are not preserved. Various types of parallel projections are
shown in the following hierarchy.

5.1.2 Orthographic Projection


In orthographic projection the direction of projection is normal to the
projection of the plane. There are three types of orthographic
projections −

 Front Projection
 Top Projection
 Side Projection
5.1.3 Oblique Projection
In oblique projection, the direction of projection is not normal to the
projection of plane. In oblique projection, we can view the object better
than orthographic projection.

There are two types of oblique projections − Cavalier and Cabinet.


The Cavalier projection makes 45° angle with the projection plane. The
projection of a line perpendicular to the view plane has the same length
as the line itself in Cavalier projection. In a cavalier projection, the
foreshortening factors for all three principal directions are equal.

The Cabinet projection makes 63.4° angle with the projection plane. In
Cabinet projection, lines perpendicular to the viewing surface are
projected at ½ their actual length. Both the projections are shown in the
following figure –
5.1.4 Isometric Projections
Orthographic projections that show more than one side of an object are
called axonometric orthographic projections. The most common
axonometric projection is an isometric projection where the
projection plane intersects each coordinate axis in the model coordinate
system at an equal distance. In this projection parallelism of lines are
preserved but angles are not preserved. The following figure shows
isometric projection –
5.1.5 Perspective Projection
In perspective projection, the distance from the center of projection to
project plane is finite and the size of the object varies inversely with
distance which looks more realistic.

The distance and angles are not preserved and parallel lines do not
remain parallel. Instead, they all converge at a single point called
center of projection or projection reference point. There are 3 types
of perspective projections which are shown in the following chart.

 One point perspective projection is simple to draw.


 Two point perspective projection gives better impression of
depth.
 Three point perspective projection is most difficult to
draw.

The following figure shows all the three types of perspective


projection –
Translation

In 3D translation, we transfer the Z coordinate along with the X and Y


coordinates. The process for translation in 3D is similar to 2D
translation. A translation moves an object into a different position on
the screen.
The following figure shows the effect of translation −

A point can be translated in 3D by adding translation coordinate


(tx,ty,tz)
to the original coordinate (X, Y, Z) to get the new coordinate
(X’, Y’, Z’).

You might also like