CV Notes unit-1
CV Notes unit-1
UNIT - 1
1.1 Introduction
Computer vision is a multidisciplinary field that enables machines to
interpret and make decisions based on visual data. It involves the
development of algorithms and systems that allow computers to gain
high-level understanding from digital images or videos. The goal of
computer vision is to replicate and improve upon human vision
capabilities, enabling machines to recognize and understand visual
information.
In computer vision, we are trying to describe the world that we see in
one or more images and to reconstruct its properties, such as shape,
illumination, and color distributions. Computer vision is so difficult?
Because it is an inverse problem, in which we seek to recover some
unknowns given insufficient information to fully specify the solution.
We must therefore resort to physics-based and probabilistic models, or
machine learning from large sets of examples, to disambiguate between
potential solutions. However, modeling the visual world in all of its
rich complexity is far more difficult.
Illumination:
Ambient Light: The overall illumination of a scene
that comes from all directions.
Directional Light: Light coming from a specific
direction, which can create highlights and shadows.
Reflection:
Diffuse Reflection: Light that is scattered in various
directions by rough surfaces.
Specular Reflection: Light that reflects off
smooth surfaces in a concentrated direction,
creating highlights.
Shading:
Lambertian Shading: A model that assumes
diffuse reflection and constant shading across a
surface.
Phong Shading: A more sophisticated model that
considers specular reflection, creating more
realistic highlights.
Surface Properties:
Reflectance Properties: Material characteristics that
determine how light is reflected (e.g., diffuse and
specular reflectance).
Albedo: The inherent reflectivity of a surface,
representing the fraction of incident light that is
reflected.
Lighting Models:
Phong Lighting Model: Combines diffuse and
specular reflection components to model lighting.
Blinn-Phong Model: Similar to the Phong model but
computationally more efficient.
Shadows:
Cast Shadows: Darkened areas on surfaces where light
is blocked by other objects.
Self Shadows: Shadows cast by parts of an object onto
itself.
Sensor:
Digital cameras use image sensors (such as CCD or CMOS) to
convert light into electrical signals.
The sensor captures the image by measuring the
intensity of light at each pixel location.
Lens:
The lens focuses light onto the image sensor.
Zoom lenses allow users to adjust the focal length,
providing optical zoom.
Aperture:
The aperture is an adjustable opening in the lens that
controls the amount of light entering the camera.
It affects the depth of field and exposure.
Shutter:
The shutter mechanism controls the duration of
light exposure to the image sensor.
Fast shutter speeds freeze motion, while slower
speeds create motion blur.
Image Processor:
Digital cameras include a built-in image processor to
convert raw sensor data into a viewable image.
Image processing algorithms may enhance color,
sharpness, and reduce noise.
Memory Card:
Digital images are stored on removable memory
cards, such as SD or CF cards.
Memory cards provide a convenient and portable way to store and
transfer images.
Autofocus and Exposure Systems:
Autofocus systems automatically adjust the lens to ensure a sharp
image.
Exposure systems determine the optimal combination
of aperture, shutter speed, and ISO sensitivity for
proper exposure.
White Balance:
White balance settings adjust the color temperature of
the captured image to match different lighting
conditions.
Connectivity:
USB, HDMI, or wireless connectivity allows users
to transfer images to computers, share online, or
connect to other devices.
Battery:
Digital cameras are powered by rechargeable
batteries, providing the necessary energy for
capturing and processing images.
Intensity:
The brightness of light color in an image.
4.1 Transformations
Transformation means changing some graphics into something else by
applying rules. We can have various types of transformations such as
translation, scaling up or down, rotation, shearing, etc.
When a transformation takes place on a 2D plane, it is called 2D
transformation.
Transformations play an important role in computer graphics to
reposition the graphics on the screen and change their size or
orientation.
Homogenous Coordinates
To perform a sequence of transformation such as translation followed
by rotation and scaling, we need to follow a sequential process −
4.1.1 Translation:
A translation moves an object to a different position on the screen. You
can translate a point in 2D by adding translation coordinate (tx, ty) to
the original coordinate (X, Y) to get the new coordinate (X’, Y’).
From the above figure, you can write that −
X’ = X + tx
Y’ = Y + ty
The pair (tx, ty) is called the translation vector or shift vector. We can
also write it as −
P’ = P + T
4.1.2 Rotation:
In rotation, we rotate the object at particular angle θ (theta) from its
origin. From the following figure, we can see that the point P(X, Y) is
located at angle φ from the horizontal X coordinate with distance r from
the origin.
Let us assume that the original coordinates are (X, Y), the scaling
factors are (SX, SY), and the produced coordinates are (X’, Y’). This
can be mathematically represented as shown below −
OR P’ = P . S
X-Shear
The X-Shear preserves the Y coordinate and changes are made to
X coordinates, which causes the vertical lines to tilt right or left
as shown in below figure.
Y-Shear
A combined matrix −
Translation
Scaling
Shearing
Rotation
Reflection
Front Projection
Top Projection
Side Projection
5.1.3 Oblique Projection
In oblique projection, the direction of projection is not normal to the
projection of plane. In oblique projection, we can view the object better
than orthographic projection.
The Cabinet projection makes 63.4° angle with the projection plane. In
Cabinet projection, lines perpendicular to the viewing surface are
projected at ½ their actual length. Both the projections are shown in the
following figure –
5.1.4 Isometric Projections
Orthographic projections that show more than one side of an object are
called axonometric orthographic projections. The most common
axonometric projection is an isometric projection where the
projection plane intersects each coordinate axis in the model coordinate
system at an equal distance. In this projection parallelism of lines are
preserved but angles are not preserved. The following figure shows
isometric projection –
5.1.5 Perspective Projection
In perspective projection, the distance from the center of projection to
project plane is finite and the size of the object varies inversely with
distance which looks more realistic.
The distance and angles are not preserved and parallel lines do not
remain parallel. Instead, they all converge at a single point called
center of projection or projection reference point. There are 3 types
of perspective projections which are shown in the following chart.