0% found this document useful (0 votes)
61 views92 pages

RMK Gropup21cs905 CV Unit 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views92 pages

RMK Gropup21cs905 CV Unit 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

1

2
Please read this disclaimer before
proceeding:
This document is confidential and intended solely for the
educational purpose of RMK Group of Educational Institutions. If
you have received this document through email in error,
please notify the system manager. This document contains
proprietary information and is intended only to the respective
group / learning community as intended. If you are not the
addressee you should not disseminate, distribute or copy
through e-mail. Please notify the sender immediately by e-mail
if you have received this document by mistake and delete this
document from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.

3
21AM702
COMPUTER
VISION

Department:
ARTIFICIAL INTELLIGENCE AND
MACHINE LEARNING
Batch/Year: BATCH 2021-25/IV
Created by:
Dr.V.Seethalakshmi, Associate Professor, AI&DS, RMKECT

Date: 20-08-2024

4
Table of Contents
Sl. No. Contents Page No.

1 Contents 5

2 Course Objectives 6

3 Pre Requisites (Course Name with Code) 8

4 Syllabus (With Subject Code, Name, LTPC details) 10

5 Course Outcomes (6) 12

6 CO-PO/PSO Mapping 14
Lecture Plan (S.No., Topic, No. of Periods, Proposed
7 date, Actual Lecture Date, pertaining CO, Taxonomy 17
level, Mode of Delivery)
8 Activity based learning 19
Lecture Notes (with Links to Videos, e-book reference,
9 21
PPTs, Quiz and any other learning materials )
Assignments ( For higher level learning and Evaluation
10 65
- Examples: Case study, Comprehensive design, etc.,)
11 Part A Q & A (with K level and CO) 70

12 Part B Qs (with K level and CO) 76


Supportive online Certification courses (NPTEL,
13 79
Swayam, Coursera, Udemy, etc.,)
14 Real time Applications in day to day life and to Industry 81
Contents beyond the Syllabus ( COE related Value
15 84
added courses)

16 Assessment Schedule ( Proposed Date & Actual Date) 86

17 Prescribed Text Books & Reference Books 88

18 Mini Project 90

5
Course
Objectives

6
COURSE OBJECTIVES

 To understand the fundamental


concepts related to Image formation and
processing.
 To learn feature detection, matching
and detection
 To become familiar with feature
based alignment and motion estimation
 To develop skills on 3D reconstruction
 To understand image based rendering
and recognition

7
PRE
REQUISITES

8
PRE REQUISITES

 SUBJECT CODE: 22MA101


 SUBJECT NAME: Matrices and Calculus

 SUBJECT CODE: 22MA201


 SUBJECT NAME: Transforms and Numerical
Methods

9
Syllabus

10
Syllabus

21AM702 COMPUTER VISION L T P C


3 0 0 3
UNIT I INTRODUCTION TO IMAGE FORMATION AND PROCESSING
15
Computer Vision - Geometric primitives and transformations - Photometric image
formation - The digital camera - Point operators - Linear filtering - More
neighborhood operators - Fourier transforms - Pyramids and wavelets - Geometric
transformations - Global optimization.

UNIT II FEATURE DETECTION, MATCHING AND SEGMENTATION 15


Points and patches - Edges - Lines - Segmentation - Active contours - Split and
merge - Mean shift and mode finding - Normalized cuts - Graph cuts and energy-
based methods.

UNIT III FEATURE-BASED ALIGNMENT & MOTION ESTIMATION 15


2D and 3D feature-based alignment - Pose estimation - Geometric intrinsic
calibration - Triangulation - Two-frame structure from motion - Factorization -
Bundle adjustment - Constrained structure and motion - Translational alignment -
Parametric motion - Spline-based motion - Optical flow - Layered motion.

UNIT IV 3D RECONSTRUCTION 15
Shape from X - Active rangefinding - Surface representations - Point-based
representations- Volumetric representations - Model-based reconstruction -
Recovering texture maps and albedosos.

UNIT V IMAGE-BASED RENDERING AND RECOGNITION 15


View interpolation Layered depth images - Light fields and Lumigraphs -
Environment mattes -Video-based rendering-Object detection - Face recognition -
Instance recognition – Category recognition - Context and scene understanding-
Recognition databases and test sets.

11
Course
Outcomes

12
Course Outcomes
Course Description Knowledge
Outcomes Level

CO4 To understand basic knowledge, theories and methods in K2


image processing and computer
vision.
To implement basic and some advanced image processing
CO2 K3
techniques in OpenCV.

CO4 To apply 2D a feature-based based image alignment, K3


segmentation and motion estimations
To apply 3D image reconstruction techniques
CO4 K4

CO5 To design and develop innovative image processing and K5


computer vision applications.

Knowledge Level Description

K6 Evaluation

K5 Synthesis

K4 Analysis

K3 Application

K2 Comprehension

K1 Knowledge

13
CO – PO/PSO
Mapping

14
CO – PO /PSO Mapping
Matrix
CO PO PO PO PO PO PO PO PO PO PO PO PO PS PS PS
1 2 3 4 5 6 7 8 9 10 11 12 O1 O2 03
1 3 2 1 1 3

2 3 3 2 2 3

3 3 3 1 1 3

4 3 3 1 1 3

5 3 3 1 1 3

6 2 2 1 1 3

15
UNIT –IV 3D
RECONSTRUCTION

16
Lecture
Plan

17
Lecture Plan – Unit 4 – 3D
RECONSTRUCTION
Numb er
Sl. Propose Actual Ta Mode of
of
No. Topic d Lectur CO xo Delivery
Period s
Date e Date no
my
Le
vel
Shape from X 1 11-09- CO4 K Blackboard
1 24 3 / ICT Tools

Blackboard
2 Active rangefinding 1 12-09-24 CO4 K3
/ ICT Tools
Surface representations 13-09-24
3 1 CO4 K4 Blackboard
/ ICT Tools
Point-based 14-09-24
1 CO4 K4 Blackboard
4 representations / ICT Tools
Volumetric Blackboard
5 representations 18-09-24 / ICT Tools
1 CO4 K4
19-09-24 Blackboard
Model-based
6 1 CO4 K3 / ICT Tools
reconstruction

19-09-24
Model-based 1 CO4 K4 Blackboard
7. / ICT Tools
reconstruction
1 20-09-24 CO4 K4 Blackboard
8 Recovering texture / ICT Tools
maps and
albedosos.
9 Recovering texture maps 1 CO4 K4 Blackboard
21-09-24 / ICT Tools
and albedosos.

18
Activity Based
Learning

19
Activity Based Learning
Sl. No. Contents Page No.

1 Optical Flow 63

Example Activities for Optical Flow:


1.Basic Algorithm Implementation:
 Students implement the Lucas-Kanade optical flow
algorithm from scratch and visualize motion vectors
on sample videos.
2.Real-Time Motion Detection:
 Develop a real-time system that detects and
highlights moving objects in a live video feed using
optical flow.
3.Optical Flow-Based Object Tracking:
 Create an application that tracks a specific object
moving in a video sequence using dense or sparse
optical flow.
4.Comparison of Optical Flow Techniques:
 Conduct experiments comparing different optical flow
methods (e.g., classical vs. deep learning-based) in
terms of accuracy, speed, and robustness to noise.
5.Layered Motion Analysis:
 Use optical flow to segment a video scene into
multiple motion layers, allowing for activities like
separating foreground and background motions or
analyzing individual movements.

20
Lecture Notes –
Unit 4

21
UNIT-4 _3D RECONSTRUCTION

Sl. Contents Page

No. No.

1 Shape from X 23

2 Active rangefinding 30

3 Surface representations 34

4 Point-based representations 39

5 Volumetric representations 41
Model-based reconstruction
6 45

7 Recovering texture maps and albedosos. 58

22
UNIT–IV
3DRECONSTRUCTION

Shape from X - Active range finding - Surface representations - Point-


based representations- Volumetric representations - Model-based
reconstruction - Recovering texture maps and albedosos

4.1 SHAPE FROM X


In addition to binocular disparity, shading, texture, and focus all play a
role in how we perceive shape. The study of how shape can be inferred
from such cues is sometimes called shape from X, because the individual
instances are called shape from shading, shape from texture, and shape
from focus.
4.1.1 Shape from shading and photometric stereo
When you look at images of smooth shaded objects, such as the
ones shown in Figure 13.2, you can clearly see the shape of the
object from just the shading variation.

The problem of recovering the shape of a surface from this


intensity variation is known as shape from shading and is one of
the classic problems in computer vision.

Most shape from shading algorithms assume that the surface


under consideration is of a uniform albedo and reflectance, and
that the light source directions are either known or can be
calibrated by the use of a reference object. Under the
assumptions of distant light sources and observer, the variation
in intensity (irradiance equation) becomes purely a function of
the local surface orientation,
23
Unit IV– 3D RECONSTRUCTION

24
Unit IV– 3D RECONSTRUCTION

 Instead of first recovering the orientation fields (p,q) and integrating


them to obtain a surface, it is also possible to directly minimize the
discrepancy in the image formation equation (13.1) while finding the
optimal depth map z(x,y).

 Unfortunately, shape from shading is susceptible to local minima in the


search space and, like other variational problems that involve the
simultaneous estimation of many variables, can also suffer from slow
convergence. Using multi-resolution techniques can help accelerate the
convergence, while using more sophisticated optimization techniques
can help avoid local minima.

4.1.2 Photometric stereo


 Another way to make shape from shading more reliable is to use
multiple light

 sources that can be selectively turned on and off. This technique is called
photometric stereo, as the light sources play a role analogous to the
cameras located at different locations in traditional stereo.

25
Unit IV– 3D RECONSTRUCTION

For each light source, we have a different reflectance map, R1(p,q), R2(p,q), etc. Given
the corresponding intensities I1, I2, etc. at a pixel, we can in principle recover both an
unknown albedo _ and a surface orientation estimate (p,q).

For diffuse surfaces (13.2), if we parameterize the local orientation by ^n, we get (for
non-shadowed pixels) a set of linear equations of the form

From which we can recover _^n using linear least squares. These equations are well conditioned
as long as the (three or more) vectors vk are linearly independent, i.e., they are not along the
same azimuth (direction away from the viewer).

Once the surface normals or gradients have been recovered at each pixel, they
can be integrated into a depth map using a variant of regularized surface fitting
.The combination of multi-view stereo for coarse shape and photometric stereo for
fine detail continues to be an active area of research describe such a system that
can produce very high-quality scans (Figure 13.3), although it requires a
sophisticated laboratory setup

26
Unit IV– 3D RECONSTRUCTION

It is also possible to apply photometric stereo to outdoor web camera sequences (Figure 13.4),
using the trajectory of the Sun as a variable direction illuminator. When surfaces are specular,
more than three light directions may be required. In fact, the irradiance equation given in (13.1)
not only requires that the light sources and camera be distant from the surface, it also neglects
inter-reflections, which can be a significant source of the shading observed on object surfaces,
e.g., the darkening seen inside concave structures such as grooves and crevasses.

4.1.3 Shape From Texture

27
Unit IV– 3D RECONSTRUCTION

The variation in foreshortening observed in regular textures can also provide useful
information about local surface orientation. Figure 13.5 shows an example of such a
pattern, along with the estimated local surface orientations. Shape from texture
algorithms require a number of processing steps, including the extraction of repeated
patterns or the measurement of local frequencies to compute local affine deformations,
and a subsequent stage to infer local surface orientation.

When the original pattern is regular, it is possible to fit a regular but slightly deformed
grid to the image and use this grid for a variety of image replacement or analysis tasks.

The deformations induced in a regular pattern when it is viewed in the reflection of a


curved mirror, as shown in Figure 13.5c–d, can be used to recover the shape of the
surface. It is also possible to infer local shape information from specular flow, i.e., the
motion of specularities when viewed from a moving camera.

4.1.4 Shape from focus

A number of techniques have been developed to estimate depth from the amount of
defocus (depth from defocus). To make such a technique practical, a number of issues
need to be addressed:

The amount of blur increase in both directions as you move away from the focus plane.
Therefore, it is necessary to use two or more images captured with different focus
distance settings or to translate the object in depth and look for the point of maximum
sharpness.

The magnification of the object can vary as the focus distance is changed or the object is
moved. This can be modeled either explicitly (making correspondence more difficult) or
using telecentric optics, which approximate an orthographic camera and require an
aperture in front of the lens.

28
Unit IV– 3D RECONSTRUCTION

The amount of defocus must be reliably estimated. A simple approach is to average the
squared gradient in a region, but this suffers from several problems, including the image
magnification problem mentioned above. A better solution is to use carefully designed
rational filters. Figure 13.6 shows an example of a real-time depth from defocus sensor,
which employs two imaging chips at slightly different depths sharing a common optical
path, as well as an active illumination system that projects a checkerboard pattern from
the same direction. As you can see in Figure 13.6b–g, the system produces high-
accuracy real-time depth maps for both static and dynamic scenes.

29
Unit IV– 3D RECONSTRUCTION

4.2 ACTIVE RANGE FINDING

30
Unit IV– 3D RECONSTRUCTION

31
Unit IV– 3D RECONSTRUCTION

32
Unit IV– 3D RECONSTRUCTION

33
Unit IV– 3D RECONSTRUCTION

4.3 SURFACE REPRESENTATIONS

Surface representations in computer vision refer to the ways in which the geometry or
shape of surfaces in a three-dimensional (3D) scene is represented. These representations
are crucial for tasks such as 3D reconstruction, computer graphics, and virtual reality.
Different methods exist for representing surfaces, and the choice often depends on the
application's requirements and the characteristics of the data.

4.3.1 Surface interpolation:

One of the most common operations on surfaces is their reconstruction from a set of
sparse data constraints, i.e., scattered data interpolation, When formulating such
problems, surfaces may be parameterized as height fields f(x), as 3D parametric surfaces
f (x), or as non-parametric models such as collections of triangles.

Two-dimensional function interpolation and approximation problem

could be cast as energy minimization problems using regularization . Such problems can
also specify the locations of discontinuities in the surface as well as local orientation
constraints

One approach to solving such problems is to discretize both the surface and the energy
on a discrete grid or mesh using finite element analysis. Such problems can then be
solved using sparse system solving techniques, such as multigrid or hierarchically
preconditioned conjugate gradient.

The surface can also be represented using a hierarchical combination of multilevel B-


splines. An alternative approach is to use radial basis (or kernel) functions

34
Unit IV– 3D RECONSTRUCTION

Unfortunately, because the dense system solving is cubic in the number of data points,
basis function approaches can only be used for small problems such as feature based
image morphing .

When a three-dimensional parametric surface is being modeled, the vector-valued


function f in encodes 3D coordinates (x, y, z) on the surface and the domain x = (s; t)
encodes the surface parameterization.

One example of such surfaces are symmetry seeking parametric models, which are
elastically deformable versions of generalized cylinders. In these models, s is the
parameter along the spine of the deformable tube and t is the parameter around the
tube. A variety of smoothness and radial symmetry forces are used to constrain the
model while it is fitted to image-based silhouette curves. It is also possible to define non-
parametric surface models, such as general triangulated meshes, and to equip such
meshes (using finite element analysis) with both internal smoothness metrics and
external data fitting metrics.

Both parametric and non-parametric surface models assume that the topology of the
surface is known and fixed ahead of time. For more flexible surface modeling, we can
either represent the surface as a collection of oriented points or use 3D implicit
functions, which can also be combined with elastic 3D surface models. The field of
surface reconstruction from unorganized point samples continues to advance rapidly,
with more recent work addressing issues with data imperfections.

simplified:

Surface interpolation involves reconstructing surfaces from sparse data constraints.

Surfaces can be parameterized as height fields, 3D parametric surfaces, or non-


parametric models like triangle collections.

35
Unit IV– 3D RECONSTRUCTION

Interpolation problems are cast as energy minimization tasks, allowing for discontinuity and
orientation constraint specification.

Finite element analysis discretizes surfaces and energies onto grids or meshes for solution using
techniques like multigrid or conjugate gradient.

Hierarchical B-splines or radial basis functions offer alternative approaches but are limited by
computational complexity.

Three-dimensional parametricsurfaces encode surface coordinates and


parameterization domains.

Non-parametric models like triangulated meshes use internal smoothness and external data
fitting metrics.

Flexible surface modeling involves oriented points or 3D implicit functions, sometimes combined
with elastic 3D surface models.

Ongoing advances in surface reconstruction from unorganized point samples address challenges
like data imperfections.

4. 3.2 Surface simplification

Once a triangle mesh has been created from 3D data, it is often desirable to create a
hierarchy of mesh models, for example, to control the displayed level of detail (LOD) in a
computer graphics application. One approach to doing this is to approximate a given
mesh with one that has subdivision connectivity, over which a set of triangular wavelet
coefficients can then be computed. A more continuous approach is to use sequential
edge collapse operations to go from the original fine-resolution mesh to a coarse base-
level mesh. The resulting progressive mesh (PM) representation can be used to render
the 3D model at arbitrary levels of detail, as shown in Figure 13.15.

36
Unit IV– 3D RECONSTRUCTION
SIMPLIFIED:

Hierarchy of Mesh Models:

After creating a triangle mesh from 3D data, it's often useful to establish a hierarchy of
mesh models to control the level of detail (LOD) displayed in computer graphics
applications.

Subdivision Connectivity:

One method involves approximating the given mesh with one having subdivision
connectivity. This allows computation of triangular wavelet coefficients over the mesh,
facilitating LOD control.

Sequential Edge Collapse:

Another approach employs sequential edge collapse operations to transition from the
original fine-resolution mesh to a coarse base-level mesh. This results in a progressive
mesh (PM) representation.

Progressive Mesh (PM)

The PM representation enables rendering the 3D model at arbitrary levels of detail. It


provides flexibility in displaying different levels of detail based on requirements, enhancing
rendering efficiency and optimizing performance in graphics applications.

37
Unit IV– 3D RECONSTRUCTION

4.3.3 Geometry images

While multi-resolution surface representations support level of detail operations, they still
consist of an irregular collection of triangles, which makes them more difficult to compress
and store in a cache-efficient manner.

To make the triangulation completely regular (uniform and gridded), describe how to
create geometry images by cutting surface meshes along well chosen lines and
“flattening” the resulting representation into a square. Figure 13.16a shows the
resulting (x; y; z) values of the surface mesh mapped over the unit square, while Figure
13.16b shows the associated (nx; ny; nz) normal map, i.e., the surface normals
associated with each mesh vertex, which can be used to compensate for loss in visual
fidelity if the original geometry image is heavily compressed.
SIMPLIFIED:
Creating Geometry Images:
To transform multi-resolution surface representations into a completely regular and
gridded form, geometry images are constructed. This process involves cutting surface
meshes along well-chosen lines and then "flattening" the resulting representation into a
square.
Mapping Surface Mesh onto Unit Square:
In Figure 13.16a, the resulting (x, y, z) values of the surface mesh are mapped over the
unit square. This mapping ensures that the irregular collection of triangles is
transformed into a regular and gridded format.
38
Unit IV– 3D RECONSTRUCTION

Normal Map Generation:

Figure 13.16b depicts the associated (nx, ny, nz) normal map. This map represents the
surface normals associated with each mesh vertex. These normals can compensate for any
loss in visual fidelity that may occur due to heavy compression of the original geometry
image.

4.4 POINT-BASED REPRESENTATIONS

As we mentioned previously, triangle-based surface models assume that the topology (and often
the rough shape) of the 3D model is known ahead of time. While it is possible to re-mesh a
model as it is being deformed or fitted, a simpler solution is to dispense with an explicit triangle
mesh altogether and to have triangle vertices behave as oriented points, or particles, or surface
elements (surfels).

To endow the resulting particle system with internal smoothness constraints, pairwise interaction
potentials can be defined that approximate the equivalent elastic bending energies that would be
obtained using local finite-element analysis.Instead of defining the finite element neighborhood
for each particle (vertex) ahead of time, a soft influence function is used to couple nearby
particles.

Another alternative is to first convert the point cloud into an implicit signed distance or inside–
outside function, using either minimum signed distances to the oriented points or by
interpolating a characteristic (inside–outside) function using radial basis functions.

Even greater precision over the implicit function fitting, including the ability to handle irregular
point densities, can be obtained by computing a moving least squares (MLS) estimate of the
signed distance function as shown in Figure 13.17.

Point-based representations in computer vision and computer graphics refer to methods that
represent surfaces or objects using a set of individual points in three- dimensional (3D) space.
Instead of explicitly defining the connectivity between points as in polygonal meshes, point-
based representations focus on the spatial distribution of points to describe the surface
geometry.

39
Unit IV– 3D RECONSTRUCTION

Here are some common point-based representations:

Point Clouds:
Description: A collection of 3D points in space, each representing a sample on the surface
of an object or a scene.

Application: Point clouds are generated by 3D scanners, LiDAR, depth sensors, or


photogrammetry. They find applications in robotics, autonomous vehicles,
environmental mapping, and 3D modeling.

Dense Point Clouds:


Description: Similar to point clouds but with a high density of points, providing more
detailed surface information.

Application: Used in applications requiring detailed 3D reconstructions, such as cultural


heritage preservation, archaeological studies, and industrial inspections.

Sparse Point Sets:


Description: Representations where only a subset of points is used to describe the surface,
resulting in a sparser dataset compared to a dense point cloud.

40
Unit IV– 3D RECONSTRUCTION

Application: Sparse point sets are useful in scenarios where computational efficiency is
crucial, such as real-time applications and large-scale environments.

Point Splats:
Description: Represent each point as a disc or a splat in 3D space. The size and orientation
of the splats can convey additional information.

Application: Commonly used in point-based rendering and visualization to represent dense


point clouds efficiently.

Point Features:

Description: Represent surfaces using distinctive points or key points, each associated with
local features such as normals, color, or texture information.

Application: Widely used in feature based registration , object recognition and 3D


reconstruction.

Point Set Surfaces:


Description: Represent surfaces as a set of unorganized points without connectivity
information. Surface properties can be interpolated from neighboring points.

Application: Used in surface reconstruction from point clouds and point- based rendering.

Radial Basis Function (RBF) Representations:


Description: Use radial basis functions to interpolate surface properties between points.
These functions define a smooth surface that passes through the given points.

Application: Commonly used in shape modeling, surface reconstruction, and computer-


aided design.

4. 5 VOLUMETRIC REPRESENTATIONS

Volumetric representations in computer vision and computer graphics are methods used to
describe and model three-dimensional (3D) space in a volumetric manner. Unlike surface

41
Unit IV– 3D RECONSTRUCTION

representations, which focus on defining the surface geometry explicitly,


volumetric representations capture information about the entire volume, including
the interior of objects

A third alternative for modeling 3D surfaces is to construct 3D volumetric inside–


outside functions, where we looked at voxel coloring , space carving , and level set
techniques for stereo matching, where we discussed using binary silhouette images
to reconstruct volumes.
4.5.1 Implicit surfaces and level sets

While polyhedral and voxel-based representations can represent three-dimensional


shapes to an arbitrary precision, they lack some of the intrinsic smoothness properties
available with continuous implicit surfaces, which use an indicator function (or
characteristic function) F(x; y; z) to indicate which 3D points are inside F(x; y; z)
< 0 or outside F(x; y; z) > 0 the object.

An early example of using implicit functions to model 3D objects in computer vision were
superquadrics, To model a wider variety of shapes, superquadrics are usually combined
with either rigid or non-rigid deformations. Superquadric models can either be fitted to
range data or used directly for stereo matching.

A different kind of implicit shape model can be constructed by defining a signed distance function
over a regular three-dimensional grid, optionally using an octree spline to represent this function
more coarsely away from its surface (zero-set) .

We have already seen examples of signed distance functions being used to represent distance
transforms, level sets for 2D contour fitting and tracking , volumetric stereo , range data merging ,
and point-based modelling . The advantage of representing such functions directly on a grid is
that it is quick and easy to look up distance function values for any (x, y,z) location and also easy
to extract the isosurface using the marching cubes algorithm.

42
Unit IV– 3D RECONSTRUCTION

The function itself is represented using a quadratic tensorproduct B-spline over an octree, which
provides a compact representation with larger cells away from the surface or in regions of lower
point density, and also admits the efficient solution of the related Poisson equations (4.24–4.27).

It is also possible to replace the quadratic penalties used in the Poisson equations with L1 (total
variation) constraints and still obtain a convex optimization problem, which can be solved using
either continuous or discrete graph cut techniques.

Signed distance functions also play an integral role in level-set evolution equations, where the
values of distance transforms on the mesh are updated as the surface evolves to fit multi-view
stereo photo consistency measures.

As with many other areas of computer vision, deep neural networks have started being applied to
the construction and modeling of volumetric object representations. Some neural networks
construct 3D surface or volumetric occupancy grid models from single images , although more
recent experiments suggest that these networks may just be recognizing the general object
category and doing a small amount of fitting .

All of these networks use latent codes to represent individual instances from a generic class (e.g., car or
chair) from the ShapeNet dataset (Chang, Funkhouser et al. 2015), although they use the codes in a
different part of the network (either in the input or through conditional batch normalization). This allows
them to reconstruct 3D models from just a single image. Pixel-aligned Implicit function (PIFu) networks
combine fully convolutional image features with neural implicit functions to better preserve local shape
and color details.

They are trained specifically on clothed humans and can hallucinate full 3D models from just a single color
image (Figure 13.18). Neural Radiance Fields (NeRF) extend this to also use pixel ray directions as inputs
and also output continuous valued opacities and radiance values, enabling ray-traced rendering of shiny
3D models constructed from multiple input images. This representation is related to Lumigraphs and
surface LightFields.

43
Unit IV– 3D RECONSTRUCTION

Voxel Grids:

Description: A regular grid of small volume elements, called voxels, where each voxel represents
a small unit of 3D space.

Application: Used in medical imaging , Computer -aided design ( CAD ) ,computational fluid
dynamics, and robotics. Voxel grids are effective for representing both the exterior and interior of
objects.

Octrees:

Description: A hierarchical data structure that recursively divides 3D space into octants. Each leaf
node in the octree contains information about the occupied or unoccupied status of the
corresponding volume.

Application: Octrees are employed for efficient storage and representation of volumetric data,
particularly in real-time rendering, collision detection, and adaptive resolution.

Signed Distance Fields (SDF):

Description: Represent the distance from each point in space to the nearest surface of an object,
with positive values inside the object and negative values outside.

44
Unit IV– 3D RECONSTRUCTION

Application: Used in shape modeling, surface reconstruction, and physics-based simulations. SDFs
provide a compact representation of geometry and are often used in conjunction with implicit
surfaces.

3D Texture Maps:

Description: Extend the concept of 2D texture mapping to 3D space, associating color or other
properties with voxels in a volumetric grid.

Application: Employed in computer graphics, simulations, and visualization to represent complex


volumetric details such as smoke, clouds, or other phenomena.

Point Clouds with Occupancy Information:

Description: Combine the idea of point clouds with additional information about the occupancy
of each point in space.
Application: Useful in scenarios where capturing both the surface and interior details of objects is
necessary, such as in robotics and 3D reconstruction.
Tensor Fields:
Description: Represent the local structure of a volumetric region using tensors. Tensor fields
capture directional information, making them suitable for anisotropic materials and shapes.
Application: Commonly used in materials science, biomechanics, and simulations where
capturing anisotropic properties is important.
Shell Maps:
Description: Represent the surfaces of objects as a collection of shells or layers, each
encapsulating the object’s geometry .
Application: Used in computer graphics and simulation to efficiently represent complex objects
and enable dynamic level-of-detail rendering.

4.6 MODEL-BASED RECONSTRUCTION


When we know something ahead of time about the objects we are trying to model, we can
construct more detailed and reliable 3D models using specialized techniques and
representations.

45
Unit IV– 3D RECONSTRUCTION

For example, architecture is usually made up of large planar regions and other
parametric forms (such as surfaces of revolution), usually oriented perpendicular to
gravity and to each other. Heads and faces can be represented using low-dimensional,
nonrigid shape models, because the variability in shape and appearance of human faces,
while extremely large, is still bounded. Human bodies or parts, such as hands, form
highly articulated structures, which can be represented using kinematic chains of
piecewise rigid skeletal elements linked by joints.

4.6.1 Architecture

Architectural modeling, especially from aerial photography, has been one of the
longest studied problems in both photogrammetry and computer vision. In the last
two decades, the development of reliable image-based modeling techniques, as well
as the prevalence of digital cameras and 3D computer games, has led to widespread
deployment of such systems.

46
Unit IV– 3D RECONSTRUCTION

●The work by Debevec, Taylor, and Malik (1996) was one of the earliest hybrid geometry
and image-based modeling and rendering systems. Their Fac¸ade system combines an
interactive image-guided geometric modeling tool with model-based (local plane plus
parallax) stereo matching and view-dependent texture mapping. During the interactive
photogrammetric modeling phase, the user selects block elements and aligns their edges
with visible edges in the input images (Figure 13.19a).

●The system then automatically computes the dimensions and locations of the blocks along
with the camera positions using constrained optimization (Figure 13.19b–c). This approach
is intrinsically more reliable than general feature-based structure from motion, because it
exploits the strong geometry available in the block primitives.

●Once the rough geometry has been estimated, more detailed offset maps can be
computed for each planar face using a local plane sweep, call model-based stereo. Finally,
during rendering, images from different viewpoints are warped and blended together as the
camera moves around the scene, using a process called view- dependent texture mapping
(Figure 13.19d).

●For interior modeling, instead of working with single pictures, it is more useful to work
with panoramas, as you can see larger extents of walls and other structures. The lines are
initially used to establish an absolute rotation for each panorama and are later used (along
with the inferred vertices and planes) to optimize the 3D structure, which can be recovered
up to scale from one or more images (Figure 13.20).

47
Unit IV– 3D RECONSTRUCTION

Recent advances in deep networks now make it possible to both automatically infer the
lines and their junctions high dynamic range panoramas can also be used for outdoor
modeling, because they provide highly reliable estimates of relative camera orientations
as well as vanishing point directions .

Intersections of planes are used to determine the extent of each plane, i.e., an initial
coarse geometry, which is then refined with the addition of rectangular or wedge-
shaped indentations and extrusions. Note that when top-down maps of the buildings
being modeled are available, these can be used to further constrain the 3D modeling
process

Figure 13.21 shows some of the processing stages in the system developed by Sinha,
Steedly, and Szeliski (2009). Another common characteristic of architecture is the repeated
use of primitives such as windows, doors, and colonnades. Architectural modeling systems
can be designed to search for such repeated elements and to use them as part of the
structure inference process.

48
Unit IV– 3D RECONSTRUCTION

● The combination of structured elements such as parallel lines, junctions, and rectangles
with full axis-aligned 3D models for the modeling of architectural environments has
recently been called holistic 3D reconstruction. The combination of all these techniques
now makes it possible to reconstruct the structure of large 3D scenes.

● For example, the Urbanscan system reconstructs texture-mapped 3D models of city


streets from videos acquired with a GPS-equipped vehicle. To obtain real-time
performance, they use both optimized online structure-from-motion algorithms, as well
as GPU implementations of plane-sweep stereo aligned to dominant planes and depth
map fusion. present a related system that also uses plane-sweep stereo combined with
object recognition and segmentation for vehicles.

● Numerous photogrammetric reconstruction systems that produce detailed texture-


mapped 3D models have been developed based on these computer vision techniques.
Examples of commercial software that can be used to reconstruct large-scale 3D models
from aerial drone and ground level photography include Pix4D,Metashape,and
RealityCapture. Another example is Occipital’s Canvas mobile phone app (Stein 2020),
which appears to use a combination of photogrammetry and depth map fusion.

4.6.2 Facial Modeling And Tracking

49

Given:
Unit IV– 3D RECONSTRUCTION

Figure 13.22 shows an example of an image-based modeling system, where user-


specified Keypoints in several images are used to fit a generic head model to a person’s
face. As you Can see in figure 13.22c, after specifying just over 100 keypoints, the shape
of the face has Become quite adapted and recognizable. Extracting a texture map from
the original images And then applying it to the head model results in an animatable
model with striking visual Fidelity (figure 13.23a).

As you can See in figure 13.25, it is then possible to fit morphable 3d models to single

images and to Use such models for a variety of animation and visual effects. It is also
possible to design stereo matching algorithms that optimize Directly for the head model
parameters Or to use the output of real-time stereo with active illumination (figures
13.10 and 13.23b).

Once a 3d head model has been constructed, it can be used in a variety of applications,
Such as head tracking as shown in figures and face transfer, i.e., replacing one person’s
face With another in a video . Additional Applications include face beautification by
warping face images toward a more attractive “standard” , face de-identification for
privacy protection , and face swapping.

50
Unit IV– 3D RECONSTRUCTION
●More recent applications of 3d head models include photorealistic avatars for video
conferencing , 3d unwarping for better selfies, and single image portrait relighting , an
example of Which is shown in figure 13.24

4. 6.3 Application: Facial Animation

●Blanz And vetter (1999) describe a system where they first capture a set of 200 colored
range scans of faces (figure 13.25a), which can be represented as a large collection of
(X,Y,Z,R,G,B) samples (vertices).16 for 3d morphing to be meaningful, corresponding
vertices in different people’s scans must first be put into correspondence.

●After computing a subspace representation, different directions in this space can


beassociated with different characteristics such as gender, facial expressions, or facial
features (figure 13.25a). 3D morphable models can be fitted to a single image using
gradient descent on the error between the input image and the re-synthesized model
image, after an initial manual placement of the model in an approximately correct pose,
scale, and location (figures 13.25b–c). The efficiency of this fitting process can be increased
using inverse compositional image alignment.

51
Unit IV– 3D RECONSTRUCTION

The resulting texture-mapped 3d model can then be modified to produce a variety of visual
effects, including changing a person’s weight or expression, or three- dimensional effects such
as re-lighting or 3d video-based animation. Such models can also be used for video
compression, e.g., by only transmitting a small number of facial expression and pose
parameters to drive a synthetic avatar or to bring a still portrait image to life. The survey paper
on 3d morphable face models by (figure 13.26) discusses additional research and applications
in this area.

3d facial animation is often matched to the performance of an actor, in what is known as


performance-driven animation. Traditional performance driven animation systems use marker-
based motion capture, while some newer systems use depth cameras or regular video to
control the animation.

52
Unit IV– 3D RECONSTRUCTION

3d facial animation is often matched to the performance of an actor, in what is known as


performance-driven animation. Traditional performance driven animation systems use
marker-based motion capture, while some newer systems use depth cameras or regular
video to control the animation.

53
Unit IV– 3D RECONSTRUCTION

●These 3d models were then translated into Facial action coding system (FACS) shapeand
expression parameters to drive a different (older) synthetically animated computer-
generated imagery (CGI) character.

4.6.4 Human Body Modeling And Tracking

The topics of tracking humans, modeling their shape and appearance, and recognizing
their activities, are some of the most actively studied areas of computer vision. The
HumanEva database of articulated human motions contains multi-view video sequences
of human actions along with corresponding motion capture data, evaluation code, and a
reference 3D tracker based on particle filtering.

We refer the reader to the previously mentioned surveys for other topics and more
details.

Background subtraction : One of the first steps in many human tracking systems is to
model the background to extract the moving foreground objects (silhouettes)
corresponding to people. Tracking such silhouettes over time supports the analysis of
multiple people moving around a scene, including building shape and appearance models
and detecting if they are carrying objects.

Initialization and detection : To track people in a fully automated manner, it is


necessary to first detect (or re-acquire) their presence in individual video frames. Single-
frame human detection and pose estimation algorithms can be used by themselves to
perform tracking. They are often combined, however, with frame-to-frame tracking
techniques to provide better reliability.

Tracking with flow: The tracking of people and their pose from frame to frame can be
enhanced by computing optical flow or matching the appearance of their limbs from one
frame to another. For example, the cardboard people models the appearance of each leg
portion (upper and lower) as a moving rectangle, and uses optical flow to estimate their
location in each subsequent frame. It is also possible to match the estimated motion field
itself to some prototypes in order to identify the particular phase of a running motion or
to match two low-resolution video portions to perform video replacement.
54
Unit IV– 3D RECONSTRUCTION

3D kinematic models : The effectiveness of human modeling and tracking can be


greatly enhanced using a more accurate 3D model of a person’s shape and motion.
Underlying such representations, which are ubiquitous in 3D computer animation in
games and special effects, is a kinematic model or kinematic chain, which specifies the
length of each limb in a skeleton as well as the 2D or 3D rotation angles between the
limbs or segments (Figure 13.27a–b). Inferring the values of the joint angles from the
locations of the visible surface points is called inverse kinematics (IK) and is widely
studied in computer graphics. Figure 13.27a shows the kinematic model for a human
hand used to track hand motion in a video. As you can see, the attachment points
between the fingers and the thumb have two degrees of freedom, while the finger joints
themselves have only one. Using this kind of model can greatly enhance the ability of an
edge-based tracker to cope with rapid motion, ambiguities in 3D pose, and partial
occlusions. One popular approach is to associate an ellipsoid or superquadric with each
rigid limb in the kinematic model, as shown in Figure 13.27b.

Probabilistic models : Because tracking can be such a difficult task, sophisticated


probabilistic inference techniques are often used to estimate the likely states of the
person being tracked. Figure 13.27c–d shows an example of a sophisticated spatio-
temporal probabilistic graphical model called loose-limbed people, which models not only
the geometric relationship between various limbs, but also their likely temporal
dynamics. The conditional probabilities relating various limbs and time instances are
learned from training data, and particle filtering is used to perform the final pose
inference locations of occluding edges.

55
Unit IV– 3D RECONSTRUCTION

Adaptive shape modeling : Adaptive shape modeling refers to the use of mathematical
and computational techniques to represent and analyze the shapes of objects in images or
videos. These techniques allow for the automatic detection, recognition, and tracking of
objects by adapting to variations in shape, size, and appearance. The registered datasets
are used to model the variation in shape as a function of personal characteristics and
skeletal pose, e.g., the bulging of muscles as certain joints are flexed (Figure 13.29, top
row). The resulting system can then be used for shape completion, i.e., the recovery of a
full 3D mesh model from a small number of captured markers, by finding the best model
parameters in both shape and pose space that fit the measured data. Because it is
constructed completely from scans of people in close-fitting clothing and uses a parametric
shape model, the SCAPE system cannot cope with people wearing loose fitting clothing.

56
Unit IV– 3D RECONSTRUCTION

While the preceding body fitting and pose estimation systems use multiple views to
estimate body shape, fit a human shape and pose model to a single image of a person
on a natural background. Manual initialization is used to estimate a rough pose
(skeleton) and height model, and this is then used to segment the person’s outline using
the Grab Cut segmentation algorithm. The shape and pose estimate are then refined
using a combination of silhouette edge cues and shading information (Figure 13.29). The
resulting 3D model can be used to create novel animations.

Activity recognition : The final widely studied topic in human modeling is motion, activity,
and action recognition. Examples of actions that are commonly recognized include
walking and running, jumping, dancing, picking up objects, sitting down and standing up,
and waving.

57
Unit IV– 3D RECONSTRUCTION
4.7 RECOVERING TEXTURE MAPS AND ALBEDOS

After a 3D model of an object or person has been acquired, the final step in modeling is
usually to recover a texture map to describe the object’s surface appearance. This first
requires establishing a parameterization for the (u,v) texture coordinates as a function of
3D surface position.One simple way to do this is to associate a separate texture map
with each triangle (or pair of triangles). More space-efficient techniques involve
unwrapping the surface onto one or more maps, e.g., using a subdivision mesh or a
geometry image.

Once the (u; v) coordinates for each triangle have been fixed, the perspective
projection equations mapping from texture (u,v) to an image j’s pixel (uj,vj) coordinates
can be obtained by concatenating the affine (u,v)->(X,Y,Z) mapping with the
perspective homography (X,Y,Z) -> (uj ; vj) . The color values for the (u; v)
texture map can then be re-sampled and stored, or the original image can itself be used
as the texture source using projective texture mapping . The situation becomes more
involved when more than one source image is available for appearance recovery, which is
the usual case.

One possibility is to use a view-dependent texture map (Section 14.1.1), in which a


different source image (or combination of source images) is used for each polygonal face
based on the angles between the virtual camera, the surface normals, and the source
images.

In some situations, e.g., when using models in traditional 3D games, it is preferable to


merge all of the source images into a single coherent texture map during pre-
processing. Ideally, each surface triangle should select the source image where it is seen
most directly (perpendicular to its normal) and at the resolution best matching the
texture map resolution. This can be posed as a graph cut optimization problem, where
the smoothness term encourages adjacent triangles to use similar source images,
followed by blending to compensate for exposure differences.

58
Unit IV– 3D RECONSTRUCTION

In some situations, e.g., when using models in traditional 3D games, it is preferable to


merge all of the source images into a single coherent texture map during pre-
processing. Ideally, each surface triangle should select the source image where it is seen
most directly (perpendicular to its normal) and at the resolution best matching the
texture map resolution. This can be posed as a graph cut optimization problem, where
the smoothness term encourages adjacent triangles to use similar source images,
followed by blending to compensate for exposure differences. The light source directions
and estimating the surface reflectance properties while recovering the texture map.

Figure 13.31 shows the results of one such approach, where the specularities are first
removed while estimating the matte reflectance component (albedo) and then later re-
introduced by estimating the specular component ks in a Torrance–Sparrow reflection
model .

4.7.1 Estimating BRDFs

A more ambitious approach to the problem of view-dependent appearance


modeling is to estimate a general bidirectional reflectance distribution function (BRDF)
for each point on an object’s surface. The BRDF can be written as

59
Unit IV– 3D RECONSTRUCTION

where the e subscript now represents the emitted rather than the reflected light directions.

● To build up their models, they first associate a lumitexel, which contains a 3D position, a
surface normal, and a set of sparse radiance samples, with each surface point. Next, they
cluster such lumitexels into materials that share common properties, using a Lafortune
reflectance model and a divisive clustering approach (Figure 13.32a).

Finally, to model detailed spatially varying appearance, each lumitexel (surface point) is
projected onto the basis of clustered appearance models (Figure 13.32b). More recent
approaches to recovering spatially varying BRDFs (SVBRDFs) either start with RGB-D
scanners, flash/noflash image pairs , or use deep learning approaches to simultaneously
estimate surface normals and appearance models . Even more sophisticated systems can
also estimate shape and environmental lighting from range scanner sequences or single
monocular images . 60
Unit IV– 3D RECONSTRUCTION

While most of the techniques discussed in this section require large numbers of views to
estimate surface properties, an interesting challenge is to take these techniques out of the lab
and into the real world, and to combine them with regular and internet photo image- based
modeling approaches.

4. 7.2 Application: 3D model capture

The techniques described in this chapter for building complete 3D models from multiple images
and then recovering their surface appearance have opened up a whole new range of applications
that often go under the name 3D photography.

An example of a more recent commercial photogrammetric modeling system that can be used for
both object and scene capture is Pix4D, whose website shows a wonderful example of a 3D
texture-mapped castle reconstructed from both regular and aerial drone photographs. An
alternative to such fully automated systems is to put the user in the loop in what is sometimes
called interactive computer vision.

VideoTrace system, which performs automated point tracking and 3D structure recovery from
video and then lets the user draw triangles and surfaces on top of the resulting point cloud, as
well as interactively adjusting the locations of model vertices. It describe a related system that
uses matched vanishing points in multiple images (Figure 7.50) to infer 3D line orientations and
plane normals. These are then used to guide the user drawing axis-aligned planes, which are
automatically fitted to the recovered 3D point cloud.

61
ACTIVITIES

1. Activity: Provide a set of images with different lighting conditions, and ask
students to implement different "Shape from X" techniques (e.g., Shape from
Shading, Shape from Silhouettes, Shape from Motion) to recover the 3D
structure. Evaluate the performance of each technique under varying
conditions.
2. Activity: Create a project where students must design a simulation or use
real sensor data (e.g., LIDAR or structured light) to map out a 3D
environment. Have them write an algorithm that fuses multiple range data
sources for higher accuracy.
3. Activity: Give students a set of point cloud data or mesh models, and ask
them to create different surface representations (e.g., triangulated surfaces,
NURBS, spline-based surfaces). Have them compare the benefits and trade-
offs of each representation for rendering or further processing.

4. Activity: Have students build a point-based rendering engine where they must
visualize complex objects using point clouds instead of traditional polygon
meshes. Introduce error metrics to evaluate the fidelity of the rendered object
compared to the original mesh.

62
Video Links
Unit – 4

63
Video Links

Sl. Video Link


Title
No.

1 Shape from X https://www.youtube.com/wat


ch?v=YQ5QOiyoF9U
2 Active Rangefinding https://www.youtube.com/wat
ch?v=3S3xLUXAgHw
3 Surface Representations https://www.youtube.com/watc
h?v=wQRgcCGkIhc
4 Point-based https://www.youtube.com/watc
Representations h?v=9vM6E6zoA84
5 Volumetric Representations https://www.youtube.com/watc
h?v=B2BTSKcYqtQ
Model-based Reconstruction
6 https://www.youtube.com/watc
h?v=Rfb1J3fJMYA
7 Recovering texture maps https://www.youtube.com/watc
and albedosos. h?v=uF9I7UT9csI

64
Assignments
Unit - IV

65
Assignment Questions
Assignment Questions – Very Easy
Q. ASSIGNMENT QUESTIONS Marks Knowledg CO
e level
No.
1 5 K1 CO4
Explain how surface representations are
used in reconstructing 3D objects

5 K1 CO4
2 What role does albedo play in the
appearance of a surface under different
lighting conditions?

Assignment Questions – Easy


Q. ASSIGNMENT QUESTIONS Marks Knowledge CO
level
No.
1 5 K2 CO4
Provide examples where each
representation would be useful in 3D
modeling

2 5 K2 CO4
How can shading provide information about
the shape of an object?

66
Assignment Questions
Assignment Questions – Medium
Q. ASSIGNMENT QUESTIONS Marks Knowledg CO
e level
No.
1 5 K3 CO4
Discuss the advantages and disadvantages
of active rangefinding compared to methods
like stereo vision.

5 K3 CO4
2 Describe the process and how this
representation is used for more accurate 3D
modeling.

Assignment Questions – Hard


Q. ASSIGNMENT QUESTIONS Marks Knowledge CO
level
No.
1 5 K4 CO4
What are the challenges of recovering
accurate texture maps in model-based
reconstruction?

2 5 K4 CO4
How does Shape from X (e.g., Shape from
Texture, Shape from Shading) contribute to
recovering 3D geometry in complex scenes?

67
Assignment Questions
Assignment Questions – Very Hard
Q. ASSIGNMENT QUESTIONS Marks Knowledg CO
e level
No.
1 5 K5 CO4
Develop a methodology for combining point-
based and volumetric representations to
improve the accuracy of 3D surface
reconstruction.
5 K5 CO4
2 Discuss the limitations of albedo recovery in
scenes with varying lighting conditions.

68
Course Outcomes:
CO4: To apply 3D image reconstruction techniques
*Allotment of Marks

Correctness of the Presentation Timely Submission Total (Marks)


Content

15 - 5 20

69
Part A – Questions
& Answers
Unit – IV

70
Part A - Questions & Answers

1. What is Shape from X? [K2, CO4]

Shape from X refers to a set of techniques that infer 3D shape from various cues like

shading, texture, and motion.

2. What is the principle behind active rangefinding?[K3, CO4]

Active rangefinding measures the distance to an object by projecting energy (e.g.,

light or sound) and measuring the time it takes for the reflection to return.

3. What is a surface representation in 3D modeling? [K2, CO4]

Surface representation defines the geometry of an object’s surface, typically in forms

like meshes, parametric surfaces, or point clouds.

4. How do point-based representations work?[K2, CO4]

Point-based representations describe a surface using discrete points in 3D space

without explicitly defining connectivity between them.

5. What are volumetric representations?. [K3, CO4]

Volumetric representations describe objects as a set of volumetric elements (voxels),

representing the space occupied by the object.

71
6.What is the role of texture mapping in 3D models?[K3,
CO4]

Texture mapping applies 2D images (textures) to the surface of a


3D model to provide detail without adding geometric complexity.

7. Define albedo in computer vision.[K3, CO4]

Albedo refers to the intrinsic reflectivity of a surface, describing


how much light it reflects independently of lighting conditions.

8. What is model-based reconstruction? [K2, CO4]

Model-based reconstruction uses a pre-defined model to help


reconstruct the 3D shape of objects from sensor data.

9. How does Shape from Shading work? [K3, CO4]

Shape from Shading infers the 3D shape of a surface by analyzing


how light interacts with it, based on its shading pattern.

10. What is the key advantage of using active over passive


depth sensing?.[K3, CO4]

Active depth sensing is independent of ambient lighting and can


work in darkness, unlike passive methods like stereo vision.

72
11. What are the challenges with Shape from Texture? [K3, CO4]

Shape from Texture can be challenged by irregular or non-uniform


textures, leading to ambiguous interpretations of surface geometry.

12. What is the difference between a mesh and a point


cloud?[K3, CO4]

A mesh connects points (vertices) with edges and faces, defining


surface structure, while a point cloud is just an unconnected collection
of points.

13. Why are volumetric representations useful for 3D


printing?. [K2, CO4]

Volumetric representations can represent the entire volume of an


object, making it easier to generate data for 3D printing.

14. How does active rangefinding help in obstacle


detection?[K3, CO4]

Active rangefinding measures distances to objects in real-time, helping


detect obstacles based on their distance from the sensor.

15. What is a voxel in volumetric representation? [K3, CO4]

A voxel (volumetric pixel) represents a value on a regular grid in 3D


space, similar to how a pixel represents 2D image data.

73
16. How can Shape from Motion be used in 3D
reconstruction?.[K3, CO4]

Shape from Motion uses changes in viewpoint over time (e.g.,


video sequences) to infer the 3D structure of a scene.

17. Why is recovering texture maps important in 3D


modeling?[K3, CO4]

Recovering texture maps provides detailed surface


appearance, allowing realistic rendering of 3D models in
computer graphics.

18. What is a major limitation of point-based


representations?[K2, CO4]

Point-based representations do not explicitly capture surface


connectivity, making tasks like surface smoothing more difficult.

19. What role does Shape from Silhouette play in


reconstruction? [K3, CO4]
Shape from Silhouette reconstructs the 3D shape of an object
by analyzing its outline (silhouette) from different viewpoints.
20. What is the challenge with lighting variation in
texture recovery?. [K3, CO4]
Lighting variation can distort the appearance of textures,
making it difficult to accurately recover the original texture
map.

74
21.How is Shape from Stereo different from Shape from
Shading? [K2, CO4]
Shape from Stereo estimates depth using two or more images from
different angles, while Shape from Shading uses only one image to
infer shape from light intensity.
22. What is the purpose of parametric surface
representation? [K3 ,CO4]
Parametric surfaces, like NURBS, use mathematical equations to
define smooth, continuous surfaces with a high level of control over
the shape.
23. Why is Shape from Texture often more challenging than
Shape from Shading? [K3, CO4]
Shape from Texture requires distinct patterns that vary with surface
orientation, while Shape from Shading relies on changes in light
intensity.
24. What does it mean to 'recover albedo' in image
processing?[K3, CO4]
Recovering albedo refers to extracting the intrinsic reflectivity of a
surface, separating it from shading and illumination effects.
25. Why is Shape from Shading not suitable for textureless
surfaces?[K3, CO4]
Textureless surfaces offer little variation in shading patterns, making
it difficult to infer shape from shading alone.
75
Part B – Questions
Unit – IV

76
Part B Questions
Q. No. Questions K Level CO
Mapping
1 Explain the process of Shape from Shading K4 CO4
in detail. How does it help in reconstructing
3D surfaces, and what are its limitations?

2 Compare and contrast active and passive K4 CO4


rangefinding techniques. Which method is
more suitable for autonomous navigation
systems and why?

3 Describe the various surface representation K4 CO4


techniques used in 3D modeling. What are
the benefits and drawbacks of using mesh,
point-based, and parametric surface
representations?

4 Discuss the challenges involved in K3 CO4


recovering texture maps and albedo in 3D
reconstruction. What techniques are used
to overcome lighting variations, occlusions,
and surface inconsistencies?
5 Explain the principles of Shape from K4 CO4
Texture and Shape from Motion in 3D
reconstruction. How do these techniques
complement each other in capturing the
geometry of complex scenes?
6 Describe the process of volumetric K3 CO4
representation in 3D reconstruction.
How does it differ from point-based
representation, and in what scenarios is
it more advantageous?
77
7 Discuss how using a model-driven approach K4 CO4
aids in reducing ambiguity in object
recognition and reconstruction. Highlight
challenges such as model generalization and
inaccuracies.

8 Explain how Shape from Silhouette is used K3 CO4


for 3D object reconstruction. What are the
key limitations of this technique, especially
in scenes with complex geometries or
occlusions?
9 Compare and contrast time-of-flight (used in K4 CO4
LIDAR systems) and triangulation-based
methods (e.g., structured light, laser
scanning), focusing on accuracy, cost, and
use cases.

10 Discuss how volumetric representations, K4 CO4


point-based methods, and surface meshes
can be combined in a multi-representation
approach to improve the accuracy and
efficiency of 3D surface reconstruction.

78
Supportive online
Certification
courses (NPTEL,
Swayam, Coursera,
Udemy, etc.,)

79
Supportive Online Certification
Courses
 Coursera – Introduction to Computer
Vision
• Description:
This course provides an overview of computer
vision, including image processing, feature
extraction, and object recognition.
• Offered by:
Georgia Tech
https://www.coursera.org/learn/introdu
ction-computer-vision
• NPTEL:Computer Vision
• Computer Vision - Course (nptel.ac.in)

 Udemy:
• Computer Vision -
https://www.bing.com/ck/a?!&&p=37e82e3153ef651eJmlt
dHM9MTcxODc1NTIwMCZpZ3VpZD0wZmYzMGRhMi1iMTh
hLTZmZDgtMmFkNi0xZThmYjAyNzZlYjQmaW5zaWQ9NTU
xMA&ptn=3&ver=2&hsh=3&fclid=0ff30da2-b18a-6fd8- -
2ad6-
1e8fb0276eb4&u=a1L3ZpZGVvcy9yaXZlcnZpZXcvcmVsYX
RlZHZpZGVvP3E9UHJhY3RpY2FsK09wZW5DViszK3VkbWV5
K2NvdXJzZSZtaWQ9NThDMDExNjI2NTMzQzBFNDRCNjE1O
EMwMTE2MjY1MzNDMEU0NEI2MSZGT1JNPVZJUkU&ntb=1

80
Real time
Applications in day
to day life and to
Industry

81
Real time Applications
1. Autonomous Vehicle Camera Calibration
In an autonomous vehicle, camera calibration is crucial for accurate
perception. The vehicle's camera system is calibrated to ensure that the
distance measurements, lane detection, and object recognition are
precise. A miscalibration could lead to incorrect distance estimation,
which could be dangerous in a real-world scenario. Calibration is typically
performed during the vehicle's manufacturing process and periodically
re-calibrated as part of maintenance.
2. Stereo Vision for Depth Estimation in Autonomous Vehicles
Autonomous vehicles rely heavily on accurate depth estimation for
navigation and obstacle avoidance. Stereo vision systems in these
vehicles use triangulation to estimate the distance to various objects in
the environment. By comparing the images captured by two cameras
placed at a known distance apart, the system can determine the depth
of each pixel in the image. This information is used to create a 3D map
of the surroundings, allowing the vehicle to make informed decisions.
3. Archaeological Site Reconstruction
In archaeology, Two-Frame Structure from Motion can be used to
create 3D models of excavation sites from aerial photographs taken by
drones. By capturing two images of the site from slightly different
positions, researchers can reconstruct the 3D layout of the site, allowing
them to analyze the spatial relationships between different artifacts and
features.
4. Constrained SfM in Robotics
In a study involving robot navigation, the robot’s movement was
constrained to a flat plane (e.g., a ground robot navigating through an
indoor environment). By applying the planar constraint, the robot could
efficiently reconstruct the 3D layout of the environment and navigate
obstacles in real time. The use of known calibration parameters and
limited camera motion further reduced the complexity of the SfM
calculations, enabling faster and more accurate map-building and
localization.

82
Real time Applications
5. Video Compression with Motion Estimation
In video compression (e.g., MPEG, H.264), parametric motion
estimation is used to reduce redundancy between consecutive frames. By
estimating the motion parameters that describe how parts of the image
move from one frame to the next, only the motion parameters and
residual errors (differences between the predicted and actual frames)
need to be encoded, leading to significant data compression.
6. Spline-Based Motion in Robotics
In robotic path planning, splines are often used to define the trajectory
of a robotic arm. For instance, if a robotic arm needs to move from one
point to another while avoiding obstacles, a spline-based motion model
can be used to generate a smooth path that takes into account the
positions of obstacles and the arm’s physical constraints. The resulting
motion is smooth, continuous, and can be executed with precision by the
robot.
7. Tracking a Moving Car
Suppose we have a video sequence of a car moving across a scene. By
calculating the optical flow, we can determine the direction and speed at
which each pixel in the image is moving. The flow vectors for the pixels
corresponding to the car will point in the direction of the car's movement,
and their magnitudes will reflect its speed. This information can be used
to track the car's position over time, detect changes in its velocity, or even
predict its future position in the sequence.
8. Video Surveillance
In video surveillance, layered motion analysis can be used to detect
and track multiple individuals or vehicles in a scene. By decomposing the
scene into layers, each representing different moving objects or groups of
objects, the system can track each entity separately, even in crowded
environments. This layered approach allows for better handling of
occlusions (when one object blocks another), improving the accuracy of
the tracking system.

83
Content Beyond
Syllabus

84
Deep Learning for Optical Flow Estimation
and Layered Motion Analysis
Description: Optical flow estimation traditionally relies on differential
methods or variational approaches, which have limitations in handling
large displacements, occlusions, and complex motions. Recent
advances in deep learning have revolutionized optical flow estimation
by leveraging convolutional neural networks (CNNs) and recurrent
neural networks (RNNs) to learn motion patterns from data. Deep
learning models like FlowNet, PWC-Net, and RAFT provide state-of-
the-art results in optical flow estimation. Additionally, deep learning aids
in layered motion analysis, allowing for the segmentation of complex
scenes into multiple motion layers, improving object detection, tracking,
and understanding dynamic interactions in videos.
Key Subtopics:
• Overview of traditional vs. deep learning-based optical flow
estimation techniques.
• Architectures of deep learning models (e.g., FlowNet, PWC-Net,
RAFT) for optical flow.
• Techniques for handling occlusions and large displacements in flow
estimation.
• Layered motion analysis using deep learning for scene
segmentation.
• Applications in video understanding, autonomous driving, human
action recognition, and sports analytics.
• Challenges in real-time optical flow estimation and future research
directions.

85
Assessment
Schedule
(Proposed Date &
Actual Date)

86
Assessment Schedule
(Proposed Date & Actual Date)
Sl. ASSESSMENT Proposed Actual
No. Date Date

1 FIRST INTERNAL
ASSESSMENT

2 SECOND
INTERNAL
ASSESSMENT
3 MODEL
EXAMINATION

4 END SEMESTER
EXAMINATION

87
Prescribed Text
Books & Reference

88
Prescribed Text Books &
Reference
TEXT BOOKS:
1.D. A. Forsyth, J. Ponce, “Computer Vision: A Modern
Approach”, Pearson Education, 2003.

2. Richard Szeliski, “Computer Vision: Algorithms and


Applications”, Springer Verlag London Limited,2011.

REFERENCES:
1. B. K. P. Horn -Robot Vision, McGraw-Hill.
2.Simon J. D. Prince, Computer Vision: Models, Learning,
and Inference, Cambridge University Press, 2012.

3. Mark Nixon and Alberto S. Aquado, Feature Extraction &


Image Processing for Computer Vision, Third Edition,
Academic Press, 2012.
4.E. R. Davies, (2012), “Computer & Machine Vision”,
Fourth Edition, Academic Press.
5.Concise Computer Vision: An Introduction into Theory
and Algorithms, by Reinhard Klette, 2014

89
Mini Project
Suggestions

90
Mini Project Suggestions
1 Very Hard
Capture images of a textured 3D object from multiple viewpoints. Implement a
system that not only reconstructs the 3D shape of the object but also recovers its
texture map and albedo properties for realistic rendering. Address challenges like
lighting variation and occlusion.
2. Hard
Build a simple LIDAR-like system using a time-of-flight sensor (e.g., an affordable
sensor like the VL53L0X). Map a small room or environment, and generate a 3D point
cloud of the environment based on range data.
3. Medium
Implement a system that takes a set of 2D cross-sectional images of an object and
reconstructs a volumetric (voxel-based) representation of the object. The output should
be a 3D model ready for 3D printing.
4. Easy
Capture a series of depth images (or use a public dataset) and generate a 3D point
cloud using the depth information. Visualize the point cloud and apply basic filtering to
remove noise.
5. Very Easy
Use a simple 3D object (e.g., a cube or sphere) and implement Shape from
Silhouette using images taken from multiple angles. The project should output the
reconstructed 3D shape..

91
Thank you
Disclaimer:

This document is confidential and intended solely for the educational


purpose of RMK Group of Educational Institutions. If you have received
this document through email in error, please notify the system
manager. This document contains proprietary information and is
intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately
by e-mail if you have received this document by mistake and delete this
document from your system. If you are not the intended recipient you
are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

92

You might also like