0% found this document useful (0 votes)
20 views

Introduction FPCV-0-1

The document introduces the field of computer vision, emphasizing its goal to build machines that can see and analyze images to extract information about the physical world. It outlines the structure of a lecture series that covers the mathematical and physical principles of computer vision, highlighting its applications in various industries. The document also discusses the importance of understanding first principles in computer vision, despite the popularity of deep learning approaches.

Uploaded by

Simhadri Sevitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Introduction FPCV-0-1

The document introduces the field of computer vision, emphasizing its goal to build machines that can see and analyze images to extract information about the physical world. It outlines the structure of a lecture series that covers the mathematical and physical principles of computer vision, highlighting its applications in various industries. The document also discusses the importance of understanding first principles in computer vision, despite the popularity of deep learning approaches.

Uploaded by

Simhadri Sevitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Computer Vision

Shree K. Nayar

Monograph: FPCV-0-1
Module: Introduction
Series: First Principles of Computer Vision
Computer Science, Columbia University

February 05, 2022

FPCV Channel
FPCV Website
First Principles of Computer Vision Introduction

The poet Joseph Addison once said that “our sight


is the most perfect and the most delightful of all
our senses.” The goal of computer vision is to build
Introduction
machines that can see. We have already witnessed
some successful applications of vision such as face
Shree K. Nayar
recognition and driverless cars. There is much
Columbia University
more to come. In the next decade, we can expect
computer vision to have a profound impact on the Topic: Introduction, Module: Introduction
First Principles of Computer Vision

way we live our lives.

The goal of this lecture series is to cover the 1

mathematical and physical underpinnings of


computer vision. Vision deals with images. We will look at how images are formed and then develop a
variety of methods for recovering information about the physical world from images. Along the way, we
will show several real-world applications of vision.

Since deep learning is popular today, you may be wondering if it is worth knowing the first principles of
vision, or, for that matter, the first principles of any field. Given a task, why not just train a neural network
with tons of data to solve the task? Indeed, there are applications where such an approach may suffice,
but there are several reasons to embrace the basics.

First, it would be laborious and unnecessary to train a network to learn a phenomenon that can be
concisely and precisely described using first principles. Second, when a network does not perform well
enough, first principles are your only hope for understanding why. Third, a network that is intended to
learn a complex mapping would typically require an enormous amount of training data to be collected.
This can be tedious and sometimes even impractical. In such cases, models based on first principles can
be used to synthesize the training data instead of collecting it. Finally, the most compelling reason to
learn the first principles of any field is curiosity. What makes humans unique is our innate desire to know
why things work the way they do.

I have partitioned this lecture series into 5 modules, each spanning an important aspect of computer
vision. Module 1 is about imaging. Module 2 is about detecting features and boundaries. Module 3 is on
3D reconstruction from a single viewpoint. Module 4 is on 3D reconstruction using multiple viewpoints.
Module 5 covers perception.

To follow any of these modules, you do not need any prior knowledge of computer vision. All you need
to know are the fundamentals of linear algebra and calculus. If you happen to know a programming
language, it would enable you to picture how the methods I describe can be implemented in software.
In short, any science or engineering sophomore should be able to handle the material with ease.

FPCV-0-1 1
First Principles of Computer Vision Introduction

While we approach vision as an engineering discipline in this series, when appropriate, we make
connections with other fields such as neuroscience, psychology, art history, and biology. I hope you enjoy
the lectures and, by the end of it, I hope you will be convinced that computer vision is not only powerful
but also fascinating.

What is Computer Vision?

Shree K. Nayar
Columbia University

Topic: Introduction, Module: Introduction


First Principles of Computer Vision

2 3
I.1

Vision is our most powerful sense. It allows us to interact with the physical world without making any
direct physical contact. It is believed that about 60% of the brain is, in one way or the other, involved in
visual processing. Ponder that for a moment. Thanks to our vision system, we are able to effortlessly
navigate through the complex world we live in and perform a variety of daily chores. In fact, our visual
system is so powerful that most of the time we are unaware of how much it is doing for us.

Computer vision is the enterprise of building machines that can see. You may be wondering, given that
the human visual system is so powerful, why even bother to build machines that can emulate it? Well,
there are several reasons. First, there are many chores we perform on a daily basis that we would rather
have done by a machine so we can free up time to devote to more rewarding activities. Examples of such
chores might be tidying your home and driving to work. Second, while our vision system is truly powerful,
it tends to be more qualitative than quantitative. It is not particularly good at making precise
measurements of things in the physical world. Lastly, and perhaps most importantly, a computer vision
system can be designed to surpass the capability of human vision and extract information about the
world that we simply cannot.

FPCV-0-1 2
First Principles of Computer Vision Introduction

Here we see the basic elements of a computer


vision system. On the right is a three-dimensional What is Computer Vision?
scene we wish to understand. This scene is lit by
Camera Lighting
some form of lighting. Without light, there can be
no vision. The source of lighting could be simple as
in the case of a point source such as the sun, or
complex as in the case of a collection of different Vision
Software
types of lamps in an indoor setting.
I.36

The scene reflects some of the light it receives Scene Description


Scene

towards a camera, which plays the role of the


4
human eye. The camera receives light from the 3D
scene to form a 2D image. This image is passed on to a piece of vision software that seeks to analyze the
image and come up with a symbolic description of the scene. The description could say that there are
wine bottles, wine glasses, cheese, bread, and fruits in the scene. A more sophisticated vision system
may be able to tell how fresh the bread is and what types of cheeses and fruits you have on the cutting
board.

So, what would be a concise definition of


computer vision? Well, it depends on the But, What Really is Computer Vision?
background of the person you ask. In the early
Vision is
years of vision, David Marr, who wrote one of the
… automating human visual processes
first texts on vision, defined vision as automating
… an information processing task
human visual processes. Others have viewed it
… inverting image formation
more generally as an information processing task.
… inverse graphics

Berthold Horn, who wrote the book titled “Robot


… really useful!
Vision”, viewed it as inverting image formation. An
image is a mapping of the 3D world onto a 2D 5
image. Can we now go from the 2D image back
into the 3D world and say things about the objects that reside within it? Some like to view vision as the
inverse of graphics. In graphics you first create detailed models of both the 3D objects in the scene and
the lighting of the scene to then render a photorealistic 2D image. In vision, we are given a 2D image and
wish to use it to recover the 3D models of the objects that make up the scene.

My PhD advisor Takeo Kanade used to say that, irrespective of how you define it, vision is fun! Perhaps,
most importantly, vision is really useful.

FPCV-0-1 3
First Principles of Computer Vision Introduction

Vision deals with images. An image is an array of


pixels. The word pixel is short for “picture Vision Deals with Images
element.” Typically, in an image, each pixel records An Image is an Array of Pixels

the brightness and color of the corresponding point


in the scene. However, pixels could be richer in A Pixel has Values:

Brightness
terms of the information they record. For instance, •

• Color
a pixel could also measure the distance (or depth)
• Distance
of the corresponding scene point. In the future, it
• Material
could also reveal the material the scene point is • …
made of – whether it is made of plastic, metal,
wood, etc. 6

In short, with time, images will get richer in terms of the information they measure which, in turn, will
lead to more powerful vision systems. One day, not too long from now, we can expect to have computer
vision systems that can see things in a scene that even our powerful human visual system cannot.

We know that images are interesting. By simply


opening our eyes and looking at this image, we can Images Are Interesting
perceive an enormous amount of information. We
are immediately able to figure out that there are
two boys with one giving the other a shower, and
we can perceive the three-dimensional structure of
the environment, the vegetation, etc. In fact, we
even get a sense of what the boy on the right must
be feeling as the water falls on him and the overall
playful mood of the setting – all this in a fraction of
a second. I.19

FPCV-0-1 4
First Principles of Computer Vision Introduction

Now, in order to appreciate how difficult


computer vision is, take a look at the digital But When You Look Close…
equivalent of the same image shown here.

This is the array of numbers a vision system


receives from the camera. It is from these
numbers that we seek to extract all the
information you and I perceived from the image in
the previous slide. Think about that for a moment.
It gives us a true appreciation for how challenging
computer vision is, and that is also why it is 8

interesting.

Computer vision has been a vibrant field of


research for about 50 years now. We have learned Vision Research
several things. First, vision is a hard problem.
Second, it is a multidisciplinary field, drawing on • Vision is a Hard Problem

several disciplines including optics, electrical • Vision is Multi-Disciplinary


engineering, material science, computer science, • Considerable Progress Has Been Made
neuroscience, and even psychology. As hard as
• Many Successful Real-World Applications
vision is, a lot of progress has been made. Today,
there are many successful applications of
computer vision.
9
However, there is much more to come. In the
coming decades, vision is sure to play a critical role in the way we live our lives.

FPCV-0-1 5
First Principles of Computer Vision Introduction

Let us take a look at some of the things vision is


being used for today. Each one of these is a
thriving industry unto itself. I should mention that
What is Vision Used for?
these are merely examples and do not represent
a complete list of vision applications.
Shree K. Nayar
Columbia University

Topic: Introduction, Module: Introduction


First Principles of Computer Vision

10

As you know, manufacturing is highly automated


these days. Automobiles, for instance, are largely What is Vision Used For?
assembled by robots. Robots need computer
vision to be intelligent. Without vision, robots
would not be able to cope with the uncertainties
that come with any real environment. For
instance, if a robot is to insert a peg into a hole, it
needs vision to detect any variations in the size
and position of the hole. Vision-guided robotics is
a major application of computer vision.
Factory Automation: Vision-Guided Robotics
11

In factory automation, one of the major challenges


is inspecting the quality of manufactured objects. What is Vision Used For?
Given the speed of manufacturing and the fact that
components that go into products today can be
too small for the human eye to even see, computer
vision has become indispensable to modern-day
manufacturing.

I.1

Factory Automation: Visual Inspection

12

FPCV-0-1 6
First Principles of Computer Vision Introduction

Another widely used vision technology is optical


character recognition, or OCR. OCR is used today What is Vision Used For?
in traffic systems to identify vehicles that violate
traffic rules. The license plates of these vehicles
are automatically read, and tickets are mailed to
violator’s home.

ATA 010
Optical Character Recognition (OCR): Reading License Plates

13

Character recognition, as you can imagine, has


many other important applications, such as What is Vision Used For?
digitization of physical books, authentication of
signatures on checks, and reading mailing
addresses on envelopes and packages received by
postal services. OCR is now even available in phone
apps that enable you to translate in real time a sign
in one language into a language you understand.

I.2

Optical Character Recognition (OCR): Book Digitization

14

Vision plays a critical role in the field of biometrics,


where one’s physical characteristics are used to What is Vision Used For?
determine their identity. Iris recognition is a widely
used biometric today. Take a look at the high-
resolution images of the eyes of these two people.
It turns out that the intricate patterns seen in a
person’s iris are unique to them, almost as unique
as their DNA, and can be used to determine their
identity with very high confidence. I.3

Biometrics: Iris Recognition


15

FPCV-0-1 7
First Principles of Computer Vision Introduction

A vision technology that is ubiquitous today is face


detection. It is considered to be one of the most What is Vision Used For?
successful applications of vision. Faces can be
robustly detected in images under different poses
and illuminations. Face analysis can also be used to
recognize the identity of a person in an image. This
technology has far-reaching implications. It can be
used to organize photos in your personal
collection and to find suspicious persons in
security applications.
I.25

Biometrics: Face Detection and Recognition


16

A recent and interesting application of vision is


intelligent marketing. Here, you see a vending What is Vision Used For?
machine I have used in Shinagawa Station in
Tokyo. As a person approaches the machine, it
identifies the gender and rough age of the person
and displays products that are most likely to be of
interest to the person.

I.5

Intelligent Marketing: Vending Machine with Face Detection


17

Modern vision algorithms can also robustly track


people moving through space. In the context of What is Vision Used For?
surveillance, this technology is used to follow a
person as they move through a crowd, even as
they get obstructed by other people or objects in
the scene. In fact, when a person leaves the field
of view of one camera, they can be handed off to
another camera that continues to track them.

I.4

Security: Object Detection and Tracking


18

FPCV-0-1 8
First Principles of Computer Vision Introduction

One application of vision that is ubiquitous, but


one that you may be unaware of, is the optical What is Vision Used For?
mouse. Inside the mouse is a complete vision
system that tracks the movement of the pattern,
or texture, of the surface on which the mouse sits.
This is done using a low-resolution camera with a
very high frame-rate, enabling the mouse to
precisely estimate its motion with respect to the
surface it sits on. This information is used by your
computer to control the position of the cursor. I.6

Human Computer Interaction: Optical Mouse


19

Another popular application of vision is in gaming


consoles. Here you see a player using Microsoft’s What is Vision Used For?
Kinect. Kinect is packed with vision technology,
enabling it to capture the motion of the player’s
full body. This has given rise to a new breed of
engaging interactive games.

I.7

Entertainment and Gaming: Kinect


20

Vision plays a vital role in creating special effects


in movies and animations. Here, you see Doug What is Vision Used For?
Roble on the left with a camera attached to his Doug Elbor

head that is watching him. The expression on


Doug’s face is computed in real time and
transferred to the face of Elbor, who is a virtual
character. Using this technology, Doug Roble I.8

recently gave an entire TED talk via a virtual


character.

Visual Effects: Motion and Performance Capture 21

FPCV-0-1 9
First Principles of Computer Vision Introduction

Vision is also being used to create augmented


reality (AR) technologies. Here, you see Jian What is Vision Used For?
Wang’s face captured using the Snapchat camera.
Jian’s 3D face model (middle) is computed in real
time. This model is used to modify the appearance
of Jian’s face. For instance, you can see what Jian
might have looked like when he was a young boy
(left) or what he might look like when he becomes
an old man (right).
I.38

Augmented Reality: Face Manipulation


22

A useful application of vision is landmark


recognition. This is within the realm of what is What is Vision Used For?
called visual search. By simply taking a picture of
any well-known landmark, such as this monument,
one can immediately get detailed historical
information regarding the monument.

I.9

Visual Search: Landmark Recognition


23

Vision is not just useful but essential in certain


fields such as space exploration. Here, the Mars What is Vision Used For?
rover uses an array of cameras to extract and send
detailed information about the terrain of Mars
back to Earth. This is a scenario where vision is the
only way humans can explore a region that is
inaccessible to them.

I.10

Autonomous Navigation: Space Exploration


24

FPCV-0-1 10
First Principles of Computer Vision Introduction

One application of vision that is widely talked about


today is the driverless car. These cars use a wide What is Vision Used For?
range of cameras – visible light, infra-red, and
depth – to measure their surroundings with high
precision and detail. This information is used by
algorithms to enable the car to make decisions in a
variety of driving scenarios. There is little doubt
that driverless cars will soon become a part of our
everyday lives. This would not be possible without
the advances made by computer vision. I.11

Autonomous Navigation: Driverless Car


25

Another area that exploits vision is remote sensing.


Here, you see a satellite orbiting the Earth. High- What is Vision Used For?
resolution cameras on the satellite are used to
create 3D maps of the Earth’s surface, monitor
natural disasters, surveil enemy territory during
war, and track the effects of climate change on the
planet.

I.12

Remote Sensing
26

An important application domain for vision is


medical imaging. While vision is most often applied What is Vision Used For?
to visible light images captured by consumer
cameras, the algorithms developed for such
images can be modified and used to analyze
medical images such as X-ray, ultrasound and
magnetic resonance images. Here you see a
magnetic resonance image (MRI) from which
anatomical structures can be automatically
detected and analyzed to help diagnose the I.13

patient. Medical Image Analysis


27

FPCV-0-1 11
First Principles of Computer Vision Introduction

I hope I have convinced you that vision has


enormous utility. That is why it is a topic of wide
interest today.
How do Humans do it?

Before we begin to develop tools to help us solve


Shree K. Nayar
vision problems, it is worth taking a quick look at
Columbia University
how our human visual system works.
Topic: Introduction, Module: Introduction
First Principles of Computer Vision

28

As shown here, images of the world are captured


by each of our two eyes. Some early visual Human Eye and Visual Cortex
processing takes places in the eye itself. That is,
the information recorded by the retina is
processed by cells in the retina to reduce the
information that needs to be transmitted to the
brain. This information travels via the optic nerve
to the lateral geniculate nucleus where it is relayed
to the visual cortex – the part of the brain in the I.15

back that performs most of the visual processing. Vision is easy for us

But, we don’t fully understand how we do it! 29


As you can see in this map of the cortex, there are
regions of the cortex that perform different functions such as the perception of shape, color, motion,
etc. While we know roughly where each type of analysis takes place and roughly how many neurons
there are in each of these regions of the cortex, we are very far from having a detailed architecture, or
“circuit diagram”, of the human visual system. In short, we do not know enough about the visual cortex
to replicate it using electronics.

So, what do we do? We reinvent. This might sound unfortunate to you, but not quite. As you can imagine,
there are many applications of vision that require functionality and precision that go well beyond what
the human visual system is capable of. While human vision is remarkable in its versatility and is able to
cope with many complex real-world situations, it is more of a qualitative system than a quantitative one.

FPCV-0-1 12
First Principles of Computer Vision Introduction

For instance, if you wanted to know how many


millimeters long a pen is, the human visual system What do we do?
can only give very rough estimates. Such estimates
are not useful in domains such as factory
automation, medical imaging, or autonomous
Reinvent!
navigation of robots and cars. While no computer
vision system has yet been developed that is as
versatile as the human one, there are many
computer vision systems in use today that
demonstrate much higher precision and reliability
than ours. In short, for many tasks that require 30
vision, human vision may indeed be the wrong
system to emulate.

Furthermore, human vision is more fallible than we may like to believe. When you and I perceive
something incorrectly, we do not have a voice in our head telling us we are wrong. We see what we see
and believe it to be accurate.

To demonstrate this, let us take a look at some well-known optical illusions.

On the left is Fraser’s spiral. You should be seeing a spiral emerging from the center of the image. Well,
it turns out there is no spiral in the photo. You can verify this by following any one of the curves – you
will see that you end up where you started. That is, Fraser’s spiral is actually a set of perfectly concentric
circles as shown on the right.

Illusions: Fraser’s Spiral Illusions: Fraser’s Spiral

I.16 31 I.16 32

FPCV-0-1 13
First Principles of Computer Vision Introduction

Here, on the left, is the checker shadow illusion from Ted Adelson. We would all agree that patch A is
made of darker material than patch B. It turns out that we perceive this to be the case because our visual
system is able to determine that the illumination is varying over the scene. We first estimate this spatially
varying illumination and then use it to compensate for the brightness at each point in the scene. The end
result is that we perceive patch A to be of lower reflectance (darker material) than patch B.

Now, if you look at patches A and B in isolation, that is, if you cut them out of the rest of the image as
done on the right, you see that they have exactly the same brightness.

Illusions: Checker Shadow Illusions: Checker Shadow

I.17

B seems Brighter than A …But, they have the same brightness


33 34

Here is the Donguri wave illusion. It is a single


image and not a video. Yet, when you move your Illusions: Donguri Wave
eyes around the image, you perceive the leaves to
be moving around. Clearly, an illusion.

I.18

Perceived Motion Without Motion


35

FPCV-0-1 14
First Principles of Computer Vision Introduction

Here is an example of forced perspective. The


person standing on the right seems to be much Illusions: Forced Perspective
shorter than the one on the left. In fact, they are
almost exactly the same height. The room itself is
not a cuboid but rather tapered such that the
distance between the floor and the ceiling
increases with distance from the camera. This
changes your perception of the relative sizes of
objects in the room.
I.19

These two people are of the same height!


36

Then, there are visual ambiguities. These are not


really illusions. It is just that the image itself, since Visual Ambiguities
it is 2D while the world is 3D, can lead to multiple
interpretations of objects in the scene. In this
case, if you stare at one of the vertices, you can
lead yourself to believe that it is a corner that is
convex (cube popping out) or concave (cube
pushed in). In fact, if you keep staring at the
vertex, you will find yourself flipping between
these two interpretations.
Six Cubes or Seven Cubes?
37

There are even higher levels of ambiguity. The


image on the left can be perceived to be either of Visual Ambiguities
a young girl turned away from the viewer or the
profile of an old woman with a large nose.

On the right, it could be a vase or two heads facing


each other.

I.21 I.22

Young-Girl/Old-Woman Face/Vase
38

FPCV-0-1 15
First Principles of Computer Vision Introduction

Here is my favorite ambiguity. On the left you see a large mound with a small crater in the center. Now,
what would you see if you turned this image upside down? You would expect to see the large mound
hanging upside down, right? As seen on the right, in fact, you see a large crater with a small mound in
the center. You perceive this because we live in a world where the light is expected to come from above
– from the sun for instance. We therefore invoke this assumption – that the illumination is from above
– while interpreting the shape of an object from its shading.

Visual Ambiguities Visual Ambiguities

I.23 I.23

Crater on a Mound 39 Mound in a Crater 40

Finally, there is the famous Kanizsa triangle. Here


we perceive a white inverted triangle in the center. Seeing vs. Thinking
In fact, the triangle appears to be slightly brighter
than the white background. It also appears to be
closer in depth, seeming to sit above the rest of the
scene.

Of course, there is no triangle here. It is just three


“pac-man” like fragmented discs that are precisely
aligned to give the illusion of a triangle. I.24

Kanizsa Triangle
41

This example tells us that there is seeing and then


there is perceiving. Our eyes see the three “pac-man” discs, but from their arrangement our brain infers
the existence of a triangle.

FPCV-0-1 16
First Principles of Computer Vision Introduction

Now, let us take a quick look at the topics covered


in this lecture series.
Topics Covered

Shree K. Nayar
Columbia University

Topic: Introduction, Module: Introduction


First Principles of Computer Vision

42

We start with image formation, where we look at


how the 3D world is projected by a lens to form a Image Formation and Optics
2D image. We would like to understand the Where do Images Come From?
geometric and photometric relation between a
Light Source
point in 3D and its 2D projection in the image.

Object
Image plane Lens

Projection of 3D world on a 2D Plane


43

Next, we look at image sensors that are used to


convert the optical image formed by a lens into a Image Sensors
digital image. Image sensor technology has Convert Optical Images to Electrical Signals
evolved rapidly, enabling us to capture digital
images that today exceed what film could do in
the past. If there is one reason for the imaging
revolution we have witnessed in the past decade
or so, it is the remarkable improvements made in
image sensor technology. I.26

Consumer Image Sensor Typical Structure of Image Sensor

44

FPCV-0-1 17
First Principles of Computer Vision Introduction

The simplest type of image is the binary image,


which is a two-valued image (right) obtained by Binary Images
simply thresholding a captured image (left). You Two-Valued Images: Easy to Store and Process
end up with white (or 1) for object and black (or
0) for background. Such images are often used in
factory automation. They are easy to store and
process. We look at how they can be used to solve
simple vision problems.

Grayscale Image Binary Image

45

Next, we will devote two lectures to image


processing, which seeks to transform a captured Image Processing
image into a new one that is cleaner in terms of Transform Image to New One that is More Useful
the information we want to extract. In this
example, you can see a noisy captured image on
the left that is processed to create the one on the
right. In this processed image, the noise is more
or less removed while the edges are preserved.

I.37

Input Image Edge-Preserved Smoothing

46

With image processing tools under our belt, we


are in a position to extract useful features from Edge and Corner Detection
images. From the perspective of information Detecting Intensity Changes in the Image
theory, edges in an image are of great
importance. We start by developing a theory of
edge detection. Based on this theory, we develop
a few different edge and corner detectors.

I.27

Input Image Edges

47

FPCV-0-1 18
First Principles of Computer Vision Introduction

When you apply an edge detector to a typical


image, you get a lot of edges, but they are not Boundaries from Edges
related to each other. In order to extract objects Finding Continuous Lines from Edge Segments
from an image, we need to go from edges to
boundaries. When we look at the edge image in
the center, we can quickly group edges that Edge Boundary
Detection Detection
belong to the same boundary or contour. It turns
out that this grouping process is not as easy as it
I.28

seems. We develop a variety of algorithms that Input Image Edges Boundaries

can group edges to extract boundaries.


48

One feature detector we describe in detail is the


Scale Invariant Feature Transform (SIFT), which 2D Recognition using Features
can detect interesting blobs in an image. SIFT Matching using “Interesting Points”
features can be used to robustly detect and
recognize planar objects in an image, even when
they are scaled, rotated, and occluded by other
objects.

Object in Database Input Image and Detected Object

49

As an example application of feature detection,


we develop an algorithm that can take a set of Image Alignment and Stitching
overlapping images of a scene taken from roughly Combine multiple photos to create a larger photo
the same viewpoint (top row) as input and
produce a single seamless panorama (bottom).
This method, called image stitching, is available on
most smartphones. Source Image 1 Source Image 2 Source Image 3

Stitched Image
50

FPCV-0-1 19
First Principles of Computer Vision Introduction

Next, we discuss the widely used technique of face


detection. We develop an algorithm that can Face Detection
efficiently and robustly find faces in an image. One
of the key challenges here is to be able to reliably
discriminate between faces and non-faces and
handle faces with different skin tones,
expressions, illuminations, and poses.

I.34

51

Everything we have discussed thus far focuses on


extracting information in image coordinates, that Radiometry and Reflectance
is, in two dimensions (2D). Next, we lay the
groundwork for developing algorithms that Why do these Spheres Look Different?

recover the three-dimensional (3D) structure of a


scene from one or more images.

The first topic in this context is radiometry and


I.30

reflectance. We begin by defining the


fundamental concepts of radiometry, which has to
do with measuring light. We will establish a 52

relation between the brightness of a point in the


scene to its brightness in the image. Then, we explore why different materials appear the way they do.
We discuss a small number of popular reflectance models that can each describe a wide range of
materials found in the real world.

FPCV-0-1 20
First Principles of Computer Vision Introduction

It turns out that if we take a few images of an


object under different, known lighting conditions, Photometric Stereo
we can compute the surface normal at each point 3D Shape from Images under Different Lighting
on the object. This method is called photometric
stereo. In the case of a continuous surface, the
measured surface normals can be used to
reconstruct the shape of the object.

Computed Shape 53

Next, we look at shape from shading, a more


challenging problem, where we seek to extract the Shape from Shading
3D shape of a surface from a single shaded image. 3D Shape from a Single Image
We use experiments conducted with humans to
show that shape from shading is an under-
constrained problem. By using a few assumptions,
we show that shape can indeed be recovered from
a single image using shading.

Input Image Computed Shape

54

As you know, when you vary the focus setting of


your camera, you see objects in the scene come Depth from Focus/Defocus
into and go out of focus. In this sequence of
images, exactly when a point comes into focus is a
function of its depth from the camera. We develop
algorithms for recovering the 3D structure of a
Near-Focus Image
scene using both focus and defocus.

Estimated Depth Map

Far-Focus Image 55

FPCV-0-1 21
First Principles of Computer Vision Introduction

Our next topic is active illumination. In many real-


world applications of vision, such as factory Active Illumination Methods
automation, we have the ability to control the Using Patterned Lighting to Recover Shape
illumination of objects being imaged. When this is
possible, we can develop very efficient and
accurate methods for recovering the shapes of
objects. In this example, we see the statue of I.33

Michelangelo’s David, which has been scanned


using active illumination to produce a remarkably
accurate 3D model.
56

If we wish to precisely measure the height of a


bottle in millimeters by taking images of it, we Camera Calibration
need to first know how the position of a point in Estimating Camera Parameters
an image, which is measured in pixels, relates to
the position of the corresponding point in the 3D
world. Relating 2D image coordinates to 3D world
coordinates requires us to know the various
parameters related to image formation. The
process of estimating these parameters is called
camera calibration. We show how a single image Image of object with known geometry

of a 3D object of known dimensions, such as the 57

cube here, is enough to compute all the


parameters of the camera.

FPCV-0-1 22
First Principles of Computer Vision Introduction

Next, we present binocular stereo. Try this simple


experiment. Make the index fingers of both your Binocular Stereo
hands point out while folding your other fingers in. Computing Depth using Two Views
Now, hold your left hand out in front of you and
your right hand behind you. Then, shut any one of
your two eyes and bring your right hand from
behind quickly to make both of your index fingers Right View Left View
touch. Not easy, right? This is because with just
one eye, you lose quite a bit of your ability to
perceive depth.
I.40

Estimated Depth Map 58

Now, repeat the same with both of your eyes


open. You should find it much easier to make the two fingers meet.

We use two eyes to help us perceive the 3D structure of the world in front of us. The two eyes capture
two slightly different views of the world in front of us. The small differences in these two views are used
to estimate the depth of each point in the scene.

We will describe algorithms for achieving the same using two cameras. Here you see a right camera
image and a left camera image and the depth of the scene computed from them. In the depth image,
the closer a point is, the brighter it is.

Thus far, we have assumed the world to be static


while we capture our images. We know that, in Motion and Optical Flow
reality, everything is in motion all the time. We Determining the Movement of Scene Points
look at how the motion of a point in 3D relates to
its motion in the image. The visible motion of a
point in an image is called optical flow. Based on I.31

Frame 1 Frame 2 Frame 3


first principles, we develop an algorithm for
estimating optical flow (bottom) from a sequence
of images captured in quick succession (top).

Estimated Motion 59

FPCV-0-1 23
First Principles of Computer Vision Introduction

Imagine capturing a video of a structure such as


the sculpture shown here by simply moving a Structure From Motion
camera around it. Note that this is a casually
captured video, and we do not know the motion of
the camera in 3D. It turns out that, even without
knowing the motion of the camera, we can
compute the 3D structure of the scene (right).
Interestingly, in addition to the structure of the
scene, we can also determine the motion of the Casual Video
I.39

Reconstructed 3D Structure
camera. This method is called structure from
motion. 60

In the remaining lectures, we will cover topics


related to higher levels of visual perception. Object Tracking
Determining the Movement of Objects in Videos

We start with the problem of tracking objects as


they move through 3D space. In this example, we
see that the system is able to produce a separate
track for each person walking through a public
space. Note that such an algorithm must be able
to handle objects that briefly go out of view when
they are obstructed by other objects.
62

Next, we look at the important problem of image


segmentation. We develop algorithms that can Image Segmentation

take an image (left) as input and segment it into Group pixels with similar visual characteristics.

clearly defined regions (right), where each region


more or less corresponds to a single physical
object. Segmentation is an ill-defined task since
what exactly constitutes an object often depends
on the context. We develop a few different
approaches to image segmentation. I.29

Input Image Segmented Image

61

FPCV-0-1 24
First Principles of Computer Vision Introduction

The last problem we look at is recognition. The first


approach to recognition we discuss is called Appearance Matching
appearance matching. We capture many images of Object Recognition using Principle Component Analysis
an object under different poses and illumination
conditions. Dimension reduction is used to
compute a compact appearance model of the
object from its images. When the object appears
Learning Object Appearance Appearance Manifold (Model)
in a new image, it is segmented and recognized ! = 85°
using its appearance model. The model also
reveals the pose and lighting of the object.
! = 210°
Recognition by Matching Appearance 63

Finally, we discuss the use of artificial neural


networks for solving complex recognition tasks. Artificial Neural Networks
We first describe the concept of a neuron and how Using Network of Neurons to Solve Complex Problems
a network is constructed using a large number of *!
0
0
1
neurons. We then show how a network can be *"
2
0
0
0 0
efficiently trained using the back-propagation 3
0 0
4 0
0
algorithm. We conclude with a few examples of ⋮ ⋮ 5
0
+= 0
0
the use networks for solving challenging visual Input
6

7
1
1
0
784 pixels 0 0
recognition problems. 8
0 0
*#$% 9 Activation
0
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
64

FPCV-0-1 25
First Principles of Computer Vision Introduction

The lecture series is organized as 5 modules in


addition to this introduction. Module 1 focuses on
imaging and will cover image formation, image
About the Lecture Series
sensing and image processing. Module 2 is about
features and boundaries and will include edge
Shree K. Nayar
detection, boundary detection, the SIFT feature
Columbia University
detection, and applications like stitching
panoramas and detecting faces. Next, in Module 3, Topic: Introduction, Module: Introduction
First Principles of Computer Vision

we explore ways of reconstructing a 3D scene


using one or more images taken from a single
viewpoint. This module includes photometric 65
stereo, shape from shading, depth from defocus,
and active illumination methods. In Module 4, we explore reconstruction methods that use images taken
from multiple viewpoints. These methods include binocular stereo, optical flow, and structure from
motion. Finally, in Module 5, we discuss perception, which includes higher levels of visual processing
such as tracking, segmentation, and object recognition.

To follow these lectures, you do not need any prior


knowledge of computer vision. All you need to Modules and Prerequisites
know are the fundamentals of linear algebra and Modules:
calculus. If you happen to know a programming 0. Introduction
1. Imaging: Image Formation, Sensing, Processing
language, you would be able to imagine how the
2. Features: Edges, Boundaries, SIFT, Applications
methods we discuss can be implemented in 3. Reconstruction 1: Shading, Focus, Active Illumination

software. 4. Reconstruction 2: Stereo, Optical Flow, SFM


5. Perception: Segmentation, Tracking, Recognition

Prerequisites:
• Fundamental of Linear Algebra
• Fundamentals of Calculus
• One Programming Language
66

FPCV-0-1 26
First Principles of Computer Vision Introduction

Now, a few words about the slides I use in the


lectures. I have been teaching a course on About the Slides
computer vision at Columbia for many years. In the Once Upon a Time: Slides Thanks to:
initial years, before PowerPoint and Keynote Jinwei Gu, Neeraj Kumar,
Changyin Zhou, Oliver Cossairt,
became popular, I used hand-made overhead Guru Krishnan, Mohit Gupta,
Daniel Miau, Manushree Gangwar,
projector slides like the one shown here. In recent Avinash Nair, Parita Pooj, Henry

years, several of my students and postdocs have Xu, Robert Colgan, Anne Fleming

helped me create the slides I am now using. Also, Most of All:


new content has been added to the lectures as the Guru Krishnan

field has evolved.


67

Many of my students and postdocs have


contributed in small and big ways to these slides. In particular, I would like to thank Jinwei Gu, Neeraj
Kumar, Changyin Zhou, Oliver Cossairt, Guru Krishnan, Mohit Gupta, Daniel Miau, Avinash Nair, Parita
Pooj, Henry Xu, Robert Colgan and Anne Fleming for their efforts. Most of all, I would like to thank Guru
Krishnan who did the bulk of the work. Without Guru’s efforts, I am not sure I would have been able to
create this lecture series.

The slides come in a few different flavors. The one on the left is labeled at the bottom as a math primer.
I have several of these in the lectures. Each math primer highlights a mathematical concept that is not
only needed for vision, but one that I believe is useful to any engineering or computer science student.

Every now and then, you will see a review slide like the one on the right that is used to recap a concept
that was covered in an earlier lecture.

Example: Math Primer Slide Example: Review Slide

% !" = cos ' + ! sin ' ! = −1 ,& = -'


*
+ 0'
(%, $, ")
.
2
Expand ! !" using Taylor Series: 1& = -( + 0(
.
$#
!" ! !" " !" # !" $ !" %
% !" = 1 + !' + #!
+ %!
+ &!
+ '!
+ (!
+… (0,0,0)
"̂ %#
+- , ,-
(', 0,0) *−4
%# %$ %% %& %' %( ,) = -' + 0'
! !" = 1 − + − +⋯ +, %− + − +⋯ Left
.
2! 4! 6! 3! 5! 7! Camera 2
1) = -( + 0(
+. , ,.
.
cos ' sin '
Stereo System )! , )" , +, ,! , ," are
Right
MATH PRIMER 68 REVIEW Camera in pixel units. 69

FPCV-0-1 27
First Principles of Computer Vision Introduction

There a few other categories of slides you will see


along the way. These include appendices, eye and A Few More Special Slides
brain, history, and art. While computer vision has
historically been considered to be a subfield of
computer science, it has strong connections to
many other fields. I have always found these
connections interesting and so have highlighted
APPENDIX
them when appropriate.
EYE AND BRAIN

HISTORY

ART 70

The material covered in these lectures comes from


varied sources. I have used published papers and
textbooks.
References and Credits

Shree K. Nayar
Columbia University

Topic: Introduction, Module: Introduction


First Principles of Computer Vision

71

Here are a few of the texts I have used. The one


that overlaps most substantially with these Recommended Texts
lectures is “Computer Vision: Algorithms and Computer Vision: Algorithms and Applications (Vision)
Szeliski, R., Springer
Applications” by Rick Szeliski. “Computer Vision: A Computer Vision: A Modern Approach (Vision)
Forsyth, D and Ponce, J., Prentice Hall
Modern Approach” by Forsyth and Ponce is also a Robot Vision (Vision)
Horn, B. K. P., MIT Press
book I would encourage you to refer to. When it A Guided Tour of Computer Vision (Vision)
comes to first principles of many vision topics, it is Nalwa, V., Addison-Wesley

Digital Image Processing (Image Processing)


hard to beat “Robot Vision” by Berthold Horn. Few González, R and Woods, R., Prentice Hall

Optics (Optics)
authors are quite as precise as Horn. For a concise Hecht, E., Addison-Wesley

Eye and Brain (Human Vision)


and well-written overview of vision, I recommend Gregory, R., Princeton University Press

Animal Eyes (Biological Vision)


Vic Nalwa’s “Guided Tour of Computer Vision”. Land, M. and Nilsson, D., Oxford University Press 72

FPCV-0-1 28
First Principles of Computer Vision Introduction

We have two lectures on image processing. There are many excellent books on the topic and if you
needed to pick one, I would suggest “Digital Image Processing” by Gonzalez and Woods. For all things
related to optics, I strongly recommend the classic by Hecht. There is a small but wonderful book called
“Eye and Brain” by Richard Gregory, which has a lot of great nuggets related to human vision. Finally, if
you are interested in biological eyes of all kinds, I strongly recommend “Animal Eyes” by Land and
Nilsson. It is a thin book but densely packed with information.

Finally, I believe any lecture related to vision must


be visual. Else, it is a lost opportunity. I have used Image Credits
many visuals from varied sources in my lectures I.1 https://www.automation.com/images/article/omron/MVWP3.jpg
I.2 Adapted from ION Sound Experience.

and at the end of each lecture you can find the http://www.ionaudio.com/downloads/booksaver_2011_overview.pdf
I.3 Steve McCurry. Used with permission.
credits related to the photos and videos I have I.4 Anton Milan. Used with permission.
I.5 http://www.designboom.com/design/acure-digital-vending-machine/
used. I.6 http://howthingswork.org/electronics-how-an-optical-mouse-works/
I.7 Oliver Berg dpa.
I.8 Doug Roble. Used with permission.
I.9 Ales Leonardis. Used with permission.
I.10 http://en.wikipedia.org/wiki/File:NASA_Mars_Rover.jpg. NASA. Public Domain.
I.11 Waymo Self-driving car. Licensed under CC BY-SA 4.0.
I.12 http://en.wikipedia.org/wiki/File:NASA_Mars_Rover.jpg NASA. Public Domain.
I.13 http://www.sciencephoto.com. Used with permission.
I.15 Terese Winslow. Used with permission.
I.16 https://en.wikipedia.org/wiki/File:Fraser_spiral.svg. Public Domain.
I.17 Edward H. Adelson. Used with permission. 73

Image Credits Image Credits


I.18 A. Kitaoka. Used with permission. I.33 Marc Levoy. Used with permission.
I.19 http://commons.wikimedia.org/wiki/File:Ames_room_forced_perspective.jpg. I.34 Purchased from iStock by Getty Images.
Licensed under CC 2.0. Public Domain. I.35 Purchased from iStock by Getty Images.
I.21 My Wife and My Mother-In-Law. William Ely Hill, 1888. Public Domain. I.36 Purchased from iStock by Getty Images.
I.22 https://upload.wikimedia.org/wikipedia/commons/b/b5/Rubin2.jpg. I.37 Purchased from Shutterstock.com.
Public Domain. I.38 Jian Wang. Used with permission.
I.23 Associated Press World Wide Photos, 1972. I.39 Marc Pollefeys. Used with permission.
I.24 https://en.wikipedia.org/wiki/File:Kanizsa_triangle.svg. I.40 https://vision.middlebury.edu/stereo/data/scenes2001/. Tsukuba Stereo Dataset.
Licensed under CC BY-SA 3.0. I.41 https://builtin.com/robotics/automotive-cars-manufacturing-assembly
I.25 Greece National Football Team, 2017. Steindy. Licensed under CC BY-SA 3.0.
I.26 http://hamamatsu.magnet.fsu.edu/articles/microlensarray.html
I.27 Lena. Dwight Hooker, 1973.
I.28 John Wright. Used with permission.
I.29 https://www.mathworks.com/help/matlab/ref/rgb2gray.html
I.30 Fredo Durand. Used with permission.
I.31 Edward Adelson. Used with permission.
I.32 Anton Milan. Used with permission. 74 75

Acknowledgement: I thank Nisha Aggarwal and Jenna Everard for proof reading this monograph.

FPCV-0-1 29
First Principles of Computer Vision Introduction

References

[Szeliski 2022] Computer Vision: Algorithms and Applications, Szeliski, R., Springer, 2022.

[Forsyth and Ponce 2003] Computer Vision: A Modern Approach, Forsyth, D and Ponce, J., Prentice Hall,
2003

[Horn 1986] Robot Vision, Horn, B. K. P., MIT Press, 1986.

[Nalwa 1994] A Guided Tour of Computer Vision, Nalwa, V., Addison-Wesley, 1993.

[González and Woods 2009] Digital Image Processing, González, R and Woods, R., Prentice Hall, 2009.

[Hecht 2012] Optics, Hecht, E., Pearson Education India, 2012.

[Gregory 1966] Eye and Brain, Gregory, R., Princeton University Press, 1966.

[Land and Nilsson 2012] Animal Eyes, Land, M. and Nilsson, D., Oxford University Press, 2012.

FPCV-0-1

You might also like