Introduction FPCV-0-1
Introduction FPCV-0-1
Shree K. Nayar
Monograph: FPCV-0-1
Module: Introduction
Series: First Principles of Computer Vision
Computer Science, Columbia University
FPCV Channel
FPCV Website
First Principles of Computer Vision Introduction
Since deep learning is popular today, you may be wondering if it is worth knowing the first principles of
vision, or, for that matter, the first principles of any field. Given a task, why not just train a neural network
with tons of data to solve the task? Indeed, there are applications where such an approach may suffice,
but there are several reasons to embrace the basics.
First, it would be laborious and unnecessary to train a network to learn a phenomenon that can be
concisely and precisely described using first principles. Second, when a network does not perform well
enough, first principles are your only hope for understanding why. Third, a network that is intended to
learn a complex mapping would typically require an enormous amount of training data to be collected.
This can be tedious and sometimes even impractical. In such cases, models based on first principles can
be used to synthesize the training data instead of collecting it. Finally, the most compelling reason to
learn the first principles of any field is curiosity. What makes humans unique is our innate desire to know
why things work the way they do.
I have partitioned this lecture series into 5 modules, each spanning an important aspect of computer
vision. Module 1 is about imaging. Module 2 is about detecting features and boundaries. Module 3 is on
3D reconstruction from a single viewpoint. Module 4 is on 3D reconstruction using multiple viewpoints.
Module 5 covers perception.
To follow any of these modules, you do not need any prior knowledge of computer vision. All you need
to know are the fundamentals of linear algebra and calculus. If you happen to know a programming
language, it would enable you to picture how the methods I describe can be implemented in software.
In short, any science or engineering sophomore should be able to handle the material with ease.
FPCV-0-1 1
First Principles of Computer Vision Introduction
While we approach vision as an engineering discipline in this series, when appropriate, we make
connections with other fields such as neuroscience, psychology, art history, and biology. I hope you enjoy
the lectures and, by the end of it, I hope you will be convinced that computer vision is not only powerful
but also fascinating.
Shree K. Nayar
Columbia University
2 3
I.1
Vision is our most powerful sense. It allows us to interact with the physical world without making any
direct physical contact. It is believed that about 60% of the brain is, in one way or the other, involved in
visual processing. Ponder that for a moment. Thanks to our vision system, we are able to effortlessly
navigate through the complex world we live in and perform a variety of daily chores. In fact, our visual
system is so powerful that most of the time we are unaware of how much it is doing for us.
Computer vision is the enterprise of building machines that can see. You may be wondering, given that
the human visual system is so powerful, why even bother to build machines that can emulate it? Well,
there are several reasons. First, there are many chores we perform on a daily basis that we would rather
have done by a machine so we can free up time to devote to more rewarding activities. Examples of such
chores might be tidying your home and driving to work. Second, while our vision system is truly powerful,
it tends to be more qualitative than quantitative. It is not particularly good at making precise
measurements of things in the physical world. Lastly, and perhaps most importantly, a computer vision
system can be designed to surpass the capability of human vision and extract information about the
world that we simply cannot.
FPCV-0-1 2
First Principles of Computer Vision Introduction
My PhD advisor Takeo Kanade used to say that, irrespective of how you define it, vision is fun! Perhaps,
most importantly, vision is really useful.
FPCV-0-1 3
First Principles of Computer Vision Introduction
Brightness
terms of the information they record. For instance, •
• Color
a pixel could also measure the distance (or depth)
• Distance
of the corresponding scene point. In the future, it
• Material
could also reveal the material the scene point is • …
made of – whether it is made of plastic, metal,
wood, etc. 6
In short, with time, images will get richer in terms of the information they measure which, in turn, will
lead to more powerful vision systems. One day, not too long from now, we can expect to have computer
vision systems that can see things in a scene that even our powerful human visual system cannot.
FPCV-0-1 4
First Principles of Computer Vision Introduction
interesting.
FPCV-0-1 5
First Principles of Computer Vision Introduction
10
I.1
12
FPCV-0-1 6
First Principles of Computer Vision Introduction
ATA 010
Optical Character Recognition (OCR): Reading License Plates
13
I.2
14
FPCV-0-1 7
First Principles of Computer Vision Introduction
I.5
I.4
FPCV-0-1 8
First Principles of Computer Vision Introduction
I.7
FPCV-0-1 9
First Principles of Computer Vision Introduction
I.9
I.10
FPCV-0-1 10
First Principles of Computer Vision Introduction
I.12
Remote Sensing
26
FPCV-0-1 11
First Principles of Computer Vision Introduction
28
back that performs most of the visual processing. Vision is easy for us
So, what do we do? We reinvent. This might sound unfortunate to you, but not quite. As you can imagine,
there are many applications of vision that require functionality and precision that go well beyond what
the human visual system is capable of. While human vision is remarkable in its versatility and is able to
cope with many complex real-world situations, it is more of a qualitative system than a quantitative one.
FPCV-0-1 12
First Principles of Computer Vision Introduction
Furthermore, human vision is more fallible than we may like to believe. When you and I perceive
something incorrectly, we do not have a voice in our head telling us we are wrong. We see what we see
and believe it to be accurate.
On the left is Fraser’s spiral. You should be seeing a spiral emerging from the center of the image. Well,
it turns out there is no spiral in the photo. You can verify this by following any one of the curves – you
will see that you end up where you started. That is, Fraser’s spiral is actually a set of perfectly concentric
circles as shown on the right.
I.16 31 I.16 32
FPCV-0-1 13
First Principles of Computer Vision Introduction
Here, on the left, is the checker shadow illusion from Ted Adelson. We would all agree that patch A is
made of darker material than patch B. It turns out that we perceive this to be the case because our visual
system is able to determine that the illumination is varying over the scene. We first estimate this spatially
varying illumination and then use it to compensate for the brightness at each point in the scene. The end
result is that we perceive patch A to be of lower reflectance (darker material) than patch B.
Now, if you look at patches A and B in isolation, that is, if you cut them out of the rest of the image as
done on the right, you see that they have exactly the same brightness.
I.17
I.18
FPCV-0-1 14
First Principles of Computer Vision Introduction
I.21 I.22
Young-Girl/Old-Woman Face/Vase
38
FPCV-0-1 15
First Principles of Computer Vision Introduction
Here is my favorite ambiguity. On the left you see a large mound with a small crater in the center. Now,
what would you see if you turned this image upside down? You would expect to see the large mound
hanging upside down, right? As seen on the right, in fact, you see a large crater with a small mound in
the center. You perceive this because we live in a world where the light is expected to come from above
– from the sun for instance. We therefore invoke this assumption – that the illumination is from above
– while interpreting the shape of an object from its shading.
I.23 I.23
Kanizsa Triangle
41
FPCV-0-1 16
First Principles of Computer Vision Introduction
Shree K. Nayar
Columbia University
42
Object
Image plane Lens
44
FPCV-0-1 17
First Principles of Computer Vision Introduction
45
I.37
46
I.27
47
FPCV-0-1 18
First Principles of Computer Vision Introduction
49
Stitched Image
50
FPCV-0-1 19
First Principles of Computer Vision Introduction
I.34
51
FPCV-0-1 20
First Principles of Computer Vision Introduction
Computed Shape 53
54
Far-Focus Image 55
FPCV-0-1 21
First Principles of Computer Vision Introduction
FPCV-0-1 22
First Principles of Computer Vision Introduction
We use two eyes to help us perceive the 3D structure of the world in front of us. The two eyes capture
two slightly different views of the world in front of us. The small differences in these two views are used
to estimate the depth of each point in the scene.
We will describe algorithms for achieving the same using two cameras. Here you see a right camera
image and a left camera image and the depth of the scene computed from them. In the depth image,
the closer a point is, the brighter it is.
Estimated Motion 59
FPCV-0-1 23
First Principles of Computer Vision Introduction
Reconstructed 3D Structure
camera. This method is called structure from
motion. 60
take an image (left) as input and segment it into Group pixels with similar visual characteristics.
61
FPCV-0-1 24
First Principles of Computer Vision Introduction
7
1
1
0
784 pixels 0 0
recognition problems. 8
0 0
*#$% 9 Activation
0
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
64
FPCV-0-1 25
First Principles of Computer Vision Introduction
Prerequisites:
• Fundamental of Linear Algebra
• Fundamentals of Calculus
• One Programming Language
66
FPCV-0-1 26
First Principles of Computer Vision Introduction
years, several of my students and postdocs have Xu, Robert Colgan, Anne Fleming
The slides come in a few different flavors. The one on the left is labeled at the bottom as a math primer.
I have several of these in the lectures. Each math primer highlights a mathematical concept that is not
only needed for vision, but one that I believe is useful to any engineering or computer science student.
Every now and then, you will see a review slide like the one on the right that is used to recap a concept
that was covered in an earlier lecture.
FPCV-0-1 27
First Principles of Computer Vision Introduction
HISTORY
ART 70
Shree K. Nayar
Columbia University
71
Optics (Optics)
authors are quite as precise as Horn. For a concise Hecht, E., Addison-Wesley
FPCV-0-1 28
First Principles of Computer Vision Introduction
We have two lectures on image processing. There are many excellent books on the topic and if you
needed to pick one, I would suggest “Digital Image Processing” by Gonzalez and Woods. For all things
related to optics, I strongly recommend the classic by Hecht. There is a small but wonderful book called
“Eye and Brain” by Richard Gregory, which has a lot of great nuggets related to human vision. Finally, if
you are interested in biological eyes of all kinds, I strongly recommend “Animal Eyes” by Land and
Nilsson. It is a thin book but densely packed with information.
and at the end of each lecture you can find the http://www.ionaudio.com/downloads/booksaver_2011_overview.pdf
I.3 Steve McCurry. Used with permission.
credits related to the photos and videos I have I.4 Anton Milan. Used with permission.
I.5 http://www.designboom.com/design/acure-digital-vending-machine/
used. I.6 http://howthingswork.org/electronics-how-an-optical-mouse-works/
I.7 Oliver Berg dpa.
I.8 Doug Roble. Used with permission.
I.9 Ales Leonardis. Used with permission.
I.10 http://en.wikipedia.org/wiki/File:NASA_Mars_Rover.jpg. NASA. Public Domain.
I.11 Waymo Self-driving car. Licensed under CC BY-SA 4.0.
I.12 http://en.wikipedia.org/wiki/File:NASA_Mars_Rover.jpg NASA. Public Domain.
I.13 http://www.sciencephoto.com. Used with permission.
I.15 Terese Winslow. Used with permission.
I.16 https://en.wikipedia.org/wiki/File:Fraser_spiral.svg. Public Domain.
I.17 Edward H. Adelson. Used with permission. 73
Acknowledgement: I thank Nisha Aggarwal and Jenna Everard for proof reading this monograph.
FPCV-0-1 29
First Principles of Computer Vision Introduction
References
[Szeliski 2022] Computer Vision: Algorithms and Applications, Szeliski, R., Springer, 2022.
[Forsyth and Ponce 2003] Computer Vision: A Modern Approach, Forsyth, D and Ponce, J., Prentice Hall,
2003
[Nalwa 1994] A Guided Tour of Computer Vision, Nalwa, V., Addison-Wesley, 1993.
[González and Woods 2009] Digital Image Processing, González, R and Woods, R., Prentice Hall, 2009.
[Gregory 1966] Eye and Brain, Gregory, R., Princeton University Press, 1966.
[Land and Nilsson 2012] Animal Eyes, Land, M. and Nilsson, D., Oxford University Press, 2012.
FPCV-0-1