5.1image Processing 1 FPCV-1-4
5.1image Processing 1 FPCV-1-4
Shree K. Nayar
Lecture: FPCV-1-4
Module: Imaging
Series: First Principles of Computer Vision
Computer Science, Columbia University
FPCV Channel
FPCV Website
First Principles of Computer Vision Image Processing I
Image Processing I
Topics:
Shree K. Nayar
Columbia University (1) Pixel Processing
This is the first of two lectures devoted to the topic of image processing. In image processing, we are
given an image which we want to transform into one that is easier to analyze. Perhaps we have an image
of a scene at night time, and it happens to be grainy or noisy due to the lack of light. We want to be able
to remove the noise from the image. Or, in an image of a fast-moving object, the object gets smeared,
an effect called motion blur. We want to be able to remove this smearing and create a crisp image of the
object. In a different scenario, an object of interest may lie outside the depth of field while it is imaged,
causing it to be defocus blurred. We want to be able to remove the blur so that the object is in focus. All
of these image enhancements can be achieved using image processing.
We may also be interested in recovering information from the image that is most salient to the vision
problem we are trying to solve. This may involve the detection of features such as edges and corners. A
wide variety of features can be detected using image processing. Image processing tools lie under the
hood in any computer vision system.
We will start with pixel processing, the simplest type of image processing. This just involves looking at
the brightness or the color of each pixel in the image and transforming it using some predetermined
mapping. We are not really concerned about where the pixel lies in the image. Next, we will talk about
linear shift invariant systems. This is a very important class of systems in image processing. Many
operations that are applied to images are linear and shift invariant, and any system that is linear and
shift invariant can be implemented as a convolution. We will look at what convolution is and discuss its
properties. Then, we will develop a suite of simple linear image filters that can be applied using
convolutions. We will take a look at what kinds of modification we can make to an image using linear
filters.
We will argue that there are certain image modifications that cannot be done using convolution. That
takes us to the class of nonlinear image filters, which can be viewed as more algorithmic in nature.
Looking at the values of both a pixel and its neighborhood, we apply simple algorithms to come up with
the output value of that pixel. Finally, we will talk about the important problem of template matching.
FPCV-1-4 1
First Principles of Computer Vision Image Processing I
Given a certain pattern, we want to find everywhere it appears in an image. This problem can be solved
using correlation, which is related to the concept of convolution.
Image as a Function
!
Pixel Processing
"
Shree K. Nayar
Columbia University
I.1
Let us start by defining an image as a function f(x,y), where f is the intensity at the spatial coordinates
(x,y). If we have a color image, there will be multiple channels – red, green, and blue – each of which will
be a function.
image. Taking a pixel, we can simply transform its Transformation ! of intensity ! at each pixel to
brightness value based on the value itself, and intensity ':
FPCV-1-4 2
First Principles of Computer Vision Image Processing I
Original ($)
“negative” of the image shown at the bottom.
Invert (255 − $)
Original ($)
image, by taking a linear combination of the three
color values at each pixel. Pixel processing is a very Gray (0.3%% + 0.6%& + 0.1%' )
FPCV-1-4 3
First Principles of Computer Vision Image Processing I
"($ − () &($ − ()
10
FPCV-1-4 4
First Principles of Computer Vision Image Processing I
brightness of the defocused image. If we shift an object in the scene, its image is going to shift in the
focused image, and its defocused image is also going to shift by the same amount. The relationship
between f and g is therefore linear and shift invariant. This is an example of how a linear shift invariant
system might manifest in the case of an imaging system.
working in computer vision or not, this concept is Convolution of two functions !(#) and ℎ #
going to pop up sooner or later, so it is worth 0
paying close attention to. Shown here is the ' # = !(#) ∗ ℎ(#) = + ! , ℎ(# − ,) .,
/0
mathematical definition of convolution, which is
denoted by an asterisk. We have f(x) convolved !(,) ℎ(,)
with h(x) to get the result g(x). To gain some
geometrical insight into the mathematical
, ,
definition of convolution, we will first express f and
g as functions of 𝜏, as shown below. 12
We take h(𝜏) and flip it about the vertical axis to get h(-𝜏), as shown in the left slide. We shift h(-𝜏) by x
to get h(x- 𝜏), which is then overlaid on f(𝜏), as shown in the right slide.
Convolution Convolution
Convolution of two functions !(#) and ℎ # Convolution of two functions !(#) and ℎ #
0 0
, , , ,
# #
13 14
FPCV-1-4 5
First Principles of Computer Vision Image Processing I
infinity to infinity. This gives us a single number, Convolution of two functions !(#) and ℎ #
which is the result of the convolution at the point 0
!(,) 0
∫/0 ! , ℎ(# − ,) ., ℎ(# − ,)
, ,
# #
15
infinity by sliding the function h(-𝜏) over f(𝜏) from ' # = !(#) ∗ ℎ(#) = + ! , ℎ(# − ,) .,
/0
left to right. For each shift value x we find the
product of the two functions and then the integral !(,) ! # ∗ ℎ(#) ℎ(# − ,)
of the product. This gives us the entire function
g(x), which is the result of the convolution.
, ,
#
It turns out that any linear shift invariant system is LSIS implies Convolution and Convolution implies LSIS
performing a convolution, and whenever we are 16
FPCV-1-4 6
First Principles of Computer Vision Image Processing I
!(#) ℎ(#)
each location of the sliding function we find the
2
integral of the product of the two functions. As we
move from minus infinity, most of the time the
product of the two functions is going to be zero. -2 -1 1 2
FPCV-1-4 7
First Principles of Computer Vision Image Processing I
http://www.jhu.edu/signals/convolve/
19
Then:
Suppose we have that f1 convolved with h gives us #
integrals with the constants 𝛼 and 𝛽 outside the = ,!! " + .!$ "
because a is a finite number. This integral is simply $ % & − / ℎ(" − &) +&
"#
= $ % 0 ℎ(" − / − 0) +0 1 (Substituting µ = 3 − .)
shifted by a, so we see that convolution is shift "#
system.
FPCV-1-4 8
First Principles of Computer Vision Image Processing I
22
FPCV-1-4 9
First Principles of Computer Vision Image Processing I
24
FPCV-1-4 10
First Principles of Computer Vision Image Processing I
FPCV-1-4 11
First Principles of Computer Vision Image Processing I
= ?
9 @
Topic: Image Processing I, Module: Imaging
First Principles of Computer Vision B
: A
ℎ
! (1, 3) ' 29
28
Now that we understand what a linear shift invariant system is, and that it is just performing a
convolution, we can develop some very simple linear image filters that use convolution to enhance
images or extract information from them.
First, let us take a look at how convolution works in the case of discrete images. The definition of
convolution in discrete domain is given by this expression 1 . The input discrete image is f[i,j] where i is
the row and j is the column, and the size of the image is M by N. f[i,j] is being convolved with an impulse
response h[i,j] and the output is g[i,j], which is also an image of the same size as f[i,j]. In image processing,
the impulse response h[i,j] is referred to as a mask, a kernel, or a filter. We will use these terms
interchangeably. Since this is a two-dimensional convolution, the flip of the filter happens twice, once
with respect to i and then with respect to j.
There is simple way to visualize this two-dimensional convolution. Let us assume the filter is small
compared to the input image. Then, the value of the output image g at pixel location [i,j] is obtained by
flipping the filter h twice, overlaying it on the image f with the center of the filter at [i,j], and finding the
sum of the product of the pixel values of the image and the filter in the overlap region. This process is
repeated for all pixels in the input image to get the output image g. You can imagine that writing a
program to perform convolution is quite straightforward.
FPCV-1-4 12
First Principles of Computer Vision Image Processing I
I.6
31
FPCV-1-4 13
First Principles of Computer Vision Image Processing I
square with 5x5 pixels where each pixel has a Input Output
constant value of 1. The output image is going to
be a smooth (blurred) version of the input image
because each pixel in the output image will be the
aggregate of 5x5 or 25 pixels in the input image. In
∗ =
“Box Filter”
addition, the output image is going to be really 5x5
bright, around 25 times brighter than the input !(#, %) E(#, %) '(#, %)
image. Let us say the images are represented using
Result Image is saturated. Why?
eight bits of brightness information at each pixel. 33
FPCV-1-4 14
First Principles of Computer Vision Image Processing I
output that is smoother than the 5x5 box filter. But Input Output
if we look closely at this image, we see that it has
some “blocky” artifacts. We can see that these
artifacts line up with the vertical and horizontal
axes. This is because the box filter has hard vertical
∗ =
“Box Filter”
and horizontal edges on its boundary. I.7
21 x 21
36
FPCV-1-4 15
First Principles of Computer Vision Image Processing I
B
So, what size filter should we use? This is an
interesting question because the Gaussian F=2 F=3 F=4 F=5
function goes to zero only at infinity. Clearly, we do Rule of thumb: Set kernel size $ ≈ 2'#
not want to use a filter that is infinite in extent. As 37
∗ = ∗ =
4=4 4 = 16
Larger the kernel (or σ), more the blurring Larger the kernel (or σ), more the blurring
38 39
Let us look at the effect of changing the width of the Gaussian filter. When we convolve f(x,y) with the
Gaussian with 𝜎=4, we get a little bit of smoothing. When we increase 𝜎 to 16, we get more smoothing
or blurring without any undesirable artifacts being introduced in the output image.
FPCV-1-4 16
First Principles of Computer Vision Image Processing I
forward to end up with two terms: one which Using One 2D Gaussian Filter ≡ Using Two 1D Gaussian Filters
sums over m and a second which sums over n. This
implies that the input image is being convolved (∗ = (∗ B ∗
B
with a (horizontal) one-dimensional Gaussian of B 40
width K, and that resulting image is again
convolved with a second (vertical) one-dimensional Gaussian of height K. The end result is exactly equal
to convolving the image f with the original KxK two-dimensional Gaussian filter. This is made possible by
the fact that the two-dimensional Gaussian function is separable, in that, it can be written as the product
of two one-dimensional Gaussian functions. We can exploit this to dramatically reduce the
computational cost of filtering the image.
B
particular pixel. At that pixel, we would need to do
K2 multiplications and then K2-1 additions to get Which one is faster? Why?
FPCV-1-4 17
First Principles of Computer Vision Image Processing I
42
same time, we are also blurring out of the edges of Image with
Salt and Pepper Noise
Gaussian Blurred
Image
the intensity values (including its own value) within 1. Sort the 8 $ values in window centered at the pixel
B
2. Assign the Middle Value (Median) to pixel
a KxK window centered at the pixel and sort them. B
We then find the middle value of the sorted list,
which is the median of all the intensity values. We
simply use the median as the filtered output for
the pixel.
Image with Median Filtered
When we apply median filtering with even a small Salt and Pepper Noise Image (B = 3)
filter, say K=3, we see that the result is a significant Non-linear Operation
(Cannot be implemented using convolution)
improvement over the original image. Almost all of 44
FPCV-1-4 18
First Principles of Computer Vision Image Processing I
the noise is gone. We do lose a little bit of detail on the coins, but the result is quite impressive given
that the mask is very small. Note that we cannot implement a median filter using convolution — the
sorting step makes it a nonlinear method.
by noise. This is the type of noise that typically Not Effective when Image Noise is not a Simple
Salt and Pepper Noise.
appears when an image is taken under low-light
conditions. In this case, we need to use a bigger
median filter. When we do that, the noise is
reduced, but the price that we pay is that the
details on the coins are also lost. If we go to an
Image with Noise Median Filtered
even bigger filter, we do better in terms of the Image (B = 11)
noise reduction, but even more of the details are Larger $ causes blurring of image detail
gone. Can we come up with a filter for removing 45
out.
I.6
∗ =
The reason this happens is because we are using Input Output
FPCV-1-4 19
First Principles of Computer Vision Image Processing I
preserving the edge. When the Gaussian filter is Gaussian blurs across edges 48
FPCV-1-4 20
First Principles of Computer Vision Image Processing I
of edge that the center pixel lies on, but has low Multiply
FPCV-1-4 21
First Principles of Computer Vision Image Processing I
spatial Gaussian of 2 and a sigma for the brightness Original Gaussian Bilateral
4% = 2 4% = 2, 4& = 10
Gaussian of 10, we get a very nice result. Virtually 51
all of the noise has been removed and, at the same
time, the spatial features have been well preserved.
FPCV-1-4 22
First Principles of Computer Vision Image Processing I
Template Matching
Shree K. Nayar
Columbia University
Next, let us discuss the problem of template matching, where we are given a template — an image patch
with a pattern that is relevant to the application — and the goal is to find all the locations in an image
where the template appears.
Consider the example shown in the slide on the right. Our goal is to find the template (the face of the
king) on the right in the image of the card on the left. A natural way to solve this problem is to slide the
template over the image and for each position of the template find the difference between the template
and the image region it overlaps. We can mathematically define the difference E(i,j) between the
template and the underlying image region for the template location (i,j) as the sum of the squared
differences (SSD) between the pixels in the template and the image region. When this difference is small,
we have found the template in the image. If we expand E(i,j), we get this expression 1 . Note that
maximizing E(i,j) is equivalent to minimizing the last of the three terms of E(i,j).
FPCV-1-4 23
First Principles of Computer Vision Image Processing I
Convolution:
![?, @] = B B % C, D E[? − C, @ − D] = E ∗ %
Template ' (
Correlation:
How do we locate the template in the image? F)* [?, @] = B B % C, D E[C − ?, D − @] = E⨂%
' (
Maximize:
Here we show the last term mentioned above, which is called cross-correlation. Notice that it looks
similar to the expression for convolution. The difference is that while in convolution one of the two
functions needs to flipped before computing the integral (or summation), in this case neither of the two
functions is flipped. This appears to be a trivial difference but, in fact, it has mathematical implications
that result in convolution and correlation having different properties.
FPCV-1-4 24
First Principles of Computer Vision Image Processing I
FPCV-1-4 25
First Principles of Computer Vision Image Processing I
References: Textbooks
63 64
[Tomasi 1998] C. Tomasi and R. Manduchi, “Bilateral filtering for gray I.1 https://commons.wikimedia.org/wiki/File:Ansel_Adams_and_camera.jpg.
and color images,” in Proceedings of the IEEE International Conference Public Domain.
on Computer Vision, 1998. I.2 Wilson J. Pugh. Used with permission.
I.5 Purchased from iStock by Getty Images.
I.6 Purchased from iStock by Getty Images.
I.7 Purchased from iStock by Getty Images.
I.8 Purchased from iStock by Getty Images.
65 66
Acknowledgements: Thanks to Nisha Aggarwal and Jenna Everard for their help with transcription,
editing and proofreading.
FPCV-1-4 26