0% found this document useful (0 votes)
17 views152 pages

Minor Notes Ch1-5

The document discusses image sampling and quantization, explaining the processes required to convert continuous images into digital formats. It details how sampling involves taking discrete spatial samples and quantization converts continuous intensity values into discrete levels. The quality of the resulting digital image is influenced by the number of samples and the discrete intensity levels used during these processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views152 pages

Minor Notes Ch1-5

The document discusses image sampling and quantization, explaining the processes required to convert continuous images into digital formats. It details how sampling involves taking discrete spatial samples and quantization converts continuous intensity values into discrete levels. The quality of the resulting digital image is influenced by the number of samples and the discrete intensity levels used during these processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 152

2.

4 Image Sampling and Quantization 63

chest X-ray. In this case, we would deal with a transmissivity instead of a reflectivity
function, but the limits would be the same as in Eq. (2-6), and the image function
formed would be modeled as the product in Eq. (2-4).

EXAMPLE 2.1 : Some typical values of illumination and reflectance.


The following numerical quantities illustrate some typical values of illumination and reflectance for
visible light. On a clear day, the sun may produce in excess of 90, 000 lm/m 2 of illumination on the sur-
face of the earth. This value decreases to less than 10, 000 lm/m 2 on a cloudy day. On a clear evening, a
full moon yields about 0.1 lm/m 2 of illumination. The typical illumination level in a commercial office
is about 1, 000 lm/m 2 . Similarly, the following are typical values of r( x, y): 0.01 for black velvet, 0.65 for
stainless steel, 0.80 for flat-white wall paint, 0.90 for silver-plated metal, and 0.93 for snow.

Let the intensity (gray level) of a monochrome image at any coordinates ( x, y)


be denoted by

/ = f ( x, y) (2-7)

From Eqs. (2-4) through (2-6) it is evident that / lies in the range

Lmin ≤ / ≤ Lmax (2-8)

In theory, the requirement on Lmin is that it be nonnegative, and on Lmax that it


be finite. In practice, Lmin = imin rmin and Lmax = imax rmax . From Example 2.1, using
average office illumination and reflectance values as guidelines, we may expect
Lmin ≈ 10 and Lmax ≈ 1000 to be typical indoor values in the absence of additional
illumination. The units of these quantities are lum/m 2 . However, actual units sel-
dom are of interest, except in cases where photometric measurements are being
performed.
The interval [Lmin , Lmax ] is called the intensity (or gray) scale. Common practice is
to shift this interval numerically to the interval [0, 1], or [0, C ], where / = 0 is consid-
ered black and / = 1 (or C ) is considered white on the scale. All intermediate values
are shades of gray varying from black to white.

2.4 IMAGE SAMPLING AND QUANTIZATION


2.4

The discussion of sam-


As discussed in the previous section, there are numerous ways to acquire images, but
pling in this section is of our objective in all is the same: to generate digital images from sensed data. The out-
an intuitive nature. We
will discuss this topic in
put of most sensors is a continuous voltage waveform whose amplitude and spatial
depth in Chapter 4. behavior are related to the physical phenomenon being sensed. To create a digital
image, we need to convert the continuous sensed data into a digital format. This
requires two processes: sampling and quantization.

BASIC CONCEPTS IN SAMPLING AND QUANTIZATION


Figure 2.16(a) shows a continuous image f that we want to convert to digital form.
An image may be continuous with respect to the x- and y-coordinates, and also in

www.EBooksWorld.ir
64 Chapter 2 Digital Image Fundamentals

a b
c d
FIGURE 2.16 A B
(a) Continuous
image. (b) A
scan line show-
ing intensity
variations along
line AB in the A B
continuous image.
(c) Sampling and
quantization.
(d) Digital scan
line. (The black
border in (a) is
included for A B A B
clarity. It is not
part of the image).
Quantization

Sampling

amplitude. To digitize it, we have to sample the function in both coordinates and
also in amplitude. Digitizing the coordinate values is called sampling. Digitizing the
amplitude values is called quantization.
The one-dimensional function in Fig. 2.16(b) is a plot of amplitude (intensity
level) values of the continuous image along the line segment AB in Fig. 2.16(a). The
random variations are due to image noise. To sample this function, we take equally
spaced samples along line AB, as shown in Fig. 2.16(c). The samples are shown as
small dark squares superimposed on the function, and their (discrete) spatial loca-
tions are indicated by corresponding tick marks in the bottom of the figure. The set
of dark squares constitute the sampled function. However, the values of the sam-
ples still span (vertically) a continuous range of intensity values. In order to form a
digital function, the intensity values also must be converted (quantized) into discrete
quantities. The vertical gray bar in Fig. 2.16(c) depicts the intensity scale divided
into eight discrete intervals, ranging from black to white. The vertical tick marks
indicate the specific value assigned to each of the eight intensity intervals. The con-
tinuous intensity levels are quantized by assigning one of the eight values to each
sample, depending on the vertical proximity of a sample to a vertical tick mark. The
digital samples resulting from both sampling and quantization are shown as white
squares in Fig. 2.16(d). Starting at the top of the continuous image and carrying out
this procedure downward, line by line, produces a two-dimensional digital image.
It is implied in Fig. 2.16 that, in addition to the number of discrete levels used, the
accuracy achieved in quantization is highly dependent on the noise content of the
sampled signal.

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 65

a b
FIGURE 2.17
(a) Continuous
image projected
onto a sensor
array. (b) Result
of image sampling
and quantization.

In practice, the method of sampling is determined by the sensor arrangement


used to generate the image. When an image is generated by a single sensing element
combined with mechanical motion, as in Fig. 2.13, the output of the sensor is quan-
tized in the manner described above. However, spatial sampling is accomplished by
selecting the number of individual mechanical increments at which we activate the
sensor to collect data. Mechanical motion can be very exact so, in principle, there is
almost no limit on how fine we can sample an image using this approach. In practice,
limits on sampling accuracy are determined by other factors, such as the quality of
the optical components used in the system.
When a sensing strip is used for image acquisition, the number of sensors in the
strip establishes the samples in the resulting image in one direction, and mechanical
motion establishes the number of samples in the other. Quantization of the sensor
outputs completes the process of generating a digital image.
When a sensing array is used for image acquisition, no motion is required. The
number of sensors in the array establishes the limits of sampling in both directions.
Quantization of the sensor outputs is as explained above. Figure 2.17 illustrates this
concept. Figure 2.17(a) shows a continuous image projected onto the plane of a 2-D
sensor. Figure 2.17(b) shows the image after sampling and quantization. The quality
of a digital image is determined to a large degree by the number of samples and dis-
crete intensity levels used in sampling and quantization. However, as we will show
later in this section, image content also plays a role in the choice of these parameters.

REPRESENTING DIGITAL IMAGES


Let f ( s, t ) represent a continuous image function of two continuous variables, s and
t. We convert this function into a digital image by sampling and quantization, as
explained in the previous section. Suppose that we sample the continuous image
into a digital image, f ( x, y), containing M rows and N columns, where ( x, y) are
discrete coordinates. For notational clarity and convenience, we use integer values
for these discrete coordinates: x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. Thus,
for example, the value of the digital image at the origin is f (0, 0), and its value at
the next coordinates along the first row is f (0, 1). Here, the notation (0, 1) is used

www.EBooksWorld.ir
66 Chapter 2 Digital Image Fundamentals

to denote the second sample along the first row. It does not mean that these are
the values of the physical coordinates when the image was sampled. In general, the
value of a digital image at any coordinates ( x, y) is denoted f ( x, y), where x and y
are integers. When we need to refer to specific coordinates (i, j ), we use the notation
f (i, j ), where the arguments are integers. The section of the real plane spanned by
the coordinates of an image is called the spatial domain, with x and y being referred
to as spatial variables or spatial coordinates.
Figure 2.18 shows three ways of representing f ( x, y). Figure 2.18(a) is a plot of
the function, with two axes determining spatial location and the third axis being the
values of f as a function of x and y. This representation is useful when working with
grayscale sets whose elements are expressed as triplets of the form ( x, y, z) , where
x and y are spatial coordinates and z is the value of f at coordinates ( x, y). We will
work with this representation briefly in Section 2.6.
The representation in Fig. 2.18(b) is more common, and it shows f ( x, y) as it would
appear on a computer display or photograph. Here, the intensity of each point in the
display is proportional to the value of f at that point. In this figure, there are only
three equally spaced intensity values. If the intensity is normalized to the interval
[0, 1], then each point in the image has the value 0, 0.5, or 1. A monitor or printer con-
verts these three values to black, gray, or white, respectively, as in Fig. 2.18(b). This
type of representation includes color images, and allows us to view results at a glance.
As Fig. 2.18(c) shows, the third representation is an array (matrix) composed of
the numerical values of f ( x, y). This is the representation used for computer process-
ing. In equation form, we write the representation of an M * N numerical array as

⎡ f (0, 0) f (0, 1)  f (0, N − 1) ⎤


⎢ f (1, 0) f (1, 1)  f (1, N − 1) ⎥⎥
f ( x, y) = ⎢ (2-9)
⎢    ⎥
⎢ ⎥
⎣ f (M − 1, 0) f (M − 1, 1)  f (M − 1, N − 1)⎦

The right side of this equation is a digital image represented as an array of real
numbers. Each element of this array is called an image element, picture element, pixel,
or pel. We use the terms image and pixel throughout the book to denote a digital
image and its elements. Figure 2.19 shows a graphical representation of an image
array, where the x- and y-axis are used to denote the rows and columns of the array.
Specific pixels are values of the array at a fixed pair of coordinates. As mentioned
earlier, we generally use f (i, j ) when referring to a pixel with coordinates (i, j ).
We can also represent a digital image in a traditional matrix form:

⎡ a0,0 a0,1  a0,N −1 ⎤


⎢ a a1,1  a1,N −1 ⎥
A=⎢ ⎥
1,0
(2-10)
⎢    ⎥
⎢ ⎥
⎢⎣aM −1,0 aM −1,1  aM −1,N −1 ⎥⎦

Clearly, aij = f (i, j ), so Eqs. (2-9) and (2-10) denote identical arrays.

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 67

a f (x, y)
b c
FIGURE 2.18
(a) Image plotted
as a surface.
(b) Image displayed
as a visual intensity
array. (c) Image
shown as a 2-D nu-
merical array. (The
numbers 0, .5, and
1 represent black, y
gray, and white, x
respectively.)

Origin
y 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0
0 0 .5 .5 .5
0 0 .5 .5
0 .5 1 1 1
1
1 .5

As Fig. 2.19 shows, we define the origin of an image at the top left corner. This is
a convention based on the fact that many image displays (e.g., TV monitors) sweep
an image starting at the top left and moving to the right, one row at a time. More
important is the fact that the first element of a matrix is by convention at the top
left of the array. Choosing the origin of f ( x, y) at that point makes sense mathemati-
cally because digital images in reality are matrices. In fact, as you will see, sometimes
we use x and y interchangeably in equations with the rows (r) and columns (c) of a
matrix.
It is important to note that the representation in Fig. 2.19, in which the positive
x-axis extends downward and the positive y-axis extends to the right, is precisely the
right-handed Cartesian coordinate system with which you are familiar,† but shown
rotated by 90° so that the origin appears on the top, left.


Recall that a right-handed coordinate system is such that, when the index of the right hand points in the direc-
tion of the positive x-axis and the middle finger points in the (perpendicular) direction of the positive y-axis, the
thumb points up. As Figs. 2.18 and 2.19 show, this indeed is the case in our image coordinate system. In practice,
you will also find implementations based on a left-handed system, in which the x- and y-axis are interchanged
from the way we show them in Figs. 2.18 and 2.19. For example, MATLAB uses a left-handed system for image
processing. Both systems are perfectly valid, provided they are used consistently.

www.EBooksWorld.ir
68 Chapter 2 Digital Image Fundamentals
FIGURE 2.19 Origin
Coordinate
0 1 2 j yc N-1
convention used 0 y
to represent digital 1
images. Because
coordinate values i
are integers, there pixel f(i, j) The coordinates of the
is a one-to-one image center are
correspondence xc
between x and y M N
Center A xc, ycB = afloor Q R, floor Q Rb
and the rows (r) 2 2
and columns (c) of
a matrix.
Image f(x, y)
M- 1

The center of an M × N digital image with origin at (0, 0) and range to (M − 1, N − 1)


is obtained by dividing M and N by 2 and rounding down to the nearest integer.
The floor of z, sometimes
denoted JzK, is the largest This operation sometimes is denoted using the floor operator, J i K , as shown in Fig.
integer that is less than
or equal to z. The ceiling
2.19. This holds true for M and N even or odd. For example, the center of an image
of z, denoted LzM, is the of size 1023 × 1024 is at (511, 512). Some programming languages (e.g., MATLAB)
smallest integer that is
greater than or equal start indexing at 1 instead of at 0. The center of an image in that case is found at
to z. ( xc , yc ) = ( floor (M 2) + 1, floor ( N 2) + 1) .
To express sampling and quantization in more formal mathematical terms, let
Z and R denote the set of integers and the set of real numbers, respectively. The
sampling process may be viewed as partitioning the xy-plane into a grid, with the
coordinates of the center of each cell in the grid being a pair of elements from the
See Eq. (2-41) in
Cartesian product Z 2 (also denoted Z × Z ) which, as you may recall, is the set of
Section 2.6 for a formal all ordered pairs of elements (zi , zj ) with zi and zj being integers from set Z. Hence,
definition of the
Cartesian product.
f ( x, y) is a digital image if ( x, y) are integers from Z 2 and f is a function that assigns
an intensity value (that is, a real number from the set of real numbers, R) to each
distinct pair of coordinates ( x, y). This functional assignment is the quantization pro-
cess described earlier. If the intensity levels also are integers, then R = Z, and a
digital image becomes a 2-D function whose coordinates and amplitude values are
integers. This is the representation we use in the book.
Image digitization requires that decisions be made regarding the values for M, N,
and for the number, L, of discrete intensity levels. There are no restrictions placed
on M and N, other than they have to be positive integers. However, digital storage
and quantizing hardware considerations usually lead to the number of intensity lev-
els, L, being an integer power of two; that is

L = 2k (2-11)

where k is an integer. We assume that the discrete levels are equally spaced and that
they are integers in the range [0, L − 1] .

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 69

FIGURE 2.20
An image exhibit-
ing saturation and
noise. Saturation
is the highest val-
Noise
ue beyond which
all intensity values
are clipped (note Saturation
how the entire
saturated area has
a high, constant
intensity level).
Visible noise in
this case appears
as a grainy texture
pattern. The dark
background is
noisier, but the
noise is difficult
to see.

Sometimes, the range of values spanned by the gray scale is referred to as the
dynamic range, a term used in different ways in different fields. Here, we define the
dynamic range of an imaging system to be the ratio of the maximum measurable
intensity to the minimum detectable intensity level in the system. As a rule, the
upper limit is determined by saturation and the lower limit by noise, although noise
can be present also in lighter intensities. Figure 2.20 shows examples of saturation
and slight visible noise. Because the darker regions are composed primarily of pixels
with the minimum detectable intensity, the background in Fig. 2.20 is the noisiest
part of the image; however, dark background noise typically is much harder to see.
The dynamic range establishes the lowest and highest intensity levels that a system
can represent and, consequently, that an image can have. Closely associated with this
concept is image contrast, which we define as the difference in intensity between
the highest and lowest intensity levels in an image. The contrast ratio is the ratio of
these two quantities. When an appreciable number of pixels in an image have a high
dynamic range, we can expect the image to have high contrast. Conversely, an image
with low dynamic range typically has a dull, washed-out gray look. We will discuss
these concepts in more detail in Chapter 3.
The number, b, of bits required to store a digital image is

b=M*N *k (2-12)

When M = N , this equation becomes

b = N 2k (2-13)

www.EBooksWorld.ir
70 Chapter 2 Digital Image Fundamentals
FIGURE 2.21
Number of k=8
100
megabytes 7
90
required to store
images for 80

* 10 )
6

6
various values of
70
N and k. 5

b
8
60

Megabytes (
4
50
40 3
30 2
20
1
10
0
0 1 2 3 4 5 6 7 8 9 10 * 103
N

Figure 2.21 shows the number of megabytes required to store square images for
various values of N and k (as usual, one byte equals 8 bits and a megabyte equals
10 6 bytes).
When an image can have 2 k possible intensity levels, it is common practice to
refer to it as a “k-bit image,” (e,g., a 256-level image is called an 8-bit image). Note
that storage requirements for large 8-bit images (e.g., 10, 000 * 10, 000 pixels) are
not insignificant.

LINEAR VS. COORDINATE INDEXING


The convention discussed in the previous section, in which the location of a pixel is
given by its 2-D coordinates, is referred to as coordinate indexing, or subscript index-
ing. Another type of indexing used extensively in programming image processing
algorithms is linear indexing, which consists of a 1-D string of nonnegative integers
based on computing offsets from coordinates (0, 0). There are two principal types of
linear indexing, one is based on a row scan of an image, and the other on a column scan.
Figure 2.22 illustrates the principle of linear indexing based on a column scan.
The idea is to scan an image column by column, starting at the origin and proceeding
down and then to the right. The linear index is based on counting pixels as we scan
the image in the manner shown in Fig. 2.22. Thus, a scan of the first (leftmost) column
yields linear indices 0 through M − 1. A scan of the second column yields indices M
through 2 M − 1, and so on, until the last pixel in the last column is assigned the linear
index value MN − 1. Thus, a linear index, denoted by a , has one of MN possible
values: 0, 1, 2, … , MN − 1, as Fig. 2.22 shows. The important thing to notice here is
that each pixel is assigned a linear index value that identifies it uniquely.
The formula for generating linear indices based on a column scan is straightfor-
ward and can be determined by inspection. For any pair of coordinates ( x, y), the
corresponding linear index value is

a = My + x (2-14)

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 71

FIGURE 2.22 (0, 1) α = M


(0, 0) α = 0 (0, 2) α = 2M
Illustration of y
column scanning
for generating
linear indices.
Shown are several
2-D coordinates (in
parentheses) and
their corresponding
linear indices.
Image f(
ff(x,
x, y)
(M - 1, 0) α = M - 1 (M - 1, N - 1) α = MN - 1
(M - 1, 1) α = 2M - 1
x

Conversely, the coordinate indices for a given linear index value a are given by the
equations†
x = a mod M (2-15)
and

y = (a - x) M (2-16)

Recall that a mod M means “the remainder of the division of a by M.” This is a
formal way of stating that row numbers repeat themselves at the start of every col-
umn. Thus, when a = 0, the remainder of the division of 0 by M is 0, so x = 0. When
a = 1, the remainder is 1, and so x = 1. You can see that x will continue to be equal
to a until a = M − 1. When a = M (which is at the beginning of the second column),
the remainder is 0, and thus x = 0 again, and it increases by 1 until the next column
is reached, when the pattern repeats itself. Similar comments apply to Eq. (2-16). See
Problem 2.11 for a derivation of the preceding two equations.

SPATIAL AND INTENSITY RESOLUTION


Intuitively, spatial resolution is a measure of the smallest discernible detail in an
image. Quantitatively, spatial resolution can be stated in several ways, with line
pairs per unit distance, and dots (pixels) per unit distance being common measures.
Suppose that we construct a chart with alternating black and white vertical lines,
each of width W units (W can be less than 1). The width of a line pair is thus 2W, and
there are W 2 line pairs per unit distance. For example, if the width of a line is 0.1 mm,
there are 5 line pairs per unit distance (i.e., per mm). A widely used definition of
image resolution is the largest number of discernible line pairs per unit distance (e.g.,
100 line pairs per mm). Dots per unit distance is a measure of image resolution used
in the printing and publishing industry. In the U.S., this measure usually is expressed
as dots per inch (dpi). To give you an idea of quality, newspapers are printed with a


When working with modular number systems, it is more accurate to write x ≡ a mod M , where the symbol ≡
means congruence. However, our interest here is just on converting from linear to coordinate indexing, so we
use the more familiar equal sign.

www.EBooksWorld.ir
72 Chapter 2 Digital Image Fundamentals

resolution of 75 dpi, magazines at 133 dpi, glossy brochures at 175 dpi, and the book
page at which you are presently looking was printed at 2400 dpi.
To be meaningful, measures of spatial resolution must be stated with respect to
spatial units. Image size by itself does not tell the complete story. For example, to say
that an image has a resolution of 1024 * 1024 pixels is not a meaningful statement
without stating the spatial dimensions encompassed by the image. Size by itself is
helpful only in making comparisons between imaging capabilities. For instance, a
digital camera with a 20-megapixel CCD imaging chip can be expected to have a
higher capability to resolve detail than an 8-megapixel camera, assuming that both
cameras are equipped with comparable lenses and the comparison images are taken
at the same distance.
Intensity resolution similarly refers to the smallest discernible change in inten-
sity level. We have considerable discretion regarding the number of spatial samples
(pixels) used to generate a digital image, but this is not true regarding the number
of intensity levels. Based on hardware considerations, the number of intensity levels
usually is an integer power of two, as we mentioned when discussing Eq. (2-11). The
most common number is 8 bits, with 16 bits being used in some applications in which
enhancement of specific intensity ranges is necessary. Intensity quantization using
32 bits is rare. Sometimes one finds systems that can digitize the intensity levels of
an image using 10 or 12 bits, but these are not as common.
Unlike spatial resolution, which must be based on a per-unit-of-distance basis to
be meaningful, it is common practice to refer to the number of bits used to quan-
tize intensity as the “intensity resolution.” For example, it is common to say that an
image whose intensity is quantized into 256 levels has 8 bits of intensity resolution.
However, keep in mind that discernible changes in intensity are influenced also by
noise and saturation values, and by the capabilities of human perception to analyze
and interpret details in the context of an entire scene (see Section 2.1). The following
two examples illustrate the effects of spatial and intensity resolution on discernible
detail. Later in this section, we will discuss how these two parameters interact in
determining perceived image quality.

EXAMPLE 2.2 : Effects of reducing the spatial resolution of a digital image.


Figure 2.23 shows the effects of reducing the spatial resolution of an image. The images in Figs. 2.23(a)
through (d) have resolutions of 930, 300, 150, and 72 dpi, respectively. Naturally, the lower resolution
images are smaller than the original image in (a). For example, the original image is of size 2136 * 2140
pixels, but the 72 dpi image is an array of only 165 * 166 pixels. In order to facilitate comparisons, all the
smaller images were zoomed back to the original size (the method used for zooming will be discussed
later in this section). This is somewhat equivalent to “getting closer” to the smaller images so that we can
make comparable statements about visible details.
There are some small visual differences between Figs. 2.23(a) and (b), the most notable being a slight
distortion in the seconds marker pointing to 60 on the right side of the chronometer. For the most part,
however, Fig. 2.23(b) is quite acceptable. In fact, 300 dpi is the typical minimum image spatial resolution
used for book publishing, so one would not expect to see much difference between these two images.
Figure 2.23(c) begins to show visible degradation (see, for example, the outer edges of the chronometer

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 73

a b
c d
FIGURE 2.23
Effects of
reducing spatial
resolution. The
images shown
are at:
(a) 930 dpi,
(b) 300 dpi,
(c) 150 dpi, and
(d) 72 dpi.

case and compare the seconds marker with the previous two images). The numbers also show visible
degradation. Figure 2.23(d) shows degradation that is visible in most features of the image. When print-
ing at such low resolutions, the printing and publishing industry uses a number of techniques (such as
locally varying the pixel size) to produce much better results than those in Fig. 2.23(d). Also, as we will
show later in this section, it is possible to improve on the results of Fig. 2.23 by the choice of interpola-
tion method used.

EXAMPLE 2.3 : Effects of varying the number of intensity levels in a digital image.
Figure 2.24(a) is a 774 × 640 CT projection image, displayed using 256 intensity levels (see Chapter 1
regarding CT images). The objective of this example is to reduce the number of intensities of the image
from 256 to 2 in integer powers of 2, while keeping the spatial resolution constant. Figures 2.24(b)
through (d) were obtained by reducing the number of intensity levels to 128, 64, and 32, respectively (we
will discuss in Chapter 3 how to reduce the number of levels).

www.EBooksWorld.ir
74 Chapter 2 Digital Image Fundamentals

a b
c d
FIGURE 2.24
(a) 774 × 640,
256-level image.
(b)-(d) Image
displayed in 128,
64, and 32 inten-
sity levels, while
keeping the
spatial resolution
constant.
(Original image
courtesy of the
Dr. David R.
Pickens,
Department of
Radiology &
Radiological
Sciences,
Vanderbilt
University
Medical Center.)

The 128- and 64-level images are visually identical for all practical purposes. However, the 32-level image
in Fig. 2.24(d) has a set of almost imperceptible, very fine ridge-like structures in areas of constant inten-
sity. These structures are clearly visible in the 16-level image in Fig. 2.24(e). This effect, caused by using
an insufficient number of intensity levels in smooth areas of a digital image, is called false contouring, so
named because the ridges resemble topographic contours in a map. False contouring generally is quite
objectionable in images displayed using 16 or fewer uniformly spaced intensity levels, as the images in
Figs. 2.24(e)-(h) show.
As a very rough guideline, and assuming integer powers of 2 for convenience, images of size 256 * 256
pixels with 64 intensity levels, and printed on a size format on the order of 5 * 5 cm, are about the lowest
spatial and intensity resolution images that can be expected to be reasonably free of objectionable sam-
pling distortions and false contouring.

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 75

e f
g h
FIGURE 2.24
(Continued)
(e)-(h) Image
displayed in 16, 8,
4, and 2 intensity
levels.

The results in Examples 2.2 and 2.3 illustrate the effects produced on image qual-
ity by varying spatial and intensity resolution independently. However, these results
did not consider any relationships that might exist between these two parameters.
An early study by Huang [1965] attempted to quantify experimentally the effects on
image quality produced by the interaction of these two variables. The experiment
consisted of a set of subjective tests. Images similar to those shown in Fig. 2.25 were
used. The woman’s face represents an image with relatively little detail; the picture
of the cameraman contains an intermediate amount of detail; and the crowd picture
contains, by comparison, a large amount of detail.
Sets of these three types of images of various sizes and intensity resolution were
generated by varying N and k [see Eq. (2-13)]. Observers were then asked to rank

www.EBooksWorld.ir
76 Chapter 2 Digital Image Fundamentals

a b c
FIGURE 2.25 (a) Image with a low level of detail. (b) Image with a medium level of detail. (c) Image with a relatively
large amount of detail. (Image (b) courtesy of the Massachusetts Institute of Technology.)

them according to their subjective quality. Results were summarized in the form of
so-called isopreference curves in the Nk-plane. (Figure 2.26 shows average isopref-
erence curves representative of the types of images in Fig. 2.25.) Each point in the
Nk-plane represents an image having values of N and k equal to the coordinates
of that point. Points lying on an isopreference curve correspond to images of equal
subjective quality. It was found in the course of the experiments that the isoprefer-
ence curves tended to shift right and upward, but their shapes in each of the three
image categories were similar to those in Fig. 2.26. These results were not unexpect-
ed, because a shift up and right in the curves simply means larger values for N and k,
which implies better picture quality.

FIGURE 2.26
Representative
isopreference
curves for the 5
three types of
images in
Fig. 2.25.
Face
k
Cameraman

Crowd
4

32 64 128 256
N

www.EBooksWorld.ir
2.4 Image Sampling and Quantization 77

Observe that isopreference curves tend to become more vertical as the detail in
the image increases. This result suggests that for images with a large amount of detail
only a few intensity levels may be needed. For example, the isopreference curve in
Fig. 2.26 corresponding to the crowd is nearly vertical. This indicates that, for a fixed
value of N, the perceived quality for this type of image is nearly independent of the
number of intensity levels used (for the range of intensity levels shown in Fig. 2.26).
The perceived quality in the other two image categories remained the same in some
intervals in which the number of samples was increased, but the number of intensity
levels actually decreased. The most likely reason for this result is that a decrease in k
tends to increase the apparent contrast, a visual effect often perceived as improved
image quality.

IMAGE INTERPOLATION
Interpolation is used in tasks such as zooming, shrinking, rotating, and geometrically
correcting digital images. Our principal objective in this section is to introduce inter-
polation and apply it to image resizing (shrinking and zooming), which are basically
image resampling methods. Uses of interpolation in applications such as rotation
and geometric corrections will be discussed in Section 2.6.
Interpolation is the process of using known data to estimate values at unknown
locations. We begin the discussion of this topic with a short example. Suppose that
an image of size 500 * 500 pixels has to be enlarged 1.5 times to 750 * 750 pixels. A
simple way to visualize zooming is to create an imaginary 750 * 750 grid with the
same pixel spacing as the original image, then shrink it so that it exactly overlays the
original image. Obviously, the pixel spacing in the shrunken 750 * 750 grid will be
less than the pixel spacing in the original image. To assign an intensity value to any
point in the overlay, we look for its closest pixel in the underlying original image and
assign the intensity of that pixel to the new pixel in the 750 * 750 grid. When intensi-
ties have been assigned to all the points in the overlay grid, we expand it back to the
specified size to obtain the resized image.
The method just discussed is called nearest neighbor interpolation because it
assigns to each new location the intensity of its nearest neighbor in the original
image (see Section 2.5 regarding neighborhoods). This approach is simple but, it has
the tendency to produce undesirable artifacts, such as severe distortion of straight
Contrary to what the
edges. A more suitable approach is bilinear interpolation, in which we use the four
name suggests, bilinear nearest neighbors to estimate the intensity at a given location. Let ( x, y) denote the
interpolation is not a
linear operation because
coordinates of the location to which we want to assign an intensity value (think of
it involves multiplication it as a point of the grid described previously), and let v( x, y) denote that intensity
of coordinates (which is
not a linear operation).
value. For bilinear interpolation, the assigned value is obtained using the equation
See Eq. (2-17).
v( x, y) = ax + by + cxy + d (2-17)

where the four coefficients are determined from the four equations in four
unknowns that can be written using the four nearest neighbors of point ( x, y).
Bilinear interpolation gives much better results than nearest neighbor interpolation,
with a modest increase in computational burden.

www.EBooksWorld.ir
78 Chapter 2 Digital Image Fundamentals

The next level of complexity is bicubic interpolation, which involves the sixteen
nearest neighbors of a point. The intensity value assigned to point ( x, y) is obtained
using the equation
3 3
v( x, y) = ∑ ∑ aij x i y j (2-18)
i=0 j=0

The sixteen coefficients are determined from the sixteen equations with six-
teen unknowns that can be written using the sixteen nearest neighbors of point
( x, y) . Observe that Eq. (2-18) reduces in form to Eq. (2-17) if the limits of both
summations in the former equation are 0 to 1. Generally, bicubic interpolation does
a better job of preserving fine detail than its bilinear counterpart. Bicubic interpola-
tion is the standard used in commercial image editing applications, such as Adobe
Photoshop and Corel Photopaint.
Although images are displayed with integer coordinates, it is possible during pro-
cessing to work with subpixel accuracy by increasing the size of the image using
interpolation to “fill the gaps” between pixels in the original image.

EXAMPLE 2.4 : Comparison of interpolation approaches for image shrinking and zooming.
Figure 2.27(a) is the same as Fig. 2.23(d), which was obtained by reducing the resolution of the 930 dpi
image in Fig. 2.23(a) to 72 dpi (the size shrank from 2136 * 2140 to 165 * 166 pixels) and then zooming
the reduced image back to its original size. To generate Fig. 2.23(d) we used nearest neighbor interpola-
tion both to shrink and zoom the image. As noted earlier, the result in Fig. 2.27(a) is rather poor. Figures
2.27(b) and (c) are the results of repeating the same procedure but using, respectively, bilinear and bicu-
bic interpolation for both shrinking and zooming. The result obtained by using bilinear interpolation is a
significant improvement over nearest neighbor interpolation, but the resulting image is blurred slightly.
Much sharper results can be obtained using bicubic interpolation, as Fig. 2.27(c) shows.

a b c
FIGURE 2.27 (a) Image reduced to 72 dpi and zoomed back to its original 930 dpi using nearest neighbor interpolation.
This figure is the same as Fig. 2.23(d). (b) Image reduced to 72 dpi and zoomed using bilinear interpolation. (c) Same
as (b) but using bicubic interpolation.

www.EBooksWorld.ir
2.5 Some Basic Relationships Between Pixels 79

It is possible to use more neighbors in interpolation, and there are more complex
techniques, such as using splines or wavelets, that in some instances can yield better
results than the methods just discussed. While preserving fine detail is an exception-
ally important consideration in image generation for 3-D graphics (for example, see
Hughes and Andries [2013]), the extra computational burden seldom is justifiable
for general-purpose digital image processing, where bilinear or bicubic interpola-
tion typically are the methods of choice.

2.5 SOME BASIC RELATIONSHIPS BETWEEN PIXELS


2.5

In this section, we discuss several important relationships between pixels in a digital


image. When referring in the following discussion to particular pixels, we use lower-
case letters, such as p and q.

NEIGHBORS OF A PIXEL
A pixel p at coordinates ( x, y) has two horizontal and two vertical neighbors with
coordinates

( x + 1, y), ( x − 1, y), ( x, y + 1), ( x, y − 1)

This set of pixels, called the 4-neighbors of p, is denoted N 4 ( p).


The four diagonal neighbors of p have coordinates

( x + 1, y + 1), ( x + 1, y − 1), ( x − 1, y + 1), ( x − 1, y − 1)

and are denoted N D ( p). These neighbors, together with the 4-neighbors, are called
the 8-neighbors of p, denoted by N8 ( p). The set of image locations of the neighbors
of a point p is called the neighborhood of p. The neighborhood is said to be closed if
it contains p. Otherwise, the neighborhood is said to be open.

ADJACENCY, CONNECTIVITY, REGIONS, AND BOUNDARIES


Let V be the set of intensity values used to define adjacency. In a binary image,
V = {1} if we are referring to adjacency of pixels with value 1. In a grayscale image,
the idea is the same, but set V typically contains more elements. For example, if we
are dealing with the adjacency of pixels whose values are in the range 0 to 255, set V
could be any subset of these 256 values. We consider three types of adjacency:

1. 4-adjacency. Two pixels p and q with values from V are 4-adjacent if q is in the
set N 4 ( p).
2. 8-adjacency. Two pixels p and q with values from V are 8-adjacent if q is in the
set N8 ( p).
3. m-adjacency (also called mixed adjacency). Two pixels p and q with values from
V are m-adjacent if

www.EBooksWorld.ir
80 Chapter 2 Digital Image Fundamentals

(a) q is in N 4 ( p), or
We use the symbols
(b) q is in N D ( p) and the set N 4 ( p) ¨ N 4 (q) has no pixels whose values are
¨ and ´ to denote set from V.
intersection and union,
respectively. Given sets
A and B, recall that
Mixed adjacency is a modification of 8-adjacency, and is introduced to eliminate the
their intersection is the ambiguities that may result from using 8-adjacency. For example, consider the pixel
set of elements that
are members of both
arrangement in Fig. 2.28(a) and let V = {1} . The three pixels at the top of Fig. 2.28(b)
A and B. The union of show multiple (ambiguous) 8-adjacency, as indicated by the dashed lines. This ambi-
these two sets is the set
of elements that are
guity is removed by using m-adjacency, as in Fig. 2.28(c). In other words, the center
members of A, of B, or and upper-right diagonal pixels are not m-adjacent because they do not satisfy con-
of both. We will discuss
sets in more detail in
dition (b).
Section 2.6. A digital path (or curve) from pixel p with coordinates (x0 , y0 ) to pixel q with
coordinates (xn , yn ) is a sequence of distinct pixels with coordinates

( x0 , y0 ), ( x1 , y1 ),…, ( xn , yn )

where points ( xi , yi ) and ( xi −1 , yi −1 ) are adjacent for 1 ≤ i ≤ n. In this case, n is the


length of the path. If ( x0 , y0 ) = ( xn , yn ) the path is a closed path. We can define 4-, 8-,
or m-paths, depending on the type of adjacency specified. For example, the paths in
Fig. 2.28(b) between the top right and bottom right points are 8-paths, and the path
in Fig. 2.28(c) is an m-path.
Let S represent a subset of pixels in an image. Two pixels p and q are said to be
connected in S if there exists a path between them consisting entirely of pixels in S.
For any pixel p in S, the set of pixels that are connected to it in S is called a connected
component of S. If it only has one component, and that component is connected,
then S is called a connected set.
Let R represent a subset of pixels in an image. We call R a region of the image if R
is a connected set. Two regions, Ri and Rj are said to be adjacent if their union forms
a connected set. Regions that are not adjacent are said to be disjoint. We consider 4-
and 8-adjacency when referring to regions. For our definition to make sense, the type
of adjacency used must be specified. For example, the two regions of 1’s in Fig. 2.28(d)
are adjacent only if 8-adjacency is used (according to the definition in the previous

1 1 1 0 0 0 0 0 0 0 0
1 0 1 Ri 0 1 1 0 0 0 1 0
0 1 0 0 1 1 0 0 0 1 0
0 1 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 1 0
0 1 0 0 1 0 0 1 0 1 1 1 Rj 0 1 1 1 0 0 1 0
0 0 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0
a b c d e f
FIGURE 2.28 (a) An arrangement of pixels. (b) Pixels that are 8-adjacent (adjacency is shown by dashed lines).
(c) m-adjacency. (d) Two regions (of 1’s) that are 8-adjacent. (e) The circled point is on the boundary of the 1-valued
pixels only if 8-adjacency between the region and background is used. (f) The inner boundary of the 1-valued region
does not form a closed path, but its outer boundary does.

www.EBooksWorld.ir
2.5 Some Basic Relationships Between Pixels 81

paragraph, a 4-path between the two regions does not exist, so their union is not a
connected set).
Suppose an image contains K disjoint regions, Rk , k = 1, 2, … , K , none of which
touches the image border.† Let Ru denote the union of all the K regions, and let
( Ru )c denote its complement (recall that the complement of a set A is the set of
points that are not in A). We call all the points in Ru the foreground, and all the
points in ( Ru ) the background of the image.
c

The boundary (also called the border or contour) of a region R is the set of pixels in
R that are adjacent to pixels in the complement of R. Stated another way, the border
of a region is the set of pixels in the region that have at least one background neigh-
bor. Here again, we must specify the connectivity being used to define adjacency. For
example, the point circled in Fig. 2.28(e) is not a member of the border of the 1-val-
ued region if 4-connectivity is used between the region and its background, because
the only possible connection between that point and the background is diagonal.
As a rule, adjacency between points in a region and its background is defined using
8-connectivity to handle situations such as this.
The preceding definition sometimes is referred to as the inner border of the
region to distinguish it from its outer border, which is the corresponding border in
the background. This distinction is important in the development of border-follow-
ing algorithms. Such algorithms usually are formulated to follow the outer boundary
in order to guarantee that the result will form a closed path. For instance, the inner
border of the 1-valued region in Fig. 2.28(f) is the region itself. This border does not
satisfy the definition of a closed path. On the other hand, the outer border of the
region does form a closed path around the region.
If R happens to be an entire image, then its boundary (or border) is defined as the
set of pixels in the first and last rows and columns of the image. This extra definition
is required because an image has no neighbors beyond its border. Normally, when
we refer to a region, we are referring to a subset of an image, and any pixels in the
boundary of the region that happen to coincide with the border of the image are
included implicitly as part of the region boundary.
The concept of an edge is found frequently in discussions dealing with regions
and boundaries. However, there is a key difference between these two concepts. The
boundary of a finite region forms a closed path and is thus a “global” concept. As we
will discuss in detail in Chapter 10, edges are formed from pixels with derivative val-
ues that exceed a preset threshold. Thus, an edge is a “local” concept that is based on
a measure of intensity-level discontinuity at a point. It is possible to link edge points
into edge segments, and sometimes these segments are linked in such a way that
they correspond to boundaries, but this is not always the case. The one exception in
which edges and boundaries correspond is in binary images. Depending on the type
of connectivity and edge operators used (we will discuss these in Chapter 10), the
edge extracted from a binary region will be the same as the region boundary. This is


We make this assumption to avoid having to deal with special cases. This can be done without loss of generality
because if one or more regions touch the border of an image, we can simply pad the image with a 1-pixel-wide
border of background values.

www.EBooksWorld.ir
120 Chapter 3 Intensity Transformations and Spatial Filtering

3.1 BACKGROUND
3.1

All the image processing techniques discussed in this chapter are implemented in
the spatial domain, which we know from the discussion in Section 2.4 is the plane
containing the pixels of an image. Spatial domain techniques operate directly on the
pixels of an image, as opposed, for example, to the frequency domain (the topic of
Chapter 4) in which operations are performed on the Fourier transform of an image,
rather than on the image itself. As you will learn in progressing through the book,
some image processing tasks are easier or more meaningful to implement in the
spatial domain, while others are best suited for other approaches.

THE BASICS OF INTENSITY TRANSFORMATIONS AND SPATIAL


FILTERING
The spatial domain processes we discuss in this chapter are based on the expression

g( x, y) = T [ f ( x, y)] (3-1)

where f ( x, y) is an input image, g( x, y) is the output image, and T is an operator on f


defined over a neighborhood of point ( x, y). The operator can be applied to the pix-
els of a single image (our principal focus in this chapter) or to the pixels of a set of
images, such as performing the elementwise sum of a sequence of images for noise
reduction, as discussed in Section 2.6. Figure 3.1 shows the basic implementation of
Eq. (3-1) on a single image. The point ( x0 , y0 ) shown is an arbitrary location in the
image, and the small region shown is a neighborhood of ( x0 , y0 ), as explained in Sec-
tion 2.6. Typically, the neighborhood is rectangular, centered on ( x0 , y0 ), and much
smaller in size than the image.
The process that Fig. 3.1 illustrates consists of moving the center of the neighbor-
hood from pixel to pixel, and applying the operator T to the pixels in the neighbor-
hood to yield an output value at that location. Thus, for any specific location ( x0 , y0 ),

FIGURE 3.1 Origin y0


y
A3×3
neighborhood Pixel [its value is f ( x0 , y0 )]
about a point
( x0 , y0 ) in an image. x0
The neighborhood
is moved from
3 × 3 neighborhood
pixel to pixel in the
image to generate of point ( x0 , y0 )
an output image.
Recall from
Chapter 2 that the
value of a pixel at
location ( x0 , y0 ) is
f ( x0 , y0 ), the value
Image f
of the image at that
location.
x

www.EBooksWorld.ir
3.1 Background 121

the value of the output image g at those coordinates is equal to the result of apply-
ing T to the neighborhood with origin at ( x0 , y0 ) in f. For example, suppose that
the neighborhood is a square of size 3 × 3 and that operator T is defined as “com-
pute the average intensity of the pixels in the neighborhood.” Consider an arbitrary
location in an image, say (100, 150). The result at that location in the output image,
g(100, 150), is the sum of f (100, 150) and its 8-neighbors, divided by 9. The center of
the neighborhood is then moved to the next adjacent location and the procedure
Depending on the size
is repeated to generate the next value of the output image g. Typically, the process
of a neighborhood and starts at the top left of the input image and proceeds pixel by pixel in a horizontal
its location, part of the
neighborhood may lie
(vertical) scan, one row (column) at a time. We will discuss this type of neighbor-
outside the image. There hood processing beginning in Section 3.4.
are two solutions to this:
(1) to ignore the values
The smallest possible neighborhood is of size 1 × 1. In this case, g depends only
outside the image, or on the value of f at a single point ( x, y) and T in Eq. (3-1) becomes an intensity (also
(2) to pad image, as
discussed in Section 3.4.
called a gray-level, or mapping) transformation function of the form
The second approach is
preferred. s = T (r ) (3-2)

where, for simplicity in notation, we use s and r to denote, respectively, the intensity
of g and f at any point ( x, y). For example, if T (r ) has the form in Fig. 3.2(a), the
result of applying the transformation to every pixel in f to generate the correspond-
ing pixels in g would be to produce an image of higher contrast than the original, by
darkening the intensity levels below k and brightening the levels above k. In this
technique, sometimes called contrast stretching (see Section 3.2), values of r lower
than k reduce (darken) the values of s, toward black. The opposite is true for values
of r higher than k. Observe how an intensity value r0 is mapped to obtain the cor-
responding value s0 . In the limiting case shown in Fig. 3.2(b), T (r ) produces a two-
level (binary) image. A mapping of this form is called a thresholding function. Some
fairly simple yet powerful processing approaches can be formulated with intensity
transformation functions. In this chapter, we use intensity transformations princi-
pally for image enhancement. In Chapter 10, we will use them for image segmenta-
tion. Approaches whose results depend only on the intensity at a point sometimes
are called point processing techniques, as opposed to the neighborhood processing
techniques discussed in the previous paragraph.

a b s  T(r) s  T (r)

FIGURE 3.2
Intensity s0  T (r0)
Light

transformation
Light

functions. T (r) T (r)


(a) Contrast
stretching
function.
Dark
Dark

(b) Thresholding
function.
r r
k r0 k
Dark Light Dark Light

www.EBooksWorld.ir
122 Chapter 3 Intensity Transformations and Spatial Filtering

ABOUT THE EXAMPLES IN THIS CHAPTER


Although intensity transformation and spatial filtering methods span a broad range
of applications, most of the examples in this chapter are applications to image
enhancement. Enhancement is the process of manipulating an image so that the
result is more suitable than the original for a specific application. The word specific
is important, because it establishes at the outset that enhancement techniques are
problem-oriented. Thus, for example, a method that is quite useful for enhancing
X-ray images may not be the best approach for enhancing infrared images. There is
no general “theory” of image enhancement. When an image is processed for visual
interpretation, the viewer is the ultimate judge of how well a particular method
works. When dealing with machine perception, enhancement is easier to quantify.
For example, in an automated character-recognition system, the most appropriate
enhancement method is the one that results in the best recognition rate, leaving
aside other considerations such as computational requirements of one method
versus another. Regardless of the application or method used, image enhancement
is one of the most visually appealing areas of image processing. Beginners in image
processing generally find enhancement applications interesting and relatively sim-
ple to understand. Therefore, using examples from image enhancement to illustrate
the spatial processing methods developed in this chapter not only saves having an
extra chapter in the book dealing with image enhancement but, more importantly, is
an effective approach for introducing newcomers to image processing techniques in
the spatial domain. As you progress through the remainder of the book, you will find
that the material developed in this chapter has a scope that is much broader than
just image enhancement.

3.2 SOME BASIC INTENSITY TRANSFORMATION FUNCTIONS


3.2

Intensity transformations are among the simplest of all image processing techniques.
As noted in the previous section, we denote the values of pixels, before and after
processing, by r and s, respectively. These values are related by a transformation T,
as given in Eq. (3-2), that maps a pixel value r into a pixel value s. Because we deal
with digital quantities, values of an intensity transformation function typically are
stored in a table, and the mappings from r to s are implemented via table lookups.
For an 8-bit image, a lookup table containing the values of T will have 256 entries.
As an introduction to intensity transformations, consider Fig. 3.3, which shows
three basic types of functions used frequently in image processing: linear (negative
and identity transformations), logarithmic (log and inverse-log transformations),
and power-law (nth power and nth root transformations). The identity function is
the trivial case in which the input and output intensities are identical.

IMAGE NEGATIVES
The negative of an image with intensity levels in the range [0, L − 1] is obtained by
using the negative transformation function shown in Fig. 3.3, which has the form:
s = L−1−r (3-3)

www.EBooksWorld.ir
3.2 Some Basic Intensity Transformation Functions 123

FIGURE 3.3 L1


Some basic
intensity Negative
transformation nth root
functions. Each 3L/4
curve was scaled

Output intensity levels, s


independently so
Log
that all curves
nth power
would fit in the
same graph. Our L/2
interest here is
on the shapes of
the curves, not
on their relative L/4
values.
Inverse log
Identity (exponential)

0
0 L/4 L/2 3L/4 L1
Input intensity levels, r

Reversing the intensity levels of a digital image in this manner produces the
equivalent of a photographic negative. This type of processing is used, for example,
in enhancing white or gray detail embedded in dark regions of an image, especially
when the black areas are dominant in size. Figure 3.4 shows an example. The origi-
nal image is a digital mammogram showing a small lesion. Despite the fact that the
visual content is the same in both images, some viewers find it easier to analyze the
fine details of the breast tissue using the negative image.

a b
FIGURE 3.4
(a) A
digital
mammogram.
(b) Negative
image obtained
using Eq. (3-3).
(Image (a)
Courtesy of
General Electric
Medical Systems.)

www.EBooksWorld.ir
124 Chapter 3 Intensity Transformations and Spatial Filtering

LOG TRANSFORMATIONS
The general form of the log transformation in Fig. 3.3 is
s = c log(1 + r ) (3-4)
where c is a constant and it is assumed that r ≥ 0. The shape of the log curve in Fig. 3.3
shows that this transformation maps a narrow range of low intensity values in the
input into a wider range of output levels. For example, note how input levels in the
range [0, L 4] map to output levels to the range [0, 3L 4]. Conversely, higher values
of input levels are mapped to a narrower range in the output. We use a transformation
of this type to expand the values of dark pixels in an image, while compressing the
higher-level values. The opposite is true of the inverse log (exponential) transformation.
Any curve having the general shape of the log function shown in Fig. 3.3 would
accomplish this spreading/compressing of intensity levels in an image, but the pow-
er-law transformations discussed in the next section are much more versatile for
this purpose. The log function has the important characteristic that it compresses
the dynamic range of pixel values. An example in which pixel values have a large
dynamic range is the Fourier spectrum, which we will discuss in Chapter 4. It is not
unusual to encounter spectrum values that range from 0 to 10 6 or higher. Processing
numbers such as these presents no problems for a computer, but image displays can-
not reproduce faithfully such a wide range of values. The net effect is that intensity
detail can be lost in the display of a typical Fourier spectrum.
Figure 3.5(a) shows a Fourier spectrum with values in the range 0 to 1.5 × 10 6.
When these values are scaled linearly for display in an 8-bit system, the brightest
pixels dominate the display, at the expense of lower (and just as important) values
of the spectrum. The effect of this dominance is illustrated vividly by the relatively
small area of the image in Fig. 3.5(a) that is not perceived as black. If, instead of
displaying the values in this manner, we first apply Eq. (3-4) (with c = 1 in this case)
to the spectrum values, then the range of values of the result becomes 0 to 6.2. Trans-
forming values in this way enables a greater range of intensities to be shown on the
display. Figure 3.5(b) shows the result of scaling the intensity range linearly to the

a b
FIGURE 3.5
(a) Fourier
spectrum
displayed as a
grayscale image.
(b) Result of
applying the log
transformation
in Eq. (3-4) with
c = 1. Both images
are scaled to the
range [0, 255].

www.EBooksWorld.ir
3.2 Some Basic Intensity Transformation Functions 125

interval [0, 255] and showing the spectrum in the same 8-bit display. The level of
detail visible in this image as compared to an unmodified display of the spectrum
is evident from these two images. Most of the Fourier spectra in image processing
publications, including this book, have been scaled in this manner.

POWER-LAW (GAMMA) TRANSFORMATIONS


Power-law transformations have the form
s = cr g (3-5)
where c and g are positive constants. Sometimes Eq. (3-5) is written as s = c ( r + e )
g

to account for offsets (that is, a measurable output when the input is zero). However,
offsets typically are an issue of display calibration, and as a result they are normally
ignored in Eq. (3-5). Figure 3.6 shows plots of s as a function of r for various values
of g. As with log transformations, power-law curves with fractional values of g map
a narrow range of dark input values into a wider range of output values, with the
opposite being true for higher values of input levels. Note also in Fig. 3.6 that a fam-
ily of transformations can be obtained simply by varying g. Curves generated with
values of g > 1 have exactly the opposite effect as those generated with values of
g < 1. When c = g = 1 Eq. (3-5) reduces to the identity transformation.
The response of many devices used for image capture, printing, and display obey
a power law. By convention, the exponent in a power-law equation is referred to as
gamma [hence our use of this symbol in Eq. (3-5)]. The process used to correct these
power-law response phenomena is called gamma correction or gamma encoding.
For example, cathode ray tube (CRT) devices have an intensity-to-voltage response
that is a power function, with exponents varying from approximately 1.8 to 2.5. As
the curve for g = 2.5 in Fig. 3.6 shows, such display systems would tend to produce

FIGURE 3.6 L1


Plots of the
g  0.04
gamma equation
s = c r g for various g  0.10
values of g (c = 1 3L/4 g  0.20
in all cases). Each
Output intensity levels, s

curve was scaled g  0.40


independently so
that all curves g  0.67

would fit in the L/2 g1


same graph. Our g  1.5
interest here is
on the shapes of g  2.5
the curves, not L/4 g  5.0
on their relative
values. g  10.0

g  25.0

0
0 L/4 L/2 3L/4 L1
Input intensity levels, r

www.EBooksWorld.ir
126 Chapter 3 Intensity Transformations and Spatial Filtering

a b
c d
FIGURE 3.7
(a) Intensity ramp
image. (b) Image
as viewed on a
simulated monitor
with a gamma of
2.5. (c) Gamma-
corrected image.
(d) Corrected
image as viewed
on the same Original image Gamma Correction Original image as viewed on a monitor with
monitor. Compare a gamma of 2.5
(d) and (a).

Gamma-corrected image Gamma-corrected image as viewed on the


same monitor

images that are darker than intended. Figure 3.7 illustrates this effect. Figure 3.7(a)
is an image of an intensity ramp displayed in a monitor with a gamma of 2.5. As
Sometimes, a higher expected, the output of the monitor appears darker than the input, as Fig. 3.7(b)
gamma makes the shows.
displayed image look
better to viewers than In this case, gamma correction consists of using the transformation s = r 1 2.5 = r 0.4
the original because of to preprocess the image before inputting it into the monitor. Figure 3.7(c) is the result.
an increase in contrast.
However, the objective When input into the same monitor, the gamma-corrected image produces an output
of gamma correction is to that is close in appearance to the original image, as Fig. 3.7(d) shows. A similar analysis
produce a faithful display
of an input image. as above would apply to other imaging devices, such as scanners and printers, the dif-
ference being the device-dependent value of gamma (Poynton [1996]).

EXAMPLE 3.1 : Contrast enhancement using power-law intensity transformations.


In addition to gamma correction, power-law transformations are useful for general-purpose contrast
manipulation. Figure 3.8(a) shows a magnetic resonance image (MRI) of a human upper thoracic spine
with a fracture dislocation. The fracture is visible in the region highlighted by the circle. Because the
image is predominantly dark, an expansion of intensity levels is desirable. This can be accomplished
using a power-law transformation with a fractional exponent. The other images shown in the figure were
obtained by processing Fig. 3.8(a) with the power-law transformation function of Eq. (3-5). The values

www.EBooksWorld.ir
3.2 Some Basic Intensity Transformation Functions 127

a b
c d
FIGURE 3.8
(a) Magnetic
resonance
image (MRI) of a
fractured human
spine (the region
of the fracture is
enclosed by the
circle).
(b)–(d) Results of
applying the
transformation
in Eq. (3-5)
with c = 1 and
g = 0.6, 0.4, and
0.3, respectively.
(Original image
courtesy of Dr.
David R. Pickens,
Department of
Radiology and
Radiological
Sciences,
Vanderbilt
University
Medical Center.)

of gamma corresponding to images (b) through (d) are 0.6, 0.4, and 0.3, respectively (c = 1 in all cases).
Observe that as gamma decreased from 0.6 to 0.4, more detail became visible. A further decrease of
gamma to 0.3 enhanced a little more detail in the background, but began to reduce contrast to the point
where the image started to have a very slight “washed-out” appearance, especially in the background.
The best enhancement in terms of contrast and discernible detail was obtained with g = 0.4. A value of
g = 0.3 is an approximate limit below which contrast in this particular image would be reduced to an
unacceptable level.

EXAMPLE 3.2 : Another illustration of power-law transformations.


Figure 3.9(a) shows the opposite problem of that presented in Fig. 3.8(a). The image to be processed

www.EBooksWorld.ir
128 Chapter 3 Intensity Transformations and Spatial Filtering

a b
c d
FIGURE 3.9
(a) Aerial image.
(b)–(d) Results
of applying the
transformation
in Eq. (3-5) with
g = 3.0, 4.0, and
5.0, respectively.
(c = 1 in all cases.)
(Original image
courtesy of
NASA.)

now has a washed-out appearance, indicating that a compression of intensity levels is desirable. This can
be accomplished with Eq. (3-5) using values of g greater than 1. The results of processing Fig. 3.9(a) with
g = 3.0, 4.0, and 5.0 are shown in Figs. 3.9(b) through (d), respectively. Suitable results were obtained
using gamma values of 3.0 and 4.0. The latter result has a slightly more appealing appearance because it
has higher contrast. This is true also of the result obtained with g = 5.0. For example, the airport runways
near the middle of the image appears clearer in Fig. 3.9(d) than in any of the other three images.

PIECEWISE LINEAR TRANSFORMATION FUNCTIONS


An approach complementary to the methods discussed in the previous three sec-
tions is to use piecewise linear functions. The advantage of these functions over those
discussed thus far is that the form of piecewise functions can be arbitrarily complex.
In fact, as you will see shortly, a practical implementation of some important trans-
formations can be formulated only as piecewise linear functions. The main disadvan-
tage of these functions is that their specification requires considerable user input.

www.EBooksWorld.ir
3.2 Some Basic Intensity Transformation Functions 129

Contrast Stretching
Low-contrast images can result from poor illumination, lack of dynamic range in the
imaging sensor, or even the wrong setting of a lens aperture during image acquisi-
tion. Contrast stretching expands the range of intensity levels in an image so that it
spans the ideal full intensity range of the recording medium or display device.
Figure 3.10(a) shows a typical transformation used for contrast stretching. The
locations of points (r1 , s1 ) and (r2 , s2 ) control the shape of the transformation function.
If r1 = s1 and r2 = s2 the transformation is a linear function that produces no changes
in intensity. If r1 = r2 , s1 = 0, and s2 = L − 1 the transformation becomes a threshold-
ing function that creates a binary image [see Fig. 3.2(b)]. Intermediate values of (r1 , s1 )
and ( s2 , r2 ) produce various degrees of spread in the intensity levels of the output
image, thus affecting its contrast. In general, r1 ≤ r2 and s1 ≤ s2 is assumed so that
the function is single valued and monotonically increasing. This preserves the order
of intensity levels, thus preventing the creation of intensity artifacts. Figure 3.10(b)
shows an 8-bit image with low contrast. Figure 3.10(c) shows the result of contrast
stretching, obtained by setting (r1 , s1 ) = (rmin , 0) and (r2 , s2 ) = (rmax , L − 1), where
rmin and rmax denote the minimum and maximum intensity levels in the input image,

a b L1
c d (r2, s2)
FIGURE 3.10
Contrast stretching. 3L/4
(a) Piecewise linear
Output intensities, s

transformation
L/2 T(r)
function. (b) A low-
contrast electron
microscope image
of pollen, magnified L/4
700 times. (r1, s1)
(c) Result of
0
contrast stretching.
(d) Result of 0 L/4 L/2 3L/4 L1
thresholding. Input intensities, r
(Original image
courtesy of Dr.
Roger Heady,
Research School of
Biological Sciences,
Australian National
University,
Canberra,
Australia.)

www.EBooksWorld.ir
130 Chapter 3 Intensity Transformations and Spatial Filtering

a b L1 L1
FIGURE 3.11
(a) This transfor-
mation function
highlights range
[ A, B] and reduces
all other intensities s s T (r)
T (r)
to a lower level.
(b) This function
highlights range
[ A, B] and leaves
other intensities
unchanged. r r
0 A B L1 0 A B L1

respectively. The transformation stretched the intensity levels linearly to the full
intensity range, [0, L − 1]. Finally, Fig. 3.10(d) shows the result of using the thresh-
olding function, with (r1 , s1 ) = (m, 0) and (r2 , s2 ) = (m, L − 1), where m is the mean
intensity level in the image.

Intensity-Level Slicing
There are applications in which it is of interest to highlight a specific range of inten-
sities in an image. Some of these applications include enhancing features in satellite
imagery, such as masses of water, and enhancing flaws in X-ray images. The method,
called intensity-level slicing, can be implemented in several ways, but most are varia-
tions of two basic themes. One approach is to display in one value (say, white) all the
values in the range of interest and in another (say, black) all other intensities. This
transformation, shown in Fig. 3.11(a), produces a binary image. The second approach,
based on the transformation in Fig. 3.11(b), brightens (or darkens) the desired range
of intensities, but leaves all other intensity levels in the image unchanged.

EXAMPLE 3.3 : Intensity-level slicing.


Figure 3.12(a) is an aortic angiogram near the kidney area (see Section 1.3 for details on this image). The
objective of this example is to use intensity-level slicing to enhance the major blood vessels that appear
lighter than the background, as a result of an injected contrast medium. Figure 3.12(b) shows the result
of using a transformation of the form in Fig. 3.11(a). The selected band was near the top of the intensity
scale because the range of interest is brighter than the background. The net result of this transformation
is that the blood vessel and parts of the kidneys appear white, while all other intensities are black. This
type of enhancement produces a binary image, and is useful for studying the shape characteristics of the
flow of the contrast medium (to detect blockages, for example).
If interest lies in the actual intensity values of the region of interest, we can use the transformation of
the form shown in Fig. 3.11(b). Figure 3.12(c) shows the result of using such a transformation in which
a band of intensities in the mid-gray region around the mean intensity was set to black, while all other
intensities were left unchanged. Here, we see that the gray-level tonality of the major blood vessels and
part of the kidney area were left intact. Such a result might be useful when interest lies in measuring the
actual flow of the contrast medium as a function of time in a sequence of images.

www.EBooksWorld.ir
3.2 Some Basic Intensity Transformation Functions 131

a b c
FIGURE 3.12 (a) Aortic angiogram. (b) Result of using a slicing transformation of the type illustrated in Fig. 3.11(a),
with the range of intensities of interest selected in the upper end of the gray scale. (c) Result of using the transfor-
mation in Fig. 3.11(b), with the selected range set near black, so that the grays in the area of the blood vessels and
kidneys were preserved. (Original image courtesy of Dr. Thomas R. Gest, University of Michigan Medical School.)

Bit-Plane Slicing
Pixel values are integers composed of bits. For example, values in a 256-level gray-
scale image are composed of 8 bits (one byte). Instead of highlighting intensity-level
ranges, as 3.3, we could highlight the contribution made to total image appearance
by specific bits. As Fig. 3.13 illustrates, an 8-bit image may be considered as being
composed of eight one-bit planes, with plane 1 containing the lowest-order bit of all
pixels in the image, and plane 8 all the highest-order bits.
Figure 3.14(a) shows an 8-bit grayscale image and Figs. 3.14(b) through (i) are
its eight one-bit planes, with Fig. 3.14(b) corresponding to the highest-order bit.
Observe that the four higher-order bit planes, especially the first two, contain a sig-
nificant amount of the visually-significant data. The lower-order planes contribute
to more subtle intensity details in the image. The original image has a gray border
whose intensity is 194. Notice that the corresponding borders of some of the bit

FIGURE 3.13 One 8-bit byte Bit plane 8


Bit-planes of an (most significant)
8-bit image.

Bit plane 1
(least significant)

www.EBooksWorld.ir
132 Chapter 3 Intensity Transformations and Spatial Filtering

a b c
d e f
g h i
FIGURE 3.14 (a) An 8-bit gray-scale image of size 550 × 1192 pixels. (b) through (i) Bit planes 8 through 1, with bit
plane 1 corresponding to the least significant bit. Each bit plane is a binary image..

planes are black (0), while others are white (1). To see why, consider a pixel in, say,
the middle of the lower border of Fig. 3.14(a). The corresponding pixels in the bit
planes, starting with the highest-order plane, have values 1 1 0 0 0 0 1 0, which is the
binary representation of decimal 194. The value of any pixel in the original image
can be similarly reconstructed from its corresponding binary-valued pixels in the bit
planes by converting an 8-bit binary sequence to decimal.
The binary image for the 8th bit plane of an 8-bit image can be obtained by thresh-
olding the input image with a transformation function that maps to 0 intensity values
between 0 and 127, and maps to 1 values between 128 and 255. The binary image in
Fig. 3.14(b) was obtained in this manner. It is left as an exercise (see Problem 3.3) to
obtain the transformation functions for generating the other bit planes.
Decomposing an image into its bit planes is useful for analyzing the relative
importance of each bit in the image, a process that aids in determining the adequacy
of the number of bits used to quantize the image. Also, this type of decomposition
is useful for image compression (the topic of Chapter 8), in which fewer than all
planes are used in reconstructing an image. For example, Fig. 3.15(a) shows an image
reconstructed using bit planes 8 and 7 of the preceding decomposition. The recon-
struction is done by multiplying the pixels of the nth plane by the constant 2 n−1. This
converts the nth significant binary bit to decimal. Each bit plane is multiplied by the
corresponding constant, and all resulting planes are added to obtain the grayscale
image. Thus, to obtain Fig. 3.15(a), we multiplied bit plane 8 by 128, bit plane 7 by 64,
and added the two planes. Although the main features of the original image were
restored, the reconstructed image appears flat, especially in the background. This

www.EBooksWorld.ir
3.3 Histogram Processing 133

a b c FIGURE 3.15 Image reconstructed from bit planes: (a) 8 and 7; (b) 8, 7, and 6; (c) 8, 7, 6, and 5.

is not surprising, because two planes can produce only four distinct intensity lev-
els. Adding plane 6 to the reconstruction helped the situation, as Fig. 3.15(b) shows.
Note that the background of this image has perceptible false contouring. This effect
is reduced significantly by adding the 5th plane to the reconstruction, as Fig. 3.15(c)
illustrates. Using more planes in the reconstruction would not contribute significant-
ly to the appearance of this image. Thus, we conclude that, in this example, storing
the four highest-order bit planes would allow us to reconstruct the original image
in acceptable detail. Storing these four planes instead of the original image requires
50% less storage.

3.3 HISTOGRAM PROCESSING


3.3

Let rk , fork = 0, 1, 2, … , L − 1, denote the intensities of an L-level digital image,


f ( x, y). The unnormalized histogram of f is defined as

h(rk ) = nk for k = 0, 1, 2, … , L − 1 (3-6)

where nk is the number of pixels in f with intensity rk , and the subdivisions of the
intensity scale are called histogram bins. Similarly, the normalized histogram of f is
defined as
h(rk ) n
p(rk ) = = k (3-7)
MN MN

where, as usual, M and N are the number of image rows and columns, respectively.
Mostly, we work with normalized histograms, which we refer to simply as histograms
or image histograms. The sum of p(rk ) for all values of k is always 1. The components
of p(rk ) are estimates of the probabilities of intensity levels occurring in an image.
As you will learn in this section, histogram manipulation is a fundamental tool in
image processing. Histograms are simple to compute and are also suitable for fast
hardware implementations, thus making histogram-based techniques a popular tool
for real-time image processing.
Histogram shape is related to image appearance. For example, Fig. 3.16 shows
images with four basic intensity characteristics: dark, light, low contrast, and high
contrast; the image histograms are also shown. We note in the dark image that the
most populated histogram bins are concentrated on the lower (dark) end of the
intensity scale. Similarly, the most populated bins of the light image are biased
toward the higher end of the scale. An image with low contrast has a narrow histo-

www.EBooksWorld.ir
134 Chapter 3 Intensity Transformations and Spatial Filtering

Histogram of Histogram of Histogram of Histogram of


dark image light image low-contrast image high-contrast image

a b c d
FIGURE 3.16 Four image types and their corresponding histograms. (a) dark; (b) light; (c) low contrast; (d) high con-
trast. The horizontal axis of the histograms are values of rk and the vertical axis are values of p(rk ).

gram located typically toward the middle of the intensity scale, as Fig. 3.16(c) shows.
For a monochrome image, this implies a dull, washed-out gray look. Finally, we see
that the components of the histogram of the high-contrast image cover a wide range
of the intensity scale, and the distribution of pixels is not too far from uniform, with
few bins being much higher than the others. Intuitively, it is reasonable to conclude
that an image whose pixels tend to occupy the entire range of possible intensity lev-
els and, in addition, tend to be distributed uniformly, will have an appearance of high
contrast and will exhibit a large variety of gray tones. The net effect will be an image
that shows a great deal of gray-level detail and has a high dynamic range. As you will
see shortly, it is possible to develop a transformation function that can achieve this
effect automatically, using only the histogram of an input image.

HISTOGRAM EQUALIZATION
Assuming initially continuous intensity values, let the variable r denote the intensi-
ties of an image to be processed. As usual, we assume that r is in the range [0, L − 1],
with r = 0 representing black and r = L − 1 representing white. For r satisfying these
conditions, we focus attention on transformations (intensity mappings) of the form

s = T (r ) 0 ≤ r ≤ L−1 (3-8)

www.EBooksWorld.ir
3.3 Histogram Processing 135

a b T (r) T (r)
FIGURE 3.17
(a) Monotonic L1 L1
increasing function,
Single
showing how value, sk
multiple values can T (r)
T (r)
map to a single
value. (b) Strictly Single sk
monotonic increas- value, sq
ing function. This is
a one-to-one map- ...
ping, both ways.
r r
0 L1 0 rk L1
Multiple Single
values value

that produce an output intensity value, s, for a given intensity value r in the input
image. We assume that

(a) T (r ) is a monotonic† increasing function in the interval 0 ≤ r ≤ L − 1; and


(b) 0 ≤ T (r ) ≤ L − 1 for 0 ≤ r ≤ L − 1.

In some formulations to be discussed shortly, we use the inverse transformation

r = T −1 ( s) 0 ≤ s ≤ L−1 (3-9)

in which case we change condition (a) to:

(a) T (r ) is a strictly monotonic increasing function in the interval 0 ≤ r ≤ L − 1.

The condition in (a) that T (r ) be monotonically increasing guarantees that output


intensity values will never be less than corresponding input values, thus preventing
artifacts created by reversals of intensity. Condition (b) guarantees that the range of
output intensities is the same as the input. Finally, condition (a) guarantees that the
mappings from s back to r will be one-to-one, thus preventing ambiguities.
Figure 3.17(a) shows a function that satisfies conditions (a) and (b). Here, we see
that it is possible for multiple input values to map to a single output value and still
satisfy these two conditions. That is, a monotonic transformation function performs
a one-to-one or many-to-one mapping. This is perfectly fine when mapping from r
to s. However, Fig. 3.17(a) presents a problem if we wanted to recover the values of
r uniquely from the mapped values (inverse mapping can be visualized by revers-
ing the direction of the arrows). This would be possible for the inverse mapping
of sk in Fig. 3.17(a), but the inverse mapping of sq is a range of values, which, of
course, prevents us in general from recovering the original value of r that resulted

A function T (r ) is a monotonic increasing function if T (r2 ) ≥ T (r1 ) for r2 > r1 . T (r ) is a strictly monotonic increas-
ing function if T (r2 ) > T (r1 ) for r2 > r1 . Similar definitions apply to a monotonic decreasing function.

www.EBooksWorld.ir
136 Chapter 3 Intensity Transformations and Spatial Filtering

in sq . As Fig. 3.17(b) shows, requiring that T (r ) be strictly monotonic guarantees


that the inverse mappings will be single valued (i.e., the mapping is one-to-one in
both directions).This is a theoretical requirement that will allow us to derive some
important histogram processing techniques later in this chapter. Because images are
stored using integer intensity values, we are forced to round all results to their near-
est integer values. This often results in strict monotonicity not being satisfied, which
implies inverse transformations that may not be unique. Fortunately, this problem is
not difficult to handle in the discrete case, as Example 3.7 in this section illustrates.
The intensity of an image may be viewed as a random variable in the interval
[0, L − 1]. Let pr (r ) and ps ( s) denote the PDFs of intensity values r and s in two dif-
ferent images. The subscripts on p indicate that pr and ps are different functions. A
fundamental result from probability theory is that if pr (r ) and T (r ) are known, and
T (r ) is continuous and differentiable over the range of values of interest, then the
PDF of the transformed (mapped) variable s can be obtained as

dr
ps ( s) = pr (r ) (3-10)
ds

Thus, we see that the PDF of the output intensity variable, s, is determined by the
PDF of the input intensities and the transformation function used [recall that r and
s are related by T (r )].
A transformation function of particular importance in image processing is

20
r
s = T (r ) = (L − 1) pr (w) dw (3-11)

where w is a dummy variable of integration. The integral on the right side is the
cumulative distribution function (CDF) of random variable r. Because PDFs always
are positive, and the integral of a function is the area under the function, it follows
that the transformation function of Eq. (3-11) satisfies condition (a). This is because
the area under the function cannot decrease as r increases. When the upper limit in
this equation is r = (L − 1) the integral evaluates to 1, as it must for a PDF. Thus, the
maximum value of s is L − 1, and condition (b) is satisfied also.
We use Eq. (3-10) to find the ps ( s) corresponding to the transformation just dis-
cussed. We know from Leibniz’s rule in calculus that the derivative of a definite
integral with respect to its upper limit is the integrand evaluated at the limit. That is,

ds dT (r )
=
dr dr

⎢⎣ 2
d ⎡ r ⎤
= (L − 1) ⎢ pr (w) dw ⎥ (3-12)
dr 0 ⎥⎦
= (L − 1) pr (r )

www.EBooksWorld.ir
3.3 Histogram Processing 137

pr (r) ps (s)

A
Eq. (3-11)
1
L1

r s
0 L1 0 L1
a b
FIGURE 3.18 (a) An arbitrary PDF. (b) Result of applying Eq. (3-11) to the input PDF. The
resulting PDF is always uniform, independently of the shape of the input.

Substituting this result for dr ds in Eq. (3-10), and noting that all probability values
are positive, gives the result
dr
ps ( s) = pr (r )
ds
1
= pr (r ) (3-13)
(L − 1) pr (r )
1
= 0 ≤ s ≤ L−1
L−1
We recognize the form of ps ( s) in the last line of this equation as a uniform prob-
ability density function. Thus, performing the intensity transformation in Eq. (3-11)
yields a random variable, s, characterized by a uniform PDF. What is important is
that ps ( s) in Eq. (3-13) will always be uniform, independently of the form of pr (r ).
Figure 3.18 and the following example illustrate these concepts.

EXAMPLE 3.4 : Illustration of Eqs. (3-11) and (3-13).


Suppose that the (continuous) intensity values in an image have the PDF

⎧ 2r
⎪ for 0 ≤ r ≤ L − 1
pr (r ) = ⎨ (L − 1)2
⎪0 otherwise

From Eq. (3-11)

20 L−12
r r
2 r2
s = T (r ) = (L − 1) pr (w) dw = w dw =
0 L−1

www.EBooksWorld.ir
138 Chapter 3 Intensity Transformations and Spatial Filtering

Suppose that we form a new image with intensities, s, obtained using this transformation; that is, the s
values are formed by squaring the corresponding intensity values of the input image, then dividing them
by L − 1. We can verify that the PDF of the intensities in the new image, ps ( s), is uniform by substituting
pr (r ) into Eq. (3-13), and using the fact that s = r 2 (L − 1); that is,

−1
dr 2r ⎡ ds ⎤
ps ( s) = pr (r ) =
ds (L − 1) ⎣ dr ⎥⎦
2 ⎢

−1
2r ⎡ d r2 ⎤ 2r (L − 1) 1
= ⎢ ⎥ = =
(L − 1)2 ⎣ dr L − 1 ⎦ (L − 1)2 2r L−1

The last step follows because r is nonnegative and L > 1. As expected, the result is a uniform PDF.

For discrete values, we work with probabilities and summations instead of prob-
ability density functions and integrals (but the requirement of monotonicity stated
earlier still applies). Recall that the probability of occurrence of intensity level rk in
a digital image is approximated by
nk
pr (rk ) = (3-14)
MN
where MN is the total number of pixels in the image, and nk denotes the number of
pixels that have intensity rk . As noted in the beginning of this section, pr (rk ), with
rk ∈ [0, L − 1], is commonly referred to as a normalized image histogram.
The discrete form of the transformation in Eq. (3-11) is
k
sk = T (rk ) = (L − 1)∑ pr (rj ) k = 0, 1, 2, … , L − 1 (3-15)
j=0

where, as before, L is the number of possible intensity levels in the image (e.g., 256
for an 8-bit image). Thus, a processed (output) image is obtained by using Eq. (3-15)
to map each pixel in the input image with intensity rk into a corresponding pixel with
level sk in the output image, This is called a histogram equalization or histogram
linearization transformation. It is not difficult to show (see Problem 3.9) that this
transformation satisfies conditions (a) and (b) stated previously in this section.

EXAMPLE 3.5 : Illustration of the mechanics of histogram equalization.


It will be helpful to work through a simple example. Suppose that a 3-bit image (L = 8) of size 64 × 64
pixels (MN = 4096) has the intensity distribution in Table 3.1, where the intensity levels are integers in
the range [0, L − 1] = [0, 7]. The histogram of this image is sketched in Fig. 3.19(a).Values of the histo-
gram equalization transformation function are obtained using Eq. (3-15). For instance,
0
s0 = T (r0 ) = 7∑ pr (rj ) = 7 pr (r0 ) = 1.33
j=0

www.EBooksWorld.ir
3.3 Histogram Processing 139

TABLE 3.1
Intensity rk nk pr (rk ) = nk MN
distribution and r0 = 0 790 0.19
histogram values
for a 3-bit, 64 × 64 r1 = 1 1023 0.25
digital image.
r2 = 2 850 0.21

r3 = 3 656 0.16

r4 = 4 329 0.08

r5 = 5 245 0.06

r6 = 6 122 0.03

r7 = 7 81 0.02

Similarly, s1 = T (r1 ) = 3.08, s2 = 4.55, s3 = 5.67, s4 = 6.23, s5 = 6.65, s6 = 6.86, and s7 = 7.00. This trans-
formation function has the staircase shape shown in Fig. 3.19(b).
At this point, the s values are fractional because they were generated by summing probability values,
so we round them to their nearest integer values in the range [0, 7] :

s0 = 1.33 → 1 s2 = 4.55 → 5 s4 = 6.23 → 6 s6 = 6.86 → 7


s1 = 3.08 → 3 s3 = 5.67 → 6 s5 = 6.65 → 7 s7 = 7.00 → 7

These are the values of the equalized histogram. Observe that the transformation yielded only five
distinct intensity levels. Because r0 = 0 was mapped to s0 = 1, there are 790 pixels in the histogram
equalized image with this value (see Table 3.1). Also, there are 1023 pixels with a value of s1 = 3 and 850
pixels with a value of s2 = 5. However, both r3 and r4 were mapped to the same value, 6, so there are
(656 + 329) = 985 pixels in the equalized image with this value. Similarly, there are (245 + 122 + 81) = 448
pixels with a value of 7 in the histogram equalized image. Dividing these numbers by MN = 4096 yield-
ed the equalized histogram in Fig. 3.19(c).
Because a histogram is an approximation to a PDF, and no new allowed intensity levels are created
in the process, perfectly flat histograms are rare in practical applications of histogram equalization using
the method just discussed. Thus, unlike its continuous counterpart, it cannot be proved in general that
discrete histogram equalization using Eq. (3-15) results in a uniform histogram (we will introduce later in

a b c pr (rk) sk ps (sk)
FIGURE 3.19
.25 7.0 .25
Histogram
equalization. .20 5.6 .20
(a) Original .15 4.2
T (r) .15
histogram. .10 2.8 .10
(b) Transformation
.05 1.4 .05
function.
(c) Equalized sk
0 1 2 3 4 5 6 7 rk 0 1 2 3 4 5 6 7 rk 0 1 2 3 4 5 6 7
histogram.

www.EBooksWorld.ir
140 Chapter 3 Intensity Transformations and Spatial Filtering

this section an approach for removing this limitation). However, as you will see shortly, using Eq. (3-15)
has the general tendency to spread the histogram of the input image so that the intensity levels of the
equalized image span a wider range of the intensity scale. The net result is contrast enhancement.

We discussed earlier the advantages of having intensity values that span the entire
gray scale. The method just derived produces intensities that have this tendency, and
also has the advantage that it is fully automatic. In other words, the process of his-
togram equalization consists entirely of implementing Eq. (3-15), which is based on
information that can be extracted directly from a given image, without the need for
any parameter specifications. This automatic, “hands-off” characteristic is important.
The inverse transformation from s back to r is denoted by

rk = T −1 ( sk ) (3-16)

It can be shown (see Problem 3.9) that this inverse transformation satisfies conditions
(a) and (b) defined earlier only if all intensity levels are present in the input image.
This implies that none of the bins of the image histogram are empty. Although the
inverse transformation is not used in histogram equalization, it plays a central role
in the histogram-matching scheme developed after the following example.

EXAMPLE 3.6 : Histogram equalization.


The left column in Fig. 3.20 shows the four images from Fig. 3.16, and the center column shows the result
of performing histogram equalization on each of these images. The first three results from top to bottom
show significant improvement. As expected, histogram equalization did not have much effect on the
fourth image because its intensities span almost the full scale already. Figure 3.21 shows the transforma-
tion functions used to generate the equalized images in Fig. 3.20. These functions were generated using
Eq. (3-15). Observe that transformation (4) is nearly linear, indicating that the inputs were mapped to
nearly equal outputs. Shown is the mapping of an input value rk to a corresponding output value sk . In
this case, the mapping was for image 1 (on the top left of Fig. 3.21), and indicates that a dark value was
mapped to a much lighter one, thus contributing to the brightness of the output image.
The third column in Fig. 3.20 shows the histograms of the equalized images. While all the histograms
are different, the histogram-equalized images themselves are visually very similar. This is not totally
unexpected because the basic difference between the images on the left column is one of contrast, not
content. Because the images have the same content, the increase in contrast resulting from histogram
equalization was enough to render any intensity differences between the equalized images visually
indistinguishable. Given the significant range of contrast differences in the original images, this example
illustrates the power of histogram equalization as an adaptive, autonomous contrast-enhancement tool.

HISTOGRAM MATCHING (SPECIFICATION)


As explained in the last section, histogram equalization produces a transformation
function that seeks to generate an output image with a uniform histogram. When
automatic enhancement is desired, this is a good approach to consider because the

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 153

for x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1, where, as indicated above, C, k0 , k1 , k2 , and k3 are


specified constants, mG is the global mean of the input image, and sG is its standard deviation. Param-
eters mSxy and sSxy are the local mean and standard deviation, respectively, which change for every loca-
tion ( x, y). As usual, M and N are the number of rows and columns in the input image.
Factors such as the values of the global mean and variance relative to values in the areas to be
enhanced play a key role in selecting the parameters in Eq. (3-29), as does the range of differences
between the intensities of the areas to be enhanced and their background. In the case of Fig. 3.27(a),
mG = 161, sG = 103, the maximum intensity values of the image and areas to be enhanced are 228 and
10, respectively, and the minimum values are 0 in both cases.
We would like for the maximum value of the enhanced features to be the same as the maximum value
of the image, so we select C = 22.8. The areas to be enhanced are quite dark relative to the rest of the
image, and they occupy less than a third of the image area; thus, we expect the mean intensity in the
dark areas to be much less than the global mean. Based on this, we let k0 = 0 and k1 = 0.1. Because the
areas to be enhanced are of very low contrast, we let k2 = 0. For the upper limit of acceptable values
of standard deviation we set k3 = 0.1, which gives us one-tenth of the global standard deviation. Figure
3.27(b) is the result of using Eq. (3-29) with these parameters. By comparing this figure with Fig. 3.26(c),
we see that the method based on local statistics detected the same hidden features as local histogram
equalization. But the present approach extracted significantly more detail. For example, we see that all
the objects are solid, but only the boundaries were detected by local histogram equalization. In addition,
note that the intensities of the objects are not the same, with the objects in the top-left and bottom-right
being brighter than the others. Also, the horizontal rectangles in the lower left square evidently are of
different intensities. Finally, note that the background in both the image and dark squares in Fig. 3.27(b)
is nearly the same as in the original image; by comparison, the same regions in Fig. 3.26(c) exhibit more
visible noise and have lost their gray-level content. Thus, the additional complexity required to use local
statistics yielded results in this case that are superior to local histogram equalization.

3.4 FUNDAMENTALS OF SPATIAL FILTERING


3.4

In this section, we discuss the use of spatial filters for image processing. Spatial filter-
ing is used in a broad spectrum of image processing applications, so a solid under-
standing of filtering principles is important. As mentioned at the beginning of this
chapter, the filtering examples in this section deal mostly with image enhancement.
Other applications of spatial filtering are discussed in later chapters.
The name filter is borrowed from frequency domain processing (the topic of
Chapter 4) where “filtering” refers to passing, modifying, or rejecting specified fre-
quency components of an image. For example, a filter that passes low frequencies
is called a lowpass filter. The net effect produced by a lowpass filter is to smooth an
image by blurring it. We can accomplish similar smoothing directly on the image
itself by using spatial filters.
Spatial filtering modifies an image by replacing the value of each pixel by a func-
tion of the values of the pixel and its neighbors. If the operation performed on the
See Section 2.6 regarding
image pixels is linear, then the filter is called a linear spatial filter. Otherwise, the
linearity. filter is a nonlinear spatial filter. We will focus attention first on linear filters and then
introduce some basic nonlinear filters. Section 5.3 contains a more comprehensive
list of nonlinear filters and their application.

www.EBooksWorld.ir
154 Chapter 3 Intensity Transformations and Spatial Filtering

THE MECHANICS OF LINEAR SPATIAL FILTERING


A linear spatial filter performs a sum-of-products operation between an image f and a
filter kernel, w. The kernel is an array whose size defines the neighborhood of opera-
tion, and whose coefficients determine the nature of the filter. Other terms used to
refer to a spatial filter kernel are mask, template, and window. We use the term filter
kernel or simply kernel.
Figure 3.28 illustrates the mechanics of linear spatial filtering using a 3 × 3 ker-
nel. At any point ( x, y) in the image, the response, g( x, y), of the filter is the sum of
products of the kernel coefficients and the image pixels encompassed by the kernel:

g( x, y) = w(−1, −1) f ( x − 1, y − 1) + w(−1, 0) f ( x − 1, y) + …


(3-30)
+ w(0, 0) f ( x, y) + … + w(1, 1) f ( x + 1, y + 1)

As coordinates x and y are varied, the center of the kernel moves from pixel to pixel,
generating the filtered image, g, in the process.†
Observe that the center coefficient of the kernel, w(0, 0) , aligns with the pixel at
location ( x, y). For a kernel of size m × n, we assume that m = 2a + 1 and n = 2b + 1,
It certainly is possible where a and b are nonnegative integers. This means that our focus is on kernels of
to work with kernels of
even size, or mixed even odd size in both coordinate directions. In general, linear spatial filtering of an image
and odd sizes. However, of size M × N with a kernel of size m × n is given by the expression
working with odd sizes
simplifies indexing and a b
is also more intuitive
because the kernels have g( x, y) = ∑ ∑ w(s, t ) f ( x + s, y + t )
s = −a t = −b
(3-31)
centers falling on integer
values, and they are
spatially symmetric. where x and y are varied so that the center (origin) of the kernel visits every pixel in
f once. For a fixed value of ( x, y), Eq. (3-31) implements the sum of products of the
form shown in Eq. (3-30), but for a kernel of arbitrary odd size. As you will learn in
the following section, this equation is a central tool in linear filtering.

SPATIAL CORRELATION AND CONVOLUTION


Spatial correlation is illustrated graphically in Fig. 3.28, and it is described mathemati-
cally by Eq. (3-31). Correlation consists of moving the center of a kernel over an
image, and computing the sum of products at each location. The mechanics of spatial
convolution are the same, except that the correlation kernel is rotated by 180°. Thus,
when the values of a kernel are symmetric about its center, correlation and convolu-
tion yield the same result. The reason for rotating the kernel will become clear in
the following discussion. The best way to explain the differences between the two
concepts is by example.
We begin with a 1-D illustration, in which case Eq. (3-31) becomes
a
g( x ) = ∑ w(s) f ( x + s)
s = −a
(3-32)


A filtered pixel value typically is assigned to a corresponding location in a new image created to hold the results
of filtering. It is seldom the case that filtered pixels replace the values of the corresponding location in the origi-
nal image, as this would change the content of the image while filtering is being performed.

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 155

FIGURE 3.28 Image origin


The mechanics y
of linear spatial
filtering Kernel origin
using a 3 × 3
kernel. The pixels
are shown as Magnified view showing filter kernel
squares to sim- Filter kernel coefficients and corresponding pixels
plify the graph- in the image
ics. Note that
the origin of the
image is at the top Image pixels
left, but the origin
of the kernel is at
its center. Placing
w (1, 1) w (1, 0) w (1, 1)
the origin at the Image f
center of spatially
symmetric kernels
simplifies writing w (0, 1) w (0, 0) w (0, 1) Filter kernel, w(s,t)
x
expressions for
linear filtering.
w (1, 1) w (1, 0) w (1, 1)

f(x  1, y  1) f (x  1, y) f (x  1, y 1)
Kernel coefficients

f (x, y  1) f (x, y) f (x, y 1)

f(x 1, y  1) f (x 1, y) f (x 1, y 1)

Pixel values under kernel


when it is centered on (x, y)

Figure 3.29(a) shows a 1-D function, f, and a kernel, w. The kernel is of size 1 × 5, so
a = 2 and b = 0 in this case. Figure 3.29(b) shows the starting position used to per-
form correlation, in which w is positioned so that its center coefficient is coincident
with the origin of f.
The first thing we notice is that part of w lies outside f, so the summation is
Zero padding is not the
undefined in that area. A solution to this problem is to pad function f with enough
only padding option, as 0’s on either side. In general, if the kernel is of size 1 × m, we need (m − 1) 2 zeros
we will discuss in detail
later in this chapter.
on either side of f in order to handle the beginning and ending configurations of w
with respect to f. Figure 3.29(c) shows a properly padded function. In this starting
configuration, all coefficients of the kernel overlap valid values.

www.EBooksWorld.ir
156 Chapter 3 Intensity Transformations and Spatial Filtering

FIGURE 3.29 Correlation Convolution


Illustration of 1-D
correlation and Origin f w Origin f w rotated 180
convolution of a (a) 0 0 0 1 0 0 0 0 1 2 4 2 8 0 0 0 1 0 0 0 0 8 2 4 2 1 (i)
kernel, w, with a
function f (b) 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 ( j)
consisting of a 1 2 4 2 8 8 2 4 2 1
discrete unit
impulse. Note that Starting position alignment Starting position alignment
correlation and
convolution are Zero padding Zero padding
functions of the
(c) 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 (k)
variable x, which
acts to displace 1 2 4 2 8 8 2 4 2 1
one function with Starting position Starting position
respect to the (d) 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 (l)
other. For the 1 2 4 2 8 8 2 4 2 1
extended
correlation and Position after 1 shift Position after 1 shift
convolution (e) 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 (m)
results, the
1 2 4 2 8 8 2 4 2 1
starting
configuration Position after 3 shifts Position after 3 shifts
places the right-
most element of (f) 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 (n)
the kernel to be 1 2 4 2 8 8 2 4 2 1
coincident with Final position Final position
the origin of f.
Additional Correlation result Convolution result
padding must be (g) 0 8 2 4 2 1 0 0 0 1 2 4 2 8 0 0 (o)
used.

Extended (full) correlation result Extended (full) convolution result


(h) 0 0 0 8 2 4 2 1 0 0 0 0 0 0 0 1 2 4 2 8 0 0 0 0 (p)

The first correlation value is the sum of products in this initial position, computed
using Eq. (3-32) with x = 0 :
2
g( 0 ) = ∑ w(s) f (s + 0) = 0
s = −2

This value is in the leftmost location of the correlation result in Fig. 3.29(g).
To obtain the second value of correlation, we shift the relative positions of w and
f one pixel location to the right [i.e., we let x = 1 in Eq. (3-32)] and compute the sum
of products again. The result is g(1) = 8, as shown in the leftmost, nonzero location
in Fig. 3.29(g). When x = 2, we obtain g(2) = 2. When x = 3, we get g(3) = 4 [see Fig.
3.29(e)]. Proceeding in this manner by varying x one shift at a time, we “build” the
correlation result in Fig. 3.29(g). Note that it took 8 values of x (i.e., x = 0, 1, 2, … , 7 )
to fully shift w past f so the center coefficient in w visited every pixel in f. Sometimes,
it is useful to have every element of w visit every pixel in f. For this, we have to start

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 157

with the rightmost element of w coincident with the origin of f, and end with the
leftmost element of w being coincident the last element of f (additional padding
would be required). Figure Fig. 3.29(h) shows the result of this extended, or full, cor-
relation. As Fig. 3.29(g) shows, we can obtain the “standard” correlation by cropping
the full correlation in Fig. 3.29(h).
There are two important points to note from the preceding discussion. First, cor-
relation is a function of displacement of the filter kernel relative to the image. In
other words, the first value of correlation corresponds to zero displacement of the
kernel, the second corresponds to one unit displacement, and so on.† The second
thing to notice is that correlating a kernel w with a function that contains all 0’s and
a single 1 yields a copy of w, but rotated by 180°. A function that contains a single 1
Rotating a 1-D kernel
by 180° is equivalent to with the rest being 0’s is called a discrete unit impulse. Correlating a kernel with a dis-
flipping the kernel about
its axis.
crete unit impulse yields a rotated version of the kernel at the location of the impulse.
The right side of Fig. 3.29 shows the sequence of steps for performing convolution
(we will give the equation for convolution shortly). The only difference here is that
the kernel is pre-rotated by 180° prior to performing the shifting/sum of products
operations. As the convolution in Fig. 3.29(o) shows, the result of pre-rotating the
kernel is that now we have an exact copy of the kernel at the location of the unit
impulse. In fact, a foundation of linear system theory is that convolving a function
with an impulse yields a copy of the function at the location of the impulse. We will
use this property extensively in Chapter 4.
The 1-D concepts just discussed extend easily to images, as Fig. 3.30 shows. For a
kernel of size m × n, we pad the image with a minimum of (m − 1) 2 rows of 0’s at
the top and bottom and (n − 1) 2 columns of 0’s on the left and right. In this case,
m and n are equal to 3, so we pad f with one row of 0’s above and below and one
column of 0’s to the left and right, as Fig. 3.30(b) shows. Figure 3.30(c) shows the
initial position of the kernel for performing correlation, and Fig. 3.30(d) shows the
final result after the center of w visits every pixel in f, computing a sum of products
at each location. As before, the result is a copy of the kernel, rotated by 180°. We will
discuss the extended correlation result shortly.
In 2-D, rotation by 180°
For convolution, we pre-rotate the kernel as before and repeat the sliding sum of
is equivalent to flipping products just explained. Figures 3.30(f) through (h) show the result. You see again
the kernel about one axis
and then the other.
that convolution of a function with an impulse copies the function to the location
of the impulse. As noted earlier, correlation and convolution yield the same result if
the kernel values are symmetric about the center.
The concept of an impulse is fundamental in linear system theory, and is used in
numerous places throughout the book. A discrete impulse of strength (amplitude) A
located at coordinates ( x0 , y0 ) is defined as

⎧⎪ A if x = x0 and y = y0
d( x − x0 , y − y0 ) = ⎨ (3-33)
⎪⎩0 otherwise

In reality, we are shifting f to the left of w every time we increment x in Eq. (3-32). However, it is more intuitive
to think of the smaller kernel moving right over the larger array f. The motion of the two is relative, so either
way of looking at the motion is acceptable. The reason we increment f and not w is that indexing the equations
for correlation and convolution is much easier (and clearer) this way, especially when working with 2-D arrays.

www.EBooksWorld.ir
158 Chapter 3 Intensity Transformations and Spatial Filtering

FIGURE 3.30 Padded f


Correlation 0 0 0 0 0 0 0
(middle row) and Origin f 0 0 0 0 0 0 0
convolution (last 0 0 0 0 0 0 0 0 0 0 0 0
row) of a 2-D 0 0 0 0 0 w 0 0 0 1 0 0 0
kernel with an 0 0 1 0 0 1 2 3 0 0 0 0 0 0 0
image consisting 0 0 0 0 0 4 5 6 0 0 0 0 0 0 0
of a discrete unit 0 0 0 0 0 7 8 9 0 0 0 0 0 0 0
impulse. The 0’s
(a) (b)
are shown in gray
to simplify visual
Initial position for w Correlation result Full correlation result
analysis. Note that
correlation and 1 2 3 0 0 0 0 0 0 0 0 0 0 0
convolution are 4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
functions of x and 7 8 9 0 0 0 0 0 9 8 7 0 0 0 9 8 7 0 0
y. As these 0 0 0 1 0 0 0 0 6 5 4 0 0 0 6 5 4 0 0
variable change, 0 0 0 0 0 0 0 0 3 2 1 0 0 0 3 2 1 0 0
they 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
displace one 0 0 0 0 0 0 0 0 0 0 0 0 0 0
function with (c) (d) (e)
respect to the
other. See the Rotated w Convolution result Full convolution result
discussion of Eqs. 9 8 7 0 0 0 0 0 0 0 0 0 0 0
(3-36) and (3-37) 6 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
regarding full 3 2 1 0 0 0 0 0 1 2 3 0 0 0 1 2 3 0 0
correlation and 0 0 0 1 0 0 0 0 4 5 6 0 0 0 4 5 6 0 0
convolution. 0 0 0 0 0 0 0 0 7 8 9 0 0 0 7 8 9 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
(f) (g) (h)

For example, the unit impulse in Fig. 3.29(a) is given by d( x − 3) in the 1-D version of
Recall that A = 1 for a
unit impulse.
the preceding equation. Similarly, the impulse in Fig. 3.30(a) is given by d( x − 2, y − 2)
[remember, the origin is at (0, 0) ].
Summarizing the preceding discussion in equation form, the correlation of a
kernel w of size m × n with an image f ( x, y), denoted as (w � f )( x, y), is given by
Eq. (3-31), which we repeat here for convenience:
a b
(w � f )( x, y) = ∑ ∑ w(s, t ) f ( x + s, y + t )
s = −a t = −b
(3-34)

Because our kernels do not depend on ( x, y), we will sometimes make this fact explic-
it by writing the left side of the preceding equation as w � f ( x, y). Equation (3-34) is
evaluated for all values of the displacement variables x and y so that the center point
of w visits every pixel in f,† where we assume that f has been padded appropriately.


As we mentioned earlier, the minimum number of required padding elements for a 2-D correlation is (m − 1) 2
rows above and below f, and (n − 1) 2 columns on the left and right. With this padding, and assuming that f
is of size M × N , the values of x and y required to obtain a complete correlation are x = 0, 1, 2, … , M − 1 and
y = 0, 1, 2, … , N − 1. This assumes that the starting configuration is such that the center of the kernel coincides
with the origin of the image, which we have defined to be at the top, left (see Fig. 2.19).

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 159

As explained earlier, a = (m − 1) 2, b = (n − 1) 2, and we assume that m and n are


odd integers.
In a similar manner, the convolution of a kernel w of size m × n with an image
f ( x, y), denoted by (w � f )( x, y), is defined as
a b
(w � f )( x, y) = ∑ ∑ w(s, t ) f ( x − s, y − t )
s = −a t = −b
(3-35)

where the minus signs align the coordinates of f and w when one of the functions is
rotated by 180° (see Problem 3.17). This equation implements the sum of products
process to which we refer throughout the book as linear spatial filtering. That is, lin-
ear spatial filtering and spatial convolution are synonymous.
Because convolution is commutative (see Table 3.5), it is immaterial whether w
or f is rotated, but rotation of the kernel is used by convention. Our kernels do not
depend on ( x, y), a fact that we sometimes make explicit by writing the left side
of Eq. (3-35) as w � f ( x, y). When the meaning is clear, we let the dependence of
the previous two equations on x and y be implied, and use the simplified notation
w � f and w � f . As with correlation, Eq. (3-35) is evaluated for all values of the
displacement variables x and y so that the center of w visits every pixel in f, which
we assume has been padded. The values of x and y needed to obtain a full convolu-
tion are x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. The size of the result is M × N .
We can define correlation and convolution so that every element of w (instead of
just its center) visits every pixel in f. This requires that the starting configuration be
such that the right, lower corner of the kernel coincides with the origin of the image.
Similarly, the ending configuration will be with the top left corner of the kernel coin-
ciding with the lower right corner of the image. If the kernel and image are of sizes
m × n and M × N, respectively, the padding would have to increase to (m − 1) pad-
ding elements above and below the image, and (n − 1) elements to the left and right.
Under these conditions, the size of the resulting full correlation or convolution array
will be of size Sv × Sh , where (see Figs. 3.30(e) and (h), and Problem 3.19),

Sv = m + M − 1 (3-36)

and

Sh = n + N − 1 (3-37)

Often, spatial filtering algorithms are based on correlation and thus implement
Eq. (3-34) instead. To use the algorithm for correlation, we input w into it; for con-
volution, we input w rotated by 180°. The opposite is true for an algorithm that
implements Eq. (3-35). Thus, either Eq. (3-34) or Eq. (3-35) can be made to perform
the function of the other by rotating the filter kernel. Keep in mind, however, that
the order of the functions input into a correlation algorithm does make a difference,
because correlation is neither commutative nor associative (see Table 3.5).

www.EBooksWorld.ir
160 Chapter 3 Intensity Transformations and Spatial Filtering

TABLE 3.5
Property Convolution Correlation
Some fundamen-
tal properties of Commutative f �g = g� f —
convolution and
correlation. A Associative f � ( g � h) = ( f � g ) � h —
dash means that
the property does
Distributive f � ( g + h) = ( f � g ) + ( f � h) f � ( g + h) = ( f � g ) + ( f � h)
not hold.

Because the values of


Figure 3.31 shows two kernels used for smoothing the intensities of an image. To
these kernels are sym- filter an image using one of these kernels, we perform a convolution of the kernel
metric about the center,
no rotation is required
with the image in the manner just described. When talking about filtering and ker-
before convolution. nels, you are likely to encounter the terms convolution filter, convolution mask, or
convolution kernel to denote filter kernels of the type we have been discussing. Typi-
cally, these terms are used in the literature to denote a spatial filter kernel, and not
to imply necessarily that the kernel is used for convolution. Similarly, “convolving a
kernel with an image” often is used to denote the sliding, sum-of-products process
we just explained, and does not necessarily differentiate between correlation and
convolution. Rather, it is used generically to denote either of the two operations.
This imprecise terminology is a frequent source of confusion. In this book, when we
use the term linear spatial filtering, we mean convolving a kernel with an image.
Sometimes an image is filtered (i.e., convolved) sequentially, in stages, using a dif-
ferent kernel in each stage. For example, suppose than an image f is filtered with a
kernel w1 , the result filtered with kernel w2 , that result filtered with a third kernel,
and so on, for Q stages. Because of the commutative property of convolution, this
multistage filtering can be done in a single filtering operation, w � f , where
We could not write a
similar equation for
correlation because it is
w = w1 � w2 � w3 �  � wQ (3-38)
not commutative.
The size of w is obtained from the sizes of the individual kernels by successive
applications of Eqs. (3-36) and (3-37). If all the individual kernels are of size m × n,
it follows from these equations that w will be of size Wv × Wh , where
Wv = Q × (m − 1) + m (3-39)
and
Wh = Q × (n − 1) + n (3-40)
These equations assume that every value of a kernel visits every value of the array
resulting from the convolution in the previous step. That is, the initial and ending
configurations, are as described in connection with Eqs. (3-36) and (3-37).

a b
1 1 1 0.3679 0.6065 0.3679
FIGURE 3.31
Examples of 1 1
smoothing kernels: 9
 1 1 1
4.8976
 0.6065 1.0000 0.6065
(a) is a box kernel;
(b) is a Gaussian
1 1 1 0.3679 0.6065 0.3679
kernel.

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 161

SEPARABLE FILTER KERNELS


As noted in Section 2.6, a 2-D function G( x, y) is said to be separable if it can be written
as the product of two 1-D functions, G1 ( x) and G2 ( x); that is, G( x, y) = G1 ( x)G2 ( y).
A spatial filter kernel is a matrix, and a separable kernel is a matrix that can be
expressed as the outer product of two vectors. For example, the 2 * 3 kernel
To be strictly consistent
in notation, we should
use uppercase, bold ⎡1 1 1 ⎤
symbols for kernels when w=⎢ ⎥
we refer to them as ⎣1 1 1 ⎦
matrices. However,
kernels are mostly
treated in the book as is separable because it can be expressed as the outer product of the vectors
2-D functions, which we
denote in italics. To avoid
⎡1 ⎤
confusion, we continue ⎡1⎤
to use italics for kernels c=⎢ ⎥ and r = ⎢⎢1 ⎥⎥
in this short section, with
the understanding that
⎣1⎦ ⎢⎣1 ⎥⎦
the two notations are
intended to be equivalent
in this case. That is,

⎡1⎤ ⎡1 1 1 ⎤
c rT = ⎢ ⎥ [ 1 1 1 ] = ⎢ ⎥=w
⎣1⎦ ⎣1 1 1 ⎦
A separable kernel of size m × n can be expressed as the outer product of two vec-
tors, v and w:

w = vwT (3-41)

where v and w are vectors of size m × 1 and n × 1, respectively. For a square kernel
of size m × m, we write

w = vvT (3-42)

It turns out that the product of a column vector and a row vector is the same as the
2-D convolution of the vectors (see Problem 3.24).
The importance of separable kernels lies in the computational advantages that
result from the associative property of convolution. If we have a kernel w that can
be decomposed into two simpler kernels, such that w = w1 � w2 , then it follows
from the commutative and associative properties in Table 3.5 that

w � f = (w1 � w2 ) � f = (w2 � w1 ) � f = w2 � (w1 � f ) = (w1 � f ) � w2 (3-43)

This equation says that convolving a separable kernel with an image is the same as
convolving w1 with f first, and then convolving the result with w2 .
We assume that the For an image of size M × N and a kernel of size m × n, implementation of Eq.
values of M and N (3-35) requires on the order of MNmn multiplications and additions. This is because
include any padding of
f prior to performing it follows directly from that equation that each pixel in the output (filtered) image
convolution. depends on all the coefficients in the filter kernel. But, if the kernel is separable and
we use Eq. (3-43), then the first convolution, w1 � f , requires on the order of MNm

www.EBooksWorld.ir
162 Chapter 3 Intensity Transformations and Spatial Filtering

multiplications and additions because w1 is of size m × 1. The result is of size M × N ,


so the convolution of w2 with the result requires MNn such operations, for a total of
MN (m + n) multiplication and addition operations. Thus, the computational advan-
tage of performing convolution with a separable, as opposed to a nonseparable, ker-
nel is defined as

MNmn mn
C= = (3-44)
MN ( m + n ) m + n

For a kernel of modest size, say 11 × 11, the computational advantage (and thus exe-
cution-time advantage) is a respectable 5.2. For kernels with hundreds of elements,
execution times can be reduced by a factor of a hundred or more, which is significant.
We will illustrate the use of such large kernels in Example 3.16.
We know from matrix theory that a matrix resulting from the product of a column
vector and a row vector always has a rank of 1. By definition, a separable kernel is
formed by such a product. Therefore, to determine if a kernel is separable, all we
have to do is determine if its rank is 1. Typically, we find the rank of a matrix using a
pre-programmed function in the computer language being used. For example, if you
use MATLAB, function rank will do the job.
Once you have determined that the rank of a kernel matrix is 1, it is not difficult
to find two vectors v and w such that their outer product, vwT, is equal to the kernel.
The approach consists of only three steps:

1. Find any nonzero element in the kernel and let E denote its value.
2. Form vectors c and r equal, respectively, to the column and row in the kernel
containing the element found in Step 1.
3. With reference to Eq. (3-41), let v = c and wT = r E .

The reason why this simple three-step method works is that the rows and columns
of a matrix whose rank is 1 are linearly dependent. That is, the rows differ only by a
constant multiplier, and similarly for the columns. It is instructive to work through
the mechanics of this procedure using a small kernel (see Problems 3.20 and 3.22).
As we will discuss later
in this chapter, the only
As we explained above, the objective is to find two 1-D kernels, w1 and w2 , in
kernels that are sepa- order to implement 1-D convolution. In terms of the preceding notation, w1 = c = v
rable and whose values
are circularly symmetric
and w2 = r E = wT. For circularly symmetric kernels, the column through the center
about the center are of the kernel describes the entire kernel; that is, w = vvT c , where c is the value of
Gaussian kernels, which
have a nonzero center
the center coefficient. Then, the 1-D components are w1 = v and w2 = vT c .
coefficient (i.e., c > 0 for
these kernels).
SOME IMPORTANT COMPARISONS BETWEEN FILTERING IN THE
SPATIAL AND FREQUENCY DOMAINS
Although filtering in the frequency domain is the topic of Chapter 4, we introduce
at this junction some important concepts from the frequency domain that will help
you master the material that follows.
The tie between spatial- and frequency-domain processing is the Fourier trans-
form. We use the Fourier transform to go from the spatial to the frequency domain;

www.EBooksWorld.ir
3.4 Fundamentals of Spatial Filtering 163

a b u0

FIGURE 3.32 Frequency domain Spatial domain


(a) Ideal 1-D low-
pass filter transfer 1
function in the Passband Stopband
frequency domain.
(b) Corresponding u x
u0
filter kernel in the frequency
spatial domain.

to return to the spatial domain we use the inverse Fourier transform. This will be
covered in detail in Chapter 4. The focus here is on two fundamental properties
relating the spatial and frequency domains:

1. Convolution, which is the basis for filtering in the spatial domain, is equivalent
to multiplication in the frequency domain, and vice versa.
See the explanation of
2. An impulse of strength A in the spatial domain is a constant of value A in the
Eq. (3-33) regarding frequency domain, and vice versa.
impulses.
As explained in Chapter 4, a function (e.g., an image) satisfying some mild condi-
tions can be expressed as the sum of sinusoids of different frequencies and ampli-
tudes. Thus, the appearance of an image depends on the frequencies of its sinusoidal
components—change the frequencies of those components, and you will change the
appearance of the image. What makes this a powerful concept is that it is possible to
associate certain frequency bands with image characteristics. For example, regions
of an image with intensities that vary slowly (e.g., the walls in an image of a room)
are characterized by sinusoids of low frequencies. Similarly, edges and other sharp
intensity transitions are characterized by high frequencies. Thus, reducing the high-
frequency components of an image will tend to blur it.
Linear filtering is concerned with finding suitable ways to modify the frequency
content of an image. In the spatial domain we do this via convolution filtering. In
the frequency domain we do it with multiplicative filters. The latter is a much more
intuitive approach, which is one of the reasons why it is virtually impossible to truly
understand spatial filtering without having at least some rudimentary knowledge of
the frequency domain.
An example will help clarify these ideas. For simplicity, consider a 1-D func-
tion (such as an intensity scan line through an image) and suppose that we want to
eliminate all its frequencies above a cutoff value, u0 , while “passing” all frequen-
cies below that value. Figure 3.32(a) shows a frequency-domain filter function for
As we did earlier with
doing this. (The term filter transfer function is used to denote filter functions in the
spatial filters, when the frequency domain—this is analogous to our use of the term “filter kernel” in the
meaning is clear we use spatial domain.) Appropriately, the function in Fig. 3.32(a) is called a lowpass filter
the term filter inter-
changeably with filter transfer function. In fact, this is an ideal lowpass filter function because it eliminates
transfer function when all frequencies above u0 , while passing all frequencies below this value.† That is, the
working in the frequency
domain.

All the frequency domain filters in which we are interested are symmetrical about the origin and encompass
both positive and negative frequencies, as we will explain in Section 4.3 (see Fig. 4.8). For the moment, we show
only the right side (positive frequencies) of 1-D filters for simplicity in this short explanation.

www.EBooksWorld.ir
164 Chapter 3 Intensity Transformations and Spatial Filtering

transition of the filter between low and high frequencies is instantaneous. Such filter
functions are not realizable with physical components, and have issues with “ringing”
when implemented digitally. However, ideal filters are very useful for illustrating
numerous filtering phenomena, as you will learn in Chapter 4.
To lowpass-filter a spatial signal in the frequency domain, we first convert it to the
frequency domain by computing its Fourier transform, and then multiply the result
by the filter transfer function in Fig. 3.32(a) to eliminate frequency components with
values higher than u0 . To return to the spatial domain, we take the inverse Fourier
transform of the filtered signal. The result will be a blurred spatial domain function.
Because of the duality between the spatial and frequency domains, we can obtain
the same result in the spatial domain by convolving the equivalent spatial domain
filter kernel with the input spatial function. The equivalent spatial filter kernel
is the inverse Fourier transform of the frequency-domain filter transfer function.
Figure 3.32(b) shows the spatial filter kernel corresponding to the frequency domain
filter transfer function in Fig. 3.32(a). The ringing characteristics of the kernel are
evident in the figure. A central theme of digital filter design theory is obtaining faith-
ful (and practical) approximations to the sharp cut off of ideal frequency domain
filters while reducing their ringing characteristics.

A WORD ABOUT HOW SPATIAL FILTER KERNELS ARE CONSTRUCTED


We consider three basic approaches for constructing spatial filters in the following
sections of this chapter. One approach is based on formulating filters based on
mathematical properties. For example, a filter that computes the average of pixels
in a neighborhood blurs an image. Computing an average is analogous to integra-
tion. Conversely, a filter that computes the local derivative of an image sharpens the
image. We give numerous examples of this approach in the following sections.
A second approach is based on sampling a 2-D spatial function whose shape has
a desired property. For example, we will show in the next section that samples from
a Gaussian function can be used to construct a weighted-average (lowpass) filter.
These 2-D spatial functions sometimes are generated as the inverse Fourier trans-
form of 2-D filters specified in the frequency domain. We will give several examples
of this approach in this and the next chapter.
A third approach is to design a spatial filter with a specified frequency response.
This approach is based on the concepts discussed in the previous section, and falls
in the area of digital filter design. A 1-D spatial filter with the desired response is
obtained (typically using filter design software). The 1-D filter values can be expressed
as a vector v, and a 2-D separable kernel can then be obtained using Eq. (3-42). Or the
1-D filter can be rotated about its center to generate a 2-D kernel that approximates a
circularly symmetric function. We will illustrate these techniques in Section 3.7.

3.5 SMOOTHING (LOWPASS) SPATIAL FILTERS


3.5

Smoothing (also called averaging) spatial filters are used to reduce sharp transi-
tions in intensity. Because random noise typically consists of sharp transitions in

www.EBooksWorld.ir
3.5 Smoothing (Lowpass) Spatial Filters 165

intensity, an obvious application of smoothing is noise reduction. Smoothing prior


to image resampling to reduce aliasing, as will be discussed in Section 4.5, is also
a common application. Smoothing is used to reduce irrelevant detail in an image,
where “irrelevant” refers to pixel regions that are small with respect to the size of
the filter kernel. Another application is for smoothing the false contours that result
from using an insufficient number of intensity levels in an image, as discussed in Sec-
tion 2.4. Smoothing filters are used in combination with other techniques for image
enhancement, such as the histogram processing techniques discussed in Section 3.3,
and unsharp masking, as discussed later in this chapter. We begin the discussion
of smoothing filters by considering linear smoothing filters in some detail. We will
introduce nonlinear smoothing filters later in this section.
As we discussed in Section 3.4, linear spatial filtering consists of convolving an
image with a filter kernel. Convolving a smoothing kernel with an image blurs the
image, with the degree of blurring being determined by the size of the kernel and
the values of its coefficients. In addition to being useful in countless applications of
image processing, lowpass filters are fundamental, in the sense that other impor-
tant filters, including sharpening (highpass), bandpass, and bandreject filters, can be
derived from lowpass filters, as we will show in Section 3.7.
We discuss in this section lowpass filters based on box and Gaussian kernels,
both of which are separable. Most of the discussion will center on Gaussian kernels
because of their numerous useful properties and breadth of applicability. We will
introduce other smoothing filters in Chapters 4 and 5.

BOX FILTER KERNELS


The simplest, separable lowpass filter kernel is the box kernel, whose coefficients
have the same value (typically 1). The name “box kernel” comes from a constant
kernel resembling a box when viewed in 3-D. We showed a 3 × 3 box filter in Fig.
3.31(a). An m × n box filter is an m × n array of 1’s, with a normalizing constant in
front, whose value is 1 divided by the sum of the values of the coefficients (i.e., 1 mn
when all the coefficients are 1’s). This normalization, which we apply to all lowpass
kernels, has two purposes. First, the average value of an area of constant intensity
would equal that intensity in the filtered image, as it should. Second, normalizing
the kernel in this way prevents introducing a bias during filtering; that is, the sum
of the pixels in the original and filtered images will be the same (see Problem 3.31).
Because in a box kernel all rows and columns are identical, the rank of these kernels
is 1, which, as we discussed earlier, means that they are separable.

EXAMPLE 3.11 : Lowpass filtering with a box kernel.


Figure 3.33(a) shows a test pattern image of size 1024 × 1024 pixels. Figures 3.33(b)-(d) are the results
obtained using box filters of size m × m with m = 3, 11, and 21, respectively. For m = 3, we note a slight
overall blurring of the image, with the image features whose sizes are comparable to the size of the
kernel being affected significantly more. Such features include the thinner lines in the image and the
noise pixels contained in the boxes on the right side of the image. The filtered image also has a thin gray
border, the result of zero-padding the image prior to filtering. As indicated earlier, padding extends the
boundaries of an image to avoid undefined operations when parts of a kernel lie outside the border of

www.EBooksWorld.ir
166 Chapter 3 Intensity Transformations and Spatial Filtering

a b
c d
FIGURE 3.33
(a) Test pattern of
size 1024 × 1024
pixels.
(b)-(d) Results of
lowpass filtering
with box kernels
of sizes 3 × 3,
11 × 11,
and 21 × 21,
respectively.

the image during filtering. When zero (black) padding is used, the net result of smoothing at or near the
border is a dark gray border that arises from including black pixels in the averaging process. Using the
11 × 11 kernel resulted in more pronounced blurring throughout the image, including a more prominent
dark border. The result with the 21 × 21 kernel shows significant blurring of all components of the image,
including the loss of the characteristic shape of some components, including, for example, the small
square on the top left and the small character on the bottom left. The dark border resulting from zero
padding is proportionally thicker than before. We used zero padding here, and will use it a few more
times, so that you can become familiar with its effects. In Example 3.14 we discuss two other approaches
to padding that eliminate the dark-border artifact that usually results from zero padding.

LOWPASS GAUSSIAN FILTER KERNELS


Because of their simplicity, box filters are suitable for quick experimentation and
they often yield smoothing results that are visually acceptable. They are useful also
when it is desired to reduce the effect of smoothing on edges (see Example 3.13).
However, box filters have limitations that make them poor choices in many appli-
cations. For example, a defocused lens is often modeled as a lowpass filter, but
box filters are poor approximations to the blurring characteristics of lenses (see
Problem 3.33). Another limitation is the fact that box filters favor blurring along
perpendicular directions. In applications involving images with a high level of detail,

www.EBooksWorld.ir
3.5 Smoothing (Lowpass) Spatial Filters 167

or with strong geometrical components, the directionality of box filters often pro-
duces undesirable results. (Example 3.13 illustrates this issue.) These are but two
applications in which box filters are not suitable.
The kernels of choice in applications such as those just mentioned are circularly
symmetric (also called isotropic, meaning their response is independent of orienta-
Our interest here is
tion). As it turns out, Gaussian kernels of the form
strictly on the bell shape s2 + t 2
of the Gaussian function; −
thus, we dispense with w( s, t ) = G( s, t ) = Ke 2s 2 (3-45)
the traditional multiplier
of the Gaussian PDF and
use a general constant,
are the only circularly symmetric kernels that are also separable (Sahoo [1990]).
K, instead. Recall that s Thus, because Gaussian kernels of this form are separable, Gaussian filters enjoy the
controls the “spread” of a
Gaussian function about
same computational advantages as box filters, but have a host of additional proper-
its mean. ties that make them ideal for image processing, as you will learn in the following
discussion. Variables s and t in Eq. (3-45), are real (typically discrete) numbers.
By letting r = [ s 2 + t 2 ]1 2 we can write Eq. (3-45) as
r2

G(r ) = Ke 2s 2 (3-46)

This equivalent form simplifies derivation of expressions later in this section. This
form also reminds us that the function is circularly symmetric. Variable r is the dis-
tance from the center to any point on function G. Figure 3.34 shows values of r for
several kernel sizes using integer values for s and t. Because we work generally with
odd kernel sizes, the centers of such kernels fall on integer values, and it follows that
all values of r 2 are integers also. You can see this by squaring the values in Fig. 3.34

FIGURE 3.34 (m − 1) (m − 1)
2 2
2 2
Distances from
the center for
.....

various sizes of
square kernels.
4 2 5 2 5 17 4 17 2 5 5 4 2

5 3 2 13 10 3 10 13 3 2 5

2 5 13 2 2 5 2 5 2 2 13 2 5

17 10 5 2 1 2 5 10 17

..... 4 3 2 1 0 1 2 3 4 .....

17 10 5 2 1 2 5 10 17
3*3
2 5 13 2 2 5 2 5 2 2 13 2 5
5*5
5 3 2 13 10 3 10 13 3 2 5
7*7
4 2 5 2 5 17 4 17 2 5 5 4 2
9*9
.....

m*m
(m − 1) (m − 1)
2 2
2 2

www.EBooksWorld.ir
168 Chapter 3 Intensity Transformations and Spatial Filtering

(for a formal proof, see Padfield [2011]). Note in particular that the distance squared
to the corner points for a kernel of size m × m is
2
⎡ (m - 1) ⎤ (m - 1)2
r2
max =⎢ 2⎥ = (3-47)
⎣ 2 ⎦ 2

Small Gaussian kernels The kernel in Fig. 3.31(b) was obtained by sampling Eq. (3-45) (with K = 1 and
cannot capture the char-
acteristic Gaussian bell s = 1). Figure 3.35(a) shows a perspective plot of a Gaussian function, and illustrates
shape, and thus behave that the samples used to generate that kernel were obtained by specifying values of
more like box kernels. As
we discuss below, a prac- s and t, then “reading” the values of the function at those coordinates. These values
tical size for Gaussian are the coefficients of the kernel. Normalizing the kernel by dividing its coefficients
kernels is on the order of
6s6s. by the sum of the coefficients completes the specification of the kernel. The reasons
for normalizing the kernel are as discussed in connection with box kernels. Because
Gaussian kernels are separable, we could simply take samples along a cross section
As we explained in
through the center and use the samples to form vector v in Eq. (3-42), from which
Section 2.6, the symbols we obtain the 2-D kernel.
<⋅= and :⋅; denote the
ceiling and floor func-
Separability is one of many fundamental properties of circularly symmetric
tions. That is, the ceiling Gaussian kernels. For example, we know that the values of a Gaussian function at a
and floor functions map
a real number to the
distance larger than 3s from the mean are small enough that they can be ignored.
smallest following, or the This means that if we select the size of a Gaussian kernel to be L 6sM × L 6sM (the nota-
largest previous, integer,
respectively.
tion LcM is used to denote the ceiling of c; that is, the smallest integer not less than
c), we are assured of getting essentially the same result as if we had used an arbi-
trarily large Gaussian kernel. Viewed another way, this property tells us that there
is nothing to be gained by using a Gaussian kernel larger than L 6sM × L 6sM for image
processing. Because typically we work with kernels of odd dimensions, we would use
the smallest odd integer that satisfies this condition (e.g., a 43 × 43 kernel if s = 7).
Proofs of the results in Two other fundamental properties of Gaussian functions are that the product
Table 3.6 are simplified
by working with the
and convolution of two Gaussians are Gaussian functions also. Table 3.6 shows the
Fourier transform and mean and standard deviation of the product and convolution of two 1-D Gaussian
the frequency domain,
both of which are topics functions, f and g (remember, because of separability, we only need a 1-D Gauss-
in Chapter 4. ian to form a circularly symmetric 2-D function). The mean and standard deviation

a b G(s, t)
FIGURE 3.35
1
(a) Sampling a
Gaussian function
to obtain a discrete 0.3679 0.6065 0.3679
Gaussian kernel.
The values shown 1
are for K = 1 and t
4.8976
 0.6065 1.0000 0.6065
1
s = 1. (b) Resulting
3 × 3 kernel [this 1 1
is the same as Fig. 0.3679 0.6065 0.3679
3.31(b)]. s

www.EBooksWorld.ir
3.5 Smoothing (Lowpass) Spatial Filters 169

TABLE 3.6 Mean and standard deviation of the product (× ) and convolution (� ) of two 1-D Gaussian functions, f
and g. These results generalize directly to the product and convolution of more than two 1-D Gaussian functions
(see Problem 3.25).

f g f ×g f �g

mf s g2 + mg sf2
Mean mf mg mf × g = mf � g = m f + mg
sf2 + s g2

sf2 s g2
Standard deviation sf sg sf × g = sf � g = s 2f + s g2
sf2 + s g2

completely define a Gaussian, so the parameters in Table 3.6 tell us all there is to
know about the functions resulting from multiplication and convolution of Gauss-
ians. As indicated by Eqs. (3-45) and (3-46), Gaussian kernels have zero mean, so our
interest here is in the standard deviations.
The convolution result is of particular importance in filtering. For example, we
mentioned in connection with Eq. (3-43) that filtering sometimes is done in succes-
sive stages, and that the same result can be obtained by one stage of filtering with a
composite kernel formed as the convolution of the individual kernels. If the kernels
are Gaussian, we can use the result in Table 3.6 (which, as noted, generalizes directly
to more than two functions) to compute the standard deviation of the composite
kernel (and thus completely define it) without actually having to perform the con-
volution of all the individual kernels.

EXAMPLE 3.12 : Lowpass filtering with a Gaussian kernel.


To compare Gaussian and box kernel filtering, we repeat Example 3.11 using a Gaussian kernel. Gauss-
ian kernels have to be larger than box filters to achieve the same degree of blurring. This is because,
whereas a box kernel assigns the same weight to all pixels, the values of Gaussian kernel coefficients
(and hence their effect) decreases as a function of distance from the kernel center. As explained earlier,
we use a size equal to the closest odd integer to L 6sM × L 6sM. Thus, for a Gaussian kernel of size 21 × 21,
which is the size of the kernel we used to generate Fig. 3.33(d), we need s = 3.5. Figure 3.36(b) shows the
result of lowpass filtering the test pattern with this kernel. Comparing this result with Fig. 3.33(d), we see
that the Gaussian kernel resulted in significantly less blurring. A little experimentation would show that
we need s = 7 to obtain comparable results. This implies a Gaussian kernel of size 43 × 43. Figure 3.36(c)
shows the results of filtering the test pattern with this kernel. Comparing it with Fig. 3.33(d), we see that
the results indeed are very close.
We mentioned earlier that there is little to be gained by using a Gaussian kernel larger than L 6sM × L 6sM.
To demonstrate this, we filtered the test pattern in Fig. 3.36(a) using a Gaussian kernel with s = 7 again,
but of size 85 × 85. Figure 3.37(a) is the same as Fig. 3.36(c), which we generated using the smallest
odd kernel satisfying the L 6 M × L 6 M condition (43 × 43, for s = 7). Figure 3.37(b) is the result of using the
85 × 85 kernel, which is double the size of the other kernel. As you can see, not discernible additional

www.EBooksWorld.ir
170 Chapter 3 Intensity Transformations and Spatial Filtering

a b c
FIGURE 3.36 (a)A test pattern of size 1024 × 1024 . (b) Result of lowpass filtering the pattern with a Gaussian kernel
of size 21 × 21, with standard deviations s = 3.5. (c) Result of using a kernel of size 43 × 43, with s = 7. This result
is comparable to Fig. 3.33(d). We used K = 1 in all cases.

blurring occurred. In fact, the difference image in Fig 3.37(c) indicates that the two images are nearly
identical, their maximum difference being 0.75, which is less than one level out of 256 (these are 8-bit
images).

EXAMPLE 3.13 : Comparison of Gaussian and box filter smoothing characteristics.


The results in Examples 3.11 and 3.12 showed little visual difference in blurring. Despite this, there are
some subtle differences that are not apparent at first glance. For example, compare the large letter “a”
in Figs. 3.33(d) and 3.36(c); the latter is much smoother around the edges. Figure 3.38 shows this type
of different behavior between box and Gaussian kernels more clearly. The image of the rectangle was

a b c
FIGURE 3.37 (a) Result of filtering Fig. 3.36(a) using a Gaussian kernels of size 43 × 43, with s = 7. (b) Result of using
a kernel of 85 × 85, with the same value of s. (c) Difference image.

www.EBooksWorld.ir
3.5 Smoothing (Lowpass) Spatial Filters 171

a b c
FIGURE 3.38 (a) Image of a white rectangle on a black background, and a horizontal intensity profile along the scan
line shown dotted. (b) Result of smoothing this image with a box kernel of size 71 × 71, and corresponding intensity
profile. (c) Result of smoothing the image using a Gaussian kernel of size 151 × 151, with K = 1 and s = 25. Note
the smoothness of the profile in (c) compared to (b). The image and rectangle are of sizes 1024 × 1024 and 768 × 128
pixels, respectively.

smoothed using a box and a Gaussian kernel with the sizes and parameters listed in the figure. These
parameters were selected to give blurred rectangles of approximately the same width and height, in
order to show the effects of the filters on a comparable basis. As the intensity profiles show, the box filter
produced linear smoothing, with the transition from black to white (i.e., at an edge) having the shape
of a ramp. The important features here are hard transitions at the onset and end of the ramp. We would
use this type of filter when less smoothing of edges is desired. Conversely, the Gaussian filter yielded
significantly smoother results around the edge transitions. We would use this type of filter when gener-
ally uniform smoothing is desired.

As the results in Examples 3.11, 3.12, and 3.13 show, zero padding an image intro-
duces dark borders in the filtered result, with the thickness of the borders depending
on the size and type of the filter kernel used. Earlier, when discussing correlation
and convolution, we mentioned two other methods of image padding: mirror (also
called symmetric) padding, in which values outside the boundary of the image are
obtained by mirror-reflecting the image across its border; and replicate padding, in
which values outside the boundary are set equal to the nearest image border value.
The latter padding is useful when the areas near the border of the image are con-
stant. Conversely, mirror padding is more applicable when the areas near the border
contain image details. In other words, these two types of padding attempt to “extend”
the characteristics of an image past its borders.
Figure 3.39 illustrates these padding methods, and also shows the effects of more
aggressive smoothing. Figures 3.39(a) through 3.39(c) show the results of filtering
Fig. 3.36(a) with a Gaussian kernel of size 187 × 187 elements with K = 1 and s = 31,
using zero, mirror, and replicate padding, respectively. The differences between the
borders of the results with the zero-padded image and the other two are obvious,

www.EBooksWorld.ir
172 Chapter 3 Intensity Transformations and Spatial Filtering

a b c
FIGURE 3.39 Result of filtering the test pattern in Fig. 3.36(a) using (a) zero padding, (b) mirror padding, and (c) rep-
licate padding. A Gaussian kernel of size 187 × 187, with K = 1 and s = 31 was used in all three cases.

and indicate that mirror and replicate padding yield more visually appealing results
by eliminating the dark borders resulting from zero padding.

EXAMPLE 3.14 : Smoothing performance as a function of kernel and image size.


The amount of relative blurring produced by a smoothing kernel of a given size depends directly on
image size. To illustrate, Fig. 3.40(a) shows the same test pattern used earlier, but of size 4096 × 4096
pixels, four times larger in each dimension than before. Figure 3.40(b) shows the result of filtering this
image with the same Gaussian kernel and padding used in Fig. 3.39(b). By comparison, the former
image shows considerably less blurring for the same size filter. In fact, Fig. 3.40(b) looks more like the

a b c
FIGURE 3.40 (a) Test pattern of size 4096 × 4096 pixels. (b) Result of filtering the test pattern with the same Gaussian
kernel used in Fig. 3.39. (c) Result of filtering the pattern using a Gaussian kernel of size 745 × 745 elements, with
K = 1 and s = 124. Mirror padding was used throughout.

www.EBooksWorld.ir
3.5 Smoothing (Lowpass) Spatial Filters 173

image in Fig. 3.36(d), which was filtered using a 43 × 43 Gaussian kernel. In order to obtain results that
are comparable to Fig. 3.39(b) we have to increase the size and standard deviation of the Gaussian
kernel by four, the same factor as the increase in image dimensions. This gives a kernel of (odd) size
745 × 745 (with K = 1 and s = 124). Figure 3.40(c) shows the result of using this kernel with mirror pad-
ding. This result is quite similar to Fig. 3.39(b). After the fact, this may seem like a trivial observation, but
you would be surprised at how frequently not understanding the relationship between kernel size and
the size of objects in an image can lead to ineffective performance of spatial filtering algorithms.

EXAMPLE 3.15 : Using lowpass filtering and thresholding for region extraction.
Figure 3.41(a) is a 2566 × 2758 Hubble Telescope image of the Hickson Compact Group (see figure
caption), whose intensities were scaled to the range [0, 1]. Our objective is to illustrate lowpass filtering
combined with intensity thresholding for eliminating irrelevant detail in this image. In the present con-
text, “irrelevant” refers to pixel regions that are small compared to kernel size.
Figure 3.41(b) is the result of filtering the original image with a Gaussian kernel of size 151 × 151
(approximately 6% of the image width) and standard deviation s = 25. We chose these parameter val-
ues in order generate a sharper, more selective Gaussian kernel shape than we used in earlier examples.
The filtered image shows four predominantly bright regions. We wish to extract only those regions from
the image. Figure 3.41(c) is the result of thresholding the filtered image with a threshold T = 0.4 (we will
discuss threshold selection in Chapter 10). As the figure shows, this approach effectively extracted the
four regions of interest, and eliminated details deemed irrelevant in this application.

EXAMPLE 3.16 : Shading correction using lowpass filtering.


One of the principal causes of image shading is nonuniform illumination. Shading correction (also
called flat-field correction) is important because shading is a common cause of erroneous measurements,
degraded performance of automated image analysis algorithms, and difficulty of image interpretation

a b c
FIGURE 3.41 (a) A 2566 × 2758 Hubble Telescope image of the Hickson Compact Group. (b) Result of lowpass filter-
ing with a Gaussian kernel. (c) Result of thresholding the filtered image (intensities were scaled to the range [0, 1]).
The Hickson Compact Group contains dwarf galaxies that have come together, setting off thousands of new star
clusters. (Original image courtesy of NASA.)

www.EBooksWorld.ir
174 Chapter 3 Intensity Transformations and Spatial Filtering

a b c
FIGURE 3.42 (a) Image shaded by a shading pattern oriented in the −45° direction. (b) Estimate of the shading
patterns obtained using lowpass filtering. (c) Result of dividing (a) by (b). (See Section 9.8 for a morphological
approach to shading correction).

by humans. We introduced shading correction in Example 2.7, where we corrected a shaded image by
dividing it by the shading pattern. In that example, the shading pattern was given. Often, that is not the
case in practice, and we are faced with having to estimate the pattern directly from available samples of
shaded images. Lowpass filtering is a rugged, simple method for estimating shading patterns.
Consider the 2048 × 2048 checkerboard image in Fig. 3.42(a), whose inner squares are of size 128 × 128
pixels. Figure 3.42(b) is the result of lowpass filtering the image with a 512 × 512 Gaussian kernel (four
times the size of the squares), K = 1, and s = 128 (equal to the size of the squares). This kernel is just
large enough to blur-out the squares (a kernel three times the size of the squares is too small to blur
them out sufficiently). This result is a good approximation to the shading pattern visible in Fig. 3.42(a).
Finally, Fig. 3.42(c) is the result of dividing (a) by (b). Although the result is not perfectly flat, it definitely
is an improvement over the shaded image.
In the discussion of separable kernels in Section 3.4, we pointed out that the computational advan-
tage of separable kernels can be significant for large kernels. It follows from Eq. (3-44) that the compu-
tational advantage of the kernel used in this example (which of course is separable) is 262 to 1. Thinking
of computation time, if it took 30 sec to process a set of images similar to Fig. 3.42(b) using the two 1-D
separable components of the Gaussian kernel, it would have taken 2.2 hrs to achieve the same result
using a nonseparable lowpass kernel, or if we had used the 2-D Gaussian kernel directly, without decom-
posing it into its separable parts.

ORDER-STATISTIC (NONLINEAR) FILTERS


Order-statistic filters are nonlinear spatial filters whose response is based on ordering
(ranking) the pixels contained in the region encompassed by the filter. Smoothing is
achieved by replacing the value of the center pixel with the value determined by the
ranking result. The best-known filter in this category is the median filter, which, as
its name implies, replaces the value of the center pixel by the median of the intensity
values in the neighborhood of that pixel (the value of the center pixel is included

www.EBooksWorld.ir
204 Chapter 4 Filtering in the Frequency Domain

4.1 BACKGROUND
4.1

We begin the discussion with a brief outline of the origins of the Fourier transform
and its impact on countless branches of mathematics, science, and engineering.

A BRIEF HISTORY OF THE FOURIER SERIES AND TRANSFORM


The French mathematician Jean Baptiste Joseph Fourier was born in 1768 in the
town of Auxerre, about midway between Paris and Dijon. The contribution for
which he is most remembered was outlined in a memoir in 1807, and later pub-
lished in 1822 in his book, La Théorie Analitique de la Chaleur (The Analytic Theory
of Heat). This book was translated into English 55 years later by Freeman (see
Freeman [1878]). Basically, Fourier’s contribution in this field states that any peri-
odic function can be expressed as the sum of sines and/or cosines of different fre-
quencies, each multiplied by a different coefficient (we now call this sum a Fourier
series). It does not matter how complicated the function is; if it is periodic and satis-
fies some mild mathematical conditions, it can be represented by such a sum. This
is taken for granted now but, at the time it first appeared, the concept that compli-
cated functions could be represented as a sum of simple sines and cosines was not
at all intuitive (see Fig. 4.1). Thus, it is not surprising that Fourier’s ideas were met
initially with skepticism.
Functions that are not periodic (but whose area under the curve is finite) can be
expressed as the integral of sines and/or cosines multiplied by a weighting function.
The formulation in this case is the Fourier transform, and its utility is even greater
than the Fourier series in many theoretical and applied disciplines. Both representa-
tions share the important characteristic that a function, expressed in either a Fourier
series or transform, can be reconstructed (recovered) completely via an inverse pro-
cess, with no loss of information. This is one of the most important characteristics of
these representations because it allows us to work in the Fourier domain (generally
called the frequency domain) and then return to the original domain of the function
without losing any information. Ultimately, it is the utility of the Fourier series and
transform in solving practical problems that makes them widely studied and used as
fundamental tools.
The initial application of Fourier’s ideas was in the field of heat diffusion, where
they allowed formulation of differential equations representing heat flow in such
a way that solutions could be obtained for the first time. During the past century,
and especially in the past 60 years, entire industries and academic disciplines have
flourished as a result of Fourier’s initial ideas. The advent of digital computers and
the “discovery” of a fast Fourier transform (FFT) algorithm in the early 1960s revo-
lutionized the field of signal processing. These two core technologies allowed for the
first time practical processing of a host of signals of exceptional importance, ranging
from medical monitors and scanners to modern electronic communications.
As you learned in Section 3.4, it takes on the order of MNmn operations (multi-
plications and additions) to filter an M × N image with a kernel of size m × n ele-
ments. If the kernel is separable, the number of operations is reduced to MN (m + n).
In Section 4.11, you will learn that it takes on the order of 2 MN log 2 MN operations
to perform the equivalent filtering process in the frequency domain, where the 2 in
front arises from the fact that we have to compute a forward and an inverse FFT.

www.EBooksWorld.ir
4.1 Background 205

FIGURE 4.1
The function at
the bottom is the
sum of the four
functions above it.
Fourier’s idea in
1807 that periodic
functions could be
represented as a
weighted sum of
sines and cosines
was met with
skepticism.

To get an idea of the relative computational advantages of filtering in the frequency


versus the spatial domain, consider square images and kernels, of sizes M × M and
m × m, respectively. The computational advantage (as a function of kernel size) of
filtering one such image with the FFT as opposed to using a nonseparable kernel is
defined as
M 2 m2
Cn (m) =
2 M 2 log 2 M 2
(4-1)
m2
=
4 log 2 M
If the kernel is separable, the advantage becomes

2M 2 m
Cs (m) =
2 M 2 log 2 M 2
(4-2)
m
=
2 log 2 M

In either case, when C(m) > 1 the advantage (in terms of fewer computations)
belongs to the FFT approach; otherwise the advantage favors spatial filtering.

www.EBooksWorld.ir
206 Chapter 4 Filtering in the Frequency Domain

a b 25 5
m Cn(m) m Cs(m)
FIGURE 4.2
(a) Computational 3 0.2 4 3 0.1
20
advantage of the 7 1.1 7 0.3
FFT over non- 11 2.8 11 0.5
3
Cn(m)  10

Cs(m)  10
separable spatial 15 3
15 5.1 15 0.7
kernels. 21 10.0 21 0.9
(b) Advantage over 10 27 16.6 2 27 1.2
separable kernels.
101 232 101 4.6
The numbers for
C(m) in the inset 5 201 918 1 201 9.1
tables are not to be M = 2048 M = 2048
multiplied by the
factors of 10 shown 3 255 511 767 1023 3 255 511 767 1023
for the curves. m m

The computational
Figure 4.2(a) shows a plot of Cn (m) as a function of m for an image of intermedi-
advantages given by Eqs. ate size (M = 2048). The inset table shows a more detailed look for smaller kernel
(4-1) and (4-2) do not
take into account the fact
sizes. As you can see, the FFT has the advantage for kernels of sizes 7 × 7 and larger.
that the FFT performs The advantage grows rapidly as a function of m, being over 200 for m = 101, and
operations between
complex numbers, and
close to 1000 for m = 201. To give you a feel for the meaning of this advantage, if
other secondary (but filtering a bank of images of size 2048 × 2048 takes 1 minute with the FFT, it would
small in comparison)
computations discussed
take on the order of 17 hours to filter the same set of images with a nonseparable
later in the chapter. Thus, kernel of size 201 × 201 elements. This is a significant difference, and is a clear indica-
comparisons should be
interpreted only as
tor of the importance of frequency-domain processing using the FFT.
guidelines, In the case of separable kernels, the computational advantage is not as dramatic,
but it is still meaningful. The “cross over” point now is around m = 27, and when
m = 101 the difference between frequency- and spatial-domain filtering is still man-
ageable. However, you can see that with m = 201 the advantage of using the FFT
approaches a factor of 10, which begins to be significant. Note in both graphs that
the FFT is an overwhelming favorite for large spatial kernels.
Our focus in the sections that follow is on the Fourier transform and its properties.
As we progress through this chapter, it will become evident that Fourier techniques
are useful in a broad range of image processing applications. We conclude the chap-
ter with a discussion of the FFT.

ABOUT THE EXAMPLES IN THIS CHAPTER


As in Chapter 3, most of the image filtering examples in this chapter deal with image
enhancement. For example, smoothing and sharpening are traditionally associated
with image enhancement, as are techniques for contrast manipulation. By its very
nature, beginners in digital image processing find enhancement to be interesting
and relatively simple to understand. Therefore, using examples from image enhance-
ment in this chapter not only saves having an extra chapter in the book but, more
importantly, is an effective tool for introducing newcomers to filtering techniques in
the frequency domain. We will use frequency domain processing methods for other
applications in Chapters 5, 7, 8, 10, and 11.

www.EBooksWorld.ir
4.2 Preliminary Concepts 207

4.2 PRELIMINARY CONCEPTS


4.2

We pause briefly to introduce several of the basic concepts that underlie the mate-
rial in later sections.

COMPLEX NUMBERS
A complex number, C, is defined as
C = R + jI (4-3)
where R and I are real numbers and j = −1. Here, R denotes the real part of the
complex number and I its imaginary part. Real numbers are a subset of complex
numbers in which I = 0. The conjugate of a complex number C, denoted C * , is
defined as
C * = R − jI (4-4)
Complex numbers can be viewed geometrically as points on a plane (called the com-
plex plane) whose abscissa is the real axis (values of R) and whose ordinate is the
imaginary axis (values of I). That is, the complex number R + jI is point (R, I ) in the
coordinate system of the complex plane.
Sometimes it is useful to represent complex numbers in polar coordinates,

C = C (cos u + j sin u) (4-5)

where C = R 2 + I 2 is the length of the vector extending from the origin of the
complex plane to point (R, I ), and u is the angle between the vector and the real axis.
Drawing a diagram of the real and complex axes with the vector in the first quadrant
will show that tan u = ( I R) or u = arctan( I R). The arctan function returns angles
in the range [ − p 2 , p 2]. But, because I and R can be positive and negative inde-
pendently, we need to be able to obtain angles in the full range [ −p, p]. We do this
by keeping track of the sign of I and R when computing u. Many programming
languages do this automatically via so called four-quadrant arctangent functions. For
example, MATLAB provides the function atan2(Imag, Real) for this purpose.
Using Euler’s formula,

e ju = cos u + j sin u (4-6)

where e = 2.71828 ..., gives the following familiar representation of complex num-
bers in polar coordinates,

C = C e ju (4-7)

where C and u are as defined above. For example, the polar representation of the
complex number 1 + j 2 is 5e ju , where u = 63.4° or 1.1 radians. The preceding equa-
tions are applicable also to complex functions. A complex function, F(u), of a real
variable u, can be expressed as the sum F (u) = R(u) + jI (u), where R(u) and I (u) are
the real and imaginary component functions of F (u). As previously noted, the com-
12
plex conjugate is F * (u) = R(u) − jI (u), the magnitude is F (u) = [ R(u)2 + I (u)2 ] ,

www.EBooksWorld.ir
208 Chapter 4 Filtering in the Frequency Domain

and the angle is u(u) = arctan[ I (u) R(u)]. We will return to complex functions sev-
eral times in the course of this and the next chapter.

FOURIER SERIES
As indicated in the previous section, a function f (t ) of a continuous variable, t,
that is periodic with a period, T, can be expressed as the sum of sines and cosines
multiplied by appropriate coefficients. This sum, known as a Fourier series, has the
form
 2pn
j t
f (t ) = ∑
n = −
cn e T (4-8)

where

2−T 2
T 2 2 pn
1 −j t
cn = f (t ) e T dt for n = 0, ± 1, ± 2, . . . (4-9)
T

are the coefficients. The fact that Eq. (4-8) is an expansion of sines and cosines fol-
lows from Euler’s formula, Eq. (4-6).

IMPULSES AND THEIR SIFTING PROPERTIES


Central to the study of linear systems and the Fourier transform is the concept of an
impulse and its sifting property. A unit impulse of a continuous variable t, located at
An impulse is not a
t = 0, and denoted d(t ), is defined as
function in the usual
sense. A more accurate ⎧ if t = 0
name is a distribution d(t ) = ⎨ (4-10)
or generalized function. ⎩0 if t ≠ 0
However, one often
finds in the literature the and is constrained to satisfy the identity
names impulse function,

2- 
delta function, and Dirac 
delta function, despite the
misnomer.
d(t ) dt = 1 (4-11)

Physically, if we interpret t as time, an impulse may be viewed as a spike of infinity


amplitude and zero duration, having unit area. An impulse has the so-called sifting
property with respect to integration,
To sift means literally to

2-
separate, or to separate 
out, by putting something
through a sieve. f (t ) d(t ) dt = f (0) (4-12)

provided that f (t ) is continuous at t = 0, a condition typically satisfied in practice.


Sifting simply yields the value of the function f (t ) at the location of the impulse (i.e.,
at t = 0 in the previous equation). A more general statement of the sifting property
involves an impulse located at an arbitrary point, t0 , denoted as, d(t − t0 ). In this case,

2- 

f (t ) d(t − t0 ) dt = f (t0 ) (4-13)

www.EBooksWorld.ir
4.2 Preliminary Concepts 209

which simply gives the value of the function at the location of the impulse. For
example, if f (t ) = cos(t ), using the impulse d(t − p) in Eq. (4-13) yields the result
f (p) = cos(p) = −1. The power of the sifting concept will become evident shortly.
Of particular interest later in this section is an impulse train, sT (t ), defined as the
sum of infinitely many impulses T units apart:

sT (t ) = ∑ d(t − kT )
k = −
(4-14)

Figure 4.3(a) shows a single impulse located at t = t0 , and Fig. 4.3(b) shows an
impulse train. Impulses for continuous variables are denoted by up-pointing arrows
to simulate infinite height and zero width. For discrete variables the height is finite,
as we will show next.
Let x represent a discrete variable. As you learned in Chapter 3, the unit discrete
impulse, d( x), serves the same purposes in the context of discrete systems as the
impulse d(t ) does when working with continuous variables. It is defined as

⎧1 if x = 0
d( x) = ⎨ (4-15)
⎩0 if x ≠ 0
Clearly, this definition satisfies the discrete equivalent of Eq. (4-11):

∑ d( x) = 1
x = −
(4-16)

The sifting property for discrete variables has the form



∑ f ( x) d( x) = f (0)
x = −
(4-17)

a b d(t) sT (t)


c d
d(t  t0)
FIGURE 4.3
(a) Continuous
impulse located ... ...
at t = t0 . (b) An
impulse train
consisting of t t
0 t0 . . . 3T 2T T 0 T 2T 3T . . .
continuous
impulses. (c) Unit
d(x) sX(x)
discrete impulse
located at x = x0 . d(x  x0)
(d) An impulse 1
1
train consisting
of discrete unit
... ...
impulses.

x x
0 x0 . . . 3X 2X X 0 X 2X 3X . . .

www.EBooksWorld.ir
210 Chapter 4 Filtering in the Frequency Domain

or, more generally using a discrete impulse located at x = x0 (see Eq. 3-33),

∑ f ( x) d( x − x0 ) = f ( x0 )
x = −
(4-18)

As before, we see that the sifting property yields the value of the function at the
location of the impulse. Figure 4.3(c) shows the unit discrete impulse diagrammati-
cally, and Fig. 4.3(d) shows a train of discrete unit impulses, Unlike its continuous
counterpart, the discrete impulse is an ordinary function.

THE FOURIER TRANSFORM OF FUNCTIONS OF ONE CONTINUOUS


VARIABLE
The Fourier transform of a continuous function f (t ) of a continuous variable, t,
denoted � { f (t )} , is defined by the equation

2-

� { f (t )} = f (t ) e − j 2pmt dt (4-19)

where m is a continuous variable also.† Because t is integrated out, � { f (t )} is a func-


tion only of m. That is � { f (t )} = F (m); therefore, we write the Fourier transform of
f (t ) as

2-

F (m) = f (t ) e − j 2 pmt dt
(4-20)

Conversely, given F(m), we can obtain f (t ) back using the inverse Fourier transform,
written as

2-
Equation (4-21) indicates
the important fact men- 
tioned in Section 4.1 that f (t ) = F (m) e j 2 pmt dm (4-21)
a function can be recov-
ered from its transform.
where we made use of the fact that variable m is integrated out in the inverse
transform and wrote simply f (t ), rather than the more cumbersome notation
f (t ) = �−1 {F (m)} . Equations (4-20) and (4-21) comprise the so-called Fourier
transform pair, often denoted as f (t ) ⇔ F (m). The double arrow indicates that the
expression on the right is obtained by taking the forward Fourier transform of the
expression on the left, while the expression on the left is obtained by taking the
inverse Fourier transform of the expression on the right.
Using Euler’s formula, we can write Eq. (4-20) as

2-
Because t is integrated
out in this equation, the

only variable left is m, F (m) = f (t )[ cos(2pmt ) − j sin(2pmt )] dt (4-22)
which is the frequency of
the sine and cosine terms.

Conditions for the existence of the Fourier transform are complicated to state in general (Champeney [1987]),
but a sufficient condition for its existence is that the integral of the absolute value of f (t ), or the integral of the
square of f (t ), be finite. Existence is seldom an issue in practice, except for idealized signals, such as sinusoids
that extend forever. These are handled using generalized impulses. Our primary interest is in the discrete Fourier
transform pair which, as you will see shortly, is guaranteed to exist for all finite functions.

www.EBooksWorld.ir
4.2 Preliminary Concepts 211

If f (t ) is real, we see that its transform in general is complex. Note that the Fourier
transform is an expansion of f (t ) multiplied by sinusoidal terms whose frequencies
are determined by the values of m. Thus, because the only variable left after integra-
tion is frequency, we say that the domain of the Fourier transform is the frequency
domain. We will discuss the frequency domain and its properties in more detail later
in this chapter. In our discussion, t can represent any continuous variable, and the
units of the frequency variable m depend on the units of t. For example, if t repre-
sents time in seconds, the units of m are cycles/sec or Hertz (Hz). If t represents
distance in meters, then the units of m are cycles/meter, and so on. In other words,
the units of the frequency domain are cycles per unit of the independent variable of
the input function.

EXAMPLE 4.1 : Obtaining the Fourier transform of a simple continuous function.


The Fourier transform of the function in Fig. 4.4(a) follows from Eq. (4-20):

2- 2−W 2
 W 2
F (m) = f (t ) e − j 2 pmt dt = Ae − j 2 pmt dt

−A − j 2 pmt W 2 −A
= ⎡e ⎤ ⎡ − jpmW − e jpmW ⎤
j 2pm ⎣ ⎦ −W 2 = j 2pm ⎣e ⎦
A
= ⎡e jpmW − e − jpmW ⎤
j 2pm ⎣ ⎦
sin(pmW )
= AW
(pmW )

f (t) F ( m) F ( m)

AW AW

1/W 1/W
t m m
W/ 2 0 W/ 2 0 . . . 2/W 0
. . . 2/W 2/W . . . 2/W . . .
1/W 1/W

a b c
FIGURE 4.4 (a) A box function, (b) its Fourier transform, and (c) its spectrum. All functions extend to infinity in both
directions. Note the inverse relationship between the width, W, of the function and the zeros of the transform.

www.EBooksWorld.ir
212 Chapter 4 Filtering in the Frequency Domain

where we used the trigonometric identity sin u = (e ju − e − ju ) 2 j. In this case, the complex terms of the
Fourier transform combined nicely into a real sine function. The result in the last step of the preceding
expression is known as the sinc function, which has the general form

sin(pm)
sinc(m) = (4-23)
(pm)
where sinc(0) = 1 and sinc(m) = 0 for all other integer values of m. Figure 4.4(b) shows a plot of F(m).
In general, the Fourier transform contains complex terms, and it is customary for display purposes to
work with the magnitude of the transform (a real quantity), which is called the Fourier spectrum or the
frequency spectrum:
sin(pmW )
F (m) = AW
(pmW )

Figure 4.4(c) shows a plot of F(m) as a function of frequency. The key properties to note are (1) that
the locations of the zeros of both F(m) and F(m) are inversely proportional to the width,W, of the “box”
function; (2) that the height of the lobes decreases as a function of distance from the origin; and (3) that
the function extends to infinity for both positive and negative values of m. As you will see later, these
properties are quite helpful in interpreting the spectra of two dimensional Fourier transforms of images.

EXAMPLE 4.2 : Fourier transform of an impulse and an impulse train.


The Fourier transform of a unit impulse located at the origin follows from Eq. (4-20):

2- 2-
 
� {d(t )} = F (m) = d(t ) e − j 2 pmt dt = e − j 2 pmt d(t ) dt = e − j 2 pm

where we used the sifting property from Eq. (4-12). Thus, we see that the Fourier transform of an
impulse located at the origin of the spatial domain is a constant in the frequency domain (we discussed
this briefly in Section 3.4 in connection with Fig. 3.30).
Similarly, the Fourier transform of an impulse located at t = t0 is

2- 2-
 
� {d(t − t0 )} = F (m) = d(t − t0 ) e − j 2 pmt dt = e − j 2 pmt d(t − t0 ) dt = e − j 2 pmt0

where we used the sifting property from Eq. (4-13). The term e − j 2 pmt0 represents a unit circle centered on
the origin of the complex plane, as you can easily see by using Euler’s formula to expand the exponential
into its sine and cosine components.
In Section 4.3, we will use the Fourier transform of a periodic impulse train. Obtaining this transform
is not as straightforward as we just showed for individual impulses. However, understanding how to
derive the transform of an impulse train is important, so we take the time to derive it here. We start by
noting that the only basic difference in the form of Eqs. (4-20) and (4-21) is the sign of the exponential.
Thus, if a function f (t ) has the Fourier transform F(m), then evaluating this function at t, F (t ), must
have the transform f (−m). Using this symmetry property and given, as we showed above, that the Fou-
rier transform of an impulse d(t − t0 ) is e − j 2 pmt0 , it follows that the function e − j 2 pmt0 has the transform

www.EBooksWorld.ir
4.2 Preliminary Concepts 213

d(− m − t0 ). By letting −t0 = a, it follows that the transform of e j 2pat is d(− m + a) = d(m − a), where the last
step is true because d is zero unless m = a, which is the same condition for either d(− m + a) or d(m − a).
The impulse train sT (t ) in Eq. (4-14) is periodic with period T, so it can be expressed as a Fourier
series:
 2pn
j t
sT (t ) = ∑ cn e
n = −
T

where

2−T 2
T 2 2 pn
1 −j t
cn = sT (t ) e T dt
T

With reference to Fig. 4.3(b), we see that the integral in the interval [ −T 2 , T 2] encompasses only
the impulse located at the origin. Therefore, the preceding equation becomes

2−T 2
T 2 2 pn
1 −j t 1 0 1
cn = d(t ) e T dt = e =
T T T

where we used the sifting property of d(t ). The Fourier series then becomes
2pn
1  j t
sT (t ) =
T

n = −
e T

Our objective is to obtain the Fourier transform of this expression. Because summation is a linear pro-
cess, obtaining the Fourier transform of a sum is the same as obtaining the sum of the transforms of the
individual components of the sum. These components are exponentials, and we established earlier in
this example that
2pn n
�Ue j T t V = d Qm − R
T

So, S(m), the Fourier transform of the periodic impulse train, is

1  2 pn 1  2 pn 1 
n
S(m) = � {sT (t )} = � U ∑ e j T t V = � U ∑ e j T t V = ∑ d Qm − T R
T n = − T n = − T n = −

This fundamental result tells us that the Fourier transform of an impulse train with period T is also
an impulse train, whose period is 1 T . This inverse proportionality between the periods of sT (t ) and
S(m) is analogous to what we found in Fig. 4.4 in connection with a box function and its transform. This
inverse relationship plays a fundamental role in the remainder of this chapter.

As in Section 3.4, the CONVOLUTION


fact that convolution of a
function with an impulse We showed in Section 3.4 that convolution of two functions involves flipping (rotat-
shifts the origin of the
function to the location of ing by 180°) one function about its origin and sliding it past the other. At each dis-
the impulse is also true for placement in the sliding process, we perform a computation, which, for discrete
continuous convolution.
(See Figs. 3.29 and 3.30.) variables, is a sum of products [see Eq. (3-35)]. In the present discussion, we are

www.EBooksWorld.ir
214 Chapter 4 Filtering in the Frequency Domain

interested in the convolution of two continuous functions, f (t ) and h(t ), of one con-
tinuous variable, t, so we have to use integration instead of a summation. The con-
volution of these two functions, denoted as before by the operator �, is defined as

2-

( f � h)(t ) = f (t) h(t − t) dt (4-24)

where the minus sign accounts for the flipping just mentioned, t is the displacement
needed to slide one function past the other, and t is a dummy variable that is inte-
grated out. We assume for now that the functions extend from − to .
We illustrated the basic mechanics of convolution in Section 3.4, and we will do
so again later in this chapter and in Chapter 5. At the moment, we are interested in
finding the Fourier transform of Eq. (4-24). We start with Eq. (4-19):

2- ⎢⎣ 2-
 ⎡  ⎤
� {( f � h)(t )} = ⎢ f (t) h(t − t) dt ⎥ e − j 2 pmt dt
⎥⎦

2- ⎢⎣ 2
 ⎡  ⎤
= f (t) ⎢ h(t − t) e − j 2 pmt dt ⎥ dt
- ⎥⎦

The term inside the brackets is the Fourier transform of h(t − t). We will show later
in this chapter that � {h(t − t)} = H (m) e − j 2 pmt , where H(m) is the Fourier transform
of h(t ). Using this in the preceding equation gives us

2- 

� {( f � h)(t )} = f (t) ⎡⎣ H (m) e − j 2 pmt ⎤⎦ dt

2- 
Remember, convolution
is commutative, so the 
order of the functions in
convolution expressions
= H (m) f (t) e − j 2 pmt dt
does not matter.
= H (m)F (m)
= (H i F )(m)

where “ i ” indicates multiplication. As noted earlier, if we refer to the domain of t


as the spatial domain, and the domain of m as the frequency domain, the preceding
equation tells us that the Fourier transform of the convolution of two functions in
the spatial domain is equal to the product in the frequency domain of the Fourier
transforms of the two functions. Conversely, if we have the product of the two trans-
forms, we can obtain the convolution in the spatial domain by computing the inverse
Fourier transform. In other words, f � h and H i F are a Fourier transform pair. This
result is one-half of the convolution theorem and is written as

( f � h)(t ) ⇔ (H i F )(m) (4-25)

As noted earlier, the double arrow indicates that the expression on the right is
obtained by taking the forward Fourier transform of the expression on the left, while

www.EBooksWorld.ir
4.3 Sampling and the Fourier Transform of Sampled Functions 215

the expression on the left is obtained by taking the inverse Fourier transform of the
expression on the right.
These two expressions Following a similar development would result in the other half of the convolution
also hold for discrete
variables, with the theorem:
exception that the right
side of Eq. (4-26) is ( f i h)(t ) ⇔ (H � F )(m) (4-26)
multiplied by (1/M),
where M is the number
of discrete samples (see
which states that convolution in the frequency domain is analogous to multiplica-
Problem 4.18). tion in the spatial domain, the two being related by the forward and inverse Fourier
transforms, respectively. As you will see later in this chapter, the convolution theo-
rem is the foundation for filtering in the frequency domain.

4.3 SAMPLING AND THE FOURIER TRANSFORM OF SAMPLED


4.3
FUNCTIONS
In this section, we use the concepts from Section 4.2 to formulate a basis for express-
ing sampling mathematically. Starting from basic principles, this will lead us to the
Fourier transform of sampled functions. That is, the discrete Fourier transform.

SAMPLING
Continuous functions have to be converted into a sequence of discrete values before
they can be processed in a computer. This requires sampling and quantization, as
introduced in Section 2.4. In the following discussion, we examine sampling in more
detail.
Consider a continuous function, f (t ), that we wish to sample at uniform intervals,
T, of the independent variable t (see Fig. 4.5). We assume initially that the function
extends from − to  with respect to t. One way to model sampling is to multiply
f (t ) by a sampling function equal to a train of impulses T units apart. That is,
Taking samples ΔΤ units
apart implies a sampling
rate equal to 1/ΔΤ. If the

units of ΔΤ are seconds,
then the sampling rate is
in samples/s. If the units
f (t ) = f (t )sT (t ) = ∑ f (t ) d(t − nT )
n = −
(4-27)
of ΔΤ are meters, then
the sampling rate is in
samples/m, and so on. where f (t ) denotes the sampled function. Each component of this summation is an
impulse weighted by the value of f (t ) at the location of the impulse, as Fig. 4.5(c)
shows. The value of each sample is given by the “strength” of the weighted impulse,
which we obtain by integration. That is, the value, fk , of an arbitrary sample in the
sampled sequence is given by

2-

fk = f (t ) d(t − kT ) dt
(4-28)
= f (kT )

where we used the sifting property of d in Eq. (4-13). Equation (4-28) holds for any
integer value k = . . . , − 2, − 1, 0, 1, 2, . . . . Figure 4.5(d) shows the result, which con-
sists of equally spaced samples of the original function.

www.EBooksWorld.ir
216 Chapter 4 Filtering in the Frequency Domain

a f (t)
b
c
d
FIGURE 4.5 ...
...
(a) A continuous
function. (b) Train t
of impulses used to 0
model sampling. sT (t)
(c) Sampled
function formed as
the product of (a)
and (b). (d) Sample
values obtained by
... ...
integration and t
using the sifting . . . 2T T 0 T 2T . . .
property of f (t)sT (t)
impulses. (The
dashed line in (c) is
shown for refer-
ence. It is not part ..
of the data.) ..
... ...
. . . 2TT 0 T 2T . . . t

f k  f (kT)

... ...
k
. . . 2 1 0 1 2 ...

THE FOURIER TRANSFORM OF SAMPLED FUNCTIONS


Let F(m) denote the Fourier transform of a continuous function f (t ). As discussed
in the previous section, the corresponding sampled function, f (t ), is the product of
f (t ) and an impulse train. We know from the convolution theorem that the Fourier
transform of the product of two functions in the spatial domain is the convolution
of the transforms of the two functions in the frequency domain. Thus, the Fourier
transform of the sampled function is:

{ }
F (m) = � f (t ) = � { f (t )sT (t )}
(4-29)
= (F � S )(m)

where, from Example 4.2,

1 
n
S(m) =
T
∑ d Qm − T R
n = −
(4-30)

www.EBooksWorld.ir
4.3 Sampling and the Fourier Transform of Sampled Functions 217

is the Fourier transform of the impulse train sT (t ). We obtain the convolution of
F(m) and S(m) directly from the 1-D definition of convolution in Eq. (4-24):

2-

F (m) = (F � S )(m) = F (t) S(m − t) dt

2-

1 
n
= F (t) ∑ d Qm − t − R dt
T n = − T
(4-31)

n = − 2

1 
n
=
T
∑ F (t) d Qm − t −
T
R dt
-

1  n
= ∑ F Qm − T R
T n = − 

where the final step follows from the sifting property of the impulse, Eq. (4-13).
The summation in the last line of Eq. (4-31) shows that the Fourier transform
 m) of the sampled function f (t ) is an infinite, periodic sequence of copies of the
F(
transform of the original, continuous function. The separation between copies is
determined by the value of 1 T. Observe that although f (t ) is a sampled function,
its transform, F( m), is continuous because it consists of copies of F(m), which is a
continuous function.
Figure 4.6 is a graphical summary of the preceding results.† Figure 4.6(a) is a
sketch of the Fourier transform, F(m), of a function f (t ), and Fig. 4.6(b) shows the
transform, F(  m), of the sampled function, f (t ). As mentioned in the previous sec-
tion, the quantity 1 T is the sampling rate used to generate the sampled function.
So, in Fig. 4.6(b) the sampling rate was high enough to provide sufficient separation
between the periods, and thus preserve the integrity (i.e., perfect copies) of F(m). In
Fig. 4.6(c), the sampling rate was just enough to preserve F(m), but in Fig. 4.6(d), the
sampling rate was below the minimum required to maintain distinct copies of F(m),
and thus failed to preserve the original transform. Figure 4.6(b) is the result of an
over-sampled signal, while Figs. 4.6(c) and (d) are the results of critically sampling
and under-sampling the signal, respectively. These concepts are the basis that will
help you grasp the fundamentals of the sampling theorem, which we discuss next.

THE SAMPLING THEOREM


We introduced the idea of sampling intuitively in Section 2.4. Now we consider sam-
pling formally, and establish the conditions under which a continuous function can
be recovered uniquely from a set of its samples.
A function f (t ) whose Fourier transform is zero for values of frequencies outside
a finite interval (band) [ −mmax , mmax ] about the origin is called a band-limited func-
tion. Figure 4.7(a), which is a magnified section of Fig. 4.6(a), is such a function. Simi-
larly, Fig. 4.7(b) is a more detailed view of the transform of the critically sampled


For the sake of clarity in sketches of Fourier transforms in Fig. 4.6, and other similar figures in this chapter, we
ignore the fact that Fourier transforms typically are complex functions. Our interest here is on concepts.

www.EBooksWorld.ir
218 Chapter 4 Filtering in the Frequency Domain

a F (m)
b
c
d
FIGURE 4.6 ... ...
(a) Illustrative m
sketch of the 0
Fourier transform ~
F (m)
of a band-limited
function.
(b)–(d) Trans-
forms of the
... ...
corresponding
sampled functions m
under the 2/T 1/T 0 1/T 2/T
conditions of ~
F (m)
over-sampling,
critically
sampling, and
under-sampling,
respectively. ... ...
m
2/T 1/T 0 1/T 2/T
~
F (m)

... ...

m
3/T 2/T 1/T 0 1/T 2/T 3/T

function [see Fig. 4.6(c)]. A higher value of T would cause the periods in F(  m) to
merge; a lower value would provide a clean separation between the periods.
We can recover f (t ) from its samples if we can isolate a single copy of F(m) from
the periodic sequence of copies of this function contained in F(  m), the transform of
the sampled function f (t ). Recall from the discussion in the previous section that
 m) is a continuous, periodic function with period 1 T. Therefore, all we need is
F(
one complete period to characterize the entire transform. In other words, we can
recover f (t ) from that single period by taking its inverse Fourier transform.
Extracting from F(  m) a single period that is equal to F(m) is possible if the sepa-
ration between copies is sufficient (see Fig. 4.6). In terms of Fig. 4.7(b), sufficient
separation is guaranteed if 1 2T > mmax or

1
> 2 mmax (4-32)
Remember, the sampling T
rate is the number of
samples taken per unit of
the independent variable. This equation indicates that a continuous, band-limited function can be recovered
completely from a set of its samples if the samples are acquired at a rate exceeding

www.EBooksWorld.ir
4.3 Sampling and the Fourier Transform of Sampled Functions 219

a F (m)
b
FIGURE 4.7
(a) Illustrative
sketch of the
Fourier
transform of a ... ...
band-limited
function. m max
m
0 m max
(b) Transform ~
resulting from F (m)
critically sampling
that band-limited
function.

... m max m max ...

m
1 0 1 1
––– ––– ––
2T 2T T

twice the highest frequency content of the function. This exceptionally important
result is known as the sampling theorem.† We can say based on this result that no
information is lost if a continuous, band-limited function is represented by samples
acquired at a rate greater than twice the highest frequency content of the function.
Conversely, we can say that the maximum frequency that can be “captured” by sam-
pling a signal at a rate 1 T is mmax = 1 2T . A sampling rate exactly equal to twice
the highest frequency is called the Nyquist rate. Sampling at exactly the Nyquist rate
sometimes is sufficient for perfect function recovery, but there are cases in which
this leads to difficulties, as we will illustrate later in Example 4.3. This is the reason
why the sampling theorem specifies that sampling must exceed the Nyquist rate.
Figure 4.8 illustrates the procedure for recovering F(m) from F( m) when a function
is sampled at a rate higher than the Nyquist rate. The function in Fig. 4.8(b) is defined
by the equation

The ΔΤ in Eq. (4-33) ⎧T − mmax ≤ m ≤ mmax


cancels out the 1/ΔΤ in H (m) = ⎨ (4-33)
Eq. (4-31). ⎩0 otherwise

When multiplied by the periodic sequence in Fig. 4.8(a), this function isolates the
period centered on the origin. Then, as Fig. 4.8(c) shows, we obtain F(m) by multiply-
 m) by H(m) :
ing F(

The sampling theorem is a cornerstone of digital signal processing theory. It was first formulated in 1928 by
Harry Nyquist, a Bell Laboratories scientist and engineer. Claude E. Shannon, also from Bell Labs, proved the
theorem formally in 1949. The renewed interest in the sampling theorem in the late 1940s was motivated by the
emergence of early digital computing systems and modern communications, which created a need for methods
dealing with digital (sampled) data.

www.EBooksWorld.ir
220 Chapter 4 Filtering in the Frequency Domain
~
a F (m)
b
c
FIGURE 4.8 m max m max
(a) Fourier ... ...
transform of a
m
sampled,
2/T 1/T 0 1/T 2/T
band-limited
function.
(b) Ideal lowpass H (m)
filter transfer
function.
(c) The product
of (b) and (a), T
used to extract ... ...
one period of the m
infinitely periodic 0
sequence in (a). ~
F (m)  H (m) F (m)

... ...
m
m max 0 m max

F (m) = H (m)F (m) (4-34)

Once we have F(m), we can recover f (t ) using the inverse Fourier transform:

2-

f (t ) = F (m) e j 2 pmt dm (4-35)

Equations (4-33) through (4-35) prove that, theoretically, it is possible to recover a


band-limited function from samples obtained at a rate exceeding twice the highest
frequency content of the function. As we will discuss in the following section, the
requirement that f (t ) must be band-limited implies in general that f (t ) must extend
from − to , a condition that cannot be met in practice. As you will see shortly,
In Fig. 3.32 we sketched
the radial cross sections
having to limit the duration of a function prevents perfect recovery of the function
of filter transfer functions from its samples, except in some special cases.
using only positive fre-
quencies, for simplicity.
Function H(m) is called a lowpass filter because it passes frequencies in the low
Now you can see that end of the frequency range, but it eliminates (filters out) higher frequencies. It is
frequency domain filter
functions encompass
called also an ideal lowpass filter because of its instantaneous transitions in ampli-
both positive and nega- tude (between 0 and T at location −mmax and the reverse at mmax ), a characteristic
tive frequencies. that cannot be implemented physically in hardware. We can simulate ideal filters
in software, but even then there are limitations (see Section 4.8). Because they are
instrumental in recovering (reconstructing) the original function from its samples,
filters used for the purpose just discussed are also called reconstruction filters.

www.EBooksWorld.ir
4.3 Sampling and the Fourier Transform of Sampled Functions 221

ALIASING
Literally, the word alias means “a false identity.” In the field of signal processing,
aliasing refers to sampling phenomena that cause different signals to become indis-
tinguishable from one another after sampling; or, viewed another way, for one signal
to “masquerade” as another.
Conceptually, the relationship between sampling and aliasing is not difficult to
grasp. The foundation of aliasing phenomena as it relates to sampling is that we
can describe a digitized function only by the values of its samples. This means that
it is possible for two (or more) totally different continuous functions to coincide at
the values of their respective samples, but we would have no way of knowing the
Although we show
characteristics of the functions between those samples. To illustrate, Fig. 4.9 shows
sinusoidal functions for two completely different sine functions sampled at the same rate. As you can see
simplicity, aliasing occurs
between any arbitrary
in Figs. 4.9(a) and (c), there are numerous places where the sampled values are the
signals whose values are same in the two functions, resulting in identical sampled functions, as Figs. 4.9(b)
the same at the sample
points.
and (d) show.
Two continuous functions having the characteristics just described are called an
aliased pair, and such pairs are indistinguishable after sampling. Note that the reason
these functions are aliased is because we used a sampling rate that is too coarse. That
is, the functions were under-sampled. It is intuitively obvious that if sampling were
refined, more and more of the differences between the two continuous functions
would be revealed in the sampled signals. The principal objective of the following
discussion is to answer the question: What is the minimum sampling rate required
to avoid (or reduce) aliasing? This question has both a theoretical and a practical
answer and, in the process of arriving at the answers, we will establish the conditions
under which aliasing occurs.
We can use the tools developed earlier in this section to formally answer the
question we just posed. All we have to do is ask it in a different form: What happens

a b
c d
FIGURE 4.9
The functions in
(a) and (c) are
totally different,
but their digi-
tized versions in
(b) and (d) are
identical. Aliasing
occurs when the
samples of two or
more functions
coincide, but the
functions are dif-
ferent elsewhere.

www.EBooksWorld.ir
222 Chapter 4 Filtering in the Frequency Domain

if a band-limited function is sampled at less than the Nyquist rate (i.e., at less than
twice its highest frequency)? This is precisely the under-sampled situation discussed
earlier in this section and mentioned in the previous paragraph.
Figure 4.10(a) is the same as Fig. 4.6(d); it shows schematically the Fourier trans-
form of an under-sampled, band-limited function. This figure illustrates that the net
effect of lowering the sampling rate below the Nyquist rate is that the periods of the
Fourier transform now overlap, and it becomes impossible to isolate a single period
If we cannot isolate one
of the transform, regardless of the filter used. For instance, using the ideal lowpass
period of the transform, filter in Fig. 4.10(b) would result in a transform that is corrupted by frequencies from
we cannot recover the
signal without aliasing,
adjacent periods, as Fig. 4.10(c) shows. The inverse transform would then yield a
function, fa (t ), different from the original. That is, fa (t ) would be an aliased function
because it would contain frequency components not present in the original. Using
our earlier terminology, fa (t ) would masquerade as a different function. It is pos-
sible for aliased functions to bear no resemblance whatsoever to the functions from
which they originated.
Unfortunately, except in some special cases mentioned below, aliasing is always
present in sampled signals. This is because, even if the original sampled function is
band-limited, infinite frequency components are introduced the moment we limit
the duration of the function, which we always have to do in practice. As an illustra-
tion, suppose that we want to limit the duration of a band-limited function, f (t ), to a
finite interval, say [0, T ]. We can do this by multiplying f (t ) by the function

⎧1 0 ≤ t ≤T
h(t ) = ⎨ (4-36)
⎩0 otherwise

This function has the same basic shape as Fig. 4.4(a), whose Fourier transform, H(m),
has frequency components extending to infinity in both directions, as Fig. 4.4(b) shows.
From the convolution theorem, we know that the transform of the product h(t ) f (t )
is the convolution in the frequency domain of the transforms F(m) and H(m). Even
if F(m) is band-limited, convolving it with H(m) , which involves sliding one function
across the other, will yield a result with frequency components extending to infinity
in both directions (see Problem 4.12). From this we conclude that no function of
finite duration can be band-limited. Conversely, a function that is band-limited must
extend from − to .†
Although aliasing is an inevitable fact of working with sampled records of finite
length, the effects of aliasing can be reduced by smoothing (lowpass filtering) the
input function to attenuate its higher frequencies. This process, called anti-aliasing,
has to be done before the function is sampled because aliasing is a sampling issue
that cannot be “undone after the fact” using computational techniques.

† An important special case is when a function that extends from − to  is band-limited and periodic. In this
case, the function can be truncated and still be band-limited, provided that the truncation encompasses exactly
an integral number of periods. A single truncated period (and thus the function) can be represented by a set of
discrete samples satisfying the sampling theorem, taken over the truncated interval.

www.EBooksWorld.ir
4.3 Sampling and the Fourier Transform of Sampled Functions 223

~
F ( m)

m max m max

... ...

m
3/T 2/T 1/T 0 1/T 2/T 3/T

H (m)

T
... ...
m
0
~
F (m)  H (m) F (m)

... ...
m
m max 0 m max
a
b
c
FIGURE 4.10 (a) Fourier transform of an under-sampled, band-limited function. (Interference between adjacent peri-
ods is shown dashed). (b) The same ideal lowpass filter used in Fig. 4.8. (c) The product of (a) and (b).The interfer-
ence from adjacent periods results in aliasing that prevents perfect recovery of F(m) and, consequently, of f (t ).

EXAMPLE 4.3 : Aliasing.


Figure 4.11 shows a classic illustration of aliasing. A pure sine wave extending infinitely in both direc-
tions has a single frequency so, obviously, it is band-limited. Suppose that the sine wave in the figure
(ignore the large dots for now) has the equation f (t ) = sin(pt ), and that the horizontal axis corresponds
to time, t, in seconds. The function crosses the axis at t = 0, ±1, ± 2, … .
Recall that a function f (t ) is periodic with period P if f (t + P ) = f (t ) for all values of t. The period
is the number (including fractions) of units of the independent variable that it takes for the function
to complete one cycle. The frequency of a periodic function is the number of periods (cycles) that the
function completes in one unit of the independent variable. Thus, the frequency of a periodic function
is the reciprocal of the period. As before, the sampling rate is the number of samples taken per unit of
the independent variable.
In the present example, the independent variable is time, and its units are seconds. The period, P,
of sin(pt ) is 2 s, and its frequency is 1 P , or 1 2 cycles/s. According to the sampling theorem, we can
recover this signal from a set of its samples if the sampling rate exceeds twice the highest frequency
of the signal. This means that a sampling rate greater than 1 sample/s (2 × 1 2 = 1) is required to

www.EBooksWorld.ir
224 Chapter 4 Filtering in the Frequency Domain

...

... 0 1 2 3 4 5... t

...

T

FIGURE 4.11 Illustration of aliasing. The under-sampled function (dots) looks like a sine wave having a frequency
much lower than the frequency of the continuous signal. The period of the sine wave is 2 s, so the zero crossings of
the horizontal axis occur every second. T is the separation between samples.

recover the signal. Viewed another way, the separation, T, between samples has to be less than 1 s.
Observe that sampling this signal at exactly twice the frequency (1 sample/s), with samples taken at
t = 0, ±1, ± 2, … , results in … sin(−p), sin(0), sin(p) … , all of which are 0. This illustrates the reason
why the sampling theorem requires a sampling rate that exceeds twice the highest frequency of the
function, as mentioned earlier.
The large dots in Fig. 4.11 are samples taken uniformly at a rate below the required 1 sample/s (i.e.,
the samples are taken more than 1 s apart; in fact, the separation between samples exceeds 2 s). The
sampled signal looks like a sine wave, but its frequency is about one-tenth the frequency of the original
function. This sampled signal, having a frequency well below anything present in the original continu-
ous function, is an example of aliasing. If the signal had been sampled at a rate slightly exceeding the
Nyquist rate, the samples would not look like a sine wave at all (see Problem 4.6).
Figure 4.11 also illustrates how aliasing can be extremely problematic in musical recordings by intro-
ducing frequencies not present in the original sound. In order to mitigate this, signals with frequencies
above half the sampling rate must be filtered out to reduce the effect of aliased signals introduced into
digital recordings. This is the reason why digital recording equipment contains lowpass filters specifically
designed to remove frequency components above half the sampling rate used by the equipment.
If we were given just the samples in Fig. 4.11, another issue illustrating the seriousness of aliasing is
that we would have no way of knowing that these samples are not a true representation of the original
function. As you will see later in this chapter, aliasing in images can produce similarly misleading results.

FUNCTION RECONSTRUCTION (RECOVERY) FROM SAMPLED DATA


In this section, we show that reconstructing a function from a set of its samples
reduces in practice to interpolating between the samples. Even the simple act of
displaying an image requires reconstruction of the image from its samples by the dis-
play medium. Therefore, it is important to understand the fundamentals of sampled
data reconstruction. Convolution is central to developing this understanding, dem-
onstrating again the importance of this concept.
The discussion of Fig. 4.8 and Eq. (4-34) outlines the procedure for perfect recov-
ery of a band-limited function from its samples using frequency domain methods.

www.EBooksWorld.ir
4.4 The Discrete Fourier Transform of One Variable 225

Using the convolution theorem, we can obtain the equivalent result in the spatial
domain. From Eq. (4-34), F (m) = H (m)F (m), so it follows that

f (t ) = �−1 {F (m)}
{
= �−1 H (m)F (m) } (4-37)
= h(t ) � f (t )

where, as before, f (t ) denotes the sampled function, and the last step follows from
the convolution theorem, Eq. (4-25). It can be shown (see Problem 4.13), that sub-
stituting Eq. (4-27) for f (t ) into Eq. (4-37), and then using Eq. (4-24), leads to the
following spatial domain expression for f (t ):

f (t ) = ∑ f (nT ) sinc [(t − nT ) T ]
n = −
(4-38)

where the sinc function is defined in Eq. (4-23). This result is not unexpected because
the inverse Fourier transform of the ideal (box) filter, H(m), is a sinc function (see
Example 4.1). Equation (4-38) shows that the perfectly reconstructed function, f (t ),
is an infinite sum of sinc functions weighted by the sample values. It has the impor-
tant property that the reconstructed function is identically equal to the sample val-
ues at multiple integer increments of T. That is, for any t = kT , where k is an inte-
ger, f (t ) is equal to the kth sample, f (k T ). This follows from Eq. (4-38) because
sinc(0) = 1 and sinc(m) = 0 for any other integer value of m. Between sample points,
See Section 2.4 regard-
values of f (t ) are interpolations formed by the sum of the sinc functions.
ing interpolation. Equation (4-38) requires an infinite number of terms for the interpolations
between samples. In practice, this implies that we have to look for approximations
that are finite interpolations between the samples. As we discussed in Section 2.6, the
principal interpolation approaches used in image processing are nearest-neighbor,
bilinear, and bicubic interpolation. We will discuss the effects of interpolation on
images in Section 4.5.

4.4 THE DISCRETE FOURIER TRANSFORM OF ONE VARIABLE


4.4

One of the principal goals of this chapter is the derivation of the discrete Fourier
transform (DFT) starting from basic principles. The material up to this point may
be viewed as the foundation of those basic principles, so now we have in place the
necessary tools to derive the DFT.

OBTAINING THE DFT FROM THE CONTINUOUS TRANSFORM OF A


SAMPLED FUNCTION
As we discussed in Section 4.3, the Fourier transform of a sampled, band-limited func-
tion extending from − to  is a continuous, periodic function that also extends from
− to . In practice, we work with a finite number of samples, and the objective of
this section is to derive the DFT of such finite sample sets.
Equation (4-31) gives the transform, F(  m), of sampled data in terms of the trans-
 m) in terms
form of the original function, but it does not give us an expression for F(

www.EBooksWorld.ir
226 Chapter 4 Filtering in the Frequency Domain

of the sampled function f (t ) itself. We find that expression directly from the defini-
tion of the Fourier transform in Eq. (4-19):

2-

F (m) = f (t ) e − j 2 pmt dt (4-39)

By substituting Eq. (4-27) for f (t ), we obtain

2- 2-
  
F (m) = f (t ) e − j 2 pmt dt = ∑
n = −
f (t ) d(t − nT ) e − j 2 pmt dt

2
 
= ∑
n = − - 
f (t ) d(t − nT ) e − j 2 pmt dt (4-40)


= ∑ fn e− j 2pmnT
n = −

The last step follows from Eq. (4-28) and the sifting property of the impulse.
 m), is continuous and
Although fn is a discrete function, its Fourier transform, F(
infinitely periodic with period 1 T, as we know from Eq. (4-31). Therefore, all we
need to characterize F( m) is one period, and sampling one period of this function is
the basis for the DFT.
Suppose that we want to obtain M equally spaced samples of F(  m) taken over the
one period interval from m = 0 to m = 1 T (see Fig. 4.8). This is accomplished by
taking the samples at the following frequencies:

m
m= m = 0, 1, 2, … , M − 1 (4-41)
MT

Substituting this result for m into Eq. (4-40) and letting Fm denote the result yields

M −1
Fm = ∑
n=0
fn e − j 2 p mn M m = 0, 1, 2, … , M − 1 (4-42)

This expression is the discrete Fourier transform we are seeking.† Given a set { fm }
consisting of M samples of f (t ), Eq. (4-42) yields a set {Fm } of M complex values
corresponding to the discrete Fourier transform of the input sample set. Conversely,

†  m) covers
Referring back to Fig. 4.6(b), note that the interval [0, 1 T ] over which we sampled one period of F(
two adjacent half periods of the transform (but with the lowest half of period appearing at higher frequencies).
This means that the data in Fm requires re-ordering to obtain samples that are ordered from the lowest to the
highest frequency of the period. This is the price paid for the notational convenience of taking the samples at
m = 0, 1 , 2, … , M − 1, instead of using samples on either side of the origin, which would require the use of nega-
tive notation. The procedure used to order the transform data will be discussed in Section 4.6.

www.EBooksWorld.ir
4.4 The Discrete Fourier Transform of One Variable 227

given {Fm }, we can recover the sample set { fm } by using the inverse discrete Fourier
transform (IDFT)

1 M −1
fn = ∑ Fm e j 2pmn M n = 0, 1, 2, …, M − 1
M m=0
(4-43)

It is not difficult to show (see Problem 4.15) that substituting Eq. (4-43) for fn into
Eq. (4-42) gives the identity Fm ≡ Fm . Similarly, substituting Eq. (4-42) into Eq. (4-43)
for Fm yields fn ≡ fn . This implies that Eqs. (4-42) and (4-43) constitute a discrete
Fourier transform pair. Furthermore, these identities indicate that the forward and
inverse Fourier transforms exist for any set of samples whose values are finite. Note
that neither expression depends explicitly on the sampling interval T, nor on the
frequency intervals of Eq. (4-41). Therefore, the DFT pair is applicable to any finite
set of discrete samples taken uniformly.
We used m and n in the preceding development to denote discrete variables
because it is typical to do so for derivations. However, it is more intuitive, especially
in two dimensions, to use the notation x and y for image coordinate variables and
u and v for frequency variables, where these are understood to be integers.† Then,
Eqs. (4-42) and (4-43) become
M −1
F (u ) = ∑
x=0
f ( x) e − j 2 pux M u = 0, 1, 2, … , M − 1 (4-44)

and

1 M −1
f ( x) = ∑ F (u) e j 2pux M
M u=0
x = 0, 1, 2, … , M − 1 (4-45)

where we used functional notation instead of subscripts for simplicity. Comparing


Eqs. (4-42) through (4-45), you can see that F (u) ≡ Fm and f ( x) ≡ fn . From this point
on, we use Eqs. (4-44) and (4-45) to denote the 1-D DFT pair. As in the continuous
case, we often refer to Eq. (4-44) as the forward DFT of f ( x), and to Eq. (4-45) as
the inverse DFT of F (u). As before, we use the notation f ( x) ⇔ F (u) to denote a
Fourier transform pair. Sometimes you will encounter in the literature the 1 M term
in front of Eq. (4-44) instead. That does not affect the proof that the two equations
form a Fourier transform pair (see Problem 4.15).
Knowledge that f ( x) and F (u) are a transform pair is useful in proving relation-
ships between functions and their transforms. For example, you are asked in Prob-
lem 4.17 to show that f ( x − x0 ) ⇔ F (u) e − j 2 pux0 M is a Fourier transform pair. That is,
you have to show that the DFT of f ( x − x0 ) is F (u) e − j 2 pux0 M and, conversely, that
the inverse DFT of F (u) e − j 2 pux0 M is f ( x − x0 ). Because this is done by substituting


We have been careful in using t for continuous spatial variables and m for the corresponding continuous fre-
quency variables. From this point on, we will use x and u to denote 1-D discrete spatial and frequency variables,
respectively. When working in 2-D, we will use (t , z) , and (m, n), to denote continuous spatial and frequency
domain variables, respectively. Similarly, we will use ( x, y) and (u, v) to denote their discrete counterparts.

www.EBooksWorld.ir
228 Chapter 4 Filtering in the Frequency Domain

directly into Eqs. (4-44) and (4-45), and you will have proved already that these two
equations constitute a Fourier transform pair (Problem 4.15), if you prove that one
side of “⇔” is the DFT (IDFT) of the other, then it must be true the other side is the
IDFT (DFT) of the side you just proved. It turns out that having the option to prove
one side or the other often simplifies proofs significantly. This is true also of the 1-D
continuous and 2-D continuous and discrete Fourier transform pairs.
It can be shown (see Problem 4.16) that both the forward and inverse discrete
transforms are infinitely periodic, with period M. That is,

F (u) = F (u + kM ) (4-46)

and

f ( x) = f ( x + kM ) (4-47)

where k is an integer.
The discrete equivalent of the 1-D convolution in Eq. (4-24) is

M −1
f ( x) � h( x) = ∑ f (m)h( x − m)
m=0
x = 0, 1, 2, … , M − 1 (4-48)

Because in the preceding formulations the functions are periodic, their convolu-
tion also is periodic. Equation (4-48) gives one period of the periodic convolution.
For this reason, this equation often is referred to as circular convolution. This is a
direct result of the periodicity of the DFT and its inverse. This is in contrast with the
convolution you studied in Section 3.4, in which values of the displacement, x, were
determined by the requirement of sliding one function completely past the other,
and were not fixed to the range [0, M − 1] as in circular convolution. We will discuss
this difference and its significance in Section 4.6 and in Fig. 4.27.
Finally, we point out that the convolution theorem given in Eqs. (4-25) and (4-26)
is applicable also to discrete variables, with the exception that the right side of
Eq. (4-26) is multiplied by 1 M (Problem 4.18).

RELATIONSHIP BETWEEN THE SAMPLING AND FREQUENCY


INTERVALS
If f ( x) consists of M samples of a function f (t ) taken T units apart, the length of
the record comprising the set { f ( x)} , x = 0, 1, 2, … , M − 1, is

T = MT (4-49)

The corresponding spacing, u, in the frequency domain follows from Eq. (4-41):

1 1
u = = (4-50)
MT T

www.EBooksWorld.ir
4.4 The Discrete Fourier Transform of One Variable 229

The entire frequency range spanned by the M components of the DFT is then

1
R = Mu = (4-51)
T
Thus, we see from Eqs. (4-50) and (4-51) that the resolution in frequency, u, of
the DFT depends inversely on the length (duration, if t is time) of the record, T,
over which the continuous function, f (t ), is sampled; and the range of frequencies
spanned by the DFT depends on the sampling interval T. Keep in mind these
inverse relationships between u and T.

EXAMPLE 4.4 : The mechanics of computing the DFT.


Figure 4.12(a) shows four samples of a continuous function, f (t ), taken T units apart. Figure 4.12(b)
shows the samples in the x-domain. The values of x are 0, 1, 2, and 3, which refer to the number of the
samples in sequence, counting up from 0. For example, f (2) = f (t0 + 2T ), the third sample of f (t ).
From Eq. (4-44), the first value of F (u) [i.e., F(0)] is
3
F (0) = ∑ f ( x) = [ f (0) + f (1) + f (2) + f (3)] = 1 + 2 + 4 + 4 = 11
x=0

The next value of F (u) is


3
F (1) = ∑ f ( x) e− j 2p(1)x 4 = 1e0 + 2e− jp 2 + 4e− jp + 4e− j 3p 2 = − 3 + 2 j
x=0

Similarly, F (2) = −(1 + 0 j ) and F (3) = −(3 + 2 j ). Observe that all values of f ( x) are used in computing
each value of F (u).
If we were given F (u) instead, and were asked to compute its inverse, we would proceed in the same
manner, but using the inverse Fourier transform. For instance,

1 3 1 3 1 1
f (0) = ∑ F (u) e j 2 pu( 0 ) = ∑ F (u) = [11 − 3 + 2 j − 1 − 3 − 2 j ] = [4] = 1
4 u=0 4 u=0 4 4

which agrees with Fig. 4.12(b). The other values of f ( x) are obtained in a similar manner.

a b f (t) f (x)

FIGURE 4.12 5 5
(a) A continuous
4 4
function sampled
T units apart. 3 3
(b) Samples in the
2 2
x-domain.
Variable t is 1 1
continuous, while 0 t 0 x
x is discrete. 0 t0 t0 T t0 2T t0 3T 0 1 2 3

www.EBooksWorld.ir
230 Chapter 4 Filtering in the Frequency Domain

4.5 EXTENSIONS TO FUNCTIONS OF TWO VARIABLES


4.5

In the following discussion we extend to two variables the concepts introduced in


the previous sections of this chapter.

THE 2-D IMPULSE AND ITS SIFTING PROPERTY


The impulse, d(t , z), of two continuous variables, t and z, is defined as before:

⎧1 if t = z = 0
d(t , z) = ⎨ (4-52)
⎩0 otherwise

and

2-  2- 
 
d(t , z) dtdz = 1 (4-53)

As in the 1-D case, the 2-D impulse exhibits the sifting property under integration,

2- 2- 
 
f (t , z) d(t , z) dtdz = f (0, 0) (4-54)

or. more generally for an impulse located at (t0 , z0 ),

2- 2-
 
f (t , z) d(t − t0 , z − z0 ) dtdz = f (t0 , z0 ) (4-55)

As before, we see that the sifting property yields the value of the function at the
location of the impulse.
For discrete variables x and y, the 2-D discrete unit impulse is defined as

⎧1 if x = y = 0
d( x, y) = ⎨ (4-56)
⎩0 otherwise

and its sifting property is


 
∑ ∑ f ( x, y) d( x, y) = f (0, 0)
x = − y = −
(4-57)

where f ( x, y) is a function of discrete variables x and y. For an impulse located at


coordinates ( x0 , y0 ) (see Fig. 4.13) the sifting property is
 
∑ ∑ f ( x, y) d( x − x0 , y − y0 ) = f ( x0 , y0 )
x = − y = −
(4-58)

When working with an image of finite dimensions, the limits in the two preceding
equations are replaced by the dimensions of the image.

www.EBooksWorld.ir
4.5 Extensions to Functions of Two Variables 231

FIGURE 4.13 d
2-D unit discrete
impulse. Variables
x and y are d (x  x0, y  y0)
discrete, and d is 1
zero everywhere
except at
coordinates
( x0 , y0 ), where its
x0
value is 1. x y0 y

THE 2-D CONTINUOUS FOURIER TRANSFORM PAIR


Let f (t , z) be a continuous function of two continuous variables, t and z. The two-
dimensional, continuous Fourier transform pair is given by the expressions

2- 2-
 
F (m, n) = f (t , z) e − j 2 p( mt + nz) dt dz (4-59)

and

2- 2-
 
f (t , z) = F (m, n) e j 2 p( mt + nz) dm dn (4-60)

where m and n are the frequency variables. When referring to images, t and z are
interpreted to be continuous spatial variables. As in the 1-D case, the domain of the
variables m and n defines the continuous frequency domain.

EXAMPLE 4.5 : Obtaining the Fourier transform of a 2-D box function.


Figure 4.14(a) shows the 2-D equivalent of the 1-D box function in Example 4.1. Following a procedure
similar to the one used in that example gives the result

2- 2- 2−T 2 2−Z 2


  T 2 Z2
F (m, n) = f (t , z) e − j 2 p( mt + nz) dt dz = Ae − j 2 p( mt + nz) dt dz

⎡ sin(pmT ) ⎤ ⎡ sin(pnZ ) ⎤
= ATZ ⎢ ⎥⎢ ⎥
⎣ (pmT ) ⎦ ⎣ (pnZ ) ⎦

Figure 4.14(b) shows a portion of the spectrum about the origin. As in the 1-D case, the locations of the
zeros in the spectrum are inversely proportional to the values of T and Z. In this example, T is larger
than Z, so the spectrum is the more “contracted” along the m-axis.

2-D SAMPLING AND THE 2-D SAMPLING THEOREM


In a manner similar to the 1-D case, sampling in two dimensions can be modeled
using a sampling function (i.e., a 2-D impulse train):

www.EBooksWorld.ir
232 Chapter 4 Filtering in the Frequency Domain

a b F ( m, n)
f (t, z)
FIGURE 4.14 ATZ
(a) A 2-D function
and (b) a section
of its spectrum. T Z
The box is longer A
along the t-axis,
so the spectrum is
more contracted t T/ 2 Z/ 2 z m n
along the m-axis.

 
sT Z (t , z) = ∑ ∑ d(t − mT , z − nZ)
m = − n = −
(4-61)

where T and Z are the separations between samples along the t- and z-axis of
the continuous function f (t , z). Equation (4-61) describes a set of periodic impulses
extending infinitely along the two axes (see Fig. 4.15). As in the 1-D case illustrated
in Fig. 4.5, multiplying f (t , z) by sT Z (t , z) yields the sampled function.
Function f (t , z) is said to be band limited if its Fourier transform is 0 outside a
rectangle established in the frequency domain by the intervals [ − mmax , mmax ] and
[ − nmax , nmax ]; that is,
F(m, n) = 0 for m ≥ mmax and n ≥ nmax (4-62)

The two-dimensional sampling theorem states that a continuous, band-limited func-


tion f (t , z) can be recovered with no error from a set of its samples if the sampling
intervals are
1
T < (4-63)
2mmax
and
1
Z < (4-64)
2nmax

or, expressed in terms of the sampling rate, if

FIGURE 4.15 sTZ (t, z)


2-D impulse train.

... ...

t z
... Z T ...

www.EBooksWorld.ir
4.5 Extensions to Functions of Two Variables 233

a b
Footprint of a
FIGURE 4.16 2-D ideal lowpass
Two-dimensional (box) filter
Fourier
transforms of (a) an
over-sampled, and v v
(b) an under-sam-
pled, band-limited
function. m max vmax

m m

1
> 2 mmax (4-65)
T
and
1
> 2nmax (4-66)
Z
Stated another way, we say that no information is lost if a 2-D, band-limited, con-
tinuous function is represented by samples acquired at rates greater than twice the
highest frequency content of the function in both the m- and n-directions.
Figure 4.16 shows the 2-D equivalents of Figs. 4.6(b) and (d). A 2-D ideal fil-
ter transfer function has the form illustrated in Fig. 4.14(a) (but in the frequency
domain). The dashed portion of Fig. 4.16(a) shows the location of the filter function
to achieve the necessary isolation of a single period of the transform for recon-
struction of a band-limited function from its samples, as in Fig. 4.8. From Fig 4.10,
we know that if the function is under-sampled, the periods overlap, and it becomes
impossible to isolate a single period, as Fig. 4.16(b) shows. Aliasing would result
under such conditions.

ALIASING IN IMAGES
In this section, we extend the concept of aliasing to images, and discuss in detail sev-
eral aspects of aliasing related to image sampling and resampling.

Extensions from 1-D Aliasing


As in the 1-D case, a continuous function f (t , z) of two continuous variables, t and z,
can be band-limited in general only if it extends infinitely in both coordinate direc-
tions. The very act of limiting the spatial duration of the function (e.g., by multiply-
ing it by a box function) introduces corrupting frequency components extending to
infinity in the frequency domain, as explained in Section 4.3 (see also Problem 4.12).
Because we cannot sample a function infinitely, aliasing is always present in digital
images, just as it is present in sampled 1-D functions. There are two principal mani-
festations of aliasing in images: spatial aliasing and temporal aliasing. Spatial aliasing
is caused by under-sampling, as discussed in Section 4.3, and tends to be more visible

www.EBooksWorld.ir
234 Chapter 4 Filtering in the Frequency Domain

(and objectionable) in images with repetitive patterns. Temporal aliasing is related


to time intervals between images of a sequence of dynamic images. One of the most
common examples of temporal aliasing is the “wagon wheel” effect, in which wheels
with spokes in a sequence of images (for example, in a movie) appear to be rotating
backwards. This is caused by the frame rate being too low with respect to the speed
of wheel rotation in the sequence, and is similar to the phenomenon described in
Fig. 4.11, in which under sampling produced a signal that appeared to be of much
lower frequency than the original.
Our focus in this chapter is on spatial aliasing. The key concerns with spatial alias-
ing in images are the introduction of artifacts such as jaggedness in line features, spu-
rious highlights, and the appearance of frequency patterns not present in the original
image. Just as we used Fig. 4.9 to explain aliasing in 1-D functions, we can develop
an intuitive grasp of the nature of aliasing in images using some simple graphics. The
sampling grid in the center section of Fig. 4.17 is a 2-D representation of the impulse
train in Fig. 4.15. In the grid, the little white squares correspond to the location of the
impulses (where the image is sampled) and black represents the separation between
samples. Superimposing the sampling grid on an image is analogous to multiplying
the image by an impulse train, so the same sampling concepts we discussed in con-
nection with the impulse train in Fig. 4.15 are applicable here. The focus now is to
analyze graphically the interaction between sampling rate (the separation of the
sampling points in the grid) and the frequency of the 2-D signals being sampled.
Figure 4.17 shows a sampling grid partially overlapping three 2-D signals (regions
of an image) of low, mid, and high spatial frequencies (relative to the separation
between sampling cells in the grid). Note that the level of spatial “detail” in the
regions is proportional to frequency (i.e., higher-frequency signals contain more
bars). The sections of the regions inside the sampling grip are rough manifestations
of how they would appear after sampling. As expected, all three digitized regions

FIGURE 4.17
Various aliasing
effects resulting
from the
interaction
between the
frequency of 2-D
signals and the
sampling rate
used to digitize
them. The regions
Low frequency High frequency
outside the
sampling grid are
continuous and
free of aliasing.

Sampling grid

Mid frequency

www.EBooksWorld.ir
4.5 Extensions to Functions of Two Variables 235

exhibit aliasing to some degree, but the effects are dramatically different, worsening
as the discrepancy between detail (frequency) and sampling rate increases. The low-
frequency region is rendered reasonably well, with some mild jaggedness around
the edges. The jaggedness increases as the frequency of the region increases to the
mid-range because the sampling rate is the same. This edge distortion (appropriately
called jaggies) is common in images with strong line and/or edge content.
The digitized high-frequency region in the top right of Fig. 4.17 exhibits totally
different and somewhat surprising behavior. Additional stripes (of lower frequen-
cy) appear in the digitized section, and these stripes are rotated significantly with
respect to the direction of the stripes in the continuous region. These stripes are an
alias of a totally different signal. As the following example shows, this type of behav-
ior can result in images that appear “normal” and yet bear no relation to the original.

EXAMPLE 4.6 : Aliasing in images.


Consider an imaging system that is perfect, in the sense that it is noiseless and produces an exact digi-
tal image of what it sees, but the number of samples it can take is fixed at 96 × 96 pixels. For simplicity,
assume that pixels are little squares of unit width and length. We want to use this system to digitize
checkerboard images of alternating black and white squares. Checkerboard images can be interpreted
as periodic, extending infinitely in both dimensions, where one period is equal to adjacent black/white
pairs. If we specify “valid” digitized images as being those extracted from an infinite sequence in such
a way that the image contains an integer multiple of periods, then, based on our earlier comments, we
know that properly sampled periodic images will be free of aliasing. In the present example, this means
that the sizes of the squares must be such that dividing 96 by the size yields an even number. This will
give an integer number of periods (pairs of black/white squares). The smallest size of squares under the
stated conditions is 1 pixel.
The principal objective of this example is to examine what happens when checkerboard images with
squares of sizes less than 1 pixel on the side are presented to the system. This will correspond to the
undersampled case discussed earlier, which will result in aliasing. A horizontal or vertical scan line of the
checkerboard images results in a 1-D square wave, so we can focus the analysis on 1-D signals.
To understand the capabilities of our imaging system in terms of sampling, recall from the discussion
of the 1-D sampling theorem that, given the sampling rate, the maximum frequency allowed before
aliasing occurs in the sampled signal has to be less than one-half the sampling rate. Our sampling rate is
fixed, at one sample per unit of the independent variable (the units are pixels). Therefore, the maximum
frequency our signal can have in order to avoid aliasing is 1/2 cycle/pixel.
We can arrive at the same conclusion by noting that the most demanding image our system can
handle is when the squares are 1 unit (pixel) wide, in which case the period (cycle) is two pixels. The
frequency is the reciprocal of the period, or 1/2 cycle/pixel, as in the previous paragraph.
Figures 4.18(a) and (b) show the result of sampling checkerboard images whose squares are of sizes
16 × 16 and 6 × 6 pixels, respectively. The frequencies of scan lines in either direction of these two images
are 1/32 and 1/6 cycles/pixel. These are well below the 1/2 cycles/pixel allowed for our system. Because, as
mentioned earlier, the images are perfectly registered in the field of view of the system, the results are free
of aliasing, as expected.
When the size of the squares is reduced to slightly less than one pixel, a severely aliased image results,
as Fig. 4.18(c) shows (the squares used were approximately of size 0.95 × 0.95 pixels). Finally, reducing

www.EBooksWorld.ir
236 Chapter 4 Filtering in the Frequency Domain

a b
c d
FIGURE 4.18
Aliasing. In (a) and
(b) the squares are
of sizes 16 and 6
pixels on the side.
In (c) and (d) the
squares are of sizes
0.95 and 0.48 pixels,
respectively. Each
small square in (c)
is one pixel. Both
(c) and (d) are
aliased. Note how
(d) masquerades as
a “normal” image.

the size of the squares to slightly less than 0.5 pixels on the side yielded the image in Fig. 4.18(d). In
this case, the aliased result looks like a normal checkerboard pattern. In fact, this image would result
from sampling a checkerboard image whose squares are 12 pixels on the side. This last image is a good
reminder that aliasing can create results that may be visually quite misleading.

The effects of aliasing can be reduced by slightly defocusing the image to be digi-
tized so that high frequencies are attenuated. As explained in Section 4.3, anti-alias-
ing filtering has to be done at the “front-end,” before the image is sampled. There
are no such things as after-the-fact software anti-aliasing filters that can be used to
reduce the effects of aliasing caused by violations of the sampling theorem. Most
commercial digital image manipulation packages do have a feature called “anti-
aliasing.” However, as illustrated in Example 4.8 below, this term is related to blur-
ring a digital image to reduce additional aliasing artifacts caused by resampling. The
term does not apply to reducing aliasing in the original sampled image. A significant
number of commercial digital cameras have true anti-aliasing filtering built in, either
in the lens or on the surface of the sensor itself. Even nature uses this approach to
reduce the effects of aliasing in the human eye, as the following example shows.

EXAMPLE 4.7 : Nature obeys the limits of the sampling theorem.


When discussing Figs. 2.1 and 2.2, we mentioned that cones are the sensors responsible for sharp vision.
Cones are concentrated in the fovea, in line with the visual axis of the lens, and their concentration is
measured in degrees off that axis. A standard test of visual acuity (the ability to resolve fine detail) in
humans is to place a pattern of alternating black and white stripes in one degree of the visual field. If the
total number of stripes exceeds 120 (i.e., a frequency of 60 cycles/degree), experimental evidence shows
that the observer will perceive the image as a single gray mass. That is, the lens in the eye automatically
lowpass filters spatial frequencies higher than 60 cycles/degree. Sampling in the eye is done by the cones,
so, based on the sampling theorem, we would expect the eye to have on the order of 120 cones/degree
in order to avoid the effects of aliasing. As it turns out, that is exactly what we have!

www.EBooksWorld.ir
4.5 Extensions to Functions of Two Variables 237

Image Resampling and Interpolation


As in the 1-D case, perfect reconstruction of a band-limited image function from a set
of its samples requires 2-D convolution in the spatial domain with a sinc function. As
explained in Section 4.3, this theoretically perfect reconstruction requires interpola-
tion using infinite summations which, in practice, forces us to look for approximate
interpolation methods. One of the most common applications of 2-D interpolation
in image processing is in image resizing (zooming and shrinking). Zooming may
be viewed as over-sampling, while shrinking may be viewed as under-sampling. The
key difference between these two operations and the sampling concepts discussed
in previous sections is that we are applying zooming and shrinking to digital images.
We introduced interpolation in Section 2.4. Our interest there was to illustrate the
performance of nearest neighbor, bilinear, and bicubic interpolation. In this section,
the focus is on sampling and anti-aliasing issues. Aliasing generally is introduced
when an image is scaled, either by zooming or by shrinking. For example, a special
case of nearest neighbor interpolation is zooming by pixel replication, which we use
to increase the size of an image an integer number of times. To double the size of
an image, we duplicate each column. This doubles the image size in the horizontal
direction. Then, we duplicate each row of the enlarged image to double the size in
the vertical direction. The same procedure is used to enlarge the image any integer
number of times. The intensity level assignment of each pixel is predetermined by
the fact that new locations are exact duplicates of old locations. In this crude method
of enlargement, one of the principal aliasing effects is the introduction of jaggies
on straight lines that are not horizontal or vertical. The effects of aliasing in image
enlargement often are reduced significantly by using more sophisticated interpola-
tion, as we discussed in Section 2.4. We show in the following example that aliasing
can also be a serious problem in image shrinking.

EXAMPLE 4.8 : Illustration of aliasing in resampled natural images.


The effects of aliasing generally are worsened when the size of a digital image is reduced. Figure 4.19(a)
is an image containing regions purposely selected to illustrate the effects of aliasing (note the thinly
spaced parallel lines in all garments worn by the subject). There are no objectionable aliasing artifacts
in Fig. 4.19(a), indicating that the sampling rate used initially was sufficient to mitigate visible aliasing.
In Fig. 4.19(b), the image was reduced to 33% of its original size using row/column deletion. The
effects of aliasing are quite visible in this image (see, for example, the areas around scarf and the sub-
ject’s knees). Images (a) and (b) are shown in the same size because the reduced image was brought
back to its original size by pixel replication (the replication did not alter appreciably the effects of alias-
ing just discussed.
The digital “equivalent” of the defocusing of continuous images mentioned earlier for reducing alias-
ing, is to attenuate the high frequencies of a digital image by smoothing it with a lowpass filter before
resampling. Figure 4.19(c) was processed in the same manner as Fig. 4.19(b), but the original image was
smoothed using a 5 × 5 spatial averaging filter (see Section 3.5) before reducing its size. The improve-
ment over Fig. 4.19(b) is evident. The image is slightly more blurred than (a) and (b), but aliasing is no
longer objectionable.

www.EBooksWorld.ir
238 Chapter 4 Filtering in the Frequency Domain

a b c
FIGURE 4.19 Illustration of aliasing on resampled natural images. (a) A digital image of size 772 × 548 pixels with visu-
ally negligible aliasing. (b) Result of resizing the image to 33% of its original size by pixel deletion and then restor-
ing it to its original size by pixel replication. Aliasing is clearly visible. (c) Result of blurring the image in (a) with an
averaging filter prior to resizing. The image is slightly more blurred than (b), but aliasing is not longer objectionable.
(Original image courtesy of the Signal Compression Laboratory, University of California, Santa Barbara.)

The term moiré is a Aliasing and Moiré Patterns


French word (not the
name of a person) that In optics, a moiré pattern is a secondary, visual phenomenon produced, for example,
appears to have
originated with weavers, by superimposing two gratings of approximately equal spacing. These patterns are
who first noticed what
appeared to be interfer-
common, everyday occurrences. For instance, we see them in overlapping insect win-
ence patterns visible on dow screens and on the interference between TV raster lines and striped or high-
some fabrics. The root
of the word is from the
ly textured materials in the background, or worn by individuals. In digital image
word mohair, a cloth processing, moiré-like patterns arise routinely when sampling media print, such as
made from Angora goat
hairs. newspapers and magazines, or in images with periodic components whose spacing
is comparable to the spacing between samples. It is important to note that moiré
patterns are more general than sampling artifacts. For instance, Fig. 4.20 shows the
moiré effect using vector drawings that have not been digitized. Separately, the pat-
terns are clean and void of interference. However, the simple acts of superimposing
one pattern on the other creates a pattern with frequencies not present in either of
the original patterns. Note in particular the moiré effect produced by two patterns
of dots, as this is the effect of interest in the following discussion.

EXAMPLE 4.9 : Sampling printed media.


Newspapers and other printed materials use so called halftone dots, which are black dots or ellipses
whose sizes and various grouping schemes are used to simulate gray tones. As a rule, the following num-
bers are typical: newspapers are printed using 75 halftone dots per inch (dpi), magazines use 133 dpi, and

www.EBooksWorld.ir
4.5 Extensions to Functions of Two Variables 239

a b c
d e f
FIGURE 4.20
Examples of the
moiré effect.
These are vector
drawings, not
digitized patterns.
Superimposing
one pattern on the
other is analogous
to multiplying the
patterns.

high-quality brochures use 175 dpi. Figure 4.21 shows what happens when a newspaper image is (under)
sampled at 75 dpi. The sampling lattice (which is oriented vertically and horizontally) and dot patterns
on the newspaper image (oriented at ± 45° ) interact to create a uniform moiré-like pattern that makes
the image look blotchy. (We will discuss a technique in Section 4.10 for reducing the effects of moiré
patterns in under-sampled print media.)

FIGURE 4.21
A newspaper
image digitized at
75 dpi. Note the
moiré-like pattern
resulting from
the interaction
between the ± 45°
orientation of the
half-tone dots and
the north-south
orientation of the
sampling elements
used to digitized
the image.

www.EBooksWorld.ir
240 Chapter 4 Filtering in the Frequency Domain

THE 2-D DISCRETE FOURIER TRANSFORM AND ITS INVERSE


A development similar to the material in Sections 4.3 and 4.4 would yield the follow-
ing 2-D discrete Fourier transform (DFT):

M −1 N −1
F (u, v) = ∑ ∑
x=0 y=0
f ( x, y) e − j 2 p( ux M + vy N ) (4-67)

where f ( x, y) is a digital image of size M × N . As in the 1-D case, Eq. (4-67) must be
evaluated for values of the discrete variables u and v in the ranges u = 0, 1, 2, … , M − 1
and v = 0, 1, 2, … , N − 1. †
Given the transform F(u, v), we can obtain f ( x, y) by using the inverse discrete
Sometimes you will find
Fourier transform (IDFT):
in the literature the
1�MN constant in front M −1 N −1
1
of the DFT instead of
the IDFT. At times,
f ( x, y) =
MN
∑ ∑ F (u, v) e j 2p(ux M + vy N )
u=0 v=0
(4-68)
the square root of this
constant is included in
front of the forward
and inverse transforms, for x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. As in the 1-D case, [Eqs. (4-44)
thus creating a more and (4-45)], Eqs. (4-67) and (4-68) constitute a 2-D discrete Fourier transform pair,
symmetrical pair. Any
of these formulations is f ( x, y) ⇔ F (u, v). (The proof is a straightforward extension of the 1-D case in Prob-
correct, provided they lem 4.15.) The rest of this chapter is based on properties of these two equations and
are used consistently.
their use for image filtering in the frequency domain. The comments made in con-
nection with Eqs. (4-44) and (4-45) are applicable to Eqs. (4-67) and (4-68); that is,
knowing that f ( x, y) and F(u, v) are a Fourier transform pair can be quite useful in
proving relationships between functions and their transforms.

4.6 SOME PROPERTIES OF THE 2-D DFT AND IDFT


4.6

In this section, we introduce several properties of the 2-D discrete Fourier transform
and its inverse.

RELATIONSHIPS BETWEEN SPATIAL AND FREQUENCY INTERVALS


The relationships between spatial sampling and the corresponding frequency
domain intervals are as explained in Section 4.4. Suppose that a continuous func-
tion f (t , z) is sampled to form a digital image, f ( x, y), consisting of M × N samples
taken in the t- and z-directions, respectively. Let T and Z denote the separations
between samples (see Fig. 4.15). Then, the separations between the corresponding
discrete, frequency domain variables are given by

1
u = (4-69)
MT


As mentioned in Section 4.4, keep in mind that in this chapter we use (t , z) and (m, n) to denote 2-D continuous
spatial and frequency-domain variables. In the 2-D discrete case, we use ( x, y) for spatial variables and (u, v) for
frequency-domain variables, all of which are discrete.

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 241

and
1
v = (4-70)
N Z
respectively. Note the important property that the separations between samples in
the frequency domain are inversely proportional both to the spacing between spa-
tial samples and to the number of samples.

TRANSLATION AND ROTATION


The validity of the following Fourier transform pairs can be demonstrated by direct
substitution into Eqs. (4-67) and (4-68) (see Problem 4.27):

M + v0 y N )
Recall that we use the
f ( x, y) e j 2 p( u0 x ⇔ F (u − u0 , v − v0 ) (4-71)
symbol “⇔” to denote
Fourier transform pairs. and
That is, the term on the
right is the transform f ( x − x0 , y − y0 ) ⇔ F (u, v) e − j 2 p( x0 u M + y0 v N )
(4-72)
of the term on the left,
and the term on the left
is the inverse Fourier That is, multiplying f ( x, y) by the exponential shown shifts the origin of the DFT to
transform of the term on
the right. (u0 , v0 ) and, conversely, multiplying F(u, v) by the negative of that exponential shifts
the origin of f ( x, y) to ( x0 , y0 ). As we illustrate in Example 4.13, translation has no
effect on the magnitude (spectrum) of F(u, v).
Using the polar coordinates

x = r cos u y = r sin u u = v cos w v = v sin w

results in the following transform pair:

f (r, u + u0 ) ⇔ F (v, w + u0 ) (4-73)

which indicates that rotating f ( x, y) by an angle u0 rotates F(u, v) by the same angle.
Conversely, rotating F(u, v) rotates f ( x, y) by the same angle.

PERIODICITY
As in the 1-D case, the 2-D Fourier transform and its inverse are infinitely periodic
in the u and v directions; that is,

F (u, v) = F (u + k1 M , v) = F (u, v + k2 N ) = F (u + k1 M , v + k2 N ) (4-74)


and
f ( x, y) = f ( x + k1 M , y) = f ( x, y + k2 N ) = f ( x + k1 M , y + k2 N ) (4-75)

where k1 and k2 are integers.


The periodicities of the transform and its inverse are important issues in the
implementation of DFT-based algorithms. Consider the 1-D spectrum in Fig. 4.22(a).
As explained in Section 4.4 [see the footnote to Eq. (4-42)], the transform data in the
interval from 0 to M − 1 consists of two half periods meeting at point M 2, but with

www.EBooksWorld.ir
242 Chapter 4 Filtering in the Frequency Domain

a F (u)
b
c d
FIGURE 4.22 Two adjacent half
Centering the periods meet here.
Fourier transform.
(a) A 1-D DFT u
showing an infinite M/ 2 0 M/ 2  1 M/ 2 M
number of peri- M1
ods. (b) Shifted F (u)
DFT obtained
by multiplying
f ( x) by (−1)x Two adjacent half
before computing periods meet here.
F (u). (c) A 2-D
DFT showing an
infinite number of u
0 M/ 2 M1
periods. The area One period (M samples)
within the dashed
rectangle is the
data array, F(u, v),
obtained with
Eq. (4-67) with
an image f ( x, y) (0, 0) N/ 2 N1 (0, 0) N/ 2 N1
as the input. This v v
array consists of F (0, 0)
four quarter peri- M/ 2 M/ 2
ods. (d) Shifted
array obtained
by multiplying M1 M1
f ( x, y) by (−1)x + y
before computing
F(u, v). The data u u
now contains one Four adjacent quarter
complete, centered periods meet here
period, as in (b).
 M  N data array computed by the DFT with f ( x, y) as input
 M  N data array computed by the DFT with f ( x, y)(−1)x + y as input
= Periods of the DFT

the lower part of the period appearing at higher frequencies. For display and filter-
ing purposes, it is more convenient to have in this interval a complete period of the
transform in which the data are contiguous and ordered properly, as in Fig. 4.22(b).
It follows from Eq. (4-71) that

f ( x) e j 2 p(u0 x M)
⇔ F (u − u0 )

In other words, multiplying f ( x) by the exponential term shown shifts the transform
data so that the origin, F(0), is moved to u0 . If we let u0 = M 2, the exponential
term becomes e jpx , which is equal to (−1)x because x is an integer. In this case,

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 243

f ( x)(−1)x ⇔ F (u − M / 2)

That is, multiplying f ( x) by (−1)x shifts the data so that F (u) is centered on the inter-
val [0, M − 1], which corresponds to Fig. 4.22(b), as desired.
In 2-D the situation is more difficult to graph, but the principle is the same, as
Fig. 4.22(c) shows. Instead of two half periods, there are now four quarter periods
meeting at the point (M 2 , N 2). As in the 1-D case, we want to shift the data so
that F(0, 0) is at (M 2 , N 2). Letting (u0 , v0 ) = (M 2 , N 2) in Eq. (4-71) results in
the expression

f ( x, y)(−1)x + y ⇔ F (u − M 2 , v − N 2) (4-76)

Using this equation shifts the data so that F(0, 0) is moved to the center of
the frequency rectangle (i.e., the rectangle defined by the intervals [0, M − 1] and
[0, N − 1] in the frequency domain). Figure 4.22(d) shows the result.
Keep in mind that in all our discussions, coordinate values in both the spatial and
frequency domains are integers. As we explained in Section 2.4 (see Fig. 2.19) if, as
in our case), the origin of an M × N image or transform is at (0, 0), then the center of
that image or transform is at ( floor(M 2), floor( N 2)) . This expression is applicable
to both even and odd values of M and N. For example, the center of an array of size
20 × 15 is at point (10, 7). Because we start counting from 0, these are the 11th and
8th points in the first and second coordinate axes of the array, respectively.

SYMMETRY PROPERTIES
An important result from functional analysis is that any real or complex function,
w( x, y), can be expressed as the sum of an even and an odd part, each of which can
be real or complex:

w( x, y) = we ( x, y) + wo ( x, y) (4-77)

where the even and odd parts are defined as


w( x, y) + w(− x, − y)
we ( x, y)  (4-78)
2
and
w( x, y) − w(− x, − y)
wo ( x, y)  (4-79)
2
for all valid values of x and y. Substituting Eqs. (4-78) and (4-79) into Eq. (4-77) gives
the identity w( x, y) ≡ w( x, y), thus proving the validity of the latter equation. It fol-
lows from the preceding definitions that

we ( x, y) = we (− x, − y) (4-80)

and

www.EBooksWorld.ir
244 Chapter 4 Filtering in the Frequency Domain

wo ( x, y) = −wo (− x, − y) (4-81)

Even functions are said to be symmetric and odd functions antisymmetric. Because
In the context of this dis- all indices in the DFT and IDFT are nonnegative integers, when we talk about sym-
cussion, the locations of
elements in a sequence metry (antisymmetry) we are referring to symmetry (antisymmetry) about the cen-
are denoted by integers. ter point of a sequence, in which case the definitions of even and odd become:
Therefore, the same
observations made a few
paragraphs back about we ( x, y) = we (M − x, N − y) (4-82)
the centers of arrays of
even and odd sizes are
applicable to sequences. and
But, do not confuse the
concepts of even/odd
numbers and even/odd wo ( x, y) = −wo (M − x, N − y) (4-83)
functions.

for x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. As usual, M and N are the number


of rows and columns of a 2-D array.
We know from elementary mathematical analysis that the product of two even or
two odd functions is even, and that the product of an even and an odd function is
odd. In addition, the only way that a discrete function can be odd is if all its samples
To convince yourself that
the samples of an odd
sum to zero. These properties lead to the important result that
function sum to zero,
M −1 N −1
sketch one period of
a 1-D sine wave about
the origin or any other
∑ ∑ we ( x, y)wo ( x, y) = 0
x=0 y=0
(4-84)
interval spanning one
period.
for any two discrete even and odd functions we and wo . In other words, because the
argument of Eq. (4-84) is odd, the result of the summations is 0. The functions can
be real or complex.

EXAMPLE 4.10 : Even and odd functions.


Although evenness and oddness are visualized easily for continuous functions, these concepts are not as
intuitive when dealing with discrete sequences. The following illustrations will help clarify the preceding
ideas. Consider the 1-D sequence

f = { f (0), f (1), f (2), f (3)} = {2, 1, 1, 1}

in which M = 4. To test for evenness, the condition f ( x) = f (4 − x) must be satisfied for x = 0, 1, 2, 3.


That is, we require that

f (0) = f (4), f (1) = f (3), f (2) = f (2), f (3) = f (1)

Because f (4) is outside the range being examined and can be any value, the value of f (0) is immaterial
in the test for evenness. We see that the next three conditions are satisfied by the values in the array, so
the sequence is even. In fact, we conclude that any 4-point even sequence has to have the form
{a, b, c, b}
That is, only the second and last points must be equal in a 4-point even sequence. In general, when M
is an even number, a 1-D even sequence has the property that the points at locations 0 and M 2 have

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 245

arbitrary values. When M is odd, the first point of an even sequence is still arbitrary, but the others form
pairs with equal values.
Odd sequences have the interesting property that their first term, wo (0, 0), is always 0, a fact that fol-
lows directly from Eq. (4-79). Consider the 1-D sequence

g = { g(0), g(1), g(2), g(3)} = {0, − 1, 0, 1}

We can confirm that this is an odd sequence by noting that the terms in the sequence satisfy the condi-
tion g( x) = − g(4 − x) for x = 1, 2, 3. All we have to do for x = 0 is to check that g(0) = 0. We check the
other terms using the definition. For example, g(1) = − g(3). Any 4-point odd sequence has the form

{0, − b, 0, b}
In general, when M is an even number, a 1-D odd sequence has the property that the points at locations
0 and M 2 are always zero. When M is odd, the first term still has to be 0, but the remaining terms form
pairs with equal value but opposite signs.
The preceding discussion indicates that evenness and oddness of sequences depend also on the length
of the sequences. For example, we showed already that the sequence {0, − 1, 0, 1} is odd. However, the
sequence {0, − 1, 0, 1, 0} is neither odd nor even, although the “basic” structure appears to be odd. This
is an important issue in interpreting DFT results. We will show later in this section that the DFTs of even
and odd functions have some very important characteristics. Thus, it often is the case that understanding
when a function is odd or even plays a key role in our ability to interpret image results based on DFTs.
The same basic considerations hold in 2-D. For example, the 6 × 6 2-D array with center at location
(3, 3), shown bold in the figure [remember, we start counting at (0, 0)],

0 0 0 0 0 0
0 0 0 0 0 0
0 0 −1 0 1 0
0 0 −2 0 2 0
0 0 −1 0 1 0
0 0 0 0 0 0

is odd, as you can prove using Eq. (4-83). However, adding another row or column of 0’s would give
a result that is neither odd nor even. In general, inserting a 2-D array of even dimensions into a larger
array of zeros, also of even dimensions, preserves the symmetry of the smaller array, provided that the
centers coincide. Similarly, a 2-D array of odd dimensions can be inserted into a larger array of zeros of
odd dimensions without affecting the symmetry. Note that the inner structure of the preceding array is
a Sobel kernel (see Fig. 3.50). We return to this kernel in Example 4.15, where we embed it in a larger
array of zeros for filtering purposes.

Conjugate symmetry
is also called hermitian Armed with the preceding concepts, we can establish a number of important sym-
symmetry. The term metry properties of the DFT and its inverse. A property used frequently is that the
antihermitian is used
sometimes to refer to Fourier transform of a real function, f ( x, y), is conjugate symmetric:
conjugate antisymmetry.

www.EBooksWorld.ir
246 Chapter 4 Filtering in the Frequency Domain

F * (u, v) = F (− u, −v) (4-85)

We show the validity of this equation as follows:


*
⎡ M −1 N −1 ⎤
F (u, v) = ⎢ ∑ ∑ f ( x, y) e − j 2p(ux M + vy
* N)

⎢⎣ x = 0 y = 0 ⎥⎦
M −1 N −1
= ∑ ∑ f * ( x, y) e j 2p(ux M + vy N )
x=0 y=0
M −1 N −1
= ∑ ∑ f ( x, y) e− j 2p([ − u]x M + [ − v]y N )
x=0 y=0

= F (−u, −v)
where the third step follows from the fact that f ( x, y) is real. A similar approach
can be used to prove that, if f ( x, y) is imaginary, its Fourier transform is conjugate
antisymmetric; that is, F * (− u, −v) = − F (u, v).
Table 4.1 lists symmetries and related properties of the DFT that are useful in
digital image processing. Recall that the double arrows indicate Fourier transform
pairs; that is, for any row in the table, the properties on the right are satisfied by the
Fourier transform of the function having the properties listed on the left, and vice
versa. For example, entry 5 reads: The DFT of a real function f ( x, y), in which ( x, y)

TABLE 4.1
Spatial Domain† Frequency Domain†
Some symmetry
properties of the 1) f ( x, y) real ⇔ F * (u, v) = F (− u, − v)
2-D DFT and its
inverse. R(u, v) 2) f ( x, y) imaginary ⇔ F * (− u, − v) = − F (u, v)
and I(u, v) are
the real and 3) f ( x, y) real ⇔ R(u, v) even; I (u, v) odd
imaginary parts of 4) f ( x, y) imaginary ⇔ R(u, v) odd; I (u, v) even
F(u, v),
respectively. 5) f (− x, − y) real ⇔ F * (u, v) complex
Use of the word
complex indicates 6) f (− x, − y) complex ⇔ F(− u, − v) complex
that a function
has nonzero real 7) f * ( x, y) complex ⇔ F * (− u, − v) complex
and imaginary
parts. 8) f ( x, y) real and even ⇔ F(u, v) real and even
9) f ( x, y) real and odd ⇔ F(u, v) imaginary and odd
10) f ( x, y) imaginary and even ⇔ F(u, v) imaginary and even
11) f ( x, y) imaginary and odd ⇔ F(u, v) real and odd
12) f ( x, y) complex and even ⇔ F(u, v) complex and even
13) f ( x, y) complex and odd ⇔ F(u, v) complex and odd

Recall that x, y, u, and v are discrete (integer) variables, with x and u in the range [0, M − 1], and y and v in
the range [0, N − 1]. To say that a complex function is even means that its real and imaginary parts are even, and
similarly for an odd complex function. As before, “⇔” indicates a Fourier transform pair.

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 249

FOURIER SPECTRUM AND PHASE ANGLE


Because the 2-D DFT is complex in general, it can be expressed in polar form:

F (u, v) = R(u, v) + jI (u, v)


(4-86)
= F (u, v) e jf(u,v )
where the magnitude

1/ 2
F (u, v) = ⎡⎣ R 2 (u, v) + I 2 (u, v)⎤⎦ (4-87)

is called the Fourier (or frequency) spectrum, and

⎡ I (u, v) ⎤
f(u, v) = arctan ⎢ ⎥ (4-88)
⎣ R(u, v) ⎦
is the phase angle or phase spectrum. Recall from the discussion in Section 4.2 that
the arctan must be computed using a four-quadrant arctangent function, such as
MATLAB’s atan2(Imag, Real) function.
Finally, the power spectrum is defined as
2
P(u, v) = F (u, v)
(4-89)
= R 2 (u, v) + I 2 (u, v)
As before, R and I are the real and imaginary parts of F(u, v), and all computations
are carried out for the discrete variables u = 0, 1, 2, … , M − 1 and v = 0, 1, 2, … , N − 1.
Therefore, F(u, v) , f(u, v), and P(u, v) are arrays of size M × N .
The Fourier transform of a real function is conjugate symmetric [see Eq. (4-85)],
which implies that the spectrum has even symmetry about the origin:

F (u, v) = F (−u, −v) (4-90)

The phase angle exhibits odd symmetry about the origin:

f(u, v) = − f(−u, −v) (4-91)

It follows from Eq. (4-67) that


M −1 N −1
F (0, 0) = ∑ ∑ f ( x, y)
x=0 y=0

which indicates that the zero-frequency term of the DFT is proportional to the aver-
age of f ( x, y). That is,
M −1 N −1
1
F (0, 0) = MN
MN
∑ ∑ f ( x, y)
x=0 y=0 (4-92)
= MNf

www.EBooksWorld.ir
250 Chapter 4 Filtering in the Frequency Domain

where f (a scalar) denotes the average value of f ( x, y). Then,

F (0, 0) = MN f (4-93)

Because the proportionality constant MN usually is large, F(0, 0) typically is the


largest component of the spectrum by a factor that can be several orders of magni-
tude larger than other terms. Because frequency components u and v are zero at the
origin, F(0, 0) sometimes is called the dc component of the transform. This terminol-
ogy is from electrical engineering, where “dc” signifies direct current (i.e., current of
zero frequency).

EXAMPLE 4.13 : The spectrum of a rectangle.


Figure 4.23(a) shows an image of a rectangle and Fig. 4.23(b) shows its spectrum, whose values were
scaled to the range [0, 255] and displayed in image form. The origins of both the spatial and frequency
domains are at the top left. This is the right-handed coordinate system convention we defined in Fig. 2.19.
Two things are apparent in Fig. 4.23(b). As expected, the area around the origin of the transform con-
tains the highest values (and thus appears brighter in the image). However, note that the four corners

a b y v
c d
FIGURE 4.23
(a) Image.
(b) Spectrum,
showing small,
bright areas in the
four corners (you
have to look care-
fully to see them).
(c) Centered
spectrum.
(d) Result after a
log transformation.
The zero crossings x u
of the spectrum v v
are closer in the
vertical direction
because the rectan-
gle in (a) is longer
in that direction.
The right-handed
coordinate
convention used in
the book places the
origin of the spatial
and frequency
domains at the top
left (see Fig. 2.19).
u u

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 251

a b
c d
FIGURE 4.24
(a) The rectangle
in Fig. 4.23(a)
translated.
(b) Corresponding
spectrum.
(c) Rotated
rectangle.
(d) Corresponding
spectrum.
The spectrum of
the translated
rectangle is
identical to the
spectrum of the
original image in
Fig. 4.23(a).

of the spectrum contain similarly high values. The reason is the periodicity property discussed in the
previous section. To center the spectrum, we simply multiply the image in (a) by (−1)x + y before comput-
ing the DFT, as indicated in Eq. (4-76). Figure 4.23(c) shows the result, which clearly is much easier to
visualize (note the symmetry about the center point). Because the dc term dominates the values of the
spectrum, the dynamic range of other intensities in the displayed image are compressed. To bring out
those details, we used the log transformation defined in Eq. (3-4) with c = 1. Figure 4.23(d) shows the
display of log(1 + F (u, v) ). The increased rendition of detail is evident. Most spectra shown in this and
subsequent chapters are scaled in this manner.
It follows from Eqs. (4-72) and (4-73) that the spectrum is insensitive to image translation (the abso-
lute value of the exponential term is 1), but it rotates by the same angle of a rotated image. Figure
4.24 illustrates these properties. The spectrum in Fig. 4.24(b) is identical to the spectrum in Fig. 4.23(d).

a b c
FIGURE 4.25
Phase angle
images of
(a) centered,
(b) translated,
and (c) rotated
rectangles.

www.EBooksWorld.ir
252 Chapter 4 Filtering in the Frequency Domain

Clearly, the images in Figs. 4.23(a) and 4.24(a) are different so, if their Fourier spectra are the same,
then, based on Eq. (4-86), their phase angles must be different. Figure 4.25 confirms this. Figures 4.25(a)
and (b) are the phase angle arrays (shown as images) of the DFTs of Figs. 4.23(a) and 4.24(a). Note the
lack of similarity between the phase images, in spite of the fact that the only differences between their
corresponding images is simple translation. In general, visual analysis of phase angle images yields little
intuitive information. For instance, because of its 45° orientation, one would expect intuitively that the
phase angle in Fig. 4.25(a) should correspond to the rotated image in Fig. 4.24(c), rather than to the
image in Fig. 4.23(a). In fact, as Fig. 4.25(c) shows, the phase angle of the rotated image has a strong
orientation that is much less than 45°.

The components of the spectrum of the DFT determine the amplitudes of the
sinusoids that combine to form an image. At any given frequency in the DFT of
an image, a large amplitude implies a greater prominence of a sinusoid of that fre-
quency in the image. Conversely, a small amplitude implies that less of that sinu-
soid is present in the image. Although, as Fig. 4.25 shows, the contribution of the
phase components is less intuitive, it is just as important. The phase is a measure of
displacement of the various sinusoids with respect to their origin. Thus, while the
magnitude of the 2-D DFT is an array whose components determine the intensities
in the image, the corresponding phase is an array of angles that carry much of the
information about where discernible objects are located in the image. The following
example illustrates these ideas in more detail.

EXAMPLE 4.14 : Contributions of the spectrum and phase angle to image formation.
Figure 4.26(b) shows as an image the phase-angle array, f(u, v), of the DFT of Fig. 4.26(a), computed
using Eq. (4-88). Although there is no detail in this array that would lead us by visual analysis to associ-
ate it with the structure of its corresponding image, the information in this array is crucial in determin-
ing shape features of the image. To illustrate this, we reconstructed the boy’s image using only its phase
angle. The reconstruction consisted of computing the inverse DFT of Eq. (4-86) using f(u, v), but setting
F(u, v) = 1. Figure Fig. 4.26(c) shows the result (the original result had much less contrast than is shown;
to bring out details important in this discussion, we scaled the result using Eqs. (2-31) and (2-32), and
then enhanced it using histogram equalization). However, even after enhancement, it is evident that
much of the intensity information has been lost (remember, that information is carried by the spectrum,
which we did not use in the reconstruction). However, the shape features in 4.26(c) are unmistakably
from Fig. 4.26(a). This illustrates vividly the importance of the phase angle in determining shape char-
acteristics in an image.
Figure 4.26(d) was obtained by computing the inverse DFT Eq. (4-86), but using only the spectrum.
This means setting the exponential term to 1, which in turn implies setting the phase angle to 0. The
result is not unexpected. It contains only intensity information, with the dc term being the most domi-
nant. There is no shape information in the image because the phase was set to zero.
Finally, Figs. 4.26(e) and (f) show yet again the dominance of the phase in determining the spatial
feature content of an image. Figure 4.26(e) was obtained by computing the inverse DFT of Eq. (4-86)
using the spectrum of the rectangle from Fig. 4.23(a) and the phase angle from the boy’s image. The
boy’s features clearly dominate this result. Conversely, the rectangle dominates Fig. 4.26(f), which was
computed using the spectrum of the boy’s image and the phase angle of the rectangle.

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 253

a b c
d e f
FIGURE 4.26 (a) Boy image. (b) Phase angle. (c) Boy image reconstructed using only its phase angle (all shape features
are there, but the intensity information is missing because the spectrum was not used in the reconstruction). (d) Boy
image reconstructed using only its spectrum. (e) Boy image reconstructed using its phase angle and the spectrum of
the rectangle in Fig. 4.23(a). (f) Rectangle image reconstructed using its phase and the spectrum of the boy’s image.

THE 2-D DISCRETE CONVOLUTION THEOREM

You will find it helpful


Extending Eq. (4-48) to two variables results in the following expression for 2-D
to review Eq. (4-48), circular convolution:
and the comments made M −1 N −1
there regarding circular
convolution, as opposed
( f � h)( x, y) = ∑ ∑ f (m, n)h( x − m, y − n)
m=0 n=0
(4-94)
to the convolution we
studied in Section 3.4.
for x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. As in Eq. (4-48), Eq. (4-94) gives
one period of a 2-D periodic sequence. The 2-D convolution theorem is give by

( f � h)( x, y) ⇔ (F i H )(u, v) (4-95)

www.EBooksWorld.ir
254 Chapter 4 Filtering in the Frequency Domain

and, conversely,
1
( f i h)( x, y) ⇔ (F � H )(u, v) (4-96)
MN
The function products
where F and H are the Fourier transforms of f and h, respectively, obtained using
are elementwise products, Eq. (4-67). As before, the double arrow is used to indicate that the left and right sides
as defined in Section 2.6. of the expressions constitute a Fourier transform pair. Our interest in the remainder
of this chapter is in Eq. (4-95), which states that the Fourier transform of the spatial
convolution of f and h, is the product of their transforms. Similarly, the inverse DFT
of the product (F i H )(u, v) yields ( f � h)( x, y).
Equation (4-95) is the foundation of linear filtering in the frequency domain and,
as we will explain in Section 4.7, is the basis for all the filtering techniques discussed
in this chapter. As you will recall from Chapter 3, spatial convolution is the foun-
dation for spatial filtering, so Eq. (4-95) is the tie that establishes the equivalence
between spatial and frequency-domain filtering, as we have mentioned several times
before.
Ultimately, we are interested in the results of convolution in the spatial domain,
where we analyze images. However, the convolution theorem tell us that we have
two ways of computing the spatial convolution of two functions. We can do it directly
in the spatial domain with Eq. (3-35), using the approach described in Section 3.4
or, according to Eq. (4-95), we can compute the Fourier transform of each function,
multiply the transforms, and compute the inverse Fourier transform. Because we are
dealing with discrete quantities, computation of the Fourier transforms is carried
We will discuss efficient
out using a DFT algorithm. This automatically implies periodicity, which means that
ways for computing the when we take the inverse Fourier transform of the product of the two transforms we
DFT in Section 4.11.
would get a circular (i.e., periodic) convolution, one period of which is given by Eq.
(4-94). The question is: under what conditions will the direct spatial approach and
the inverse Fourier transform method yield the same result? We arrive at the answer
by looking at a 1-D example first, and then extending the results to two variables.
The left column of Fig. 4.27 implements convolution of two functions, f and h,
using the 1-D equivalent of Eq. (3-35), which, because the two functions are of same
size, is written as
399
( f � h)( x) = ∑ f ( x) h( x − m)
m=0

Recall from our explanation of Figs. 3.29 and 3.30 that the procedure consists of (1)
rotating (flipping) h by 180°, [see Fig. 4.27(c)], (2) translating the resulting function
by an amount x [Fig. 4.27(d)], and (3) for each value x of translation, computing the
entire sum of products in the right side of the equation. In terms of Fig. 4.27, this
means multiplying the function in Fig. 4.27(a) by the function in Fig. 4.27(d) for each
value of x. The displacement x ranges over all values required to completely slide h
across f. Figure 4.27(e) shows the convolution of these two functions. As you know,
convolution is a function of the displacement variable, x, and the range of x required
in this example to completely slide h past f is from 0 to 799.

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 255
f (m) f (m)
a f
b g
c h 3 3
d i
e j
FIGURE 4.27 m m
Left column: 0 200 400 0 200 400
Spatial h (m) h (m)
convolution
computed with
Eq. (3-35), using
2 2
the approach
discussed in
m m
Section 3.4.
0 200 400 0 200 400
Right column:
h (m) h (m)
Circular
convolution. The
solid line in (j)
is the result we
would obtain
using the DFT, m m
or, equivalently, 0 200 400 0 200 400
Eq. (4-48). This h (x  m) h (x  m)
erroneous result
can be remedied
by using zero x x
padding.
m m
0 200 400 0 200 400
( f � g )( x) ( f � g )( x)

1200 1200

600 600
x x
0 200 400 600 800 0 200 400
Range of
Fourier transform
computation

If we use the DFT and the convolution theorem to try to obtain the same result
as in the left column of Fig. 4.27, we must take into account the periodicity inher-
ent in the expression for the DFT. This is equivalent to convolving the two periodic
functions in Figs. 4.27(f) and (g) (i.e., as Eqs. (4-46) and (4-47) indicate, the func-
tions their transforms have implied periodicity). The convolution procedure is the
same as we just discussed, but the two functions now are periodic. Proceeding with
these two functions as in the previous paragraph would yield the result in Fig. 4.27(j),
which obviously is incorrect. Because we are convolving two periodic functions, the
convolution itself is periodic. The closeness of the periods in Fig. 4.27 is such that

www.EBooksWorld.ir
256 Chapter 4 Filtering in the Frequency Domain

they interfere with each other to cause what is commonly referred to as wraparound
error. According to the convolution theorem, if we had computed the DFT of the
two 400-point functions, f and h, multiplied the two transforms, and then computed
the inverse DFT, we would have obtained the erroneous 400-point segment of the
periodic convolution shown as a solid line in Fig. 4.27(j) (remember the limits of the
1-D DFT are u = 0, 1, 2, … , M − 1). This is also the result we would obtain if we used
Eq. (4-48) [the 1-D equivalent of Eq. (4-94)] to compute one period of the circular
convolution.
Fortunately, the solution to the wraparound error problem is simple. Consider
two functions, f ( x) and h( x) composed of A and B samples, respectively. It can be
shown (Brigham [1988]) that if we append zeros to both functions so that they have
the same length, denoted by P, then wraparound is avoided by choosing

P≥ A+B−1 (4-97)

In our example, each function has 400 points, so the minimum value we could use is
The padding zeros could
P = 799, which implies that we would append 399 zeros to the trailing edge of each
be appended also at function. This procedure is called zero padding, as we discussed in Section 3.4. As
the beginning of the
functions, or they could
an exercise, you should convince yourself that if the periods of the functions in Figs.
be divided between the 4.27(f) and (g) were lengthened by appending to each period at least 399 zeros, the
beginning and end of the
functions. It is simpler to
result would be a periodic convolution in which each period is identical to the cor-
append them at the end. rect result in Fig. 4.27(e). Using the DFT via the convolution theorem would result
in a 799-point spatial function identical to Fig. 4.27(e). The conclusion, then, is that
to obtain the same convolution result between the “straight” representation of the
convolution equation approach in Chapter 3, and the DFT approach, functions in
the latter must be padded prior to computing their transforms.
Visualizing a similar example in 2-D is more difficult, but we would arrive at the
same conclusion regarding wraparound error and the need for appending zeros to
the functions. Let f ( x, y) and h( x, y) be two image arrays of sizes A × B and C × D
pixels, respectively. Wraparound error in their circular convolution can be avoided
We use zero-padding
by padding these functions with zeros, as follows:
here for simplicity. Recall
from the discussion of
⎧ f ( x, y) 0 ≤ x ≤ A − 1 and 0 ≤ y ≤ B − 1
Fig. 3.39 that replicate f p ( x, y) = ⎨ (4-98)
and mirror padding
⎩0 A ≤ x ≤ P or B ≤ y ≤ Q
generally yield better
results.
and

⎧ h( x, y) 0 ≤ x ≤ C − 1 and 0 ≤ y ≤ D − 1
hp ( x, y) = ⎨ (4-99)
⎩0 C ≤ x ≤ P or D ≤ y ≤ Q

with

P ≥ A+C −1 (4-100)

and

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 257

Q≥B+D−1 (4-101)

The resulting padded images are of size P × Q. If both arrays are of the same size,
M × N , then we require that P ≥ 2 M − 1 and Q ≥ 2 N − 1. As a rule, DFT algorithms
tend to execute faster with arrays of even size, so it is good practice to select P and
Q as the smallest even integers that satisfy the preceding equations. If the two arrays
are of the same size, this means that P and Q are selected as:

P = 2M (4-102)

and

Q = 2N (4-103)

Figure 4.31 in the next section illustrates the effects of wraparound error on images.
The two functions in Figs. 4.27(a) and (b) conveniently become zero before the
end of the sampling interval. If one or both of the functions were not zero at the end
of the interval, then a discontinuity would be created when zeros were appended
to the function to eliminate wraparound error. This is analogous to multiplying a
function by a box, which in the frequency domain would imply convolution of the
original transform with a sinc function (see Example 4.1). This, in turn, would create
so-called frequency leakage, caused by the high-frequency components of the sinc
function. Leakage produces a blocky effect on images. Although leakage can never
be totally eliminated, it can be reduced significantly by multiplying the sampled
function by another function that tapers smoothly to near zero at both ends of the
sampled record. This idea is to dampen the sharp transitions (and thus the high fre-
A simple apodizing
quency components) of the box. This approach, called windowing or apodizing, is an
function is a triangle, cen- important consideration when fidelity in image reconstruction (as in high-definition
tered on the data record,
which tapers to 0 at both
graphics) is desired.
ends of the record. This is
called a Bartlett window.
Other common windows SUMMARY OF 2-D DISCRETE FOURIER TRANSFORM PROPERTIES
are the Gaussian, the
Hamming and the Hann Table 4.3 summarizes the principal DFT definitions introduced in this chapter. We
windows.
will discuss the separability property in Section 4.11, where we also show how to
obtain the inverse DFT using a forward transform algorithm. Correlation will be
discussed in detail Chapter 12.
Table 4.4 summarizes some important DFT pairs. Although our focus is on dis-
crete functions, the last two entries in the table are Fourier transform pairs that can
be derived only for continuous variables (note the use of continuous variable nota-
tion).We include them here because, with proper interpretation, they are quite use-
ful in digital image processing. The differentiation pair can be used to derive the fre-
quency-domain equivalent of the Laplacian defined in Eq. (3-50) (see Problem 4.52).
The Gaussian pair is discussed in Section 4.7. Tables 4.1, 4.3 and 4.4 provide a sum-
mary of properties useful when working with the DFT. Many of these properties
are key elements in the development of the material in the rest of this chapter, and
some are used in subsequent chapters.

www.EBooksWorld.ir
258 Chapter 4 Filtering in the Frequency Domain

TABLE 4.3
Summary of DFT Name Expression(s)
definitions and 1) Discrete Fourier M −1 N −1
corresponding transform (DFT) of F (u, v) = ∑ ∑ f ( x, y) e − j 2 p(ux M + vy N )
expressions. f ( x, y)
x=0 y=0

2) Inverse discrete 1 M −1 N −1
Fourier transform f ( x, y) = ∑ ∑ F (u, v) e j 2p(ux M +vy
MN u = 0 v = 0
N)

(IDFT) of F(u, v)
12
3) Spectrum F (u, v) = ⎡⎣ R 2 (u, v) + I 2 (u, v)⎤⎦ R = Real(F ); I = Imag(F )

⎡ I (u, v) ⎤
4) Phase angle f(u, v) = tan −1 ⎢ ⎥
⎣ R(u, v) ⎦

5) Polar representation F (u, v) = F (u, v) e jf( u,v )

2
6) Power spectrum P(u, v) = F (u, v)

1 M −1 N −1 1
7) Average value f = ∑ ∑
MN x = 0 y = 0
f ( x, y) =
MN
F (0, 0)

8) Periodicity (k1 and F (u, v) = F (u + k1 M , v) = F (u, v + k2 N )


k2 are integers) = F (u + k1 , v + k2 N )
f ( x, y) = f ( x + k1 M , y) = f ( x, y + k2 N )
= f ( x + k1 M , y + k2 N )

M −1 N −1
9) Convolution ( f � h)( x, y) = ∑ ∑ f (m, n)h( x − m, y − n)
m=0 n=0

M −1 N −1
10) Correlation ( f � h)( x, y) = ∑ ∑ f * (m, n)h( x + m, y + n)
m=0 n=0

11) Separability The 2-D DFT can be computed by computing 1-D DFT
transforms along the rows (columns) of the image, followed
by 1-D transforms along the columns (rows) of the result.
See Section 4.11.

M −1 N −1
12) Obtaining the IDFT MNf * ( x, y) = ∑ ∑ F * (u, v) e − j 2 p(ux M + vy N )
using a DFT u=0 v=0
algorithm This equation indicates that inputting F * (u, v) into an
algorithm that computes the forward transform (right side
of above equation) yields MNf * ( x, y). Taking the complex
conjugate and dividing by MN gives the desired inverse. See
Section 4.11.

www.EBooksWorld.ir
4.6 Some Properties of the 2-D DFT and IDFT 259

TABLE 4.4 Name DFT Pairs


Summmary of
DFT pairs. The 1) Symmetry See Table 4.1
closed-form properties
expressions in 12
and 13 are valid 2) Linearity a f1 ( x, y) + b f2 ( x, y) ⇔ aF1 (u, v) + bF2 (u, v)
only for
M + v0 y N )
continuous 3) Translation f ( x, y) e j 2 p(u0 x ⇔ F (u − u0 , v − v0 )
variables. They (general)
f ( x − x0 , y − y0 ) ⇔ F (u, v) e − j 2 p(ux0 M + vy0 N )
can be used with
discrete variables
4) Translation f ( x, y)(−1)x + y ⇔ F (u − M 2 , v − N 2)
by sampling the
continuous to center of
the frequency f ( x − M 2 , y − N 2) ⇔ F (u, v)(−1)u + v
expressions.
rectangle,
( M 2 , N 2)
5) Rotation f (r, u + u0 ) ⇔ F (v, w + u0 )
r= x 2 + y2 u = tan −1 ( y x) v = u2 + v2 w = tan −1 (v u)

6) Convolution f � h)( x, y) ⇔ (F i H )(u, v)


theorem† ( f i h)( x, y) ⇔ (1 MN )[( F � H )(u, v)]

7) Correlation ( f � h)( x, y) ⇔ (F *i H )(u, v)


theorem† ( f *i h)( x, y) ⇔ (1 MN )[(F � H )(u, v)]

8) Discrete unit d( x, y) ⇔ 1
impulse 1 ⇔ MNd(u, v)
sin(pua) sin(pvb) − jp( ua + vb)
9) Rectangle rec [ a, b] ⇔ ab e
(pua) (pvb)

jMN
10) Sine sin( 2pu0 x M + 2pv0 y N ) ⇔
2
[d(u + u0 , v + v0 ) − d(u − u0 , v − v0 )]
1
11) Cosine cos(2pu0 x M + 2pv0 y N ) ⇔
2
[d(u + u0 , v + v0 ) + d(u − u0 , v − v0 )]
The following Fourier transform pairs are derivable only for continuous variables, denoted
as before by t and z for spatial variables and by m and n for frequency variables. These
results can be used for DFT work by sampling the continuous forms.
12) Differentiation ∂ m ∂ n m n
(the expressions a b a b f (t , z) ⇔ ( j 2pm) ( j 2pn) F (m, n)
∂t ∂z
on the right
assume that ∂ m f (t , z) ∂ n f (t , z)
m
⇔ ( j 2pm)m F (m, n); ⇔ ( j 2pn)n F (m, n)
f (± , ± ) = 0 . ∂t ∂z m
2 2
( t 2 + z2 ) 2
+ n2 ) 2 s 2
13) Gaussian A2ps 2 e −2 p s ⇔ Ae −( m ( A is a constant)


Assumes that f ( x, y) and h( x, y) have been properly padded. Convolution is associative, commutative, and
distributive. Correlation is distributive (see Table 3.5). The products are elementwise products (see Section 2.6).

www.EBooksWorld.ir
260 Chapter 4 Filtering in the Frequency Domain

4.7 THE BASICS OF FILTERING IN THE FREQUENCY DOMAIN


4.7

In this section, we lay the groundwork for all the filtering techniques discussed in the
remainder of the chapter.

ADDITIONAL CHARACTERISTICS OF THE FREQUENCY DOMAIN


We begin by observing in Eq. (4-67) that each term of F(u, v) contains all values of
f ( x, y), modified by the values of the exponential terms. Thus, with the exception
of trivial cases, it is usually impossible to make direct associations between specific
components of an image and its transform. However, some general statements can
be made about the relationship between the frequency components of the Fourier
transform and spatial features of an image. For instance, because frequency is direct-
ly related to spatial rates of change, it is not difficult intuitively to associate frequen-
cies in the Fourier transform with patterns of intensity variations in an image. We
showed in Section 4.6 that the slowest varying frequency component (u = v = 0)
is proportional to the average intensity of an image. As we move away from the
origin of the transform, the low frequencies correspond to the slowly varying inten-
sity components of an image. In an image of a room, for example, these might cor-
respond to smooth intensity variations on the walls and floor. As we move further
away from the origin, the higher frequencies begin to correspond to faster and faster
intensity changes in the image. These are the edges of objects and other components
of an image characterized by abrupt changes in intensity.
Filtering techniques in the frequency domain are based on modifying the Fourier
transform to achieve a specific objective, and then computing the inverse DFT to get
us back to the spatial domain, as introduced in Section 2.6. It follows from Eq. (4-87)
that the two components of the transform to which we have access are the transform
magnitude (spectrum) and the phase angle. We learned in Section 4.6 that visual
analysis of the phase component generally is not very useful. The spectrum, however,
provides some useful guidelines as to the gross intensity characteristics of the image
from which the spectrum was generated. For example, consider Fig. 4.28(a), which
is a scanning electron microscope image of an integrated circuit, magnified approxi-
mately 2500 times.
Aside from the interesting construction of the device itself, we note two principal
features in this image: strong edges that run approximately at ± 45°, and two white,
oxide protrusions resulting from thermally induced failure. The Fourier spectrum
in Fig. 4.28(b) shows prominent components along the ± 45° directions that corre-
spond to the edges just mentioned. Looking carefully along the vertical axis in Fig.
4.28(b), we see a vertical component of the transform that is off-axis, slightly to the
left. This component was caused by the edges of the oxide protrusions. Note how the
angle of the frequency component with respect to the vertical axis corresponds to
the inclination (with respect to the horizontal axis of the image) of the long white
element. Note also the zeros in the vertical frequency component, corresponding to
the narrow vertical span of the oxide protrusions.
These are typical of the types of associations we can make in general between
the frequency and spatial domains. As we will show later in this chapter, even these
types of gross associations, coupled with the relationships mentioned previously

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 261

a b
FIGURE 4.28 (a) SEM image of a damaged integrated circuit. (b) Fourier spectrum of (a).
(Original image courtesy of Dr. J. M. Hudak, Brockhouse Institute for Materials Research,
McMaster University, Hamilton, Ontario, Canada.)

between frequency content and rate of change of intensity levels in an image, can
lead to some very useful results. We will show in Section 4.8 the effects of modifying
various frequency ranges in the transform of Fig. 4.28(a).

FREQUENCY DOMAIN FILTERING FUNDAMENTALS


Filtering in the frequency domain consists of modifying the Fourier transform of an
image, then computing the inverse transform to obtain the spatial domain represen-
tation of the processed result. Thus, given (a padded) digital image, f ( x, y), of size
P × Q pixels, the basic filtering equation in which we are interested has the form:
If H is real and

{ }
symmetric and f is real
(as is typically the case), g( x, y) = Real �−1 [ H (u, v)F (u, v)] (4-104)
then the IDFT in Eq.
(4-104) should yield
real quantities in theory.
In practice, the inverse where �−1 is the IDFT, F(u, v) is the DFT of the input image, f ( x, y), H(u, v) is a
often contains para- filter transfer function (which we often call just a filter or filter function), and g( x, y)
sitic complex terms from
roundoff error and other is the filtered (output) image. Functions F, H, and g are arrays of size P × Q, the same
computational inaccura- as the padded input image. The product H (u, v)F (u, v) is formed using elementwise
cies. Thus, it is customary
to take the real part of multiplication, as defined in Section 2.6. The filter transfer function modifies the
the IDFT to form g.
transform of the input image to yield the processed output, g( x, y). The task of speci-
fying H(u, v) is simplified considerably by using functions that are symmetric about
their center, which requires that F(u, v) be centered also. As explained in Section 4.6,
this is accomplished by multiplying the input image by (−1)x + y prior to computing
its transform.†


Some software implementations of the 2-D DFT (e.g., MATLAB) do not center the transform. This implies
that filter functions must be arranged to correspond to the same data format as the uncentered transform (i.e.,
with the origin at the top left). The net result is that filter transfer functions are more difficult to generate and
display. We use centering in our discussions to aid in visualization, which is crucial in developing a clear under-
standing of filtering concepts. Either method can be used in practice, provided that consistency is maintained.

www.EBooksWorld.ir
262 Chapter 4 Filtering in the Frequency Domain

We are now in a position to consider filtering in detail. One of the simplest filter
transfer functions we can construct is a function H(u, v) that is 0 at the center of
the (centered) transform, and 1’s elsewhere. This filter would reject the dc term and
“pass” (i.e., leave unchanged) all other terms of F(u, v) when we form the product
H (u, v)F (u, v). We know from property 7 in Table 4.3 that the dc term is responsible
for the average intensity of an image, so setting it to zero will reduce the average
intensity of the output image to zero. Figure 4.29 shows the result of this operation
using Eq. (4-104). As expected, the image became much darker. An average of zero
implies the existence of negative intensities. Therefore, although it illustrates the
principle, Fig. 4.29 is not a true representation of the original, as all negative intensi-
ties were clipped (set to 0) by the display.
As noted earlier, low frequencies in the transform are related to slowly varying
intensity components in an image, such as the walls of a room or a cloudless sky in
an outdoor scene. On the other hand, high frequencies are caused by sharp transi-
tions in intensity, such as edges and noise. Therefore, we would expect that a func-
tion H(u, v) that attenuates high frequencies while passing low frequencies (called a
lowpass filter, as noted before) would blur an image, while a filter with the opposite
property (called a highpass filter) would enhance sharp detail, but cause a reduction
in contrast in the image. Figure 4.30 illustrates these effects. For example, the first
column of this figure shows a lowpass filter transfer function and the corresponding
filtered image. The second column shows similar results for a highpass filter. Note
the similarity between Figs. 4.30(e) and Fig. 4.29. The reason is that the highpass
filter function shown eliminates the dc term, resulting in the same basic effect that
led to Fig. 4.29. As illustrated in the third column, adding a small constant to the
filter does not affect sharpening appreciably, but it does prevent elimination of the
dc term and thus preserves tonality.
Equation (4-104) involves the product of two functions in the frequency domain
which, by the convolution theorem, implies convolution in the spatial domain. We
know from the discussion in Section 4.6 that we can expect wraparound error if
the functions in question are not padded. Figure 4.31 shows what happens when

FIGURE 4.29
Result of filter-
ing the image in
Fig. 4.28(a) with
a filter transfer
function that sets
to 0 the dc term,
F (P 2 , Q 2),
in the centered
Fourier transform,
while leaving all
other transform
terms unchanged.

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 263

H (u, v)
H (u, v)
H (u, v)

M/ 2 N/ 2
M/ 2 N/ 2 M/ 2 N/ 2
u
v a
u v u
v

a b c
d e f
FIGURE 4.30 Top row: Frequency domain filter transfer functions of (a) a lowpass filter, (b) a highpass filter, and (c)
an offset highpass filter. Bottom row: Corresponding filtered images obtained using Eq. (4-104). The offset in (c) is
a = 0.85, and the height of H(u, v) is 1. Compare (f) with Fig. 4.28(a).

a b c
FIGURE 4.31 (a) A simple image. (b) Result of blurring with a Gaussian lowpass filter without padding. (c) Result of
lowpass filtering with zero padding. Compare the vertical edges in (b) and (c).

www.EBooksWorld.ir
264 Chapter 4 Filtering in the Frequency Domain

we apply Eq. (4-104) without padding. Figure 4.31(a) shows a simple image, and
Fig. 4.31(b) is the result of lowpass filtering the image with a Gaussian lowpass filter
of the form shown in Fig. 4.30(a). As expected, the image is blurred. However, the
blurring is not uniform; the top white edge is blurred, but the sides are not. Pad-
ding the input image with zeros according to Eqs. (4-98) and (4-99) before applying
Eq. (4-104) resulted in the filtered image in Fig. 4.31(c). This result is as expected,
with a uniform dark border resulting from zero padding (see Fig. 3.33 for an expla-
nation of this effect).
Figure 4.32 illustrates the reason for the discrepancy between Figs. 4.31(b) and (c).
The dashed area in Fig. 4.32(a) corresponds to the image in Fig. 4.31(a). The other
copies of the image are due to the implied periodicity of the image (and its trans-
form) implicit when we use the DFT, as explained in Section 4.6. Imagine convolving
the spatial representation of the blurring filter (i.e., the corresponding spatial ker-
nel) with this image. When the kernel is centered on the top of the dashed image, it
will encompass part of the image and also part of the bottom of the periodic image
immediately above it. When a dark and a light region reside under the filter, the
result is a mid-gray, blurred output. However, when the kernel is centered on the top
right side of the image, it will encompass only light areas in the image and its right
region. Because the average of a constant value is that same value, filtering will have
no effect in this area, giving the result in Fig. 4.31(b). Padding the image with 0’s cre-
ates a uniform border around each image of the periodic sequence, as Fig. 4.32(b)
shows. Convolving the blurring kernel with the padded “mosaic” of Fig. 4.32(b) gives
the correct result in Fig. 4.31(c). You can see from this example that failure to pad an
image prior to filtering can lead to unexpected results.
Thus far, the discussion has centered on padding the input image. However,
Eq. (4-104) also involves a filter transfer function that can be specified either in the

a b
FIGURE 4.32 (a) Image periodicity without image padding. (b) Periodicity after padding with 0’s (black). The dashed
areas in the center correspond to the image in Fig. 4.31(a). Periodicity is inherent when using the DFT. (The thin
white lines in both images are superimposed for clarity; they are not part of the data.)

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 265

spatial or in the frequency domain. But padding is done in the spatial domain, which
raises an important question about the relationship between spatial padding and
filter functions specified directly in the frequency domain.
It would be reasonable to conclude that the way to handle padding of a frequency
domain transfer function is to construct the function the same size as the unpad-
ded image, compute the IDFT of the function to obtain the corresponding spatial
representation, pad that representation in the spatial domain, and then compute its
DFT to return to the frequency domain. The 1-D example in Fig. 4.33 illustrates the
pitfalls in this approach.
Figure 4.33(a) shows a 1-D ideal lowpass filter transfer function in the frequency
domain. The function is real and has even symmetry, so we know from property 8
in Table 4.1 that its IDFT will be real and symmetric also. Figure 4.33(b) shows the
result of multiplying the elements of the transfer function by (−1)u and computing
Padding the two ends of
its IDFT to obtain the corresponding spatial filter kernel. The result is shown in
a function is the same Fig. 4.33(b). It is evident in this figure that the extremes of this spatial function are
as padding one end,
provided that the total
not zero. Zero-padding the function would create two discontinuities, as Fig. 4.33(c)
number of zeros is the shows. To return to the frequency domain, we compute the forward DFT of the
same.
spatial, padded function. As Fig. 4.33(d) shows, the discontinuities in the padded
function caused ringing in its frequency domain counterpart.

a c 1.2 0.04
b d
1
FIGURE 4.33 0.03
(a) Filter transfer 0.8
function specified in
the (centered) 0.6 0.02
frequency domain.
(b) Spatial 0.4 0.01
representation (filter
kernel) obtained by 0.2
computing the IDFT 0
of (a). 0
(c) Result of
padding (b) to twice 0.2 0.01
0 128 255 0 128 256 384 511
its length (note the
0.04 1.2
discontinuities).
(d) Corresponding
1
filter in the frequen- 0.03
cy domain obtained 0.8
by computing the
DFT of (c). Note the 0.02 0.6
ringing caused by
the discontinuities 0.4
in (c). Part (b) of the 0.01
figure is below (a), 0.2
and (d) is below (c). 0
0

0.01 0.2
0 128 255 0 128 256 384 511

www.EBooksWorld.ir
266 Chapter 4 Filtering in the Frequency Domain

The preceding results tell us that we cannot pad the spatial representation of a
frequency domain transfer function in order to avoid wraparound error. Our objec-
tive is to work with specified filter shapes in the frequency domain without having to
be concerned with truncation issues. An alternative is to pad images and then create
the desired filter transfer function directly in the frequency domain, this function
being of the same size as the padded images (remember, images and filter transfer
functions must be of the same size when using the DFT). Of course, this will result
in wraparound error because no padding is used for the filter transfer function, but
this error is mitigated significantly by the separation provided by padding the image,
and it is preferable to ringing. Smooth transfer functions (such as those in Fig. 4.30)
present even less of a problem. Specifically, then, the approach we will follow in this
chapter is to pad images to size P × Q and construct filter transfer functions of the
same dimensions directly in the frequency domain. As explained earlier, P and Q
are given by Eqs. (4-100) and (4-101).
We conclude this section by analyzing the phase angle of filtered images. We can
express the DFT in terms of its real and imaginary parts: F (u, v) = R(u, v) + jI (u, v).
Equation (4-104) then becomes

g( x, y) = �−1 [ H (u, v)R(u, v) + jH (u, v)I (u, v)] (4-105)

The phase angle is computed as the arctangent of the ratio of the imaginary and the
real parts of a complex number [see Eq. (4-88)]. Because H(u, v) multiplies both
R and I, it will cancel out when this ratio is formed. Filters that affect the real and
imaginary parts equally, and thus have no effect on the phase angle, are appropri-
ately called zero-phase-shift filters. These are the only types of filters considered in
this chapter.
The importance of the phase angle in determining the spatial structure of an
image was vividly illustrated in Fig. 4.26. Thus, it should be no surprise that even
small changes in the phase angle can have dramatic (and usually undesirable) effects
on the filtered output. Figures 4.34(b) and (c) illustrate the effect of changing the
phase angle array of the DFT of Fig. 4.34(a) (the F(u, v) term was not changed in
either case). Figure 4.34(b) was obtained by multiplying the phase angle, f(u, v), in
Eq. (4-86) by −1 and computing the IDFT. The net result is a reflection of every pixel
in the image about both coordinate axes. Figure 4.34(c) was obtained by multiply-
ing the phase term by 0.25 and computing the IDFT. Even a scale change rendered
the image almost unrecognizable. These two results illustrate the advantage of using
frequency-domain filters that do not alter the phase angle.

SUMMARY OF STEPS FOR FILTERING IN THE FREQUENCY DOMAIN


The process of filtering in the frequency domain can be summarized as follows:

1. Given an input image f ( x, y) of size M × N , obtain the padding sizes P and Q


using Eqs. (4-102) and (4-103); that is, P = 2 M and Q = 2 N .

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 267

a b c
FIGURE 4.34 (a) Original image. (b) Image obtained by multiplying the phase angle array by −1 in Eq. (4-86) and
computing the IDFT. (c) Result of multiplying the phase angle by 0.25 and computing the IDFT. The magnitude of
the transform, F(u, v) , used in (b) and (c) was the same.

2. Form a padded† image f p ( x, y) of size P × Q using zero-, mirror-, or replicate


padding (see Fig. 3.39 for a comparison of padding methods).
3. Multiply f p ( x, y) by (−1)x + y to center the Fourier transform on the P × Q fre-
quency rectangle.
4. Compute the DFT, F(u, v), of the image from Step 3.
5. Construct a real, symmetric filter transfer function, H(u, v), of size P × Q with
center at (P 2 , Q 2).
See Section 2.6 for a
6. Form the product G(u, v) = H (u, v)F (u, v) using elementwise multiplication; that
definition of elementwise is, G(i, k ) = H (i, k )F (i, k ) for i = 0, 1, 2, … , M − 1 and k = 0, 1, 2, … , N − 1.
operations.
7. Obtain the filtered image (of size P × Q) by computing the IDFT of G(u, v) :

g p ( x, y) = Qreal ⎡⎣�−1 {G(u, v)}⎤⎦ R(−1)x + y

8. Obtain the final filtered result, g( x, y), of the same size as the input image, by
extracting the M × N region from the top, left quadrant of g p ( x, y).

We will discuss the construction of filter transfer functions (Step 5) in the following
sections of this chapter. In theory, the IDFT in Step 7 should be real because f ( x, y)
is real and H(u, v) is real and symmetric. However, parasitic complex terms in the
IDFT resulting from computational inaccuracies are not uncommon. Taking the real
part of the result takes care of that. Multiplication by (−1)x + y cancels out the multi-
plication by this factor in Step 3.


Sometimes we omit padding when doing “quick” experiments to get an idea of filter performance, or when
trying to determine quantitative relationships between spatial features and their effect on frequency domain
components, particularly in band and notch filtering, as explained later in Section 4.10 and in Chapter 5.

www.EBooksWorld.ir
268 Chapter 4 Filtering in the Frequency Domain

a b c
d e f
g h
FIGURE 4.35
(a) An M × N
image, f .
(b) Padded image,
f p of size P × Q.
(c) Result of
multiplying f p by
(−1)x + y .
(d) Spectrum of
F. (e) Centered
Gaussian lowpass
filter transfer
function, H, of size
P × Q.
(f) Spectrum of
the product HF.
(g) Image g p , the
real part of the
IDFT of HF, mul-
tiplied by (−1)x + y .
(h) Final result,
g, obtained by
extracting the first
M rows and N
columns of g p .

Figure 4.35 illustrates the preceding steps using zero padding. The figure legend
explains the source of each image. If enlarged, Fig. 4.35(c) would show black dots
interleaved in the image because negative intensities, resulting from the multiplica-
tion of f p by (−1)x + y , are clipped at 0 by the display. Note in Fig. 4.35(h) the charac-
teristic dark border of by lowpass filtered images obtained using zero padding.

CORRESPONDENCE BETWEEN FILTERING IN THE SPATIAL AND


FREQUENCY DOMAINS
As mentioned several times before, the link between filtering in the spatial and fre-
quency domains is the convolution theorem. Earlier in this section, we defined fil-
See Section 2.6 for a
tering in the frequency domain as the elementwise product of a filter transfer func-
definition of elementwise tion, H(u, v), and F(u, v), the Fourier transform of the input image. Given H(u, v),
operations. suppose that we want to find its equivalent kernel in the spatial domain. If we let
f ( x, y) = d( x, y), it follows from Table 4.4 that F(u, v) = 1. Then, from Eq. (4-104),
the filtered output is �−1{H(u, v)} . This expression as the inverse transform of the
frequency domain filter transfer function, which is the corresponding kernel in the

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 269

spatial domain. Conversely, it follows from a similar analysis and the convolution
theorem that, given a spatial filter kernel, we obtain its frequency domain repre-
sentation by taking the forward Fourier transform of the kernel. Therefore, the two
filters form a Fourier transform pair:

h( x, y) ⇔ H (u, v) (4-106)

where h( x, y) is the spatial kernel. Because this kernel can be obtained from the
response of a frequency domain filter to an impulse, h( x, y) sometimes is referred to
as the impulse response of H(u, v). Also, because all quantities in a discrete imple-
mentation of Eq. (4-106) are finite, such filters are called finite impulse response
(FIR) filters. These are the only types of linear spatial filters considered in this book.
We discussed spatial convolution in Section 3.4, and its implementation in
Eq. (3-35), which involved convolving functions of different sizes. When we use the
DFT to compute the transforms used in the convolution theorem, it is implied that
we are convolving periodic functions of the same size, as explained in Fig. 4.27. For
this reason, as explained earlier, Eq. (4-94) is referred to as circular convolution.
When computational speed, cost, and size are important parameters, spatial con-
volution filtering using Eq. (3-35) is well suited for small kernels using hardware
and/or firmware, as explained in Section 4.1. However, when working with general-
purpose machines, frequency-domain methods in which the DFT is computed using
a fast Fourier transform (FFT) algorithm can be hundreds of times faster than using
spatial convolution, depending on the size of the kernels used, as you saw in Fig. 4.2.
We will discuss the FFT and its computational advantages in Section 4.11.
Filtering concepts are more intuitive in the frequency domain, and filter design
often is easier there. One way to take advantage of the properties of both domains
is to specify a filter in the frequency domain, compute its IDFT, and then use the
properties of the resulting, full-size spatial kernel as a guide for constructing smaller
kernels. This is illustrated next (keep in mind that the Fourier transform and its
inverse are linear processes (see Problem 4.24), so the discussion is limited to linear
filtering). In Example 4.15, we illustrate the converse, in which a spatial kernel is
given, and we obtain its full-size frequency domain representation. This approach is
useful for analyzing the behavior of small spatial kernels in the frequency domain.
Frequency domain filters can be used as guides for specifying the coefficients of
some of the small kernels we discussed in Chapter 3. Filters based on Gaussian func-
tions are of particular interest because, as noted in Table 4.4, both the forward and
inverse Fourier transforms of a Gaussian function are real Gaussian functions. We
limit the discussion to 1-D to illustrate the underlying principles. Two-dimensional
Gaussian transfer functions are discussed later in this chapter.
Let H (u) denote the 1-D frequency domain Gaussian transfer function
2
2s 2
As mentioned in Table
4.4, the forward and
H (u ) = Ae −u (4-107)
inverse Fourier trans-
forms of Gaussians are where s is the standard deviation of the Gaussian curve. The kernel in the spatial
valid only for continuous
variables. To use discrete
domain is obtained by taking the inverse DFT of H (u) (see Problem 4.48):
formulations, we sample 2 2 2
the continuous forms. h( x ) = 2ps A e −2 p s x (4-108)

www.EBooksWorld.ir
270 Chapter 4 Filtering in the Frequency Domain

These two equations are important for two reasons: (1) They are a Fourier trans-
form pair, both components of which are Gaussian and real. This facilitates analysis
because we do not have to be concerned with complex numbers. In addition, Gauss-
ian curves are intuitive and easy to manipulate. (2) The functions behave recipro-
cally. When H (u) has a broad profile (large value of s), h( x) has a narrow profile,
and vice versa. In fact, as s approaches infinity, H (u) tends toward a constant func-
tion and h( x) tends toward an impulse, which implies no filtering in either domain.
Figures 4.36(a) and (b) show plots of a Gaussian lowpass filter transfer function
in the frequency domain and the corresponding function in the spatial domain. Sup-
pose that we want to use the shape of h( x) in Fig. 4.36(b) as a guide for specifying
the coefficients of a small kernel in the spatial domain. The key characteristic of the
function in Fig. 4.36(b) is that all its values are positive. Thus, we conclude that we
can implement lowpass filtering in the spatial domain by using a kernel with all posi-
tive coefficients (as we did in Section 3.5). For reference, Fig. 4.36(b) also shows two
of the kernels discussed in that section. Note the reciprocal relationship between
the width of the Gaussian functions, as discussed in the previous paragraph. The nar-
rower the frequency domain function, the more it will attenuate the low frequencies,
resulting in increased blurring. In the spatial domain, this means that a larger kernel
must be used to increase blurring, as we illustrated in Example 3.11.
As you know from Section 3.7, we can construct a highpass filter from a lowpass
filter by subtracting a lowpass function from a constant. We working with Gauss-
ian functions, we can gain a little more control over filter function shape by using
a so-called difference of Gaussians, which involves two lowpass functions. In the
frequency domain, this becomes
2
/ 2 s12 2
/ 2 s 22
H (u) = Ae − u − B e−u (4-109)

with A ≥ B and s1 > s 2 . The corresponding function in the spatial domain is

a c H (u) H (u)
b d
FIGURE 4.36
(a) A 1-D Gaussian
lowpass transfer
function in the
frequency domain.
(b) Corresponding
kernel in the spatial u u
domain. (c) Gauss- h (x) h (x)
ian highpass trans-
fer function in the 1
1 1 1 1 1 1

frequency domain. ––
9
 1
1
1
1
1
1
1 8 1
1 1 1
(d) Corresponding 1 2 1 0 1 0
kernel. The small 1
––  2
16
4 2 1 4 1
1 2 1 0 1 0
2-D kernels shown
are kernels we used x x
in Chapter 3.

www.EBooksWorld.ir
4.7 The Basics of Filtering in the Frequency Domain 271

2 2 2 2 2 2
h( x) = 2ps1 Ae −2 p s1 x − 2ps 2 Be −2 p s2 x (4-110)

Figures 4.36(c) and (d) show plots of these two equations. We note again the reci-
procity in width, but the most important feature here is that h( x) has a positive cen-
ter term with negative terms on either side. The small kernels shown in Fig. 4.36(d),
which we used in Chapter 3 for sharpening, “capture” this property, and thus illus-
trate how knowledge of frequency domain filtering can be used as the basis for
choosing coefficients of spatial kernels.
Although we have gone through significant effort to get here, be assured that it is
impossible to truly understand filtering in the frequency domain without the foun-
dation we have just established. In practice, the frequency domain can be viewed as
a “laboratory” in which we take advantage of the correspondence between frequen-
cy content and image appearance. As will be demonstrated numerous times later in
this chapter, some tasks that would be exceptionally difficult to formulate direct-
ly in the spatial domain become almost trivial in the frequency domain. Once we
have selected a specific filter transfer function via experimentation in the frequency
domain, we have the option of implementing the filter directly in that domain using
the FFT, or we can take the IDFT of the transfer function to obtain the equivalent
spatial domain function. As we showed in Fig. 4.36, one approach is to specify a
small spatial kernel that attempts to capture the “essence” of the full filter function
in the spatial domain. A more formal approach is to design a 2-D digital filter by
using approximations based on mathematical or statistical criteria, as we discussed
in Section 3.7.

EXAMPLE 4.15 : Obtaining a frequency domain transfer function from a spatial kernel.
In this example, we start with a spatial kernel and show how to generate its corresponding filter trans-
fer function in the frequency domain. Then, we compare the filtering results obtained using frequency
domain and spatial techniques. This type of analysis is useful when one wishes to compare the perfor-
mance of a given kernel against one or more “full” filter candidates in the frequency domain, or to gain a
deeper understanding about the performance of a kernel in the spatial domain. To keep matters simple,
we use the 3 × 3 vertical Sobel kernel from Fig. 3.50(e). Figure 4.37(a) shows a 600 × 600-pixel image,
f ( x, y), that we wish to filter, and Fig. 4.37(b) shows its spectrum.
Figure 4.38(a) shows the Sobel kernel, h( x, y) (the perspective plot is explained below). Because
the input image is of size 600 × 600 pixels and the kernel is of size 3 × 3, we avoid wraparound error in
the frequency domain by padding f and h with zeros to size 602 × 602 pixels, according to Eqs. (4-100)
and (4-101). At first glance, the Sobel kernel appears to exhibit odd symmetry. However, its first element
is not 0, as required by Eq. (4-81). To convert the kernel to the smallest size that will satisfy Eq. (4-83),
we have to add to it a leading row and column of 0’s, which turns it into an array of size 4 × 4. We can
embed this array into a larger array of zeros and still maintain its odd symmetry if the larger array is of
even dimensions (as is the 4 × 4 kernel) and their centers coincide, as explained in Example 4.10. The
preceding comments are an important aspect of filter generation. If we preserve the odd symmetry with
respect to the padded array in forming hp ( x, y), we know from property 9 in Table 4.1 that H(u, v) will
be purely imaginary. As we show at the end of this example, this will yield results that are identical to
filtering the image spatially using the original kernel h( x, y). If the symmetry were not preserved, the
results would no longer be the same.

www.EBooksWorld.ir
272 Chapter 4 Filtering in the Frequency Domain

a b
FIGURE 4.37
(a) Image of a
building, and
(b) its Fourier
spectrum.

The procedure used to generate H(u, v) is: (1) multiply hp ( x, y) by (−1)x + y to center the frequency
domain filter; (2) compute the forward DFT of the result in (1) to generate H(u, v); (3) set the real
part of H(u, v) to 0 to account for parasitic real parts (we know that H has to be purely imaginary
because hp is real and odd); and (4) multiply the result by (−1)u + v . This last step reverses the multiplica-
tion of H(u, v) by (−1)u + v , which is implicit when h( x, y) was manually placed in the center of hp ( x, y).
Figure 4.38(a) shows a perspective plot of H(u, v), and Fig. 4.38(b) shows H(u, v) as an image. Note
the antisymmetry in this image about its center, a result of H(u, v) being odd. Function H(u, v) is used
as any other frequency domain filter transfer function. Figure 4.38(c) is the result of using the filter
transfer function just obtained to filter the image in Fig. 4.37(a) in the frequency domain, using the step-
by-step filtering procedure outlined earlier. As expected from a derivative filter, edges were enhanced
and all the constant intensity areas were reduced to zero (the grayish tone is due to scaling for display).
Figure 4.38(d) shows the result of filtering the same image in the spatial domain with the Sobel kernel
h( x, y), using the procedure discussed in Section 3.6. The results are identical.

4.8 IMAGE SMOOTHING USING LOWPASS FREQUENCY DOMAIN


4.8
FILTERS
The remainder of this chapter deals with various filtering techniques in the frequency
domain, beginning with lowpass filters. Edges and other sharp intensity transitions
(such as noise) in an image contribute significantly to the high frequency content
of its Fourier transform. Hence, smoothing (blurring) is achieved in the frequency
domain by high-frequency attenuation; that is, by lowpass filtering. In this section,
we consider three types of lowpass filters: ideal, Butterworth, and Gaussian. These
three categories cover the range from very sharp (ideal) to very smooth (Gaussian)
filtering. The shape of a Butterworth filter is controlled by a parameter called the
filter order. For large values of this parameter, the Butterworth filter approaches
the ideal filter. For lower values, the Butterworth filter is more like a Gaussian filter.
Thus, the Butterworth filter provides a transition between two “extremes.” All filter-
ing in this section follows the procedure outlined in the previous section, so all filter
transfer functions, H(u, v), are understood to be of size P × Q; that is, the discrete

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 273

a b
1 0 1
c d
FIGURE 4.38 2 0 2
(a) A spatial
kernel and per- 1 0 1
spective plot of
its corresponding
frequency domain
filter transfer
function.
(b) Transfer
function shown as
an image.
(c) Result of
filtering
Fig. 4.37(a) in the
frequency domain
with the transfer
function in (b).
(d) Result of
filtering the same
image in the
spatial domain
with the kernel
in (a). The results
are identical.

frequency variables are in the range u = 0, 1, 2, … , P − 1 and v = 0, 1, 2, … , Q − 1,


where P and Q are the padded sizes given by Eqs. (4-100) and (4-101).

IDEAL LOWPASS FILTERS


A 2-D lowpass filter that passes without attenuation all frequencies within a circle of
radius from the origin, and “cuts off” all frequencies outside this, circle is called an
ideal lowpass filter (ILPF); it is specified by the transfer function

⎧1 if D(u, v) ≤ D0
H (u, v) = ⎨ (4-111)
⎩0 if D(u, v) > D0

where D0 is a positive constant, and D(u, v) is the distance between a point (u, v) in
the frequency domain and the center of the P × Q frequency rectangle; that is,
1/ 2
D(u, v) = ⎡( u − P 2 ) + ( v − Q 2 ) ⎤
2 2
(4-112)
⎣ ⎦

www.EBooksWorld.ir
274 Chapter 4 Filtering in the Frequency Domain
H (u, v) v H (u, v)

u
v
D (u, v)
D0
u
a b c
FIGURE 4.39 (a) Perspective plot of an ideal lowpass-filter transfer function. (b) Function displayed as an image.
(c) Radial cross section.

where, as before, P and Q are the padded sizes from Eqs. (4-102) and (4-103).
Figure 4.39(a) shows a perspective plot of transfer function H(u, v) and Fig. 4.39(b)
shows it displayed as an image. As mentioned in Section 4.3, the name ideal indicates
that all frequencies on or inside a circle of radius D0 are passed without attenuation,
whereas all frequencies outside the circle are completely attenuated (filtered out).
The ideal lowpass filter transfer function is radially symmetric about the origin. This
means that it is defined completely by a radial cross section, as Fig. 4.39(c) shows. A
2-D representation of the filter is obtained by rotating the cross section 360°.
For an ILPF cross section, the point of transition between the values H(u, v) = 1
and H(u, v) = 0 is called the cutoff frequency. In Fig. 4.39, the cutoff frequency is D0 .
The sharp cutoff frequency of an ILPF cannot be realized with electronic compo-
nents, although they certainly can be simulated in a computer (subject to the con-
strain that the fastest possible transition is limited by the distance between pixels).
The lowpass filters in this chapter are compared by studying their behavior as a
function of the same cutoff frequencies. One way to establish standard cutoff fre-
quency loci using circles that enclose specified amounts of total image power PT ,
which we obtain by summing the components of the power spectrum of the padded
images at each point (u, v), for u = 0, 1, 2, … , P − 1 and v = 0, 1, 2, … , Q − 1; that is,
P −1 Q −1
PT = ∑ ∑ P(u, v)
u=0 v=0
(4-113)

where P(u, v) is given by Eq. (4-89). If the DFT has been centered, a circle of radius
D0 with origin at the center of the frequency rectangle encloses a percent of the
power, where
⎡ ⎤
a = 100 ⎢∑ ∑ P(u, v) PT ⎥ (4-114)
⎣u v ⎦
and the summation is over values of (u, v) that lie inside the circle or on its boundary.
Figures 4.40(a) and (b) show a test pattern image and its spectrum. The cir-
cles superimposed on the spectrum have radii of 10, 30, 60, 160, and 460 pixels,

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 275

a b
FIGURE 4.40 (a) Test pattern of size 688 × 688 pixels, and (b) its spectrum. The spectrum is dou-
ble the image size as a result of padding, but is shown half size to fit. The circles have radii of
10, 30, 60, 160, and 460 pixels with respect to the full-size spectrum. The radii enclose 86.9, 92.8,
95.1, 97.6, and 99.4% of the padded image power, respectively.

respectively, and enclosed the percentages of total power listed in the figure caption.
The spectrum falls off rapidly, with close to 87% of the total power being enclosed
by a relatively small circle of radius 10. The significance of this will become evident
in the following example.

EXAMPLE 4.16 : Image smoothing in the frequency domain using lowpass filters.
Figure 4.41 shows the results of applying ILPFs with cutoff frequencies at the radii shown in Fig. 4.40(b).
Figure 4.41(b) is useless for all practical purposes, unless the objective of blurring is to eliminate all
detail in the image, except the “blobs” representing the largest objects. The severe blurring in this image
is a clear indication that most of the sharp detail information in the image is contained in the 13% power
removed by the filter. As the filter radius increases, less and less power is removed, resulting in less blur-
ring. Note that the images in Figs. 4.41(c) through (e) contain significant “ringing,” which becomes finer
in texture as the amount of high frequency content removed decreases. Ringing is visible even in the
image in which only 2% of the total power was removed [Fig. 4.41(e)]. This ringing behavior is a char-
acteristic of ideal filters, as we have mentioned several times before. Finally, the result for a = 99.4% in
Fig. 4.41(f) shows very slight blurring and almost imperceptible ringing but, for the most part, this image
is close to the original. This indicates that little edge information is contained in the upper 0.6% of the
spectrum power removed by the ILPF.
It is clear from this example that ideal lowpass filtering is not practical. However, it is useful to study
the behavior of ILPFs as part of our development of filtering concepts. Also, as shown in the discussion
that follows, some interesting insight is gained by attempting to explain the ringing property of ILPFs
in the spatial domain.

www.EBooksWorld.ir
276 Chapter 4 Filtering in the Frequency Domain

a b c
d e f
FIGURE 4.41 (a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using ILPFs with cutoff frequencies
set at radii values 10, 30, 60, 160, and 460, as shown in Fig. 4.40(b). The power removed by these filters was 13.1, 7.2,
4.9, 2.4, and 0.6% of the total, respectively. We used mirror padding to avoid the black borders characteristic of zero
padding, as illustrated in Fig. 4.31(c).

The blurring and ringing properties of ILPFs can be explained using the convolu-
tion theorem. Figure 4.42(a) shows an image of a frequency-domain ILPF transfer
function of radius 15 and size 1000 × 1000 pixels. Figure 4.42(b) is the spatial repre-
sentation, h( x, y), of the ILPF, obtained by taking the IDFT of (a) (note the ringing).
Figure 4.42(c) shows the intensity profile of a line passing through the center of (b).
This profile resembles a sinc function.† Filtering in the spatial domain is done by
convolving the function in Fig. 4.42(b) with an image. Imagine each pixel in an image
as being a discrete impulse whose strength is proportional to the intensity of the
image at that location. Convolving this sinc-like function with an impulse copies (i.e.,
shifts the origin of) the function to the location of the impulse. That is, convolution

Although this profile resembles a sinc function, the transform of an ILPF is actually a Bessel function whose
derivation is beyond the scope of this discussion. The important point to keep in mind is that the inverse propor-
tionality between the “width” of the filter function in the frequency domain, and the “spread” of the width of the
lobes in the spatial function, still holds.

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 277

a b c
FIGURE 4.42
(a) Frequency
domain ILPF
transfer function.
(b) Corresponding
spatial domain
kernel function.
(c) Intensity profile
of a horizontal line
through the center
of (b).

makes a copy of the function in Fig. 4.42(b) centered on each pixel location in the
image. The center lobe of this spatial function is the principal cause of blurring, while
the outer, smaller lobes are mainly responsible for ringing. Because the “spread” of
the spatial function is inversely proportional to the radius of H(u, v), the larger D0
becomes (i,e, the more frequencies that are passed), the more the spatial function
approaches an impulse which, in the limit, causes no blurring at all when convolved
with the image. The converse happens as D0 becomes smaller. This type of recipro-
cal behavior should be routine to you by now. In the next two sections, we show that
it is possible to achieve blurring with little or no ringing, an important objective in
lowpass filtering.

GAUSSIAN LOWPASS FILTERS


Gaussian lowpass filter (GLPF) transfer functions have the form
2
( u,v )/ 2s 2
H (u, v) = e − D (4-115)

where, as in Eq. (4-112), D(u, v) is the distance from the center of the P × Q fre-
quency rectangle to any point, (u, v), contained by the rectangle. Unlike our earlier
expressions for Gaussian functions, we do not use a multiplying constant here in
order to be consistent with the filters discussed in this and later sections, whose
highest value is 1. As before, s is a measure of spread about the center. By letting
s = D0 , we can express the Gaussian transfer function in the same notation as other
functions in this section:
2
( u,v )/ 2 D02
H (u, v) = e − D (4-116)

where D0 is the cutoff frequency. When D(u, v) = D0 , the GLPF transfer function is
down to 0.607 of its maximum value of 1.0.
From Table 4.4, we know that the inverse Fourier transform of a frequency-
domain Gaussian function is Gaussian also. This means that a spatial Gaussian filter
kernel, obtained by computing the IDFT of Eq. (4-115) or (4-116), will have no
ringing. As property 13 of Table 4.4 shows, the same inverse relationship explained
earlier for ILPFs is true also of GLPFs. Narrow Gaussian transfer functions in the
frequency domain imply broader kernel functions in the spatial domain, and vice

www.EBooksWorld.ir
278 Chapter 4 Filtering in the Frequency Domain

H (u, v) H (u, v)
v
1.0

D0  10
0.607 D0  20
D0  40
D0  60
u
v
0 D(u, v)
u
a b c
FIGURE 4.43 (a) Perspective plot of a GLPF transfer function. (b) Function displayed as an image. (c) Radial cross
sections for various values of D0 .

versa. Figure 4.43 shows a perspective plot, image display, and radial cross sections
of a GLPF transfer function.

EXAMPLE 4.17 : Image smoothing in the frequency domain using Gaussian lowpass filters.
Figure 4.44 shows the results of applying the GLPF of Eq. (4-116) to Fig. 4.44(a), with D0 equal to the five
radii in Fig. 4.40(b). Compared to the results obtained with an ILPF (Fig. 4.41), we note a smooth transi-
tion in blurring as a function of increasing cutoff frequency. The GLPF achieved slightly less smoothing
than the ILPF. The key difference is that we are assured of no ringing when using a GLPF. This is an
important consideration in practice, especially in situations in which any type of artifact is unacceptable,
as in medical imaging. In cases where more control of the transition between low and high frequencies
about the cutoff frequency are needed, the Butterworth lowpass filter discussed next presents a more
suitable choice. The price of this additional control over the filter profile is the possibility of ringing, as
you will see shortly.

BUTTERWORTH LOWPASS FILTERS


The transfer function of a Butterworth lowpass filter (BLPF) of order n, with cutoff
frequency at a distance D0 from the center of the frequency rectangle, is defined as

1
H (u, v) = (4-117)
1 + [ D(u, v) D0 ]
2n

where D(u, v) is given by Eq. (4-112). Figure 4.45 shows a perspective plot, image
display, and radial cross sections of the BLPF function. Comparing the cross section
plots in Figs. 4.39, 4.43, and 4.45, we see that the BLPF function can be controlled to
approach the characteristics of the ILPF using higher values of n, and the GLPF for
lower values of n, while providing a smooth transition in from low to high frequen-
cies. Thus, we can use a BLPF to approach the sharpness of an ILPF function with
considerably less ringing.

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 279

a b c
d e f
FIGURE 4.44 (a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using GLPFs with cutoff frequen-
cies at the radii shown in Fig. 4.40. Compare with Fig. 4.41. We used mirror padding to avoid the black borders
characteristic of zero padding.

H (u, v) H (u, v)
v
1.0

n1
0.5 n2
n3
u n4
v
D(u, v)
D0
u
a b c
FIGURE 4.45 (a) Perspective plot of a Butterworth lowpass-filter transfer function. (b) Function displayed as an image.
(c) Radial cross sections of BLPFs of orders 1 through 4.

www.EBooksWorld.ir
280 Chapter 4 Filtering in the Frequency Domain

EXAMPLE 4.18 : Image smoothing using a Butterworth lowpass filter.

Figures 4.46(b)-(f) show the results of applying the BLPF of Eq. (4-117) to Fig. 4.46(a), with cutoff
frequencies equal to the five radii in Fig. 4.40(b), and with n = 2.25. The results in terms of blurring are
between the results obtained with using ILPFs and GLPFs. For example, compare Fig. 4.46(b), with
Figs. 4.41(b) and 4.44(b). The degree of blurring with the BLPF was less than with the ILPF, but more
than with the GLPF.

The kernels in Figs. 4.47(a) The spatial domain kernel obtainable from a BLPF of order 1 has no ringing.
through (d) were obtained Generally, ringing is imperceptible in filters of order 2 or 3, but can become sig-
using the procedure out-
lined in the explanation of nificant in filters of higher orders. Figure 4.47 shows a comparison between the spa-
Fig. 4.42. tial representation (i.e., spatial kernels) corresponding to BLPFs of various orders
(using a cutoff frequency of 5 in all cases). Shown also is the intensity profile along

a b c
d e f
FIGURE 4.46 (a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using BLPFs with cutoff frequen-
cies at the radii shown in Fig. 4.40 and n = 2.25. Compare with Figs. 4.41 and 4.44. We used mirror padding to avoid
the black borders characteristic of zero padding.

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 281

a b c d
e f g h
FIGURE 4.47 (a)–(d) Spatial representations (i.e., spatial kernels) corresponding to BLPF transfer functions of size
1000 × 1000 pixels, cut-off frequency of 5, and order 1, 2, 5, and 20, respectively. (e)–(h) Corresponding intensity
profiles through the center of the filter functions.

a horizontal scan line through the center of each spatial kernel. The kernel corre-
sponding to the BLPF of order 1 [see Fig. 4.47(a)] has neither ringing nor negative
values. The kernel corresponding to a BLPF of order 2 does show mild ringing and
small negative values, but they certainly are less pronounced than would be the case
for an ILPF. As the remaining images show, ringing becomes significant for higher-
order filters. A BLPF of order 20 has a spatial kernel that exhibits ringing charac-
teristics similar to those of the ILPF (in the limit, both filters are identical). BLPFs
of orders 2 to 3 are a good compromise between effective lowpass filtering and
acceptable spatial-domain ringing. Table 4.5 summarizes the lowpass filter transfer
functions discussed in this section.

ADDITIONAL EXAMPLES OF LOWPASS FILTERING


In the following discussion, we show several practical applications of lowpass filter-
ing in the frequency domain. The first example is from the field of machine per-
ception with application to character recognition; the second is from the printing
and publishing industry; and the third is related to processing satellite and aerial
images. Similar results can be obtained using the lowpass spatial filtering techniques
discussed in Section 3.5. We use GLPFs in all examples for consistency, but simi-
lar results can be obtained using BLPFs. Keep in mind that images are padded to
double size for filtering, as indicated by Eqs. (4-102) and (4-103), and filter transfer
functions have to match padded-image size. The values of D0 used in the following
examples reflect this doubled filter size.

www.EBooksWorld.ir
282 Chapter 4 Filtering in the Frequency Domain

TABLE 4.5
Lowpass filter transfer functions. D0 is the cutoff frequency, and n is the order of the Butterworth filter.
Ideal Gaussian Butterworth

⎧1 if D(u, v) ≤ D0
H ( u, v) = e − D ( u,v) 2 D0
2 2
1
H (u, v) = ⎨ H (u, v) =
⎩0 if D(u, v) > D0 1 + [ D(u, v) D0 ]
2n

Figure 4.48 shows a sample of text of low resolution. One encounters text like
this, for example, in fax transmissions, duplicated material, and historical records.
This particular sample is free of additional difficulties like smudges, creases, and
torn sections. The magnified section in Fig. 4.48(a) shows that the characters in this
document have distorted shapes due to lack of resolution, and many of the charac-
ters are broken. Although humans fill these gaps visually without difficulty, machine
recognition systems have real difficulties reading broken characters. One approach
for handling this problem is to bridge small gaps in the input image by blurring
it. Figure 4.48(b) shows how well characters can be “repaired” by this simple pro-
cess using a Gaussian lowpass filter with D0 = 120. It is typical to follow the type of
“repair” just described with additional processing, such as thresholding and thinning,
to yield cleaner characters. We will discuss thinning in Chapter 9 and thresholding
in Chapter 10.
Lowpass filtering is a staple in the printing and publishing industry, where it is
We will cover unsharp
used for numerous preprocessing functions, including unsharp masking, as discussed
masking in the frequency in Section 3.6. “Cosmetic” processing is another use of lowpass filtering prior to print-
domain in Section 4.9. ing. Figure 4.49 shows an application of lowpass filtering for producing a smoother,
softer-looking result from a sharp original. For human faces, the typical objective is
to reduce the sharpness of fine skin lines and small blemishes. The magnified sec-
tions in Figs. 4.49(b) and (c) clearly show a significant reduction in fine skin lines
around the subject’s eyes. In fact, the smoothed images look quite soft and pleasing.
Figure 4.50 shows two applications of lowpass filtering on the same image, but
with totally different objectives. Figure 4.50(a) is an 808 × 754 segment of a very high

a b
FIGURE 4.48
(a) Sample text
of low resolution
(note the broken
characters in the
magnified view).
(b) Result of
filtering with a
GLPF,
showing that gaps
in the broken
characters were
joined.

www.EBooksWorld.ir
4.8 Image Smoothing Using Lowpass Frequency Domain Filters 283

a b c
FIGURE 4.49 (a) Original 785 × 732 image. (b) Result of filtering using a GLPF with D0 = 150. (c) Result of filtering
using a GLPF with D0 = 130. Note the reduction in fine skin lines in the magnified sections in (b) and (c).

resolution radiometer (VHRR) image showing part of the Gulf of Mexico (dark)
and Florida (light) (note the horizontal sensor scan lines). The boundaries between
bodies of water were caused by loop currents. This image is illustrative of remotely
sensed images in which sensors have the tendency to produce pronounced scan lines
along the direction in which the scene is being scanned. (See Example 4.24 for an

a b c
FIGURE 4.50 (a) 808 × 754 satellite image showing prominent horizontal scan lines. (b) Result of filtering using a
GLPF with D0 = 50. (c) Result of using a GLPF with D0 = 20. (Original image courtesy of NOAA.)

www.EBooksWorld.ir
284 Chapter 4 Filtering in the Frequency Domain

illustration of imaging conditions that can lead for such degradations.) Lowpass fil-
tering is a crude (but simple) way to reduce the effect of these lines, as Fig. 4.50(b)
shows (we consider more effective approaches in Sections 4.10 and 5.4). This image
was obtained using a GLFP with D0 = 50. The reduction in the effect of the scan
lines in the smoothed image can simplify the detection of macro features, such as the
interface boundaries between ocean currents.
Figure 4.50(c) shows the result of significantly more aggressive Gaussian lowpass
filtering with D0 = 20. Here, the objective is to blur out as much detail as possible
while leaving large features recognizable. For instance, this type of filtering could be
part of a preprocessing stage for an image analysis system that searches for features
in an image bank. An example of such features could be lakes of a given size, such
as Lake Okeechobee in the lower eastern region of Florida, shown in Fig. 4.50(c) as
a nearly round dark region surrounded by a lighter region. Lowpass filtering helps
to simplify the analysis by averaging out features smaller than the ones of interest.

4.9 IMAGE SHARPENING USING HIGHPASS FILTERS


4.9

We showed in the previous section that an image can be smoothed by attenuating


the high-frequency components of its Fourier transform. Because edges and other
abrupt changes in intensities are associated with high-frequency components, image
sharpening can be achieved in the frequency domain by highpass filtering, which
In some applications of
attenuates low-frequencies components without disturbing high-frequencies in the
highpass filtering, it is Fourier transform. As in Section 4.8, we consider only zero-phase-shift filters that
advantageous to enhance
the high-frequencies of
are radially symmetric. All filtering in this section is based on the procedure outlined
the Fourier transform. in Section 4.7, so all images are assumed be padded to size P × Q [see Eqs. (4-102)
and (4-103)], and filter transfer functions, H(u, v), are understood to be centered,
discrete functions of size P × Q.

IDEAL, GAUSSIAN, AND BUTTERWORTH HIGHPASS FILTERS FROM


LOWPASS FILTERS
As was the case with kernels in the spatial domain (see Section 3.7), subtracting a
lowpass filter transfer function from 1 yields the corresponding highpass filter trans-
fer function in the frequency domain:

H HP (u, v) = 1 − H LP (u, v) (4-118)

where H LP (u, v) is the transfer function of a lowpass filter. Thus, it follows from
Eq. (4-111) that an ideal highpass filter (IHPF) transfer function is given by

⎧0 if D(u, v) ≤ D0
H (u, v) = ⎨ (4-119)
⎩1 if D(u, v) > D0

where, as before, D(u, v) is the distance from the center of the P × Q frequency rect-
angle, as given in Eq. (4-112). Similarly, it follows from Eq. (4-116) that the transfer
function of a Gaussian highpass filter (GHPF) transfer function is given by

www.EBooksWorld.ir
4.9 Image Sharpening Using Highpass Filters 285

a b c H (u, v) H (u, v)
d e f v 1
g h i
FIGURE 4.51
Top row:
Perspective plot,
image, and, radial
cross section of
an IHPF transfer
function. Middle u
v D(u, v)
and bottom
rows: The same u H (u, v)
H (u, v)
sequence for v 1
GHPF and BHPF
transfer functions.
(The thin image
borders were
added for clarity.
They are not part
of the data.)
u
v D(u, v)
H (u, v) u H (u, v)
v 1

u
v D(u, v)
u

2
( u,v ) 2 D02
H (u, v) = 1 − e − D (4-120)

and, from Eq. (4-117), that the transfer function of a Butterworth highpass filter
(BHPF) is
1
H (u, v) = (4-121)
1 + [ D0 D(u, v)]
2n

Figure 4.51 shows 3-D plots, image representations, and radial cross sections for
the preceding transfer functions. As before, we see that the BHPF transfer function
in the third row of the figure represents a transition between the sharpness of the
IHPF and the broad smoothness of the GHPF transfer function.
It follows from Eq. (4-118) that the spatial kernel corresponding to a highpass
filter transfer function in the frequency domain is given by

www.EBooksWorld.ir
286 Chapter 4 Filtering in the Frequency Domain

hHP ( x, y) = �−1 [ H HP (u, v)]


= �−1 [1 − H LP (u, v)] (4-122)
= d( x, y) − hLP ( x, y)

where we used the fact that the IDFT of 1 in the frequency domain is a unit impulse
in the spatial domain (see Table 4.4). This equation is precisely the foundation for
the discussion in Section 3.7, in which we showed how to construct a highpass kernel
Recall that a unit impulse
by subtracting a lowpass kernel from a unit impulse.
in the spatial domain is Figure 4.52 shows highpass spatial kernels constructed in just this manner, using
an array of 0’s with a 1 in
the center.
Eq. (4-122) with ILPF, GLPF, and BLPF transfer functions (the values of M, N, and
D0 used in this figure are the same as those we used for Fig. 4.42, and the BLPF is of
order 2). Figure 4.52(a) shows the resulting ideal highpass kernel obtained using Eq.
(4-122), and Fig. 4.52(b) is a horizontal intensity profile through the center of the ker-
nel. The center element of the profile is a unit impulse, visible as a bright dot in the
center of Fig. 4.52(a). Note that this highpass kernel has the same ringing properties
illustrated in Fig. 4.42(b) for its corresponding lowpass counterpart. As you will see
shortly, ringing is just as objectionable as before, but this time in images sharpened
with ideal highpass filters. The other images and profiles in Fig. 4.52 are for Gaussian
and Butterworth kernels. We know from Fig. 4.51 that GHPF transfer functions in
the frequency domain tend to have a broader “skirt” than Butterworth functions of
comparable size and cutoff frequency. Thus, we would expect Butterworth spatial

a b c
d e f
FIGURE 4.52 (a)–(c): Ideal, Gaussian, and Butterworth highpass spatial kernels obtained from
IHPF, GHPF, and BHPF frequency-domain transfer functions. (The thin image borders are
not part of the data.) (d)–(f): Horizontal intensity profiles through the centers of the kernels.

www.EBooksWorld.ir
4.9 Image Sharpening Using Highpass Filters 287

TABLE 4.6
Highpass filter transfer functions. D0 is the cutoff frequency and n is the order of the Butterworth transfer function.
Ideal Gaussian Butterworth

⎧0 if D(u, v) ≤ D0 2
( u,v ) 2 D02 1
H (u, v) = ⎨ H (u, v) = 1 − e − D H (u, v) =
if D(u, v) > D0 1 + [ D0 D(u, v)]
2n
⎩1

kernels to be “broader” than comparable Gaussian kernels, a fact that is confirmed


by the images and their profiles in Figs. 4.52. Table 4.6 summarizes the three highpass
filter transfer functions discussed in the preceding paragraphs.

EXAMPLE 4.19 : Highpass filtering of the character test pattern.


The first row of Fig. 4.53 shows the result of filtering the test pattern in Fig. 4.37(a) using IHPF, GHPF, and
BHPF transfer functions with D0 = 60 [see Fig. 4.37(b)] and n = 2 for the Butterworth filter. We know
from Chapter 3 that highpass filtering produces images with negative values. The images in Fig. 4.53 are
not scaled, so the negative values are clipped by the display at 0 (black). The key objective of highpass
filtering is to sharpen. Also, because the highpass filters used here set the DC term to zero, the images
have essentially no tonality, as explained earlier in connection with Fig. 4.30.
Our main objective in this example is to compare the behavior of the three highpass filters. As
Fig. 4.53(a) shows, the ideal highpass filter produced results with severe distortions caused by ringing.
For example, the blotches inside the strokes of the large letter “a” are ringing artifacts. By comparison,
neither Figs. 4.53(b) or (c) have such distortions. With reference to Fig. 4.37(b), the filters removed or
attenuated approximately 95% of the image energy. As you know, removing the lower frequencies of an
image reduces its gray-level content significantly, leaving mostly edges and other sharp transitions, as is
evident in Fig. 4.53. The details you see in the first row of the figure are contained in only the upper 5%
of the image energy.
The second row, obtained with D0 = 160, is more interesting. The remaining energy of those images
is about 2.5%, or half, the energy of the images in the first row. However, the difference in fine detail
is striking. See, for example, how much cleaner the boundary of the large “a” is now, especially in the
Gaussian and Butterworth results. The same is true for all other details, down to the smallest objects.
This is the type of result that is considered acceptable when detection of edges and boundaries is impor-
tant.
Figure 4.54 shows the images in the second row of Fig. 4.53, scaled using Eqs. (2-31) and (2-32) to
display the full intensity range of both positive and negative intensities. The ringing in Fig. 4.54(a) shows
the inadequacy of ideal highpass filters. In contrast, notice the smoothness of the background on the
other two images, and the crispness of their edges.

EXAMPLE 4.20 : Using highpass filtering and thresholding for image enhancement.
Figure 4.55(a) is a 962 × 1026 image of a thumbprint in which smudges (a typical problem) are evident.
A key step in automated fingerprint recognition is enhancement of print ridges and the reduction of
smudges. In this example, we use highpass filtering to enhance the ridges and reduce the effects of

www.EBooksWorld.ir
288 Chapter 4 Filtering in the Frequency Domain

a b c
d e f
FIGURE 4.53 Top row: The image from Fig. 4.40(a) filtered with IHPF, GHPF, and BHPF transfer functions using
D0 = 60 in all cases (n = 2 for the BHPF). Second row: Same sequence, but using D0 = 160.

a b c
FIGURE 4.54 The images from the second row of Fig. 4.53 scaled using Eqs. (2-31) and (2-32) to show both positive
and negative values.

www.EBooksWorld.ir
4.9 Image Sharpening Using Highpass Filters 289

a b c
FIGURE 4.55 (a) Smudged thumbprint. (b) Result of highpass filtering (a). (c) Result of thresholding (b). (Original
image courtesy of the U.S. National Institute of Standards and Technology.)

smudging. Enhancement of the ridges is accomplished by the fact that their boundaries are character-
ized by high frequencies, which are unchanged by a highpass filter. On the other hand, the filter reduces
low frequency components, which correspond to slowly varying intensities in the image, such as the
background and smudges. Thus, enhancement is achieved by reducing the effect of all features except
those with high frequencies, which are the features of interest in this case.
Figure 4.55(b) is the result of using a Butterworth highpass filter of order 4 with a cutoff frequency
of 50. A fourth-order filter provides a sharp (but smooth) transition from low to high frequencies, with
filtering characteristics between an ideal and a Gaussian filter. The cutoff frequency chosen is about 5%
of the long dimension of the image. The idea is for D0 to be close to the origin so that low frequencies are
attenuated but not completely eliminated, except for the DC term which is set to 0, so that tonality dif-
ferences between the ridges and background are not lost completely. Choosing a value for D0 between
5% and 10% of the long dimension of the image is a good starting point. Choosing a large value of
D0 would highlight fine detail to such an extent that the definition of the ridges would be affected. As
expected, the highpass filtered image has negative values, which are shown as black by the display.
A simple approach for highlighting sharp features in a highpass-filtered image is to threshold it by set-
ting to black (0) all negative values and to white (1) the remaining values. Figure 4.55(c) shows the result
of this operation. Note how the ridges are clear, and how the effect of the smudges has been reduced
considerably. In fact, ridges that are barely visible in the top, right section of the image in Fig. 4.55(a) are
nicely enhanced in Fig. 4.55(c). An automated algorithm would find it much easier to follow the ridges
on this image than it would on the original.

THE LAPLACIAN IN THE FREQUENCY DOMAIN


In Section 3.6, we used the Laplacian for image sharpening in the spatial domain. In
this section, we revisit the Laplacian and show that it yields equivalent results using
frequency domain techniques. It can be shown (see Problem 4.52) that the Laplacian
can be implemented in the frequency domain using the filter transfer function

H (u, v) = − 4p 2 (u 2 + v 2 ) (4-123)

www.EBooksWorld.ir
332 Chapter 5 Image Restoration and Reconstruction

are quite popular because, for certain types of random noise, they provide excellent
noise-reduction capabilities, with considerably less blurring than linear smoothing
filters of similar size. Median filters are particularly effective in the presence of both
bipolar and unipolar impulse noise, as Example 5.3 below shows. Computation of
the median and implementation of this filter are discussed in Section 3.6.

Max and Min Filters


Although the median filter is by far the order-statistic filter most used in image pro-
cessing, it is by no means the only one. The median represents the 50th percentile of
a ranked set of numbers, but you will recall from basic statistics that ranking lends
itself to many other possibilities. For example, using the 100th percentile results in
the so-called max filter, given by

fˆ ( x, y) = max
( r , c ) ∈Sxy
{ g(r, c)} (5-28)

This filter is useful for finding the brightest points in an image or for eroding dark
regions adjacent to bright areas. Also, because pepper noise has very low values, it
is reduced by this filter as a result of the max selection process in the subimage area
Sxy .
The 0th percentile filter is the min filter:

fˆ ( x, y) = min
( r, c )∈Sxy
{ g(r, c)} (5-29)

This filter is useful for finding the darkest points in an image or for eroding light
regions adjacent to dark areas. Also, it reduces salt noise as a result of the min opera-
tion.

Midpoint Filter
The midpoint filter computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:

1⎡ ⎤
fˆ ( x, y) = ⎢ max { g(r, c)} + min { g(r, c)}⎥ (5-30)
2⎣ ( r, c ) ∈S xy ( r, c ) ∈S xy ⎦

Note that this filter combines order statistics and averaging. It works best for ran-
domly distributed noise, like Gaussian or uniform noise.

Alpha-Trimmed Mean Filter


Suppose that we delete the d 2 lowest and the d 2 highest intensity values of g(r, c)
in the neighborhood Sxy . Let gR (r, c) represent the remaining mn − d pixels in Sxy .
A filter formed by averaging these remaining pixels is called an alpha-trimmed mean
filter. The form of this filter is

www.EBooksWorld.ir
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 333

1
fˆ ( x, y) = ∑ g R (r , c )
mn − d (r, c )∈Sxy
(5-31)

where the value of d can range from 0 to mn − 1. When d = 0 the alpha-trimmed fil-
ter reduces to the arithmetic mean filter discussed earlier. If we choose d = mn − 1,
the filter becomes a median filter. For other values of d, the alpha-trimmed filter is
useful in situations involving multiple types of noise, such as a combination of salt-
and-pepper and Gaussian noise.

EXAMPLE 5.3 : Image denoising using order-statistic filters.


Figure 5.10(a) shows the circuit board image corrupted by salt-and-pepper noise with probabilities
Ps = Pp = 0.1. Figure 5.10(b) shows the result of median filtering with a filter of size 3 × 3. The improve-
ment over Fig. 5.10(a) is significant, but several noise points still are visible. A second pass [on the im-
age in Fig. 5.10(b)] with the median filter removed most of these points, leaving only few, barely visible
noise points. These were removed with a third pass of the filter. These results are good examples of the
power of median filtering in handling impulse-like additive noise. Keep in mind that repeated passes
of a median filter will blur the image, so it is desirable to keep the number of passes as low as possible.
Figure 5.11(a) shows the result of applying the max filter to the pepper noise image of Fig. 5.8(a). The
filter did a reasonable job of removing the pepper noise, but we note that it also removed (set to a light
intensity level) some dark pixels from the borders of the dark objects. Figure 5.11(b) shows the result
of applying the min filter to the image in Fig. 5.8(b). In this case, the min filter did a better job than the
max filter on noise removal, but it removed some white points around the border of light objects. These
made the light objects smaller and some of the dark objects larger (like the connector fingers in the top
of the image) because white points around these objects were set to a dark level.
The alpha-trimmed filter is illustrated next. Figure 5.12(a) shows the circuit board image corrupted
this time by additive, uniform noise of variance 800 and zero mean. This is a high level of noise corrup-
tion that is made worse by further addition of salt-and-pepper noise with Ps = Pp = 0.1, as Fig. 5.12(b)
shows. The high level of noise in this image warrants use of larger filters. Figures 5.12(c) through (f) show
the results, respectively, obtained using arithmetic mean, geometric mean, median, and alpha-trimmed
mean (with d = 6) filters of size 5 × 5. As expected, the arithmetic and geometric mean filters (especially
the latter) did not do well because of the presence of impulse noise. The median and alpha-trimmed
filters performed much better, with the alpha-trimmed filter giving slightly better noise reduction. For
example, note in Fig. 5.12(f) that the fourth connector finger from the top left is slightly smoother in
the alpha-trimmed result. This is not unexpected because, for a high value of d, the alpha-trimmed filter
approaches the performance of the median filter, but still retains some smoothing capabilities.

ADAPTIVE FILTERS
Once selected, the filters discussed thus far are applied to an image without regard
for how image characteristics vary from one point to another. In this section, we
take a look at two adaptive filters whose behavior changes based on statistical char-
acteristics of the image inside the filter region defined by the m × n rectangular
neighborhood Sxy . As the following discussion shows, adaptive filters are capable
of performance superior to that of the filters discussed thus far. The price paid for

www.EBooksWorld.ir
334 Chapter 5 Image Restoration and Reconstruction

a b
c d
FIGURE 5.10
(a) Image
corrupted by salt-
and- pepper noise
with probabilities
Ps = Pp = 0.1.
(b) Result of one
pass with a medi-
an filter of size
3 × 3. (c) Result
of processing (b)
with this filter.
(d) Result of
processing (c)
with the same
filter.

a b
FIGURE 5.11
(a) Result of
filtering Fig. 5.8(a)
with a max filter
of size 3 × 3.
(b) Result of
filtering Fig. 5.8(b)
with a min filter of
the same size.

www.EBooksWorld.ir
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 335

a b
c d
e f
FIGURE 5.12
(a) Image
corrupted by
additive uniform
noise. (b) Image
additionally
corrupted by
additive salt-and-
pepper noise.
(c)-(f) Image (b)
filtered with a
5 × 5:
(c) arithmetic
mean filter;
(d) geometric
mean filter;
(e) median filter;
(f) alpha-trimmed
mean filter, with
d = 6.

www.EBooksWorld.ir
336 Chapter 5 Image Restoration and Reconstruction

improved filtering power is an increase in filter complexity. Keep in mind that we


still are dealing with the case in which the degraded image is equal to the original
image plus noise. No other types of degradations are being considered yet.

Adaptive, Local Noise Reduction Filter


The simplest statistical measures of a random variable are its mean and variance.
These are reasonable parameters on which to base an adaptive filter because they
are quantities closely related to the appearance of an image. The mean gives a mea-
sure of average intensity in the region over which the mean is computed, and the
variance gives a measure of image contrast in that region.
Our filter is to operate on a neighborhood, Sxy , centered on coordinates ( x, y).
The response of the filter at ( x, y) is to be based on the following quantities: g( x, y),
the value of the noisy image at ( x, y); sh2 , the variance of the noise; zSxy , the local
average intensity of the pixels in Sxy ; and sS2xy, the local variance of the intensities of
pixels in Sxy . We want the behavior of the filter to be as follows:

1. If sh2 is zero, the filter should return simply the value of g at ( x, y). This is the
trivial, zero-noise case in which g is equal to f at ( x, y).
2. If the local variance sS2xy is high relative to sh2 , the filter should return a value
close to g at ( x, y). A high local variance typically is associated with edges, and
these should be preserved.
3. If the two variances are equal, we want the filter to return the arithmetic mean
value of the pixels in Sxy . This condition occurs when the local area has the same
properties as the overall image, and local noise is to be reduced by averaging.

An adaptive expression for obtaining fˆ ( x, y) based on these assumptions may be


written as
sh2
fˆ ( x, y) = g( x, y) − 2 ⎡ g( x, y) − zSxy ⎤ (5-32)
sSxy ⎣ ⎦

The only quantity that needs to be known a priori is sh2 , the variance of the noise
corrupting image f ( x, y). This is a constant that can be estimated from sample noisy
images using Eq. (3-26). The other parameters are computed from the pixels in
neighborhood Sxy using Eqs. (3-27) and (3-28).
An assumption in Eq. (5-32) is that the ratio of the two variances does not exceed 1,
which implies that sh2 ≤ sS2xy . The noise in our model is additive and position indepen-
dent, so this is a reasonable assumption to make because Sxy is a subset of g( x, y).
However, we seldom have exact knowledge of sh2 . Therefore, it is possible for this
condition to be violated in practice. For that reason, a test should be built into an
implementation of Eq. (5-32) so that the ratio is set to 1 if the condition sh2 > sS2xy
occurs. This makes this filter nonlinear. However, it prevents nonsensical results (i.e.,
negative intensity levels, depending on the value of zSxy ) due to a potential lack of
knowledge about the variance of the image noise. Another approach is to allow the
negative values to occur, and then rescale the intensity values at the end. The result
then would be a loss of dynamic range in the image.

www.EBooksWorld.ir
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 337

EXAMPLE 5.4 : Image denoising using adaptive, local noise-reduction filtering.


Figure 5.13(a) shows the circuit-board image, corrupted this time by additive Gaussian noise of zero
mean and a variance of 1000. This is a significant level of noise corruption, but it makes an ideal test bed
on which to compare relative filter performance. Figure 5.13(b) is the result of processing the noisy im-
age with an arithmetic mean filter of size 7 × 7. The noise was smoothed out, but at the cost of significant
blurring. Similar comments apply to Fig. 5.13(c), which shows the result of processing the noisy image
with a geometric mean filter, also of size 7 × 7. The differences between these two filtered images are
analogous to those we discussed in Example 5.2; only the degree of blurring is different.
Figure 5.13(d) shows the result of using the adaptive filter of Eq. (5-32) with sh2 = 1000. The improve-
ments in this result compared with the two previous filters are significant. In terms of overall noise
reduction, the adaptive filter achieved results similar to the arithmetic and geometric mean filters. How-
ever, the image filtered with the adaptive filter is much sharper. For example, the connector fingers at the
top of the image are significantly sharper in Fig. 5.13(d). Other features, such as holes and the eight legs
of the dark component on the lower left-hand side of the image, are much clearer in Fig. 5.13(d).These
results are typical of what can be achieved with an adaptive filter. As mentioned earlier, the price paid
for the improved performance is additional filter complexity.

a b
c d
FIGURE 5.13
(a) Image
corrupted by
additive
Gaussian noise of
zero mean and a
variance of 1000.
(b) Result of
arithmetic mean
filtering.
(c) Result of
geometric mean
filtering.
(d) Result of
adaptive noise-
reduction filtering.
All filters used
were of size 7 × 7.

www.EBooksWorld.ir

You might also like