0% found this document useful (0 votes)
24 views

21CS732 Module 4 Textbook (1)

Uploaded by

Mohammed Nihal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

21CS732 Module 4 Textbook (1)

Uploaded by

Mohammed Nihal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

MODULE 4

6 Color Image Processing

ud
It is only after years of preparation that the young artist should touch
color—not color used descriptively, that is, but as a means of
personal expression. Henri Matisse
lo
For a long time I limited myself to one color—as a form of discipline.
Pablo Picasso
C
Preview
The use of color in image processing is motivated by two principal factors.
tu

First, color is a powerful descriptor that often simplifies object identification


and extraction from a scene. Second, humans can discern thousands of color
shades and intensities, compared to about only two dozen shades of gray. This
second factor is particularly important in manual (i.e., when performed by hu-
mans) image analysis.
Color image processing is divided into two major areas: full-color and
pseudocolor processing. In the first category, the images in question typically
V

are acquired with a full-color sensor, such as a color TV camera or color scan-
ner. In the second category, the problem is one of assigning a color to a partic-
ular monochrome intensity or range of intensities. Until relatively recently,
most digital color image processing was done at the pseudocolor level. How-
ever, in the past decade, color sensors and hardware for processing color im-
ages have become available at reasonable prices. The result is that full-color
image processing techniques are now used in a broad range of applications, in-
cluding publishing, visualization, and the Internet.
It will become evident in the discussions that follow that some of the gray-scale
methods covered in previous chapters are directly applicable to color images.

394
6.1 ■ Color Fundamentals 395

Others require reformulation to be consistent with the properties of the color


spaces developed in this chapter. The techniques described here are far from ex-
haustive; they illustrate the range of methods available for color image processing.

6.1 Color Fundamentals


Although the process followed by the human brain in perceiving and inter-
preting color is a physiopsychological phenomenon that is not fully under-
stood, the physical nature of color can be expressed on a formal basis
supported by experimental and theoretical results.
In 1666, Sir Isaac Newton discovered that when a beam of sunlight passes

ud
through a glass prism, the emerging beam of light is not white but consists in-
stead of a continuous spectrum of colors ranging from violet at one end to red
at the other. As Fig. 6.1 shows, the color spectrum may be divided into six
broad regions: violet, blue, green, yellow, orange, and red. When viewed in full
color (Fig. 6.2), no color in the spectrum ends abruptly, but rather each color
blends smoothly into the next.
Basically, the colors that humans and some other animals perceive in an object
are determined by the nature of the light reflected from the object. As illustrated
lo
in Fig. 6.2, visible light is composed of a relatively narrow band of frequencies in
the electromagnetic spectrum.A body that reflects light that is balanced in all vis-
ible wavelengths appears white to the observer. However, a body that favors re-
flectance in a limited range of the visible spectrum exhibits some shades of color.
For example, green objects reflect light with wavelengths primarily in the 500 to
570 nm range while absorbing most of the energy at other wavelengths.
C
FIGURE 6.1 Color
spectrum seen by
passing white
tu

light through a
prism. (Courtesy
of the General
Electric Co.,
Lamp Business
Division.)
V

FIGURE 6.2 Wavelengths comprising the visible range of the electromagnetic spectrum.
(Courtesy of the General Electric Co., Lamp Business Division.)
396 Chapter 6 ■ Color Image Processing

Characterization of light is central to the science of color. If the light is


achromatic (void of color), its only attribute is its intensity, or amount. Achro-
matic light is what viewers see on a black and white television set, and it has
been an implicit component of our discussion of image processing thus far. As
defined in Chapter 2, and used numerous times since, the term gray level
refers to a scalar measure of intensity that ranges from black, to grays, and fi-
nally to white.
Chromatic light spans the electromagnetic spectrum from approximately
400 to 700 nm. Three basic quantities are used to describe the quality of a
chromatic light source: radiance, luminance, and brightness. Radiance is the
total amount of energy that flows from the light source, and it is usually mea-

ud
sured in watts (W). Luminance, measured in lumens (lm), gives a measure of
the amount of energy an observer perceives from a light source. For example,
light emitted from a source operating in the far infrared region of the spec-
trum could have significant energy (radiance), but an observer would hardly
perceive it; its luminance would be almost zero. Finally, brightness is a subjec-
tive descriptor that is practically impossible to measure. It embodies the
achromatic notion of intensity and is one of the key factors in describing
color sensation. lo
As noted in Section 2.1.1, cones are the sensors in the eye responsible for
color vision. Detailed experimental evidence has established that the 6 to 7 mil-
lion cones in the human eye can be divided into three principal sensing cate-
gories, corresponding roughly to red, green, and blue. Approximately 65% of all
cones are sensitive to red light, 33% are sensitive to green light, and only about
2% are sensitive to blue (but the blue cones are the most sensitive). Figure 6.3
C
shows average experimental curves detailing the absorption of light by the red,
green, and blue cones in the eye. Due to these absorption characteristics of the
tu

FIGURE 6.3 445 nm 535 nm 575 nm


Absorption of
light by the red,
Absorption (arbitrary units)

green, and blue


cones in the Blue Green Red
human eye as a
function of
wavelength.
V

400 450 500 550 600 650 700 nm


Bluish purple

Purplish blue

Blue
Blue green

Green

Yellowish green

Yellow
Orange

Reddish orange

Red
6.1 ■ Color Fundamentals 397

human eye, colors are seen as variable combinations of the so-called primary
colors red (R), green (G), and blue (B). For the purpose of standardization, the
CIE (Commission Internationale de l’Eclairage—the International Commis-
sion on Illumination) designated in 1931 the following specific wavelength val-
ues to the three primary colors: blue = 435.8 nm, green = 546.1 nm, and
red = 700 nm. This standard was set before the detailed experimental curves
shown in Fig. 6.3 became available in 1965. Thus, the CIE standards correspond
only approximately with experimental data. We note from Figs. 6.2 and 6.3 that
no single color may be called red, green, or blue. Also, it is important to keep in
mind that having three specific primary color wavelengths for the purpose of
standardization does not mean that these three fixed RGB components acting

ud
alone can generate all spectrum colors. Use of the word primary has been widely
misinterpreted to mean that the three standard primaries, when mixed in vari-
ous intensity proportions, can produce all visible colors. As you will see shortly,
this interpretation is not correct unless the wavelength also is allowed to vary,
in which case we would no longer have three fixed, standard primary colors.
The primary colors can be added to produce the secondary colors of light—
magenta (red plus blue), cyan (green plus blue), and yellow (red plus green).
Mixing the three primaries, or a secondary with its opposite primary color, in
lo
the right intensities produces white light. This result is shown in Fig. 6.4(a),
which also illustrates the three primary colors and their combinations to pro-
duce the secondary colors.

a
C
MIXTURES OF LIGHT b
(Additive primaries)
GREEN FIGURE 6.4
Primary and
secondary colors
YELLOW CYAN of light and
tu

WHITE pigments.
(Courtesy of the
RED BLUE General Electric
MAGENTA
Co., Lamp
Business
Division.)

MIXTURES OF PIGMENTS
(Subtractive primaries)
V

YELLOW

RED GREEN

BLACK

MAGENTA CYAN
BLUE

PRIMARY AND SECONDARY COLORS


OF LIGHT AND PIGMENT
398 Chapter 6 ■ Color Image Processing

Differentiating between the primary colors of light and the primary colors
of pigments or colorants is important. In the latter, a primary color is defined
as one that subtracts or absorbs a primary color of light and reflects or trans-
mits the other two. Therefore, the primary colors of pigments are magenta,
cyan, and yellow, and the secondary colors are red, green, and blue. These col-
ors are shown in Fig. 6.4(b). A proper combination of the three pigment pri-
maries, or a secondary with its opposite primary, produces black.
Color television reception is an example of the additive nature of light col-
ors. The interior of CRT (cathode ray tube) color TV screens is composed of a
large array of triangular dot patterns of electron-sensitive phosphor. When ex-
cited, each dot in a triad produces light in one of the primary colors. The inten-

ud
sity of the red-emitting phosphor dots is modulated by an electron gun inside
the tube, which generates pulses corresponding to the “red energy” seen by
the TV camera. The green and blue phosphor dots in each triad are modulated
in the same manner. The effect, viewed on the television receiver, is that the
three primary colors from each phosphor triad are “added” together and re-
ceived by the color-sensitive cones in the eye as a full-color image. Thirty suc-
cessive image changes per second in all three colors complete the illusion of a
continuous image display on the screen.
lo
CRT displays are being replaced by “flat panel” digital technologies, such as
liquid crystal displays (LCDs) and plasma devices. Although they are funda-
mentally different from CRTs, these and similar technologies use the same
principle in the sense that they all require three subpixels (red, green, and
blue) to generate a single color pixel. LCDs use properties of polarized light to
block or pass light through the LCD screen and, in the case of active matrix
C
display technology, thin film transistors (TFTs) are used to provide the proper
signals to address each pixel on the screen. Light filters are used to produce
the three primary colors of light at each pixel triad location. In plasma units,
pixels are tiny gas cells coated with phosphor to produce one of the three pri-
tu

mary colors. The individual cells are addressed in a manner analogous to


LCDs. This individual pixel triad coordinate addressing capability is the foun-
dation of digital displays.
The characteristics generally used to distinguish one color from another are
brightness, hue, and saturation. As indicated earlier in this section, brightness
embodies the achromatic notion of intensity. Hue is an attribute associated
with the dominant wavelength in a mixture of light waves. Hue represents
V

dominant color as perceived by an observer. Thus, when we call an object red,


orange, or yellow, we are referring to its hue. Saturation refers to the relative
purity or the amount of white light mixed with a hue. The pure spectrum colors
are fully saturated. Colors such as pink (red and white) and lavender (violet
and white) are less saturated, with the degree of saturation being inversely
proportional to the amount of white light added.
Hue and saturation taken together are called chromaticity, and, therefore, a
color may be characterized by its brightness and chromaticity. The amounts of
red, green, and blue needed to form any particular color are called the
6.1 ■ Color Fundamentals 399

tristimulus values and are denoted, X, Y, and Z, respectively. A color is then


specified by its trichromatic coefficients, defined as
X
x = (6.1-1)
X + Y + Z
Y
y = (6.1-2)
X + Y + Z
and
Z
z = (6.1-3)
X + Y + Z

ud
It is noted from these equations that†
x + y + z = 1 (6.1-4)
For any wavelength of light in the visible spectrum, the tristimulus values
needed to produce the color corresponding to that wavelength can be ob-
tained directly from curves or tables that have been compiled from extensive
experimental results (Poynton [1996]. See also the early references by Walsh
[1958] and by Kiver [1965]). lo
Another approach for specifying colors is to use the CIE chromaticity dia-
gram (Fig. 6.5), which shows color composition as a function of x (red) and y
(green). For any value of x and y, the corresponding value of z (blue) is ob-
tained from Eq. (6.1-4) by noting that z = 1 - (x + y). The point marked
green in Fig. 6.5, for example, has approximately 62% green and 25% red con-
tent. From Eq. (6.1-4), the composition of blue is approximately 13%.
C
The positions of the various spectrum colors—from violet at 380 nm to red
at 780 nm—are indicated around the boundary of the tongue-shaped chro-
maticity diagram. These are the pure colors shown in the spectrum of Fig. 6.2.
Any point not actually on the boundary but within the diagram represents
tu

some mixture of spectrum colors. The point of equal energy shown in Fig. 6.5
corresponds to equal fractions of the three primary colors; it represents the
CIE standard for white light. Any point located on the boundary of the chro-
maticity chart is fully saturated. As a point leaves the boundary and approach-
es the point of equal energy, more white light is added to the color and it
becomes less saturated. The saturation at the point of equal energy is zero.
The chromaticity diagram is useful for color mixing because a straight-line
V

segment joining any two points in the diagram defines all the different color
variations that can be obtained by combining these two colors additively. Con-
sider, for example, a straight line drawn from the red to the green points shown
in Fig. 6.5. If there is more red light than green light, the exact point represent-
ing the new color will be on the line segment, but it will be closer to the red
point than to the green point. Similarly, a line drawn from the point of equal


The use of x, y, z in this context follows notational convention. These should not be confused with the
use of (x, y) to denote spatial coordinates in other sections of the book.
400 Chapter 6 ■ Color Image Processing

FIGURE 6.5
Chromaticity
diagram.
(Courtesy of the
General Electric
Co., Lamp
Business
Division.)

ud
lo
C
tu

energy to any point on the boundary of the chart will define all the shades of
that particular spectrum color.
V

Extension of this procedure to three colors is straightforward. To determine


the range of colors that can be obtained from any three given colors in the
chromaticity diagram, we simply draw connecting lines to each of the three
color points. The result is a triangle, and any color on the boundary or inside
the triangle can be produced by various combinations of the three initial col-
ors. A triangle with vertices at any three fixed colors cannot enclose the entire
color region in Fig. 6.5. This observation supports graphically the remark made
earlier that not all colors can be obtained with three single, fixed primaries.
The triangle in Figure 6.6 shows a typical range of colors (called the color
gamut) produced by RGB monitors. The irregular region inside the triangle
is representative of the color gamut of today’s high-quality color printing
6.2 ■ Color Models 401

.9 FIGURE 6.6
Typical color
520 gamut of color
monitors
.8 530
(triangle) and
color printing
510 540
G
devices (irregular
.7
region).
550

560
.6

ud
570
500

.5
580
y-axis

590
.4
600

610
R 620
.3

.2
490 lo 640
780

480
C
.1
470
460
B
450 380
0
0 .1 .2 .3 .4 .5 .6 .7 .8
tu

x-axis

devices. The boundary of the color printing gamut is irregular because color
printing is a combination of additive and subtractive color mixing, a process
that is much more difficult to control than that of displaying colors on a
monitor, which is based on the addition of three highly controllable light
V

primaries.

6.2 Color Models


The purpose of a color model (also called color space or color system) is to fa-
cilitate the specification of colors in some standard, generally accepted way. In
essence, a color model is a specification of a coordinate system and a subspace
within that system where each color is represented by a single point.
Most color models in use today are oriented either toward hardware (such
as for color monitors and printers) or toward applications where color manip-
ulation is a goal (such as in the creation of color graphics for animation). In
402 Chapter 6 ■ Color Image Processing

terms of digital image processing, the hardware-oriented models most com-


monly used in practice are the RGB (red, green, blue) model for color moni-
tors and a broad class of color video cameras; the CMY (cyan, magenta,
yellow) and CMYK (cyan, magenta, yellow, black) models for color printing;
and the HSI (hue, saturation, intensity) model, which corresponds closely with
the way humans describe and interpret color. The HSI model also has the ad-
vantage that it decouples the color and gray-scale information in an image,
making it suitable for many of the gray-scale techniques developed in this
book. There are numerous color models in use today due to the fact that color
science is a broad field that encompasses many areas of application. It is
tempting to dwell on some of these models here simply because they are inter-

ud
esting and informative. However, keeping to the task at hand, the models dis-
cussed in this chapter are leading models for image processing. Having
mastered the material in this chapter, you will have no difficulty in under-
standing additional color models in use today.

6.2.1 The RGB Color Model


In the RGB model, each color appears in its primary spectral components of
lo
red, green, and blue. This model is based on a Cartesian coordinate system.
The color subspace of interest is the cube shown in Fig. 6.7, in which RGB pri-
mary values are at three corners; the secondary colors cyan, magenta, and yel-
low are at three other corners; black is at the origin; and white is at the corner
farthest from the origin. In this model, the gray scale (points of equal RGB
values) extends from black to white along the line joining these two points.
C
The different colors in this model are points on or inside the cube, and are de-
fined by vectors extending from the origin. For convenience, the assumption
is that all color values have been normalized so that the cube shown in Fig. 6.7
is the unit cube. That is, all values of R, G, and B are assumed to be in the
range [0, 1].
tu

FIGURE 6.7 B
Schematic of the
RGB color cube.
Points along the
main diagonal (0, 0, 1)
Blue Cyan
V

have gray values,


from black at the
origin to white at Magenta
White
point (1, 1, 1).

Gray scale (0, 1, 0)


Black G
Green

(1, 0, 0)
Red Yellow

R
6.2 ■ Color Models 403

FIGURE 6.8 RGB


24-bit color cube.

ud
Images represented in the RGB color model consist of three component
images, one for each primary color. When fed into an RGB monitor, these
three images combine on the screen to produce a composite color image, as
explained in Section 6.1. The number of bits used to represent each pixel in
RGB space is called the pixel depth. Consider an RGB image in which each of
the red, green, and blue images is an 8-bit image. Under these conditions each
RGB color pixel [that is, a triplet of values (R, G, B)] is said to have a depth of
24 bits (3 image planes times the number of bits per plane). The term full-color
lo
image is used often to denote a 24-bit RGB color image. The total number of
colors in a 24-bit RGB image is (28)3 = 16,777,216. Figure 6.8 shows the 24-bit
RGB color cube corresponding to the diagram in Fig. 6.7.

■ The cube shown in Fig. 6.8 is a solid, composed of the (28)3 = 16,777,216 EXAMPLE 6.1:
C
colors mentioned in the preceding paragraph. A convenient way to view these Generating the
hidden face
colors is to generate color planes (faces or cross sections of the cube). This is planes and a cross
accomplished simply by fixing one of the three colors and allowing the other section of the
two to vary. For instance, a cross-sectional plane through the center of the cube RGB color cube.
and parallel to the GB-plane in Fig. 6.8 is the plane (127, G, B) for
tu

G, B = 0, 1, 2, Á , 255. Here we used the actual pixel values rather than the
mathematically convenient normalized values in the range [0, 1] because the
former values are the ones actually used in a computer to generate colors.
Figure 6.9(a) shows that an image of the cross-sectional plane is viewed simply
by feeding the three individual component images into a color monitor. In the
component images, 0 represents black and 255 represents white (note that
V

these are gray-scale images). Finally, Fig. 6.9(b) shows the three hidden surface
planes of the cube in Fig. 6.8, generated in the same manner.
It is of interest to note that acquiring a color image is basically the process
shown in Fig. 6.9 in reverse. A color image can be acquired by using three fil-
ters, sensitive to red, green, and blue, respectively. When we view a color scene
with a monochrome camera equipped with one of these filters, the result is a
monochrome image whose intensity is proportional to the response of that fil-
ter. Repeating this process with each filter produces three monochrome im-
ages that are the RGB component images of the color scene. (In practice,
RGB color image sensors usually integrate this process into a single device.)
Clearly, displaying these three RGB component images in the form shown in
Fig. 6.9(a) would yield an RGB color rendition of the original color scene. ■
404 Chapter 6 ■ Color Image Processing

a
b
FIGURE 6.9 Red
(a) Generating
the RGB image of
the cross-sectional
color plane (127,
G, B). (b) The
three hidden
Green RGB
surface planes in Color
the color cube of monitor
Fig. 6.8.

ud
Blue

lo
C
(R  0) (G  0) (B  0)

While high-end display cards and monitors provide a reasonable rendition


of the colors in a 24-bit RGB image, many systems in use today are limited to
tu

256 colors. Also, there are numerous applications in which it simply makes no
sense to use more than a few hundred, and sometimes fewer, colors. A good
example of this is provided by the pseudocolor image processing techniques
discussed in Section 6.3. Given the variety of systems in current use, it is of
considerable interest to have a subset of colors that are likely to be repro-
duced faithfully, reasonably independently of viewer hardware capabilities.
V

This subset of colors is called the set of safe RGB colors, or the set of all-
systems-safe colors. In Internet applications, they are called safe Web colors or
safe browser colors.
On the assumption that 256 colors is the minimum number of colors that
can be reproduced faithfully by any system in which a desired result is likely to
be displayed, it is useful to have an accepted standard notation to refer to
these colors. Forty of these 256 colors are known to be processed differently by
various operating systems, leaving only 216 colors that are common to most
systems. These 216 colors have become the de facto standard for safe colors,
especially in Internet applications. They are used whenever it is desired that
the colors viewed by most people appear the same.
6.2 ■ Color Models 405

TABLE 6.1
Number System Color Equivalents
Valid values of
Hex 00 33 66 99 CC FF each RGB
Decimal 0 51 102 153 204 255 component in a
safe color.

Each of the 216 safe colors is formed from three RGB values as before, but
each value can only be 0, 51, 102, 153, 204, or 255. Thus, RGB triplets of these
values give us (6)3 = 216 possible values (note that all values are divisible by
3). It is customary to express these values in the hexagonal number system, as
shown in Table 6.1. Recall that hex numbers 0, 1, 2, Á , 9, A, B, C, D, E, F

ud
correspond to decimal numbers 0, 1, 2, Á , 9, 10, 11, 12, 13, 14, 15. Recall
also that (0)16 = (0000)2 and (F)16 = (1111)2. Thus, for example,
(FF)16 = (255)10 = (11111111)2 and we see that a grouping of two hex num-
bers forms an 8-bit byte.
Since it takes three numbers to form an RGB color, each safe color is
formed from three of the two digit hex numbers in Table 6.1. For example, the
purest red is FF0000. The values 000000 and FFFFFF represent black and
white, respectively. Keep in mind that the same result is obtained by using the
lo
more familiar decimal notation. For instance, the brightest red in decimal no-
tation has R = 255 (FF) and G = B = 0.
Figure 6.10(a) shows the 216 safe colors, organized in descending RGB val-
ues. The square in the top left array has value FFFFFF (white), the second
square to its right has value FFFFCC, the third square has value FFFF99, and
C
a
b
FIGURE 6.10
(a) The 216 safe
tu

RGB colors.
(b) All the grays
in the 256-color
RGB system
(grays that are
part of the safe
color group are
shown
V

underlined).
AAAAAA

DDDDDD
BBBBBB

CCCCCC

EEEEEE

FFFFFF
000000

111111

222222

333333

444444

555555

666666

777777

888888

999999
406 Chapter 6 ■ Color Image Processing

so on for the first row. The second row of that same array has values FFCCFF,
FFCCCC, FFCC99, and so on. The final square of that array has value FF0000
(the brightest possible red). The second array to the right of the one just ex-
amined starts with value CCFFFF and proceeds in the same manner, as do the
other remaining four arrays. The final (bottom right) square of the last array
has value 000000 (black). It is important to note that not all possible 8-bit gray
colors are included in the 216 safe colors. Figure 6.10(b) shows the hex codes
for all the possible gray colors in a 256-color RGB system. Some of these val-
ues are outside of the safe color set but are represented properly (in terms of
their relative intensities) by most display systems. The grays from the safe
color group, (KKKKKK)16, for K = 0, 3, 6, 9, C, F, are shown underlined in

ud
Fig. 6.10(b).
Figure 6.11 shows the RGB safe-color cube. Unlike the full-color cube in
Fig. 6.8, which is solid, the cube in Fig. 6.11 has valid colors only on the sur-
face planes. As shown in Fig. 6.10(a), each plane has a total of 36 colors, so
the entire surface of the safe-color cube is covered by 216 different colors, as
expected.

lo
6.2.2 The CMY and CMYK Color Models
As indicated in Section 6.1, cyan, magenta, and yellow are the secondary colors
of light or, alternatively, the primary colors of pigments. For example, when a
surface coated with cyan pigment is illuminated with white light, no red light is
reflected from the surface. That is, cyan subtracts red light from reflected white
light, which itself is composed of equal amounts of red, green, and blue light.
C
Most devices that deposit colored pigments on paper, such as color printers
and copiers, require CMY data input or perform an RGB to CMY conversion
internally. This conversion is performed using the simple operation

C 1 R
tu

CMS = C1S - CGS (6.2-1)


Y 1 B
where, again, the assumption is that all color values have been normalized to
the range [0, 1]. Equation (6.2-1) demonstrates that light reflected from a
V

FIGURE 6.11
The RGB safe-
color cube.
6.2 ■ Color Models 407

surface coated with pure cyan does not contain red (that is, C = 1 - R in the
equation). Similarly, pure magenta does not reflect green, and pure yellow
does not reflect blue. Equation (6.2-1) also reveals that RGB values can be
obtained easily from a set of CMY values by subtracting the individual CMY
values from 1. As indicated earlier, in image processing this color model is
used in connection with generating hardcopy output, so the inverse opera-
tion from CMY to RGB generally is of little practical interest.
According to Fig. 6.4, equal amounts of the pigment primaries, cyan, ma-
genta, and yellow should produce black. In practice, combining these colors
for printing produces a muddy-looking black. So, in order to produce true
black (which is the predominant color in printing), a fourth color, black, is

ud
added, giving rise to the CMYK color model. Thus, when publishers talk about
“four-color printing,” they are referring to the three colors of the CMY color
model plus black.

6.2.3 The HSI Color Model


As we have seen, creating colors in the RGB and CMY models and changing
from one model to the other is a straightforward process. As noted earlier,
lo
these color systems are ideally suited for hardware implementations. In addi-
tion, the RGB system matches nicely with the fact that the human eye is
strongly perceptive to red, green, and blue primaries. Unfortunately, the
RGB, CMY, and other similar color models are not well suited for describing
colors in terms that are practical for human interpretation. For example, one
does not refer to the color of an automobile by giving the percentage of each
C
of the primaries composing its color. Furthermore, we do not think of color
images as being composed of three primary images that combine to form that
single image.
When humans view a color object, we describe it by its hue, saturation, and
tu

brightness. Recall from the discussion in Section 6.1 that hue is a color at-
tribute that describes a pure color (pure yellow, orange, or red), whereas satu-
ration gives a measure of the degree to which a pure color is diluted by white
light. Brightness is a subjective descriptor that is practically impossible to mea-
sure. It embodies the achromatic notion of intensity and is one of the key fac-
tors in describing color sensation. We do know that intensity (gray level) is a
most useful descriptor of monochromatic images. This quantity definitely is
V

measurable and easily interpretable. The model we are about to present, called
the HSI (hue, saturation, intensity) color model, decouples the intensity com-
ponent from the color-carrying information (hue and saturation) in a color
image. As a result, the HSI model is an ideal tool for developing image pro-
cessing algorithms based on color descriptions that are natural and intuitive to
humans, who, after all, are the developers and users of these algorithms. We
can summarize by saying that RGB is ideal for image color generation (as in
image capture by a color camera or image display in a monitor screen), but its
use for color description is much more limited. The material that follows pro-
vides an effective way to do this.
408 Chapter 6 ■ Color Image Processing

As discussed in Example 6.1, an RGB color image can be viewed as three


monochrome intensity images (representing red, green, and blue), so it should
come as no surprise that we should be able to extract intensity from an RGB
image.This becomes rather clear if we take the color cube from Fig. 6.7 and stand
it on the black (0, 0, 0) vertex, with the white vertex (1, 1, 1) directly above it, as
shown in Fig. 6.12(a). As noted in connection with Fig. 6.7, the intensity (gray
scale) is along the line joining these two vertices. In the arrangement shown in
Fig. 6.12, the line (intensity axis) joining the black and white vertices is vertical.
Thus, if we wanted to determine the intensity component of any color point in
Fig. 6.12, we would simply pass a plane perpendicular to the intensity axis and
containing the color point. The intersection of the plane with the intensity axis

ud
would give us a point with intensity value in the range [0, 1]. We also note with a
little thought that the saturation (purity) of a color increases as a function of dis-
tance from the intensity axis. In fact, the saturation of points on the intensity axis
is zero, as evidenced by the fact that all points along this axis are gray.
In order to see how hue can be determined also from a given RGB point,
consider Fig. 6.12(b), which shows a plane defined by three points (black,
white, and cyan). The fact that the black and white points are contained in the
plane tells us that the intensity axis also is contained in the plane. Further-
lo
more, we see that all points contained in the plane segment defined by the in-
tensity axis and the boundaries of the cube have the same hue (cyan in this
case). We would arrive at the same conclusion by recalling from Section 6.1
that all colors generated by three colors lie in the triangle defined by those col-
ors. If two of those points are black and white and the third is a color point, all
points on the triangle would have the same hue because the black and white
C
components cannot change the hue (of course, the intensity and saturation of
points in this triangle would be different). By rotating the shaded plane about
the vertical intensity axis, we would obtain different hues. From these concepts
we arrive at the conclusion that the hue, saturation, and intensity values re-
tu

quired to form the HSI space can be obtained from the RGB color cube. That
is, we can convert any RGB point to a corresponding point in the HSI color
model by working out the geometrical formulas describing the reasoning out-
lined in the preceding discussion.

a b
V

FIGURE 6.12 White White


Conceptual
relationships
between the RGB
and HSI color Magenta Magenta
models. Cyan Yellow Cyan Yellow

Blue Red Blue Red


Green Green

Black Black
6.2 ■ Color Models 409

The key point to keep in mind regarding the cube arrangement in Fig. 6.12
and its corresponding HSI color space is that the HSI space is represented by a
vertical intensity axis and the locus of color points that lie on planes
perpendicular to this axis. As the planes move up and down the intensity axis,
the boundaries defined by the intersection of each plane with the faces of the
cube have either a triangular or hexagonal shape. This can be visualized much
more readily by looking at the cube down its gray-scale axis, as shown in
Fig. 6.13(a). In this plane we see that the primary colors are separated by 120°.
The secondary colors are 60° from the primaries, which means that the angle
between secondaries also is 120°. Figure 6.13(b) shows the same hexagonal
shape and an arbitrary color point (shown as a dot). The hue of the point is de-

ud
termined by an angle from some reference point. Usually (but not always) an
angle of 0° from the red axis designates 0 hue, and the hue increases counter-
clockwise from there. The saturation (distance from the vertical axis) is the
length of the vector from the origin to the point. Note that the origin is defined
by the intersection of the color plane with the vertical intensity axis. The impor-
tant components of the HSI color space are the vertical intensity axis, the
length of the vector to a color point, and the angle this vector makes with the
red axis. Therefore, it is not unusual to see the HSI planes defined is terms of
lo
the hexagon just discussed, a triangle, or even a circle, as Figs. 6.13(c) and (d)
show. The shape chosen does not matter because any one of these shapes can
be warped into one of the other two by a geometric transformation. Figure 6.14
shows the HSI model based on color triangles and also on circles.
C
Green Yellow

White
tu

Cyan Red

Blue Magenta
Green Yellow Green Yellow
Green
V

S S
H H
Cyan Red Cyan S Yellow Cyan Red
H

Blue Magenta Blue Magenta Red Blue Magenta

a
b c d
FIGURE 6.13 Hue and saturation in the HSI color model. The dot is an arbitrary color
point. The angle from the red axis gives the hue, and the length of the vector is the
saturation. The intensity of all colors in any of these planes is given by the position of
the plane on the vertical intensity axis.
410 Chapter 6 ■ Color Image Processing

a White
b
FIGURE 6.14 The I  0.75
HSI color model
based on
(a) triangular and I
(b) circular color
planes. The Green
triangles and Cyan Yellow
circles are H
perpendicular to S
I  0.5 Blue Red
the vertical Magenta

ud
intensity axis.

Black
lo I  0.75
White

I
C
Green Yellow
I  0.5 Cyan H
S Red
tu

Blue Magenta
V

Black

Converting colors from RGB to HSI


Computations from RGB Given an image in RGB color format, the H component of each RGB pixel is
to HSI and back are
carried out on a per-pixel obtained using the equation
basis. We omitted the
dependence on (x, y) of u if B … G
the conversion equations H = b (6.2-2)
for notational clarity. 360 - u if B 7 G
6.2 ■ Color Models 411

with†
1
2 [(R - G) + (R - B)]
u = cos b -1
r
[(R - G) + (R - B)(G - B)]1>2
2

The saturation component is given by


3
S = 1 - [min(R, G, B)] (6.2-3)
(R + G + B)
Finally, the intensity component is given by
1
I =
(R + G + B) (6.2-4)
3

ud
It is assumed that the RGB values have been normalized to the range [0, 1]
and that angle u is measured with respect to the red axis of the HSI space, as
indicated in Fig. 6.13. Hue can be normalized to the range [0, 1] by dividing by
360° all values resulting from Eq. (6.2-2). The other two HSI components al-
ready are in this range if the given RGB values are in the interval [0, 1].
The results in Eqs. (6.2-2) through (6.2-4) can be derived from the geometry
shown in Figs. 6.12 and 6.13. The derivation is tedious and would not add sig-

following HSI to RGB conversion results.


Converting colors from HSI to RGB
lo
nificantly to the present discussion. The interested reader can consult the
book’s references or Web site for a proof of these equations, as well as for the
Consult the Tutorials sec-
tion of the book Web site
for a detailed derivation
of the conversion equa-
tions between RGB and
HSI, and vice versa.

Given values of HSI in the interval [0, 1], we now want to find the correspond-
C
ing RGB values in the same range. The applicable equations depend on the
values of H. There are three sectors of interest, corresponding to the 120° in-
tervals in the separation of primaries (see Fig. 6.13). We begin by multiplying
H by 360°, which returns the hue to its original range of [0°, 360°].
tu

RG sector (0° … H 6 120°): When H is in this sector, the RGB components


are given by the equations
B = I(1 - S) (6.2-5)

d
S cos H
R = Ic1 + (6.2-6)
cos(60° - H)
and
V

G = 3I - (R + B) (6.2-7)

GB sector (120° … H 6 240°): If the given value of H is in this sector, we first


subtract 120° from it:
H = H - 120° (6.2-8)


It is good practice to add a small number in the denominator of this expression to avoid dividing by 0
when R = G = B, in which case u will be 90°. Note that when all RGB components are equal, Eq. (6.2-3)
gives S = 0. In addition, the conversion from HSI back to RGB in Eqs. (6.2-5) through (6.2-7) will give
R = G = B = I, as expected, because when R = G = B, we are dealing with a gray-scale image.
412 Chapter 6 ■ Color Image Processing

Then the RGB components are


R = I(1 - S) (6.2-9)

d
S cos H
G = Ic1 + (6.2-10)
cos(60° - H)
and
B = 3I - (R + G) (6.2-11)

BR sector (240° … H … 360°): Finally, if H is in this range, we subtract 240°


from it:

ud
H = H - 240° (6.2-12)
Then the RGB components are
G = I(1 - S) (6.2-13)

d
S cos H
B = Ic1 + (6.2-14)
cos(60° - H)
and
lo R = 3I - (G + B) (6.2-15)
Uses of these equations for image processing are discussed in several of the
following sections.

EXAMPLE 6.2: ■ Figure 6.15 shows the hue, saturation, and intensity images for the RGB
The HSI values values shown in Fig. 6.8. Figure 6.15(a) is the hue image. Its most distinguishing
C
corresponding to feature is the discontinuity in value along a 45° line in the front (red) plane of
the image of the
RGB color cube. the cube. To understand the reason for this discontinuity, refer to Fig. 6.8, draw
a line from the red to the white vertices of the cube, and select a point in the
middle of this line. Starting at that point, draw a path to the right, following the
tu

cube around until you return to the starting point. The major colors encoun-
tered in this path are yellow, green, cyan, blue, magenta, and back to red. Ac-
cording to Fig. 6.13, the values of hue along this path should increase from 0°
V

a b c
FIGURE 6.15 HSI components of the image in Fig. 6.8. (a) Hue, (b) saturation, and (c) intensity images.
6.2 ■ Color Models 413

to 360° (i.e., from the lowest to highest possible values of hue). This is precise-
ly what Fig. 6.15(a) shows because the lowest value is represented as black and
the highest value as white in the gray scale. In fact, the hue image was original-
ly normalized to the range [0, 1] and then scaled to 8 bits; that is, it was con-
verted to the range [0, 255], for display.
The saturation image in Fig. 6.15(b) shows progressively darker values to-
ward the white vertex of the RGB cube, indicating that colors become less and
less saturated as they approach white. Finally, every pixel in the intensity
image shown in Fig. 6.15(c) is the average of the RGB values at the corre-
sponding pixel in Fig. 6.8. ■

ud
Manipulating HSI component images
In the following discussion, we take a look at some simple techniques for ma-
nipulating HSI component images. This will help you develop familiarity with
these components and also help you deepen your understanding of the HSI color
model. Figure 6.16(a) shows an image composed of the primary and secondary
RGB colors. Figures 6.16(b) through (d) show the H, S, and I components of
this image, generated using Eqs. (6.2-2) through (6.2-4). Recall from the dis-
lo
cussion earlier in this section that the gray-level values in Fig. 6.16(b) corre-
spond to angles; thus, for example, because red corresponds to 0°, the red
region in Fig. 6.16(a) is mapped to a black region in the hue image. Similarly,
the gray levels in Fig. 6.16(c) correspond to saturation (they were scaled to
[0, 255] for display), and the gray levels in Fig. 6.16(d) are average intensities.
C
a b
c d
FIGURE 6.16
(a) RGB image
tu

and the com-


ponents of its
corresponding
HSI image:
(b) hue,
(c) saturation, and
(d) intensity.
V
414 Chapter 6 ■ Color Image Processing

a b
c d
FIGURE 6.17
(a)–(c) Modified
HSI component
images.
(d) Resulting
RGB image. (See
Fig. 6.16 for the
original HSI
images.)

ud
lo
To change the individual color of any region in the RGB image, we change
the values of the corresponding region in the hue image of Fig. 6.16(b). Then
C
we convert the new H image, along with the unchanged S and I images, back to
RGB using the procedure explained in connection with Eqs. (6.2-5) through
(6.2-15). To change the saturation (purity) of the color in any region, we follow
the same procedure, except that we make the changes in the saturation image
tu

in HSI space. Similar comments apply to changing the average intensity of any
region. Of course, these changes can be made simultaneously. For example, the
image in Fig. 6.17(a) was obtained by changing to 0 the pixels corresponding to
the blue and green regions in Fig. 6.16(b). In Fig. 6.17(b) we reduced by half
the saturation of the cyan region in component image S from Fig. 6.16(c). In
Fig. 6.17(c) we reduced by half the intensity of the central white region in the
intensity image of Fig. 6.16(d). The result of converting this modified HSI
V

image back to RGB is shown in Fig. 6.17(d). As expected, we see in this figure
that the outer portions of all circles are now red; the purity of the cyan region
was diminished, and the central region became gray rather than white. Al-
though these results are simple, they illustrate clearly the power of the HSI
color model in allowing independent control over hue, saturation, and intensi-
ty, quantities with which we are quite familiar when describing colors.

6.3 Pseudocolor Image Processing


Pseudocolor (also called false color) image processing consists of assigning col-
ors to gray values based on a specified criterion. The term pseudo or false color
is used to differentiate the process of assigning colors to monochrome images
6.3 ■ Pseudocolor Image Processing 415

from the processes associated with true color images, a topic discussed starting
in Section 6.4. The principal use of pseudocolor is for human visualization and
interpretation of gray-scale events in an image or sequence of images. As noted
at the beginning of this chapter, one of the principal motivations for using color
is the fact that humans can discern thousands of color shades and intensities,
compared to only two dozen or so shades of gray.

6.3.1 Intensity Slicing


The technique of intensity (sometimes called density) slicing and color coding is
one of the simplest examples of pseudocolor image processing. If an image is in-
terpreted as a 3-D function [see Fig. 2.18(a)], the method can be viewed as one

ud
of placing planes parallel to the coordinate plane of the image; each plane then
“slices” the function in the area of intersection. Figure 6.18 shows an example of
using a plane at f(x, y) = li to slice the image function into two levels.
If a different color is assigned to each side of the plane shown in Fig. 6.18,
any pixel whose intensity level is above the plane will be coded with one color,
and any pixel below the plane will be coded with the other. Levels that lie on
the plane itself may be arbitrarily assigned one of the two colors. The result is
a two-color image whose relative appearance can be controlled by moving the
lo
slicing plane up and down the intensity axis.
In general, the technique may be summarized as follows. Let [0, L - 1]
represent the gray scale, let level l0 represent black [f(x, y) = 0], and level
lL - 1 represent white [f(x, y) = L - 1]. Suppose that P planes perpendicular
to the intensity axis are defined at levels l1, l2, Á , lP. Then, assuming that
0 6 P 6 L - 1, the P planes partition the gray scale into P + 1 intervals,
C
V1, V2, Á , VP + 1. Intensity to color assignments are made according to the re-
lation
f(x, y) = ck if f(x, y) H Vk (6.3-1)
tu

f(x, y) FIGURE 6.18


Intensity axis Geometric
(White) L  1
interpretation of
the intensity-
Slicing plane slicing technique.
li
V

(Black) 0 y

x
416 Chapter 6 ■ Color Image Processing

FIGURE 6.19 An
alternative
representation of c2
the intensity-
slicing technique.

Color
c1

0 li L1

ud
Intensity levels

where ck is the color associated with the kth intensity interval Vk defined by
the partitioning planes at l = k - 1 and l = k.
The idea of planes is useful primarily for a geometric interpretation of the
intensity-slicing technique. Figure 6.19 shows an alternative representation
that defines the same mapping as in Fig. 6.18. According to the mapping func-
tion shown in Fig. 6.19, any input intensity level is assigned one of two colors,
lo
depending on whether it is above or below the value of li. When more levels
are used, the mapping function takes on a staircase form.

EXAMPLE 6.3: ■ A simple, but practical, use of intensity slicing is shown in Fig. 6.20. Figure
Intensity slicing. 6.20(a) is a monochrome image of the Picker Thyroid Phantom (a radiation
C
test pattern), and Fig. 6.20(b) is the result of intensity slicing this image into
eight color regions. Regions that appear of constant intensity in the mono-
chrome image are really quite variable, as shown by the various colors in the
sliced image. The left lobe, for instance, is a dull gray in the monochrome
image, and picking out variations in intensity is difficult. By contrast, the color
tu

image clearly shows eight different regions of constant intensity, one for each
of the colors used. ■

a b
FIGURE 6.20
(a) Monochrome
V

image of the Picker


Thyroid Phantom.
(b) Result of
density slicing into
eight colors.
(Courtesy of Dr.
J. L. Blankenship,
Instrumentation
and Controls
Division, Oak
Ridge National
Laboratory.)
6.3 ■ Pseudocolor Image Processing 417

In the preceding simple example, the gray scale was divided into intervals and
a different color was assigned to each region, without regard for the meaning of
the gray levels in the image. Interest in that case was simply to view the different
gray levels constituting the image. Intensity slicing assumes a much more mean-
ingful and useful role when subdivision of the gray scale is based on physical
characteristics of the image. For instance, Fig. 6.21(a) shows an X-ray image of a
weld (the horizontal dark region) containing several cracks and porosities (the
bright, white streaks running horizontally through the middle of the image). It
is known that when there is a porosity or crack in a weld, the full strength of the
X-rays going through the object saturates the imaging sensor on the other side of
the object.Thus, intensity values of 255 in an 8-bit image coming from such a sys-

ud
tem automatically imply a problem with the weld. If a human were to be the ulti-
mate judge of the analysis, and manual processes were employed to inspect welds
(still a common procedure today), a simple color coding that assigns one color to

a
b
FIGURE 6.21
lo (a) Monochrome
X-ray image of a
weld. (b) Result
of color coding.
(Original image
courtesy of
X-TEK Systems,
Ltd.)
C
tu
V
418 Chapter 6 ■ Color Image Processing

level 255 and another to all other intensity levels would simplify the inspector’s
job considerably. Figure 6.21(b) shows the result. No explanation is required to
arrive at the conclusion that human error rates would be lower if images were
displayed in the form of Fig. 6.21(b), instead of the form shown in Fig. 6.21(a). In
other words, if the exact intensity value or range of values one is looking for is
known, intensity slicing is a simple but powerful aid in visualization, especially if
numerous images are involved. The following is a more complex example.

EXAMPLE 6.4: ■ Measurement of rainfall levels, especially in the tropical regions of the
Use of color to Earth, is of interest in diverse applications dealing with the environment. Accu-
highlight rainfall rate measurements using ground-based sensors are difficult and expensive to

ud
levels.
acquire, and total rainfall figures are even more difficult to obtain because a
significant portion of precipitation occurs over the ocean. One approach for ob-
taining rainfall figures is to use a satellite. The TRMM (Tropical Rainfall Mea-
suring Mission) satellite utilizes, among others, three sensors specially designed
to detect rain: a precipitation radar, a microwave imager, and a visible and in-
frared scanner (see Sections 1.3 and 2.3 regarding image sensing modalities).
The results from the various rain sensors are processed, resulting in esti-
lo
mates of average rainfall over a given time period in the area monitored by the
sensors. From these estimates, it is not difficult to generate gray-scale images
whose intensity values correspond directly to rainfall, with each pixel repre-
senting a physical land area whose size depends on the resolution of the sen-
sors. Such an intensity image is shown in Fig. 6.22(a), where the area monitored
by the satellite is the slightly lighter horizontal band in the middle one-third of
C
the picture (these are the tropical regions). In this particular example, the rain-
fall values are average monthly values (in inches) over a three-year period.
Visual examination of this picture for rainfall patterns is quite difficult, if
not impossible. However, suppose that we code intensity levels from 0 to 255
using the colors shown in Fig. 6.22(b). Values toward the blues signify low val-
tu

ues of rainfall, with the opposite being true for red. Note that the scale tops out
at pure red for values of rainfall greater than 20 inches. Figure 6.22(c) shows
the result of color coding the gray image with the color map just discussed. The
results are much easier to interpret, as shown in this figure and in the zoomed
area of Fig. 6.22(d). In addition to providing global coverage, this type of data
allows meteorologists to calibrate ground-based rain monitoring systems with

V

greater precision than ever before.

6.3.2 Intensity to Color Transformations


Other types of transformations are more general and thus are capable of
achieving a wider range of pseudocolor enhancement results than the simple
slicing technique discussed in the preceding section. An approach that is partic-
ularly attractive is shown in Fig. 6.23. Basically, the idea underlying this ap-
proach is to perform three independent transformations on the intensity of any
input pixel. The three results are then fed separately into the red, green, and
blue channels of a color television monitor. This method produces a composite
image whose color content is modulated by the nature of the transformation
6.3 ■ Pseudocolor Image Processing 419

ud
lo
C
a b
c d
FIGURE 6.22 (a) Gray-scale image in which intensity (in the lighter horizontal band shown) corresponds to
average monthly rainfall. (b) Colors assigned to intensity values. (c) Color-coded image. (d) Zoom of the
South American region. (Courtesy of NASA.)
tu

FIGURE 6.23
Red Functional block
fR (x, y)
transformation diagram for
pseudocolor
V

image processing.
fR, fG, and fB are
Green fed into the
f(x, y) fG (x, y) corresponding
transformation
red, green, and
blue inputs of an
RGB color
monitor.
Blue
fB (x, y)
transformation
420 Chapter 6 ■ Color Image Processing

functions. Note that these are transformations on the intensity values of an


image and are not functions of position.
The method discussed in the previous section is a special case of the tech-
nique just described. There, piecewise linear functions of the intensity levels
(Fig. 6.19) are used to generate colors. The method discussed in this section, on
the other hand, can be based on smooth, nonlinear functions, which, as might
be expected, gives the technique considerable flexibility.

EXAMPLE 6.5: ■ Figure 6.24(a) shows two monochrome images of luggage obtained from an
Use of airport X-ray scanning system. The image on the left contains ordinary articles.

ud
pseudocolor for The image on the right contains the same articles, as well as a block of simulated
highlighting
explosives plastic explosives. The purpose of this example is to illustrate the use of intensi-
contained in ty level to color transformations to obtain various degrees of enhancement.
luggage. Figure 6.25 shows the transformation functions used. These sinusoidal func-
tions contain regions of relatively constant value around the peaks as well as
regions that change rapidly near the valleys. Changing the phase and frequen-
cy of each sinusoid can emphasize (in color) ranges in the gray scale. For in-
stance, if all three transformations have the same phase and frequency, the
lo
output image will be monochrome. A small change in the phase between the
three transformations produces little change in pixels whose intensities corre-
spond to peaks in the sinusoids, especially if the sinusoids have broad profiles
(low frequencies). Pixels with intensity values in the steep section of the sinu-
soids are assigned a much stronger color content as a result of significant dif-
ferences between the amplitudes of the three sinusoids caused by the phase
C
displacement between them.

a
b c
tu

FIGURE 6.24
Pseudocolor
enhancement by
using the gray
level to color
transformations in
Fig. 6.25.
V

(Original image
courtesy of
Dr. Mike Hurwitz,
Westinghouse.)
6.3 ■ Pseudocolor Image Processing 421

a
L1 b
FIGURE 6.25
Red Transformation
functions used to
obtain the images
L1 in Fig. 6.24.

Green

L1

ud
Blue

Intensity
0 L1
Explosive Garment Background
bag

L1 lo
Red

L1
C
Green

L1
tu

Blue

Intensity
0 L1
Explosive Garment Background
bag
V

The image shown in Fig. 6.24(b) was obtained with the transformation
functions in Fig. 6.25(a), which shows the gray-level bands corresponding to
the explosive, garment bag, and background, respectively. Note that the ex-
plosive and background have quite different intensity levels, but they were
both coded with approximately the same color as a result of the periodicity of
the sine waves. The image shown in Fig. 6.24(c) was obtained with the trans-
formation functions in Fig. 6.25(b). In this case the explosives and garment
bag intensity bands were mapped by similar transformations and thus re-
ceived essentially the same color assignments. Note that this mapping allows
an observer to “see” through the explosives. The background mappings were
about the same as those used for Fig. 6.24(b), producing almost identical color
assignments. ■
422 Chapter 6 ■ Color Image Processing

FIGURE 6.26 A g 1 (x, y)


pseudocolor f 1 (x, y) Transformation T1 h R (x, y)
coding approach
used when several
monochrome g 2 (x, y)
images are Additional hG (x, y)
f 2 (x, y) Transformation T2
available. processing

g K (x, y)
f K (x, y) Transformation TK hB (x, y)

ud
The approach shown in Fig. 6.23 is based on a single monochrome image.
Often, it is of interest to combine several monochrome images into a single
color composite, as shown in Fig. 6.26. A frequent use of this approach (illus-
trated in Example 6.6) is in multispectral image processing, where different
sensors produce individual monochrome images, each in a different spectral
band. The types of additional processes shown in Fig. 6.26 can be techniques
such as color balancing (see Section 6.5.4), combining images, and selecting
lo
the three images for display based on knowledge about response characteris-
tics of the sensors used to generate the images.

EXAMPLE 6.6: ■ Figures 6.27(a) through (d) show four spectral satellite images of Washing-
Color coding of ton, D.C., including part of the Potomac River. The first three images are in the
C
multispectral visible red, green, and blue, and the fourth is in the near infrared (see Table 1.1
images.
and Fig. 1.10). Figure 6.27(e) is the full-color image obtained by combining the
first three images into an RGB image. Full-color images of dense areas are dif-
ficult to interpret, but one notable feature of this image is the difference in
tu

color in various parts of the Potomac River. Figure 6.27(f) is a little more in-
teresting.This image was formed by replacing the red component of Fig. 6.27(e)
with the near-infrared image. From Table 1.1, we know that this band is strong-
ly responsive to the biomass components of a scene. Figure 6.27(f) shows quite
clearly the difference between biomass (in red) and the human-made features
in the scene, composed primarily of concrete and asphalt, which appear bluish
in the image.
V

The type of processing just illustrated is quite powerful in helping visualize


events of interest in complex images, especially when those events are beyond
our normal sensing capabilities. Figure 6.28 is an excellent illustration of this.
These are images of the Jupiter moon Io, shown in pseudocolor by combining
several of the sensor images from the Galileo spacecraft, some of which are in
spectral regions not visible to the eye. However, by understanding the physical
and chemical processes likely to affect sensor response, it is possible to combine
the sensed images into a meaningful pseudocolor map. One way to combine the
sensed image data is by how they show either differences in surface chemical
composition or changes in the way the surface reflects sunlight. For example, in
the pseudocolor image in Fig. 6.28(b), bright red depicts material newly ejected
6.3 ■ Pseudocolor Image Processing 423

ud
lo
C
tu
V

FIGURE 6.27 (a)–(d) Images in bands 1–4 in Fig. 1.10 (see Table 1.1). (e) Color a b
composite image obtained by treating (a), (b), and (c) as the red, green, blue com- c d
ponents of an RGB image. (f) Image obtained in the same manner, but using in the e f
red channel the near-infrared image in (d). (Original multispectral images courtesy
of NASA.)
424 Chapter 6 ■ Color Image Processing

a
b
FIGURE 6.28
(a) Pseudocolor
rendition of
Jupiter Moon Io.
(b) A close-up.
(Courtesy of
NASA.)

ud
lo
C
tu

from an active volcano on Io, and the surrounding yellow materials are older
sulfur deposits. This image conveys these characteristics much more readily
V

than would be possible by analyzing the component images individually. ■

6.4 Basics of Full-Color Image Processing


In this section, we begin the study of processing techniques applicable to full-
color images. Although they are far from being exhaustive, the techniques de-
veloped in the sections that follow are illustrative of how full-color images are
handled for a variety of image processing tasks. Full-color image processing
approaches fall into two major categories. In the first category, we process
each component image individually and then form a composite processed
color image from the individually processed components. In the second category,
we work with color pixels directly. Because full-color images have at least
462 Chapter 7 ■ Wavelets and Multiresolution Processing

In this chapter, we examine wavelet-based transformations from a multires-


olution point of view. Although such transformations can be presented in other
ways, this approach simplifies both their mathematical and physical interpreta-
tions. We begin with an overview of imaging techniques that influenced the for-
mulation of multiresolution theory. Our objective is to introduce the theory’s
fundamental concepts within the context of image processing and simultane-
ously provide a brief historical perspective of the method and its application.
The bulk of the chapter is focused on the development and use of the discrete
wavelet transform. To demonstrate the usefulness of the transform, examples
ranging from image coding to noise removal and edge detection are provided.
In the next chapter, wavelets will be used for image compression, an application

ud
in which they have received considerable attention.

7.1 Background
When we look at images, generally we see connected regions of similar texture
and intensity levels that combine to form objects. If the objects are small in
size or low in contrast, we normally examine them at high resolutions; if they
are large in size or high in contrast, a coarse view is all that is required. If both
lo
small and large objects—or low- and high-contrast objects—are present simul-
taneously, it can be advantageous to study them at several resolutions. This, of
course, is the fundamental motivation for multiresolution processing.
From a mathematical viewpoint, images are two-dimensional arrays of inten-
Local histograms are sity values with locally varying statistics that result from different combinations
histograms of the pixels
of abrupt features like edges and contrasting homogeneous regions.As illustrated
C
in a neighborhood (see
Section 3.3.3). in Fig. 7.1—an image that will be examined repeatedly in the remainder of the

FIGURE 7.1
tu

An image and its


local histogram
variations.
V

www.EBooksWorld.ir
7.1 ■ Background 463

section—local histograms can vary significantly from one part of an image to


another, making statistical modeling over the span of an entire image a diffi-
cult, or impossible task.

7.1.1 Image Pyramids


A powerful, yet conceptually simple structure for representing images at more
than one resolution is the image pyramid (Burt and Adelson [1983]). Originally
devised for machine vision and image compression applications, an image
pyramid is a collection of decreasing resolution images arranged in the shape
of a pyramid. As can be seen in Fig. 7.2(a), the base of the pyramid contains a
high-resolution representation of the image being processed; the apex con-

ud
tains a low-resolution approximation. As you move up the pyramid, both size
and resolution decrease. Base level J is of size 2J * 2J or N * N, where
J = log2 N, apex level 0 is of size 1 * 1, and general level j is of size 2j * 2j,
where 0 … j … J. Although the pyramid shown in Fig. 7.2(a) is composed of
J + 1 resolution levels from 2J * 2J to 20 * 20, most image pyramids are trun-
cated to P + 1 levels, where 1 … P … J and j = J - P, Á , J - 2, J - 1, J.
That is, we normally limit ourselves to P reduced resolution approximations of
the original image; a 1 * 1 (i.e., single pixel) approximation of a 512 * 512
lo
image, for example, is of little value. The total number of pixels in a P + 1 level
pyramid for P 7 0 is

1 1 1 4
N2 ¢ 1 + 1
+ 2
+Á+ P
≤ … N2
(4) (4) (4) 3
C
Figure 7.2(b) shows a simple system for constructing two intimately related
image pyramids. The Level j - 1 approximation output provides the images

Level 0 (apex)
a
11
tu

b
22 Level 1
FIGURE 7.2
44 Level 2
(a) An image
pyramid. (b) A
simple system for
Level J  1 creating
N/2  N/ 2
approximation
Level J (base) and prediction
NN
V

residual pyramids.

Downsampler
(rows and columns)
Approximation Level j  1
2T
filter approximation
2c Upsampler
(rows and columns)
Interpolation
filter
Prediction
Level j
Level j 
 prediction
input image
residual

www.EBooksWorld.ir
464 Chapter 7 ■ Wavelets and Multiresolution Processing

needed to build an approximation pyramid (as described in the preceding


paragraph), while the Level j prediction residual output is used to build a
complementary prediction residual pyramid. Unlike approximation pyramids,
prediction residual pyramids contain only one reduced-resolution approxi-
mation of the input image (at the top of the pyramid, level J - P). All other
In general, a prediction levels contain prediction residuals, where the level j prediction residual (for
residual can be defined
as the difference
J - P + 1 … j … J) is defined as the difference between the level j approxi-
between an image and a mation (the input to the block diagram) and an estimate of the level j approx-
predicted version of the
image. As will be seen in
imation based on the level j - 1 approximation (the approximation output in
Section 8.2.9, prediction the block diagram).
residuals can often be
As Fig. 7.2(b) suggests, both approximation and prediction residual pyra-

ud
coded more efficiently
than 2-D intensity arrays. mids are computed in an iterative fashion. Before the first iteration, the image
to be represented in pyramidal form is placed in level J of the approximation
pyramid. The following three-step procedure is then executed P times—for
j = J, J - 1, Á , and J - P + 1 (in that order):

Step 1. Compute a reduced-resolution approximation of the Level j input


image [the input on the left side of the block diagram in Fig. 7.2(b)]. This is
done by filtering and downsampling the filtered result by a factor of 2. Both
lo
of these operations are described in the next paragraph. Place the resulting
approximation at level j - 1 of the approximation pyramid.
Step 2. Create an estimate of the Level j input image from the reduced-
resolution approximation generated in step 1. This is done by upsampling
and filtering (see the next paragraph) the generated approximation. The re-
sulting prediction image will have the same dimensions as the Level j input
C
image.
Step 3. Compute the difference between the prediction image of step 2
and the input to step 1. Place this result in level j of the prediction residual
pyramid.
tu

At the conclusion of P iterations (i.e., following the iteration in which


j = J - P + 1), the level J - P approximation output is placed in the pre-
diction residual pyramid at level J - P. If a prediction residual pyramid is not
needed, this operation—along with steps 2 and 3 and the upsampler, inter-
polation filter, and summer of Fig. 7.2(b)—can be omitted.
A variety of approximation and interpolation filters can be incorporated
V

into the system of Fig. 7.2(b). Typically, the filtering is performed in the spatial
domain (see Section 3.4). Useful approximation filtering techniques include
neighborhood averaging (see Section 3.5.1.), which produces mean pyramids;
lowpass Gaussian filtering (see Sections 4.7.4 and 4.8.3), which produces
Gaussian pyramids; and no filtering, which results in subsampling pyramids.
Any of the interpolation methods described in Section 2.4.4, including nearest
neighbor, bilinear, and bicubic, can be incorporated into the interpolation fil-
ter. Finally, we note that the upsampling and downsampling blocks of Fig.
7.2(b) are used to double and halve the spatial dimensions of the approxima-
tion and prediction images that are computed. Given an integer variable n and
1-D sequence of samples f(n), upsampled sequence f2c (n) is defined as

www.EBooksWorld.ir
7.1 ■ Background 465
In this chapter, we will be
f(n>2) if n is even
f2c (n) = b (7.1-1) working with both
continuous and discrete
0 otherwise functions and variables.
With the notable
exception of 2-D image
where, as is indicated by the subscript, the upsampling is by a factor of 2. The f(x, y) and unless other-
complementary operation of downsampling by 2 is defined as wise noted, x, y, z, Á are
continuous variables;
i, j, k, l, m, n, Á are
f2T (n) = f(2n) (7.1-2) discrete variables.

Upsampling can be thought of as inserting a 0 after every sample in a sequence;


downsampling can be viewed as discarding every other sample. The upsampling
and downsampling blocks in Fig. 7.2(b), which are labeled 2 c and 2 T, respectively,

ud
are annotated to indicate that both the rows and columns of the 2-D inputs on
which they operate are to be up- and downsampled. Like the separable 2-D DFT
in Section 4.11.1, 2-D upsampling and downsampling can be performed by suc-
cessive passes of the 1-D operations defined in Eqs. (7.1-1) and (7.1-2).

■ Figure 7.3 shows both an approximation pyramid and a prediction residual EXAMPLE 7.1:
pyramid for the vase of Fig. 7.1. A lowpass Gaussian smoothing filter (see Approximation
lo
Section 4.7.4) was used to produce the four-level approximation pyramid in
Fig. 7.3(a). As you can see, the resulting pyramid contains the original
512 * 512 resolution image (at its base) and three low-resolution approxima-
tions (of resolution 256 * 256, 128 * 128, and 64 * 64). Thus, P is 3 and levels
and prediction
residual pyramids.

9, 8, 7, and 6 out of a possible log 2 (512) + 1 or 10 levels are present. Note the
reduction in detail that accompanies the lower resolutions of the pyramid. The
C
level 6 (i.e., 64 * 64) approximation image is suitable for locating the window
stiles (i.e., the window pane framing), for example, but not for finding the stems
of the plant. In general, the lower-resolution levels of a pyramid can be used for
the analysis of large structures or overall image context; the high-resolution im-
tu

ages are appropriate for analyzing individual object characteristics. Such a


coarse-to-fine analysis strategy is particularly useful in pattern recognition.
A bilinear interpolation filter was used to produce the prediction residual
pyramid in Fig. 7.3(b). In the absence of quantization error, the resulting predic-
tion residual pyramid can be used to generate the complementary approxima-
tion pyramid in Fig. 7.3(a), including the original image, without error. To do so,
we begin with the level 6 64 * 64 approximation image (the only approxima-
V

tion image in the prediction residual pyramid), predict the level 7 128 * 128 res-
olution approximation (by upsampling and filtering), and add the level 7
prediction residual. This process is repeated using successively computed ap-
proximation images until the original 512 * 512 image is generated. Note that
the prediction residual histogram in Fig. 7.3(b) is highly peaked around zero; the
approximation histogram in Fig. 7.3(a) is not. Unlike approximation images, pre-
diction residual images can be highly compressed by assigning fewer bits to the
more probable values (see the variable-length codes of Section 8.2.1). Finally, we
note that the prediction residuals in Fig. 7.3(b) are scaled to make small predic-
tion errors more visible; the prediction residual histogram, however, is based on
the original residual values, with level 128 representing zero error. ■

www.EBooksWorld.ir
466 Chapter 7 ■ Wavelets and Multiresolution Processing

a
b
FIGURE 7.3
Two image
pyramids and
their histograms:
(a) an
approximation
pyramid;
(b) a prediction
residual pyramid.

ud
The approximation
pyramid in (a) is called a
Gaussian pyramid
because a Gaussian filter
was used to construct it.
The prediction residual
pyramid in (b) is often
called a Laplacian
pyramid; note the lo
similarity in appearance
with the Laplacian fil-
tered images in Chapter 3.
C
tu

7.1.2 Subband Coding


Another important imaging technique with ties to multiresolution analysis is
V

subband coding. In subband coding, an image is decomposed into a set of


bandlimited components, called subbands. The decomposition is performed so
that the subbands can be reassembled to reconstruct the original image with-
out error. Because the decomposition and reconstruction are performed by
means of digital filters, we begin our discussion with a brief introduction to
digital signal processing (DSP) and digital signal filtering.
The term “delay” implies Consider the simple digital filter in Fig. 7.4(a) and note that it is constructed
a time-based input
sequence and reflects the
from three basic components—unit delays, multipliers, and adders. Along the
fact that in digital signal top of the filter, unit delays are connected in series to create K- 1 delayed
filtering, the input is
usually a sampled analog
(i.e., right shifted) versions of the input sequence f(n). Delayed sequence
signal. f(n - 2), for example, is

www.EBooksWorld.ir
7.1 ■ Background 467

o
f(0) for n = 2
f(n - 2) = d
f(1) for n = 2 + 1 = 3
o
As the grayed annotations in Fig. 7.4(a) indicate, input sequence f(n) =
f(n - 0) and the K - 1 delayed sequences at the outputs of the unit delays,
denoted f(n - 1), f(n - 2), Á , f(n - K + 1), are multiplied by constants
h(0), h(1), Á , h(K - 1), respectively, and summed to produce the filtered
output sequence

ud
q If the coefficients of the
fN(n) = aqh(k)f(n - k)
filter in Fig. 7.4(a) are
indexed using values of n
k=-
between 0 and K - 1 (as
(7.1-3) we have done), the limits
= f(n)  h(n) on the sum in Eq. (7.1-3)
can be reduced to 0 to
where  denotes convolution. Note that—except for a change in variables— K - 1 [like Eq. (4.4-10)].
Eq. (7.1-3) is equivalent to the discrete convolution defined in Eq. (4.4-10) of
Chapter 4. The K multiplication constants in Fig. 7.4(a) and Eq. (7.1-3) are

f (n)
f(n  0)

h(0)
Unit
delay
f (n  1)

h(1)
loUnit
delay
f (n  2)

h(2)
... Unit
delay

h(K  1)
f (n  K  1)

h(0)f (n) h(1)f (n  1) h(2)f (n  2) h(K  1)f(n  K  1)


C
  ...  fˆ(n)  f (n)  h(n)

h(0)f(n)  h(1)f (n  1)

h(0)f(n)  h(1)f (n  1)  h(2)f (n  2)


tu

K1
h(k)f (n  k)  f (n)  h(n)
k0

1 1
h(1)
Input sequence f(n)  (n)

h(2)
Impulse response h(n)

h(0)
h(K  1)
V

h(3)
0 0
...
h(4)

1 1
1 0 1 2 . . . ... K1 1 0 1 2 . . . ... K1
n n
a
b c
FIGURE 7.4 (a) A digital filter; (b) a unit discrete impulse sequence; and (c) the impulse response of the filter.

www.EBooksWorld.ir
468 Chapter 7 ■ Wavelets and Multiresolution Processing

called filter coefficients. Each coefficient defines a filter tap, which can be
thought of as the components needed to compute one term of the sum in Eq.
(7.1-3), and the filter is said to be of order K.
If the input to the filter of Fig. 7.4(a) is the unit discrete impulse of
Fig. 7.4(b) and Section 4.2.3, Eq. (7.1-3) becomes
q
fN(n) = a h(k)d(n - k)
k = -q (7.1-4)
= h(n)

That is, by substituting d(n) for input f(n) in Eq. (7.1-3) and making use of

ud
the sifting property of the unit discrete impulse as defined in Eq. (4.2-13), we
find that the impulse response of the filter in Fig. 7.4(a) is the K-element se-
quence of filter coefficients that define the filter. Physically, the unit impulse
is shifted from left to right across the top of the filter (from one unit delay to
the next), producing an output that assumes the value of the coefficient at
the location of the delayed impulse. Because there are K coefficients, the im-
pulse response is of length K and the filter is called a finite impulse response
(FIR) filter.
In the remainder of the
chapter, “filter h(n)” will
be used to refer to the
filter whose impulse
response is h(n).
lo
Figure 7.5 shows the impulse responses of six functionally related filters. Fil-
ter h2(n) in Fig. 7.5(b) is a sign-reversed (i.e., reflected about the horizontal
axis) version of h1(n) in Fig. 7.5(a). That is,

h2(n) = -h1(n) (7.1-5)


C
1 1 1
h1(n) h2(n)  h1(n) h3(n)  h1(n)

0 0 0
tu

1 1 1
. . . 3 21 0 1 2 3 4 5 6 7 . . . . . . 3 21 0 1 2 3 4 5 6 7 . . . . . . 3 21 0 1 2 3 4 5 6 7 . . .
n n n

1 1 1
h4(n)  h1(K  1  n) h5(n)  (1)nh1(n) h6(n)  (1)nh1(K  1  n)
V

0 0 0

1 1 1
. . . 3 21 0 1 2 3 4 5 6 7 . . . . . . 3 21 0 1 2 3 4 5 6 7 . . . . . . 3 21 0 1 2 3 4 5 6 7 . . .
n n n

a b c
d e f
FIGURE 7.5 Six functionally related filter impulse responses: (a) reference response; (b) sign reversal;
(c) and (d) order reversal (differing by the delay introduced); (e) modulation; and (f) order reversal and
modulation.

www.EBooksWorld.ir
7.1 ■ Background 469

Filters h3(n) and h4(n) in Figs. 7.5(c) and (d) are order-reversed versions of Order reversal is often
called time reversal when
h1(n): the input sequence is a
sampled analog signal.
h3(n) = h1( -n) (7.1-6)
h4(n) = h1(K - 1 - n) (7.1-7)

Filter h3(n) is a reflection of h1(n) about the vertical axis; filter h4(n) is a re-
flected and translated (i.e., shifted) version of h1(n). Neglecting translation,
the responses of the two filters are identical. Filter h5(n) in Fig. 7.5(e), which is
defined as

ud
h5(n) = (-1)nh1(n) (7.1-8)

is called a modulated version of h1(n). Because modulation changes the signs


of all odd-indexed coefficients [i.e., the coefficients for which n is odd in
Fig. 7.5(e)], h5(1) = -h1(1) and h5(3) = -h1(3), while h5(0) = h1(0) and
h5(2) = h1(2). Finally, the sequence shown in Fig. 7.5(f) is an order-reversed
version of h1(n) that is also modulated:
lo
h6(n) = (-1)nh1(K - 1 - n)

This sequence is included to illustrate the fact that sign reversal, order rever-
sal, and modulation are sometimes combined in the specification of the rela-
tionship between two filters.
(7.1-9)

With this brief introduction to digital signal filtering, consider the two-band
C
subband coding and decoding system in Fig. 7.6(a). As indicated in the figure,
A filter bank is a collec-
the system is composed of two filter banks, each containing two FIR filters of tion of two or more filters.
the type shown in Fig. 7.4(a). Note that each of the four FIR filters is depicted
tu

flp(n) a
b
 h0(n) 2T 2c  g0(n)
FIGURE 7.6
(a) A two-band
f(n) Analysis filter bank Synthesis filter bank  fˆ(n) subband coding
and decoding
system, and (b) its
V

 h1(n) 2T 2c  g1(n)
spectrum splitting
properties.
fhp(n)

H0() H1()

Low band High band


0 /2 

www.EBooksWorld.ir
470 Chapter 7 ■ Wavelets and Multiresolution Processing

as a single block in Fig. 7.6(a), with the impulse response of each filter (and the
convolution symbol) written inside it. The analysis filter bank, which includes
filters h0 (n) and h1(n), is used to break input sequence f(n) into two half-
length sequences flp(n) and fhp(n), the subbands that represent the input. Note
that filters h0 (n) and h1(n) are half-band filters whose idealized transfer char-
acteristics, H0 and H1, are shown in Fig. 7.6(b). Filter h0 (n) is a lowpass filter
whose output, subband flp(n), is called an approximation of f(n); filter h1(n) is
a highpass filter whose output, subband fhp (n), is called the high frequency or
detail part of f(n). Synthesis bank filters g0 (n) and g1 (n) combine flp(n) and
fhp(n) to produce fN(n). The goal in subband coding is to select h0 (n), h1(n),

ud
g0 (n), and g1 (n) so that fN(n) = f(n). That is, so that the input and output of the
subband coding and decoding system are identical. When this is accomplished,
the resulting system is said to employ perfect reconstruction filters.
By real-coefficient, we There are many two-band, real-coefficient, FIR, perfect reconstruction fil-
mean that the filter
coefficients are real (not ter banks described in the filter bank literature. In all of them, the synthesis fil-
complex) numbers. ters are modulated versions of the analysis filters—with one (and only one)
synthesis filter being sign reversed as well. For perfect reconstruction, the im-
pulse responses of the synthesis and analysis filters must be related in one of
lo
the following two ways:
Equations (7.1-10) g0 (n) = (-1)nh1(n)
through (7.1-14) are (7.1-10)
described in detail in the
filter bank literature (see, g1 (n) = (-1)n + 1h0 (n)
for example, Vetterli and
C
Kovacevic [1995]). or

g0 (n) = (-1)n + 1h1(n)


(7.1-11)
g1 (n) = (-1)nh0 (n)
tu

Filters h0 (n), h1(n), g0 (n), and g1(n) in Eqs. (7.1-10) and (7.1-11) are said to be
cross-modulated because diagonally opposed filters in the block diagram of
Fig. 7.6(a) are related by modulation [and sign reversal when the modulation
factor is -(-1)n or (-1)n + 1]. Moreover, they can be shown to satisfy the fol-
lowing biorthogonality condition:

8hi (2n - k), gj (k)9 = d(i - j)d(n), i, j = 50, 16 (7.1-12)


V

Here, 8hi (2n - k), gj (k)9 denotes the inner product of hi (2n - k) and gj (k).†
When i is not equal to j, the inner product is 0; when i and j are equal, the
product is the unit discrete impulse function, d(n). Biorthogonality will be con-
sidered again in Section 7.2.1.
Of special interest in subband coding—and in the development of the fast
wavelet transform of Section 7.4—are filters that move beyond biorthogonality
and require

The vector inner product of sequences f1 (n) and f2 (n) is 8f1, f29 = a f*1(n)f2 (n), where the * denotes

the complex conjugate operation. If f1 (n) and f2 (n) are real, 8f1, f29 = 8f2, f19.
n

www.EBooksWorld.ir
7.1 ■ Background 471

8gi (n), gj (n + 2m)9 = d(i - j)d(m), i, j = 50, 16 (7.1-13)

which defines orthonormality for perfect reconstruction filter banks. In addi-


tion to Eq. (7.1-13), orthonormal filters can be shown to satisfy the following
two conditions:

g1 (n) = (-1)ng0 (Keven - 1 - n)


(7.1-14)
hi (n) = gi (Keven - 1 - n), i = 50, 16

where the subscript on Keven is used to indicate that the number of filter coef-

ud
ficients must be divisible by 2 (i.e., an even number). As Eq. (7.1-14) indicates,
synthesis filter g1 is related to g0 by order reversal and modulation. In addi-
tion, both h0 and h1 are order-reversed versions of synthesis filters, g0 and g1,
respectively. Thus, an orthonormal filter bank can be developed around the
impulse response of a single filter, called the prototype; the remaining filters
can be computed from the specified prototype’s impulse response. For
biorthogonal filter banks, two prototypes are required; the remaining filters
can be computed via Eq. (7.1-10) or (7.1-11). The generation of useful proto-
lo
type filters, whether orthonormal or biorthogonal, is beyond the scope of this
chapter. We simply use filters that have been presented in the literature and
provide references for further study.
Before concluding the section with a 2-D subband coding example, we note
that 1-D orthonormal and biorthogonal filters can be used as 2-D separable
filters for the processing of images. As can be seen in Fig. 7.7, the separable fil-
C
ters are first applied in one dimension (e.g., vertically) and then in the other
(e.g., horizontally) in the manner introduced in Section 2.6.7. Moreover, down-
sampling is performed in two stages—once before the second filtering opera-
tion to reduce the overall number of computations. The resulting filtered
tu

FIGURE 7.7
 h0 (n) 2T a(m,n) A two-
Columns
dimensional, four-
(along n) band filter bank
 h0 (m) 2T for subband
V

Rows image coding.


(along m)  h1 (n) 2T dV(m,n)

f(m,n) Columns

 h0 (n) 2T dH(m,n)

Columns
 h1 (m) 2T

Rows
 h1 (n) 2T dD(m,n)

Columns

www.EBooksWorld.ir
472 Chapter 7 ■ Wavelets and Multiresolution Processing

outputs, denoted a(m, n), dV(m, n), d H(m, n), and dD(m, n) in Fig. 7.7, are
called the approximation, vertical detail, horizontal detail, and diagonal detail
subbands of the input image, respectively. These subbands can be split into
four smaller subbands, which can be split again, and so on—a property that
will be described in greater detail in Section 7.4.

EXAMPLE 7.2: ■ Figure 7.8 shows the impulse responses of four 8-tap orthonormal filters.
A four-band The coefficients of prototype synthesis filter g0 (n) for 0 … n … 7 [in Fig. 7.8(c)]
subband coding of
are defined in Table 7.1 (Daubechies [1992]). The coefficients of the remaining
the vase in Fig. 7.1.
orthonormal filters can be computed using Eq. (7.1-14). With the help of Fig.

ud
7.5, note (by visual inspection) the cross modulation of the analysis and synthe-
sis filters in Fig. 7.8. It is relatively easy to show numerically that the filters are

TABLE 7.1
n g0(n)
Daubechies 8-tap
orthonormal filter 0 0.23037781
coefficients for 1 0.71484657
g0(n) (Daubechies lo 2 0.63088076
[1992]). 3 -0.02798376
4 -0.18703481
5 0.03084138
6 0.03288301
7 -0.01059740
C
a b h0(n) h1(n)
c d 1 1

FIGURE 7.8
0.5 0.5
tu

The impulse
responses of four
8-tap Daubechies 0 0
orthonormal
filters. See 0.5 0.5
Table 7.1 for the
values of g0 (n) for 1 n 1 n
0 … n … 7. 0 2 4 6 8 0 2 4 6 8
V

g0 (n) g1(n)
1 1

0.5 0.5

0 0

0.5 0.5

1 n 1 n
0 2 4 6 8 0 2 4 6 8

www.EBooksWorld.ir
7.1 ■ Background 473

a b
c d
FIGURE 7.9
A four-band split
of the vase in
Fig. 7.1 using the
subband coding
system of Fig. 7.7.
The four
subbands that
result are the
(a) approximation,

ud
(b) horizontal
detail, (c) vertical
detail, and
(d) diagonal detail
subbands.

lo
C
both biorthogonal (they satisfy Eq. 7.1-12) and orthonormal (they satisfy Eq. 7.1-
tu

13). As a result, the Daubechies 8-tap filters in Fig. 7.8 support error-free recon-
struction of the decomposed input.
A four-band split of the 512 * 512 image of a vase in Fig. 7.1, based on the
filters in Fig. 7.8, is shown in Fig. 7.9. Each quadrant of this image is a subband
of size 256 * 256. Beginning with the upper-left corner and proceeding in a
clockwise manner, the four quadrants contain approximation subband a, hori-
zontal detail subband dH, diagonal detail subband dD, and vertical detail sub-
V

band dV, respectively. All subbands, except the approximation subband in


Fig. 7.9(a), have been scaled to make their underlying structure more visible.
See Section 4.5.4 for
Note the visual effects of aliasing that are present in Figs. 7.9(b) and (c)—the dH more on aliasing.
and dV subbands. The wavy lines in the window area are due to the downsam-
pling of a barely discernable window screen in Fig. 7.1. Despite the aliasing, the
original image can be reconstructed from the subbands in Fig. 7.9 without
error. The required synthesis filters, g0 (n) and g1 (n), are determined from
Table 7.1 and Eq. (7.1-14), and incorporated into a filter bank that roughly
mirrors the system in Fig. 7.7. In the new filter bank, filters hi (n) for i = 50, 16
are replaced by their gi (n) counterparts, and upsamplers and summers are
added. ■

www.EBooksWorld.ir
474 Chapter 7 ■ Wavelets and Multiresolution Processing

7.1.3 The Haar Transform


The third and final imaging-related operation with ties to multiresolution
analysis that we will look at is the Haar transform (Haar [1910]). Within
the context of this chapter, its importance stems from the fact that its basis
functions (defined below) are the oldest and simplest known orthonormal
wavelets. They will be used in a number of examples in the sections that
follow.
With reference to the discussion in Section 2.6.7, the Haar transform can be
expressed in the following matrix form

ud
T = HFH T (7.1-15)

where F is an N * N image matrix, H is an N * N Haar transformation


matrix, and T is the resulting N * N transform. The transpose is required
because H is not symmetric; in Eq. (2.6-38) of Section 2.6.7, the transforma-
tion matrix is assumed to be symmetric. For the Haar transform, H contains
the Haar basis functions, hk(z). They are defined over the continuous, closed
interval z H [0, 1] for k = 0, 1, 2, Á , N - 1, where N = 2 n. To generate H,
lo
we define the integer k such that k = 2 p + q - 1, where 0 … p … n - 1,
q = 0 or 1 for p = 0, and 1 … q … 2 p for p Z 0. Then the Haar basis func-
tions are

1
h0 (z) = h00 (z) = , z H [0, 1] (7.1-16)
1N
C
and

2 p>2 (q - 1)>2 p … z 6 (q - 0.5)>2 p


1
tu

hk(z) = hpq (z) = c -2 p>2 (q - 0.5)>2 p … z 6 q>2 p


1N (7.1-17)
0 otherwise, z H [0, 1]

The ith row of an N * N Haar transformation matrix contains the elements


of hi (z) for z = 0>N, 1>N, 2>N, Á , (N - 1)>N. For instance, if N = 2, the
V

first row of the 2 * 2 Haar matrix is computed using h0(z) with z = 0>2, 1>2.
From Eq. (7.1-16), h0 (z) is equal to 1> 1 2, independent of z, so the first row of
H 2 has two identical 1> 1 2 elements. The second row is obtained by computing
h1(z) for z = 0>2, 1>2. Because k = 2 p + q - 1, when k = 1, p = 0 and
q = 1. Thus, from Eq. (7.1-17), h1(0) = 2 0> 12 = 1> 12, h1 (1>2) = -2 0> 12
= -1> 1 2, and the 2 * 2 Haar matrix is

1 1 1
H2 = B R (7.1-18)
12 1 -1

www.EBooksWorld.ir
7.1 ■ Background 475

If N = 4, k, q, and p assume the values

k p q
0 0 0
1 0 1
2 1 1
3 1 2

and the 4 * 4 transformation matrix, H 4, is

ud
1 1 1 1
1 1 1 -1 -1
H4 = D T (7.1-19)
14 1 2 -12 0 0
0 0 12 -12

Our principal interest in the Haar transform is that the rows of H 2 can be used
lo
to define the analysis filters, h0 (n) and h1(n), of a 2-tap perfect reconstruction
filter bank (see the previous section), as well as the scaling and wavelet vectors
(defined in Sections 7.2.2 and 7.2.3, respectively) of the simplest and oldest
wavelet transform (see Example 7.10 in Section 7.4). Rather than concluding
the section with the computation of a Haar transform, we close with an exam-
ple that illustrates the influence of the decomposition methods that have been
C
considered to this point on the methods that will be developed in the remainder
of the chapter.

■ Figure 7.10(a) shows a decomposition of the 512 * 512 image in Fig. 7.1 EXAMPLE 7.3:
tu

that combines the key features of pyramid coding, subband coding, and the Haar functions in
Haar transform (the three techniques we have discussed so far). Called the a discrete wavelet
transform.
discrete wavelet transform (and developed later in the chapter), the represen-
tation is characterized by the following important features:
1. With the exception of the subimage in the upper-left corner of Fig. 7.10(a),
the local histograms are very similar. Many of the pixels are close to zero.
V

Because the subimages (except for the subimage in the upper-left corner)
have been scaled to make their underlying structure more visible, the dis-
played histograms are peaked at intensity 128 (the zeroes have been
scaled to mid-gray). The large number of zeroes in the decomposition
makes the image an excellent candidate for compression (see Chapter 8).
2. In a manner that is similar to the way in which the levels of the prediction
residual pyramid of Fig. 7.3(b) were used to create approximation images
of differing resolutions, the subimages in Fig. 7.10(a) can be used to con-
struct both coarse and fine resolution approximations of the original
vase image in Fig. 7.1. Figures 7.10(b) through (d), which are of size

www.EBooksWorld.ir
476 Chapter 7 ■ Wavelets and Multiresolution Processing

a
b c d
FIGURE 7.10
(a) A discrete
wavelet transform
using Haar H 2
basis functions. Its
local histogram
variations are also
shown. (b)–(d)
Several different
approximations

ud
(64 * 64,
128 * 128, and
256 * 256) that
can be obtained
from (a).

lo
C
tu
V

64 * 64, 128 * 128, and 256 * 256, respectively, were generated from
the subimages in Fig. 7.10(a). A perfect 512 * 512 reconstruction of the
original image is also possible.
3. Like the subband coding decomposition in Fig. 7.9, a simple real-coefficient,
FIR filter bank of the form given in Fig. 7.7 was used to produce Fig. 7.10(a).
After the generation of a four subband image like that of Fig. 7.9, the
256 * 256 approximation subband was decomposed and replaced by four
128 * 128 subbands (using the same filter bank), and the resulting approx-
imation subband was again decomposed and replaced by four 64 * 64 sub-
bands. This process produced the unique arrangement of subimages that

www.EBooksWorld.ir
7.2 ■ Multiresolution Expansions 477

characterizes discrete wavelet transforms. The subimages in Fig. 7.10(a)


become smaller in size as you move from the lower-right-hand to upper-
left-hand corner of the image.
4. Figure 7.10(a) is not the Haar transform of the image in Fig. 7.1. Although
the filter bank coefficients that were used to produce this decomposition
were taken from Haar transformation matrix H 2, a variety of othronormal
and biorthogonal filter bank coefficients can be used in discrete wavelet
transforms.
5. As will be shown in Section 7.4, each subimage in Fig. 7.10(a) represents a
specific band of spatial frequencies in the original image. In addition,
many of the subimages demonstrate directional sensitivity [e.g., the

ud
subimage in the upper-right corner of Fig. 7.10(a) captures horizontal edge
information in the original image].
Considering this impressive list of features, it is remarkable that the discrete
wavelet transform of Fig. 7.10(a) was generated using two 2-tap digital filters
with a total of four filter coefficients. ■

7.2 Multiresolution Expansionslo


The previous section introduced three well-known imaging techniques that
play an important role in a mathematical framework called multiresolution
analysis (MRA). In MRA, a scaling function is used to create a series of ap-
proximations of a function or image, each differing by a factor of 2 in resolu-
tion from its nearest neighboring approximations. Additional functions, called
C
wavelets, are then used to encode the difference in information between adja-
cent approximations.

7.2.1 Series Expansions


tu

A signal or function f(x) can often be better analyzed as a linear combination


of expansion functions

f(x) = a ak wk (x) (7.2-1)


k

where k is an integer index of a finite or infinite sum, the ak are real-valued


V

expansion coefficients, and the wk(x) are real-valued expansion functions. If


the expansion is unique—that is, there is only one set of ak for any given f(x)—
the wk(x) are called basis functions, and the expansion set, E wk(x) F , is called a
basis for the class of functions that can be so expressed. The expressible func-
tions form a function space that is referred to as the closed span of the expan-
sion set, denoted

V = Span E wk(x) F (7.2-2)


k

To say that f(x) H V means that f(x) is in the closed span of E wk(x) F and can
be written in the form of Eq. (7.2-1).

www.EBooksWorld.ir
478 Chapter 7 ■ Wavelets and Multiresolution Processing

For any function space V and corresponding expansion set E wk(x) F , there is a
set of dual functions denoted E wk(x) F that can be used to compute the ak coeffi-
'

cients of Eq. (7.2-1) for any f(x) H V. These coefficients are computed by taking
'
the integral inner products† of the dual wk(x) and function f(x). That is,

ak = 8wk(x), f(x)9 =
' '
w*k(x)f(x) dx (7.2-3)
L
where the * denotes the complex conjugate operation. Depending on the or-
thogonality of the expansion set, this computation assumes one of three possi-
ble forms. Problem 7.10 at the end of the chapter illustrates the three cases

ud
using vectors in two-dimensional Euclidean space.

Case 1: If the expansion functions form an orthonormal basis for V,


meaning that

8wj (x), wk (x)9 = djk = b


0 j Z k
(7.2-4)
1 j = k

becomes
lo ak = 8wk(x), f(x)9
'
the basis and its dual are equivalent. That is, wk(x) = wk(x) and Eq. (7.2-3)

(7.2-5)

The ak are computed as the inner products of the basis functions and f(x).
C
Case 2: If the expansion functions are not orthonormal, but are an orthog-
onal basis for V, then

8wj (x), wk (x)9 = 0 j Z k (7.2-6)


tu

and the basis functions and their duals are called biorthogonal. The ak are
computed using Eq. (7.2-3), and the biorthogonal basis and its dual are
such that

8wj (x), wk(x)9 = djk = b


' 0 j Z k
(7.2-7)
1 j = k
V

Case 3: If the expansion set is not a basis for V, but supports the expan-
sion defined in Eq. (7.2-1), it is a spanning set in which there is more than
one set of ak for any f(x) H V. The expansion functions and their duals are
said to be overcomplete or redundant. They form a frame in which‡

A7f(x)7 2 … a ƒ 8wk(x), f(x)9 ƒ 2 … B7f(x)7 2 (7.2-8)


k

The integral inner product of two real or complex-valued functions f(x) and g(x) is 8f(x), g(x)9 =

f*(x)g(x) dx. If f(x) is real, f*(x) = f(x) and 8f(x), g(x)9 = f(x)g(x) dx.
L L
The norm of f(x), denoted 7f(x)7, is defined as the square root of the absolute value of the inner prod-

uct of f(x) with itself.

www.EBooksWorld.ir
7.2 ■ Multiresolution Expansions 479

for some A 7 0, B 6 q , and all f(x) H V. Dividing this equation by the


norm squared of f(x), we see that A and B “frame” the normalized inner
products of the expansion coefficients and the function. Equations similar
to (7.2-3) and (7.2-5) can be used to find the expansion coefficients for
frames. If A = B, the expansion set is called a tight frame and it can be
shown that (Daubechies [1992])

8wk(x), f(x)9wk(x)
1
Aa
f(x) = (7.2-9)
k

Except for the A-1 term, which is a measure of the frame’s redundancy,
this is identical to the expression obtained by substituting Eq. (7.2-5) (for

ud
orthonormal bases) into Eqs. (7.2-1).
7.2.2 Scaling Functions
Consider the set of expansion functions composed of integer translations and
binary scalings of the real, square-integrable function w(x); this is the set
E wj, k(x) F , where
wj, k(x) = 2j>2w(2jx - k) (7.2-10)
lo
for all j, k H Z and w(x) H L2(R).† Here, k determines the position of wj, k(x)
along the x-axis, and j determines the width of wj, k(x)—that is, how broad or
narrow it is along the x-axis. The term 2 j>2 controls the amplitude of the func-
tion. Because the shape of wj, k(x) changes with j, w(x) is called a scaling function.
By choosing w(x) properly, E wj, k(x) F can be made to span L2(R), which is the
C
set of all measurable, square-integrable functions.
If we restrict j in Eq. (7.2-10) to a specific value, say j = j0 , the resulting
expansion set, E wj0, k(x) F , is a subset of E wj, k(x) F that spans a subspace of L2(R).
Using the notation of the previous section, we can define that subspace as
tu

Vj0 = Span E wj0 , k(x) F (7.2-11)


k
That is, Vj0 is the span of wj0 , k(x) over k. If f(x) H Vj0, we can write

f(x) = a ak wj0 , k(x) (7.2-12)


V

k
More generally, we will denote the subspace spanned over k for any j as

Vj = Span E wj, k(x) F (7.2-13)


k

As will be seen in the following example, increasing j increases the size of Vj,
allowing functions with smaller variations or finer detail to be included in the
subspace. This is a consequence of the fact that, as j increases, the wj, k(x) that
are used to represent the subspace functions become narrower and separated
by smaller changes in x.

The notation L2(R), where R is the set of real numbers, denotes the set of measurable, square-integrable,
one-dimensional functions; Z is the set of integers.

www.EBooksWorld.ir
480 Chapter 7 ■ Wavelets and Multiresolution Processing

EXAMPLE 7.4: ■ Consider the unit-height, unit-width scaling function (Haar [1910])
The Haar scaling
function.
1 0 … x 6 1
w(x) = b (7.2-14)
0 otherwise

Figures 7.11(a) through (d) show four of the many expansion functions that
can be generated by substituting this pulse-shaped scaling function into
Eq. (7.2-10). Note that the expansion functions for j = 1 in Figs. 7.11(c) and
(d) are half as wide as those for j = 0 in Figs. 7.11(a) and (b). For a given in-
terval on x, we can define twice as many V1 scaling functions as V0 scaling func-

ud
tions (e.g., w1, 0 and w1, 1 of V1 versus w0, 0 of V0 for the interval 0 … x 6 1).
Figure 7.11(e) shows a member of subspace V1. This function does not be-
long to V0, because the V0 expansion functions in 7.11(a) and (b) are too
coarse to represent it. Higher-resolution functions like those in 7.11(c) and (d)

a b w0, 0 (x)  w (x) w0, 1(x)  w (x  1)


c d
e f
FIGURE 7.11
Some Haar
scaling functions.
1

0
lo 1

x x
C
0 1 2 3 0 1 2 3

w1, 0(x)  2 w (2x) w1, 1(x)  2 w(2x  1)


tu

1 1

0 0

x x
0 1 2 3 0 1 2 3
V

f(x) H V1 w0, 0 (x) H V1

1 1, 1 1
w1, 1/ 2

0.25 1, 4
0 0
0.5 1, 0 w1,0/ 2
x x
0 1 2 3 0 1 2 3

www.EBooksWorld.ir
7.2 ■ Multiresolution Expansions 481

are required. They can be used, as shown in (e), to represent the function by
the three-term expansion
f(x) = 0.5w1, 0 (x) + w1, 1 (x) - 0.25w1, 4 (x)
To conclude the example, Fig. 7.11(f) illustrates the decomposition of
w0, 0 (x) as a sum of V1 expansion functions. In a similar manner, any V0 expan-
sion function can be decomposed using
1 1
w0, k(x) = w1, 2k (x) + w1, 2k + 1 (x)
12 12
Thus, if f(x) is an element of V0, it is also an element of V1. This is because all

ud
V0 expansion functions are contained in V1. Mathematically, we write that V0 is
a subspace of V1, denoted V0 ( V1. ■

The simple scaling function in the preceding example obeys the four funda-
mental requirements of multiresolution analysis (Mallat [1989a]):
MRA Requirement 1: The scaling function is orthogonal to its integer
translates. lo
This is easy to see in the case of the Haar function, because whenever it has a
value of 1, its integer translates are 0, so that the product of the two is 0. The
Haar scaling function is said to have compact support, which means that it is
0 everywhere outside a finite interval called the support. In fact, the width of
the support is 1; it is 0 outside the half open interval [0, 1). It should be noted
that the requirement for orthogonal integer translates becomes harder to
C
satisfy as the width of support of the scaling function becomes larger than 1.
MRA Requirement 2: The subspaces spanned by the scaling function at low
scales are nested within those spanned at higher scales.
tu

As can be seen in Fig. 7.12, subspaces containing high-resolution functions


must also contain all lower resolution functions. That is,
V- q ( Á ( V-1 ( V0 ( V1 ( V2 ( Á ( Vq (7.2-15)
Moreover, the subspaces satisfy the intuitive condition that if f(x) H Vj, then
f(2x) H Vj + 1. The fact that the Haar scaling function meets this requirement
V

V0 V1 V2 FIGURE 7.12
The nested
function spaces
spanned by a
scaling function.

V0

www.EBooksWorld.ir
482 Chapter 7 ■ Wavelets and Multiresolution Processing

should not be taken to indicate that any function with a support width of 1
automatically satisfies the condition. It is left as an exercise for the reader
to show that the equally simple function

1 0.25 … x 6 0.75
w(x) = b
0 elsewhere

is not a valid scaling function for a multiresolution analysis (see Problem 7.11).
MRA Requirement 3: The only function that is common to all Vj is f(x) = 0.
If we consider the coarsest possible expansion functions (i.e., j = - q ),

ud
the only representable function is the function of no information. That is,

V- q = 506 (7.2-16)

MRA Requirement 4:Any function can be represented with arbitrary precision.


Though it may not be possible to expand a particular f(x) at an arbitrarily
coarse resolution, as was the case for the function in Fig. 7.11(e), all mea-
surable, square-integrable functions can be represented by the scaling
lo
functions in the limit as j : q . That is,

Vq = E L2(R) F (7.2-17)

Under these conditions, the expansion functions of subspace Vj can be ex-


pressed as a weighted sum of the expansion functions of subspace Vj + 1. Using
C
Eq. (7.2-12), we let

wj, k (x) = a an wj + 1, n (x)


n
tu

where the index of summation has been changed to n for clarity. Substituting
The an are changed to
hw(n) because they are
for wj + 1, n (x) from Eq. (7.2-10) and changing variable an to hw(n), this becomes
used later (see Section
7.4) as filter bank
coefficients. wj, k (x) = a hw(n)2 (j + 1)>2w(2 j + 1x - n)
n

Because w(x) = w0, 0 (x), both j and k can be set to 0 to obtain the simpler non-
V

subscripted expression

w(x) = a hw(n)12w(2x - n) (7.2-18)


n

The hw(n) coefficients in this recursive equation are called scaling function co-
efficients; hw is referred to as a scaling vector. Equation (7.2-18) is fundamental
to multiresolution analysis and is called the refinement equation, the MRA
equation, or the dilation equation. It states that the expansion functions of any
subspace can be built from double-resolution copies of themselves—that is,
from expansion functions of the next higher resolution space. The choice of a
reference subspace, V0, is arbitrary.

www.EBooksWorld.ir
7.2 ■ Multiresolution Expansions 483

■ The scaling function coefficients for the Haar function of Eq. (7.2-14) EXAMPLE 7.5:
are hw(0) = hw(1) = 1> 12, the first row of matrix H 2 in Eq. (7.1-18). Thus, Haar scaling
function
Eq. (7.2-18) yields
coefficients.

C 12w(2x) D + C 12w(2x - 1) D
1 1
w(x) =
12 12
This decomposition was illustrated graphically for w0, 0 (x) in Fig. 7.11(f), where
the bracketed terms of the preceding expression are seen to be w1, 0 (x) and
w1, 1 (x). Additional simplification yields w(x) = w(2x) + w(2x - 1). ■

7.2.3 Wavelet Functions

ud
Given a scaling function that meets the MRA requirements of the previous
section, we can define a wavelet function c(x) that, together with its integer
translates and binary scalings, spans the difference between any two adjacent
scaling subspaces, Vj and Vj + 1. The situation is illustrated graphically in Fig. 7.13.
We define the set 5cj, k (x)6 of wavelets

cj, k (x) = 2 j>2c(2 jx - k) (7.2-19)


lo
for all k H Z that span the Wj spaces in the figure. As with scaling functions, we
write
Wj = Span E cj, k (x) F
k
(7.2-20)

and note that if f(x) H Wj,


C
f(x) = a ak cj, k (x) (7.2-21)
k
The scaling and wavelet function subspaces in Fig. 7.13 are related by
Vj + 1 = Vj { Wj (7.2-22)
tu

where { denotes the union of spaces (like the union of sets). The orthogonal
complement of Vj in Vj + 1 is Wj, and all members of Vj are orthogonal to the
members of Wj. Thus,
8wj, k (x), cj, l (x)9 = 0 (7.2-23)
for all appropriate j, k, l H Z.
V

V2  V1 { W1 V0 { W0 { W1 FIGURE 7.13


The relationship
between scaling
V1  V0 { W0 and wavelet
function spaces.
W1
W0
V0

www.EBooksWorld.ir
484 Chapter 7 ■ Wavelets and Multiresolution Processing

We can now express the space of all measurable, square-integrable func-


tions as

L2(R) = V0 { W0 { W1 { Á (7.2-24)

or
L2(R) = V1 { W1 { W2 { Á (7.2-25)
or even

L2(R) = Á { W-2 { W-1 { W0 { W1 { W2 { Á (7.2-26)

ud
which eliminates the scaling function, and represents a function in terms of
wavelets alone [i.e., there are only wavelet function spaces in Eq. (7.2-26)].
Note that if f(x) is an element of V1, but not V0, an expansion using Eq. (7.2-24)
contains an approximation of f(x) using V0 scaling functions. Wavelets from
W0 would encode the difference between this approximation and the actual
function. Equations (7.2-24) through (7.2-26) can be generalized to yield

lo L2(R) = Vj0 { Wj0 { Wj0 + 1 { Á

where j0 is an arbitrary starting scale.


(7.2-27)

Since wavelet spaces reside within the spaces spanned by the next higher
resolution scaling functions (see Fig. 7.13), any wavelet function—like its scal-
ing function counterpart of Eq. (7.2-18)—can be expressed as a weighted sum
C
of shifted, double-resolution scaling functions. That is, we can write

c(x) = a hc(n)12w(2x - n) (7.2-28)


n
tu

where the hc(n) are called the wavelet function coefficients and hc is the
wavelet vector. Using the condition that wavelets span the orthogonal comple-
ment spaces in Fig. 7.13 and that integer wavelet translates are orthogonal, it
can be shown that hc(n) is related to hw(n) by (see, for example, Burrus,
Gopinath, and Guo [1998])
V

hc(n) = ( -1)n hw(1 - n) (7.2-29)

Note the similarity of this result and Eq. (7.1-14), the relationship governing
the impulse responses of orthonormal subband coding and decoding filters.

EXAMPLE 7.6: ■ In the previous example, the Haar scaling vector was defined as
The Haar wavelet hw(0) = hw(1) = 1> 12. Using Eq. (7.2-29), the corresponding wavelet
function vector is hc(0) = (-1)0hw(1 - 0) = 1> 12 and hc(1) = (-1)1hw(1 - 1)
coefficients.
= -1> 12. Note that these coefficients correspond to the second row of ma-
trix H2 in Eq. (7.1-18). Substituting these values into Eq. (7.2-28), we get

www.EBooksWorld.ir
7.2 ■ Multiresolution Expansions 485

c(x) = w(2x) - w(2x - 1), which is plotted in Fig. 7.14(a). Thus, the Haar
wavelet function is
1 0 … x 6 0.5
c(x) = c -1 0.5 … x 6 1 (7.2-30)
0 elsewhere
Using Eq. (7.2-19), we can now generate the universe of scaled and translated
Haar wavelets.Two such wavelets, c0, 2 (x) and c1, 0 (x), are plotted in Figs. 7.14(b)
and (c), respectively. Note that wavelet c1, 0 (x) for space W1 is narrower than
c0, 2 (x) for W0; it can be used to represent finer detail.
Figure 7.14(d) shows a function of subspace V1 that is not in subspace V0. This

ud
function was considered in an earlier example [see Fig. 7.11(e)]. Although the
function cannot be represented accurately in V0, Eq. (7.2-22) indicates that it can
be expanded using V0 and W0 expansion functions. The resulting expansion is
f(x) = fa (x) + fd (x)

c(x)  c0, 0 (x) c0, 2 (x)  c(x  2) a b


c d
1

0
lo 1

0
e f
FIGURE 7.14
Haar wavelet
functions in W0
and W1.
1 1
C
x x
0 1 2 3 0 1 2 3

c1, 0 (x)  2 c(2x) f(x) H V1  V0 { W0


tu

1 1

0 0

1 1
x x
V

0 1 2 3 0 1 2 3

fa(x) H V0 fd (x) H W0

3 2/4 w0, 0
1 1  2/8 c0, 2

0 0

1  2/8 w0, 2 1  2/4 c0, 0

x x
0 1 2 3 0 1 2 3

www.EBooksWorld.ir
486 Chapter 7 ■ Wavelets and Multiresolution Processing

where
312 12
fa (x) = w0, 0 (x) - w0, 2 (x)
4 8
and
- 12 12
fd (x) = c0, 0 (x) - c0, 2 (x)
4 8
Here, fa (x) is an approximation of f(x) using V0 scaling functions, while fd (x)
is the difference f(x) - fa (x) as a sum of W0 wavelets. The two expansions,
which are shown in Figs. 7.14(e) and (f), divide f(x) in a manner similar to a

ud
lowpass and highpass filter as discussed in connection with Fig. 7.6. The low
frequencies of f(x) are captured in fa (x)—it assumes the average value of
f(x) in each integer interval—while the high-frequency details are encoded in
fd (x). ■

7.3 Wavelet Transforms in One Dimension


lo
We can now formally define several closely related wavelet transformations:
the generalized wavelet series expansion, the discrete wavelet transform, and
the continuous wavelet transform. Their counterparts in the Fourier domain
are the Fourier series expansion, the discrete Fourier transform, and the inte-
gral Fourier transform, respectively. In Section 7.4, we develop a computation-
ally efficient implementation of the discrete wavelet transform called the fast
C
wavelet transform.

7.3.1 The Wavelet Series Expansions


We begin by defining the wavelet series expansion of function f(x) H L2(R) rel-
tu

ative to wavelet c(x) and scaling function w(x). In accordance with Eq. (7.2-27),
f(x) can be represented by a scaling function expansion in subspace Vj0
[Eq. (7.2-12) defines such an expansion] and some number of wavelet func-
tion expansions in subspaces Wj0, Wj0 + 1, Á [as defined in Eq. (7.2-21)]. Thus,

q
V

f(x) = a cj0(k)wj0, k (x) + a a dj(k)cj, k (x) (7.3-1)


k j = j0 k

where j0 is an arbitrary starting scale and the cj0(k) and dj (k) are relabeled ak
from Eqs. (7.2-12) and (7.2-21), respectively. The cj0(k) are normally called
approximation and/or scaling coefficients; the dj (k) are referred to as detail
and/or wavelet coefficients. This is because the first sum in Eq. (7.3-1) uses scal-
ing functions to provide an approximation of f(x) at scale j0 [unless f(x) H Vj0
so that the sum of the scaling functions is equal to f(x)]. For each higher scale
j Ú j0 in the second sum, a finer resolution function—a sum of wavelets—is
added to the approximation to provide increasing detail. If the expansion
628 Chapter 9 ■ Morphological Image Processing

9.1 Preliminaries
You will find it helpful to The language of mathematical morphology is set theory. As such, morpholo-
review Sections 2.4.2 and
2.6.4 before proceeding. gy offers a unified and powerful approach to numerous image processing
problems. Sets in mathematical morphology represent objects in an image.
For example, the set of all white pixels in a binary image is a complete mor-
phological description of the image. In binary images, the sets in question are
members of the 2-D integer space Z2 (see Section 2.4.2), where each element
of a set is a tuple (2-D vector) whose coordinates are the (x, y) coordinates
of a white (or black, depending on convention) pixel in the image. Gray-
scale digital images of the form discussed in the previous chapters can be

ud
represented as sets whose components are in Z3. In this case, two compo-
nents of each element of the set refer to the coordinates of a pixel, and the
third corresponds to its discrete intensity value. Sets in higher dimensional
spaces can contain other image attributes, such as color and time varying
components.
In addition to the basic set definitions in Section 2.6.4, the concepts of set
The set reflection opera- reflection and translation are used extensively in morphology. The reflection of
tion is analogous to the
a set B, denoted B N , is defined as
flipping (rotating) opera- lo
tion performed in spatial
convolution (Section N = 5w ƒ w = -b, for
B b H B6 (9.1-1)
3.4.2).

N is
If B is the set of pixels (2-D points) representing an object in an image, then B
simply the set of points in B whose (x, y) coordinates have been replaced by
(-x, -y). Figures 9.1(a) and (b) show a simple set and its reflection.†
C
a b c
tu

FIGURE 9.1
(a) A set, (b) its
reflection, and
(c) its translation
by z. z2

B
z1
V

(B) z


When working with graphics, such as the sets in Fig. 9.1, we use shading to indicate points (pixels) that
are members of the set under consideration. When working with binary images, the sets of interest are
pixels corresponding to objects. We show these in white, and all other pixels in black. The terms
foreground and background are used often to denote the sets of pixels in an image defined to be objects
and non-objects, respectively.
9.1 ■ Preliminaries 629

The translation of a set B by point z = (z1, z2), denoted (B)z, is defined as

(B)z = 5c ƒ c = b + z, for b H B6 (9.1-2)

If B is the set of pixels representing an object in an image, then (B)z is the


set of points in B whose (x, y) coordinates have been replaced by
(x + z1, y + z2). Figure 9.1(c) illustrates this concept using the set B from
Fig. 9.1(a).
Set reflection and translation are employed extensively in morphology to
formulate operations based on so-called structuring elements (SEs): small
sets or subimages used to probe an image under study for properties of in-

ud
terest. The first row of Fig. 9.2 shows several examples of structuring ele-
ments where each shaded square denotes a member of the SE. When it does
not matter whether a location in a given structuring element is or is not a
member of the SE set, that location is marked with an “*” to denote a “don’t
care” condition, as defined later in Section 9.5.4. In addition to a definition
of which elements are members of the SE, the origin of a structuring element
also must be specified. The origins of the various SEs in Fig. 9.2 are indicated
by a black dot (although placing the center of an SE at its center of gravity is
lo
common, the choice of origin is problem dependent in general). When the
SE is symmetric and no dot is shown, the assumption is that the origin is at
the center of symmetry.
When working with images, we require that structuring elements be rec-
tangular arrays. This is accomplished by appending the smallest possible
number of background elements (shown nonshaded in Fig. 9.2) necessary to
C
form a rectangular array. The first and last SEs in the second row of Fig. 9.2
illustrate the procedure. The other SEs in that row already are in rectangu-
lar form.
As an introduction to how structuring elements are used in morphology,
tu

consider Fig. 9.3. Figures 9.3(a) and (b) show a simple set and a structuring el-
ement. As mentioned in the previous paragraph, a computer implementation
requires that set A be converted also to a rectangular array by adding back-
ground elements. The background border is made large enough to accommo-
date the entire structuring element when its origin is on the border of the
V

FIGURE 9.2 First


row: Examples of
structuring
elements. Second
row: Structuring
elements
converted to
rectangular
arrays. The dots
denote the centers
of the SEs.
630 Chapter 9 ■ Morphological Image Processing

ud
a b
c d e
FIGURE 9.3 (a) A set (each shaded square is a member of the set). (b) A structuring
element. (c) The set padded with background elements to form a rectangular array and
provide a background border. (d) Structuring element as a rectangular array. (e) Set
processed by the structuring element.

In future illustrations, we original set (this is analogous to padding for spatial correlation and convolu-
add enough background
points to form rectangular
arrays, but let the padding
be implicit when the
meaning is clear in order
to simplify the figures.
lo
tion, as discussed in Section 3.4.2). In this case, the structuring element is of
size 3 * 3 with the origin in the center, so a one-element border that encom-
passes the entire set is sufficient, as Fig. 9.3(c) shows. As in Fig. 9.2, the struc-
turing element is filled with the smallest possible number of background
elements necessary to make it into a rectangular array [Fig. 9.3(d)].
Suppose that we define an operation on set A using structuring element B,
C
as follows: Create a new set by running B over A so that the origin of B visits
every element of A. At each location of the origin of B, if B is completely con-
tained in A, mark that location as a member of the new set (shown shaded);
else mark it as not being a member of the new set (shown not shaded).
tu

Figure 9.3(e) shows the result of this operation. We see that, when the origin of
B is on a border element of A, part of B ceases to be contained in A, thus elim-
inating the location on which B is centered as a possible member for the new
set. The net result is that the boundary of the set is eroded, as Fig. 9.3(e) shows.
When we use terminology such as “the structuring element is contained in the
set,” we mean specifically that the elements of A and B fully overlap. In other
words, although we showed A and B as arrays containing both shaded and
V

nonshaded elements, only the shaded elements of both sets are considered in
determining whether or not B is contained in A. These concepts form the basis
of the material in the next section, so it is important that you understand the
ideas in Fig. 9.3 fully before proceeding.

9.2 Erosion and Dilation


We begin the discussion of morphology by studying two operations: erosion
and dilation. These operations are fundamental to morphological processing.
In fact, many of the morphological algorithms discussed in this chapter are
based on these two primitive operations.
9.2 ■ Erosion and Dilation 631

9.2.1 Erosion
With A and B as sets in Z 2, the erosion of A by B, denoted A | B, is defined as

A | B = 5z ƒ (B)z 8 A6 (9.2-1)

In words, this equation indicates that the erosion of A by B is the set of all
points z such that B, translated by z, is contained in A. In the following discus-
sion, set B is assumed to be a structuring element. Equation (9.2-1) is the
mathematical formulation of the example in Fig. 9.3(e), discussed at the end of
the last section. Because the statement that B has to be contained in A is

ud
equivalent to B not sharing any common elements with the background, we
can express erosion in the following equivalent form:

A | B = 5z ƒ (B)z ¨ Ac = 6 (9.2-2)

where, as defined in Section 2.6.4, Ac is the complement of A and  is the


empty set.
Figure 9.4 shows an example of erosion. The elements of A and B are
lo
shown shaded and the background is white. The solid boundary in Fig. 9.4(c)
is the limit beyond which further displacements of the origin of B would
cause the structuring element to cease being completely contained in A.
Thus, the locus of points (locations of the origin of B ) within (and includ-
ing) this boundary, constitutes the erosion of A by B. We show the erosion
shaded in Fig. 9.4(c). Keep in mind that that erosion is simply the set of
C
d
tu

d/4
d d/4
B
AB
A 3d/4
d/ 8 d/8
d/4
V

d/ 2

d
d/ 2
AB
B 3d/4
a b c d/8 d/8
d e
FIGURE 9.4 (a) Set A. (b) Square structuring element, B. (c) Erosion of A by B, shown
shaded. (d) Elongated structuring element. (e) Erosion of A by B using this element.
The dotted border in (c) and (e) is the boundary of set A, shown only for reference.
632 Chapter 9 ■ Morphological Image Processing

values of z that satisfy Eq. (9.2-1) or (9.2-2). The boundary of set A is


shown dashed in Figs. 9.4(c) and (e) only as a reference; it is not part of the
erosion operation. Figure 9.4(d) shows an elongated structuring element,
and Fig. 9.4(e) shows the erosion of A by this element. Note that the origi-
nal set was eroded to a line.
Equations (9.2-1) and (9.2-2) are not the only definitions of erosion (see
Problems 9.9 and 9.10 for two additional, equivalent definitions.) However,
these equations have the distinct advantage over other formulations in that
they are more intuitive when the structuring element B is viewed as a spatial
mask (see Section 3.4.1).

ud
EXAMPLE 9.1: ■ Suppose that we wish to remove the lines connecting the center region to
Using erosion to the border pads in Fig. 9.5(a). Eroding the image with a square structuring
remove image
element of size 11 * 11 whose components are all 1s removed most of the
components.
lines, as Fig. 9.5(b) shows. The reason the two vertical lines in the center were
thinned but not removed completely is that their width is greater than 11
pixels. Changing the SE size to 15 * 15 and eroding the original image again
did remove all the connecting lines, as Fig. 9.5(c) shows (an alternate ap-
lo
proach would have been to erode the image in Fig. 9.5(b) again using the
same 11 * 11 SE). Increasing the size of the structuring element even more
would eliminate larger components. For example, the border pads can be re-
moved with a structuring element of size 45 * 45, as Fig. 9.5(d) shows.
C
a b
c d
FIGURE 9.5 Using
erosion to remove
image compo-
tu

nents. (a) A
486 * 486 binary
image of a wire-
bond mask.
(b)–(d) Image
eroded using
square structuring
elements of sizes
V

11 * 11, 15 * 15,
and 45 * 45,
respectively. The
elements of the
SEs were all 1s.
9.2 ■ Erosion and Dilation 633

We see from this example that erosion shrinks or thins objects in a bina-
ry image. In fact, we can view erosion as a morphological filtering operation
in which image details smaller than the structuring element are filtered (re-
moved) from the image. In Fig. 9.5, erosion performed the function of a
“line filter.” We return to the concept of a morphological filter in Sections
9.3 and 9.6.3. ■

9.2.2 Dilation
With A and B as sets in Z2, the dilation of A by B, denoted A { B, is defined as

ud
A{B = E z ƒ (BN )z ¨ A Z  F (9.2-3)

This equation is based on reflecting B about its origin, and shifting this reflection
by z (see Fig. 9.1). The dilation of A by B then is the set of all displacements,
z, such that BN and A overlap by at least one element. Based on this inter-
pretation, Eq. (9.2-3) can be written equivalently as

A{B = loE z ƒ [(BN )z ¨ A] 8 A F (9.2-4)

As before, we assume that B is a structuring element and A is the set (image


objects) to be dilated.
Equations (9.2-3) and (9.2-4) are not the only definitions of dilation cur-
rently in use (see Problems 9.11 and 9.12 for two different, yet equivalent,
C
definitions). However, the preceding definitions have a distinct advantage
over other formulations in that they are more intuitive when the structuring
element B is viewed as a convolution mask. The basic process of flipping
(rotating) B about its origin and then successively displacing it so that it
slides over set (image) A is analogous to spatial convolution, as introduced
tu

in Section 3.4.2. Keep in mind, however, that dilation is based on set opera-
tions and therefore is a nonlinear operation, whereas convolution is a linear
operation.
Unlike erosion, which is a shrinking or thinning operation, dilation
“grows” or “thickens” objects in a binary image. The specific manner and ex-
tent of this thickening is controlled by the shape of the structuring element
V

used. Figure 9.6(a) shows the same set used in Fig. 9.4, and Fig. 9.6(b) shows a
structuring element (in this case BN = B because the SE is symmetric about its
origin). The dashed line in Fig. 9.6(c) shows the original set for reference, and
the solid line shows the limit beyond which any further displacements of the
origin of BN by z would cause the intersection of BN and A to be empty. There-
fore, all points on and inside this boundary constitute the dilation of A by B.
Figure 9.6(d) shows a structuring element designed to achieve more dilation
vertically than horizontally, and Fig. 9.6(e) shows the dilation achieved with
this element.
634 Chapter 9 ■ Morphological Image Processing

a b c d
d e
d/4
FIGURE 9.6 d/4
d
(a) Set A.
(b) Square Bˆ  B
structuring ele- AB
A
ment (the dot de- d
notes the origin). d/ 8 d/8
(c) Dilation of A
by B, shown
shaded. d/ 2
d/4
(d) Elongated

ud
structuring ele-
ment. (e) Dilation
of A using this d d d
element. The
dotted border in
(c) and (e) is the
Bˆ  B d/ 2
boundary of set A,
shown only for AB
reference d
d/ 8 d/ 8

EXAMPLE 9.2:
An illustration of
dilation.
lo
■ One of the simplest applications of dilation is for bridging gaps. Figure 9.7(a)
shows the same image with broken characters that we studied in Fig. 4.49 in
connection with lowpass filtering. The maximum length of the breaks is
known to be two pixels. Figure 9.7(b) shows a structuring element that can be
C
used for repairing the gaps (note that instead of shading, we used 1s to denote
the elements of the SE and 0s for the background; this is because the SE is
now being treated as a subimage and not as a graphic). Figure 9.7(c) shows
the result of dilating the original image with this structuring element. The
gaps were bridged. One immediate advantage of the morphological approach
tu

over the lowpass filtering method we used to bridge the gaps in Fig. 4.49 is

a c
b
FIGURE 9.7
V

(a) Sample text of


poor resolution
with broken
characters (see
magnified view).
(b) Structuring
element.
(c) Dilation of (a)
by (b). Broken
segments were 0 1 0
joined. 1 1 1
0 1 0
9.3 ■ Opening and Closing 635

that the morphological method resulted directly in a binary image. Lowpass


filtering, on the other hand, started with a binary image and produced a gray-
scale image, which would require a pass with a thresholding function to con-
vert it back to binary form. ■

9.2.3 Duality
Erosion and dilation are duals of each other with respect to set complementa-
tion and reflection. That is,
N
(A | B)c = Ac { B (9.2-5)
and

ud
N
(A { B)c = Ac | B (9.2-6)
Equation (9.2-5) indicates that erosion of A by B is the complement of the di-
lation of Ac by B N , and vice versa. The duality property is useful particularly
when the structuring element is symmetric with respect to its origin (as often is
the case), so that B N = B. Then, we can obtain the erosion of an image by B
simply by dilating its background (i.e., dilating Ac ) with the same structuring
lo
element and complementing the result. Similar comments apply to Eq. (9.2-6).
We proceed to prove formally the validity of Eq. (9.2-5) in order to illus-
trate a typical approach for establishing the validity of morphological expres-
sions. Starting with the definition of erosion, it follows that

E z ƒ (B)z 8 A F
c
(A | B)c =
C
If set (B)z is contained in A, then (B)z ¨ Ac = , in which case the preceding
expression becomes
E z ƒ (B)z ¨ Ac =  F
c
(A | B)c =
tu

But the complement of the set of z’s that satisfy (B)z ¨ Ac =  is the set of z’s
such that (B)z ¨ Ac Z . Therefore,
(A | B)c = E z ƒ (B)z ¨ Ac Z  F
N
= Ac { B
where the last step follows from Eq. (9.2-3). This concludes the proof. A simi-
V

lar line of reasoning can be used to prove Eq. (9.2-6) (see Problem 9.13).

9.3 Opening and Closing


As you have seen, dilation expands the components of an image and erosion
shrinks them. In this section we discuss two other important morphological
operations: opening and closing. Opening generally smoothes the contour of
an object, breaks narrow isthmuses, and eliminates thin protrusions. Closing
also tends to smooth sections of contours but, as opposed to opening, it gener-
ally fuses narrow breaks and long thin gulfs, eliminates small holes, and fills
gaps in the contour.
636 Chapter 9 ■ Morphological Image Processing

The opening of set A by structuring element B, denoted A  B, is de-


fined as

A  B = (A | B) { B (9.3-1)

Thus, the opening A by B is the erosion of A by B, followed by a dilation of


the result by B.
Similarly, the closing of set A by structuring element B, denoted A • B, is
defined as

A • B = (A { B) | B (9.3-2)

ud
which says that the closing of A by B is simply the dilation of A by B, followed
by the erosion of the result by B.
The opening operation has a simple geometric interpretation (Fig. 9.8).
Suppose that we view the structuring element B as a (flat) “rolling ball.” The
boundary of A  B is then established by the points in B that reach the
farthest into the boundary of A as B is rolled around the inside of this bound-
ary. This geometric fitting property of the opening operation leads to a set-
theoretic formulation, which states that the opening of A by B is obtained by
lo
taking the union of all translates of B that fit into A. That is, opening can be ex-
pressed as a fitting process such that

A  B = d E (B)z ƒ (B)z 8 A F (9.3-3)

where ´ 5 # 6 denotes the union of all the sets inside the braces.
C
Closing has a similar geometric interpretation, except that now we roll B on
the outside of the boundary (Fig. 9.9). As discussed below, opening and closing
are duals of each other, so having to roll the ball on the outside is not unex-
pected. Geometrically, a point w is an element of A • B if and only if
(B)z ¨ A Z  for any translate of (B)z that contains w. Figure 9.9 illustrates
tu

the basic geometrical properties of closing.

A  B  {(B)z|(B)z  A}
A
Translates of B in A
V

a b c d
FIGURE 9.8 (a) Structuring element B “rolling” along the inner boundary of A (the dot
indicates the origin of B). (b) Structuring element. (c) The heavy line is the outer
boundary of the opening. (d) Complete opening (shaded). We did not shade A in (a)
for clarity.
9.3 ■ Opening and Closing 637

B
AB

a b c
FIGURE 9.9 (a) Structuring element B “rolling” on the outer boundary of set A. (b) The

ud
heavy line is the outer boundary of the closing. (c) Complete closing (shaded). We did
not shade A in (a) for clarity.

■ Figure 9.10 further illustrates the opening and closing operations. Figure EXAMPLE 9.3:
9.10(a) shows a set A, and Fig. 9.10(b) shows various positions of a disk struc- A simple
turing element during the erosion process. When completed, this process re- illustration of
morphological
sulted in the disjoint figure in Fig. 9.10(c). Note the elimination of the bridge opening and
between the two main sections. Its width was thin in relation to the diameter of
lo closing.

a
b c
d e
f g
C
h i
A
FIGURE 9.10
Morphological
opening and
closing. The
tu

structuring
AB element is the
small circle shown
in various
positions in
(b). The SE was
not shaded here
A  B  (A  B)  B for clarity. The
V

dark dot is the


center of the
structuring
element.

AB

A  B  (A  B)  B
638 Chapter 9 ■ Morphological Image Processing

the structuring element; that is, the structuring element could not be complete-
ly contained in this part of the set, thus violating the conditions of Eq. (9.2-1).
The same was true of the two rightmost members of the object. Protruding el-
ements where the disk did not fit were eliminated. Figure 9.10(d) shows the
process of dilating the eroded set, and Fig. 9.10(e) shows the final result of
opening. Note that outward pointing corners were rounded, whereas inward
pointing corners were not affected.
Similarly, Figs. 9.10(f) through (i) show the results of closing A with the
same structuring element. We note that the inward pointing corners were
rounded, whereas the outward pointing corners remained unchanged. The
leftmost intrusion on the boundary of A was reduced in size significantly, be-

ud
cause the disk did not fit there. Note also the smoothing that resulted in parts
of the object from both opening and closing the set A with a circular structur-
ing element. ■

As in the case with dilation and erosion, opening and closing are duals of
each other with respect to set complementation and reflection. That is,
N)
(A • B)c = (Ac  B (9.3-4)
and
lo N)
(A  B)c = (Ac • B (9.3-5)
We leave the proof of this result as an exercise (Problem 9.14).
The opening operation satisfies the following properties:
C
(a) A  B is a subset (subimage) of A.
(b) If C is a subset of D, then C  B is a subset of D  B.
(c) (A  B)  B = A  B.
Similarly, the closing operation satisfies the following properties:
tu

(a) A is a subset (subimage) of A • B.


(b) If C is a subset of D, then C • B is a subset of D • B.
(c) (A • B) • B = A • B.
Note from condition (c) in both cases that multiple openings or closings of a
set have no effect after the operator has been applied once.
V

EXAMPLE 9.4: ■ Morphological operations can be used to construct filters similar in concept
Use of opening to the spatial filters discussed in Chapter 3. The binary image in Fig. 9.11(a)
and closing for
shows a section of a fingerprint corrupted by noise. Here the noise manifests
morphological
filtering. itself as random light elements on a dark background and as dark elements on
the light components of the fingerprint. The objective is to eliminate the noise
and its effects on the print while distorting it as little as possible. A morpho-
logical filter consisting of opening followed by closing can be used to accom-
plish this objective.
Figure 9.11(b) shows the structuring element used. The rest of Fig. 9.11
shows a step-by-step sequence of the filtering operation. Figure 9.11(c) is the
9.3 ■ Opening and Closing 639

A
AB 1 1 1 B a b
1 1 1 d c
1 1 1 e f
FIGURE 9.11
(a) Noisy image.
(b) Structuring
element.
(c) Eroded image.
(d) Opening of A.
(e) Dilation of the
opening.
(f) Closing of the

ud
opening.
(Original image
courtesy of the
(A  B)  B  A  B National Institute
(A  B)  B [(A  B)  B]  B  (A  B)  B of Standards and
Technology.)

lo
C
result of eroding A with the structuring element. The background noise was
completely eliminated in the erosion stage of opening because in this case all
tu

noise components are smaller than the structuring element. The size of the
noise elements (dark spots) contained within the fingerprint actually increased
in size. The reason is that these elements are inner boundaries that increase in
size as the object is eroded. This enlargement is countered by performing dila-
tion on Fig. 9.11(c). Figure 9.11(d) shows the result. The noise components con-
tained in the fingerprint were reduced in size or deleted completely.
V

The two operations just described constitute the opening of A by B. We note


in Fig. 9.11(d) that the net effect of opening was to eliminate virtually all noise
components in both the background and the fingerprint itself. However, new
gaps between the fingerprint ridges were created. To counter this undesirable
effect, we perform a dilation on the opening, as shown in Fig. 9.11(e). Most of
the breaks were restored, but the ridges were thickened, a condition that can be
remedied by erosion. The result, shown in Fig. 9.11(f), constitutes the closing of
the opening of Fig. 9.11(d). This final result is remarkably clean of noise specks,
but it has the disadvantage that some of the print ridges were not fully repaired,
and thus contain breaks. This is not totally unexpected, because no conditions
were built into the procedure for maintaining connectivity (we discuss this issue
again in Example 9.8 and demonstrate ways to address it in Section 11.1.7). ■
640 Chapter 9 ■ Morphological Image Processing

9.4 The Hit-or-Miss Transformation


The morphological hit-or-miss transform is a basic tool for shape detection.
We introduce this concept with the aid of Fig. 9.12, which shows a set A con-
sisting of three shapes (subsets), denoted C, D, and E. The shading in Figs. 9.12(a)
through (c) indicates the original sets, whereas the shading in Figs. 9.12(d) and
(e) indicates the result of morphological operations. The objective is to find
the location of one of the shapes, say, D.

a b
c d ACDE W (W  D)

ud
e
f
FIGURE 9.12
E Origin
(a) Set A. (b) A
window, W, and C
the local back-
D
ground of D with
respect to
W, (W - D).
(c) Complement
of A. (d) Erosion
of A by D.
(e) Erosion of Ac
by (W - D).
lo Ac

(A  D)

(f) Intersection of
(d) and (e), show-
C
ing the location of
the origin of D, as
desired. The dots Ac  (W  D)
indicate the
origins of C, D,
and E.
tu

Ac  (W  D)
V

(A  D)  (Ac  [W  D])
9.4 ■ The Hit-or-Miss Transformation 641

Let the origin of each shape be located at its center of gravity. Let D be en-
closed by a small window, W. The local background of D with respect to W is
defined as the set difference (W - D), as shown in Fig. 9.12(b). Figure 9.12(c)
shows the complement of A, which is needed later. Figure 9.12(d) shows the
erosion of A by D (the dashed lines are included for reference). Recall that
the erosion of A by D is the set of locations of the origin of D, such that D is
completely contained in A. Interpreted another way, A | D may be viewed
geometrically as the set of all locations of the origin of D at which D found a
match (hit) in A. Keep in mind that in Fig. 9.12 A consists only of the three
disjoint sets C, D, and E.
Figure 9.12(e) shows the erosion of the complement of A by the local back-

ud
ground set (W - D). The outer shaded region in Fig. 9.12(e) is part of the ero-
sion. We note from Figs. 9.12(d) and (e) that the set of locations for which D
exactly fits inside A is the intersection of the erosion of A by D and the erosion
of Ac by (W - D) as shown in Fig. 9.12(f). This intersection is precisely the lo-
cation sought. In other words, if B denotes the set composed of D and its back-
ground, the match (or set of matches) of B in A, denoted A ~ * B, is

* B = (A | D) ¨ C A | (W - D) D
c
A~ (9.4-1)
lo
We can generalize the notation somewhat by letting B = (B1, B2), where
B1 is the set formed from elements of B associated with an object and B2 is the
set of elements of B associated with the corresponding background. From the
preceding discussion, B1 = D and B2 = (W - D). With this notation, Eq.
(9.4-1) becomes
C
c
A~
* B = (A | B1) ¨ (A | B2) (9.4-2)

Thus, set A ~ * B contains all the (origin) points at which, simultaneously, B1


found a match (“hit”) in A and B2 found a match in Ac. By using the definition
tu

of set differences given in Eq. (2.6-19) and the dual relationship between ero-
sion and dilation given in Eq. (9.2-5), we can write Eq. (9.4-2) as

A~ N )
* B = (A | B1) - (A { B2 (9.4-3)

However, Eq. (9.4-2) is considerably more intuitive. We refer to any of the pre-
ceding three equations as the morphological hit-or-miss transform.
V

The reason for using a structuring element B1 associated with objects and
an element B2 associated with the background is based on an assumed defini-
tion that two or more objects are distinct only if they form disjoint (discon-
nected) sets. This is guaranteed by requiring that each object have at least a
one-pixel-thick background around it. In some applications, we may be inter-
ested in detecting certain patterns (combinations) of 1s and 0s within a set, in
which case a background is not required. In such instances, the hit-or-miss
transform reduces to simple erosion. As indicated previously, erosion is still a
set of matches, but without the additional requirement of a background match
for detecting individual objects. This simplified pattern detection scheme is
used in some of the algorithms developed in the following section.
642 Chapter 9 ■ Morphological Image Processing

9.5 Some Basic Morphological Algorithms


With the preceding discussion as foundation, we are now ready to consider
some practical uses of morphology. When dealing with binary images, one of
the principal applications of morphology is in extracting image components
that are useful in the representation and description of shape. In particular,
we consider morphological algorithms for extracting boundaries, connected
components, the convex hull, and the skeleton of a region. We also develop
several methods (for region filling, thinning, thickening, and pruning) that
are used frequently in conjunction with these algorithms as pre- or post-
processing steps. We make extensive use in this section of “mini-images,”

ud
designed to clarify the mechanics of each morphological process as we in-
troduce it. These images are shown graphically with 1s shaded and 0s in
white.

9.5.1 Boundary Extraction


The boundary of a set A, denoted by b(A), can be obtained by first eroding
A by B and then performing the set difference between A and its erosion.
That is,
lo b(A) = A - (A | B) (9.5-1)

where B is a suitable structuring element.


Figure 9.13 illustrates the mechanics of boundary extraction. It shows a
C
simple binary object, a structuring element B, and the result of using Eq.
(9.5-1). Although the structuring element in Fig. 9.13(b) is among the most
frequently used, it is by no means unique. For example, using a 5 * 5 struc-
turing element of 1s would result in a boundary between 2 and 3 pixels
thick.
tu

From this point on, we do


not show border padding
explicitly.

A B
V

AB b(A)

a b
c d
FIGURE 9.13 (a) Set A. (b) Structuring element B. (c) A eroded by B. (d) Boundary,
given by the set difference between A and its erosion.
9.5 ■ Some Basic Morphological Algorithms 643

a b
FIGURE 9.14
(a) A simple
binary image, with
1s represented in
white. (b) Result
of using
Eq. (9.5-1) with
the structuring
element in
Fig. 9.13(b).

ud
■ Figure 9.14 further illustrates the use of Eq. (9.5-1) with a 3 * 3 structuring EXAMPLE 9.5:
element of 1s. As for all binary images in this chapter, binary 1s are shown in Boundary
extraction by
white and 0s in black, so the elements of the structuring element, which are 1s,
lo morphological
also are treated as white. Because of the size of the structuring element used, processing.
the boundary in Fig. 9.14(b) is one pixel thick. ■

9.5.2 Hole Filling


A hole may be defined as a background region surrounded by a connected
C
border of foreground pixels. In this section, we develop an algorithm based on
set dilation, complementation, and intersection for filling holes in an image.
Let A denote a set whose elements are 8-connected boundaries, each bound-
ary enclosing a background region (i.e., a hole). Given a point in each hole, the
tu

objective is to fill all the holes with 1s.


We begin by forming an array, X0, of 0s (the same size as the array contain-
ing A), except at the locations in X0 corresponding to the given point in each
hole, which we set to 1. Then, the following procedure fills all the holes with 1s:

Xk = (Xk - 1 { B) ¨ Ac k = 1, 2, 3, Á (9.5-2)
V

where B is the symmetric structuring element in Fig. 9.15(c).The algorithm termi-


nates at iteration step k if Xk = Xk - 1. The set Xk then contains all the filled
holes.The set union of Xk and A contains all the filled holes and their boundaries.
The dilation in Eq. (9.5-2) would fill the entire area if left unchecked. However,
the intersection at each step with Ac limits the result to inside the region of inter-
est.This is our first example of how a morphological process can be conditioned to
meet a desired property. In the current application, it is appropriately called
conditional dilation. The rest of Fig. 9.15 illustrates further the mechanics of
Eq. (9.5-2).Although this example only has one hole, the concept clearly applies to
any finite number of holes, assuming that a point inside each hole region is given.
644 Chapter 9 ■ Morphological Image Processing

a b c
d e f
g h i
FIGURE 9.15 Hole
filling. (a) Set A
(shown shaded).
(b) Complement
of A.
A Ac B
(c) Structuring
element B.
(d) Initial point
inside the

ud
boundary.
(e)–(h) Various
steps of
Eq. (9.5-2).
(i) Final result
[union of (a) X0 X1 X2
and (h)].

lo X6 X8 X8  A
C
EXAMPLE 9.6: ■ Figure 9.16(a) shows an image composed of white circles with black inner
Morphological spots. An image such as this might result from thresholding into two levels a
hole filling.
scene containing polished spheres (e.g., ball bearings). The dark spots inside
the spheres could be the result of reflections. The objective is to eliminate the
tu

reflections by hole filling. Figure 9.16(a) shows one point selected inside one of
the spheres, and Fig. 9.16(b) shows the result of filling that component. Finally,
V

a b c
FIGURE 9.16 (a) Binary image (the white dot inside one of the regions is the starting
point for the hole-filling algorithm). (b) Result of filling that region. (c) Result of filling
all holes.
9.5 ■ Some Basic Morphological Algorithms 645

Fig. 9.16(c) shows the result of filling all the spheres. Because it must be known
whether black points are background points or sphere inner points, fully au-
tomating this procedure requires that additional “intelligence” be built into
the algorithm. We give a fully automatic approach in Section 9.5.9 based on
morphological reconstruction. (See also Problem 9.23.) ■

9.5.3 Extraction of Connected Components


The concepts of connectivity and connected components were introduced in
Section 2.5.2. Extraction of connected components from a binary image is cen-
tral to many automated image analysis applications. Let A be a set containing

ud
one or more connected components, and form an array X0 (of the same size as
the array containing A) whose elements are 0s (background values), except at
each location known to correspond to a point in each connected component in
A, which we set to 1 (foreground value). The objective is to start with X0 and
find all the connected components. The following iterative procedure accom-
plishes this objective:
Xk = (Xk - 1 { B) ¨ A k = 1, 2, 3, Á (9.5-3)
lo
where B is a suitable structuring element (as in Fig. 9.17). The procedure ter-
minates when Xk = Xk - 1, with Xk containing all the connected components

B
C
tu

A X0 X1
V

X2 X3 X6
a
b c d
e f g
FIGURE 9.17 Extracting connected components. (a) Structuring element. (b) Array
containing a set with one connected component. (c) Initial array containing a 1 in the
region of the connected component. (d)–(g) Various steps in the iteration of Eq. (9.5-3).
646 Chapter 9 ■ Morphological Image Processing

of the input image. Note the similarity in Eqs. (9.5-3) and (9.5-2), the only dif-
ference being the use of A as opposed to Ac. This is not surprising, because
here we are looking for foreground points, while the objective in Section 9.5.2
was to find background points.
Figure 9.17 illustrates the mechanics of Eq. (9.5-3), with convergence being
achieved for k = 6. Note that the shape of the structuring element used is
based on 8-connectivity between pixels. If we had used the SE in Fig. 9.15,
which is based on 4-connectivity, the leftmost element of the connected com-
See Problem 9.24 for an
ponent toward the bottom of the image would not have been detected because
algorithm that does not it is 8-connected to the rest of the figure. As in the hole-filling algorithm,
require that a point in
Eq. (9.5-3) is applicable to any finite number of connected components con-

ud
each connected compo-
nent be known a priori. tained in A, assuming that a point is known in each.

EXAMPLE 9.7: ■ Connected components are used frequently for automated inspection.
Using connected Figure 9.18(a) shows an X-ray image of a chicken breast that contains bone
components to fragments. It is of considerable interest to be able to detect such objects in
detect foreign
objects in
processed food before packaging and/or shipping. In this particular case, the
packaged food. density of the bones is such that their nominal intensity values are different

a
b
lo
from the background. This makes extraction of the bones from the background

c d
FIGURE 9.18
C
(a) X-ray image
of chicken filet
with bone frag-
ments.
(b) Thresholded
tu

image. (c) Image


eroded with a
5 * 5 structuring
element of 1s.
(d) Number of Connected No. of pixels in
pixels in the component connected comp
connected compo-
01 11
nents of (c).
V

02 9
(Image courtesy of 03 9
NTB 04 39
Elektronische 05 133
Geraete GmbH, 06 1
Diepholz, 07 1
Germany, 08 743
www.ntbxray.com.) 09 7
10 11
11 11
12 9
13 9
14 674
15 85
9.5 ■ Some Basic Morphological Algorithms 647

a simple matter by using a single threshold (thresholding was introduced in


Section 3.1 and is discussed in more detail in Section 10.3). The result is the bi-
nary image in Fig. 9.18(b).
The most significant feature in this figure is the fact that the points that re-
main are clustered into objects (bones), rather than being isolated, irrelevant
points. We can make sure that only objects of “significant” size remain by erod-
ing the thresholded image. In this example, we define as significant any object
that remains after erosion with a 5 * 5 structuring element of 1s. The result of
erosion is shown in Fig. 9.18(c). The next step is to analyze the size of the ob-
jects that remain. We label (identify) these objects by extracting the connected
components in the image. The table in Fig. 9.18(d) lists the results of the extrac-

ud
tion. There are a total of 15 connected components, with four of them being
dominant in size. This is enough to determine that significant undesirable ob-
jects are contained in the original image. If needed, further characterization
(such as shape) is possible using the techniques discussed in Chapter 11. ■

9.5.4 Convex Hull


A set A is said to be convex if the straight line segment joining any two points
lo
in A lies entirely within A. The convex hull H of an arbitrary set S is the small-
est convex set containing S. The set difference H - S is called the convex de-
ficiency of S. As discussed in more detail in Sections 11.1.6 and 11.3.2, the
convex hull and convex deficiency are useful for object description. Here, we
present a simple morphological algorithm for obtaining the convex hull, C(A),
of a set A.
C
Let Bi, i = 1, 2, 3, 4, represent the four structuring elements in Fig. 9.19(a).
The procedure consists of implementing the equation:

Xik = (Xk - 1 ~ i
* B)´A i = 1, 2, 3, 4 and k = 1, 2, 3, Á (9.5-4)
tu

with Xi0 = A. When the procedure converges (i.e., when Xik = Xik - 1), we let
Di = Xik. Then the convex hull of A is
4
C(A) = d Di (9.5-5)
i=1
V

In other words, the method consists of iteratively applying the hit-or-miss


transform to A with B1; when no further changes occur, we perform the union
with A and call the result D1. The procedure is repeated with B2 (applied to A)
until no further changes occur, and so on. The union of the four resulting Ds
constitutes the convex hull of A. Note that we are using the simplified imple-
mentation of the hit-or-miss transform in which no background match is re-
quired, as discussed at the end of Section 9.4.
Figure 9.19 illustrates the procedure given in Eqs. (9.5-4) and (9.5-5).
Figure 9.19(a) shows the structuring elements used to extract the convex hull.
The origin of each element is at its center. The * entries indicate “don’t care”
conditions. This means that a structuring element is said to have found a match
648 Chapter 9 ■ Morphological Image Processing

**
*
**
** **

**

**
* *
b c d

*
** **
e f g
B1 B2 B3 B4
h
FIGURE 9.19
(a) Structuring
elements. (b) Set
A. (c)–(f) Results
of convergence
with the
structuring
elements shown

ud
in (a). (g) Convex
hull. (h) Convex X 01  A X 41 X 22
hull showing the
contribution of
each structuring
element.

lo X 83 X 24 C(A)

B1
C
B2
B3
B4
tu

in A if the 3 * 3 region of A under the structuring element mask at that loca-


tion matches the pattern of the mask. For a particular mask, a pattern match
occurs when the center of the 3 * 3 region in A is 0, and the three pixels under
the shaded mask elements are 1. The values of the other pixels in the 3 * 3 re-
gion do not matter. Also, with respect to the notation in Fig. 9.19(a), Bi is a
V

clockwise rotation of Bi - 1 by 90°.


Figure 9.19(b) shows a set A for which the convex hull is sought. Starting
with X10 = A resulted in the set in Fig. 9.19(c) after four iterations of Eq. (9.5-4).
Then, letting X20 = A and again using Eq. (9.5-4) resulted in the set in
Fig. 9.19(d) (convergence was achieved in only two steps in this case). The next
two results were obtained in the same way. Finally, forming the union of the
sets in Figs. 9.19(c), (d), (e), and (f) resulted in the convex hull shown in
Fig. 9.19(g). The contribution of each structuring element is highlighted in the
composite set shown in Fig. 9.19(h).
One obvious shortcoming of the procedure just outlined is that the con-
vex hull can grow beyond the minimum dimensions required to guarantee
9.5 ■ Some Basic Morphological Algorithms 649

convexity. One simple approach to reduce this effect is to limit growth so


that it does not extend past the vertical and horizontal dimensions of the
original set of points. Imposing this limitation on the example in Fig. 9.19 re-
sulted in the image shown in Fig. 9.20. Boundaries of greater complexity can
be used to limit growth even further in images with more detail. For exam-
ple, we could use the maximum dimensions of the original set of points along
the vertical, horizontal, and diagonal directions. The price paid for refine-
ments such as this is additional complexity and increased computational re-
quirements of the algorithm.

ud
9.5.5 Thinning
The thinning of a set A by a structuring element B, denoted A z B, can be de-
fined in terms of the hit-or-miss transform:

A z B = A - (A ~
* B)
c
= A ¨ (A ~
* B) (9.5-6)
lo
As in the previous section, we are interested only in pattern matching with the
structuring elements, so no background operation is required in the hit-or-miss
transform. A more useful expression for thinning A symmetrically is based on
a sequence of structuring elements:

5B6 = 5B1, B2, B3, Á , Bn6 (9.5-7)


C
where Bi is a rotated version of Bi - 1. Using this concept, we now define thin-
ning by a sequence of structuring elements as

A z 5B6 = (( Á ((A z B1) z B2) Á ) z Bn) (9.5-8)


tu

The process is to thin A by one pass with B1, then thin the result with one pass
of B2, and so on, until A is thinned with one pass of Bn. The entire process is
repeated until no further changes occur. Each individual thinning pass is per-
formed using Eq. (9.5-6).
V

FIGURE 9.20
Result of limiting
growth of the
convex hull
algorithm to the
maximum
dimensions of the
original set of
points along the
vertical and
horizontal
directions.
650 Chapter 9 ■ Morphological Image Processing

Figure 9.21(a) shows a set of structuring elements commonly used for


thinning, and Fig. 9.21(b) shows a set A to be thinned by using the proce-
dure just discussed. Figure 9.21(c) shows the result of thinning after one
pass of A with B1, and Figs. 9.21(d) through (k) show the results of passes
with the other structuring elements. Convergence was achieved after the
second pass of B6. Figure 9.21(l) shows the thinned result. Finally, Fig.
9.21(m) shows the thinned set converted to m-connectivity (see Section
2.5.2) to eliminate multiple paths.

9.5.6 Thickening

ud
Thickening is the morphological dual of thinning and is defined by the expression

A } B = A ´ (A ~
* B) (9.5-9)

Origin

*
*

*
*
*
* *
* *

*
*

* *
*

B1 B2 B3 B4 B5 B6 B7 B8

Origin
lo
A A1  A  B1 A2  A1  B2
C
tu

A3  A2  B3 A4  A3  B4 A5  A4  B5

A6  A5  B6 A8  A6  B7,8 A8,4  A8  B1,2,3,4


V

A8,5  A8,4  B5 A8,6  A8,5  B6 A8,6 converted to


No more changes after this. m-connectivity.

a FIGURE 9.21 (a) Sequence of rotated structuring elements used for thinning. (b) Set A.
b c d (c) Result of thinning with the first element. (d)–(i) Results of thinning with the next
e f g seven elements (there was no change between the seventh and eighth elements).
h i j (j) Result of using the first four elements again. (l) Result after convergence. (m)
k l m Conversion to m-connectivity.
9.5 ■ Some Basic Morphological Algorithms 651

where B is a structuring element suitable for thickening. As in thinning, thick-


ening can be defined as a sequential operation:

A } 5B6 = (( Á ((A } B1) } B2) Á ) } Bn) (9.5-10)

The structuring elements used for thickening have the same form as those
shown in Fig. 9.21(a), but with all 1s and 0s interchanged. However, a separate
algorithm for thickening is seldom used in practice. Instead, the usual proce-
dure is to thin the background of the set in question and then complement the
result. In other words, to thicken a set A, we form C = Ac, thin C, and then
form C c. Figure 9.22 illustrates this procedure.

ud
Depending on the nature of A, this procedure can result in disconnected
points, as Fig. 9.22(d) shows. Hence thickening by this method usually is fol-
lowed by postprocessing to remove disconnected points. Note from Fig. 9.22(c)
that the thinned background forms a boundary for the thickening process.
This useful feature is not present in the direct implementation of thickening
using Eq. (9.5-10), and it is one of the principal reasons for using background
thinning to accomplish thickening.

9.5.7 Skeletons
lo
As Fig. 9.23 shows, the notion of a skeleton, S(A), of a set A is intuitively sim-
ple. We deduce from this figure that
(a) If z is a point of S(A) and (D)z is the largest disk centered at z and con-
C
tained in A, one cannot find a larger disk (not necessarily centered at z)
containing (D)z and included in A. The disk (D)z is called a maximum
disk.
(b) The disk (D)z touches the boundary of A at two or more different places.
tu
V

a b
c d
e
FIGURE 9.22 (a) Set A. (b) Complement of A. (c) Result of thinning the complement
of A. (d) Thickened set obtained by complementing (c). (e) Final result, with no
disconnected points.
652 Chapter 9 ■ Morphological Image Processing

a b
c d
FIGURE 9.23
(a) Set A.
(b) Various
positions of
maximum disks
with centers on
the skeleton of A.
(c) Another
maximum disk on
a different

ud
segment of the
skeleton of A.
(d) Complete
skeleton.

lo
The skeleton of A can be expressed in terms of erosions and openings. That is,
it can be shown (Serra [1982]) that
K
C
S(A) = d Sk(A) (9.5-11)
k=0
with
Sk(A) = (A | kB) - (A | kB)  B (9.5-12)
tu

where B is a structuring element, and (A | kB) indicates k successive erosions


of A:

(A | kB) = (( Á ((A | B) | B) | Á ) | B) (9.5-13)


k times, and K is the last iterative step before A erodes to an empty set. In
other words,
V

K = max5k ƒ (A | kB) Z 6 (9.5-14)


The formulation given in Eqs. (9.5-11) and (9.5-12) states that S(A) can be
obtained as the union of the skeleton subsets Sk(A). Also, it can be shown that
A can be reconstructed from these subsets by using the equation
K
A = d (Sk(A) { kB) (9.5-15)
k=0

where (Sk(A) { kB) denotes k successive dilations of Sk(A); that is,


(Sk(A) { kB) = (( Á ((Sk(A) { B) { B) { Á ) { B) (9.5-16)
9.5 ■ Some Basic Morphological Algorithms 653

■ Figure 9.24 illustrates the concepts just discussed. The first column EXAMPLE 9.8:
shows the original set (at the top) and two erosions by the structuring ele- Computing the
skeleton of a
ment B. Note that one more erosion of A would yield the empty set, so
simple figure.
K = 2 in this case. The second column shows the opening of the sets in the
first column by B. These results are easily explained by the fitting charac-
terization of the opening operation discussed in connection with Fig. 9.8.
The third column simply contains the set differences between the first and
second columns.
The fourth column contains two partial skeletons and the final result (at
the bottom of the column). The final skeleton not only is thicker than it
needs to be but, more important, it is not connected. This result is not unex-

ud
pected, as nothing in the preceding formulation of the morphological skele-
ton guarantees connectivity. Morphology produces an elegant formulation in
terms of erosions and openings of the given set. However, heuristic formula-
tions such as the algorithm developed in Section 11.1.7 are needed if, as is
usually the case, the skeleton must be maximally thin, connected, and mini-
mally eroded.

k
A  kB (A  kB)  B
lo
Sk(A)
K
 Sk(A)
k0
K
Sk(A)  kB Sk(A)  kB
k0
FIGURE 9.24
Implementation
of Eqs. (9.5-11)
through (9.5-15).
The original set is
at the top left, and
its morphological
C
skeleton is at the
0 bottom of the
fourth column.
The reconstructed
set is at the
tu

bottom of the
sixth column.

1
V

S(A) A

B
654 Chapter 9 ■ Morphological Image Processing

The fifth column shows S0(A), S1(A) { B, and (S2(A) { 2B) =


(S2(A) { B) { B. Finally, the last column shows reconstruction of set A, which,
according to Eq. (9.5-15), is the union of the dilated skeleton subsets shown in the
fifth column. ■

9.5.8 Pruning
Pruning methods are an essential complement to thinning and skeletonizing
algorithms because these procedures tend to leave parasitic components that
need to be “cleaned up” by postprocessing. We begin the discussion with a
pruning problem and then develop a morphological solution based on the ma-

ud
terial introduced in the preceding sections. Thus, we take this opportunity to il-
lustrate how to go about solving a problem by combining several of the
techniques discussed up to this point.
A common approach in the automated recognition of hand-printed charac-
ters is to analyze the shape of the skeleton of each character. These skeletons
often are characterized by “spurs” (parasitic components). Spurs are caused
during erosion by non uniformities in the strokes composing the characters.
We develop a morphological technique for handling this problem, starting
lo
with the assumption that the length of a parasitic component does not exceed
a specified number of pixels.
Figure 9.25(a) shows the skeleton of a hand-printed “a.” The parasitic com-
ponent on the leftmost part of the character is illustrative of what we are in-
terested in removing. The solution is based on suppressing a parasitic branch
We may define an end by successively eliminating its end point. Of course, this also shortens (or elim-
point as the center point
C
of a 3 * 3 region that inates) other branches in the character but, in the absence of other structural
satisfies any of the information, the assumption in this example is that any branch with three or
arrangements in
Figs. 9.25(b) or (c). less pixels is to be eliminated. Thinning of an input set A with a sequence of
structuring elements designed to detect only end points achieves the desired
result. That is, let
tu

X1 = A z 5B6 (9.5-17)

where 5B6 denotes the structuring element sequence shown in Figs. 9.25(b)
and (c) [see Eq. (9.5-7) regarding structuring-element sequences]. The se-
quence of structuring elements consists of two different structures, each of
which is rotated 90° for a total of eight elements. The * in Fig. 9.25(b) sig-
V

nifies a “don’t care” condition, in the sense that it does not matter whether
the pixel in that location has a value of 0 or 1. Numerous results reported in
the literature on morphology are based on the use of a single structuring ele-
ment, similar to the one in Fig. 9.25(b), but having “don’t care” conditions
along the entire first column. This is incorrect. For example, this element
would identify the point located in the eighth row, fourth column of Fig.
9.25(a) as an end point, thus eliminating it and breaking connectivity in the
stroke.
Applying Eq. (9.5-17) to A three times yields the set X1 in Fig. 9.25(d). The
next step is to “restore” the character to its original form, but with the parasitic
9.5 ■ Some Basic Morphological Algorithms 655

a b
c
*
d e
* B1, B2, B3, B4 (rotated 90) f g
FIGURE 9.25
(a) Original
image. (b) and
(c) Structuring
B5, B6, B7, B8 (rotated 90) elements used for
deleting end
points. (d) Result
of three cycles of

ud
thinning. (e) End
points of (d).
(f) Dilation of end
points condi-
tioned on (a).
(g) Pruned image.

lo
C
branches removed. To do so first requires forming a set X2 containing all end
points in X1 [Fig. 9.25(e)]:
8
tu

k
X2 = d (X1 ~
*B ) (9.5-18)
k=1
where the Bk are the same end-point detectors shown in Figs. 9.25(b) and (c).
The next step is dilation of the end points three times, using set A as a delimiter:
Equation (9.5-19) is the
X3 = (X2 { H) ¨ A (9.5-19) basis for morphological
reconstruction by dila-
where H is a 3 * 3 structuring element of 1s and the intersection with A is
V

tion, as explained in the


applied after each step. As in the case of region filling and extraction of con- next section.

nected components, this type of conditional dilation prevents the creation


of 1-valued elements outside the region of interest, as evidenced by the re-
sult shown in Fig. 9.25(f). Finally, the union of X3 and X1 yields the desired
result,
X4 = X1 ´ X3 (9.5-20)
in Fig. 9.25(g).
In more complex scenarios, use of Eq. (9.5-19) sometimes picks up the
“tips” of some parasitic branches. This condition can occur when the end
656 Chapter 9 ■ Morphological Image Processing

points of these branches are near the skeleton. Although Eq. (9.5-17) may
eliminate them, they can be picked up again during dilation because they are
valid points in A. Unless entire parasitic elements are picked up again (a rare
case if these elements are short with respect to valid strokes), detecting and
eliminating them is easy because they are disconnected regions.
A natural thought at this juncture is that there must be easier ways to solve
this problem. For example, we could just keep track of all deleted points and
simply reconnect the appropriate points to all end points left after application
of Eq. (9.5-17). This option is valid, but the advantage of the formulation just
presented is that the use of simple morphological constructs solved the entire
problem. In practical situations when a set of such tools is available, the ad-

ud
vantage is that no new algorithms have to be written. We simply combine the
necessary morphological functions into a sequence of operations.

9.5.9 Morphological Reconstruction


The morphological concepts discussed thus far involve an image and a struc-
turing element. In this section, we discuss a powerful morphological transfor-
mation called morphological reconstruction that involves two images and a
lo
structuring element. One image, the marker, contains the starting points for
the transformation. The other image, the mask, constrains the transformation.
The structuring element is used to define connectivity.†
Geodesic dilation and erosion
Central to morphological reconstruction are the concepts of geodesic dilation
C
and geodesic erosion. Let F denote the marker image and G the mask image.
It is assumed in this discussion that both are binary images and that F 8 G.
The geodesic dilation of size 1 of the marker image with respect to the mask,
denoted by D(1)
G (F), is defined as
D(1) (9.5-21)
tu

G (F) = (F { B) ¨ G
where ¨ denotes the set intersection (here ¨ may be interpreted as a logical
AND because the set intersection and logical AND operations are the same
for binary sets). The geodesic dilation of size n of F with respect to G is de-
fined as

G C G
D(n)
G
(F) = D(1) D(n - 1)(F) D (9.5-22)
V

(0)
with DG (F) = F. In this recursive expression, the set intersection in Eq. (9.5-21)
is performed at each step.‡ Note that the intersection operator guarantees that


In much of the literature on morphological reconstruction, the structuring element is tacitly assumed to
be isotropic and typically is called an elementary isotropic structuring element. In the context of this
chapter, an example of such an SE is simply a 3 * 3 array of 1s with the origin at the center.

Although it is more intuitive to develop morphological-reconstruction methods using recursive formu-
lations (as we do here), their practical implementation typically is based on more computationally effi-
cient algorithms (see, for example, Vincent [1993] and Soille [2003]). All image-based examples in this
section were generated using such algorithms.
9.5 ■ Some Basic Morphological Algorithms 657

FIGURE 9.26
Illustration of
B geodesic dilation.


Marker, F
Marker dilated by B Geodesic dilation, D(1)(F )
G

ud
Mask, G

mask G will limit the growth (dilation) of marker F. Figure 9.26 shows a sim-
ple example of a geodesic dilation of size 1. The steps in the figure are a direct
implementation of Eq. (9.5-21).
Similarly, the geodesic erosion of size 1 of marker F with respect to mask G
is defined as lo
E (1)
G (F) = (F | B) ´ G (9.5-23)
where ´ denotes set union (or OR operation). The geodesic erosion of size n
of F with respect to G is defined as

G (F) = E G C E G (F) D
E (n) (1) (n - 1)
(9.5-24)
C
with E (0)
G (F) = F. The set union operation in Eq. (9.5-23) is performed at each
iterative step, and guarantees that geodesic erosion of an image remains
greater than or equal to its mask image. As expected from the forms in Eqs.
(9.5-21) and (9.5-23), geodesic dilation and erosion are duals with respect to
tu

set complementation (see Problem 9.29). Figure 9.27 shows a simple example
of geodesic erosion of size 1. The steps in the figure are a direct implementa-
tion of Eq. (9.5-23).

FIGURE 9.27
Illustration of
V

B geodesic erosion.


Marker, F
Marker eroded by B Geodesic erosion, E (1)(F )
G

Mask, G
658 Chapter 9 ■ Morphological Image Processing

Geodesic dilation and erosion of finite images always converge after a finite
number of iterative step because propagation or shrinking of the marker
image is constrained by the mask.

Morphological reconstruction by dilation and by erosion


Based on the preceding concepts, morphological reconstruction by dilation of a
mask image G from a marker image F, denoted R D G (F), is defined as the geo-
desic dilation of F with respect to G, iterated until stability is achieved; that is,

RD (k)
G (F) = D G (F) (9.5-25)

ud
with k such that D(k)
G (F) = D G
(k + 1)
(F).
Figure 9.28 illustrates reconstruction by dilation. Figure 9.28(a) continues
the process begun in Fig. 9.26; that is, the next step in reconstruction after ob-
(1)
taining DG (F) is to dilate this result and then AND it with the mask G to yield
(2) (2)
DG (F), as Fig. 9.28(b) shows. Dilation of DG (F) and masking with G then
(3)
yields DG (F), and so on. This procedure is repeated until stability is
reached. If we carried this example one more step, we would find that
(5) (6)
DG (F) = DG (F), so the morphologically reconstructed image by dilation is
lo (5)
given by R D G (F) = D G (F), as indicated in Eq. (9.5-25). Note that the recon-
structed image in this case is identical to the mask because F contained a sin-
gle 1-valued pixel (this is analogous to convolution of an image with an
impulse, which simply copies the image at the location of the impulse, as ex-
plained in Section 3.4.2).
C
In a similar manner, the morphological reconstruction by erosion of a mask
image G from a marker image F, denoted RE G(F), is defined as the geodesic
erosion of F with respect to G, iterated until stability; that is,
(k)
RE
G(F) = E G (F) (9.5-26)
tu

(k) (k + 1)
with k such that E G (F) = E G (F). As an exercise, you should generate a
figure similar to Fig. 9.28 for morphological reconstruction by erosion.

a b c d
e f g h
FIGURE 9.28
V

Illustration of
morphological
reconstruction by
dilation. F, G, B (1) (2) (2) (3)
(1) DG (F) dilated by B DG (F) DG (F ) dilated by B DG (F)
and DG (F) are
from Fig. 9.26.

(3) (4) (4) (5) D


DG (F) dilated by B DG (F) DG (F ) dilated by B DG (F)  RG (F )
9.5 ■ Some Basic Morphological Algorithms 659

Reconstruction by dilation and erosion are duals with respect to set com-
plementation (see Problem 9.30).
Sample applications
Morphological reconstruction has a broad spectrum of practical applications,
each determined by the selection of the marker and mask images, by the struc-
turing elements used, and by combinations of the primitive operations defined
in the preceding discussion. The following examples illustrate the usefulness of
these concepts.

Opening by reconstruction: In a morphological opening, erosion removes

ud
small objects and the subsequent dilation attempts to restore the shape of ob-
jects that remain. However, the accuracy of this restoration is highly dependent
on the similarity of the shapes of the objects and the structuring element used.
Opening by reconstruction restores exactly the shapes of the objects that remain
after erosion.The opening by reconstruction of size n of an image F is defined as
the reconstruction by dilation of F from the erosion of size n of F; that is,

R (F) = R F C (F | nB) D
O (n) D
(9.5-27)
lo
where (F | nB) indicates n erosions of F by B, as explained in Section 9.5.7.
Note that F is used as the mask in this application. A similar expression can be
written for closing by reconstruction (see Table 9.1).
Figure 9.29 shows an example of opening by reconstruction. In this illus-
tration, we are interested in extracting from Fig. 9.29(a) the characters that
C
contain long, vertical strokes. Opening by reconstruction requires at least
one erosion, so we perform that step first. Figure 9.29(b) shows the erosion
tu
V

a b
c d
FIGURE 9.29 (a) Text image of size 918 * 2018 pixels. The approximate average height
of the tall characters is 50 pixels. (b) Erosion of (a) with a structuring element of size
51 * 1 pixels. (c) Opening of (a) with the same structuring element, shown for
reference. (d) Result of opening by reconstruction.
660 Chapter 9 ■ Morphological Image Processing

of Fig. 9.29(a) with a structuring element of length proportional to the aver-


age height of the tall characters (51 pixels) and width of one pixel. For the
purpose of comparison, we computed the opening of the image using the
same structuring element. Figure 9.29(c) shows the result. Finally, Fig. 9.29(d)
(1)
is the opening by reconstruction (of size 1) of F [i.e., O R (F)] given in Eq.
(9.5-27). This result shows that characters containing long vertical strokes
were restored accurately; all other characters were removed.

Filling holes: In Section 9.5.2, we developed an algorithm for filling holes


based on knowing a starting point in each hole in the image. Here, we develop
a fully automated procedure based on morphological reconstruction. Let

ud
I(x, y) denote a binary image and suppose that we form a marker image F that
is 0 everywhere, except at the image border, where it is set to 1 - I; that is,
1 - I(x, y) if (x, y) is on the border of I
F(x, y) = b (9.5-28)
0 otherwise
Then
H = C R Ic (F) D
D c
(9.5-29)
lo
is a binary image equal to I with all holes filled.
Let us consider the individual components of Eq. (9.5-29) to see how this
expression in fact leads to all holes in an image being filled. Figure 9.30(a)
shows a simple image I containing one hole, and Fig. 9.30(b) shows its comple-
ment. Note that because the complement of I sets all foreground (1-valued)
pixels to background (0-valued) pixels, and vice versa, this operation in effect
C
builds a “wall” of 0s around the hole. Because I c is used as an AND mask, all
we are doing here is protecting all foreground pixels (including the wall
around the hole) from changing during iteration of the procedure. Figure
9.30(c) is array F formed according to Eq. (9.5-28) and Fig. 9.30(d) is F dilated
with a 3 * 3 SE whose elements are all 1s. Note that marker F has a border of
tu

1s (except at locations where I is 1), so the dilation of F of the marker points


starts at the border and proceeds inward. Figure 9.30(e) shows the geodesic di-
lation of F using I c as the mask. As was just indicated, we see that all locations
in this result corresponding to foreground pixels from I are 0, and that this is
true now for the hole pixels as well. Another iteration will yield the same re-
sult which, when complemented as required by Eq. (9.5-29), gives the result in
V

Fig. 9.30(f). As desired, the hole is now filled and the rest of image I was un-
changed. The operation H ¨ I c yields an image containing 1-valued pixels in
the locations corresponding to the holes in I, as Fig. 9.30(g) shows.

a b c d e f g
FIGURE 9.30
Illustration of
hole filling on a
simple image.

I Ic F FB F  B  Ic H H  Ic
9.5 ■ Some Basic Morphological Algorithms 661

a b
c d
FIGURE 9.31
(a) Text image of
size 918 * 2018
pixels. (b) Com-
plement of (a) for
use as a mask
image. (c) Marker
image. (d) Result
of hole-filling
using Eq. (9.5-29).

ud
Figure 9.31 shows a more practical example. Figure 9.31(b) shows the com-
plement of the text image in Fig. 9.31(a), and Fig. 9.31(c) is the marker image,
F, generated using Eq. (9.5-28). This image has a border of 1s, except at loca-
tions corresponding to 1s in the border of the original image. Finally, Fig. 9.31(d)
lo
shows the image with all the holes filled.

Border clearing: The extraction of objects from an image for subsequent


shape analysis is a fundamental task in automated image processing. An algo-
rithm for removing objects that touch (i.e., are connected to) the border is a
useful tool because (1) it can be used to screen images so that only complete
C
objects remain for further processing, or (2) it can be used as a signal that par-
tial objects are present in the field of view. As a final illustration of the con-
cepts introduced in this section, we develop a border-clearing procedure based
on morphological reconstruction. In this application, we use the original image
as the mask and the following marker image:
tu

I(x, y) if (x, y) is on the border of I


F(x, y) = b (9.5-30)
0 otherwise
The border-Eclearing algorithm first computes the morphological reconstruc-
tion RD
I (F) (which simply extracts the objects touching the border) and then
computes the difference
V

X = I - RD
I (F) (9.5-31)
to obtain an image, X, with no objects touching the border.
a b
FIGURE 9.32
Border clearing.
(a) Marker image.
(b) Image with no
objects touching
the border. The
original image is
Fig. 9.29(a).
662 Chapter 9 ■ Morphological Image Processing

FIGURE 9.33 Five


basic types of
structuring B B
elements used for I II
binary morphol-
 
ogy. The origin of  B i i  1, 2, 3, 4 B i i  1, 2, . . . , 8


each element is at   (rotate 90) (rotate 45)
its center and the III IV
*’s indicate
“don’t care” 
values. B i i  1, 2, 3, 4 B i i  5, 6, 7, 8
 (rotate 90) (rotate 90)

ud
V

As an example, consider the text image again. Figure 9.32(a) in the previous
page shows the reconstruction R D I (F) obtained using a 3 * 3 structuring ele-
ment of all 1s (note the objects touching the boundary on the right side), and
Fig. 9.32(b) shows image X, computed using Eq. (9.5-31). If the task at hand
were automated character recognition, having an image in which no characters
touch the border is most useful because the problem of having to recognize
lo
partial characters (a difficult task at best) is avoided.

9.5.10 Summary of Morphological Operations on Binary Images


Table 9.1 summarizes the morphological results developed in the preceding
sections, and Fig. 9.33 summarizes the basic types of structuring elements used
C
in the various morphological processes discussed thus far. The Roman numer-
als in the third column of Table 9.1 refer to the structuring elements in Fig. 9.33.

TABLE 9.1 Comments


Summary of (The Roman numerals refer to the
tu

morphological Operation Equation structuring elements in Fig. 9.33.)

(B)z = 5w ƒ w = b + z,
operations and
their properties. Translation Translates the origin
for b H B6 of B to point z.

Reflection N = 5w ƒ w = -b, for b H B6


B Reflects all elements of
B about the origin of this set.
Complement Ac = 5w ƒ w x A6 Set of points not in A.
V

Difference A - B = 5w ƒ w H A, w x B6 Set of points that belong to A


= A ¨ Bc but not to B.
Dilation A { B = E z ƒ (B
N ) ¨ A Z F
z “Expands” the boundary
of A. (I)
Erosion A|B = E z ƒ (B)z 8 A F “Contracts” the boundary of
A. (I)
Opening A  B = (A | B) { B Smoothes contours, breaks
narrow isthmuses, and
eliminates small islands and
sharp peaks. (I)

(Continued)
9.5 ■ Some Basic Morphological Algorithms 663

Comments TABLE 9.1


(The Roman numerals refer to the (Continued)
Operation Equation structuring elements in Fig. 9.33.)
Closing A • B = (A { B) | B Smoothes contours, fuses
narrow breaks and long thin
gulfs, and eliminates small
holes. (I)
c
Hit-or-miss * B = (A | B1) ¨ (A | B2)
A~ The set of points (coordinates)
transform = (A | B1) - (A { BN ) at which, simultaneously, B1
2
found a match (“hit”) in A
and B2 found a match in Ac

ud
Boundary b(A) = A - (A | B) Set of points on the boundary
extraction of set A. (I)
Hole filling Xk = (Xk - 1 { B) ¨ Ac; Fills holes in A; X0 = array of
k = 1, 2, 3, Á 0s with a 1 in each hole. (II)
Connected Xk = (Xk - 1 { B) ¨ A; Finds connected components
components k = 1, 2, 3, Á in A; X0 = array of 0s with a
1 in each connected

Convex hull Xik = (Xik - 1 ~


i = 1, 2, 3, 4;
k = 1, 2, 3, Á ;
Xi0 = A; and
lo
i
* B ) ´ A;
component. (I)
Finds the convex hull C(A) of
set A, where “conv” indicates
convergence in the sense that
Xik = Xik - 1. (III)
Di = Xiconv
C
Thinning A z B = A - (A ~ * B) Thins set A. The first two
= A ¨ (A ~* B)
c equations give the basic defi-
A z 5B6 = nition of thinning. The last
(( Á ((A z B1) z B2) Á ) z Bn) equations denote thinning
5B6 = 5B1, B2, B3, Á , Bn6 by a sequence of structuring
tu

elements. This method is


normally used in practice. (IV)
Thickening A } B = A ´ (A ~ * B) Thickens set A. (See preceding
A } 5B6 = comments on sequences of
(( Á (A } B1) } B2 Á ) } Bn) structuring elements.) Uses IV
with 0s and 1s reversed.
K
V

Skeletons S(A) = d Sk(A) Finds the skeleton S(A) of set


k=0 A. The last equation indicates
K
Sk(A) = d 5(A | kB) that A can be reconstructed
k=0 from its skeleton subsets
- [(A | kB)  B]6 Sk(A). In all three equations,
Reconstruction of A: K is the value of the iterative
step after which the set A
K
erodes to the empty set. The
A = d (Sk(A) { kB)
k=0 notation (A | kB) denotes the
kth iteration of successive
erosions of A by B. (I)

(Continued)
664 Chapter 9 ■ Morphological Image Processing

TABLE 9.1 Comments


(Continued) (The Roman numerals refer to the
Operation Equation structuring elements in Fig. 9.33.)
Pruning X1 = A z 5B6 X4 is the result of pruning set A.
8 The number of times that the
k
X2 = d (X1 ~
*B ) first equation is applied to
k=1 obtain X1 must be specified.
X3 = (X2 { H) ¨ A Structuring elements V are used
X4 = X1 ´ X3 for the first two equations. In
the third equation H denotes
structuring element I.
(1)
Geodesic DG (F) = (F { B) ¨ G F and G are called the marker

ud
dilation of and mask images, respectively.
size 1
DG (F) = DG C DG (F) D ;
(n) (1) (n - 1)
Geodesic
dilation of D(0)
G (F) = F
size n

Geodesic E (1)
G
(F) = (F | B) ´ G
erosion of
size 1
EG (F) = EG C EG (F) D ;
Geodesic
erosion of
size n
Morphological
lo (n)

(0)
E G (F)

RD
= F

G (F) = D G (F)
(k)
(1) (n - 1)

k is such that
reconstruction D(k)
G (F) = D G
(k + 1)
(F)
by dilation
C
(k)
Morphological RE
G(F) = E G (F) k is such that
reconstruction E (k)
G (F) = E G
(k + 1)
(F)
by erosion
(n)
Opening by O R (F) = RD
F [(F | nB)] (F | nB) indicates n
reconstruction erosions of F by B.
tu

Closing by
(n)
reconstruction C R (F) = R E
F [(F { nB)] (F { nB) indicates n
dilations of F by B.
H = C RD
Ic (F) D
c
Hole filling H is equal to the input
image I, but with all holes
filled. See Eq. (9.5-28) for
V

the definition of the marker


image F.
Border clearing X = I - R D
I (F) X is equal to the input
image I, but with all objects
that touch (are connected
to) the boundary removed.
See Eq. (9.5-30) for the
definition of the marker
image F.

You might also like