Encyclopedia Physical Sci and Tech Computer
Encyclopedia Physical Sci and Tech Computer
EN002-838
20:12
I.
II.
III.
IV.
V.
VI.
Brief Description
Early History of BASIC
Growth of BASIC
Standardization and its Failure
The Microcomputer Revolution
Present and Future
GLOSSARY
BASIC Name of any of a large number of simple
programming languages that are similar and ultimately derived from the original Dartmouth BASIC of
1964.
Keyword Word in a computer language that has a special
meaning. (Keywords in BASIC include, for instance,
LET, PRINT, FOR, NEXT, TO, and STEP.)
Language In computing, a programming language. Programming languages like human languages, consist of
words and symbols together with grammatical rules
that govern how they can be put together.
Line Same as a line of text, beginning with a line number
in original BASIC.
Line number Integer (whole number) that begins each
line of a BASIC program and serves as a kind of
serial number for that line. Line numbers also serve
as targets for GOTO statements.
List A list of a program is a printout on the screen of
a computer or on a hard-copy printer, of the text of the
program (i.e., its lines).
Program Collection of statements, formed according to
the rules of the language and with the purpose of carrying out a particular computing task.
Run Actual carrying out of the instructions of the program by the computer.
Statement Instruction to the computer. In BASIC, statements are virtually synonymous with lines and usually
begin with a keyword.
Subroutine Portion of a program, usually set off from the
rest of the program, that carries out some specific task.
Subroutines are usually invoked by special instructions,
such as GOSUB and RETURN in original BASIC, or
CALL in modern versions.
Variable Word in a program, usually different from a
keyword, that stands for a quantity, just as in ordinary
algebra.
23
EN002-838
20:12
24
100
110
120
125
130
I. BRIEF DESCRIPTION
RUN
? 3, 4
7
? 1.23, 4.56
5.79
? 17.5, 5.3
12.2
?
100
110
120
130
140
LET X
LET Y
LET Z
PRINT
END
= 3
= 4
= X + Y
Z
100
110
120
130
INPUT X, Y
LET Z = X + Y
PRINT Z
END
After the user typed RUN, the program stopped (temporarily), printed a question mark (?), and waited for the user
to respond. The user then typed two numbers, separated
by a comma, and followed by hitting the RETURN or
ENTER key. The program then commenced, calculated Z
(as the sum of the two numbers), and printed the answer.
The result might look like this on the yellow paper, or on
the screen of an early microcomputer:
RUN
? 3.4
7
A user who wished to make several additions could
arrange for the program to continue indefinitely, as in:
INPUT X, Y
LET Z = X + Y
PRINT Z
GOTO 100
END
100
110
120
130
x = 3
y = 4
z = x + y
print z
100
110
120
125
130
INPUT X, Y
LET Z = X + Y
PRINT Z
GOTO 100
END
EN002-838
20:12
25
100 DO
110
INPUT x, y
120
LET z = x + y
130
PRINT z
125 LOOP
130 END
The collection of lines starting with 100 DO and ending with 125 LOOP is known as a loop. The interior lines
of the loop are carried out repeatedly, as in the example
by using a GOTO statement. Notice, in addition, that the
program is written in mixed case (using both upper- and
lowercase letters), and the interior lines of the loop are indented. All modern versions of BASIC allow these stylistic
improvements.
Eliminating the GOTO statement (line 125) removes the
need to reference line numbers in the program. The line
numbers themselves, no longer serving a useful purpose,
can now be eliminated to get:
DO
INPUT x, y
LET z = x + y
PRINT z
LOOP
END
We could not have removed the line numbers from the
version that used a GOTO statement GOTO 100 because there would no longer be a line numbered 100 in
the program. Some earlier versions of BASIC allowed removing some lines except for those used as GOTO targets,
in which case the line numbers became statement labels,
a concept not present in the original BASIC.
EN002-838
20:12
26
PRINT
DATA
IF-THEN
NEXT
RETURN
DEF
END
REM
LET, PRINT, and END were illustrated in the first example program. READ and DATA were used to supply
data to the program other than through LET statements.
(It is a strange fact that the first version of BASIC did
not have the INPUT statement.) GOTO and IF-THEN
provided the ability to transfer to other locations in the
program, either unconditionally or conditionally on the
result of some comparison. FOR and NEXT were used
together and formed a loop. GOSUB and RETURN provided a crude subroutine capability. DIM allowed the user
to specify the size of a vector or matrix. DEF allowed the
user to define a new function (in addition to the functions
such as SQR, SIN, and COS that BASIC included automatically). REM allowed the user to add comments or
other explanatory information to programs.
We shall illustrate all 14 statement types in two short
programs. The first program uses eight of the statement
types and prints a table of the values of the common logarithms (logarithms to the base 10) for a range of arguments
and a spacing given in a DATA statement:
100
110
120
130
140
150
160
170
(Common logarithms can be computed from natural logarithms with the formula shown in line 130. The program,
when run, prints a table of values of the common logarithm
for arguments 1, 1.1, 1.2, 1.3, etc., up to 2.)
The second program computes, stores in a vector, and
prints the Fibonacci numbers up to the first that exceeds
100. (The Fibonacci numbers are 1, 1, 2, 3, 5, 8, 13, . . . .
The first two, 1 and 1, are given; each succeeding one is
obtained by adding the previous two.)
100
110
120
130
140
REM
DIM
LET
LET
LET
FIBONACCI NUMBERS
F(20)
F(1) = 1
F(2) = 1
N = 2
EN002-838
20:12
27
N = N+1
N := N + 1;
BASIC
LET N = N + 1
100
110
120
130
140
150
160
FOR I = 1 TO
LET X(I)
NEXT I
FOR I = 1 TO
PRINT I,
NEXT I
END
8
= I*I
8
X(I)
It was not necessary to include a DIM statement to establish that X stood for a vector, as in:
99 DIM X(8)
Such a DIM statement could be included, to be sure, but
if one were satisfied to work with elements X(1) through
X(10), the DIM statement would not be required. While
supplying sensible default values is still a cornerstone of
BASIC, default dimensioning of arrays has given way to
multi-character function names.
7. BASIC Should be Blank Insensitive; That is,
a User Should be Able to Type in a Program
Without Regard to Blank Spaces
This feature was intended to ease life for beginning typists.
The idea was that:
100 LET N = N + 1
could be typed as
100LETN=N+1
This decision meant that only simple variable names could
be used. The allowable variable names consisted of either
single letters or single letters followed by single digits.
EN002-838
20:12
28
With this rule, the following program fragment was unambiguous:
100FORI=1TON
110LETX1=X1+Y9I*SQR(N)
120NEXTI
It later became abundantly clear that permitting multicharacter variable names was far more important than
blank insensitivity. The reason that multi-character variable names and blank insensitivity cannot coexist is illustrated in the following example:
FOR I = A TO B STEP C
FOR I = A TO B STEP C
If multi-character variable names were allowed, BASIC
was in a quandary. It cannot distinguish the first form (the
variable I starts at the value of the variable A, finishes at
the value of the variable B, and is incremented with a
step size given by the value of the variable C) from the
second (the variable I starts at the value of the variable A
and finishes at the value of the variable BSTEPC, with the
step size understood to be 1). Giving up blank insensitivity
in favor of multi-character variable names resolves the
ambiguity (in favor of the second form).
8. Line Numbers Should Double as Editing
Aids and Targets of GOTO and IF-THEN
Statements
For the first 10 years of BASIC, this design decision
remained valid, but eventually video terminals replaced
Teletypes and soon supported screen editors. (A screen
editor permits making changes in the program simply
by moving the cursor around; with a screen editor, What
You See Is What You Get.) Line numbers were no longer
needed as editing aids. This period also saw the birth of
structured programming. One of the tenets of structured
programming is that the GOTO statements are simply not
needed, provided that one can use an IF-THEN-ELSE
structure and a general loop structure. If all old-fashioned
GOTO and IF-THEN statements are eliminated, line
numbers are not needed as targets for those statements.
Line numbers, no longer serving a useful purpose, can
quietly disappear.
D. BASIC Starts To Grow
BASIC quickly grew in response to the needs of its users.
By 1969, which saw the appearance of the fifth version of
CHANGE N$ TO N
FOR I = 1 TO N(0)
IF N(1) = 32 THEN GOTO 250
LET F(1) = N(I)
NEXT I
LET F(O) = I - 1
CHANGE F TO F$
The first CHANGE statement put the following numbers
into the list N:
EN002-838
20:12
29
MAT T = INV(A)
where A and T both stood for square matrices having the
same size.
100
110
120
130
140
150
160
170
180
190
200
210
FILES GRADES
FOR S = 1 TO 3
INPUT #1: N$
LET T = 0
FOR J = 1 TO 4
INPUT #1: G
LET T = T + G
NEXT J
LET A = T/4
PRINT N$, A
NEXT S
END
JONES
78
86
61
90
SMITH
66
87
88
91
WHITE
56
77
81
85
The purpose of the program was to average the grades of
several students. This example illustrates the type of file
called the terminal-format file, now called a text file. These
files consist entirely of printable characters. Many versions
of BASIC also included random-access files. That term
did not mean that the files contained random numbers;
it meant that any record in the file could be accessed in
roughly the same amount of time. Although the details
varied, those versions of BASIC that included files also
included the capabilities for erasing them, determining the
current position in the file, determining the files length,
and so on.
EN002-838
20:12
30
100
110
120
130
140
150
160
EN002-838
20:12
31
first microcomputer versions of BASIC were very simple. They were similar in most ways to 1964 Dartmouth
BASIC, but since they were not based directly on
Dartmouths BASIC, they were invariably different from
it, and from each other. The explosive technological development which spawned personal computers carried with
it two other major developments. The first was structured
programming, which, in simplest terms, made it possible
to write programs without using GOTO statements. The
second was the sudden availability of graphic displays.
Features taking advantage of these two developments
were added to almost every version of BASIC, and rarely
were the features alike, or even similar, among different
versions.
2. Optional LET
Space can be saved, and typing time reduced, by omitting
the keyword LET in a LET statement, as with:
100 X = 3
Allowing the omission of the keyword LET violated one
of the original premises of Dartmouth BASIC (that all
statements begin with a keyword so that the assignment
statement looks different from an assertion of equality),
but this feature is nonetheless quite popular among personal computer users, and most, but not all, versions of
BASIC allowed this feature.
3. Multiple Statements on a Line
Another feature motivated partially by the limited memory
available was putting several statements on a line, as with:
EN002-838
20:12
32
meant that, if X were in fact less than Y, the two statements
following the THEN were executed. But, a programmer
who followed the usual rules of putting several statements
on a single line might believe that:
100 X = 3: Y = 4: Z = 5
and
100 X = 3: Y = 4
101 Z = 5
which are equivalent.
4. Commenting Conventions
All versions of BASIC have always allowed the REM
statement, which allows including remarks or comments
in the program. Such comments might include the programmers name, brief descriptions of how the program
works, or a detailed explanation of some particularly tricky
section of code. Many versions of BASIC also allowed
comments on the same line as other BASIC statements.
For instance, in
200 DEFINT IN
200 DECLARE INTEGER IN
where the IN means that all variables with names that
begin with I, J, K, L, M, or N are to be treated as integer
variables.
7. Strings Proliferate
Most versions of BASIC added string concatenation,
which is simply the joining of two strings, end to end.
For example,
EN002-838
20:12
33
a$ = abcdefghijklmn
then
MID$(a$, 5, 7) = efghijk
while
SEG$(a$, 5, 7) = efg
Inasmuch as MID$(a$, 1, 5) does give the same
result as SEG$(a$, 1, 5), this caused confusion
when users switched from one version of BASIC to
another.
Most BASICs also provided ways to locate or find various patterns in a string. For example, if:
100 FOR I = I TO 10
110 . . .
120 NEXT I
we know that the insides of the loop (line 110) will be executed exactly 10 times. This construct is not adequate for
situations where we want to carry out the loop until, for
example, some accuracy requirement has been met. Some
programmers used FOR-NEXT loops with the TO value
set sufficiently high and then used a GOTO statement to
EN002-838
20:12
34
jump out of the loop when the completion test was met.
(This trick is less desirable than using a general loop construct.) Other programmers fiddled with the loop variable
(the variable I in the above fragment) to force completion
of the loop when the completion test was met; this practice was even less desirable. Furthermore, the IF-THEN
statement in the original BASIC, based as it was on jumps
to other statements, did not provide the capability found
in the IF-THEN-ELSE.
To be sure, both the general loop and the general
IF-THEN-ELSE can be constructed using GOTO
and IF-THEN statements, but that is not the point. Programmers tend to make fewer errors when the language
they are using provides the constructs they need to carry
out their work.
As an illustration of the weaknesses of the original
IF-THEN and GOTO statements, suppose we want to
check a students answer to a question in a drill program
(the correct answer is 17):
300 IF a = 17 then
310
PRINT Right
320 ELSE
330
PRINT Wrong
340 END IF
(Indentation is used for clarity in both examples and is not
essential.)
As the acceptance of structured programming grew, the
reputation of BASIC declined. This was a fair assessment of most microcomputer versions of BASIC. Many of
them added abbreviated constructs, such as the single-line
IF-THEN and IF-THEN-ELSE discussed earlier, but
these additions were often made with little thought as to
how they would fit with other language features, such as
multiple statements on a line.
2. Subroutines
Another limitation of early versions of BASIC was the almost complete dependence of the GOSUB and RETURN
200
210
220
230
LET X
LET Y
GOSUB
LET C
= A
= B
1000
= Z
EN002-838
20:12
35
3. Graphics
The last big innovation that occurred during BASICs
teenage years was the explosion in graphics. It is undoubtedly true that, while BASIC fell behind other languages in
adapting to structured programming, it led all other languages in embracing graphics.
Microcomputers are particularly adept at drawing pictures. We might even assert that they are better at that than
they are at printing numbers. Contrast that with big machines, which did better at number crunching. Drawing
a picture with a big machine entailed much more work
and much longer periods of waiting than it did with microcomputers. Since BASIC was the only language on
most early microcomputers, it was natural to include linedrawing and other graphics commands in the language. If
one wanted to draw pictures on a microcomputer, we may
safely assert that one would prefer to use BASIC.
Surprisingly, the first interactive graphics versions of
BASIC were not on personal microcomputersthey were
on big time-sharing systems. Some of the early work was
done on the Dartmouth time-sharing system in the late
1960s, before the first personal computers became commercially available. There was one big problem in using
graphics with BASIC on most personal computers. Each
computer was different and allowed a different number
of pixels on the screen. (A pixel is a point on the screen.
A commercial television set has about 250,000 pixels, arranged in 500 rows of 500 pixels each.) Drawing a picture
on the screen of a different brand of personal computer,
or a different model of the same brand, required learning
how many pixels there were.
An example should make this clear. Suppose the screen
allowed 40 pixels in the horizontal direction and 48 pixels
in the vertical direction. Suppose that a version of BASIC
had line-drawing commands as follows:
are (20, 25), (20, 35). To draw a small rectangle in the center of the screen, one might use:
HLIN
VLIN
HLIN
VLIN
20,
16,
20,
16,
28
24
28
24
AT
AT
AT
AT
16
20
24
28
SET WINDOW 0, 1, 0, 1
The rectangle-drawing code might then be reduced to a
single statement such as the following:
HLIN 10, 20 AT 30
VLIN 25, 35 AT 20
(These conventions correspond to medium-resolution
color graphics in one popular early microcomputer
BASIC.) The first would draw a horizontal line 30 pixels below the upper edge of the screen and extending from
the 10th pixel from the left edge of the screen to the 20th
pixel. The second would draw a vertical line 20 pixels
from the left edge of the screen and from the 25th pixel
below the top to the 35th pixel.
It is easier to understand these commands in terms of the
coordinates of the Cartesian plane (most personal computers, however, turned the vertical axis upside down). The
first draws the line given by the two end points (10, 30),
(20, 30), while the second draws the line whose end points
EN002-838
20:12
36
to wonder how they could teach good program construction and modularization using a language that essentially
does not allow either. True, some better versions of BASIC were available, but only on certain larger machines
such as larger computer-based, time-sharing systems.
These were used in secondary schools and colleges, but
the total number of students who could be trained using
these better versions of BASIC was smaller than the
number who learned BASIC on a microcomputer. For
example, BASIC at Dartmouth College continued to grow
until, by the end of the 1970s, it contained many features
found only in more sophisticated languages. It allowed
external subroutines, for example, to be precompiled (preprocessed to save time) and placed in libraries for general
use. But, the work done at such isolated locations did not
find its way into general practice, and there was no way to
curb the individuality that dictated that different manufacturers have different versions of BASIC or that the same
manufacturer might have as many as 5 or 10 different
versions of BASIC on its different lines of computers.
That is, there was no way until an official standardization
activity commenced. We describe that activity in the next
section.
A. Standard BASIC
The ANSI standard for BASIC differs from the 1964 original BASIC in a number of ways. The first is that variable
names can contain any number of characters up to 31, instead of just one or two. The standard also allows general
names for new functions introduced by the DEF statement.
In the original BASIC, all such functions had names that
started with FN. Anytime one saw something like FNF(x,
y), one could be quite certain that it stood for a function
with two arguments. Standard BASIC allows more general names, such as cuberoot(x), instead of requiring the
name to begin with fn, such as fncuberoot(x).
Standard BASIC includes several loop constructs. The
general loop construct can have the exit condition attached
to the DO statement, attached to the LOOP statement, or
located somewhere in between. The following program
fragment illustrates the middle exit from a DO loop:
DO
PRINT Enter x, n: : x, n
If n = int(n) then EXIT DO
PRINT n must be an integer;
please reenter.
LOOP
PRINT x to the n-th power is; x n
Alternative ways of coding that fragment without using the
EXIT DO statement are either longer or more obscure.
Standard BASIC also allows exiting from the middle of a
FOR-NEXT loop. Standard BASIC includes a variety of
constructs for making choices; for example:
then
smallest
smallest
smallest
and
EN002-838
20:12
37
line$ [2:5]
which gives the second through fifth characters of the
string of characters contained in the string variable, line$.
Such special functions as SEG$, LEFT$, RIGHT$, and
MID$ are no longer needed.
Standard BASIC includes provision for both new function definitions and named subroutines with arguments.
The function definitions can contain multiple lines, as in:
! Main program
DO
INPUT prompt Enter two
numbers: x, y
CALL sum (x, y, s)
PRINT The sum is
PRINT
LOOP
END
!External Subroutine
EXTERNAL SUB sum (a, b, c)
LET C = a + b
END SUB
DO
INPUT prompt Filename:fname$
WHEN EXCEPTION IN
OPEN #1: name fname$
EXIT DO
USE
PRINT File; fname$;
not there: retry.
END WHEN
LOOP
This represents a typical need in a program: to allow the
user to give the name of a file to be used but to allow the
program to continue gracefully if that file happens to not
exist. (Modern versions of BASIC use a file open dialog
box, which presents in visual form a list of file names
from which to choose.)
Perhaps the single most important contribution of standard BASIC is its provision for graphics. The manner in which graphics is done is based on the GKS
(Graphics Kernel System) International Standard, with
a few exceptions and additions. The user always works
in user coordinates (sometimes called problem or world
EN002-838
20:12
38
coordinates) regardless of how many pixels the screen
contains. An example of how the user specifies these coordinates is
Monitors
Interaction
Modes
Application
size
Simple computations
Several thousand lines
of code
Speed
Memory (RAM)
Disks
Early
microcomputers
EN002-838
20:12
39
mouse-like pointing devices dictated far more sophisticated applications than those of a few years earlier. Rather
than dropping BASIC, the vendors rapidly added new
features to handle the new capabilities. For example, the
new applications required sophisticated treatment of objects such as windows and controls such as push buttons. For a time, these features were made available in the
usual way, by adding statements or subroutine libraries to
the language, but more programmer-friendly ways were
devised.
B. Visual Interface Building Tools
One major development was the introduction of graphical
tools to build the user interfaces. These interfaces, called
graphical user interfaces (GUI), replaced the old typed
commands of the original BASIC and the early microcomputer versions. With these tools, application developers can build their applications user interface by clicking
and dragging the mouse. The mouse is used to select the
control (i.e., push button) and to move it to a new location
or resize it. Once the interface has been built, the programmer proceeds to supply substance to the application
by filling in the callback subroutines. Initially, the callback subroutines are empty, so the programmer must fill
them using a traditional programming languageBASIC,
in this case.
C. Object-Oriented Programming
Object-oriented programming provides a higher level way
for programmers to envision and develop their applications. Without attempting to define the concept, we merely
note that one deals with objects and methods. For example, an object might be a window, and a method might
be to display the window (make it visible.) As applied
to BASIC, the concepts of object-oriented programming
are partly substantial and partly nomenclature. Dealing
with windows, movies, sound strips, internet access, and
so on is made simpler, at least conceptually, by thinking of
them as objects. At the other end of the spectrum, a BASIC
variable, such as x, can be thought of as an object, while
PRINT can be thought of as a method that can be applied
to that object. This is not to diminish the importance of
object-oriented programming. Its most important aspect
is that the detailed coding, which must exist at the lower
BIBLIOGRAPHY
American National Standards Institute. (1978). American National
Standard for the Programming Language Minimal BASIC, X3.601978, ANSI, New York.
American National Standards Institute. (1987). American National
Standard for the Programming Language BASIC, X3.113-1987,
ANSI, New York.
American National Standards Institute. (1991). American National
Standard for the Programming Language BASIC, Addendum,
X3.113A-1991, ANSI, New York.
Frederichk, J., ed. (1979). Conduit Basic Guide, Project: CONDUIT,
Universihty of Iowa Press, Iowa City.
Kemeny, J. G., and Kurtz, T. E. (1968). Dartmouth time sharing,
Science, 162, 223228.
Kemeny, J. G., and Kurtz, T. E. (1985). Back to Basic, Addison-Wesley,
Reading, PA.
Kurtz, T. E. (1980). In History of Programming Languages (R. L.
Wexelblatt, ed.), pp. 515549. Academic Press, New York.
Sammet, J. (1969). Programming Languages, History and Fundamentals, Prentice-Hall, Englewood Cliffs, NJ.
EN002C-839
20:27
I.
II.
III.
IV.
GLOSSARY
ANSI C Version of C standardized by the ANSI (American National Standards Institute) X3J11 committee.
Array Data type that has multiple elements, all of the
same type. Individual elements are accessed using a
numeric index expression. C array elements are numbered from zero to one less than the number of array
elements.
Base class Class from which another can be created by
adding and/or redefining elements.
Cast C operation that converts an expression from one
type to another.
Class C++ data type whose members (elements) may
consist of both procedures and data (information). C++
classes are syntactically based on C structures.
Declaration Description of the name and characteristics
of a variable or procedure.
Dereference To access the value that a pointer expression
points toward.
Derived class Class that has been created from a base
class.
Embedded assignment Assignment statement that appears in a context where many people would expect
simple expressions, such as in the control expression
of an if or while statement.
Enumeration C data type that allows you to create symbolic names for constants.
Inheritance Capability of a programming Ianguage that
allows new data types to be created from existing types
by adding and/or redefining capabilities.
Iterate To perform a task repeatedly.
K&R C The first version of C, which was specified in
the book The C Programming Language by Kernighan
and Ritchie.
Multiple inheritance Similar to inheritance, but a capability in which new data types can be created simultaneously from sev
Object An instance of a class. An object occupies a region of storage, is interpreted according to the conventions of the class, and is operated on by class member
functions.
Object-oriented programming Style of programming
in which classes are used to create software objects
335
EN002C-839
20:27
336
whose use and function corresponds to real world
objects.
Operand Value or expression that is acted on by an operator.
Operator Something that can combine and modify expressions, according to conventional rules, such as the
+ (add) operator that adds its operands, or the (dereference) operator that can access a value pointed at by
a pointer expression.
Pointer Constant or variable whose value is used to access a value or a function. The type of a pointer indicates
what item the pointer accesses. The accessed item can
itself be a pointer.
Polymorphism Ability of a routine or operator to have
various behaviors based on the dynamically determined
(runtime) type of the operand.
Standard I/O library (stdio) Set of routines that provides input/output (I/O) facilities that are usable on
most systems that support C.
Strongly typed Said of a programming language that
only allows operations between variables of the same
(or similar) type.
Structure C data type that has multiple elements, each
with its own type and name.
Type Characteristic of a value (a variable, a constant, or
the return value of a procedure) that specifies what values it can attain and what operations can be performed
on it.
Type checking Checking performed by most languages
to make sure that operations are only performed between compatible data types.
Union C data type that has multiple elements, each with
its own type and name, but all of which are stored at
the same location.
Usual arithmetic conversions Conversions that C performs to allow expressions involving different types of
operands.
Weakly typed Said of a programming language that allows operations between various types.
WG21 ISO committee that developed a C++ standard.
X3J11 ANSI committee that developed a C standard.
X3J16 ANSI committee that developed a C++ standard.
EN002C-839
20:27
337
of C, even if they contain additional features, will at minimum contain all of the features of K&R C.
In 1979, AT&T disseminated a paper written by B. R.
Rowland that specified a few minor changes to the C language. Some of these features were anticipated in the original K&R definition, others were a result of experience with
the language. A more formal description of these changes
appeared in the 1984 AT&T Bell Laboratories Technical
Journal. Most of these features have been in widespread
use since the early 1980s.
Although C was a widely used language by 1980, there
was no attempt to forge an official standard until 1984,
when the ANSI (American National Standards Institute)
committee X3J1I was convened. Thus by the time the committee started work there was already over a decade of experience with the language, and there was a huge existing
body of C language software that the committee wanted to
preserve. The role of the X3J11 committee was primarily
to codify existing practice, and only secondarily to add
new features or to fix existing problems.
During the early l980s, while Cs position as a leading development language was being consolidated, Bjarne
Stroustrup of AT&T Bell Laboratories developed a C language extension called C with Classes. The goal of C
with Classes was to create a more productive language
by adding higher level abstractions, such as those found
in Simula, to C. The major enhancement was the class, a
user definable data type that combines traditional data elements with procedures to create and manipulate the data
elements. Classes enable one to adopt an object-oriented
programming style, in which programs are composed of
software objects whose design and use mirrors that of real
world objects.
By 1985, C with Classes had evolved into C++ (pronounced C plus plus), and it began to be used outside
of AT&T Bell Laboratories. The most important reference
for C++ at that time was The Annotated C++ Reference
Manual by Ellis and Stroustrup. In 1987 the International
Standards Organization (ISO) formed Working Group 21
(WG21) to investigate standardization of C++, while at
about the same time ANSI convened committee X3JI6 to
create a standard for the C++ programming language. In
late 1998 the standards efforts concluded with the publication of ISO/IEC 14882-1998, a standard for C++.
Building C++ on top of C has worked well for a variety
of reasons. First, Cs relatively low-level approach provided a reasonable foundation upon which to add higher
level features. A language that already had higher level
features would have been a far less hospitable host. Second, programmers have found that C++s compatibility
with C has smoothed the transition, making it easier to
move to a new programming paradigm. Third, one of
Stroustrups most important goals was to provide ad-
Size (bytes)
char
unsigned char
short
unsigned short
int
unsigned int
long
unsigned long
float
1
1
2
2
4
4
4
4
4
double
Range
EN002C-839
20:27
338
are not widely used. Instead of enumerations, most programmers use the preprocessor #define statement to create
named constants. This approach lacks some of the merit
of enumerations, such as strong type checking, but preprocessor definitions, unlike enumerations, are available
on all implementations of C.
In addition to its basic data types, C has four more
sophisticated data types: structures, unions, bitfields, and
arrays.
A structure is a way of grouping data. For example, an
employees record might contain his or her name, address,
title, salary, department, telephone number, and hiring
date. A programmer could create a structure to store all of
this information, thereby making it easier to store, retrieve,
and manipulate each employees data. Each element in a
structure has a name. C language structures are analogous
to records in a database.
A union is a way of storing different types of items
in a single place. Of course, only one of those types can
be there at any one time. Unions are sometimes used to
provide an alternate access to a data type, such as accessing
the bytes in a long integer, but the most common use is to
save space in a large data set by storing only one of a set
of mutually exclusive pieces of information.
A bitfield is somewhat like a structure, but each member
of a bitfield is one or more bits of a word. Bitfields are
a compact way to store small values, and they are also a
convenient way to refer to specific bits in computer control
registers.
An array is a sequence of data items. Each item in an
array is the same type as all the other items, though they
all may have different values. Each array has a name, but
the individual elements of an array do not. Instead, the
elements in an array are accessed using an index, which is
a numeric value. The first element in a C array is accessed
using the index 0, the next using the index 1, and so on
up to the highest index (which is always one less than the
number of elements in the array).
Whereas there are many operations that can be performed on Cs numeric data types, there are only a few
operations that are applicable to the four more complex
data types. These operations are summarized in Table II.
TABLE II Operations on Cs Advanced Data Types
Array
Structure,
union, or
bitfield
Addition (binary)
Force order of
evaluation (unary)
Increment
Multiplication
Logical
==
<
<=
Equality
Less than
Less than or equal
&&
/
%
Decrement
Division
Remainder
!=
>
>=
Not equal
Greater than
Greater than or equal
||
Logical AND
Logical NOT
Subtraction (binary)
Negation (unary)
Logical OR
B. Operators
C is known as an operator-rich language for good reason. It
has many operators, and they can be used in ways that are
not allowed in tamer languages, such as Pascal. The basic
arithmetical and logical operators, detailed in Table III,
are present in most programming languages. The only unusual operators in Table III are increment and decrement.
For ordinary numeric variables, increment or decrement is
simply addition or subtraction of one. Thus the expression
i++;
(i is a variable; the expression is read aloud as i plus
plus) is the same as the expression
i = i + 1;
(again i is a variable; the expression is read aloud as i is
assigned the value i plus one). However, the increment
and decrement operators can also be used inside a larger
expression, which provides a capability not provided by
an assignment statement. More on this special capability
in Section III.D.
Cs bit manipulation operators, shown in Table IV, provide a powerful facility for programming computer hardware. They provide the same capabilities as the traditional
logical operators but on individual bits or sets of bits. There
are also two shift operators, which allow the bits in an integer or character to be shifted left or right.
TABLE IV Bit Manipulation Operators
&
Bitwise AND
Bitwise complement
Left shift
Bitwise OR
Bitwise exclusive OR
Right shift
EN002C-839
20:27
339
&=
<<=
+=
=
Assign
Assign bitwise AND
Assign left shift
Assign sum
Assign product
|=
>>=
=
/=
%=
p
in a program symbolizes an expression that contains the
address of a character, while the expressions
i = i * 2;
(i is a variable; this statement means that i is multiplied
by two, and then the result is stored in i.)
Using the multiplication assignment operator, this can
be written
*p
accesses that character. The expression
*(p + 1)
accesses the following character in memory, and so on.
The address-of operator (&) does the opposite. It takes
a reference to a variable, and converts it to an address
expression. For example, if f is a floating-point variable,
the expression
f
accesses the floating-point variable, while the expression
i *= 2;
&f
In simple situations assignment operators do not provide
much advantage. However, in more complex situations
they can be very important. One advantage is that the
address calculation (to access the variable) only needs to
be performed once. For example, the statement
*&f
is equivalent to f by itself.
The sequential evaluation operator is used to sneak two
or more expressions into a place where, syntactically, only
one expression is expected. For example, C while loops use
a control expression to control the repetition of the loop.
While loops execute so long as the control expression is
true. A typical while statement is
i = 0;
while (i++ < 10)
processData();
TABLE VI Miscellaneous Operators
sizeof
.
[]
Indirection
& Address of
Sequential evaluation
? : Conditional (tertiary)
Size of type or variable (type) Type cast
Member of
> Member pointed
toward
Element of array
( ) Parentheses (grouping)
EN002C-839
20:27
340
In this loop, the repetition will continue as long as the
value of the i variable is less than 10. The ++ increments
the value of i after each test is made, and the body of
the loop is a call of the function named processData. If
we want to have the control expression also increment a
variable named k each time, we can use
if (index > 0)
sx = stheta[index];
else
sx = stheta[-index];
Instead of this four-line if statement, we can use a oneline conditional expression:
either be a variable or the name of a data type. For example, sizeof is often used in conjunction with the standard
C memory allocation routine, malloc(), which must be
passed the number of bytes to allocate. The statement
iptr = malloc(1000*sizeof(int));
(iptr is a pointer to an integer) allocates enough space
to store an array of 1000 integers. This statement will
work correctly on all machines that support C, even though
different machines have different sized integers, and thus
need to allocate different amounts of memory for the array.
C, as originally designed, was a very weakly typechecked language. In the earliest versions, pointers and
integers were (in many situations) treated equivalently,
and pointers to different types of structures could be used
interchangeably. Since that time, stronger and stronger
type checking has become the norm. By the time of the
ANSI C committee (mid-l980s), most implementations of
C encouraged programmers to pay much more attention
to issues of type compatibility.
One of the most important tools for managing types
is the cast operator. It allows a programmer to specify
that an expression should be converted to a given type.
For example, if variables named tnow and tzero are long
integers, the natural type of the expression tnow-tzero is
also a long integer. If you need another type, you can
specify the conversion using a cast operator:
pBox->TopRight
EN002C-839
20:27
341
(*pBox).TopRight
which uses the indirection operator (*) to dereference the
pointer-to-structure and then uses the member-of operator
to access the given member.
The last two operators in Table VI are square brackets (for array subscripting) and parentheses (for grouping). These two operators, together with the assignment
operator, are familiar features from other programming
languages, but they are not considered operators in most
languages. However, in C these elements are considered
to be operators to make the expression syntax as regular,
powerful, and complete as possible.
C. Control Structures
In contrast to its eclectic set of operators, C has an unremarkable collection of control structures. One possible
exception is the C for loop, which is more compact and
flexible than its analogs in other languages. The purpose
of the C control structures, like those in other languages,
is to provide alternatives to strictly sequential program
execution. The control structures make it easy to provide
alternate branches, multi-way branches, and repeated execution of parts of a program. (Of course, the most profound
control structure is the subroutine, which is discussed in
Section II.D.)
A compound statement is simply a group of statements
surrounded by curly braces. It can be used anywhere that
a simple statement can be used to make a group of statements into a single entity.
if (expr)
statement1;
else
statement2;
The else part is optional, and either statement may be a
compound statement. (The expression and the statement
above are shown in italics, to indicate that they may be
any C expression or C statement. The words if and else
are keywords, which must appear exactly as shown, which
is why they are not shown in italics.) It is very common
for the else part of the if statement to contain another if
statement, which creates a chain of if-statement tests. This
is sometimes called a cascaded if statement:
if (code == 10)
statement1;
else if (code < 0)
statement2;
else if (code > 100)
statement3;
else
statement4;
In the series of tests shown here, only one of the four
statements will be executed.
An alternative multi-way branch can be created by a
C switch statement. In a switch statement, one of several
alternatives is executed, depending on the value of a test
expression. Each of the alternatives is tagged by a constant value. When the test expression matches one of the
constant values, then that alternative is executed.
The syntax of the switch statement is somewhat complicated.
switch (expr) {
case const1:
statement1;
break;
case const2:
statement2;
break;
default:
statement3;
break;
}
EN002C-839
20:27
342
In this skeleton switch statement, expr is the test expression and const1 and const2 symbolize the constants
that identify the alternatives. In this example, each alternative is shown terminated by a break statement, which is
common but not required. Without these break statements,
flow of control would meander from the end of each alternative into the beginning of the following. This behavior
is not usually what the programmer wants, but it is one
of the possibilities that is present in Cs switch statement.
The break statement will be discussed further later.
The switch statement is less general than the cascaded
if statement, because in a cascaded if statement each alternative can be associated with a complex expression, while
in a switch statement each alternative is associated with a
constant value (or with several constant values; multiple
case labels are allowed).
The switch statement has two advantages over the more
flexible cascaded if statement. The first is clarity; when a
solution can be expressed by a switch statement, then that
solution is probably the clearest solution. The second is
efficiency. In a cascaded if statement, each test expression
must be evaluated in turn until one of the expressions is
true. In a switch statement, it is often possible for the C
compiler to generate code that branches directly to the
target case.
A while loop lets you repeatedly execute a statement
(or group of statements) while some condition is true. It
is the simplest iterative statement in C, but it is also very
general.
i = 0;
while (i < 10) {
x[i++] = 0;
}
A close relative of the while loop is Cs do loop. It
repeats a statement (or a group of statements) while a
i = 0;
do
x[i] = 0;
while (++i < 10);
As with a while loop, something in the body of the
loop or the control expression presumably changes the
value of the test expression, so that the loop will eventually
terminate.
Cs most elaborate loop is the for loop. Here is the
general form of the for loop:
EN002C-839
20:27
343
which simplifies to
98
4
which has two solutions, 1 and 0.5.
The first part of solve specifies the procedure name,
parameter names, and parameter types. The header of the
solve procedure indicates that it expects three parameters,
which it also calls a, b, and c. The body of solve calculates
the solutions discriminant, which is the expression inside
the square root symbol, and then calculates the answers,
based on whether the discriminant is positive, zero, or
negative. Most of the body of solve is a large if statement
that handles each of the three types of disciriminants.
In a program, you can invoke the solve procedure as
follows:
3
D. Procedures
Procedures are tools for packaging a group of instructions
together with a group of local variables to perform a given
task. You define a procedure by specifying what information is passed to the procedure each time it is activated,
listing its local variables, and writing its statements. As
discussed in Section II.B, the statements inside a procedure can access global data that are declared outside the
procedure, but it is not possible for other procedures to access the data that are declared within a procedure (unless
the procedure exports the address of a local variable). This
insularity is the most important feature of procedures. It
helps the programmer to create small, easily understandable routines.
Figure 1 contains a procedure to solve the quadratic
equation
/*
* solve the quadratic equation
ax**2 + b*x + c = 0
* using the formula x = ( -b
+/- sqrt (b**2 - 4*a*c))/2*a
*/
void solve (double a, double b,
double c)
{
Ax 2 + Bx + C = 0
using the well-known quadratic equation:
B B 2 4 A C
2 A
For example,
2x 2 + 3x + 1 = 0
is a quadratic equation (A is 2; B is 3; C is 1) whose
solution is
3 32 4 2 1
,
22
}
FIGURE 1 A program to solve the quadratic equation.
EN002C-839
20:27
344
#include <stdio.h>
During the preprocessing phase, this statement will be
replaced by the contents of the stdio.h file, so that the later
phases of the compiler will only see the contents of stdio.h.
The macro feature of the C preprocessor allows you to
replace one item by another throughout a program. This
has many uses, such as creating named constants, creating in-line subroutines, hiding complicated constructs,
and making minor adjustments to the syntax of C. Macros
can have parameters, or they can simply replace one item
of text with another. Macros are created using the #define
mechanism. The first word following #define is the name
of the macro, and following names are the replacement
text.
There are several ubiquitous C macros including NULL,
an impossible value for pointers; EOF, the standard end
marker for stdio input streams; and the single character
I/O routines, getc() and putc(). NULL and EOF are simply
named constants. In most versions of C, they are defined
as follows:
EN002C-839
20:27
345
int a;
states that a is an integer. The next simplest declaration is
to declare an array of something, for example, an array of
integers.
int b[10];
This declaration states that b is an array of ten integers.
b[0] is the first element in the array, b[1] is the next,
and so on. Notice that the declaration does not contain a
keyword stating that b is an array. Instead, Cs standard
array notation, b[subscript], is used in the declaration.
The next simplest declaration creates a pointer to a simple type, such as a pointer to an integer.
int *c;
This declaration states that *c is an integer. Remember
that * is the C indirection operator, which is used to dereference a pointer. Thus, if *c is an integer, then c itself must
be a pointer to an integer.
Another thing that can be declared is the return type
of a function. The following declaration states that d is a
function returning an integer.
int d();
The ( ) in the declaration indicate that d is a function.
When d is invoked in the program, it can be used in any
situation where an integer variable is used. For example,
you could write
i = 2 * d() + 10;
int *e();
Note that the above declaration does not declare a pointer
e to a function returning an integer, because the parentheses to the right of e take precedence over the indirection
operator to the left.
When you are verbalizing a declaration, start from the
inside and work out, and remember that it is helpful to
read ( ) as function returning, [] as array of, and * as
pointer to. Thus, this declaration above could be read e
is a function returning a pointer to an int.
There are a few restrictions on what you can declare in
C. For example, you can declare a function, a pointer to a
function, or an array of pointers to functions, but you are
not allowed to declare an array of functions.
D. Operator-Rich Syntax
C has the usual assortment of numeric operators, plus some
additional operators, such as the operators for pointers, the
assignment operators, the increment/decrement operators,
the comma operator, and the conditional operator. With
just this rich set of operators, C could be considered to
have an operator-rich syntax.
But C goes one step further. it considers the expression
to be a type of statement, which makes it possible to put an
expression any place a statement is expected. For example,
c++ is a complete statement that applies the increment
operator (the ++ operator) to the variable named c.
C programs take on a very dense appearance when assignment statements are used in the control expressions
of loops and if statements. For example, the following
snippet of code is extremely common.
int ch;
while ((ch = getchar()) != EOF)
;
EN002C-839
20:27
346
The control expression of this while loop calls getchar
to read in a character, assigns that character to the ch variable, and then runs the body of the loop (which in the
above example is empty, causing the above code to read
in and ignore all of the input). The loop terminates when
getchar returns the value EOF (end of file; a symbolic
constant that is defined in the stdio.h include file).
Another common technique is to use the pointer increment and decrement operators in a loop control expression.
For example, the following loop copies the string pointed
to by p to the location pointed at by q ( p and q are both
pointers to characters).
p && q
means p AND q. According to the rules of Boolean logic,
the result will be TRUE only if both p and q are TRUE.
When the program is running, if the p part turns out to
be FALSE, then the result of the whole expression is immediately known to be FALSE, and in this case the q part
will not be evaluated.
Similarly, the expression
p || q
means p OR q. In this case, according to the rules of
Boolean logic, the result will be TRUE if either the p
or q part is TRUE. When the program is running, if the
p part turns out to be TRUE, then the result is immediately known to be TRUE, and in this case the q part will
not be evaluated, because C uses short circuit expression
evaluation.
The following code fragment is an example of how
short-circuit evaluation is often used. In it, a pointer is
compared with the address of the end of an array to make
sure that the pointer has not advanced past the end of the
EN002C-839
20:27
347
int temp;
temp = *a;
*a = *b;
*b = temp;
}
Inside the iswap procedure the * operator is used to
access the values that a and b point toward. The iswap
procedure is called with two pointers to int (integer), as in
the following:
int i, j;
i = 50;
j = 20;
iswap(&i, &j);
When you call iswap you need to put the & (address-of)
operator in front of the variables i and j so that you pass
the addresses of the two variables to iswap. After iswap
completes its work, the variable i will have the value 20
and the variable j will have the value 50.
F. Function Pointers
A function pointer is a pointer variable, but it holds the
address of a function, not the address of a data item. The
only things you can do with a function pointer are read its
value, assign its value, or call the function that it points
toward. You cannot increment or decrement the address
stored in a function pointer or perform any other arithmetic
operations.
Function pointers make it possible to write very general
programs. For example, if you have a data structure that
contains several different types of items, each item might
contain a function pointer that could be used to print, order,
or otherwise manipulate the information. Each type of data
item would contain a function pointer to the appropriate
function. Function pointers provide a very tedious way
to build an object, a data structure that combines a set of
values with a collection of appropriate behaviors.
Function pointers are declared using the syntax described in Section III.C. In that section, it was mentioned
that the declaration
int *fn();
int (*fnptr)();
The parentheses around fnptr are necessary; they bind
the (the indirection operator) to the fnptr, overriding the
normal precedence of the parentheses over the . This declaration should be read aloud as fnptr is a pointer to a
function returning an integer.
G. Void
One of the innovations of ANSI C is the creation of the
void data type, which is a data type that does not have any
values or operations, and that cannot be used in an expression. One important use of void is to state that a function
does not have a return value. Before ANSI, the best you
could do was avoid using procedures in expressions when
they did not return a value.
For example, the procedure iswap() in Section III.E,
does not return a value. It works correctly if used as shown
in that section, but it also can be used incorrectly.
x = 2 * iswap(&i, &j);
In this expression, the value stored in x is unpredictable,
because iswap does not return a value. Most pre-ANSI C
compilers will not object to this statement, because by
default all procedures were presumed to return an integer.
With ANSI C, you can specify that iswap has the type
void, thereby assuring that erroneously using iswap in an
arithmetic expression will be flagged as an error.
Another use of void is to create a generic pointer. On
some machines, different types of pointers have different formats, and on most machines different data types
have different alignment requirements, which impose restrictions on legitimate pointer values. Until the ANSI
standardization, C lacked a generic pointer type that was
guaranteed to meet all of the requirements on any machine, that is, a pointer that would be compatible with all
of the other pointer types. This need is met by specifying
that something is a pointer to void.
EN002C-839
20:27
348
int &rx = x;
The ampersand in the above declaration is the syntactical
indicator that r x is a reference to an int variable, rather
than a true int variable. After r x has been declared a reference to x, it can be used in any situation where x itself
could be used. The simple-minded form shown above is
rarely used, but references are extensively used as procedure parameters and as return values from functions.
One of the simplest practical uses of references is to
write a slightly cleaner version of the iswap routine. (A
pointer version of iswap was shown in Section III.E.)
int temp = a;
a = b;
b = temp;
}
Because this version of iswap uses reference-to-int parameters, it is used somewhat differently than the previous
version.
int i, j;
i = 50;
j = 30;
iswap(i, j);
In addition to demonstrating references, this example
also shows several aspects of C++, including the positioning of parameter declarations in the procedure header, the
more thorough declaration of function return types (void
in this example), and the new // syntax (on the first line of
the example) to indicate a comment to the end of line.
B. Function Overloading
Programmers often need to create a family of procedures
to perform the same task on various data types. For example, the iswap procedure shown in Sections III.E and IV.A
works only for integers. It would also be useful to have
swap procedures for doubles, characters, and so forth. Although each could be given a unique name (e.g., iswap for
integers, dswap for doubles, etc.), unique names quickly
become tedious and error prone. Instead, in C++ one can
create a family of procedures that have the same name but
that accept different parameter types. This lets the programmer use a single name, while it gives the compiler
the job of choosing the correct procedure, based on the
parameter types.
The following example shows how one could overload
the swap procedure, creating versions for characters, integers, and doubles.
double temp = a; a = b; b =
temp;
}
When the compiler sees the statement swap(x,y) it will
choose a version of swap based on the types of the variables x and y. For example, if x and y are doubles, the
compiler will choose the third function shown above.
EN002C-839
C. Classes
Classes are the major new feature that C++ adds to C.
They are the C++ language feature that facilitates objectoriented programming. The key idea of object-oriented
programming is that the fundamental components of programs should be a objectsa data structure that combines
data (information storage) and procedures. Software objects are analogous to the raw parts that are used in other
creative disciplines. They make it possible to build more
complex entities than would otherwise be possible, they
may be modified and specialized as necessary, and they
allow the programmer to build software whose structure
parallels the structure of the problem domain.
Its important to understand the difference between a
class and an object. A class is what programmers work
with; its a concept that is expressed in a program. An
object (also called an instance) is the realization of a class
when a program is executing. A programmer might write
a program that defines a class to represent, say, complex
numbers. When a program that uses that class is running,
then each complex number in the program is an object. If,
for example, the program is using 100 complex numbers,
then there are 100 objects, each following the blueprint
established in the complex number class that was written
by a programmer.
Although their syntax is based on C structures, classes
go far beyond the capabilities of ordinary C structures.
r Classes may have both data elements and procedure
20:27
349
public part of a class is often called its interface
because it defines the set of operations that are used to
work with the class.
r Classes may have routines, called constructors and
destructors, that are called automatically when a class
instance is created or destroyed. Constructors are
responsible for initializing a class, while destructors
perform any necessary cleanup. Constructors are also
used to convert items of another type to the class type.
For example, a class that represents complex numbers
might have a constructor that would convert a double
into a complex.
r Classes may have operator functions, so that objects
can be manipulated using algebraic notation.
The following class declaration describes a data type
that represents complex numbers, which are numbers defined as having both real and imaginary parts. In algebraic
notation the letter i indicates the imaginary part of a number, thus 50 + 100i represents a complex with real part
of 50 and imaginary part of 100. A more realistic complex number class would have many more facilities; the
simplifications imposed in this simple example are for
clarity.
class Complex {
protected:
double realpart, imagpart;
public:
// constructors
Complex(void);
Complex(double r);
Complex(double r, double i);
Complex(Complex &c);
// ADD OP - add a complex
or a double to a complex
void operator+=(Complex&
rhs);
void operator+=(double d);
// extract the real and
imaginary parts of a
complex
double getReal() { return
realpart; }
double getImag() { return
imagpart; }
};
The class shown above contains two data elements, doubles named realpart and imagpart, and six operations. The
class is divided into two parts, the protected part and the
public part. The public elements of complex are universally accessible, while the elements in the protected part,
EN002C-839
20:27
350
which in this case are the data elements, can only be accessed by derived classes.
The first four operations, the constructors, are used to
create new complex numbers. The first creates a Complex
object that is initialized to zero, the second creates a Complex from a single number (the real part), the third creates
a Complex from a pair of numbers (the real and imaginary
parts), and the fourth creates a new Complex object from
an existing Complex object. In the class declaration shown
above, the four construction operations are described but
not defined. Here is the definition of the second constructor
(the others are similar, hence not shown).
Complex a;
Complex b(50, 100);
Complex c(b);
The Complex named a is initialized to zero, the Complex named b is initialized to 50 + l00i, and the Complex
named c is initialized to the value of b. To understand how
the above works, you must remember that the compiler
will call the appropriate version of the constructor, based
on the arguments. In the example, Complex variables a, b,
and c will be constructed using the first, third, and fourth
forms, respectively, of the constructors shown previously
in the class declaration.
The two procedures named operator+ = in the Complex
class declaration allow you to use the + = operator (the assign sum operator) to manipulate Complex numbers. (The
bodies of these procedures are not shown here.) This capability, which is known as operator overloading, is primarily a notational convenience that allows manipulations
of class objects to be expressed algebraically. C++ allows
nearly all of its rich set of operators to be overloaded.
The limitations of C++ operator overloading are that userdefined operator overloading must involve at least one
user-defined type (class) and that the standard precedence
and associativity may not be altered.
The first operator+ = procedure in the Complex class
lets you add one complex to another, the second allows you
to add a double (a real number) to a complex. For example,
the following two expressions automatically invoke the
first and second operator+ = functions. (Objects a and b
are complex.)
a += b;
b += 5;
The last two procedures shown in the Complex class
declaration are used to extract the real and imaginary parts
of a complex number. For example, the following statement assigns the imaginary part of a, which is a Complex,
to x, which is a double number.
x = a.getImag();
Note that ordinary C member of notation (the dot) is
used to access the getImag member function.
D. Inheritance
Inheritance is a facility that allows a programmer to create
a new class by specializing or extending an existing class.
In C++ the original class is called the base class and the
newly created class is called the derived class. Inheritance
is also called class derivation. Derivation can do two types
of things to a base class: new features can be added, and
existing features can be redefined. Derivation does not
allow for the removal of existing class features.
EN002C-839
20:27
351
class Motor {
double power;
double speed;
// other Motor characteristics
};
Note that only a few of the Motor classs members are
sketched above. Next lets look at the class declaration of
HighVoltageMotor. The notation in the class header states
that a HighVoltageMotor is derived from the Motor base
class. The body of HighVoltageMotor only lists things that
are added to the base class; the existing elements, such as
power, need not be restated:
EN002C-839
20:27
352
to whatever calling routine has arranged to handle that type
of problem. For example, consider a function named A that
manages the task of writing a document to disk. Naturally
A will call numerous other routines to do all the chores
necessary to accomplish the overall task. Before actually
calling its helper routines, A will enter a try/catch block
so that it can catch any input/output exceptions that occur
at any point in the operation. Any of the called subroutines
that detects a problem can simply throw an input/output
exception, knowing that it will be handled elsewhere (in
A in this example).
Another advantage of handling errors using exceptions
is that you can use a hierarchy of exceptions in order to provide a more fine-grained approach to handling errors. For
example, in addition to a generic input/output exception
that indicates something failed during an I/O operation,
you can also derive more specialized exceptions to indicate the precise failure, such as a file open error or a file
write error. Catching the generic I/O exception will catch
all of the I/O errors, which is probably what function A
(from our example) would want to do, but the more focused subroutines that A calls might want to handle one
of the more specialized exceptions locally.
G. Iostream Library
One of the more demanding tasks that must be handled
by any software development environment is input and
output. There are several difficulties in handling I/O, such
as the need to input and output any conceivable type of
data, and the fact that different computer systems provide
very different low-level primitives for performing I/O. Because of these difficulties, many programming languages
provide I/O facilities that are built into the language, tacitly
admitting that the language itself isnt powerful enough or
flexible enough to meet the needs of I/O operations.
One of the C languages innovations was its Standard
I/O Library (stdio), which is a flexible group of subroutines, written in C, that let a programmer perform input and
output operations. One problem with the C standard I/O
library is that it isnt type safe. For example, you can easily
(but erroneously) output or input a floating point number
using the format intended for integers. For example, the
following snippet of C code does just that, producing a
nonsense output:
double d = 5.0;
printf(The variable d has the
value, %d\n, d);
(The problem with the above is the %d format code, which
calls for an integer to output, but which is handed the
variable d, a double.)
double d = 5.0;
cout << The variable d has the
value << d << \n;
As you can see in the above statement, iostreams hijack
the << operator to create output expressions. Similarly, it
uses the >> operator to form input expressions.
Besides the advantage of being type-safe, iostream is
extensible. If you create a function to insert a Complex
into an output stream, then you can use Complex objects
with the iostream library as conveniently as you can use
the built-in types:
EN002C-839
20:27
353
I. Templates
In general usage, a template is a pattern that you use to create things. For example, in a woodworking class you might
use a template to guide you when you are sawing a particularly tricky curve in a project. Similarly, in C++ a template
is a set of generic instructions for performing some task.
For example, the task might be storing a collection of objects. The template would contain generic instructions for
storing and retrieving, plus it would include a parameter to
indicate what type of object should be managed. Then the
C++ compiler would actually create the C++ code to implement the storing operation on the given type of object.
This is often referred to as generic programming because
you are writing software that applies to any type of object.
For example, you might want to create a vector of objects, in which the objects would be accessed by a numeric
index. The first step would be to create a template that
would incorporate all the details of creating a vector of
some type of object. The template itself wouldnt be specific to a given type of object, rather it would be generic,
simply a set of instructions that could apply to storing any
object type in a vector. Then if the program contained object types Pt, Rect, and Sphere, then you could use the
vector template to create a vector of Pt, a vector of Rect,
and a vector of Sphere.
In addition to creating containers, templates are often
used to implement generic algorithm. An earlier example
in this article showed how to create an overloaded family
of functions to swap the values held in a pair of variables.
Since it would be tedious to create such functions for every
type of object in a large project, you could instead use
templates to create the function.
T temp = a; a = b; b = temp;
}
The template declaration shown above doesnt actually
create a swap function, but the compiler will follow the
recipe given in the template if you actually try to use a
swap function
Complex a(50);
Complex b(10, 20);
swap(a, b);
The code shown above first creates a pair of initialized
complex variables. When the compiler encounters the call
to swap with the two Complex arguments, it uses the template recipe for swap to create a version of swap appropriate for Complex arguments, and then it uses that newly
minted swap function to actually perform the operation.
list<Complex> complexNumbers;
Given the above declaration, individual Complex objects can be added to the list in many ways, such as adding
them to the front of the list:
Complex a(50);
Complex b(10, 20);
complexNumbers.push-front(a);
complexNumbers.push-front(b);
Other elements of the STL include vectors, stacks, and
queues. The emergence of the STL as a standard, key
component of C++ has greatly expanded the breadth of
tasks that are addressed by the libraries supplied with C++.
EN002C-839
20:27
354
BIBLIOGRAPHY
Ellis, M. A., and Stroustrup, B. (1991). The Annotated C++ Reference
Manual, Addison-Wesley, Reading, Massachusetts.
P1: GNB/MAG
EN003I-840
22:43
Computer Algorithms
Conor Ryan
University of Limerick
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
GLOSSARY
Algorithm Sequence of well-defined instructions the execution of which results in the solution of a specific
problem. The instructions are unambiguous and each
can be performed in a finite amount of time. Furthermore, the execution of all the instructions together takes
only a finite amount of time.
Approximation algorithm Algorithm that is guaranteed
to produce solutions whose value is within some prespecified amount of the value of an optimal solution.
Asymptotic analysis Analysis of the performance of an
algorithm for large problem instances. Typically the
time and space requirements are analyzed and provided
as a function of parameters that reflect properties of the
problem instance to be solved. Asymptotic notation
(e.g., big oh, theta, omega) is used.
Deterministic algorithm Algorithm in which the outcome of each step is well defined and determined by
the values of the variables (if any) involved in the step.
507
P1: GNB/MAG
EN003I-840
22:43
508
NP-Complete problem Decision problem (one for
which the solution is yes or no) that has the following property: The decision problem can be solved in
polynomial deterministic time if all decision problems
that can be solved in nondeterministic polynomial time
are also solvable in deterministic polynomial time.
Performance Amount of resources (i.e., amount of computer time and memory) required by an algorithm. If
the algorithm does not guarantee optimal solutions, the
term performance is also used to include some measure of the quality of the solutions produced.
Probabilistically good algorithm Algorithm that does
not guarantee optimal solutions but generally does provide them.
Simulated annealing Combinatorial optimization technique adapted from statistical mechanics. The technique attempts to find solutions that have value close to
optimal. It does so by simulating the physical process
of annealing a metal.
Stepwise refinement Program development methods in
which the final computer program is arrived at in a
sequence of steps. The first step begins close to the
problem specification. Each step is a refinement of the
preceding one and gets one closer to the final program.
This technique simplifies both the programming task
and the task of proving the final program correct.
Usually good algorithm Algorithm that generally provides optimal solutions using a small amount of computing resources. At other time, the resources required
may be prohibitively large.
IN ORDER to get a computer to solve a problem, it is necessary to provide it with a sequence of instructions that if
followed faithfully will result in the desired solution. This
sequence of instructions is called a computer algorithm.
When a computer algorithm is specified in a language the
computer understands (i.e., a programming language), it is
called a program. The topic of computer algorithms deals
with methods of developing algorithms as well as methods
of analyzing algorithms to determine the amount of computer resources (time and memory) required by them to
solve a problem and methods of deriving lower bounds on
the resources required by any algorithm to solve a specific
problem. Finally, for certain problems that are difficult
to solve (e.g., when the computer resources required are
impractically large), heuristic methods are used.
Computer Algorithms
P1: GNB/MAG
EN003I-840
22:43
509
Computer Algorithms
P1: GNB/MAG
EN003I-840
22:43
510
programming styles. The Pascal program has been written in such a way as to permit one to make changes with
ease. The number of months, interest rate, initial balance,
and monthly additions are more easily changed into Pascal
program.
Each of the three approaches is valid, and the one
that should eventually be used will depend on the user.
If the task only needs to be carried out occasionally,
then a calculator would probably suffice, but if it is to
be executed hundreds or thousands of times a day, then
clearly one of the computer programs would be more
suitable.
Computer Algorithms
(10,18,8,12) (9)
(10) (18,8,12,9)
(10,18,8,9) (12)
(10,18,12,9) (8)
P1: GNB/MAG
EN003I-840
22:43
511
Computer Algorithms
P1: GNB/MAG
EN003I-840
22:43
512
PROGRAM 8: Refinement of Program 7
line procedure sort (n)
1 {sort n numbers into nondecreasing
order}
2 for j := 2 to n do
3 begin {insert x[j] into x[1:j - 1]}
4
assign t and x[0] the value
x[j];
5
assign i the value j - 1;
6
while t < x[i] do {find correct
place for t}
7
begin
8
move x[i] to x[i + 1];
9
reduce i by 1;
10
end;{of while}
11
put t into x[i + 1]
12 end;{of sort}
Let us consider the route our development process
would have taken if we had decided to decompose sort
instances into two smaller instances of roughly equal size.
Let us further suppose that the left half of the sequence is
one of the instances created and the right half is the other.
For our example we get the instances (10,18) and (8,12,9).
These are sorted independently to get the sequence (10,18)
and (8.9,12). Next, the two sorted sequence are combined
to get the sequence (8,9,10,12,18). This combination process is called merging. The resulting sort algorithm is
called merge sort.
PROGRAM 9: Final version of Program 8
line procedure sort(n)
1 {sort n numbers into nondecreasing
order}
2 var t, i, j : integer;
2 begin
2 for j := 2 to n do
3 begin {insert x[j] into x[1 : j - 1]}
4
t := x[j];x[0]:=t;i:=j - 1;
6
while t < x[i] do {find correct
place for t}
7
begin
8
x[i + 1] := x[i];
9
i := i - 1;
10
end;{of while}
11
x[i + 1] := t
12 end;{of for}
13 end;{of sort}
PROGRAM 9: Merge Sort
line procedure MergeSort(X, n)
1 {sort n numbers in X}
2 if n > 1 then
3 begin
Computer Algorithms
P1: GNB/MAG
EN003I-840
22:43
513
Computer Algorithms
exact solution. Since approximate solutions are often easier to obtain than exact ones, we develop a notation for
approximate solutions.
Definition [Big oh]. f (n) = O(g(n)) (read as f of
n is big oh of g of n) iff there exist positive constants c
and n 0 such that f (n) cg(n) for all n, n n 0 . Intuitively,
O(g(n)) represents all functions f (n) whose rate of growth
is no more than that of g(n).
Thus, the statement f (n) = O(g(n)) states only that
g(n) is an upper bound on the value of f (n) for all n, n > n.
It does not say anything about how good this bound is.
Notice that n = O(n 2 ), n = O(n 2.5 ), n = 0 (n 3 )n = O(2n ),
and so on. In order for the statement f (n) = 0(g(n)) to be
informative, g(n) should be a small function of n as one
can come up with for which f (n) = O(g(n)). So while we
often say 3n + 3 = O(n2), even though the latter statement
is correct. From the definition of O, it should be clear that
f (n) = O(g(n)) is not the same as 0(g(n)) = f (n). In fact,
it is meaningless to say that O(g(n)) = f (n). The use of
the symbol = is unfortunate because it commonly denoted
the equals relation. Some of the confusion that results
from the use of this symbol (which is standard terminology) can be avoided by reading the symbol = as is and
not as equals. The recurrence for insertion sort can be
solved to obtain
t(n) = O(n 2 ).
To solve the recurrence for merge sort, we must use the
fact m(n) = O(n). Using this, we obtain
t(n) = O(n log n).
It can be shown that the average number of steps executed by insertion sort and merge sort are, respectively,
0(n 2 ) and 0(n log n). Analyses such as those performed
above for the worst-case and the average times are called
asymptotic analyses. 0(n2) and 0(n log n) are, respectively, the worst-case asymptotic time complexities of insertion and merge sort. Both represent the behavior of the
algorithms when n is suitably large. From this analysis we
learn that the growth rate of the computing time for merge
sort is less than that for insertion sort. So even if insertion
sort is faster for small n, when n becomes suitably large,
merge sort will be faster. While most asymptotic analysis
is carried out using the big oh notation, analysts have
available to them three other notations. These are defined
below.
Definition [Omega, Theta, and Little oh]. f (n) =
(g(n)) read as f of n is omega of g of n) iff there
exist positive constants c and n 0 such that f (n) cg(n)
for all n, n n 0 . f (n) is (g(n)) (read as f of n is
theta of g of n) iff there exist positive constants c1 , c2 ,
and n 0 such that c1 g(n) f (n) c2 g(n) for all n, n n 0 .
P1: GNB/MAG
EN003I-840
22:43
514
Computer Algorithms
f (n) = O(n m )
f (n) = (n m )
f (n) = (n m )
f (n) = o(am n m )
Asymptotic analysis can also be used for space complexity. While asymptotic analysis does not tell us how
many seconds an algorithm will run for or how many
words of memory it will require, it does characterize the
growth rate of the complexity. If an (n 2 ) procedure takes
2 sec when n = 10, then we expect it to take 8 sec when
n = 20 (i.e., each doubling of n will increase the time
by a factor of 4). We have seen that the time complexity of an algorithm is generally some function of the instance characteristics. As noted above, this function is very
useful in determining how the time requirements vary as
the instance characteristics change. The complexity function can also be used to compare two algorithms A and
B that perform the same task. Assume that algorithm A
has complexity (n) and algorithm B is t of complexity
(n 2 ). We can assert that algorithm A is faster than algorithm B for sufficiently large n. To see the validity
of this assertion, observe that the actual computing time
of A is bounded from above by c n for some constant c
and for all n, n n 2 , while that of B is bounded from below by d n 2 for some constant d and all n, n n 2 . Since
n log n
n2
n3
2n
0
1
2
3
4
5
1
2
4
8
16
32
0
2
8
24
64
160
1
4
16
64
256
1,024
1
8
64
512
4,096
32,768
2
4
16
256
65,536
4,294,967,296
P1: GNB/MAG
EN003I-840
22:43
515
Computer Algorithms
TABLE II Times on a 1 Billion Instruction per Second Computera
Time for f(n) instructions on a 109 instruction/sec computer
n
f (n) = n
10
20
30
40
50
100
1,000
0.01 sec
0.02 sec
0.03 sec
0.04 sec
0.05 sec
0.10 sec
1.00 sec
f (n) = n log2 n
f (n) = n2
f (n) = n3
0.03 sec
0.09 sec
0.15 sec
0.21 sec
0.28 sec
0.66 sec
9.96 sec
0.1 sec
0.4 sec
0.9 sec
1.6 sec
2.5 sec
10 sec
1 msec
1 sec
8 sec
27 sec
64 sec
125 sec
1 msec
1 sec
10,000
10.00 sec
100 msec
16.67 min
100,000
100.00 sec
1.66 msec
10 sec
11.57 day
1,000,000
1.00 msec
19.92 msec
16.67 min
130.3 sec
31.71 yr
f (n) = n4
10 sec
160 sec
810 sec
2.56 msec
6.25 msec
100 msec
16.67 min
115.7 day
3171 yr
3.17 107 yr
f (n) = n10
10 sec
2.84 hr
6.83 day
121.36 day
3.1 yr
3171 yr
3.17 103 yr
3.17 1023
yr
3.17 1033 yr
3.17 1043 yr
f(n) = nn
1sec
1 msec
1 sec
18.3 min
13 day
4 103 yr
32 10283 yr
simulations. Suppose we wish to measure the average performance of our two sort algorithms using the programming language Pascal and the TURBO Pascal (TURBO
is a trademark of Borland International) compiler on an
IBM-PC. We must first design the experiment. This design process involves determining the different values of
n for which the times are to be measured. In addition, we
must generate representative data for each n. Since there
are n! different permutations of n distinct numbers, it is
impractical to determine the average run time for any n
(other than small ns, say n < 9) by measuring the time
for all n! permutations and then computing the average.
Hence, we must use a reasonable number of permutations
and average over these. The measured average sort times
obtained from such experiments are shown in Table III.
As predicted by our earlier analysis, merge sort is faster
than insertion sort. In fact, on the average, merge sort will
sort 1000 numbers in less time than insertion sort will take
for 300! Once we have these measured times, we can fit
a curve (a quadratic in the case of insertion sort and an
n log n in the case of merge sort) through them and then
use the equation of the curve to predict the average times
for values of n for which the times have not been measured.
The quadratic growth rate of the insertion sort time and
the n log n growth rate of the merge sort times can be seen
clearly by plotting these times as in Fig. 2. By performing additional experiments, we can determine the effects
of the compiler and computer used on the relative performance of the two sort algorithms. We shall provide some
comparative times using the VAX 11780 as the second
computer. This popular computer is considerably faster
than the IBM-PC and costs 100 times as much. Our
first experiment obtains the average run time of Program
8 (the Pascal program for insertion sort). The times for
the V AX llnso were obtained using the combined translator and interpretive executer, pix. These are shown in
P1: GNB/MAG
EN003I-840
22:43
516
Computer Algorithms
TABLE III Average Times for Merge and Insertion Sorta
n
Merge
Insert
0
10
20
30
40
50
60
70
80
90
100
200
300
400
500
600
700
800
900
1000
0.027
1.524
3.700
5.587
7.800
9.892
11.947
15.893
18.217
20.417
22.950
48.475
81.600
109.829
138.033
171.167
199.240
230.480
260.100
289.450
0.032
0.775
2.253
4.430
7.275
10.892
15.013
20.000
25.450
31.767
38.325
148.300
319.657
567.629
874.600
IBM-PC turbo
VAX pix
50
100
200
300
400
10.9
38.3
148.3
319.7
567.6
22.1
90.47
353.9
805.6
1404.5
IBM-PC merge
sort turbo
VAX insertion
sort pc
400
500
600
700
800
900
1000
109.8
138.0
171.2
199.2
230.5
260.1
289.5
64.1
106.1
161.8
217.9
263.5
341.9
418.8
P1: GNB/MAG
EN003I-840
22:43
517
Computer Algorithms
Information-theoretic arguments
State space arguments
Adversary constructions
Reducibility constructions
A. Information-Theoretic Arguments
In an information-theoretic argument, one determines the
number of different behaviors the algorithm must exhibit
in order to work correctly for the given problem. For example, if an algorithm is to sort n numbers, it must be
capable of generating n! different permutations of the n
input numbers. This is because depending on the particular values of the n numbers to be sorted, any of these n!
permutations could represent the right sorted order. The
next step is to determine how much time every algorithm
that has this many behaviors must spend in the solution
of the problem. To determine this quantity, one normally
places restrictions on the kinds of computations the algorithm is allowed to perform. For instance, for the sorting
problem, we may restrict our attention to algorithms that
are permitted to compare the numbers to be sorted but not
permitted to perform arithmetic on these numbers. Under these restrictions, it can be shown that n log n is a
lower bound on the average and worst-case complexity of
sorting. Since the average and worst-case complexities of
merge sort is (n log n), we conclude that merge sort is an
asymptotically optimal sorting algorithm under both the
average and worst-case measures. Note that it is possible
for a problem to have several different algorithms that are
asymptotically optimal. Some of these may actually run
faster than others. For example, under the above restrictions, there may be two optimal sorting algorithms. Both
will have asymptotic complexity (n log n). However, one
may run in 10n log n time and the other in 2On log n time.
A lower bound f (n) is a tight lower bound for a certain
problem if this problem is, in fact, solvable by an algorithm of complexity O( f (n)). The lower bound obtained
above for the sorting problem is a tight lower bound for
algorithms that are restricted to perform only comparisons
among the numbers to be sorted.
P1: GNB/MAG
EN003I-840
22:43
518
the complexity of one problem to that of another using the
notion of reducibility that we briefly mentioned in the last
section. Two very important classes of reducible problems
are NP-hard and NP-complete. Informally, all problems in
the class NP-complete have the property that, if one can be
solved by an algorithm of polynomial complexity, then all
of them can. If an NP-hard problem can be solved by an algorithm of polynomial complexity, then all NP-complete
problems can be so solved. The importance of these two
classes comes from the following facts:
1. No NP-hard or NP-complete problem is known to be
polynomially solvable.
2. The two classes contain more than a thousand
problems that have significant application.
3. Algorithms that are not of low-order polynomial
complexity are of limited value.
4. It is unlikely that any NP-complete or NP-hard
problem is polynomially solvable because of the
relationship between these classes and the class of
decision problems that can be solved in polynomial
nondeterministic time.
We shall elaborate the last item in the following
subsections.
VI. NONDETERMINISM
According to the common notion of an algorithm, the result of every step is uniquely defined. Algorithms with this
property are called deterministic algorithms. From a theoretical framework, we can remove this restriction on the
outcome of every operation. We can allow algorithms to
contain an operation whose outcome is not uniquely defined but is limited to a specific set of possibilities. A computer that executes these operations are allowed to choose
anyone of these outcomes. This leads to the concept of
a nondeterministic algorithm. To specify such algorithms
we introduce three new functions:
r choice(S): Arbitrarily choose one of the elements of
set S.
Computer Algorithms
P1: GNB/MAG
EN003I-840
22:43
519
Computer Algorithms
3
x(i):= choice({0,1});
4
endfor
5 if ni=1 w(i)x(i) = M then success
6
else failure;
7 endif;
7 end;
A. NP-Hard and NP-Complete Problems
The size of a problem instance is the number of digits
needed to represent that instance. An instance of the sum of
subsets problem is given by (w(I ), w(2), . . . , w(n), M).
If each of these numbers is a positive integer, the instance
size is
n
log2 (w(i) + 1) + log2 (M + 1)
i=1
P1: GNB/MAG
EN003I-840
22:43
520
NP7: Hamiltonian Cycle
Input: An undirected graph G = (V, E).
Output: Yes if G contains a Hamiltonian cycle (i.e., a
path that goes through each vertex of G exactly once and
then returns to the start vertex of the path). No otherwise.
NP8: Bin Packing
Input: A set of n objects, each of size s(i), 1 i n
[s(i) is a positive number], and two natural numbers k and
C.
Output: Yes if the n objects can be packed into at most
k bins of size c. No otherwise. When packing objects
into bins, it is not permissible to split an object over two
or more bins.
NP9: Set Packing
Computer Algorithms
n
i=1
pi xi subject to
(a)
xi {0, 1}, 1 i n
n
(b) i=1
wi xi M
A feasible solution is any solution that satisfies the constraints C(x). For the 0/1-knapsack problem, any assignment of values to the xi S that satisfies constraints (a) and
(b) above is a feasible solution. An optimal solution is a
feasible solution that results in an optimal (maximum in
the case of the 0/1-knapsack problem) value for the optimization function. There are many interesting and important optimization problems for which the fastest algorithms known are impractical. Many of these problems
are, in fact, known to be NP-hard. The following are some
of the common strategies adopted when one is unable to
develop a practically useful algorithm for a given optimization:
P1: GNB/MAG
EN003I-840
22:43
521
Computer Algorithms
Best Fit Decreasing (BFD). This is the same as BF except that the objects are reordered as for FFD. It should be
possible to show that none of these methods guarantees
optimal packings. All four are intuitively appealing and
can be expected to perform well in practice. Let I be any
instance of the bin packing problem. Let b(I ) be the number of bins used by an optimal packing. It can be shown
that the number of bins used by FF and BF never exceeds
(17/10)b(I ) + 2, while that used by FFD and BFD does
not exceed (11/9)b(I ) + 4.
Example. Four objects with s(1 : 4) = (3, 5, 2, 4) are
to be packed in bins of size 7. When FF is used, object
1 goes into bin 1 and object 2 into bin 2. Object 3 fits
into the first bin and is placed there. Object 4 does not
fit into either of the two bins used so far and a new bin
is used. The solution produced utilizes 3 bins and has
objects 1 and 3 in bin I, object 2 in bin 2, and object 4 in
bin 3.
When BF is used, objects 1 and 2 get into bins 1 and 2,
respectively. Object 3 gets into bin 2, since this provides
a better fit than bin I. Object 4 now fits into bin I. The
packing obtained uses only two bins and has objects 1 and
4 in bin 1 and objects 2 and 3 in bin 2. For FFD and BFD,
the objects are packed in the order 2,4, 1,3. In both cases,
two-bin packing is obtained. Objects 2 and 3 are in bin 1
and objects 1 and 4 in bin 2. Approximation schemes (in
particular fully polynomial time approximation schemes)
are also known for several NP-hard problems. We will not
provide any examples here.
B. Other Heuristics
PROGRAM 12: General Form of an Exchange Heuristic
1. Let j be a random feasible solution [i.e., C( j) is
satisfied] to the given problem.
2. Perform perturbations (i.e., exchanges) on i until it is
not possible to improve j by such a perturbation.
3. Output i.
Often, the heuristics one is able to devise for a problem
are not guaranteed to produce solutions with value close to
optimal. The virtue of these heuristics lies in their capacity
to produce good solutions most of the time. A general
category of heuristics that enjoys this property is the class
of exchange heuristics. In an exchange heuristic for an
optimization problem, we generally begin with a feasible
solution and change parts of it in an attempt to improve
its value. This change in the feasible solution is called a
perturbation. The initial feasible solution can be obtained
using some other heuristic method or may be a randomly
generated solution. Suppose that we wish to minimize the
P1: GNB/MAG
EN003I-840
22:43
522
objective function f (i) subject to the constraints C. Here,
i denotes a feasible solution (i.e., one that satisfies C).
Classical exchange heuristics follow the steps given in
Program 12. This assumes that we start with a random
feasible solution. We may, at times, start with a solution
constructed by some other heuristic. The quality of the
solution obtained using Program 12 can be improved by
running this program several times. Each time, a different
starting solution is used. The best of the solutions produced
by the program is used as the final solution.
1. A Monte Carlo Improvement Method
In practice, the quality of the solution produced by an exchange heuristic is enhanced if the heuristic occasionally
accepts exchanges that produce a feasible solution with
increased f ( ). (Recall that f is the function we wish
to minimize.) This is justified on the grounds that a bad
exchange now may lead to a better solution later. In order to implement this strategy of occasionally accepting
bad exchanges, we need a probability function prob(i, j)
that provides the probability with which an exchange that
transforms solution i into the inferior solution j is to be accepted. Once we have this probability function, the Monte
Carlo improvement method results in exchange heuristics
taking the form given in Program 13. This form was proposed by N. Metropolis in 1953. The variables counter and
n are used to stop the procedure. If n successive attempts
to perform an exchange on i are rejected, then an optimum with respect to the exchange heuristic is assumed to
have been reached and the algorithm terminates. Several
modifications of the basic Metropolis scheme have been
proposed. One of these is to use a sequence of different
probability functions. The first in this sequence is used
initially, then we move to the next function, and so on.
The transition from one function to the next can be made
whenever sufficient computer time has been spent at one
function or when a sufficient number of perturbations have
failed to improve the current solution.
PROGRAM 13: Metropolis Monte Carlo Method
1. Let i be a random feasible solution to the given
problem. Set counter = 0.
2. Let j be a feasible solution that is obtained from i as
a result of a random perturbation.
3. If f ( j) < f (i), then [i = j, update best solution found
so far in case i is best, counter = 0, go to Step 2].
4. If f ( j) f (i) If counter = n then output best
solution found and stop. Otherwise, r = random
number in the range (0, 1).
If r < prob(i, j), then [i = j, counter = 0] else
[counter = counter + 1].
go to Step 2.
Computer Algorithms
The Metropolis Monte Carlo Method could also be referred to as a metaheuristic, that is, a heuristic that is
general enough to apply to a broad range of problems.
Similar to heuristics, these are not guaranteed to produce
an optimal solution, so are often used in situations either
where this is not crucial, or a suboptimal solution can be
modified.
IX. SUMMARY
In order to solve difficult problems in a reasonable amount
of time, it is necessary to use a good algorithm, a good
compiler, and a fast computer. A typical user, generally,
does not have much choice regarding the last two of these.
The choice is limited to the compilers and computers the
user has access to. However, one has considerable flexibility in the design of the algorithm. Several techniques
are available for designing good algorithms and determining how good these are. For the latter, one can carry out
an asymptotic analysis. One can also obtain actual run
times on the target computer.When one is unable to obtain a low-order polynomial time algorithm for a given
problem, one can attempt to show that the problem is NPhard or is related to some other problem that is known
to be computationally difficult. Regardless of whether
one succeeds in this endeavor, it is necessary to develop
a practical algorithm to solve the problem. One of the
suggested strategies for coping with complexity can be
adopted.
P1: GNB/MAG
EN003I-840
Computer Algorithms
BIBLIOGRAPHY
Aho, A., Hopcroft, J., and Ullman, J. (1974). The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA.
Canny, J. (1990). J. Symbolic Comput. 9 (3), 241250.
Garey, M., and Johnson, D. (1979). Computers and Intractability,
22:43
523
Freeman, San Francisco, CA.
Horowitz, E., and Sahni, S. (1978). Fundamentals of Computer Algorithms, Computer Science Press, Rockville, MD.
Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Science 220, 671680.
Knuth, D. (1972). Commun. ACM 15 (7), 671677.
Nahar, S., Sahni, S., and Shragowitz, E. (1985). ACM/IEEE Des. Autom.
Conf., 1985, pp. 748752.
Sahni, S. (1985). Concepts in Discrete Mathematics, 2nd ed., Camelot,
Fridley, MN.
Sahni, S. (1985). Software Development in Pascal, Camelot, Fridley,
MN.
Sedgewick, R. (1983). Algorithms, Addison-Westley, Reading, MA.
Syslo, M., Deo, N., and Kowalik, J. (1983). Discrete Optimization
Algorithms, Prentice-Hall, Engle-wood Cliffs, NJ.
P1: LDK/GJK
EN003B-842
12:13
Computer Viruses
Ernst L. Leiss
University of Houston
I.
II.
III.
IV.
GLOSSARY
Data integrity Measure of the ability of a (computer)
system to prevent unwanted (unauthorized) changes or
destruction of data and software.
Data security Measure of the ability of a (computer) system to prevent unwanted (unauthorized) access to data
and software.
Logical bomb Code embedded in software whose execution will cause undesired, possibly damaging, actions.
Subversion Any action that results in the circumvention
of violation of security principles.
Worm Self-contained program that is usually not permanently stored as a file and has the capacity of selfreplication and of causing damage to data and software.
Virus Logical bomb with the ability of self-replication. It
usually is a permanent part of an existing, permanently
stored file and has the capability of causing damage to
data and software.
tems, accompanied by the possibility of massive destruction of data and software. While until then these concerns were considered rather remote, the Internet attack
of 1988 shattered this complacency. In the intervening
decade, computer viruses have attained significant visibility in the computer-literate population, rivalling the
notoriety of Y2K-related problems but with substantially
greater staying power.
The reason for the attention attracted by these intruders
lies in their potential for destruction of data and software.
With the exception of some highly secured systems related to defense and national security, virtually all larger
computer systems are connected via computer networks,
commonly referred to as the Internet. Personal computers,
if they are not permanently linked into these networks,
have at least the capability of linking up to them intermittently through a variety of Internet service providers.
Networks are systems that allow the transmission of digitally encoded information (data, software, messages, as
well as still images, video, and audio) at relatively high
speeds and in relatively convenient ways from one system
to another. Subverting the function of a network may therefore result in the subversion of the computers linked by it.
Consequently, a scenario is very plausible in which a program may be transmitted that is capable of destroying large
amounts of data in all the computers in a given network.
577
P1: LDK/GJK
P2: FVZ Final
Encyclopedia of Physical Science and Technology
EN003B-842
12:13
578
The case that such a scenario is plausible has been made
for many years, starting with F. Cohens demonstration of
a computer virus in 1983.
In 1988, such a scenario was played out for the first time
on a worldwide scale. Since then, numerous incidents have
reinforced the publics sense of vulnerability to attacks by
insidious code fragments on software and data stored in all
kinds of computers. While earlier virus attacks spread via
diskettes and later via electronic bulletin boards (in ways
that required some user participation through loading infected programs), in recent years, the World Wide Web and
more sophisticated e-mail systems have provided transmission channels that facilitated the worldwide spread of
the attackers at a unprecedented speed. Moreover, infection which earlier required some explicit action by the
victim has become much more stealthy, with the advent
of viruses that become activated through the opening (or
even previewing) of an apparently innocent attachment to
an e-mail document.
The destruction of data and software has obvious economic implications. The resulting malfunctioning of computer systems may also affect safety-critical systems, such
as air-traffic control systems or control systems for hydroelectric dams or nuclear power plants. Futhermore, the
potential for disruption can be damaging: a bomb threat
can conceivably be more paralyzing that the explosion of
a small bomb itself. Protection against such treats may be
either impossible or unacceptable to usesrs in the necessarily resulting reduction of functionality and ease of use
of computer systems. It must be borne in mind that by
necessity, the notion of user friendliness of a computer
system of communications network is antithetical to the
notions of data security and data integrity.
Computer Viruses
condition is met. It does not have the capability of selfreplication. Activation of the logical bomb may abort a
program run or erase data or program files. If the condition
for execution is not satisfied at all times, it may be regarded
as a logical time bomb. Logical bombs that are activated in
every invocation are usually not as harmful as time bombs
since their actions can be observed in every execution of
the affected software. A typical time bomb is one where
a disgruntled employee inserts into complex software that
is frequently used (a compiler or a payroll system, for
example) code that will abort the execution of the software,
for instance, after a certain date, naturally chosen to fall
after the date of the employees resignation or dismissal.
While some programming errors may appear to be time
bombs (the infamous Y2k problem certainly being the best
known and most costly of these), virtually all intentional
logical bombs are malicious.
A computer virus is a logical bomb that is able to selfreplicate, to subvert a computer system in some way, and
to transmit copies of itself to other hardware and software
systems. Each of these copies in turn may self-replicate
and affect yet other systems. A computer virus usually
attaches itself to an existing program and thereby is permanently stored.
A worm is very similar to a computer virus in that it
is self-replicating and subverts a system; however, it usually is a self-contained program that enters a system via
regular communication channels in a network and then
generates its own commands. Therefore, it is frequently
not permanently stored as a file but rather exists only in the
main memory of the computer. Note that a logical bomb
resident in a piece of software that is explicitly copied to
another system may appear as a virus to the users, even
though it is not.
Each of the three types of subversion mechanisms,
bombs, viruses, and worms, can, but need not, cause damage. Instances are known in which bombs and viruses
merely printed out some brief message on the screen and
then erased themselves, without destroying data or causing other disruptions. These can be considered as relatively harmless pranks. However, it must be clearly understood that these subversion mechanisms, especially the
self-replicating ones, most definitely have enormous potential for damage. This may be due to deliberate and
explicit erasure of data and software, or it may be due to
far less obvious secondary effects. To give one example,
consider a worm that arrives at some system via electronic
mail, thereby activating a process that handles the receiving of mail. Typically, this process has a high priority; that
is, if there are any other processes executing, they will
be suspended until the mail handler is finished. Thus, if
the system receives many mail messages, a user may get
the impression that the system is greatly slowed down. If
P1: LDK/GJK
P2: FVZ Final
Encyclopedia of Physical Science and Technology
EN003B-842
12:13
579
Computer Viruses
P1: LDK/GJK
P2: FVZ Final
Encyclopedia of Physical Science and Technology
EN003B-842
12:13
580
Computer Viruses
eventually exceeds any available storage capacity, resulting in immediate detection of the virus. Returning now to
our question of how virus detection software works, we
can say that it does exactly the same that each virus does.
This ofcourse implies trivially that that test can be carried out only if the virus is known. In other words, virus
detection software will never be able to find any virus;
it will only be able to detect viruses that were known to
the authors of the detection software at the time it was
written. The upshot is that old virus detection software
is virtually worthless since it will not be able to detect
any viruses that appeared since the software was written.
Consequently, it is crucial to update ones virus detection
software frequently and consistently.
While many virus detection programs will attempt to
remove a virus once it is detected, removal is significantly
trickier and can result in the corruption of programs. Since
viruses and worms typically have all the access privileges
that the user has, but no more, it is possible to set the
permissions for all files so that writing is not permitted,
even for the owner of the files. In this way, the virus will not
be able to write the files, something that would be required
to insert the virus. It is true that the virus could subvert
the software that controls the setting of protections, but
to date (February 2000), no virus has ever achieved this.
(Whenever a user legitimately wants to write a file, the user
would have to change the protection first, then write the
file, and then change the protection back.) The primary
advantage of this method is that it is quite simple and
very effective. Its primary disadvantage is that users might
find it inconvenient. Other, more complicated approaches
include the following:
IV. CONCLUSION
P1: LDK/GJK
P2: FVZ Final
Encyclopedia of Physical Science and Technology
EN003B-842
Computer Viruses
internally disseminating, and strictly adhering to a carefully thought-out disaster recovery plan (which must function even if the usual computer networks are not operational!), it is likely that major damage can be minimized.
BIBLIOGRAPHY
Anti-Virus Emergency Response Team, hosted by Network Associates
(http://www.avertlabs.com).
12:13
581
Bontchev, V. (199). Future Trends in Virus Writing, Virus Test Center, University of Hamburg, Germany (http://www.virusbtn.com/other
papers/Trends).
Computer Emergency Response Team (CERT). Registered Service mark
of Carnegie-Mellon University, Pittsburgh (http://www.cert.org).
Department of Energy Computer Incident Advisory Capability (http://
www.ciac.org).
European Institute for Computer Anti-Virus Research (http://www.
eicar.com).
Kephart, J. O., Sorkin, G. B., Chess, D. M., and White, S.
R. (199). Fighting Computer Viruses, Sci. Am., New York
(http://www.sciam.com/ 1197issue/1197kephart.html).
Leiss, E. L. (1990). Software Under Siege: Viruses and Worms, Elsevier, Oxford, UK.
Polk, W. T., and Bassham, L. E. (1994). A Guide to the Selection
of Anti-Virus Tools and Techniques, Natl. Inst. Standards and Technology Computer Security Divison (http://csrc.ncsl.nist.gov/nistpubs/
select).
EN004M-843
June 8, 2001
15:37
Cryptography
Rebecca N. Wright
AT&T LabsResearch
I.
II.
III.
IV.
V.
VI.
VII.
Introduction
Attacks on Cryptographic Systems
Design and Use of Cryptographic Systems
Symmetric Key Cryptography
Public Key Cryptography
Key Distribution and Management
Applications of Cryptography
GLOSSARY
Ciphertext All or part of an encrypted message or file.
Computationally infeasible A computation is computationally infeasible if it is not practical to compute, for
example if it would take millions of years for current
computers.
Cryptosystem A cryptosystem or cryptographic system
consists of an encryption algorithm and the corresponding decryption algorithm.
Digital signature Cryptographic means of authenticating
the content and sender of a message, like a handwritten
signature is to physical documents.
Encryption The process of transforming information to
hide its meaning. An encryption algorithm is also called
a cipher or code.
Decryption The process of recovering information that
has been encrypted.
Key A parameter to encryption and decryption that controls how the information is transformed.
Plaintext The plaintext or cleartext is the data to be
encrypted.
EN004M-843
June 8, 2001
15:37
62
Historically, cryptography has been used to safeguard
military and diplomatic communications and has therefore
been of interest mainly to the government. Now, as the use
of computers and computer networks grows, there is an
increasing amount of information that is stored electronically, on computers that can be accessed from around the
world via computer networks. As this happens, businesses
and individuals are finding more of a need for protection
of information that is proprietary, sensitive or expensive
to obtain.
Traditionally, encryption was used for a sender to send
a message to a receiver in such a way that others could
not read or undetectably tamper with the message. Today,
encryption protects the privacy and authenticity of data
in transit and stored data, prevents unauthorized access
to computer resources, and more. Cryptography is commonly used by almost everyone, often unknowingly, as it
is increasingly embedded into ubiquitous systems, such as
automated bank teller machines, cellular telephones, and
World Wide Web browsers.
I. INTRODUCTION
Cryptography probably dates back close to the beginnings
of writing. One of the earliest known examples is the Caesar cipher, named for its purported use by Julius Caesar in
ancient Rome. The Caesar cipher, which can not be considered secure today, replaced each letter of the alphabet
with the letter occurring three positions later or 23 positions earlier in the alphabet: A becomes D, B becomes
E, X becomes A, and so forth. A generalized version of
the Caesar cipher is an alphabetic substitution cipher. As
an example, a simple substitution cipher might use the
following secret key to provide a substitution between
characters of the original message and characters of the
encrypted message.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
F R O A H I C W T Z X L U Y N K E B P M V G D S Q J
As is evident from even such a short example, this simple method does not disguise patterns in the text such as
repeated letters and common combinations of letters. In
fact, if the encrypted message is known to be English text,
it is usually quite easy to determine the original message,
even without the knowledge of the secret key, by using letter frequency analysis, guessing and checking, and maybe
a little intuition and luck. Such substitution ciphers are
Cryptography
commonly used today as puzzles in newspapers and puzzle books, but are not secure when used alone as cryptosystems. Polyalphabetic substitution ciphers, developed
by Len Battista in 1568, improved on regular substitution ciphers by changing the substitution scheme partway
through a message. Although substitution is not secure
when used alone, it can be useful when used in conjunction
with other techniques, and in fact, many cryptosystems
used today benefit from substitution when it is carefully
used as part of their encryption algorithms.
Another simple technique that is not secure alone, but
can be secure when used as part of a cryptosystem, is
transposition or permutation. A simple transposition cipher might rearrange the message
WE MEET AT DAWN IN THE MUSEUM,
E
T
A
T
D
A
W
N
I
N
T
H
E E
M U
U M
S,
EN004M-843
June 8, 2001
15:37
63
Cryptography
M KE R C
Dec:
C KD M,
In order for a cryptosystem to be useful, all three functions KeyGen, Enc, and Dec must be efficiently computable. Since key generation is generally done only infrequently and can often be done in advance, it is acceptable
for KeyGen to be somewhat less efficient, perhaps even
taking many minutes to compute. In contrast, encryption
and decryption are usually done more frequently and in
real time, so Enc and Dec should be more efficient, measuring on the order of milliseconds or less.
We would also like additional requirements to capture
the security of the cryptosystemfor example, that it is
difficult to determine any information about K D or M from
Enc(M, K E ,) alone. However, the specific meaning of this
requirement depends the computational power available
to an attacker, the abilities of the attacker to learn the encryptions of various messages, and other such factors, so
there is not one single definition that can capture security in all settings. A rather strong, and desirable, notion
of security is that of semantic security: from the ciphertext only, it should be computationally infeasible to learn
anything about the plaintext except its length. The ciphertext should not reveal, for example, any of the bits of the
ciphertext, nor should it suggest that some plaintext messages are more probably than others. Semantic security is
much stronger than simply stating that an attacker does
not learn the plaintext.
To see the importance of the key generation function in a
cryptosystem, consider again the Caesar cipher presented
previously. This can be thought of as encryption function
that rotates characters in the alphabet according to the key,
with a key generation function that always chooses the key
3. It is intuitively easy to see that an attacker who knows
that the key is always 3, or infers it by seeing a number of plaintext/ciphertext pairs, clearly has an advantage
over an attacker in a system where the key is chosen randomly from 1 to 26. In a system with a large key space,
the key generation function can help to formally express
the security of the cryptosystem by quantifying the a priori uncertainty the attacker has about the decryption key.
In some implemented systems, key generation is left to
the user, but this can be problematic because users are a
bad source of randomness, and therefore this effectively
reduces the size of the key space.
EN004M-843
June 8, 2001
15:37
64
B. Goals of Cryptosystems: What Cryptography
Can and Cannot Provide
Cryptography can be used to provide a variety of securityrelated properties. We will use the term message to refer
either to a message or any other kind of data, either in
transit or stored. Cryptography is often used to provide
the following important properties.
Confidentiality: protects the contents of data from being
read by unauthorized parties.
Authentication: allows the recipient of a message to positively determine the identity of the sender.
Integrity: ensures the recipient that a message has not
been altered from its original contents.
Nonrepudiation: allows the recipient of a message to
prove to a third party that the sender sent the
message.
There are a number of additional security-related properties that cryptography does not directly provide, but for
which cryptography can be part of a solution. These include anonymous communication, in which the receiver
of a message is prevented from learning the identity of
the sender; fair exchange, in which Alice should receive
a valid message from Bob if and only if Bob receives a
valid message from Alice; privacy from spam (unwanted
bulk electronic mail); preventing the recipient of a message from further distributing the message; and protection
against message traffic analysis.
Although cryptography is an important tool in securing
computer systems, it alone is not sufficient. Even if a strong
cryptosystem is used to authenticate users of a computer
system before allowing them access, this authentication
procedure can be easily subverted if there are other ways
for attackers to access the system, whether through mistakenly installed, poorly configured, or just plain buggy
software.
Cryptography
EN004M-843
June 8, 2001
15:37
65
Cryptography
In the types of attacks just described, it is usually assumed that all the ciphertexts were generated with the
same encryption key. In addition, some cryptosystems
are susceptible to related-message and related-key attacks,
in which the attacker has access to ciphertexts or plaintext/ciphertext pairs for keys or plaintext messages with
certain known relationships.
One measure of the practicality of an attack is the number and type of ciphertexts or plaintext/ciphertext pairs it
requires. Other measures include the computational complexity, also called the work factor, and the storage requirements of the attack.
In the case that encryption is being used to provide properties other than just secrecy, there are additional types of
attacks. For example, if encryption is being used to provide
authentication and integrity through digital signatures, an
attacker may attempt to forge signatures. As above, successful attacks can range from an existential forgery, in
which one signed message is forged, to a total break, and
attacks can use any number and type of signed messages.
Cryptanalysis describes attacks that directly attack the
cryptosystem itself. The main two classes of cryptanalytic
attacks, described below, are brute force attacks and structural attacks. We also describe some non-cryptanalytic
attacks.
D. Non-Cryptanalytic Attacks
1. Social Attacks
Social attacks describe a broad range of attacks that use
social factors to learn a users secret key. These range from
attempting to guess a secret key chosen by a user by using
information known about the user to calling a user on
the telephone pretending to be a system administrator and
asking to be given information that will allow the attacker
to gain access to the users private keys. Alternately, the
target of a social attack can be the contents of a particular
sensitive message, again by fooling the sender or recipient
into divulging information about it. Bribery and coercion
are also considered social attacks. The best defense against
social attacks is a combination of user education and legal
remedies.
2. System Attacks
System attacks are attacks in which an attacker attempts
to gain access to stored secret keys or stored unencrypted
documents by attacking through non-cryptographic means
the computer systems on which they are stored. Common
ways that this is done are:
r Exploiting known, publicized holes in common
EN004M-843
June 8, 2001
15:37
66
Edward Felten and others have described a number of
strong attacks that are partially system attacks and partially
social attacks, in which they take advantage of certain
features in the way systems such as Web browsers are
designed, combined with expected user behavior.
The best defenses against system attacks are prevention,
detection, and punishment, achieved by a combination of
good system administration, good firewalls, user education, and legal remedies.
3. Timing Attacks
Timing attacks were publicized by Paul Kocher in 1996.
They attack the implementation of cryptosystems by measuring observable differences in the timing of the algorithm based on the particular value of the key. They then
use statistical methods to determine the bits of key by
observing many operations using the same key. Timing
attacks typically require a significant number of chosen
ciphertexts.
Related attacks can use any measure of differences in
the performance of the encryption and decryption functions such as power consumption and heat dissipation.
Timing attacks and related attacks can be protected
against to some degree by blinding the devices performing encryption and decryption computations so that all
computations have the same performance, regardless of
the particular key and message being used. However, this
can have a substantial performance cost, as it requires all
computations to have worst-case performance. Such attacks can also be protected against by designing systems
so that they will not act as an oracle by decrypting and returning all and any messages that come their way, thereby
preventing an attacker from obtaining the necessary data
to carry out the attack. However, this is not always possible
without interfering with the purpose of the system.
Cryptography
EN004M-843
June 8, 2001
15:37
67
Cryptography
new ciphertext message and change its meaning. For example, a lucky attacker might be able to change an electronic funds transfer to a different amount or a different
payee.
The other modes discussed as follows avoid this problem by incorporating some feedback that causes different
occurrences of the same plaintext block to encrypt to different ciphertexts.
2. Cipher Block Chaining Mode (CBC)
In cipher block chaining mode, the plaintext of a block is
combined with the ciphertext of the previous block via an
exclusive or (xor) operation, and the result is encrypted.
The result is the ciphertext of that block, and will also be
used in the encryption of the following block. An initialization vector (IV) acts as the previous ciphertext block
for the first plaintext block. The initialization vector can
be made public (i.e., can be sent in the clear along with the
ciphertext), but ideally should not be reused for encryption
of different messages to avoid having the same ciphertext
prefix for two messages with the same plaintext prefix.
Decryption reverses the process. The first block of ciphertext is decrypted and then xored with the initialization
vector; the result is the first plaintext block. Subsequent
ciphertext blocks are decrypted and then xored with the
ciphertext of the previous block.
One concern in feedback modes is synchronization after transmission errors. Cipher block chaining is selfsynchronizing: a transmission error in one block will result
in an error in that block and the following block, but will
not affect subsequent blocks.
Plaintext block chaining is also possible.
3. Cipher Feedback Mode (CFB)
Cipher feedback mode allows a block cipher with block
size n bits to be used as a stream cipher with a data encryption unit of m bits, for any m n.
In CFB mode, the block cipher operates on a register of
n bits. The register is initially filled with an initialization
vector. To encrypt m bits of data, the block cipher is used
to encrypt the contents of the register, the leftmost m bits
of the result are xored with the m bits of data, and the result
is m bits of ciphertext. In addition, the register is shifted
left by m bits, and those m ciphertext bits are inserted in
the right-most m register bits to be used in processing the
next m bits of plaintext.
Decryption reverses the process. The register initially
contains the initialization vector. To decrypt m bits of ciphertext, the block cipher is used to encrypt the contents
of the register, and the resulting leftmost m bits are xored
with the m ciphertext bits to recover m plaintext bits. The
m ciphertext bits are then shifted left into the register.
EN004M-843
June 8, 2001
15:37
68
Cryptography
EN004M-843
June 8, 2001
15:37
69
Cryptography
EN004M-843
June 8, 2001
15:37
70
specific substitution and permutation parameters of DES
to provide as much resistance to differential cryptanalysis
as possible.
b. Linear cryptanalysis. Linear cryptanalysis was
invented by Mitsuru Matsui and Atsuhiro Yamagishi in
1992, and applied by Matsui to DES in 1993. Like differential cryptanalysis, linear cryptanalysis also requires
a large number of plaintext/ciphertext pairs. Linear cryptanalysis uses plaintext/ciphertext pairs to generate a linear
approximation to each round, that is, a function that approximates the key for each round as an xor of some of
the rounds input bits and output bits. An approximation
to DES can be obtained by combining the 16 1-round approximations. The more plaintext/ciphertext pairs that are
used, the more accurate the approximation will be. With
243 plaintext/ciphertext pairs, linear cryptanalysis requires
time 213 and has success probability .85 of recovering the
key.
D. Advanced Encryption Standard
In 1997, the United States National Institute of Standards
(NIST) began the process of finding a replacement for
DES. The new advanced encryption standard (AES) would
need to be an unpatented, publicly disclosed, symmetric
key block cipher, operating on 128 bit blocks, and supporting key sizes of 128, 192, and 256 bits, large enough
to resist brute force attacks well beyond the foreseeable
future. Several candidates were submitted, and were considered for security, efficiency, and ease of implementation. Fifteen submitted algorithms from twelve countries
were considered in the first round of the selection process,
narrowed to five in the second round. On October 2, 2000,
NIST announced that it had selected Rijndael, a block
cipher developed by Belgian cryptographers Joan Daemen and Vincent Rijmen, as the proposed AES algorithm.
Rijndael was chosen for its security, performance, efficiency, implementability, and flexibility. Before Rijndael
can actually become the standard, it must first undergo a
period of public review as a Draft Federal Information Processing Standard (FIPS) and then be officially approved
by the United States Secretary of Commerce. This process
is expected to be completed by the middle of 2001.
The Rijndael algorithm supports a variable key size and
variable block size of 128, 192, or 256 bits, but the standard
is expected to allow only block size 128, and key size 128,
192, or 256. Rijndael proceeds in rounds. For a 128-bit
block, the total number of rounds performed is 10 if the
key length is 128 bits, 12 if the key length is 192 bits, and
14 if the key length is 256 bits.
Unlike the Feistel structure of DES, Rijndaels rounds
are divided into three layers, in each of which each bit
Cryptography
EN004M-843
June 8, 2001
15:37
71
Cryptography
left, by the same offsets as ShiftRow. InvMixColumn replaces the polynomial by its inverse in GF(28 ). Since xor
is its own inverse, AddRoundKey remains the same, except that in decryption the Round keys must be used in
the reverse order. Hence, Rijndael decryption starts with
an AddRoundKey step, and then operates in rounds consisting of InvByteSub, InvShiftRow, InvMixColumn, and
AddRoundKey, followed by a final round of InvByteSub,
InvShiftRow, and InvMixColumn.
At the time of this writing, encryption modes for AES
are being determined, and will probably consist of ECB,
CBC, CFB, OFB, and counter modes, as well as possibly
others.
EN004M-843
June 8, 2001
15:37
72
also secretly developed similar ideas as early as the
1960s.
In public key systems, the encryption key K E , also
called the public key, and the decryption key K D , also
called the secret key, are different from each other. Furthermore, it is computationally infeasible to determine K D
from K E . Therefore, the encryption key can be made public. This has two advantages for key management. First,
instead of each pair of users requiring a different secret
key, as in the case of symmetric key cryptography (for
a total of n 2 n encryption keys for n users), each user
can have a single encryption key that all the other users
can use to send her encrypted messages (for a total of n
encryption keys for n users). Second, keys no longer need
to be exchanged privately before encrypted messages can
be sent.
Although the public keys in a public key cryptosystem need not be communicated secretly, they still must
be communicated in an authenticated manner. Otherwise,
an attacker Marvin could try convince Bob into accepting Marvins public key in place of Alices public key. If
Marvin succeeds, then encrypted messages from Bob to
Alice will actually be readable by Marvin instead of by
Alice. If Marvin has sufficient control over the communication network, he can even prevent detection by Alice by
intercepting the messages from Bob and then reencrypting the messages with Alices real public key and sending
them to her. In order to avoid such man-in-the-middle
attacks, public keys are usually certified by being digitally
signed by other entities.
Assuming that Alice and Bob already know each others
public keys, they can communicate privately as follows.
To send her message M A to Bob, Alice encrypts it with
Bobs public key and sends the resulting ciphertext to Bob.
Bob uses his private key to decrypt the ciphertext and obtain M A . To send his response M B to Alice, Bob encrypts it
with Alices public key and sends the resulting ciphertext
to Alice. Alice uses her private key to decrypt the ciphertext and obtain M B . An eavesdropper who overhears the
ciphertexts does not learn anything about M A and M B
because she does not have the necessary decryption keys.
The fundamental mathematical idea behind public key
cryptosystems are trapdoor one-way functions. A function
is one-way if it is hard to invert it: that is, given a value y it
is computationally infeasible to find x such that f (x) = y.
A one-way function is said to have the trapdoor property
if given the trapdoor information, it becomes easy to invert the function. To use a trapdoor one-way function as
a public key cryptosystem, the one-way function is used
as the encryption algorithm, parametrized by its public
key. The trapdoor information is the secret key. Trapdoor
one-way functions are conjectured, but not proven, to exist. As such, all known public key cryptosystems are in
Cryptography
EN004M-843
June 8, 2001
15:37
73
Cryptography
EN004M-843
June 8, 2001
15:37
74
Cryptography
EN004M-843
June 8, 2001
15:37
75
Cryptography
c. Small public exponent attacks. In order to improve efficiency of encryption, it has been suggested to instead fix e = 3. However, certain attacks have been demonstrated when either e is too small. The most powerful of
these attacks is based on lattice basis reduction and is due
to Don Coppersmith. Coppersmiths attack is not a total
break. However, if the public exponent is small enough
and certain relationships between messages are known, it
allows the attacker to succeed in learning the actual messages. If the encryption key is small enough and some bits
of the decryption key are known, it allows the attacker to
learn the complete decryption key. To avoid these attacks,
it is important that the public exponent is chosen to be sufficiently large. It is still believed secure, and is desirable
for efficiency reasons, to choose e to be of the form 2k + 1
for some k 16.
d. Small private exponent attacks. An attack of
Michael Wiener shows that if d < (1/3)N 1/4 , than attacker
can efficiently recover the private exponent d from the
public key < N , e). The attack is based on continued
fraction approximations.
In addition to the attacks just described that relate to how
the RSA parameters are chosen, there are also a number
of attacks on RSA that relate to how RSA is used. As
mentioned earlier, if the message space is small and no
randomization is used, an attacker can learn the plaintext
of a ciphertext C by encrypting each message in the message space and see which one gives the target ciphertext
C. Some additional usage attacks on RSA are described
below. RSA is also susceptible to timing attacks, described
earlier.
e. Bleichenbachers padding attack. Daniel
Bleichenbacher showed a adaptive chosen-ciphertext
attack on RSA as implemented in the PKCS1 standard,
which uses the approach of appending random bits
to a short message M before encrypting it to make it
n bits long. In PKCS1, a padded message looks like
this:
02
random pad
00
M ,
EN004M-843
June 8, 2001
15:37
76
Cryptography
Assuming that the discrete logarithm problem is computationally infeasible, an attacker overhearing the conversation between Alice and Bob can not learn g x y mod p.
However, it is subject to the kind of man-in-the-middle
attack discussed earlier.
C. Key Distribution Centers
In a key distribution center (KDC) solution, a key distribution center shares a secret key with all participants and
is trusted to communicate keys from one user to another.
If Alice wants to exchange a key with Bob, she asks the
KDC to choose a key for Alice and Bob to use and send
it securely to each of them. While it may be possible to
have such solutions within a particular business, they do
not scale well to large systems or systems that cross administrative boundaries.
D. Public Key Infrastructures
In a public key infrastructure (PKI), any user Alice should
be able to determine the public key of any other user Bob,
and to be certain that it is really Bobs public key. This
is done by having different entities digitally sign the pair:
Bob, K E , consisting of Bobs identity and public key. In
practice, a certificate will also contain other information,
such as an expiration date, the algorithm used, and the
identity of the signer. Now, Bob can present his certificate
to Alice, and if she can verify the signature and trusts
the signer to tell the truth, she knows K E is Bobs public
key. As with other key exchange solutions, this is simply
moving the need for secrecy or authentication from one
place to another, but can sometimes be useful.
The two main approaches to building a large-scale
PKI are the hierarchical approach and the web of trust
approach. In either model, a participant authenticates
user/key bindings by determining one or more paths of
certificates such that the user trusts the first entity in the
path, certificates after the first are signed by the previous
entity, and the final certificate contains the user/key binding in question. The difference between the two models is
in the way trust is conveyed on the path.
In the hierarchical model, a certificate is signed by a
certificate authority (CA). Besides a key binding, a CA
certificate authorizes a role or privilege for the certified
entity, by virtue of its status as an authority within its
domain. For example, a company can certify its employees keys because it hired those employees; a commercial
certificate authority (CA) can certify its customers keys
because it generated them; a government or commercial
CA can certify keys of hierarchically subordinate CAs by
its powers of delegation; government agencies can certify
keys of government agencies and licensed businesses, as
empowered by law; and an international trade bureau can
EN004M-843
June 8, 2001
15:37
77
Cryptography
BIBLIOGRAPHY
EN004I-845
June 8, 2001
18:21
I.
II.
III.
IV.
V.
GLOSSARY
Association rules link the values of a group of attributes,
or variables, with the value of a particular attribute of
interest which is not included in the group.
Data mining process takes place in four main stages:
Data Pre-processing, Exploratory Data Analysis, Data
Selection, and Knowledge Discovery.
Data mining tools are software products; a growing
number of such products are becoming commercially
available. They may use just one approach (single
paradigm), or they may employ a variety of different
methods (multi-paradigm).
Deviation detection is carried out in order to discover
Interestingness in the data. Deviations may be detected
either for categorical or numerical data.
Interestingness is central to Data Mining where we are
looking for new knowledge which is nontrivial. It allows the separation of novel and useful patterns from
the mass of dull and trivial ones.
DATA MINING is the process by which computer programs are used to repeatedly search huge amounts of data,
usually stored in a Database, looking for useful new patterns. The main developments that have led to the emergence of Data Mining have been in the increased volume
of data now being collected and stored electronically, and
an accompanying maturing of Database Technology. Such
developments have meant that traditional Statistical Methods and Machine Learning Technologies have had to be
extended to incorporate increased demands for fast and
scaleable algorithms.
In recent years, Database Technology has developed increasingly more efficient methods for data processing and
229
EN004I-845
June 8, 2001
18:21
230
data access. Simultaneously there has been a convergence
between Machine Learning Methods and Database Technology to create value-added databases with an increased
capability for intelligence. There has also been a convergence between Statistics and Database Technology.
Data Pre-processing
Exploratory Data analysis
Data Selection
Knowledge Discovery
EN004I-845
June 8, 2001
18:21
231
into the business process which in turn feeds back into the
Data Mining process.
EN004I-845
June 8, 2001
18:21
232
2. Classification
A commonly occurring task in Data Mining is that of
classifying cases from a dataset into one of a number
of well-defined categories. The categories are defined by
sets of attribute values, and cases are allocated to categories according to the attribute values that they possess.
The selected combinations of attribute values that define
the classes represent features within the particular context of the classification problem. In the simplest cases,
classification could be on a single binary-valued attribute,
and the dataset is partitioned into two groups, namely,
those cases with a particular property, and those without
it. In general it may only be possible to say which class
the case is closest to, or to say how likely it is that the
case is in a particular category.
Classification is often carried out by supervised Machine Learning, in which a number of training examples
(tuples whose classification is known) are presented to the
system. The system learns from these how to classify
other cases in the database which are not in the training set.
Such classification may be probabilistic in the sense that it
is possible to provide the probability that a case is any one
of the predefined categories. Neural Networks are one
of the main Machine Learning technologies used to carry
out classification. A probabilistic approach to classification may be adopted by the use of discriminant functions.
3. Clustering
In the previous section, the classification problem was considered to be essentially that of learning how to make decisions about assigning cases to known classes. There are,
however, different forms of classification problem, which
may be tackled by unsupervised learning, or clustering.
Unsupervised classification is appropriate when the definitions of the classes, and perhaps even the number of
classes, are not known in advance, e.g., market segmentation of customers into similar groups who can then be
targeted separately.
One approach to the task of defining the classes is to
identify clusters of cases. In general terms, clusters are
groups of cases which are in some way similar to each
other according to some measure of similarity. Clustering
algorithms are usually iterative in nature, with an initial
classification being modified progressively in terms of the
class definitions. In this way, some class definitions are
discarded, whilst new ones are formed, and others are
modified, all with the objective of achieving an overall
goal of separating the database tuples into a set of cohesive
categories. As these categories are not predetermined, it
is clear that clustering has much to offer in the process of
Data Mining in terms of discovering concepts, possibly
within a concept hierarchy.
4. Summarization
Summarization aims to present concise measures of the
data both to assist in user comprehension of the underlying structures in the data and to provide the necessary inputs to further analysis. Summarization may take the form
of the production of graphical representations such as bar
charts, histograms, and plots, all of which facilitate a visual
overview of the data, from which sufficient insight might
be derived to both inspire and focus appropriate Data Mining activity. As well as assisting the analyst to focus on
those areas in a large database that are worthy of detailed
analysis, such visualization can be used to help with the
analysis itself. Visualization can provide a drill-down
and drill-up capability for repeated transition between
summary data levels and detailed data exploration.
5. Pattern Recognition
Pattern recognition aims to classify objects of interest
into one of a number of categories or classes. The objects of interest are referred to as patterns, and may range
from printed characters and shapes in images to electronic
waveforms and digital signals, in accordance with the data
under consideration. Pattern recognition algorithms are
designed to provide automatic identification of patterns,
without the need for human intervention. Pattern recognition may be supervised, or unsupervised.
The relationships between the observations that describe a pattern and the classification of the pattern are
used to design decision rules to assist the recognition
process. The observations are often combined to form features, with the aim that the features, which are smaller in
number than the observations, will be more reliable than
the observations in forming the decision rules. Such feature extraction processes may be application dependent,
or they may be general and mathematically based.
6. Discovery of Interestingness
The idea of interestingness is central to Data Mining
where we are looking for new knowledge that is nontrivial. Since, typically, we may be dealing with very large
amounts of data, the potential is enormous but so too is the
capacity to be swamped with so many patterns and rules
that it is impossible to make any sense out of them. It is
the concept of interestingness that provides a framework
for separating out the novel and useful patterns from the
myriad of dull and trivial ones.
Interestingness may be defined as deviations from the
norm for either categorical or numerical data. However, the initial thinking in this area was concerned with
categorical data where we are essentially comparing the
deviation between the proportion of our target group with
EN004I-845
June 8, 2001
18:21
233
a particular property and the proportion of the whole population with the property. Association rules then determine
where particular characteristics are related.
An alternative way of computing interestingness for
such data comes from statistical considerations, where we
say that a pattern is interesting if there is a statistically
significant association between variables. In this case the
measure of interestingness in the relationship between two
variables A and B is computed as:
Probability of (A and B)-Probability of
(A) Probability of (B).
Interestingness for continuous attributes is determined in
much the same way, by looking at the deviation between
summaries.
7. Predictive Modeling
In Predictive Modeling, we are concerning with using
some attributes or patterns in the database to predict other
attributes or extract rules. Often our concern is with trying
to predict behavior at a future time point. Thus, for business applications, for example, we may seek to predict
future sales from past experience.
Predictive Modeling is carried out using a variety of
technologies, principally Neural Networks, Case-Based
Reasoning, Rule Induction, and Statistical Modeling, usually via Regression Analysis. The two main types of
predictive modeling are transparent (explanatory) and
opaque (black box). A transparent model can give information to the user about why a particular prediction is
being made, while an opaque model cannot explain itself
in terms of the relevant attributes. Thus, for example, if
we are making predictions using Case-Based Reasoning,
we can explain a particular prediction in terms of similar
behavior commonly occurring in the past. Similarly, if we
are using a statistical model to predict, the forecast is obtained as a combination of known values which have been
previously found to be highly relevant to the attribute being predicted. A Neural Network, on the other hand, often
produces an opaque prediction which gives an answer to
the user but no explanation as to why this value should be
an accurate forecast. However, a Neural Network can give
extremely accurate predictions and, where it may lack in
explanatory power, it more than makes up for this deficit
in terms of predictive power.
the presentation of information. Such approaches are often concerned with summarizing data in such a way as to
facilitate comprehension and interpretation. It is important to have the facility to handle the commonly occurring
situation in which it is the case that too much information
is available for presentation for any sense to be made of
itthe haystack view. The information extracted from
Visualization may be an end in itself or, as is often the case,
may be a precursor to using some of the other technologies
commonly forming part of the Data Mining process.
Visual Data Mining allows users to interactively explore
data using graphs, charts, or a variety of other interfaces.
Proximity charts are now often used for browsing and
selecting material; in such a chart, similar topics or
related items are displayed as objects close together, so
that a user can traverse a topic landscape when browsing
or searching for information. These interfaces use colors,
filters, and animation, and they allow a user to view data
at different levels of detail. The data representations, the
levels of detail and the magnification, are controlled by
using mouse-clicks and slider-bars.
Recent developments involve the use of virtual reality,
where, for example, statistical objects or cases within a
database may be represented by graphical objects on the
screen. These objects may be designed to represent people,
or products in a store, etc., and by clicking on them the
user can find further information relating to that object.
9. Dependency Detection
The idea of dependency is closely related to interestingness and a relationship between two attributes may be
thought to be interesting if they can be regarded as dependent, in some sense. Such patterns may take the form
of statistical dependency or may manifest themselves as
functional dependency in the database. With functional
dependency, all values of one variable may be determined
from another variable. However, statistical dependency is
all we can expect from data which is essentially random
in nature.
Another type of dependency is that which results from
some sort of causal mechanism. Such causality is often
represented in Data Mining by using Bayesian Belief Networks which discover and describe. Such causal models allow us to predict consequences, even when circumstances change. If a rule just describes an association, then
we cannot be sure how robust or generalizable it will be
in the face of changing circumstances.
8. Visualization
Visualization Methods aim to present large and complex
data sets using pictorial and other graphical representations. State-of-the-art Visualization techniques can thus
assist in achieving Data Mining objectives by simplifying
EN004I-845
June 8, 2001
18:21
234
handling uncertainty is to use classical, or Bayesian,
probability. This allows us to establish the probabilities,
or support, for different rules and to rank them accordingly. One well-known example of the use of Bayesian
probability is provided by the Bayesian Classifier which
uses Bayes Theorem as the basis of a classification
method. The various approaches to handling uncertainty
have different strengths and weaknesses that may make
them particularly appropriate for particular Mining tasks
and particular data sets.
11. Sequence Processing
Sequences of data, which measure values of the same attribute at a sequence of different points, occur commonly.
The best-known form of such data arises when we collect
information on an attribute at a sequence of time points,
e.g., daily, quarterly, annually. However, we may instead
have data that are collected at a sequence of different points
in space, or at different depths or heights. Statistical data
that are collected at a sequence of different points in time
are known as time series.
In general, we are concerned with finding ways of describing the important features of a time series, thus allowing Predictive Modeling to be carried out over future time
periods. There has also been a substantial amount of work
done on describing the relationship between one time series and another with a view to determining if two time
series co-vary or if one has a causal effect on the other.
Such patterns are common in economic time series, where
such variables are referred to as leading indicators. The
determination of such leading indicators can provide new
knowledge and, as such, is a fertile area for Data Mining.
The methods used for Predictive Modeling for the purpose of sequence processing are similar to those used for
any other kind of Predictive Modeling, typically Rule Induction and Statistical Regression. However, there may
be particular features of sequences, such as seasonality,
which must be incorporated into the model if prediction
is to be accurate.
F. Data Mining Approaches
As has already been stated, Data Mining is a multidisciplinary subject with major input from the disciplines of
Machine Learning, Database Technology and Statistics
but also involving substantial contributions from many
other areas, including Information Theory, Pattern Recognition, and Signal Processing. This has led to many different approaches and a myriad of terminology where different communities have developed substantially different
terms for essentially the same concepts. Nonetheless, there
is much to gain from such an interdisciplinary approach
and the synergy that is emerging from recent developments
in the subject is one of its major strengths.
EN004I-845
June 8, 2001
18:21
235
EN004I-845
June 8, 2001
18:21
236
output is to the required output; this error is reduced by a
technique called back-error propagation. This approach
is a supervised method in which the network learns the
connection weights as it is taught more examples. A different approach is that of unsupervised learning, where
the network attempts to learn by finding statistical features
of the input training data.
4. Case-Based Reasoning
Case-Based Reasoning (CBR) is used to solve problems
by finding similar, past cases and adapting their solutions.
By not requiring specialists to encapsulate their expertise
in logical rules, CBR is well suited to domain experts who
attempt to solve problems by recalling approaches which
have been taken to similar situations in the past. This is
most appropriate in domains which are not well understood, or where any rules that may have been devised have
frequent exceptions. CBR thus offers a useful approach to
building applications that support decisions based on past
experience.
The quality of performance achieved by a case-based
reasoner depends on a number of issues, including its experiences, and its capabilities to adapt, evaluate, and repair
situations. First, partially matched cases must be retrieved
to facilitate reasoning. The retrieval process consists of
two steps: recalling previous cases, and selecting a best
subset of them. The problem of retrieving applicable cases
is referred to as the indexing problem. This comprises the
matching or similarity-assessment problem, of recognizing that two cases are similar.
5. Genetic Algorithms
Genetic Algorithms (GAs) are loosely based on the biological principles of genetic variation and natural selection. They mimic the basic ideas of the evolution of life
forms as they adapt to their local environments over many
generations. Genetic Algorithms are a type of evolutionary algorithm, of which other types include Evolutionary
Programming and Evolutionary Strategies.
After a new generation is produced, it may be combined
with the population that spawned it to yield the new current population. The size of the new population may be
curtailed by selection from this combination, or alternatively, the new generation may form the new population.
The genetic operators used in the process of generating
offspring may be examined by considering the contents
of the population as a gene pool. Typically an individual
may then be thought of in terms of a binary string of fixed
length, often referred to as a chromosome. The genetic
operators that define the offspring production process are
usually a combination of crossover and mutation opera-
EN004I-845
June 8, 2001
18:21
237
EN004I-845
June 8, 2001
18:21
238
that a proposition is either true or it is not. Fuzzy Logic
is defined via a membership function that measures the
degree to which a particular element is a member of a set.
The membership function can take any value between 0
and 1 inclusive.
In common with a number of other Artificial Intelligence methods, fuzzy methods aim to simulate human
decision making in uncertain and imprecise environments.
We may thus use Fuzzy Logic to express expert opinions
that are best described in such an imprecise manner. Fuzzy
systems may therefore be specified using natural language
which allows the expert to use vague and imprecise terminology. Fuzzy Logic has also seen a wide application to
control theory in the last two decades.
An important use of fuzzy methods for Data Mining is
for classification. Associations between inputs and outputs
are known in fuzzy systems as fuzzy associative memories or FAMs. A FAM system encodes a collection of
compound rules that associate multiple input statements
with multiple output statements We combine such multiple statements using logical operators such as conjunction,
disjunction and negation.
5. Rough Sets
Rough Sets were introduced by Pawlak in 1982 as a means
of investigating structural relationships in data. The technique, which, unlike classical statistical methods, does not
make probability assumptions, can provide new insights
into data and is particularly suited to situations where we
want to reason from qualitative or imprecise information.
Rough Sets allow the development of similarity measures that take account of semantic as well as syntactic
distance. Rough Set theory allows us to eliminate redundant or irrelevant attributes. The theory of Rough Sets has
been successfully applied to knowledge acquisition, process control, medical diagnosis, expert systems and Data
Mining. The first step in applying the method is to generalize the attributes using domain knowledge to identify
the concept hierarchy. After generalization, the next step
is to use reduction to generate a minimal subset of all the
generalized attributes, called a reduct. A set of general
rules may then be generated from the reduct that includes
all the important patterns in the data. When more than
one reduct is obtained, we may select the best according
to some criteria. For example, we may choose the reduct
that contains the smallest number of attributes.
6. Information Theory
The most important concept in Information Theory is
Shannons Entropy, which measures the amount of information held in data. Entropy quantifies to what extent
the data are spread out over its possible values. Thus high
entropy means that the data are spread out as much as possible while low entropy means that the data are nearly all
concentrated on one value. If the entropy is low, therefore,
we have high information content and are most likely to
come up with a strong rule.
Information Theory has also been used as a measure
of interestingness which allows us to take into account
how often a rule occurs and how successful it is. This is
carried out by using the J-measure, which measures the
amount of information in a rule using Shannon Entropy
and multiplies this by the probability of the rule coming
into play. We may therefore rank the rules and only present
the most interesting to the user.
C. Database Methods
1. Association Rules
An Association Rule associates the values of a given set
of attributes with the value of another attribute from outside that set. In addition, the rule may contain information about the frequency with which the attribute values
are associated with each other. For example, such a rule
might say that 75% of men, between 50 and 55 years
old, in management positions, take out additional pension
plans.
Along with the Association Rule we have a confidence
threshold and a support threshold. Confidence measures
the ratio of the number of entities in the database with the
designated values of the attributes in both A and B to
the number with the designated values of the attributes
in A. The support for the Association Rule is simply the
proportion of entities within the whole database that take
the designated values of the attributes in A and B.
Finding Association Rules can be computationally intensive, and essentially involves finding all of the covering
attribute sets, A, and then testing whether the rule A implies B, for some attribute set B separate from A, holds
with sufficient confidence. Efficiency gains can be made
by a combinatorial analysis of information gained from
previous passes to eliminate unnecessary rules from the
list of candidate rules. Another highly successful approach
is to use sampling of the database to estimate whether or
not an attribute set is covering. In a large data set it may
be necessary to consider which rules are interesting to the
user. An approach to this is to use templates, to describe
the form of interesting rules.
2. Data Manipulation Techniques
For Data Mining purposes it is often necessary to use
a large number of data manipulations of various types.
When searching for Association Rules, for example, tuples
EN004I-845
June 8, 2001
18:21
239
EN004I-845
June 8, 2001
18:21
240
then cleansed to remove inaccuracies and inconsistencies
and transformed to give a consistent view. Metadata that
maintain information concerning the source data are also
stored in the warehouse. Data within the warehouse is
generally stored in a distributed manner so as to increase
efficiency and, in fact, parts of the warehouse may be replicated at local sites, in data marts, to provide a facility for
departmental decision-making.
4. Visualization Methods
Visualization Methods aim to present complex and voluminous data sets in pictorial and other graphical representations that facilitate understanding and provide insight
into the underlying structures in the data. The subject
is essentially interdisciplinary, encompassing statistical
graphics, computer graphics, image processing, computer
vision, interface design and cognitive psychology.
For exploratory Data Mining purposes, we require flexible and interactive visualization tools which allow us to
look at the data in different ways and investigate different subsets of the data. We can highlight key features of
the display by using color coding to represent particular
data values. Charts that show relationships between individuals or objects within the dataset may be color-coded,
and thus reveal interesting information about the structure
and volume of the relationships. Animation may provide
a useful way of exploring sequential data or time series
by drawing attention to the changes between time points.
Linked windows, which present the data in various ways
and allow us to trace particular parts of the data from one
window to another, may be particularly useful in tracking
down interesting or unusual data.
5. Intelligent Agents
The potential of Intelligent Agents is increasingly having
an impact on the marketplace. Such agents have the capability to form their own goals, to initiate action without
instructions from the user and to offer assistance to the
user without being asked. Such software has been likened
to an intelligent personal assistant who works out what
is needed by the boss and then does it. Intelligent Agents
are essentially software tools that interoperate with other
software to exchange information and services. They act
as an intelligent layer between the user and the data, and
facilitate tasks that serve to promote the users overall
goals. Communication with other software is achieved by
exchanging messages in an agent communication language. Agents may be organized into a federation or
agency where a number of agents interact to carry out
different specialized tasks.
6. OLAP
The term OLAP (On-line Analytical Processing) originated in 1993 when Dr. E. F. Codd and colleagues developed the idea as a way of extending the relational database
paradigm to support business modeling. This development took the form of a number of rules that were designed to facilitate fast and easy access to the relevant
data for purposes of management information and decision support. An OLAP Database generally takes the
form of a multidimensional server database that makes
management information available interactively to the
user. Such multidimensional views of the data are ideally suited to an analysis engine since they give maximum flexibility for such database operations as Slice
and Dice or drill down which are essential for analytical
processing.
7. Parallel Processing
High-performance parallel database systems are displacing traditional systems in very large databases that have
complex and time-consuming querying and processing requirements. Relational queries are ideally suited to parallel
execution since they often require processing of a number
of different relations. In addition to parallelizing the data
retrieval required for Data Mining, we may also parallelize
the data processing that must be carried out to implement
the various algorithms used to achieve the Mining tasks.
Such processors may be designed to (1) share memory,
(2) share disks, or (3) share nothing. Parallel Processing
may be carried out using shared address space, which provides hardware support for efficient communication. The
most scaleable paradigm, however, is to share nothing,
since this reduces the overheads. In Data Mining, the implicitly parallel nature of most of the Mining tasks allows
us to utilize processors which need only interact occasionally, with resulting efficiency in both speed-up and
scalability.
8. Distributed Processing
Distributed databases allow local users to manage and access the data in the local databases while providing some
sort of global data management which provides global
users with a global view of the data. Such global views
allow us to combine data from the different sources which
may not previously have been integrated, thus providing
the potential for new knowledge to be discovered. The constituent local databases may either be homogeneous and
form part of a design which seeks to distribute data storage
and processing to achieve greater efficiency, or they may
be heterogeneous and form part of a legacy system where
EN004I-845
June 8, 2001
18:21
241
EN004I-845
June 8, 2001
18:21
242
C. Text Mining
Text may be considered as sequential data, similar in
this respect to data collected by observation systems. It
is therefore appropriate for Data Mining techniques that
have been developed for use specifically with sequential
data to be also applied to the task of Text Mining. Traditionally, text has been analyzed using a variety of information retrieval methods, including natural language
processing. Large collections of electronically stored text
documents are becoming increasingly available to a variety of end-users, particularly via the World Wide Web.
There is great diversity in the requirements of users: some
need an overall view of a document collection to see
what types of documents are present, what topics the
documents are concerned with, and how the documents
are related to one another. Other users require specific
information or may be interested in the linguistic structures contained in the documents. In very many applications users are initially unsure of exactly what they
are seeking, and may engage in browsing and searching
activities.
General Data Mining methods are applicable to the
tasks required for text analysis. Starting with textual data,
the Knowledge Discovery Process provides information
on commonly occurring phenomena in the text. For example, we may discover combinations of words or phrases
that commonly appear together. Information of this type
is presented using episodes, which contain such things
as the base form of a word, grammatical features, and the
position of a word in a sequence. We may measure, for example, the support for an episode by counting the number
of occurrences of the episode within a given text sequence.
For Text Mining, a significant amount of pre-processing
of the textual data may be required, dependent on the
domain and the users requirements. Some natural language analysis may be used to augment or replace some
words by their parts of speech or by their base forms.
Post-processing of the results of Text Mining is usually
also necessary.
EN004I-845
June 8, 2001
18:21
243
evaluated so that decisions for automatic correction or intervention can be taken if necessary. Machine Learning
technologies also provide the facility for failure diagnosis
in the maintenance of industrial machinery.
Industrial safety applications are another area benefiting
from the adoption of Data Mining technology. Materials
and processes may need to be classified in terms of their
industrial and environmental safety. This approach, as opposed to experimentation, is designed to reduce the cost
and time scale of safe product development.
B. Administration
C. Business
As we might expect, the major application area of Data
Mining, so far, has been Business, particularly the areas
of Marketing, Risk Assessment, and Fraud Detection.
In Marketing, perhaps the best known use of Data Mining is for customer profiling, both in terms of discovering
what types of goods customers tend to purchase in the
same transaction and groups of customers who all behave
in a similar way and may be targeted as a group. Where
customers tend to buy (unexpected) items together then
goods may be placed on nearby shelves in the supermarket or beside each other in a catalogue. Where customers
may be classified into groups, then they may be singled out
for customized advertising, mail shots, etc. This is known
as micro marketing. There are also cases where customers
of one type of supplier unexpectedly turn out to be also
customers of another type of supplier and advantage may
be gained by pooling resources in some sense. This is
known as cross marketing.
EN004I-845
June 8, 2001
18:21
244
Another use of Data Mining for Business has been for
Risk Assessment. Such assessment of credit worthiness
of potential customers is an important aspect of this use
which has found particular application to banking institutions where lending money to potentially risky customers
is an important part of the day-to-day business. A related
application has been to litigation assessment where a firm
may wish to assess how likely and to what extent a bad
debtor will pay up and if it is worth their while getting
involved in unnecessary legal fees.
Case Study I. A supermarket chain with a large
number of stores holds data on the shopping transactions and demographic profile of each customers
transactions in each store. Corporate management
wants to use the customer databases to look for global
and local shopping patterns.
D. Database Marketing
Database Marketing refers to the use of Data Mining techniques for the purposes of gaining business advantage.
These include improving a companys knowledge of its
customers in terms of their characteristics and purchasing
habits and using this information to classify customers;
predicting which products may be most usefully offered to
a particular group of customers at a particular time; identifying which customers are most likely to respond to a mail
shot about a particular product; identifying customer loyalty and disloyalty and thus improving the effectiveness of
intervention to avoid customers moving to a competitor;
identifying the product specifications that customers really want in order to improve the match between this and
the products actually offered; identifying which products
from different domains tend to be bought together in order to improve cross-marketing strategies; and detecting
fraudulent activity by customers.
One of the major tasks of Data Mining in a commercial arena is that of market segmentation. Clustering techniques are used in order to partition a customer database
into homogeneous segments characterized by customer
needs, preferences, and expenditure. Once market segments have been established, classification techniques are
used to assign customers and potential customers to particular classes. Based on these, prediction methods may be
employed to forecast buying patterns for new customers.
E. Medicine
Potential applications of Data Mining to Medicine provide
one of the most exciting developments and hold much
promise for the future. The principal medical areas which
have been subjected to a Data Mining approach, so far,
may be categorized as: diagnosis, treatment, monitoring,
and research.
The first step in treating a medical complaint is diagnosis, which usually involves carrying out various tests
and observing signs and symptoms that relate to the possible diseases that the patient may be suffering from. This
may involve clinical data, data concerning biochemical
indicators, radiological data, sociodemographic data including family medical history, and so on. In addition,
some of these data may be measured at a sequence of
time-points, e.g., temperature, lipid levels. The basic problem of diagnosis may be regarded as one of classification of the patient into one, or more, possible disease
classes.
Data Mining has tremendous potential as a tool for
assessing various treatment regimes in an environment
where there are a large number of attributes which measure
the state of health of the patient, allied to many attributes
and time sequences of attributes, representing particular
treatment regimes. These are so complex and interrelated,
e.g., the interactions between various drugs, that it is difficult for an individual to assess the various components
particularly when the patient may be presenting with a
variety of complaints (multi-pathology) and the treatment
for one complaint might mitigate against another.
Perhaps the most exciting possibility for the application of Data Mining to medicine is in the area of medical research. Epidemiological studies often involve large
numbers of subjects which have been followed-up over
a considerable period of time. The relationship between
variables is of considerable interest as a means of investigating possible causes of diseases and general health inequalities in the population.
Case Study II. A drug manufacturing company is
studying the risk factors for heart disease. It has data
on the results of blood analyses, socioeconomic data,
and dietary patterns. The company wants to find out
the relationship between the heart disease markers in
the blood and the other relevant attributes.
F. Science
In many areas of science, automatic sensing and recording
devices are responsible for gathering vast quantities of
data. In the case of data collected by remote sensing from
satellites in disciplines such as astronomy and geology the
amount of data are so great that Data Mining techniques
offer the only viable way forward for scientific analysis.
One of the principal application areas of Data Mining
is that of space exploration and research. Satellites provide immense quantities of data on a continuous basis
via remote sensing devices, for which intelligent, trainable image-analysis tools are being developed. In previous
large-scale studies of the sky, only relatively small amount
of the data collected have actually been used in manual attempts to classify objects and produce galaxy catalogs.
EN004I-845
June 8, 2001
18:21
245
Not only has the sheer amount of data been overwhelming for human consideration, but also the amount of data
required to be assimilated for the observation of a single
significant anomaly is a major barrier to purely manual
analysis. Thus Machine Learning techniques are essential
for the classification of features from satellite pictures, and
they have already been used in studies for the discovery
of quasars. Other applications include the classification
of landscape features, such as the identification of volcanoes on the surface of Venus from radar images. Pattern
recognition and rule discovery also have important applications in the chemical and biomedical sciences. Finding
patterns in molecular structures can facilitate the development of new compounds, and help to predict their chemical
properties. There are currently major projects engaged in
collecting data on the human gene pool, and rule-learning
has many applications in the biomedical sciences. These
include finding rules relating drug structure to activity for
diseases such as Alzheimers disease, learning rules for
predicting protein structures, and discovering rules for the
use of enzymes in cancer research.
Case Study III. An astronomy cataloguer wants to
process telescope images, identify stellar objects of interest and place their descriptions into a database for
future use.
G. Engineering
Machine Learning has an increasing role in a number of
areas of engineering, ranging from engineering design to
project planning. The modern engineering design process
is heavily dependent on computer-aided methodologies.
Engineering structures are extensively tested during the
development stage using computational models to provide
information on stress fields, displacement, load-bearing
capacity, etc. One of the principal analysis techniques
employed by a variety of engineers is the finite element
method, and Machine Learning can play an important role
in learning rules for finite element mesh design for enhancing both the efficiency and quality of the computed
solutions.
Other engineering design applications of Machine
Learning occur in the development of systems, such as
traffic density forecasting in traffic and highway engineering. Data Mining technologies also have a range of
other engineering applications, including fault diagnosis
(for example, in aircraft engines or in on-board electronics
in intelligent military vehicles), object classification (in oil
exploration), and machine or sensor calibration. Classification may, indeed, form part of the mechanism for fault
diagnosis.
As well as in the design field, Machine Learning
methodologies such as Neural Networks and Case-Based
Reasoning are increasingly being used for engineering
project management in an arena in which large scale international projects require vast amounts of planning to stay
within time scale and budget.
H. Fraud Detection and Compliance
Techniques which are designed to register abnormal transactions or data usage patterns in databases can provide an
early alert, and thus protect database owners from fraudulent activity by both a companys own employees and
by outside agencies. An approach that promises much for
the future is the development of adaptive techniques that
can identify particular fraud types, but also be adaptive to
variations of the fraud. With the ever-increasing complexity of networks and the proliferation of services available
over them, software agent technology may be employed in
the future to support interagent communication and message passing for carrying out surveillance on distributed
networks.
Both the telecommunications industry and the retail
businesses have been quick to realize the advantages of
Data Mining for both fraud detection and discovering failures in compliance with company procedures. The illegal
use of telephone networks through the abuse of special services and tariffs is a highly organized area of international
crime. Data Mining tools, particularly featuring Classification, Clustering, and Visualization techniques have been
successfully used to identify patterns in fraudulent behavior among particular groups of telephone service users.
V. FUTURE DEVELOPMENTS
Data Mining, as currently practiced, has emerged as a
subarea of Computer Science. This means that initial developments were strongly influenced by ideas from the
Machine Learning community with a sound underpinning from Database Technology. However, the statistical
community, particularly Bayesians, was quick to realize
that they had a lot to contribute to such developments.
Data Mining has therefore rapidly grown into the interdisciplinary subject that it is today.
Research in Data Mining has been led by the KDD
(Knowledge Discovery in Databases) annual conferences,
several of which have led to books on the subject (e.g.,
Fayyad et al., 1996). These conferences have grown in 10
years from being a small workshop to a large independent
conference with, in Boston in 2000, nearly 1000 participants. The proceedings of these conferences are still the
major outlet for new developments in Data Mining.
Major research trends in recent years have been:
r The development of scalable algorithms that can
EN004I-845
June 8, 2001
18:21
246
r The development of algorithms that look for local
BIBLIOGRAPHY
Adriaans, P., and Zantinge, D. (1996). Data Mining, Addison-Wesley,
MA.
Berry, M., and Linoff, G. (1997). Data Mining Techniques for Marketing, Sales and Customer Support, Wiley, New York.
Berson, A., and Smith, S. J. (1997). Data Warehousing, Data Mining,
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
Data Structures
Allen Klinger
University of California, Los Angeles
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
Introduction
Memory Allocation and Algorithms
Hierarchical Data Structures
Order: Simple, Multiple, and Priority
Searching and Sorting Techniques
Tree Applications
Randomness, Order, and Selectivity
Conclusion
GLOSSARY
Algorithm Regular procedure (like a recipe) that terminates and yields a result when it is presented with input
data.
Binary search tree Data structure used in a search.
Binary tree Tree in which each entry has no, one, or two
successors; a data structure that is used to store many
kinds of other data structures.
Bit Binary digit.
Circular linkage Pointers permitting more than one complete traversal of a data structure.
Data structure An abstract idea concerning the organization of records in computer memory; a way to arrange
data in a computer to facilitate computations.
Deque Double-ended queue (inputs and outputs may be
at both ends in the most general deques).
Double linkage Pointers in both directions within a data
structure.
Field Set of adjoining bits in a memory word; grouped
bits treated as an entity.
Graph Set of nodes and links.
Hash (hashing) Process of storing data records in a disorderly manner; the hash function calculates a key and
finds an approximately random table location.
Heap Size-ordered tree; all successor nodes are either
(consistently) smaller or larger than the start node.
Key Index kept with a record; the variable used in a sort.
Linear list One-dimensional data set in which relative
order is important. In mathematical terms, linear list
elements are totally ordered.
Link Group of bits that store an address in (primary, fast)
memory.
Linked allocation Method for noncontiguous assignment of memory to a data structure; locations or address for the next element stored in a part of the current
word.
List Treelike structure useful in representing recursion.
Minimal spanning tree Least sum-of-link path that connects all nodes in a graph with no cycles (closed circuits).
Node Element of a data structure consisting of one or
more memory words. Also, an entity in a graph connected to others of its kind by arcs or links.
263
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
264
Pointer See link.
Port Location where data enter or exit a data structure.
Priority queue Queue where each item has a key that
governs output; replacing physical position of records
in the data structure by their key values.
Quad-tree (also quadtree) A tree where every node has
four or fewer successor elements; abbreviated form of
quaternary tree.
Queue Linear list with two ports, one for inputs; the other
for outputs; data structure supporting algorithms that
operate in a first-infirst-out manner.
Recursion Repeated use of a procedure by itself.
Ring Circularly linked data structure.
Search Operation of locating a record or determining its
absence from a data structure.
Sequential allocation Assignment of memory by contiguous words.
Sort Permutation of a set of records to put them into a
desired order.
Stack One-port linear list that operates in a last-infirstout manner. This structure is heavily used to implement
recursion.
Tag Bit used to signal whether a pointer is as originally
intended; if the alternative, the pointer is a thread.
Thread Alternate use for a pointer in a data structure.
Tree Hierarchical data structure.
Trie Tree data structure employing (1) repeated subscripting, and (2) different numbers of node successors. A
data structure that is particularly useful for multiway
search; structure from information retrieval; useful
means for search of linguistic data.
Data Structures
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
265
Data Structures
I. INTRODUCTION
Data structures make it easier for a programmer to decide
and state how a computing machine will perform a given
task. To actually execute even such a simple algorithm as
multiplication of two numbers with whole and fractional
parts (as 2 14 times 3 13 ) using a digital device, a suitable
data structure must be chosen since the numeric information must be placed in the machine somehow. How
this is done depends on the representation used in the
data structurefor example, the number of fields (distinct information entities in a memory word) and their size
in binary digits or bits. Finally, the actual program must
be written as a step-by-step procedure to be followed involving the actual operations on the machine-represented
data. (These operations are similar to those a human
would do as a data processor performing the same kind of
algorithm.)
The topic of data structures fits into the task of programming a computer as mathematical reasoning leads from
word descriptions of problems to algebraic statements.
(Think of rate times time equals distance problems.)
In other words, data structure comparisons and choices
resemble variable definitions in algebra. Both must be undertaken in the early stages of solving a problem. Data
structures are a computer-oriented analogy of let x be
the unknown in algebra; their choice leads to the final
design of a program system, as the assignment of x leads
to the equation representing a problem.
The first three sections of this article introduce basic
data structure tools and their manipulation, including the
concepts of sequential and linked allocation, nodes and
fields, hierarchies or trees, and ordered and nonordered
storage in memory. The section on elementary and advanced structures presents stacks, queues, and arrays,
as well as inverted organization and multilists. The final sections deal with using data structures in algorithm
design. Additional data structure concepts, including binary search trees, parent trees, and heaps, are found in
these sections, as is a survey of sorting and searching
algorithms.
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
266
so forth. Physical stacks include piles of playing cards
in solitaire and rummy games and unread material on
desks.
Data structure methods help programs get more done
using less space. Storage space is valuablethere is only
a limited amount of primary computer storage (random
access, fast, or core memory; RAM stands for random access memory). Programs have been rewritten many times,
often to more efficiently use fast memory. Many techniques evolved from memory management programming
lore, some from tricks helpful in speeding the execution
of particular algorithms. Peripheral memory devices are
commonly referred to as external or secondary storage.
They include compact-disc read-only memory (CD-ROM,
disks, drums, and magnetic tape). Such devices enable retaining and accessing vast volumes of data. The use of
secondary stores is an aspect of advanced data structuring
methodology often dealt with under file management or
database management. This article is mainly concerned
with fast memory (primary storage).
Improvements in storage use or in execution speed occur when program code is closely matched to machine
capabilities. Data structure ideas do this by using special
computer hardware and software features. This can involve more full use of bits in computer words, fields of
words in primary storage, and segments of files in secondary storage.
Data structure ideas can improve algorithm performance in both the static (space) and dynamic (time) aspects of memory utilization. The static aspects involve the
actual memory allocation. The dynamic aspects involve
the evolution of this space over time (measured in central processor unit, or CPU, cycles) as programs execute.
Note that algorithms act on the data structure. Program
steps may simply access or read out a data value or rearrange, modify, or delete stored information. But, they also
may combine or eliminate partially allotted sets of memory words. Before-and-after diagrams help in planning dynamic aspects of data structures. Ladder diagrams help in
planning static allocations of fast memory. Memory allocation to implement a data structure in available computer
storage begins with the fundamental decision whether to
use linked or sequential memory. Briefly, linked allocation
stores wherever space is available but requires advanced
reservation of space in each memory word for pointers
to show where to go to get the next datum in the structure. Memory that is sequentially allocated, by contrast,
requires reservation of a fixed-sized block of words, even
though many of these words may remain unused through
much of the time an algorithm is being followed as its
program executes.
We now present an example that illustrates several other
data structure concepts; the notions of arrays (tables) and
Data Structures
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
267
Data Structures
1.25
3.75
2.35
50
3
32
0.25
3.75
3.75
7.50
8.82 105
8.82 105
64
21
1.75
2.10 106
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
268
bag or set. A node in either kind of tree has degree equal to
the number of its children, that is, the length of its sublist.
Another basic tree term is level. This signifies the number
of links between the first element in the hierarchy, called
the root, and the datum. As a result, we say that the root
is at level zero. A tree node with no successors is called a
tip or leaf.
Strings, linear lists with space markers added, are often
used to give text depictions of trees. There are several ways
to visually depict the hierarchical relationship expressed
by tree structures: nested parentheses, bar indentation, set
inclusion, and decimal pointsessentially the same device used when library contents are organized, for example, by the Dewey decimal system. All of these methods
can be useful. The kind of information that can be stored
by a tree data structure is both varied and often needed in
practical situations. For example, a tree can describe the
chapter outline of a book (Fig. 2a). On personal computers
or web browsers, the starting location is called the root or
home. It lists highest level information.
A binary tree is particularly useful because it is easy
to represent in computer memory. To begin with, the data
words (the contents stored at each node in the structure)
can be distinguished from successor links by a one-bit flag.
Another one-bit flag indicates a left/right characteristic
of the two pointers. Thus, at a cost of very little dedicated
storage, basic information that is to be kept can be stored
with the same memory word field assignments. The binary
tree version of Fig. 2a is Fig. 2b.
Binary trees always have tip nodes with unused pointers (link fields). Hence, at the small cost of a tag bit to
indicate whether the link field is in use, this space can be
used for traversal. The idea is that any pointer field not
used to locate other data in the binary tree structure can
be used to describe where to go in a traversal (via preorder, post-order, etc.). Hence, these tip fields can be used
to speed algorithm execution. These pointer memory location values may be useful in indicating where the algorithm should go next. If a tag bit indicates that the purpose
is data structure traversal, the locator is called a thread.
Data Structures
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
Data Structures
FIGURE 2 Trees and a book chapter outline. (a) General tree representation; (b) binary tree representation. Chap.,
chapter; Sect., section; SS, subsection.
269
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
270
items may be present only in certain intervals of coordinate directions. Both dense and block arrays are good
candidates for sequential storage, in contrast with sparse
arrays. A sparse array characteristically has a number of
data items present much smaller than the product of the
maximum coordinate dimensions, and data that are arbitrarily distributed within it. Linked allocation is preferred
for sparse arrays. Sparse arrays occur in routing applications. A table using a binary variable to show plants
where goods are manufactured and destinations where
they are sold would have many zeros signifying plantnot-connected-to-market. Use of a linked representation
for storing this sparse array requires much less storage
than by sequential means.
Compressed sequential allocation is another array storage method. It uses the concept of a base location and
stores only nonzero data items. It has the advantage that the
nonzero data items are stored in sequential order, so access
is faster than with linked representation. For r nonzero elements, where r is not very small compared to the product
the storage required for comof the array dimensions mn,
pressed sequential allocation is 2r . The base locations are
essentially list heads (see below), and searching a list of r
of them takes, on the average, of the order of log2r steps.
Arrays are also called orthogonal lists, the idea being that
the several coordinate directions are perpendicular; rows
and columns in a two-dimensional matrix are examples.
In an m n matrix A with elements a[i, j], a typical element A[k, l] belongs to two orthogonal linear lists (row list
A[k, *], column list A[*, l], where the asterisk represents
all the values, i.e., 1, . . . , m or 1, . . . , n).
Data Structures
B. Linkage
The method of double linkage of all nodes has advantages
in speeding algorithm execution. This technique facilitates
insertion and deletion in linear lists. Another advanced
method has pointers that link the beginning and end of a
data structure. The resulting data structures enable simplified traversal algorithms. This technique is usually called
circular linkage or sometimes simply rings; it is a valuable and useful method when a structure will be traversed
several times and is entered each time at a starting point
other than the starting datum (top in a stack, front in
a queue).
Whenever frequent reversal of traversal direction or insertion or deletion of data elements occurs, a doubly linked
structure should be considered. Two situations where this
often occurs are in discrete simulation and in algebra.
Knuth described a simulation where individuals waiting for an elevator may tire and disappear. Likewise, the
case of polynomial addition requires double linkage because sums of like terms with different signs necessitate
deletions.
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
271
Data Structures
list, their nature, and the rate of re-sort on given lists must
all be taken into account. Several measures of merit are
used to make quantitative comparisons between sorting
techniques; memory requirement is one such measure.
In essence, a sort yields a permutation of the entries in
an input list. However, this operation takes place not on
the information contained in the individual entries, but
rather on a simpler and often numerical entity known
as a key.
A. Sorting Algorithms
There are very many different sorting algorithms. Some
of them have descriptive names, including insertion sort,
distribution sorting, and exchange sorting. Another kind,
bubble sort, is based on a simple idea. It involves a small
key rising through a list of all others. When the list is
sorted, that key will be above all larger values. Some sorting methods rely on special data structures. One such case
is heap sort.
A heap is a size-ordered complete binary tree. The root
of the tree is thus either the largest of the key values or the
least, depending on the convention adopted. When a heap
is built, a new key is inserted at the first free node of the
bottom level (just to the right of the last filled node), then
exchanges take place (bubbling) until the new value is in
the place where it belongs.
Insertion sort places each record in the proper position
relative to records already sorted.
Distribution sort (also called radix sort) is based on the
idea of partitioning the key space into successively finer
sets. When the entire set of keys has been examined, all
relative positions in the list have been completely determined. (Alphabetizing a set is an example of a radix sort.)
When a large sorted list is out of order in a relatively
small area, exchange sorts can be useful. This is a kind of
strategy for restoring order. The process simply exchanges
positions of record pairs found out of order. The list is
sorted when no exchanges can take place.
Another sorting strategy takes the most extreme record
from an unsorted list, ends a sorted list to it, then continues
the process until the unsorted list is empty. This approach
is called sorting by selection.
Counting sort algorithms determine the position of a
particular key in a sorted list by finding how many keys
are greater (or less) than that chosen. Once the number
is determined, no further relative movement of the key
position is found.
Merging two sorted lists requires only one traversal of
each listthe key idea in merg sort. To sort a list by merging, one begins with many short sorted lists. Often those
runs of elements in a random list that are already in order form one of them. The process merges them two at a
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
272
Data Structures
entered each subfile at its midpoint, we would have a binary search tree with key value 8 at the root, left successor
4, and right successor 12. Fibonacci search replaces the
division-by-two operations necessary in a binary search
by simpler addition and subtraction steps. The Fibonacci
series consists of numbers that are each the sum of the
immediately preceding two in that sequence. A Fibonacci
series begins with two ones, and has values:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . .
In a Fibonaccian search, the elements of the binary search
tree are either Fibonacci numbers or derived from them;
the root is a Fibonacci number, as are all nodes reached
by only left links. Right links lead to nodes whose values are the ancestor plus the difference between it and
its left successor. That is, the difference between the ancestor and left successor is added to the ancestor to get
the right successor value. Fibonaccian binary search trees
have a total number of elements one less than a Fibonacci
number.
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
273
Data Structures
(b, d) = 12
(a, c) = 3,
(c, d) = 16
The tree with the form (a (b (d), c) has total weight 21,
which is less than that for the tree with d linked to c instead
of bi.e., a (b, c (d)). Both trees just described appear in
Fig. 4 (second row at right).
The minimal spanning tree algorithm uses the following
idea. Initially, all the nodes are members of different sets.
There are as many sets as nodes. Each node has itself as the
only member of its set. As the algorithm proceeds, at each
stage it groups more nodes together, just as in equivalence
methods. The algorithm stops when all nodes are in one
set. The parent tree data structure is the best one to use to
implement the minimal spanning tree algorithm.
Yet another algorithm concept and its implications for
data structure selection arise from the precedence situation introduced earlier. To create an efficient scheduling
procedure, observe that any task that does not require completion of others before it is begun can be started at any
time. In data structure terms, this is equivalent to an algorithm using the following fact: Removal of a graph element without a predecessor does not change the order of
priorities stored in the graph.
FIGURE 4 Spanning trees. (a) Four nodes with four spanning trees, all with node one at the root; (b) four nodes with
edge weights and two different spanning trees.
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
274
Data Structures
FIGURE 5 Cuboctahedron.
P1: GLM/GLE
EN004-846
June 8, 2001
16:16
275
Data Structures
VIII. CONCLUSION
The subject of data structures clearly contains a myriad
of technical terms. Each of the topics discussed has been
briefly mentioned in this article. A basic text on data structures would devote many pages to the fine points of use.
Yet the pattern of the subject is now clear. Before programming can begin, planning the algorithms and the data they
will operate on must take place. To have efficient computing, a wide range of decisions regarding the organization
of the data must be made. Many of those decisions will
be based on ideas of how the algorithms should proceed
(e.g., put equivalent nodes into the same tree). Others
will be based on a detailed analysis of alternatives. One
example is taking into account the likely range of values
and their number. This determines possible size of a data
table. Both table size and retrieval of elements within it
impact key choices. Two computer data structure considerations are always memory needed and processing time
or algorithm execution speed.
Many aspects of data structures and algorithms involve
data stored where access is on a secondary device. When
that is the situation, procedures deal with search and sorting. Sorting occupies a substantial portion of all the computing time used and contains numerous alternate algorithms. Alternative means exist because data sometimes
make them advantageous.
As in the case of sorting, it is always true that actual
choice of how an algorithmic task should be implemented
can and should be based on planning, analysis, and tailoring of a problem that is to be solved. The data structure
also needs to take into account the computer hardware
characteristics and operating system.
Explosive development of computer networks, the
World Wide Web, and Internet browsers means that
many technical terms discussed in this article will join
links, pointers, and hierarchy in becoming common terms,
not solely the province of computer experts using data
structures.
BIBLIOGRAPHY
Aho, A., Hopcroft, J., and Ullman, J. (1983). Data Structures and Algorithms, Addison-Wesley, Reading, MA.
Dehne, F., Tamassia, R., and Sack, J., eds. (1999). Algorithms and
Data Structures, (Lecture Notes in Computer Science), Proc. 6th
Int. Workshop, Vancouver, Canada, August 1114, Springer-Verlag,
New York.
Graham, R., Knuth, D., and Patashnik, O. (1988). Concrete Mathematics, Addison-Wesley, Reading, MA.
Knuth, D. (1968). The Art of Computer Programming, Vol. I, Fundamental Algorithms, Addison-Wesley, Reading, MA.
Knuth, D. (1973). The Art of Computer Programming, Vol. III, Sorting and Searching, Addison-Wesley, Reading, MA.
Knuth, D. (1981). The Art of Computer Programming, Vol. II,
Seminumerical Algorithms, Addison-Wesely, Reading, MA.
Meinel, C. (1998), Algorithmen und Datenstrukturen im VLSI-Design
[Algorithms and Data Structures in VLSI Design], Springer-Verlag,
New York.
Sedgewick, R. (1990). Algorithms in C, Addison-Wesley, Reading,
MA.
Sedgewick, R. (1999). Algorithms in C++: Fundamentals, Data Structures, Sorting, Searching, 3rd ed., Addison-Wesley, Reading, MA.
Waite, M., and Lafore, R. (1998). Data Structures and Algorithms in
Java, Waite Group.
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
Databases
Alvaro A. A. Fernandes
Norman W. Paton
University of Manchester
I.
II.
III.
IV.
V.
VI.
GLOSSARY
Application A topic or subject for which information systems are needed, e.g., genomics.
Conceptual model A data model that describes application concepts at an abstract level. A conceptual
model for an application may be amenable to implementation using different database management
systems.
Concurrency control Mechanisms that ensure that each
individual accessing the database can interact with the
database as if they were the only user of the database,
with guarantees as to the behavior of the system when
many users seek to read from or write to the same data
item at the same time.
Database A collection of data managed by a database
management system.
Database management system A collection of services
that together give comprehensive support to applications requiring storage of large amounts of data that
are to be shared by many users.
DBMS See database management system.
Data model A collection of data types which are
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
214
A DATABASE is a collection of data managed by a
database management system (DBMS). A DBMS provides facilities for describing the data that are to be stored,
and is engineered to support the long-term, reliable storage of large amounts of data (Atzeni et al., 1999). DBMSs
also provide query language and/or programing language
interfaces for retrieving and manipulating database data.
Many organizations are dependent in a significant way
upon the reliability and efficiency of the DBMSs they
deploy.
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
215
Databases
application requirements for persistent data. The permissible operations associated with each data type become
the building blocks with which developers model application requirements for interactions with persistent data.
Data models, therefore, differ primarily in the collection
of data types that they make available. The differences can
relate to structural or behavioral characteristics. Structural
characteristics determine what states the database may be
in (or, equivalently, are legal under the data model). Behavioral characteriztics determine how database states can be
scrutinized and what transitions between database states
are possible under the data model.
A DBMS is said to implement a data model in the sense
that:
1. It ensures that every state of every database managed
by it only contains instances a data type that is
well-defined under the implemented data model.
2. It ensures that every retrieval of data and every state
transition in every database managed by it are the
result of applying an operation that is (and only
involves types that are) well defined under the
implemented data model.
From the point of view of application developers then,
the different structural or behavioral characteriztics associated with different data models give rise to the pragmatic requirement of ensuring that the data model chosen
to model application requirements for persistent data does
not unreasonably distort the conceptual view of the application that occurs most naturally to users. The main data
models in widespread use at the time of this writing meet
this pragmatic requirement in different degrees for different kinds of application.
A. The Relational Data Model
The relational data model makes available one single data
type, referred to as a relation. Informally, a relation can
be thought of as a table, with each row corresponding to
an instance of the application concept modeled by that
relation and each column corresponding to the values of a
property that describes the instances. Rows are referred to
as tuples and column headers as attributes. More formal
definitions now follow.
Let a domain be a set of atomic values, where a value
is said to be atomic if no further structure need be discerned in it. For example, integer values are atomic, and
so, quite often, are strings. In practice, domains are specified by choosing one data type, from the range of primitive
data types available in the software and hardware environment, whose values become elements of the domain being
specified.
A relation schema R(A1 , . . . , An ) describes a relation
R of degree n by enumerating the n-attributes A1 , . . . , An
that characterize R. An attribute names a role played by
some domain D in describing the entity modeled by R.
The domain D associated with an attribute A is called the
domain of A and is denoted by dom(A) = D. A relational
database schema is a set of relation schemas plus the
following relational integrity constraints:
1. Domain Constraint: Every value of every attribute is
atomic.
2. Key Constraint: Every relation has a designated
attribute (or a concatenation thereof) which acts as the
primary key for the relation in the sense that the
value of its primary key identifies the tuple uniquely.
3. Entity Integrity Constraint: No primary key value can
be the distinguished NULL value.
4. Referential Integrity Constraint: If a relation has a
designated attribute (or a concatenation thereof)
which acts as a foreign key with which the relation
refers to another, then the value of the foreign key in
the referring relation must be a primary key value in
the referred relation.
For example, a very simple relational database storing nucleotide sequences might draw on the primitive
data type string to specify all its domains. Two relation names might be DNA-sequence and organism. Their schemas can be represented as in Fig. 1. In
this simple example, a DNA sequence has an identifying attribute sequence id, one or more accession
nos by which the DNA sequence is identified in other
databases, the identifying attribute protein id which
the DNA sequence codes for and the organism ids of
the organisms where the DNA sequence has been identified. An organism has an identifying attribute organism id, perhaps a common name and the up node of
the organism in the adopted taxonomy.
A relation instance r (R) of a relation schema R(A1 ,
. . . , An ) is a subset of the Cartesian product of the domains of the attributes that characterize R. Thus, a relation instance of degree n is a set, each element of
which is an n-tuple of the form v1 , . . . , vn such that
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
216
Databases
either vi dom(Ai ), 1 i n, or else vi is the distinguished NULL value. A relational database state is
a set of relation instances. Legal relation instances for
DNA sequence and organism from Fig. 1 might be
as depicted in Fig. 2.
Notice, in Fig. 2, how each tuple in DNA sequence
is an element of the Cartesian product:
dom(sequence id) dom(accession no)
dom(protein id) dom(organism id)
and correspondingly in the case of organism. Notice
also, in Figs. 1 and 2, that all the relational integrity constraints are satisfied, as follows. The domain constraint is
satisfied due to the fact that the domain of every attribute
is the primitive data type string, whose instances have
been assumed to be atomic. The designated primary keys
for DNA sequence and organism are, respectively,
sequence id and organism id, and since they are
intended as unique identifiers of the entities modeled,
the key constraint is satisfied. The entity integrity constraint is satisfied assuming that tuples that are not shown
also do not contain NULL as values for the primary keys.
There is a designated foreign key, organism id, from
DNA sequence to organism, and since the values of
organism id appear as primary key values in organism, the referential integrity constraint is satisfied.
The relational data model has several distinctive features as a consequence of its simple mathematical characterization:
r Since a relation instance is a set of tuples, no order
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
217
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
218
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
Databases
Because an object-oriented model brings together structure and behavior, much of the functionality required by
applications to interact with application objects can be
modeled inside the database itself. Nevertheless, most
object-oriented database systems provide application program interfaces that allow a programming language to
retrieve and manipulate objects.
Unlike the relational model, object-oriented data models cannot all be traced back to a single major proposal.
The closest thing to this notion is a manifesto (found, e.g.,
in Atkinson et al. (1990)) signed by prominent members
of the research community whose recommendations are
covered by and large in the treatment given in this section. The ODMG industry consortium of object database
vendors has proposed an object-oriented data model that
is widely seen as the de facto standard (Cattell et al.,
2000).
C. Object-Relational Data Models
While object-oriented data models typically relegate relations to the status of one among many types that
can be built using type constructors (in this case, e.g.,
tuple of), object-relational data models retain the central role relations have in the relational data model.
However, they relax the domain constraint of the relational data model (thereby allowing attribute values to be
drawn from complex domains), they incorporate some of
219
the distinctive features of the object-oriented model such
as inheritance and assigned behavior with encapsulated
states, and they allow for database-wide functionality to
be specified by means of rules (commonly referred to as
triggers, as described in Section IV.B) that react to specific interactions with application entities by carrying out
some appropriate action.
As a consequence of the central role that relations retain
in object-relational data models, one crucial difference
with respect to the object-oriented case is that the role
played by object identity is relaxed to an optional, rather
than mandatory, feature. Thus, an object-relational DBMS
stands in an evolutionary path regarding relational ones,
whereas object-oriented ones represent a complete break
with the relational approach. In this context, notice that
while a tuple of type constructor may allow a relation
type to be supported, each tuple will have an identity, and
attribute names will be explicitly needed to retrieve and
interact with values.
Such differences at the data model level lead to pragmatic consequences of some significance at the level of
the languages used to interact with application entities.
In particular, while object-oriented data models naturally
induce a navigational approach to accessing values, this
leads to chains of reference of indefinite length that need
to be traversed, or navigated.
In contrast, object-relational data models retain the
associative approach introduced by the relational data
model. Roughly speaking, this approach is based on viewing the sharing of values between attributes in different relations as inherently establishing associations between relation instances. These associations can then be exploited
to access values across different relations without specifically and explicitly choosing one particular chain of references. These issues are exemplified in Sections III.B
and III.C
Since object-relational models are basically hybrids,
their notions of schema and instance combine features
of both relational and object-oriented schemas and instances. The object-relational approach to modeling the
gene entity type is the same as the object-oriented one
depicted in Figs. 4 and 5 but for a few differences, e.g.,
relationships are usually not explicitly supported as such
by object-relational data models.
As in the object-oriented case, object-relational data
models were first advocated in a concerted manner by
a manifesto (Stonebraker et al., 1990) signed by prominent members of the research community. A book-length
treatment is available in Stonebraker and Brown (1999).
Any undergraduate textbook on Database Systems (e.g.,
Atzeni et al., 1999; Elmasri and Navathe, 2000) can be
used to complement the treatment of this and the section
that follows.
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
220
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
221
Databases
FIGURE 7 Using SQL to effect state transitions in the relation instances from Fig. 2.
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
222
Databases
formal treatment of some of the issues arising in objectoriented languages can be found in Abiteboul et al. (1995).
C. Object-Relational Database Languages
The proposed standard for object-relational database languages is SQL-99. Figure 14 shows how Fig. 4 could be
specified in SQL-99. Note the use of ROW TYPE to specify a complex domain, the use of REF to denote tuple identifiers and the use of type constructors such as SET and
LIST. Note also that, unlike ODL (cf. Fig. 12), in SQL-99
inverse relationships are not declared. Note, finally, how
gene is modeled as including operations, as indicated by
the keyword FUNCTION introducing the behavioral part
of the specification of gene.
Two SQL-99 queries over the gene type in Fig. 14
are given in Fig. 15. Query ORQ1 returns a binary table relating the standard name of each gene with the
common name of organisms where the gene is found.
Query ORQ2 returns the common name of organisms
associated with genes that have alleles. Note that in
SQL-99 path expressions use the symbol -> to dereference identifiers and (not shown in Fig. 15) the symbol ..
to denote attributes in row types.
SQL-99 is legally defined by ISO/ANSI standards
which are available from those organizations.
A. Deductive Databases
Deductive database systems can be seen as bringing together mainstream data models with logic programming
languages for querying and analyzing database data. Although there has been research on the use of deductive
databases with object-oriented data models, this section
illustrates deductive databases in the relational setting, in
particular making use of Datalog as a straightforward deductive database language (Ceri et al., 1990).
A deductive (relational) database is a Datalog program. A Datalog program consists of an extensional
database and an intensional database. An extensional
database contains a set of facts of the form:
p(c1 , . . . , cm )
where p is a predicate symbol and c1 , . . . , cm are constants. Each predicate symbol with a given arity in the
extensional database can be seen as analogous to a relational table. For example, the table organism in Fig. 2
can be represented in Datalog as:
organism(Bras. napus, rape,
Brassicaceae,).
organism (Trif. repens,
white clover, Papilionoideae,).
Traditionally, constant symbols start with a lower
case letter, although quotes can be used to delimit other
constants.
An intensional database contains a set of rules of the
form:
p(x1 , . . . , xm ): q1 (x11 , . . . , x1k ), . . . , q j (x j1 , . . . , x j p )
where p and qi are predicate symbols, and all argument
positions are occupied by variables or constants.
Some additional terminology is required before examples can be given of Datalog queries and rules. A term is
either a constant or a variablevariables are traditionally
written with an initial capital letter. An atom p(t1 , . . . , tm )
consists of a predicate symbol and a list of arguments,
each of which is a term. A literal is an atom or a negated
atom p(t1 , . . . , tm ). A Datalog query is a conjunction
of literals.
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
223
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
224
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
225
Databases
V. DISTRIBUTION
A conventional DBMS generally involves a single server
communicating with a number of user clients, as illustrated in Fig. 18. In such a model, the services provided
by the database are principally supported by the central
server, which includes a secondary storage manager, concurrency control facilities, etc., as described in Section I.
In relational database systems, clients generally communicate with the database by sending SQL query or update statements to the server as strings. The server then
compiles and optimizes the SQL, and returns any result or
error reports to the client.
Clearly there is a sense in which a client-server database
is distributed, in that application programs run on many
clients that are located in different parts of the network.
However, in a client-server context there is a single server
managing a single database described using a single data
model. The remainder of this section introduces some
of the issues raised by the relaxation of some of these
restrictions.
A. Distributed Databases
In a distributed database, data is stored in more than one
place, as well as being accessed from multiple places, as
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
226
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
227
Databases
P1: LDK/LPB
En004I-844
June 8, 2001
2:29
228
analytical processing (OLAP) has been used to characterize this different kind of access to databases, in which
typical transactions read rather than update the database,
and may perform complex aggregation operations on the
database. As the data required for OLAP often turn out to
be stored in several OLTP databases, it often proves to be
necessary to replicate data in a system designed specifically to support OLAPsuch a replicated store is known
as a data warehouse.
A data warehouse architecture is illustrated in Fig. 20
(Anahory and Murray, 1997). A data warehouse contains
three principal categories of information: detailed information (for example, in the genomic context, this may be
the raw sequence data or other data on the function of individual protein); summary information (for example, in
the genome context, this may record the results of aggregations such as the numbers of genes associated with a
given function in each species); and meta data (which describes the information in the warehouse and where it came
from).
An important prerequisite for the conducting of effective analyses is that the warehouse contains appropriate
data, of adequate quality, which is sufficiently up-to-date.
This means that substantial effort has to be put into loading the data warehouse in the first place, and then keeping the warehouse up-to-date as the sources change. In
Fig. 20, data essentially flow from the bottom of the figure
to the top. Sources of data for the warehouse, which are
often databases themselves, are commonly wrapped so
that they are syntactically consistent, and monitored for
changes that may be relevant to the warehouse (e.g., using
active rules). The load manager must then merge together
information from different sources, and discard information that doesnt satisfy relevant integrity constraints. The
query manager is then responsible for providing comprehensive interactive analysis and presentation facilities
for the data in the warehouse.
VI. CONCLUSIONS
This chapter has provided an introduction to databases,
and in particular to the models, languages, and systems
that allow large and complex data-intensive applications
to be developed in a systematic manner. DBMSs are now
quite a mature technology, and almost every organization
of any size uses at least one such system. The DBMS
will probably be among the most sophisticated software
systems deployed by an organization.
This is not to say, however, that research in database
management is slowing down. Many data-intensive applications do not use DBMSs, and developments continue,
Databases
BIBLIOGRAPHY
Abiteboul, S., Hull, R., and Vianu, V. (1995). Foundations of Databases,
Addison-Wesley. ISBN 0-201-53771-0.
Anahory, S., and Murray, D. (1997). Data Warehousing in the Real
World, Addison-Wesley. ISBN 0-201-17519-3.
Atkinson, M., Bancilhon, F., DeWitt, D., Dittrich, K., Maier, D., and
Zdonik, S. B. (1990). The Object-Oriented Database System Manifesto, In [(Kim et al., 1990)], pp. 223240.
Atzeni, P., Ceri, S., Paraboschi, S., and Torlone, R. (1999). Database
Systems: Concepts, Languages and Architectures, McGraw-Hill.
Cattell, R. G. G., Barry, D. K., Berler, M., Eastman, J., Jordan, D.,
Russell, C., Schadow, O., Stanienda, T., and Velez, F. (2000). The
Object Data Standard: ODMG 3.0, Morgan Kaufman, ISBN 1-55860647-5.
Ceri, S., Gottlob, G., and Tanca, L. (1990). Logic Programming and
Databases, Springer-Verlag, Berlin. ISBN 0-387-51728-6.
Codd, E. F. A Relational Model of Data for Large Shared Data Banks,
Communications of the ACM 13(6): 377387, June 1970; Also in
CACM 26(1) January 1983 pp. 6469.
Elmasri, R., and Navathe, S. B. (2000). Fundamentals of Database Systems, Addison-Wesley, Reading, MA, USA, 3rd. edition, ISBN 0-20154263-3.
Garcia-Molina, H., Ullman, J., and Widom, J. (2000). Database System
Implementation, Prentice Hall. ISBN 0-13-040264-8.
Kim, W., Nicolas, J.-M., and Nishio, S. (eds.). (1990). Deductive
and Object-Oriented Databases (First International Conference
DOOD89, Kyoto), Amsterdam, The Netherlands, Elsevier Science
Press (North-Holland), ISBN 0-444-88433-5.
Melton, J., and Simon, A. R. (1993). Understanding the New SQL: A
Complete Guide, Morgan Kaufman, ISBN 1-55860-245-3.
Paton, N., and Diaz, O. (1999). Active Database Systems, ACM Computing Surveys 1(31), 63103.
Stonebraker, M., and Brown, P. (1999). Object-Relational DBMS: Tracking the Next Great Wave, Morgan Kaufman, ISBN 1-55860-452-9.
Stonebraker, M., Rowe, L. A., Lindsay, B. G., Gray, J., Carey, M. J.,
Brodie, M. L., Bernstein, P. A., and Beech, D. Third-Generation
Database System ManifestoThe Committee for Advanced DBMS
Function, SIGMOD Record 19(3): 3144, September 1990.
EN005I-847
12:41
Evolutionary Algorithms
and Metaheuristics
Conor Ryan
University of Limerick
I.
II.
III.
IV.
V.
VI.
GLOSSARY
Chromosome A strand of DNA in the cell nucleus that
carries the genes. The number of genes per chromosome and number of chromosomes per individual depend on the species concerned. Often, evolutionary algorithms use just 1 chromosome; humans, for example,
have 46.
Cossover Exchange of genetic material from two parents
to produce one or more offspring.
Genotype The collective genetic makeup of an organism.
Often in evolutionary algorithms, the genotype and the
chromosome are identical, but this is not always the
case.
Hill climber A metaheuristic that starts with a potential
solution at a random place in the solution landscape.
Random changes are made to the potential solution
which, in general, are accepted if they improve its performance. Hill climbers tend to to find the best solution
in their immediate neighborhood.
673
EN005I-847
12:41
674
the next generation. The probability that an individual
will reproduce (and thus have a longer lifetime) is directly proportional to its fitness.
Selection In evolutionary algorithms, individuals in a new
generation are created from parents in the previous generation. The probability that individuals from the previous generation will be used is directly proportional to
their fitness, in a similar manner to Darwinian survival
of the fittest.
Solution landscape A map of the fitness of every possible
set of inputs for a problem. Like a natural landscape, a
solution landscape is characterized by peaks, valleys,
and plateaus, each reflecting how fit its corresponding
inputs are.
Weak artificial intelligence An artificial intelligence
(AI) method that does not employ any information
about the problem it is trying to solve. Weak AI methods do not always find optimal solutions, but can be
applied to a broad range of problems.
I. METAHEURISTICS VERSUS
ALGORITHMS
A. Introduction
An algorithm is a fixed series of instructions for solving
a problem. Heuristics, on the other hand, are more of a
rule of thumb used in mathematical programming and
usually mean a procedure for seeking a solution, but without any guarantee of success. Often, heuristics generate a
reasonably satisfactory solution, which tends to be in the
neighborhood of an optimal one, if it exists.
Heuristics tend use domain-specific knowledge to explore landscapes, which is usually given by an expert in the
area. This renders them less than useful when applied to
other problem areas, as one cannot expect them to be general enough to be applied across a wide range of problems.
Metaheuristics, on the other hand, operate at a higher level
than heuristics, and tend to employ little, if any, domainspecific knowledge.
The absence of this knowledge qualifies metaheuristics
as weak methods in classic artificial intelligence parlance.
Although they have little knowledge which they can apply
to the problem area, they are general enough to be applicable across a broad range of problems. Metaheuristics
can be broadly divided into two groups: the first continuously modifies a single potential solution until it reaches
a certain performance threshold; the second employs a
population of candidate solutions which is effectively
evolved over time until one of the candidates performs
satisfactorily.
B. Solution Landscapes
Often, when discussing metaheuristics, or indeed, any
search method, one refers to the solution landscape. Simply put, the solution landscape for a problem is a mapping of every set of inputs to their corresponding output.
Consider the graph in Fig. 1, which shows the solution
landscape for the function X X 2 . Clearly, the maximum
value for the function is at X = 0, and any search method
should quickly locate the area in which this value lies and
then slowly home in on it.
Not all solution landscapes have such a simple shape,
however. Figure 2 gives the landscape for the function
2
x
0.1
F 2 (x) = exp 2 log(2)
sin6 (5 x).
0.8
This function contains five peaks, each at different
heights. Landscapes such as these can cause difficulties for
searches because if one of the lower peaks is mistakenly
identified as the optimal value, it is unlikely that the search
EN005I-847
12:41
675
EN005I-847
12:41
676
which consists of a number of genes that encode the individuals behavior. What the genes encode is entirely dependent on the problem at hand, and we will see later that
different evolutionary algorithms use different encoding
schemes and are often suited to very different problem
domains. Thus, when trying to maximize or minimize a
function, it would be appropriate to represent a real number with an individuals chromosome, while if one were
trying to generate control routines for a robot, it would be
more appropriate to encode a program.
The genes of an individual are collectively known as its
genotype, while their physical expression, or result of decoding, is referred to as the phenotype. Some EAs exploit
the mapping process from genotype to phenotype, while
others rely on the fact that the phenotype is directly encoded, i.e., the genotype and the phenotype are the same.
Based on the Darwinian process of survival of the
fittest, those individuals that perform best at the problem survive longer and thus have more of an opportunity
to contribute to new offspring. While the exact manner in
which individuals are combined to produce new individuals depends on both the particular evolutionary algorithm being employed and the type of structures being
manipulated, most evolutionary algorithms follow a series of steps similar to that in Fig. 4.
The origin of life in genetic algorithms happens in a
somewhat less romantic fashion than the sudden spark
which gave rise to life from the primordial ooze on earth.
In a manner not unlike that suggested by the theory of
directed panspermia,1 the implementor of a genetic algorithm seeds the initial population with an appropriate
1 Panspermia is the belief that the life on earth derives from seeds
of extraterrestial origin. According to the notion of directed panspermia,
these seeds were deliberately sent out by intelligent beings.
EN005I-847
12:41
677
A common view is that mutation is an explorative operator, in that a random change in an individuals genes can
cause it to jump to a previously unexplored region in the
solution landscape. It then follows that crossover is an
exploitative operator, as it effectively concentrates the
search in an area already occupied by individuals. However, it is a matter of some debate as to which of the operators is the more important, and there is considerable
disagreement in particular on the role played by mutation.
Some of the evolutionary algorithms discussed below have
very different views on the importance of each. For example, genetic programming generally does not employ
any mutation, while evolutionary programming does not
employ any crossover. It is perhaps indicative of the complexity of the evolutionary processes that no consensus
has been reached.
What all proponents of evolutionary algorithms would
agree on, however, is that evolutionary search is not random search. Although the initial population is random,
further operations are only performed on the more fit
individuals, which results in a process that is distinctly
nonrandom and produces structures that far outperform
any generated by a strictly random process.
A. Genetic Algorithms
Genetic algorithms are possibly the simplest evolutionary
algorithms and deviate the least from the basic description
given above. Genetic algorithms tend to use fixed-length
binary strings to represent their chromosomes, which are
analogous to the base-4 chromosomes employed in DNA.
However, there is no particular reason why chromosome
length should be fixed, nor why a representation scheme
other than binary could not be used. Figure 6 shows an example of chromosomes of different lengths being crossed
over.
Genetic algorithms are most often used in function maximization (or minimization) or object selection problems.
In a function maximization problem, the algorithm must
find a value for x that maximizes the output of some
function f (x). The chromosomes of individuals are thus
simple binary (or possibly gray-coded) numbers, and the
fitness for each individual is the value the function generates with it as the input. Consider the sample individuals in Table I, which are trying to maximize the problem
f (x) = X 3 X 2 + X . In this case, we are using five-bit
numbers, the first-bit of which is used as a sign.
In this extremely small population, the individual 11011
is clearly the best performer, but the rule for selection is
simply that the greater the fitness of an individual, the more
likely it is to be selected. This suggests that even the relatively poor individual 10011 should have some chance,
too. The simplest selection scheme is roulette wheel
selection. In this scheme, each individual is assigned a
EN005I-847
12:41
678
(1)
That is, the probability that an individual will be selected is equal to the individuals fitness divided by the
total fitness.
The total fitness is 30,977, so the probability that the
fittest individual will be selected is 14,509/30,977, or
.468, while the probability that the least fit individual will
be selected is just 69/30977, or .002. This ensures that
although the less fit individuals have some chance of contributing to the next generation, it is far more likely to be
the more fit individuals who contribute their genes. This
is analogous to individuals in a population competing for
resources, where the more fit individuals attain more.
In an experiment such as this, the flowchart in Fig. 4 is
repeated until there is no longer change in the population,
which means that no more evolution is taking place.
TABLE I Example Individuals from a Simple
Genetic Algorithm Population
Individual
Binary
Decimal
Fitness
1
2
3
4
01010
11011
01001
10011
10
11
9
3
9,910
14,509
6,489
69
Types of Problems
Genetic algorithms are typically used on function optimization problems, where they are used to calculate which
input yields the maximum (or minimum) value. However,
they have also been put to use on a wide variety of problems. These range from game-playing, where the genotype encodes a series of moves, to compiler optimization,
in which case each gene is an optimization to be applied
to a piece of code.
B. Genetic Programming
Genetic programming (GP) is another well-known evolutionary algorithm, and differs from the genetic algorithm
described above in that the individuals it evolves are parse
trees instead of binary strings. The benefit of using parse
trees is that individuals can represent programs, rather than
simply represent numbers.
A simple parse tree is illustrated in Fig. 7. The nodes of
a tree contain functions, such as + or , while the leaves
contain the arguments to that function. Notice that, because all functions in a parse tree return a value, it is
possible to pass the result of a function as the argument
to another function. For example, the expression a + b * c
would be represented as in Fig. 7.
Parse trees also have a convenient linear representation, identical to that used in the language Lisp. For ex+ 1 2), i.e., the
ample, the left tree in Fig. 7 would be (+
function followed by its arguments, while the right tree in
+ a (* b c)). Because of the
Fig. 7 would be described by (+
close relationship between Lisp and parse trees and the
EN005I-847
12:41
679
2. Initial Population
In common with most evolutionary algorithms, the initial
population in GP is randomly generated. However, due to
the variable sizes of the individuals, there are a number of
methods for producing this initial population. One such
method, the grow method, randomly chooses which functions and terminals to put in. This can result in a variety
of shapes and sizes of trees, as the choice of a function
will cause an individual to keep growing. Usually, there
is an upper limit set on the depth of individuals, which,
once encountered, forces the inclusion of a terminal. Each
EN005I-847
12:41
680
3. Types of Problems
The flexible representation employed by GP has given rise
to a wide variety of applications. One area where GP has
enjoyed much success is in that of symbolic regression,
also refered to as sequence problems. Symbolic regression
problems involve determining a function that maps a set
C. Evolutionary Programming
Evolutionary programming (EP) is distinct from most
other types of evolutionary algorithms in that it operates
at a higher level, that is, instead of groups of individuals,
it maintains groups of species which are evolving toward
a particular goal.
The main steps in EP are similar to that of other GAs,
with the exception of crossover. Crossover does not take
place between individuals in EP because the finest grain
in the population is a species, and crossover tends not to
take place between different species in nature. Instead of
crossover, the driving force behind EP is mutation, the idea
of which is to reinforce the notion that, in general, new
generations of a species tend to be somewhat similar to
EN005I-847
12:41
681
EN005I-847
12:41
682
E. Grammatical Evolution
EN005I-847
12:41
683
(0)
(1)
(2)
(3)
In this case there are four possible rules which can be applied to expr. If the current codon integer value was
2, this would give 2 MOD 4, which equals 0. In this
case, then, the zeroth production rule is applied to the
nonterminal resulting in the replacement of expr with
expropexpr; the mapping continues by reading
codon after codon using the generated number to select
appropriate rules for each nonterminal until a completed
program is achieved.
It is the modulus function to select production rules that
gives the system a degenerate genetic code. That is, many
different codon integer values can represent the same production rule. A consequence of this is that a mutation at a
codon while changing the integer value will not necessarily change the production rule applied at that instance, the
result being a functionally equivalent phenotype. A mutation of this type has been termed a neutral mutation, and it
is proposed that these neutral mutations are responsible for
maintaining genetic diversity in biological populations; a
similar effect has been observed in GE.
EN005I-847
12:41
684
V. MEMETIC ALGORITHMS
While the population-based evolutionary algorithms may
appear to be quite different from the single-entity-based
algorithms of simulated annealing and tabu search, there
is no reason why they cannot work together. After all,
our own evolution is characterized by both the evolution
of genes and the evolution of ideas. An individual often
benefits from the experience of its parents, and during its
own lifetime, strives to become as successful as possible
in EA terms, to become as fit as it can be.
Memetic algorithms, sometimes refered to as hybrid
genetic algorithms, attempt to capture this behavior. At a
simple level, they are similar to evolutionary algorithms,
EN005I-847
12:41
685
but each individual also undergoes some kind of hill climbing, possibly simulated annealing or tabu search. The fitness of that individual becomes whatever the fitness is
after the hill-climbing. Figure 13 shows a flowchart for
a memetic algorithm. As can be seen, the only difference
between this and the flowchart for evolutionary algorithms
in Fig. 4 is the extra Perform Local Search step.
In true metaheuristic fashion, there is no detail about
either the method used to produce each generation or the
local search method employed. This means that any representation scheme may be used. Thus, it is possible to
combine GP and SA, or GA and tabu search; indeed, any
combination is possible.
VI. SUMMARY
In order to solve difficult or poorly understood problems, it
is often useful to employ a metaheuristic. Metaheuristics
are adept at discovering good solutions without necessarily requiring much information about the problem to
which they are being applied.
There are two broad categories of metaheuristics, often
termed evolutionary algorithms and hill climbers. How-
BIBLIOGRAPHY
Axelrod, R. (1984). The Evolution of Cooperation, Basic Books, New
York.
Banzhaf, W., Nordin, P., Keller, R. E., and Francone, F. D. (1998).
Genetic ProgrammingAn Introduction, Morgan Kaufmann, San
Francisco.
Darwin, C. (1859). The Origin of Species, John Murray, London.
Fogel, L. J., Owens, A. J., and Walsh, M. J. (1966). Artificial Intelligence
through Simulated Evolution, Wiley, New York.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, Reading, MA.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems,
University of Michigan Press, Ann Arbor, MI.
Koza, J. R. (1992). Genetic Programming: On the Programming of
Computers by Means of Natural Selection, MIT Press, Cambridge,
MA.
Ryan, C. (1999). Automatic Re-engineering of Software Using Genetic
Programming, Kluwer, Amsterdam.
P1: GTV/GRI
EN007I-841
17:53
Image Processing
Rama Chellappa
Azriel Rosenfeld
University of Maryland
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
Introduction
Digitization
Representation
Compression
Enhancement
Restoration
Reconstruction
Matching
GLOSSARY
Compression Reduction of the amount of data used
to represent an image, by compact encoding or by
approximation.
Description Information about image parts and their
properties and relations.
Digitization Conversion of an image into a discrete array
of numbers for computer processing.
Enhancement Processing of an image to improve its
appearance.
Matching Comparison of images for purposes of pattern recognition, registration, stereopsis, or motion
analysis.
Recognition Recognition of objects in an image by comparing image descriptions with object models.
Reconstruction Computation of cross sections of an image or volume, given a set of its projections.
Recovery Estimation of the orientation of a surface from
IX.
X.
XI.
XII.
XIII.
XIV.
XV.
COMPUTERS are used to process images for many purposes. Image processing is the computer manipulation of
images to produce more useful images; subfields of image
processing include image compression or coding, image
enhancement and restoration, and image reconstruction
from projections; examples of applications include compression of DVD movies, restoration of Hubble telescope
images, and medical imaging. The goal of image analysis
is to produce a description of the scene that gave rise to the
595
P1: GTV/GRI
EN007I-841
17:53
596
image; examples of applications are reading documents
(character recognition), counting blood cells on a microscope slide, detecting tumors in chest X rays, producing
land use maps from satellite images, detecting defects in
printed circuits, and guiding robots in navigating or in
manipulating objects. In addition to processing images
in the visible spectrum, acquisition, processing and analysis of infrared, synthetic aperture radar, medical, hyperspectral, and range images have become important over
the last 20 years due to the emergence of new sensors and
applications.
I. INTRODUCTION
This chapter deals with the manipulation and analysis of
images by computer. In image processing, both the input
and the output are images, the output being, for example, an approximated or improved version of the input. In
image analysis (also known by such names as pictorial
pattern recognition, image understanding, and computer
vision), the input is an image and the output is (typically) a
description of the scene that gave rise to the image. Computer graphics, which is not covered in this chapter, is the
inverse of image analysis: The input is a scene description,
and the output is an image of the scene as it would appear
from a given viewpoint.
An image is defined by specifying how its value (brightness, color, etc.) varies from point to pointin other
words, by a function of two variables defined over an
image plane. Before an image can be processed and
analyzed by (digital) computer, it must be converted into
a discrete array of numbers each of which represents the
value at a given point. This process of conversion is called
digitization (Section II).
A digitized image can be viewed as a matrix of graylevel values. To understand/analyze the structure of this
matrix, image models and image transforms have been
used. Image models attempt to describe the image data
quantatively, while image transforms enable the analysis of the image data in the transform domain for various applications such as compression, restoration, and
filtering.Image models and representations are discussed
in Section III.
To represent the input image with sufficient accuracy,
the array of numbers must usually be quite largefor
example, about 500 500 in the case of a television image. Image compression (or coding) deals with methods
of reducing this large quantity of data without sacrificing
important information about the image (Section IV).
One of the central goals of image processing is to improve the appearance of the imagefor example, by increasing contrast, reducing blur, or removing noise. Image
Image Processing
enhancement (Section V) deals with methods of improving the appearance of an image. More specifically, image
restoration (Section VI) is concerned with estimating image degradations and attempting to correct them.
Another important branch of image processing is image
reconstruction from projections (Section VII). Here we are
given a set of images (e.g., X rays) representing projections
of a given volume, and the task is to compute and display
images representing cross sections of that volume.
Comparison or matching of images is an important tool
in both image processing and analysis. Section VIII discusses image matching and registration and depth measurement by comparison of images taken from different
positions (stereomapping).
Section IX summarizes methods for the analysis of image sequences. Techniques for motion compensation, detection and tracking of moving objects, and recovery of
scene structure from motion using optic flow and discrete
features are discussed.
The brightness of an image at a point depends on many
factors, including the illumination, reflectivity, and surface orientation of the corresponding surface point in the
scene. Section X discusses methods of recovering these
intrinsic scene characteristics from an image by analyzing shading, texture, or shapes in the image.
Image analysis usually begins with feature detection or
segmentationthe extraction of parts of an image, such as
edges, curves, or regions, that are relevant to its description. Techniques for singling out significant parts of an
image are reviewed in Section XI. Methods of compactly
representing image parts for computer manipulation, as
well as methods of decomposing image parts based on
geometric criteria and of computing geometric properties
of image parts, are treated in Section XII.
Section XIII deals with image description, with an emphasis on the problem of recognizing objects in an image.
It reviews properties and relations, relational structures,
models, and knowledge-based image analysis.
A chapter such as this would not be complete without some discussion of architectures designed for efficient
processing of images. The eighties witnessed an explosion
of parallel algorithms and architectures for image processing and analysis; especially noteworthy were hypercubeconnected machines. In the early nineties attention was
focused on special processors such as pyramid machines.
Recently, emphasis is being given to embedded processors
and field-programmable gate arrays. Section XIV presents
a summary of these developments.
The treatment in this chapter is concept-oriented; applications are not discussed, and the use of mathematics has
been minimized, although some understanding of Fourier
transforms, stochastic processes, estimation theory, and
linear algebra is occasionally assumed. The Bibliography
P1: GTV/GRI
EN007I-841
17:53
597
Image Processing
II. DIGITIZATION
Digitization is the process of converting an image into a
discrete array of numbers. The array is called a digital
image, its elements are called pixels (short for picture
elements), and their values are called gray levels. (In a
digital color image, each pixel has a set of values representing color components.)
Digitization involves two processes: sampling the image value at a discrete grid of points and quantizing the
value at each of these points to make it one of a discrete set
of gray levels. In this section we briefly discuss sampling
and quantization.
A. Sampling
In general, any process of converting a picture into a discrete set of numbers can be regarded as sampling, but
we assume here that the numbers represent the image values at a grid of points (or, more precisely, average values
over small neighborhoods of these points). Traditionally,
rectangular uniform sampling lattices have been used, but
depending on the shape of the image spectrum, hexagonal,
circular, or nonuniform sampling lattices are sometimes
more efficient.
The grid spacing must be fine enough to capture all the
detail of interest in the image; if it is too coarse, information may be lost or misrepresented. According to the
sampling theorem, if the grid spacing is d, we can exactly reconstruct all periodic (sinusoidal) components of
the image that have period 2d or greater (or, equivalently,
spatial frequency 1/2d or fewer cycles per unit length).
However, if patterns having periods smaller than 2d are
present, the sampled image may appear to contain spurious patterns having longer periods; this phenomenon is
called aliasing. Moire patterns are an everyday example
of aliasing, usually arising when a scene containing shortperiod patterns is viewed through a grating or mesh (which
acts as a sampling grid).
Recent developments in the design of digital cameras
enable us to acquire digital images instantly. In addition,
many image processing operations such as contrast enhancement and dynamic range improvement are being
transferred to the acquisition stage.
B. Quantization
Let z be the value of a given image sample, representing
the brightness of the scene at a given point. Since the
values lie in a bounded range, and values that differ by
sufficiently little are indistinguishable to the eye, it suffices
to use a discrete set of values. It is standard practice to use
256 values and represent each value by an 8-bit integer
(0, 1, . . . , 255). Using too few discrete values give rise
to false contours, which are especially conspicuous in
regions where the gray level varies slowly. The range of
values used is called the grayscale.
Let the discrete values (quantization levels) be
zl , . . . , z k . To quantize z, we replace it by the z j that lies
closest to it (resolving ties arbitrarily). The absolute difference |z z j | is called the quantization error at z.
Ordinarily, the zs are taken to be equally spaced over
the range of possible brightnesses. However, if the brightnesses in the scene do not occur equally often, we can
reduce the average quantization error by spacing the zs
unequally. In fact, in heavily populated parts of the brightness range, we should space the zs closely together, since
this will result in small quantization errors. As a result,
in sparsely populated parts of the range, the zs will have
to be farther apart, resulting in larger quantization errors;
but only a few pixels will have these large errors, whereas
most pixels will have small errors, so that the average error
will be small. This technique is sometimes called tapered
quantization.
Quantization can be optimized for certain distributions of brightness levels, such as Gaussian and doubleexponential. Another class of techniques, known as
moment-preserving quantizers, does not make specific assumptions about distributions. Instead, the quantizers are
designed so that low-order moments of the brightness before and after quantization are required to be equal. These
scalar quantizers do not exploit the correlation between adjacent pixels. Vector quantization (VQ), in which a small
array of pixels is represented by one of several standard
patterns, has become a popular method of image compression on its own merits, as well as in combination with
techniques such as subband or wavelet decomposition. A
major difficulty with VQ techniques is their computational
complexity.
P1: GTV/GRI
EN007I-841
17:53
598
III. REPRESENTATION
Two dominant representations for images are twodimensional (2-D) discrete transforms and various types of
image models. In the former category, linear, orthogonal,
separable transforms have been popular due to their
energy-preserving nature and computational simplicity.
When images are represented using discrete transforms,
the coefficients of the expansion can be used for synthesis,
compression, and classification. The two most often used
discrete transforms are the discrete Fourier transform (due
to its FFT implementation) and discrete cosine transform
(due to its FFT-based implementation and its adoption in
the JPEG image compression standard). If minimizing the
mean squared error between the original image and its expansion using a linear orthogonal transform is the goal, it
can be shown that the KarhunenLoeve transform (KLT)
is the optimal transform, but the KLT is not practical, since
it requires eigencomputations of large matrices. Under a
circulant covariance structure, the DFT is identical to the
KLT.
All the 2-D discrete transforms mentioned above analyze the data at one scale or resolution only. Over the last
20 years, new transforms that split the image into multiple
frequency bands have given rise to schemes that analyze
images at multiple scales. These transforms, known as
wavelet transforms, have long been known to mathematicians. The implementation of wavelet transforms using
filter banks has enabled the development of many new
transforms.
Since the early eighties, significant progress has been
made in developing stochastic models for images. The
most important reason is the abstraction that such models
provide of the large amounts of data contained in the images. Using analytical representations for images, one can
develop systematic algorithms for accomplishing a particular image-related task. As an example, model-based
optimal estimation-theoretic principles can be applied to
find edges in textured images or remove blur and noise
from degraded images. Another advantage of using image models is that one can develop techniques to validate a given model for a given image. On the basis of
such a validation, the performance of algorithms can be
compared.
Most statistical models for image processing and analysis treat images as 2-D data, i.e., no attempt is made to
relate the 3-D world to its 2-D projection on the image.
There exists a class of models, known as image-formation
models, which explicitly relate the 3-D information to the
2-D brightness array through a nonlinear reflectance map
by making appropriate assumptions about the surface being imaged. Such models have been the basis for computer
graphics applications, can be customized for particular
Image Processing
sensors, and have been used for inferring shape from shading and other related applications. More recently, accurate
prediction of object and clutter models (trees, foliage, urban scenes) has been recognized as a key component in
model-based object recognition as well as change detection. Another class of models known as fractals, originally
proposed by Mandelbrot, is useful for representing images
of natural scenes such as mountains and terrain. Fractal
models have been successfully applied in the areas of image synthesis, compression, and analysis.
One of the earliest model-based approaches was the
multivariate Gaussian model used in restoration. Regression models in the form of facets have been used for deriving hypothesis tests for edge detection as well as deriving
the probability of detection. Deterministic 2-D sinusoidal
models and polynomial models for object recognition have
also been effective.
Given that the pixels in a local neighborhood are correlated, researchers have proposed 2-D extensions of timeseries models for images in particular, and for spatial data
in general, since the early 1950s. Generalizations have included 2-D causal, nonsymmetric half-plane (NSHP), and
noncausal models.
One of the desirable features in modeling is the ability
to model nonstationary images. A two-stage approach,
regarding the given image as composed of stationary
patches, has attempted to deal with nonstationarities in
the image. A significant contribution to modeling nonstationary images used the concept of a dual lattice process,
in which the intensity array is modeled as a multilevel
Markov random field (MRF) and the discontinuities are
modeled as line processes at the dual lattice sites interposed between the regular lattice sites. This work has led
to several novel image processing and analysis algorithms.
Over the last 5 years, image modeling has taken on
a more physics-based flavor. This has become necessary because of sensors such as infrared, laser radar,
SAR, and foliage-penetrating SAR becoming more common in applications such as target recognition and image
exploitation. Signatures predicted using electromagnetic
scattering theory are used for model-based recognition and
elimination of changes in the image due to factors such as
weather.
IV. COMPRESSION
The aim of image compression (or image coding) is to
reduce the amount of information needed to specify an
image, or an acceptable approximation to an image. Compression makes it possible to store images using less memory or transmit them in less time (or at a lower bandwidth).
P1: GTV/GRI
EN007I-841
17:53
599
Image Processing
A. Exact Encoding
Because images are not totally random, it is often possible to encode them so that the coded image is more
compact than the original, while still permitting exact
reconstruction of the original. Such encoding methods are
generally known as lossless coding.
As a simple illustration of this idea, suppose that the
image has 256 gray levels but they do not occur equally
often. Rather than representing each gray level by an 8-bit
number, we can use short codes for the frequently occurring levels and long codes for the rare levels. Evidently
this implies that the average code length is relatively short,
since the frequent levels outnumber the rare ones. If the
frequent levels are sufficiently frequent, this can result
in an average code length of less than eight bits. A general method of constructing codes of this type is called
ShannonFanoHuffman coding.
As another example, suppose that the image has only a
few gray levels, say two (i.e., it is a black-and-white, or
binary, image: two levels might be sufficient for digitizing
documents, e.g., if we can distinguish ink from paper reliably enough). Suppose, further, that the patterns of black
and white in the image are relatively simplefor example, that it consists of black blobs on a white background.
Let us divide the image into small blocks, say 3 3. Theoretically, there are 29 = 512 such blocks (each of the nine
pixels in a block can have gray level either 0 or 1), but not
all of these combinations occur equally often. For example, combinations like
1 1 1
1 1 1
1 1 1
or
0 1 1
0 1 1
0 0 1
or
1 1 0
0 0 0
0 0 0
or
0 0 0
0 0 0
0 0 0
0
1
0
1
0
0
P1: GTV/GRI
EN007I-841
17:53
600
techniques, as described in Section IV.A, can be used
to greater advantage if we apply them to the
differences rather than to the original gray levels.
2. When large differences do occur, the gray level is
fluctuating rapidly. Thus large differences can be
quantized coarsely, so that fewer quantization levels
are required to cover the range of differences.
The main disadvantage of this difference coding approach
is the relatively low degree of compression that it typically
achieves.
More generally, suppose that we apply an invertible
transformation to an imagefor example, we take its
discrete Fourier transform. We can then encode or approximate the transformed image, and when we want to
reconstruct the original image, we apply the inverse transformation to the approximated transform. Evidently, the
usefulness of this transform coding approach depends on
the transformed image being highly compressible.
If we use the Fourier transform, it turns out that for most
classes of images, the magnitudes of the low-frequency
Fourier coefficients are very high, whereas those of the
high-frequency coefficients are low. Thus we can use a
different quantization scheme for each coefficientfine
quantization at low frequencies, coarse quantization at
high frequencies. (For sufficiently high frequencies, the
magnitudes are so small that they can be discarded.) In
fact, the quantization at high frequencies can be very
coarse because they represent parts of the image where
the gray level is fluctuating rapidly. When the image is
reconstructed using the inverse transform, errors in the
Fourier coefficients are distributed over the entire image
and so tend to be less conspicuous (unless they are very
large, in which case they show up as periodic patterns).
An example of transform coding using the discrete cosine
transform is shown in Fig. 1.
Difference and transform coding techniques can be
combined, leading to hybrid methods. For example, one
can use a 1-D transform within each row of the image, as
well as differencing to predict the transform coefficients
for each successive row by extrapolating from the coefficients for the preceding row(s). Difference coding is also
very useful in the compression of time sequences of images (e.g., sequences of television frames), since it can
be used to predict each successive image by extrapolating
from the preceding image(s).
D. Recent Trends
The difference and transform compression schemes described above attempt to decorrelate the image data at a
single resolution only. With the advent of multiresolution
representations (pyramids, wavelet transforms, subband
Image Processing
P1: GTV/GRI
EN007I-841
17:53
601
Image Processing
(a)
FIGURE 1 Image compression. (a) Original image; (b) compressed image, using the discrete cosine transform;
(c) compressed image, using a subband coding scheme.
V. ENHANCEMENT
A. Grayscale Modification
P1: GTV/GRI
EN007I-841
17:53
602
Image Processing
(b)
FIGURE 1 (continued)
visible, since spreading apart indistinguishable (e.g., adjacent) gray levels makes them distinguishable.
Even if an image occupies the entire grayscale, one can
spread apart the gray levels in one part of the grayscale at
the cost of packing them closer together (i.e., combining
adjacent levels) in other parts. This is advantageous if the
information of interest is represented primarily by gray
levels in the stretched range.
If some gray levels occur more frequently than others
(e.g., the gray levels at the ends of a grayscale are usually
relatively uncommon), one can improve the overall contrast of the image by spreading apart the frequently occurring gray levels while packing the rarer ones more closely
together; note that this stretches the contrast for most of
the image. (Compare the concept of tapered quantization
in Section II.B.) The same effect is achieved by requantizing the image so that each gray level occurs approximately
equally often; this breaks up each frequently occuring gray
level into several levels, while compressing sets of consecutive rarely occuring levels into a single level. Given an
P1: GTV/GRI
EN007I-841
17:53
603
Image Processing
(c)
FIGURE 1 (continued)
B. Blur Reduction
When an image is blurred, the ideal gray level of each pixel
is replaced by a weighted average of the gray levels in a
neighborhood of that pixel. The effects of blurring can be
reversed, to a first approximation, by subtracting from the
gray level of each pixel in the blurred image a weighted
average of the gray levels of the neighboring pixels. This
method of sharpening or deblurring an image is sometimes referred to as Laplacian processing, because the
P1: GTV/GRI
EN007I-841
17:53
604
Image Processing
(a)
(b)
FIGURE 2 Moving object detection in a video frame. (a) Original frame; (b) detected object.
P1: GTV/GRI
EN007I-841
Image Processing
C. Shading Reduction
The illumination across a scene usually varies slowly,
while the reflectivity may vary rapidly from point to point.
Thus illumination variations, or shading, give rise primarily to low-frequency Fourier components in an image, while reflectivity variations also give rise to highfrequency components. Thus one might attempt to reduce
shading effects in an image by high-emphasis frequency
filtering, weakening the low-frequency components in the
images Fourier transform relative to the high-frequency
components. Unfortunately, this simple approach does not
work, because the illumination and reflectivity information is combined multiplicatively rather than additively;
the brightness of the scene (and hence of the image) at
a point is the product of illumination and reflectivity, not
their sum. Better results can be obtained using a technique
called homomorphic filtering. If we take the logarithm of
the gray level at each point of the image, the result is
the sum of the logarithms of the illumination and reflectivity; in other words, the multiplicative combination has
been transformed into an additive one. We can thus take
the Fourier transform of the log-scaled image and apply
high-emphasis filtering to it. Taking the inverse Fourier
transform and the antilog at each point then gives us an
enhanced image in which the effects of shading have been
reduced.
D. Noise Cleaning
Noise that is distinguishable from the image detail is relatively easy to remove from an image. For example, if the
image is composed of large objects and the noise consists
of high-contrast specks (salt and pepper), we can detect
the specks as pixels that are very different in gray level
from (nearly) all of their neighbors and remove them by
replacing each such pixel by the average of its neighbors.
As another example, if the noise is a periodic pattern,
in the Fourier transform of the image it gives rise to a
small set of isolated high values (i.e., specks); these can
be detected and removed as just described, and the inverse
Fourier transform can be applied to reconstruct an image
from which the periodic pattern has been deleted. (This
process is sometimes called notch filtering.)
Image noise can also be reduced by averaging operations. If we have several copies of an image that are
identical except for the noise (e.g., several photographs or
television frames of the same scene), averaging the copies
reduces the amplitude of the noise while preserving the
image detail. In a single image, local averaging (of each
pixel with its neighbors) will reduce noise in flat regions
of the image, but it will also blur the edges or boundaries
of regions. To avoid blurring, one can first attempt to detect edges and then average each pixel that lies near an
17:53
605
edge only with those of its neighbors that lie on the same
side of the edge. One need not actually decide whether
an edge is present but can simply average each pixel only
with those of its neighbors whose gray levels are closest
to its own, since these neighbors are likely to lie on the
same side of the edge (if any) as the pixel. [Better results
are obtained by selecting, from each symmetric pair of
neighbors, the one whose gray level is closer to that of
the pixel. If the images histogram has distinctive peaks
(see Section XI.A), averaging each pixel with those of its
neighbors whose gray levels belong to the same peak also
has a strong smoothing effect.] Another approach is to
examine a set of wedge-shaped neighborhoods extending
out from the pixel in different directions and to use the
average of that neighborhood whose gray levels are least
variable, since such a neighborhood is likely to lie on one
side of an edge. A more general class of approaches uses
local surface fitting (to the gray levels of the neighbors)
rather than local averaging; this too requires modifications
to avoid blurring edges.
Another class of noise-cleaning methods is based on
rank ordering, rather than averaging, the gray levels of the
neighbors of each pixel. The following are two examples
of this approach.
1. Minmax filtering. Suppose that the noise consists of
small specks that are lighter than their surroundings.
If we replace each pixels gray level by the minimum
of its neighbors gray levels, these specks disappear,
but light objects also shrink in size. We now replace
each pixel by the maximum of its neighbors; this
reexpands the light objects, but the specks do not
reappear. Specks that are darker than their
surroundings can be removed by an analogous
process of taking a local maximum followed by a
local minimum.
2. Median filtering. The median gray level in a
neighborhood is the level such that half the pixels in
the neighborhood are lighter than it and half are
darker. In a flat region of an image, the median is
usually close to the mean; thus replacing each pixel
by the median of its neighbors has much the same
effect as local averaging. For a pixel near a (relatively
straight) edge, on the other hand, the median of the
neighbors will usually be one of the neighbors on the
same side of the edge as the pixel, since these
neighbors are in the majority; hence replacing the
pixel by the median of the neighbors will not blur the
edge. Note that both minmax and median filtering
destroy thin features such as lines, curves, or sharp
corners; if they are used on an image that contains
such features, one should first attempt to detect them
so they can be preserved.
P1: GTV/GRI
EN007I-841
17:53
606
Image Processing
Median filters belong to a class of filters known as orderstatistic or rank order-based filters. These filters can effectively handle contamination due to heavy-tailed, nonGaussian noise, while at the same time not blurring edges
or other sharp features, as linear filters almost always do.
An example of iterated median filtering is shown in Fig. 3.
VI. RESTORATION
The goal of image restoration is to undo the effects of given
or estimated image degradations. In this section we describe some basic methods of restoration, including photometric correction, geometric correction, deconvolution,
and estimation (of the image gray levels in the presence
of noise).
A. Photometric Correction
Ideally, the gray levels in a digital image should represent
the brightnesses at the corresponding points of the scene in
a consistent way. In practice, however, the mapping from
brightness to gray level may vary from point to pointfor
example, because the response of the sensor is not uniform
or because the sensor collects more light from scene points
The image obtained by a sensor may be geometrically distorted, by optical aberrations, for example. We can estimate the distortion by using images of known test objects
such as regular grids. We can then compute a geometric transformation that will correct the distortion in any
image.
A geometric transformation is defined by a pair of
functions x = (x, y), y = (x, y) that map the old
coordinates (x, y) into new ones (x , y ). When we
apply such a transformation to a digital image, the input
points (x, y) are regularly spaced sample points, say
with integer coordinates, but the output points (x , y )
can be arbitrary points of the plane, depending on the
nature of the transformation. To obtain a digital image
as output, we can map the output points into the nearest
integer-coordinate points. Unfortunately, this mapping
is not one-to-one; some integer-coordinate points in the
output image may have no input points mapped into
them, whereas others may have more than one.
To circumvent this problem, we use the inverse transformation x = (x , y ), y = (x , y ) to map each integercoordinate point of the output image back into the input
image plane. This point is then assigned a gray level derived from the levels of the nearby input image pointsfor
example, the gray level of the nearest point or a weighted
average of the gray levels of the surrounding points.
C. Deconvolution
Suppose that an image has been blurred by a known
process of local weighted averaging. Mathematically,
such a blurring process is described by the convolution
(g = h f ) of the image f with the pattern of weights h.
(The value of h f for a given shift of h relative to f is
obtained by pointwise multiplying them and summing the
results.)
By the convolution theorem for Fourier transforms, we
have G = HF, where F, G, H are the Fourier transforms
P1: GTV/GRI
EN007I-841
17:53
607
Image Processing
estimate at each pixel as a linear combination of the estimates at preceding, nearby pixels. The coefficients of the
estimate that minimizes the expected squared error can be
computed from the autocorrelation of the (ideal) image.
Recursive Kalman filters for image restoration are computationally intensive. Approximations such as reducedupdate Kalman filters give very good results at a much
lower computational complexity. The estimation-theoretic
formulation of the image restoration problem using the
Kalman filter has enabled investigation of more general
cases, including blind restoration (where the unknown blur
function is modeled and estimated along with the original image) and nonstationary image restoration (piecewise
stationary regions are restored, with the filters appropriate
for the regions being selected using a Markov chain).
In addition to recursive filters, other model-based
estimation-theoretic approaches have been developed. For
example, in the Wiener filter described above, one can use
random field models (see Section III) to estimate the power
spectra needed. Alternatively, one can use MRF models to
characterize the degraded images and develop deterministic or stochastic estimation techniques that maximize the
posterior probability density function.
A seminal approach that models the original image
uisng a composite model, where stationary regions are
represented using MRF models, and the discontinuities
that separate the stationary regions are represented using
line processes, has yielded a new unified framework for
handling a wide variety of problems in image estimation,
restoration, surface reconstruction, and texture segmentation. The composite model, when used in conjunction
with the MAP criterion, leads to nonconvex optimization
problems. A class of stochastic search methods known
as simulated annealing and its variants has enabled the
solution of such optimization problems. Although these
methods are computationally intensive, parallel hardware
implementations of the annealing algorithms have taken
the bite out of their computational complexity. An image
restoration example is shown in Fig. 4.
VII. RECONSTRUCTION
A projection of an image is obtained by summing its gray
levels along a family of parallel lines. In this section we
show how an image can be (approximately) reconstructed
from a sufficiently large set of its projections. This process of reconstruction from projections has important applications in reconstructing images of cross sections of a
volume, given a set of X-ray images of the volume taken
from different directions. In an X-ray image of an object,
the ray striking the film at a given point has been attenuated by passing through the object; thus its strength is
P1: GTV/GRI
EN007I-841
17:53
608
Image Processing
(a)
(b)
FIGURE 4 Image restoration. (a) Original image; (b) blurred image; (c) restored image. [These images were provided
by Prof. A. Katsaggelos of Northwestern University.]
P1: GTV/GRI
EN007I-841
17:53
609
Image Processing
(c)
FIGURE 4 (continued)
FIGURE 5 Reconstruction of a cross section of the human body from a set of X rays.
P1: GTV/GRI
EN007I-841
17:53
610
high-frequency components of f will not be reconstructed
accurately using this method, since the cross-section data
far from the origin are sparse.
Another approach to reconstruction from projections is
based on back-projection. The value v of a projection at
a point is the sum of the values of the original image along
a given line; to back-project, we divide v into many equal
parts and distribute them along that line. If we do this for
every projection, we obtain a highly blurred version of
the original image. (To see this, suppose that the image
contains a bright point P on a dark background; then P
will give rise to a high value in each projection. When we
back-project, these values will be spread along a set of
lines that all intersect at P, giving rise to a high value at
P and lower values along the lines radiating from it.) To
obtain a more accurate reconstruction of the image, we
combine back-projection with deblurring; we first filter
the projections to precompensate for the blurring and then
back-project them. This method of reconstruction is the
one most commonly used in practice.
Each projection of an image f gives us a set of linear
equations in the gray levels of f , since the value of a projection at a point is the sum of the gray levels along a line.
Thus if we have a large enough set of projections of f , we
can, in principle, solve a large set of linear equations to
determine the original gray levels of f . Many variations of
this algebraic approach to reconstruction have been formulated. (In principle, algebraic techniques can also be
used in image deblurring; the gray level of a pixel in the
blurred image is a weighted average of neighboring gray
levels in the ideal image, so that the ideal gray levels can
be found by solving a set of linear equations.)
Over the last 15 years, the emphasis in image reconstruction has been on introducing probabilistic or statistical principles in modeling noise and imaging mechanisms
and deriving mathematically sound algorithms. As multimodal images (X ray, CT, MRI) are becoming increasingly available, registration of these images to each other
and positioning of these images to an atlas have also become critical technologies. Visualization of reconstructed
images and objects is also gaining attention.
VIII. MATCHING
There are many situations in which we want to match
or compare two images with one another. The following
are some common examples.
1. We can detect occurrences of a given pattern in an
image by comparing the image with a template of
the pattern. This concept has applications in pattern
recognition, where we can use template matching to
Image Processing
P1: GTV/GRI
EN007I-841
Image Processing
17:53
611
gion may have different values depending on the sensor,
pixel similarity and pixel dissimilarity are usually preserved. In other words, a region that appears homogeneous
to one sensor is likely to appear homogeneous to another,
local textural variations apart. Regions that can be clearly
distinguished from one another in one image are likely
to be distinguishable from one another in other images,
irrespective of the sensor used. Although this is not true
in all cases, it is generally valid for most types of sensors
and scenes. Man-made objects such as buildings and roads
in aerial imagery, and implants, prostheses, and metallic
probes in medical imagery, also give rise to features that
are likely to be preserved in multisensor images. Featurebased methods that exploit the information contained in
region boundaries and in man-made structures are therefore useful for multisensor registration.
Feature-based methods traditionally rely on establishing feature correspondences between the two images.
Such correspondence-based methods first employ feature
matching techniques to determine corresponding feature
pairs in the two images and then compute the geometric transformation relating them, typically using a leastsquares approach. Their primary advantage is that the
transformation parameters can be computed in a single
step and are accurate if the feature matching is reliable.
Their drawback is that they require feature matching,
which is difficult to accomplish in a multisensor context
and is computationally expensive, unless the two images
are already approximately registered or the number of features is small.
Some correspondence-less registration methods based
on moments of image features have been proposed, but
these techniques, although mathematically elegant, work
only if the two images contain exactly the same set of
features. This requirement is rarely met in real images.
Another proposed class of methods is based on the generalized Hough transform (GHT). These methods map the
feature space into a parameter space, by allowing each
feature pair to vote for a subspace of the parameter space.
Clusters of votes in the parameter space are then used to
estimate parameter values. These methods, although far
more robust and practical than moment-based methods,
have some limitations. Methods based on the GHT tend to
produce large numbers of false positives. They also tend
to be computationally expensive, since the dimensionality
of the problem is equal to the number of transformation
parameters.
Recently, methods similar in spirit to GHT-style methods, but employing a different search strategy to eliminate the problems associated with them, have been
proposed. These methods first decompose the original
transformation into a sequence of elementary stages. At
each stage, the value of one transformation parameter
P1: GTV/GRI
EN007I-841
17:53
612
Image Processing
(a)
FIGURE 6 Image registration. (a, b) The two images to be registered; (c) the registration result (checkerboard
squares show alternating blocks of the two registered images.)
P1: GTV/GRI
EN007I-841
17:53
613
Image Processing
(b)
FIGURE 6 (continued)
(c)
FIGURE 6 (continued)
P1: GTV/GRI
EN007I-841
17:53
614
A strong theoretical basis for this approach evolved
around the midseventies. Since then, advances have been
made in interpolating the depth estimates obtained at positions of matched image features using surface interpolation techniques and hierarchical feature-based matching
schemes, and dense estimates have been obtained using
gray-level matching guided by simulated annealing. Although these approaches have contributed to a greater understanding of the problem of depth recovery using two
cameras, much more tangible benefits have been reaped
using a larger number of cameras. By arranging them in
an array pattern, simple sum of squared difference-based
schemes are able to produce dense depth estimates in
real time. Using large numbers of cameras (in excess of
50), new applications in virtual reality, 3-D modeling, and
computer-assisted surgery have become feasible.
Consistent with developments in multiscale analysis,
stereo mapping has benefited from multiscale featurebased matching techniques. Also, simulated annealing and
neural networks have been used for depth estimation using
two or more images.
Another approach to determining the spatial positions
of the points in a scene is to use patterned illumination.
For example, suppose that we illuminate the scene with a
plane of light
, so that only those scene points that lie
in
are illuminated, and the rest are dark. In an image of
the scene, any visible scene point P (giving rise to image
point P1 ) must lie on the line P1 L 1 ; since P must also
lie in
, it must be at the intersection of P1 L 1 and
,
so that its position in space is completely determined. We
can obtain complete 3-D information about the scene by
moving
through a set of positions so as to illuminate
every visible scene point, or we can use coded illumination
in which the rays in each plane are distinctive (e.g., by their
colors). A variety of range sensing techniques based on
patterned illumination has been developed. Still another
approach to range sensing is to illuminate the scene, one
point at a time, with a pulse of light and to measure the
time interval (e.g., the phase shift) between the transmitted
and the reflected pulses, thus obtaining the range to that
scene point directly, as in radar.
Image Processing
P1: GTV/GRI
EN007I-841
Image Processing
components in the gradient direction agree with the observed components perpendicular to edges.
When a sequence of frames is taken by a moving
sensor, there will be changes nearly everywhere, with the
magnitude of the change at a given image point depending
on the velocity of the sensor and on its distance from the
corresponding scene point. The array of motion vectors at
all points of the image is known as the optical flow field.
An example of such a field is shown in Fig. 8. Different
types of sensor motion (ignoring, for the moment, the
motions of objects in the scene) give rise to different types
of flow fields. Translational sensor motion perpendicular
to the optical axis of the sensor simply causes each point
of the image to shift, in the opposite direction, by an
amount proportional to its distance from the sensor, so
that the resulting image motion vectors are all parallel.
Translation in other directions, on the other hand, causes
the image to expand or shrink, depending on whether the
sensor is approaching or receding from the scene; here
the motion vectors all pass through a common point,
the focus of expansion (or contraction). Thus if we
know the translational sensor motion, we can compute
the relative distance from the sensor to each point of the
scene. (The distances are only relative because changes
in absolute distance are indistinguishable from changes
in the speed of the senser.) The effects of rotational
sensor motion are more complex, but they can be treated
independently of the translation effects.
Motion of a rigid body relative to the sensor gives rise to
a flow field that depends on the motion and on the shape of
the body. In principle, given enough (accurate) measurements of the flow in a neighborhood, we can determine
both the local shape of the body and its motion.
Parallel to, and often independent of, the correspondence-based approach, optical flow-based structure and
motion estimation algorithms have flourished for more
than two decades. Although robust dense optical flow estimates are still elusive, the field has matured to the extent
that systematic characterization and evaluation of optical
flow estimates are now possible. Methods that use robust
statistics, generalized motion models, and filters have all
shown great promise. Significant work that uses directly
observable flow (normal flow) provides additional insight into the limitations of traditional approaches. An
example of depth estimation using optical flow is shown
in Fig. 8.
Segmentation of independently moving objects and
dense scene structure estimation from computed flow
have become mature research areas. New developments
such as fast computation of depth from optical flow using fast Fourier transforms have opened up the possibility of real-time 3-D modeling. Another interesting
accomplishment has been the development of algorithms
17:53
615
P1: GTV/GRI
EN007I-841
17:53
616
for sensor motion stabilization and panoramic view
generation.
Detection of moving objects, structure and motion
estimation, and tracking of 2-D and 3-D object motion
using contours and other features have important applications in surveillance, traffic monitoring, and automatic
target recognition. Using methods ranging from recursive Kalman filters to the recently popular Monte Carlo
Markov chain algorithms (known as CONDENSATION
algorithms), extensive research has been done in this area.
Earlier attempts were concerned only with generic tracking, but more recently, attributed tracking, where one incorporates the color, identity, or shape of the object as part
of the tracking algorithm, is gaining importance. Also, due
to the impressive computing power that is now available,
real-time tracking algorithms have been demonstrated.
X. RECOVERY
The gray level of an image at a given point P1 is proportional to the brightness of the corresponding scene point
P as seen by the sensor; P is the (usually unique) point on
the surface of an object in the scene that lies along the line
P1 L 1 (see Section VIII.C). The brightness of P depends
in turn on several factors: the intensity of the illumination
at P, the reflective properties of the surface S on which
P lies, and the spatial orientation of S at P. Typically,
if a light ray is incident on S at P from direction i, then
the fraction r of the ray that emerges from S in a given
direction e is a function of the angles i and e that i and e,
respectively, make with the normal n to S at P. For example, in perfect specular reflection we have r = 1, if i and e
are both coplanar with n and i = e , and r = 0 otherwise.
In perfectly diffuse or Lambertian reflection, on the other
hand, r depends only on , and not on e ; in fact, we have
r = p cos i , where p is a constant between 0 and 1.
If we could separate the effects of illumination, reflectivity, and surface orientation, we could derive 3-D information about the visible surfaces in the scene; in fact, the
surface orientation tells us the rate at which the distance
to the surface is changing, so that we can obtain distance
information (up to a constant of integration) by integrating
the orientation. The process of inferring scene illumination, reflectivity, and surface orientation from an image
is called recovery (more fully, recovery of intrinsic scene
characteristics from an image). Ideally, recovery gives us
a set of digital image arrays in which the value of a pixel
represents the value of one of these factors at the corresponding scene point; these arrays are sometimes called
intrinsic images.
In this section we briefly describe several methods of inferring intrinsic scene characteristics from a single image.
Image Processing
P1: GTV/GRI
EN007I-841
17:53
617
Image Processing
P1: GTV/GRI
EN007I-841
17:53
618
an observed shape could be the projected image of a more
symmetric or more compact slanted shape (e.g., an ellipse
could be the projection of a slanted circle), one might assume that this is actually the case; in fact, human observers
frequently make such assumptions. They also tend to assume that a continuous curve in an image arises from a
continuous curve in space, parallel curves arise from parallel curves, straight lines arise from straight lines, and so
on. Similarly, if two shapes in the image could arise from
two congruent objects at different ranges or in different
orientations, one tends to conclude that this is true.
A useful assumption about a curve in an image is that it
arises from a space curve that is as planar as possible and
as uniformly curved as possible. One might also assume
that the surface bounded by this space curve has the least
possible surface curvature (it is a soap bubble surface) or
that the curve is a line of curvature of the surface. Families
of curves in an image can be used to suggest the shape of
a surface very compellingly; we take advantage of this
when we plot perspective views of 3-D surfaces.
XI. SEGMENTATION
Images are usually described as being composed of parts
(regions, objects, etc.) that have certain properties and
that are related in certain ways. Thus an important step
in the process of image description is segmentation, that
is, the extraction of parts that are expected to be relevant
to the desired description. This section reviews a variety
of image segmentation techniques.
A. Pixel Classification
If a scene is composed of surfaces each of which has a constant orientation and uniform reflectivity, its image will be
composed of regions that each have an approximately constant gray level. The histogram of such an image (see Section V.A) will have peaks at the gray levels of the regions,
indicating that pixels having these levels occur frequently
in the image, whereas other gray levels occur rarely. Thus
the image can be segmented into regions by dividing the
gray scale into intervals each containing a single peak.
This method of segmentation is called thresholding.
Thresholding belongs to a general class of segmentation techniques in which pixels are characterized by a set
of properties. For example, in a color image, a pixel can
be characterized by its coordinates in color spacee.g.,
its red, green, and blue color components. If we plot each
pixel as a point in color space, we obtain clusters of points
corresponding to the colors of the surfaces. We can thus
segment the image by partitioning the color space into regions each containing a single cluster. This general method
of segmentation is called pixel classification.
Image Processing
P1: GTV/GRI
EN007I-841
17:53
619
Image Processing
(a)
(b)
FIGURE 10 Texture segmentation. (a) Original texture mosaic; (b) segmented result.
P1: GTV/GRI
EN007I-841
17:53
620
3. Match a set of templates, representing the second
derivatives of gray-level steps in various
orientations, to the neighborhood of P (see
Section VIII.A) and pick the orientation for which the
match is best. (It turns out that convolving such a
template with the image amounts to applying a
first-difference operator at each pixel; thus this
method amounts to computing first differences in
many directions and picking the direction in which
the difference is greatest.) Alternatively, fit a step
function to the gray levels in the neighborhood of P
and use the orientation and height of this step to
define the orientation and contrast of the edge at P.
4. Estimate the Laplacian of the gray level at each pixel
P. Since the Laplacian is a second-difference
operator, it is positive on one side of an edge and
negative on the other side; thus its zero-crossings
define the locations of edges. (First differences should
also be computed to estimate the steepnesses of these
edges.)
Although the use of the Laplacian operator for edge
detection has long been known, the idea of applying the
Laplacian operator to a Gaussian smoothed image, using
filters of varying sizes, stimulated theoretical developments in edge detection. In the mideighties, edge detectors that jointly optimize detection probability and localization accuracy were formulated and were approximated
by directional derivatives of Gaussian-smoothed images.
Fig. 11 shows an example of edge detection using this
approach.
All these methods of edge detection respond to noise as
well as to region borders. It may be necessary to smooth
the image before attempting to detect edges; alternatively,
one can use difference operators based on averages of
blocks of gray levels rather than on single-pixel gray levels, so that the operator itself incorporates some smoothing. Various types of statistical tests can also be used to
detect edges, based, for example, on whether the set of
gray levels in two adjacent blocks comes from a single
population or from two populations. Note that edges in an
image can arise from many types of abrupt changes in the
scene, including changes in illumination, reflectivity, or
surface orientation; refinements of the methods described
in this section would be needed to distinguish among these
types.
To detect local features such as lines (or curves or corners) in an image, the standard approach is to match the image with a set of templates representing the second derivatives of lines (etc.) in various orientations. It turns out that
convolving such a template with the image amounts to applying a second-difference operator at each pixel; thus this
method responds not only to lines, but also to edges or to
Image Processing
P1: GTV/GRI
EN007I-841
17:53
621
Image Processing
(a)
FIGURE 11 Edge detection. (a) Original image; (b) detected edges.
P1: GTV/GRI
EN007I-841
17:53
622
Image Processing
(b)
FIGURE 11 (continued)
P1: GTV/GRI
EN007I-841
17:53
623
Image Processing
parts and still have good fits to the merged parts. By using both splitting and merging, we can arrive at a partition
such that the fit on each part is acceptable, but no two parts
can be merged and still yield an acceptable fit. Note that
by fitting surfaces to the gray levels we can, in principle,
handle regions that arise from curved or shaded surfaces
in the scene.
In many situations we have prior knowledge about the
sizes and shapes of the regions that are expected to be
present in the image, in addition to knowledge about their
gray levels, colors, or textures. In segmentation by pixel
classification we can make use of information about gray
level (etc.) but not about shape. Using region-based methods of segmentation makes it easier to take such geometric knowledge into account; we can use region-growing,
merging, or splitting criteria that are biased in favor of
the desired geometries, or, more generally, we can include geometry-based cost factors in searching for an optimal partition. Cost criteria involving the spatial relations
among regions of different types can also be used.
XII. GEOMETRY
In this section we discuss ways of measuring geometric
properties of image subsets (regions or features) and of
decomposing them into parts based on geometric criteria. We also discuss ways of representing image subsets
exactly or approximately.
A. Geometric Properties
Segmentation techniques based on pixel classification can
yield arbitrary sets of pixels as segments. One often
wants to consider the connected pieces of a subset individually, in order to count them, for example.
Let P and Q be pixels belonging to a given subset S. If
there exists a sequence of pixels P = P0 , P1 , . . . , Pn = Q,
all belonging to S, such that, for each i, Pi is a neighbor of Pi1 , we say that P and Q are connected in S.
The maximal connected subsets of S are called its (connected) components. We call S connected if it has only one
component.
Let S be the complement of S. We assume that the
image is surrounded by a border of pixels all belonging to
The component of S that contains this border is called
S.
if any, are
the background of S; all other components of S,
called holes in S.
Two image subsets S and T are called adjacent if some
pixel of S is a neighbor of some pixel of T . Let S1 , . . . , Sm
be a partition of the image into subsets. We define the
adjacency graph of the partition as the graph whose nodes
P1: GTV/GRI
EN007I-841
17:53
624
components or concavities (i.e., those whose areas are
below some threshold).
An important class of geometric decomposition techniques is based on the concept of expanding and shrinking
a set. [These operations are the same as minmax filtering (Section V.D) for the special case of a binary image.]
Let S (1) be obtained from S by adding to it the border
of S (i.e., adding all points of S that are adjacent to S),
and let S (2) , S (3) , . . . be defined by repeating this process.
Let S (1) be obtained from S by deleting its border (i.e.,
and let
deleting all points of S that are adjacent to S),
S (2) , S (3) , . . . be defined by repeating this process. If
we expand S and then reshrink it by the same amount,
that is, we construct (S (k) )(k) for some k, it can be shown
that the result always contains the original S; but it may
contain other things as well. For example, if S is a cluster
of dots that are less than 2k apart, expanding S will fuse
the cluster into a solid mass, and reshrinking it will leave
a smaller, but still solid mass that just contains the dots of
the original cluster. Conversely, we can detect elongated
parts of a set S by a process of shrinking and reexpanding.
Specifically, we first construct (S (k) )(k) ; it can be shown
that this is always contained in the original S. Let Sk be
the difference set S (S (k) )(k) . Any component of Sk has
a thickness of at most 2k; thus if its area as large relative
to k (e.g., 10k 2 ), it must be elongated.
Another method of geometric decomposition makes use
of shrinking operations that preserve the connectedness
properties of S, by never deleting a pixel P if this would
disconnect the remaining pixels of S in the neighborhood
of P. Such operations can be used to shrink the components or holes of S down to single pixels or to shrink S
down to a skeleton consisting of connected arcs and
curves; the latter process is called thinning.
Still another approach to geometric decomposition involves detecting features of a sets border(s). As we move
around a border, each step defines a local slope vector, and
we can estimate the slope of the border by taking running
averages of these vectors. By differentiating (i.e., differencing) the border slope, we can estimate the curvature of
the border. This curvature will be positive on convex parts
of the border and negative on concave parts (or vice versa);
zero-crossings of the curvature correspond to points of inflection that separate the border into convex and concave
parts. Similarly, sharp positive or negative maxima of the
curvature correspond to sharp convex or concave corners. It is often useful to decompose a border into arcs by
cutting it at such corners or inflections. Similar remarks
apply to the decomposition of curves. Many of the methods of image segmentation described in Section XI can be
applied to border segmentation, with local slope vectors
playing the role of pixel gray levels. Analogues of various
image-processing techniques can also be applied to bor-
Image Processing
P1: GTV/GRI
EN007I-841
17:53
625
Image Processing
XIII. DESCRIPTION
The goal of image analysis is usually to derive a description of the scene that gave rise to the image, in particular,
to recognize objects that are present in the scene. The description typically refers to properties of and relationships
among objects, surfaces, or features that are present in the
scene, and recognition generally involves comparing this
descriptive information with stored models for known
classes of objects or scenes. This section discusses properties and relations, their representation, and how they are
used in recognition.
A. Properties and Relations
In Section XII.A we defined various geometric properties
of and relationships among image parts. In this section we
discuss some image properties that depend on gray level
(or color), and we also discuss the concept of invariant
properties.
The moments of an image provide information about the
spatial arrangement of the images gray levels. If f (x, y)
is the
gray level at (x, y), the (i, j) moment m i j is defined
as
x i y i f (x, y) summed over the image. Thus m 00 is
simply the sum of all the gray levels. If we think of gray
level as mass, (m 10 /m 00 , m 01 /m 00 ) are the coordinates of
the centroid of the image. If we choose the origin at the
centroid and let m i j be the (i, j) central moment relative to
this origin, then m 10 = m 01 = 0, and the central moments
of higher order provide information about how the gray
levels are distributed around the centroid. For example,
m 20 and m 02 are sensitive to how widely the high gray
levels are spread along the x and y axes, whereas m 30 and
m 03 are sensitive to the asymmetry of these spreads. The
principal axis of the image is the line that gives the best
fit to the image in the least-squares sense; it is the line
through the centroid whose slope tan satisfies
tan2 +
m 20 m 02
tan 1 = 0.
m 11
P1: GTV/GRI
EN007I-841
17:53
626
autocorrelation drops off from its peak at zero displacement (or, equivalently, by how rapidly the images Fourier
power spectrum drops off from its peak at zero frequency),
and the directionality of the texture can be detected by
variation in the rate of dropoff with direction. Strong periodicities in the texture give rise to peaks in the Fourier
power spectrum. More information is provided by the
second-order probability density of gray levels, which tells
us how often each pair of gray levels occurs at each possible relative displacement. (The first-order gray level probability density tells us about the population of gray levels
but not about their spatial arrangement.) Some other possible texture descriptors are statistics of gray-level run
lengths, gray-level statistics after various amounts and
types of filtering, or the coefficients in a least-squares
prediction of the gray level of a pixel from those of its
neighbors. Information about the occurrence of local patterns in a texture is provided by first-order probability
densities of various local property values (e.g., degrees of
match to templates of various types). Alternatively, one
can detect features such as edges in a texture, or segment
the texture into microregions, and compute statistics of
properties of these features or microregions (e.g., length,
curvature, area, elongatedness, average gray level) and of
their spatial arrangement. Overall descriptions of region
texture are useful primarily when the regions are the images of flat surfaces oriented perpendicularly to the line of
sight; in images of curved or slanted surfaces, the texture
will not usually be spatially stationary (see Section X.B).
The desired description of an image is often insensitive
to certain global transformations of the image; for example, the description may remain the same under stretching
or shrinking of the gray scale (over a wide range) or under
rotation in the image plane. This makes it desirable to describe the image in terms of properties that are invariant
under these transformations. Many of the geometric properties discussed in Section XII are invariant under various geometric transformations. For example, connectivity properties are invariant under arbitrary rubber-sheet
distortion; convexity, elongatedness, and compactness are
invariant under translation, rotation, and magnification;
and area, perimeter, thickness, and curvature are invariant under translation and rotation. (This invariance is only
approximate, because the image must be redigitized after the transformation.) The autocorrelation and Fourier
power spectrum of an image are invariant under (cyclic)
translation, and similar transforms can be defined that are
invariant under rotation or magnification. The central moments of an image are invariant under translation, and
various combinations of moments can be defined that are
invariant under rotation or magnification. It is often possible to normalize an image, that is, to transform it into a
standard form, such that all images differing by a given
Image Processing
type of transformation have the same standard form; properties measured on the normalized image are thus invariant
to transformations of that type. For example, if we translate an image so its centroid is at the origin and rotate it so
its principal axis is horizontal, its moments become invariant under translation and rotation. If we flatten an images
histogram, its gray-level statistics become independent of
monotonic transformations of the grayscale; this is often
done in texture analysis. It is much more difficult to define properties that are invariant under 3-D rotation, since
as an object rotates in space, the shape of its image can
change radically, and the shading of its image can change
nonmonotonically.
In an image of a 2-D scene, many types of relationships
among image parts can provide useful descriptive information. These include relationships defined by the relative
values of properties (larger than, darker than, etc.) as
well as various types of spatial relations (adjacent to,
surrounded by, between, near, above, etc.). It is
much more difficult to infer relationships among 3-D objects or surfaces from an image, but plausible inferences
can sometimes be made. For example, if in the region on
one side of an edge there are many edges or curves that
abruptly terminate where they meet the edge, it is reasonable to infer that the surface on that side lies behind the
surface on the other side and that the terminations are due
to occlusion. In any case, properties of and relationships
among image parts imply constraints on the corresponding object parts and how they could be related and, thus,
provide evidence about which objects could be present in
the scene, as discussed in the next subsection.
B. Relational Structures and Recognition
The result of the processes of feature extraction, segmentation, and property measurement is a collection of image
parts, values of properties associated with each part, and
values of relationships among the parts. Ideally, the parts
should correspond to surfaces in the scene and the values should provide information about the properties of
and spatial relationships among these surfaces. This information can be stored in the form of a data structure
in which, for example, nodes might represent parts; there
might be pointers from each node to a list of the properties
of that part (and, if desired, to a data structure such as a
chain code or quadtree that specifies the part as an image
subset) and pointers linking pairs of nodes to relationship
values.
To recognize an object, we must verify that, if it were
present in the scene, it could give rise to an image having
the observed description. In other words, on the basis of
our knowledge about the object (shape, surface properties,
etc.) and about the imaging process (viewpoint, resolution,
P1: GTV/GRI
EN007I-841
17:53
627
Image Processing
C. Models
In many situations we need to recognize objects that belong to a given class, rather than specific objects. In principle, we can do this if we have a model for the class,
that is, a generic description that is satisfied by an object
if and only if it belongs to the class. For example, such
a model might characterize the objects as consisting of
sets of surfaces or features satisfying certain constraints
on their property and relationship values. To recognize an
object as belonging to that class, we must verify that the
observed configuration of image parts could have arisen
from an object satisfying the given constraints.
Unfortunately, many classes of objects that humans can
readily recognize are very difficult to characterize in this
way. Object classes such as trees, chairs, or even handprinted characters do not have simple generic descriptions.
One can characterize such classes by simplified, partial
descriptions, but since these descriptions are usually incomplete, using them for object recognition will result in
many errors. Even the individual parts of objects are often
difficult to model; many natural classes of shapes (e.g.,
clouds) or of surface textures (e.g., tree bark) are themselves difficult to characterize.
P1: GTV/GRI
EN007I-841
17:53
628
and computer vision literature. A more recent idea is to
design a supervisor for the knowledge-based system, so
that by evaluating the systems current state (quality of
image, required speed, etc.), the best possible algorithm,
with optimal choice of parameters, can be chosen from
among the available options. Designing such a system,
however, requires active participation by the user.
XIV. ARCHITECTURES
Image processing and analysis are computationally costly
because they usually involve large amounts of data and
complex sets of operations. Typical digital images consist
of hundreds of thousands of pixels, and typical processing requirements may involve hundreds or even thousands
of computer operations per pixel. If the processing must
be performed rapidly, conventional computers may not be
fast enough. For this reason, many special-purpose computer architectures have been proposed or built for processing or analyzing images. These achieve higher processing speeds by using multiple processors that operate
on the data in parallel.
Parallelism could be used to speed up processing at
various stages; for example, in image analysis, one could
process geometric representations in parallel or match relational structures in parallel. However, most of the proposed approaches have been concerned only with parallel
processing of the images themselves, so we consider here
only operations performed directly on digital images.
The most common class of image operations comprises
local operations, in which the output value at a pixel depends only on the input values of the pixel and a set of its
neighbors. These include the subclass of point operations,
in which the output value at a pixel depends only on the input value of that pixel itself. Other important classes of operations are transforms, such as the discrete Fourier transform, and statistical computations, such as histogramming
or computing moments; in these cases each output value
depends on the entire input image. Still another important
class consists of geometric operations, in which the output value of a pixel depends on the input values of some
other pixel and its neighbors. We consider in this section
primarily local operations.
A. Pipelines
Image processing and analysis often require fixed sequences of local operations to be performed at each pixel
of an image. Such sequences of operations can be performed in parallel using a pipeline of processors, each
operating on the output of the preceding one. The first processor performs the first operation on the image, pixel by
Image Processing
pixel. As soon as the first pixel and its neighbors have been
processed, the second processor begins to perform the second operation, and so on. Since each processor has available to it the output value of its operation at every pixel,
it can also compute statistics of these values, if desired.
Let t be the longest time required to perform an operation at one pixel, and let kt be the average delay required,
after an operation begins, before the next operation can
start. If there are m operations and the image size is n n,
the total processing time required is then n 2 t + (m 1)kt.
Ordinarily n is much greater than m or k, so that the total
processing time is not much greater than that needed to do
a single operation (the slowest one).
Pipelines can be structured in various ways to take advantage of different methods of breaking down operations
into suboperations. For example, a local operation can
sometimes be broken into stages, each involving a different neighbor, or can be broken up into individual arithmetical or logical operations. Many pipeline image-processing
systems have been designed and built.
B. Meshes
Another way to use parallel processing to speed up image
operations is to divide the image into equal-sized blocks
and let each processor operate on a different block. Usually the processors will also need some information from
adjacent blocks; for example, to apply a local operation
to the pixels on the border of a block, information about
the pixels on the borders of the adjacent blocks is needed.
Thus processors handling adjacent blocks must communicate. To minimize the amount of communication needed,
the blocks should be square, since a square block has the
least amount of border for a given area. If the image is
n n, where n = rs, we can divide it into an r r array
of square blocks, each containing s s pixels. The processing is then done by an r r array of processors, each
able to communicate with its neighbors. Such an array of
processors is called a mesh-connected computer (or mesh,
for short) or, sometimes, a cellular array. Meshes are very
efficient at performing local operations on images. Let t
be the time required to perform the operation at one pixel,
and let c be the communication time required to pass a
pixel value from one processor to another. Then the total
processing time required is 4cs + ts 2 . Thus by using r 2
processors we have speeded up the time required for the
operation from tn 2 to about ts 2 , which is a speedup by
nearly a factor of r 2 . As r approaches n (one processor
per pixel), the processing time approaches t and no longer
depends on n at all.
For other types of operations, meshes are not quite so
advantageous. To compute a histogram, for example, each
processor requires time on the order of s 2 to count the
P1: GTV/GRI
EN007I-841
17:53
629
Image Processing
XV. SUMMARY
Given the advances that have been made over the last
15 years in image processing, analysis, and understanding, this article has been very difficult for us to revise.
After completing the revision, we paused to ponder what
has been achieved in these fields over the last 50 years.
Based on our combined perspectives, we have formulated
the following observations that encapsulate the excitement
that we still feel about the subject.
1. SensorsInstead of the traditional single camera in
the visible spectrum, we now have sensors that can
cover a much wider spectrum and can perform simple
operations on the data.
BIBLIOGRAPHY
Aloimonos, Y. (ed.) (1993). Active Perception, Lawrence Erlbaum,
Hillsdale, NJ.
Ballard, D. H., and Brown, C. M. (1982). Computer Vision, Prentice
Hall, Englewood Cliffs, NJ.
Blake, A., and Zisserman, A. (1987). Visual Reconstruction, MIT
Press, Cambridge, MA.
Brady, J. M. (ed.) (1981). Computer Vision, North-Holland,
Amsterdam.
Chellappa, R., and Jain, A. K. (1993). Markov Random Fields: Theory
and Application, Academic Press, San Diego, CA.
Chellappa, R., Girod, B., Munson, D., Jr., Tekalp, M., and Vetterli, M.
(1998). The Past, Present and Future of Image and Multidimensional
Signal Processing. IEEE Signal Process. Mag. 15(2), 2158.
Cohen, P. R., and Feigenbaum, E. A. (eds.) (1982). Handbook of Artificial Intelligence, Vol. III, Morgan Kaufmann, Los Altos, CA.
Ekstrom, M. P. (ed.) (1984). Digital Image Processing Techniques,
Academic Press, Orlando, FL.
Faugeras, O. D. (ed.) (1983). Fundamentals in Computer VisionAn
Advanced Course, Cambridge University Press, New York.
P1: GTV/GRI
EN007I-841
17:53
630
Faugeras, O. D. (1996). Three-Dimensional Computer VisionA Geometric Viewpoint, MIT Press, Cambridge, MA.
Gersho, A., and Gray, R. M. (1992). Vector Quantization and Signal
Compression, Kluwer, Boston.
Grimson, W. E. L. (1982). From Images to Surfaces, MIT Press,
Cambridge, MA.
Grimson, W. E. L. (1990). Object Recognition by ComputerThe Role
of Geometric Constraints, MIT Press, Cambridge, MA.
Hanson, A. R., and Riseman, E. M. (eds.) (1978). Computer Vision
Systems, Academic Press, New York.
Horn, B. K. P. (1986). Robot Vision, MIT Press, Cambridge, MA, and
McGrawHill, New York.
Horn, B. K. P., and Brooks, M. (eds.) (1989). Shape from Shading,
MIT Press, Cambridge, MA.
Huang, T. S. (ed.) (1981a). Image Sequence Analysis, Springer, Berlin.
Huang, T. S. (ed.) (1981b). Two-Dimensional Digital Signal Processing, Springer, Berlin.
Jain, A. K. (1989). Fundamentals of Digital Image Processing,
PrenticeHall, Englewood Cliffs, NJ.
Kanatani, K. (1990). Group-Theoretic Methods in Image Understanding, Springer, Berlin.
Koenderink, J. J. (1990). Solid Shape, MIT Press, Cambridge, MA.
Image Processing
Marr, D. (1982). VisionA Computational Investigation into the Human Representation and Processing of Visual Information, Freeman,
San Francisco.
Mitchell, J. L., Pennebaker, W. B., Fogg., C. E., and LeGall, D. J. (1996).
MPEG Video Compression Standard, Chapman and Hall, New York.
Pavlidis, T. (1982). Algorithms for Graphics and Image Processing,
Computer Science Press, Rockville, MD.
Pennebaker, W. B., and Mitchell, J. L. (1993). JPEG Still Image Compression Standard, Van Nostrand Reinhold, New York.
Pratt, W. K. (1991). Digital Image Processing, Wiley, New York.
Rosenfeld, A., and Kak, A. C. (1982). Digital Picture Processing, 2nd
ed., Academic Press, Orlando, FL.
Ullman, S. (1979). The Interpretation of Visual Motion, MIT Press,
Cambridge, MA.
Weng, J., Huang, T. S., and Ahuja, N. (1992). Motion and Structure
from Image Sequences, Springer, Berlin.
Winston, P. H. (ed.) (1975). The Psychology of Computer Vision,
McGrawHill, New York.
Young, T. Y., and Fu, K. S. (eds.) (1986). Handbook of Pattern Recognition and Image Processing, Academic Press, Orlando, FL.
Zhang, Z., and Faugeras, O. D. (1992). 3D Dynamic Scene AnalysisA
Stereo-Based Approach, Springer, Berlin.
EN008A-861
15:20
GLOSSARY
Condition (condition number) Product of the norms of
a matrix and of its inverse; condition of the coefficient
matrix characterizes the sensitivity of the solution of
the linear system to input errors.
Error matrix (error vector) Difference between the exact and approximate values of a matrix (of a vector).
Gaussian elimination Algorithm that solves a linear system of equations via successive elimination of its unknowns, or, equivalently, via decomposition of the input coefficient matrix into a product of two triangular
matrices.
m n matrix Two-dimensional array of mn entries represented in m rows and n columns. A sparse matrix
is a matrix filled mostly with zeros. A sparse matrix
617
EN008A-861
15:20
618
x1 + x2 + x3 = 3
(1)
EN008A-861
15:20
619
..
..
..
W = ...
(2)
.
.
.
am1 am2 amn bm
which occupies m (n + 1) working array in a computer.
The first n columns of W form the m n coefficient matrix A of the system. The last column is the right-hand-side
vector b.
EXAMPLE 4. The extended matrices of systems of
Examples 1 and 2 are
10 14 0 7
2 1
3.2
3 4 6 4 ,
2 2 10.6
5
2 5 6
0 6 18
Since the primary storage space of a computer is limited, the array [Eq. (2)] should not be too large; 100 101
or 200 201 can be excessively large for some computers.
Practically, systems with up to, say, 100,000 equations, are
handled routinely, however; because large linear systems
arising in computational practice are usually sparse (only
a small part of their coefficients are nonzeros) and well
structured (the nonzeros in the array follow some regular patterns). Then special data structures enable users to
store only nonzero coefficients (sometimes only a part of
them). The algorithms for solving such special systems
are also much more efficient than in the case of general
dense systems.
Consider, for instance, tridiagonal systems, where ai j =
0, unless 1 i j 1. Instead of storing all the n 2 + n
input entries of A and b, which would be required in case
of a dense system, special data structures can be used to
store only the 4n 2 nonzero entries. The running time of
4 1
1
0
1 4 0
1
B2 I2
A=
(3)
=
0 4 1
I2 B2
1
0
1
1 4
4 1
1 0
B2 =
,
I2 =
1 4
0 1
Block tridiagonal structures can be also effectively exploited, particularly in cases where the blocks are well
structured.
D. Specifics of Overdetermined and
Underdetermined Linear Systems
Overdetermined linear systems [Eq. (1)] with m greatly
exceeding n (say, m = 1000; n = 2) arise when we try
to fit given data by simple curves, in statistics, and in
many other applications; such systems are usually inconsistent. A quasi-solution x1 , . . . , xn is sought, which
minimizes the magnitudes of the residuals, ri = bi
(ai1 x1 + + ain xn ), i = 1, . . . , n. Methods of computing such a quasi-solution vary with the choice of the minimization criterion, but usually the solution is ultimately
reduced to solving some regular linear systems [Eq. (1)]
(where m = n) (see Sections II.E; IV.A; and IV.C).
A consistent underdetermined system [Eq. (1)] always
has infinitely many solutions (compare Example 3) and
is frequently encountered as part of the problem of mathematical programming, where such systems are complemented with linear inequalities and with some optimization criteria.
E. General and Special Linear Systems. Direct
and Iterative Methods. Sensitivity to Errors
Generally, the efforts to identify fully the structure of a
system [Eq. (1)] are generously awarded at the solution
stage. Special cases and special algorithms are so numerous, however, that we shall first study the algorithms that
work for general linear systems [Eq. (1)]. We shall follow
the customary pattern of subdividing the methods for
EN008A-861
15:20
620
solving systems [Eq. (1)] into direct and iterative. The direct methods are more universal; they apply to general and
special linear systems, but for many special and/or sparse
linear systems, the special iterative methods are superior
(see Section VIII). If the computations are performed
with infinite precision, the direct methods solve Eq. (1) in
finite time, whereas iterative methods only compute better
and better approximations to a solution with each new
iteration (but may never compute the solution exactly).
That difference disappears in practical computations,
where all arithmetic operations are performed with finite
precision, that is, with round-off errors. In principle,
the round-off errors may propagate and greatly, or even
completely, contaminate the outputs. This depends on
the properties of the coefficient matrix, on the choice of
the algorithms, and on the precision of computation. A
certain amount of study of the sensitivity of the outputs
to round-off error is normally included in texts on linear
systems and in current packages of computer subroutines
for such systems; stable algorithms are always chosen,
which keep output errors lower. In general, direct methods
are no more or no less stable than the iterative methods.
u5
u16
u15
u7
u1
u2
u3
u4
u8
u9
by
u/ y
by
= u 6 u 16
+ u 4 = u 7 u 9
4u 3 + u 4 = u 13 u 15
u1
u 2 + u 3 4u 4 = u 10 u 12
The coefficient matrix A of the system is the block tridiagonal of Eq. (3). With smaller spacing we may obtain a
finer grid and compute the temperatures at more points on
the plate. Then the size of the linear system will increase,
say to N 2 equations in N 2 unknowns for larger N ; but its
N 2 N 2 coefficient matrix will still be block tridiagonal
of the following special form (where blank spaces mean
zero entries),
BN
IN
IN
BN
IN
4
1
BN =
IN
BN
..
.
1
4
..
.
..
.
..
.
IN
IN
BN
..
.
..
.
1
4
u10
x
u13
u12
u11
EN008A-861
15:20
621
h2 2
0
0
1
h2 2
1
0
0
1
2
h 2
1
0
0
1
h2 2
h 2 g(t1 ) x0
h 2 g(t2 )
h 2 g(t3 )
h 2 g(t4 ) x5
minimize
subject to
x11 + x12 = 2
x21 + x22 = 2
x31 + x32 = 2
j = 1, 2
for i = 1, 2, 3,
ci j xi j
i=1 j=1
subject to
xi j = si ,
i = 1, . . . , p
xi j = d j,
j = 1, . . . , q
j=1
p
xi j = si,
i = 1, 2, 3
j=1
3
xi j = d j,
j = 1, 2
i=1
xi j 0
all i
and j
Here si , d j and ci j are given for all i, j. (In our specific example above, p = 3, q = 2, s1 = s2 = s3 = 2, d1 = d2 = 3,
c11 = c12 = c21 = c32 = 1, c22 = c31 = 2.) The linear equations form an underdetermined but sparse and wellstructured system.
The Hitchcock transportation problem is in turn an important particular case of the linear programming problem
(1.p.p.). The 1.p.p. is known in several equivalent forms,
one of which follows:
minimize
cjxj
j=1
j=1
for
subject to
ai j xi j = bi,
i = 1, . . . , n
j=1
x j 0,
j = 1, . . . , m
In this representation, the general 1.p.p. includes an underdetermined system of linear equations complemented
with the minimization and nonnegativity requirements.
Solving the 1.p.p. can be reduced to a finite number of iterations, each reduced to solving one or two auxiliary linear
systems of equations; such systems either have n equations
with n unknowns (in the simplex algorithms for 1.p.p.) or
are overdetermined, in which case their least-squares solutions are sought (in ellipsoid algorithms, in Karmarkars
EN008A-861
15:20
622
T
H
vhg being the complex conjugate of vhg .V =V for a real
V . A matrix V is called symmetric if V = V T and Hermitian if V = V H . For two column vectors u and v of the
same dimension p, their inner product (also called their
scalar product or their dot product), is defined as follows,
uT v = u 1 v1 + u 2 v2 + + u p v p
This is extended to define the m p product of an
m n matrix A by an n p matrix B, AB = [ai1 b1k +
ai2 b2k + + ain bnk , i = 1, . . . , m; k = 1, . . . , p]; that is,
every row [ai1 , . . . , ain ] of A is multiplied by every column [b1k , . . . , bnk ]T of B to form the m p matrix AB.
For instance, if A = [1, 2], B = [1, 2]T , then AB = [5] is
a 1 1 matrix, B A = [ 12 24 ] is a 2 2 matrix, AB = B A.
The m equations of Eq. (1) can be equivalently represented
by a single matrixvector equation, Ax = b. For instance,
the system of Eq. (1) takes the following form,
10 14 0
x1
7
3 4 6 x2 = 4 .
5
2 5
6
x3
(For control, substitute here the solution vector [x1 , x2 ,
x3 ]T = [0, 0.5, 1]T and verify the resulting equalities.)
Hereafter In denotes the unique n n matrix (called the
identity matrix) such that AIn = A, In B = B for all the matrices A and B of sizes m n and n p, respectively. All
the entries of In are zeros except for the diagonal entries,
equal to 1. (Check that I2 A = A for I2 = [ 10 01 ] and for ar-
EN008A-861
15:20
623
ai j = 0, unless 1 i j 1
2|aii | > nj=1 |ai j | for all i
n
2|a j j | > i=1
|ai j | for all j
A = AH (in real case, A = AT )
x H A x > 0 for all vectors x = 0
AH A = I (real case, AT A = I )
ai j = ai+1 j+1 , for all i, j < n
ai j = ai+1 j1 , for all i < n, j > 1
12. Vandermonde:
ai j = a i1
for all i, j; a j distinct
j
ai j = 0, unless i = j
ai j = 0, unless i j
aii = 1 for all i
The transpose of 2 (of item 2a)
ai j = 0, unless g i j h
1
0.1
2
0.2
3
0.3
4
0.5
5
0.7
6
1.0
7
1.4
8
2.0
0.197
0.381
0.540
0.785
0.951
1.11
1.23
1.33
i:
xi :
fi :
1
970
2
990
3
1000
4
1040
1
1000
1
1040
EN008A-861
15:20
624
i:
xi 1000:
fi :
30
10
40
qi
j=1
subject to
q f Ac,
q f + Ac
i = 1, 2, . . . , N
EN008A-861
15:20
625
For i = n, n 1, . . . , 1
EXAMPLE 8.
If aii = 0,
Else
xi := bi
aii
ai j x j
j=i+1
10x1 + 14x2
=7
=7
=7
0.2x2 +
6x3 = 6.1
155x3 = 155
For the general system [Eq. (1)], where m = n, the algorithm can be written as follows.
ALGORITHM 1. FORWARD ELIMINATION. Input the n 2 + n entries of the extended matrix W =
[wi j , i = 1, . . . , n; j = 1, . . . , n + 1] of the system of
Eq. (1), which occupies n (n + 1) working array:
For k = 1, . . . , n 1,
For i = k + 1, . . . , n,
EXAMPLE 7.
x1 + 2x2 x3 = 3
0 x2 2x3 = 6
6x3 = 18
Back substitution yields x3 = 3; x2 is a free variable; x1 =
6 2x2 .
B. Forward Elimination Stage
of Gaussian Elimination
Every system [Eq. (1)] can be reduced to triangular form
using the following transformations, which never change
its solutions:
1. Multiply equation i (row i of the extended matrix W )
by a nonzero constant.
2. Interchange equations i and k (rows i and k of W ).
3. Add a multiple of equation i to equation k (of row i to
row k of W ).
EN008A-861
15:20
626
The entire computational process of forward elimination
can be represented by the sequence of n (n + 1) matrices W (0) , . . . , W (n1) , where W (k1) and W (k) denote
the contents of the working arrays before and after elimination step k, respectively, k = 1, . . . , n 1; W (0) = W,
whereas W (n1) consists of the multipliers (placed under
the diagonal) and of the entries of U .
EXAMPLE 9. The sequence W (0) , W (1) , W (2) represents forward elimination for a system of three equations
of Example 1:
10 14 0 7
10
14 0 7
10
14
0
7
6
6.1
0.3 0.2
0.5 25 155 155
The presented algorithm works if and only if the pivot
entry (k, k) is zero in none step k, k = 1, . . . , n 1, or,
equivalently, if and only if for none k the (n k) (n k)
principal submatrix of the coefficient matrix A is singular.
(A p p submatrix of A is said to be the principal if it is
formed by the first p rows and by the first p columns of A.)
That assumption always holds (so the validity of the above
algorithm is assured) in the two important cases where A
is row or column diagonally dominant and where A is
Hermitian (or real symmetric) positive definite (compare
the list of special matrices in Section II.D). (For example, the system derived in Section II.A is simultaneously
row and column diagonally dominant and real symmetric;
multiplication of all the inputs by 1 makes that system
also positive definite. The product V H V for a nonsingular
matrix V is a Hermitian positive definite matrix, which is
real symmetric if V is real.)
Next, we assume that in some step k the pivot entry
(k, k) is 0. Then we have two cases.
Case 1. The entries (k, k), (k + 1, k), . . . , (s 1, k) are
zeros; the entry (s, k) is nonzero, where k < s n. In that
case interchange rows s and k, bringing a nonzero entry
into the pivot position; then continue the elimination.
EXAMPLE 10.
10
14
0 7
10
14 0 7
10
14 0 7
0.5 5 5 2.5
0.3 0 6 6.1
Case 2. The pivot entry (k, k) and all the subdiagonal entries (s, k) for s > k equal 0. In that case continue
elimination, skipping the kth elimination step and leaving
the (k, k) entry equal to 0. For underdetermined systems,
apply complete pivoting (see Section III.C). Some subroutines end the computation in Case 2, indicating that
the system is singular.
EXAMPLE 11.
10
14
3 4.2
7
5
10
14
10
0.3
0.5
1
10
0.3
0.5
1
0 7 6
5 4 5
5 5 7
5 9 4
14
0
0
0
0
5
5
5
14
0
0
0
0
5
5
1
6
6.8
4
2
7
6
6.1 6.8
1.5 4
0.5 6
7
6.1
1.5
2
3 2 0 1
0.3 0.1 0.3 1
Summarizing, forward elimination with pivoting reduces arbitrary linear system Eq. (1) to triangular form
and, respectively, to either Case 1 or Case 2 of the previous section (see the end of Section III.C for complete
classification).
Forward elimination and back substitution together are
called Gaussian elimination algorithm. There exist several
modifications of that algorithm. In one of them, Jordans,
the back substitution is interwoven throughout the elimination; that is, every pivot equation times appropriate multipliers is subtracted from all the subsequent and preceding
equations; this turns the system into diagonal form in n
elimination steps. [Each step, but the first, involves more
flops, so the resulting solution of the general system of
Eq. (1) becomes more costly.] In the case of systems of
two equations, Jordans version is identical to the canonical Gaussian elimination.
EN008A-861
15:20
627
10
14
0
7
3 4.198 6 3.901
5
2
6
7
10
14
0
7
10
14
0
7
6
6.001
0.3 0.002
0.5 2500 15006 15005
Solving the resulting upper triangular system, we obtain
the following approximation to the solution:
x3 = 0.99993,
x2 = 0.75,
x1 = 0.35
This greatly differs from the correct solution, x1 = 0,
x2 = 0.5, x3 = 1, because the division (at the second elimination step) by the small diagonal entry 0.002 has magnified the round-off errors.
The algorithm can be made more stable (less sensitive
to the errors) if the rows of working array are appropriately
interchanged during the elimination. First, the following
policy of row interchange is called (unscaled) partial pivoting. Before performing the kth elimination step, choose
(the least) i such that |wik | is maximum over all i k and
interchage rows i and k. Row i is called pivotal, the entry wik is called the pivot entry of step k. Keep track of
all the row interchanges. In some subroutines row interchanges are not explicit but are implicitly indicated by
pointers. (Each step k may be preceded by scaling the
equations by factors of 2s on binary computers to make
max j |wi j | in all rows i k lying between 1 and 2. Such
scaling is expensive, however, so it is rarely repeated for
k > 1.)
10
14
0
7
1
10
14
0
7
1
5 6 3.5
3
0.5
x1 x2 = 0
EN008A-861
15:20
628
10
3
7 1
,
2 0
W =
10
3
7
2
0
1
3 2 0 1
0.3 0.1 0.3 1
Both systems have been simultaneously reduced to upper
triangular form, so back substitution immediately gives
the solutions x2 = 3, x1 = 2 to the first system and
x2 = 10, x1 = 7 to the second. This defines the inverse
2 7
matrix A1 = [ 3
]. (Verify that A A1 = A1 A = I
10
10 7
for A = [ 3
].)
2
In the elimination stage for k systems with a common
m n matrix A (as well as for an underdetermined system
with n = m + k 1), m 3 /3 + km 2 /2 flops and (k + m)m
units of storage space are used; for matrix inversion k = m,
it is easier to solve the system Ax = b than to invert A.
The back substitution stage involves km 2 /2 flops for the k
systems and (k + m/2)m for the underdetermined system.
E. Block Matrix Algorithms
Arithmetic operations with matrices can be performed the
same as those with numbers, except that singular matrices
cannot be inverted, and the communitive law no longer
holds for multiplications (see Section II.D). If the coefficient matrix is represented in block matrix form, as in
Eq. (3), then we may perform block Gaussian elimination
operating with matrix blocks the same as with numbers
and taking special care when divisions and/or pivoting are
needed. The block version can be highly effective. For instance, we represent the linear system of Section II.A as
follows [compare Eq. (3)]:
B2 I2
y
c
u 6 u 16
=
,
c=
z
d
I2 B2
u 7 u 9
u 13 u 15
4 1
d=
,
B2 =
u 10 u 12
1 4
where I2 is the 2 2 identity matrix, and y, z are twodimensional vectors of unknowns. Then block forward
elimination transforms the extended matrix as follows:
I2 c
B2
B2 I2 c
I2 B2 d
B21 C2 f
Here, C2 = B2 B21 and f = d B21 c. Block back substitution defines the solution vectors
z = C21 f,
y = B21 (c z) = B21 c C21 f
The recent development of computer technology greatly
increased the already high popularity and importance
of block matrix algorithms (and consequently, of matrix multiplication and inversion) for solving linear systems, because block matrix computations turned out to be
EN008A-861
15:20
629
1
0
0
1
0 ,
L = 0.3
0.5 25 1
10 14
0
7
6
6.1 ,
U = 0 0.2
0
0 155 155
10 14
0
6
U = 0 0.2
0
0 155
For that special instance of Eq. (1), W = L U , A = LU ;
similarly for the general system of Eq. (1), unless pivoting is used. Moreover, Gaussian elimination with partial
pivoting can be reduced to a certain interchange of the
rows of W (and of A), defined by the output vector
p (see Algorithm 2 in Section III.C), and to Gaussian
elimination with no pivoting. Any row interchange of W
is equivalent to premultiplication of W by an appropriate permutation matrix P 1 (say, if P 1 = [ 01 10 ], then
rows 1 and 2 are interchanged), so Gaussian elimination
with pivoting computes matrices P, L, and U such that
P 1 W = L U , P 1 A = LU, W = P L U , A = P LU ; that
is, the LU factors of P 1 A and the L U factors of P 1 W
are computed. P LU factorization is not unique, it depends
on pivoting policy; the elimination with no pivoting gives
P = 1, W = L U ; A = LU .
When P LU factors of A are known, solving the system
Ax = b is reduced to the interchange of the entries of b
and to solving two triangular systems
Ly = P 1 b,
Ux = y
(4)
Sections III.A and III.B lead to exactly the same computations (within the order of performing the operations). In
subroutine packages, the solution based on P LU factorization is usually preferred. Among its many applications,
P LU factorization of A leads to very effective computation of the determinant of an n n matrix A, det A = det P
det U = (det P)u 11 u 22 , . . . , u nn , where u 11 , . . . , u nn denotes the diagonal entries of U and where det P = (1)s , s
being the total number of all the row interchanges made
during the elimination.
Gaussian elimination applied to overdetermined or underdetermined systems also computes P LU factorizations of their matrices (see Sections IV.AC about some
other important factorizations in cases m = n).
G. Some Modifications of LU Factorization.
Choleskis Factorization. Block
Factorizations of a Matrix
If all the principal submatrices of an n n matrix A
are nonsingular, LU factors of A can be computed; furthermore, L D M T factors of A can be computed where
D M T = U , D is a diagonal matrix, D = diag (u 11 , . . . ,
u nn ), so both L and M T are unit triangular matrices (having only 1s on their diagonals). If the L D M T factors of
A are known, solving the system Ax = b can be reduced
to solving the systems Ly = b, Dz = y, M T x = z, which
costs only n 2 + n flops, practically as many as in case
where the LU factors are known. The following modification of Gaussian elimination computes the L D M T factors
of A.
ALGORITHM 3. LDMT FACTORIZATION
For k = 1, . . . , n
For g = 1, . . . , k 1
u g : = agg agk
vg : = akg agg
akk : = akk
k1
ak j u j
j=1
aik : = aik
akk ,
ai j u j
aki : = aki
j=1
k1
v j a jk
akk
j=1
EN008A-861
15:20
630
ik
ik
j ai j u j )/akk , or aki : = (aki
j kj j
v
a
)/a
,
where
computing
inner products is the
kk
j j ji
main operation (easy on many serial computers).
Algorithm 3, for LDM T factorization, is a simple extension of Crouts algorithm, which computes LD and
M T , and of Doolittles algorithm, which computes L
and U = DM T . If A is symmetric and has only nonsingular principal submatrices, then L = M, so Algorithm
3 computes the LDL T factorization of A. If A is symmetric and positive definite, then allthe diagonal
entries
of D are positive, so the matrix D = diag [ d11 ,
. . . , dnn ] can be computed;
then Choleskis factoriza
tion, A = GG T , G = L D can be computed. Algorithm
4 computes Choleskis factorization using n 3 /6 flops and
only n(n + 1)/2 units of storage space.
ALGORITHM 4. CHOLESKIS FACTORIZATION
For k = 1, . . . , n
akk : = akk
k1
ak2j
v=0
j=1
A1 =
A11 0 I A1
11 A12
0
B 0
I
I
0
A1
A1
11 A12
11
0
B 1 A21 A1
I
11
I
A21 A1
11
I
0
0
I
0
I
(9)
A = max
|ai j |,
i
(8)
(6)
(7)
A21 A1
11 A12
A = maxAv/v
For i = k + 1, . . . , n
k1
aik : = aik
akk
ai j ak j
A =
1/2
j=1
Then,
A1 = max
j
|ai j |,
A
n A2 mA
A1
m A2 nA1
EN008A-861
15:20
631
2 7
A1 =
3 10
1
A 1 = 13,
(10)
(11)
x1
217
=
254
x2
563
659
The correct solution is x1 = 1, x2 = 1. For the approximation x1 = 0.341, x2 = 0.087, we have the error and
the residual vectors
e = [0.659, 0.913]T
+
x 1 cond(A)E/A A
b
r = [0.001, 0]T
ij
EN008A-861
15:20
632
Many years of computational practice have convinced us
that the latter bound is almost always pessimistic, even
where unscaled partial pivoting is applied. The latter observation and the perturbation theorem of Section III.I
imply that Gaussian elimination with complete pivoting
never (and with partial pivoting rarely) greatly magnifies
the input or round-off errors, unless the system of Eq. (1) is
ill-conditioned, so that Gaussian elimination with pivoting
is quite stable numerically. This analysis can be extended
to some block matrix algorithms, in particular, to the block
factorization (5)(7) for Hermitian (real symmetric) and
positive definite matrices A.
EN008A-861
rh j : =
ai h ai j
i=1
For i = 1, . . . , m
ai j : = ai j ai h rh j
However, a faster algorithm, also completely stable
(called the Householder transformation or Householder
reflection), computes R and Q T f using n 2 (m n/3) flops
and mn space. The algorithm can be used for the more
general purpose of computing an m m orthogonal matrix
Q = Q m,m and an m n upper triangular matrix R = Rm,n
such that A = QR, Q T Q = QQ T = I . Previously we considered Q R factorization, where Q had size m n and
R had size n n. Such QR factors of A can be obtained by deleting the last m n columns of Q m,m and
the last m n rows of R (those last rows of R form a
null matrix, for R is upper triangular). Householder transformation of A into R is performed by successive premultiplications of A by the Householder orthogonal matrices Hk = I 2vk vTk /vTk vk , k = 1, 2, . . . , r, where r n,
and usually r = n. The vector vk is chosen such that the
premultiplication by Hk makes zeros of all the subdiagonal
entries of column k of the matrix Ak = Hk1 Hk1 . . . H1 A
and does not affect its columns 1, 2, . . . , k 1. Such a
choice of v(k) for k = 1, 2 is shown below for the matrix A
15:20
633
of our previous example. Here is the general rule. Zero the
first k 1 entries of column k of Ak and let a(k) denote the
resulting vector. Then vk = a(k) a(k)2 i(k) where i(k)
is the unit coordinate whose kth entry is 1 and whose other
entries are zeros; the sign + or is chosen the same as
for the entry k of a(k) (if that entry is 0, choose, say, +)
Remark 3. A modification with column pivoting is sometimes performed prior to premultiplication by Hk for each
k in order to avoid possible complications where the vector a(k) has small norm. In that case, column k of A is
interchanged with column s such that s k and a(s)2
is maximum. Finally, QR factors are computed for the
matrix AP, where P is the permutation matrix that monitors the column interchange. Column pivoting can be performed using O(mn) comparisons.
When only the vector Q T f and the matrix Rmn must
be computed, the vector f is overwritten by successively computed vectors H1 f, H2 H1 f, . . . , and the matrices H1 , H2 , . . . , are not stored. If the matrix Q is to be
saved, it can be either explicitly computed by multiplying the matrices Hi together [Q = (Hr Hr 1 . . . H1 )T ] or
implicitly defined by saving the vectors v1 , v2 , . . . , vr .
EXAMPLE 18. HOUSEHOLDER TRANSFORMATION WITH NO COLUMN PIVOTING. For the matrix
A of Example 17, vT1 = [3, 1, 1, 1], so
2
0 0
0
T
A1 =
,
2000 0 10 50
EN008A-861
15:20
634
the transposed matrix AT and compute its factorization
ATP = QR, where
R11 R12
R=
0
0
and R11 is an r r nonsingular triangular matrix, r =
rank(A), Q = [Q 1 , Q 2 ], and Q 1 is a square matrix. Then
the minimum 2-norm solution x to the system Ax = b can
be computed,
T
R11
x = Q 1 y,
y = P 1 b
T
R12
unless the latter system (and then also the system Ax = b)
is inconsistent.
C. Applications to Overdetermined
Systems of Deficient Rank
Householder transformation with column pivoting can be
applied to a matrix A in order to compute the least-squares
solution to Ac = f even where A does not have full rank,
that is, where r = rank(A) < n m. That algorithm first
computes the factorization AP = QR where
R11 R12
R=
0
0
and R11 is an r r nonsingular upper triangular matrix.
Then the general solution to AT Ac = AT f (that is, the
general least-squares solution to Ac = f ) is computed as
follows:
1
R11 (g R12 b)
1
c= P
b
where the vector g consists of the first r entries of Q T f, and
the vector b consists of the last n r entries of Pc. The
latter n r entries can be used as free parameters in order
to define a specific solution (the simplest choice is b = 0).
Formally, infinite precision of computation is required in
that algorithm; actually, the algorithm works very well in
practice, although it fails on some specially concocted instances, somewhat similarly to Gaussian elimination with
unscaled partial pivoting.
A little more expensive [n 2 (m + 17n/3) flops and 2mn 2
space versus n 2 (m n/3) and mn 2 in the Householder
transformation] but completely stable algorithm relies on
computing the singular value decomposition (SVD) of A.
Unlike Householders transformation, that algorithm always computes the least-squares solution of the minimum
2-norm. The SVD of an m n matrix A is the factorization A = U V T , where U and V are two square orthogonal matrices (of sizes m m and n n, respectively),
U T U = Im , V T V = In , and where the m n matrix
EN008A-861
15:20
635
uct of banded triangular matrices. In particular, if the
bandwidth of A is ( p, q) and if A has LU factors, then
L and U have bandwidths (0, q) and ( p, 0), respectively;
pqn + r (n ( p 2 + q 2 )/2) + r 3 /3 flops, where r = min
{ p, q}, suffice to compute the factors L and U . Then it
remains to use ( p + q + 1)n ( p 2 + q 2 )/2 flops to solve
the system Ax = b. Partial pivoting partially destroys the
band structure of A; however, the resulting PLU factorization of A defines matrices U still having bandwidth
( p + q, 0) and L having at most p + 1 nonzero entries per
column. Consequently, a substantial flop saving is still
possible.
Many banded systems are symmetric positive definite
and/or diagonally dominant. In those cases, pivoting is
unnecessary, and the band structure can be fully exploited;
one-half of the flops used can be further saved if A has
Choleskis factorization.
PLDL T P T factorization of an n n symmetric matrix
A (where P is a permutation matrix) can be computed,
say, by Aasens algorithm, using about n 3 /6 (rather than
n 3 /3) flops and O(n 2 ) comparisons for pivoting, even if A
is not positive definite and has no Choleskis factorization.
In many applications, linear systems have block-band
structure. In particular, the numerical solution of partial differential equations is frequently reduced to solving block tridiagonal systems (see Section II.A). For such
systems, block triangular factorizations of A and block
Gaussian elimination are effective. Such systems can be
also solved in the same way as usual banded systems with
scalar coefficients. This would save flops against dense
systems of the same size, but the algorithms exploiting
the block structure are usually far more effective.
Apart from triangular factorization and Gaussian elimination, there exist other effective direct methods for blockbanded systems. For instance, the oddeven reduction (the
cyclic reduction) is effective for solving symmetric block
tridiagonal systems Ax = b with (2s 1) (2s 1), matrices of the form
D F
E D F
.
.
.
.. .. ..
A=
= PBP T
E D
D
F ..
.
..
.
E
.
.. F
D
E
= P
PT
E F
..
.
.
.
.. F
..
E
D
EN008A-861
15:20
636
Here, the blanks show the block entries filled with zeros; P and P T are permutation matrices such that the
matrix B obtained from the original matrix A by moving all the 2s1 odd numbered rows and columns of A into
the first 2s1 positions. The first 2s1 steps of Gaussian
elimination eliminate all the subdiagonal nonzero blocks
in the first 2s1 columns of the resulting matrix. The
(2s1 1) (2s1 1) matrix in the right lower corner
is again a block tridiagonal block Toeplitz matrix, so the
reduction is recursively repeated until the system is reduced to a single block equation. (For an exercise, apply
this algorithm to the system of Section II.A for the 4 6
grid, n = 15, s = 4.)
4
1
2
v2
v1
v0
5
4
1
5 ,
4
EXAMPLE 21. Polynomial division and solving a triangular Toeplitz system. Given two polynomials,
w(t) = u 4 t 4 + u 3 t 3 + u 2 t 2 + u 1 t + u 0
v(t) = v2 t 2 + v1 t + v0
we compute the quotient q(t) = q2 t 2 + q1 t + q0 and the
remainder r (t) = r1 t + r0 of the division of u(t) by v(t)
such that u(t) = v(t)q(t) + r (t) or, equivalently, such that
v2 0 0
u4
0
v1 v2 0 q2
0 u 3
v0 v1 v2 q1 + 0 = u 2
0 v 0 v1 q 0
r1 u 1
0 0 v0
r0
u0
0
v2
v1
q2
u4
0
0 q1 = u 3
v2
q0
u2
1 0 1 0
0 0
1
0 1 0 1 0 0 ,
2
0 0 1
0 1 0
3
1 2 3
2 1 2
3 2 1
0
1
0 0
1
1
1 0 1 = 1
0 1
2
1
EN008A-861
15:20
637
The coefficient matrices of banded and block-banded linear systems of Section VI.A are sparse, that is, filled
mostly with zeros, and have some regular patterns for all
the nonzero entries. Linear systems with those two features
arise most frequently in applications (compare the regular patterns of banded and block-banded systems, where
nonzeros are grouped about the diagonal). Such sparse and
structured systems can be solved efficiently using special
fast direct or iterative algorithms and special data structures allowing the storage of only nonzero coefficients.
To see the structure of the coefficient matrix A and to
choose appropriate data structures for its representation,
replace the nonzero entries of A by ones. The resulting
n n matrix, B = [bi j ], filled with zeros and ones, can be
interpreted as the adjacency matrix of a (directed) graph
G = (V, E) consisting of the vertex set V = {1, 2, . . . , n}
and of the edge set E such that there exists are (i, j) from
vertex i to vertex j if and only if bi j = 1 or, equivalently,
if and only if ai j = 0. Note that the graph G is undirected
if the matrix A is symmetric. The special data structures
used in graph algorithms (linked lists, stacks, queues) are
extended to the computations for linear systems.
x x x x x
x x
A = x
..
..
.
.
where all the nonzero entries of A are located on the diagonal, in the first row, and in the first column and are denoted
by x. Then the fill-in of Gaussian elimination with no pivoting would make the matrix dense, which would require
an increase in the storage space from 3n 2 to n 2 units.
EN008A-861
15:20
638
With the Markowitz rule for this matrix, no fill-in will take
place.
There also exist general ordering policies that
1. Decrease the bandwidth (CuthillMcKee) or the
profile (reversed CuthillMcKee, King) of a
symmetric matrixA [the profile of a symmetric
n
matrix A equals i=1
(i minai j = 0 j)]
2. Reduce the matrix A to the block diagonal form or to
the block triangular form with the maximum number
of blocks (policies 1 and 2 amount to computing all
the connected components or all the strongly
connected components, respectively, of the associated
graph G
3. Represent symmetric A as a block matrix such that
elimination of the block level causes no fill-in (tree
partitioning algorithms for the associated graph G).
Effective dissection algorithms customarily solve
linear systems whose associated graphs G have small
separators, that is, can be partitioned into two or more
disconnected subgraphs of about equal size by
removing relatively few vertices. For instance,
in
many applications G takes the
form of an n n
grid on the plane. Removing 2 n 1 vertices of the
horizontal and vertical medians separates
G into four
disconnected grids, each with (n + 1 2 n)/4
vertices. This process can be recursively repeated
until the set of all the separators includes all the
vertices. The nested dissections of this kind define the
elimination orders (where the separator vertices are
eliminated in the order-reversing the process of
dissection), which leads to agreat
saving of time and
space. For instance, for the n n plane grid, the
nested dissection method requires O(n 1.5 ) flops with
small overhead, rather than n 3 /3. Furthermore the
triangular factors L and U of PA (or Choleskis factor
L and L T of PAP T in the symmetric case) are filled
with only O(n) nonzeros, so O(n) flops suffice in the
substitution stage, which makes the method even
more attractive where the right side b varies and the
coefficient matrix A is fixed.
D. Solving Path Algebra Problems
via Their Reduction to Linear
Systems. Exploiting Sparsity
Although effective algorithms for solving sparse linear
systems exploit some properties of the associated graphs,
many combinatorial problems can be effectively solved
by reducing them to linear systems of equations whose
coefficient matrix is defined by the graph, say, is filled
with the weights (the lengths) of the edges of the graph. In
particular, that reduction is applied to path algebra prob-
i1
j=1
i = 1, . . . , n
ai j x (s)
j
n
j=i+1
ai j x (s)
j
aii
ALGORITHM 7 (GAUSSSEIDEL)
bi
xi(s+1) =
i1
j=1
ai j x (s+1)
n
j=i+1
ai j x (s)
j
aii
i = 1, . . . , n
Example 22. 4x1 x2 = 7, x1 + 4x2 = 2, x1 = 2, x2 = 1
is the solution. Let x1(0) = x2(0) = 0. Then, the Jacobi iterations give x1(1) = 1.75, x2(1) = 0.5, x1(2) = 1.875, x2(2) =
EN008A-861
15:20
639
4
1
A=
0
L=
1
1
,
4
0
,
0
D = diag[4, 4]
0
U=
0
1
0
EN008A-861
15:20
640
Such a reclassification of the available algorithms for linear systems due to the recent and current development of
computer technology is an area of active research. Making
the final choice for practical implementation of parallel algorithms requires some caution. The following quotient q
is a good measurement for the speedup due to using a
certain parallel algorithm on p processors:
q=
EN008A-861
BIBLIOGRAPHY
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J.,
Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D.
(1995). LAPACK, Users Guide, release 2.0, second edition, SIAM,
Philadelphia.
Bini, D., and Pan, V. Y. (1994). Polynomial and Matrix Computations,
Birkhauser, Boston.
Chvatal, V. (1983). Linear Programming, Freeman, San Francisco, CA.
15:20
641
Demmel, J. W. (1996). Numerical Linear Algebra, SIAM, Philadelphia.
Dongarra, J., Bunch, J. R., Moler, C. B., and Stewart, G. W. (1978).
LINPACK Users Guide, SIAM, Philadelphia.
Golub, G. H., and van Loan, C. F. (1996). Matrix Computations, third
edition, Johns Hopkins Press, Baltimore, MD.
Gondran, M., and Minoux, M. (1984). Graphs and Algorithms, Wiley
(Interscience), New York.
Greenbaum, A. (1997). Iterative Methods for Solving Linear Systems,
SIAM, Philadelphia.
Higham, N. J. (1996). Accuracy and Stability of Numerical Analysis,
SIAM, Philadelphia.
Kailath, T., and Sayed, A., eds. (1999). Fast Reliable Algorithms for
Matrices with Structure, SIAM, Philadelphia.
Pan, V. (1984). How to Multiply Matrices Faster, Lecture Notes in
Computer Science 179, Springer-Verlag, Berlin.
Pissanetsky, S. (1984). Sparse Matrix Technology, Academic Press,
New York.
Spedicato, E., ed. (1991). Computer Algorithms for Solving Linear
Algebraic Equations (the State of Art), NATO ASI Series, Series
F: Computer and Systems Sciences, Vol. 77, Springer-Verlag, Berlin,
1991.
Trefethen, L. N., and Bau III, D. (1997). Numerical Linear Algebra,
SIAM, Philadelphia.
Winter Althaus, G., and Spedicato, E., eds. (1998). Algorithms for
Large Scale Linear Algebriac Systems: Applications in Science and
Engineering, NATO Advanced Science Institute, Kluwer Academic,
Dordrecht, The Netherlands.
EN013E-852
19:10
I.
II.
III.
IV.
V.
GLOSSARY
Activity network (AN) A diagram showing the activities
required to complete a project and the dependences
among activities that will govern the sequence in which
they can be undertaken.
Function points A method of assessing the theoretical
size of a computer-based information system by counting a number of different types of externally apparent
features, applying weights to the counts, and aggregating them.
Product breakdown structure (PBS) A representation
of the products that are to be created by a project.
Product flow diagram (PFD) A diagram indicating the
order in which the products of a project must be created
because of the technical dependences among them.
Project life cycle The sequence of phases needed to accomplish a particular project.
Risk exposure An indicator of the seriousness of a risk,
SOME PRINCIPLES OF MANAGEMENT are applicable to projects of all kinds. A convergence of opinion
on these principles is reflected in, among other things,
139
EN013E-852
19:10
140
the documentation of a body of knowledge (BOK) relating to project management produced by the Project
Management Institute (PMI) in the United States. Comparable BOKs have also been published in many other
countries, including Australia and the United Kingdom.
The applicability of these principles to software development projects is illustrated by the integration by the
software engineering community in the United States of
the PMI BOK into the proposed Software Engineering
Body of Knowledge. The PMI, for its part, recognizes
that the core practices and techniques of project management may need to be extended to deal with specialist projects in particular environments. As software development projects are notoriously prone to delay and
cost overruns, the question therefore arises of whether,
because of their inherent risks, they require specialist
software project management techniques. In an influential paper, Brooks (1987) argued that the products of
software engineering differ from the products of most
other engineering disciplines in their inherent complexity, conformity, changeability, and invisibility. Software
project management can therefore be seen as extending
the range of techniques offered by generic project management to deal with the particular difficulties in software
development.
Software products are more complex than other engineered artifacts in relation to the effort expended on
their creation and hence their cost. In part, this is because
software tends to be composed of unique components.
Clearly software products can be reproduced, as when
millions of copies of an operating system are produced
and sold, but the focus of software development (as with
the authorship of books) is on the development of the
initial unique product. The inherent complexity of software crucially affects such aspects of development as the
definition of requirements and the testing of completed
products.
The problem of conformity centers on the need for software to reflect the requirements of human beings and their
institutions. Brooks pointed out that while the physicist
has to deal with complexity, the physical world does seem
to conform to consistent rules, while human institutions
can often promulgate inconsistent, ambiguous, and arbitrary rules for which the software engineer has to make
provision.
In most fields of engineering, it is commonly accepted
that once a product is built, changes to it are likely to be at
least expensive to implement and might in extreme cases
be technically impossible. Having built a road, for example, one would not expect that a change in the routing of
the road would be undertaken lightly. Yet with software
products, changes in functionality are expected to be in-
r
r
r
r
r
EN013E-852
19:10
141
I. PRODUCT-BASED PROJECT
MANAGEMENT
A. Product Breakdown Structures
The design of software is primarily a mental activity. It is
also largely iterative in nature. One consequence of this is
that it is difficult to judge, as a development activity takes
place, the time that will be required to complete the task.
One symptom of this is the 90% completion syndrome,
where an activity is reported as being on schedule at the
end of each week, say, until the planned completion date is
reached, at which point it is reported as being 90% complete for however remaining weeks are actually required
for the accomplishment of the task. The first step in avoiding this is to commence project planning with the creation
of a product breakdown structure (PBS) identifying all the
products that the project is to create.
The products identified may be technical ones, such as
software components and their supporting documentation,
management products, such as plans and reports of various types, or items that are the by-products of processes to
ensure software quality, such as testing scripts. The products may be ones that will be delivered to the customer
(deliverables) or intermediate products created at various
stages of the project for internal purposes, for example, to
clarify and communicate design decisions between members of the development team. The concept of a product is
a broad one, so that a person could be a product (e.g., a
trained user) or the product could be a revised version of
a previously existing product (e.g., a tested program).
A simplistic example of how a fragment of a PBS might
appear is shown in Fig. 1. In the hierarchy a higher level
box assigns a name to the grouping of lower level products that belong to it, but does not itself add any new
products.
The PBS may be compared with the work breakdown
structure (WBS), which is the more traditional way of decomposing a project. Some advantages of the PBS over
EN013E-852
19:10
142
EN013E-852
19:10
143
EN013E-852
19:10
144
dictate the amount of functionality to be delivered: if time
runs out, rather than extending the deadline, the delivery
of some of the less urgent functionality is deferred until a
later increment.
The incremental approach is not without possible disadvantages. Each increment can be perceived as a small
project in its own right. Projects have startup costs and
there is a risk that the incremental approach can lead to a
reduction in productivity because of the loss of economies
of scale. There is also the risk of software breakage:
because of the lack of an overall detailed design, later
increments may require the software written for earlier
increments to be rewritten so that it is compatible with
this later functionality. The occurrence of software breakage may be a symptom of a deeper problem: the piecemeal
development of the overall application may lead to insufficient effort being given to the creation of a robust unifying
architecture. The focus on a succession of individual increments, especially where the minds of developers are
being concentrated by the tight deadlines implied by time
boxing, might mean that opportunities for different subsystems to share code may be overlooked. There is also a
risk that the short-term nature of the concerns of the developers of increments leads to the neglect of longer term
concerns relating to maintainability.
D. Evolutionary Models
Although it has been suggested that all the details of every
stage of a project planned using the waterfall model do
not have to be determined before the start of the project,
managers usually need to provide the client with a clear
idea of when the project will be completed. The history of
computer disasters indicates this is often problematic. A
large part of the difficulty stems from the uncertainties that
are inherent in the environments of many projects. These
uncertainties might reside in the nature of the customer
requirements or in the nature of the technical platform
that is to be used to deliver functionality. The rational approach to the reduction of such uncertainty is to commit
resources to buying knowledge through trial and experiment, usually by building prototypes. Prototypes, working
models of the software, can be used to explore the nature
of user requirements, or to trial alternative ways of using technical platforms. Prototypes can also be used to
assess the impact of the adoption of a particular information technology (IT) solution on the operations of the host
organization.
A prototype can be a throwaway, in that, once the
lessons learnt have been documented, it is discarded and
development starts afresh. Alternatively, a prototype can
be evolutionary, and gradually modified until it is transformed into the final operational product.
EN013E-852
19:10
145
EN013E-852
19:10
146
managers: a project leader who gives day-to-day direction about the work in hand, and a specialist manager
concerned with such matters as technical training needs.
A major problem with software development continues to be the dearth of good software developers. Many
observers have noted the wide difference in productivity
between the most and the least capable developers. It has
been suggested that the best way of achieving success with
a software project is to ensure that the best staff are hired
and that then they are used to best effect. This train of
thought leads to the chief programmer team. The chief
programmer is a highly talented and rewarded developer,
who designs and codes software. They are supported by
their own team designed to maximize the chief programmers personal effectiveness. There is a co-pilot with
whom the chief programmer can discuss problems and
who writes some of the code. There is an editor to write
up formally the documentation sketched out by the chief
programmer and a program clerk to maintain the actual
code and a separate tester. The general idea is that the
team is under the control of a single unifying intellect.
The major problem with this strategy is the difficulty of
obtaining and retaining the really outstanding software
engineers to fulfil the role of chief programmer.
A more practical and widespread approach is to have
small groups of programmers under the leadership of senior programmers. Within these groups there is free communication and a practice of reviewing each others work.
The structures above assume that developers work in
isolation from users and other, more business-orientated
analysis specialists. The adoption of a rapid application
development strategy would require this approach to be
radically rethought.
EN013E-852
19:10
147
ahead with a change. Once the cost of the change has been
assessed, then a decision is needed on going ahead with
the change. A positive decision would lead to an authorization for work to proceed and for copies of the baselined
products affected to be released to developers assigned to
making the changes.
EN013E-852
19:10
148
Project attributes
Management
Engineering
Work environment
Other
a From Conrow, E. H., and Shishido, P. S. (1997). Implementing risk management on software
intensive projects. IEEE Software 14(3) (May/June), 8389.
to be groundless, while new risks can emerge unexpectedly. Some risks related to specific events, the delivery of
equipment, for example, will simply disappear because the
activity has been successfully accomplished. Hence risks
need to be carefully monitored throughout the execution
of the project: one method of doing this is to maintain a
project risk register, or inventory, which is reviewed and
updated as part of the general project control process.
EN013E-852
19:10
149
r
r
r
r
r
FIGURE 4 An accumulative probability curve showing the probability that a software component will be completed within different
numbers of staff-days.
cannot be found, then analogies might be sought by informally breaking the application down into component parts
and then seeking analogies for these components. Boehm
drew the conclusion that the various methods were complementary, so that while algorithmic models could produce objective predictions of effort that were not subject
to bias, expert judgment might be able to identify exceptional circumstances. The best practice was therefore to
use the techniques in combination, compare their results,
and analyze the differences among them.
B. COCOMO: An Example
of an Algorithmic Model
Nearly all the techniques listed above, especially the algorithmic models, depend on some measure of software
size. If the size of implemented software applications can
be measured in some manner, for example, in lines of
code, and the actual effort needed for these applications
is also recorded, then a productivity rate can be derived
as size/effort (e.g., as lines of code per day). If, for a new
project, the probable size is known, then by applying the
historical productivity rate an estimate of effort for the
new project can be arrived at.
Early approaches to software effort estimation were
based on this principle, a sophistication tending to be an
adjustment by means of the application of exponentiation
to take account of diseconomies of scale. These diseconomies can be caused by larger projects needing disproportionately more effort to deal with communication and
management overheads.
Boehms COCOMO (constructive cost model) illustrates the principle. In its basic, organic mode it is expressed as
pm = 3.2(kdsi)1.05 ,
(1)
EN013E-852
19:10
150
Computer attributes
Personnel attributes
Project attributes
RELY
DATA
CPLX
TIME
STOR
VIRT
TURN
ACAP
AEXP
PCAP
VEXP
LEXP
MODP
TOOL
SCED
A problem with this basic approach is that productivity rates vary considerably between development environments and indeed between projects within the same
environment. One solution to this is to use statistical techniques to build local models. An alternative, favored by the
COCOMO community, is to locate a development environment and application type within the totality of projects
executed by the software development community along
a number of dimensions. The dimensions as identified by
the initial COCOMO are shown in Table II. Each point in
this n-dimensional matrix would have associated with it
an expected productivity rate relating to the software development industry as a whole. For example, an application might have a high reliability requirement compared
to the nominal or industry-average type of project: this
would justify 15% additional effort. However, the programmers involved might have higher than average capabilities, which would justify a reduction in the effort
projected.
In addition to the need for calibration to deal with local
circumstances, a problem with the COCOMO-type approach is that the number of lines of code that an application will require will be difficult to assess at the beginning
of the software development life cycle and will only be
known with certainty when the software has actually been
coded. Lines of code are also difficult for the user community to grasp and thus validate.
C. Function Points
An alternative size measure, function points, was first suggested by Alan Albrecht. This measurement is based on
counts of the features of a computer-based information
system that are externally apparent. These include counts
of the files that are maintained and accessed by the application (logical internal files), and files that are maintained
by other applications but are accessed by the current application (external interface files). Three different types
of function are also counted. These are transactions that
take inputs and use them to update files (external inputs),
transactions that report the contents of files (external outputs), and transactions that execute inquiries on the data
that are held on files. The counts of the different types of
feature are each weighted in accordance with the perceptions of the designers of function points of their relative
importance. The particular weighting that applies is also
governed by the perception of whether the instance of the
feature is simple, average, or complex. These basic ideas
and rules have been subsequently taken up and expanded
by an International Function Point User Group (IFPUG).
An advantage of function points is that they can be
counted at an earlier stage of a project than lines of code,
that is, once the system requirements are determined. During the course of software development there is a tendency,
known as scope creep, for the requirements of the proposed system to increase in size. This is partly because
the users are likely to identify new needs during the gathering of the details of the required features of the new
system. These will clearly require additional effort for
their implementation and are a frequent cause of cost and
time overruns. Counting and recounting function points
during the course of a project can help to keep this phenomenon under control. When the software application is
being constructed for a client under a contract and extra
features are required, then the original contract will need
to be modified to take account of the extra work and cost
to be incurred by the contractor. One option for dealing
with this is to agree at the outset on a price per function
EN013E-852
19:10
151
EN013E-852
19:10
152
a management review focuses on the effectiveness of the
processes.
A technical review is directed by a review leader who
is familiar with the methods and technologies used on the
product to be examined. The review leader must select reviewers and arrange a review meeting when the product
is in a fit state for examination. In addition to the product,
the objectives of the review, the specification which the
product is fulfilling, and any relevant standards should be
available. Reviewers should carefully examine these documents before the meeting. At the meeting itself defects
found by the reviewers are recorded on a technical review
issues list. The temptation to suggest ways of resolving the
issues at this stage should be resisted. If a large number
of defects is found, a second meeting may be needed to
review the reworked product, but otherwise there should
be a management arrangement to ensure that the technical
issues noted by the review are dealt with.
Inspections are similar to reviews in principle, but the
focus is on the scrutiny of a specific document. There must
be one or more other documents against which it can be
checked. A design document would, for example, need a
specification with which it should be compatible. Inspections are associated with M. E. Fagan, who developed the
approach at IBM. He drew particular attention to the conditions needed to make inspections effective. For instance,
the defect detection rate was found to fall off if more than
about 120 lines of a document was reviewed in 1 hr or the
review took longer than 2 hr. Inspections can thus be very
time-consuming, but Fagan was able to produce evidence
that, despite this, the use of inspections could be massively
cost-effective. Both reviews and inspections can be made
more effective by the use of checklists of the most probable errors that are likely to occur in each type of software
product.
Reviews, inspections, and other quality-enhancing
techniques are well established but are potentially expensive. Project managers need to weigh the cost of conformance, that is, the cost of the measures taken to remove
defects during development, against the cost of nonconformance, that is, the cost of remedying defects found
during the final testing phase. There would also be costs
associated with the potential damage an application could
occasion if it were defective when in operation.
D. Quality Plans
Standard entry, exit, and implementation requirements
could be documented in an organizations quality manual. Project planners would then select those standards
that are appropriate for the current project and document
their decisions in a software quality assurance plan for
their project.
EN013E-852
19:10
153
BIBLIOGRAPHY
Brooks, F. (1987). No silver bullet, essence and accidents of software
engineering, IEEE Computer 20(4), 1019.
Brooks, F. P. (1995). The Mythical Man-Month: Essays on Software
Engineering (Anniversary Edition), Addison-Wesley, Reading MA.
Boehm, B. (1981). Software Engineering Economics, Prentice-Hall,
Englewood Cliffs, NJ.
Hughes, B., and Cotterell, M. (1999). Software Project Management,
2nd ed., McGraw-Hill, Maidenhead, U.K.
Humphrey, W. S. (1990). Managing the Software Process, AddisonWesley, Reading, MA.
IEEE. (1997). Managing risk, IEEE Software 14(3), 1789.
Jones, C. (1998). Estimating Software Costs, McGraw-Hill, New York.
Kemerer, C. F. (ed.). (1997). Software Project Management: Readings
and Cases, Irwin, Chicago.
Project Management Institute (1996). A Guide to the Project Management Body of Knowledge, Project Management Institute, Upper
Darby, PA.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
GLOSSARY
Backtracking If the basic control structure of Prolog
(i.e., calling procedures and using clauses in a topdown, left-to-right fashion) leads to a goal that cannot be satisfied, Prolog goes back to a previous choice
point with an unexplored alternative and moves forward again.
Clause Alternative formulation of statements expressed
in first-order predicate calculus, having the form q1 ,
q 2 , . . . q m p1 , p2 , . . . pn .
Constraint Logical relation among several variables,
each taking a value in a given domain.
Definite clause grammar Formalism for describing languages, both natural and artificial. Definite clause
grammars are translated into Prolog yielding a
recursive-descent, top-down parser.
Horn clause Clause having one left-hand side literal at
155
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
156
PROLOG is a computer programming language based on
the ideas of logic programming. Logic programming, like
functional programming, is radically different from conventional, imperative (or procedural) programming languages. Rather than mapping the von Neumann machine
model into a programming language and prescribing how
the computer has to solve the problem, logic programming is derived from an abstract model for describing
the logical structure of a problem with no relationship to
a machine model. Prolog describes objects and the relationships between them in a form close to mathematical
logic. In this context, computation means the deduction of
consequences from a program. Prolog manipulates pure
symbols with no intrinsic meaning. Constraint logic programming extends the purely abstract logical framework
with objects that have meaning in an application domain:
numbers, along with their associated algebraic operations
and relations.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
157
B. Questions
It is possible to ask questions in Prolog. The symbol used
to indicate a question is ?-. There are two different types
of questions: is-questions and which-questions. A typical
is-question is Did Mozart compose Falstaff ? In Prolog
one would write
?- composed(mozart, falstaff).
A. Facts
In a statement like Mozart composed Don Giovanni a
relation (composed) links two objects (Mozart and
Don Giovanni). This is expressed in Prolog in the form
of an assertion:
composed(mozart, don giovanni).
A relationship such as composed is also called a predicate. Several facts together form a database.
Example. A database for operas
composed(beethoven,
composed(mozart,
composed(verdi,
composed(verdi,
composed(verdi,
composed(rossini,
composed(rossini,
composed(paisiello,
fidelio).
don giovanni).
rigolesso).
macbeth).
falstaff).
guillaume tell).
il barbiere di siviglia).
il barbiere di siviglia).
no.
1. Prolog Environment
Prolog waits after each solution for a user input.
A semicolon means: Present one more solution. Hitting the return-key terminates the query. The no at the
end indicates that there are no more solutions.
Actually, the top-level behavior of a Prolog system is
not defined in the ISO standard. For the rest of this article
we will use the following convention for better readability:
if there are one or more solutions to a query, all solutions
are listed, separated by semicolons; the last solution is
terminated by a full stop. The values of the variables for
one solution are separated by a comma. If there is no
solution, this is indicated by the word no.
The text of a Prolog program (e.g., the opera database)
is normally created in a file or a number of files using one
of the standard text editors. The Prolog interpreter can
then be instructed to read in programs from these files;
this is called consulting the file. Alternatively, the Prolog
compiler can be used for compiling the file.
2. Closed-World Assumption
The answer to a query with respect to a program is a
logical consequence of the program. Such consequences
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
158
A variable should be thought of as standing for some definite but unidentified object, which is analogous to the use
of a pronoun in natural language.
4. Compound Terms
1. Terms
The data objects of Prolog are called terms. A term is a
constant, a variable, or a compound term. A constant denotes an individual entity such as an integer or an atom,
while a variable stands for a definite but unidentified object. A compound term describes a structured data object.
2. Constants
Atoms. An atom is a named symbolic entity. Any
symbolic name can be used to represent an atom. If there
is a possibility of confusion with other symbols, the
symbol has to be enclosed in single quotes. The following
are examples:
Quoted: 'Socrates' 'end of file' '('
Unquoted: composed mozart --> ?- :- **
D. Rules
Strings. A string is formed by a sequence of characters enclosed between quotation characters, as in ``this
is a string".
-2.718
5.5E8
-0.34e+8
45.0e-8
3. Variables
A variable name may contain letters, numbers, and the underscore character. It must start with an uppercase letter
or the underscore character.
E. Conjunctions
Given the following database (comments in Prolog come
in two forms: % up to end of line or /* */)
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
159
% mary
% peter
% paul
It can easily be verified that only the object wine satisfies the conditions. Therefore, Prologs answer is
X = wine.
Y=paul, Z=mary},
Y=paul, Z=elizabeth},
Y=paul, Z=xzvky} or
Y=paul}
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
160
a rule
a fact
a goal
(n > 0)
(n > 0).
E. Clauses
can be interpreted:
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
161
Procedural interpretation. There are many possible interpretations, for example, given Grand child,
seeking Grand dad: To find the Grand dad of
a given Grand child, find (compute) a parent of
Grand child and then find (compute) his/her father.
G. Invertibility
Prolog is different from most other languages according
to the inputoutput behavior of arguments. A parameter in
a procedural programming language is of type in, out,
or inout. This means that either a value is passed to a
subroutine, a result is returned from the subroutine, or a
value is passed first and a result is returned afterward. In
Prolog, the same argument of a predicate can be used for
both input and output, depending on the intention of the
user.
In a question like ?-grandfather(paul, peter) both arguments are used as input parameters. In a question like
?-grandfather(paul, X) the second argument may function as an output parameter, producing all grandchildren
of Paul, if any. This aspect of Prolog, that an argument
can be used sometimes as an input parameter but at other
times as an output parameter, is called invertibility.
H. Resolution
Resolution is a computationally advantageous inference
rule for proving theorems using the clausal form of logic.
If two clauses have the same positive and negative literal
(after applying appropriate substitutions as needed), as in
( p q1 q2 . . . qm ) and ( p r1 r2 . . . rn ), then
(q1 q2 . . . qm r1 r2 . . . rn )
logically follows. This new clause is called the resolvent
of the two parent clauses.
Resolution refutation is a proof by contradiction. To
prove a theorem from a given set of consistent axioms,
the negation of the theorem and the axioms are put in
clausal form. Then resolution is used to find a contradiction. If a contradiction can be deduced (i.e., the negated
theorem contradicts the initial set of axioms), the theorem
logically follows from the axioms. An advantage of resolution refutation is that only one inference rule is used. A
problem is combinatorial explosion: Many different candidates can be selected for resolution at each stage of the
proof, and worse, each match may involve different substitutions. Using only the most general unifier eliminates
one disadvantage mentioned above. Restricting clauses
to Horn clauses that have only one positive literal drastically reduces the number of possible reduction candidates.
The specific form of resolution used in Prolog systems
is called SLD-resolution (Linear resolution with Selector
V. PROGRAMMING IN PROLOG
A. Lists
A list is an ordered sequence of elements that can have any
length. Lists are written in Prolog using square brackets
to delimit elements separated by commas, as in
colors([red, blue, green]).
The first list consists of the first four prime numbers, the
next list is the empty list. The last one represents the
grammatical structure of the sentence The boy kicked the
ball:
[[ the, boy ], [kicked, [ the, ball ]]]
det noun
verb
det noun
np
np
vp
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
162
Head
Tail
[a,b,c,d]
[a]
[a,[b,c]]
[[],[a],[b],[a,b]]
a
a
a
[]
[bc,d]
[]
[[b,c]]
[[a],[b],[a,b]]
?- member(X, [a,b,c,d]).
X = a;
X = b;
X = c;
X = d.
C. Example: Permutations
The following Prolog program can be used to compute all
permutations of a given list:
List 2
Instantiated variables
[a, b, c, d]
[X, Y, Z, U]
[a, b, c, d]
[X|Y]
[a, b, c, d]
[X, Y|Z]
X = a
Y = b
Z = c
U = d
X = a
Y = [b, c, d]
X = a
Y = b
Z = [c, d]
X = a
Y = b
Z = []
Incorrect syntax
Fails
[a, b]
[X, Y|Z]
[a, b, c, d]
[prolog, lisp]
[X, Y|Z, W]
[lisp, X]
permutation([], []).
permutation(X, [H|T]):append(Y, [H|Z], X),
append(Y, Z, P),
permutation(P, T).
append([], X, X).
append([A|B], C, [A|D]) :append(B, C, D).
and
B. Recursion
Data structures and procedures that are defined in terms
of themselves are called recursive. In many cases the use
of recursion permits the specification of a solution to a
problem in a natural form. A simple example is the membership relation:
member(Element, [Element| Tail]).
member(Element, [Head|Tail]):member(Element, Tail).
This can be read as The element given as the first argument is a member of the list given as the second argument
if either the list starts with the element (the fact in the first
line) or the element is a member of the tail (the rule in the
second line).
The following are possible questions (with answers):
?- member(d, [a,b,c,d]).
yes
?- member(e, [a,b,c,d]).
no
?- permutation([a,b,c], [a,b,c,d]).
no
Explanation. The predicate append defines a relation between three lists: the list given as the third argument
is the concatenation of the lists given as the first and second arguments. For example, the following holds:
?- append([a,b,c], [i,j], [a,b,c,i,j]).
yes
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
163
?- append(X, Y,
X = [],
X = [1],
X = [1,2],
X = [1,2,3],
::::-
[1,2,3]).
Y = [1,2,3];
Y = [2,3];
Y = [3];
Y = [].
fx,
yfx,
yfx,
xfy,
[:-, ?-]).
+).
*).
).
E. Operators
Operators are a syntactical convenience to make programs
more readable. Instead of saying
composed(mozart, don giovanni).
x: Only operators of strictly lower precedence are allowed at this side of the term.
y: Operators of lower or the same precedence are allowed at this side of the term.
yfx:
xfy:
xfx:
left-associative
a + b + c . . . (a + b) + c . . . +(+(a,b),c)
right-associative
a b c . . . a (b c) . . . (a, (b,c))
nonassociative
a :- b :- c . . . this is an invalid term.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
164
op(
op(
op(
op(
op(
op(
op(
op(
op(
op(
op(
op(
:- op(
:- op(
:- op(
english
english
english
english
german(you, du)
german(are, bist)
german(a, ein)
german(X, X).
:::/*
!.
!.
!.
catch-all */
F. Control Constructs
1. Cut
The cut allows one to control the procedural behavior of
Prolog programs. The cut succeeds only once. In case of
backtracking, it not only fails, but causes the parent goal
to fail as well, indicating that choices between the parent
goal and the cut need not be considered again. By pruning
computation paths in this form, programs operate faster
or require less memory space.
append([], X, X) :- !.
append([X|Xl], Y, [X|Zl]) :append(Xl, Y, Zl).
Second interpretation of the cut. The cutfail combination: If you get this far, you should stop trying to
satisfy this goal, you are wrong! This helps to solve the
problem if a sentence has no Horn-clause representation
such as Siblings are not married. The expression siblings (X, Y) not married (X, Y) would lead to a rule
with a negated left-hand side, which is not allowed in
Horn-clause logic. The solution is to write a Prolog program such as
married(X, Y) :- siblings(X, Y), !, fail.
married(X, Y).
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
165
The goal repeat always succeeds and can always be resatisfied. This goal can be used to build looplike control
structures if used in conjunction with a goal that fails. Such
a loop is called a failure-driven loop:
repeat,
goal1,
goal2,
,
end test.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
166
3. Once
To find only one (the first) solution, once(X) can be used.
If no solutions can be found, it fails. However, on backtracking, it explores no further solutions.
B. All Solutions
Normally, a goal like
?- composed(verdi, X).
produces
Operas = [rigoletto, macbeth, falstaff].
The question
?- bagof(X, composed(verdi, X), Operas).
treats a variable like Work not as free, but as existentially quantified. This results in only one solution:
Operas =[fidelio, don giovanni, rigoletto,
macbeth, falstaff,guillaume tell,
il barbiere di siviglia,
il barbiere di siviglia].
C. Input / Output
A question like
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
167
Write. The predicate for output is write. The argument is a Prolog term. The term is written to the current
output stream.
2. Layout
New line. The predicate nl produces a new line on
the current output stream.
Example. We can write a list of terms
E. Arithmetic
Expressions are evaluated using is, which is defined as
an infix operator (xfx). The right-hand side of the is
goal is evaluated. If variables are used, they have to be
instantiated to numbers, otherwise the Prolog system produces an error. The result of the evaluation is then unified
with the term on the left-hand side. Prolog supports the
usual mathematical functions like abs(), sign(), round(),
truncate(), sin(), cos(), atan(), exp(), log(), sqrt(), and
power().
Example.
?- X is (2 + 3) * 5.
X = 25.
Arithmetic comparison operators cause evaluation of expressions as well. Depending on the results, the goal succeeds or fails. Arithmetic operators and comparison operators are listed in Tables IV and V, respectively.
Examples.
X
X
X
X
6
7
X
Y
X
7
1
is 1 + 2 + 3.
is 4 - 5.
is sqrt(2.3).
is 2**3.
is 1 + 2 + 3.
is 1 + 2 + 3.
is 4 + Y.
is 5, X is 4 + Y.
is a + 2 + 3.
is 4 + Y.
+ 3 =:= 2+2.
1 + 3 =\= 2 + 5.
1 + 3 < 2 + 4.
X=6
X = 1
X = 1.51657508881031
X = 8.0
succeeds
fails
gives an error
succeeds
gives an error
gives an error
succeeds
succeeds
succeeds
?- putcode(65).
?- putchar(a).
prints `A'
prints `a'
D. File Handling
Prolog supports the transfer of data to and from one or
more external files. The predicates to open and close
streams are, respectively,
open(Filename, Mode, Stream)
close(Stream)
Mode is either read, write, or append. An example of copying a file and mapping all lowercase letters into uppercase
letters is given in Program I.
Operator
Description
+
-
Addition
Subtraction
Multiplication
Division (float)
Division (integer)
Remainder
Modulo
Bitwise and
Bitwise or
Bitwise complement
Shift left
Shift right
*
/
//
rem
mod
/\
\/
\
<<
>>
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
168
Description
<
>
=<
>=
=:=
=\=
Less
Greater
Less-equal (the syntax avoids arrows!)
Greater-equal
Equal
Not equal
asserta(
assertz(
assertz(
assertz(
composed(bizet,
(sign(X, 1) :(sign(X, 0) :(sign(X, -1) :-
carmen) ).
X > 0) ).
X =:= 0) ).
X < 0) ).
3. Retracting Clauses
The predicate retract removes a clause from the
database. The goal can be resatisfied. The predicate abolish removes all clauses for a given predicate. The argument for abolish is a predicate indicator, i.e., a term of the
form Name/Arity.
Example.
?- retract( (likes(X, Y) :- Z) ).
The first clause for predicate likes with arity 2 and arbitrary body will be removed.
Example.
?- abolish(likes/2).
All facts and rules for predicate likes with arity 2 are
removed.
G. Manipulating, Creating, and Testing Terms
1. Testing Terms
The predicates for testing terms are shown in Table VI.
These predicates are meta-logical since they treat variables, rather than the terms they denote, as objects of the
language.
2. Manipulating Structured Terms
a. Functor. The predicate functor is called with
three arguments, functor(S, F, A), where S is a structure or atom, F is an atom indicating the name of principal
functor of S, and A is the arity of S.
The predicate functor is used in most cases in either
one of the following ways:
1. Get the principal functor and arity for a given structure, that is, S input, F and A output:
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
169
Succeeds if X is uninstantiated
when the goal is executed
Succeeds for atoms ([] is an atom)
Succeeds for integers
Succeeds for reals
Succeeds if X is an atom or number
Succeeds if X is instantiated to
a compound term
Succeeds if X is instantiated when
the goal is executed
Succeeds for numbers
3. Term Unification
The predicates for testing and forcing equality are defined
as infix operators.
= and \=. Two terms are defined as being equal or not
equal depending on the success of the unification process.
The following goals will succeed:
????-
a
a
X
a
= a.
= X.
= Y.
\= b.
4. Term Comparison
a. == and \==. A different concept is identity. The
main difference concerns variables. Two terms are identical if they are:
1. Variables which are linked together by a previous
unification process, or
2. The same integer, or
3. The same float, or
4. The same atom, or
5. Structures with the same functor and identical
components.
This predicate is necessary in Prolog since goals and structures cannot be manipulated directly. This is possible,
however, for the corresponding list. A typical action in
a grammar rule translator (see Sections VII and VIII)
is to expand a term with two additional arguments. In
the following example, expr(Value) is expanded into
expr(Value,S0,S):
?????-
?- expr(Value) =.. L,
append(L, [S0,S], L1),
G =.. L1.
G = expr(Value, S0, S),
L = [expr, Value],
L1 = [expr, Value, S0, S].
X == X.
X \== Y.
a == a.
a(X,Y) == a(X,Y).
a(3) \== a(X).
c. =:= and =\=. A third form of equality was covered in the section on arithmetics. Two expressions are
considered numerically equal if they evaluate to the same
value, otherwise they are numerically not equal. The following goals will succeed:
?- 1+3 =:= 2+2.
?- 1+3 =\= 1+4.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
170
-->
-->
-->
-->
[1].
[2].
[3].
[4]. /* etc. */
A goal like
?- phrase(expr(X),[3,*,4,+,5]).
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
171
ball
noun
S0 = [the,boy,kicked,the,ball],
S1 = [boy,kicked,the,ball],
S = [kicked,the,ball].
X = 17.
boy
determiner
kicked
noun
verb
noun_ phrase
the
determiner
noun_ phrase
verb_ phrase
This demonstrates the correctness of the following interpretation: The difference between S0 and S is the list
[the, boy] which is the result of parsing the input
sentence S0 as a noun-phrase.
To parse a sentence completely by a rule, the second
parameter has to be set to the empty list:
?- sentence([the,boy,kicked,the,ball],[]).
yes
sentence
-->
-->
-->
-->
-->
-->
-->
The goal
?- noun phrase([the, boy, kicked, the, ball],S).
E. A Grammar Example
The arguments of the nonterminals can be used to produce
data structures during the parsing process. This means that
such a grammar can be used not only to check the validity
of a sentence, but also to produce the corresponding parse
tree. Another use of arguments is to deal with context
dependence. To handle, e.g., number agreement, certain
nonterminals will have an extra argument, which can take
the values singular or plural.
The following grammar demonstrates some aspects of
writing a DCG in Prolog. It produces the complete parse
tree of the sentence. It handles some form of number agreement. (A sentence like The boy kick the ball would be
rejected.). Finally, it separates grammar rules from the dictionary. In this form it is easier to maintain the grammar:
/* A simple grammar */
sentence(s(NP, VP))
--> noun phrase(NP, Number),
verb phrase(VP, Number).
noun phrase(np(Det,Noun), Number)
--> determiner(Det, Number),
noun(Noun, Number).
verb phrase(vp(V,NP), Number)
--> verb(V, Number, transitive),
noun phrase(NP, ).
determiner(det(Word), Number)
--> [word],
{is determiner(Word, Number)|.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
172
noun(n(Root), Number)
--> [Word],
{is noun(Word, Number, Root)|.
verb(v(Root, Tense), Number, Transitivity)
--> [Word],
{is verb(Word, Root, Number,
Tense, Transitivity)|.
/*
is
is
is
is
/* the dictionary */
determiner */
determiner(a, singular).
determiner(every, singular).
determiner(the, singular).
determiner(all, plural).
/*
is
is
is
is
nouns */
noun(man, singular, man).
noun(men, plural, man).
noun(boy, singular, boy).
noun(boys, plural, boy).
is
is
is
is
/* verbs */
is verb(Word, Root, Number,
Tense, Transitivity) :verb form(Word, Root, Number, Tense),
infinitive(Root, Transitivity).
infinitive(kick, transitive).
infinitive(live, intransitive).
infinitive(like, transitive).
verb
verb
verb
verb
verb
verb
verb
verb
verb
kicks a man
kicks a boy.
kicked a man.
man likes every woman.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
173
Program II
expand term((Head --> Body), (Head1:-Body1)) :expand goal(Head, S0, S, Head1),
expand body(Body, S0, S, Body1).
!,
NewGoal =.. NewGoalList.
expand body((P1,P2), S0, S, G) :!,
expand body(P1, S0, S1, G1),
expand body(P2, S1, S, G2),
expand and(G1, G2, G).
%
%
%
%
%
%
%
%
expand and(true, A, A) :- !.
expand and(A, true, A) :- !.
expand and(A, B, (A,B)).
%
%
%
%
?- Z=7, Y=4, Z is X + Y.
(1)
X = 1, Y = 3;
X = 3, Y = 1.
X = 3, Y = 4, Z = 7.
(3)
?- X=3, Y=4, Z is X + Y.
(2b)
A. Introduction
?- Z is X + Y, X=3, Y=4.
%
%
%
%
%
%
%
%
%
%
(2a)
(4)
(5)
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
174
Query (1) produces an instantiation error since at the
time the first goal is executed, X and Y are not bound.
Rearranging the goals (2a) produces the correct solution:
Z = 7. Rearranging the goals would not help for (2b),
even though simple arithmetic would lead to the desired
solution: X is Z Y. Query (3) produces an instantiation
error, but it is trivial that X should be 3 (assuming
X to be an integer!). Prolog easily finds solutions for
query (4); however, query (5) fails because of the first
subgoal (X and Y are unbound, therefore they can be
unified!) Prolog would be a more powerful language if
all these cases would be handled in the intuitive way: a
goal like Z is X + Y should be delayed until X and
Y are bound; 7 is X + 4 should produce a solution for
X; if the constraints narrow a variable to a single value,
the variable should be bound to this value, and there
should be a predicate different terms delayed until the
arguments are sufficiently bound to decide on final failure
or success of the goal. All this is achieved by extending
Prolog with the concept of constraints.
B. Constraint Logic Programming (CLP)
Prolog manipulates pure symbols with no intrinsic meaning. Numbers, on the other hand, have a rich mathematical
structure: algebraic operations (e.g., addition and multiplication) and order (e.g., =, <, and >). Taking advantage
of this for Prolog means extending the purely abstract logical framework by introducing domains for variables and
constraints for those variables which have to be obeyed.
This is called constraint logic programming (CLP). In
such a CLP system the simple unification algorithm that
lies at the heart of Prolog is augmented by a dedicated
solver, which can decide at any moment whether the
remaining constraints are solvable. For efficiencys sake,
solvers for CLP systems need to be monotonic, so that
adding a new constraint to an already solved set does not
force them all to be re-solved. From a simple users perspective, CLP allows one to do mathematics with unbound
variables.
D. Constraint Satisfaction
Constraint satisfaction deals with problems defined over
finite domains. A constraint satisfaction problem is defined as follows:
r A set of variables X = x , . . ., x .
1
n
r For each variable x a finite set D of possible values
i
i
in
in
in
in
The second way to impose domains upon variables is designed for larger problems. It produces domains for a list
of variables:
C. Constraints
?- domain(VarList,Lower,Upper).
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
175
Description
#<
#>
#=<
#>=
#=
#\=
Less
Greater
Less-equal
Greater-equal
Equal
Not equal
?- Z #= X + Y, X=3, Y=4.
X = 3, Y = 4, Z = 7.
(1)
?- X=3, Y=4, Z #= X + Y.
X = 3, Y = 4, Z = 7.
(2a)
?- Z=7, Y=4, Z #= X + Y.
X = 3, Y = 4, Z = 7.
(2b)
?- X #>= 3, X #< 4.
X = 3.
(3)
(5a)
(4)
X in inf..sup, Y in inf..sup.
Description
C1 #/\ C2
C1 #\ C2
#\ C
C1 #\/ C2
C1 #=> C2
C1 #<= C2
C1 #<=> C2
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
176
selected.
max: The leftmost variable with the greatest upper bound
is selected.
ff: The first-fail principle is used. The leftmost variable
with the smallest domain is selected.
ffc: The variable with the smallest domain is selected,
breaking ties by selecting the variable with the most
constraints suspended on it.
variable(Sel): Provides the programmer with direct
control on how the next domain variable is selected.
Furthermore, there are atoms which control the way the
integer for each domain variable is selected.
step: Chooses the upper or the lower bound of a domain
variable first. This is the default.
enum: Chooses multiple integers for the domain variable.
bisect: Uses domain splitting to make the choices for
each domain variable.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
177
?- riddle(Solution,Variables).
+
=
SEND
MORE
MONEY
%
% +
% =
SEND
MORE
MONEY
BIBLIOGRAPHY
Bartak, R. (1999). Constraint Programming: In Pursuit of the Holy
Grail. In Proceedings of WDS99, Prague.
Campbell, J. A. (ed.). (1984). Implementations of Prolog, Wiley, New
York.
Carlsson, M., Ottoson, G., and Carlson, B. (1997). An open-ended finite
domain constraint solver. In Proceedings Programming Languages:
Implementations, Logics, and Programs, Springer-Verlag, New York.
Clark, K. L., and Tarnlund, S.-A. (eds.). (1982). Logic Programming,
Academic Press, New York.
Clocksin, W. F., and Mellish, C. S. (1984). Programming in Prolog,
2nd ed. Springer-Verlag, New York.
Hogger, C. J. (1984). Introduction to Logic Programming, Academic
Press, Orlando, FL.
P1: GQT/GLT
P2: GPA/GAE/GRD
EN013E-853
11:46
178
ISO/IEC 13211 (1995). Prolog International Standard, ISO, Geneva.
Kowalski, R. A. (1979). Logic for Problem Solving, Elsevier NorthHolland, Amsterdam.
Jaffar, J., and Lassez, J. L. (1987). Constraint Logic Programming. In
Proceedings Conference on Principles of Programming Languages,
ACM, Munich.
Lloyd, J. W. (1984). Foundations of Logic Programming, Springer-
EN014E-854
17:21
Real-Time Systems
A. Burns
University of York
I.
II.
III.
IV.
V.
VI.
Introduction
System Model
Computational Model
Scheduling Models
Approaches to Fault Tolerance
Conclusion
GLOSSARY
Aperiodic process A process that is released at nonregular time intervals.
Deadline The time by which a process must complete its
execution.
Event-triggered The triggering of an activity by the arrival of an event.
Jitter Variability in the completion (or release) time of
repeating processes.
Periodic process A process that is released at regular time
intervals.
Real-time systems A computing system with explicit
timing requirements that must be satisfied for correct
execution.
Response time analysis A form of schdulability analysis
that computes the worst-case completion time for a
process.
Scheduling The scheme by which resource are allocated
to processes.
Sporadic process An aperiodic process where consecutive requests are separated by a minimum interarrival
time.
Temporal firewall A partitioning of a distributed system such that an error in the time domain in one
part of the system does not induce timing error
elsewhere.
Time-triggered The triggering of an activity by the passage of time.
45
EN014E-854
17:21
46
I. INTRODUCTION
At the 1998 Real-Time Systems Symposium, Kopetz
(1998) presented a cogent argument for the use of a timetriggered model of computation for hard real-time systems. In this model, all computation and communication
activities are controlled by the passage of time, i.e., they
are triggered by a clock. His argument in support of a
time-triggered approach is based upon the following:
r A rejection of the client-server model of computation
r The need to emphasize the real-time properties of data
r The need to support the partitioning of systems with
Real-Time Systems
EN014E-854
17:21
47
Real-Time Systems
to an analog-to-digital converter (ADC) device at the interface to the computer systems. The device converts the
voltage level to a digital signature representing its value.
By polling the register holding this value, the RTSV can
obtain the effective temperature reading. It is reliable if
the sensor is reliable and is up to date apart from the time
it takes for the reading to become stable in the ADC. Of
course the value in the RTSV will immediately start to
age and will eventually become stale if a new value is not
obtained.
The rate of change of the controlled object will dictate
the rate of polling at the interface. Polling is usually a timetriggered activity and hence time-triggered architectures
are really only focused on systems with a fixed set of continuously changing external objects. A typical controlled
object of this type will give rise to a single sampling rate.
Others will execute in different phases and will ideally
be supported by time-varying polling rates. Interestingly,
even for this essentially time-triggered arrangement the
interface to the ADC is often event-triggered. The software sends a signal to the device to initiate an input action
from a particular input channel. It then waits a predefined
time interval before taking the reading.
For external objects that exhibit discrete changes of state
there are two means of informing the computer system of
the state change. Either the state change event is made
available at the interface, or the interface just has the state
value. In the former case, the computer system can directly
deal with the event; in the latter case polling must again
be used to check the current state of the external object.
A time-triggered system must use the polling approach.
Note that with applications that have discrete-change
external objects, events are a reality of the system. The
computer system may choose to be only time-triggered
by restricting its interface, but at the system level it is necessary to deal with the temporal consequences of events.
The real world cannot be forced to be time-triggered. So
the real issue is when to transform an event into state information. A time-triggered approach always does it at the
interface, an event-triggered model allows an event to penetrate the system and to be transformed into a state only
when it is most appropriate. Hence, the event can have
an impact on the internal scheduling decisions affecting
computation. Indeed the event could have a priority that
will directly influence the time at which the state change
occurs and the consequences of this change.
For the computer system designer, the decision about
the interface (whether to accept events or to be just state
based) is often one that is strongly influenced by the temporal characteristics of the external object. It depends on
the rarity of the event and the deadline on dealing with
the consequences of the event. To poll for a state change
that only occurs infrequently is very inefficient. The air-
bag controller in an automobile provides an extreme example of this. The deadline for deploying an air-bag is
10 msec. It would therefore be necessary to poll for the
state change every 5 msec and complete the necessary
computations within 5 msec. If 1 msec of computation is
needed to check deployment, then 20% of the processing
resource must be dedicated to this activity. Alternatively,
an event-trigger response will have a deadline of 10 msec,
the same computational cost of 1 msec, but negligible processor utilization as the application only requires one such
event ever to be dealt with. Clearly the event-triggered approach is the more efficient.
A. Interface Definition
It is imperative that the interface between the computerbased control system and the environment it is controlling
is well defined and appropriately constrained. This is one
of the attractive features of the time-triggered architecture
and is equally important for the event-triggered (ET) approach. An ET interface consists of state data and event
ports. State data are written by one side of the interface
and read by the other. Concurrent reads and writes are
noninterfering. The temporal validity of the data is known
(or can be ascertained) by the reading activity. This may
require an associated time-stamp but may alternatively be
a static property that can be validated prior to execution.
An event port allows an event to pass through the interface.
It is unidirectional. The occurrence of an event at the system interface will result in a computational process being
released for execution. The effect of an output event into
the computer systems environment cannot be defined; it
is application-specific.
Although event ports are allowed in the interface, it remains state-based as events must carry an implicit state
with them. So, for example, a switch device will not give
rise to a single event Change but will interface with its
state variable via two events: SwitchUp and SwitchDown.
In order to manage the impact of incoming events the
computer system must be able to control its responses,
either by bounding the arrival patters or by only meeting
temporal constraints that are load sensitive. A bound on
the input traffic may again be a static property requiring
no run-time monitoring. Alternatively, an interface may
have the capability to disable the event port for periods
of time and thereby impose rate control over the event
source (see Section V.A). Flexible temporal constraints
can take the form of load-sensitive deadlines (e.g., the
greater the concentration of events, the longer the deadline requirement on dealing with each event) or event rationing. In the latter case an overloaded system will drop
certain events; it is therefore crucial that such events be
EN014E-854
17:21
48
state-based. A time-triggered system has no such robustness when it comes to unforeseen dynamics.
The above discussion has concentrated on the computer
system and its environment. Equally important are the interfaces between distinct subsystems of the computer system itself. On a distributed platform it is useful to construct
temporal firewalls (Kopetz and Nossal, 1997) around
each node or location with local autonomy. An ET interface with its state and event port definition is therefore
required at the boundary of any two subsystems.
The existence of event ports as well as state data would
appear to make the ET interface more complicated than its
TT equivalent. However, Kopetz (1998) recognised that to
meet some timing requirements it is necessary to coordinate the execution of linked subsystems. He terms these
interfaces phase-sensitive. To acquire this sensitivity, the
time-triggered schedules in each subsystem must be coordinated, which therefore necessitates a common time
base. That is, some notion of global time must exist in
the system. With an ET interface, this coordination can
be achieved though the events themselves and hence no
global time base is needed for the architecture itself.
Having said that, many applications will require a global
time service (although not necessarily of the granularity
required by the TT mechanisms), for example, to achieve
an ordering of external events in a distributed environment, or to furnish some forms of fault tolerance will require global time (see Section V). Nevertheless, the ET
architecture itself is not fundamentally linked to the existence of global time. This is important in facilitating the
development of open real-time systems (Stankovic et al.,
1996).
B. Summary
Our system model recognizes the need to support interfaces to external controlled objects (COs) that either
change continuously or in discrete steps. Within the computer system real-time entities (RTSV) will track the
behavior of these external objects. A system designer
has the choice of making the computer system entirely
time-triggered, in which case all COs are polled, or to
be event-triggered, in which case continuously COs are
polled but discrete COs give rise to events that are directly
represented and supported. The decision of which methods
to use must take into account the temporal characteristics
of the external COs and the structure and reliability of
the electronics needed to link each external object to the
computer systems interface.
We have defined an ET interface that allows state data
and events ports to be used. All communication through an
interface is unidirectional. Such interfaces link the computer system to its environment and are used internally
Real-Time Systems
EN014E-854
17:21
49
Real-Time Systems
EN014E-854
17:21
50
and Wellings, 1994). It is also possible to define its semantics formally. Events are assumed to be instantaneous
(i.e., have no duration), whereas processing activities must
always have duration. Time could be modeled as a continuous entity or as a discrete notion (this depends on the
form of verification that will be applied). The notation
RTL (Jahanian and Mok, 1986) exactly fits this requirement, as do the modeling techniques represented by timed
automata (Alur and Dill, 1994).
Although proof-theoretic forms of verification are powerful, they require theorem provers and a high level of
skill on behalf of the user. They have also yet to be used
on large temporal systems. Model-theoretic schemes are,
however, proving to be effective. For example, the combination of model checking and timed automata in tools
such as Uppaal (Larsen et al., 1995, 1997) does allow verification to be applied to nontrivial applications. Although
state explosion remains a problem with large systems, the
ability to break a system down into effectively isolated
zones (using temporal firewalls) does allow a composable
approach.
The verification activity links the attributes of the application code, for example, periods and deadlines with
the requirements of the interface. It does not prove that
an implementation is correct; it does, however, allow the
correctness of the model to be explored. If the subsequent
implementation is able to support the process periods and
deadlines, then the system will work correctly in the temporal domain.
This form of verification is required for hard systems.
For soft or firm components the model can be exercised to determine its average performance or other behavior metrics. The ability to combine worst-case and
average-case behavior is a key aspect of the computational
model.
C. Summary
The event-triggered model of computation introduced in
this section consists of the following:
r Processes, which are released for execution by
r
r
r
r
Real-Time Systems
EN014E-854
17:21
51
Real-Time Systems
some of the schemes that can be built upon the eventtriggered model of computation. Three issues are considered: how a system (or subsystem) can protect itself
against an overload of event occurrences, how to build a
fault-tolerant system when there exists a global time base,
and how to provide fault-tolerant services when there is
no such time base.
A. Protection against Event Overloads
The notion of a temporal firewall is built on the assumption
that the external environment of a system or subsystem
cannot induce a failure in the temporal domain. With a
state-only interface this is straightforward, although coordinated executions between subsystems requires a global
time base. Event ports, however, introduce the possibility that assumptions about the frequency of events may
be violated at run-time. If static analysis is not adequate,
then the source and/or the destination of events must be
monitored and controlled.
To control exports, a subsystem must monitor its event
production and be able to recognize overproduction, a
phenomena known as babbling. The simplest action to
take on recognizing this error is to close the subsystem down. Action at a system level via, for example,
replication will then be needed if availability is to be
preserved.
To control imports requires the ability to close an event
port (e.g., disable interrupts) or to internally drop events
so that an upper bound on the event traffic is guaranteed.
Combining export and import controls facilitates the
production of fault containment regions, at least for temporal faults.
B. Fault Tolerance with Global Time
If a global time base is available (either directly or via
a protocol that bounds relative drift between the set of
system clocks), then it is relatively straightforward to
build fault-tolerant services on either the time-triggered
or event-triggered models.
The first requirement is that the broadcast facility provided by the communication system is actually atomic.
Either all receivers get a message (embodying state data or
an event) or none do; moreover, they get the message only
once. The ability to support atomic broadcast with global
time (to the required level of integrity) is well illustrated
by the time-triggered approach. However, it is also possible to provide an atomic broadcast service on top of an
event-triggered communication media that does not have
a global time base. For example, it has been shown that
the CAN protocol can support atomic broadcast (Ruffino
et al., 1998; Proenza and Miro-Julia, 1999).
EN014E-854
17:21
52
Real-Time Systems
The second requirement, if active replication is required, is for replicated subsystems to exhibit replica determinism. If active replication is not required, i.e., faults
can be assumed to give rise to only fail-stop behavior and
there is time to execute recovery, then replica determinism
is not necessary and cold or warm standbys can be used
to increase availability.
Replica determinism requires all nonfaulty replicas to
exhibit identical external behavior (so that voting can
be used to identify faulty replicas). With the producer/
consumer model, divergent behavior can only occur if
replicated consumers do not read the same data from single or replicated producers.2
The use of lock-stepped processors will furnish replica
determinism but it is inflexible. If replicated processes
are allowed to be more free-running, then different levels of code replication can be achieved and indeed replicated and nonreplicated software can coexist. If replicated
consumers can execute at different times (though within
bounds), it is possible for an early execution to miss data
that a later execution will obtain. To prevent this, a simple protocol can be used based on the following (Poledna
et al., 2000):
r Data are state-based, i.e., a read operation does not
Note the protocol recognized that some messages delivered in less than may be designated slow. This is
a performance issue; it may effect the quality of the communication service but not its fundamental behavior.
3 Another approach is to define quasi-synchronous systems (Verissimo
and Almeida, 1995; Almeida and Verissimo, 1998).
EN014E-854
17:21
53
Real-Time Systems
VI. CONCLUSION
This article has presented the event-triggered model of
computation. It is a generalization of the time-triggered
approach in that it allows actions to be invoked by the passage of time and by events originating in the systems environment or in other actions. By bounding the occurrences
of events, predictable hard real-time systems can be produced. By not requiring an event to be statically mapped
on to a predetermined time line, flexible, adaptive, valueadded, responsive systems can be defined. For example,
a polling process that can dynamically change its rate of
execution when the controlled object is close to a critical
value is easily supported by the event-triggered model of
computation. A statically scheduled time-triggered architecture cannot furnish this flexibility.
As the computational model is supporting real-time applications it is obvious that a local clock (time source) is
required in each subsystem, but the event-triggered model
does not require global time or a strong synchronous assumption about communication. Figure 1 depicts the architecture implied by the event-triggered approach. Within
temporal firewalls processes and state variables are protected from external misuse. Events and state data are communicated between subsystems; the messages embodying
these entities are time stamped (with local clock values).
The computational model can be supported by a number
of flexible scheduling schemes. We have shown how the
EN014E-854
17:21
54
fixed-priority approach is well suited to event-triggered
computation.
ACKNOWLEDGMENTS
The author would like to express his thanks for useful comments made
on an earlier version of this article by Neil Audlsey, Paulo Verissimo,
and Andy Wellings.
BIBLIOGRAPHY
Almeida, C., and Verissimo, P. (1998). Using light-weight groups to
handle timing failures in quasi-synchronous systems. In Proceedings
of the 19th IEEE Real-Time Systems Symposium, Madrid, Spain, pp.
430439, IEEE Computer Society Press.
Alur, R., and Dill, D. L. (1994). A theory of timed automata, Theor.
Computer Sci. 126(2), 183236.
Audsley, N. C., Burns, A., Richardson, M., Tindell, K., and Wellings,
A. J. (1993a). Applying new scheduling theory to static priority preemptive scheduling, Software Eng. J. 8(5), 284292.
Audsley, N. C., Burns, A., and Wellings, A. J. (1993b). Deadline monotonic scheduling theory and application, Control Eng. Pract. 1(1),
7178.
Audsley, N. C., Tindell, K., and Burns, A. (1993c). The end of the line
for static cyclic scheduling? In Proceedings of the Fifth Euromicro Workshop on Real-Time Systems, pp. 3641, IEEE Computer
Society Press.
Audsley, N. C., Burns, A., Richardson, M. F., and Wellings, A. J. (1995).
Data consistency in hard real-time systems, Informatica 9(2), 223
234.
Bate, I. J., Burns, A., and Audsley, N. C. (1996). Putting fixed
priority scheduling theory into engineering practice for safety critical
applications. In Proceedings of the 2nd Real-Time Applications
Symposium.
Burns, A. (1994). Preemptive priority based scheduling: An appropriate
engineering approach. In Advances in Real-Time Systems (Son,
S. H., ed.), pp. 225248, Prentice-Hall, Englewood Cliffs, NJ.
Burns, A., and Davis, R. I. (1996). Choosing task periods to minimise
system utilisation in time triggered systems, Information Processing
Lett. 58, 223229.
Burns, A., and McDermid, J. A. (1994). Real-time safety critical
systems: Analysis and synthesis, Software Eng. J. 9(6), 267281.
Burns, A., and Wellings, A. J. (1994). HRT-HOOD: A design method
for hard real-time ada, Real-Time Syst. 6(1), 73114.
Burns, A., Tindell, K., and Wellings, A. J. (1994). Fixed priority
scheduling with deadlines prior to completion, In 6th Euromicro
Workshop on Real-Time Systems, Vaesteraas, Sweden, pp. 138142.
Cristian, F., and Fetzer, C. (1998). The timed asynchronous system
model. In Digest of Papers, The 28th Annual International Symposium on Fault-Tolerant Computing, pp. 140149, IEEE Computer
Society Press.
Fetzer, C., and Cristian, F. (1996a). Fail-aware detectors, Technical
Report CS96-475, University of California, San Diego, CA.
Real-Time Systems
Fetzer, C., and Cristian, R. (1996b). Fail-awareness in timed asynchronous systems. In Proceedings of the 15th ACM Symposium
on Principles of Distributed Computing, pp. 314321a.
Jahanian, F., and Mok, A. K. (1986). Safety analysis of timing properties
in real-time systems, IEEE Trans. Software Eng. 12(9), 890904.
Joseph, M., and Pandya, P. (1986). Finding response times in a
real-time system, BCS Computer J. 29(5), 390395.
Kopetz, H. (1995). Why time-triggered architectures will succeed in
large hard real-time systems, Proc. IEEE Future Trends 1995, 29.
Kopetz, H. (1998). The time-triggered model of computation. In
Proceedings 19th Real-Time Systems Symposium, pp. 168177.
Kopetz, H., and Nossal, R. (1997). Temporal firewalls in large
distributed real-time systems. In Proceedings IEEE Workshop on
Future Trends in Distributed Systems, Tunis.
Kopetz, H., and Verissimo, P. (1993). Real-time and dependability
concepts. In Distributed Systems, 2nd ed. (Mullender, S. J., ed.),
Chapter 16, pp. 411446. Addison-Wesley, Reading, MA.
Larsen, K. G., Pettersson, P., and Yi, W. (1995). Compositional and
symbolic model-checking of real-time systems. In Proceedings of
the 16th IEEE Real-Time Systems Symposium, pp. 7687, IEEE
Computer Society Press.
Larsen, K. G., Pettersson, P., and Yi, W. (1997). Uppaal in a nutshell,
Int. J. Software Tools Technol. Transfer 1(1/2), 134152.
Lehoczky, J. P. (1990). Fixed priority scheduling of periodic task sets
with arbitrary deadlines. In Proceedings 11th Real-Time Systems
Symposium, pp. 201209.
Leung, J. Y.-T., and Whitehead, J. (1982) On the complexity of
fixed-priority scheduling of periodic real-time tasks, Performance
Evaluation (Netherlands) 2(4), 237250.
Liu, C. L., and Layland, J. W. (1973). Scheduling algorithms for multiprogramming in a hard real-time environment, JACM 20(1), 4661.
Liu, J. W. S., Lin, K. J., Shih, W. K., Yu, A. C. S., Chung, J. Y., and
Zhao, W. (1991). Algorithms for scheduling imprecise computations, IEEE Computer 1991, 5868.
Poledna, S., Burns, A., Wellings, A. J., and Barrett, P. A. (2000).
Replica determinism and flexible scheduling in hard real-time
dependable systems, IEEE Trans. Computing 49(2), 100111.
Proenza, J., and Miro-Julia, J. (1999). MajorCAN: A modification to
the controller area network protocol to achieve atomic broadcast. In
Offered to RTSS99, IEEE Computer Society Press.
Ruffino, J., Verissimo, P., Arroz, G., Almeida, C., and Rodrigues, L.
(1998). Fault-tolerant braodcast in can. In Proceedings of the 28th
FTCS, Munich, IEEE Computer Society Press.
Simpson, H. R. (1986). The mascot method, Software Eng. J. 1(3),
103120.
Stankovic, J. A., et al. (1996). Real-time and embedded systems,
ACM Surv. Spec. Iss. Strategic Directions Computer Sci. Res. 28(4),
751763.
Tindell, K., Burns, A., and Wellings, A. J. (1994a). An extendible
approach for analysing fixed priority hard real-time tasks, Real-Time
Syst. 6(2), 133151.
Tindell, K. W., Hansson, H., and Wellings, A. J. (1994b). Analysing
real-time communications: Controller area network (CAN). In
Proceedings 15th IEEE Real-Time Systems Symposium, pp.
259265, San Juan, Puerto Rico.
Tindell, K., Burns, A., and Wellings, A. J. (1995). Analysis of hard
real-time communications, Real-Time Syst. 7(9), 147171.
Verissimo, P., and Almeida, C. (1995). Quasi-synchronisation: A step
away from the traditional fault-tolerant real-time system models,
Bull. Tech. Committee Operating Syst. Appl. Environ. (TCOS) 7(4),
3539.
P1: GNB/GLT
EN014I-855
19:19
Requirements Engineering
Bashar Nuseibeh
Steve Easterbrook
Open University
University of Toronto
I.
II.
III.
IV.
V.
VI.
VII.
Introduction
Overview of Requirements Engineering
Eliciting Requirements
Modeling and Analyzing Requirements
Context and Groundwork
Integrated Requirements Engineering
Summary and Conclusions
GLOSSARY
Analysis The process of examining models or specifications for the purposes of identifying their state of
correctness, completeness, or consistency.
Elicitation Sometimes also referred to as acquisition or
capture, it is the process of gathering requirements
about an envisioned system.
Modeling In the context of requirements engineering, it is
process of constructing abstract descriptions of the desired structure or behavior of a system and/or its problem domain.
Problem domain The environment in which a desired
system will be installed.
Prototyping The process of developing partial models
or implementations of an envisioned system, for the
purposes of eliciting and validating requirements.
Requirements Describe the world of the problem domain
as the stakeholders would like it to be.
Specification A description of the requirements for a
system.
Stakeholders Individuals, groups, or organizations who
I. INTRODUCTION
Pamela Zave provides a concise definition of Requirements Engineering (RE):
Requirements engineering is the branch of software engineering concerned with the real-world goals for, functions of, and
constraints on software systems. It is also concerned with the
relationship of these factors to precise specifications of software
behavior, and to their evolution over time and across software
families. [Zave]
Simply put, software requirements engineering is the process of discovering the purpose of a software system,
by identifying stakeholders and their needs, and documenting these in a form that is amenable to analysis,
229
P1: GNB/GLT
EN014I-855
19:19
230
communication, and subsequent implementation. There
are a number of inherent difficulties in this process. Stakeholders (who include paying customers, users, and developers) may be numerous and geographically distributed.
Their goals may vary and conflict, depending on their perspectives of the environment in which they work and the
tasks they wish to accomplish. Their goals may not be
explicit or may be difficult to articulate, and, inevitably,
satisfaction of these goals may be constrained by a variety
of factors outside their control.
The primary measure of success of a software system,
and by implication its quality, is the degree to which it
meets its requirements. RE therefore plays a pivotal role
in determining the quality of a software system and its
fitness for purpose. It lies at the heart of software development projects, since the outcomes of an RE process provide crucial inputs into project planning, technical design, and evaluation. Of course, RE is not an
activity that only takes place at the start of a software
development project only. The discovery of requirements
often continues well into the development process, and
can be influenced by the presence of existing technical solutions and constraints. Broadly speaking, however,
RE is a front-end development activitydetermining
subsequent development decisions and used as a measure of success for the final software system that is
delivered.
It is worth noting in passing that software requirements
engineering is often regarded as part of systems engineering, in that the software to be developed has to fit into a
wider technical, social, organizational, and business context. Thus, RE is often also called software systems requirements engineering, or simply systems requirements
engineering, to emphasise that it is a system-wide scope,
and not merely a software engineering one. The consequences of this are that RE is very much a multidisciplinary activity, whose techniques draw upon many areas
of research and practice. These include traditional areas
such as computer science and mathematical logic, systems
theory, and information systems analysis, and the cognitive and social sciences such as psychology, anthropology,
sociology, and linguistics.
Requirements Engineering
P1: GNB/GLT
EN014I-855
19:19
231
Requirements Engineering
stakeholders have divergent goals. Validation is the process of establishing that the requirements elicited provide
an accurate account of stakeholder actual needs. Explicitly
describing the requirements is a necessary precondition
not only for validating requirements, but also for resolving conflicts between stakeholders.
Techniques such as inspection and formal analysis tend
to concentrate on the coherence of the requirements descriptions, asking questions such as: are they consistent,
and are they structurally complete? In contrast, techniques
such as prototyping, specification animation, and the use
of scenarios are geared toward testing a correspondence
with the real world problem. For example, they may ask:
have all the aspects of the problem that the stakeholders
regard as important been covered? Requirements validation is difficult for two reasons. The first reason is philosophical in nature, and concerns the question of truth and
what is knowable. The second reason is social, and concerns the difficulty of reaching agreement among different
stakeholders with conflicting goals.
C. Evolving Requirements
Successful software systems always evolve as the environment in which these systems operate changes and stakeholder requirements change. Therefore, managing change
is a fundamental activity in RE.
Changes to requirements documentation need to be
managed. Minimally, this involves providing techniques
and tools for configuration management and version control, and exploiting traceability links to monitor and control the impact of changes in different parts of the documentation. Typical changes to requirements specifications
include adding or deleting requirements and fixing errors.
Requirements are added in response to changing stakeholder needs, or because they were missed in the initial
analysis. Requirements are deleted usually only during
development, to forestall cost and schedule overruns, a
practice sometimes called requirements scrubbing.
Managing changing requirements is not only a process
of managing documentation, it is also a process of recognizing change through continued requirements elicitation, re-evaluation of risk, and evaluation of systems in
their operational environment. In software engineering, it
has been demonstrated that focusing change on program
code leads to a loss of structure and maintainability. Thus,
each proposed change needs to be evaluated in terms of
existing requirements and architecture so that the trade-off
between the cost and benefit of making a change can be
assessed.
Finally, the development of software system product
families has become an increasingly important form of
development activity. For this purpose, there is a need to
P1: GNB/GLT
EN014I-855
19:19
232
is an activity that continues as development proceeds, as
high-level goals (such as business goals) are refined into
lower-level goals (such as technical goals that are eventually operationalized in a system). Eliciting goals focuses
the requirements engineer on the problem domain and the
needs of the stakeholders, rather than on possible solutions
to those problems.
It is often the case that users find it difficult to articulate
their requirements. To this end, a requirements engineer
can resort to eliciting information about the tasks users
currently perform and those that they might want to perform. These tasks can often be represented in use cases that
can be used to describe the outwardly visible interactions
of users and systems. More specifically, the requirements
engineer may choose a particular path through a use case,
a scenario, in order to better understand some aspect of
using a system.
Requirements Engineering
B. Elicitation Techniques
The choice of elicitation technique depends on the time
and resources available to the requirements engineer,
and of course, the kind of information that needs to be
elicited. We distinguish a number of classes of elicitation
technique:
Group elicitation techniques aim to foster stakeholder agreement and buy-in, while exploiting team dynamics to elicit a richer understanding of needs. They
include brainstorming and focus groups, as well as
RAD/JAD workshops (using consensus-building workshops with an unbiased facilitator).
Cognitive techniques include a series of techniques originally developed for knowledge acquisition
for knowledge-based systems. Such techniques include
protocol analysis (in which an expert thinks aloud while
P1: GNB/GLT
EN014I-855
19:19
233
Requirements Engineering
P1: GNB/GLT
EN014I-855
19:19
234
occur early in the lifetime of a project, motivated by the
evidence that requirements errors, such as misunderstood
or omitted requirements, are more expensive to fix later in
project life cycles.
Before a project can be started, some preparation is
needed. In the past, it was often the case that RE methods
assumed that RE was performed for a specific customer,
who could sign off a requirements specification. However,
RE is actually performed in a variety of contexts, including market-driven product development and development
for a specific customer with the eventual intention of developing a broader market. The type of product will also
affect the choice of method: RE for information systems
is very different from RE for embedded control systems,
which is different again from RE for generic services such
as networking and operating systems.
Furthermore, some assessment of a projects feasibility
and associated risks needs to be undertaken, and RE plays
a crucial role in making such an assessment. It is often
possible to estimate project costs, schedules, and technical feasibility from precise specifications of requirements.
It is also important that conflicts between high-level goals
of an envisioned system surface early, in order to establish a systems concept of operation and boundaries. Of
course, risk should be re-evaluated regularly throughout
the development lifetime of a system, since changes in
the environment can change the associated development
risks.
Groundwork also includes the identification of a suitable process for RE, and the selection of methods and techniques for the various RE activities. We use the term process here to denote an instance of a process model, which
is an abstract description of how to conduct a collection of
activities, describing the behavior of one or more agents
and their management of resources. A technique prescribes how to perform one particular activityand, if
necessary, how to describe the product of that activity in a
particular notation. A method provides a prescription for
how to perform a collection of activities, focusing on how
a related set of techniques can be integrated, and providing
guidance on their use.
Requirements Engineering
P1: GNB/GLT
EN014I-855
19:19
235
Requirements Engineering
ACKNOWLEDGMENTS
The content of this article is drawn from a wide variety of sources, and is structured along the lines of the
roadmap by Nuseibeh and Easterbrook (2000). Nuseibeh
would like to acknowledge the financial support the UK
EPSRC (projects GR/L 55964 and GR/M 38582).
BIBLIOGRAPHY
Chung, L., Nixon, B., Yu, E., and Mylopoulos, J. (2000). Non-Functional
Requirements in Software Engineering. Kluwer Academic, Boston.
Davis, A. (1993). Software Requirements: Objects, Functions and States.
Prentice Hall, New york.
Finkelstein, A. (1993). Requirements Engineering: An Overview, 2nd
Asia-Pacific Software Engineering Conference (APSEC93), Tokyo,
Japan.
Gause, D. C., and Weinberg, G. M. (1989). Exploring Requirements:
Quality before Design, Dorset House.
Goguen, J., and Jirotka, M., eds. (1994). Requirements Engineering:
Social and Technical Issues, Academic Press London.
Graham, I. S. (1998). Requirements Engineering and Rapid Development: A Rigorous, Object-Oriented Approach, Addison-Wesley,
Reading, MA.
P1: GNB/GLT
EN014I-855
19:19
236
Jackson, M. (1995). Software Requirements and Specifications: A Lexicon of Practice, Principles and Prejudices, Addison-Wesley, Reading,
MA.
Jackson, M. (2001). Problem Frames: Analyzing and Structuring Software Development Problems, Addison-Wesley, Reading, MA.
Kotonya, G., and Sommerville, I. (1998). Requirements Engineering:
Processes and Techniques, Wiley, New York.
Kovitz, B. L. (1999). Practical Software Requirements: A Manual of
Contents & Style. Manning.
Loucopoulos, P., and Mylopoulos, J., eds. (1996). Requirements Engineering Journal, Springer Verlag, Berlin/New York.
Loucopoulos, P., and Karakostas, V. (1995). System Requirements Engineering, McGraw-Hill, New York.
Macaulay, L. M. (1996). Requirements Engineering, Springer Verlag,
Berlin/New York.
Nuseibeh, B., and Easterbrook, S. (2000). Requirements Engineering:
A Roadmap, In ICSE-2000 The Future of Software Engineering
(A. Finkelstein, ed.), ACM Press, New York.
Requirements Engineering
Pohl, K. (1996). Process-Centered Requirements Engineering, Research
Studies Press.
Robertson, S., and Robertson, J. (1999). Mastering the Requirements
Process. Addison-Wesley, Reading, MA.
Sommerville, I., and Sawyer, P. (1997). Requirements Engineering: A
Good Practice Guide, Wiley, New York.
Stevens, R., Brook, P., Jackson, K., and Arnold, S. (1998). Systems
Engineering: Coping with Complexity, Prentice Hall Europe.
Thayer, R., and Dorfman, M., eds. (1997). Software Requirements Engineering (2nd Ed.), IEEE Computer Society Press, Los Alamitos, CA.
van Lamsweerde, A. (2000). Requirements Engineering in the Year 00:
A Research Perspective, Keynote Address, In Proceedings of 22nd
International Conference on Software Engineering (ICSE-2000),
Limerick, Ireland, June 2000, ACM Press, New York.
Wieringa, R. J. (1996). Requirements Engineering: Frameworks for
Understanding, Wiley, New York.
Zave, P. (1997). Classification of Research Efforts in Requirements
Engineering, ACM Computing Surveys 29(4): 315321.
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
Software Engineering
Mehdi Jazayeri
Technische Universitat Wien
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.
XI.
GLOSSARY
Component-based software engineering An approach
to software engineering based on the acquisition and
assembly of components.
Component interface The description of what services
are provided by the component and how to request those
services.
Inspections and reviews Organized meetings to review
software components and products with the aim of uncovering errors early in the software life cycle.
Process maturity The capability of an organization to
follow a software process in a disciplined and controlled way.
Programming language A primary tool of the software
engineer. It is a notation used to write software.
Software architecture The overall structure of a software system in terms of its components and their relationships and interactions.
Software component A well-defined module that may
be combined with other components to form a software
product.
Software process An ideal description of the steps involved in software production.
Software requirements A document that describes the
expected capabilities of a software product.
Validation and verification Procedures that help gain
confidence in the correct functioning of software.
SOFTWARE ENGINEERING is the application of engineering principles to the construction of software. More
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
2
precisely, the IEEE Std 610.12-1990 Standard Glossary
of Software Engineering Terminology defines software
engineering as the application of a systematic, disciplined,
quantifiable approach to the development, operation, and
maintenance of software.
Software engineering deals with the building of software systems that are so large or so complex that they are
built by a team or teams of engineers. Usually, these software systems exist in multiple versions and are in service
for many years. During their lifetime, they undergo many
changesto fix defects, to enhance existing features, to
add new features, to remove old features, or to be adapted
to run in a new environment.
We may appreciate the issues involved in software
engineering by contrasting software engineering with
computer programming. A programmer writes a complete
program, while a software engineer writes a software component that will be combined with components written by
other software engineers to build a system. The component written by one software engineer may be modified
by other software engineers; it may be used by others to
build different versions of the system long after the original engineer has left the project. Programming is primarily
a personal activity, while software engineering is essentially a team activity.
The term software engineering was invented in the
late 1960s after many large software projects failed to
meet their goals. Since then, the field has grown to include
many techniques and methods for systematic construction of software systems. These techniques span the entire
range of activities starting from the initial attempts to understand the customer requirements for a software system
to the design and implementation of that system, validation
of the system against the requirements, and the delivery
of the system and its deployment at the customer site.
The importance of software has grown over the years.
Software is now used to control virtually every sophisticated or even not so sophisticated device. Software is
used to control transportation systems including subways,
trains, and airplanes; control power plants; everyday devices such as ovens, refrigerators, and television sets; medical devices such as pace makers and diagnostic machines.
The Internet is certainly powered by sophisticated software. The whole society is dependent upon correct functioning of software. Software is also of growing economic
impact: in 1985, around $140 billion was spent annually
on software worldwide. In 2000, the amount is estimated
at $800 billion worldwide. As a result, the software engineering discipline has gained importance. The emphasis of
software engineering is on building dependable software
economically.
This article reviews the two major elements of software engineering: products and processes. Software en-
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
4
done by a marketing group perhaps with the help of
focus groups. There is no specific customer that may
be consulted about the requirements. In contrast, the
requirements of custom software are driven by a customer.
Shrink-wrap software must be developed in different versions for different hardware platforms and perhaps in different levels of functionality (e.g., in light and professional
versions), whereas custom software is developed and optimized for a particular platform (with a view toward future
evolutions of the platform). These considerations affect
the initial statements of the requirements and the techniques used to achieve those requirements.
B. System Software
System software is intended to control the hardware resources of a computer system to provide useful functionality for the user. For example, an operating system tries to
optimize the use of resources such as processor, memory,
and inputoutput devices to enable users to run various
programs efficiently. A database system tries to maximize
the use of memory and disk resources to allow different
users to access data concurrently.
As opposed to application software, the requirements
of system software are tied directly to the capabilities
of the hardware platform. The requirements must deal
with computer concepts such as hardware resources and
interfaces. The software engineering of system software
typically requires specialized knowledge such as transaction processing for databases or process scheduling for
operating systems.
Despite their dependence on the hardware platform,
system software such as operating systems and databases
these days are developed independently of the hardware
platform. In earlier days, they had to be developed hand
in hand with the hardware. But advances in techniques for
developing abstract interfaces for hardware devices have
enabled system software to be developed in a portable
way. By setting appropriate configuration parameters, the
software is specialized for a particular platform.
System software often provides interfaces to be used by
other programs (typically application programs) and also
interfaces to be used by users (interactively). For example,
an operating system may be invoked interactively by a user
or by database systems.
An increasingly important class of system software is
communication software. Communication software provides mechanisms for processes on different computers
to communicate. Initially, communication software was
included in operating systems. Today, it is being increasingly packages as middleware. Middleware provides facilities for processes on different computers to communicate as well as other facilities for finding and sharing
Software Engineering
resources among computers. For writing distributed applications, software engineers rely on standard middleware
software.
C. Embedded Software
Embedded software is software that is not directly visible or invokable by a human user but is part of a system.
For example, the software is embedded in television sets,
airplanes, and videogames. Embedded software is used
to control the functions of hardware devices. For example, a train control system reads various signals produced
by sensors along tracks to control the speed of the train.
The characteristic of embedded software is that it is developed hand in hand with the hardware. The designers of
the system face tradeoffs in placing a given functionality
in hardware or software. Generally, software offers more
flexibility. For example, a coin-operated machine could be
designed with different-sized slots for different coins or a
single slot with control software that determines the value
of the coin based on its weight. The software solution is
more flexible in that it can be adapted to new coins or new
currencies.
A particular kind of embedded software is real-time
software. This kind of software has requirements in terms
of meeting time constraints. For example, the telephone
software must play the dial tone within a certain time after
the customer has taken the phone off hook. Often real-time
systems are responsible for critical functions such as patient monitoring. In such cases special design techniques
are needed to ensure correct operation within required time
constraints. Real-time software is among the most challenging software to construct.
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
6
The figure gives the simplest model of the waterfall
model. For example, it is clear that if any tests uncover
defects in the system, we have to go back at least to the
coding phase and perhaps to the design phase to correct some mistakes. In general, any phase may uncover
problems in previous phases; this will necessitate going
back to the previous phases and redoing some earlier
work. For example, if the system design phase uncovers inconsistencies or ambiguities in the system requirements, the requirements analysis phase must be revisited to determine what requirements were really intended.
Such problems require us to add feedback loops to the
model, represented as arrows that go back from one phase
to an earlier phase and the need to repeat some earlier
work.
A common shortcoming of the waterfall model is that it
requires a phase to be completed before the next one starts.
While feedback loops allow the results of one phase to affect an earlier phase, they still do not allow the overlapping
of phases. In practice, a common technique to shorten development times is to carry out activities in parallel. The
strict sequential nature of the waterfall model is one of
its most severe drawbacks. Other process models try to
alleviate this problem.
B. Incremental Models
The waterfall model defines the requirements at the beginning and delivers the product at the end. During the whole
development time, the customer is not involved and does
not gain any visibility into the state of the product. Some
models try to remedy this problem by introducing different stages in which partial deliveries of the product are
made to the customer.
One such model is the prototyping approach. The first
delivery to the customer is a prototype of the envisaged
system. The purpose of the prototype is to assess the feasibility of the product and to verify that the requirements of
the customer have been understood by the developer and
will be met by the system. The prototype is then thrown
away (in fact, it is sometimes called a throwaway prototype), and development starts on the real product based
on the now firmly established requirements. The prototyping approach addresses the difficulty of understanding
the real requirements but it does not eliminate the time
gap between the definition of requirements and delivery
of the application.
Incremental process model addresses the delivery gap.
It produces the product in increments that implement the
needed functionality in increments. Increments may be
delivered to the customer as they are developed; this is
called evolutionary, or incremental, delivery.
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
Software Engineering
V. PROCESS QUALITY
The process followed in building software affects the qualities of the software product. For example, a disciplined
process with reviews and inspections along the way is
more likely to produce reliable software than a process
that does not rely on any reviews of the intermediate steps.
Several standards have been developed to measure the
quality of software processes and thus gain confidence in
the quality of the produced product. Such process quality
standards help enterprises such as government agencies
evaluate software contractors.
A typical such standard is the Capability Maturity
Model. This model defines software process capability
as the range of results that may be expected by following
a software process. The process capability of an organization may be used to predict the most likely results of
a project undertaken by the organization. The model also
defines software process maturity as the extent to which
the process is defined explicitly, managed, measured, controlled, and effective. The notion of maturity implies that
an organizations software process capability can grow
as it matures. The model defines five levels of software
capability maturity. The least capable organizations are
graded at Level 1, called initial. An organization at Level 1
has no defined process model. It produces software as
best it can without any apparent planning. As a result,
projects sometimes succeed and sometimes fail. Any
given projects outcome is unpredictable. At Level 2, the
organization has stable and established project management controls. As a result, even though the organization
does not have a documented process, it consistently produces the intended product. This capability is referred to
as repeatable. Level 3 organizations have a defined process that consistently leads to successful projects. Projects
in the organization tailor the defined process to the needs
of the project. A specific group is responsible for the organizations software engineering activities. Management
has good visibility into the progress and status of projects.
Level 4 organizations not only have a defined process but
also collect metrics on the performance of their processes
and can therefore predict with good confidence how their
projects will perform in the future. This level is called
managed. At the highest level of process maturity,
organizations not only measure their process performance
but also aim to improve their performance continuously.
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
8
This level is called optimizing. The focus in optimizing
organizations is no longer merely the production of the
software productwhich is considered a certaintybut
on continuous improvement of the organization.
An initial study of software organizations around the
world showed that very few organizations perform at
Level 5. Most were at Level 3 or below. But once such
criteria for process performance have been defined, they
can be used by organizations to strive for improvement.
Each level of maturity is defined with the help of a
detailed list of characteristics. There are organizations
that perform process assessment, that is, measure the
maturity of a software development organization and provide it with guidelines on how to try to improve to the
next level of maturity. An organizations process maturity
level gives its customers a basis for realistic expectations
of its performance. This is especially important if a company wants to outsource the development of key parts of
its software.
Measuring the maturity level of an organization is one
way of assessing the performance of a software development organization. Other standards also exist that measure
the process differently. In all cases, such measurements are
difficult to make and should be evaluated carefully.
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
Software Engineering
for the library in the larger system are also relevant stakeholders. As an example, in the case of the library of a
university department, the department head is a stakeholder, whose goals and requirements should be properly taken into account. The department heads goal might
be to encourage students to borrow both books and journals, to stimulate their interest in reading the current scientific literature. Or, alternatively, he or she might require
that students may only borrow books, limiting the ability
to borrow journals to only staff members. An important
stakeholder is the person or organization that will have the
budget to pay for the development. In this case, it may be
the dean or the president at the university. As this example
shows, the various stakeholders have different viewpoints
on the system. Each viewpoint provides a partial view of
what the system is expected to provide. Sometimes the
different viewpoints may be even contradictory. The goal
of requirements engineering is to identify all conflicts and
integrate and reconcile all the different viewpoints in one
coherent view of the system.
The result of the requirements activities is a requirements specification document, which describes what the
analysis has produced. The purpose of this document
is twofold: on the one hand, it must be analyzed and
confirmed by the various stakeholders in order to verify
whether it captures all of the customers expectations; on
the other hand, it is used by the software engineers to
develop a solution that meets the requirements.
The way requirements are actually specified is usually
subject to standardized procedures in software organizations. Standards may prescribe the form and structure of
the requirements specification document, the use of specific analysis methods and notations, and the kind of reviews and approvals that the document should undergo.
A possible checklist of the contents of the requirements
specification document that might guide in its production
is the following:
1. Functional requirements. These describe what the
product does by using informal, semiformal, formal
notations, or a suitable mixture. Various kinds of
notations and approaches exist with different
advantages and applicability. The Unified Modeling
Language (UML) is increasingly used as a practical
standard because it contains different notations for
expressing different views of the system. The
engineer can select and combine the notations best
suited to the application or the ones he or she finds
the most convenient or familiar.
2. Quality requirements. These may be classified into
the following categories: reliability (availability,
integrity, security, safety, etc.), accuracy of results,
performance, humancomputer interface issues,
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
10
independently. Indeed, different Web browsers, produced
by different people and organizations work with different
Web servers, developed by different people and different organizations. The requirement for this independence
between client and server, and indeed between any two
components is that each component has a well-defined
interface with which it can be used. The interface specifies what services are available from a component and the
ways in which clients may request those services. Thus, the
designer of any client knows how to make use of another
component simply based on that components interface.
It is the responsibility of the components implementer
to implement the interface. The interface thus specifies
the responsibilities of the designers and implementers of
components and users of components.
Standard architectures capture and codify the accepted
knowledge about software design. They help designers
communicate about different architectures and their tradeoffs. They may be generic, such as the clientserver architecture, or application- or domain-specific. Applicationspecific architectures are designed for particular domains
of application and take advantage of assumptions that may
be made about the application. For example, a domainspecific architecture for banking systems may assume the
existence of databases and even particular data types such
as bank accounts and operations such as depositing and
withdrawing from accounts.
At a lower level of detail, design patterns are a technique
for codifying solutions to recurring design problems. For
example, one may study different techniques for building
high-performance server components that are capable of
servicing large numbers of clients. Such solutions may
be documented using patterns that describe the problem
the pattern solves, the applicability of the patterns, the
tradeoffs involved, and when to use or not to use the pattern. Both standard architectures and design patterns are
aimed at helping designers and architects cope with the
challenges of the design activity and avoid reinventing the
wheel on every new project.
The notion of a component is central to both architectures and design patterns. Just as we can identify standard
architectures and patterns, we can also identify components that are useful in more than one application. The
use of standard components is standard practice in other
engineering disciplines but it has been slow in software
engineering. Their importance was recognized early in
the development of software engineering but its realization was only made possible in the 1990s after advances
in software architecture and design and programming languages made it possible to build such components and
specify their interfaces.
Component-based software engineering considers software engineering to consist of assembling and integrating
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
11
Software Engineering
quality factors are performance, reliability, security, usability, and so on. These factors may be further subdivided. For example, reliability is subdivided into correctness, fault-tolerance, and robustness. Correctness means
that the product correctly implements the required functionality; fault- tolerance means that the product is tolerant
of the occurrence of faults such as power failure; robustness means that the product behaves reasonably even in
unexpected situation such as the user supplying incorrect
input. For each of these factors, we can define metrics for
measuring them.
A complete requirements document includes not only
the functional requirements on the product but also the
quality requirements in terms of factors and the levels
required for each factor. Software engineers then try to
apply techniques and procedures for achieving the stated
quality requirements. There are two general approaches to
achieving quality requirements: product-oriented methods
and process-oriented methods. The process-oriented approach concentrates on improving the process followed to
ensure the production of quality products. We have already
discussed process quality, its assessment and improvement
in Section V. Here we review product-oriented techniques
for achieving software quality.
A. Validation and Verification
We refer to activities devoted to quality assessment as validation and verification or V&V. Validation is generally
concerned with assessing whether the product meets external criteria such as the requirements. Verification is generally concerned with internal criteria such as consistency
of components with each other or that each component
meets its specification.
The approaches to validation and verification may be
roughly classified into two categories: formal and informal. Formal approaches include mathematical techniques
of proving the correctness of programs and the more common approach of testing. Informal techniques include reviews and inspections of intermediate work products developed during the software life cycle.
B. Testing
Testing is a common engineering approach in which the
product is exercised in representative situations to assess
its behavior. It is also the most common quality assessment technique in software engineering. There are, however, several limitations to testing in software engineering.
The most important limitation is that software products do
not exhibit the continuity property we are accustomed
to in the physical world. For example, if we test drive a
car at 70 miles an hour, we reasonably expect that the car
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
12
will also operate at speeds below 70 miles an hour. If a
software product behaves correctly for a value of 70, however, there is no guarantee at all about how it will behave
with any other value! Indeed, due to subtle conditions, it
may not even behave properly when run with the value of
70 a second time! This discontinuity property of software
creates challenges for the testing activity. An accepted
principle in software engineering is that testing can only
be used to show the presence of errors in the software,
never their absence.
The primary challenge of testing is in deciding what
values and environments to test the software against.
Since it is impossible to test the software exhaustively
on all possible input valuesthe engineer must apply
some criteria for selecting test cases that have the highest chances of uncovering errors in the product. Two approaches to selecting test cases are called white box testing and black box testing. White box testing considers
the internals of the software, such as its code, to derive
test cases. Black box testing ignores the internal structure of the software and selects test cases based on the
specification and requirements of the software. Each approach has its own advantages and uses. Test coverage
refers to the degree to which the tests have covered the
software product. For example, the tests may have exercised 80% of the code of the software. The amount of
the test coverage gives the engineers and managers a figure on which to base their confidence in the softwares
quality.
There are different focuses for testing. Functional testing concentrates on detecting errors in achieving the required functionality. Performance testing measures the
performance of the product. Overload testing tries to find
the limits of the product under heavy load.
Testing is carried out throughout the entire life cycle,
applied to different work products. Two important tests applied before final delivery of the product are alpha test and
beta test. In alpha test, the product is used internally by the
developing organization as a trial run of the product. Any
errors found are corrected and then the product is placed
in a beta test. In a beta test, selected potential users of the
product try a pre-release of the product. The primary aim
of the beta test is to find any remaining errors in the product
that are more likely to be found in the customers environment. A benefit of the beta test for the softwares developer
is to get early reaction of potential customers to a product
or its new features. A benefit of beta test for the customer
is to get an early look at a future product or its new
features.
Testing consumes a large part of the software life cycle and its budget. Engineers use software tools such as
test generators and test execution environments to try to
control these costs.
Software Engineering
P1: GNH/MBG
EN015J-856
August 2, 2001
Software Engineering
X. MANAGEMENT OF SOFTWARE
ENGINEERING
In addition to common project management issues such
as planning, staffing, monitoring, and so on, there are several important challenges that face a software engineering
manager due to the special nature of software. The first is
that there are no reliable methods for estimating the cost or
schedule for a proposed software product. This problem
leads to frequent schedule and cost over-runs for software
projects. The factors that contribute to this problem are
the difficulty of measuring individual engineers software
productivity and the large variability in the capability and
productivity of software engineers. It is not uncommon
for the productivity of peer engineers to differ by an order
of magnitude. Dealing with the cost estimation challenge
requires the manager to monitor the project vigilantly and
update estimates and schedules as necessary throughout
the project.
Another factor that complicates the scheduling and estimation tasks is that requirements for a software system
can rarely be specified precisely and completely at the
start of a project. The product is often built for an incompletely known environment, possibly for a new market, or
to control a novel device. Thus, it is typical for requirements to change throughout the software life cycle. One
of the ways to deal with this challenge is to use a flexible
process model that requires frequent reviews and incremental delivery cycles for the product. Feedback from
the delivered increments can help focus and complete the
requirements.
A third challenge of managing a software project
is to coordinate the interactions and communications
among the engineers and the large numbers of work
products produced during the life cycle. Typically, many
work products exist in different versions. They must
be shared, consulted, and modified by different engineers. The solution to this challenge is to establish
strict guidelines and procedures for configuration management. Configuration management tools are some of
the most widely used and effective software engineering
tools.
11:21
13
Configuration management tools and processes provide
a repository to hold documents. Procedures exist for controlled access to documents in the repository, for creating
new versions of documents, and for creating parallel and
alternative development paths for a product consisting of
some of the documents. Typically, a software engineer
checks out a document such as a source code file from the
repository. This action locks out any other accesses to the
document by others. The engineer applies some changes
to the document and checks the document back in. At this
point this new version of the document becomes available
for others to access. Such procedures ensure that engineers
can share documents while avoiding unintended conflicting updates.
With the support of configuration management, several
releases of a product may exist in the repository and indeed
under development. A particular release of a product may
be built by knowing which versions of individual components are necessary. Configuration management tools
typically provide mechanisms for documenting the contents of a release and build tools for automatically building
a given release. Such build tools support parameterization
of components. For example, a given product may be built
for a particular country by specifying a country-specific
component as a parameter.
Configuration management tools are essential for orderly development of software projects that produce myriads of components and work products. They are even
more important in distributed software engineering organizations. It is now common for teams of software engineers to be dispersed geographically, located at different
sites of an organization, possibly in different countries
and continents. Configuration management tools provide
support for such distributed teams by providing the illusion of a central repository that may be implemented in a
distributed fashion.
One of the product management and development
trends of the 1990s was to look for organizational efficiency by outsourcing those parts of a business that are
not in the core business of the company. Outsourcing
of software is indeed compatible with component-based
software engineering. Development of whole classes of
components can be outsourced with the responsibility
lines clearly specified. Successful outsourcing of software engineering tasks can be based on well-specified
architectures and interface specifications. Clear and complete specifications are difficult to produce but they are
a necessary prerequisite to a contract between the contractor and a contracting organization. The specification
must state quality requirements in addition to functional
requirements. In particular, the contract must specify who
performs the validation of the software, and how much,
including test coverage criteria and levels.
P1: GNH/MBG
EN015J-856
August 2, 2001
11:21
14
XI. SUMMARY
Software engineering is an evolving engineering discipline. It deals with systematic approaches to building
large software systems by teams of programmers. We
have given a brief review of the essential elements of
software engineering including product-related issues
such as requirements, design, and validation, and
process-related issues including process models and their
assessment.
With the pervasiveness of software in society, the importance of software engineering is sure to grow. As technologies in diverse areas are increasingly controlled by
software, challenges, requirements, and responsibilities
of software engineers also grow. For example, the growth
of the Internet has spurred the need for new techniques to
address the development and large-scale deployment of
software products and systems. The development of ecommerce applications has necessitated the development
of techniques for achieving security in software systems.
As new applications and technologies are constantly
emerging, the software engineering field promises to stay
a vibrant and active field in a constant state of flux.
ACKNOWLEDGMENT
Sections of this material were adapted from the textbook Ghezzi,
C., Jazayeri, M., and Mandrioli, D. (2002). Fundamentals of
Software Engineering
Software Engineering, 2nd edition, Prentice Hall, EnglewoodCliffs, NJ.
BIBLIOGRAPHY
Boehm, B. W. (1981). Software Engineering Economics, PrenticeHall, Englewood Cliffs, N.J.
Brooks, F. P. Jr. (1995). The Mythical Man-Month: Essays on Software
Engineering, second edition, Addison-Wesley, Reading, MA.
Ghezzi, C., and Jazayeri, M. (1997). Programming Language Concepts, third edition, Wiley, New York.
Ghezzi, C., Jazayeri, M., and Mandrioli, D. (2002). Fundamentals of
Software Engineering, second edition, Prentice Hall, EnglewoodCliffs, NJ.
Jazayeri, M., Ran, A., and van der Linden, A. (2000). Software Architecture for Product Families: Principles and Practice, AddisonWesley, Reading, MA.
Leveson, N. (1995). Safeware: System Safety and Computers,
Addison-Wesley, Reading, MA.
Neumann, P. G. (1995). Computer-Related Risks, Addison-Wesley,
Reading, MA.
EN015J-857
August 2, 2001
11:23
Software Maintenance
and Evolution
Elizabeth Burd
Malcolm Munro
Research Institute in Software Evolution
I. Software Maintenance
II. Evolution
III. Conclusions
GLOSSARY
Change request A request for a change to a piece of
software.
Program comprehension The process of understanding
how a software system works.
Repository A data storage used for storing information
about a software system.
Software engineering The discipline of producing and
maintaining software.
Software system A collection of computer code and data
that runs on a computer.
The first definition can be considered as a technical definition and states that there are certain technical activities
that we have to perform on software after it is delivered.
The second definition is much more business oriented and
states that software has to change as business changes.
15
EN015J-857
August 2, 2001
11:23
16
Many different terms have been used to describe the
activity of software maintenance. These include bug fixing, enhancement, support, further development, current
engineering, postdeployment software support, postrelease development, and many more. Coupled with these
terms and the previously cited definitions it is important
to recognize the different types of maintenance. In most
organizations software maintenance is only seen as bug
fixing, whereas continued change and evolution are seen
either as development or redevelopment. This is a mistake, as changing an existing system requires a number
of specialized techniques that do not apply to green field
development. The main technique required is that of program comprehension, that is, understanding how the current system is constructed and works so that changes can
be made. Underestimating this task can lead to system
degradation and further problems.
Software maintenance processes are initiated for a number of reasons. These differing reasons result in four categories of maintenance activities. These are
r Perfective maintenanceThis involves improving
demands for changes due to user enhancements or environmental changes exacerbate the problems. In addition,
constant perfective and corrective maintenance, which is
not supported by preventative maintenance, has a tendency
to make the software more difficult to maintain in the
future.
Software evolution is a term that is sometimes used interchangeably with software maintenance. In fact they are
different, but related terms. For the purpose of this paper
software evolution is considered to be the changes that occur to software throughout its lifetime to ensure it continues to support business processes. Software maintenance
is the process of making these changes to the software
thereby seeking to extend its evolutionary path.
Software evolution is defined as the cumulative effect
of the set of all changes made to software products after
delivery.
This chapter will investigate the existing approaches to
maintenance and then describe how these processes are to
be improved by studying the effect of software evolution.
Within Section II there is a review of past and present
maintenance practices. Section III evaluates the recent
findings from studies of software evolution and evaluates
these in terms of the implication to increasing complexity and the formation of the detrimental legacy properties
within applications. It also makes some recommendations
regarding best practice for software maintenance based on
results of the evolution studies. Finally some conclusions
are drawn.
I. SOFTWARE MAINTENANCE
Software maintenance is influenced by software engineering. As the software engineering fraternity strives for the
perfect maintenance-free development method, the software maintainers have to pick up the pieces. For example, Object-Oriented designed systems were heralded as
maintenance free. It is easy to state this when there are
no Object-Oriented systems to maintain. It is only now
that such systems have move into maintenance that all
the problems manifest themselves. What should of course
happen is that software maintenance should influence software engineering.
From the definitions given previously, it should be
clear that all levels of an organization should be aware
(and involved) of software maintenance. At the technical level the developers should not ignore it as they must
address the issues of maintainability of the systems they
develop; the maintainers cannot ignore it as they are carrying out the work; the users are involved as they want
the systems to evolve and change to meet their everchanging need; senior management cannot ignore it as its
EN015J-857
August 2, 2001
11:23
17
Both these laws state that software maintenance is inevitable and that organizations should be aware of this
fact. There are no magic silver bullets that can be used
in development to eliminate maintenance. The first law of
Continuing Change says that the original business and
organizational environment modeled in the software is
modified by the introduction of the software system and
thus the system must continually change to reflect this.
Thus software must evolve. The second law says that extra resources must be devoted to preserving and simplifying the structure of the software system and that there is a
cost associated with maintaining the quality of the system.
Thus we should pay attention to preventive maintenance.
B. Models and Process
in Software Maintenance
There have been a large number of models describing the
software maintenance process. Probably the best models
are those that have remained in-house and are a hybrid
devised from several models and current in-house practice.
The published models can be classified into four types.
Maintenance models can be classified into four types:
1. Modification Cycle Models. These models give a
sequence of steps to be carried outthey are usually
oriented for corrective maintenance. The steps are of
the form Problem Verification, Problem Diagnosis,
Reprogramming and Rebuild, Baseline Verification,
and Validation.
2. Entity Models. These models enumerate the entities
of a system and detail how for each step of the model
the entities are generated or modified. The entities are
items such as Requirement, Specification, Source
Code, Change Requests, People, Tasks, Forms, and
Knowledge. The classic model of this type is that
developed for the IEEE in the IEEE Standard
1219-1993.
3. Process Improvement Models. These models
concentrate on how to improve the maintenance
process and are based on the SEI (Software
Engineering Institute) five-layer model of process
improvement.
4. Cost/Benefit Models. These models used the Entity
models and applied cost/benefit analysis to them.
The major gap in software maintenance is a theory of
software maintenance. There is not a formal theory nor is
there a clear way forward to developing one. The nearest
there is, is in the work of Manny Lehman and his laws of
program evolution. These laws were formulated from empirical evidence but are difficult to validate for all systems.
A way forward is to carry out some long-term research on
EN015J-857
August 2, 2001
11:23
18
how systems evolve and how their structure changes over
time. This must be linked with how business affects the
evolving software. From this research a universal definition of maintainability may emerge.
C. Measurements and Metrics for Software
Maintenance
Measurement and metrics for software maintenance is
a difficult topic. There are a large number of product
and process metrics for software. Product metrics tend
to concentrate on complexity type metrics and there
have been some attempts at correlating some of these
metrics with maintenance effort. One of the problems
here is there is no definitive and universal definition of
maintainability. Another problem is that organizations
are reluctant to publish data on their systems thus making
any comparisons difficult.
Determination of the maintenance effort is again a difficult task. It will depend on factors such as the size of
the system, the age since delivery, structure and type
of system, quality of the documentation standards and
the document update procedures in place, the number
of reported bugs, the type and number of change requests that are mandatory and desirable, the use of change
control procedures, the establishment of test procedures,
the level of staff competence and training and staff
turnover.
All these factors (and more) do not fit easily into a simple function that give the definitive answers to questions
such as Is my system maintainable, How well is my
maintenance operation doing, or Should I continue with
maintaining the system or throw it away and start again.
Some of these factors can be combined into a simple function forming that can be called a System Profile. From
this profile judgments could be made as to how to improve
the process and possibly make decisions on whether to
continue maintenance or to redevelop. The system profile
addressed the following factors:
II. EVOLUTION
The evolution of software is studied by investigating the
changes that are made to software over successive versions. In the majority of cases this involved a historical
study of past changes over the past versions that are available. Unfortunately since the benefit of evolutionary studies of software have yet to establish their significance and
benefit within industry, the need to retain past versions
and data regarding the environmental circumstances of
the changes is not foreseen. While the authors have found
very few applications with a record of versions from the
entire lifetime of the software product they have been successful in collecting a great number of systems and their
version upon which to base their analysis.
The approach adopted by the authors is to take successive versions of a software application and to investigate
the changes that are occurring. Depending on the general
data available regarding the environment of the changes,
additional information may or may not be used. Where
possible, analysis is performed using as much information
as possible to supplement the analysis process. A number
of different approaches are adopted but the fundamental
approach replies on the appearance or removal of calls and
data items.
A. Levels
EN015J-857
August 2, 2001
11:23
19
EN015J-857
August 2, 2001
11:23
20
EN015J-857
August 2, 2001
11:23
21
EN015J-857
August 2, 2001
11:23
22
FIGURE 5 The addition of a new call and resulting increase in calling structure.
later version of the software. Within the COBOL application all data items are global, thus usages of the same data
item within a number of SECTIONs means each one must
be consulted when a change is applied. The graph in Fig. 8
shows an overall change in the number of SECTIONs for
a specific data item.
Within Fig. 7 half of the graph shows data items which
are in fewer SECTIONs (those to the left and labelled
Removal of data items), whereas the other half of the
graph represents the addition of data items. For instance,
it can be seen that from the left-hand side, 5 data items
have been removed from 4 SECTIONs. Thus, in this case
the complexity of the relationships between SECTIONs
can be said to be decreasing for these specific data items.
However, the majority of the changes appear in the half of
the graph that relates to the addition of data items. To the
right-hand side it can be seen that over 20 data items have
been added to a further SECTION, but in addition 6 data
items have been added to more than 10 SECTIONs. Thus,
the graph shows a definite increase in data complexity of
the COBOL software due to the addition of data items.
Other increases in complexity, at least partly resulting from this phenomenon have also been identified. One
of these is an increased complexity in the data interface
EN015J-857
August 2, 2001
11:23
23
between subsystems within a software module. An example of this finding is shown within Fig. 8.
This figure represents the clear interface of data and
subsystems within the initial version of the software (to
the left) but shows how this structure is corrupted due to
the evolution process. This result has major implications
on the comprehensibility and future adaptability of the
software module.
A very steep rise would indicate that a software application is quickly gaining legacy properties; whereas a steep
fall may indicate that a preventative maintenance approach
had been adopted. Slight falls within the later trends of the
softwares evolution can be observed within Application 1
and Application 3. From Fig. 9, it can be observed that the
GCC application promotes a steep rise between Sample
Version S1 to Sample Version S5. Likewise the same observation can be made with Application 4 between Sample
Versions S1 and S2.
4. Comparing Levels
In order to gain an even greater understanding of the different maintenance trends of applications the results of call
and data analysis can be compared. The approach adopted
is to compare the proportion of data items modified and
function call changes within each of the applications for
each available version. Thus to compare the results of the
analysis of Level 2 and Level 3.
The results of this analysis process are shown within
Fig. 9. This graph represents the proportion of data items
modified per call change for each of the applications. This
graph would seem to indicate that within Sample Version
S2 (the GCC application) revealed a considerably higher
proportion of data per call modifications than was necessary with the other versions. In addition, it is interesting to investigate the rise and fall of these proportions.
B. General Recommendations
From conducting this analysis process a number of factors
for successful maintenance have been identified from early
recommendations that the authors aim to extend and verify
within later studies. However, in order that industry can
see the benefits of such research it is necessary to make
some early recommendations. In summary,
r Study entire applicationsBy studying the entire
EN015J-857
August 2, 2001
11:23
24
applications it seems that frequently the data are
modified in a less than optimal way. More effort
should be applied when making a change to ensure
that wherever possible data cohesion is not adversely
effected. Representing data cluster changes is one way
of highlighting such a problem.
r Fewer software releases tend of lead to slower
increases in data complexityA strategy that tends to
batch change requests and issue releases at set
periodic time-scales has the opportunity to develop a
more considered overall maintenance strategy and
optimize and integrate the design of requests.
r Best people should be assigned to maintenanceThis
research seems to highlight that when some of the best
programmers were assigned to the maintenance tasks
the overall quality of the code tended to improve. This
is a complete reversal of the standard evolutionary
path of software under maintenance where usually a
steady increase in software data complexity is
identifiable.
r Preventative maintenance needs to be continuous
themePreventative maintenance is not something
that can be performed once and then forgotten; rather
it must either be a task that is carried out in detail at
specific time periods or more appropriately as a
continuing theme.
Additional work in this area will try to gain further insights
into properties of specific maintenance changes and how
these changes effect the evolution of software applications. From this it is hoped that other insights into appropriate strategies for maintenance providers will emerge.
III. CONCLUSIONS
This chapter has investigated the often-confused relationship between maintenance and evolution. It has defined
software evolution as the change that occurs to software
throughout its lifetime to ensure it continues to support
business processes. Software maintenance is the process
of making these changes to the software thereby seeking
to extend its evolutionary path. This chapter has reviewed
recent findings on studies of both the maintenance and the
evolution process.
Software Maintenance is concerned with the way in
which software is changed and how those changes are
managed. Software will have to change to meet the everchanging needs of a business. If these changes are not
well managed then the software will become scrambled
and prevent the business from achieving it full potential.
The chapter has reviewed studies at each of the three levels of evolution study; the system, function and data levels.
It is shown how legacy properties are introduced into software applications through the change process and what are
the long-term implications of these changes. Predictions of
the consequence of software gaining legacy properties are
made and also an indication of the low-level effect of these
legacy properties have been shown diagrammatically. On
the basis of these observations a number of recommendations, are made to assist software maintainers to prevent
the further introduction of these legacy properties within
future maintenance interventions. It is hoped from these
recommendations that the maintainability of software will
be improved within the future, therefore making the process of evolution easier and cheaper.
The authors hope that those involved within the maintenance and development of software applications will see
the benefit of retaining information regarding the development and maintenance process. In the future more data
must be retained, which will lead to studies of the evolution
process to make even further observations, and therefore
continue to strive for the improvement of the maintainability of software.
BIBLIOGRAPHY
Boehm, B. W. (1995). The high cost of software. In Practical Strategies
for Developing Large Software Systems (E. Horowitz, ed.), Addison
Wesley, Reading, MA.
Burd, E. L., and Munro, M. (1999). Characterizing the Process of Software Change, Proceedings of the Workshop on Principles of Software
Change and Evolution: SCE1999, ICSE.
Burd, E. L., and Munro, M. (2000). Supporting program comprehension
using dominance trees (Invited Paper), Special Issue on Software
Maintenance for the Annals of Software Engineering 9, 193213.
Burd, E. L., and Munro, M. (2000). Using evolution to evaluate reverse
engineering technologies: mapping the process of software change,
J. Software Systems 53(1), 4351.
Glass, R. L., and Noiseux, R. A. (1981). Software Maintenance Guidebook, Prentice Hall, Englewood Cliffs, NJ.
IEEE Standard for Software Maintenance, TEEE Std 1219-1998.
Information TechnologySoftware MaintenanceBS ISO/IEC
14764:1999.
Lehman, M. M. (1980). On understanding laws, evolution, and conservation in the large-program life cycle, J. Systems Software 1, 213221.
Lehman, M. M. (1989). Uncertainty in Computer Applications and its
Control through the Engineering of Software, J. Software Main. 1(1).
Lientz, B., and Swanson, E. B. (1980). Software Maintenance,
Addison-Wesley, Reading, MA.
Parikh, G., and Zvegintzov, N. (1993). Tutorial on Software Maintenance, IEEE Computer Society Press, Silver Spring, MD.
Pigiski, T. (1996). Practical Software Maintenance, Wiley, New York.
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
Software Reliability
Claes Wohlin
Blekinge Institute of Technology
Martin Host
Per Runeson
Anders Wesslen
Lund University
GLOSSARY
Software error A mistake made by a human being, resulting in a fault in the software.
Software failure A dynamic problem with a piece of
software.
Software fault A defect in the software, which may cause
a failure if being executed.
Software reliability A software quality aspect that is
measured in terms of mean time to failure or failure
intensity of the software.
Software reliability certification To formally demonstrate system acceptability to obtain authorization to
use the system operationally. In terms of software reliability, it means to evaluate whether the reliability
requirement is met or not.
Software reliability estimation An assessment of the
current value of the reliability attribute.
Software reliability prediction A forecast of the value of
the reliability attribute at a future stage or point of time.
SOFTWARE RELIABILITY is defined as the probability for failure-free operation of a program for a specified
time under a specified set of operating conditions. It is
one of the key attributes when discussing software quality. Software quality may be divided into quality aspects
in many ways, but mostly software reliability is viewed as
one of the key attributes of software quality.
The area of software reliability covers methods, models, and metrics of how to estimate and predict software
reliability. This includes models for both the operational
profile, to capture the intended usage of the software, and
the operational failure behavior. The latter type of models
is then also used to predict the future behavior in terms of
failures.
Before going deeper into the area of software reliability, it is necessary to define a set of terms. Already in the
definition, the word failure occurs, which has to be defined
and in particular differentiated from error and fault.
Failure is a dynamic description of a deviation from
the expectation. In other words, a failure is a departure
25
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
26
Software Reliability
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
27
Software Reliability
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
28
Software Reliability
The algorithmic model is a refinement of the domainbased model. The refinement is that the algorithmic model
takes the input history into account when selecting the next
input. The model may be viewed as drawing balls from
an urn, where the distribution of balls is changed by the
input history.
To define the usage profile for the algorithmic model,
the input history must be partitioned into a set of classes.
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
29
Software Reliability
3. Operational Profile
A way of characterizing the environment is to divide the
execution into a set of runs, where a run is the execution
of a function in the system. If runs are identical repetitions
of each other, these runs form a run type. Variations of a
system function are captured in different run types. The
specification of the environment using run types is called
the operational profile. The operational profile is a set of
relative frequencies of occurrence of operations, where an
operation is the set of run types associated with a system
function. To simplify the identification of the operational
profile, a hierarchy of profiles is established, each making
a refinement of the operational environment.
The development of operational profiles is made in five
steps.
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
30
Software Reliability
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
31
Software Reliability
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
32
There are three ways to assign the probabilities in the
usage profile.
1. Measuring the usage of an old system. The usage
is measured during operation of an old system that the
new system shall replace or modify. The statistics are collected, the new functions are analyzed, and their usage is
estimated based on the collected statistics.
2. Estimate the intended usage. When there are no old
or similar systems to measure on, the usage profile must
be estimated. Based on data from previous projects and on
interviews with the end users, an estimate on the intended
usage is made. The end users can usually make a good
profile in terms of relating the different functions to each
other. The function can be placed in different classes, depending on how often a function is used. Each class is then
related to the other by, for example, saying that one class is
used twice as much as one other class. When all functions
are assigned a relation, the profile is set according to these
relations.
3. Uniform distribution. If there is no information available for estimating the usage profile, one can use a uniform
distribution. This approach is sometimes called the uninformed approach.
reported in the same way all the time, for example, the
time for failure occurrence has to be reported with
enough accuracy.
r Completeness. All data has to be collected, for
example, even failures for which the tester corrects the
causing fault.
r Measurement system consistency. The measurement
system itself must as a whole be consistent, for
example, faults shall not be counted as failures, since
they are different attributes.
B. Measurement Program
Measurement programs can be set up for a project, an
organizational unit, or a whole company. The cost is, of
course, higher for a more ambitious program, but the gains
Software Reliability
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
33
Software Reliability
FIGURE 12 Relationship between operation, usage specification, usage-based testing, and software reliability models.
B. Definitions
As a starting point, we introduce some basic reliability
theory definitions. Let X be a stochastic variable representing time to failure. Then the failure probability F(t)
is defined as the probability that X is less than or equal to
t. We also define the survival function as R(t) = 1 F(t).
Some important mean value terms are displayed in
Fig. 13. Here, the state of the system is simply modeled
as alternating between two states: when the system is executing, a failure can occur and the system is repaired; and
when the system is being repaired, it will after a while,
when the fault is corrected, be executed again. This is
iterated for the entire life cycle of the system.
The expected value of the time from a failure until the
system can be executed again is denoted MTTR (mean
time to repair). This term is not dependent on the number
of remaining faults in the system.
The expected time that the system is being executed after a repair activity until a new failure occurs is denoted
MTTF1 (mean time to failure), and the expected time between two consecutive failures is denoted MTBF (mean
time between failures). The two last terms (MTTF and
MTBF) are dependent on the remaining number of software faults in the system.
The above three terms are standard terms used in reliability theory in general. In hardware theory, however, the
last two terms are often modeled as being independent of
the age of the system. This cannot, in most cases, be done
for software systems.
Two simple but important relationships are
MTBF = MTTF + MTTR,
Availability = MTTF/MTBF.
When modeling software reliability, the repair times do
not have any meaning. Instead, only the times between
consecutive failures are considered and therefore measured. In this case, the only term of the above three that
FIGURE 11 The reliability can be derived from directly measurable attributes via a software reliability model.
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
34
Software Reliability
The first two are the most used, while the last is the least
used because of its high level of complexity.
The application of reliability models is summarized in
an example in Fig. 14. In this example, the failure intensity
is modeled with a reliability model. It could, however, be
some other reliability attribute, such as MTBF. First, the
real values of the times between failures in one realization
are measured (1). Then the parameters of the model are
estimated (2) with an inference method such as the maximum likelihood method. When this is done, the model can
be used, for example, for prediction (3) of future behavior
(see Fig. 14.)
D. Model Overview
Reliability models can be classified into four different
classes:
1.
2.
3.
4.
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
35
Software Reliability
FIGURE 14 The application of reliability models. In this example the model is used for prediction.
time between the (i 1)th and i th failure, then the probability density function of X i is defined as in Eq. (1).
f X i (t) = i ei t ,
(1)
i = (N (i 1)),
(2)
i=1
(N (i1))ti
n
i=1
ln(N i + 1)
(3)
n
(N i + 1)ti
i=1
(4)
This function should be maximized with respect to N and
. To do this, the first derivative with respect to N and
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
36
Software Reliability
(5)
E. Reliability Demonstration
When the parameters of the reliability model have been estimated, the reliability model can be used for prediction of
the time to the next failure and the extra development time
required until a certain objective is reached. The reliability of the software can be certified via interval estimations
of the parameters of the model, i.e., confidence intervals
are created for the model parameters. But often, another
approach, which is described in this section, is chosen.
A method for reliability certification is to demonstrate
the reliability in a reliability demonstration chart. This
method is based on faults that are not corrected when failures are found, but if faults were corrected, this would only
mean that the actual reliability is even better than what the
certification says. This type of chart is shown in Fig. 15.
To use this method, start in the origin of the diagram.
For each observed failure, draw a line to the right and one
step up. The distance to the right is equal to the normalized
time (time failure intensity objective). For example, the
objective may be that the mean time to failure should be
100 (failure intensity objective is equal to 1/100) and the
measured time is 80, then the normalized time is 0.8. This
means that when the normalized time is less than 1, then
the plot comes closer to the reject line; on the other hand,
if it is larger than 1 then it comes closer to the accept line.
If the reached point has passed the accept line, the objective is met with the desired certainty, but if the reject
line is passed, it is with a desired certainty clear that the
objective is not met.
The functions for the two lines (accept line and reject
line) are described by Eq. (8).
x(n) =
A n ln
,
1
(8)
1
,
(9)
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
37
Software Reliability
(10)
V. EXPERIENCE PACKAGING
A. Purpose
In all measurement programs, collected experience is necessary to make full use of the potential in software product
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
38
B. Usage Model and Profile
The usage model and profile are descriptions of the intended operational usage of the software. A reliability prediction is conducted based on test cases generated from a
usage profile and is always related to that profile. Hence,
a reliability prediction must always be stored in the experience base together with the usage profile used.
Comparing the prediction to the outcome in operational
usage can validate a reliability prediction. If there is a considerable difference between predicted and experienced
reliability, one of the causes may be discrepancies between
the usage profile and the real operational profile. This has
to be fed back, analyzed, and stored as experience.
Continuous measurements on the operational usage of
a product are the most essential experience for improving
the usage profile. A reliability prediction derived in usagebased testing is never more accurate than the usage profile
on which it is based.
The usage models and profiles as such contain a lot of
information and represent values invested in the derivation
of the models and profiles. The models can be reused,
thus utilizing the investments better. Different cases can
be identified.
r Reliability prediction for a product in a new
Software Reliability
C. Reliability Models
Experience related to the use of reliability models is just
one type of experience that should be stored by an organization. Like other experiences, projects can be helped if
experience concerning the reliability models is available.
In the first stages of testing, the estimations of the model
parameters are very uncertain due to too few data points.
Therefore, it is very hard to estimate the values of the parameters, and experience would be valuable. If, for example, another project prior to the current project has developed a product similar to the currently developed product,
then a good first value for the parameters would be to take
the values of the prior project.
Another problem is to decide what model to choose for
the project. As seen in the previous sections, a number of
VI. SUMMARY
When you use a software product, you want it to have
the highest quality possible. But how do you define the
quality of a software product? In the ISO standard 9126,
the product quality is defined as the totality of features
and characteristics of a software product that bear on its
ability to satisfy stated or implied needs. The focus here
has been on one important quality aspect: the reliability
of the software product.
Software reliability is a measure of how the software
is capable of maintaining its level of performance under
stated conditions for a stated period of time and is often
expressed as a probability. To measure the reliability, the
P1: GTV/GRD
EN015G-858
August 2, 2001
11:27
39
Software Reliability
BIBLIOGRAPHY
Fenton, N., and Pfleeger, S. L. (1996). Software Metrics: A Rigorous &
Practical Approach, 2nd ed., International Thomson Computer Press,
London, UK.
Goel, A., and Okumoto, K. (1979). Time-dependent error-detection
rate model for software reliability and other performance measures,
IEEE Trans. Reliab. 28(3), 206211.
Jelinski, Z., and Moranda, P. (1972). Software Reliability Research.
Proceedings of Statistical Methods for the Evaluation of Computer
System Performance, 465484, Academic Press, New York.
Lyu, M. R., ed. (1996). Handbook of Software Reliability Engineering,
McGraw-Hill, New York.
Musa, J. D., Iannino, A., and Okumoto, K. (1987). Software Reliability: Measurement, Prediction, Application, McGraw-Hill, New
York.
Musa, J. D. (1993). Operational profiles in software reliability engineering, IEEE Software March, 1432.
Musa, J. D. (1998). Software Reliability Engineering: More Reliable Software, Faster Development and Testing, McGraw-Hill, New
York.
van Solingen, R., and Berghout, E. (1999). The Goal/Question/Metric
Method: A Practical Guide for Quality Improvement and Software
Development, McGraw-Hill International, London, UK.
Xie, M. (1991). Software Reliability Modelling, World Scientific,
Singapore.
EN015J-859
August 2, 2001
11:35
Software Testing
Marc Roper
Strathclyde University
I.
II.
III.
IV.
GLOSSARY
Error A (typically human) mistake which results in the
introduction of a fault into the software.
Failure The manifestation of a fault as it is executed.
A failure is a deviation from the expected behavior,
that is, some aspect of behavior that is different from
that specified. This covers a large range of potential
scenarios including, but by no means limited to, interface behavior, computational correctness, and timing
performance, and may range from a simple erroneous
calculation or output to a catastrophic outcome.
Fault Also known as a bug or defect, a fault is some
mistake in the software which will result in a failure.
Test case A set of test data and expected results related
to a particular piece of software.
Testing technique A mechanism for the generation of
test data based on some properties of the software under
test.
I. FUNDAMENTAL LIMITATIONS
OF TESTING
A. The Input Domain
In simple terms testing is about sampling data from the
input space of a system. The nave reaction to this is to
consider testing as involving running the program on every possible input. Unfortunately, this is impossible as the
input space is prohibitively large. Consider, for example,
a well-known program (a simple program is used for illustrative purposes here, but the results are equally applicable
to any piece of software) which finds the greatest common
divisor of two integers. Assume further that in addition to
41
EN015J-859
August 2, 2001
11:35
42
Software Testing
EN015J-859
August 2, 2001
11:35
43
Software Testing
xi ,
i=1
EN015J-859
August 2, 2001
11:35
44
to be executed with the data that are going to reveal the
fault. To pick a simple example, the statement x = x + x
is indistinguishable from the statement x = x x if only
tested on the values 0 and 2 for x. It is only when other
data such as 3 or 1 are used that they calculate different results and it becomes clear that maybe one should
have been used in place of the other. It is also easy to
see other limitations of statement testing. For example, if
there is a compound condition such as if (x < 3 and
y == 0.1) statement testing will not require that the subconditions within this are thoroughly tested. It is sufficient
just to choose values that will cause any subsequent dependent statements to be executed (such as x = 2 and y = 0.1,
and x = 4 and y = 0.1). For reasons such as this stronger
structural coverage techniques were introduced based on
elements such as compound conditions or the flow of data
within the program (see Section IV.B, which also explains
the important development of perceiving coverage techniques as adequacy criteria). However, the stronger techniques were devised to address loopholes in the weaker
techniques, but still have loopholes themselves.
There is another important factor to note at this point.
It is naturally tempting to think of the more demanding
testing techniques as being more likely to reveal faults.
This is only the case if the data used for the more demanding technique are built upon and subsume that of the less
demanding technique (for example, a set of data might be
created to achieve statement coverage which is then augmented to achieve a higher level of coverage). If, however,
the data sets are created independently then the situation
can arise where, by chance rather than design, the data to
achieve statement testing might reveal a fault which the
data to achieve the higher level of coverage did not. This
raises the issue of the relationship between techniques and
external attributes such as reliability or, more nebulously,
quality. For the reasons just discussed this relationship is
by no means simple or well defined. Work in this area has
produced some interesting results. For example, comparisons between partition testing (any criteria that divides up
the input domain in some waysuch as statement testing
or branch testing) and random testing found partition testing to be generally preferable, but also discovered some
situations where random testing was superior! This is an
active area of research.
C. The Impact of Technologies
This general overview has indicated that there is quite
some way to go before a general, effective, and predictable
testing strategy is in place for even relatively straightforward imperative single-user systems. Given that software
development technology is moving at an astonishing pace,
what is the lesson for developers of, for example, dis-
Software Testing
EN015J-859
August 2, 2001
Software Testing
11:35
45
invoking the lower-level modules, but again without
the full range of functionality. Similarly to stubs,
these are replaced by the real modules as the
integration process works its way up the system.
r SandwichAn outside-in approach that
combines the top-down and bottom-up strategies.
r Big BangThe simultaneous integration of all
modules in one go.
r BuildsModules are integrated according to
threads of functionality. That is, all the modules in a
system which implement one distinct requirement
(or use-case) are pulled together. This proceeds
through the set of requirements until they have all
been tested. In this approach a module might find
itself being integrated several times if it participates
in many requirements.
A strategic approach is valuable in integration as a
means of identifying any faults that appear. At this
stage, faults are going to be the result of subtle
interactions between components and are often
difficult to isolate. A strategy assists this by focusing
the tests and limiting the number of modules that are
integrated at any one phase. The big-bang strategy is
the obvious exception to this and is only a feasible
strategy when a system is composed of a small
number of modules. The actual strategy chosen will
often be a function of the system. For example, in a
an object-oriented system the notion of top and
bottom often does not exist as such systems tend to
have network rather than hierarchical structures and
so a threaded strategy based upon builds is a more
natural choice. In contrast, systems designed using
more traditional structured analysis techniques
according to their flow of data will often display a
distinct hierarchy. Such systems are more amenable
to the application of top-down (if the priority is on
establishing the outward appearance of the system) or
bottom-up (if the behavior of the lower-level data
gathering modules is considered important)
strategies.
3. System Testing. A level of testing that is geared at
establishing that the functionality of the system is in
place. This would often be based around a high-level
design of the system. System testing will often
involve analysis of other properties of the system by
using techniques such as:
r Performance TestingWhich examines the
systems ability to deal efficiently with the demands
placed upon it, for example, by focusing on the
response time of the system under various loads or
operating conditions
r Stress TestingThe activity of determining how
the system deals with periods of excessive demand
EN015J-859
August 2, 2001
11:35
46
(in terms of transactions, for example) by
overloading it for short periods of time
r Volume TestingThe operation of the system at
maximum capacity for a sustained period of time
r Configuration TestingThe execution of the
system on different target platforms and
environments
4. Acceptance Testing. The highest level of tests
carried out to determine if the product is acceptable to
the customer. These would typically be based around
the requirements of the system and usually involves
the user. If there were many possible users, for
example when the product is being built for a mass
market, then the product would also be subjected to
alpha and beta testing:
r Alpha TestingThe idea of inviting a typical
customer to try the product at the developer site.
The ways in which the customer uses the product is
observed, and errors and problems found are noted
by the developers.
r Beta TestingUsually performed subsequent to
alpha testing, a number of typical customers receive
the product, use it in their own environment and
then problems and errors are reported to the
developer.
5. Regression Testing. The process of retesting
software after changes have been made to ensure that
the change is correct and has not introduced any
undesirable side effects. Very often, the difficulty in
regression testing is in identifying the scope of the
change.
Software Testing
EN015J-859
August 2, 2001
11:35
47
Software Testing
data that is based upon the specification, or some other external description of the software. As the name suggests,
their focus is to help establish that the software correctly
supports the intended functions. They are frequently referred to as black box techniques.
1. Equivalence Partitioning. The mechanism whereby
classes of input data are identified from the
specification on the basis that everything within this
partition is treated identically according to the
specification. Having identified such partitions it is
only necessary to choose one test for each partition
since the underlying assumption is that all values are
treated the same. It is also recommended that data
falling outside the partition also be chosen. In
addition, the partitioning of the output domain should
also be considered and treated in a similar way.
2. Boundary Value Analysis. Having identified the
equivalence partitions, boundary value analysis is a
technique that encourages the selection of data from
the boundary of the partition on the basis that it is
more likely that errors will have been made at this
point. Data values should be chosen as close as
possible to each side of the boundary to ensure that it
has been correctly identified.
3. Cause-Effect Graphing. A technique that attempts
to develop tests that exercise the combinations of
input data. All inputs and outputs to a program are
identified and the way in which they are related is
defined using a Boolean graph so that the result
resembles an electrical circuit (the technique has its
roots in hardware testing). This graph is translated
into a decision table in which each entry represents a
possible combination of inputs and their
corresponding output.
4. Category-Partition Testing. This attempts to
combine elements of equivalence partitioning,
boundary value analysis, and cause-effect graphing
by exercising all combinations of distinct groups of
BIBLIOGRAPHY
Adrion, W. R., Branstad, M. A., and Cherniavsky, J. C. (1982). Validation, verification and testing of computer software, ACM Computing
Surv. 14(2), 159192.
Beizer, B. (1990). Software Testing Techniques, Van Nostrand
Reinhold.
Hamlet, R. (1988). Special section on software testing (Guest Editorial), Commun. ACM 31(6), 662667.
Hamlet, R. (1992). Are We Testing for True Reliability? IEEE Software
9(4), 2127.
Harrold, M. J. (2000). Testing: A roadmap. In (A. Finkelstein, ed.), The
Future of Software Engineering, ACM Press, New York.
Ould, M. A., and Unwin, C. (eds.) (1986). Testing in Software Development, Cambridge University Press.
Roper, M. (1993). Software testing: A selected annotated bibliography,
SoftwareTesting, Verification and Reliability 3(2), 135157.
Roper, M. (1994). Software Testing, McGraw-Hill.
Software Testing Online Resources (STORM MTSU)http://www.mtsu.
edu/storm/
Zhu, H., Hall, P. A. V., and May, J. H. R. (1997). Software
unit test coverage and adequacy, ACM Comput. Surv. 29(4),
366427.
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
I.
II.
III.
IV.
GLOSSARY
Client A device that requests a service from a remote
computer.
Internet An internet is a collection of networks connected together. The Internet is a global collection of
interlinked networks. Computers connected to the Internet communicate via the Internet Protocol (IP).
Internet Protocol A protocol that was designed to provide a mechanism for transmitting blocks of data called
datagrams from sources to destinations, where sources
and destinations are hosts identified by fixed length
addresses.
Protocol A description of the messages and rules
for interchanging messages in intercomputer communication.
Server A device (normally a computer) that provides a
service when it receives a remote request.
TCP Transmission control protocol. A protocol that provides reliable connections across the Internet. Protocols
other than TCP may be used to send messages across
the Internet.
TCP/IP Transmission control protocol implemented on
top of the Internet protocol. The most commonly used
protocol combination on the Internet.
FOR MANY PEOPLE the World Wide Web is the public face of the Internet. Although the majority of network
traffic is still devoted to electronic mail, most people visualize a Web page when they try to portray the Internet. The
Web has done much to popularize the use of the Internet,
and it continues to be one of the enabling technologies in
the world of e-commerce.
The World Wide Web (WWW) is a mechanism for
making information stored in different formats available,
across a computer network, on hardware from a variety of
hardware manufacturers. At the present time (Fall 2000)
the hardware is normally a computer; however, in future
devices such as mobile phones will be increasingly used
for displaying information obtained via the Web. It is in
the spirit of the Web that the information can be accessed
and displayed by software that can be supplied by a variety of software suppliers, and that such software should be
available for a variety of operating systems (ideally every
operating system one can name). A further aim is that the
information can be presented in a form that makes it suitable for the recipient. For example, whereas information
might appear to a sighted person in the form of a text document, a visually impaired person might be able to have the
same information read to them by the receiving device. A
similar mechanism could be used by someone working in
875
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
876
an environment that required them to observe something
other than a display screen.
19:22
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
877
top of facilities provided by Internet software. The additional software that is necessary for the creation of a web
falls into two categories: servers and browsers.
In the Web any number of computers may run server
software. These machines will have access to the information that is to be made available to users of the Web.
Web server software normally executes permanently on
server machines (it is only possible to request information
while server software is executing). The purpose of such
server software is to listen for requests for information
and, when those requests arrive, to assemble that information from the resources available to the server. The information is then delivered to the requestor. A machine that
is connected to a network via the Internet protocol TCP
(Transmission Control Protocol) sees the network as a collection of numbered ports. A message may arrive on any of
these ports. Programs may attach themselves to a port (by
specifying its number) and then listen for requests. The
port normally used by a Web server is port 80 (although
it is possible to use other ports). Requests for Web pages
are therefore normally directed to port 80 of a machine
known to be operating Web server software. The owners
of Web sites (Webmasters) typically register an alias for
their machines that makes it clear that they are open to
requests for Web pages. Such aliases often begin with the
character string www, as in www.scit.wlv.ac.uk. Numerous different types of Web server software are available.
The most popular Web server software, such as Apache
and Microsofts Internet Information Server (IIS), is either
freely downloadable from the Internet or else bundled with
an operating system. Web servers that perform specialized
tasks in addition to delivering Web-based information can
be purchased from a variety of software suppliers. Most
hardware/software platforms have at least one Web server
product available for them.
Information available via the Web is delivered to the
user via a browser. Browser software runs on client machines. A browser makes a request to a server and, when
the relevant information arrives, presents it to the user
(Fig. 1). The common conception of a browser is of a piece
of software with a graphical user interface (GUI) capable
of displaying pictures and playing sound. In fact the original concept of a browser was never intended to be so
limited. A browser is simply a software product that can
request information from a Web server and then present
that information in a way that is suited to the recipient of
that information. The original browser developed by Tim
Berners-Lee made use of features of the NeXT platform,
which were quite advanced (in 1990), and was indeed GUI
based. At the same time, however, it was also recognized
that not all Web client machines would be so advanced,
and therefore a text-only browser was developed. From
the beginning, the Web was intended to be available to the
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
878
This allows a reader to approach a topic in a nonsequential
manner that is said to be similar to normal human thought
patterns. Web documents may contain links to other documents, and by selecting these links a user can access
(or browse) other related material. In this second type of
access each link contains information about identifying
a particular piece of information on a particular server.
As with explicit requests, it is the task of the browser to
contact the server (usually on port 80) and request the
information that has been cited.
19:22
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
879
Type of access
http
ftp
gopher
mailto
news
telnet, rlogin, tn3270
wais
file
refers to the home page of the Web server with the Internet address www.scit.wlv.ac.uk. As a port number
is not explicitly specified, it will be assumed to be listening
on the standard Web server port, port 80.
The URL
http://www.scit.wlv.ac.uk:8000
refers to the home page of a Web server located with
the Internet address www.scit.wlv.ac.uk, which is
listening on port 8000.
The URL
http://www.scit.wlv.ac.uk/myfile.html
refers to something called myfile.html that is accessible
by the Web server listening on port 80 with the address
www.scit.wlv.ac.uk. Most likely this will be a file
containing HTML, but it may in fact turn out to be a directory or something else. The exact nature of what is
returned by the server if this link is followed will depend
of what myfile.html turns out to be and the way the server
is configured.
The URL
http://www.scit.wlv.ac.uk/ cm1914
refers to the home page of user cm1914 on the machine with address www.scit.wlv.ac.uk. The full
file name of such a page will depend on the server
configuration.
The URL
http://www.scit.wlv.ac.uk/mypage.
html#para4
refers to a fragment within the page referenced by
http://www.scit.wlv.ac.uk/mypage.html
called para4.
The URL
http://www.scit.wlv.ac.uk/phonebook?
Smith
references a resource (in this case a database) named
phonebook and passes the query string Smith to that
resource. The actual document referenced by the URL is
the document produced when the query Smith is applied
to the phonebook database.
One important aspect of the definition of URIs (which
featured in the original RFC) is a mechanism for specifying a partial URI. This allows a document to reference
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
880
another document without stating its full URI. For example, when two documents are on the same server, it is not
necessary to specify the server name in a link. This makes
it possible to move an entire group of interconnected documents without breaking all the links in those documents.
The examples just given have demonstrated how URIs
can be used to provide links to any resource on the Internet. As new resources become available, new schemes
are proposed and the debate over the implementation of
URNs continues. W3C operates a mailing list on the topics
of URIs in order to support these developments.
B. HyperText Transfer Protocol (HTTP)
Tha basic architecture of the Web consists of browsers that
act as clients requesting information from Web servers.
Computer-to-computer communications are described in
terms of protocols, and Web interactions are no exception to this rule. In order to implement the prototype Web,
Berners-Lee had to define the interactions that were permissible between server and client (normally a browser,
but it could in fact be any program). The HyperText Transfer Protocol (HTTP) describes these interactions. The basis of the design of HTTP was that it should make the
time taken to retrieve a document from a server as short
as possible. The essential interchange is Give me this
document and Here it is.
The first version of HTTP was very much a proof of
concept and did little more than the basic interchange.
This has become known as HTTP/0.9. The function of
HTTP is, given a URI, retrieve a document corresponding
to that URI. As all URIs in the prototype Web were URLs,
they contained the Internet address of the server that held
the document being requested. It is therefore possible to
open a standard Internet connection to the server using
TCP (or some other similar protocol). The connection is
always made to port 80 of the server unless the URI address
specifies a different port. HTTP then defines the messages
that can be sent across that connection.
In HTTP/0.9 the range of messages is very limited. The
only request a client may make is GET. The GET request
will contain the address of the required document. The
response of the server is to deliver the requested document
and then close the connection. In HTTP 0.9 all documents
are assumed to be HTML documents.
HTPP/0.9 is clearly not sufficient to meet the aspirations
of the Web. It was quickly enhanced to produce HTTP/1.0.
This protocol is described in RFC 1945. For the first few
years of the Webs existence HTTP/1.0 was the protocol
in use. In July 1999 the definition HTPP/1.1 became an
IETF draft standard and the majority of commonly used
Web servers now implement it. HTTP/1.1 is described in
RFC 2616.
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
881
Method
name
Status code
category
GET
HEAD
POST
Action
Requests a resource specified by a URI from a server.
The action of GET can be made conditional if
certain headers are supplied. If successful it
will result in a response which contains the
requested resource in the message body.
Similar to GET, except that if the request is successful
only the header information is returned and not the
resource. This could be used when a user requires
to know whether a resource exists but is not
interested in its content.
This is used to send data to the server. A typical use of
POST would be in a Web-enabled database system. In
this type of system the browser might display a form
in order to collect input data. The URI in this case
would specify a program that could take data from
the message body and insert it into the database. The
introduction of the POST method has been extremely
important in terms of making the Web interactive.
1xx
2xx
3xx
4xx
5xx
Meaning
Informational
The request was successfully carried out
The request must be redirected in order to be satisfied
The request cannot be carried out because of client error
Server error; although the request appeared to be valid,
the server was unable to carry it out
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
882
it indicates that the document received is in a format
that can be handled by the application program Microsoft
Word.
HTTP/1.0 was introduced to make the Web a reality.
This was the major design goal, and aspects such as efficient use of network resources, use of caches, and proxy
servers were largely ignored. A public draft version of
HTTP that addresses these issues, HTTP/1.1, first appeared in January of 1996. At the time of writing it is
nearing adoption as a formal Internet standard and is
described in RFC 2616. Many recently released servers
and browsers implement HTTP/1.1.
HTTP/1.1 is essentially a superset of HTTP/1.0.
HTTP/1.1 servers must continue to support clients who
communicate using HTTP/1.0. The most important additional features introduced in HTTP/1.1 are as follows:
r Clients may make a connection with a server and
19:22
are required to meet the SGML requirement that files containing mark-up should either contain the definition of the
mark-up scheme in use or indicate where the definition
can be found. Every SGML-based mark-up scheme has
a Document Type Definition (DTD). The DTD specifies
which tags must be used, which tags can be used, and
how those tags can be arranged (ordered and nested). The
file in Fig. 2 has been encoded using the tags defined in
the HTML 4.0 Transitional Recommendation from W3C.
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
883
A URL (http://www.w3.org/TR/REC-html40/
strict.dtd) that references the HTML 4.0 DTD
is provided. In this particular case a version of the
DTD that prohibits the use of deprecated tags has been
used.
The <HTML> tag that begins the actual HTML document is matched at the end of the file by a </HTML>
tag. SGML requires that documents consist of a single
element enclosed within a pair of tags, and the <HTML>
and </HTML> tags serve this purpose. In SGML mark-up
schemes, wherever a tag pair is used to enclose some other
text, the closing tag will have the same name as the opening tag prefixed by a /.
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
884
keyword in the title (even if that title is never displayed).
Other information that may be included in the head section
is the name of the page author, keywords describing the
page, the name of the editor used to produce the page, etc.
The body section of the page contains descriptions of
the part of the document that should be displayed by a
browser. This section is delineated by the <BODY> and
</BODY> tags.
The line <H1>Example HTML Page</H1> defines a heading with the text Example HTML Page.
HTML 4.0 defines six levels of headings H1 through H6.
H1 is considered to be a heading of the greatest importance
and H6 a heading with the least importance. Browsers often display headings in bold and at a font size that distinguishes them from the body text. The lines
<P>
This page contains a link to the
home page of the
<A HREF= http://www.wlv.ac.uk >
University of Wolverhampton</A>.
</P>
define a paragraph. All the text between the <P> and </P>
tags forms part of the paragraph. Browsers cannot guarantee the layout they will use for text, as they cannot know
the screen space they will have to use in advance. The
layout of the text in an HTML file has no influence on the
way it will appear to someone viewing the text on the Web.
Visual browsers normally indicate paragraphs by inserting
white space between them, but this is not mandatory. The
paragraph shown contains a hyperlink to another document. The link is declared in the anchor (A) tag. The
anchor tag has an attribute HREF that is set to a URL (in
this case http://www.wlv.ac.uk). Since the <A>
and </A> tags surround the text University of Wolverhampton, it is this text that will contain the link to the
document located at http://www.wlv.ac.uk.
The second paragraph,
19:22
is specified using the <IMG> tag. This is a tag that does not
enclose text, and therefore a corresponding closing tag is
not necessary. A number of attributes define how the image
is to be displayed. The BORDER attribute controls how
wide the border around the image should be: in this case
no border is to be displayed. The SRC attribute specifies
the URL that gives location of the image file. The HEIGHT
and WIDTH attributes give the browser an indication of
the scaling that should be applied to the image when it
is displayed. The ALT attribute specifies what text should
be displayed if the browser is unable to display images.
It could also be used by Web page processing software to
determine what the image represents.
Many other HTML tags beyond those shown in this
example are defined in the HTML 4.0 recommendation.
When the HTML shown in Fig. 2 is displayed in one of
the most commonly used Web browsers, it is rendered as
shown in Fig. 3.
There are number of things to note about Fig. 3. The
browser has chosen to put the text that appeared between
the <TITLE> and </TITLE> tags into the title of the
window. The text for the heading has been rendered in a
bold with a large font size. In addition, the browser has
separated the heading from the first paragraph with white
space.
The text of the first paragraph has been reformatted to
fit into the size of the browser window. The text between
the <A> and the </A> tags has been underlined (seen
in color it appears blue). This is this particular browsers
default way of indicating that this text is a hyperlink. If
the browser user clicks on the hyperlink, the browser will
issue an HTTP request to the server indicated in the URL
referenced in the HREF attribute of the anchor tag. This
will request the document indicated by the URL and the
browser will eventually display it.
As the image and the text are in separate paragraphs,
the browser uses white space to separate them. The image
<P>
<A HREF=http://validator.w3.org
/check/referer>
<IMG BORDER=0
SRC= http://validator.w3.org
/images/vh40
ALT=Valid HTML 4.0! HEIGHT=31
WIDTH=88>
</A>
</P>
contains a hyperlink that is attached to an image rather than
being associated with text. The link specifies the URL of
an HTML validation service operated by W3C. The image
P1: GTV/GVX
P2: GQT/LPQ
EN017B-860
August 2, 2001
19:22
885
IV. SUMMARY
This article has described the key Web technologies: URIs,
HTTP, and HTML. Since the creation of the Web many
other technical innovations (e.g., Active Server Pages,
browser scripting languages, Java Applets, etc.) have contributed to ensuring its popularity and ubiquity. The three
technologies discussed here, however, are the ones that
provide the basis for the deployment of all Web-based
software. They continue to develop as a result of commercial pressure or via the standardization efforts of W3C.
New schemes for URIs are devised as new media types
are introduced on the Web. The HTTP/1.1 standardization
effort is nearing completion. It is already clear, however,
that HTTP/1.1 is not a lasting solution for the Web. A
working group on HTTP Next Generation (HTTP-NG) has
been set up by W3C. This group has already established
a number of directions for the development of HTTP.
HTML has undergone the most changes since its creation. The original set of tags was quite small. These have
been considerably expanded in later versions of HTML.
Many changes have come about from the acceptance of
what were at one time browser-specific tags. Finally, it
has been recognized that HTML cannot continue to grow
forever in this way and that at some point the set of tags
supported must be frozen. At the same time it is admitted that the need for new tags will continue. A subset of
SGML called Extensible Markup Language (XML) has
BIBLIOGRAPHY
Berners-Lee, T. (1989). Information Management: A Proposal. [Online] 2 May 2000. <http://www.w3.org/History/1989/proposal.html>.
Berners-Lee, T. (1994). RFC 1630: Universal Resource Identifiers in
WWW. [Online] 2 May 2000. <http://www.ietf.org/rfc/rfc1630.txt>.
Berners-Lee, T. (1999). Weaving the Web, Orion Business, London.
Berners-Lee, T., and Cailliau, R. (1990). WorldWideWeb: Proposal
for a HyperText Project. [Online] 2 May 2000. <http://www.w3.org/
Proposal.html>.
Berners-Lee, T., Fielding, R., Irvine, U. C., and Frystyk, H. (1996).
RFC 1945: Hypertext Transfer ProtocolHTTP/1.0. [Online] 3
May 2000. <http://www.ietf.org/rfc/rfc1945.txt.>
Berners-Lee, T., Fielding, R., Irvine, U. C., and Mastiner L. (1998). RFC
2396: Uniform Resource Identifiers (URI): Generic Syntax. [Online]
2 May 2000. <http://www.ietf.org/rfc/rfc2396.txt>.
Brewer J., and Dardallier, D. (2000). Web Accessibility Initiative
(WAI). [Online] 2 May 2000. <http://www.w3.org/WAI/>.
Fielding R., Irvine, U. C., Gettys, J., Mogul, J., Frystyck, H., Masinter, L., Leach, P. and Berners-Lee, T. (1999). RFC 2616:
Hypertext Transfer ProtocolHTTP/1.1. [Online] 3 May 2000.
<http://www.ietf.org/rfc/rfc2616.txt>.
Kunze, J. (1995). RFC 1736: Functional Recommendations
for Internet Resource Locators. [Online] 2 May 2000,
<http://www.ietf.org/rfc/rfc1736.txt>.
Raggett, D., Le Hors, A. and Jacobs I., eds. (1998). HTML 4.0 Specification. [Online] 3 May 2000. <http://www.w3.org/TR/1998/REChtml40-19980424/>.
World Wide Web Consortium (W3C). [Online] <http://w3c.org>.