0% found this document useful (0 votes)
141 views420 pages

Discrete Mathematics in Computer Science 0132160528 9780132160520 Compress

This document provides an introduction and table of contents to the book "Discrete Mathematics in Computer Science" by Donald F. Stanat and David F. McAllister. It was published in 1977 to serve as a textbook for a course on discrete mathematics for computer science students. The book covers topics such as mathematical reasoning, sets, relations, functions, counting, and algebra with a focus on their applications to computer science. It assumes some prior programming experience but no specific mathematics background.

Uploaded by

duke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views420 pages

Discrete Mathematics in Computer Science 0132160528 9780132160520 Compress

This document provides an introduction and table of contents to the book "Discrete Mathematics in Computer Science" by Donald F. Stanat and David F. McAllister. It was published in 1977 to serve as a textbook for a course on discrete mathematics for computer science students. The book covers topics such as mathematical reasoning, sets, relations, functions, counting, and algebra with a focus on their applications to computer science. It assumes some prior programming experience but no specific mathematics background.

Uploaded by

duke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 420

DISCRETE MATHEMATICS IN

COMPUTER SCIENCE

DONALD F. STANAT
Department of Computer Science
University of North Carolina at Chapel Hill

DAVID F. McALLISTER
Department of Computer Science
North Carolina State University

Prentice/Hall International, Inc.


Library of Congress Cataloging in Publication Data

STANAT, DONALD F (date)


Discrete mathematics in computer science.
Bibliography: p.
Includes index.
1. Mathematics—1961~- 2. Electronic
data processing. I. McAllister, David F.
(date) joint author. II. Title.
QA39.2.8688 S121 76-48915
ISBN 0-13-216052-8

This edition may be sold only in those countries


to which it is consigned by Prentice-Hall International.
It is not to be re-exported and it is not for sale in
the U.S.A., Mexico or Canada.

© 1977 by Prentice-Hall, Inc., Englewood Cliffs, N.J.

All rights reserved. No part of this book may be


reproduced in any form, by mimeograph or any other
means, without permission in writing from the publisher.

10 9 8 7 6 5 4 3 2 #1

ISBN Q-13-21605ce-4

Printed in the United States of America

Prentice-Hall International (UK) Limited, London


Prentice-Hall of Australia Pty. Limited, Sydney
Prentice-Hall Canada Inc., Toronto
Prentice-Hall Hispanoamericana, $.A., Mexico
Prentice-Hall of India Private Limited, New Delhi
Prentice-Hall of Japan, Inc., Tokyo
Prentice-Hall of Southeast Asia Pte. Ltd., Singapore
Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
Whitehall Books Limited, Wellington, New Zealand
Prentice-Hall, Englewood Cliffs, New Jersey
To Sylvia and Beth
CONTENTS

PREFACE x

Notation xiii

O MATHEMATICAL MODELS 1

0.0 Introduction 1
0.1 Principles and Models 1
0.2 Mathematical Models 2
0.3 Purposes of Models 6

1 MATHEMATICAL REASONING 8

1.0 Introduction 8
Propositions 9
Predicates and Quantifiers 20
3 Quantifiers and Logical Operators 29
.4 Logical Inference 39
5 Methods of Proof 47
6 Program Correctness 57
Axioms of Assignment 69

2 SETS 75

2.0 Introduction 75
2.1 The Primitives of Set Theory 75
{2.2 The Paradoxes of Set Theory 79
2.3 Relations Between Sets 82

{Denotes optional section


vi CONTENTS

2.4 Operations on Sets 85


2.5 Induction 95
Inductive Definition of Sets 95
Recursive Procedures 98
Inductive Proofs 100
{2.6 The Natural Numbers 108
2.7 Set Operations on &* iil

3 BINARY RELATIONS 120

Introduction 120
Binary Relations and Digraphs 120
Trees 131
Search Trees 136
Tree Traversal Algorithms 140
Special Properties of Relations 145
Composition of Relations 149
Closure Operations on Relations 155
Order Relations 164
{Some Additional Concepts for Posets 173
Equivalence Relations and Partitions 178
{Sums and Products of Partitions 187

4 FUNCTIONS 193

4.0 Introduction 193


4.1 Basic Properties of Functions 193
Inductively Defined Functions 199
Partial Functions 201
4.2 Special Classes of Functions 204
Inverse Functions 209
{One-Sided Inverse Functions 213

5 COUNTING AND ALGORITHM ANALYSIS 218

3.0 Introduction 218


5.1 Basic Counting Techniques 218
Permutations and Combinations 222
Decision Trees 225
5.2 Asymptotic Behavior of Functions 232
Some Important Classes of Asymptotic Behavior 236
5.3 Recurrence Systems 243
Divide and Conquer Algorithms 248
5.4 Analysis of Algorithms 258
Searching Algorithms 262
Sorting Algorithms 265
CONTENTS vii

6 INFINITE SETS 275

Introduction 275
Finite and Infinite Sets 275
Countable and Uncountable Sets 279
Comparison of Cardinal Numbers 288
Cardinal Arithmetic 295

7 ALGEBRAS 300

Introduction 300
The Structure of Algebras 30]
Some Varieties of Algebras 309
Semigroups 309
Monoids 310
Groups 311
Boolean Algebras 312
73 Homomorphisms 315
Congruence Relations 322
7.5 New Algebraic Systems from Old 327
Quotient Algebras 327
Product Algebras 329

APPENDIX: THE PROGRAMMING LANGUAGE 332

ANSWERS TO SELECTED EXERCISES 339

BIBLIOGRAPHY 391

INDEX 393
PREFACE

This text is intended for use in a first course in discrete mathematics in an


undergraduate computer science curriculum. The level is appropriate for a sopho-
more or junior course. The student is assumed to have experience with a high-level
programming language. No specific mathematics is prerequisite, but some previous
exposure to college-level mathematics is desirable.
The mathematics taught to students of computer science has changed radically
since the early days of this academic discipline. Initially, nearly all topics were
drawn from electrical engineering and numerical analysis. Over the years, however,
the mathematics of computer science has developed a distinct character, incorpo-
rating and melding aspects from such areas as logic, universal algebra and combi-
natorics as well as analysis. Moreover, as the field has evolved, its use of
mathematics has become more sophisticated. It is our view that a computer scien-
tist must have substantial training in mathematics if he is to understand his tools
and use them well. The purpose of this text is to provide a foundation for the
discrete mathematics used in the theory and application of computer science.
The major part of this book treats classical mathematical topics, including
sets, relations, functions, cardinality, and algebra. The approach, however, is not
classical; we have emphasized the topics of importance to computer science and
provided examples to illustrate why the material is of interest.
The first two topics of the text are usually not treated explicitly in a course of
this type. Chapter 0 is a brief description of the nature and purpose of mathematical
models. Chapter | treats mathematical reasoning, including the representation of
assertions, how inferences are made, and how assertions are proved. The final
section of the chapter is a description of how programs can be proved correct. The
material of Chapter 1 is difficult for some students, especially those who have not
had some previous experience in proving theorems in a college-level mathematics
course. For this reason, many of the proofs in succeeding chapters are presented in

vili
PREFACE ix

considerable detail with explicit references to the concepts and techniques of


Chapter 1. The symbol Jf is used throughout the text to indicate the end of a
proof.
Chapter 2 begins with the usual topics of an introductory treatment of set
theory, and then proceeds to inductive definitions of sets, proofs by induction, and
recursive programs. The final section of the chapter treats languages, or sets of
symbol strings over a finite alphabet. These sets play an important role in computer
science, but they are usually not considered in an introduction to set theory.
Chapter 3 treats relations, using digraphs as a visual representation of binary
relations on sets. Trees, equivalence relations and order relations are covered, as
well as operations on relations, including composition and transitive closure.
Chapter 4 treats functions as a special class of relations. Several important
classes of functions are defined and their properties investigated.
Chapter 5 is a treatment of counting techniques and their application to al-
gorithm analysis. The first section introduces basic concepts, including permuta-
tions and combinations. The second section develops the concept of the asymptotic
behavior of a function and how it can be used to measure algorithm complexity.
Recurrence equations and their use in the analysis of algorithms are treated in the
next section. The final section of the chapter uses the tools developed in the first
three sections to investigate the optimality of several algorithms.
Chapter 6 treats infinite sets and cardinalities, emphasizing enumeration and
diagonalization. A cardinality argument is used to show the existence of a real
number which is not computable.
Chapter 7 is an introduction to the concepts of universal algebra, including
homomorphisms, congruence relations, and quotient and direct product algebras.
Semigroups, monoids, groups and Boolean algebras are described.
Through Chapter 4, the material of the text should be covered in the order in
which it is presented, although sections and subsections which are marked with a
double dagger ({) can be omitted. The material of Chapter | is often ignored or
treated in a cursory fashion, but we feel that these fundamental concepts of mathe-
matics are better understood if studied explicitly. Many of the topics of Chapters 1
through 4 may have been studied previously by some students; these topics can be
covered as rapidly as is appropriate.
Chapters 5, 6 and 7 assume a knowledge of Chapters | through 4 but not each
other; any subset of these three chapters can be presented. It is our opinion that
Chapter 5 is the most important.
The examples which occur throughout the text range from very simple ones,
included only as illustrations of the definitions, through ones which are both difficult
and substantive. (The halting problem is treated in an example of Chapter 1, hash-
ing functions are described in an example in Chapter 4, and the existence of a non-
computable real number is established in an example of Chapter 6.) In a few of the
examples which relate the subject matter to applications, the reader may not be
familiar with terminology used (e.g., PERT charts); these examples are included for
x PREFACE

the benefit of those who can eas ily und ers tan d the m and sho uld not cau se con cer n
to those who can not. A num ber sign (3) is use d to den ote the end of a col lec tio n
of examples.
Exe rci ses are giv en at the end of eac h sec tio n in the app rox ima te ord er in
which the topics are pre sen ted in the text ; wit hin top ics , the y are ord ere d acc ord ing
to inc rea sin g diff icul ty. A pro ble m mar ked } tre ats mat eri al fro m an opt ion al sub -
section. The pro gra mmi ng pro ble ms giv en at the end of som e sec tio ns will usu all y
require additional spe cif ica tio n bef ore the y can be wor ked by nov ice pro gra mme rs.
For exampl e, in a set the ory pro ble m, one mig ht wan t to con sid er onl y sets wit h no
more than 100 elements.
This text has evo lve d ove r a per iod of sev era l yea rs. Pre lim ina ry ver sio ns
have been used ext ens ive ly at the Uni ver sit y of Nor th Car oli na at Cha pel Hill
and Nor th Car oli na Sta te Uni ver sit y. It wou ld be imp oss ibl e to list all tho se
who have contributed to the final product. Jon Bentley, Don Johnson and
Nei l Jon es des erv e par tic ula r men tio n; the y pro vid ed com men ts and sug ges tio ns
on the entire manuscript. Others who made substantial contributions include
Peter Calingaert, James W. Hanson, Yale N. Patt, Stephen M. Pizer, James
Tha tch er, Vic tor L. Wal lac e and Ste phe n F. Wei ss. Ann e Pre sne ll and Dav e Tol le
assisted in the preparation of problem solutions. Finally, we wish to thank the
many students who studied from the manuscript and contributed to its final form.
Our secretarial help has come from many quarters, but three individuals
deserve special mention. Nina Eaker worked on endless drafts and revisions in the
early stages of the manuscript, Gloria Edwards carried the work forward, and
Anne Edwards brought the manuscript to its final form. We thank them for their
help and support.

DONALD F. STANAT
Davip F. MCALLISTER
NOTATION

Logic
—P not P
PV Q PorQ
PAO Pand QO
P>@Q P implies Q
PQ P if and only if O
V Universal quantifier: for all...
4 Existential quantifier: there exists...
qd! There exists a unique...

_ Numbers
[x] the integer n such that x <<n<x+1
|x| the integer n such that x >n> x — |
N the set of natural numbers, or nonnegative integers: 0, 1, 2,...
I the set of all integers: ..., —2, —1,0,1,2,...
I+ the set of positive integers: 1, 2,3,...
Q the set of rational numbers.
Q+ the set of positive rational numbers,
R the set of real numbers.
R+ the set of positive real numbers.
(a, b) the open interval in R from a to b:
(a,b) = {x|x ERA a<x<bd}.
{a, | the closed interval in R from a to b:
[a,b] = {x|x Ee RA ax<x<dt.
(a, 5] the half-open interval in R from a to b:
(a, b] = {xla<x< bh.
[a, 5) the half-open interval in R from a to b:
(a,b) = {xja<x < Bh.
(a, co) {x|x eR A x> ah.
[a, oo) {x|x ERA x> a}.
N, the set of integers (0, 1,2,...,k — 1}.

Sets
aca a is an element of the set A.
a¢a a is not an element of the set A.
AcB the set A is contained in the set B.
AEB the set A is not contained in the set B.
d the empty, or void set.
AUB the union of the sets A and B.
AMB the intersection of the sets A and B.
A—B the relative complement of B with respect to A.

xi
the absolute complement of A.
the power set of A.
SA x ic
{x|dif € Aj}.
S>x
{x|VWi e Aj}
lic
the cartesian product of A with B.
the cartesian product of the sets A,, 1 Si<in.

Sets of character strings


a finite alphabet.
the empty string.
the length of a string x.
the set of all strings of finite nonzero length over the alphabet
x.
the set of all strings of finite length over the alphabet X,
including A.
{xy|xEe AAye Bh.
{x4X_X3...X,|xX; € A}.
L) At.
ier
LJ Af.
iceN

Relations and partitions


{a4,.-+54,> the n-tuple whose ith component is q;.
a is related to 6 under the relation R.
a is not related to 6 under the relation R.
the digraph with node set A and relation R.
the composite relation of R, with R,.
the nth power of the relation R; the composition of R with
itself 7 times.
the reflexive closure of R.
the symmetric closure of R.
the transitive closure of R.
the converse of R: {<x, Y>|<¥, x> € R}.
t(R)
rt(R) ;
a partial order.
a = b (mod k) a is equivalent to b modulo k.
[a]p the equivalence class of a with respect to R.
n a partition.
A/R the partition of A induced by the equivalence relation R.
Hy + Ry the sum of the partitions z, and z,.
N,*Ty the product of the partitions z, and z,.

xii
Functions
uC) the value of the function f for the argument a.
f:A-B J is a function with domain A and codomain B.
f(A) the image of the set A under the function f.
fog, or fg the composite function of f with g.

the set of functions from B to A.
1, the identity function on the set A.
fu}
the inverse of f.
f-*(A) the inverse image of A under f.
ta the function f restricted to A.
4A the characteristic function of A.

Cardinality and order notation


|A| the cardinality of A.

p
P(n, r) the number of: permutations of n objects taken r at a time.
the number of combinations of n objects taken r at a time.

Of) the set of functions asymptotically dominated by f.


No Aleph nuil, the cardinality of N.
c the cardinality of [0, 1].

Algebras
<8, 0, kD an algebra with carrier S, operation o, and constant k.
addition modulo k .
the product algebra of A with A’.
the quotient algebra of A’ with respect to the congruence
relation ~.

xiii
O

MATHEMATICAL MODELS

0.0 INTRODUCTION

The goal of this text is the development of mathematical concepts and techniques
which are fundamental to the field of computer science. We define computer science
broadly, as the discipline concerned with the representation and processing of
information. We consider computer science to lie somewhere between mathematics
and technology, close enough to each to be profoundly affected by developments
in either of these fields but dominated by neither. The mathematical topics we will
develop are classical ones which predate computer science, but which are generally
recognized as necessary and fundamental tools for the investigation of many
problems in the field. Our aim is to present these mathematical tools and illustrate
their use in characterizing the phenomena of computer science. In this chapter we
describe the ways in which mathematics can be used to represent objects of study.

0.1 PRINCIPLES AND MODELS


Observation is the ultimate basis of our understanding of the world around us.
But observation only provides information about the specific events which we
observe; alone, it provides little help for dealing with new situations. Useful knowl-
edge results from our ability to recognize similarities in different events, isolate the
important factors, and generalize from our experience. Generalization enables us
to operate effectively in new environments by using inferences drawn from past
experience.
Knowledge varies in sophistication from simple classification to understand-
ing based on a system of principles. A principle is a generalization, or an abstract
assertion. Principles are expressed in a variety of ways ranging from “old saws”
to equations which express relationships between physical properties. The follow-
ing assertions are examples of principles.
MODELS Ch. 0
2 MATHEMATICAL

“Virtue is its own reward.”


“All matter is composed of earth, air, fire, and water.”
“Ontogeny recapitulates phylogeny.”
“F = ma.”

Principles vary in their validity as wel l as the ir pre cis ion . Th ey als o var y in the ir
importance and the degree to wh ic h the y aff ect the way we thi nk and act .
The concept of “model” is eve n mo re vag ue tha n tha t of “pr inc ipl e.” Ro ug hl y
speaking, a model is an ana log y for so me obj ect or ph en om en on of int ere st. As
we will use the term, mode ls are use d to “ex pla in” a pro ces s or to pre dic t an eve nt.
For example, a wind tunnel, use d wit h a min iat ure rep lic a of an air cra ft, ma ke s it
possible to predict so me cha rac ter ist ics of the air cra ft’ s pe rf or ma nc e, sin ce the
behavior of the full-sized cra ft is str ong ly rel ate d to tha t of the mod el. Sim ila rly ,
a world globe all ows us to est ima te dis tan ce be tw ee n loc ati ons on the ear th, and
an orrery provides a vis ual mo de l of the mo ve me nt of the pla net s abo ut the sun .
Genetic models for the transf er of tra its pro vid e a bas is for pre dic tin g the fre que n-
cies with whic h inh eri ted cha rac ter ist ics wil l app ear in suc ces siv e gen era tio ns.
Models can als o be mis lea din g. A me di ev al mo de l of hu ma n re pr od uc ti on
proposed that babies dev elo p fro m ho mu nc ul i co nt ai ne d ab ini tio in a wo ma n’ s
body. Of cou rse , fem ale ho mu nc ul i als o co nt ai ne d oth er ho mu nc ul i nes ted wit hin .
Since it was felt tha t thi s nes tin g cou ld not go on wit hou t lim it, thi s mo de l had the
unco mf or ta bl e imp lic ati on tha t the rac e wo ul d be co me ext inc t, sin ce re pr od uc ti on
would cease aft er the in ne rm os t ho mu nc ul i wer e bor n. A mo de l of our uni ver se
wh ic h was co mm on ly acc ept ed in the fif tee nth cen tur y pre dic ted tha t Co lu mb us
would not ret urn fr om his vo ya ge to the wes t. Thi s mo de l of a fla t ear th of fin ite
extent was cle arl y an im po rt an t one , par tly bec aus e of its inf lue nce on exp lor ati on,
but it see ms wr on g to cal l it a val uab le mod el. The val ue of a mo de l mi gh t bes t be
define d as the deg ree to wh ic h it ena ble s us to an sw er que sti ons and ma ke pre dic -
tions correctly.
Mathematics, because of its rigor and lack of ambiguity, has always provided
a good la ng ua ge for the exp res sio n of pri nci ple s. Mo de ls bas ed on ma th em at ic al ly
stated principles are called mathematical models. The purpose of this text is to
dev elo p ma th em at ic s for exp res sin g pri nci ple s and con str uct ing mo de ls in co m-
puter science. While the mathematical topics we treat cannot be nicely categorized,
our emphasis will be on what is often referred to as discrete mathematics.

0.2 MATHEMATICAL MODELS

A mathematical model is a mathematical characterization of a phenomenon or


process. Such a definition is necessarily imprecise, but some illustrations should
establish the notion. A mathematical model has three essential parts: a process or
phenomenon which is to be modeled, a mathematical structure capable of express-
ing the important properties of the object to be modeled, and an explicit correspon-
Sec. 0.2 MATHEMATICAL MODELS 3

Real-world Process

properties of the process


and those of the structure

Mathematical Structure

Fig, 0.2.1 Components of mathematical model

dence between the two. Such a model is represented by Fig. 0.2.1. Some comments
will help to clarify the concept.
(a) The first component of a model is a phenomenon or process which we
wish to characterize mathematically. Examples include physical processes,
such as planetary motion, fluid flow, or the pattern of weather change,
as well as such things as economic processes, learning patterns, and so
on. Examples in computer science include the execution of a program,
the allocation of resources of a computation center, and the flow of
information in a computer network. Although the phenomena of interest
need not be taken from the “real world,” they usually are, and in our
discussion, the phrase “real world” will denote this component of a
mathematical model. The real world component is described quantita-
tively by such things as parameter values and times at which events occur.
/(b) The second component of a model is an abstract mathematical structure.
The set of integers with the operations of addition and multiplication
provides one example of such a structure. In itself, this structure is abstract
and has no intrinsic relation to the real world. However, because of its
abstractness, the structure can be used to model many different phenom-
ena. Every mathematical structure has an associated language for
making assertions. In our familiar system of algebra, the assertions
5+ 6< 10, and 7x +y = 18
can both be made, although one is incorrect. If a mathematical model
is successful, the language of its mathematical structure can be used to
make assertions about the object being modeled..
(c) The third component of a model is a specification of the way in which
the real world is represented by the mathematical structure, that is, a
4 MATHEMATICAL MODELS Ch. 0

correspondence between the elements of the first component and those of


the second. Parameters, relationships, and occurrences in the real world
will be associated with such things as variables, equations, and operations
in the mathematical structure. This correspondence makes possible the
use of the mathematical structure to describe those facets of the real
world which are of interest.
Mathematical models, as described here, pervade our culture, particularly in
quantitative areas such as economics and physics. The following example provides
an illustration of the three components of a model and demonstrates that models
are common and familiar objects.

Example
Every business must keep track of the cash received from sales each day. A
mathematical model is used for this purpose. The first component of the model is
the process of accumulating money from sales. The set of integers (denoting cents),
together with the operation of addition, provides a simple but appropriate mathe-
matical structure. Receiving cash from a sale corresponds to adding the amount of
the sale to the current receipts. The principal parameter of the model represents
cash received. This parameter takes on integer values; at the beginning of the day
its value is 0, and at any time during the day the value of the parameter is the current
amount received from sales. The occurrence of a cash sale is represented in the
structure by the operation of addition; selling an item worth & cents is represented by
adding k to the current value of the parameter. At the end of the business day,
the store owner can determine the total cash receipts by noting the value of the
parameter. +

The above example illustrates all the crucial points of our description of a
mathematical model. It also illustrates that mathematical models ignore certain
aspects of the real world process. For example, the model described above does
not keep track of how many one dollar bills or how many pennies have been
received. This failure is not considered to be a defect of the model, since the store
owner is willing to assume that the actual form of currency received will not cause
him any particular inconvenience. If, however, all of his income for one day hap-
pened to be in pennies, he might find himself with a serious transportation problem
when it came time to take the day’s receipts to the bank. Other factors which are
ignored by the model may be more important. The model does not try to answer
such questions as how the storeowner can maximize his profits. It is legitimate to
use a mathematical model to deal with this kind of question, but the question is
beyond the scope of a model designed simply to keep track of the store’s daily
receipts. Thus, the suitability of a mathematical model depends strongly on the
problem at hand. Ideally, we want a model to represent everything that is impor-
tant about the process and ignore everything else. It is difficult to realize this ideal,
because we are often not sure what aspects of the real world are important. In
fact, the process of deciding which aspects are important can be one of the most
difficult and rewarding steps in specifying a mathematical model.
Sec, 0.2 MATHEMATICAL MODELS §&

Without going into detail, we can give examples of more elaborate mathe-
matical models and describe how they are used.

Examples
{a) A set of simultaneous partial differential equations is useful as a mathematical
structure to describe planetary motion. Newton first proposed such a model
based on observations of the planets and his work on gravitational attraction.
(b) Differential equations are used to determine the flow of current in electrical
circuits by establishing a correspondence between the parts of an electrical
circuit and the terms of mathematical equations. The same equations can be
used to describe mechanical systems involving objects with mass, springs, and
damping devices called dashpots. Thus, the same mathematical structure can be
used in models of entirely different phenomena. These examples also show that
not all models need be mathematical: a mechanical system consisting of springs,
masses, and dashpots can be used as a mechanical model of an electrical circuit,
and vice versa. Analog computers exploit this fact and use electrical models to
solve problems which are expressed mathematically.

(c) Mathematical models are the basis for all computer simulations. Consider the
problem of simulating the operation of a computer center. We can view a com-
puter center as a system which accepts programs and program data as inputs
and produces outputs in a variety of forms, including program listings and
program output. At any time, the state of the system is described by parameter
values which specify what programs are being executed, which disk and tape
drives are busy, the length of the input queue, etc. Other parameters, such as
average turnaround time and the total number of programs processed, can be
used to measure the performance of the system. A mathematical model for
simulation of the system in discrete time steps will incorporate these parameters
into a set of mathematical equations which describe how the values of the
parameters and the system input at any time ¢ can be used to determine the
values of the parameters at time ¢ + 1. Different machine configurations and
different operation policies will be represented by different sets of equations. The
system is simulated by hypothesizing initial parameter values for time t = 0 and
then successively solving the equations for times ¢ = 1, 2,3,..., a. If the simu-
lation is successful, then the system parameters at time ¢ = nv will accurately
forecast the behavior of the system. Such simulation models can be used as a
basis for choosing among various alternatives, e.g., the performance of a model
can be used to predict the result of a proposed change in either a hardware
configuration or in operations policy. #

The rapid progress of computer science is largely due to the development of


appropriate mathematical tools. Mathematical models have been applied success-
fully to a broad range of problem areas, including
design of computers and computer systems,
allocation of resources of computer systems, such as paging algorithms for
storage management,
analysis of the cost of algorithm execution,
6 MATHEMATICAL MODEL Ch. 0

measurement of the intrinsic “difficulty” of certain classes of problems,


development of programming languages and language processors, and
methods of proving program correctness.
We will mention some of these problems explicitly when the appropriate mathe-
matical tools are developed. But more importantly, we will develop a basis for the
treatment of all these topics. Thus, although it is not feasible to present even super-
ficial treatments of all these problems, a contemporary approach to any of them
would be based on the mathematical topics which we treat here.

0.3 PURPOSES OF MODELS

The purposes of models fall into three categories. In the most straightforward
applications, models are used to present information in an easily assimilated form.
For example, graphs may be used to present genealogies and family trees. It is
much easier to decide if cousin Joseph is a descendant of great-grandfather John’s
sister Martha when we have a drawing of the family tree before us instead of a
written record of marriages and offspring. In the same way, a roadmap provides
a descriptive model of a highway network. Planning a trip would not be so easy if,
instead of a roadmap, we had a list of distances between adjacent cities.
A second use of models is to provide a convenient method for performing
certain computations. Familiar examples include optimization methods and
Fourier analysis. The choice of a model for the purpose of computation is often
directly affected by the set of available mathematical techniques. For example, a
system known to have nonlinear components may be modeled approximately with
a set of linear equations so that linear programming can be used to estimate a
solution.
Thirdly, models are used for investigation and prediction. Simulation, both
with physical models and with computers, is an excellent example. The Wright
brothers invented the wind tunnel so they could use physical models to compare
the lifts of different airfoils. Analogous experiments in water tanks use models of
ship hulls to determine which shapes produce the least turbulence and drag.
Models are frequently used to predict parameter values of events which have not
yet occurred, such as the time of tomorrow’s sunrise or the implications for the
national economy of a change in the tax laws. The equations used for calculating
the time of tomorrow’s sunrise are well established and thoroughly tested; con-
sequently, we have a great deal of faith in these predictions. The same is not true
for current models of the national economy, and our prediction in this case is not
likely to be so accurate. In many cases, the predictive ability of a model determines
its worth. The value of Newton’s model of planetary motion was established.
beyond any doubt when deviations from the model’s predictions led to the discovery
of the planet Neptune. The location of Neptune was estimated by determining
what could be the source of observed deviations from the predicted orbit of
Uranus.
Sec. 0.3 PURPOSES OF MODELS 7

A mathematical model is an abstraction which associates parameters and


processes of the real world with expressions and operations in a mathematical
structure. This abstraction allows us to ignore those aspects of the real world
which are not of interest and provides a framework for studying those which are.
If the model is successful, the properties of the mathematical structure are strongly
related to the phenomena being studied.
Much of this text will be devoted to the development of mathematical struc-
tures for models which are important to computer science. Our goal is to provide
a basis for a reasonable and fruitful correspondence between the computational
process and the mathematical structures which we use to represent it.

Suggestions for Further Reading

The first chapter of Maki and Thompson [1973] gives an excellent description
of how models are built and refined. Their discussion treats the roles of axioms
and theorems in models and provides a basis for some of the topics of our next
chapter. Chapter 2 of their book is a collection of case studies from a variety of
areas. The first chapter of Roberts [1976] is also a good description of model
types and the modeling process.
MATHEMATICAL REASONING

1.0 INTRODUCTION

Mathematics is the study of the properties of mathematical structures. In this


chapter we will study mathematical reasoning, which is the process used to verify
these properties.
A mathematical structure is defined by a set of axioms. By definition, an axiom
is a true statement about the properties of the structure. Other true assertions which
can be inferred from the truth of the axioms are called theorems. A proof of a
theorem is an argument which establishes that the theorem is true for a specified
mathematical structure. A proof is often presented as a sequence of assertions such
that each assertion is either an axiom of the mathematical structure, a previous
theorem, or a logical inference from previous steps of the proof. Therefore, in
order to prove theorems, we must be able to make assertions about mathematical
structures and to determine when one assertion follows from others. To establish
that one assertion follows from another, we must use only principles of reasoning
which we accept as valid; these principles are called rules of inference.
In this chapter we will study how to make careful assertions about mathe-
matical structures as well as how to combine these assertions and draw conclusions
from them. Because of the importance of these topics to any development of the
theory of computer science, we will treat them carefully. The concepts and tools
we develop in this chapter are directly relevant to certain important areas, such as
proving programs correct. Our primary concern, however, is with the more general
topic of mathematical reasoning, and our goal is to develop the student’s ability
to discern and construct sound mathematical arguments.
The material in this chapter is a mathematical model of the reasoning process,
or careful argument. It also serves as a brief introduction to some of the concepts
and notations of mathematical logic.
Sec. 1.1 PROPOSITIONS 9

1.1 PROPOSITIONS

An assertion is a statement. A proposition is an assertion which is either true or


false, but not both.t If a proposition is true, we say it has a “truth value” of true;
if a proposition is false, its truth value is false.

Examples
The following are all propositions:
(a) The moon is made of green cheese.
(b) 4 is a prime number.
(c) 3+3=6.
(d) 2 is an even integer and 3 is not.
(e) It snowed on the island that is now called Manhattan on the day the King of
England signed the Magna Carta.
(f) My most recently written computer program always halts if allowed to run for
a sufficiently long time.
Of the above propositions, (a) and (b) are false, (c) and (d) are true, and (e)
may or may not be true; we have no way of ascertaining its truth value. Neverthe-
less, we assume the assertion is either true or false and therefore classify it as a
proposition. The truth of proposition (f) may be difficult to determine; establishing
the truth of such assertions is the subject of some profound mathematical results in
the theory of computation.

Th e fo ll ow in g ar e no t pr op os it io ns :
(g) x +y> 4.
(h) x = 3.
(i) Are you leaving?
(j) Buy four of them.
The first example is an assertion but not a proposition because its truth value de-
pends on the values of x and y. Similarly, the truth value of the second assertion
depends on the value of x. Examples (i) and (j) are not assertions and are therefore
not propositions. #

A propositional variable denotes an arbitrary proposition with an unspecified


truth value. We will use the letters P, Q, R, .. . for propositional variables. Propo-
sitions as well as propositional variables can be combined to form new assertions

+A system in which propositions must be either true or false is said to use a two-valued logic.
The characteristic that “a proposition which is not true is false, and vice-versa” is known as the
law of the excluded middle. Some mathematicians do not consider the law of the excluded middle
to be an accurate reflection of our reasoning. To understand some of the reasons for rejecting the
law of the excluded middle and for a description of logical systems with more than two truth
values, the reader is referred to Rescher [1969].
10 MATHEMATICAL REASONING Ch. 1

using words such as “and,” “or,” and “not.” For example, from the propositions
“John is six feet tall” and “There are four cows in the barn,” we can form

“John is six feet tall and there are four cows in the barn.”
“John is six feet tall or there are four cows in the barn.”
“John is not six feet tall.”

In the same way,


“Pp and Q”

“<p or Q”

“not P”

are assertions which can be formed from the propositional variables P and Q.
In expressions such as the above, the variables P and Q are called operands, and
the words “and,” “or,” and “not” are called logical operators, or logical connec-
tives. Logical connectives denote operations on propositions in the same way that
“plus” and “times” denote operations on numbers. This terminology is common
throughout mathematics; for example, in algebra the expression “4 + x” has 4
and x as operands and + as an operator.
An assertion which contains at least one propositional variable is called a
propositional form. When propositions are substituted for the variables of a propo-
sitional form, a proposition results. Thus, if P represents “John is six feet tall”
and Q represents “Two is a prime number,” the propositional form “P and Q”
represents the proposition “John is six feet tall and two is a prime number,” and
“not P” represents “It is false that John is six feet tall.” When no confusion will
result, we will often refer to propositional forms as propositions. The principal
distinction between propositions and propositional forms is that every proposition
has a truth value whereas a propositional form is an expression whose truth value
may not be determined until propositions are substituted for its propositional
variables.
When a logical operator is used to construct a new proposition from old ones,
the truth value of the new proposition depends on both the logical operator and
the truth values of the original propositions. We will now discuss how the logical
operators “and,” “or,” and “not” affect the truth value of propositions. We will
see that the meaning of the logical operators does not always coincide precisely
with English usage.
The logical operator “not,” or negation, is denoted by the symbol —. Let P
denote a proposition; then “P is not true” is a proposition which we represent by
“-——«P” and refer to as “not P,” or the negation of P. It follows from the law of the
excluded middle that —P is true if P is false, and vice versa. The relationship
between the truth value of —P and that of P is defined by a truth table for the logi-
cal operator —. The truth table of a logical operator specifies how the truth value
of a proposition using that operator is determined by the truth values of the oper-
ands. A truth table lists all possible combinations of truth values of the operands
Sec, 1.1 PROPOSITIONS 11

in the leftmost columns and the truth value of the resulting proposition in the
rightmost column. The truth table for — is the following:

In order to make truth tables easier to read, we will generally use the symbol
1 to denote true and 0 to denote false. Using this convention, the truth table for —
is given as
P —P

0 1
1 0

While negation changes one proposition into another, other logical operators
combine two propositions to form a third. An example is the logical operator
“and,” which we will denote by the symbol /. If P and Q are propositions, then
“P and Q” is a proposition which we represent by “P (A Q” and refer to as the
conjunction of P and Q. The following truth table defines the logical operator /\.

P Q(|PAQ

The truth table defines P (A Q to be true if and only if both P and Q are true.
Like “and,” the logical operator “or,” denoted by the symbol \/, combines
two propositions to form a third. If P and Q are propositions, then the proposi-
tion “P or Q” is called the disjunction of P and Q and is denoted by “P V Q.”
The following truth table defines the logical operator \/.

Q(|PV@Q

It follows from the truth table that P V Q is true if at least one of P or Q is true.
This operator is known as “logical or” or “inclusive or.” One can also define an
12. MATHEMATICAL REASONING Ch.1

“exclusive or,” denoted by @, by the following truth table:

P Q|P®eQ

The English language uses the word “or” to denote both the “inclusive or”
and the “exclusive or.” For example, an “inclusive or” is intended in the sentence
“It will rain or snow today”
since the speaker would presumably not be branded a liar if it both rained and
snowed. On the other hand,
“You have to wash the dishes or you must clean the garage”
is not likely to be considered a true statement if, in fact, you are required to wash
the dishes and to clean the garage as well. In mathematics, we use different symbols
for the “inclusive or” and the “exclusive or” to preclude any ambiguity.
The logical operator “implies” is denoted by the symbol =; the proposition
“P implies Q” is represented by “P = Q” and is called an implication. The operand
P is called the premise, hypothesis, or antecedent, and Q is called the conclusion
or consequence. The truth table for the operator => is the following:

The proposition P > Q is false only when P is true and Q is false. Implications may
be stated in a number of ways; the assertion P = Q may be expressed as
“If P, then Q”
“P only if Q”
“P is a sufficient condition for QO”
“Q is a necessary condition for P”
“QO if P”
“@Q follows from P”
“Q provided P”
“O is a logical consequence of P”
“QO whenever P.”
The converse of P => Q is the proposition Q = P, and the contrapositive of P > Q
is the proposition —Q > —P. If P > Q is true, then P is said to be a stronger
=
Sec. 1.1 PROPOSITIONS 13

assertion than Q; thus “x is a positive integer” is a stronger assertion than “x


is an integer.”
The English language uses implication to assert a causal or inherent relation-
ship between a premise and a conclusion. Thus, “If I fall in the lake, then I will
get wet” relates a cause to its effect, and “If Iam a man, I am mortal” characterizes
a property of men. However, in the language of propositions, the premise of an
implication need not be related to the conclusion in any substantive way. This
can be disturbing, as illustrated by the following example.

Example
If P represents “oranges are purple” and Q represents “the earth is not flat,”
then P > Q represents “If oranges are purple, then the earth is not flat.” Although
no causal or inherent relationship holds between the color of oranges and the shape
of the earth, the implication P = Q is true since the premise is false and the con-
clusion is true. #

If P and Q have the same truth values, then they are said to be /ogically equiva-
lent propositions. A logical operator called “equivalence” and denoted by <= pro-
duces a true proposition if the operand propositions are logically equivalent. The
truth table which defines the operator “equivalence” is the following:

P O|P<@

Comparison of the truth tables for implication and equivalence shows that
if P > Q is true, then P > Q and QO => Pare both true. Conversely, if both P > Q
and Q = P are true, then P <> Q’is true. For these reasons, the terminologies for
equivalence and implication are closely related. The proposition P < Q is read
“P is equivalent to Q,” “P is a necessary and sufficient condition for Q,” or “P
if and only if Q.” The abbreviation “iff” is often used to represent the phrase “if
and only if.”
Other logical operators can be defined and are of interest for a variety of
reasons; some of them will be described in the exercises of this section.
Truth tables for individual operators can be used to construct truth tables for
arbitrarily complex propositional forms. The truth table for a propositional form
specifies its truth value for every possible combination of truth values of its propo-
sitional variables. Each propositional variable can assume either of two values,
true or false. Therefore, if k variables occur in a proposition, the associated truth
table must describe 2* cases. Each case occurs as a separate line in the truth table.
14 MATHEMATICAL REASONING Ch.1

Examples
(a) Construct a truth table for the proposition (Q A —“P) => P.

P Q|—P| (QA —P)| (QA “P) => P

(b) Construct a truth table for the proposition [((P A Q) V “R]<> P.

RIPAQI|7RI(PAQVmRIIPA QV ARI P
|

CD
©

OO

Oe
OOOO
OO

et
Om

BE oo
>

mm

Om

et
>

Oe

oO

et
et

OOS

Oh

OE
ht

et OD
O
Oe

Lola
~——

am

Oe

toh
—_

—_

In the above truth tables, we have used two conventions which aid readability:
(i) All propositional variables occur in the leftmost columns.
(ii) Truth values are assigned to the propositional variables by “counting in
binary” from 0 to 2* — 1, where & is the number of propositional vari-
ables.
A tautology is a propositional form whose truth value is true for all possible
values of its propositional variables, e.g., P \/ 1 P. A contradiction or absurdity
is a propositional form which is always false, such as P A — P. A propositional
form which is neither a tautology nor a contradiction is called a contingency.
Properties of a propositional form can sometimes be determined by construct-
ing an “abbreviated” truth table. For example, if we wish to show that a proposi-
tional form is a contingency, it suffices to exhibit two lines of the truth table, one
of which makes the proposition true and another that makes it false. To determine
if a propositional form is a tautology, it is only necessary to check those lines of the
truth table for which the proposition could be false.

Example
Consider the problem of determining whether (P A Q) => Pisa tautology. We
will use an abbreviated truth table. If an implication A => B is false, then A must be
true and B must be false. The truth table for (P A Q) = P has only one line where
Sec. 1.1 PROPOSITIONS 15

the value of the premise P A Q is true. Since this is the only instance where
(P A Q) => P could be false, it suffices to consider this line.

P Q|PAQ|PAQ)>P

Since the value of the propositional form for this line is true, it follows that the
proposition is a tautology. +

It is often convenient to replace one propositional form by another which is


logically equivalent. If two propositional forms are logically equivalent, one can
be substituted for the other in any proposition in which they occur; thus, since P
1s logically equivalent to P \/ P, it follows that P \V Q is logically equivalent to
(P V P) V Q. Table 1.1.1 is a list of important equivalences, often called identities.
The symbols P, Q, and R represent arbitrary propositional forms. The symbol
“1” is used to denote either a tautology or a true proposition. Likewise, the sym-
bol “0” represents a false proposition or a contradiction. The names which appear
to the right of the identities refer to properties and “rules of inference” which will
be discussed later.
Certain of the identities are particularly important. Identity 18 permits the
replacement of implications by disjunctions. Identities 7 and 8 permit the replace-
ment of disjunctions by conjunctions and vice versa. Most of the identities in

Table 1.1.1 LOGICAL IDENTITIES

1 P<(PYV P) idempotence of V
2. P<>(P AP) idempotence of A
3. (PV O<S(OV P) commutativity of V
4, PAQ<-(QAP) commutativity of A
5. (PV QV RISIPV (OV RI associativity of V
6. [PA QA RIS(IPACA RI associativity of A
7. “PV Q)<>(7P A 7Q) |} DeMorgan’s
’ Laws
8. -(P A Q) <> (PV 70)
9 [PA(OVRIS(PAQDV(PAR distributivity of A over V
10. [PV(QARISIPV ODACPY RJ distributivity of V over A
ll. (PVD<+1
12. (PADSP
13. (PVO@P
14, PAH+0
15. (PV mP)<1
16. (PA —mP)<0
17. P< —-(-P) double negation
18. (P>Q)->("PV Q) implication
19. (Pe O<elP>AAC>P)) equivalence
20. (PA Q)=> Rl <> [P>(QO> R)] exportation
21. [((P=>QAP> 7-0] —~P absurdity
22. (P= Q)<>(“Q> —P) contrapositive
16 MATHEMATICAL REASONING Ch. 1

Table 1.1.1 have straightforward intuitive interpretations; all of them can be


established by constructing truth tables.
If propositional forms are not carefully written, ambiguities in their interpreta-
tion can arise. For example, the expression P= Q => R could be interpreted as
(P > Q)> R or P>(Q= R). Since these two expressions are not logically
equivalent, the ambiguity is not acceptable and parentheses must be used to specify
which expression is intended. However, (P \ Q) \ R and P A (Q A R) are
logically equivalent (by identity 6), and consequently the use of P \ Q (A R does
not result in an ambiguous truth value. Parentheses are often deleted if all inter-
pretations are equivalent propositional forms. For example, we commonly write
PAQAR,P\V OV R,and P= Q< R. We will adopt one further convention
for reducing the number of parentheses in an expression: the negation sign will
apply to the smallest possible subexpression consistent with the parentheses. Thus,
—P \V QO will denote (—P) V Q rather than —(P V Q).
Identities such as those in Table 1.1.1 can be used to show relationships
between propositional forms and to find logically equivalent expressions.

Example
Simplify the following propositional form:
> (BV
> B) V (A => D)]
[((A D).
The numbers at the right indicate which identities are applied at each step.

[((AA V B)V (PAV D> (BV D) (18)


[—4 Vv (BV D> (BV D) (3, 3, 1)
—[7A V (BV DV (BY D) (18)
[4 A ~(BV D)]) Vv (BY D) (7, 17)
(AV BV D) A[7(BV DV @v DJ G, 10)
(AV BV D)A1 (15)
AVBVD (12)
+
Table 1.1.2 is a list of useful tautologies which are implications. The names
associated with some of the implications correspond to “rules of inference”; these
will be discussed in Section 1.4.

Table 1.1.2 LoGicaL IMPLICATIONS

1. P=>(PV Q) addition
2, (PAQ>P simplification
3. [PA P2>Q120 modus ponens
4. (P>QA70])> —7P modus tollens
5. [AP A(PV QI>@ disjunctive syllogism
6. (P= OQ) A (O> BR) > (P= R) hypothetical syllogism
7. (P> O)=>[ (O
=> R)> (P=> R)]
8. [(P>QAR>SP-IPARD>(OAS)I
9 (Po? QA(Q>R)]> (PR)
Sec. 1.1 PROPOSITIONS 17

The following example illustrates how a facility with propositions can be


useful in dealing with some vexing problems of everyday life.

Example
A man who was captured by savages was promised his freedom if he could deter-
mine with a single “yes or no” question the color of the tribe’s idol. He knew the
idol was either white or black. Unfortunately, the tribe contained two kinds of
individuals: liars, who invariably gave the wrong answer to any question they were
asked, and truth-tellers, who invariably gave the right answer. Fortunately, the
victim was well-educated. He knew he must ask a question which would be answered
according to the following table:

Color of Idol
White Black

Liars
Truth-tellers Yes No

However, since a liar always gave the wrong answer, he realized he must ask a
question whose correct answers could be tabulated as follows:

Color of Idol
White Black

Liars
Truth-tellers

Whereupon he asked his nearest captor “Is it true that either you tell the truth and
the idol is white or that you lie and the idol is black ?”+ This question enabled him to
determine the color correctly, since an answer of yes meant the idol was white and xo
meant it was black. Unfortunately, the savages thought it was just a lucky guess and
reneged on their promise. That’s why you never heard this story before. #

In this section we have introduced the notions of proposition and logical


operations. We then illustrated the use of truth tables to establish whether a propo-
sitional form is a tautology, contingency or absurdity. These concepts and tools
will form the basis for the remainder of our discussions of mathematical reasoning.

Problems: Section 1.1

1. Using truth tables, show that if P <> Q is true, then P > Q and Q => P are both
true. Conversely, show that if P = Q and Q => P are both true, then P <> Q is true.
2. Show that P = Q has the same truth value as “P VY Q for all truth values of P
and Q, i.e., show that (P > Q)<(—P V Q)isa tautology.

+Simpler questions of equivalent power can be formulated, e.g., “Would the other kind of
person say yes if I asked him if the idol is black ?”
18 MATHEMATICAL REASONING Ch.1

Establish whether the following propositions are tautologies, contingencies, or


contradictions:
(a) PV —P
(b) PA 7P
(c) P= —1(7P)
(4) “(PA Q)<>(7P V 7Q)
(ec) “(PV Q)<>(7P A 7Q)
(ff) (P>Q<—(7“Q=> 7P)
(g) (P>QA(Q=P)
(h) [PA(QV RI >(PA QV PA R)
(i) (PA 7P)>@Q
(ij) (PV 7Q)>Q
(k) P>(PV Q)
() (PA Q)>P
(m) (PA Qe P]+[P= Q]
()) (P22 OV (RSP V DOV S)I
Let P be the proposition “It is snowing.”
Let Q be the proposition “I will go to town.”
Let R be the proposition “I have time.”
(a) Using logical connectives, write a proposition which symbolizes each of the
following:
(i) If it is not snowing and I have time, then I will go to town.
(ii) I will go to town only if I have time.
(iit) It isn’t snowing.
(iv) It is snowing, and I will not go to town.
(b) Writ e a sent ence in Engl ish cor res pon din g to each of the fol low ing pro pos iti ons :
(i) Q<=(R A =P)
(ii) RAQ
(iii) (Q> R) A (R> Q)
{fiv) “(RV Q)
State the converse and contrapositive of each of the following:
(a) If it rains, I’m not going.
(b) I will stay only if you go.
(c) If you get 4 pounds, you can bake the cake.
(d) Ican’t complete the task if I don’t get more help.

For each of the following expressions, use identities to find equivalent expressions.
which use only A and — and are as simple as possible.
(a) PV QV mR
(b) => PI
PV [(7@ A R)
(c) P»>(Q=>P)
For each of the following expressions, use identities to find equivalent expressions
which use only V and — and are as simple as possible. .
(d) (PA Q) A 7P
(ec) [P=(QV 7ARI] A WPA Q
(f) “PA 7Q A (7R=> P)
Sec. 1.1 PROPOSITIONS 19

Establish the following tautologies by simplifying the left side to the form of the
right side:
(a) (PA Q)>P)1
(b)° —(-(P V Q) => —P)=0
(c) (Q>P) A (7P>Q) A (Q> QO) <P
(d) (P= 7P) A (7P => P)] <0
Relate the following assertion to the logical operator >: “If you start with a false
assumption, you can prove anything you like.” HINT: Consider the truth table
of =>.
An operation with two operands is said to be commutative if the order of the operands
does not affect the result. Thus, addition is commutative since x + y = y + x for
all values of x and y, but subtraction is not commutative since 4 —-2~42—-4.A
logical operator with two operands is commutative if reversing the order of the
operands produces a logically equivalent proposition.
(a) Determine which of the following logical operators are commutative: A,
Vi, @.
(b) Prove your assertions by using truth tables.
10. Let “(7)” denote a logical operator with two operands; the expression x [_] y denotes
the result of applying [] to the operands x and y. The operator [] is said to be
associative if x [_] (y (] z) and (« [] vy) (] z are logically equivalent for all operands
x,y, and z,
(a) Determine which of the logical operations A, V, =, <>, and @ are associative.
(b) Prove your assertions using truth tables.

i. The operation of multiplication is said to distribute over addition because


xe(y + z) = x-y + x-z. On the other hand, addition does not distribute over mul-
tiplication, since x + (y-z) # (x + y)-(x + 2) for all values of x, y and z. If []
and o are logical operators with two operands then [_] is said to distribute over o if
P(1(Q o R) and (P (] Q) > (PF R) are logically equivalent.
(a) Using truth tables and the identities of this section, show that A and V
distribute over each other, and each of A, V, and = distributes over itself.
(b) Does either addition or multiplication distribute over itself?
12. (a) We have seen that = can be expressed in terms of V and — since P > Q and
—P V @Q are logically equivalent. Find a way of expressing @ using only
A, V,and —.
(b) Show that all the logical operators described in this section can be expressed
using only A and —.
13. (a) The Sheffer stroke, or nand operator, is defined by the following truth table:

Q|P\Q
20. MATHEMATICAL REASONING Ch.1

Nand is an acronym for not-and; P| Q is logically equivalent to —(P A Q).


Show that
(ij) P| P< —P
(ii) (PIP)(QIQ<-—PVaQ
(iii) (P| Q)|(P|Q<-PAQ
(b) The Peirce arrow, or nor operator, is defined by the following truth table:

Q|PiQ

For each of the following, find equivalent expressions which use only the nor
operator.
(ij) “P
Gi) PV Q
Giii) PA Q

Programming Problem

Write a program to construct truth tables of propositions. Assume the propo-


sition will have no more than three variables (P, Q, and R) and calculate the
truth value of the proposition for each possible set of truth values for P, QO, and R.

1.2 PREDICATES AND QUANTIFIERS

The language of propositions is not sufficiently powerful to make all the assertions
needed in mathmatics. We also need to make assertions such as “x == 3,” “x > y,”
and “x + y = z.” Such assertions are not propositions, since they are not neces-
sarily either true or false. However, if values are assigned to the variables, each of
these assertions becomes a proposition. Similar assertions occur in English where
pronouns and improper nouns are often used as variables; e.g.,
“He is tall and blonde,” (“x is tall and blonde”).
“She lives in the city,” (“x lives in y”).
These assertions are formed using variables in a “template” which expresses a
property of an object or a relationship between objects. These templates are called
predicates. Assertions made with predicates and variables become true or false
when the variables are replaced by specific values. In the assertion “x is tall and
blonde,” x is a variable and “is tall and blonde” is a predicate; in the assertion
“x lives in y,” x and y are variables and “lives in” is a predicate. For ease of dis-
cussion, we will often refer to an assertion containing a predicate simply as a
“predicate.”
Sec. 1.2 PREDICATES AND QUANTIFIERS = 21

Example
Predicates are commonly used in control statements in high-level programming
languages. For example, a statement of the form

ifx > 3 then y<z


includes the predicate “x > 3.” When the statement is executed, the truth value of
the assertion “x > 3” is determined using the current value of the variable x; the
assertion is assigned either the value 1 (representing true) or 0 (representing false).
The coding of truth values as integers is sometimes exploited in strange ways in
programming languages. For example, in PL/I,
A=X>3;

is a legitimate assignment statement; execution of this statement causes the numeric


variable A to be assigned the value of lif ¥ > 3istrueandOif¥ <3. #

Some predicates are sufficiently important to warrant special notation. Exam-


ples include the use of “=” in assertions of the form “x is equal to y” and
“>” to assert “x is greater than y.” We will use these and other special notations
wherever convenient and denote other predicates by capital letters, e.g.,
“x is a female” can be denoted by F(x),
“x is married to y” can be denoted by M(x, y), and
“x + y = z” can be represented by S(x, y, z).
When a symbol, such as F, M, or S above or the symbol “=” in the assertion
“x = y,” denotes a specific predicate, it is called a predicate constant. (We will
use the term “predicate” to refer to either a predicate constant or a predicate
variable when the context makes the intended meaning clear.) A variable which
appears in the parenthesized list after a predicate or is used with a predicate con-
stant such as “==” is called an individual variable. In the expression P(x,, X2,...,X,)>
P is a predicate constant or variable, each x, is an individual variable, and P is
said to have n arguments or be an n-place predicate.
Values of the individual variables must be drawn from a set called the uni-
verse of discourse, or simply the universe. For example, in discussing the predicate
“x < 3,” we would presumably choose a set of numbers as our universe of dis-
course, thus avoiding the possibility of assertions such as “green < 3.” To be
precise it is necessary to establish explicitly the universe of discourse; in practice,
the universe is frequently left implicit. We require that the universe of discourse
contain at least one element.
Any predicate has n arguments, where n is some natural number, i.e., 1 is a
nonnegative integer. If P is an n-place predicate constant and values c,,¢,,...,C,
are assigned to each of the individual variables, the result is a proposition. Suppose
the universe of discourse is U. If the value of P(c,, ¢.,...,¢,) 18 true for every
choice of arguments c;,¢2,...,¢, Selected from U, then P is said to be valid in
the universe U. If the value of P(c,, ¢,,..., C,) is true for some (but not necessarily
all) choices of arguments selected from U, then P is said to be satisfiable in the
22 MATHEMATICAL REASONING Ch.1

universe U, and the values c,, ¢2,..., ¢, Which make P(c,, ¢2,..., ¢,) true are said
to satisfy P. If P is not satisfiable in the universe U, then we say P is unsatisfiable
in U. Note that a predicate is permitted to have zero arguments. Since a predicate
constant must have a value of either true or false when values are assigned to all
its arguments, it follows that a predicate constant with no arguments is a proposi-
tion. Similarly, a predicate variable with zero arguments is a propositional variable.
In order to change a predicate into a proposition, each individual variable of
the predicate must be bound; this may be done in two ways. The first way to bind
an individual variable is by assigning a value to it.

Example
Consider the predicate “x + y = 3” which we will denote by P(x, y). If the
value 1 is assigned to x, and 2 to y, the predicate is changed into a proposition
P(, 2) whose truth value is true. On the other hand, if we assign the values 2 and 6
to x and y, respectively, the resulting proposition, P(2, 6), is false. #

The second method of binding individual variables is by quantification of the


variable. The most common forms of quantification are universal and existential.
If P(x) is a predicate with the individual variable x as an argument, then the
assertion
“For all x, P(x),”
which is interpreted as
“For all values of x, the assertion P(x) is true,”
is a statement in which the variable x is said to be universally quantified. The sym-
bol V, called the universal quantifier, is used to denote the phrase “for all.” Thus,
“For all x, P(x)” is written “WxP(x).” The symbol V may be read “for all,” “for
every,” “for any,” “for arbitrary,” or “for each.” If the assertion P(x) is true for
every possible value of x, then VxP(x) is true; otherwise, WxP(x) is false. Thus,
if the universe of discourse is U, the assertion VxP(x) is true if and only if the
predicate P is valid in U. It follows that for any predicate P and any elernent c
of the universe of discourse, the implication
VxP(x) > P(c)
is true.

Examples
The following propositions are formed by universal quantification:
(a) VWxlx <x 41] (for all x, x is less than x + 1)
(b) Wxl[x = 3] (for any x, x = 3)
If the universe is the set of integers I, the predicate x < x + 1 is true for all
values of x, but “x = 3” is false when x is assigned the value of 1. Consequently,
for this universe (a) is true and (b) is false.
Sec. 1.2 PREDICATES AND QUANTIFIERS 23

(c) If A is an integer array with 50 entries, 4[1], A[2],..., A[50], then we can
assert that all entries are nonzero as follows:
Vi{1 <i A i< 50) > Afi] 4 0}.
The entries of the array are sorted in nondecreasing order if the following
assertion holds.
Vi{d <i A i < 50) > Afi] < Afi + 1}.
We may also use more than one quantifier with predicates which have more
than one variable, e.g., the assertion
(d) Wx Vyfx + y > x] is read “for all x and all y, x + y is greater than x.” This
proposition is true if the universe of discourse consists of positive integers I-+-
and false if the universe is the set of all integers I. +

Another common form of quantification is existential. The individual vari-


able x in the assertion
“For some x, P(x)”,
or equivalently,
“There exists a value of x for which the assertion P(x) is true,”
is said to be existentially quantified. The symbol d is used to represent the phrase
“there exists” and the above statement can be written “dx P(x).” The symbol 4
may also be read “for some” or “for at least one.” If the assertion P(x) is true for
at least one element in the universe of discourse, then the proposition 4xP(x) is
true; otherwise, it is false. More succinctly, dxP(x) is true if and only if P(x) is
satisfiable in the universe of discourse. It follows that for any element c of the
universe, the implication
P(c) > AxP(x)
is true.

Examples
The variable x is existentially quantified in the following propositions:
(a) dx[x <x-+1] (There exists an x such that x is less than x + 1).
(b) Ax{x = 3] (There exists an x such that x = 3.)
Both of these are true propositions if the universe of discourse is the set of
integers. The proposition
(c) Jxfx =x +1] (There exists an x such that x = x + 1.)
is false, since no matter what value we assign to x, the assertion “x = x + 1”
is false. #

A third form of quantification can be used to assert that there is one and only
one element of the universe of discourse which makes a predicate true. This quan-
tifier is denoted J!, and the sequence of symbols d!x is read “There exists a
unique x such that... .” or “There is one and only one x such that...”
24 MATHEMATICAL REASONING Ch. 1

Examples
Let the universe of discourse be the set of natural numbers N. Then the fol-
lowing propositions are true.
(a) dixfx < 1] a
(b) dlx[x = 3]
In (a), assigning the value of 0 to x makes the assertion x < 1 true; no other
value will do. In (b), the unique value of x is 3. For the same universe, the assertion

(c) Alx[x > 1] is fals e, sinc e the asse rtio n “x > 1” is true if x is assi gned any
value other than Oorl. #

An ass ert ion wit h qua nti fie d var iab les can be exp res sed usi ng pro pos iti ons
obtained by ass ign ing val ues to the ind ivi dua l var iab les of the pre dic ate s whi ch
occur in the assertion. This relationship can be made explicit by considering a
finite uni ver se of dis cou rse . Let the uni ver se cons ist of the inte gers 1,2, and 3.
Then the proposition
VxP(x)
is equivalent to the conjunction
P(l) A PQ) A PQ),
and the proposition
dxP(x)
is equivalent to the disjunction
P(l) V PQ) V PG).
The proposition
dtxP(x)
is equivalent to the proposition
[P) A 7 P) A 7 PG) V (PQ) A 7 PO) A 7 PG)
V [PG) A 7 PQ) A PQ).
If the universe of discourse is infinite, a quantified assertion cannot always be
represented by a finite conjunction or disjunction of propositions without quanti-
fiers. However, the concept can be extended, and it is often convenient to consider
a universally quantified assertion over an infinite universe as an infinite conjunction
and an existentially quantified assertion as an infinite disjunction.

Example
Considér the universe of nonnegative integers, and let P(x) denote the assertion
“x > 3.” Then the proposition
VxP(x)
can be interpreted as the infinite conjunction
P(O) A PQ) A PQ) A PB) A:
Sec. 1.2 PREDICATES AND QUANTIFIERS 25

which is false, since some of the operands, e.g., P(0), are false. The proposition
dxP(x)
can be interpreted as the infinite disjunction

P(O) V PG) V PQ) V P(3) V -:-


which is true, since at least one of the operands, e.g., P(4), is true. #

All of the individual variables of a predicate must be bound in order to trans-


form the predicate into a proposition. Recall the two ways of binding individual
variables: values can be assigned to them or they can be quantified. Individual
variables which are not bound are called free. If P is a predicate with n free vari-
ables, then binding an individual variable reduces the number of free variables by
one; the resulting assertion is equivalent to a predicate with n — 1 variables. As
we stated earlier, a predicate with no free variables is a proposition.

Examples
The predicate P(x, y, z) representing “x + y = z,” has three variables, all of
which are free in the assertion
P(x, y, 2).
If we assign x the value of 2, the result is the predicate P with a bound variable.

P(2, y, 2).
This assertion is equivalent to a predicate with two free variables, which we can
denote by Q(y, z), where Q(y, z) is true if 2 + y = z. Similarly,

AyP(x, y, Z)
is an assertion with two free variables. The truth value of this assertion is equivalent
to that of a predicate with two variables which we will call R(x, z); if the universe is
the natural numbers, then

R(x, 2) <> JyP(x, y, z) <> dye + y =z] (x<z). #

If y does not occur as an individual variable in P(x,, x,,...,,), then the as-
sertions VyP(x,, X.,...,%,) and dyP(x,, x,,...,%,) are both equivalent to
P(X1,% ,..+,X,), Since none of the individual variables of P are bound by the
quantification. As a special case, if P is a proposition, then the truth value of
4dxP or VxP is equal to the truth value of P.
If more than one quantifier is applied to a predicate, the order in which the
variables are bound is the same as their order in the quantifier list; for example,
Vx VyP(x, y) denotes Vx[VyP(x, y)].
The binding order can profoundly affect the meaning of an assertion. For example,
the sequence “Wx dy,” can be paraphrased informally as “No matter what value
of x is chosen, a value of y can be found such that .. .” In this quantifier sequence,
since y is chosen after x, the value of y may depend on the value of x. In contrast,
the sequence “dy Vx” asserts “A value of y can be chosen so that no matter what
26 MATHEMATICAL REASONING Ch, 1

value is chosen for x...” In this case, since y is bound first, the value of y must
be specified independently of the value of x.

Examples
Let the universe of discourse be the set of married persons. Then
(a) Wx dy[x is married to y] is true. However,
(b) Jy Vx[x is married to y] asserts that there is some person in the universe who
is married to everyone; this is false.
Now let the universe of discourse be the integers I. The assertion
(c) Wx dylx + y = 0] (For all x, there exists a y such that x + y = 0.) is true,
since for any value of x there is a value of y (i.e., y is equal to —x) which makes
the assertion “x + y = 0” true. The proposition
(d) dy Vx[x + y = 0] (There exists a y such that for all x, x + y = 0.) asserts
that the value of y can be chosen independently of the value of x. Since no y
exists which yields zero when added to an arbitrary integer, this proposition is
false. The proposition
(e) Wx Vy diz[x + » = z] asserts that for every pair of integers x and y, there
is a unique integer z equal to their sum; the assertion is true. If we interchange
the last two quantifiers of part (e), we obtain the proposition
(f) Wx 3!zVy[x + y = z] which asserts that for every x, a unique z can be
chosen such that no matter what y is added to x, x + y = z. This proposition
is false. The proposition
(g) A!x[x-6 = 0] is true since equation x-6 = 0 is true if and only if x = 0. The
proposition
(h) dix Vy[x-y = 0] is true, but
(i) Wy dlxix-y = 0] is false, since, if y = 0, any value of x will yield zero.
Similarly,
G) Wy dtxlx + » < 0] is false, since for any value of y there are many values of
x for which the sum of x and y is negative. #

Although the order in which individual variables are bound cannot always
be changed without affecting the meaning of an assertion, there are two notable
exceptions: the sequence Vx Vy can always be replaced by Vy Vx, and the
sequence 4x dy can always be replaced by dy Ax.
Example
Let the universe be the nonnegative integers. For any predicate P, the propo-
sition
Vx VyP(x, y)
can be expandedf to

+Throughout this chapter, we will frequently expand quantified statements into infinite con-
junctions or disjunctions, rearrange the terms using the identities of Table 1.1.1 and derive a new
quantified assertion. This technique does not always constitute a careful mathematical argument
and in fact cannot be applied to some universes. We use it as an intuitive aid for understanding
quantified assertions.
Sec. 1.2 PREDICATES AND QUANTIFIERS 27

[VyPO, y)] A [VyPC, »)] A [VyPQ2, ¥)] A +


which can be interpreted as

[P(O, 0) A PO, 1) A PO, 2) A ++]


A [PU,0) A PU, 1) A Pd,2) A ++]
A [P@, 0) A PQ, 1) A PQ, 2) A +++]
Ate

Applying the commutativity and associativity of A (identities 4 and 6 of Table


1.1.1) to rearrange the terms by collecting the propositions in each column, we ob-
tain the infinite conjunction

A (PO, 1) A PUD) A PQ,1) A +e]


A [PO,2) A PC,2) A PQ,2) A ++]

ee
A sae

which represents

[VxP(x, 0)] A [VxP(x, 1] A [VxP(x, 2)] A ++:


This is an expansion of
Vy VxP(x, y). #

It is common practice in mathematics to omit leading universal quantifiers


from assertions. For example, it is acceptable to assert “x + y = y + x.” Accord-
ing to our definition, this is a predicate rather than a proposition because the
variables x and y are apparently free. However, the intended assertion is
Vx Vylx + y=y + x],

which has no free variables. We will follow this convention of deleting universal
quantifiers in later chapters, but will refrain from doing so for the present.
The notions of predicates and quantified variables described in this section
provide a strong extension to the language of propositions. Most substantive
mathematical arguments involve quantification, and the tools introduced in this
section will be used throughout the remainder of this text.

Problems: Section 1.2

1. Let S(x, y, z) denote the predicate “x + y = z,” P(x, y, z) denote “x-y = z,” and
L(x, y) denote “x < y.” Let the universe of discourse be the natural numbers N.
Using the above predicates, express the following assertions. The phrase “there is an
x” does not imply that x has a unique value.
28 MATHEMATICAL REASONING Ch. 1

(a) For every x and y, there is a z such that x + y = z.


(b) No x is less than 0.
(c) Forallx,x+0= x.
(d) For all x, x-y = y for all y.
(e) There is an x such that x-y = y for all y.
Show 4x 3 yP(x, y) and Jy 3xP(x, y) are equivalent by expanding the expressions
into infinite disjunctions.
Determine which of the following propositions are true if the universe is the set of
integers I and - denotes the operation of multiplication.
(a) Vx dylx-y = 0]
(b) Vx d!ylx-y = 1]
(c) dy Vatx-y = 1)
(d) Sy Vxix-y = x]
Let the universe be the integers. For each of the following assertions, find a predicate
P which makes the implication false.
(a) Wx d!yP(x, y) > Aly VxPC, y)
(b) dly VxP(x, y) => Vx dlyP(x, y)
Specify a universe of disc ours e for whi ch the fol low ing pro pos iti ons are true . Try to
choose the univ erse to be as larg e a subs et of the inte gers as poss ible . Exp lai n any
difficulties.
(a) Wx[x > 10]
(b) Wx{x = 3]
(c) Wx dylx + vy = 436)
(d) dy Vx{x +» < 0].
Let the universe of discourse consist of the integers 0 and 1. Find finite disjunctions
and conjunctions of propositions which do not use quantifiers and which are equi-
valent to the following:
(a) VxP(0, x)
(b) Vx VyP(x, y)
(c) Wx dyP(x, y)
(d) dx VyP(x, y)
(e) dy dxPC, y)
Consider the universe of integers I.
(a) Find a predicate P(x) which is false regardless of whether the variable x is bound
by V or 3.
(b) Find a predicate P(x) which is true regardless of whether the variable x is bound
by V or 3.
(c) Is it possible for a predicate P(x) to be true regardless of whether the variable is
bound by V, 3 or 3!? Justify your answer.
Let P be an arbitrary predicate and let the universe of discourse be the integers 1, 2,
and 3. Is the truth of the proposition 4!xP(x) equivalent to the truth of the propo-
sition P(1) B P(2) B PB)?
Consider the universe of integers and let P(x, y, z) denote x — y = z. Transcribe the
following assertions into logical notation.
(a) For every x and y, there is some z such that x — y = z.
Sec. 1.3 QUANTIFIERS AND LOGICAL OPERATORS 29

(b) For every x and y, there is some z such that x — z = y.


(c) There is an x such that for all y,y — x = y.
(d) When 0 is subtracted from any integer, the result is the original integer.
(e) 3 subtracted from 5 gives 2.

1.3 QUANTIFIERS AND LOGICAL OPERATORS

A careful transcription of mathematical statements often involves quantifiers,


predicates, and logical operators. Such assertions can take a variety of forms.

Examples
Let the universe be the integers and let N(x) denote “x is a nonnegative integer,”
E(x) denote “x is even,” O(x) denote “x is odd,” and P(x) denote “x is prime.” The
following examples illustrate the transcription of assertions into logical notation.
(a) There exists an even integer.
dxE(x)
(b) Every integer is even or odd.
Vx[E(x) V O(}X)]
(c) All prime integers are nonnegative.
Vx[P(x) > N(x)]
(d) The only even prime is two.
Vx((E@) A P(x) > x = 2]
(e) There is one and only one even prime.
AMx[E(x) A P(X)]
(f} Not all integers are odd.
AV xO(x), or dx 7 O(x)
(g) Not all primes are odd.
“WV x[P(x) => O(x)], or Ax[PXX) A 70()]
(h) If an integer is not odd, then it’s even.
Va[70(x) > EGO]. #

In previous examples, the quantifiers occur at the beginning of the assertion.


However, in transcribing many mathematical statements, quantifiers may natur-
ally go elsewhere and their placement is important.

Examples
Consider the universe of integers and let P(x, y, z) denote “xy = z”. The fol-
lowing are examples of mathematical statements and equivalent formulations in
logical notation. Note that informal statements of propositions frequently omit the
universal quantification of individual variables.
REASONING Ch. 1
30 MATHEMATICAL

(a) “If x = 0, then xy = x for all values of y.”


Vxlx = 0 => VyP(x, y, x)]
(b) “If xy = x for every y, then x = 0.”
Vx(VyP(x, y, x) > x = 0]
Observe that Wx Vy[P(x, y, x) > x = 0] is not a correct transcription of
n rep res ent s the fal se ass ert ion “fo r all x and y,
assertion (b). The latter transcriptio
if xy = x, then x = 0.” The val ues of x = La nd y = 1 pro vid e a co un te re xa mp le to
this assertion, whereas the assertion (b) is true.
(c) “If xy # x for some y, then x 4 0.”
Valay a P(x, y, x) > (x = 0). F

The preceding examples ill ust rat e a var iet y of way s in wh ic h ass ert ion s can inv olv e
predicates, quantifiers and log ica l ope rat ors . In con str uct ing pro ofs , we fre que ntl y
need to establish rel ati ons hip s be tw ee n ass ert ion s. For ex am pl e, con sid er the
statements
Ax[P(x) > O(x)] and AxP(x) > AyQQ).
Are they equ iva len t, or doe s one imp ly the oth er, or is no sta tem ent of this kin d
possible? In order to res olv e suc h que sti ons , it is nec ess ary to und ers tan d the way s
in which logical operators, quantifiers, and predicates interact.
An ass ert ion inv olv ing pre dic ate var iab les is vali d if it is true for eve ry uni ver se
of discourse no mat ter how the pre dic ate var iab les are int erp ret ed. An ass ert ion
is satisfiable if there exists a universe and some interpretation of the predicate
variab les whi ch mak es it true . If an ass ert ion is not true for any uni ver se or inte r-
pretation, it is uns ati sfi abl e. Val id, sat isf iab le and uns ati sfi abl e ass ert ion s are the
analogs of tautologies, contingencies, and contradictions in the language of prop-
ositions. In this sec tio n, we will dev elo p som e fun dam ent al ide nti tie s whi ch can
be used to determ ine the val idi ty of ass ert ion s. In our dis cus sio n we will oft en
refer to “equivalent” assertions. Two assertions A, and A, are said to be (log-
ical ly) equ iva len t if and onl y if for eve ry uni ver se of dis cou rse and eve ry int erp ret a-
tion of the predicate variables, A, is true if and only if A, is true. In other words,
A, and A, are equivalent if and only if the assertion A, <> A, is valid.
We firs t con sid er how the neg ati on ope rat ion aff ect s qua nti fie d ass ert ion s.
Let P(x) be a predicate and consider the meaning of the proposition —WVxP(x).
We can interpret this proposition as “the assertion ‘WxP(x)’ is false,” which is
equivalent to the statement “for some x, P(x) is not true,” or “Ix — P(x).” This
leads to the valid assertion
—W xP(x) <> dx — P(x).
Similarly, the proposition —74xP(x) asserts that “it is false that there exists an
x such that P(x) is true.” This is equivalent to the assertion that “there does not
exist an x such that P(x) is true” or “for all x, P(x) is false.” This establishes the
valid assertion
—3IxP(x) <> Vx — P(x).
Sec. 1.3 QUANTIFIERS AND LOGICAL OPERATORS 31

These two equivalences can be used to propagate negation signs through a sequence
of quantifiers, as illustrated by the following example.

Example
“dx Vy VzP(x, y, z) <> Vx a Vy VzPC(x, y, z)
<> Vx dy — VzP(x, y, z)
<> Vx dy dz 7 P(x,y,z) #
Propagation of negations through quantifier sequences is often useful in construct-
ing proofs and counterexamples. Consider the following assertion:
For every pair of integers x and y, there exists a z such that x + z= yp.
This statement can be formulated as follows:
Vx Vy Az[x + z = yl].
This proposition is true for the universe of integers I, but false for the natural
numbers N. We establish its falsity for the universe N by showing that its negation
is true. The negation has the form
—Vx Vy dz[x +2z= y]
which is somewhat difficult to interpret. The equivalent form
dx dy Vz [x + z= y], or dx dy Vz[x + zy]
is more tractable and can easily be shown to be true for the nonnegative integers
by choosing x > y.
The scope of a quantifier is the part of an assertion in which variables are
bound by the quantifier.

Examples
(a) In the assertion
AV Wx[P(x) V Q(x)]
the scope of the universal quantifier is [P(x) V Q(x)].
(b) In the assertion
[VxP(x)] > [AxO@)]
the scope of the universal quantifier is P(x), and the scope of the existential
quantifier is Q(x).
(c) In the assertion
Vx{[P() > (A V B)]
the scope of the universal quantifier is [P(x) > (A V B)]. #

Parentheses and brackets can be used to make the scope of a quantifier explicit.
We adopt the convention that the scope of a quantifier is the smallest subexpression
possible, consistent with the parentheses of the expression. Consequently, the
" assertion
VxP(x) V Q(x)
REASONING Ch.1
32 MATHEMATICAL

does not denote the proposition


Vx[P(x) V O)]
but instead the predicate .
[VxP(x)] V Q(x).
This assertion has two variables, one bound and one free, both of which are
denoted by x. It is equivalent to
VyP(y) V Q(x)
which uses different symbols for the bo un d an d fre e var iab les . Us in g the sa me
gh fo rm al ly co rr ec t, is of te n co nf us in g an d
symbol for different variables, althou
should be avoided.

Example
In the expression
Vx[P(x) V O(y) V R(x, 2)]
both occurrences of x refer to the same variable. But in the expression

Vx[P(x) V QO) V RG, 2);


the occurrence of the var iab le x in the pre dic ate P(x ) is bo un d whi le the occ urr enc e
of x in the predic ate R(x , z) is fre e. The last exp res sio n is the ref ore equ iva len t to

Vx[P(x) V OO)IV ROW, 2).

We now con sid er the way qua nti fie rs affe ct con jun cti ons and dis jun cti ons .
We first note that if a pro pos iti on occ urs in a dis jun cti on or con jun cti on wit hin
the scope of a qua nti fie r, it can be rem ove d fro m the sco pe of the qua nti fie r. Thu s,

Wx{A(x) V P] <> [VxA@) V P],


VWx{A(x) A P] <> [VxAQ@) A P],
Ax[A(x) V P] <> [SxAQ@) V P],
and
Ax[A(x) A P] = [SxA(®) A PI
are all valid. Pre dic ate s who se var iab les are not bou nd by a qua nti fie r can be
treated in the same way, ¢.g.,

Vx[P(x) V Q()] <> [VxP@) V 20)),


and
V{VyP(x, y) A Q@)]1 <> [Vx VyP@. ») A O@)
are also valid.
The reader may obtain a better understanding of these equivalences by expand-
ing them into infinite conjunctions and disjunctions. Thus, for the universe of
natural numbers N, the first identity above can be treated as follows:
The assertion Vx[A(x) V P] is equivalent to the infinite conjunction
[4(0) V P] A [AQ) V P I A ( 4 2 ) V P I A = :
Sec. 1.3 QUANTIFIERS AND LOGICAL OPERATORS = 33

which can be rearranged using the distributive laws to form


[40) A AQ) A AQ) A ++] VP
which is equivalent to VxA(x) \ P.
Now suppose the variable bound by a quantifier occurs in both predicates
of a disjunction or conjunction. We will first show that the proposition
Vx[P(x) A Q(x)] <> [VxP(x) A VxQ(x)] is valid. The proposition
Vx[P(x) A Q(x)]
can be read “For all x, P(x) is true and Q(x) is true” or “For all x, P(x) and Q(x)
are both true.” The proposition
VxP(x) A VxO(x)
states that “For all x, P(x) is true and for all x, Q(x) is true.” We show the equival-
ence of these two assertions for the universe of natural numbers N by first expand-
ing Vx[P(x) (A Q(x)] into an infinite conjunction:
[PO) A QO) APO) A QI A LPR) A PQA ---
Appealing to the associative and commutative properties of /\, (identities 4 and
6 of Table 1.1.1), we rearrange these terms to obtain
[P) A PO) A PQ) A ---1 A [8@@) A Q0) A @Q2) A +5]
which is equivalent to
VxP(x) \ VxQ(x),
or equivalently
VxP(x) \ VyQQy).
Our argument has used the universe N, but the two assertions are equivalent
for any universe of discourse. Thus the following assertion is valid:
VxlP(x) A Q)] > [VxP(X) A VxQ(x)].
This relationship between V and /\ is informally characterized by the assertion
that the universal quantifier V distributes over the logical connective /\. However,
the existential quantifier J does not distribute over the logical connective /\.
That is, dx[P(x) A O(x)] is not equivalent to 4xP(x) A 4xQ(x), as the follow-
ing argument shows. The proposition 4x[P(x) A Q(x)] asserts that “There exists
an x such that P(x) and Q(x) are both true.” This assertion requires that the same
value of x satisfies both P and Q. On the other hand, the assertion “There exists
an x such that P(x) is true and there exists an x such that Q(x) is true,” which can
be represented by
dxP(x) A 4xO(x)
permits different values of x to be chosen to satisfy P and Q.
To show the two assertions 4x[P(x) A Q(x)] and AxP(x) /\ 4xQ(x) are not
equivalent, we can use the preceding analysis to construct a universe and predicates
P and Q such that one assertion is true and the other false. Let the universe be
the the integers and let P(x) denote “x is an even integer” and Q(x) denote “x
Ch.1
34 MATHEMATICAL REASONING

is an odd integer.” Then 4xP(x) \ 4xQ(x) is a true proposition, whereas


Ax{P(x) A Q(x)] is false.
Although Jx {P (x ) A Q( x) ] an d Sx P( x) A 4x Q( x) are not equivalent, the
first implies the second, that is, the assertion
Ax[P(x) \ Q(x) > [4xP@) A 3xOQ)]
is val id. Fo r if Ix [P (x ) A Q( x) ] is tru e, th en th er e is some element c of the uni-
verse such th at the pr op os it io n P( c) /\ Q( c) is tru e. Therefore, P(c) is true and
Q(c) is true. Fr om the tr ut h of P( c) , we ca n co nc lu de that JxP(x) is true. Simil-
arly, we ca n co nc lu de fr om Q( c) th at 4x Q( x) is tr ue and therefore, the conjunc-
tion SxP(x) A 3xQ(x) is true.
By changing predicate variable name s, we ca n us e the pr ev io us re su lt s to
\V bu t V do es not . Si nc e ou r re su lt s we re es ta b-
establish that J distributes over
ca n re pl ac e P by “R an d Q by —7 T in th e
lished for arbitrary predicates, we
valid assertion
Vx{P(x) A O(x)] <> [VxP(@) A VxO@)]-
Since an equivalence remains valid wh en bo th sid es are ne ga te d, th e fo ll ow in g is
also valid:
—W xf R( x) A A T ) <> SI V (x) MR O) A Vx MT @) I.
Applying identities, we ob ta in the fo ll ow in g se qu en ce of eq ui va le nc es .

Ax S[AR(x) A T(x) <> (AV x TR(X) V (OV MT)


Ax[ (R(x) V (HT (x) > (x ACHR) V Gx “CTE
Ax[R(x) V T(x)] <> [SxR(x) V IxTO)]
This establishes that 4 distributes over \/.
Using the same technique of replacing the predicate variables P and Q by
—P and —@ in the valid assertion
Ax{P(x) A O(x)] = [AxP(x) A 4x0)
we can establish
[VxP(x) V VWxO(x)] = Vx[P(x) V Q@))-
The converse of this implication is not valid.
Once we hav e est abl ish ed ho w to dea l wit h the qua nti fie rs for the ope rat ors
A, VV’ and —, we can tre at the re ma in in g con nec tiv es > and < by app lyi ng ide nti -
ties relating them to A, V, and —.

Example
We show that 3 does not distribute over =>; that is, the assertion
Ax[P(x) > Q(x)] <> [AxP(X) > 3xQ@Q)]
is not valid.
Sec. 1.3 QUANTIFIERS AND LOGICAL OPERATORS 35

Since A => B is equivalent to —A V B, it follows that

Ax[PQ) => O()] <> Axf-P(X) V O@)]


<> [dx —P(x) V 4xOQ(x)]
<> [AVxP(x) V 4xO(x)]
<> [VxP(x) = 4xQ(x)].

Hence, the original assertion is equivalent to the assertion

[VxP(x) > dx QO(x)] <> [SxP(x) = 4xQ(x)].

We can construct a truth table for the propositional form of this assertion, taking
the components VxP(x), JxQ(x), and 4xP(x) as propositional variables. How-
ever, since JxP(x) is true whenever VxP(x) is true, two lines of the truth table do
not apply.

VxP(x) dxP(x) AxQ(x) VxP(x) > 3xO(x) AxP(x) => 3xQ(x)

0 0 0 1 1
0 0 1 1 1
0 1 0 1 0
0 1 1 1 I
1 0 0 ma na
I 0 1 ma na.
1 1 0 0
1 1 j 1 1

Considering the last two columns of the table, we conclude that the implication
holds in one direction,

[AxP(x) > 3xO(x)] = [VxP(x) > 3xQ(Q)].


However, we can show the converse is not valid by exhibiting a counterexample.
From the third line of the truth table, we know that any counterexample must be an
interpretation of the predicate P in which VxP(x) is false and 4xP(x) is true, and
an interpretation for Q in which 4xQ(x) is false. For the universe of the integers,
let P(x) denote “x = 0” and Q(x) denote “x + x.” This provides a counterexample
and establishes that J does not distribute over >. #

Table 1.3.1 is a list of useful logical relationships between assertions involving


quantifiers. Each relationship of the table also holds when additional free vari-
ables are inserted consistently in each occurrence of a predicate. Thus, from ident-
ity 4 we can infer
VxP(x, y) > IxP(x, y)

and from identity 6 we can infer

[VxP(x, y) \ O(2)] <> Vx[PC: y) A O@)].


36 MATHEMATICAL REASONING Ch.1

Table 1.3.1 A SUMMARY OF LOGICAL RELATIONSHIPS INVOLVING QUANTIFIERS

VxP(x) => P(c), where c is an arbitrary element of the universe


P(c) => 3xP(x), where c is an arbitrary element of the universe

WN
Wx P(x) <> “73 xP)
VxP(x) => 3xP(x)

AMNP
dx PQ) <> “OV xP)
[VxP(Qx) A QO) <> Ve[P@ A Q]
[VxP(x) V Q] <> Val[P®) V Q]

WEN
[VxP(x) A VxOQ)] <> ValP@) A O@))]
[VxP(x) V VxO(x)] > ValPQ) V OQ)
10. [BxP(x) A Q] <> dxfP(x) A Q]
11. [SxP@) V Q]) <> 3xfP@) V Q)
12. Ix[P@) A OC) = [AxP@) A 3xOQ)]
13. [BxP@) V 3xOQ)] > JAP) V OCO))

A compact form of logical notation is often used to express mathematical


assertions. For example, the assertion
“For every x such that x > 0, P(x) is true,”
which would be written in our current notation as

Vxi(x > 0) > PO)


can be written more compactly as
Vxi509 P(x).
Similarly,
“There exists an x such that x + 3 and Q(x) is true,”
which would be written
Ax{(x # 3) A O@)]
can be written
Ax.23 Q(x).

Using these conventions, the formal statement of assertions becomes both more
compact and more readable. Furthermore, the compact notation allows a negation
sign to be propagated through a sequence of quantifiers in the same manner as
was illustrated earlier.

Example
Consider the limit of a function defined over the real line. The definition is
usually expressed as follows.

Definition: The limit of f(x) as x approaches c is k (denoted lim f(x) = k)


if for every € > 0, there exists a 6 > 0 such that for all x, if |x —c| <6 then
|f(x) — k| <e.
This can be transcribed in the abbreviated logical notation as follows.
Sec. 1.3 QUANTIFIERS AND LOGICAL OPERATORS 37

Definition:
lim f(x) = k > VE.s0 Adss0Vxl|x — cl <d >| f(x) — k| < €].
xe

To show that lim f(x} +k, we form the negation of both sides of the above
aoe

definition giving

lim f(x) 4 k <> F€.39 WO5s0 Ax[|x — ec] < 6 A | f(x) — kl > €].

This establishes that lim f(x) + k if and only if there exists an € > 0 such that
woe

for every d > 0, there is some


x such that|x — c| < dand yet| f(x) —ki[ De. #

The virtues of the compact notation will be obvious to anyone who writes out
the definition of a limit using the conventional logical notation.
In this section we have described ways in which quantifiers and logical oper-
ators interact with each other. These interactions are often subtle, and dealing
with them requires some care, but a facility with them is invaluable in the construc-
tion of sound mathematical arguments.

Problems: Section 1.3

1. Let P(x, y, z) denote xy = z;


E(x, y) denote x = y; and
G(x, y) denote x > y.
Let the universe of discourse be the integers. Transcribe the following into logical
notation.
(a) Ify = 1, then xy = x for any x.
(b) If xy +0, then x 4 O and y 40.
(c) If xy = 0, then x = Oory = 0.
(d) 3x = 6if and only if x = 2.
(e) There is no solution to x? = y unless y > 0.
(f) x < zis a necessary condition for x <yandy <z.
(g) x<yand y < x is a sufficient condition for y = x.
(h) Ifx < yand z <0, then xz > yz.
(i) It cannot happen that x = yandx < y.
(j). Ifx < y then for some z such that z < 0, xz > yz.
(k) There is an x such that for every y and z, xy = xz.
2. Let the universe of discourse be the set of arithmetic assertions with predicates
defined as follows:
P(x) denotes “x is provable”
T(x) denotes “x is true”
S(x) denotes “x is satisfiable”
D(x, y, z) denotes “z is the disjunction x V y”
Translate the following assertions into English statements. Make your transcriptions
as natural as possible, e.g.,

Vw Vx Vy Wz[LD(w, x, ») A D(x, w, z) A Py) > P)]


38 MATHEMATICAL REASONING Ch. 1

become s “If y is the asse rtio n w V x, zis the asse rtio n x V wan d y is pro vab le, then
zis provable.”
(a) Vx[P(x) > T(x)]
(b) Vax{T(x) V mS]
(c) Ax{Tx) A 7PO))
(d) Wx Vy Vz{[D(x, y, z) A P(2)) > (PO) V PON}
(e) Wx{T(x) > Vy VaLDG@, y, 2) > T)]}
Put the following into logical notation. Choose predicates so that each assertion
requires at least one quantifier.
(a) There is one and only one even prime.
(b) No odd numbers are even.
(c) Every train is faster than some cars.
(d) Some cars are slower than all trains but at least one train is faster than every car.
(e) If it rains tomorrow, then somebody will get wet.

Find an assertion which is logically equivalent to VxP(x) but uses only the quantifier
3 and the logical operator —. Similarly, express Ix P(x} in terms of V and —.
Find an assertion which is logically equivalent to 4!xP(x) but which uses only the
quantifiers V and J together with the predicate for equality and logical operators.
Show that the following propositions are valid.
(a) [VxP(x) > Q] <= [Ax[P(x) = Q]]
(b) Vx{P > O(X)] = [P = VxOQ)]
For the following assertions, establish those which are true and find interpretations
for P and Q which provide counterexamples for those which are false.
(a) Wx{P(x) > O(x)] > [VxP(x) > VxO(x)]
(b) [WxP(x) > VxOQ(x)] > VxlP(x) > GO)]
(c) [AxP@) > VxQ(x)] > Va[PQ) > O(X)]
(d) Wx[P(x) > QO(x)] > [AxP(x) > VxOQ)]
(a) For a universe containing only the elements 0 and 1, expand
Ax[P(x) A Q(x)] and [AxP(x) A 4xQQ)]
into propositions involving P(0), P(1),...etc., and without quantifiers. Re-
arrange the terms of this expansion to show
Ax[P(x) A Q(x)] > [AxP(x) A 4xOQ)1.
(b) Show that the converse of the implication of part (a) is not valid.
(c) For the same universe, show
Vx[P(x) <> Q(x)] > [VxP(x) <> VxQ(x)].
(d) Show that the converse of the implication of part (c) is not valid.
Show that the following are valid for the universe of natural numbers N either by
expanding the statement or by applying identities.
(a) Vx Vy[P(x) V Q0)] <> [VxP(x) V VyQ0)]
(b) dx 3y[PQX) A QQ)] > 4xP(x)
(c) Vx VylP(x) A Q0)]<> [VxP(x) A Vv@0)]
(d) dx Sy[P(x) > P(y)] > [VxP(x) > AyPQ)]
(ce) Vx Vy[P(x) > Q()) <> [AxP@) > VyQ0)]
Sec. 1.4 LOGICAL INFERENCE 39

10. (a) Write out the definition of lim,., f(x) = k in the usual logical notation rather
than the compact notation used in the last example of this section.
(b) Find the condition for lim,.., f(x) # k by forming the negation of both sides
" of the definition.
11. Let A be a two-dimensional integer array with 20 rows (indexed from 1 to 20) and 30
columns (indexed from 1 to 30). Using compact logical notation, make the following
assertions. Assume the universe of discourse is the set of integers I.
(a) All entries of A are nonnegative.
(b) All entries of the 4th and 15th rows are positive.
(c) Some entries of A are zero.
(d) The entries of A are sorted in row-major order (the entries are in order within
rows, and every entry of the ith row is less than or equal to every entry of the
(i + I)st row).

1.4 LOGICAL INFERENCE

A theorem is a mathematical assertion which can be shown to be true. A proof is


an argument which establishes the truth of a theorem. A mathematician will not
usually accept an assertion as true unless he is convinced that a proof of the asser-
tion can be constructed.
Mathematicians have long been concerned with the question of what consti-
tutes a proof. Their work has resulted in the concept of a formal mathematical
system in which the notions of axiom, theorem, and proof are precisely defined.
Ideally, these systems would provide a formal basis for describing all rigorous
mathematics. In fact, the systems are not powerful enough to describe all mathe-
matical systems of practical importance. Nevertheless, work in formal systems
has increased our understanding of mathematical reasoning, and we will use the
terminology of formal systems to describe the concepts of theorem and proof.
In the remaining sections of this chapter, we address the problem of formulat-
ing and constructing proofs. The novice in mathematics is often puzzled by the
question of what constitutes a proof. When is an argument convincing? The ques-
tion is not easily answered. In fact, mathematicians sometimes disagree among
themselves as to whether an argument is sound. Their disagreement may be over
whether to allow a particular proof technique, such as the use of the law of the
excluded middle; such differences are essentially philosophical and, in some sense,
unresolvable. Even when mathematicians can agree on the acceptablilty of a proof
technique, disagreement sometimes occurs when a purported proof is thought to
contain some error of commission or omission. The existence of such disagree-
ments indicates the difficulty of constructing and evaluating proofs.
There are no general algorithms for deciding whether an assertion is true or
false; if there were, mathem aticia ns would use them. The constru ction of proofs
is a craft, and while we can offer a modicum of advice, the skill can only be learned
by means of exampl es and practic e. The proofs which occur throug hout this text
MATHEMATICAL REASONING Ch. 1
40

are intended to serve bot h as con vin cin g arg ume nts and as mod els of pro of tec h-
niques. The exercises are intend ed to pro vid e pra cti ce in the con str uct ion of pro ofs .
A proof of an ass ert ion is a seq uen ce of sta tem ent s whi ch rep res ent s an arg u-
ment that the theorem is tru e. So me of the ass ert ion s whi ch occ ur in a pro of may
be known to be tru e a pri ori ; the se inc lud e axi oms or pre vio usl y pro ved the ore ms.
Other assertions may be hyp oth ese s of the the ore m, as su me d to be tru e in the
argume nt. Fin all y, som e ass ert ion s may be inf err ed fro m oth er ass ert ion s whi ch
occurred earlie r in the pro of. Thu s, to con str uct pro ofs , we nee d a mea ns of dra win g
conclusions or der ivi ng new ass ert ion s fro m old one s. Thi s is don e usi ng rul es of
inference . Rul es of inf ere nce spe cif y con clu sio ns whi ch can be dra wn fro m ass er-
tions known or assumed to be true.
Perhaps the most fundamental rules of inference are those which permit sub-
sti tut ion s. Thu s, we are gen era lly all owe d to rep lac e any exp res sio n in an ass ert ion
by another expression which is equivalent to it; we consider the new assertion to
be true if and only if the original assertion was true. We learn this rule of inference
at an ear ly age ; it is som eti mes exp res sed as “eq ual s can be sub sti tut ed for equ als .”
Other rules gov ern ing sub sti tut ion are co mm on ly use d in mat hem ati cs, and we
wil l app ly the m fre ely wit hou t exp lic itl y sta tin g the m. For exa mpl e, if Sis a tau -
tology in whi ch pro pos iti ona l var iab les occ ur, sub sti tut ion of pro pos iti ons for
the propositional variables in the usual way results in a new tautology.
Another rule of inference can be stated as follows: If it is known that a state-
ment P is true, and also that the statement P = Q is true, then we can conclude
that the statement Q is true.

Example
Suppose we know “Samson is strong” and “If Samson is strong, then it will take
a woman to do him in.” We can conclude “It will take a woman to do Samson in.”
+wth

This rule of inference is called modus ponens; it is often presented in the form of
an argument as follows:
P
P>@Q
Q
In such a tabular presentation of an argument, the assertions above the
horizontal line are called hypotheses or premises; the assertion below the line is
the conclusion. The symbol .*. is read “therefore” or “it follows that,” or “hence.”
An argument is said to be valid if, whenever all the premises are true, the con-
clusion is true. A rule of inference is an argument form which is taken to be
valid in the same sense that an axiom is taken to be true.
The rule of inference known as modus ponens is related to the tautology
[P A (P > Q)] = Q in the language of propositions. Other rules of inference
have similar interpretations; we have listed some of the most important rules of
inference in Table 1.4.1.
Sec. 1.4 LOGICAL INFERENCE 41

Table 1.4.1 RULES OF INFERENCE RELATED TO THE LANGUAGE OF PROPOSITIONS


Rule of Inference Tautological Form Name

PO P=>(PV Q) addition
WPV QO
PAQ (PA Q)=P simplification
“?P
P
P>@Q [PA > Q)]=>20 modus ponens
“Q
—7Q
P=>Q ("2 A®=>Q)|=>7P modus tollens
—P
PV OQ
—p (PV Q) A 7P]= OQ disjunctive
ee) syllogism
P>Q
O>R (P> OA (Q2>R)]>[P> Rl hypothetical
P>R syllogism
P
Q conjunction
WPA Q
(P= Q) A (RS)
PVR (P>ODAR>SAPVRIS(CVS] constructive
“OVS dilemma

(P= Q) A (R> S)
“70 V 3S (P> Q)AR>SA (OV mS) =[-P V mR] destructive
“PV OR dilemma

Examples
Fallacious arguments are often the result of incorrect inferences. Here we pre-
sent some examples of common fallacies.

(a) The Fallacy of Affirming the Consequent


Consider the following argument:

If the butler did it, he will be nervous when he is interrogated.


The butler was very nervous when he was interrogated.
Therefore, the butler did it.

Presented in the form of our rules of inference, this argument can be presented
as follows:
P=>Q
Q
P
REASONING Ch.1
42 MATHEMATICAL

The argument is not cor rec t bec aus e the con clu sio n P can be fal se eve n tho ugh
the hypoth ese s P = Q and Q are tru e; i.e. , the ass ert ion [(P > Q) A Q] > P
is not a tautol ogy : the sou rce of the but ler ’s dis com for t ma y not hav e bee n
guilt but rather the beh avi or of the sto ck mar ket on the day tha t he was que s-
tioned.
(b) The Fallacy of Denying the Antecedent
This form of fallacious argument can be represented as
P=>Q
“iP
“1Q

The following example illustrates the fallacy:


If the butler’s hands are covered with blood, then he did it.
The butler is impeccably groomed.
Therefore, the butler is innocent.
The arg ume nt ign ore s the com pul siv e cle anl ine ss of the but ler , who alw ays
washes his han ds imm edi ate ly aft er com mit tin g a cri me. Fro m P => Q and —P,
one can conclude neither Q nor “Q. #

The following exa mpl e ill ust rat es the cor rec t app lic ati on of som e of the rul es of
inference given in Table 1.4.1.

Example
Consider the following argument:
If horses fly or cows eat artichokes, then the mosquito is the national bird. If
the mosquito is the national bird, then peanut butter tastes good on hot dogs.
But peanut butter tastes terrible on hot dogs. Therefore, cows don’t eat arti-
chokes.
The first three assertions are the hypotheses of the argument; the last assertion is the
conc lusi on. We are aske d to dete rmin e whet her the trut h of the conc lusi on is impl ied
by the trut h of the hypo thes es. We begi n by repr esen ting the com pon ent prop osi-
tions as follows:
F denotes the proposition “horses fly”;
A denotes the proposition “cows eat artichokes” ;
M denotes the proposition “the mosquito is the national bird” ;
P denotes the proposition “peanut butter tastes good on hot dogs.”

The argument can be represented as follows:


1. (FV A)>M
2. M>P
3. —“P
iA

Assertions 1, 2, and 3 are the hypotheses, and 4 is the conclusion. One way to test
whether the conclusion is implied by the hypotheses is to construct a truth table for
Sec. 1.4 LOGICAL INFERENCE 43

the implication which has the conjunction of the hypotheses as its antecedent and
the conclusion as its consequent; in the present case this is the implication

{(F V A)> M1] A (M>P) A =P} > (7A).


If the implication is a tautology, then we say the conclusion follows logically from
the hypotheses and hence the argument is valid. If the implication is a contingency
or contradiction, then the conclusion —A may be false even though all the hy-
potheses are true. (An assignment of truth values for the propositional variables
which makes the implication false is called a “disproof by counterexample”; we will
discuss this in more detail in the following section.)
When several hypotheses and propositional variables are involved, construc-
tion of a truth table can become unwieldy. An alternative way to show an argument
is valid is to construct a proof using the hypotheses, logical identities, and rules of
inference. A proof is an expansion of the argument in which the hypotheses are
augmented by additional assertions such as axioms, previously proved theorems, or
assertions obtained by applying rules of inference. The conclusion of the argument
must be shown to follow from these assertions by a rule of inference. The following
is a proof of the argument given above.

Proof:

Assertion Reasons

1 (FVAS>M Hypothesis 1
2. M=>P Hypothesis 2
3, (FV A)>P Steps 1 and 2 and hypothetical
syllogism
4. —P Hypothesis 3
5. ~-(F V A) Steps 3 and 4 and modus tollens
6. “FA A Step 5 and DeMorgan’s law
(identity 7, Table 1.1.1)
7. TAA OF Step 6 and commutativity of A
(identity 4, Table 1.1.1)
8. “A Step 7 and simplification |

Each assertion of the proof is considered true, either because it is a hypothesis, or


because it is known to be logically equivalent to a preceding assertion of the proof,
or because it is obtained by applying a rule of inference to preceding assertions of the
proof. Since the last assertion of the proof is the conclusion, it follows that if the
hypotheses are true, then the conclusion is true. #

Additional rules of inference are necessary to prove assertions involving pre-


dicates and quantifiers. A careful treatment of these rules is beyond our scope,
but we will illustrate some of the techniques. The following four rules describe
when the universal and existential quantifiers can be added to or deleted from an
assertion.
44 MATHEMATICAL REASONING Ch. 1

The first rule is known as universal instantiation; it may be represented as


follows:
VxP(x)
P(c)
where P is a pre dic ate and c is som e arb itr ary ele men t of the uni ver se of dis-
course. As an exa mpl e of the use of this rule , let the uni ver se be the set of all
humans, and let P(x) denote “x is mortal.” If we can establish VxP(x), that is,
“all men are mortal,” then the rule of universal instantiation permits us to
conclude, “Socrates is mortal.”
A sec ond rule of infe renc e, kno wn as uni ver sal gen era liz ati on, per mit s the
universal qua nti fic ati on of asse rtio ns. If we can sho w that the ass ert ion P(c) hol ds
for every element c of the universe of discourse, then universal generalization allows
us to conclude that the universally quantified assertion WxP(x) holds. Thus,
the rule of inference
P(x)
o.WxP(x)
can be applied if we can show that the hypothesis P(x) is true for every possible
value of x.
The third rule of inference is known as existential instantiation. It takes the
form
dxP(x)
.P(c)
where c is some element of the universe of discourse. However, the element c is
not arbitrary (as it was in the case of universal instantiation), but must be one for
which P(c) is true. It follows from the truth of JxP(x) that at least one such ele-
ment must exist, but nothing more is guaranteed. This places constraints on the
proper use of this rule of inference. For example, if we know that dxP(x) and
4x Q(x) are true, we can conclude that the statement P(c) /\ Q(d) is true for some
choice of c and d, but we cannot conclude that P(c) A Q(c) is true. For suppose
P(x) represents “x is even” and Q(x) represents “x is odd” in the universe of inte-
gers. Then 4xP(x) and 4x Q(x) are true, but P(c) A Q(c) is false for every c.
The last rule of inference we will describe is known as existential generaliza-
tion. It is represented as
P(c)
“.dxP(x)
where c is an element of the universe. This rule asserts that if P(c) is true for some
element c, then the assertion 4xP(x) is true.
When quantifiers are involved, construction of proofs is more involved because
of the care required in the application of the rules of inference. An exploration of
the subtleties of proofs involving quantifiers is beyond the scope of this chapter,
but the following simple example will illustrate the application of some of the rules
of inference.
Sec. 1.4 LOGICAL INFERENCE 45

Example
Consider the following argument:
Every man.has two legs. John Smith is a man.
Hence, John Smith has two legs.
Let M(x) denote the assertion “x is a man,”
L(x) denote the assertion “x has two legs,” and
J denote John Smith.
Expressed in logical notation, the argument is
1. VxlM(x) > LQX)]
2. MW)
3. LW)
A formal proof is as follows:

Assertion Reasons

1. Va[M@) > LQ] Hypothesis 1


2 MD)=>W) Step 1 and universal instantiation
3. MV) Hypothesis 2
4. LW) Steps 2 and 3 and modus ponens | ie

In this section we have dealt with the problem of logical inference, i.e., infer-
ring the truth of one statement from the known or assumed truth of others. A rule
of inference is an explicit statement of when such an inference can be made. We
commonly apply rules of inference in mathematical arguments without explicit
reference to them; this is one reason why mathematical arguments are sometimes
difficult to follow. By treating these rules explicitly, we aim to provide a basis for
the understanding, construction, and description of mathematical arguments.

Problems: Section 1.4

1. For each of the following sets of premises, list the relevant conclusions which can be
drawn and the rules of inference used in each case.
(a) I’m either fat or thin. I’m certainly not thin.
(b) If I run I get out of breath. I’m not out of breath.
(c) If the butler did it, then his hands are dirty. The butler’s hands are dirty.
(d) Blue skies make me happy and gray skies make me sad. The sky is either blue or
gray.
(e) If my program runs, then I am happy. If I am happy, the sun shines. It’s 11: 00
p.m. and very dark.
(f) All trigonometric functions are periodic functions and all periodic functions are
continuous functions.
(g) All cows are mammals. Some mammals chew their cud.
(h) All even integers are divisible by 2. The integer 4 is even but 3 is not.
(i) What’s good for the auto industry is good for the country. What’s good for the
country is good for you. What’s good for the auto industry is for you to buy an
expensive car.
REASONING Ch. 1
46 MATHEMATICAL

Show that the tautological for m of the fol low ing rul es of inf ere nce s giv en in Tab le
1.4.1 are tautologies:
(a) modus tollens
(b) disjunctive syllogism
(c) constructive dilemma
(d) destructive dilemma
Construct a proof for eac h of the fol low ing arg ume nts , giv ing all nec ess ary add iti ona l
assertions. Specify the rul es of inf ere nce use d at eac h ste p. (T he wor d “or ” den ote s
the “logical or” rather than the “exclusive or.”)
(a) It is not the case tha t IB M or Xer ox wil l tak e ove r the cop ier mar ket . If RC A
returns to the comp ut er mar ket , the n IB M wil l tak e ove r the cop ier mar ket .
Hence, RCA will not return to the computer market.
(b) (My program runs suc ces sfu lly ) or (th e sys tem bo mb s and I blo w my sta ck) .
Furthermore, (the sys tem doe s not bo mb ) or (I don ’t blo w my sta ck and my
program runs successfully). Therefore, my program runs successfully.
Supply the missing ass ert ion s to pro ve the fol low ing arg ume nt. Just ify the inc lus ion of
each assertion in the proof.
(P \ Q)=(RAS)
(T>Q) A (S>U)
(W=> P) A(T > VU)
—-R
Wo mT
5. Determine whi ch of the fol low ing arg ume nts are vali d. Con str uct pro ofs for the vali d
arguments. For tho se whi ch are not vali d, sho w why the con clu sio n doe s not fol low
from the hypotheses.
(a) AAB (b) AV B
Ax>C Ax>C
CAB “CV B
(c) A>B (d) A>(BV C)
A»>C D>—7C
.C>B B= 7A
A
Pp
“BA 7B

Determine which of the following are valid arguments. Construct proofs for those that
are valid and describe the fallacies of those that are not.
(a) If today is Tuesday, then I have a test in Computer Science or a test in Econ. If
my Econ professor is sick, then I will not have a test in Econ. Today is Tuesday
and my Econ professor is sick. Therefore, I have a test in Computer Science.
(b) Iam happy if my program runs. My happiness is a necessary condition for me to
enjoy life. Hence, if my program runs, then, if I enjoy life, then I am happy.
(c) Itis not the case that some trigonometric functions are not periodic. Some perio-
dic functions are continuous. Therefore, it is not true that al! trigonometric
functions are not continuous.
(d) Some trigonometric functions are periodic. Some periodic functions are con-
tinuous. Therefore, some trigonometric functions are continuous.
Sec. 1.5 METHODS OF PROOF 47

7. Consider the implication


VxlP(x%) V O(x)] > [VxP(x) V VxOx].
(a) Show that this implication is not valid.
(b) The following is an argument which purports to prove the above implication.
Find and explain the flaw.
VxlP(x) V O(x)] <> 74x7[P@) V O@)]
<> Ax[ PO) A 7O@)]
=> [Ax —P@) A dx 7Q(x)]
<> [dx mP(x) V adx MOW)]
<> WxP(x) V VxO(x)
8. One must exercise care in the application of rules of inference to avoid fallacious
conclusions. In the following argument, locate and explain all misapplications of rules
of inference.
Let the universe of discourse be the set of integers I. The assertion that there is no
smallest integer can be put into logical notation as follows:
Vx dyl[x > yl).
It follows universal instantiation that for arbitrary d,

Ay{d > y].


Now applying existential instantiation we conclude that for some element c
d> ec.
Since d was arbitrary it follows by universal generalization that
Vilx > c].
By universal instantiation, we can conclude
c>c,
and by universal generalization,
Vx[x > x].

1.5 METHODS OF PROOF

In the preceding section, we described the use of rules of inference to infer the
truth of one assertion from others. Rules of inference are characterizations of
the syntactic constraints which a proof must obey; in a formal mathematical sys-
tem, where the structure of proofs is precisely specified, the rules of inference
enable us to determine if an argument is a proof. In this section, we are concerned
with the structure of proofs as well as strategies for their construction. Although
it is not possible to consider all proof techniques, we will describe some of the most
common ones, give examples of their use, and relate them to the rules of inference
described in the previous section.
The most elementary form of theorem is the tautology. A tautology is a
theorem because of its sentential structure rather than its content; its truth is
MATHEMATICAL REASONING Ch. 1
48

actually independent of the inte rp re ta ti on or me an in g of any of the pr op os it io ns


involved. For this reason , ta ut ol og ie s are eas ily pr ov ed : one ne ed onl y co ns tr uc t
a truth table.

Example
Consider the universe of int ege rs. Den ote by E(x ) the ass ert ion “x is eve n” and
by O(x) the assertion “x is not eve n”; i.e. , O(x ) <> E( x) . If we rea d O(x ) as “x is
odd,” then we can prove the theorem
The integer 3 is either even or odd.
The theorem is stated as
E(3) V O(3),
or alternatively
E(Q3) V 7£E(3),
which, if we use the letter P to denote E(3), can be written
PV —P.

From the truth table of the proposition P V —P, we know it is a tautology, and the
theorem is established. #

A the ore m is oft en exp res sed as a pro pos iti ona l for m whi ch is not a tau tol ogy .
The truth of suc h an ass ert ion is dep end ent on bot h the log ica l str uct ure of the
assertion and the meanin g of the com pon ent pro pos iti ons . Bec aus e the com pon ent
propositions can not ass ume all pos sib le tru th val ues , cer tai n line s of the tru th
table cannot occur; the the ore m is pro ved by sho win g tha t all the line s whi ch can
occur result in a value of true . We will trea t suc h the ore ms by con sid eri ng the mos t
important of the logical operators.
Let T be an ass ert ion of the for m —7P , whe re P is a pro pos iti on. In ord er to
prove T, we must est abl ish that P is fals e. Sim ila rly , if T is of the for m P A Q, the n
we mus t sho w tha t bot h P and Q are true . An ass ert ion of the for m P V Q is
often establ ish ed by pro vin g the log ica lly equ iva len t pro pos iti on —P => Q (or,
by symmetry, ~Q => P). A truth table can be used to show the logical equivalence
ofP V Qan—P d > Q.
A var iet y of pro of tec hni que s are use d for pro vin g imp lic ati ons , and bec aus e
these techniques are so common, they are frequently referred to by name. Recall
that the truth table for P => Q has the following form:

Q|P>Q

The four most common techniques for proving implications are the following:
1. Vacuous Proof of P > Q
The truth value of P > Q is trve if that of P is false. Consequently, if we
can establish that P is false, only the first two lines of the above truth
Sec. 1.5 METHODS OF PROOF 49

table can possibly apply, and it follows that the assertion P > Q is true.
A vacuous proof of P > Q is constructed by establishing that the truth
value of P is false.
While vacuous proofs appear to be of little value, they are often important in
establishing limiting or special cases. We will point out many examples of vacuous
proofs in the next chapter.
2. Trivial Proof of P > Q
If it is possible to establish that Q is true, only the second and fourth lines
of the truth table for implication can apply, and it follows that the theorem
P => Q is true. Construction of a trivial proof of P = Q requires showing
that the truth value of Q is true.
Like the vacuous proof, the trivial proof has limited applicability and yet is
extremely important. It is frequently used to establish special cases of assertions.
3. Direct Proof of P > QO
A direct proof of P = Q shows that the truth of Q follows logically from
the truth of P, i.e., the third line of the truth table for implication cannot
hold. Such a proof begins by assuming P is true. Then, using whatever
information is available, such as previously proved theorems, it is shown
that Q must be true. Since all the lines of the truth table except the third
have the value true assigned to P => Q, the assertion is established.
The following examples illustrate the use of direct proofs.

Examples

(a) Theorem: If 6x + 9y = 101, then either x or y is not an integer.


Proof: Assume 6x + 9y = 101. This can be rewritten as 3(2x + 3y) =
101. But 101/3 is not an integer; therefore, 2x + 3y is not an integer and hence
either x or y is not an integer. Jj

(b) Theorem: Let S bea set of one- and two-digit integers such that each of
the digits 0 through 9 occurs exactly once in the set S. Then the sum of the
elements of S is divisible by 9.
Proof: Assume that the hypothesis of the theorem is true. The digits 0
through 9 sum to 45. In any set S, some of the digits will occur in the 10’s
position and the remainder will occur in the 1’s position. Let T denote the sum
of digits which occur in the 10’s position. Then the sum of the elements of S
can be expressed as 107 -+ 45 — 7, which can be put in the form 9T + 45.
Since both terms of this sum are divisible by 9, the sum is also divisible by 9,
regardless of the value of T. IF #

4, Indirect Proof of P = Q (Proof of the contrapositive)


The implication P > Q is logically equivalent to the implication ~Q > —P.
Consequently, we can establish the truth of P > Q by establishing that
—1Q = —P. The latter implication is usually shown by means of a direct
proof, i.e., by showing that if Q is false, then P is necessarily false. Hence,
the third line of the truth table for P => Q cannot occur.
MATHEMATICAL REASONING Ch, 1
50

Example
A perfect number is an integer wh ic h is equ al to the su m of all its div iso rs exc ept
the number itself. Thus, 6 is a per fec t nu mb er , sin ce 6 = 1 + 2 + 3, and so is 28.
We will prove the fol low ing th eo re m by est abl ish ing the con tra pos iti ve.

Theorem: A perfect number is not a prime.


Proof: The contra pos iti ve is the fol low ing : A pri me nu mb er is not a per fec t
number. Suppose p is a pri me num ber . The n p > 2 and p has exa ctl y two div iso rs:
1 and p. The sum of all its div iso rs less tha n p is the ref ore 1, and it fol low s tha t p is
not perfect. §F #

In summar y, to est abl ish P > Q by a pro of of the con tra pos iti ve,

1. Assume that Q is false;


2. Show on the bas is of tha t as su mp ti on and oth er ava ila ble in fo rm at io n
that P is false.
If the premise is a conjunction and we wish to show
Po
(PA i A+++ AP.>)Q,
the contrapositive of the assertion is
> (-P, VP,
—Q V ++: V P,)-
To establish this assertion, it suf fic es to sh ow tha t —@ Q imp lie s —P , for at lea st
one value of i.
We frequently wish to establish implications of the form
PiVP,V--: > Q.
VP)
These implications are usu all y han dle d usi ng a tec hni que cal led pro of by cas es, a
method justified by the following tautology:
(Pi VP Vio VP) > Ae: = VDAC>= ADA AC, > WD
A proof by case s req uir es pro vin g eac h “ca se, ” P, > Q, for eac h i fro m | to 7.
Often proofs by case s are not pre sen ted in full ; if seve ral of the imp lic ati ons ,
P, > Q, have similar proofs, then usually only one case is treated explicitly.

Example
Let “||” den ote the ope rat ion “ma x” on the set of inte gers I; if a > 5 then
a||6 = b\|a = a. For example, 4||2 = 4 and 1|J3 = 3.

Theorem: The binary operation “max” is associative; that is, for any integers
a, b, and c, (a[[ 6) Uc = allo).
Proof: For any three integers a, 6 and c, one of the following six cases must hold:
a>b>coal>c>bb>al>cob>ctbacpaSbocebsa.
Case1: Assume a > b >. Then (al|5)iJc = alle = aand
all(téUed =aib=a.
Case2: Assume a >c > b. Then (al{ 5) lc = al[c =a and
all(6Uod =alle=a. ;
There are four other cases; the proofs are all similar. J #
Sec. 1.5 METHODS OF PROOF 51

The last logical operator we will consider is logical equivalence. Theorems of


the form P <> Q are usually handled in one of two ways. Most commonly, the
separate implications P > Q and Q => P are proven and the assertion P = Q is
inferred. Sometimes a more economical proof is possible, beginning with a true
assertion of the form R< S and proceeding through a sequence of “if and only
if” statements such that each statement is logically equivalent to the one preceding.
If the last statement in the sequence is P <> Q, the theorem is established. This
technique will be used frequently in the next chapter.
Other proof techniques can be used to establish the truth of a proposition P.
A proof by contradiction, or reductio ad absurdum, assumes thatP is false and derives
a contradiction, such as the proposition Q A —7Q; this establishes —P = (O \ —Q).
Taking the contrapositive of this implication and applying one of DeMorgan’s
laws, we obtain (—Q VV Q) => P. Since the premise of this implication is true and
we have shown the implication to be true, we conclude that P is true.

Examples
(a) Theorem: There is no largest prime number.
The proof is by contradiction; we begin by assuming that a largest prime number
exists, and then show how to construct another which is larger.
Proof: Assume a largest prime exists; call it p. Because all primes are greater
than 1 and none are greater than p, there must be a finite number of them. Form the
product of all these primes and call it r; r = 2-3-5-7- ... -p. We now assert that
r + lisa prime. For if we divider -+ 1 by any prime between 2 and p, the remainder
is 1, which means that r + 1 cannot be expressed as a product of any two integers
other than r + 1 and 1. Since r > p, r + 1 is a prime number greater than p. This
contradicts the assumption that p is the largest prime number, and the theorem is
proved. ff

The logical structure of the preceding proof can be described as follows. Let P
denote “there is no largest prime number,” and Q denote “p is the largest prime
number.” The proof proceeds by assuming the theorem is false:

Gi) —™P
It follows that (for some particular integer p),
(ii) 7P>@Q
We then show how to construct a prime greater than p, i.e., we show

(ii) Q=>-7Q
From (ii) and (iii), applying the rule of hypothetical syllogism, we conclude

(iv) “P> 7Q
From (i) and (ii) and modus ponens, it follows that

(v) Q
and from (i) and (iv) and modus ponens,

(vi) “Q
Then from the rule of conjunction applied to (v) and (vi), we conclude
MATHEMATICAL REASONING Ch. 1
52.

(vii) QA 7Q
This is a contradiction. We conc lu de tha t the hy po th es is (i) is fal se and the th eo re m
is proved.
(b) Consider thé pro ble m of det erm ini ng whe the r a pr og ra m P wil l ter min ate
normally, i.e., not as the res ult of suc h thi ngs as exc eed ing its all ott ed exe cut ion
time or register overflow. It is con cei vab le tha t a co mp ut er pr og ra m cou ld be wri tte n
which would decide, for any pr og ra m P, whe the r P wil l hal t; suc h a pr og ra m wo ul d
be a “decis ion pro ced ure ” to sol ve wha t is kn ow n as the hal tin g pro ble m. We can
show by means of a pro of by con tra dic tio n tha t no pro ced ure exi sts whi ch wil l sol ve
the halting problem.
For ease of exp osi tio n, we res tri ct our dis cus sio n to pro ced ure s whi ch do not
read any input, alt hou gh the y ma y call oth er pro ced ure s. Thi s cor res pon ds to a
subproblem of the ori gin al pro ble m; if we can not dev ise a dec isi on pro ced ure for
the input-free pro ced ure s, the n we cle arl y can not dev ise one for arb itr ary pro ce-
dures. Let P be an inp ut- fre e pro ced ure . We ass ume (as a hyp oth esi s to be pro ved
false) that there exi sts a dec isi on pro ced ure HA LT suc h tha t the val ue of HA LT (P )
is “ye s” if the pro ced ure P hal ts and oth erw ise the val ue of HA LT (P ) is “no .” Th en
the following procedure could be executed:

procedure ABSURD:
if HALT(ABSURD) = “yes” then
while true do print “ha”

Now consider the behavior of the procedure ABSURD.


Suppose ABSURD halts. Then HALT(ABSURD) will return “yes” causing
execution of the while loop. The while loop prints “ha” as long as true has the truth
value true; thus, execution of the while loop results in (unending) gales of laughter.
We conclude that if ABSURD halts, then ABSURD does not halt.
Now suppose ABSURD does not halt. Then HALT will return “no,” causing
the test of the if-then statement to fail, and ABSURD will halt. Thus, if ABSURD
does not halt, then ABSURD will halt.
The assumption that HALT can decide whether an arbitrary program P
terminates has led to an absurdity, and we conclude that no procedure has the be-
havior assumed for HALT. Note that we do not infer that it would be very difficult
to write HALT, or that we don’t know how to write it; we conclude the much
stronger statement that no procedure exists which has the behavior ascribed to
HALT. #

The proof methods described so far are often inadequate for proving quantified
assertions. We now describe some additional proof techniques based on the rules
of inference for quantified statements. We will discuss techniques for proving asser-
tions in each of the following forms:
—AxP(x), IxP(x), =VxP(x), and VxP(x).
An assertion of the form —3xP(x) is most often proved by contradiction:
to show something does not exist, we assume it does and arrive at a contradiction.

{This program and those in the remainder of the book will be written in an informal ALGOL-
like language described in the Appendix.
Sec. 1.5 METHODS OF PROOF 53

This technique was used in our earlier proof that there is no largest prime number;
we assumed there was a largest prime and derived a contradiction of the form
Q (\ —@. We also note that —4xP(x) is equivalent to Vx — P(x). Hence, our
later remarks on proving universally quantified statements will sometimes apply.
Proofs of assertions of the form 4xP(x) are referred to as existence proofs.
Existence proofs are classified as either constructive or nonconstructive. A con-
structive existence proof establishes the assertion by exhibiting a value c such that
P(c) is true. By applying the rule of existential generalization, we conclude that
4xP(x) is true. Sometimes, rather than exhibiting a specific value of c, a construc-
tive existence proof specifies an algorithm for obtaining such a value.
A nonconstructive existence proof establishes the assertion JxP(x) without
indicating how to find a value c such that P(c) is true. Such a proof most commonly
involves a proof by contradiction; it shows that ~4xP(x) implies an absurdity
or the negation of some previous result.
A constructive existence proof specifies an element precisely, while a noncon-
structive proof may not provide any information other than an assertion of exist-
ence. Some results in mathematics fall between these two extremes. For example,
the mean value theorem of differential calculus asserts the existence of a parameter
value with a special property. Although the proof places bounds on the parameter
value (and thus provides useful information), the exact value of the parameter is
not specified. Theorems of this character are common in numerical analysis.
Assertions of the form VxP(x) are often most naturally proved by proving
the equivalent assertion Ix — P(x). Both constructive and nonconstructive exist-
ence proofs can then be used. A constructive existence proof involves finding an
element c of the universe of discourse such that P(c) is false; such an element is
called a counterexample to the assertion VxP(x). The element c forms the basis of
a proof by counterexample of the assertion —VxP(x).
Counterexamples can also be used to show that assertions involving predicate
variables are not valid. Construction of such a counterexample requires that we
exhibit a universe of discourse and an interpretation of the predicate variables
which makes the assertion false.

Example
Construct a counterexample to show the following assertion is not valid:
dx[P(x) = Od] > (BxP(x) = 4xQ(x)}.
A disproof requires that we exhibit a universe and predicates P and Q such that the
assertion is false; to disprove the above assertion we must find a universe and inter-
pretations for predicates P and Q such that
(a) Ax[P(x) > QO(x)] is true and
(b) dAxP(x) = 4xQ(x) is false.
From (b) it must happen that
(c) AxP(x) is true and
(d) 4xQ(x) is false.
REASONING Ch. 1
54 MATHEMATICAL

ege rs 1 an d 2, an d let P(x ) de no te “y == 1” an d


Let the universe consist of the int
the se pre dic ate s, the co nd it io ns of (a) th ro ug h
Q(x) denote “x #1 A x # 2.” With
ch oi ce s of un iv er se an d in te rp re ta ti on s for P
(d) are satisfied; consequently, these
and Q con sti tut e a co un te re xa mp le to the ass ert ion . +

A universally quan ti fi ed ass ert ion , Vx P( x) , is ge ne ra ll y pr ov ed by applying


the rule of un iv er sa l ge ne ra li za ti on de sc ri be d in the pr ev io us sec tio n. We first
show that P(x) is tru e for an arb itr ary el em en t x of the uni ver se. On ce thi s has been
established, the rule of uni ver sal ge ne ra li za ti on can be ap pl ie d to co nc lu de VxP(x).

Example
Theorem: For all integers x, x is even if and only if x? is even.

Proof: Using logical notation, the theorem can be expressed as


V x[x is even <> x? is even].
We prove the theorem by first establishing
x is even <> x? is even

for an arbitrary element x of the universe of discourse.


(a) First, we show the imp lic ati on fro m left to rig ht (th e “on ly if” par t, or “ne ces -
sity”) by a direct pro of. If x is eve n, the n x = 2k for som e int ege r kK. The n
x2 == (2k)? = 4k? = 2(2k?), which is an even number.
(b) We next show the implication from right to left (the “if” part, or “sufficiency”)
by showing the contrapositive: x is not even => x* is not even. If x is not even,
then x = 2k +1 for some integer k, and x? = (2k + 1)? = 4k? + 4k +1.
Since the first two summands are even, the sum is odd and the contrapositive
is established.
This completes the proof of
x is even <> x* is even.

Since the pro of was for arbi trar y x, we can appl y univ ersa l gen era liz ati on to con-
clude that
Vx(x is even <> x? iseven). fF #

The forms of mathematical argument we have considered are common and


widely acc ept ed, but by no mea ns exh aus tiv e; ind eed new pro of tec hni que s are
still bei ng dev ise d. In futu re cha pte rs we will dev elo p add iti ona l pro of tec hni que s
and apply them.
Our discussion of proof techniques has been “informal” in the sense that we
have not worked within a formal system in which all axioms and rules of inference
have been explicitly stated. The advantage of a formal system is that a character-
ization of the axioms and rules of inference implicitly defines the set of theorems:
it is the set of all statements which can be obtained from the axioms by applying
the rules of inference in all possible ways. In such a system it becomes possible to
distinguish between assertions which are true and those which are provable. An
Sec. 1.5 METHODS OF PROOF 55

assertion is provable if it is a theorem, i.e., if a proof of the assertion exists. (Note


that the definition does not require that we be able to construct the proof.) The
truth of an assertiomay n depend on the choice of universe of discourse and the
interpretation of the predicates; we have seen examples of assertions which are
true in some universes and not in others. Thus we can ask two things of a formal
system:

(a) That it be powerful enough to prove all valid assertions, that is, all
those assertions which are true regardless of the universe of discourse
and the interpretation of the predicate symbols.
(b) That it be powerful enough to prove all assertions which are true of some
particular universe with a specified interpretation of certain predicate
symbols. An example would be the universe of natural numbers with
predicates corresponding to equality and identities of arithmetic.
Without going into detail, we can say that mathematics has been rather successful
with (a), but not with (b). It has been established that, to a considerable extent, our
lack of success in (b) is inherent in our mathematical methods. For example, a
result due to Gédel asserts that if a formal system is powerful enough to express
assertions about integer arithmetic but permits only true assertions about arith-
metic to be proved, then there are other assertions which are true of arithmetic
but cannot be proved in the system.
The development of an understanding of the distinction between assertions
which are true and those which are formally provable was a magnificent accom-
plishment of mathematics; the work has profound implications for both philosophy
and mathematics. To explore further in this area, the student should consult the
excellent book of DeLong [1970].
When an argument is presented within a formal system, whether it is a proof
can be decided algorithmically, but formal systems do not encompass all of mathe-
matics. When an argument is presented outside a formal system, as most proofs
are, its validity must be determined by mathematicians; they must decide whether
the argument is convincing. Thus, the question is usually decided by consensus;
an argument is accepted as a proof if no one can perceive any flaws in its structure.
Agreement in such matters is very good, but the mechanism is not foolproof.
Although mathematical proofs are intended to be the quintessence of careful
argument, perceiving the flaws of an alleged proof can be a profoundly difficult
task. Examples exist of arguments which were widely accepted as proofs for many
years but were then shown to be fallacious by someone who discovered a possi-
bility which had been overlooked in the original argument. Sometimes such a
discovery results in a new argument being devised, which is then accepted as a
proof of the original assertion. But it is not uncommon for the overlooked possi-
bility to provide a basis for a counterexample to the original assertion, thus
disproving it. In summary, while a purported proof which is generally accepted is
rarely shown to be fallacious, examples of such occurrences do exist, and we must
conclude that “proof” is not a label which can never be removed.
Ch. 1
56 MATHEMATICAL REASONING

Problems: Section 1.5


as se rt io ns . In di ca te the pr oo f te ch ni qu e
1. Prove or disprove each of the following
set of in te ge rs I. Pu t ea ch as se rt io n in to
employed. Consider the universe to be the
ll ow in g fiv e de fi ni ti on s an d pr op er ti es of
logical notation. You may assume the fo
integers.
(Gi) An integer n is ev en if an d on ly if n = 2k for so me in te ge r k.
(ii) An integer n is od d if an d on ly if n = 2k + 1 for so me in te ge r k.
o in te ge rs is po si ti ve if an d on ly if the in te ge rs ha ve
(iii) The product of two nonzer
the same sign.
an d y, ex ac tl y on e of the fo ll ow in g ho ld s: x > y,
(iv) For every pair of integers x
x=y,orx<y.
x —ny is posi ti ve ; if x = y, t h xe — ny = 0; if x <y , th en x —y
(v) Ifx>y, the
is negative.
(a) An integer is odd if its square is odd.
(b) The sum of two even integers is an even integer.
(c) The su m of an ev en in te ge r an d an od d in te ge r is an od d in te ge r.
(d) There are two odd integers whose sum is odd.
(e) The square of any integer is negative.
(f) There is some prime number whose square is even.
(g) There do es no t exi st an in te ge r x su ch th at x? -+- 1 is ne ga ti ve .
(h) For any tw o in te ge rs x an d y, ei th er x — yo r y —x is no nn eg at iv e
(i) If 1 = 3, then the square of any intéger is negative.
Gj) If 1 =3, then the square of any integer is positive.
(k) The sum of any two primes is a prime number.
(1) There exist two primes whose sum is prime.
(m) If the square of any integer is negative, then 1 = 1.
Prove that the square roo t of 2 is irr ati ona l, tha t is, a/ 2 can not be exp res sed as a
ratio of two integers. (Hi nt: Use the fact tha t x is eve n if x? is eve n to con str uct a
proof by contradiction.)
Suppose we wish to show (Hi; A Hz A -+: A H,) > Q.
A common proof met hod is to ass ume —@ as an add iti ona l hyp oth esi s and ded uce a
contradiction, i.e.,
(Ai A Ax A+++ AA, A TWQ)>C€
where C is a contradiction.

Example

Theorem: Show that if P and P > Q are true, then Q is true.


Proof:

Assertion Reasons
NR tn
1. P Premise 1
2. P>@Q Premise 2
3, —“@Q Assumption (negation of conclusion)
4. “PV Q 2, implication
5. —P 3, 4, disjunctive syllogism
6. PA —P 1, 5, conjunction
Sec. 1.6 PROGRAM CORRECTNESS 57

But P A Pisa contradiction. Therefore, Q follows logically from the hypotheses.


i #
(a) ‘Justify the above technique using truth tables (assume only two hypotheses H;
and H,).
(b) Explain how this proof technique relates to proof by contradiction.

1.6 PROGRAM CORRECTNESS

Writing good computer programs is not a well-defined process, and criteria for
the evaluation of programs are often vague and ill-formed. There are, however,
three questions that are commonly used to assess the quality of a program:
(a) Is the program “well written” ?
(b) Is the program efficient?
(c) Does the program do what it is supposed to do?
The first question addresses the matters of style, clarity, and ease of modification;
evaluation of these properties will probably always be difficult and, to some
degree, subjective. The second question concerns the cost of program execution,
usually measured in terms of storage requirements and program execution time;
the study of program efficiency, often called algorithm analysis, will be treated in
Chapter 5. To answer the third question, we must first specify precisely what task
is to be performed. Then we must prove that the program is correct in the sense
that it performs the specified task. Establishing that a program is correct, also
known as program verification, is generally more difficult than writing the program,
but the costs which result from an incorrect program can easily exceed the cost of
verification. As a consequence, techniques for establishing program correctness are
of singular importance to the computer scientist.
Most program errors can be classified as either syntactic or logical. A syn-
tactic error is one which violates the definition of a well-formed program in the
given programming language. Syntactic errors are generally detected by the lan-
guage translator program (i.e., the compiler or interpreter) and can usually be cor-
rected easily. After all syntactic errors have been eliminated, a program is usually
tested for errors in logic by executing the program on a selected set of input data. But
correct performance of a program on test data does not guarantee that the program
is correct unless the program is tested with every possible input. Because it is
usually impractical to test all possible inputs, logical errors may remain even if
the program produces the correct results for the test data. As a consequence, pro-
gram verification usually requires the use of proof methods similar to those de-
scribed earlier in this chapter.
In this section we will describe a method for program verification based on
assertions about the program variables before, during, and after program execu-
tion; we will call such assertions program assertions. For simplicity we will restrict
our examples to integer arithmetic, that is, the universe of discourse for numerical
variables is taken to be the integers. Furthermore, as is customary in treatments of
Ch. 1
68 MATHEMATICAL REASONING

le ms as st or ag e li mi ta ti on s an d re gi st er
this topic, we will ignore such potential prob
overflow.
er ti es of pr og ra m va ri ab le s an d re la ti on -
Program assertions characterize prop
es of pr og ra m ex ec ut io n. Th es e as se rt io ns ca n
ships between them at various stag
utiliz e wh at ev er pr ed ic at es ar e ap pr op ri at e, su ch as
“x is nonnegative”
ae
x=y ae)

“x < y?

“x -- y < 2”?

“The entries of the vec tor V are sor ted in no nd ec re as in g or de r. ”


the sta te of the co mp ut at io n at ea ch st ep
We will use program assertions to describe
fo re the pr og ra m be gi ns ex ec ut io n an d af te r ea ch
of program execution, that is, be
ed . T he in di vi du al va ri ab le s wh ic h ap pe ar
program statement has been execut
m va ri ab le s. Fo r ex am pl e, if V is a ve ct or
in program assertions need not be progra
and i is no t a pr og ra m va ri ab le , th en the as se rt io n
4i(V[i] = x)
va ri ab le x is an en tr y in th e ve ct or V. Be ca us e a
establishes that the value of the
op os it io n at th e ap pr op ri at e po in t in pr og ra m
program assertion must be a pr
execution, all variables which occur in a pr og ra m as se rt io n mu st be bo un d wh en
the assertion applies. Any pr og ra m va ri ab le us ed in su ch an as se rt io n wil l ha ve an
assigned value and is therefor e bo un d by as si gn me nt . A va ri ab le ot he r th an a
program variable may be bo un d ei th er by qu an ti fi ca ti on or by as si gn me nt .
In order to establish th at a pr og ra m is co rr ec t, we mu st fir st ha ve a pr ec is e
specification of what the progra m is in te nd ed to do. Th is is gi ve n by me an s of
two program assertions ca ll ed the ini tia l as se rt io n an d th e fi na l as se rt io n. Th e
initial assertion characterizes what is kn ow n or to be as su me d ab ou t the pr og ra m
variables before progra m ex ec ut io n be gi ns . If no as su mp ti on is ma de , th e ini tia l
assertion is the tautology true. Th e fin al as se rt io n of th e pr og ra m sp ec if ie s wh at is
to be true of the pr og ra m va ri ab le s if th e pr og ra m te rm in at es no rm al ly (i. e., no t
as the result of somethin g lik e ar it hm et ic ov er fl ow or ex ce ed in g its al lo tt ed ti me ).
Together, the initial and final pr og ra m as se rt io ns sp ec if y th e ta sk to be pe rf or me d
by the program. The ques ti on of wh et he r a pr og ra m is co rr ec t ca n on ly be ad dr es se d
if a pair of initial and final as se rt io ns ha s be en ac ce pt ed as a co rr ec t ch ar ac te ri za -
tion of the task to be pe rf or me d; th at is, pr og ra m co rr ec tn es s mu st be ju dg ed rel a-
tive to a specified task. Th e fo ll ow in g de fi ni ti on ap pl ie s bo th to pr og ra ms an d to
finite sequ en ce s of pr og ra m st at em en ts kn ow n as pr og ra m se gm en ts .

Definition 1.6 .1: A pro gra m or pro gra m seg men t © is cor rec t wit h res pec t to
an initial assertion I and a final assertion F if,t whenever I is true of the program

+Here and throughout the book we follow mathematical convention for definitions and use
“if” where in fact “if and only if” is intended. For example, when we assert “An integer is prime
if it is greater than 1 and has no positive divisors other than 1 and itself,” the intention is “An
integer is prime if and only if it is greater than 1 and has no positive divisors other than 1 and
itself.” This convention is used only in stating definitions.
Sec. 1.6 PROGRAM CORRECTNESS 659

variables prior to execution of ®, and @ terminates, then F will be true of the pro-
gram variables after execution of © is complete.

We now describe some notation which will be useful in treating program cor-
rectness. Let Ai and Aj be program assertions, and let S be a program segment.
We will use the notation
Ai {S} Aj
to denote “if Aiis true prior to the execution of S, and S is executed and terminates,
then Aj will be true immediately following the termination of S.” Using this nota-
tion we can restate Definition 1.6.1 by saying a program © is correct with respect
to an initial assertion J and a final assertion F if and only if J {@} F. When S con-
sists of a number of program statements it will sometimes be more convenient to
state that “The program segment
Ai
Ss
Aj
is correct” rather than using the notation Ai {S} Aj.

Examplet
The program segment
Al: true
x<—1;
ye 2
A2Z:x=1Ay=2 ,

is correct. Equivalently, we can state


true{x—1ij;y—2x=1Ay=2. #£

In order to prove that a program is correct, we need a way to characterize


the effect on the program variables of executing the program. This implies that we
need such a characterization for each kind of executable statement of the program-
ming language, as well as a way of combining these characterizations into a descrip-
tion of the effect of executing the entire program. We begin by describing some
fundamental rules of inference. As with the rules of inference described in Section
1.4, these rules are not meant to be surprising or profound; in fact they should be
as simple and transparent as possible. But they must be powerful enough to enable
us to prove programs correct, and they must characterize precisely and correctly
the effect of executing programs and program segments.
The first rule of inference establishes that we can break a proof of correctness
of a program into a series of proofs of correctness for successive parts of the pro-
gram. Let S, and S, denote program segments, and denote by S,; S, the program
segment obtained by placing a semicolon after S, and then concatenating S, with
the result. Thus S,; S, denotes a program segment whose execution has the same

+The early examples of this section will rely on the reader’s understanding of the effect of exe-
cuting an assignment statement. A careful treatment of this topic will be given later in this section.
REASONING Ch. 1
60 MATHEMATICAL

effect as first executing S, and then ex ec ut in g S,. Th e fir st rul e of in fe re nc e, cal led
Q, {S, } Q, an d Q, {S, } Q;, th en it fol -
the rule of composition, states that if both
if the pr og ra m as se rt io n Q, is ini tia lly tru e of
lows that QO, {S,;5,} Qs; that is,
ec ut ed , th en aft er te rm in at io n of the seg -
the program variables and S;; S, is ex
ment S,; S,, the assertion Q, wil l be tru e. Pr es en te d in the ta bu la r fo rm of ou r
previous rul es of in fe re nc e, thi s is sta ted as fo ll ow s:
Q, {5S} Q,
Q, {S52} Qs
“0, {S13 S2}Q; Rule of Composition
We can interpret the rule of com pos iti on in ter ms of bot h flo wch art s and pro-
grams by adding the pro gra m ass ert ion s to the flo wch art or the pro gra m text .
Note that program assertions are ass oci ate d wit h stat es of the com put ati on rat her
than actions. For this reason, progra m ass ert ion s are ass oci ate d wit h the edg es of
flowcharts, and they either pre ced e or fol low the sta tem ent s of a pro gra m. Whe n-
ever an edge of a flowchart is tra ver sed , the ass oci ate d pro gra m ass ert ion is true .
Immediately before a pro gra m sta tem ent is exe cut ed, the pro gra m ass ert ion whi ch
precedes it is true.
The rule of compos iti on can be int erp ret ed wit h flo wch art dia gra ms as fol low s:

and

is correct.”
Sec. 1.6 PROGRAM CORRECTNESS 61

Using program segments, the rule of composition can be stated as:

“If Al: Q1 Al: Q2


- SL and S2
A2:Q2 A2:Q3
are both correct, then
A1:@1
S1;
S2
A2: Q3
is correct.”

Example
If we can establish that

Al: true A2:x = 1


x<-l and yex+z
A2:x = 1 A3Biy=z+1

are both correct, then we can conclude from an application of the rule of composi-
tion that

Al: true
x<-l;
yox+zZz
A3iy=2z+1

is correct. #

The rule of composition makes it possible to infer the correctness of a pro-


gram from the correctness of its program segments. Thus, if we wish to show
I {0} F for some program @, we can break @ into program segments S,;, S,,..., 5,
such that @ = S,;S,;...;S, and then devise “intermediate assertions”
Q,,Q.,,..., Q,-;. If we are able to prove the 2 lemmas

T {S13 Q1, Q1 (S2} Qo, ... 5 On-2{S,-1} Q,-1, and Q,_, {S,} F,
it will follow from repeated applications of the rule of composition that I {®} F.
The next rules of inference, called rules of consequence, state that a program
assertion which precedes a program segment can be replaced by a stronger one, and
an assertion which follows a program segment can be replaced by a weaker one
without affecting the correctness of the segment. (Recall that P is stronger than
OQ if P > Q.) The rules are given as follows:
Q,> Q, Q, {S} Q,
Q, {S} Q, Q, > Q;
“QO, {8S} Q; “QO, {83 Q, Rules of Consequence
The two rules of consequence allow us to ignore information about the program
variables if it is not important for the proof of correctness. For example, the value
Ch. 1
62 MATHEMATICAL REASONING

mi gh t pl ay an im po rt an t ro le in th e pr og ra m as se rt io ns wh ic h
of an in de x va ri ab le
en ex ec ut io n pr oc ee ds pa st th e lo op , th e
hold during execution of a loop, but wh
va lu e of th is va ri ab le ma y no t be si gn if ic an t.
, Q, , an d Q, de no te pr og ra m as se r-
In the rule of consequence the variables Q,
> Q; wh ic h ap pe ar as hy po th es es in
tions. The implications Q, > Q, and Q,
te tw o pr og ra m as se rt io ns . Th es e im pl ic at io ns
the rules are propositions which rela
g th e te ch ni qu es of th e pr ev io us se ct io ns of th is ch ap te r; th is is
are pr ov ed us in
done independently of an y co ns id er at io n of th e pr og ra m.

Example
If the program segment

Al: true
x<-1;
Zoey
Al:z=y+l

is shown to be correct, then since z= y + 1=>2> y, we can conclude that

Al: true
x<_1;
ze-ytx
A2’:z>y

is correct. #

We next treat the rul es of inf ere nce wh ic h are co nc er ne d wit h so me of the
control statements of our pr og ra mm in g lan gua ge. The con tro l sta tem ent s inc lud e
conditional branches and loo ps; the y can cau se pr og ra m sta tem ent s to be exe cut ed
in an order different fr om tha t in wh ic h the y ap pe ar in the pr og ra m tex t. We wil l
treat three fundamental typ es; “if con dit ion the n S,” “if con dit ion the n S, els e S2, ”
and “while condit ion do S.” In eac h sta tem ent typ e, con dit ion is an ass ert ion (bu t |
not a,program assertion) abo ut the val ues of the pr og ra m var iab les ; wh en ev er
condition is evaluated , it is eit her tru e or fal se. For eac h st at em en t typ e, the por tio n
of the program to be exe cut ed nex t is de te rm in ed by the tru th val ue of con dit ion .
The precise effect of executing eac h sta tem ent typ e is cha rac ter ize d by a rul e of
inference.
When the sta tem ent “if con dit ion the n S” is exe cut ed, the pr og ra m st at em en t
S is executed if and only if con dit ion is tru e. (No te tha t S can be a sin gle st at em en t
or a sequence of sta tem ent s enc los ed in a be gi n. .. end pai r.) A rul e of inf ere nce
for thi s sta tem ent typ e mu st inv olv e pr ec ed in g and fol low ing pr og ra m ass ert ion s
which will be tru e wh et he r or not the st at em en t S is exe cut ed. Th e rul e, cal led the
if-then rule, is the following:
(Q, A condition) {S} Q2
(O, A — condition) > Q,
..Q, {if condition then S} Q, The if-then Rule
Sec. 1.6 PROGRAM CORRECTNESS 63

Note that the implication (Q, /(\ condition) > Q, is a proposition which must
be proved without reference to the program. The if-then rule can be interpreted
using flowcharts in the following way. (Note that when edges of a flowchart con-
verge, the point of convergence is treated as a node and different assertions can
appear on the edges which enter and leave it.)

“If we can show that

(Q, A condition )> Q,

and

QO, A condition

is known to be correct, then we can infer that

is correct.”

In terms of programs, the if-then rule can be stated as follows:


“If the implication
(Q, A — condition) > Q,
REASONING Ch. 1
64 MATHEMATICAL

is true and

A1:Q1 A condition
RY
A2: Q2

is correct, then

Al: Q1
if condition then S
A2: Q2

is correct.”

Example
To show that

Al: true
y 0
if x < 0 then
A2:x2z0Vy=0

is correct, it suffices to show that the implication


[true A (x >
<0) >[0V y=)

is true and that

Al’:true Ax <0
yO
A2:xeOovy=0

is correct. It then follows from the if-then rule that


true {if x <O then y<—0}x >OV y=0.
The proof that the implication holds uses the identities in Table 1.1.1:
{true \ (x <0} > (x < 0) simplification
(x <d2-x>0 definition of >
x>os[>0Vy=9)] addition
To prove that

Al :true Nx <0
yO
Al:ix>OVy=0

is correct, we first observe that, since y is assigned the value 0 and the value of x is
not changed,

AV’ :true Ax <0


y< I
Ad :trueNx<O0Ay=0
Sec, 1.6 PROGRAM CORRECTNESS 65

is correct. Since A2’ > A2, it follows from a rule of consequence that

Al’:true A x < 0°
yo ID
A2:x>OVy=0

is correct. 3

When the statement “if condition then S, else S,” is executed, if condition is
true, then S, is executed; otherwise S, is executed. The if-then-else rule of inference
is the following:
(Q; A condition) {S,} OQ,
(Q, (\ condition) {S,} O,
”.Q, {if condition then S, else S,} O, The if-then-else Rule
We leave the flowchart and program formulations of the if-then-else rule as exer-
cises.

Example
In order to establish that

Al true
if x < 0 then »y «- —1 else
y — 1
Aa <OAy=-IV(KSOAy=1)

is correct, it suffices to show that both

Al’:true Ax <0
yo ol
A(x<O0OAY=—-DV@>O0OAy=1)

and

Al”: true \ “(x <0)


yo
A(X <OAVY=-)DV@SOAY=)

are correct. #

When a “while condition do S” statement is executed, if condition is false, then


execution proceeds to the next statement of the program. Otherwise, the statement
S is executed repeatedly until condition becomes false; condition is evaluated after
each execution of S. Note that unless condition becomes false, execution of the
while statement (and therefore of the program) will not terminate.
The rule of inference for the while statement, called the rule of iteration,
requires a program assertion which is true before the statement is executed and
remains true after each execution of the statement S. This assertion is known by
such names as the Joop invariant relation or loop invariant condition; it describes a
Ch. 1
66 MATHEMATICAL REASONING

a m o n g th e pr og ra m va ri ab le s ea ch ti me co nd it io n is
relationship wh ic h ho ld s
ly af te r ev er y ex ec ut io n of S. Fo rm ul at io n of th e pr op er
evaluated and cons eq ue nt
te n a di ff ic ul t st ep in pr ov in g a pr og ra m co rr ec t. Th e
loop invarian t re la ti on is of
ow in g (w he re th e pr og ra m as se rt io n Q is th e lo op in va ri -
rule of iteratio n is th e fo ll
ant relation):
Q A condition {S} Q
io n \ Q) Ru le of It er at io n
Q {while condition do S} (—condit
it er at io n ca n be ch ar ac te ri ze d us in g fl ow ch ar ts as fo ll ow s:
Th e ru le of

Q Acondition

is correct, then

is correct.”

In te rm s of pr og ra ms , the rul e of it er at io n sta tes


“If

Al:@Q A condition
Ss
A2:Q
Sec, 1.6 PROGRAM CORRECTNESS 67

is correct, then

A1:@
while condition do S
A2:Q A “condition

is correct.”

Example
The procedure PRODUCT given in Figure 1.6.1 sets y equal to the product of a
and b, where a is a nonnegative integer. The procedure multiplies a and b by re-
peated addition, that is, y is initialized to 0 and then b is added to y a times.

procedure PRODUCT:
comment: set y = ab, where a> 0.
Al:a=>0
begin
i<0;
Al:aSOAi=0
yO;
AB:aZSOAi=O0Ay=0
A4:y = ib Nixa
whii le < a do
AS:iy =ib Ni<a
begin
yo yt;
46:yp=%+D)bAti<a
ie-it+]
A4:y = 1b Nixa
end
AT:y
= ab
end

Fig. 1.6.1. A procedure for multiplication by repeated addition

The procedure has been annotated with program assertions, one of which holds
after each step of the computation; Al is the initial assertion and A7 is the final
assertion. We will now describe how to prove PRODUCT is correct with respect to
Al and A7.
The proof of correctness can be divided into two parts by proving the following
two lemmas.t

Lemma l: Al {i<— 0; y — 0} A4.

Lemma 2: A4 {while i < a do begin y — y -+ 6; i<- i+ 1 end} A7.


Proofof Lemma 1: We first use the intermediate assertion A2 and observe that
Al {i — 0} A2, that is,
a>Ofi-
Ola S>oAi=0,

+We continue to rely on the reader’s understanding of the effect of an assignment statement.
REASONING Ch. 1
68 MATHEMATICAL

en t doe s not aff ect the val ue of a an d it set s the val ue of


since the assignment statem
i to 0. Similarly, A2 {y <— 0} A3, that is,
a>vOAi=Ofye-Dal>o0Ai=OAy=O.
By the rule of composition, we conclude
Al {i< 0; y — 0} A3.
From the rules of ari thm eti c and the pro per tie s of < it is cle ar tha t 43 > A4.
Therefore we can app ly a rul e of co ns eq ue nc e to co nc lu de tha t

Al {i — 0; y — 0} A4.
This completes the proof of Lemma 1.

Proof of Lemma 2: To pro ve the whi le loo p is cor rec t wit h res pec t to the ini tia l
assertion A4 and the final ass ert ion A7, we mus t firs t est abl ish tha t the hyp oth esi s of
the rule of iteration holds, that is,
A4 Ni<afy~yt+byicit I A4.
Observe that (44 (A i < a) <= A5, so it suffices to show
AS{y~—y+b;ie—it+ 1} A4.
We use the intermediate assertion A6 and first show
AS {y —y + b} A6.
Since the valu e of y is cha nge d by the ass ign men t stat emen t, let y’ den ote the valu e
of y before the assignment statement is executed. Then A5 is the assertion
y=zilb Ai<a.

The assignment statement sets y equal to y’ + 6. The conjunction of the propo-


sitions y’ =ib Ai<a and y=y’ +5 implies A6, so we conclude that
AS {y — y + b} A6. Similarly, letting i’ denote the value of i before the statement
iwi -+ 1 is executed,
QVQ=WU@+DbAiU<aAi=i’+)D>Q=ibAi<a),
so we conclude that A6 {i — i + 1} A4. By the rule of composition, we infer that
Ad5{y—y +b; ie i+ 1} A4 and ther efor e the hyp oth esi s of the rule of iter atio n
holds.
Applying the rule of iteration, we conclude
A4 {while i < a do begin y — y + b; ic i+ lend} 44 A iza.

We then show that (44 A i> a) = A7and apply a rule of consequence to complete
the proof of Lemma 2.
It follows from Lemma 1 and Lemma 2 and the rule of composition that PROD-
UCT is correct with respect to Al and A7. § #

The program verification method we have described requires the following


steps:
1. Formulate initial and final assertions which characterize the task to be
accomplished by the program.
2. Segment the program into sections which accomplish subtasks, and for-
Sec. 1.6 PROGRAM CORRECTNESS ~~ 69

mulate initial and final assertions for each subtask. In every case in which
a program segment S, may be executed immediately prior to a segment
S,, the final assertion of S, should imply the initial assertion of S).
3.- Prove that each program segment is correct with respect to its initial
and final assertions.
4. Conclude that the program is correct with respect to its initial and final
assertions.
Note that if intermediate assertions have been chosen correctly for a program
and the initial assertion was true prior to program execution, then each program
assertion is true at the appropriate point of the computation. It follows as a special
case that if execution reaches the end of the program (that is, if the program ter-
minates) then the final assertion will be true when execution is complete.
Formally, a program is correct so long as it has performed the correct task
whenever it halts. In fact, according to Definition 1.6.1, a program that never
halts is correct for every pair of initial and final assertions; it follows that proving
program termination is just as important as proving correctness. It is common to
refer to what we have called “correctness” as “partial correctness,” and to call a
program “correct” if it is both “partially correct” and always halts if the initial
assertion is true prior to execution. We will treat one technique for proving pro-
gram termination in Section 3.6.

Axioms of Assignment

The preceding discussion of the formal rules of program verification has only
treated rules of inference. Rules of inference are always of the form “if we know
one thing is true, then we can conclude something else is true.” Unless we have a
characterization of some true statements, we cannot apply the rules of inference;
thus we need some axioms for our system in order to complete the specification of
our proof mechanism. The axioms for program verification describe the effect of
executing an assignment statement.
Consider a program with variables x,, x,,..., x, An assignment statement
has the form
X; <~ E(Xy, Xp... + X,)s
where &(x,, X2,...,X,) is an expression involving (some of) the variables x,,
X2,...,X,. If the program assertion A(x,, x,,...,%,) holds prior to the execu-
tion of the assignment statement, then the following assertion will hold after the
assignment statement:
Ay[A(x, Xa, ees Xi~is Vo Xi+ts ae) Xn) /\

x; = &(x,, X2; hah) Xi 45 Vs Xitts eg xn)

This program assertion states that there exists some value for y (namely the former
value of x,) which makes the assertion
A(X13 X95 066s XpntsVo Minty oe 9 Xn)
Ch.1
70 MATHEMATICAL REASONING

y is su bs ti tu te d fo r x, in th e ex pr es si on & to ge th er wi th th e
true, and if this va lu e of
va ri ab le s x, , wh er e j # i, th e re su lt wi ll be th e cu rr en t
current va lu es of th e ot he r
e ab ov e as se rt io n co rr ec t, bu t it is th e st ro ng es t co rr ec t
value of x; No t on ly is th
ma de ba se d on ly on th e kn ow le dg e th at A( x1 , %2 ,- -- > x, )
assertion wh ic h ca n be
em en t ex ec ut io n. Be ca us e th is wa y of co ns tr uc ti ng pr og ra m
holds prio r to th e st at
s th em in th e sa me or de r as pr og ra m ex ec ut io n, it ca n be us ed
assert io ns ge ne ra te
tr uc ti on of pr og ra m as se rt io ns fo r as si gn me nt st at em en ts .
for the “f or wa rd ” co ns
ti on of th e st ro ng es t po ss ib le pr og ra m as se rt io n is ch ar ac -
The fo rw ar d co ns tr uc
ow in g ax io m co nc er ni ng th e ef fe ct of th e as si gn me nt st at em en t.
terize d by th e fo ll
A(X45 Xq5 00 + 9 Xn) {Xi — E(%1 , Xa ,- + +> Xp )} Ay [A (% 1, Xa y 0 M e a Vs

Xitioer res x,) A X= G(X, Xa. Mints Vs Mizae s+ +> Xr)


Axiom of Assignment.
t ca n al so be de sc ri be d in a “b ac kw ar d”
The effect of an assignment statemen
as se rt io n wh ic h fo ll ow s th e as si gn -
direction. Suppose A(x;, X2,.--,%n) iS an
ment statement
X, — G(X, Kaye oy Xn)
n wh ic h pr ec ed es th e as si gn me nt st at em en t mu st im pl y th e as se r-
Th en th e as se rt io
tion
A(X15 Xa9 0+ 0 9 Xpnty OX , X9 0 Me ee s Xn) y Xi et s ++ Xn )

ev er y oc cu rr en ce of x; in A( X4 5 Xa 3 ++ + Xp )
This assertion is obtained by replacing
as si gn me nt st at em en t; it is th e we ak es t
by the expression on the right side of the
x, , x2 ,- -. ,> X, ) Wi ll ho ld af te r ex ec ut io n of th e
statement which will assure that A(
tr uc ti on of as se rt io ns is fo rm al iz ed by
assignment statement. The backward cons
ca n be us ed in pl ac e of th e on e gi ve n
the following axiom of assignment, which
previously.
A(X 5 Xa 0 029 Mint y B X 45 Xap 00 Xe ee s Xn) > Xi zt s s+ +9 Xn)

(x, — BCX1, Xa5 00s Xn} As Kase es X_)s Alternate Axiom of Assignment.
co ns tr uc t a pr og ra m as se rt io n to pr ec ed e an as-
This axiom is commonly used to
rt io n wh ic h fo ll ow s the st at em en t. Th e ba ck -
signment statement based on the asse
ns for as si gn me nt st at em en ts is us ua ll y eas ier th an
ward construction of assertio
constructing them in the forward di re ct io n. Th e tw o ax io ms of as si gn me nt are
redundant in tha t on ly on e is re qu ir ed for pr og ra m ve ri fi ca ti on .

Examples
(a) Consider the program segment

Al
xe-x+ypt4z
A2

If Al is the assertion
Alixt+yt2z2=9
Sec. 1.6 PROGRAM CORRECTNESS 71

then the strongest assertion that can be made for A2 is


A2:4dx'Tx’ +y+z2=9Ax=
+y4_2)
x.
It is easy to show that this statement implies
Ad :x =9
Using backward construction, if we suppose A2 is the assertion
Al:x =9
then A1 is obtained by substituting x -+ y + z for x in A2, giving
AVix+y+z=9,
(b) Consider the following program segment to interchange the values of x and y.

Alix=x Ay=y'
temp<— x;
xe y;
y <- temp
A4ix=y Ay=x’

The assertions Al and A4 involve variables x’ and y’ which are not program
variables. They are auxiliary variables bound by assigning them the original
values of x and y respectively.
To prove the program segment is correct with respect to Al and A4, we
use the backward construction of assertions. From A4 we construct 43 by
substitution of femp for y in A4; thus A3 is
A3:x =y' A temp = x’
Using backward construction from A3, we obtain
A2:y =y’ A temp = x’
Applying backward construction to A2 yields Al. By the rule of composition,
it follows that the program segment is correct with respect to Al and 44. #

Program verification using the techniques we have described in this section is


a difficult task; only relatively simple programs can be verified in this way. Programs
will not be verified to the level of precision and detail we have described in this
section unless more powerful tools are developed or the major part of the verifica-
tion can be done using a computer. Because program verification is a young
subject, there is no doubt that more powerful mathematical tools will be developed.
Some success has already been achieved with computer-aided verification of pro-
grams, and there is no doubt that these tools will also be improved. Nevertheless,
it seems likely that many programs will never be subjected to the rigors of careful
verification. It does not follow that the concepts of program verification will have
no impact on programming practice. An understanding of techniques for proving
programs correct affects the way a programmer approaches his work. Through
an appreciation of the characteristics that make a program difficult to verify, he
will learn to write programs which can be verified if the need arises. The result is
likely to be a good program: easy to read, modify, and understand.
REASONING Ch. 1
72 > MATHEMATICAL

Problems: Section 1.6

1. (a) Give a flowchart int erp ret ati on of the if- the n-e lse rul e of inf ere nce .
Give an informal sta tem ent of the if- the n-e lse rul e of inf ere nce usi ng pr og ra m
(b)
segments.
2. Write a program se gm en t wh ic h is cor rec t wit h res pec t to the ini tia l ass ert ion tru e and
the final assertion false. (Hint: Study Definition 1.6.1.)
3. Prove the follow ing pr og ra m se gm en ts are cor rec t. Use bot h fo rw ar d and ba ck wa rd
construction of assertions.

(a) Al: true


x<-1;
yo 2
AF:x =1Ay=2

(b) Al:x > 0


ype z+tx
AF: y>z

(c) comment: Set y = ax? + bx +c


Al: true
yora*x;
y+ b)*x;
yoyrte
AF:y = ax? + bx +e

4, Prove the fol low ing pro gra m seg men ts are cor rec t. Stat e whi ch rule s of inf ere nce are
used.
(a) In the following, x’ is an auxiliary variable.

comment: Set x to the absolute value of x.


Alix =x’
if x < 0 then x << —x
AFi(x’ <0>x = -—X) AG SO>x= x’)

(b) Al: true


if x > y then max < x else max <— y
AF:(x > y A max = x) V (x <y A max = y)

5. Consider the fol low ing pro gra m seg men t whi ch sets d equ al to max (a, 6, c).

Al: true
d<—a;
<— b;
if b > dthend
ifc > dthend<c
AF:.d=aVd=bVd=c)A\dza\d>bAdzec

(a) Construct the intermediate assertions.


(b) Prove the program segment is correct.
6. Provide intermediate assertions and show that the procedure ZERO given in Fig.
1.6.2 is correct with respect to the initial assertion
Sec, 1.6 PROGRAM CORRECTNESS 73

Al:n>0O
and the final assertion
AF: Vil <j<n=> V[j]= 0].
Use the following loop invariant relation
i<nt+1A Vil <j <i> Vij] =9).
procedure ZERO:
comment: Set all entries of V[1: n] to zero.
begin
i<1];
while i < n do
begin
V[i]<— 0;
ii+l
end
end

Fig. 1.6.2 A procedure to zero the vector V[1: 7]

7. The procedure PRODUCT which was proved correct in this section is not the only
procedure which is correct with respect to the given initial and final assertions. Con-
sider the following procedure.

procedure SNEAKY:
Al:a>0
begin
b<-0;
yO
AF:y = ab
end

How could the initial and final assertion be changed so that SNEAKY would not be
correct with respect to AT and AF? Address the general question of how to rule out
unintended solutions.

procedure SUM:
comment: Set sum equal to sum of entries of V[I: n].
Al:n>0
begin
sum <— 0;
ic;
while i <n do
begin
sum <— sum + Vi];
iei+l
end

AF: sum = p> VUE]

ena

Fig, 1.6.3 Procedure to sum the elements of a vector


Ch. 1
74 MATHEMATICAL REASONING

Construct intermed ia te as se rt io ns fo r th e pr oc ed ur e SU M given in Fig. 1.6.3.


8. (a)
Identify the loop invariant relation.
(b) Prove SU M is co rr ec t wi th re sp ec t to Al an d AF .
as se rt io ns fo r th e pr oc ed ur e S E A R C H given in Fig.
9. (a) Construct intermediate
1.6.4. Identify the loop invariant relation.
(b) Prove SEA R C H is co rr ec t wi th re sp ec t to Al an d AF .

procedure SEARCH (arg):


Set index to smallest val ue suc h tha t Vii nde x] = arg in V[I :n] .
comment:
Assume arg is an ent ry in V and the whi le loo p te rm in at es .
Al:n2>1 A R[V[i] = arg]
begin
index — 1;
whi le V [in dex ] + arg do ind ex < ind ex + 1
AF: Vii nde x| = arg \ Wil l <i < ind ex > V{ i) # arg ]
end

Fig 1.6.4 Linear search procedure

Suggestions for Further Reading

of thi s ch ap te r co me pr in ci pa ll y fr om th e fie ld
The concepts and terminology
ve s a ve ry re ad ab le tr ea tm en t of ma ny of
of mathematical logic. Wilder [1965] gi
va nt to la te r ch ap te rs as we ll as thi s
the basic issues in this area; his book is rele
en t in tr od uc ti on to ma th em at ic al lo gi c, in cl ud -
one. Shoenfield [1967] gives an excell
ma th em at ic al th eo ry of mo de ls . De Lo ng
ing treatments of formal systems and the
of ma th em at ic al lo gi c, th e na tu re of its
[1970] describes the historical development
io ns . Th e ha lt in g pr ob le m an d re la te d qu es -
results, and its philosophical implicat
tions are treated nicely in Minsky [1967].
d [1 96 7] an d Ho ar e [1 96 9] pr ov id e an ex ce ll en t
The original papers by Floy
ve ri fi ca ti on . Th e su rv ey by El sp as , ed al. ,
introduction to the topic of program
so ci at ed wi th pr ov in g pr og ra m co rr ec tn es s; th ei r
[1972] treats several topics as
di ff ic ul t re ad in g th an th os e by Fl oy d an d
article is broader in scope and more
Hoare. The text by Manna [1 97 4] tr ea ts pr og ra m ve ri fi ca ti on fo r bo th fl ow ch ar t
programs an d pr og ra ms in an AL GO L- li ke la ng ua ge .
SETS

2.0 INTRODUCTION

The concept of a set is of fundamental importance in modern mathematics. Most


mathematicians believe it is possible to express all of mathematics in the language
of set theory. Our interest in sets is due both to their role in modern mathematics
and their usefulness in modelling and investigating problems in computer science.
Sets were first studied formally by G. Cantor (1845-1918). After set theory
had become a well-established area of mathematics, contradictions, or paradoxes,
were found in the theory. Eventually, more sophisticated approaches than Cantor’s
were developed in order to eliminate these paradoxes. Introductory treatments of
set theory usually:describe a “naive” set theory, which is quite similar to Cantor’s
original work, rather than developing the axiomatic framework necessary to avoid
the paradoxes. We will take this simpler approach and develop a set theory in which
it is possible to derive contradictions. It may seem strange to pursue such a course
deliberately, but the naive theory does not lead to contradictions if the universe
of discourse is suitably defined, as it always will be in our investigations. Further-
more, the existence of the paradoxes in the naive theory will not affect the validity
of our results because the theorems we will present can also be developed in
alternative systems in which the paradoxes cannot occur.
In Section 2.2 we will describe some of the paradoxes of naive set theory and
discuss how a more sophisticated theory can circumvent them.

2.1 THE PRIMITIVES OF SET THEORY

A set is any collection of objects which can be treated as an entity, and an object
in the collection is said to be an element, or member, of the set. Given any object
x and set S, if x is an element of the set S, we will write x € S; if x is not an

76
Ch, 2
76 = SETS

element of S, we will write —(x € S) or x ¢ S. Th e te rm s set , col lec tio n, an d cla ss


will be used as sy no ny ms , as wil l the te rm s el em en t an d me mb er .
Note that we have not given eit her a fo rm al de fi ni ti on of a set , or a bas is for
a me mb er of a set . An y ma th em at ic al th eo ry mu st
deciding when an object is
or un de fi ne d no ti on s (e. g., the no ti on s of “p oi nt ”
ultimately rest on some primitive,
and “line” in geometry); the notion of “se t” an d the re la ti on “is an el em en t of ”
are the primitive concepts of set th eo ry . As a co ns eq ue nc e of no t ha vi ng de fi ni ti on s
for these concepts, we have no fo rm al tes t to de te rm in e wh et he r so me th in g is
a set or whether a given obje ct is an el em en t of a sp ec if ie d set . Be ca us e th er e is no
test, we must rely on a common un de rs ta nd in g of the me an in g of the te rm s.

Examples
Almost anything which wo ul d be cal led a set in or di na ry co nv er sa ti on is an
al sen se. Th e fo ll ow in g ex am pl es wil l ill ust rat e thi s
acceptable set in the mathematic
point.
The set of nonnegative integers les s tha n 4. Thi s is a fin ite set wit h fou r me mb er s:
(a)
0, 1, 2, and 3.
Pub lic Li br ar y at the pre sen t tim e. Thi s is als o
(b) The set of books in the New York
a finite set. It wo ul d be dif fic ult to list the me mb er s of thi s set be ca us e of the
constant flux in the Library’s col lec tio n, but the dif fic ult ies are pra cti cal on es
rather than theoretical.
(c) The set consisting of the na me s of the peo ple who spo ke to Ch ar le ma gn e on Ma y
10, 810 A.D. This set is fin ite and pr ob ab ly con tai ns at lea st one ele men t. It
has the disturbing charac ter ist ic tha t the re ma y not be a wa y to de te rm in e the
members of the set. Mo st ma th em at ic ia ns , ho we ve r, wo ul d not reg ard thi s as
detractin g fr om its acc ept abi lit y as a ma th em at ic al set .

(d) The set of live dinosaurs in the bas eme nt of the Bri tis h Mu se um . As su mi ng the re
have been no sinister experi men ts in the bas eme nt of the Bri tis h Mu se um , thi s
set has the pro per ty of not hav ing any me mb er s, and is cal led a null , or emp ty,
set.
(e) The set of integers gre ate r tha n 3. Eve n tho ugh this set is infi nite , the re is no
difficulty in determining whether a specified integer is a member.
(f) The set of all pro gra ms in the AL GO L lan gua ge whi ch can be pun che d on no mor e
than 500 car ds. Thi s set is ver y lar ge, but fini te, and a cor rec tly ope rat ing com -
piler can det erm ine whe the r or not a pro gra m is an ele men t of this set.

(g) The set of all pro gra ms in the AL GO L lan gua ge whi ch weu ld hal t if run for a
sufficiently long tim e on a com put er wit h unb oun ded sto rag e. Thi s set is not fini te
becaus e no mat ter how lar ge a pro gra m we wri te, it is pos sib le to wri te a lar ger
one by ins ert ing ano the r sta tem ent . (Th e sta tem ent nee d not per for m any
useful tas k.) Alt hou gh the re is a ma xi mu m size of AL GO L pro gra ms whi ch
can be run on any giv en com put er, the re is not hin g abo ut the AL GO L lan -
guage itself whi ch lim its the size of a pro gra m. Com put abi lit y the ory has
established tha t no alg ori thm exi sts to det erm ine whe the r an arb itr ary pro gra m
is an element of this set; such a set is called undecidable.
Sec. 2.1 THE PRIMITIVES OF SET THEORY 77

(h) The set of true assertions about the integers. This is an infinite set, as we can
easily demonstrate by considering assertions of the form
3+1=4,
The assertion
For every natural number n, })7_, i = n(n + 1)/2
is considerably less obvious, but can be proven. There are still other state-
ments which are conjectured to be true, but have never been proved. The fol-
lowing assertion, known as “Fermat’s Last Theorem,” is an example.
Fermat’s Last Theorem: If x,y,z, and n are positive integers and
x” + y" = 2", then n < 2.
This assertion has been a source of frustration to mathematicians for centuries.
In spite of much effort, neither a proof nor a counterexample is known.
Gi) The set with two members, one of which is the set of even integers and the other
the set of odd integers. This example illustrates that sets can have other sets as
members. Denote the set of even integers by A and the set of odd integers by
B, and let C be the set with elements A and B. Then C has only two elements,
each of whic
isa h
set: A ¢ Cand B € C.Notethat2 € A,2 ¢ Band2 ¢ C.
#
Since a set is characterized by its members, a set can be specified by stating
when an object is in the set. A finite set can be specified explicitly by listing its
elements. The elements of the list are separated by commas, and the list enclosed in
braces.

Examples
The following are explicit specifications of finite sets.
(a) The set which contains the elements A, B, and C is denoted by {A, B, C}.
(b) The set which contains all the even, nonnegative integers less than 10 is specified
by {0, 2,4, 6, 8}. #

The elements of an infinite set cannot be explicitly listed; consequently, we


need a way to describe these sets implicitly. Implicit specification is most often
done by means of a predicate with a free variable. The set is defined to be those
elements of the universe of discourse which make the predicate true. Hence, if
P(x) is a predicate with one free variable, the set {x| P(x)} denotes the set S such
that c € Sif and only if P(c) is true.

Examples
The following are implicit specifications of sets. The first two examples are
infinite sets; the third is finite.
(a) The set of integers greater than 10 is specified by
{x[x © TA x > 10}.
(b) The set of even integers can be specified as
{x|dyly e TA x = 2yh.
78 =6SETS Ch. 2

(c) The set {1, 2, 3, 4, 5} can be specified as


{xjxeTAl<x<5s}.

Less formal means are often used to describe sets. One technique is to partly
specify the predicate by the entry to the left of the vertical bar.

Examples
(a) The set of integer multiples of 3 can be specified by {3x|x ¢ I} rather than
{x| dyly e LA x = 3y}}.
(b) The set of rational numbers can be specified by {x/y|x,y eT A y#0}. #

If a set is finite but too large to list easily, or if a set is infinite, ellipses can be
used to specify the set implicitly.

Examples
The following specifications use ellipses to characterize a list of the elements of
a set.
(a) The set of integers from 1 to 50 is specified by {1, 2, 3, ..., 50}.
(b) The set of nonnegative even integers is specified by {0, 2,4,6,...}. #

All of these informal techniques of set specification are convenient, and we


will use them freely.
In more formal developments of set theory, the following axiom is used to
establish that sets are completely specified by their elements. The axiom serves as
a definition of equality of sets.

Axiom of Extension: Two sets A and B are equal, A = B, if and only if they
have the same members (i.e., every element of A is an element of B and every
element of B is an element of A).
The axiom of extension can be expressed in logical notation in two ways:
(a2) A= BoValx e Axe B]
(b.) A= Bo {Vaxlxe A>xec B) A Vxlx ce B>x ce Al

The axiom of extension asserts that if two sets have the same members, then
regardless of how the sets are specified, they are equal. It follows that if a set is
specified explicitly with a list, the order of the listing is immaterial; the set denoted
by {A, B, C} is the same as (equal to) the sets denoted by {C, B, A} and {B, A, C}.
Furthermore, it is of no consequence if an element appears in such a list more than
once; {A, B, A}, {A, B}, and {A, A, A, B, B} are different specifications of the same
set. A finite set can be characterized either explicitly or implicitly, as with the
specifications {1, 2, 3, 4, 5} and {x|x ¢ 1A 1<x< 5}. Moreover, the same set
can be specified implicitly with different predicates, e.g., the sets {x|x= 0} and
{x|x € 1A —1 <x < 1} are equal.
Sec, 2.2 THE PARADOXES OF SET THEORY 79

Problems: Section 2.1

1. Specify the following sets explicitly:


(a) The set of nonnegative integers less than 5.
(b) The set of letters in your first name.
(c) The set whose only element is the first president of the United States.
(d) The set of prime numbers between 10 and 20.
(e) The set of positive multiples of 12 which are less than 65.
2. For each of the following, choose an appropriate universe of discourse and a predicate
to define the set. Do not use ellipses.
(a) The set of integers between 0 and 100.
(b) The set of odd integers.
(c) The set of integer multiples of 10.
(d) The set of human fathers.
(e) The set of tautologies.
3. List the members of the following sets:
(a) {x|lx eT A3<x < 12}
(b) {x|x is a decimal digit}
(c) {x|x=2V x = 5}
4. Determine which of the following sets are equal. The universe of discourse is I.
A = {x|x is even and x? is odd}
= {x| dyly e 1A x = 2y}}
C = {i, » 3)
D = {0, —3,4, —4,...}
Eo pelech’
F = (3, 3, 2, 1, 2}
G = {x|x3 — 6x2 — 7x — 6 = 0}

#2.2 THE PARADOXES OF SET THEORY

As we indicated in the introduction to this chapter, the naive set theory which we
have described was ultimately found to lead to logical inconsistencies known as
paradoxes. Although set theory had its bitter opponents, by the time the paradoxes
were discovered around the turn of the century, the theory was widely accepted
and work was under way to establish it as the foundation of logic and mathe-
matics. Discovery of the paradoxes seemed to threaten this fundamental role of
set theory. But the paradoxes were not generally viewed as a basis for abandoning
set theory and starting over again; instead, mathematicians felt that the theory had
to be patched in some way which would eliminate the paradoxes but not affect
the usefulness of the theory. In this section, we will describe the best known para-
dox and briefly indicate some of the means of modifying the theory to avoid such
paradoxes. These modifications can be imposed by axiomatizing set theory in such
a way that the paradoxes cannot occur.

{Denotes optional section.


80 SETS Ch. 2

A paradox similar to the one which will concern us is the “liar paradox.”
Consider a man who asserts
“T am lying.”
Is he lying or is he speaking the truth?
If he is lying, then what he asserts is false; since he claims he is lying, he must
actually be telling the truth. We conclude that if he is lying, then he is telling the
truth.
On the other hand, if he speaks the truth, then what he says is true, namely
that he is lying. We conclude that if he is telling the truth, then he is lying.
From the above analysis, we conclude he must be neither lying nor telling
the truth. Thus, the assertion “I am lying,” which appears to be a proposition,
cannot in fact be assigned a truth value.
The liar paradox has been known since antiquity and has no obvious relation
to set theory. Yet it resembles the first widely known paradox, commonly known as
Russell’s paradox, which was discovered by Bertrand Russell in 1901 and inde-
pendently by E. Zermelo. This paradox exploits the absence of restrictions in
naive set theory on the ways in which sets can be characterized. In order to present
the paradox, we consider the possibility of a set being a member of itself. Most
sets which occur to us are not elements of themselves; e.g., {1} € {1}. However,
the set of concepts is itself a concept, and hence this set is apparently a member of
itself. The assertions x € x and x ¢ x are therefore predicates which can be used
to define sets.
Russell proposed the following paradox. Let the universe of discourse be the
set of all sets, and define S to be the following set:
S = {x|x € x}
Thus, S is the set of all sets which are not members of themselves. We now ask
“Is S a member of itself?”
Suppose S is not a member of itself. Then S satisfies the predicate x ¢ x
which defines the set S and therefore S < S. On the other hand, if S € S, then S
must satisfy the predicate which defines S and therefore S ¢ S.
Thus, we are led to a contradiction analogous to that of the liar paradox:
neither S ¢ Snor S ¢ Scan be true. A “set,” such as S, which leads to a contra-
diction is said to be not well-defined.
The Russell paradox established that set theory, as originally conceived, led to
inconsistencies. Mathematicians were faced with the necessity of abandoning the
theory or modifying it in some way which would eliminate the paradoxes. The
difficulty was felt to originate in the unrestricted way in which sets could be defined;
in particular, the concept of a set being a member of itself was considered suspect.
A number of approaches were developed, each of which used axioms to restrict
the way in which sets can be specified.
Russell and Whitehead, in the Principia Mathematica, developed what they
called the “theory of types.” This is a set theory in which sets exist in a hierarchy.
The lowest level of the hierarchy contains “individuals.” All other levels of the
Sec, 2.2 THE PARADOXES OF SET THEORY 81

hierarchy contain sets whose members must be elements of the next lower level of
the hierarchy. Each level of the hierarchy is called a type. Since x can be a member
of y only if y is a level higher in the hierarchy than x, a set cannot be a member of
another set of the same type. Thus, in the theory of types, expressions such as
x € x are not meaningful and we are spared the problem of dealing with them.
Other formulations of set theory have been created which also avoid the
Russell paradox. In each of these formulations, there are restrictions on the ways
in which sets can be related, and these restrictions imply that no set is permitted
to be a member of itself. The axiomatic formulations of these theories are too
complex to present here, and we will forego a description of them even though they
are currently more popular than the theory of types created by Russell and White-
head.
Having axiomatized set theory in a way which avoids the Russell paradox, it
is natural to ask if we can be sure that no other paradoxes are lurking in the formal
structure we have created. Using the mathematical techniques which are currently
available, there is no way to show that new paradoxes will not arise. A logical
theory which does not lead to paradoxes is called a consistent theory; more
formally, a logical theory is consistent if it is impossible to prove both an assertion
P and its negation —7P. Since we only want to prove assertions which are true and
no assertion is admitted to be both true and false, we naturally want to use logical
systems which are consistent. However, consistency by itself is not enough, since
a theory which does not permit any theorems to be proved is consistent but worth-
less. A system in which it is possible to prove all the theorems that are true is called
complete. A trivial example of a complete system is one in which every assertion
can be proved, but such a system is obviously not consistent. What we really want
is a logical system which is both complete and consistent; in this case, we can prove
everything that is true and nothing that isn’t. It has been proved that no axiomatic
formulation of set theory can be both complete and consistent. Furthermore, in
order to prove the consistency of one of these formulations, we must construct
the proof in a more powerful system. But to be sure that such a proof is acceptable,
the more powerful system must itself be proved consistent, which requires a still
more powerful system, and so on. It follows that there does not exist any way to
establish that new paradoxes will not arise in set theory.

Problems: Section 2.2

1. (The Barber Paradox) The only barber of a small town vowed that he would only
shave those citizens who did not shave themselves. If only a barber is permitted to
shave someone other than himself, how did the barber get shaved?
2. Show that the assertion
This statement is false.
is not a proposition.
3. Define an adjective to be homological if it applies to itself and heterological if it does
not. The words “ugly,” “English,” “erudite” and “eroneous” are homological, because
82 SETS Ch. 2

“ugly” is an unattractive word, “English” is an English word, “erudite” is a learned


word and “eroneous” is erroneous. The words “German,” “big” and “Lilliputian”
are heterological because “German” is not a German word, “big” is a small word,
and “Lilliputian” is large.
(a) Show that the assertion
“Heterological” is heterological.
is not a proposition.
(b) Is “heterological” homological ?

2.3 RELATIONS BETWEEN SETS

There are two fundamental relations that can hold between two sets: equality and
containment. The relation of set equality has already been defined by the Axiom of
Extension. The set containment relation is defined as follows:

Definition 2.3.1: Let A and B be sets. Then A is a subset of B, denoted


Ac B, if each element of A is an element of B(ie., Ac B<> Vx[x € A> x & B)).

If A < B, we also write B > A and say A is contained in B, or B contains A,


or Bis a superset of A. We write A ¢ Bif A is not a subset of B. If A < Band
A # B, we say A is a proper subset of B.

Examples
(a) The set of even integers is a proper subset of the integers.
(b) The set of men is a subset (and also a proper subset) of the set of humans.
(c) The set {1, 2, 3, 4, 5} is a subset (but not a proper subset) of the set
{x]jxeTAQ0<x<6} #

In all our discussions, we assume a universe of discourse U which may or may


not be explicitly specified. Every variable which denotes an element of a set can
only take on values from this universe. The following theorem is a consequence.

Theorem 2.3.1: Let U be the universe of discourse and A a set. Then A < U.
Proof: The proof is an example of a trivial proof based on the fact that
x € U for every element x. The set A is a subset of U if and only if the implication
xEArmxceU
is true. But x < U is always true; hence the implication is true. Since x was arbi-
trary, it follows by universal generalization that
Vx[x € A>xe U]
and therefore Ac U. Jj

The next theorem establishes the relationship between set equality and set con-
tainment.
Sec. 2.3 RELATIONS BETWEEN SETS 83

Theorem 2.3.2: Let A and B be sets. Then A = B if and only if A c Band


BcaA.
Proof: Thé theorem is established in two parts using direct proofs.
(a) (the “only if” part): A= B>[Ac BA Bc A].
Suppose A = B. Then by the Axiom of Extension, every member of A
is a member of B. Therefore, by Definition 2.3.1, A < B. This establishes
that if A = B then A c B. By the same argument, but interchanging
the role of A and B, if A = Bthen Bc A. Hence,

[A4=B>AcBA[A=B>BcA]
which is equivalent to
(A=B)=>[
B)A
(A(Bc
<¢A).
(b) (the “if” part): [A c BA Bc A] >A=B.
Suppose A c Band Bc A. By Definition 2.3.1,
AcB=NVsxlxeA>xeB] and BoA>Vx[x
€ B>x € Al.
Hence,

(ACB
Bc A)>[(
A Wxlx e A> Xe BA Vax € BS x € Al].
Thus,
[Ac BA Bc A]>(A=B8B). f

The preceding theorem will be used in many of our proofs of set equality;
rather than showing directly that A = B, we will show A < Band Bc A, and
then conclude that they are equal.
The following corollary is a consequence of the preceding theorem. The proof
is left as an exercise.

Corollary 2.3.2: For any set A, Ac A.

Theorem 2.3.3: Let A, B and C be sets. If Ac Band Bc C,then Ac C.

Proof: Let x be an arbitrary element of the universe of discourse.


Since A < B it follows that
xEA>xeEB,

Since B < C it follows that


xE Bere.
Therefore
xEAPXEC.

Since x was arbitrary, it follows that Vx[x € A = x € C] and therefore


AcC. f
84 8§©SETS Ch. 2

Definition 2.3.2: A set with no members is called an empty, null, or void set.
A set with one member is called a singleton set.

Theorem 2.3.4: Let ¢ be an empty set, and A an arbitrary set. Then ¢ c A.


Proof: Let x be an arbitrary element of the universe of discourse. Because
¢@ has no members, the implication
xEegoxeEead
is vacuously true. Since x was chosen arbitrarily, the assertion can be universally
quantified, giving
e 6 >xx € A],
Vilf
which establishes that@ < A. Jj

The next theorem establishes that there exists one and only one empty set;
this is often stated as “the empty set is unique.”

Theorem 2.3.5: Let 6 and ¢’ be sets which are both empty. Then ¢ = ¢’.
Proof (Direct): Since ¢ is empty, it follows from Theorem 2.3.4 that 6 < ¢’.
Similarly, 6’ < ¢. Therefore, by Theorem 2.3.2,6=¢'. fj

Traditionally, the symbol ¢ is reserved to denote the empty set. Note that the
set is distinct from the set {¢}; the latter has one element, namely the empty set.
The empty set can be used to construct an infinite sequence of distinct sets. In the
sequence

P, 1H}, {1B}, (EGH33, - - -


each set except the first has exactly one element, namely the preceding set in the
sequence. In contrast, the ith element of the sequence
D, {G}, 1h, {G3}, (P, (D}, 1d, {G}}}, .-.
has i elements, if we start counting at 0. Each set of this sequence has as its elements
all the sets which precede it in the sequence.

Examples
(a) The set {a,b} has four distinct subsets: {a, b}, {a},{b} and ¢. Note that
{a}< {a,b} and a é {a, b}, but {a} ¢ {a,b} and a ¢ {a, 5}. Furthermore,
$ < {a, b} but $ ¢ {a,b}.
(b) The set {{a}} is a singleton set; its sole member is (the set) {a}. Every singleton
set has exactly two subsets; the subsets of {{a}} are {fa}} and g. #

In general, a set with n elements has 2” distinct subsets. We will prove this
in a later section.

Problems: Section 2.3

1. List all subsets of the following sets:


(a) {1, 2, 3}
Sec. 2.4 OPERATIONS ON SETS = 85

(b) {1, {2, 33}


(c) {UL (2, 33}
(d) {9}
(e) {$, {6}
(f) (1, 2}, {2, 1, 13, {2, 1, 1, 23}
(g) {{d, 2}, (27)
Prove Corollary 2.3.2.
Let A, B, and C be sets. If A € Band B € C, is it possible that A € C? Is it always
true that A € C? Give examples to support your assertions.
Let A, B, and C be sets. Prove or disprove the following assertions:
(2) [AEBABEC]>AEC
(b) [AE BABEC]AE
>C
(C) [ACBABEC]IZ>AEC
Briefly describe the difference between the sets {2} and {{2}}. List the elements and all
the subsets of each set.
Briefly describe the difference between the sets , {G}, and {¢, {6}. List the elements
and all the subsets of each of these sets.
7. Is it possible that A < Band A € B? Prove your assertion.

Programming Problem

Write a program which decides if two input sets are equal or if one is contained
in the other. Assume all sets are finite subsets of the set of natural numbers N.

2.4 OPERATIONS ON SETS

An operation on sets uses given sets (called the operands) to specify a new set
(called the resultant). We will first treat binary operations; a binary operation com-
bines two operands to produce a resultant.
As in the previous sections, we assume that all sets are constructed from some
implicitly specified universe of discourse U.

Definition 2.4.1: Let A and B be sets.


(a) The union of A and B, denoted A U B, is the set
AUB={x|xE AV xe Bh.
(b) The intersection of A and B, denoted A CQ B, is the set

AN B={x|xEeAAx
ce Bh.
(c) The difference of A and B, or relative complement of B with respect to A,
denoted A — B, is the set
A—B={x|xeAAxéBh.
86 SETS Ch. 2

Examples
Let A = {0, 1, 2} and B = {I, 2, 3}. Then
(a) = {0,1,
AUB 2, 3}
(bt) {l, 2}
AN B=
(c) A—B= {0}
(dd) B-A=({3} #

Definition 2.4.2: If A and B are sets and A B=, then A and B are
disjoint. If C is a col lec tio n of sets suc h tha t any two dis tin ct ele men ts of C are
disjoint, then C is a collection of (pairwise) disjoint sets.

Example
If C = {{0}, {1}, {2},...} = (fi © N}, then C is a collection of disjoint sets.
ain
wT

We next defi ne som e imp ort ant clas ses of bin ary ope rat ion s. Not e that the
following definition is not restricted to operations on sets.

Definition 2.4.3: Let ["] denote a binary operation, and let x [] y denote the
resultant obtained by applying the operation [-] to the operands x and y. Then
(a) The operation [1] is commutative if x[]y=y(]~.
(b) The operation ["] is associative if (x Qy)C]z=x(10 (42).

Examples
For the integers, the binary operation of addition is commutative and associa-
tive since for all integers x, y and z,

X+y=ytx
(«ty4+z2=x4+04+2)
However, the operation of subtraction is neither commutative nor associative, ¢.g.,
6-444—-6
(6—4)-24#6—-(4-—2) #

Theorem 2.4.1: The set operations of union and intersection are commuta-
tive and associative, i.e., for arbitrary sets A, B, and C,
{a) AUB=BUA
(b) ANB=BQA
(c) (AUB)UC=AU(BUC)
(dd) AN BNC=AN(BNC)
The proofs of assertions (a)-(d) use the commutativity and associativity of the
logical operators \/ and /\. We will illustrate by proving assertions (a) and (c).
Sec. 2.4 OPERATIONS ON SETS = 87

Proof:
(a) Let x be an arbitrary element of the universe U. Then
xEAUBSXECAVXEB Definition of U
>xEBVxeEad Commutativity of
-xeEeBuA Definition of U
Since x was arbitrary, it follows that
Valxe AUBSxe BUA].
Hence, A UB=BUA.
(c) Let x be an arbitrary element. Then
xEAU(BUC)SxE AV XE(BUC) Definitiof
onU
-xEeAV (xe BV xeC) Definitiof
on U
(xe AVxe B)VxeEC Associativity of V
Hx E(AUBVxEC Definition of U
-xE(AUBUC Definition of U
Since x was arbitrary, it follows that
Valxe AU(BUCSx
) € (AUB) UC.
Hence, AU(BUC)=(AUB)UC. §

The following definition is not restricted to operations on sets.

Definition 2.4.4: Let A and [-] be binary operations. Then A distributes


over [|_| if the following hold:
xA(WVID2)=%AY Ie Az)
OO DAx=VAxXOECA)
(Note that if A is a commutative operation, then each of these “distributive laws”
implies the other.)

Examples
For the set of integers, multiplication distributes over addition:

x(Ytz2Hx yt xz
Addition does not distribute over multiplication, e.g.,
4+ (6-2)#(4+6)-44+2) #
Theorem 2.4.2: The set operations of union and intersection distribute over
each other, i.e., for arbitrary sets A, B and C.
(a) AU(BNOC)=(AVBN(AUC)
(b) AN(BUC=(AN BU(MANC)
88 = SETS Ch, 2

Proof: (a) Let x be an arbitrary element. Then


xEAU(BNC)exEAVxE(BNC) Definition of U
oxEAV(xEBAxE€EC) of n
Definitio
o(xEAVxeE B)A(KEAV x EC) Distributivity of
“V over/\
(xe AUB)A(XEAUC) Definition of U
<x E(AUB)N(AUC) Definition of
Hence, AU (BON C)=(AUB)N(AU C).
The proof of part (b) is left as an exercise. fj

Theorem 2.4.3: Let A, B, Cand D be arbitrary subsets of a universe U. Then


the following assertions are true.
(a) AUA=A
(b) AN A=A
(c) AUG=A
(d) AN¢G=¢
(ec) A—BcA
(f) If Ac Band Cc D,then(4 UC) < (BU D)
(g) If Ac Band Cc D,then(A NM C)c (BN D)
(h) ACAUB
(i) ANBcA
Gj) Ac B,thnAUB=B
(k) IfA cB, then AN B=A
() A-G@=A
(m) AN (B— A)=$6
(n) AU(B—A)=AUB
(0) A—-(BUC)=(A-—B)N(A-C)
(pP) A-(BAC)=4—-BVA-C)
Proof:
(a) (A UA=A,) By Definition 2.4.1(a), for any x € U,
xEAUASXEAVXEA
-xeEad
Hence, AUA=A.
(c) (A U¢=A.) By Definition 24.1), xe AUdexEAVxe?d.
But since x € @ is always false, it follows that xe AVxeEeg<o
x € A.Hence,x € AUb<x &€ A, and therefore A U $ = A.
Sec. 2.4 OPERATIONS ON SETS 89

(e) (A — Bc A.) By Definition 2.4.1(0), xe A—BoxEeAAx & B.


Hence, x € A — B=> x & A, and it follows that A — Bc A.
(f) (Uf 4Ac Band Cc D, then (AUC) c (BU D).) Assume A < B and
Cc D. Suppose x is an arbitrary element of A U C; then
xe AV x & C. We now construct a proof by cases.
Case 1: Suppose x € A. Since A c B it follows that x € B. There-
forex ¢ B\V x € Dand hencex € BU D.
Case 2: Suppose x € C. By an argument analogous to Case 1 it
follows that x € BU D.
Hence, if x € A UC, then x € BU D, and therefore
AUCCBUD.
(j) (If A < B, then A U B= B.) We use a direct proof and assume A c B.
Since B < B, it follows from part (f) that 4 U Bc BU Band from
part (a), B U B = B. Hence, A U B < B, which establishes containment
in one direction. From part (h), B < A U B, establishing containment
in the other direction. Therefore, A U B = B.
(1) (A—$=4) A—G={x|x EA Ax ¢€ O}. But x ¢@ is always
true. Hence, x € A A\ x € 6x & A. Therefore,
A—@={x|x € AJ=A.
(0) (A —-(BUC)B) =N((A
A ——C),)
xEA-(BUC)SxEAAXE(BUC)
-xEeAA\xe (BUC)
exe AAN7reEeBVxEC)
-xEAATAWME BAKE C)
o-(XEAA
AHXEA
EAXBEC)
-xEA-—-BAxEA-C
-xe(A—B)N(A—-C)
We leave the proofs of the remaining parts as exercises. Jj

From parts (j) and (k) of Theorem 2.4.3, it follows that for any subset A of
a universe U, AU U = U and At U = A. When the universe of discourse is
understood, a unary operation of complementation is defined.

Definition 2.4.5: Let U be a universe and A be a subset of U. The (absolute)


complement of A, denoted A, is the set d = U— A = {x|x € A}.

Examples
(a) If U = {i, 2,3, 4} and A = {1, 2}, then A = {3, 4}.
(b) If U=Nand A = {x|x > 0}, then A = {0}.
(c) If U=ILand A = {x|x > 0}, then A = {x|x <0}. #
90 SETS Ch, 2

Theorem 2.4.4: Let A be an arbitrary subset of some universe U. Then


(a) AUA=U
(b) AN A=¢

The proofs follow directly from the previous theorem and are left as exer-
cises.
The following theorem states another useful relationship between a set and its
complement.

Theorem 2.4.5 (Uniqueness of complement): Let A and B be subsets of a


universe U. Then B == A if and only if A UU B= Uand AM B= 9.
Proof: The “only if” part follows directly from Theorem 2.4.4. To show the
“if” part we assume A 1 B= ¢ and A U B= U. Then
B=UQO8B
=(AUA)NB
=(A B)U(AN B)
= U(ANB)
=(AfM A)U(AN B)
=A (AUB)
=AMU
=A ¥
Using the preceding result, we have the following.

Theorem 2.4.6: Let A be an arbitrary subset of U. Then A= A, ie., the


complement of the complement of A is A.
Proof: By Theorem 2.4.4, 4 U A = U and A A = ¢. By Theorem 2.4.5,
this establishes that A is the complement of A, that is A= A.

Theorem 2.4.7 (DeMorgan’s laws): Let A and B be arbitrary subsets of U.


Then
(2) AUB=ANB
(b+) ANB=AUB
Proof: The proofs are direct consequences of the definition of absolute
complement and identities (0) and (p) of Theorem 2.4.3. J

When the number of sets is small, the result of many set operations can be
represented pictorially using Venn diagrams. Examples of these diagrams are given
in Fig. 2.4.1. In each case, the rectangle represents the universe and the circles
Sec. 2.4 OPERATIONS ON SETS 91

A-—B (AUB)NC (AUC)NOB

Fig. 2.4.1 Venn diagrams

represent arbitrary sets A, Band C. The shaded portion of each diagram represents
the expression which appears below.
The binary operations of union and intersection can be considered as special
cases of operations which form unions and intersections of any number of sets.
These more general operations are defined over collections of sets.

Definition 2.4.6: Let C be a collection of subsets of some universe U.


(a) The union of the members of C, denoted |_Jsec S, is the set
UscecS = {x|ds[pse CA xe SI}.
(b) If C4 @, the intersection of the members of C, denoted (\scc S, is the
set (\sec S = {x |VS[S e C>x € SI}.

These operations are natural generalizations of the union and intersection


operations defined previously; if x € (Jsec S, then x is an element of at least one
subset S € C, and if x & ()\sec S, then x is a member of every subset S € C.
Note that C is required to be nonempty for ()\s<c S to be defined. This requirement
is necessary because if C = @¢, then the implication S ¢ C= x € S would be
vacuously true for every S, and therefore, the predicate VS[S¢ C>x eS]
would be true for every x. Hence, the set defined would be the universal set U.
By requiring that C + @, this possibility is eliminated.
If D is a set and a set A, has been defined for each d € D, then d is called
the index of A,, the collection C = {A,|d € D} is called an indexed collection
of sets, and D is called the index set of the collection. When D is the index set of a
collection C, the notation _Jzep Az denotes Jscc S, and ( ep Az denotes (\sco S.
If C is a finite indexed collection of sets and the index set is a set of natu-
92 SETS Ch. 2

ral num ber s {0, 1, 2,. .., ”} the n the uni on and int ers ect ion of the mem ber s of
C can be denoted by using notation similar to the summation notion. Let
C = {Ap, A,,..-, A,}; then

Us=U4= U 4= U A, = Ap U Ay U +++ U A,
SEC i=0 O<isa f6{0,1,...,0]

Similarly, if C is an infinite collection which is indexed by N,


C = {Ag, A;, A2,...-}, then
Us =( 4= U4 =U4=4U4,U4,U>:
SEC i=0 Osi i€N

In gene ral the set of indi ces need not be a subs et of N, but can be an arbi trar y set.

Examples
Let the universe be the set of real numbers R.
(a) If C = {{0}, {0, 1}, (0, 1, 2}}, then Usec S = {0, 1, 2}, and sec S = {0}.
(b) Let (a, b) denote the open interval from a to 5, i.e., (a, b) = {x|a <x < 5}. If
Usec S = (—o, 00) = R, and
C = {((—n,n)|n € I A n> 0}, then
sec S = (—1, 1).
(c) Let C = {A;|i © {a, b, c}}, where A, = {0, 1, 2}, A, = [4, 5, 6} and A, = {2}.
Then Uietatc) A; = {0, 1, 2, 4, 5, 6} and ( \eta,b,c} A; = @. #

We will often refer to the set of subsets of a set. Since the set of subsets of
a given set A is unique, we can define a unary operation on sets whose value is the
set of subsets of the operand.

Definition 2.4.7: Let A be a set. The power set of A, denoted (A), is the set
of all subsets of A.

Examples
(a) If A = @, then P(A) = {9}.
(b) If A = {1}, then @(A) = {@, {1}.
(c) If A = {1, 2} then @(A) = {¢, {1}, {2}, {1, 2H.
(d) If A is any (finite or infinite) set of natural numbers then A € O(N). #

If A is finite, then @(A) is finite; otherwise, (A) is infinite.

Problems: Section 2.4

1. (a) Construct Venn diagrams for the following:


@ AUB
(ii) ANB
(iii) A—-(BUC)
(iv) AN(BUC)
Sec. 2.4 OPERATIONS ON SETS = 93

(b) Give a formula which denotes the shaded portion of each of the following Venn

(i) ! Ce3,2)

(ii) a>
(iii)

Let A, Band C be arbitrary sets. Express A U B U Casa union of disjoint sets.


Prove parts (b) and (d) of Theorem 2.4.1.
Let A, B, and C be sets
(a) Show thatifC c AandC c B,thenC c AM B(ie., A A Bis the largest set
contained in both A and B).
(b) Show that if C > A and C > B, then C > A U B(ie., A U Bis the smallest
set which contains both A and B).
Prove part (b) of Theorem 2.4.2.
Suppose 4 + ¢ and AU B= AU C. Show that it does not follow that B = C.
Suppose in addition that A © B = A. C. Can you conclude that B = C?
(a) Show that “relative complement” is not a commutative operation; that is,
there exist universes which contain sets A and B such that
A—BAB— A.
(b) Is it possible that A — B= B — A? Characterize all conditions under which this
occurs.
(c) Is “relative complement” an associative operation? Prove your assertion.
Prove the remaining parts of Theorem 2.4.3.
Prove Theorem 2.4.4.
10. Prove the following identities.
(a) AU(ANB)=A
94 SETS Ch. 2

(b) AN(AUB)=A
(c) A-~B=ANB
(dd) AU(ANB)=AUB
(e-) AN(AUB=ANB
11. In each of the following, find Usec S and (sec S.
(a) C= {¢}
(b) C = {6, (6}}
(c) C = {fa}, {5}, {a, b}
(dd) C={Hlie B
12. Let A, B, and C be subsets of some universe U, and let D be the following collection.
D={AN BOAC,AN BOC AN BOC ANBOC
AN BAC ANBNACANBACANBNCG
(a) Construct a Venn diagram for the elements of the collection D.
(b) Prove that AM BO Cand AN BN Care disjoint. Is D a disjoint collection
of sets?
(c) Prove that Usen S = U.
13. Let C be a nonempty collection of subsets of some universe U. Prove the following -
generalization of DeMorgan’s laws.
@ US=S8
SEC sec
(b) (\S=US
SEC SEC
14. Specify the power set for each of the following sets.
(a) {a, 5, c}
(b) {{a, 5}, {c}}
(c) {{a, b}, {b, a}, {a, 6, B} |
15. Let S, = {ao, @1,..., Gnt and S,.1 = (ao, @1,..- 5 &q, Anyi}. Describe how P(S,41)
is related to @(S,). (Hint: P(S,,1) contains P(S,).)
16. Let x and y be real numbers and define the operation x A y to be x” (x raised to the
power y). :
(a) Show that the operation A is neither commutative nor associative. :
(b) Let o represent multiplication. Determine which of the following distributive _
laws hold.
(i) xoYVAz=(@cy)A&ez)
(ii) GY Az)ox =Yox Alex)
(fii) x AQoz=@AYo@Az)
(iv) Woz Ax=WAx)°ZAx)

Programming Problems

L Write a program to generate the power set of {0, 1, 2,..., 2} for any natural number
n given as input.
(a) Write a program which accepts specifications of two finite sets A and B, where ©
A, B < N, and prints a nonredundant list of the elements of AU Band AO B. .
(b) Write a program to determine for a given set A and an arbitrary n © N whether
né A.

Sec, 2.5 INDUCTION 95

2.5 INDUCTION

Inductive Definition of Sets

Earlier in this chapter, we described how finite sets can be defined either
explicitly by listing the elements of the set, or implicitly by using a predicate with
free variables; we also observed that infinite sets can only be specified implicitly.
But predicates do not always provide a convenient means of charactering an
infinite set. For example, there is no convenient or obvious predicate to specify
the set of ALGOL, PL/I, or FORTRAN programs, or even such a basic structure
as the set of natural numbers N. Such sets are often most naturally defined using an
inductive definition.t
An inductive definition of a set always consists of three distinct components.
1. The basis, or basis clause, of the definition establishes that certain objects
are in the set. This part of the definition has the dual function of establish-
ing that the set being defined is not empty and of characterizing the
“building blocks” which will be used to construct the remainder of the
set.
2. The induction, or inductive clause, of an inductive definition establishes
the ways in which elements of the set can be combined to obtain new
elements. The inductive clause always asserts that if objects x, y,...,2
are elements of the set, then they can be combined in certain specified ways
to create other objects which are also in the set. Thus, while the basis
clause describes the building blocks of the set, the inductive clause de-
scribes the operations which can be performed on objects in order to
construct new elements of the set.
3. The extremal clause asserts that unless an object can be shown to be
a member of the set by applying the basis and inductive clauses a finite
number of times, then the object is not a member of the set. The extremal
clause of an inductive definition of a set S has a variety of forms, such as
(i) “No object is a member of S unless its being so follows from a finite
number of applications of the basis and inductive clauses.”
(ii) “The set S is the smallest set which satisfies the basis and inductive
clauses.”
(iii) “The set S is the set such that S satisfies the basis and inductive
clauses and no proper subset of S satisfies them (i.e., if T is a sub-
set of S such that T satisfies the basis and inductive clauses, then
T= 8S).”
(iv) “The set S is the intersection of all sets which satisfy the properties
specified by the basis and inductive clauses.”
In fact, all these forms of the extremal clause are equivalent in consequence though

{The term “recursive definition” is often used to denote what we call an “inductive definition.”
96 SETS Ch, 2

not in form, and all serve the purpose of establishing that nothing is a member
of the set being defined unless it is required to be so by the first two steps of the
definition. Often the extremal clause is not stated explicitly in an inductive defini-
tion; this rarely leads to misunderstandings.

Example

If the universe of discourse is the set of integers I, then a predicate definition of


the set E of even nonnegative integers can be given as follows:

E = {x|x >0 A aylx = 2y}}


The same set can be defined inductively as follows:
1. (Basis) 0 € E,
2. (nduction) Ifa <¢ E, then (n + 2) € E.
3. (Extremal) No integer is an element of E unless it can be shown to be so froma
finite number of applications of clauses 1 and2. #

We will now introduce some notation and terminology that will enable us
to give some further examples of inductively defined sets. We use Z to denote
a finite and nonempty set of symbols or characters; & is called an alphabet. A string
of a finite number of symbols, each of which is an element of 2, is called a word
or string (or sometimes a sentence) over the alphabet X. Let x be a word over
x; if x = a,a,a;...a,, where n € N and a, & 2 for each 1 <i<n, then the
length of x isn, the number of symbols in the word x. The string of length 0, denoted
A, is called the empty (or null) string. If x and y are strings of symbols over XZ,
x = a,a,...a, and y= b,b,...b,, where a, € Z and b, € & for all i andj,
then x concatenated with y, denoted xy, is the string
xy = a,a,...4,b,b,...5,5
if x = A then xy = y and if y = A then xy = x. If z = xy, then x is a prefix of
zand y is a suffix. If x ~ z, then x is a proper prefix; if y % z then y is a proper
suffix. If w = xyz then y is a substring of w and if y ¥ w, then y is a proper substring.
The following two definitions describe sets which are widely used in computer
science. In later parts of this text we will develop some of the properties of these
sets, and we will often refer to them in examples.

Definition 2.5.1: Let 2 be an alphabet. The set X* of all nonempty strings


over Z is defined as follows:t
1. (Basis) Ifa € Z, thena &€ Zt.
2. (induction) If x < 2* and a e€ %, then ax € Z* (ax denotes the string

We will not distinguish between the symbol a € X and the word over which consists of
the single symbol a. These two objects are not the same, but the distinction is generally not an
important one for our purposes.
Sec. 2.5 INDUCTION 97

which consists of the symbol a juxtaposed, or concatenated, with the


string x).
3. (Extremal) The set Z* contains only those elements which can be con-
structed by a finite number of applications of clauses 1 and 2.
The set X* includes strings of length 1, 2,3,... and is therefore an infinite set.
Note, however, that no string in X* contains an infinite number of symbols; this
is ruled out by the extremal clause of the definition.

Example
If & = {a, b}, then L* = {a, b, aa, ab, ba, bb, aaa, aab,..3. +
The set of all finite strings of symbols from the alphabet © is denoted by &*. The
set 2* includes the empty string and can be defined as £* = E+ U {A}, or it can
be defined inductively.

Definition 2.5.2: Let X be an alphabet. Then £* is defined as follows:


1. (Basis) A € =*.
2. (Induction) If x ¢ Z* and a € , then ax € E*.
3. (Extremal) Nothing is an element of the set £* unless it can be con-
structed with a finite number of applications of clauses 1 and 2.

Examples
(a) If X = {a, b}, then X* = {A, a, b, aa, ab, ba, bb, aaa, aab, . . .}.
(b) If X = {0, 1}, then L* is the set of all finite binary sequences, including the
empty sequence. #

An expression or formula which makes sense in some mathematical discourse


is often referred to as a well-formed formula, or wff. Inductive definitions are used
to characterize the set of well-formed formulas whenever a careful definition is
required. Many examples occur in programming languages; for example, inductive
definitions can be used to describe the class of algebraic expressions which may
appear in an assignment statement or the class of logical expressions which may
appear in a conditional statement. In some programming languages such as
ALGOL, the syntax is largely described by means of inductive definitions given
in BNF (Backus-Naur Form, or Backus Normal Form). A description of BNF
is beyond our scope; the reader is referred to the description of ALGOL 60 given
in Rosen [1967].

Examples
(a) The set of arithmetic expressions includes sequences of symbols such as
“(5 + 6)/2)” and “((4/2) — 13)” but does not include sequences such as
“+ 6+”, and “+) (”, even though all these expressions are sequences of
symbols from the same alphabet. We will illustrate how to define the set of
98 SETS Ch. 2

well-form ed ari thm eti c exp res sio ns by mea ns of an ind uct ive def ini tio n. For
simplicity we will res tri ct our def ini tio n to the set of ari thm eti c exp res sio ns
involving onl y int ege rs, the una ry ope rat ion s of + and —, and the bin ary
operations of +, —, / and ».
1. (Basis) If D = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and x € D*, the n x is an ari th-
metic expression.
2. (induction) If x and y are arithmetic expressions, then
(i) (+ x) is an arithmetic expression,
(ii) (— x) is an arithmetic expression,
(iii) (x + y) is an arithmetic expression,
(iv) (x — y) is an arithmetic expression,
(v) (x/y) is an arithmetic expression, and
(vi) (x * y) is an arithmetic expression.
3. (Extremal) A sequence of symbols is an arithmetic expression if and only
if it can be obtained by a finite number of applications of clauses 1 and 2.
The set of arithmetic expressions characterized by this definition includes
346, 0000, (—64), (3 + 7), 3#(—61)), and (+(—(+-(6/7))).
(b) The set of propositional forms is another set which is most naturally defined
inductively. Let V = {P, Q, R,...} be a set of propositional variables, where
V does not contain any of the following symbols: (,), A, V, =>, <>, 7,9, 1.
Then
1. (Basis) 0 is a propositional form.
1 is a propositional form.
If x € V, then x is a propositional form.
2. (Induction) If E and F are propositional forms, then
(7),
(EV F),
(E \ F),
(E => F), and
(E <> F) are all propositional forms.
3. (Extremal) The set of propositional forms is the set of all expressions
which can be formed by a finite number of applications of clauses 1 and 2.
Using this definition, if V = {P, Q, R,}, then (P A Q)=> R) is a proposi-
tional form over V. This can be established as follows: From the basis clause, it
follows that P, QO, and R are all propositional forms. Applying the induction clause
to P and Q, it follows that (P A Q) is a propositional form, and by another
application of the inductive clause, this time to (P A Q) and R, it follows that
(P A Q)= R) is a propositional form. Thus one can show that an element is a
member of an inductively defined set by exhibiting a sequence of applications of
the basis and inductive steps which produces the element in question. #

Recursive Procedures

Inductive definitions form a subclass of a more general class known as recur-


sive definitions. As the term is commonly used in computer science, the salient
characteristic of a recursive definition is “self-reference” as in the induction clause
Sec. 2.5 INDUCTION 99

of an inductive definition. As we use the terms,t not all recursive definitions are
inductive; we will give examples to illustrate the difference in a later chapter.
In programming, a recursive procedure, or recursive subroutine, is one which
can call itself, either directly or indirectly. Recursive procedures are based on
recursive definitions, although the definition need not be of a set. If a recursive
procedure is based on an inductive definition, the segments of the procedure often
correspond in a natural way to the basis and induction clauses of the definition.
It is often necessary to write procedures to determine whether an input has
a specified property. If the set of elements which have the property is defined
inductively, a recursive procedure is a natural and powerful mechanism for deter-
mining set membership.

Examples
(a) Consider the universe I, and let E be the set of nonnegative even integers de-
fined inductively in the first example of this section. The recursive procedure
EVEN() given in Fig. 2.5.1 returns “yes” if an input 2 € I is an element of
the set E; otherwise it returns “no.” The procedure has three parts. The first
part causes “no” to be returned if the input is too small; this part of the pro-
cedure does not correspond to any part of the inductive definition of E. The
second part of the procedure tests if n = 0; this corresponds to the basis clause
of the definition of E, The third part corresponds to the inductive clause of the
definition and causes EVEN to call itself with the parameter n — 2.

procedure EVEN(n):
comment: If is even and n> 0, then return “yes.”
Otherwise, return “no.”
if 2 < 0 then return “no”
else
if nz = 0 then return “yes”
else
return EVEN( — 2)

Fig. 2.5.1 Recursive procedure EVEN to determine if n is a


nonnegative even integer

(b) Consider the problem of recognizing whether a string of symbols is an arith-


metic expression, where the set of arithmetic expressions is defined inductively
in part (a) of the preceding example of this section. A recursive procedure
ARITH(exp) based on this definition is given in Fig. 2.5.2. This procedure
returns “yes” if the input expression exp is generated by the inductive definition
of arithmetic expressions; otherwise, the procedure returns “no.” The proce-
dure first checks to see if exp is generated by the basis clause, that is, if exp isa

TA distinct but related meaning of the term “recursive” is used in mathematical logic and the
theory of computable functions, but a discussion of the relationship between the two uses is beyond
our scope. We will only use the term in the informal sense described above.
Ch. 2
100 = SETS

procedure ARITH(exp):
comment: If exp is an ari thm eti c exp res sio n, the n ret urn “ye s.”
Otherwise return “no.”
begin
comment: Determin e if exp is gen era ted by the bas is cla use .
if exp is a string of digits then return “yes”
else begin
comment: De te rm in e if exp is gen era ted by the ind uct ive cla use .
if exp contains a sub str ing exp _1 suc h tha t eit her exp = (+ ex p_ 1)
or exp = (—exp_1))
then return ARITH(exp_})
else if exp con tai ns sub str ing s exp _1 and ex p_ 2 suc h tha t
exp = (exp_1 1] exp_2)
where [[] is an operation symbol (+, —, / or *)
and ARITH(exp_1) = “yes”
and ARITH(exp_2) = “yes”
then return “yes”
end;
comment: exp is not pr od uc ed by eit her bas is or ind uct ive cla use s.
return “no”
end

Fig. 2.5 .2 Rec urs ive pro ced ure AR IT H to det erm ine whe the r a
string of symbols is an arithmetic expression

string of digits. If so, the procedure ret urn s “ye s.” If exp is not a str ing of dig its ,
then ARITH breaks exp int o no no ve rl ap pi ng su bs tr in gs to de te rm in e if exp is
generated from other ar it hm et ic ex pr es si on s by the in du ct io n cla use . If thi s is
not the case, ARITH conc lu de s tha t exp is not an ar it hm et ic ex pr es si on an d
returns “no.” #

When a recursive pr oc ed ur e to dec ide if an el em en t is in a set is bas ed on an


inductive definition of the set, it is nec ess ary to pro vid e a me ch an is m for ret urn -
ing a negative answer. The pr oc ed ur e giv en in Fig . 2.5 .1 con tai ns a tes t for a neg a-
tive input. Without this test, the pr oc ed ur e wo ul d ret urn “ye s” if the inp ut was
nonnegative and even, but wo ul d not ter min ate for oth er inp uts . Th e pr oc ed ur e
ARITH of Fig. 2.5.2 must determ in e if the inp ut can be br ok en int o sub str ing s of
operands and operators. All the pos sib ili tie s can be co ns id er ed by exh aus tiv e tes t,
and if none are successful, the pr oc ed ur e ret urn s “no .” (In fac t, the re are mu ch
faster ways of determining thi s in fo rm at io n tha n by exh aus tiv e tes tin g; our ex am -
ples are sui tab ly ill ust rat ive , but the y are not eff ici ent alg ori thm s.)

Inductive Proofs

Inductive definitions not onl y pro vid e a me th od of def ini ng inf ini te set s, but
they also form the basis of som e pow erf ul tec hni que s for. pro vin g the ore ms. If
a set is finite, a sta tem ent of the for m Vx P( x) can in pri nci ple be est abl ish ed by an
Sec. 2.5
INDUCTION 101

exhaustive proof by cases. But for infinite sets,


some other device must be used.
Proofs by induction are proofs of universally
quantified assertions where the
universe of discourse is an inductively defined
set.
Suppose we wish to establish that all the elements
of an inductively defined §
have a property P; i.e., we wish to establish VxP(x
) for the universe S. A proof by
induction usually consists of two parts correspo
nding to the basis and induction
clauses of the definition of S:
1. The basis step establishes that P(x) is true for ev
ery element of x e S
specified in the basis clause of the definition of S.
2. The induction step establishes that each element
constructed using the
induction clause of the definition of S has the proper
ty P if all the ele-
ments used in its construction have the property
P.
Note that there is no step in an inductive proof
which corresponds to the
extremal clause of the definition of ‘S, but its role is cr
ucial to proofs by induction.
The extremal clause guarantees that all elements
of S can be constructed using
only the basis and induction clauses of the definition,
An inductive proof establishes
that every element x constructed in this way has so
me property P. It follows from
the extremal clause that the assertion P(x) holds for
all elements of S, and we can
therefore conclude VxP(x).
To illustrate the technique of inductive proof, cons
ider the set of well-formed,
or balanced strings of parentheses. (For clarity, we
will represent parentheses by
square brackets.)

Definition 2.5.3: Let X be the alphabet {L]}. The set


B of well-formed paren-
thesis strings is the subset of £* such that
1. (Basis )[ ] is an element of B.
2. (Induction) If x and y are elements of B, then
(i) [x] is an element of B, and
(ii) xy is an element of B.
3. (Extremal) The set B consists of all symbol string
s which can be con-
structed using a finite number of applications of clause
s 1 and 2.
The set B is the set of all parenthesis sequences which ca
n occur in algebraic
formulas, such as [ ], [[ ]], [ It 1, (f JIL J, and [If Jf J]. We
now show that in any
well-formed parenthesis string, the number of left parent
heses is equal to the
number of right parentheses.

Theorem 2.5.1: Let x be an element of B. If L(x) denotes


the number of left
parentheses in x and R(x) denotes the number of right pare
ntheses in x, then
L(x) = R(x).
Proof: The theorem asserts Vx[x ¢ B > L(x) = R(x)]. The proof follows
the definition of B.
Let x be an arbitrary element of B.
102. =SETS Ch, 2

1. (Basis) If x = [ ], then L(x) = R(x) = 1.


2. (Induction) Let x and y be elements of B, and suppose they have the prop-
erty that L(x) = R(x) and L(y) = R(y). We show that any element z
which can be constructed from x and y has the property L(z) = R(z).
(i) If z=[x], then L(z)= L(x) + 1= R(x) +1= RQ)
(ii) If z= xy, then L(z) = L(x) + L(y) = R(x) + Ry) = RE).
This completes the inductive proof and establishes the theorem. Jj

Most commonly, proofs by induction deal with the natural numbers. In order
to discuss these proofs, it will be useful to have the following inductive charac-
terization of N.
1. (Basis)O EN.
2. (Induction) Ifn < N, then(n+ 1) EN.
3. (Extremal) If S < N, and S has the properties
(i) OES,
(ii) For everyn EN, ifn e Sthen(@z+ lI é S,
then S=N.
In fact, this does not suffice to define the natural numbers because we have
not carefully specified what is meant by the basis and inductive steps; we will
present a proper definition of N in the next section. However, the above charac-
terization will enable us to discuss inductive proofs for the universe N. The extre-
mal clause in the above characterization of N is the form customarily used in
definitions of the natural numbers; it is called the First Principle of Mathematical
Induction. This form of the extremal clause implies the procedure to be used for
inductive proofs of assertions of the form VxP(x) for the universe of natural
numbers. Such a proof proceeds as follows:
1. (Basis) We first show that P(0) is true, using whatever proof technique is
appropriate.
2. (Induction) We next show Vn[P(n) > P(n + 1].
The inductive step of the proof is usually a direct proof of the implication
P(n) = P(n + 1), where the implication is established for arbitrary n < N. The asser-
tion P(n) is known as the induction hypothesis. The induction hypothesis is often stated
as “Assume P(n) is true for arbitrary n < N”. Note that this is not equivalent to
assuming the truth of the theorem; P(n) is assumed only for the purpose of proving
the universally quantified assertion Wn[P(n) = P(n + 1)]. Once P(n) > P(n + 1)
has been proven for arbitrary n, it follows (by the rule of inference known as
Universal Generalization) that Vn[P(n) = P(n +- 1)]. Then from the First Principle
of Mathematical Induction we can conclude VxP(x). For suppose S is the subset
of N such that P(n) is true for every n < S. The basis step of the proof establishes
that 0 € S, The inductive step establishes that for every n € N, if n € S, then
(n + 1) & S. By the extremal clause of the definition of N, itt follows that S=N,
i.e., WxP(x).
To illustrate proofs by induction over N, we will prove the following.
Sec. 2.5 INDUCTION 103

Theorem 2.5.2: For alln & N,


>> i- n(n . 1).
r=0

The theorem is of the form WnP(n), where P(n) is the assertion


Hy. Ant i)
wig
Proof:
1, We first establish the basis step P(0):

aa2.
=O a
00+1

The proof consists simply of evaluating each side, giving 0 = 0.


2. The induction step establishes Wn[P(n) > P(n + 1)]. To prove this
assertion, we give a direct proof of the assertion P(m) > P(n + 1) for
arbitrary n < N. In a direct proof of P(n) = P(n + 1), the induction
hypothesis, P(7) is assumed to be true. P(n) asserts

9G

We wish to show P(n + 1), ie.,


Sat )m+2).
L
?=0 2
But,
m+ n
i=(#+D+Di
i=0 i=0

=(n-+1)+ mn td) (by the induction hypothesis)


= athens)
_@t)Da+2).
rr,
Since n was arbitrary, it follows that Wn[P(n) > P( + 1)]. By the First
Principle of Mathematical Induction we conclude that VxP(x). Jj

The following theorem gives algebraic expressions for two more finite sums
which will occur in Chapter 5 when we treat the analysis of algorithms. The proofs
are by induction and are left as exercises.

Theorem 2.5.3: Letr be areal number. Then for alln € N,


(2) Mr=(n+1) ~~ ifr=1,
i=0
prt — 1]
ifr
~ 1.
r—l
Ch. 2
104 SETS

(b) Sint = MTD ifr=1,


i=0

art? —(nt+ Dreiser ifrA~l.


~ (r — 1)?
e as se rt io ns in vo lv e pr op er ti es of th e na tu ra l
In many proofs by induction, th
g th eo re m is an ex am pl e; we wi sh to pr ov e
numbers only indirectly. The followin
r set s, bu t th e na tu ra l te ch ni qu e is an
an assertion relating finite sets to their powe
inductive proof.

2.5.4: If S is a finite set wi th n el em en ts , th en S ha s 2” di st in ct


Theorem
subsets.
Proof: Th e as se rt io n in log ica l no ta ti on is the fo ll ow in g:

Vn VWS{[S has n el em en ts > @( S) has 2” el em en ts ].

The inductive proof has two parts.


1. (Basis) We must show WS [S has 0 el em en ts > ®( S) ha s 1 el em en t] . Le t
S be an arbitrary set with 0 el em en ts . Th en S = ¢ an d it fo ll ow s tha t
@(S) = {}. Since 2° = 1, the as se rt io n is es ta bl is he d for n = 0.
2. (Induction) We mu st sh ow tha t if the as se rt io n is tru e for set s
with n elements, then it is true for sets with (# + 1) elements. Let
S = {a,, a, a3,... , a,} , an d S’ = S U {a} wh er e a ¢ S. If A is the set
{B U {a}| B € O(S) }, th en P( S’ ) = P( S) U A, sin ce ev er y su bs et of S’
is either a subset of S or is fo rm ed by ad di ng the el em en t a to a su bs et
of S. By the in du ct io n hy po th es is , ®( S) has 2” el em en ts . Si nc e ea ch su bs et
of S corresponds to ex ac tl y on e el em en t of A, it fo ll ow s tha t A als o ha s 2”
elements. Since @(S) an d A are dis joi nt an d @( S’ ) = ®( S) U A, it fo ll ow s
that @(S’) has 2" + 2" = 2-2" = 2"*1 elements. Jj

Often sets which hav e bee n ind uct ive ly def ine d are use d as a bas e for oth er
inductive definitions. Such “s ec on da ry ” ind uct ive def ini tio ns req uir e no ex tr em al
clause because the ext rem al cla use of the un de rl yi ng set ful fil ls the ap pr op ri at e
function.

Example
The following is an in du ct iv e de fi ni ti on of th e ex po ne nt ia l a" fo r no nn eg at iv e
intege r va lu es of n. Th e un de rl yi ng in du ct iv el y de fi ne d set is N.

Definition 2.5.4: Leta © R-+ andz & N. The value of a” is defined inductively
as follows: ;
1. (Basis) a® = 1.
2. (induction) a**! = aa.
The inductive definition can be used to establish the following:

Theorem 2.5.5: Vm Walaa" = a™**]


Sec. 2.5
INDUCTION 105

Although the above assertion involves two universa


l quantifiers, it can be proved
by inducton by letting m be arbitrary and proving
the assertion Valata" = a™**]
by induction on n. Since m was arbitrary, the th
eorem will follow by universal
generalization.
Proof: Let m be arbitrary.
1. (Basis) If n = 0, then
amg? = ging? — a™(1) = gm = gmt0 qmtn,

2. (nduction) Assume aa" = a™** for arbitrary n.


Then
amantl — gm(qng) Definition of a*
= (a"a")a Associativity of multiplication
== (a™**)q Induction hypothesis
<= gimtn)+1 Definition of a
== qnta@th Associativity of addition. J #
The principle of mathematical induction is, in fact, a rule
of inference for the
universe of the natural numbers. Using the notation of Se
ction 1.4, the formal
presentation of the rule is the following.

P(0)
ValP(™) > P(n + 1]
“. VxP(x)
We often wish to prove that a predicate P holds for all x =
k for some integer
k. A proof by induction is still appropriate but the basis st
ep must be changed to
prove P(k). The rule of inference is then

P(k)
Vn[P(n) > P(n + 1)]

-. Wxl(x > k) > PQ»)


Thus to prove that P(x) holds for all integers equal to or greater th
an k, it suf-
fices to show P(k) is true as the basis step, and then show th
e inductive step
Val P(n) > P(n + 1)].
Another form of proof by induction over the natural numbers
uses the
Second Principle of Mathematical Induction to prove assertions of th
e form VxP(x).
The induction step of a proof using the Second Principle assume
s P(k) is true for
all k <n and shows that this implies P(n). The formal statemen
t of the Second
Principle as a rule of inference is the following.

ValVkIk <n > P(k)] > P(n)]


VxP(x)
The induction hypothesis for a proof using this rule of inference is
Vkik <n > P(k)];
Ch, 2
106 SETS

from this hypothesis, we must establ ish P( ). If P(z ) can be sh ow n on the as su mp -


tion that the in du ct io n hy po th es is hol ds, the n we can co nc lu de Vx P( x) .
Note that ifm = 0, the assert ion k < 0 is fal se for eve ry k < N, an d th er ef or e
the implication k < 0 > P(k ) is tru e. It fol low s tha t Vki {k <0 > P(k )] is tru e an d
hence Wk[k < 0 > P(k)] > P(0 ) is eq ui va le nt to P(0 ). Th us the bas is ste p of the
First Pri nci ple is im pl ie d by the hy po th es is of the Se co nd Pri nci ple .
An application of the Se co nd Pri nci ple onl y req uir es tha t we est abl ish a sin gle
hypothesis, but thi s oft en req uir es a pr oo f by cas es. Thi s mo st co mm on ly is in the
form of proving the spe cia l cas e P(0 ), an d the n pr ov in g tha t for any n > 0, if
P(k) holds for all k <n, the n P(# ) hol ds. Su ch a pr oo f usi ng the Se co nd Pr in ci pl e
differs from one usi ng the Fir st Pri nci ple onl y in tha t ins tea d of as su mi ng an
induction hypothes is of P(n — 1) to pr ov e P(m ), we as su me tha t P(k ) is tru e for
allk <n.
Proofs by in du ct io n usi ng the Se co nd Pri nci ple as su me a st ro ng er in du ct io n
hypothesis than proo fs usi ng the Fir st Pri nci ple . Th e Se co nd Pri nci ple is a na tu ra l
choice for inductive proo fs in wh ic h the pr op er ti es of el em en ts ge ne ra te d in the
(n + 1)th ste p ma y de pe nd on the pr op er ti es of el em en ts ge ne ra te d in sev era l
previous steps.
Although the two pri nci ple s of ma th em at ic al in du ct io n are dif fer ent , if the
universe of di sc ou rs e is the na tu ra l nu mb er s N, the ir hy po th es es are log ica lly
equivalent an d the y are th er ef or e eq ua ll y po we rf ul . Ot he r un iv er se s exi st wh er e
the Second Princi ple is in fac t mo re po we rf ul ; we wil l see an ex am pl e wh en we
treat order relations.

Example
We use the Second Principle of Mathematical Induction to prove that all
integers n > 2 can be written as a product of prime numbers. The induction hy-
pothesis asserts that for arbitrary n,
For every k such that 2<k <n, k can be written as a product of prime
numbers.
On the basis of this assumption we must show that n can be written as a product of
primes.
The proof is by cases.
Case 1: If nis a prime, then 7 is such a product of one prime.
Case 2: Ifnis nota prime, thenn = ab, where 2 < a,b <n. By the induction
hypothesis, both a and 6 can be written as products of primes and therefore their
product can be written in this form. #

Problems: Section 2.5

1. Give inductive definitions for the following sets.


(a) The set of unsigned integers in decimal representation. The defined set should
include 4, 167, 0012, etc.
(b) The set of real numbers with terminating fractional parts in decimal represen-
tation. The defined set should include 6.1, 712., 01.2100, 0.190, etc.
Sec. 2.5
INDUCTION 107

(c) The set of even integers in binary representation without leadin


g zeroes. The
defined set should include 0, 110, 1010, etc.
Integer arithmetic operations with one nonnegative operand can of
ten be defined
inductively in terms of more “primitive” operations. Thus, the pr
oduct of two
integers can be defined as follows:
a-0 =0,
a-(6+1)=a-b+a for b>0.
(a) Write a recursive procedure based on this definition which cal
culates the
product of two integers where the second is known to be nonnegati
ve.
(b) Give an inductive definition of a? (exponentiation) using only multiplic
ation
and addition. Assume a and db are integers and b =z 0. Write a recurs
ive proce-
dure to calculate a’ based on your definition.
Give an inductive definition of n! and use it to prove the identi
ty

al=TLi
i=l
whe
nn> 1.
Prove by inducti
tho
atn (1 +2+3+4... t nj? = 13 +23 433 4+... 4 73 for
all ec x
I-+.
Let a be a positive number. Prove

Vim Vul(a") = a™] ~where m,n € N.


Prove each of the following relationships for all n & N.
(a) >» i? = nln + DQn + Y/6
(b) >» Qi+1)=(n +1)
nt

() MDi)=@+)!-1
=0
(4) 14+2n<3"
Prove Theorem 2.5.3.
A polygon is convex if every line joining two points of the polygon lies wit
hin the
polygon. Prove that the sum of the interior angles of a convex polygon wit
h x
sides is equal to (x — 2) 180° for all n > 3. (Hint: If 2 > 3, the polygon can be di
vided
into two parts by connecting nonadjacent vertices.)
Find predicates P and Q over the natural numbers which will establish that the basis
step and the induction step of an inductive proof are independent, i.e., neither
logically implies the other. Specifically, find a predicate P such that P(O) is true
and Wnr[P(n) > P(n + 1)] is false and a predicate Q such that Q(0) is false and
Vin[Q(n) = O(n + 1)] is true.
10. What is wrong with the following proof that all people are the same size? We purport
to prove that for all n and for all S, if S is a set with 2 people, then all people in §
are the same size.
1. (Basis) Let S be an empty set of people. Then for all x and y, if x € S and
y € S, then x is the same size as y.
108 SETS Ch. 2

2. (Induction) Assume the assertion is true for all sets containing n people. We
show it is true for sets containing n -+ 1 people. Any set consisting of n + 1
people contains two nonequal subsets of n people which must overlap. Denote
these sets by S’ and 8”. Then by induction hypothesis, all people in S’ are the
same size and all people in S” are the same size. Since S’ and S” overlap, all
people in S = S’ U S” are the same size.
11. Let {A;, Az, ..., A,} be a nonempty collection of sets. Prove the following generali-
zations of DeMorgan’s Laws by induction on n.

@ Ua=O4
6) Q4=U%
12. A binary operation ["] is said to be associative if a [7] (6) c) = (a(.] 6) Lic. From
this “associative law” we infer a much stronger result, namely that in any expression
involving only the operation ["], the placement of parentheses does not affect the
result, that is, only the operands and the order in which they occur in the expression
are important. In order to prove this “generalized associative law,” we define the “set
of [] expressions” as follows:
1. (Basis) A single operand a, is a [_] expression.
2. (Induction) Let e,; and e, be [] expressions. Then (e, [_] e2) is a ([] expression.
3. (Extremal) There are no [_] expressions other than those which can be constructed
from 1 and 2 in a finite number of steps.
The generalized associative law can now be stated as follows:
Let e be a [] expression with n operands a;, a2,..., a, which appear in that
order in the expression e. Then

[email protected],)--.))).
Prove this generalized associative law. (Hint: Use the Second Principle of Mathe-
matical Induction.)

#2.6 THE NATURAL NUMBERS

In this section, we will exhibit a careful set theoretic definition of the natural
numbers. In the previous section, we used the operation of addition to give an
inductive characterization of N. Since the definition of addition of natural numbers
must be based on the set N, the characterization we gave is circular and hence
unacceptable as a formal definition of N. To avoid this circularity, N must be
defined without using addition. The following is a better (but not yet successful)
characterization of N which uses n’ to denote the “successor” of a natural number
n; informally, we interpret n’ as n + 1.
1. (Basis)0 < N.
2. (Induction) Ifn € N, then n’ € N.
3. (Extremal) If S c N and S satisfies clauses 1 and 2, then S = N.
Sec. 2.6
THE NATURAL NUMBERS 109

The inadequacy of the above characterization stems


from our not having
specified exactly what is meant either by 0 in the basi
s step or by 7’ (which must
be defined in terms of n) in the inductive step. As a re
sult, models can be con-
structed which satisfy the inductive characterization give
n above, but do not have
the structure of N. The structure we want to characteri
ze can be diagrammed as
follows:

where a———>b means b is a successor of a; in the


diagram, 0’ represents 1,
0” represents 2, etc. If we can find a model of the above
inductive characteriza-
tion of N which has a different structure, then we will
have established the inade-
quacy of the characterization as a definition of N. Th
e simplest “unintended”
model is formed by making 0 its own successor, i.e., 0
= 0’. In this model, the
set N is the singleton set {0} and the structure is diagrammed as
follows:

In order to rule out such a model, the set N must be defined so as


to guarantee that
0 is not the successor of any natural number. This change alone
is not sufficient,
however. Let N be the set of nodes of an “infinite rooted binary
tree.” The root
denotes 0, and each natural number has a successor; in fact, it ha
s two successors.
This unintended model can be represented as follows:

On


05,

uw

09

Consequently, an adequate characterization of N must guarantee that the successo


r
of a natural number is unique. Even with this condition satisfied, however, it
is still
possible to construct models which do not have the intended structure. In the
following diagram, 0 is not a successor of any natural number, and every natura
l
number has an unique successor. However, two distinct natural numbers
| and 3
have the same successor.
110 =6SETS Ch. 2

Ww
0 0’ 0” 0

To rule out suc h mod els , the def ini tio n of N mus t gua ran tee tha t if x' = y’ the n
x == y, that is, a natural number can have at most one predecessor.
A def ini tio n of N whi ch sat isf ies all of the se con str ain ts can be con str uct ed
using set the ory . Eac h nat ura l nu mb er wil l be a set. The firs t nat ura l nu mb er is
defined to be ¢, changing the basis step to
1. (Basis) ¢ is a natural number.
For each natural number 2, its successor, n’, is constructed as follows.
2. (Induction) If is a natural number, then 7 U {n} is a natural number.
The extremal step remains unchanged. The result is the following definition.

Definition 2.6.1: The set of natural numbers N is the set such that

1. (Basis) ¢? € N,
2. (Induction) Ifn e N, then zn U {n} € N,
3. (Extremal) If S < N and S satisfies clauses 1 and 2, then S = N.

The set of natural numbers, according to this definition, has as its elements the .
sets b, {h}, {b, {G3}, {6, {G}, {G, {G}}},... which we denote by the numerals 0,1, |
2,3,... Many of the familiar properties of the natural numbers can now be |
established, including the following theorems. (The proofs can be found in Chapter |
1 of Cohn [1965].) r

Theorem 2.6.1: 0 is not the successor of any natural number.

Theorem 2.6.2: The successor to any natural number is unique.

Theorem 2.6.3: lf n’ =m’, thenn = m.

If these theorems are added as axioms to the inadequate inductive characteri- |


zation of N given at the beginning of this section, we obtain the well-known ©
Peano Postulates for the natural numbers. These postulates, which characterize the 4
natural numbers without using sets, can be stated as follows: s
(a) O is a natural number.
(b) For each natural number n, there exists exactly one natural number n’,
which we call the successor of n.
(c) 0 is not the successor of any natural number.
(d) Ifn’ =m’, thenn =m.
(e) If Sis a subset of N, such that
Gi) Oe S,
Sec. 2.7
SET OPERATIONS ON &* 111

(i) ifm € S,thenn’ & S,


then S = N.

Problems: Section 2.6

1. Construct a series of models for the axiom systems ob


tained from the Peano postu-
lates by deleting each of the axioms a through e in
turn. None of the models should
have the structure of the natural numbers.
2. The definition we have given of the natural numbers only involves the notion of
“successor.” Relations such as “less than” and operations
such as addition and mul-
tiplication must be defined in terms of the concept of “s
uccessor.” For example, the
operation of addition can be defined inductively as fo
llows:
1. For every integer m,m + 0 = m.
2. For every pair of integers m and nm -+ Hn = (m+ ny’.
(a) Show (using the above definition) that addition is associativ
e.
(b) Define multiplication inductively in an analogous manner
. You can use the
(previously defined) operation of addition.
(c) Define exponentiation inductively, using the operation of mult
iplication.
(d) Give an inductive definition of the relation “less than.”
3. Construct an alternate model of N using sets. The alternate mo
del need not have the
property that the set which denotes the number k has & elements
.

2.7 SET OPERATIONS ON <&*

Strings of symbols play an important role in computer science. Computer pro-


grams, texts of written documents, mathematical formulas, and theorems in a for-
mal system are all objects which we conventionally represent as finite sequences of
symbols. Thus, in order to write programs that operate on other programs, text
editing programs, programs which manipulate algebraic formulas and programs
which prove theorems, we must have tools for handling individual strings and
sets of strings.
Throughout this text, the symbol = will denote a finite alphabet and E* the
set of all strings of finite length with symbols from 2. The principal operation on
elements of X* is concatenation.

Definition 2.7.1: Let X% be an alphabet and x and y be elements of E*,


If x= a,a,...a, and y=56,b,...b, where a,b6,€X% and mneN
then the concatenation of x with y, denoted x-y, or simply xy, is the string
XY = ,0,... And b,...b,. If x = A, then xy = y for everyy; similarly if y = A,
then xy = x.

The following is a convenient notation for representing the concatenation of


a string to itself n times. This inductive definition is based on a definition of N and
therefore requires no extremal clause.
112 ~~ SETS Ch, 2

Definition 2.7.2: Let x be an element of &*. For each n € N, the string


x" is defined as follows:
1 x®° =A,
2. xett = x"x,

Examples
(a) If X = {a,b} and x = ab, then x° = A, x! = ab, x* = abab, and
x3 = ababab.
(b) The set {a"b"| n > 0} denotes the set {A, ab, aabb, aaabbb,...}. #

We often wish to treat collections of strings rather than individual strings.


For example, in programming language specification, we must characterize the
entire set of programs which can be written in a language. Similarly, a compiler
must be written so that it can handle all programs written in the language. Because
of the importance of such sets, a considerable body of terminology and notation
has been developed to deal with them.

Definition 2.7.3: Let X be a finite alphabet. A language over X is a subset


of &*.

Examples
(a) The set {a, ab, abb} is a language over X = {a, b}.
(b) The set of strings consisting of sequences of a’s followed by sequences of b’s,
{a"b™ |n, m & N}, is a language over fa, b}.
(c) The set of ALGOL programs is a language over the alphabet consisting of the
ALGOL character set. #

Since every language is a set, the usual collection of set operations introduced
earlier in this chapter can be applied to languages. However, because they are
collections of strings, other important operations on languages can be defined as
well, many of which are based on the operation of concatenation. The principal
goal of this section is to introduce these operations on languages and describe some
of their properties. These operations are important in a variety of application areas
as well as for the study of models of computation.

Definition 2.7.4: Let A and B be languages over &. The set product of A with
B, denoted A-B, or simply AB, is the language AB = {xy|x € A A y © B}.

The language AB consists of all strings which are formed by concatenating an


element of A with an element of B.

Example
Let & = {a, b}, A = {A, a, ab} and B = {a, bb}. Then
AB = {a, bb, aa, abb, aba, abbb},
BA = {a, aa, aab, bb, bba, bhab}.
Sec. 2.7
SET OPERATIONS ON £* 113

Note that, in general, AB BA ; Le., the operation of set product is not commu-
tative. +

Theorem 2.7.1: Let A, B,C, and D be arbitrary languages over Z. The


following relations hold.
(a) Ag = $A=9
(b) A{ =A
{A} }A= A
(c) (AB=)A( CBC)
(d) If 4 < Band Cc D, the
AC n
c BD
(e) A(BUC)=ABUAC
(f) (BU C)A= BAUCA
(g) A(B OC) <c ABOAC
(h) (BN C)< A BAM CA
Proof:
(a) (Ag = GA = @.) By definition, Ad = {xy|x € A Ay € $}. But for
every y € £*, y & @ is false and therefore the conjunctionxe AA yed
is false for all x and y. Since no values of x and y Satisfy the predicate, the
set Ap has no members, that is, 4é = ¢. A similar proof establishes the
identity 6A = ¢.
(d) (If 4c Band Cc D, then AC c BD.) The proof is direct. Assume
AcBACc D,and let z bean arbitrary element of AC. Then z = xy,
where x € Aandy &€ C. Since A c Band C < D, it follows that x € B
and y € D. Hence, z = xy € BD. Since z was an arbitrary element of
AC, it follows that AC c BD.
(e) (A(B U C) = AB U AC.)
() ABUACc A(BUC): We first apply part (d) by noting that
A = A,Bco BUCand Cc BUC. Therefore, AB c A(B UC)
and AC < A(B U C). Hence, ABU AC A(B UC).
(ii) A(BUC)< ABU AC: If z is an element of A(B U C), then
z= xy where x € Aand ye BUC. Hence, either (xe AAye B)
or (x € A A ye C). It follows that ze ABorze AC, and there-
forez © ABU AC.
We leave the remaining parts of the proof as exercises. |
Note that the operation of set product does not distribute over intersection.
For example, if A = {a, aa}, B = {a} and C = {aa}, then AB AC = {aaa} but
ABO C) = ¢.
Definition 2.7.5: Let A be a language over X. The language A” is defined
inductively as follows:
1. A® = {A},
2. At! = A*.A, forn EN. .
The language A” is the set product of A with itself n times. Therefore, if z © A”
form > 1, then z = w,w,...w,, where w, € A for each i from 1 ton.
114 SETS Ch, 2

Example
Let © = {a,b} and A = {A, a, ab}. Then A® = {A}, A! = A = {A, a, ab}, and
A? = A-A = {A, a, aa, aab, ab, aba, abab}. #

Theorem 2.7.2: Let A and B be subsets of £* and let m and n be arbitrary


elements of N. Then
(a) A™A" = Amen

= am" ’
(b) (any CB
(c) Ac >Ace B
Proof: The proofs of parts (a) and (b) are left as exercises. The proof of
part (c) is by induction on n:
1. (Basis) Since A° = {A} and B° = {A}, it follows that A” < B” ifn = 0.
2. (Induction) We wish to prove that for all n, if A” < B", then A”*? < Bt?”
By Theorem 2.7.1(d), if A” < B” and Ac B, then A”-A c B"-B, i.,
Anti Cc Br, |

We have used the notation £* to denote the set of all finite strings formed by
concatenating elements of Z. This notation can be extended in a natural way to
any subset of £*. We use the symbols “*” and “*” to denote unary operations
(called closure operations) on languages.

Definition 2.7.6: Let A be a subset of &*. Then the set A* (read “A star”)
is defined to be
A® — LJ A
neEN

ie, A*= Ao UA UA UAB U::-


={A}UAUA7
UA U--:
The set A* is often called the star closure, Kleene closure, or simply the closure of A.
The set A* (read “A plus”) is defined to be

i.e. At = A) U A? UAP Us:


The set A* is often called the positive closure of A.
Note that x € A* if and only if x € A* for some positive n € N, and x € A®* if
and only if x € A” for some arbitrary 2 € N.

Examples
(a) If A = {a}, then
At* = fa} U {aa} U {aaa} U ---
= {a"|n>1};
A* = {A} U At
= fa"|\n
> O}.
Sec, 2.7
SET OPERATIONS ON Z* 115

(b+) O* ={AJUPUPUPL-
A};
gr =. #
The following theorem characterizes some important properties of
the lan-
guage closure operations.

Theorem 2.7.3: Let A and B be languages over ¥ and let n & N. Then
the
following relationships hold.
(a) A® = {A} U At
(b) A’ = A* forn>0
(c) A’ =< At forn>1
(d) Ac AB*
(e) Ac B*A
(f) (A < B) => (A* & B*)
(g) (A < B) => (At c Bt)
(h) AA* = A*A = At
(i) AE AoAt= A*
GQ)
(A*)* = (At)* = A*
(k)
() A*A* = AtA* = At
(m) (A*B*)* = (4A U B)* = (A* U B*)*
Proof: Parts (a), (b), and (c) are immediate from the definition of A”, At,
and A*,
(d) (A < AB*.) By part (a), B* = {A} U B*. Therefore, AB* = A({A} U Bt)
= A U AB* which contains A. A similar proof establishes (e).
(f) (4 < B= A* co B*,) If x € A*, then x € A" for some n>0O. But
A c Bso by Theorem 2.7.2, 4” < B". Therefore, x € B" and from part
(b) it follows that x < B*. A similar argument holds for part (g).
(h) We show only A*A = A*. An intuitively appealing argument can be
constructed by noting 4* = A®° U A! U 4? U A? U -- and therefore
A®PA=(PUAUAUAU---)A
=A AUAIAUAAU-::
=A'UA? UA Us.
= At,
The preceding argument, while valid, uses the fact that set product
distributes over infinite unions, which we have not proved. The following
alternative argument does not use this fact.
x € A*A<=>yz
xforsomey € A* andz ce A
<> x = yzforsomey € A" andz ec Aandne N.
116 8 8=SETS Ch. 2

<x € A"A for somen € N


<> x € A"*! forsomen € N
x € A™ form © I+, whe
<> re+ 1
m=n
<x E At
(m) We show only (A*B*)* = (4 U B)*.
(i) (A*B*)* c (A U B)*:
Ac AUB and therefore A* < (A U B)*; _ similarly,
B* <(A U B)*. It follows that A*B* < (A U B)*(A U B)*. From
part (j), it follows that A*B* < (A U B)* and so (again applying
part (j)), (A*B*)* < (A U B)*)* = (A U BY.
(ii)(A U B)* c (A*B*)*: From part (b), A c A*, and from (),
A* < A*B*:; hence A c A*B*. Similarly B < A*B*. Therefore,
AUBc A*B*, and by part (f), (A U B)* c (A*B*)*.
The remaining parts of the proof are left as exercises. JJ

The following theorem, due to Dean Arden, has many important applications in
the study of finite automata and formal languages.

Theorem 2.7.4: Let A and B be arbitrary subsets of £* such that A ¢ A.


Then the equation X¥ = AX U B has the unique solution ¥ = A*B.

Although the theorem may initially appear difficult to interpret, careful


consideration of the assertion can make the result quite intuitive. We are given
a language X such that ¥ > Band X > AX. What can X consist of? Since X > B,
we can substitute B for X in the right side of XY > AX and conclude that X¥ > AB.
Repeating the substitution, we can conclude X¥ > AAB, X > AAAB, etc., and in
general XY > A"B. Thus X > A*B. Now consider a string x ¢ X. Since X¥ =
AX U B, and all strings in A are nonempty, it follows that either x € B, or else x
has a nonempty prefix such that the prefix is in A and removal of the prefix yields
another (shorter) string in X. By the same reasoning, this shorter string has the
same property; either it is in B or we can remove another nonempty prefix and
obtain another string in X. Since the original string was of finite length, after
stripping off a sufficient number of nonempty prefixes we will eventually obtain
a string in B. It follows that the original string must have consisted of a (possibly
empty) sequence of prefixes, each of which is in A, followed by a suffix which is
in B. Thus the original string must have been a member of A*B. The following
proof of the theorem is a formalization of these arguments.
Proof: Wet X denote an arbitrary solution to the equation. We will show
X = A*B. |
(a) Weshow XY > A*B by establishing that if X is a solution, then Y > A”B
for alln EN.
1. (Basis) For n = 0, A"= {A}, and A°B = B. Since X => B, it follows
that ¥ > A°B.
Sec. 2.7
SET OPERATIONS ON E* 117

2.(Induction) Assume X => A"B. Since XY > AX, it foll


ows that
X > A(A"B) = A*™*1B.
This completes the inductive proof that ¥ > A*B for all n
N. It is
left as an exercise to show that 4*B = Uo AB. Hence, X > A*
B.
(b) We show X¥ c A*B using the Second Principle of Mathemat
ical Induc-
tion on the length of strings in 2*. We wish to show that if x €
X, then
x € A*B. The induction hypothesis asserts that every stri
ng shorter
than x has this property. Let ||| denote the length of x € L*.
Then the
induction hypothesis is the following quantified implication.
Vw{llwll <llxll> [ve ¥ swe A*By
We use this hypothesis in a direct proof that if x € X¥ then x
€ A*B.
Since X¥ = AX U B,if x © ¥ then either x ¢ AX orx € B.
Case 1: If x € B, then x € A*B.
Case 2: Suppose x € AX. Then x = yz where y € Aandze
Y. But
A ¢ Aso yA and hence ||z|| < || x||. By the induction hy-
pothesis, it follows that z€ A*B. Thus x = yz € AA* Bc A*B,
This completes the inductive proof that if x © X¥ then x A*B, and
establishes that X < A*B.
Parts (a) and (b) of the proof establish that if Yis any solutionto X= AY UB,
then X = A*B. However, the proof of the theorem is not yet co
mplete, since
we have not shown that a solution always exists. We leave it to the
reader to show
that X = A*B is a solution to the equation ¥ = AX U B. |

Examples
(a) If A = {a} and B = 4, then the equation ¥ = AX U B has the uniq
ue solu-
tion X = A*B = g.
(b) If A = {a, ab} and B = {cc}, then the equation X = AX U B has the
solution
X = fa, ab}*{cc}. #

Problems: Section 2.7

1. Let A = {A, a}, B = {ab}. List the elements of the following sets.
(a) A?
(b) BS
(c) AB
(d) A*
(e) B*
2. Let A, B, and C be languages over E. Prove the following relationships.
(a) A(BC) = (AB)C
(b) A™A” = A™** for all m,n > 0. (This implies that {A}A = A{A} = A.)
(c) (A™)* = A™ for all m,n > 0
3. Let A and B be languages such that A? = B. Does it follow that A = B? Prove
your assertion.
SETS Ch. 2

While A* = At U {A}, it is not generally true that A+ = A* — {A}. For & = {a},
find the smallest set A such that At + A* — {A}.
(a) Prove that the operation of set product distributes over infinite union, i.e., show
that
ACY B) = U (AB,).
i@N 1€N
A similar proof can be used to show the other distributive law,
(U B)A = U (B;A).
iegN ieN
(b) Prove that
A*B = (Jo A‘B.
Let A and B be arbitrary languages over XZ. Prove the following.
(a) (A*)* = A*
(b) Ac A<> At = A*
(©) (4%) =a"
(d) A*A* = At
(0) (A*B*)* = (4* U BY) *
Show that if A 4 @ and A? = A, then A* = A.
Let A, B, and C be languages over £. Determine which of the following assertions
are true and give counterexamples for those that are false.
(a) (A*)" = (4")* for anya e N
(b) (AB)* = (BA)*
(c) (A —B)C = AC — BC
(d) A* coc B*>AcB
(ec) (A*B*)* = (B*A*)*
(f) AUBUCc A*BtCt
(g) (At)* = At
(h) (A)* = (A*), where B = X* — B
@) (AB)*A = A(BA)*
G) (A*B)*A* = (A* U B*)*
(k) At = AtAt
Let Ei, E,,..., E, be subsets of &*, Is it always true that
(E; UE,U +++ UE,)* = (EER... E*)*?
Prove your assertion.
Complete the proof of Theorem 2.7.4 by showing that X = A*B is a solution to the
equation X = AX U B.
Assume the same hypotheses on A and Bas in Theorem 2.7.4. Find the solutions to
the equation X = XA U B. Prove your assertion.
12. Suppose X = AX U BandA ¢€ A. Show thatif C > Bthen ¥ = A*Cisa solution.
13. Let A = {a}, B = {b}. Using Theorem 2.7.4, find subsets X,, X, of {a, b}* which
solve the following set of simultaneous set equations. (Hint: Solve for one variable
in terms of the remaining variables and then substitute.)
(a) X; = AX; U BX,
Ch. 2
SUGGESTIONS FOR FURTHER READING 119

(b) X, = AX,

14. Use finite sets and set Operations to characterize the follo
wing languages over
X = {a, b}. For example, the set of string of even length is {aa, ab
, ba, bb}*.
(a) The set of strings of odd length.
(b) The set of strings which contain exactly one occurrenc
e of a.
(c) The set of strings which either begin with an a or end wit
h 2 d’s or both.
(d) The set of strings which contain at least 3 consecutive a’s.
(e) The set of strings which contain the substring “bbab.”

Suggestions for Further Reading

The book by Halmos [1960] is an excellent introducti


on to set theory as well
as many of the mathematical topics we treat in Chap
ters 3, 4, and 6. Axiomatic
treatments of set theory can be found in Suppes [1960]
and Monk [1969]. Wilder
[1965] discusses the set theory paradoxes and their
role in the development of
axiomatic set theory.
The classical development of the natural numbers fr
om the Peano axioms,
followed by a development of the rational, real, and comp
lex numbers, is given by
Landau [1951]; it is an excellent introduction to formal
mathematics. The work by
Knuth [1974] follows two young lovers on an uninhabited
shore of the Indian Ocean
as they consider some of the same foundational question
s as Landau. Knuth’s
book is readable and it conveys the spirit of how one goes
about doing mathema-
tics; the reader also learns something about the natural numb
ers.
The first use of Backus-Naur Form for describing the syntax of
a programming
language occurs in the Revised Report on the Algorithmic L
anguage—ALGOL 60,
which is reprinted in Rosen [1967]. This notation is often used
in presenting con-
text-free grammars; the reader is referred to Aho and Ullman [1
972].
3

BINARY RELATIONS

3.0 INTRODUCTION

Relations characterize structure. In the last chapter we studied sets and their
elements. In this section we will study some basic forms of structure which can be
represented by relationships between elements of sets. Relations are of fundamental
importance to both the theory and applications areas of computer science. A com-
posite data structure, such as an array, list, or tree, is generally used to represent
a set of data objects together with a relation which holds between members of the
set. Relations which are a part of a mathematical model are often implicitly rep-
resented by relations within a data structure. Numerical applications, information
retrieval, and network problems are examples of application areas where rela-
tions occur as a part of the problem description, and manipulation of the relations
is important in solution procedures. Relations also play an important role in the
theory of computation, including program structure and analysis of algorithms.
In this chapter we will develop some of the fundamental tools and concepts asso-
ciated with relations.

3.1 BINARY RELATIONS AND DIGRAPHS

The mathematical concept of relation is based on the common notion of rela-


tionships among objects. Some relations describe comparisons between elements
of a set: one box is heavier than another, one man is richer than another, one
event occurred prior to another, etc. Other relations involve elements of different
sets, such as “x lives in y” where x is a human and y is a city, “x is owned by y”
where x is a building and y is a corporation, or “x was born in the country y in
the year z.”

120
Sec. 3.1
BINARY RELATIONS AND DIGRAPHS 121

The examples we have given are all telationships be


tween either two or three
objects, but in principle we can describe relation
ships which hold for n objects,
where v is any positive integer. When making an asse
rtion that a relationship holds
among 7 objects, it is often necessary to specify
not only the objects themselves
but also an ordering of the objects; for example,
only the relative positions of 6
and 4 differ in the two assertions “6 < 4” and “4
< 6”, yet one assertion is false
and the other is true. We will use “ordered n-tuples
of elements” to specify a finite
sequence of not necessarily distinct objects; the re
lative positions of the objects in
the sequence will provide the necessary ordering
of the objects.
Definition 3.1.1: For n> 0, an ordered n-tuple (or
simply n-tuple) with ith
component a, is a sequence of n objects denoted
by <@;, 42, 43,...,a,>. Two
ordered n-tuples are equal if and only if their ith
components are equal for all
Ll<i<anlfn=2orn= 3, an ordered n-tuple is called an ordered pair or
an
ordered triple respectively.

We often wish to treat collections of n-tuples where


the ith component of
each n-tuple is an element of some set A,. The set of
all such n-tuples is defined as
follows:

Definition 3.1.2: Let {A4,, Az, A3,..., A,} be an indexed collection of


sets
with indices from 1 to n, where n > 0. The cartesian pr
oduct, ot cross product of
the sets A, through A,, denoted by 4, X A, X +++ X
A, or X?7_, A,, is the set
of n-tuples {<a,,a,,...,4,>|a; € A,}.t When A, = A for
all i, then X?., A, will
be denoted by A’.

Examples
Let A = {1, 2}, B = {m, n}, C = {0} and D = ¢. Then
(a) 4 x B= {<1,m, <1, >, <2, m, <2, ny},
(b) Ax C= {<1, 0, <2, 0},
() Ax D=4¢,
When A and B are sets of real numbers, then A x B can
be represented as
a set of points in the cartesian plane. For example, let A =
{x|1 <x < 2} and
B= {y|0<y< 1}. Then
(d) Ax B={@%yl1A< 0<x
y<<1}2
, and
() BxA={y,xo|1<x<2A0<y<Jj.
tAssociativity is sometimes an annoying problem when trea
ting cartesian products. Defini-
tion 3.1.2 distinguishes between the sets Ai X Az X A3, (At
X Az) X A, and Ay X (Az X A3)
because the elements of these sets are of the forms <@1, @2, 43>,
<<a1, 22>, a3), and <a1, (a2, a3>>
respectively. These distinctions are sometimes important, but we
will usually wish to use the set
A; X Az X A3. We will therefore treat the binary operation of
cartesian product as though it
were associative, unless specific mention is made to the contrary.
122 BINARY RELATIONS Ch. 3

These rel ati ons are rep res ent ed by the sha ded are as in the fol low ing dia gra ms.

we
Lee

tho
bh

1 2 3 l 2 3
AXB BXA

Let A; = {1, 2}, 42 = {a, b} and A; = {x, y}.

(f) Xfat A; = {<1, a, x>, <1, a, ys <1, b, x>,


<1, 6, y>, <2, 4, x, <2, ay), <2, b, x>, <2, 5, yy}.

(g) Aj = A3 X A3 = {<x, x>, <x, ys <“Y, x>, XY, yy}. #

The preceding examples show that the operation of binary cartesian product
is not commutative, i.e., it is generally not true that A x B= B x A. The fol-
lowing theorem establishes that the operation of binary cartesian product dis-
tributes over union and intersection.

Theorem 3.1.1: If A, B and C are sets, then


(a) AxXx(BUCQ)=(AXBU(AXC),
(b) AX (BN C)=(AX BN(AX OC),
(c) X
(AUB)XC=(AxXQC)VOUBxOC),
(d) (AN B)X C=(AX C)N (BX C).
Proof:
(a). The proof uses the distributivity of (\ over \V. Let <x, y> be an arbitrary
element of A x (B U C). Then
yE)
(BUC
xyyoEAX(BUC)@exEAA
~-xEAAWEB V
VEC)
-(XEAAVEBVKEAAVEC)
-ix,yoE AX BV GX, pEAXC
<<(x,y> € (A x B)U(A XC).
The proofs of parts (b)-(d) are left as exercises. j

Definition 3.1.3: Let A,, A,,...,A, be sets. An n-ary relation R on X?., A;


is a subset of X7_, A, If R= @, then R is called the empty or void relation. If
R = X*., A,, then R is called the universal relation. If A, = A for alli, then R is
Sec. 3.1 BINARY RELATIONS AND DIGRAPHS 123

called an n-ary relation on A. If n = 1, 2, or 3, then R is


called a unary, binary,
or ternary relation, respectively.

In defining the concept of equality for relations, we require


not only that
the sets of n-tuples be the same but also that the cross product
supersets be the
same.

Definition 3.1.4: Let R, be an n-ary relation on X?7., A, and R, be an m-ary


relation on X”, B;. Then R, = R, if and only if n = m, and
A, = B, for all i,
1<i<n, and R, and R, are equal sets of ordered n-tuples.

In practice, the indexed collection of sets {A,, A,,..., A,} is


often left implicit
and an n-ary relation is informally referred to as a set of n-tu
ples.
If each set A, is finite, then there are a finite number of n-ary rela
tions on
X?.1 4;. Recall that if S is a finite set with k elements, then (S) has
2* elements.
Since every subset of X?_, A, is an n-ary relation on X?_, A,, if the ca
rtesian prod-
uct of A, through A, has k elements, then there are 2* n-ary relations
on the set
X?., A; But if A, has r, elements, then X71 A; has [[?_, r,; elemen
ts and hence
there are 27"7*"--.-™ y-ary relations on X71, A;.
Every n-ary relation R on a set A corresponds to an n-ary predicate
with A
as the universe of discourse. If the relation R is given, a corresponding
predicate
P can be defined as follows:
P(a,, @,...,4,) is true <> <a,,a,..., a, € R.
Conversely, a predicate P can be used to define a relation R as follows:
R = {Cay, Gy, ...5 a,>|P(ay, a,,..., a,) is true}.
A unary relation consists of a set of 1-tuples and can be associated with a predicat
e
with a single variable or a property of some elements of A; a unary relation on
a set A is simply a subset of A.

Examples
(a) Let the universe of discourse be the set A = {1, 2, 3}. The three variable
predicate “x +y =z” on the universe A corresponds to the relation
R= {tx,y, |x +y =zlon a,
(b) Consider the universe N. The property “x is an even integer” can be charac-
terized by a unary predicate
P(x) <> x is even,
or a unary relation
{<x>|x is even},
or a subset
{x|xiseven}. #
Binary Relations

The most important class of relations is the set of binary relations. Because
binary relations are referred to more frequently than others, the unqualified term
124 BINARY RELATIONS Ch. 3

“relation” usually denotes a binary relation; where no confusion will result, we


will ado pt this con ven tio n. Rel ati ons whi ch are not bin ary will be spec ifie d by suc h
terms as “ternary” or “n-ary.”
The following definition presents some additional terminology and notation
associated with binary relations.

Defi niti on 3.1. 5: Let R be a bina ry rela tion over A x B. The set A is the
domain of R; B is the codomain. We denote <a, b> € R by the infix notation aRb
and <a, b> ¢ R is denoted by aRb.

Examples
(a) Let L be the relation on the integers I of “less than.” Then we write 4 < 6 to
denote <4, 6> € Land 6 + 4 to denote <6, 4> ¢ L.
(b) Let M denote the relation “is a multiple of” for the universe N. Then 4M2 but
2M4. More generally, xMy if and only if x = ky for some k € N. Thus for
all x, OMx and xM1. If p > 1, then p is prime if xMp implies that either
x = 1lorx =p. A number x is odd if x32.
(c) When a compiler translates a computer program it constructs a symbol table
which contains the symbolic names which occur in the program, the attributes
associated with each name, and the program statements in which each name
occurs. Thus if S is the set of symbols, A is the set of possible attributes and P
is the set of program statements, then the symbol table includes information
which represents binary relations from S to A and S to P.
(d) Let A bea set of documents in a library, and B be a set of descriptors used to
describe the documents. Let R be the relation from A to B such that aRb
if and only if the descriptor b applies to document a. For example, if X is an
article on automatic word recognition, then <X, “pattern recognition”) and
<X, “speech processing”> might be elements of R. Such relations form a basis
for automatic document retrieval systems. The user of such a system describes
his interests by choosing a set of appropriate descriptors; the document
retrieval system uses the relation R to determine what documents in the library
are likely to be relevant to the user’s needs.
(e) Binary relations on the set of real numbers can be represented graphically in the
cartesian plane. The following is a graph of the relation {<x, y>||x| +|y] = 1}.

Since relations are sets, some relations can be defined inductively.


Sec. 3.1 BINARY RELATIONS AND DIGRAPHS 125

Example
The relation “less than” over the natural numbers N can be defined inductivel
y
as follows (the corresponding “ordered pair” formulation is given on the right
for
the basis and induction clauses):

1. (Basis) 0 <1 1 @DeEe<


(Induction) 2. if<x,y> € <, then
If x < y, then @ <wytDe<
@ x<yt+i Gi) <+iy+De<
Gd) x+1l<y+i1
3. (Extremal) For all x, y € N, x < y only if it is required by clauses 1 and2. #

The study of binary relations is closely related to the mathematical field of


graph theory. Graphs often provide a convenient way of viewing questions con-
cerning binary relations, and for that reason we will develop the concepts of
directed graphs in parallel with our treatment of binary relations.t

Definition 3.1.6: A directed graph or digraph is an ordered pair D = <A, R>


where A is a set and Risa binary relation on A. The set A is the set of nodes (points,
vertices) of D and the elements of R are the arcs (edges, lines) of D. The relation
R is called the incidence relation of D.

If D = <A, R) is a digraph and A isa finite set, then D is called a finite digraph.
A finite digraph <A, R> can be represented graphically by denoting the elements
of A by labeled points. An arc xRy is represented by an arrow from x to y.

o_o V

We will frequently represent digraphs with such diagrams, and in fact we will
call such a diagram a digraph, even though it is only a convenient representation.

Examples
(a) Let D = <A, R>, where A = {a, b, c,d} and R = {<a, c), <b, c>, <a, a>}. The
digraph D is represented by the following diagram.

{The definitions and terminology used in graph theory vary considerably among different
authors. We have chosen the nomenclature most appropriate for our purposes but the reader is
advised to be alert for differences in definitions when consulting other works.
126 BINARY RELATIONS Ch. 3

(b) Let D = <N, R), where the relation R consists of all integer pairs of the form
<x, x +2)>. Although N is infinite, we can represent this digraph by the
following (incomplete) diagram:

0 1 2 3 4 ; #
Digraphs constitute an important class of data structures. They may be
represented in a computer memory in a variety of ways, each of which has its
particular advantages. If the vertices of the digraph are indexed from 1 to n, the
digraph can be represented by an n xX n binary matrix M, called the incidence
matrix, where the entry in the ith row and jth column of M, denoted M[i, /], is 1
if there is an arc from the ith node to the jth node; otherwise M[i, j] = 0.
An alternate representation of a digraph consists of a list of ordered pairs
where <i, 7» is included in the list if and only if there is an arc from node i to node j.
Still another representation is a linked list, where each node of the graph is rep-
resented by its label and a list of pointers to the other nodes of the graph; each
pointer represents an arc. These representations, illustrated in Figure 3.1.1, are
only some of many possible ways to represent a digraph.
1 2
1 2 3 4
! Oo 1 0 0 (1, 2)
2 i 0 1 0 (2, 1)
3 7;0 0 0 0 (2, 3)
‘ 3 4 [1 0 0 4.1)
Incidence ast of Linked
Matrix Ordered Pairs List

Fig. 3.1.1 Some alternative representations of a directed graph

There is a natural association between digraphs and binary relations. If R


is a relation from A to B, the digraph associated with R is the digraph <A U B, R).
Conversely, if D = (A, R> is a digraph, the relation associated with D is the binary
relation R from A to A.

Definition 3.1.7: Let D = <A, R> be a digraph. If aRb, then the arc <a, b>
originates at a and terminates at b. An arc of the form <a, a> is called a Joop. The
number of arcs which originate at a node a is called the outdegree of node a; the
number of arcs which terminate at a is called the indegree of node a.

Definition 3.1.8: Let D = <A, R> be a digraph with nodes a and b. An


undirected path P from a to b is a finite sequence of nodes P = <co, ¢1,...5¢,>
such that
(i) Co = a,
(ii) ¢, = b,
(iii) For all c, such that 0 <i <n, either c,Re,44 OF ¢;41Re;.
Sec. 3.1
BINARY RELATIONS AND DIGRAPHS 127

If ¢,Re,,, for allie, O<i< n, then P is a directed path from a to b. The


is the initial node of P and b node a
is the terminal node of P. The length of the pa
is n. If all the nodes of P th P
are distinct except possibly the first and last (i.e
Co. C1>+++5C,-, are distinct ., if
and ¢,,¢,,..., ¢, are distinct), then P is a
simple
path. If cy = ¢,, then Pisa cycle; if Pis botha simple path and a cycle, then
a simple cycle. P is

If there is a path P = <cy,c,,... » C,» of nonzero


length from node a to node
b, then we can construct a simple path P’ from
a to b by eliminating the cycles
from P. This is done by successively replacing
subsequences in P of the form
(Cin City oe eg C,>, Where i > 1 and c; = c,, by the s
ubsequence <c,> until a simple
path results; this is the path P’. If P is directed, then
P’ will be a simple directed
path of nonzero length, and if P is a cycle, then P’ wi
ll be a simple cycle of nonzero
length.

Examples
(a) Let D be the following digraph:

c d

Then <a, c>, <a, b, c>, <a, c, a, c> and <a, b, b, c> are dire
cted paths from a to
c; of these, the first two are simple and the last two are no
t. The sequences
<c, 6, d> and <c, a, b, d> are undirected paths from ¢ to
d. The sequences
<a, c, a> and <a, b, c, a> are simple cycles; <a, ¢, a, c, a> and
<a, b, b, c, a> are
cycles but not simple. The path <a> is a simple cycle of length
0. Node a has
indegree 1 and outdegree 2; node d has indegree 1 and outdegree 0.
(b) Algorithms are often represented by flowcharts; a flowchart is a direct
ed graph
with labeled nodes and arcs. The node labels are represented
by boxes of
various shapes, together with notations written inside the boxe
s; the labelled
nodes represent starting points, exits, operations and tests. If a node
has only
one outgoing arc, the arc is commonly left unlabeled; in the case of a tes
t node,
outgoing arcs are labeled to indicate the results of the test, e.g., true and
false,
< and >, etc. A careful characterization of the class of flow
charts would
include other constraints on the form of a flowchart graph; for exampl
e, it
would be reasonable to require that each flowchart have exactly one sta
rt node
and at least one stop node.
A computation consisting of the execution of an algorithm represen
ted by
a flowchart corresponds to a path which begins at the start node of the
flow-
chart. The computation halts if the path terminates at a stop node. A proof
of
RELATIONS Ch. 3
128 BINARY

correctness of the algorithm mu st tre at eve ry dir ect ed pat h fr om the sta rt no de
to a stop node; for thi s rea son , pro ofs of cor rec tne ss oft en tak e the fo rm of
proofs by cases. #

We often wish to ref er to par ts of a dig rap h. For thi s pu rp os e, we def ine
subdigraphs and partial subd ig ra ph s. A su bd ig ra ph is ob ta in ed fr om a di gr ap h by
taking a subset of nodes and all arc s be tw ee n no de s of the sub set . A par tia l sub -
digraph also contains a sub set of nod es but nee d onl y con tai n so me of the arc s
between nodes of the subset.

Definition 3.1.9: Let D = <A, R> be a digraph.


(a) Adigraph D' = <A’, R’D is a subdigraph of D if
(i) A CA,
(ii) R’ = RO (4' x A.
If D’ + D, then D’ is a proper subdigraph of D.
(b) A digraph D’ = <A’, R’> is a partial subdigraph of D if
(i) A’ CA,
(ii) R’o RO (A’X A’).

Examples
If D = <A, R> is represented by

then the following represents a subdigraph of D with nodes {a, 5, c}.


a

ec

The following is a partial subdigraph but not a subdigraph of D, since the loop
<a, a> is not included.
b
g

Definition 3.1.10: A digraph D = <A, R> is strongly connected if for every


two elements a, b € A, there is a directed path from a to b and from b to a. If for
Sec. 3.1 : BINARY RELATIONS AND DIGRAPHS 129

every two nodes a,b € A, there is an undirected path fr


om a to b, then D is
connected; otherwise, D is disconnected.

Example
Consider the following digraphs:

OO
a b
..
a b a b
(i) (ii) (iii)

The digraph represented by (i) is disconnected, (ii) is conn


ected but not strongly
connected, and (iii) is strongly connected. #

The components of a digraph D are the largest connected “pieces”


of D; there are
no arcs between nodes of distinct components of a digraph.

Definition 3.1.11: A component of a digraph D is a connected subd


igraph of
D which is not a proper subdigraph of any connected subdigraph
of D.
Example
The following digraph has four components,

a d

QO
“e

&
nD

#
Definition 3.1.12: Let A be a set with n elements. The complete digraph over
A is the digraph (A, A x A), that is, A together with the universal binary
relation
on A.

Example
The following digrams are complete digraphs over sets with 1, 2, and 3 elem
ents.

eS
130 BINARY RELATIONS Ch. 3

Problems: Section 3.1

1. Let A = {0, 1, 2, 3,4}. For each of the predicates given below, specify the set of
n-tuples in the n-ary relation over A which corresponds to the predicate. For parts
(d)-(f), draw the digraph which represents the relation.
(a) Px)ox<l
(b) PQ)<3>2
(c) Px)<2> 3
(d) PX, yx <y
(e) P(x, y) <> dk[x = ky A k <2]
(f) P(x, y)<> [x =0 V 2x < 3]
(g) P(x, y,z)<>x? + y =z
2. For the following digraphs A and B,

(A)

(a) Find all simple paths from node a to node c. Give the path lengths.
(b) Find the indegree and outdegree of each node.
(c) Find all simple cycles with initial and terminal node a.
(d) Find the subdigraph containing the nodes a and c.
(e) Determine how many partial subdigraphs exist which contain only nodes a and c.
3. For each of the following, sketch a digraph of the given binary relation on A. State
whether the digraph is disconnected, connected or strongly connected, and state
how many components the digraph has.
(a) {<1, 2>, <1, 3, <2, 4}, where A = {1, 2, 3, 4
(b) {<1, 2>, <3, 1, <3, 35}, where A = {1, 2, 3, 4}
(c) {Xx,y>|0< x <y < 3}, where A = {0, 1, 2, 3, 4}
(d) {Xx,y>|2< x, y<7 A x divides y} where A = {nln EN Ar < 10)
(e) {<x,y>|0< x — y < 3}, where A = {0, 1, 2, 3, 4}
(f) {<x, y>|x and » are relatively prime}, where A = {2, 3, 4, 5, 6}.

4. Construct the incidence matrix for the following binary relation on {0, 1, 2, 3, 4, 5, 6}:
{<x, y>|x <y'V x is prime}.
5. For each of the following, give an inductive definition for the relation R on N. In
each case, use your definition to show x € R.
(a) R= {a,b|a>b};x =G,1>
(b) R = {<a, b>|a = 2b}; x = <6, 3>
(c) R= {Ka,b,c>|a + b = ch; x = <1, 1, 2>
Sec, 3.2 TREES 131

6. Let A = {1, 2, 3}.


(a) List the unary relations on A.
(b) How many binary relations are there on A?

7. Let A be a set with elements.


(a) Prove that there are 2" unary relations on A.
(b) Prove that there are 2”’ binary relations on A.
(c) How many ternary relations are there on A?
8. We have taken the notion of an ordered n-tuple to be primitive in the sense that we
did not define it in terms of either primitive or previously defined terms. An ordered
pair can be defined using set theoretic concepts as follows:
An ordered pair <a, b> with first element a and second element b is
the set {{a}, {a, b}.
Note that according to this definition, <a, a> = {{a}}.
(a) Prove, using this definition, the following property of ordered pairs:

<a, b> = <c, d> if and only if a = c and b = d.

(b) Defining ordered triples is not completely straightforward. Show that the
following definition of ordered triples does not have the property for equality
specified in Definition 3.1.1.
An ordered triple <a, b, c> with first element a, second element b,
and third element c is the set {{a}, {a, b}, {a, b, c}}.

3.2 TREES

The set of digraphs known as trees represent an important class of binary relations.
Trees provide a way to represent hierarchical structures, such as a family gen-
ealogy, the administrative structure of a corporation, or a categorization of
a collection of objects into classes. We will consider a few of the many applications
of trees in computer science, including data structures and the design and analysis
of algorithms.
Trees denote a particular kind of binary relation. Because the graphical rep-
resentation is such a natural one, definitions and theorems are usuaily couched in
the terminology of the digraphs rather than that of the binary relations.

Definition 3.2.1: A tree is a digraph with a nonempty set of nodes such that
(i) there is exactly one node, called the root of the tree, which has indegree
0;
(ii) every node other than the root has indegree 1;
(iii) for every node a of the tree, there is a directed path from the root
to a.

We will represent trees with the root node at the top and all arcs directed down-
ward, leaving the arrowheads of the arcs implicit.
132. BINARY RELATIONS Ch. 3

Examples
(a) The following digraphs are trees. The root of each tree is node a.

a
a ea

(b) The following digraphs are not trees.

a b a a

<> <>

ad d d

(i) (ii) (iti)

The digraph (i) has two nodes with indegree 0.


The digraph (ii) has a node with indegree 2.
The digraph (iii) has no node with indegree 0. #

Because trees are such an important class of digraphs, there is a rich ter-
minology associated with them. Different authors, however, do not use the terms
consistently. We will use just a few of the most widely accepted terms.

Definition 3.2.2: Let a and b be nodes of a tree T. If there is an arc from a


to b, then a is said to be the father of b and b is a son of a. (From the restrictions
on the indegree of nodes of a tree, it is clear that the root node has no father and
every node other than the root has exactly one father.) If there is a directed path
from node a to node b, then node a is said to be an ancestor of b and b is a descen-
dant of a; if a % b, then a is a proper ancestor of b and b is a proper descendant of
a. (It follows that the root is an ancestor of every node of a tree and every node is
is a descendant of the root.) The subdigraph consisting of the node a and all its
descendants is a subtree of T, and a is called the reot of the subtree. If a is not the
root of T, then the subtree is a proper subtree of T. A node with outdegree 0 (i.e.,
one with no sons) is called a leaf of a tree. A node which is not a leaf is called an
interior node. The height of the tree is the length of the longest directed path of T.
Sec. 3.2 TREES 133

Example
Consider the following tree.

d e f

The root of the tree is node a. The root a has two sons, b and c; node b has three
sons and d has no sons. The father of d is b. The leaves of the tree are the nodes
c, d, e, and f; a and 6 are the only interior nodes. The height of the tree is 2. The
subdigraph with nodes {b, d, e, f} is a subtree with root 6. The subdigraph consisting
only of node d is a subtree of height 0 with rootd. #

The usefulness of trees is due in part to the restrictions on paths which are
implied by their definition. These restrictions make it possible to traverse a tree
algorithmically (visit all its nodes) and perform searches for data more efficiently
than is possible with the general class of digraphs. The following theorems estab-
lish some of the most important properties of paths in trees.

Theorem 3.2.1: Let T be a tree with root r and let a be any node of T. Then
there is a unique directed path from r to a.

Proof: : By the definition of a tree (Definition 3.2.1) there is a directed path


from r to a, so we need only show that the path is unique. For each n € N, we
define S, to be the set of nodes of T such that for each a € S,, there is a directed
path from r to a of length x or less. We will show by induction that the path from
r to any node in S, is unique.
1. Basis: Let n = 0. Then S, = S, = {r}; that is, the only directed path of
length 0 which originates at r terminates at r. Since the indegree of r
is 0, there can be no other directed path from r to r; it follows that there
is a unique directed path in T from r to each node in Sy.
2. Induction: Let n > 0. We assume the induction hypothesis that for each
node b € S,_;, there is a unique directed path in T from r to b. Suppose
aeéS,. We treat two cases, where a € S,_, and where a ¢ S,_,. If
ae S,_,, then by the induction hypothesis, there is a unique directed path
from r to a. Ifa ¢ S,_,, then there is a directed path from r to a of length
n, but no such path of length n — 1. Any directed path of length n must
consist of a sequence <r, b,, b,,...6,-,,a>, where b,_, € S,_,. By the
134 BINARY RELATIONS Ch. 3

induction hypothesis, the path <r, b,, b,,...,5,-;> is the only directed
path in T from r to b,_,. Since the indegree of a is 1, there is only one
directed path of the form <b,-1, @>; i.e., there is a unique element 8,_,
such that <r, b,, b,,..., 6,1, 4 is a directed path. Thus the only directed
path from r to a consists of the unique path from r to b,_, followed by
the unique path from b,._, toa. J

The proofs of the following two corollaries are left as exercises.

Corollary 3.2.la: Every directed path in a tree is a simple path.

Corollary 3.2.16: There are no loops on nodes of trees.

Theorem 3.2.2: A tree has no directed simple cycles of nonzero length.


The only undirected simple cycles of nonzero length are of length 2.
Proof: From Corollary 3.2.1b it follows that no cycles of length 1 can exist.
The only simple cycles of length 2 are of the form <a, b, a>, where either a is the
son of b or b is the son of a; such a cycle is always undirected.
Suppose C = (dp, aj, d2,..., 4, A> is a simple cycle of length greater than
2; then k > 2. If <a, a,> is an arc of the tree and C is not directed, then there must
be some a, such that <a,_,, a;> and <a;,,, a;> are both arcs. Since the path is simple
and of length greater than two, a;_, + a,,, and hence a, has indegree 2, violating
the definition of a tree. Hence if <a , a,> is an arc, then C is directed. Similarly,
if <a), a,> is an arc, then the reversal of C, (ao, ay, Qy_1,---5 42, @,> Must be
directed. Moreover, either (ao, a,> or (day, a,> must be an arc since otherwise a,
would have indegree 2. Hence, if C is a cycle of length greater than two, then either
C or its reversal must be directed. Without loss of generality, assume C is directed,
and let <r, b,, b,,...,6,,@ > be the directed path from the root r to ay. Then
Kr, by, by, ..- 55, Ao, Gy,-- +5 Ag, Ag> iS a different directed path from r to a,
contradicting Theorem 3.2.1. Thus T contains no cycles of length greater than
2. |

Applications of trees often involve restricted classes of trees. A common restric-


tion is to limit the number of sons a node can have; if every node has n or fewer
sons, then a tree is called an n-ary tree. If every node has either m sons or 0 sons,
then the tree is called a complete n-ary tree. In many applications it is necessary
to impose an order on the arcs emanating from each node, or equivalently, order
the sons of each node. A tree in which the outgoing arcs of each node are ordered
is called an ordered tree, and we refer to the Ist, 2nd, ..., and nth son of a node.
The use of ordered trees is so common that it is often not explicitly specified,
although it is usually clear from context. We will use the term binary tree to denote
a 2-ary tree in which every node other than the root is specified to be either the
left son or the right son of its father.
Sec. 3.2
TREES 135

Examples

(a) Consider the following trees.

The tree 7; is a ternary tree but not a complete ternary tree; T, is


a complete
2-ary tree. As unordered trees, T, and T3; are equal; as ordered trees
they are
not equal because in T2, d is the first son of ¢ and in T;, d is the second son
of c.
The tree 7, is a 2-ary tree but not a complete 2-ary tree; if T, is a binary
tree, the
node c has a left son but no right son.
(b) The algebraic expression

(6 + 4) * 8) — (4 * 5))
can be represented by the following labeled ordered tree.

This is an example of the use of labeled ordered trees to represent assertions or


expressions in a language. The leaves of the tree are labeled with values or variable
names and the interior nodes are labeled with operators or connectives. Such trees
must be ordered if the operations and connectives are not commutative, e.g., the
trees

4 5 $ 4

represent the expressions 4 —- 5 and 5 — 4 respectively. Note that the information


provided by parentheses in the expression is implicit in the tree representation; each
BINARY RELATIONS Ch. 3
136

operand of the expression is nes ted in par ent hes es to a dep th tha t equ als its dis tan ce
from the root of the tree. Bec aus e exp res sio ns in inn erm ost par ent hes es are eva lua ted
first, the tree is evaluated by sta rti ng at the bo tt om and ass ign ing val ues to eac h
interior node. A node lab ele d wit h an ope rat or is ass ign ed the val ue whi ch res ult s
from performing the operat ion on the val ues of its son s. The pro ces s can be vie wed
as a collapsing of the tree upw ard ; thi s is ill ust rat ed by the fol low ing seq uen ce of
trees. Each tree of the sequen ce is obt ain ed fro m its pre dec ess or by col lap sin g a
subtree con sis tin g of a nod e and the two lea ves whi ch are its son s.

The procedure des cri bed abo ve is a “bo tto m-u p” eva lua tio n of the tree rep res ent ing
the expressio n. Suc h tree s can also be eva lua ted in a “to p-d own ” fas hio n by usi ng
a recursive pro ced ure to exp res s the val ue of eac h nod e in ter ms of the val ues of
its sons. We leav e it as an exe rci se to writ e a rec urs ive pro ced ure for top -do wn
evaluation of trees which represent algebraic expressions. 7

Search Trees

One of the most important uses of trees is for storing collections of records,
wher e each reco rd may cons ist of seve ral asso ciat ed data item s. Suc h a coll ecti on
of records is called a file. The choice of how a file is stored is based on a number
of fact ors, incl udin g the fre que ncy with whi ch cert ain oper atio ns are per for med
on the file. Common operations on a file include insertion of a new record, dele-
tion of a record, and searching for a record in the file. The most straightforward
search tech niqu es are base d on the valu e of som e spec ific field or item in each reco rd
called the search key. For example, a file consisting of employee records might use
the soci al secu rity num ber as a sear ch key; each reco rd wou ld then have the
employee’s soci al secu rity num ber as its sear ch key valu e. In man y sear ch tech -
niques, the value of the search key of the record sought is used to direct the search;
if the valu e of the sear ch key in each reco rd is uniq ue, then the sear ch key can be
used for record identification as well.
A file can be organized for fast access using a search key by means of a
type of binary tree known as a binary search tree. To illustrate search trees and
their use, we will assume a file whose key values are all distinct, and a search tree
in which a single record of the file is stored at each node of the tree. A binary
search tree is constructed so that if node b is the left son of node a, then the key of
every descendant of 5 (including b itself) is less than the key of a. On the other
hand, if node b is the right son of node a, then the key of every descendant of 8 is
greater than or equal to the key of a.
Sec. 3.2 TREES 137

Example
The following is a binary search tree. Each node label is the key of the record
stored at the node.

(s)
Q) ©)
©) oY) ©
(S) 20)
w) #
We will illustrate the use of binary search trees by describing two search pro-
cedures. To construct these programs, we need a way to refer to the key of the
record stored at a node of a tree as well as the left son and right son of the node.
If node is a program variable whose value is a tree node, then KEY (node) will
denote the value of the key stored in the record at node. LEFTSON (node) and
RIGHTSON (node) will have as values the left son and right son of node respec-
tively, if these sons exist; if no left son exists, the value of LEFTSON (node) will
be the distinguished value null and similarly for RIGHTSON (node).

procedure TREESEARCH 1 (root, arg):


comment: arg is the key of the record sought;
root is the root of the binary search tree if the file
is nonempty; otherwise root = null.
begin
node «- root;
while (node * null and arg # KEY (node)) do
if arg < KEY (node) then node — LEFTSON (node)
else node <—- RIGHTSON (nede);
if node = null then return “record not found”
else return node
end

Fig. 3.2.1 Iterative binary tree search

A search algorithm for records stored in a binary search tree is given in Figure
3.2.1. To find a record in the tree, we call TREESEARCHI (root, arg), where the
value of arg is the key of the record sought and the value of root is the root node
of the search tree unless the file contains no records, in which case the value of
root is null. After node is set equal to root, if node ~ null, then arg is compared
with KEY (node). If arg = KEY (node), then the record has been found and is
stored at the root node of the search tree. If arg < KEY (node), then either the
record is not in the file, or it is in the subtree whose root is LEFTSON (node).
138 BINARY RELATIONS Ch. 3

If arg > KEY (node), then either the record is not in the file or it is stored in the
subtree whose root is RIGHTSON (node).
The search proceeds by progressing down into the tree, at each step examin-
ing a node which is a son of the node previously examined. If the record is in the
file, the procedure will eventually find it by following the (unique) simple directed
path from the root of the tree to the correct node. If the record is not in the file,
the search will eventually either reach a node whose key value is greater than arg
and which has no left son, or it will reach a node whose key value is less than arg
and which has no right son. In these cases, the search procedure will terminate after
assigning the value of null to node.
The procedure TREESEARCHI given in Fig. 3.2.1 is called an iterative
procedure because the principal computation is done in a loop; in this case, the
loop uses a while statement. A search of a binary tree can also be done recursively.
The recursive search procedure rests on the following inductive definition of
binary trees. (This recursive definition of binary trees is equivalent to the non-
recursive characterization given earlier in this section, but we will not prove the
equivalence.)

Definition 3.2.3: The following digraphs are binary trees.


1. (Basis) A single node, together with the empty relation, is a binary
tree.
2. (Induction) Let T; and T, be binary trees with disjoint sets of nodes
and roots r, and r, respectively, and let r be a node not in either 7, or
T,. Then the following digraphs are binary trees with root r:
(a) The node r together with the tree 7, and a left arc fromrtor,.
(b) The node r together with the tree T, and a right arc from r to r,.
(c) The node r together with the trees 7, and T, and a left arc from r to
r, and a right arc from r to r,.
3. (Extremal) No digraph is a binary tree unless it can be constructed in
a finite number of steps using clauses | and 2.

The iterative strategy of TREESEARCHI might be described as “plunge down


into the tree until the record is found.” The recursive strategy implemented by
TREESEARCH2, given in Figure 3.2.2, can be described as “search the tree by
procedure TREESEARCH2(roo?, arg):
comment: arg is the key of the record sought;
root is the root of the binary search tree if the file is
nonempty; otherwise root = null.
if root = null then return “record not found”
else
if arg = KEY(root) then return root
else
if arg < KEY(root) then return TREESEARCH2(LEFTSON(roo1), arg)
else return TREESEARCH2(RIGHTSON(‘00?), arg)

Fig. 3.2.2. Recursive binary tree search


Sec. 3.2 TREES 139

examining the root and then, if necessary, searching either the left or right subtree
of the root.” TREESEARCH2 is called in the same way and returns the same
value as TREESEARCHI.
The height of a binary search tree is a measure of the maximum number of
Steps it will take to locate a record in the file. The following theorems relate the
size of the file to the height of a binary tree. The proofs are left as exercises.

Theorem 3.2.3: If T is a binary tree of height # and with n nodes, then


h+1<n< 21! — 1, Moreover, there exist binary trees in which these bounds
are attained.

Corollary 3.2.3: A binary tree with n nodes, n > 0, is of height at least


[log n|.t

We have described binary search trees with records stored at all nodes of the
tree. In some circumstances it is advantageous to store records only at the interior
nodes or only at the leaves. If records are stored at interior nodes, each leaf can
have an associated action which is to be taken if a search fails at that leaf. If records
are stored only at the leaves, each interior node contains a value for comparison
rather than an entire record. In this case, each leaf may be a single record, or it
may be a “bucket” which contains a subfile. A search for a record in such a tree
need not read all records of the file into main storage, since the interior of the
tree can be searched and the result used to bring only the appropriate bucket into
main storage.
Using only interior nodes or only leaves for record storage significantly in-
creases the number of nodes of the tree, but it has only a small effect on the height
of the tree; as a consequence, the number of steps of a search procedure in such
a tree is not much larger than that for other search trees. We leave it as an exercise
to show that if all leaves of a tree are approximately the same distance from the
root, then the height of a search tree with records stored only at the leaves is only
slightly greater than the height of one with records stored at all nodes.
If records are stored only at leaves of a search tree, then there must be some
value, which we will call a discriminator, associated with each internal node of the
tree. The discriminator of a node is used to direct the search process in the same
way as the key of a record stored at the node.

Example
The following graphs are binary search trees for a file with the key set
{0, 2, 4, 7, 8, 9}. In the tree on the left, records have been stored in all nodes and
each node is labelled with the key of its record. In the tree on the right, all records
are stored at the leaves, which we have drawn as squares. Labels of the internal
nodes of this tree are discriminators and need not be members of the key set.

{Unless specified otherwise, all logarithms in this book are to the base 2.
By storing more than one discriminator at each node, it is possible to imple-
ment ternary or higher-order searches. For example, for a ternary search, each
node has two discriminators d, and d, and an outdegree of 3 or less. When search-
ing for a record with key k, if k < d,, then the left subtree is searched, if d; <k <d,,
the middle subtree is searched, and if k > d,, then the right subtree is searched.

Example
The following graph represents a ternary search tree. The two discriminators
of each internal node are given as a node label x : y. Records are stored only at the
leaves of the tree.

OY 0 &
(2) fe] (920)
P} OI) Gs) Po GIGI bY

,
Tree Traversal Algorithms

When using trees as data structures, it is often necessary to traverse the tree,
that is, to inspect each data item stored in the tree. We will describe three traversal

140
Sec. 3.2 TREES 141

algorithms for binary trees; each traversal scheme will be defined by specifying
an order for processing the three components of root, left subtree and right sub-
tree. We consider the following three orders.

Vist the root, then the left subtree, then the right subtree.
Visit the left subtree, then the root, then the right subtree.
Visit the left subtree, then the right subtree, then the root.

Whatever choice is made, it is natural to apply the same strategy to the subtrees
as was chosen for the tree, making the traversal algorithm recursive. To describe
the three algorithms, we assume a binary tree T with root r, a left subtree T, and
a right subtree T,; note that T, and 7, may not exist. The order in which the nodes
of T are visited is called preorder, inorder, or postorder depending on whether the
root is visited first, second, or third. The following are recursive definitions of the
three traversal algorithms.
Preorder: 1. Process the root node r of T.
2. If 7, exists, then process 7, in preorder.
3. If T, exists, then process 7, in preorder.
Inorder: 1. If 7, exists, then process 7; in inorder.
2. Process the root node r of T.
3. If T, exists, then process 7, in inorder.
Postorder: 1. If T, exists, then process T, in postorder.
2. If 7, exists, then process 7, in postorder.
3. Process the root node r of T.

Example
The node labels of the following binary trees give the order in which the nodes
are visited by each of the traversal algorithms.

3 © ®
2) (4) 09 (s) 09)
OOM OUOWO @O® ©
4) @ & OO 2)
© ©
(s) @) )
Preorder Inorder Postorde #
142. BINARY RELATIONS Ch. 3

Algorithms based on one of these traversal schemes are naturally given as


recursive procedures. Figure 3.2.3 gives a recursive procedure which uses an
inorder traversal to list all keys stored in a tree.

procedure LIST(root):
comment: using inorder traversal, list keys stored in binary tree.
begin
if LEFTSON(root) + null then LISTC(LEFTSON(root));
print KEY(root);
if RIGHTSON(root) # null then LIST(RIGHTSON(root))
end

Fig. 3.2.3 Procedure to list the keys of records stored in a


binary tree

If L is the set of possible node labels of a tree, then each traversal order cor-
responds to a unique word w over the alphabet L for any given tree. In general, it
is not possible to reconstruct the tree given only the word w and the traversal order,
but this reconstruction can be done in certain important cases. In particular, if.
a labelled tree represents an algebraic expression, then each internal node is labelled
with an operation, such as +, —, *, and /, and each leaf is labelled with a variable
or a value. For such trees, if the node labels are listed in either preorder or post-
order, the result is a word from which the original algebraic expression can be
reconstructed. This way of representing algebraic expressions is known as paren-
thesis free or Polish notation and is extremely convenient for computer evaluation.
Evaluation is usually done using a pushdown store; a discussion of this topic is
beyond our scope.

Example
Consider the algebraic expression (a — (b + c)) * d and its associated labelled
binary tree:

Preorder traversal results in the word « — a + bcd, and postorder traversal pro-
duces abc + —d+. Both of these words can be used to reconstruct the original
tree, but the inorder expression a — b + c*dis ambiguous. #
Sec. 3.2 TREES 143

Problems: Section 3.2

1. State which of the following digraphs are trees. For those that are not, state why.

(a) (b) d (c) a c


> SN
QQ: c b
a

@ ¢ (e) ° (f) a
d a b b aN
.
f b ;

e c
é€ f

Ae
d

2. For each of the following trees identify the root, the leaves, the height, and all proper
subtrees.

(ay 2 (b) (c) (ad) @


@a
Q

3. Prove Corollary 3.2.1a.


4. Prove Corollary 3.2.1b.
5. Let a andb be distinct nodes of a tree. Prove that there is exactly one simple un-
directed path from a to b.
. Prove that if a tree has n nodes, then it has n — 1 arcs.
nN

(a) Prove that if any arc of a tree is deleted, the resulting digraph is not connected.
Pal

(b) Characterize the digraph which results when a single arc is deleted from a tree.
Give a recursive definition of the height of a binary tree.
%

. Let S bea finite set of k integers. Describe an algorithm to construct a binary search
\o

tree with k nodes, where each node is labelled with a distinct element of the set S.
Your algorithm should produce a tree of height Llog, (4)|.
10. Prove Theorem 3.2.3.
11. Prove Corollary 3.2.3.
144 BINARY RELATIONS Ch. 3

12. (a) Prove that the number of interior nodes of a binary tree of height A > 0 is less
than 2’-!,
(b) Find an uppe r bou nd for the num ber of inte rior node s of an n-ar y tree of
height A.
13. Consider the following labelled binary tree.

Give the sequence of labels encountered when the tree is traversed in each of the
following orders.
(a) Preorder
(b) Inorder
(c) Postorder
14. Represent the following propositional forms as ordered trees.
(a) (AV B)>C]+(DV A)
(b) (A>B)A [OC V B= 4] (Note that this expression contains a unary
operator.)
15. Construct the labelled binary tree corresponding to the following parenthesis-free
expressions. These expressions were obtained by traversing the trees in the order
given.
(a ——-—abed (preorder)
(b) —a—b—cd (preorder)
(c) abcwdex] + (postorder)
16. Write a recursive procedure to evaluate an algebraic expression represented by
a labelled binary tree. Assume that the leaves of the tree are labelled with integers and
the only operations used are the binary operations -+, —, *, and /.
17. Show that inorder traversal of labelled trees representing algebraic expressions may
produce an ambiguous expression; in particular, two trees representing different
expressions can produce the same word when inorder traversal is used.
18. (a) Show that the number of leaves on a complete binary tree is always one greater
than the number of interior nodes of the tree.
Sec, 3.3 SPECIAL PROPERTIES OF RELATIONS 145

(b) Find an expression for the number of leaves on a complete n-ary tree in terms
of the number of interior nodes of the tree.
19. Let'T, be a complete binary search tree of height 4, with records stored in both inte-
rior nodes and leaves such that the length of any path from the root of T, to a leaf
is either h, or h, — 1. Let T, be a complete binary search tree with records stored
only at the leaves; T; is of height A. and the length of any path from the root of T,
to a leaf is either h, or h, — 1. Suppose both search trees contain n records.
(a) What is the difference in the heights of the trees?
(b) What conclusions can be drawn about the difference in the maximum number
of nodes visited in searching for a record in the two trees?
20. An array A can be used to represent a binary tree as follows:
(i) The root value is stored at A[1].
(ii) For each i such that a value of a tree node is stored at A[/], the value of the left
son of Ai] is stored at A[2i] and the value of the right son of A[iJis stored at
A{2i + 1).
A distinguished value can be used to indicate that the corresponding tree node does
not exist.
(a) How many entries must the array have if the tree is of height A?
(b) Generalize this technique for n-ary trees.
21. Let T be a complete binary tree with n leaves, b;,b2,...,b,, and let d, be the
length of the path from the root to leaf 6;,, 1 <i<n.
(a) Show that 7.1; 27# = 1.
(b) (For students with an understanding of elementary probability.) Interpret the
equality of part (a) in terms of probabilities, and generalize the equality for
complete n-ary trees.
(c) Show max{d,} > [log nx].

Programming Problems

1. Write a recursive program to determine the height of a binary tree.


2. Let T be a binary tree whose nodes are labeled with positive integers, and let A be an
array used to represent T as described in problem 20, where the array entry 0 is used
to indicate that a node does not exist.
(a) Write a procedure to print out the node labels of T in postorder.
(b) Write a procedure to search for a node label in T.

3.3 SPECIAL PROPERTIES OF RELATIONS

Certain properties of binary relations play particularly important roles in a wide


variety of contexts. We will now define these properties and interpret them in
terms of digraphs.

Definition 3.3.1: Let R be a binary relation on A. Then


(a) Ris reflexive if xRx for every x in A.
(b) Ris irreflexive if xR x for every x in A.
146 BINARY RELATIONS Ch.3

(c) Ris symmetric if xRy implies yRx for every x,y € A.


(d) R is antisymmetric if xRy and yRx together imply x = y for every
x,y € A,
(e) Ris transitive if xRy and yRz together imply xRz for every x, y,z € A.

A relation R is reflexive on a set A if xRx for every x < A, i.e., every element
is in the relation R to itself. The digraph of a reflexive relation has a loop on every
node of the digraph. A relation R is irreflexive on A if no element x € A is in the
relation R to itself. The digraph of an irreflexive relation does not have loops on
any nodes. Note that it is possible for a relation R to be neither reflexive nor
irreflexive; the graph of such a relation would have loops on some but not all
nodes.

Examples
(a) The relation of equality (a = 4) is reflexive on any set.
(b) Consider the set of integers I. The relation < is reflexive and not irreflexive,
and the relation < is irreflexive and not reflexive.
(c) Consider the following relations on the set £*, where & = {a, b}. The relation
“is the same length as” is reflexive and not irreflexive. The relation “is longer
than” is irreflexive and not reflexive. Let R bea relation such that xRy if and only
if some proper prefix of x is a proper suffix of y. Then R is neither reflexive nor
irreflexive, since aaRaa but abRab. #

A relation on a set A is symmetric if xRy implies yRx. If D is the digraph of


a symmetric relation, then there are either two arcs or no arcs between any two
distinct nodes of D. In contrast, if D is the digraph of an antisymmetric relation,
then there is either one arc or no arcs between any two distinct nodes of D. Loops
may, but need not occur on nodes of digraphs of both symmetric and antisymmetric
relations.

Examples
(a) The relation of equality on any set is both symmetric and antisymmetric.
(b) For the set of integers I, the relations < and < are both antisymmetric;
neither is symmetric. The relation “xRy if and only if the absolute values of
x and y are equal” is symmetric and not antisymmetric.
(c) For the set X*, the relation “is a substring of” is antisymmetric and not sym-
metric. The relation “xRy if and only if x and »y have a common nonempty
prefix” is symmetric and not antisymmetric. +

If R is a transitive relation, then whenever xRy and yRz it follows that xRz.
If D is the digraph of a transitive relation, and there are arcs from x to y and from
y to z, then there is an arc from x to z. It follows that if D is the digraph of a transi-
tive relation R and there is a path of length greater than 0 from. x to y, then there is
an arc (a path of length 1) from x to y.
Sec. 3.3 SPECIAL PROPERTIES OF RELATIONS 147

Examples
(a) The equality relation is transitive for all sets.
(b) For the set of integers I, the relations < and < are transitive. The relation
“xRy if and only if x divides y” is also transitive.
(c) For the set £*, the relation “is a prefix of,” “is a proper prefix of,” “is a-sub-
word of,” and “is the same length as” are all transitive relations. +

We conclude this section with examples which list the properties of some
specific relations.

Examples
Consider the set {1, 2,3} and the relations represented by the following
digraphs.

@) 2 2
2

OQ: 3) Ci 3 3
Ry Ry R;

(a) R, is the equality relation on the set A. It is reflexive, symmetric, antisym-


metric, and transitive. It is not irreflexive.
(b) The relation R2 is symmetric but not reflexive, irreflexive, antisymmetric, or
transitive.

(c) The relation R; is irreflexive and antisymmetric, It is not reflexive, symmetric,


or transitive.
(d) The relation R, is the empty relation on A. It is irreflexive, symmetric, antisym-
metric, and transitive, but not reflexive.

(e) Rs is the universal relation on A. This relation is reflexive, symmetric, and


transitive, but not irreflexive or antisymmetric. #

Problems: Section 3.3

1. List the properties defined in Definition 3.3.1 which hold for the relations represented
by the following digraphs.
148 BINARY RELATIONS Ch, 3

(a) (b) @)

ee:
(CO , (OK)

d c

Describe the following relations in terms of the properties of Definition 3.3.1.


(a) For the set of the integers I, xRy if and only if x and y are both positive or are
both negative.
(b) For the set of integers I, xRy if and only if|x — y| = 4or|x —y| =8o0rx=y.
Consider the set of integers I. Fill in the following table with Y(yes) or N(no) accord-
ing to whether the relation possesses the property. The notation ¢ denotes the empty
relation, I x I is the universal relation, and D denotes “divides with an integer
quotient” (e.g., 4D8 but 47).

Reflexive
Irreflexive
Symmetric
Antisymmetric
Transitive

Transcribe each part of Definition 3.3.1 into logical notation. For example, part (a)
becomes
R is reflexive <> Vx[x € A => xRx]

(a) Find a nonempty set and a relation on it which is neither reflexive nor ir-
reflexive. Choose the set to be as small as possible. What if the set is permitted
to be empty?
(b) Construct a binary relation on a nonempty set which is neither symmetric nor
antisymmetric. Choose the set to be as small as possible. What if the set is
permitted to be empty? :

Consider the set of binary relations over an arbitrary set A. We say a property of
relations is preserved under a particular set operation if applying the operation to the
relation(s) results in a relation with the same property. For example, the reflexive
property is preserved under the binary operation of set union since the union of two
reflexive relations is reflexive. However, the reflexive property is not preserved under
the unary operation of set complement, since the absolute complement of a reflexive
relation on a nonempty set is not a reflexive relation. Complete the following table
Sec. 3.4 COMPOSITION OF RELATIONS 149

with Y (yes) and N (no) according to whether the given property is preserved under
the indicated set operation. For each “no” answer, give a counterexample.

Union Intersection Relative Complement Absolute Complement


Ri UR2 Ri OA Re Ri — R2 (A x A) — Ry

Reflexive
Irreflexive
Symmetric
Antisymmetric
Transitive

7. Sketch graphs of the following relations on the set of real numbers and determine
for
each relation which of the properties in Definition 3.3.1 apply.
(a) {<x, y>|x = y}.
(b) {<x»>,|x? -1=0 A y > 0}.
(©) Kx wl lxl<1 Aly|>0.
8. (a) State which of the following terms apply to the binary relations represented by
trees: reflexive, irreflexive, symmetric, antisymmetric, transitive.
(b) Does the list of applicable terms completely describe the relations represented
by trees, or are there binary relations which possess these characteristics whose
digraphs are not trees?

Programming Problem

Write a program which takes as input the ordered pairs of binary relation and deter-
mines which of the properties of Definition 3.3.1 apply.

3.4 COMPOSITION OF RELATIONS

It is often easier to describe how to construct a relation than to give a direct


characterization. We already have a variety of set operations which can be used to
construct new binary relations from old ones. If R, and R, are binary relations
from A to B, then R, U R,, Ri M R,, R, — Rz, and R, are all binary relations
from A to B, where the complement is taken relative to the universal set A x B.
The operation of composition of relations permits the use of a sequence of
relations to define a new relation. Suppose R, is a relation from A to B and R, is
a relation from B to C. The composite relation R,R, is a relation from A to C as
shown in Fig. 3.4.1. In terms of this diagram, <a, c> € R,R, if there is a path of
length 2 from a € A toc & C, where the first edge of the path represents an ele-
ment of R, and the second edge of the path represents an element of R,.
The composite relation R,R, is the result of applying a binary operation of
composition to the operands R, and R,. The operation of composition of relations
is implicitly defined by the following definition of a composite relation.
150 BINARY RELATIONS Ch. 3

alieiaetaainanaaianimetamtant » represents an element of R,


——-—— - — -——- — represents an element of Ry
nen —nnenneoege Tepresents an element of R; Ry

Fig. 3.4.1. The composite relation Ri R2

Definition 3.4.1: Let R, be a relation from A to B and R, a relation from


B to C. The composite relation from A to C, denoted R,-R, or R,R,, is defined
as follows: ‘
R,R,={<a,o|acAN\eceCA abe BA <a,b € Ri A <b, € Ry}.
(Note that if R, and R, are relations from A to B and from C to D respectively,
then R,R, is not defined unless B = C.)

Examples
(a) If R, is the relation “is the brother of” and R, is the relation “is the father of,”
then R,R, is the relation “is the paternal uncle of.”
(b) If R; is the relation “is the father of,” then R,R, is the relation “is the paternal
grandfather of.”
(c) In the execution of programs written in a high level language, a sequence of
data conversions sometimes occurs. For example, a string of decimal digits
in an arithmetic expression may be converted first to a binary integer represen-
tation and then to a floating point representation. If <x, y> € R, implies
that digit string x is converted to binary integer y, and <y, z> € R2 implies
that binary integer y is converted to floating point number z, then <x, z> €
R,R,z implies that digit string x is converted to the floating point number z, +

Theorem 3.4.1: Let R, be arelation from A to B, R, and R, be relations from


B to C, and R, be a relation from C to D. Then
(a) R,(R, U Ry) = RR, U RR;
(b) Ri(R. OM Rs) Sc RR, OM RR;
(c) (R, U R)Ry = R,Ry U RyRy
(d) (R,O R,)R, < RR, O R3R,
Sec. 3.4 COMPOSITION OF RELATIONS 151

Proof:
(a) <a,c> € R,(R, U R;) if and only if there exists some b € B such that
- <a, b> & R, and <b, c> € R, U R;. Furthermore,
dd[<a, b> € R, A <b, > © R, U Rs]
<> Aba, b> € Ry A (<b, > € Ry V <b, > € R;)]
<> Ab[(<a, b> € Ry A <b, c> & R,) V Ka, b> € R; A <b, © R;)]
<> Ad[<a, b> € Ry A <b, c> € Ry] VV Abl<a, bd ER, A <b,c> € R;]
<> a,c> € R,R, \/ <a, c> © RR;
<> <a,c> © R,R, U R,R3.
We leave the proofs of parts (b)-(d) as exercises. J

The operation of composition is clearly not commutative; in fact, R,R, may


not be defined even though R,R, is. The next theorem establishes that the opera-
tion is associative.

Theorem 3.4.2: Let R,, R, and R, be relations from A to B, Bto Cand C to


D respectively. Then (R,R,)R; = R,(R,R;).
Proof: We first show (R,R,)R; < R,(R,R;). Let <a, d> € (R,R,)R;. Then
for some ¢ € C, <a,c> € R,R, and <c,d> © R;. Furthermore, since <a, €
R,R, there exists 6 € B such that <a, b> € R, and <b, cd € R,. Since <b, c> € R,
and <c,d> € R;, it follows that <b, d> € R,R, and therefore <a, d> € R,(R,R;).
The proof that R,(R,R;) < (R,R,)R; is similar and is left to the reader. |

The above proof is in two parts in order to show containment in both direc-
tions. The theorem can also be proved using a sequence of equivalences as fol-
lows:
Proof:
<a, d> € (R,R,)R3
<> dc[<a, c> € R,R, A <c,d> € R3]
<> Ac[db[<a, b> € R, A <b, c> € Ra] A <c, d> © R3]
<> AcFb[[<a, b> © R, \ <b, c> & Re] A <c, dd © Ry]
<> dcdb[<a, b> € R, A [Xb c> € Ry A <c,d> & Ral]
<> TbAcl<a, b> € R, A [<b, c> € Ry A <c, dd © Ral]
<> db[<a, b> € R, (A Acl<b, c> € R, A <c, dd € R3J]
<> Fla, bY & R, A <b, d> © RyRy]
<<a,d>€R,(R,R,). |

Since composition is associative, we usually omit parentheses and write


152 BINARY RELATIONS Ch. 3

R,R,R;. As with other associative binary operations, placement of parentheses


is unimportant in specifying composite relations.
When R is a relation on a set A, then R can be composed with itself any num-
ber of times to form a new relation on the set A. In this case, RR is often denoted
by R2, RRR by R3, etc. We can define this notation inductively as follows.

Definition 3.4.2: Let R be a binary relation on a set A and letn € N. Then,


the nth power of R, denoted R", is defined as follows:
1. R° is the relation of equality on the set A; R° = {<x, x>|x € A}.
2. Ret! = RR.

Theorem 3.4.3: Let R be a binary relation on A, and let m and n be elements


of N. Then,
(a) R™R* = Rate

(be) (RnY = Re
The proofs are by induction and are left as exercises.
If D is the digraph of a binary relation R on a set A, then <x, y> © R” if and
only if there is a path of length n from node x to node y.

Examples
Let A = {a, b, c, d} and let R be the relation on A represented by the following
digraph:

R? by
Sec. 3.4 COMPOSITION OF RELATIONS 153

We note that R* = R* and hence R4R = R?*R, that is, R5 = R3. Similarly
R®& = R* = R2, It follows (by induction) that R2**! = R3 and R2" = R2 for n> 1.
This relationship can be represented by the following digraph where each node
represents R* for some k and an arc exists from XY to Yif ¥-R = Y.

In the preceding example, not all powers of the relation R were distinct rela-
tions. In fact, if R is a relation on a finite set, this will always be the case, as the
following theorem asserts.

Theorem 3.4.4: If A is a finite set with n elements and R is a relation on A,


then there exist s and ¢ such that R° = R' and 0 <s<t<2”,

Proof: Each binary relation on A is a subset of 4 x A. Since A x A has


n? elements, P(A x A) has 2” elements. Hence, there are 2” distinct relations on
A and therefore no more than 2” distinct powers of R. But the list R°, R},..., R2”
has 2” + 1 entries and hence at least two of these powers of R must be equal. |

If R is a relation on an infinite set A, then there may not exist two integers
s and ¢ such that R* = R‘. For example, if A= Nand <x,y) © Rey=x+1,
then <x, z> € R'<>z =x + 5; in this case, all powers of R are distinct relations
on A,

Theorem 3.4.5: Let R be a binary relation on a set A and suppose R® = Rt


for some s and ¢ with s < t. Let p = t — s. Then
(a) R°**= R'** for all k > 0.
(b) R°tket?= Rs*' for all k, i > 0.
(c) Let S = {R°, R', R*,..., R'~}. Then every power of R is an element
of S,ie., R? € Sforallg < N.

Proof: Parts (a) and (b) require proofs by induction and are left as exercises.
(c) Letg e¢ N. If q <4, then R? € S by definition of S. Suppose g >t.
Then we can express q in the form s + kp + i, where i < p. By part (b),
it follows that R? = R**', Since s + i < t, this establishes that R° <¢ S. fj

Problems: Section 3.4

1. Let R, and R, be relations on a set A = {a, b, c, d} where

Ri = {<a, a>, <a, b>, <b, dy}

= {<a, d>, <b, >, <b, d>, Xe, b>}.


Find R;R2, R2R,, R?, Ri.
154 BINARY RELATIONS Ch. 3

2. Let R be represented by the following tree.

Sketch the digraphs of the relations R” for n € N.


3. Let A = {a, b,c, d, e, f, g, h} and let R be the binary relation on A as represented by
the following two-component digraph.

h g

Find the smallest integers m and n such that m <n and R™ = R’.

4. Prove that if R is either the empty relation or the universal relation on a set A, then
R2 = R,
5. Prove or disprove:
(a) If D is the digrap h of a relati on R and D is connec ted, then the digrap h of R” is
connected for every n > 0.
(b) If D is the digrap h of a relati on R and D is strong ly connec ted, then the digrap h
of R* is strongly connected for every n > 0.
6. (a) Prove part (b) of Theorem 3.4.1.
(b) Give examples to show that the containment of parts (b) and (d) may be proper.
7. Prove Theorem 3.4.3.
8. Prove parts (a) and (b) of Theorem 3.4.5.
Let R; and R, be arbitrary relations on a set A.
Prove or disprove the following assertions.
(a) If R; and R, are reflexive, then R,R, is reflexive.
(b) If R, and R, are irreflexive, then R,R, is irreflexive.
(c) If R,; and R, are symmetric, then R;R,z is symmetric.
(d) If R,; and R, are antisymmetric, then R,Rz is antisymmetric.
(e) If R, and R, are transitive, then R,R, is transitive.
Sec. 3.5 CLOSURE OPERATIONS ON RELATIONS 155

3.5 CLOSURE OPERATIONS ON RELATIONS

In the previous section we described the use of the operation of composition to


generate new relations from old ones. In this section we will show how to construct
a new relation R’ from a given relation R by requiring that R’ contain R and that
R’ have certain properties. The relation R’ will be formed by adding to R only those
ordered pairs necessary for the properties to hold. For example, consider a data
communications network with data paths between pairs of cities. If we wish to
send messages from A to B and there is no direct transmission facility from A to
B, then it may be possible to route the message through intermediate cities. The
question we address is a global one: For what pairs of cities A, B are there paths
from A to B? Most likely, the specification of the communications network is given
as a set of local connections, that is, the pairs of cities C and D such that there is
a direct transmission route from C to D. Thus, we are given one binary relation
R which specifies a local property (the arcs of a graph), but we are interested in
another binary relation R’ which concerns a global property (the paths of the
graph). The relation R’ is transitive and can be constructed from R. We will define
a class of “closure” operations for binary relations which will enable us to con-
struct R’ from R.

Definition 3.5.1: Let R be a binary relation on a set A. The reflexive (sym-


metric, transitive) closure of Ris the relation R’ such that
(i) R’ is reflexive (symmetric, transitive);
(ii) R’ > R;
(iii) For any reflexive (symmetric, transitive) relation R”’, if R’’ > R, then
RU DR’.
We will denote the reflexive closure of R by r(R), the symmetric closure by s(R)
and the transitive closure by f(R).

If R is a binary relation on set A, we can form its reflexive (symmetric, transitive)


closure by adding to the relation R all the ordered pairs which are needed to make
the new relation reflexive (symmetric, transitive). But part (iii) of Definition 3.5.1
stipulates that no pairs shall be added unless necessary. Thus R’ is the smallest
relation such that R’ is reflexive (symmetric, transitive) and R’ > R. If R is already
reflexive (or symmetric, or transitive), then the smallest relation which has this
property and contains R is R itself. This is implied by the following theorem.

Theorem 3.5.1: Let R be a binary relation on a set A. Then


(a) R is reflexive if and only if r(R) = R.
(b) R is symmetric if and only if s(R) = R.
(c) R is transitive if and only if ¢(R) = R.
156 BINARY RELATIONS Ch. 3

Proof:
(a) If R is reflexive, then R has all the properties given in Definition 3.5.1 for
the relation R’. Hence r(R) = R. Conversely, if r(R) = R, then by prop-
erty (i) of Definition 3.5.1, R is reflexive.
The proofs of (b) and (c) are similar. J

Forming the closure of a binary relation is conveniently viewed in terms of


digraphs. For example, a digraph represents a reflexive relation if and only if it
has loops on every node. Thus, if D is the digraph of a binary relation R on a set
A, we can form the digraph of the reflexive closure of R, r(R), by adding a loop to
every node of the digraph D which does not already have one.
The following theorem is another form of this assertion. (In the remainder of
this section, E will denote the equality relation on an arbitrary set A; that is,
E = {<x, x>|x € A}.)

Theorem 3.5.2: Let R be a binary relation on a set A. Then r(R) = RU E.


Proof: Let R'’= RUE. We show that R’ satisfies Definition 3.5.1. By
construction, R’ is reflexive and R’ > R. Suppose R”’ is a reflexive relation on A
and R’” > R. We must show R” > R’. Consider an arbitrary <a, b> € R’. Then,
since R’ = R U E, either a = b or <a, b> € R. If a = Bb, then <a, b> € R” since
R’ is reflexive. If (a, b> € R, then <a, b> € R” since R” > R. Thus, if <a, b> € R’,
then <a, b> € R’’. Consequently, the conditions of Definition 3.5.1 are satisfied
and R’=r(R). J

Examples
(a) The reflexive closure of the relation < on the integers Lis <.
(b) The reflexive closure of E is E.
(c) The reflexive closure of + is the universal relation.
(d) The reflexive closure of the empty relation is the relation of equality, E. #

The concept of the converse of a relation will be useful in discussing sym-


metric closure.

Definition 3.5.2: Wet R be a binary relation from A to B. The converse of


the relation R, denoted R’, is the binary relation from B to A defined as follows:
Re = {y, 1%, YD © RI.
If D is the digraph of the relation R, the digraph of R¢ can be constructed from
D by reversing the direction of all the arcs of D.

Examples
(a) The converse of the relation < on J is the relation >.
(b) The converse of the relation < on a collection of sets is the relation >. #

The following theorem states some of the properties of converses.


Sec. 3.5 CLOSURE OPERATIONS ON RELATIONS 157

Theorem 3.5.3: Let R, R,, and R, be binary relations from A to B. Then each
of the following holds.
(a) (RY°=R
(b) (R, UR, c= Rj U RS
() (Ri 0 RR.) = REM RS
(d) (A x BY = BA
©) #=¢ _ _
(f) (R)° = (R°), where R denotes (4 x B) — R.
(g) (R, — R,)° = Ri — Ry
(h) If A = B, then (R,R,)° = RSRS
(Gi) R, oR, > REC RS
Proof:
(a) ((R°) = R.) Let <x, y> be an arbitrary element of R. Then, <x,y> ER
<> <y, x> © R° <> <x, y> € (R*)*; therefore (R°)° = R.
(b) (R, UR.) = Rf U R3.)
<x, ¥) € (Ri U Rx> ¢y, xD ER, UR,
<> <y,x) © Ry V Cy,2D ER,
<> <x,y> © REV <x,y> © RG
_ _ _ <> x, y> © RLU Rj.
(fF) (RY = (R*).) Gy > € (RY yD ER
> x ER
<> <x, y> E Re
<> <x, y> © (R*). _
(g) (R, — Ry) = Ri — Rj.) Using the identity R, — R, = Ri A R,, we
have (R, — Ra)? = (Ri A R,)° = REO (R,)°
= RL (R35)
= Rj — RS.
The proofs of the remaining parts are left as exercises. J
Theorem 3.5.4: Let R be a binary relation on A. Then R is symmetric if and
only if R = R°.
We leave the proof as an exercise.

The converse of a relation R is closely related to s(R), the symmetric closure


of R. Let R be a binary relation on a set A and let D be the digraph associated with
R. The digraph of the symmetric closure of R can be obtained from D by making
all the arcs of D into “two-way” edges so that if there is an arc from a to b, then
there is also one from b to a. Expressed in terms of R*, this becomes

Theorem 3.5.5: .Let R be a relation on a set A. Then s(R) = RU R*.


Proof: We must show that R U R’ is the smallest symmetric relation which
contains R. We first observe that R U R* contains R. Furthermore, by Theorem
3.5.3, RU R° is symmetric since (RU R*)* = R° U (R*°)* = R? UR. Now
suppose R’ is symmetric and R’ > R. We must show R’ > RU R* Let
158 BINARY RELATIONS Ch.3

<a, b> © RU R*. If <a, b> © R, the n <a, b> © R’ by hyp oth esi s. If <a, b> € R*,
then <b, a> € R and the ref ore <b, a> € R’. But R’ is sym met ric and the ref ore
<a, b> € R’. It follows that RU R° = R’.

Examples
(a) The symmetric closure of the relation < on the integers I is the relation +,
or E.
(b) The symmetric closure of < on the integers I is the universal relation.
(c) The symmetric closure of Eis E, and of #is#. #

If D is the digraph associated with a binary relation R on a set A, the transi-


tive clo sur e of R, t(R) , cor res pon ds to the dig rap h D’ whe re D’ has an arc fro m
ato bif D hasa pat h of non zer o len gth fro m a to b. The nex t the ore m rest ates this
assertion in terms of powers of R.

Theorem 3.5.6: Let R bea binary relation on the set A. Then


(R)= Ven, R=RURUR YL...
Proof: The proof is in two parts.
(i) Us. Rc t(R). We first show by induction that R” < 7¢(R) for every
n> 0.
1. (Basis) From Definition 3.5.1, part (ii), it is immediate that R < ¢(R).
2. (Induction) Suppose R” < t(R), n > 1, and let <a, b> € R**?. Since
Rt! = R*R, there exists some c € A such that <a,c> € R” and
<c,b> € R. By the induction hypothesis and the basis step,
<a, c> & t(R) and <c, b> € t(R). Because f(R) is transitive it follows
that <a, b> € t(R), thus establishing that R’*' < 7(R).
Since R* < ¢(R) for all n > 1, we conclude that J, Ri < t(R).
(ii) t(R) c Use, BR’. We first show that (JR, R’ is transitive. Let <a, b>
and <b, c> be arbitrary elements of |=, R’. Then for some integers
s<>landt>1,<a,b> € R' and <b, c> € R‘. Then <a, > € R'R’, and
by Theorem 3.4.3, R'R’ = R°*!. Thus <a,c> € (JR, R’ and therefore
(J, R’ is transitive. Since ¢(R) is contained in every transitive relation
which contains R, it follows that t((R) c Jn, R’. §

If R is a binary relation on A, then <a, b> € t(R) if and only if there is a se-
quence of elements of A, Co, C,,...,¢,, Where n> 1, cg = a, ¢,=5 and for
O<i<n,<c, C4. € R. If Dis the digraph of R, then <a, b> € t(R) if and only
if there is a path of nonzero length from node a to node b.

Examples
(a) Let R bea relation on I such that aRbd if and only if b = a + 1. Then 7(R) is
the relation <.
(b) Let R be the relation “is the child of.” Then ¢(R) is the relation “is the
descendent of.”
Sec. 3.5 CLOSURE OPERATIONS ON RELATIONS 159

(c) Let R be the relation < on a set of integers A. Sorting the elements of A
according to R requires finding the smallest relation R’ on A such that R =
(R’). #
When A is finite with n elements, it follows from Theorems 3.4.4 and 3.4.5 that
an
t(R) = Rt
t=]

The following theorem establishes a smaller bound on the number of powers of R


required to form f(R).

Theorem 3.5.7: Let R be a binary relation on a set A where A has n elements.


Then
t(R) = U R
Proof: It suffices to show that R* c Jr, R’ for all k > 0. Suppose
<x, y> € R*. Then there is a directed path of length & from x to y in the digraph
<A, R>, and by deleting cycles from this path we can construct a simple directed
path from x to y. Since the longest possible simple path in a graph with n nodes is
_ of length a, it follows that <x, y> € R' for some 0 <i<n. Hence R* < |_J., R?
fork >0. ff

Example
Consider the following digraph representing a relation R.

a b c d

Then, ¢(R) = R U R? U R} U R‘ is represented by the following digraph.

The following theorems develop some additional characteristics of the closure


operations.

Theorem 3.5.8:
(a) If Ris reflexive, then s(R) and f(R) are reflexive.
(b) If R is symmetric, then r(R) and ¢(R) are symmetric.
(c) If R is transitive, then r(R) is transitive.

The proofs of all parts are straightforward and are left as exercises.
160 BINARY RELATIONS Ch. 3

Theorem 3.5.9: Let R be a binary relation on a set A. Then


(a) rs(R) = sr(R),
(b) rt(R) = tr(R),
(c) ts(R) > st(R)..
Proof: Let E denote the equality relation on A.
(a) sr(R) = s(R U E)=(RUE)U(RV EV’
=RUEVURUE*=RURVEH= VU A2( R
R*) = rs(R).
(b) We first note that ¢r(R) = 1(R U E) and rt(R) = t(R) VE.
Using the fact that ER = RE = Rand that E” = Eforalln € N, it
follows that (R U Ey) = EU UE, Ri.
Therefore,
ir(R) = ((R UV E) = Uni (RU BY
=(RUE)U(RVEPOURVEVP YU
=EURUR UR):
= EU t(R)
= rt(R).
(c) We use the property thatif R,; > R,, then s(R,) > s(R,) and ¢(R,) > t(R,).
By definition of the symmetric closure, s(R) > R. By successively form-
ing the transitive and then symmetric closure of both sides, we find
ts(R) > t(R) and sts(R) > st(R). But ts(R) is symmetric by Theorem
3.5.8, so sts(R) = ts(R). Hence ts(R) > st(R). fj

Example
The relation < on the set of integers I can be used to show that in general
st(R) 4 ts(R). For st(<) = s(<) = # (.e., st(<) is the inequality relation), while
ts(<) = t(4) =I x I (ie., ts(<) is the universal relation). +

The transitive closure and reflexive transitive closure operations are used in
several application areas. The “plus” and “star” notations are used to denote
these closure operations in a way analogous to the use of At and A* to denote
closure operations on a language A.

Definition 3.5.3: If Ris a binary relation on a set A, then R* (read “R plus”)


denotes ¢(R), the transitive closure of R, and R* (read “R star”) denotes tr(R), the
reflexive transitive closure of R.

The plus and star closure operations are often used in studying formal lan-
guages and models of computation as well as application areas such as compiler
design.

Example
Let P = {P,,P2,...,P,} be the set of programs and subroutines in a pro-
gram library. Define the binary relation = over P as follows:
Sec. 3.5 CLOSURE OPERATIONS ON RELATIONS 161

P; => P, if and only if P, calls P,; for some input.


The relation =* characterizes all programs which might be called duri
ng the execu-
tion of a program:
P,; =>* P, if and only if execution of P; may cause P, to be called.
The relation =* characterizes all programs which might be active at
some point
during the execution of a program:
P; =>* P, if and only if P; might be active at some time during the executio
n of
P,;.
Note that P; >*P, for all i, but P; >* P, only if P; can cause itself to be ca
lled, i.e.,
only if P; is recursive. #

Problems: Section 3.5

1. Find the reflexive, symmetric, and transitive closures of each of the foll
owing.

(a) ae es

(b) ena

(c) we

2. Prove the remaining parts of Theorem 3.5.3.


Prove Theorem 3.5.4.

Let R, and R, be relations on a set A and suppose R; > R,. Prove each of the
following.
(a) r(R1) > r(R2)
(b) s(Ri) > s(R2)
(c) t(Ri) > t(R2)
5. Let R; and R, be relations on A. Prove each of the following.
(a) r(R, U R2) = r(R}) U r(R2)
(b) s(Ry U R2) = s(Ri) VU 5(Ra)
(c) t(R; U R2) > t(R,) VU t(R,).
Show by counterexample that
(d) ¢(Ri U Rz) 4% t(Ry) VU t(R2).
6. Find a set A with n elements and a relation R on A such that R', R2,..., R® are all
distinct. This establishes that the bound given in Theorem 3.5.7 is attainable.
7. Prove Theorem 3.5.8,

Let A = {a, b, ¢, d, e, f, g, h} and let R be the binary relation on A as represented by


the following digraph.
162 BINARY RELATIONS Ch. 3

A g

(a) Construct the digraph <A, ¢(R)>.


(b) Find tsr(R).
(a) Show by counterexample that the statement “If R is transitive, then s(R) is
transitive” is false.
(b) Find an example to show that st(R) and ts(R) may not be equal, even if R is
a relation on a finite set.
10. Let R be an arbitrary relation on a set A. Prove each of the following.
(a) (Rt)t = Rt
(b) RR* = R* = R*R
(c) (R*)* = R*
11. Let S = {S;, S2,...,8,} be a set of procedures. Define the relation => on S as
follows:
S; > S;<> S; calls S).
Some procedure of the set S is recursive if the digraph <S, >*}> contains a directed
cycle of nonzero length. Let S = {A, B, C, D, E} and suppose
A calls B and E,
B calls C,
C calls E,
D calls C, and
E calls B.
Does the set S contain any recursive procedures ?
12. Let A = {a}* = {a"|n > 0}, and B be the singleton set B = {z} where z is an infinite
string of a’s: B = {aaaa...}. Let R be the relation on A U B defined as follows:
Xx,yo € Rey = xa.
Prove or disprove that <A, z> € Rt.
13. Let A = {a;, @2,..., Gy} be a set with n elements and let R’ and R” be binary rela-
tions A. The incidence matrix M’ of the relation R’ is the 7 x m matrix defined as-
follows:
Mi, j] = 1 <> a;R’a;,
= 0 otherwise.
The matrix M” is defined in the analogous way.
Let the operations of matrix addition and multiplication be defined in the usual
way but using the following operations on matrix entries:
O=0-x=x0=-04+0andi=1+x=x+1=1-1, wherex=0o0rx=1.
Sec. 3.5 CLOSURE OPERATIONS ON RELATIONS 163

(a) Find the incidence matrix for R’ U R” in terms of M “, M”, and the operations
of matrix addition and multiplication.
(b) Find the incidence matrix for R’R”.
Let M be the incidence matrix for R.
(c) Find the incidence matrix for R*.
(d) Find the incidence matrices for R+ and R*.
(e) Find the smallest relation R on the set {a, b, c}, for which the incidence matrix
for Rt is
111
111
11 1
14. (For students with an understanding of elementary probability.) Consider the fol-
lowing four dice, which we will call A, B, C, and D.

If two dice x and y are chosen and rolled, we say “x beats y” if a higher number
shows on x than y.
(a) For each pair of dice x and y, calculate the probability that x beats y. Present
your results as a two-dimensional array whose entries are probabilities.
Let R denote the binary relation “is more likely to win than” on the set {A, B, C, D}
where R is defined as follows:
xRy <> the probability that x beats y is greater than 4,
(b) Give the digraph associated with the relation R.
(c) Find the transitive closure of R.
(d) Is the relation R transitive?
(e) Suppose someone proposes the following game. You may choose whichever
die you like from the set [4, B, C, D}. After your selection, your opponent will
select a die from the remaining three dice. You then roll the two dice ; the
winner is the person whose die beats the other. The loser pays the winner $1.
Assuming your moral character is such that this proposal does not make your
skin crawl, would you accept, and why?

Programming Problem

Write a program which, when given a set of integers S and a relation R on S


(specified as a set of ordered pairs), produces r(R), s(R) and ¢(R).
164 BINARY RELATIONS Ch. 3

3.6 ORDER RELATIONS

An order relation is a transitive relation on a set which provides a means to com-


pare elements of the set, although such a relation may not permit a comparison
of any two elements of the set. We will consider several types of order relations
in this section.

Definition 3.6.1: A binary relation R on a set A is a partial order if R is


reflexive, antisymmetric, and transitive. The ordered pair <A,R> is a partially
ordered set, or a poset. The relation R is said to be a partial order on A.

It follows from the preceding definition that a partially ordered set is also
a digraph whose relation is a partial order on the set of nodes. We will use the
symbol < to denote an arbitrary partial order; thus, if R is an unspecified partial
order, we will usually write either a < b or b > a rather than aRb.

Examples
(a) The relation of set containment is a partial order on any collection of subsets
of a set A; that is, c is a partial order on P(A) and <@(A), <)> is a partially
ordered set.
(b) The relation < is a partial order relation on the set of integers.
(c) Let B = {b;, bz,..., 6,} be the set of blocks in a program in a block-struc-
tured language such as ALGOL or PL/I. For all i and /, define 5; < 5, if 5;
is contained in b;. Then <B, <> is a poset.
(d) The relation < is not a partial order on I because it is not reflexive. #

The diagrams we have described for digraphs can be used for partially ordered
sets as well. However, posets are traditionally represented in a more economial
way by poset (or Hasse) diagrams. These diagrams do not explicitly represent all
ordered pairs of the partial order. The edges of a poset diagram for the relation R
represent the smallest relation R’ such that (R’)* = R. Thus, on a poset diagram
all loops are omitted, eliminating explicit representation of the reflexive property.
Furthermore, an arc is not present in a poset diagram if it is implied by the transi-
tivity of the relation. That is, there is an arc from a to b only if there is no other
element c such that a < cand c < D. Finally, the antisymmetry of a partial order
implies that the only directed cycles ina digraph representation of a poset are the
node loops. By convention, poset diagrams are drawn so that all arcs point upward
and arrowheads are not used. Poset diagrams are more easily grasped than digraph
representations of posets, and we will use them freely.

Examples
(a) The following are alternate diagrammatic representations of a partial order
Rona set S = {a, b, c, d}.
Sec, 3.6
ORDER RELATIONS 165

A digraph of (S. R) A poset diagram of (S. R)

Note that if the edges of the diagram on the right are directed upward and
the reflexive transitive closure is formed, the result is the digraph on the left.
(b) Consider the binary relation “divides” defined on a set of nonzero integers,
where a divides 6 if and only if 5 is an integer multiple of a. If we choose
the
set of positive integers from 1 to 12, the resulting poset is represented by the
following diagram.

The concept of a partial order is closely related to the notion of a quasi


order.

Definition 3.6.2: Let R be a binary relation on A. Risa quasi order if R is


transitive and irreflexive.

If R is a quasi order, then R is always antisymmetric because the premise of


the antisymmetry condition, xRy A yRx > x = y, is always false. For suppose
xRy and yRx. Since R is transitive, it follows that xRx which violates the irreflex-
ive property of R. .

Examples
(a) The relation < is a quasi order on any set of real numbers.
(b) The relation “is a proper subset of” is a quasi order on any collection of sets.
166 BINARY RELATIONS Ch. 3

(c) The relation “is a prerequisite for” is a quasi order on any set of college courses.
(d) The transitive closure of the relation “calls” is a quasi order on any collection
of nonrecursive programs and subroutines.
(e) PERT is a method of scheduling tasks to minimize the total time required for
completion of the tasks. Application of the method usually involves the con-
struction of a PERT chart which represents a quasi order on the collection
of tasks to be performed; xRy means that task y cannot be started until task x
is finished. #

The only distinction between quasi orders and partial orders is the equality
relation E. The proof of the following theorem is left as an exercise.

Theorem 3.6.1: Let R be a binary relation on A.


(a) If Ris a quasi order, then r(R) = R U Eis a partial order.
(b) If Ris a partial order, then R — E is a quasi order.

Because of the similarity between quasi orders and partial orders, it is con-
venient to use the same diagrams to represent both kinds of order relations. Thus,
a poset diagram for the partial order < over a set of integers can also be used to
represent the quasi order < over the same set.
If < is a partial order and either a < b or b < a, we say aand bare comparable.
If < is a partial order on A such that every two elements of A are comparable, then
< is called a linear order.

Definition 3.6.3: A partial order < on a set A is a linear (or simple, or total)
order if either a < bor b < a forevery a,b € A. If < isa linear order on A, then
the ordered pair <A, <> is a linearly ordered set, or chain.

If A is a finite set, we can construct a linear order over the elements of A by


listing the elements of A and specifying that a < 6 if and only if a precedes b on
the list; thus every finite set can be linearly ordered. The poset diagram of a linearly
ordered set is simply a vertical sequence of nodes with an arc connecting each
pair of adjacent nodes.

Examples
(a) The linearly ordered set

<(1, 2, 3}, (1, 1D, <2, 2), <3, 3>, C1, 2, C1, 3D, <2, OD
is represented by
3
2
1
(b) The linearly ordered set <I, <> can be represented by the following (incom-
plete) diagram.
Sec. 3.6 ORDER RELATIONS 167

Cet
ee
RQ BQ:
(c) Consider the universe of real numbers R. For every real number a, let
Sa = {x|0< x <a}, and let S be the collection {S,|a > 0}. If a <b, then
Sa = S;, and consequently <S, <> is a linearly ordered set.
(d) If A is a set with more. than one element, then <P(A), c:> is not a linearly
ordered set. #

Sometimes a subset of a partially ordered set contains distinguished elements


which are “greater” or “less” than all other elements of the subset. The following
definition characterizes these elements.

Definition 3.6.4: Let <A, <> be a poset and Ba subset of A.


(a) Anelementd € Bisa greatest element of B if for every element b’ € B,
b’ <b.
(b) Anelement d & Bis a least element of B if for every element b’ < B,
b<b.

Example
Consider the poset <P({a, b}), <> represented by the following diagram:

fo.
f«} {5}
b

(a) If B = {{a}}, then {a} is a least and greatest element of B.


(b) If B = {{a}, {b}}, then B has no least or greatest element since {a} and {5} are
not comparable.
(c) If B = {{a}, G}, then {a} is a greatest element of B and @ is a least element.
#
168 BINARY RELATIONS Ch. 3

Theorem 3.6.2: Let <A, <> be a poset and B c A. If a and b are greatest
(least) elements of B, then a = b.
Proof: Suppose a and b are both greatest elements of B. Then a < b and
b < a. It follows from the antisymmetry of < that a = b. The proof for the case
when a and b are least elements of Bis similar. J

Definition 3.6.5: A binary relation R on A is a well order if R is a linear


order and every nonempty subset of A has a least element. The ordered pair <A, R>
is called a well ordered set, and R is a well ordering of A.

Theorem 3.6.3: <N, <> is well ordered.


Proof: We must show that every nonempty subset of N has a least element
under the relation <. The proof is by contradiction. We will assume there exists
a subset of N, say S, such that S does not have a least element. We will then con-
clude that S = ¢. To show S = @, we will use induction to prove that every ele-
ment of S is at least as great as any natural number, i.e.,
VaVx[x € Son<x].
Since no natural number is greater than or equal to every integer in N, it will fol-
low that x € Sis false; i.e., S= ¢.
1. (Basis) Vx[x <¢ S + 0 <x]. This follows immediately from the fact that
SN.
2. (Induction) Assume Vx[x € S +n <x] is true for an arbitrary n. It
cannot happen that n € S since that would violate the assumption that S
has no least element. Therefore, it follows that Vx[x ¢ S>n < x] is
true. We conclude that Vx[x ¢ S > n+ 1<>x] is true. This establishes
the inductive step and we conclude that if S has no least element, then
S=¢. I
Examples
(a) Every finite linearly ordered set is well ordered.
(b) The pair <I, <> is not a well ordered set because some subsets of I (such as I
itself) do not contain a least element.
(c) The relation < is a linear ordering of the real numbers R, but not a weil
ordering. For example, we can show by the following argument that the sub-
set consisting of the positive real numbers does not have a least element.
Assume x is a least element of the set of positive real numbers. Since x is
positive, x/2 is also positive. Yet x/2 <x and they are not equal. This con-
tradicts the assumption that x is a least element of the set of positive real num-
bers under the order relation<. #

The well ordering of N by < can be used to construct a well ordering R of


a set S if we can associate each element of S with a unique element of N. The
Sec. 3.6 ORDER RELATIONS 169

induced well ordering on S is defined as follows: if a,b € S and ais paired with
n, and b with n,, then aRb <n, <n,.

Example
A well order for the set of integers, I, can be constructed by listing the elements
of N in ascending order and then pairing the elements of I with those in N as fol-
lows:
N: 0 1 2 3 4 5 6
$( ¢ ¢ ¢ $¢ $ 4
I: 0 —1 1 —2 2 —3 3
The relation R implied by the above pairing is
aRb <> |a| <|b| V (a[=|b
Aa]
<b) #
We are often interested in the set of integer n-tuples I” and the set of n-tuples
of natural numbers N’. The linear ordering < on I or N can be used to induce a
linear ordering on these sets. For example, if n = 2, we can define the ordering on
either I? or N? as follows:
<a, b><<e,d>la<cV(a=cAb<a).
The relation of “strictly less than” can be defined as

<a, b> < <c, d> <> (Ka, b> < <e, d> A <a, b> & Xe, dd).
Note that the set <N?, <> is well ordered, but <I?, <> is not.
If a linear order is imposed on the symbols of a finite alphabet ¥, then this
alphabetic ordering can be used to induce two distinct linear orderings of the
elements of X*.

Definition 3.6.6: Let & be a finite alphabet with an associated alphabetic


(linear) order. If x, y € Z*, then x < y in the lexicographic ordering of &* if
(i) x is a prefix of y, or
(ii) x = zu and y = zv, where z € E* is the longest prefix common to x
and y, and the first symbol of u precedes the first symbol of v in the alpha-
betic order.

The lexicographic ordering of X* is the usual “alphabetic” ordering used in dic-


tionaries. Under this ordering, every element of £* has an immediate successor,
but if Z has more than one element, then many elements do not have an immediate
predecessor. The lexicographic order of £* is a linear order, but it is not a well
order unless % consists of a single symbol.

Example
Let X = {a, b}, and let a precede b in the alphabetic order. Then if x is
any string in X*, the immediate successor of x is xa. The immediate predecessor
of xa is x, but there is no immediate predecessor of xb. Moreover, the set
{6, ab, aab, aaab,...} has no least element, since each string ab precedes any
string ab if m > n. It follows that the lexicographic order is not a well order. +
170 BINARY RELATIONS Ch. 3

The following definition provides a well ordering of £*.

Definition 3.6.7: Let £ be a finite alphabet with an associated alphabetic


(line ar) orde r, and let ||x|| deno te the leng th of x ¢ £*. Then x < yin the stan dard
ordering of &* if
(i) |||] < [yor
(ii) ||x|| = ||y|| and x precedes y in the lexicographic ordering of 2*.

In the standard order, every element has an immediate successor, and every ele-
ment other than A has an immediate predecessor. The least element of any set is
the shortest element of the set which occurs earliest in the lexicographic ordering
of E*. Since such an element exists for any subset of £*, it follows that the standard
ordering of £* is a well ordering.

Example
If<X = fa, b,c}, and x € &*, then the immediate successors of xa, xb, and xc
under standard order are xb, xc, and ya respectively, where y is the successor of x.
The immediate predecessors of xb and xc are xa and xb respectively. If x + A,
then the immediate predecessor of xa is yc, where y is the immediate predecessor
of x. The least element of {a"b|n € N}isb. #

Universally quantified statements about a well ordered universe are often


proved inductively. For example, the standard ordering of Z* can be used for
inductive proofs about Z*. Thus, if < represents the standard ordering and we
use S(x) to denote the successor of x € &* under this ordering, then the following
rule of inference applies:
P(A)
Vx[P(x) > P(S(x))]
o. WxP(x).
Thus, if we can prove A has property P and whenever x has property P, then S(x)
has the same property, then we can conclude that every element of Z* has property
P.
The rule of inference described above is basically the same as the First Princi-
ple of Mathematical Induction; such a rule is applicable only to well ordered sets
which “look like” the natural numbers in the sense that every element of the set can
be obtained by beginning with the least element of the set and repeatedly taking the
successor. (For example, in N, 2 is the successor of the successor of 0.) Some well
ordered sets do not have this property; an example is the set N x N under the
ordering < given above. Under this ordering, every element <a, b> has an imme-
mediate successor <a, + 1>. But an infinite number of elements do not have
immediate predecessors. The element <0, 0> has no predecessors, while if a + 0,
then <a, 0> has an infinite number of predecessors, none of which are immediate
predecessors. The natural generalization of the First Principle of Mathematical
Induction is not applicable to this universe of discourse, but the Second Principle
Sec. 3.6 ORDER RELATIONS 171

(which relies on the well order < rather than the successor operation) can be
applied, as it can to any well ordered set. Let S be a universe of discourse, let < be
a well ordering of S, and let < denote < — E (ie., x < y denotes x < y and
x # y). Then the following rule of inference holds:

Vxl< V
x >oPO)ly
= PO)
“. WxP(x).
Thus, if we can show that an arbitrary x has property P if every element less than x
has property P, then we can conclude that every element of S has property P. To
show that the conclusion of the rule of inference is valid, suppose we can prove the
premise

Vx[Vyly < x > P(y)] > P(x)


and suppose T is the subset of S consisting of all the elements of S which do not
have the property P. Since S is well ordered, if T ¢, then J must have a least ele-
ment m; it follows that P(x) is true for all elements x < m. The premise, however,
asserts that if P(x) is true for all x < m, then P(m) is true; we conclude that T must
be empty. Hence the conclusion of the Second Principle, WxP(x), is true. It follows
that the Second Principle of Mathematical Induction is a valid rule of inference for
any well ordered set <S, <>.
Finally, we note that the Second Principle of Mathematical Induction is not
applicable to sets which are not well ordered.

Example
Let X = {a, b}, let a precede b in the alphabetic ordering, and let < denote the
lexicographic order on X*. Then < is a linear ordering but not a well ordering of
*, and the Second Principle using < is not a valid rule of inference. For consider
the following predicate P on the universe &*:

P(x) <> x € {a}*.


Then assertion
Vyly < x > PO)]
is true if and only if x € {a}*. (Every predecessor of a” is of the form a", where
n< m; hence the assertion is true if x € {a}*. If x ¢ {a}*, then if n is sufficiently
large, a"b precedes x; hence the assertion is false if x ¢ {a}*.) Therefore the premise
of the Second Principle,

VxlVyly < x > P(y)] > PO)]


is true, since Vy[y < x => P(y)] is true if and only if P(x) is true. However, the
conclusion of the Second Principle,
VxP(x)

is false. It follows that the Second Principle cannot be applied to &* using the lexi-
cographic ordering <. #
172 BINARY RELATIONS Ch. 3

Associating assertions with program statements in the way described in Sec-


tion 1.6 makes it possible to prove that a program has the correct output if the
program halts, but it does not provide a means of establishing that the program
halts. Well ordered sets provide a basis for proving that programs terminate. A
program will halt if and only if each of its statements is executed only a finite
number of times. It follows that a loop-free program always halts because each
statement is executed at most once. Programs with loops may not terminate, but
in order for this to occur, some program loop must be traversed an infinite num-
ber of times. The principal technique for establishing that a loop terminates is to
show that some variable quantity v (which is not necessarily a program variable)
must assume a value in a subset S of a well ordered set T in order for the loop to
be traversed, and that each traversal of the loop causes the value of v to decrease.
Then eventually the value of v will no longer be an element of T. For suppose the
initial value of the variable v is t,, and successive executions of the loop cause v to
assume the sequence of distinct values fo, ¢,,f,,..., Where fj >f, >t, >... Then
the sequence fy, f;,f,,...is of finite length, for otherwise the set of values of the
sequence form a subset of T without a least element, violating the definition of a
well ordered set. Thus the sequence is finite. Since each traversal of the loop causes
the value of v to decrease, the value of v will eventually not be a member of T and
the loop will not be executed again.

Example
Rather than treat a specific example we will describe the application of termi-
nation techniques to the nested loop structure which appears below. Assume that
the value of m is a positive integer and is not changed inside either loop, that all
statements which affect the values of i and j are shown, and that the loops do not
contain other loops or branch statements.

for i — 1 step 1 to m do
begin

jom
whilej > ido
begin

icj—il
end

end
Sec. 3.6 ORDER RELATIONS 173

For the outer (for) loop, consider the quantity (m — i). Since we have assumed
m > 1, the initial value of this quantity is in the well ordered set N. Incrementing
i with each traversal of the loop causes the quantity (7m — 7) to be decremented.
When i > m, the quantity (m — i) is no longer an element of N and the execution
of the loop ceases. Thus, the outer loop will terminate if each execution of the
inner loop terminates. The variable j of the inner (while) loop is initialized to m
and decremented by 1 during each traversal of the loop. Since i < m and execution
of the loop leaves the value of i unchanged, execution of the loop will cease when
j is no longer an element of the well ordered set {i + 1,i+2,... , m}. Thus, each
execution of the inner loop will terminate and therefore the outer loop will term-
inate. #

tSome Additional Concepts for Posets

In Definition 3.6.4, we defined the greatest and least elements of a subset of


a partially ordered set. In this section, we introduce other distinguished elements
of subsets of posets and explore their properties.

Definition 3.6.8: Let <A, <> be a poset and B a subset of A.


(a) An element b € Bis a maximal element of B if b € B and no element
b’ & B exists such that b + b’ and b <b’.
(b) An element d € A is an upper bound for B if, for every b’ & B, b’ <b.
(c) Anelement b € A isa least upper bound (lub) for B if bis an upper bound
and for every upper bound Db’ of B, b < b’.

The definitions of a minimal element of B, a lower bound for B and a greatest


lower bound (glib) for B are similar to the definitions above. Note that a greatest
element of B and a maximal element of B must be elements of the subset B, while
an upper bound for B and a least upper bound may or may not be elements of B.
Nothing in the definition assures us that any of these elements exist, and in many
cases they do not.

Examples
(a) Consider the poset <P({a, b}), <> represented by the following poset diagram.

fo.
fe} {0}
174 BINARY RELATIONS Ch. 3

If B = {{a}}, then {a} is a least and greatest element of B, as well as a maximal


and minimal element of B. The upper bounds of B are {a} and {a, b}, and {a}
is a least upper bound. The lower bounds of B are ¢ and {a}, and {a} is a great-
est lower bound.

(b) Consider the poset <R, <>, and let B= [0, 1) = {x|O<x <1}. Then
B has no greatest or maximal elements, but 0 is a least and minimal element.
The set of upper bounds of B is the set {x|x > 1}, and 1 is a least upper
bound. The set of lower bounds of B is {x |x < 0} and 0 is a greatest lower
bound.

(c) Consider the set of integers from 1 to 6 under the partial order “divides.” The
poset diagram is the following.

4 6

Let B be the entire set {1, 2, 3, 4, 5, 6}. Then, 4, 5, and 6 are all maximal ele-
ments of B, but B has no greatest element. The set B has no upper bounds,
and therefore no least upper bounds. The element 1 is a least element, a minimal
element, a lower bound, and a greatest lower bound of B.

(d) A topological sort is a process of embedding a partial order < in a linear


order <.. That is, given a partial order < we wish to find a linear order < such
that a<b=>a<b. An algorithm for performing a topological sort can
be described as follows:
Let <S, <> be a finite poset. Choose a minimal element of x of S. (Problem
17(a) guarantees that it is always possible to find a minimal element of a
nonempty finite poset.) Make this element the first element in a list representa-
tion of <S, <>. Now repeat the procedure for the subset S — {x}. (Problem
8(b) guarantees that <S — {x}, <)> is a poset.) Each time a new minimal ele-
ment is found, it becomes the next element in the list representation of <5, <>.
The procedure is repeated until S is exhausted. +

Thefollowing theorem establishes some relationships between the distin-


guished elements defined in Definitions 3.6.4 and 3.6.8.

Theorem 3.6.4: Let <A, <> be a poset and Ba subset of A.


(a) If b is a greatest element of B, then b is a maximal element of B.
(b) If b is a greatest element of B, then 0 is a lub of B.
(c) If b is an upper bound of B and 6b € B, then b is a greatest element
of B.
Sec. 3.6 ORDER RELATIONS 175

Proof: (a) We will prove the contrapositive, that is, if 6 is not a maximal
element of B, then b is not a greatest element of B. If b is not maximal,
then there exists an element 5’ € B such that b 4 b’ and b <b’. Then
b’ < bis false, and hence b is not a greatest element of B.
(b) Since B c A, it is immediate from the definitions that if b is a greatest
element of B, then b is an upper bound for B. If a is an upper bound
for B, then b <a, since b € B. Therefore, b is a least upper bound
of B.
(c) Ifb © Bis an upper bound for B, then b’ < b for all b’ & B. Therefore,
b is a greatest element of B. fj

A theorem similar to Theorem 3.6.4 can be stated using “least” instead of


“greatest,” “minimal” instead of “maximal” and “glb” instead of “lub.”
The examples given previously illustrate that maximal elements and upper
bounds may or may not exist, and when they do exist, they may or may not be
unique. Similar statements hold for minimal elements and lower bounds. Greatest
lower bounds and least upper bounds also may or may not exist, but if they do
exist, their more restrictive definitions ensure that they are unique. This is estab-
lished by the following theorem.

Theorem 3.6.5: Let <A, <> be a poset and Bc A. Ifa least upper (greatest
lower) bound for B exists, then it is unique.
The proof is left as an exercise.

Problems: Section 3.6

1, Fill in the following table describing the characteristics of the given ordered sets.
Use Yfor yes and N for no.

Quasi Partially Linearly Well


Ordered Ordered Ordered Ordered

<N, <>
<N, =>
d,=>
<R, =>
< O(N), Proper
containment>
<P(N), =>
<P ({a}), <>
<P(g), <>

2. Prove Theorem 3.6.1.


176 BINARY RELATIONS Ch. 3

3. State which of the following digraphs represent a quasi-ordered set; a poset; a linearly
ordered set; a well ordered set.

(c)

(e) C,.)

4. Prove or disprove each of the following.


(a) Let G be a digraph which represents a poset. Then any subdigraph of G rep-
resents a poset.
(b) Any digraph which represents a quasi order has at most one component.
(c) The digraph of a poset is strongly connected.
(d) A digraph which represents a linearly ordered set is connected.
5. Let the universe of discourse be I. Prove or disprove that {<a, b>|a is an integral
multiple of b} is a partial order.
6. (a) Describe a well ordering for the set I x I.
(b) Describe a quasi ordering for the set I x I.
Sec. 3.6
ORDER RELATIONS 177

7. Prove the following assertions:


(a) If R is a quasi order, then so is R°.
(b) If R is a partial order, then so is R¢.
(c) If R is a linear order, then so is R°.
(d) There exists a set S and a relation R on S such that <S, R> is well ordered but
<S, R*> is not.
8. Let R be a relation on a set S, and let S’ be a subset of S. Define
the relation R’
on S’ as follows:
R’= RO (S’ x S%.
Determine the truth or falsity of each of the following assertions:
(a) If R is transitive on S, then, R’ is transitive on S’.
(b) If R is a partial order on S, then R’ is a partial order on S$”.
(c) If R is a quasi order on S, then R’ is a quasi order on S”’.
(d) If R is a linear order on S, then R’ is a linear order on S’.
(e) If R is a well order on S, then R’ is a well order on S”.
9. (a) Show R is a quasi order if and only if RA Re = g and R = Rt.
(b) Show R is a partial order if and only if R A R¢ = Eand R = R*.
10. Let P be a program and S be the set of subroutines which can be called during the
execution of P. Define the relation R on {P} U S by x;Rx, if x; calls x ', for some inp
ut
to program P. Under what conditions is R* a partial order?
11. Prove that the procedure PRODUCT given in Fig. 1.6.1 halts if the initial assertion
holds prior to execution.
12. Construct examples of the following sets.
(a) A non-empty linearly ordered set in which some subsets do not have a least
element.
(b) A non-empty partially ordered set which is not linearly ordered and in which
some subsets do not have a greatest element. Construct both finite and infinite
examples.
tc) A partially ordered set with a subset for which there exists a glb but which
does not have a least element. Construct both finite and infinite examples.
td) A partially ordered set with a subset for which there exists an upper bound but
not a least upper bound. Construct both finite and infinite examples.
$13. Prove Theorem 3.6.5.
14, Let 7 bea relation on the Cartesian plane R x R defined as follows:

<1, Vip T<x2, y2> if and only if x, <x, and y; < yo.
Determine whether each of the following assertions is true or false. Justify your
answer if the statement is false.
(a) T is a partial order.
(b) T is a linear order.
(c) T is a well order.
t(d) Every subset of R x R which has a lower bound has a glb.
(e) If the second condition is eliminated (that is, we only require x, < x), then the
resulting relation is a partial order.
178 BINARY RELATIONS Ch. 3

15. Redefine the relation T of Problem 14 as follows:

{x45 ¥1>T<X2, Y2> <> [Xr < X2) V 1 = 2 AY Sad]


Now answer the questions in Problem 14 for the new relation 7. (For part (e), the
definition of T will only require x; < x2.)
16. Let & = {a, 5}, and let a precede b in the alphabetic order. Use a sketch to charac-
terize each of the foll owin g digr aphs . All stri ngs of leng th less than 3 shou ld appe ar
explicitly in your sketch, and the general structure of the complete Gnfinite) digraph
should be apparent. You may use closure operations in your characterization.
(a) <2X*, standard order>
(b) <Z*, lexicographic order>
17. Prove each of the following:
(a) Any finite nonempty subset of a poset has at least one minimal and one maximal
element.
(b) For any linearly ordered set, every minimal element of a subset is a least ele-
ment and every maximal element is a greatest element.
(c) Every nonempty finite subset of a linearly ordered set has a least and greatest
element.
18. Let S be the set of nonnegative rational numbers, and let < denote the usual rela-
tion of less than or equal. Note that <S, <> is a linearly ordered set but not a well
ordered set. Find a predicate P for the universe S which shows that the Second
Principle of Mathematical Induction using < is not a valid rule of inference for
this universe.

Programming Problems

1. Write a program which accepts as input a set of ordered pairs and determines if the
relation is a quasi order, partial order, or linear order.
2. Write a program which accepts as input a set of ordered pairs denoting adjacent
nodes of a poset diagram and produces a minimal element of the poset.
3. Write a program to perform a topological sort of a finite poset. Assume the input is
presented as a set of ordered pairs denoting adjacent nodes of a poset diagram. One
technique is to select and list a minimal element of a poset, delete the element listed,
and repeat the process, continuing until all elements are listed. (Ref. Knuth, [1969],
Vol. 1, p. 262.)

3.7 EQUIVALENCE RELATIONS AND PARTITIONS

Often the elements of a set are treated according to their properties rather than
as individuals. In such a situation, we can ignore all properties which are not of
interest, and treat different elements as “equivalent,” or indistinguishable, unless
they can be differentiated using only the properties which are of interest. The notion
of “equivalence” has three important characteristics:
(i) Every element is equivalent to itself (reflexivity). -
(ii) If ais equivalent to b, then b is equivalent to a (symmetry).
Sec. 3.7 - EQUIVALENCE RELATIONS AND PARTITIONS 179

(iii)
Ifa is equivalent to 6 and b is equivalent to c, then a is equivalent to c
(transitivity).
These properties form the basis of an important class of binary relations on a set.

Definition 3.7.1: A binary relation R on a set A is an equivalence relation if


R is reflexive, symmetric, and transitive.

The digraph associated with an equivalence relation R has certain distinguish-


ing characteristics. Since R is reflexive, every node has a loop. The symmetry con-
dition implies that if there is an arc from a to b, there is an arc from b to a. The
transitivity condition implies that if there is a path from a to b, there is an arc from
a to b. From these considerations, it follows that each component of the digraph
of an equivalence relation is a complete digraph.

Examples
(a) The universal relation on any set A is an equivalence relation. If A = {1, 2, 3},
then the digraph of universal relation on A is a complete digraph with 3 nodes.
(b) The empty relation ¢ is an equivalence relation over the empty set ¢@. How-
ever, the empty relation is not an equivalence relation over any nonempty set
because it is not reflexive.
(c) Consider the class of propositional forms over some set of propositional
variables. The relation R defined by R = {<P, Q>|P <> Q}, where P and Q
are propositional forms, is an equivalence relation over this set.
(d) A predicate P with one argument induces a natural equivalence relation ~
over a universe of discourse U. Under this relation, two elements, a,b <¢ U
are equivalent if and only if P(a) and P(d) are logically equivalent:
a~ b<>[P(a@) <> P(d)].
(e) The equality relation, E, on any set is an equivalence relation. +

An important class of equivalence relations over the integers (or any subset
of them) consists of the modular equivalences.

Definition 3.7.2: Let k be a positive integer and a,b <€ I. Then a and 5 are
equivalent mod k, written
a = b (mod k)
if for some integer n, (a — b) = n-k. The integer k is called the modulus of the
equivalence.

Theorem 3.7.1: Equivalence mod k is an equivalence relation over any


settA col.
Proof: If A = 9, the assertions that R is reflexive, symmetric, and transitive
are vacuously true. If A @¢, then the conditions are established as follows:
(i) Reflexivity: For every a € A, since (a — a) = 0-k, it follows that
a =a (mod k).
180 BINARY RELATIONS Ch. 3

(ii) Symmetry: If a = b (mod 4), then there exists some n € I such that
(a — b) = n-k. Then (6 — a) = —n-k, and hence b = a (mod k).
(iii) Transitivity: Suppose a = b (mod k) and b =c (mod k). Then there
exist 1,,”,, € I such that (a — b) = n,-k and (b — c) =n,-k. Adding
both sides of these equations, we find (a — c) = (nm, +7,)-k and there-
fore a=c(modk). fj

Examples
(a) Let the relation R be equivalence mod 3 on the set A = {0, 1, 2, 3, 5, 6, 8}.
The elements 0, 3 and 6 are equivalent, as are 2, 5 and 8. The digraph of the
relation R on the set A is the following:

(b) Odometers of automobiles are devices which count in a modular fashion. If


the odometer uses five decimal digits to indicate mileage, then the modulus
is 100,000 and driving 123,456 miles registers the same as driving 23,456 miles.
(c) Many computers use a number representation (such as 1’s or 2’s complement)
where each integer is represented by k binary digits using a form of the usual
base 2 positional notation. The result of addition or subtraction of two oper-
ands consists of two parts:
(i) a binary number, obtained using modular arithmetic, and
(ii) aspecification of whether overflow has occurred. If overflow has occurred,
the magnitude of the result is too large for the result to be represented
using only k digits. #

Definition 3.7.3: Let R be an equivalence relation on a set A. For every


a & A, the equivalence class of a with respect to R, denoted [a],, is the set {x | xRa}.
The rank of R is the number of distinct equivalence classes of R if the number of
classes is finite; otherwise, the rank is said to be infinite.

The equivalence class [a], is nonempty for each a € A since a & [al]. If the
equivalence relation R is understood, we will usually write [a] in place of [a],.

Examples
(a) Let A = {a, b,c, d} and R be the set
{<a, a>, <a, b>, <b, a>, <b, b>, <c, &, <c, dD, <d, o>, <d, dd}.
The digraph of <A, R> is

OSD OS®
Sec. 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 181

The equivalence classes of the elements of A are the following:


[a] = [6] = {a, b}
[c] = [d] = {c, d}.
The relation R has rank 2.
(b) The rank of the equality relation on a set A is equal to the number of ele-
ments of A. #

Each equivalence class of an equivalence relation on a nonempty set A is the


set of nodes of a component of the digraph of the relation. We will now show that
the equivalence classes of an equivalence relation R are pairwise disjoint and that
they exhaust the set.

Theorem 3.7.2: Let R be an equivalence relation on a set A.


(a) For alla,b € A, either [a] = [b] or [a] 7 [b] = ¢.
(6) Urea lx] = 4
Proof:
(a) If A = @, the assertion is vacuously true. Hence, suppose A @ and
[a] ~ [5] ¢, represented by the following sketch.

eT >
[a] [b]
Let c be an element of [a] \ [6]. Then c & [a], so cRa. Similarly, ¢ € [b],
so cRb,
Since R is symmetric, it follows that aRc and since R is transitive,
aRb. Now consider an arbitrary element x of [a]. Then xRa and by transi-
tivity of R, xRb. Hence x é€ [b] which establishes that [a] < [Bb]. A
similar proof establishes that [b] < [a], and we conclude [b] = [a].
Therefore, if [a] ~ [b] @ then [a] = [5]. Since [a] and [b] are nonempty,
it follows that either a ; [6] = ¢ or [a]= [b].
(b) We must show (),<4[x] = A. We first establish that J,e4[x] c A.
Suppose c € (J,<4 [x]. Then c &€ [a]for some a € A, and since [a] < A,
ce & A. Therefore (),<4[x] < A. We next establish that A < Jre4 [x].
Let c € A. Thene ¢€ [c] < |, ¢4[x] and therefore A < |), <4 [x].

The proofs of the next two theorems are left as exercises.

Theorem 3.7.3: Let R, and R, be equivalence relations on a set A. Then


R, = R, if and only if R, and R, have the same set of equivalence classes.
182. BINARY RELATIONS Ch. 3

Theorem 3.7.4: Let R, and R, be equivalence relations on a set A. Then


R, © R, is an equivalence relation.

Theorem 3.7.4 is most easily proved by showing that the intersection of


equivalence relations preserves each of the properties of reflexivity, symmetry, and
transitivity. Note, however, that transitivity is not necessarily preserved under
union, and consequently the union of two equivalence relations may not be an
equivalence relation.

Theorem 3.7.5: Let R be a binary relation on A and let R’ = tsr(R), the


transitive symmetric reflexive closure of R. Then
(a) R’ is an equivalence relation on A, called the equivalence relation induced
by R, and
(b) if R” is an equivalence relation and R” > R, then R” > R’. (Thus R’
is the smallest equivalence relation which contains R.)
Proof:
(a) By definition of the closure operations and successive application of
Theorem 3.5.8,

r(R) is reflexive,
sr(R) is symmetric and reflexive, and
tsr(R) is transitive, symmetric, and reflexive.
Hence, R’ = tsr(R) is an equivalence relation on A.
(b) Let R” be any equivalence relation containing R. Then R” is reflexive
and symmetric so R” > RU R*° U E = sr(R). Since R”’ is transitive
and contains sr(R), R” contains tsr(R). Jj

Example
Let A = {a, b, c, d} and let R be represented by the following digraph.

Then, ¢sr(R) is represented by

a b c d

The equivalence classes of fsr(R) are {a, b} and {c, d}. Each equivalence class of the
induced equivalence relation is the set of nodes of a component of the digraph
(A, RD. #
Sec. 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 183

Partitions

The concept. of partition is closely related to that of equivalence relation.

Definition 3.7.4: A partition x of a nonempty set A is a collection of non-


empty subsets of A such that
(i) ForallS ¢ xandT € z, either S=TorSAT= 9.
(i) A= User.
An element of a partition is called a block. If x is a finite set, then the rank of x is
the number of blocks of z. If z is infinite, then the rank is said to be infinite.

A partition of a set A is a collection of nonempty and pairwise disjoint sub-


sets of A which exhaust the set A. One can think of partitions in a variety of ways.
Suppose A is a set of objects. We can partition A by placing each element of A
in one of a collection of boxes. After all the elements of A have been distributed,
each box which is not empty contains a block of the partition (a nonempty subset
of A). The elements of the partition are pairwise disjoint subsets of A, since no
element of A can be in two boxes at once. The rank of the partition is the number
of boxes which are not empty. We can also give a diagrammatic representation of
partitions. If the set A is represented by an enclosed area on paper, we can draw
lines to divide the area into nonoverlapping regions. Each region of the resulting
diagram will correspond to a block of the partition.

Example
(a) The following diagram represents a partition of a set.

The rank of this partition is four. By viewing the diagram alone, there is no way
of determining how many elements are in the set or how many elements are
in each block of the partition, but by definition, no block is permitted to be
empty.
(b) Consider the set of positive integers I+-. Then the sets
S; ={x|x € 1+ A xis prime} and S,=S;,
form a partition of 1+ of rank 2.
(c) Cutting a sheet of paper into pieces results in a partition of the original sheet.
(Each piece is a block of the partition.) This notion can be generalized to the
physical tearing asunder of any object.
(d) The set of tautologies, the set of contingencies, and the set of contradictions
form a partition of rank three of the set of all propositional forms.
184 BINARY RELATIONS Ch. 3

(ec) A multiprogrammed computer system can interleave the execution of several


independent programs; that is, the execution of a program may be interrupted
in order to process other programs and then later resumed. The main memory
of such a system is partitioned, and each block of the partition contains a se-
parate program. In some multiprogrammed systems, the memory is parti-
tioned in a fixed way; in others, the number and sizes of the blocks can vary
according to demand.

(f) The partition of I+ defined as m7 = {{x}|x € 1+} has infinite rank.


(g) Let A be a nonempty set. Then P(A) — @ is a collection of nonempty sets
whose members exhaust the set A, but this collection is not a partition of A
unless A is a singleton set. #

Except for the fact that equivalence relations are defined for empty sets and
partitions are not, equivalence relations and partitions are different descriptions
of the same concept. The following theorems establish a natural correspondence
between the partitions and equivalence relations over a nonempty set.

Theorem 3.7.6: Let A be a nonempty set and R an equivalence relation on


A. The set {[a]z|a € A} of equivalence classes under R is a partition of A.

The proof is immediate from Theorem 3.7.2 and Definition 3.7.4.

Definition 3.7.5: Let R be an equivalence relation over a nonempty set A.


The quotient set, A/R, is the partition {[a],|a € A}. The quotient set is also called
A modulo R or the partition of A induced by R.

Example

Let <A, R> be the following digraph:

OQ
a
Cfo
Then 4/R = {{a}, (6, c}}. The cank of the relation R is 2; the blocks of A/R are {a}
and {b,c}. #

The following theorem establishes that distinct equivalence relations ona -


nonempty set A induce distinct partitions of A.

Theorem 3.7.7: Let R, and R, be equivalence relations on a nonempty set A.


Then R, = R, if and only if A/R, = A/R.
Proof: The theorem follows immediately from Definition 3.7.5 and Theorem
3.7.3 which asserts that two equivalence relations are equal if and only if their
sets of equivalence classes are equal. JJ ;
Sec. 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 185

Not only do equivalence relations induce partitions in a natural way, but


partitions also induce equivalence relations.

Theorem 3.7.8: Let 2 be a partition of the (nonempty) set A, and define the
binary relation ~ on A as follows:
an~bAS[SexnA\aEeESAbeS].
Then ~ is an equivalence relation on A, called the equivalence relation induced
on A by the partition n.
Proof: We must show ~ is reflexive, symmetric, and transitive.
(a) Reflexivity: Since 2 exhausts A, every element of A is in some element
S of z and therefore a ~ a for every a € A.
(b) Symmetry: Suppose a ~ b. Then there is some S € z such that ae S
and b &€ S, and therefore b ~ a.
(c) Transitivity: Suppose a ~ b and b ~ c. Then there are elements S; € 2,
S, € # such that a,b € S,; and b,c & S,. But since z is a partition,
either S$; 1 S, = ¢ or S; = S,. Since b € S, andb € S,, 8,0 S,#¢.
Therefore S, = S,, and hence c € S,. We conclude thata~c. J

Note that each equivalence class of the equivalence relation induced by z is


one of the blocks of z.

Example
Let A = {a, b, c, d}, m = {fa, b}, {c}, {a}. Then the equivalence relation ~
induced by z is represented by the following digraph.

The following theorem summarizes the correspondence between partitions


and equivalence relations over nonempty sets. It asserts that each partition cor-
responds to an equivalence relation and vice versa. The proof is left as an exercise.

Theorem 3.7.9: Let a be a partition of set A and R an equivalence relation


over A. Then, z induces R if and only if R induces z.

Since set containment is a partial order over any collection of sets, itis a partial
order over any collection of equivalence relations on a set A. A corresponding
partial order of “partition refinement” exists over any set of partitions of A.

Definition 3.7.6: Let and x’ be partitions of a nonempty set A. Then z’


refines 7 if every block of z’ is contained in a block of z. We say 7’ is a refinement
of x, or 7 is refined by x’. If n’ refines x and x’ 7, then 7’ is said to be a proper
refinement of x.
186 BINARY RELATIONS Ch. 3

If x and x’ are partitions of a set A and 7’ refines z, then we can think of the
elements of 2’ as having been obtained by “breaking up” the elements of z into
smaller subsets of A.

Examples
(a) Using our diagram representation of partitions, the following illustrates two
partitions such that 7’ refines 7;

T W

The rank of 7 is 4, the rank of 2’ is 9,


(b) The partition of the natural numbers N induced by “equivalence mod 4” has
four elements, which we can denote by [0],, [1]a, [2],, and [3],. Each of these
elements of the parition contains an infinite number of integers, e.g., [0], =
{0, 4, 8, 12, ...}. The partition of N induced by “equivalence mod 2” has two
elements, which we denote by [0], and [1],. The partition induced by equiva-
lence mod 4 refines the partition induced by equivalence mod 2, since both
[0], and [2], are contained in [0],, and both [1], and [3], are contained in [1],.
(c) We noted before that one can form a partition z of a sheet of paper by cutting
it into pieces. If one then cuts the resulting pieces again, the result is another
partition 2’ which refines z.
(d) Some search procedures use a strategy of successively reducing the size of
the set to be searched (called the search Space). The search first partitions the
search space into two subsets, one which may contain the object of the search
and one which does not. The subset which may contain the object of the search
becomes the new search space. This procedure corresponds to finding a se-
quence of partitions 71, 7%2,..., 2, of the original search space, where eac
h
partiton ;,, refines 2; by dividing one block of %; into two blocks of 7;,;..
#

We will often compare the sizes of different equivalence relations and dif-
ferent partitions of a set. A partition z is larger than z’ if z has more blocks
than
xn’, and an equivalence relation R is larger than R’ if R has more ordered pairs than
R’. It is a confusing fact of life that for any set A, the large partitions of
A cor-
respond to the small equivalence relations and vice versa. To illustrate the po
int,
consider a set A with n elements. The largest equivalence relation on
A is the
universal relation A x A; this relation has n? elements. This equivalence rela
tion
induces the partition {4} which has a single block: this is the smallest partit
ion
of A. The size of a partition cannot generally be determined on the
basis of the
size of the associated equivalence relation, but the following theorem sh
ows that
if x’ refines (and is therefore at least as large as) z then the equivalence
relation
R’ induced by z’ is contained in (and is therefore no greater than) the rela
tion R
induced by z.
Sec, 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 187

Theorem 3.7.10: Let x and x’ be partitions of a nonempty set A, and let R


and R’ be the equivalence relations induced by z and z’ respectively. Then 2’
refines z if and only if R’ c R.

Proof: We first show that if x’ refines 2 then R’ < R. Suppose aR’b. Then
there is some block S’ of x’ such that a, b € S’. Since z’ refines z, there is a block
S of z such that S’ c S, and therefore, a,b € S. It follows that aRb and hence
R' & R. We next show that if R’ c R then z’ refines z. Let S’ be a block of n',
and a € S’. Then S’ = [a]p = {x|xR’a}. But for each x, if xR’a then xRa since
R' & R. Therefore, {x|xR’a} < {x|xRa} and [a] < [a]lg. Denote [a], by S; then
S is a block of z and S’ < S, which establishes that z’ refines x. fj

Theorem 3.7.11: Let C be a collection of partitions of a nonempty set A.


The relation “refines” is a partial order over the elements of C.

The proof is left as an exercise.

Example
Let A = {1, 2,3} and consider the following equivalence relations on A.
AXA
E = {<l, 1, <2, 2), <3, 3}
Ri = {d, D, <2, 2, <3, 3D, <2, 3D, <3, 29}
Rz = KI, 1), <2, 29, <3, 3, <1, 29, <2, 1}
The following is a poset diagram of <{A/(A x A), A/E, A/R1, A/R2}, refines).

AIA X A)

AIR, AIR

A/E #

tSums and Products of Partitions

Let S be the set of partitions of a nonempty set A. We now define two useful
binary operations on S, called the “sum” and “product.” The sum of two parti-
tions z, and z, is the largest partition (the one with the most blocks) that is refined
by both z, and z,. The product of z, and z, is the smallest partition (the one
with the fewest blocks) that refines both z, and z,.

Definition 3.7.7: Let 2, and z, be partitions of a nonempty set A. The prod-


uct of x, and 2,, denoted z,-7,, is a partition 2 of A such that
(i) 2 refines both z, and z,.
(ii) If’ refines both z, and z,, then z’ refines z.
188 BINARY RELATIONS Ch. 3

The following two theorems show that the product of two partitions always
exists and is, in fact, unique.

Theorem 3.7.12: Let R, and R, be the equivalence relations induced by


partitions z, and z, of a nonempty set A. Then the relation R= R, © R, induces
a product partition z of 2, and z,.
Proof: (i) Since R = R, A R,, it follows that R, > Rand R, > R. There-
fore, by Theorem 3.7.10, z refines both z, and z,, establishing
the first condition of Definition 3.7.7.
(ii) Suppose z’ refines both 2, and z,. If 2’ induces R’, then by
Theorem 3.7.10, R, > R’ and R, > R’. Then R, \ R, > R’ and
therefore R > R’ and z’refinesz. fj

Theorem 3.7.13: Let 2, and 7, be partitions of a nonempty set A. The prod-


uct of z, and z, is unique.
Proof: Suppose x and z’ are product partitions of z, and z,. Then from
Definition 3.7.7, 2 and x’ refine each other. By Theorem 3.7.11, the relation
“refines” is antisymmetric and hencez = 2’. Jj

The relationship of z, and z, to the product partition 7, -7, is illustrated in


Figure 3.7.1 The “borders” of z,+2, consist of all the borders of both z, and z3.

my ™ Wyo

Fig. 3.7.1. The product partition

Example
Suppose a sheet of paper is marked with red lines and green lines so that
cutting the paper on the red lines would result in the partition z; and cutting it on
the green lines would result in partition 2,. Then cutting it on both the red and
green lines would produce the product partition 2,-2,. +

We now discuss the sum of two partitions.

Definition 3.7.8: Let , and x, be partitions of a nonempty set A. The sum


of x, and z,, denoted 2, + z,, is a partition z such that
(i) both z, and z, refine z, ;
(ii) if x’ isa partition of A such that both z, and z, refine x’, then zrefines 2’.

Condition (ii) of the preceding definition ensures that the sum of z, and 7, Will be
the largest partition refined by both z, and z,. The sum of two partitions always
exists and is unique, as we show in the two following theorems.
Sec. 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 189

Theorem 3.7.14: Let R, and R, be equivalence relations on a nonempty


set A induced by the partitions z, and z,. Define the relation R to be the transitive
closure of R, U R,:
° R = (R, U R,)* — “(R, U R,).

Then R is an equivalence relation on A, and the partition 4/R is a sum of m, and z,.
Proof: R, UR, is reflexive and symmetric because the operation of set
union preserves these properties. Therefore, by Theorem 3.7.5, R = t(R, U R,)=
tsr(R, U R,) is the smallest equivalence relation which contains R, and R,. Since
R>R, and R > R,, both z, and z, refine A/R. Furthermore, any partition which
is refined by z, and z, induces an equivalence relation which contains both R,
and R,. Since t(R, U R,) is the smallest such equivalence relation, it follows that
A/R refines all such partitions. Therefore, 4/R is a sum of z, and 7. |

Theorem 3.7.15: Let x, and 7, be partitions of a nonempty set A. The sum


of z, and z, is unique.

The proof of this theorem is left as an exercise.

Let z, and z, be partitions of a nonempty set A, and let R, and R, be the


equivalence relations induced by x, and 2,. Two elements a,b € A are in the
same block of the sum partition z, + 2, if and only if there is a path from a to b
in the digraph <A, R, U R,>.
A diagrammatic representation of the sum of two partitions is given in
Figure 3.7.2. If x, and z, are represented by some set of “borders” on a diagram,
the borders of x, + 2, are exactly those borders common to both z, and z,.

Ts Ty nm, + 14

Fig. 3.7.2 The sum partition

Examples
(a) Suppose a sheet of paper is marked with red lines representing the partition
mz, and green lines representing the partition 2,. Then cutting the paper on
those lines which are colored both red and green would produce the sum
partition 7, + 7.
(b) In an information retrieval system, each “descriptor” induces a partition with
two blocks over the set of documents. If one descriptor is “artificial intelli-
gence,” then the documents will be categorized according to whether or not
this descriptor applies to the document. Suppose ten descriptors are used. If
retrieval is done by specifying a single descriptor, any of ten sets of documents
190 BINARY RELATIONS Ch. 3

can be specified. If retrieval can also be done using the negation of a descriptor
(meaning the descriptor is not appropriate), any of twenty sets of documents
can be obtained. If a single use of the connective AND is also permitted, then
one can obtain a set of documents corresponding to any block of a product
partition %,-72, where 2, and 7, are two of the partitions induced by a single
descriptor. A single use of the connective OR will not result in a block of
the sum partition; instead it will produce the union of some blocks of the
product partition. 3

Problems: Section 3.7

1. Prove that the universal relation on any set A is an equivalence relation. What is the
rank of this relation?
2. Prove that the empty relation is an equivalence relation on ¢. What is the rank of
the relation?
3. Suppose 4 is a finite set with n elements.
(a) How many elements are in the largest equivalence relation on A?
(b) What is the rank of the largest equivalence relation on A?
(c) How many elements are in the smallest equivalence relation on A?
(d) What is the rank of the smallest equivalence relation on A?
4. Suppose A = {a,b, c, d} and 7, is the following partition of A:
%, = {{a, b, ch, {d}.
(a) List the ordered pairs of the equivalence relation induced by 7.
(b) Do the same for the partitions

M, = {fa}, {5}, {ch}, {a}.


3 = {{a, b, c, d}}.

(c) Draw a poset diagram of the poset <{7;, %2, 73}, refines).
5. Let Rand R’ be equivalence relations on a set A. Show by example that R U R’ is
not necessarily an equivalence relation. What properties of an equivalence relation
are violated by your example? Choose the set A to be as small as possible.
6. State whether or not the following binary relations are equivalence relations. If they
are not, state which of the properties of an equivalence relation they violate. All
relations are on the set I. In each case, find the equivalence relation induced by R.
(a) <
(b) <
(c) R={a,b|@>0A5
@<>0A
0)5V
<0)
d) R={ab|@>0Ab6>0V@<0A56<0)}
() R={ab(@<0Ab>0VaG<0 A)}
b<0
(ff) R={ab A | a
b>0V @
Ea A<>
0A0
5<0}
(g) R={ab|a>0Ab>0)V @<0A6<0)V@=b=0)}
(h) R = {<a, b>|a divides b with 0 remainder}
(i) R= {<a, b>| |a — b| < 10}
G) R= {<a, b>[dxfx e 1A 10x <a <b< 10% +1}
(k) R= {<a, b>|Ax[x € 1A (10x <a < 10(« + 1)) A (10x <b < 10(x + 1)}}
Sec. 3.7 EQUIVALENCE RELATIONS AND PARTITIONS 191

Q) R= {<a,b>|axdylx el Aye lA W0x<a<10x4+1) A


(Oy <b < 107 + 1)f}
The following argument purports to prove that every symmetric and transitive rela-
tion is an equivalence relation. Let R be a symmetric and transitive relation.
(i) Because R is symmetric, if <x, y> € R, then Ky, XE R.
(ii) Because R is transitive, if <x, yy € R A <y, x> © R, then <x, x> © R.
Therefore, R is reflexive and it follows that R is an equivalence relation.
What is wrong with the argument?
Prove Theorem 3.7.3.
Prove Theorem 3.7.4.
Let A = I. Define aR if and only if a = b mod (6). Describe A/R.
Let 2, and 7, be partitions of a nonempty set A. State which of the following are
always partitions of A, which may be partitions of A, and which are never parti-
tions of A. Justify your answers.
(a) UR,
(b) 1,9 2
(c) 4, —%
) [74 A@,—-a))UMm
12. Let R; and R, be equivalence relations on a nonempty set A. Determine which of
the following are equivalence relations on A. Provide counterexamples for those
which are not.
(a) (A xX A)— R,
(b) Ri — Rz
(c) Ri
(d) r(R; — R2) (the reflexive closure of R; — R2)
(e) R,-Rz
13. Let A be a finite set with n elements and suppose 71, %2, %3,... , %y, is a sequence of
partitions of A such that 2;,; properly refines z,;. Find the maximum possible length
of the sequence.
14, Let A =I; define R,, R,, R; on A as follows:
aR,b <> a = b mod (3),
aR2b <> a = b mod (5),
aR;b <> a = b mod (6).
(a) Draw a partial order diagram for the poset
<{A/R1, A/R2, A/R3}, refines).
t(b) Describe the equivalence relations induced by
(A/R1)-(A/R3), (A/R1) + (A/R3), (A/R1)+(A/R2), (A/R1) + (A/R2).
What are the ranks of these relations?
15. Let R, denote equivalence mod j and R, denote equivalence mod k over I.
(a) Prove that I/R, refines I/R, if and only if & is an integral multiple of je
#(b) Describe the partition I/R; + I/R,.
t(c) Describe the partition I/R,-1/R,.
192 BINARY RELATIONS Ch. 3

16. Prove Theorem 3.7.9.


17, Prove Theorem 3.7.11.
$18. Prove that if 2, refines 22, then 7,-%, = 2%, and 2, + 2%, = 7.
$19. Prove Theorem 3.7.15.
120. Let P denote the set of all partitions of a nonempty set A, and consider the partially
ordered set <P, refines». Let 7, and 2, be members of P.
(a) Show that 2,-2, is the greatest lower bound of the set {7;, 72}.
(b) Show that 2, + 72 is the least upper bound of the set {7;, 2}.

Programming Problem

Write a program which accepts as input the incidence matrix of a relation and
determines if the relation is an equivalence relation.

Suggestions for Further Reading

The text by Deo [1974] is an excellent treatment of graphs with special atten-
tion to problems of interest to computer science; the book by Busacker and Saaty
[1965] is an earlier work of the same nature which considers applications in a wide
variety of areas. Aho, Hopcroft, and Ullman [1974] present and analyze many
algorithms associated with sets, graphs, and trees. Knuth [1969] treats the general
topic of trees and Knuth [1975] analyzes search trees.
4

FUNCTIONS

4.0 INTRODUCTION

Functions are a special class of binary relations. We commonly think of a function


as an input-output relationship; that is, for every input, or argument, a function
produces an output, or a value. Functions are the basis of many of our most
powerful mathematical tools, and much of our knowledge in computer science is
conveniently codified by describing the properties of certain classes of functions.
The ability to use and analyze functions is an important skill throughout the field.
In this chapter we will define the general class of functions and several special
subclasses. The terminology we introduce is widely used in mathematics and com-
puter science.

4.1 BASIC PROPERTIES OF FUNCTIONS


A function from a set A to a set B is a rule which specifies an element of B for
each element of A. We will usually denote arbitrary functions by the letters hgh.

Definition 4.1.1: Let A and B be sets. A function (or map, or transformation)


Jf from A to B, denoted f: A > B, is arelation from A to B such that for every
a & A, there exists a unique b € B such that <a, b> € f. If <a, b> & f, then we
write f(a) = b.

A function f from A to B is a binary relation from A to B with the special


properties that
(a) Every element of A occurs as the first component of an ordered pair of f.
(b) If f(@) = band f(a) =c, thn b=.
The terminology associated with functions is consistent with that of relations;
if fis a function from A to B, then A is called the domain of the function f and B

193
194 FUNCTIONS Ch. 4

is called the codomain of f. In the expression f(a) = b, a is called the argument of


the function and b is called the value of the function for the argument a.

To define a function we must specify the domain, the codomain, and the
value f(x) for each possible argument x. The notation f: A — B denotes that f
is a function with domain A and codomain B. The values of f(x) are specified by
a set of rules which cover all possible values of x, e.g.,
f: NON,
f(x) =1 if x is odd,

fo) = + if x is even.

If the domain of the function is finite, the function can be specified explicitly by
giving the values for all possible arguments, e.g.,
g: {l, 2, 3} > {A, B, C},
g(1) = A,
g(2) = C,
g(3) = C,
or by a digraph, e.g.,

2. .
3, osc

Examples
(a) Let A = {a, b} and B = {1, 2, 3}. The following digraphs represent functions
from A to B.

a. wl a. | a. al
b. 2 be boo
ee eee 3
(b) Let A and B be as above. The following digraphs represent relations from A’
to B which are not functions.

a. Al a.
b. 2 b. 2

(c) IfA = and Bis any set, then the empty relation is vacuously a function from
A to B. If A # ¢ and B = @, then the only relation from A to B is the void
relation; but this relation is not a function from A to B. There are no functions
which have a nonempty domain and an empty codomain. #
Sec. 4.1 BASIC PROPERTIES OF FUNCTIONS 195

Suppose f: A -+ B. We generally think of functions as mapping elements of


A to elements of B, but sometimes it is useful to think of f mapping subsets of A
to subsets of B. The following definition provides a convenient notation.

Definition 4.1.2: Let f be a function from A to B and let A’ be a subset of


the domain A. Then f(A’) denotes a subset of B, called the image of A’ under
f; f(A’) = {f(®) |x © A}. The image of the entire domain, f(A), is called the
image of the function f.

For any function f: A —> B, Definition 4.1.2 implicitly specifies another func-
tion F, where F': @(A) — (B); that is, F maps subsets of the domain to subsets of
the codomain, For A’ < A, the set F(A’) is denoted by f(A’). Note that f and F
are not the same function; the domain and codomain of f are the sets A and B
while the domain and codomain of F are the sets (A) and @(B). Thus the function
jf maps the element of A to elements of B, while the function F maps subsets of A
to subsets of B. This is illustrated by the following diagram:

In spite of the distinction between F and f, we will adopt the convention of using
f to represent both the original function fand the induced function F. This notation
is usually not ambiguous because the argument usually specifies which function
is intended.

Examples
(a) Suppose /: {0, 1, 2, 3} — {a, b, c} is defined by the following digraph:
0. a
Loo
ma c
3,

Then /({0, 1, 2, 3}) = {a, b, ch,


f({2, 3}) = {8, ch,
f ({0}) = {5},
f({0, 3}) = {6}, and
fp) = $.
(b) Let fbe a function from N to N such that f(x) = 1 for every odd integer x and
J(x) = x/2 for every even integer x. Then
196 FUNCTIONS Ch. 4

FO) =0 F({0}) = {0}


fy) =1 FD) = 0}
f(2)=1 f({0, 2, 4,6...) =N
£3) =1 F(A, 6, 8}) = {2, 3, 4}
f(4) = 2 FG, 3, 5,7, .. 3) = {1}.

#
Binary relations are defined to be equal if they have the same domain and
codomain and are equal sets of ordered pairs. Because functions are relations, the
same definition holds for functions; two functions fand g are equal if and only if
their domains and codomains are equal and for every element a of their domain,
f(@) = g(a).
Since functions are relations, if g is a function from A to Band fisa func
tion
from B to C, a composite relation exists from A to C. Furthermor
e, the next
theorem shows that this composite relation is itself a function. We
will use the
standard notation and represent the composite function by fo g or simp
ly fg.t
Theorem 4.1.1: Let g: A> Band f: B > C be functions. Then the composite
function fo g is a function from A to C, and (f o g)(x) = f(g(x)
) for all x € A.
Proof: Since f and g are relations, fo gis a relation from A to C. We mu
st
establish that fo g is also a function, that is, for every a € A,
there isa unique
¢ € Csuch that <a, c> € fog.
Since g is a function, for each a € A there isa bec B such
that g(a) = b;
since fis a function, for each b € B there isa c &€ C such that
f(b) = c. Because
<a, b> & g and <b, c> € f, it follows that <a, ¢> & fo g. Furt
hermore, b was uni-
quely determined by the argument a for the function g, and
c was uniquely deter-
mined by the argument for the function f. It follows that <a, c> is the only ordered
pair of the composite fo g with a as the first element. Thus
fo gis a function and
(fe g(a) =c = f(b) = f(g@). I
Examples
(a) Let g: {0, 1, 2} > {a, b} and J: {a, b} — [A, B, C} be defined by the following
digraphs.
& f
0. etree 4 CE » A
Lb
2. a
b. B
c

{We adopt the usual convention of representing


function composition witha symbol order
different from that used for composition of relations.
If Ry and R, are relations from A to Band
B to C respectively, then the composite relation is
denoted by Ri R2, while if f and g are functions
from A to Band Bto C respectively, and we are using functional
notation, then the composite
function is denoted by gf. The inconsistency is
due to the convention of putting arguments
of functions to the right of the function symbol
; if we wrote f(a) as (a)f, then we would writ
(a)\fe = ((a)f)g rather than gf (a). e
Sec. 4.1 BASIC PROPERTIES OF FUNCTIONS 197

Then f° g: {0, 1, 2} — {A, B, C}, and can be represented as follows.

(b) Let g: {0, 1, 2} > N be defined by g(x) = x and let f: N — N be defined by


f(x) = x. The composite go f is not defined because the domain of g is not
equal to the codomain of f. However, the composite fo g is defined:

fog: {0, 1,2} +N, an


fodg(x) = x.
In this case, fog = g.

(c) Let g: N—N, where g(x) = 2x for x € N, and let f: N-> N, where f(x) =
x/2 if x is even, f(x) = 0 otherwise.
Both fog and go/f are defined.

Ig:N-N,
fe(x) = f(2x) = x.

gf: NN,
ef (x) = e(x/2) =x if x is even,
ef (x) = g(0) = 0 ifxisodd. #

Relationships between various functions and composite functions are often


represented using a commutative diagram. A commutative diagram is a diagram in
which arcs represent functions and nodes represent domains and codomains; an
arc labelled f from a node labelled A to a node labelled B indicates that f is a func-
tion from A to B. A directed path from node A to node B represents the sequential
application of the functions which appear as labels on the path. For example,
the following diagram represents the assertion that fo g(x) = f(g(x)).

fog
A Cc

By saying the above diagram commutes, we assert that going from A to B by g


and then from B to C by f gives the same result as going from A to C by fo g; this
result was established in Theorem 4.1.1.
In general, each path of a commutative diagram can be associated with the
composite of functions which appear as labels on the path. If the diagram com-
mutes, then different paths with the same initial and terminal nodes represent
different descriptions of the same function. Thus the following commutative dia-
gram asserts that fg = hk for the maps g:4— B, f: B-> D, k:A-C, and
h: C= D.
198 FUNCTIONS Ch. 4

A g B
k f

C ; D

When a composite function appears in a discussion, it is understood that the


comments apply only in the case that the composite is defined. Using this con-
vention, the following theorem is a special case of Theorem 3.4.2.

Theorem 4.1.2: Composition of functions is associative: if fg and h are


functions, then (fg)h = f(gh).

The assertion that composition is associative is equivalent to the assertion


that the following diagram commutes.

A B

gh
; &
Ig

D — Cc
f

If f: A — A for some set A, then the function f can be composed with itself
any number of times. The notation used to denote the repeated composition of f
with itself is defined inductively as follows:
1. f(a) =a,
2 f"'@=ff@), forneN.
The set of all functions from a set A to a set B is often denoted by B4; this
notation has some useful properties which will become apparent later. If either of
the sets A or B is the first n natural numbers, {0, 1, 2,..., — 1}, then the set is
often represented by the symbol n. For example, the set of all functions from a
set A to {0, 1} is denoted by 24 and the set of all functions from {0, 1, 2} to aset B
is denoted by B*. Thus, the notation A” may denote either the set of n-tuples of —
elements of A or the set of all maps from {0, 1, 2,..., 2 — 1} to At. No difficulties
result from this ambiguous use of A” because there is a natural correspondence ©
between the two possible meanings; defining this correspondence is an exercise
in the next section.
The domain of a function is often a cartesian product of sets. A function f

+The notation A” has still a third use in our text. If A is a subset of E* for some alphabet
, then A” is used to denote the set product of A with itself 1 times (see Definition 2.7.5). The
context will determine the intended meaning of A”.
Sec. 4.1 BASIC PROPERTIES OF FUNCTIONS 199

with domain

Xi=l A,
is said to be a function of n variables. The value of f at <(x;,x2,...,X,>, where
x; € A,, will be denoted by f(x,, x,,...,X,)-

Example
Arithmetic operations such as addition, subtraction and multiplication are
examples of functions of two variables. These functions are commonly represented
by an infix notation; thus the function + (x, y) is denoted byx +y. #

inductively Defined Functions

When the domain of a function is an inductively defined set, induction often


provides a convenient and powerful way of specifying the function. The definition
of the function follows the definition of the domain in a natural way.

Examples
(a) The length of a word x € X* can be inductively defined as a function from
=* to N. (The length of x is denoted by || x||.)
1. (Basis) ||A|| = 0.
2. (induction) If x €¢ 2* anda é€ X, and ||x|| = x, then ||ax|| =” + 1.
Note that no extremal clause is necessary here; the function has been defined
for the entire domain 2* because it follows the inductive definition of Z*
(Definition 2.5.2).
(b) The successor function S: N — N maps each integer 1 € N into its successor,
n+ 1l;ie, S@ =n +1. Arithmetic operations on N can be defined induc-
tively using the successor function; we illustrate with a definition of the opera-
tion of addition +:N*— N,
1. (Basis) +(m, 0) = m forme N.
2. (Induction) + (m, S(@m)) = S(+(m,n)) formyn EN.
(c) The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, 13, 21,...
has the property that each term after the second is the sum of the two pre-
ceding terms. This sequence arises in a number of contexts. It can be induc-
tively defined as a function F on N as follows:
1. (Basis) F(O) = 0, and F(1) = 1.
2. (Induction) F(m + 2) = F(n + 1) + F(n) forallnae N. #

In each of the above examples, the value of the function in the induction step
is specified using values of the function for “earlier” arguments. A specification of
J(n) in terms of f(k) for k #7n is called a recursion formula, and f is said to be
recursively defined. Not all recursively defined functions are defined inductively.
200 FUNCTIONS Ch. 4

Example
The “91 function” is defined recursively (but not inductively) as follows:
[NON,
f@)=x-—10 if x > 100,
fix) =ff@4+ 11) — ifx < 100.
This function has the property that f(x) = 91 for all x such that 0 < x < 100;
otherwise, f(x) =x —10. #

The mechanism we have described for defining a function on an inductively


defined set does not guarantee that the result will be a function. Specifically, the
result may not be a function when the inductive definition of the domain allows
some elements to be constructed in more than one way. If the object defined satis-
fies the definition of a function, then we say the function is well-defined. When a
function is defined recursively, it is often necessary to prove that the function is
well-defined.

Example
Consider the set of arithmetic expressions E defined inductively as follows:
1. Every digit (0 through 9) is an element of E.
2. If Xe Eand Ye E,then X¥— Ye E.
3. The set EZ is the smallest set which satisfied clauses 1 and 2.
The above definition of E allows the construction of some elements, such as
3 — 4 — 5, in more than one way; in the inductive step one can either let X be 3
and Y be 4 — 5 and then form X — Y, or X can be 3 — 4. and FY can be 5. A func-
tion defined on E following the inductive definition may or may not be well-defined.
The following function fis well-defined because the definition does, in fact, charac-
terize a function on the elements of EZ. The function f sums the digits which appear
in an element of E.
f. EN,
1. If X e Eand X isa digit, then f(X) = X.
2. If Xe E and Ye E, then f(X¥ — Y) = f(X) + f(Y). Thus
fB—4-—5) = 12.
The following definition of g does not characterize a function.
gi: EON,
1. If X € Eand Xis a digit, then g(X) = X.
2. If X € Eand Ye E, then g(¥ — Y) = g(X) — e(Y).
The difficulty stems from the fact that subtraction is not associative, and
consequently there are two possible values of the “function” g for such expressions
as 3 — 4 — 5, namely: ,
&3 — 4 —5) = 83 — 4) — eS) = (3) — ge) — eS) = 3B -4—5
g(3 — 4 — 5) = g(3) — e(4 — 5) = g(3) — (e@ — 265) = 3 — (4 — 5).
Thus, g is a relation but not a function and we conclude g is not well-defined.
Sec. 4.1 BASIC PROPERTIES OF FUNCTIONS 201

Note that by using parentheses in the inductive step of the definition of E, the diffi-
culty can be eliminated. The inductive step would then read
2. If Xe Eand Ye Ethn(X¥-Yyck #
Inductively defined functions can often be computed either iteratively or re-
cursively. A program is said to compute a function iteratively if the computation
for most arguments is done by the statements in a program loop. A program is
said to compute the function recursively if the computation is done by a recursive
procedure.

Example
The factorial function can be computed either iteratively or recursively. The
following procedure computes n! iteratively for any n < N; the value returned by
ITERFACT for the argument z is n!.

procedure ITERFACT(n):
begin
pl;
for
7 — 1 step 1 until n do p — p «i;
return p
end

The following subroutine computes x! recursively.

procedure RECURFACT(n):
if n = 0 then return 1 else return 2 * RECURFACT (n — 1)

Two things must be considered in choosing between an iterative and a recursive


scheme for computing function values; they are the cost of the computation and
the clarity of the algorithm. For something like x!, the iterative scheme is quite
clear and likely to be somewhat cheaper to compute, but for more complex func-
tions, the clarity of a recursive algorithm often outweighs any incremental cost of
computation. #

Partial Functions

It is often convenient to consider a function from a subset A’ of A to a set B


without exactly specifying the domain A’ of the function. Alternatively, we can
view such a situation as one where a function has domain A and codomain B,
but the value of the function does not exist (is not defined) for some arguments.
This is called a partial function.

Definition 4.1.3: Let A and B be sets. A partial function f with domain A and
codomain B is any function from A’ to B, where A’ c A. For any x € A — A’,
the value of f(x) is said to be undefined.

Note that if f isa function from A to B, then f is a partial function from A to


B. To distinguish partial functions from functions, a function is sometimes called
202. =FUNCTIONS Ch. 4

a total function for emp has is. We will alw ays use the qua lif ier “pa rti al” whe n ref er-
rin g to par tia l fun cti ons ; the unq ual ifi ed ter m “fu nct ion ” will be res erv ed to des ig-
nate total functions.
The notation and the ore ms we hav e dev elo ped app ly to par tia l fun cti ons in
str aig htf orw ard way s. For exa mpl e, if g and f are par tia l fun cti ons fro m A to B
and B to C res pec tiv ely , the n fg is the par tia l fun cti on fro m A to C suc h tha t g(x )
is defined if and onl y if g(x ) and f(g (x) ) are bot h def ine d, and in tha t cas e, fg( x) =
f(g(x)). We will not develop all the analogous terms and definitions for partial
functions , alt hou gh we will occ asi ona lly use the m whe n the ir mea nin g is clea r.

Examples
(a) The operation of taking a square root of a real number is a partial function
from R to R;./x is undefined for x < 0.
(b) The partial function f(x) = 1/x from R to R is undefined for the argument
x=0.
(c) The partial function f(x) = x from R to R is a total function.
(d) Computer programs represent partial functions. The input to a program is the
argument of the partial function, and the output of the program is the value
of the partial function. If the program does not terminate or if it terminates
abnormally (e.g., by attempting to execute an illegal operation such as divi-
sion by 0), then the partial function is undefined for the argument. Using the
output of one program as the input of another corresponds to composition
of the partial functions implemented by the programs. This view of programs
provides a basis (different from the one we described in Section 1.6) for inves-
tigating program correctness. The “meaning” of a program can be defined to
be the partial function it computes, and a program is correct if it computes
the intended partial function. The program will halt for all inputs if the partial
function is total. #

Problems: Section 4.1

1. Determine which of the relations represented by the following diagraphs are func-
tions from A = {a, b, c} to B = {0, 1, 2}. For those that are functions, find the image
of the subset {a, b}. For those that are not, state what properties of a function are
not satisfied.
(a) a. O (b) a. 0
bo S A b. Jl
c. 2 c. 2
(c) a ..0 (d) a. 0
b Jl b. Jl
c. 2 c. 2
(e-) a._______+.0 (f) a. 0
b+ bs
C.D Ca 2
Sec. 4.1 BASIC PROPERTIES OF FUNCTIONS 203

2. Consider the following functions from R to R


fx) =x +3
g(x) = 2x +4 i
h(x) = x/2
k(x) = x — 2.
Construct a commuting diagram relating the functions f, g, h, and k.
Let A = {0, 1, 2}. Find all functions fin A4 for which
(a) f(x) = f(x)
(b) f(x) =x
() POa)=x
Let f be a function from A to A. Prove that for all m,n € N, ff" = ft,
Let Z be a finite alphabet.
(a) Let x € &* and ||x|| denote the length of x. Prove that
Vx Vyl(x € Z* A y € X*) => | xyll = lll + lly.
(b) The reversal of a string x € X*, denoted %, is defined inductively as follows:
1A=A
2. dx = Xawherea & Landxeé X*.
Prove
Vx Vylx € L* A y © L* > xp = PX].
Consider the set of functions f,: R* — R defined as follows:
n
FlX1, X25 6+ +5 Xn) = 2 x.
i=

Prove by induction on ” that f,(x;, x2,...,%,) = 0. You may assume x? > 0 for
any x © Randifx,y>Othenx+y>0.
Define f: N2 - N as follows.
1. (0,2) = 1 for alln € N, .
2. f(m+1,n) = f(m, n)-n.
Find an algebraic expression for fand prove by induction that it represents f-
Using addition, inductively define a function f: N? - N such that f(x, y) = x-y.
Write an iterative algorithm and a recursive algorithm to compute the value of
m-+n for mneéN. Your algorithm should use only the successor function,
S(n) = n + 1, and the predecessor partial function P, where P(S(#)) = n and P(0)is
undefined.
10. Write an iterative algorithm and a recursive algorithm to compute m” for m,n & N.
Assume only the operations of addition and multiplication together with the pre-
decessor partial function.
11. Let f be the “91 function” defined in this section.
(a) Show that f(99) = 91.
(b) Prove that f(x) = 91 for all x from 0 to 100.
12. Consider the following partial functions from R to R:

gx) =-)
204 FUNCTIONS Ch. 4

h(x) = x?,
k(x) = / x.
For each of the fo ll ow in g co mp os it e pa rt ia l fu nc ti on s, ch ar ac te ri ze th e su bs et of R
for which the pa rt ia l fu nc ti on is de fi ne d, gi ve an al ge br ai c ex pr es si on for th e co m-
posite partial function, and characterize the image of the partial function.
(a) eg
(b) Ak
(c) kh

Programming Problems

1. Write both iterative and recursive procedures which, when passed 2 € N, will return
the nth element of the Fibonacci sequence.
2. Recursion can be used to define functions which grow very fast. Consider the func-
tion defined recursively as follows:
A:N2-—~N,
A(n, 0) =n +1,

AQ, m + 1) = A, m),
A(n + 1,m +1) = A(A(n, m + 1), m).
Write a program to evaluate this function for any argument. Investigate the comput-
ing time required to calculate A(0, 0), A(1, 1), A(2, 2),.... Warning: The time and
storage required to compute A(i, i) grow very fast as i increases.

4.2 SPECIAL CLASSES OF FUNCTIONS

Certain properties of functions are sufficiently important that additional termino-


logy has been developed to describe them.

Definition 4.2.1: Let f be a function f: A > B.


(a) fis surjective (onto) if f(A) = B,
(b) f is injective (one-to-one) if a 4 a’ implies f(a) # f(@’) Ge., if f@ =
f(a’), then a = a’),
(c) fis bijective (one-to-one and onto) if fis both surjective and injective.
Functions with these properties are called surjections, injections, and bijections
respectively.

If f: A — B is surjective, then every element b € B is in the image of f. If f


is injective, then different elements of the domain are mapped to different elements
of the codomain. If fis bijective, then f effectively “pairs off” elements of A and B
in what is often called a one-to-one correspondence; each element of B is equal
to f(a) for exactly one a € A.
Consider the digraph associated with a function f: A — B. Since fis a func-
tion, every element of A is the origin of exactly one arc of the digraph. If fis sur-
Sec, 4.2 SPECIAL CLASSES OF FUNCTIONS 205

jective, then at least one arc terminates at each element of B. If f is injective, no


more than one arc terminates at each element of B, and if fis bijective, then exactly
one arc terminates at each element of B.

Examples
(a) The following digraphs illustrate the concepts of Definition 4.2.1. For each
function, the domain and codomain are represented as columns of dots on the
left and right sides respectively.

— o> oo »@

= a
Injective Surjective
Not surjective Not injective Bijective

(b) Let f: {1, 2} > {0}.


Since the codomain of f is a singleton set, we need not specify f in detail,
because implicitly f(1) = f(2) = 0. The function f is surjective but not injec-
tive.
(c) Let f: {a, b} — {2, 4, 6}, where f(a) = 2 and f(6) = 6. The function f is injec-
tive but not surjective since 4 is not the value of ffor any argument.
(d) Let f: N > N, where f(x) = 2x. :
This function is injective but not surjective; the image of fis the set of even
non-negative integers.
Let f: 1 I, where f(x) = x +1.
This function is bijective.

(f) Let [a, b] denote the closed interval of real numbers, [a, b] = {x|a <x < 5},
where a <b, and let f: [0,1] — [a, 5], where f(x) = (6 — a)x +a. This
function is a bijection.
(g) The empty relation is an injective function from an empty domain to an
arbitrary codomain. If the codomain is also empty, then the function is a bijec-
tion.
(h) The properties of being injective, surjective and bijective all can be interpreted
in terms of the graphs of functions from R to R. Consider the following graphs

OEE re
of functions.

f@)=x f(x) = x? fx)y=2* f(x) =x3 + 2x?


206 FUNCTIONS Ch, 4

Since these are graphs of functions from R to R, any vertical line will intersect
the graph at exactly one point. If every horizontal line intersects the graph at
leas t once , then the gra ph repr esen ts a surj ecti ve func tion . Thus , of the abo ve
functions, f(x) = x and f(x) = x3 + 2x? are surjective but the others are not.
If no horizontal line intersects the graph more than once, then the function
is injective. Thus, f(x) = x and f(x) = 2* are injective but the others are not.
If every horizontal line intersects the graph exactly once, then the function is
bijective; f(x) = x is bijective and the others are not.
(i) Sets of data records, called files, are often stored in tables or vectors; if T
deno tes the table , then each reco rd is loca ted at some tabl e addr ess 7;. Ther e
are many ways of assigning a table address T; to each record. In one method,
a hash function uses a part of each record called the key to compute a storage
address for the record; hash functions are also key transformations. For exam-
ple, a company might use each employee’s social security number as the key
to access the employee’s record. If the company has 400 employees, their
records could be stored in a table with 500 entries by using the first 3 digits of
the social security number mod 500 as the table address. Thus, if fis the hash
function, then
£(136 29 4516) = 136
(729 00 0345) = 229
This hash function would map each social security number into an address 7;
where 0 < 7; < 499. This particular hash function will probably be unsuitable
because it is likely that too many records will be assigned to some table
addresses. More suitable hash functions usually involve the entire key. An
example of such a function for social security numbers would be

Sf (x1X2X3 6. XgXo9) = Yiyi¥3 where y; = (x) + X4 + X7) mod 5,


Yo = (X2 + x5 + xg) mod 10, and
y3 = (%3 + X56 + X9) mod 10,
e.g., [(188 26 9416) = 253.
Ideally, a hash function is an injection from the set of key values which occur
in the file to the set of possible addresses. When two or more key values have
the same hashed address, a collision is said to occur; a collision indicates that
the function is not injective. It is generally not practical to design hash func-
tions which are injective, and therefore provision must always be made for
handling collisions. The number of collisions can usually be reduced by increas-
ing the table size, but this also increases the amount of unused storage. In
designing a hashing scheme, one must weigh the cost of handling collisions
against the cost of empty storage. #

Theorem 4.2.1: Let fg be a composite function.


(a) Iffand g are surjective, then fg is surjective.
(b) Iffand g are injective, then fg is injective.
(c) If fand gare bijective, the fg is bijective.
Proof: Letg: A— Band f: BC.
(a) (if fand g are surjective, then fg is surjective.) Let c be an element of C.
Sec. 4,2 SPECIAL CLASSES OF FUNCTIONS 207

Since f is surjective, there is some element b & B such that f(s) =c.
Since g is surjective there is some element a € A such that g(a)=b.
Then fg(a)= f(g(a)) = f(6) =c, and therefore c € fg(A). Since c
' was arbitrary, this establishes part (a).
(b) (If f and g are injective, then fg is injective.) Let a, b be elements of A,
and assume a + b. Since g is injective g(a) % g(b). Since / is injective
and g(a) # g(b), it follows that f(g(a)) + f(g(b)). Therefore, a +b
implies fg(a) fg(b), which establishes part (b).
(c) (If f and g are bijective, then fg is bijective.) Since f and g are bijective,
they are both surjective and injective. From parts (a) and (b), it follows
that fg is both surjective and injective and therefore bijective. J

Examples
(a) Let A be the set of negative integers, and define the bijections f and g as
follows.
g:A—- I+, where g(x) = —x;
f:14+ -N, where f(x) = x — 1.
Since f and g are bijections, the composite function fg is a bijection and
fe(x) = —-x — 1.
(b) We will construct an injection from [0, 1] to (0, 1). Define g:[0, 1] — [0, 4]
by g(x) = x/2; andf: [0, 4] > (0, 1) by f(x) = x + 4. Then fg: [0, 1] > ©, 1)
is the injection fg(x) = x/2 + 4. The image of (0, 1] is the interval [4, 3] which
is contained in the open interval (0,1). #

The converse of each part of the Theorem 4.2.1 is false, but the following
theorem provides a “partial converse” to each of its assertions; its proof is left as
an exercise.

Theorem 4.2.2: Let fg be a composite function.


(a) If fg is surjective, then / is surjective.
(b) If fg is injective, then g is injective.
(c) If fg is bijective, then fis surjective and g is injective.
The following classes of functions are also useful.

Definition 4.2.2: A function f: A—>B is a constant function if there exists


some b € Bsuch that f(a) = b for every a € A, i,e., f(A) = {b}.

Definition 4.2.3: The identity function on A, denoted 1), is the function on A


such that La) = a for alla € A.

Note that every identity function L, is a bijection, The next theorem asserts
that if f: A — B, then the identity function of A is a “right identity” for f and the
identity function on B is a “left identity” for f.
208 FUNCTIONS Ch, 4

Theorem 4.2.3: Let f: f=


A—- B. Then fol,= 1, 0f.

The proof is left as an exercise.

The following commutative diagram represents Theorem 4.2.3.

A f B

1, lp

A B
Definition 4.2.4: A permutation on A is a bijective function on A.

Examples
(a) The identity function on a set A is a permutation on A.
(b) The function /: {0, 1, 2} > {0, 1, 2}, where f(0) = 1, f(1) = 0 and f(2) = 2,
is a permutation.
(c) The function f: 1 I, where f(x) = x + 3, is a permutation on the inte-
gers. #

The result of applying a permutation f on A to the entire domain A is a


“rearrangement” of A where a € A is replaced by f(a). A rearrangement of A is
often called a permutation of the set A. Since every permutation is a bijection and
the composite of two bijections is a bijection, it follows that the composite of two
permutations is a permutation. This can be expressed by saying that permutations
are closed under (the operation of ) composition.
When the domain and codomain of a function are linearly ordered, the follow-
ing special terminology is used to describe functions which preserve or reverse the
order of elements of the domain. We will state the definitions for functions from
R to R, but the concept generalizes in a straightforward way to other linearly
ordered sets.

Definition 4.2.5: A function f:R-»R is monotone increasing if x <y


implies f(x) < f(y) and strictly monotone increasing if x < y implies f(x) < f()).
The function is monotone decreasing if x <y implies f(x) > f(y) and strictly
monotone decreasing if x < y implies f(x) > f(y).

If f is strictly monotone increasing, then fis monotone increasing; if / is strictly


monotone decreasing, then fis monotone decreasing.

Examples
(a) Let f:N— Nand f(x) = x + 1. Thenf is strictly monotone increasing.
(b) Any constant function on R is both monotone increasing and monotone
decreasing.
(c) The function f: R — R such that f(x) = x* is neither monotone increasing nor
monotone decreasing. +
Sec, 4.2 SPECIAL CLASSES OF FUNCTIONS 209

Inverse Functions

If fis a bijection from A to B, then f consists of a set of ordered pairs with the
property that every element a € A appears exactly once as the first element of a
pair and every element b € B appears exactly once as the second element of a pair.
The converse relation, formed by reversing the ordered pairs of f, is a relation with
the same properties, i.e., the converse of fis a bijection from B to A.

Definition 4.2.6: Let f: A—B be a bijection from A to B. The inverse


function of f, denoted f~1, is the converse relation of f.

Note that the inverse function f~' is defined only if fis a bijection.

Theorem 4.2.4: Let f be a bijective function, f: A—> B. Then f-' isa bijec-
tive function and f-!: B— A.
Proof: Consider the sets of ordered pairs corresponding to fand f-!.
f= {a blac AANbe BA f@ =),
fo? = {<b, a>| <a, b> € fF}.
Since fis surjective, every b € B occurs in an ordered pair <a, bY € f and hence
appears in an ordered pair <b, a> € f~!. Furthermore, since f is injective, for
each b € B there is at most one a € A such that <a, b> € f; hence there is only
one a & A such that <b, a> € f~!. These two statements establish that f~! is a
function and f-'!: B- A.
We leave it as an exercise to show that f~! is bijective. Jj

The inverse function has the property that it can be composed with the function
f to form an identity function. For if f(a) = b, then f~1(b) = a and it follows that
fUL@ = fUS@) = f'O) = a;
therefore, f~'f = 1,. Similarly,
Sf-(6) = fF) = f@ = 4,
which establishes that ff-! = 1,. Note that composing f and f~! always results
in an identity function but the domain may be either A or B, depending on the
order of the composition.

Example
Let f: {0, 1, 2} — {a, b, c} be defined by the following digraph:
210 = =FUNCTIONS Ch. 4

Then f~! is represented as follows:

fo

jou)

a
SY
e

bo
These functions can be composed to form 1, and 13.

Theorem 4.2.5: If fis bijective, then (f~!)"' =f.

The proof is left as an exercise.


Definition 4.1.2 established a notation for the image of a subset A’ c A under
amap f: A — B. This notation defined
f(A’) = (fO)|y € 44.
A similar notation is used to denote the set of elements in A which are mapped to
a subset B’ < B.

Definition 4.2.7: Let f: A— B, and let B’ < B. Then f~'(B’) denotes a


subset of A called the inverse image or pre-image of B’ under f:
f-(B') = {x| f@) € BY}.
Just as the symbol f denotes a function from @(A) to ®(B) when it is written
with an argument A’ c A, the symbol f~! denotes a function from @(B) to (A)
when it is applied to an argument B’ < B. Thus the notation f~! is used to denote
both the inverse function of a bijective function f and the inverse image of a set
under an arbitrary function f/ The notation f~' is ambiguous only when the
argument is both an element of the codomain and a subset of it. In most cases,
the argument of f~! specifies whether an inverse function or an inverse image of
a set is intended.
Sec. 4.2 SPECIAL CLASSES OF FUNCTIONS 211

Examples
(a) Consider the function represented by the following digraph:

0.
1.

Mt
2.
3.

Then f~*({a}) = {0}, f-*({a, b}) = {0, 1, 2}, f- (fc, d}) = {3}, and f-'({d, e}) =o.
Note that f does not have an inverse function.

(b) It is possible for the notation f-! to be ambiguous. Suppose f: A > B, where
A = {X, Y}, B = {1, {1} and f(X) = 1, f(Y) = {1}. Then, using the inverse
function of the bijection f,

f(a) = ¥.
But, using the induced function from @(B) to P(A),

f7() = {xX}. #
If A ~¢ and f: A — B, then the collection of sets {f-'({b})|b € B} forms
a partition of A, and the associated equivalence relation is known as the equivalence
relation induced by f. Two elements are equivalent under this relation if the function
J maps them to the same element of B.

Theorem 4.2.6: Let f: A— B and define the binary relation ~ on A as


follows:
a~bf(a<
) >
= f(b).
Then ~ is an equivalence relation on A.

The proof is left as an exercise.

Example
Let A = {], 2,3, 4}, B = {a, b, ch, and f: A B.
If f() =a, f(2) = 6, f(3) =e and f(4) = c, then the equivalence relation on
A induced by fhas equivalence classes {1}, {2}, and {3,4}. #

Definiton 4.2.8: Let R be an equivalence relation on a set A. The function


g:A— A/R,
g(a) = [ale,
is the canonical map from A to the quotient set A/R.
212. FUNCTIONS Ch. 4

Example
Let A = {I1, 2, 3} and let ~ be an equivalence relation on A with equivalence
classes {1, 2} and {3}. Then the canonical map from A to A/~ is the function g
defined as follows:
g: (1, 2, 3} > (1, 2}, GB,
g(1) = {1, 2}, (2) = {1, 2}, ¢3) = (3). #

The following definitions give us additional facilities for creating and modify-
ing functions. The first definition allows us to form a new function by deleting
part of the domain of a given function.

Defi niti on 4.2. 9: Let f : A— B, and let A’ be a subs et of the dom ain of
f. The restriction of f to A’ is the function denoted f |, and defined as
Ave A’ B,

Ff \akx) = f).
The next definition enables us to enlarge the domain of a function.

Definition 4.2.10: Let f: A’ > B, g: A— B, and A > A’. Then gis an exten-
sion of f to the domain A if gla =f.

Examples
Let f and g be defined by the following diagraphs.
A}
PY

&
N

&
Rh
o>

2. a
<<
4. c
ee
e
Then g = f |2,3,4) and fis an extension of g to the domain {1, 2, 3,4}. #

The following class of functions provides a way to specify sets using functions.

Definition 4.2.11: Let A be a set. For every set A’ cA, the characteristic
function (with domain A) of the set A’, denoted y», is defined as follows:
Sec. 4.2 SPECIAL CLASSES OF FUNCTIONS 213

XNA A > {0, 1},

Xa (a) = 1 fora € A’,


Xa (a) = 0 fora ¢ A’.
The domain of a characteristic function is not specified by the notation y, and is
usually implicit in the discussion.

Examples
(a) Let A = {a, b,c} and let A’ = {a}. Then

Xala) = 1,
Xa(b) = 0,
Xalc) = 0.
(b) Let A = [0,1] and 4’ = [4, 1]. The following is a graph of the function
Mar

I cuanemmnmmn

> #
tOne-Sided Inverse Functions

Earlier in this section we established that if f: A — B is a bijective function,


then an inverse function f~! is defined and f-'f = 1, and ff-! =1,. In the first
case above, we say f~' is acting as a /eft inverse and in the second case as a right
inverse. Because f~' acts as both a left inverse and a right inverse, it is sometimes
called, for emphasis, a two-sided inverse. Only bijections have a two-sided inverse,
but some other functions possess one-sided inverses. The existence of a left ora
right inverse is determined by whether the function is injective or surjective.

Definition 4.2.12: Let h: A— B and g: B—> A. If gh =1,, then g is a left


inverse of h and h is a right inverse of g.

A function g is a left inverse of h if applying the function g will “undo” the


effect of the function h; thus, the composite function gh maps each element of the
domain of h to itself. Similarly, a function A is a right inverse for g if applying A
before g will nullify the effect of g.

Theorem 4.2.7: Let f: A — B, with A < ¢. Then


(a) /has a left inverse if and only if f is injective.
(b) /has a right inverse if and only if fis surjective.
FUNCTIONS Ch. 4
214

(c) fhas a lef t and rig ht inv ers e if and onl y if fis bij ect ive .
(d) Iffis bijective, then the left and right inverses of fare equal.

The fol low ing ill ust rat ion is app rop ria te to par t (a) of the the ore m.

f g h
A Qt @ () oO ew
a g e 0

e | 1 i

b o——____>e 2 b e~<—_____e 2 aa 2

Let A = {a, b}and B = {0, 1, 2}, and let f: A — B. The fun cti on f has two dis tin ct
left inverses which we hav e na me d g and h. If fis inj ect ive , a left inv ers e ma y alw ays
be formed by mapping eac h ele men t f(a ) in the ima ge of f bac k to a and ma pp in g
each element of B which is not in f(A ) to som e arb itr ary ele men t of A. If f wer e
not injective, there wou ld be two ele men ts a, a’ < A suc h tha t a # a’ and f(a ) =
f(a’). Thus f would “merge ” two ele men ts of A and no left inv ers e wou ld exi st.

Proof of (a): We firs t est abl ish tha t if a left inv ers e exis ts for f, the n fis inj ec-
tive. Suppose g is a left inv ers e for f. The n gf = 1,, whi ch is inj ect ive . It fol low s
from Theorem 4.2.2b that f is injective.
We next use a con str uct ive pro of to sho w tha t if fis inj ect ive , the n the re exi sts
a left inv ers e g. Cho ose an arb itr ary ele men t c € A and def ine g as fol low s:

g: BA,
g(b) =a if b < f(A) and f(a) = B,
g(b) =e ifb ¢ f(A).
The fun cti on g is wel l-d efi ned , sin ce exa ctl y one val ue is spe cif ied for eac h arg u-
ment b € B. Fur the rmo re, g is a left inv ers e of f sin ce if f(a ) = b, the n gf (a )=
(f(a) = g)=4.
The following illustration is appropriate to part (b) of the theorem.

f g h
a Me) a eo«—___—_—-# 0) a e=x——___-__@ 0

be .
b " b

€ l ce 1 Cc <2 |

Let A = {a, b, c} and B = {0, 1}. The function f is a surjection from A to B,


and f has two distinct right inverses, g and A. Since f is surjective, a right inverse
can be formed by mapping each b € B to some a € A such that f(a) = b. If f
is not surjective, then such a construction is not possible and the right inverse does
not exist. The proof of (b) is left as an exercise. Part (c) follows immediately from
parts (a) and (b). We now prove part (d).
Sec. 4.2 SPECIAL CLASSES OF FUNCTIONS

Proof of (d): Suppose f is bijective with a right inverse A and a left inverse
g; then go f= 1, and foh = 1,. From Theorem 4.2.3,
&= golg=gofoh=1,oh=h, i

If f is surjective, we denote a right inverse of f by f~%; if f is injective, we


denote a left inverse by f-“. We will continue to refer to a two-sided inverse of f
as the inverse of f and denote it by f7!.

Problems: Section 4.2

1. For each of the following functions determine


(i) whether the function is injective, surjective, or bijective,
(ii) the image of the function,
(iii) the inverse image of the given set S,
(iv) the equivalence relation induced by the function, and
(v) an expression for f~! if fis bijective.
(a) f:R—-R, (b) f:R—R+,
I(x) = x, S (x) = 2%,
S = {8}. S = {i.
(c) f: NON XN, (d) f: N-N,
fi) =n, n+), f(n
= 2n)+1,

(e) f:I1->N, (f) f: (0, 1] - [0, 1],


I(x) = |x|, f(x) = x/2 + 1/4.
S = {1, 0}. S = [0, 1/2].
(g) f:R-R, (h) f: [0, co) > R,
f(x) = 3, f(x) = 1/1. + x),
S=N., S = {0, 1/2}.
(i) f: fa, b}* — fa, b}*, G) ff: (0, 1) > (, 0),
I(x) = xa, I(x) = I/x,
S = {A, 5, bat. S = (0, 1).
Under what conditions is the length function which maps £* to N a bijection?
Let A be an arbitrary set and » € N. Define S to be the set of all maps from
{0, 1, 2,..., — 1} to A, and define T to be the set of all n-tuples of elements of
A, T = {<do, ay, Q2,..-, A,-1>|a; € A}. Show there exists a “natural” bijection
from S to T. Because of this bijection, the notation A" is used to denote both of the
sets S and 7.
(a) Find a set A and functions f, g € A4 such that fis injective and g is surjective
but neither is bijective. Choose A as small as possible.
(b) Prove that iff € A4 and fis injective (surjective; bijective) then f” is also injec-
tive (surjective; bijective) for all n € N.
Let A and B be finite sets. Suppose A has m elements and B has n elements. State the
relationship which must hold between m and xn for each of the following to be true.
(a) There exists an injection from A to B.
(b) There exists a surjection from A to B.
(c) There exists a bijection from A to B.
Prove there exists an injection from A to P(A) where A is an arbitrary set.
216 FUNCTIONS Ch. 4

For each of the following sets A and B, construct a bijection from A to B.


(a) A = {0,1, 2}, B = {a, b, ch.
(b) A = (0,1), B = (0, 2).
(c) A=ITLB=N.
(d) A=N,B=NXN
(ec) A=IXxXILB=N
(f) A=R, B= (0, ~)
(g) A=(-1,),B=R.
(h) A = O({a, b, c), B= 2%
(i) A=N, B= z*, where & = {a, 5}.
(j) A= (0,1), B= G, 4).
Prove Theorem 4.2.2.
Prove Theorem 4.2.3.

10. Let fand g be monotone increasing functions on R.


(a) Show f + g is monotone increasing.
(b) Show the composite fg is monotone increasing.
(c) Show that the product of fand g may not be monotone increasing.
11. Let f: A> Bwhere C c Aand Dc B.
(a) Prove f(A) — f(C) < f(A — C).
Under what conditions do the following equalities hold?
(b) f-(B— D)=A— f-(D).
() S(O f(D) = f(C) 2 D.
12. Let f: A> B, B’ < B, A’ c A. Show that
(a) f(f-1(B) < B’.
(b) If fis surjective, then f(f~!(B’)) = B’.
(ce) f-f(A)) > A’.
(d) If fis injective, then f-!(f(A)) = A’.
13. Complete the proof of Theorem 4.2.4.
14. Prove Theorem 4.2.5.
15. Let fi, fo, fs, fa be the following functions from R to R
> 0,
/i(x) = lifx
= —lifx <0.
Si(x)
= x.
A(x) = -lifxe I,
=lifx é@I.
fa(x)
= 1.
Let E, be the equivalence relation induced by the function f,.
(a) Draw a digraph which represents the following poset:
<{R/E,, R/E2, R/E;, R/E,}, refines>
(b) For each i, find the image of 0 under the canonical map from R to R/E,.
(c) Is the digraph of part (a) connected? Strongly connected ?
16, Let f be a function from A to B where A has n > 2 elements. State necessary con-
Ch, 4 SUGGESTIONS FOR FURTHER READING 217

ditions on B and f for which the rank of the equivalence relation induced by fon A
is
(a) 1
(b) 2
(c) A
17. Let R be an equivalence relation on a set A. Under what conditions is the canonical
map g: A — A/R a bijection?
18. Prove Theorem 4.2.6,
19. (a) Prove that if f: A — Bis injective and A’ is any subset of A, then fla: A> B
is an injection.
(b) Suppose f: A’ > B is a surjection. Prove that if g is an extension of f to
A > A’, then g: A > Bis a surjection.
(c) Prove if f:A— B is a surjection, then there exists A’ c A such that
tla: A’ > Bis a bijection.
20. Verify the following for the characteristic functions of subsets A and B of C.
(a) Xa) = Xam — Lal).
(b) Xavae) = XAX) + Xal(x) ~— XA) Xalx).
(C) Xanax) = XAX)Za().
$21. Determine left and/or right inverses for the following functions when they exist.
Specify the equivalence relation induced on the domain by the function. In each
case, construct the canonical map.

(a) (b)
a.—____.-.-__.-» .0 a. ee ~0
bo ———— J
Co g
ee e c. 2
(c) (d)

(e)

$22. Complete the proof of Theorem 4.2.7.

Suggestions for Further Reading

The material in this chapter is classical and treated, at least briefly, in a num-
ber of books. The first two chapters of the text by MacLane and Birkhoff [1967]
will provide a distinct but related development of much of the material of our
Chapters 2, 3, and 4, along with some of the material of our Chapter 7.
S

COUNTING AND ALGORITHM ANALYSIS

5.0 INTRODUCTION

In order to compare, evaluate, and predict, we must often count the objects in a
finite set. For example, one way to compare the cost of applying two algorithms
is to determine, or at least estimate, how many operations each of them executes
when solving a problem. This is often done by counting only certain kinds of
operations which are executed by the algorithms. Thus, the cost of a direct method
for solving sets of simultaneous linear equations can be estimated by counting the
number of multiplications and divisions executed by the algorithm. The cost of
some sorting algorithms can be estimated by counting the number of comparisons
made between data items. The cost of using a particular data structure for a file
can be estimated by determining the average and maximum lengths of searches
for items stored in the data structure. Problems such as these ultimately involve
either counting (exactly or approximately) the elements of a set or enumerating
the elements of a set which have a common property. This chapter first introduces
some basic techniques for counting and enumerating the elements of finite sets;
we then illustrate how these techniques can be applied to the analysis of algorithms.

5.1 BASIC COUNTING TECHNIQUES

In this section, we will introduce some basic techniques of counting. We begin by


introducing the concept of the cardinality of a finite set. The cardinality of a finite
set is simply the number of elements in the set. The definition we give below is
chosen so that it can be extended to infinite sets as well.

Definition 5.1.1: A set A is finite if there is some natural number n c N


such that there is a bijection from the set {0, 1, 2,...,” — 1} to the set A. The

218
Sec. 5.1 BASIC COUNTING TECHNIQUES 219

integer 7 is called the cardinality of A, and we say “A has n elements,” or “nis the
cardinal number of A.” The cardinality of A is denoted by | A].

Example
Let A = {a, b, c}. Then the cardinal number of A is 3, i.e., | A| = 3, since the
function
Ff: {0, 1,2>}A,
fO) = a, f(1) = 6, f2) =e,
is a bijection from the first three natural numbers to A. #

The special case of the cardinality of the empty set deserves mention. As we
noted in Section 4.2, an “empty” function (consisting of the empty set of ordered
pairs) is an injection from the empty set to any set A, and if A is empty, then this
function is a bijection. Consequently, our definition states that a set A has cardi-
nality 0 if there is a bijection from the first zero natural numbers to A. But the set
consisting of the first zero natural numbers is empty, and a bijection will exist if
and only if A is empty. We conclude that | A| = 0 if and only if A = ¢.
We now introduce a fundamental rule of counting known as the “pigeonhole
principle.” Informally, the pigeonhole principle asserts that if m objects are placed
in n boxes (or pigeonholes) and m > a, then some box will contain more than one
object. This principle, which we will not prove, can be stated more formally as
follows.

Pigeonhole Principle: If A and B are finite sets with |A| =m and |B|=n
and m > n, then no injection exists from A to B.

When an intuitive notion, such as the size of a set, is characterized by means


of a mathematical definition, it is important to verify that the properties of the
mathematical characterization agree with our intuitive concept. The next theorem
has this purpose; it uses the pigeonhole principle to prove that a finite set has only
one cardinal number.

Theorem 5.1.1: Let A be a finite set. Then the cardinality of A is unique.


Proof: Suppose |A| =m and |A| = 27; we will show that m =n. Assume
that m > n. Then by the pigeonhole principle, there is no injection from A to A.
But 1, is a bijection from A to A. Thus, the assumption that m > n leads to a con-
tradiction. Similarly, the assumption that n > _m will lead to a contradiction.
Hence,m=n. fj

The proof of the following theorem is left as an exercise.

Theorem 5.1.2: Let A and B be finite sets, and suppose there is a bijection
from A to B. Then |A| = | Bl.
220 COUNTING AND ALGORITHM ANALYSIS Ch. 5

Two additional principles are fundamental for counting sets which have been
formed by using the operations of union and cartesian product. We have implicitly
used these principles in earlier chapters, but for the sake of completeness, we will
state them as theorems about the cardinalities of sets; their proofs are left as exer-
cises. The first principle is called the Rule of Sum.

Theorem 5.1.3: \f A and B are finite disjoint sets with cardinalities m and n
respectively, then |A U B| = m-+ n.

The second fundamental principle of counting is known as the Rule of Product.

Theorem 5.1.4: If A and B are finite sets with cardinalities m and n respec-
tively, then |A x B| = mn.

Examples
(a) Suppose statement labels in a programming language must be either a single
alphabetic symbol or a single decimal digit. The first set, {4, B, C,...,Z},
has 26 elements, and the second set, {0, 1, 2,..., 9} has ten elements. Because
the two sets are disjoint, the rule of sum can be applied, and we conclude that
there are 26 + 10 = 36 possible statement labels.
(b) A variable name in the programming language BASIC must be either an
alphabetic symbol or an alphabetic symbol followed by a single decimal digit.
If § denotes the set of alphabetic symbols and D denotes the set of digits, there
is a one-to-one correspondence between the variable names and the set
SU (S x D). By the rule of product, there are 26-10 elements in S x D and
hence by the rule of sum there are 286 possible variable names in BASIC.
(c) Consider the puzzle sometimes called the “four cubes problem.” It involves
four cubes such that each face of every cube is painted one of four colors. The
problem is to stack the cubes in such a way that each vertical side of the stack
contains squares of all four colors.
The order of the cubes in the stack is clearly unimportant, and we do not
wish to distinguish between arrangements which are identical except for rota-
tion. We can count the number of significantly different arrangements as
follows:
1. The first cube can be positioned in any of three different ways because
there are three pairs of faces which can be made the top and bottom
surfaces. ,
2. For each remaining cube, one of the six faces must be chosen as the
bottom and then one of four possible rotational positions must be chosen.
This gives 24 different ways to position each of the last three cubes in the
stack.
Thus there are 3-24-24-24 = 41,472 different arrangements, making an
exhaustive search costly. For a discussion of how to solve the problem (easily!)
by constructing a graph with 4 nodes and 12 edges, the reader is referred to
Deo [1974], p. 18, or Busacker and Saaty [1965], p. 153. #
Sec. 5.1 BASIC COUNTING TECHNIQUES 221

We will now develop several basic counting results, all of which are based
on the rules of sum and product.

Theorem 5.1.5: Let A and B be finite sets with cardinalities m and n respec-
tively. There are n™ functions from A to B, ice.,

| BA] = | BIl4!
Proof: If A = ¢, then the assertion holds since we define n° = 1 for all
n & N. No functions exist from A to B if B is empty and A is not. If both A and B
are nonempty, then index the elements of A in some arbitrary fashion with the
first m natural numbers: do, a;, d2,...,@,-,. Each element of A can be mapped
to any of n elements of B. Thus, there are n possible values of f(a), n possible
. values of f(a,), etc. It follows that there are n-n-n...n-n or n™ functions. Hence,
| B4| = | Bil | m factors

Example
Assume we wish to represent integers using sequences of n digits, where each
digit is one of b distinct symbols, b > 2. Choosing the symbol set to be

(0,1,2,...,5—1j,
each n digit sequence of symbols can be associated in a natural way with exactly
one function f/f: {0,1,2,...,”—1}— {0,1,2,...,6—1}. Thus, there is a
bijection from the set of all such sequences to {0, 1, 2,..., 8 — 1}{0.1:2.--.m-1),
By Theorem 5.1.5, there are b* functions from
{0,1,2
—1} ,.
to {0,1,2
.. ,...,6
,n -VJ
and therefore we can represent b" distinct integers. In the case of the standard
positional number notation in base 6, where the sequence

Qn ~14n—-24,~3 °° * A1ag
represents the number

G,y0"" 3 ++ An—2.b"~? + ane + a,b} + ayb®,

each sequence of length m represents an integer greater than or equal to 0 and less
than b. #

We proved the following assertion inductively in Section 2.5. Here the result
follows as a special case of the preceding theorem.

Corollary 5.1.5: If A isa finite set, there are 2'4! distinct subsets of A.
Proof: For each subset A’ < A, let X be the characteristic function of A’:
Xvi A — {0, 1},
4x) = lifxe A,
= 0 otherwise.
For every pair of subsets B, C contained in A, X, = X, if and only if B = C. Hence,
AND ALGORITHM ANALYSIS Ch. 5
222 COUNTING

are ch ar ac te ri st ic fu nc ti on s de fi ne d on A,
there are as many subsets of A as there
and by Theorem 5.1.5, this number is 2'4!. Jj

Permutations and Combinations

Recall that a permutation of a set is a bi je ct io n fr om the set to its elf an d tha t


is cl os ed un de r fu nc ti on co mp os it io n, i.e ., if f an d
the set of permutations of a set
g are permutations of a set A, th en fo g an d go f are pe rm ut at io ns of A.

Let A bea finite set wi th m el em en ts . Th e nu mb er of di st in ct


Theorem 5.1.6:
permutations of A is n!
Proof: If A = ¢, the n the re is one bij ect ion of A to A, na me ly the em pt y
function. Thus, if |A| = 0, the re is 0! = 1 pe rm ut at io n of A. If A is not em pt y,
then let a), 4,,42,..-,4,-; be an ar bi tr ar y ar ra ng em en t of the el em en ts of A.
A function f: A — A can be de fi ne d by fir st ch oo si ng Ff (ao ), the n f(a ,), an d so on.
If fis a bijection on A, then the re are n ch oi ce s for f(a ,), — 1 ch oi ce s for f( a) ,
n — 2 choices for f(a,) and in ge ne ra l n — i ch oi ce s for f(@ ,). Ap pl yi ng the rul e
of prod uc t, it fo ll ow s tha t the nu mb er of pos sib le bij ect ion s is
n(n ~ 1)(n —2)...3-2-1=n! §j
The permutations of a set can be put in a one-to-one correspondence with
or de re d ar ra ng em en ts of the el em en ts of the set. Let ao,@i,..- 5 ,-1 be some
arbitrary but fixed arrangement of the elements of a finite set A. Then any arrange-
ment of the elements of A can be associated with a bijection from A to A; the
arrangement a), a\,...,@,-, corresponds to the permutation f: A—> A where
f(a) = 4.
Examples
(a) Suppose a list is to be for med fro m n dis tin ct ite ms. If we dis tin gui sh bet wee n
different ord eri ngs of the ite ms, the n the re are n! dif fer ent list s whi ch can be
formed.
(b) Let A and B be fini te sets . Ho w man y bij ect ion s are the re fro m A to B? If
|A| | Bl, then no bijections exist from A to B. If|A| =|B| = x, then there
are n! bijections. #

Consider a process which selects r objects sequentially from a set of n objects.


If eac h ele men t of the set is elig ible to be cho sen rep eat edl y, the n the pro ces s is.
said to be a selection with replacement. Thus, if one were drawing items with
replacement from a jar, each time an item is drawn from the jar, its identity would
be not ed and the n it wou ld be rep lac ed in the jar, mak ing it a can did ate for fut ure
draws. If r drawings are made from a jar with n objects and the output of the
process is taken to be the resulting sequence of r objects (i.e., an r-tuple of objects
from the set), then a selection with replacement has n’ possible values, each of
which is an r-tuple, (a,,a,,...,4,>. (Note that if r= 0, then n’ = 1; there is
only one sequence of 0 length.)
Sec. 5.1 BASIC COUNTING TECHNIQUES 223

Now suppose the selection process is one in which each item can be selected
at most once; in this case, the process is said to be a selection without replacement.
The sequence which results from a selection without replacement of r objects from
n objects where r < n, is called a permutation of n objects taken r at atime. A
permutation of n objects taken r at a time is an r-tuple, (a,,a,,..., a,> such that
each a; is one of n objects and if i # j, then a, ¥ a,.

Theorem 5.1.7: The number of permutations of n objects taken r at a time,


denoted P(x, r), is equal to n(n — 1)(n —2)...(n—r+ 1):

P(n, r) = aoa
Proof: \fr = 0, then P(m, r) = | because there is only one empty sequence.
Suppose r > 0. Then there are n possible values for the selection of the first of
r objects from n objects. Since selection is without replacement and one object
has been chosen, there are only n — 1 possible values for the selection of the
second object. Similarly, there are n — i+ 1 possible values for the selection of
the ith object for all i,1<i<r. By the rule of product, we have
Pa
r) ,
= n(n — ID —2)---m—rt+)D=ant(r—r! §

Examples
(a) Let & = {a, b, c, d, e}. Find the number of strings in £* of length 3 such that
no symbol is used more than once. This is the number of permutations
of 5 things taken 3 at a time because selection is without replacement, and
PG, 3) = 5-4-3 = 60.
(b) Find the number of injections from a finite set A to a finite set B. If| A| > | Bl,
there are no injections from A to B (this follows from the pigeonhole principle).
If |A| <|B|, then the number of injections is P( Bl,|A). #

Consider a process which selects a subset of r objects from a set of n objects,


ignoring the order in which the objects are selected. If the selection is without
replacement, the result is called a combination of n objects taken r at atime. The
number of ways in which such a selection can be made is called a binomial coefficient
and is denoted by either (7) or C(n,r). The value of (7) is the number of
distinct subsets of cardinality r which are contained in a set of size n, Clearly
( 0 ) = |, since there is only one empty subset of any collection of m objects, and

(" ) == 1 since there is only one way to choose the entire set of n objects. If

r<0O or r>n, we define (") to be 0. The next theorem provides a general

expression for ( . ) when 0O<Or<n.


COUNTING AND ALGORITHM ANALYSIS Ch. 5
224

Theorem 5.1.8: Let r,n < Nand r <n. The number of combinations of n
_ n!
things taken r at a time is (7 )
~ ria — ryt
Proof: An ordere d list of r ele men ts can be fo rm ed by fir st ch oo si ng r ele -
ments and then ordering the m. Co ns eq ue nt ly , the nu mb er of list s of r ele men ts,

P(n, r), is equal to the num ber of way s of cho osi ng a sub set of r ele men ts, (7) ,
times the num ber of way s of arr ang ing the r ele men ts in a list , r! Thu s

P(r) = (7) +r!

and therefore

Note that (¢ ) = (, n ,): This equality can be understood by considering


the ways of choosi ng a sub set B of r ele men ts fro m a set A of n ele men ts. Eac h
possib le cho ice of r ele men ts to be inc lud ed in B cor res pon ds to exa ctl y one cho ice
of (n — r) elements to be excluded from B.

Theorem 5.1.9: For every integer n > 0, s (7 ) == 2°,


r=Q

Proof; Let A be a finite set with cardinality n. Then the number of distinct
> CG ).
subsets of A with r elements is (" ), and the total number of subsets is r=0
By Corollary 5.1.5, the number of subsets of A is 2”. Jj

Counting techniques often enable us to identify algorithmic solutions which


are theoretically correct but infeasible because of the magnitude of the computa-
tional task. The following problem illustrates this kind of difficulty.

Example: The Traveling Salesman Problem


A salesman wishes to visit each of n cities, beginning and ending in City #1.
There is a road between every two cities, and we denote by c;,; the distance between
the ith and jth cities. The problem is to devise an algorithm which will find the
shortest route the salesman can take.
The traveling salesman problem is mathematically equivalent to many prob-
lems of considerable practical importance. For example, consider the scheduling
problem of a large computer system: In what order should a set of computer pro-
grams be run? Each job requires certain resources, such as a compiler in main
memory, a segment of main memory, and some set of disk and tape drives. Each
combination of resources required by a program corresponds to a city to be visited
by the salesman, and City #1 corresponds to the initial configuration of the system.
The conversion of the system from one configuration C; to another, C,, does not
produce useful output; the costs of this conversion, denoted c;;, is part of system
Sec, 5.1 BASIC COUNTING TECHNIQUES 225

overhead. The total system overhead depends on the order in which the jobs are
run. For example, if two programs both require an ALGOL compiler, running one
program after the other will often eliminate the cost of bringing the compiler into
core the second time. This is the reason for “batch processing” programs written
in a single language. An algorithm to solve the traveling salesman problem would
enable us to specify the sequence of jobs which will minimize the total system
overhead for running the programs.
The set of n cities can be thought of as a complete digraph of n nodes; the values
ce;; represent the distances between the nodes. If the triangle inequality holds, then
the shortest route will visit each city other than C, only once, and so the only
routes of interest are the simple cycles beginning and ending with C,. It follows
that there are (7 — 1)! possible routes for the salesman. The most straightforward
way of finding the shortest route would be to list all (7 — 1)! cycles and then
calculate the total distance associated with each cycle. Such a process of “complete
enumeration” has the virtue of being easily programmed, but the problems of
using such an algorithm become apparent if we consider an example for which the
number of nodes is not small.
Finding the total distance for a single route will involve n additions. Since there
are (n — 1)! possible routes, the total number of additions is n! Suppose there are
50 nodes. The value of 50! is approximately 3 x 106+, Even assuming a computer
which performs 10° additions per second, it will take more than 1047 years just to
perform the additions required by the algorithm. #

The straightforward algorithms for solving the traveling salesman problem are
easily written but impractical for large values of n. This is because the number of
operations required to solve the problem by complete enumeration grows very
fast as the number of nodes increases. In practice, the number of arithmetic opera-
tions can be reduced by eliminating duplications, and the size of the problem can.
sometimes be reduced by constraining the set of acceptable solutions or by using
heuristic methods which consider only some of the cycles. For any but small values
of n, one or more of these techniques must be incorporated if an algorithm is to be
economically feasible. Depending on the exact techniques chosen, however, the
resulting algorithm may not be guaranteed to produce the shortest route, but
rather the shortest of all those routes considered by the algorithm.

Decision Trees

Digraphs are often useful for counting and enumeration problems. In model-
ing system behavior, a state graph, or state diagram, is a digraph in which each
node represents one state of a system, and each edge represents a possible transi-
tion from one state to another. Each node of a state graph is labeled with a state
name, and the edges are labeled with the input or action which causes the transi-
tion.
We often wish to consider systems in which every sequence of transitions
causes the system to enter a unique state. If we consider only states which are
accessible from some given initial state, then the state graph is a tree whose root
ANALYSIS Ch. 5
226 COUNTING AND ALGORITHM

st at e; su ch tr ee s ar e of te n ca ll ed de ci si on tr ee s. Fo r so me
repr es en ts th e in it ia l
nv en ie nt wa y of en um er at in g th e se t of
problems, decision trees provide a co
ie s of a so lu ti on pr oc ed ur e. Ea ch in te rn al no de of a de ci si on tr ee
po ss ib le hi st or
le af co rr es po nd s to a so lu ti on . Ev er y in te r-
corresponds to a partial solution; each
ad di ti on al kn ow le dg e, an d ea ch br an ch
nal node is associated with a test to obtain
a no de is la be le d wi th a di st in ct te st ou tc om e. Vi ew ed in te rm s of
outwar d fr om
pr oc ed ur e co rr es po nd s to tr av er si ng a
its decision tree, execution of a solution
to a le af . Th e le ng th of th e pa th tr av er se d is eq ua l to th e nu mb er
path fr om th e ro ot
e, an d th e he ig ht of th e tr ee is eq ua l to th e
of tests made by the solution procedur
by an y ex ec ut io n of th e pr oc ed ur e.
maximum number of tests required
st ra ti on of th e us e of de ci si on tr ee s, su pp os e we ar e gi ve n ei gh t
As an il lu
co un te rf ei t an d he av ie r th an th e ot he rs .
coins, exactly one of which is known to be
co in us in g on ly a pa n ba la nc e to co mp ar e th e
We are asked to find the counterfeit
ng ha s th re e po ss ib le ou tc om es : th e le ft
weights of two sets of coins. Each weighi
in th e le ft pa n we ig h mo re th an th os e
pan can go down (indicating that the coins
ve l, or th e ri gh t pa n ca n go do wn . Fo r co n-
in the right), the pans can remain le
fo r th is pr ob le m, we as su me th e co in s
venience in describing solution procedures
are indexed from | to 8.
in wh ic h ea ch te st ha s tw o po ss ib le ou t-
A binary solution procedure is one
tr ee fo r a bi na ry so lu ti on pr oc ed ur e to fi nd th e
comes. Figure 5.1.1 is a decision
s 1 th ro ug h 4 ar e fi rs t we ig he d ag ai ns t co in s
counterfeit coin. In this algorithm, coin
se t of fo ur co nt ai ns th e he av y co in . Th e se t wi th
5 through 8 to determine which
an d th e pr oc es s re pe at ed . Th is al go ri th m,
the heavy coin is then divided in half

{1,2,3,...8} f1, 2,3, 4} vs. {5, 6.7, 8}

left right
pan pan
down down

{1, 2haQipvs.{2} {3,4 {ahvs. {4b {5,6

MM fy Bh {4} fs} tof AS


Fig. 5.1.1 Decision tree of a binary solution procedure using a
pan balance to find a counterfeit coin known to be heavy. Each
node is labeled with the set of coins which is known to contain the
counterfeit coin and the two sets of coins which are to be compared
in the next step of the algorithm. The left branch corresponds to the
left pan going down, and the right branch corresponds to the right
a pan going down.
Sec. 5.1 BASIC COUNTING TECHNIQUES 227

which requires three weighings to locate the counterfeit coin, is not very efficient,
since the coins are never weighed in a way which permits the pans of the balance to
remain level; thus, one of the possible test outcomes can never occur.
A ternary solution procedure involves tests with as many as three possible
outcomes. Figure 5.1.2 is a decision tree for a ternary solution procedure to find
the counterfeit coin. By exploiting the fact that each weighing can result in any
of three outcomes, this procedure reduces the number of weighings to two. A third
solution procedure, which requires from one to four weighings, is represented by
the decision tree of Fig. 5.1.3.
Efficiency is a prime consideration in algorithm selection but comparisons of
algorithms must be made with respect to the particular problem to be solved. For
example, if a heavy coin is known to exist and is most likely either coin 1 or coin 2,
then the algorithm of Fig. 5.1.3 may be preferred. But in the absence of informa-
tion to the contrary, we commonly assume that all possible outcomes are equally
likely. In this case, we often prefer a procedure in which the maximum number of
steps executed by the algorithm is as small as possible. A minimax procedure is one
which minimizes the maximum number of steps required to solve the problem.
When a solution procedure is represented by a decision tree, the height of the tree
is the maximum number of steps that can be executed. It follows that the height of
the decision tree of a minimax procedure is no greater than the height of a decision
tree for any algorithm which solves the problem. The algorithm represented by
Fig. 5.1.2 is minimax for the counterfeit coin problem in which one of eight coins
is known to be heavy.
Basic counting techniques can often be used to find bounds on the number
of steps of the minimax solution of a task.

{1,2,....8h afl, 2, 3} vs. {6, 7, 8}

{1, 2.3} a{i} vs. {3} {4,Sta {4} vs. {5} {6,
7, 8a {6} vs. {8}

t 2 Bb th 6 {5} fo} {7} 8}


Fig. 5.1.2. Decision tree for a ternary solution procedure using a
pan balance to find a counterfeit coin known to be heavy. A node
with the label ¢ denotes an outcome which cannot occur.
228 COUNTING AND ALGORITHM ANALYSIS Ch. 5

Fig. 5.1.3. Decision tree for using a pan balance to find a counter-
feit coin known to be heavy.

Example
Suppose you are given 13 coins where at most one is counterfeit; a counterfeit
coin is either heavier or lighter than a genuine one. Consider the problem of
devising a minimax algorithm to detect the counterfeit coin if one exists, and state
whether it is heavier or lighter. The algorithm should only use a pan balance for
comparisons.
Analysis: To determine a lower bound on the height of a decision tree for the
problem, we begin by arranging the coins in some fixed but arbitrary order. Any
one of 27 conditions may exist. For some i between 1 and 13, the ith coin may be
counterfeit and either heavy or light; hence, if there is a counterfeit coin, then one
of 26 conditions is possible. A 27th condition occurs if no coin is counterfeit. Con-
sequently, the decision tree of a solution procedure must have at least 27 leaves.
Since each weighing will have one of three possible outcomes, k weighings can
yield any of 3* different results, from which we must infer which of 27 conditions
holds, It follows that a minimum value of & can be obtained from the inequality

3* > 27.
Thus, we have found a lower bound for the number of weighings necessary; k >-3.
In fact, three weighings will not suffice. This can be shown by considering the
number of cases which must still be distinguished after the initial weighing. If the
initial weighing compares coins 1 through 4 with coins 5 through 8 and the weights
are equal, then there are still 11 conditions which may hold: any of coins 9 through
13 may be heavy, or light, or all may be equal. But there are only nine possible
Sec. 5.1 BASIC COUNTING TECHNIQUES 229

outcomes for two weighings; thus, two weighings are not sufficient to distinguish
among the remaining eleven possible conditions. It follows that any algorithm
which uses an initial weighing which compares two sets of four coins will require
more than three weighings to distinguish some of the conditions.
Now suppose the initial weighing compares coins 1 through 5 with 6 through
10. If the weights are not equal, then any one of ten conditions may hold since any
of the coins on the light side may be light or any of those on the heavy side may be
heavy. Again, two additional weighings are not sufficient to determine which of
these conditions holds. In a similar way, it can be shown that any other initial
weighing will leave too many conditions to be resolved by the last two weighings.
This establishes that three weighings are not sufficient and therefore the height of
a decision tree for this problem must be at least four. #

Problems: Section 5.1

1. Let the alphabet & be the set X = {a, b, c}.


(a) How many strings of length 4 can be written using the symbols of ©?
(b) How many strings of length 4 beginning with the letter “c” can be written using
the symbols of 2?
(c) How many strings of length 4 beginning with either “a” or “6” and which con-
tain exactly one occurrence of “c” can be written using symbols of 2?
(d) How many strings of length 4 beginning with either “a” or “b” and which con-
tain at least one occurrence of “c” can be written using symbols of 2?
Let A and B be finite sets, |A| = m and | B| = n. How many binary relations are
there from A to B?
Let S be the set of kth degree polynomials over a single variable. How many elements
of S have integer coefficients such that the magnitude of each coefficient is not greater
than some given n € N and the coefficient of x* is not zero?
How many distinct ways are there to encode the decimal digits 0-9 as binary se-
quences of length 4? Consider only codes which represent different digits by dif-
ferent sequences.
(a) Find an expression for the number of integers from 0 to 10!° whose decimal
representations contain no 1’s or 2’s.
(b) How many odd 3-digit numbers less than 300 can be formed from the 5 digits
0, 1, 3, 5,7?
Count the number of ways 5 people can be arranged
(a) in arow of 5 chairs.
(b) inacircle of 5 chairs. (Rotations of a given arrangement are not considered to
be distinct.)
A string in {0, 1}* has even parity if the symbol 1 occurs in the word an even number
of times; otherwise, it has odd parity.
(a) How many words of length 1 have even parity?
(b) How many have odd parity?
Let & be the alphabet {a, b, c}. Show that the number of words of length n in which
the letter a appears an even number of times is (3" + 1)/2.
COUNTING AND ALGORITHM ANALYSIS Ch. 5
230

If you flip a coin 5 times, ho w ma ny dif fer ent way s can you get exa ctl y 1 hea d?
2 heads? Find a formula for the nu mb er of way s of obt ain ing r hea ds wit h n fli ps
of a coin.
Cou nt the nu mb er of dig rap hs wit h nod e set S = {0 ,1 ,2 ,. .. ,” — 1}.

Prove Theorem 5.1.2.


Let A and B be fin ite sets . Pro ve eac h of the fol low ing .
(a) If Bc A, then|A U Bl =|A|.
(b) |A]=Bl|+| A AN— Bl.
(c:) |AUBl=|A]+ |4B0 )Bl.
—-|
13. Prove Theore m 5.1 .5 by ind uct ion . App ly the rul e of pro duc t exp lic itl y.

14. Prove the following combinatoric identities for n > 0.


(a) P(m,n) =n!
(b) P(x, n) = P(v,r)P(n — r,n —r) whereO<r<n.

© ("P)=62)+()
na+i\_ n n

15. Pro ve The ore m 5.1 .9 by ind uct ion . (Hi nt: Use pro ble m 14( c). )

16. (a) Show by induction on x that for b ¢ N, b> 2.

(6-1) Sb =o! 1
i=

(b) Interpret this ide nti ty in the con tex t of num ber rep res ent ati on in the bas e 4
using the sta nda rd pos iti ona l not ati on. (It may hel p to exp and the ide nti ty
for b = 10 and n = 4.)
17. Let S be the set of fun cti ons {0, 1}4 whe re A is the set of bin ary n-t upl es, {0, 1}”. The
set S is called the set of switching functions of n variables.
(a) Specify |.S| as a function of n.
(b) A swi tch ing fun cti on is sel f-d ual if it rem ain s unc han ged whe n all occ urr enc es
of 0’s and 1’s are int erc han ged in its def ini tio n. For exa mpl e, if n = 2 the
function
f(O, 0) = 0,
£0, 1) = 1,
fC, 0) = 0,
fa,) =1,
is self-dual. Count the number of self-dual switching functions of n variables.
18. Consider a computer in which numbers are represented with p binary digits as
follows. For integer arithmetic, a number is represented using one bit to indicate the
sign and the remaining p — 1 bits represent the magnitude. (This is called a sign-
magnitude representation.) The floating point representation uses m bits to represent
the mantissa of a floating point number and k = p — m bits to represent the expo-
nent, where m, k > 2. Both the mantissa and the exponent are represented using
a single sign bit and the remaining bits as magnitudes. The exponent specifies a power
of 2, and the floating point representation is normalized, i.e., the exponent is
Sec. 5.1 BASIC COUNTING TECHNIQUES 231

adjusted so that the radix point is to the left of the digits of the mantissa and the
leading digit of the mantissa is 1 unless the value of the mantissa is 0.
(a). How many distinct integers can be represented in the integer notation? (Note
that there are two distinct representations of 0.)
(b) How many distinct real numbers can be represented in the floating point
notation?
(c) Estimate the number of distinct integers that can be represented in the floating
point notation ifm = 24 and k = 8.
(d) Estimate the ratio of integers representable in integer representation to integers
representable in floating point representation if m= 24 and k = 8.

19. (a) You are given 12 apparently identical coins of which at most one may be
counterfeit. A counterfeit coin is always either heavier or lighter than a genuine
coin. Find a minimax algorithm using a pan balance to locate the counterfeit
coin if it exists and determine whether it is heavy or light. Present your algor-
ithm as a decision tree.
(b) You are given 13 apparently identical coins, exactly one of which is counter-
feit and is either heavier or lighter than the others. Find a minimax algorithm
to locate the counterfeit coin.

20. Suppose all of n > 2 coins are of equal weight except for one which is known to be
heavier than the others. Find a lower bound for the number of weighings (using a
pan balance) needed by a minimax algorithm to locate the heavy coin. (You need
not specify an algorithm.)

21. Trees often provide a way of enumerating the set of solutions to problems. For
each of the following classical problems, construct a tree of minimal height which
contains a path which describes a solution. Each node of the tree should be labeled
with a system state and each branch of the tree should correspond to a single action
which changes the state of the system. As you construct the tree, do not include
any new node which has a label which already appears in the tree; thus, no two
nodes of the tree should be labeled with the same system state. The solution with
the minimum number of steps may not be unique; for each problem, count the
number of minimum step solutions.
(a) The Towers of Hanoi. Let A, B, and C denote 3 vertical pegs. Initially, 3 discs
of unequal size are arranged on peg A with the largest disc on the bottom and
the smallest disc on top. The problem is to move all 3 discs from peg A to peg
C. Each move consists of moving a single disc from one peg to another. No disc
may ever be placed on a disc smaller than itself.
(b) Missionaries and Cannihals. Three missionaries and three cannibals are initially
on the south side of a river and wish to cross to the other side. They have a
single boat which holds at most 2 people but can be handled by a single person.
However, if at any point the cannibals outnumber the missionaries on either
shore, a missionary will be devoured. Find a way to transport all the cannibals
and missionaries across the river without losing anyone in the process. You
may assume that missionaries do not eat cannibals.
(c) You are given an eight gallon container filled with water and two empty con-
tainers of capacity 5 and 3 gallons respectively. The containers are not
graduated. Find a way to divide the water into two four gallon quantities.
232 COUNTING AND ALGORITHM ANALYSIS Ch. 5

5.2 ASYMPTOTIC BEHAVIOR OF FUNCTIONS

Programs are representations of algorithms. They can be evaluated and com-


pared either empirically or theoretically. A common way of evaluating a pro-
gram is to choose a “typical” set of input data and find out how fast the program
solves that problem. A more general approach estimates the rate at which a pro-
gram solves problems; for example, compilers often are evaluated on the basis
of the number of source program cards or statements they process per second.
But such empirical measures are strongly dependent upon both the program and
the machine used to implement the algorithm. Thus, a change in a program may
not represent a significant change in the underlying algorithm but may, never-
theless, affect the speed of execution. Furthermore, if two programs are compared
first on one machine and then another, the comparisons may lead to different
conclusions. Thus, while comparison of actual programs running on real com-
puters is an important source of information, the results are inevitably affected by
programming skill and machine characteristics. In this section we will develop a
mathematical basis for comparing algorithms which provides a useful alternative
to empirical measurements. Judiciously used, this mathematics provides an
important means of evaluating the cost of algorithm execution.
We are interested in computer programs which can be applied to a collection
of problems of a certain type. Assuming a computer program which eventually
halts, solving a problem requires only sufficient time and sufficient storage. In
general, the time and storage space required by a program will vary with the
particular problem being solved. Consider, for example, the following classes of
problems, and note the role of the value of n.
1. Find the largest entry in a sequence of 7 integers.
2. Let V bea vector of integers with n distinct entries. Sort the entries of V
into ascending order.
3. Let g be a digraph with n nodes and a distance associated with each edge.
For some specified pair of nodes j and k where 1 <j, kK <n, find the
shortest path from node j to node k.
4. Let S be a set with n elements and R a binary relation on S. Find the
transitive closure of R.
In each of these problems, the parameter m provides a measure of the “size” or
“difficulty” of the problem in the sense that the time required to solve the prob-
lem, or the storage space required, or both, will increase as » grows. In order to
measure the cost of executing a program, we customarily define a cost function,
or complexity function f, where f(n) is either a measure of the time required to
execute an algorithm on a problem of size n or a measure of the memory space
required for execution of the algorithm. If f(#) is a measure of the time required
to execute an algorithm on a problem of size n, then fis called the time complexity
function of the algorithm. Similarly, if f(7) measures the storage required for the
execution of an algorithm on a problem of size n, then fis called the space com-
Sec. 5.2 ASYMPTOTIC BEHAVIOR OF FUNCTIONS 233

plexity function of the algorithm. We will refer to either kind of function as simply
a complexity or cost function of the algorithm, but our principal concern will be
with time complexity functions.
In general the cost of obtaining a solution increases with the problem size
n. If the value of n is sufficiently small, then even an inefficient algorithm will not
cost much to run; consequently, the choice of an algorithm for small problems is
not usually critical. For this reason, our concern is with values of n which are
large enough to make some algorithms impractical. In order to compare the
performance of algorithms for relatively large values of n, we will consider the
behavior of their cost functions as n grows large; this is called the asymptotic
behavior of the cost functions. The next definition introduces the fundamental
concept.t

Definition 5.2.1: Let fand g be functions from N to R. Then g asymptotically


dominates f, or f is asymptotically dominated by g, if there exist k > 0 and m>0
such that | f(#)| < m|g(n)| for all.n > k.

If g asymptotically dominates f, and g(n) + 0, then | f(n)/g(n)| < m for all but
a finite number of values of n, none of which are greater than k. Thus if fand g are
cost functions for algorithms F and G respectively, then for problems of size k or
greater, execution of F will never be more than m times as costly as execution of G.

Examples
(a) Let f(n) =n and g(n) = —n3. Since |n| <|—n3| for all n € N, Definition
5.2.1 is satisfied by setting kK = 0 and m = 1. Hence, g asymptotically domi-
nates f. Note that f does not asymptotically dominate g, since regardless of
the choice of m, | —3| > m|n| for all n greater than both 1 and m.
(b) Let g be an arbitrary function from N to R, and let f(m) = cg(n), wherec €C R
and c > 0. Then the functions f and g asymptotically dominate each other
since | f(m)| << clg(n)| for all n € N and | g(n)|< 1/e|f(@)}| for alln EN.
(c) The functions f(7) =n, g(n) =n + 1f(n+ 1), and A(n) = bn +c, where
b,c € Rand 6 > 0, all asymptotically dominate each other. #

Definition 5.2.1 implicitly specifies a binary relation over the functions


from N to R. The properties of this relation are summarized in the following
theorem.

Theorem 5.2.1: Let F denote the set of functions from N to R.


(a) The binary relation on F defined as
{Kf,2>\fh2e € Fand g asymptotically dominates f}
is reflexive and transitive.

tOur interest is in applying these notions to functions of discrete variables, and we will treat
the present topic using functions from N to R. However, the definitions and theorems of this
section extend in a natural and straightforward way to functions from R to R.
234 COUNTING AND ALGORITHM ANALYSIS Ch.5

(b) Let fi g € F. The binary relation


f=2<>fand g asymptotically dominate each other
is an equivalence relation on F.
Proof:
(a) To show that f asymptotically dominates f, it suffices to choose k = 0
and m = 1 and apply Definition 5.2.1; thus the relation “asymptotically
dominates” is reflexive. We next show that this relation is transitive.
Suppose A asymptotically dominates g, and g asymptotically dominates
f; then for some k,,k,,m,,m, > 0, | f(»)| <m,|e()| for n > k, and
|e(n)| << _m,[h(n)| for n> k,. By choosing k = max{k,,k,} and m=
m,m,, we have
If@| << m,|8@)| < mym,|h()| = m|A(n)|
for n > k. It follows that A asymptotically dominates /.
The proof of part (b) is left as an exercise. J

The binary relation of asymptotic domination will provide a basis for com-
paring complexity functions. If two functions f and g asymptotically dominate
each other, then the associated algorithms will be considered equivalent, and any
differences in cost of execution will be largely ignored. Suppose, on the other hand,
that g asymptotically dominates f but not vice versa, where f and g are the com-
plexity functions of algorithms F and G respectively. Then even if G is speeded up
by some arbitrary factor (through clever programming or a faster machine) so that
the complexity function of the fast version is cg, where c <1, cg will asymp-
totically dominate f but not vice versa. Consequently, for any m > 0, there will exist
an infinite number of arguments 1 such that cg(n) > mf(n).

Definition 5.2.2: The set of all functions which are asymptotically dominated
by a given function g is denoted by O(g) and read “order g,” or “big-Oh of g.”
If f € O(g), then fis said to be O(g).

Example
(a) Let f(x) =n and g(n) = n3. Then using an argument similar to that in the
previous example, we see that fis O(g) but g is not O(/).
(b) Let f(z) =n and h(n) = 3n. Then fis O(A) and A is OCS).
(c) Let f(n) =n. The following functions from N to R are all members of O(/).
fim) =k fork € R,
fi(a) = kn fork ER,
Aa=n+k fork € R,
fs(n) =n + If(n + 1). #

The next theorem establishes some important relationships between a func-


tion f and the set O(/).
Sec. 5.2 ASYMPTOTIC BEHAVIOR OF FUNCTIONS 235

Theorem 5.2.2: Consider the class of functions from N to R. Then


(a) fis O(f). .
(b) if fis O(g), then cfis O(g) for any c € R. (Thus, the set O(g) is closed
under multiplication by a constant.)
(c) if fand h are both O(g), then their sum, (f + A), where (f + A)(n) =
S(n)+ h(n), is O(g). (Thus, the set O(g) is closed under addition of
functions.)
Proof:
(a) This follows directly from part (a) of Theorem 5.2.1.
(b) If fis O(g), then for some m, k € N, | f(n)| < m|g(n)| for alln > k.
Then for ce R, |cf(@|=|cl-|f@|<|em|-|g()| for n> k, and
therefore cf(n) is O(g).
(c) If f and A are both O(g), then there exist some m,,m,,k, and k, EN
such that
If@l<smle@| ifalSk,, and
|A(n)| < m,|g()| ifn > ky.
Let m = m, + m, and k = max{k,,k,}. Then
If) + A@)| <1f@)| + [AM S ml g@| + mi g@|
= (m, + m,)|g()| = m|g(n)| forn > k.
Thus (f + A) is O(g). |

The following theorem asserts that f is O(g) if and only if every function asymp-
totically dominated by fis also asymptotically dominated by g.

Theorem 5.2.3: Let fand g be functions from N to R. Then fis O(g) if and
only if O(f) < O(g).
Proof:
(a) (O(f) < O(g) > f € O(g).) From Theorem 5.2.2 we know that
fe O(f). Since O(f) < O(g) it follows that f € O(g).
(b) (f € O(g) > O(f) < O(g).) Let h be any element of O(/); then h is
asymptotically dominated by f. Since f € O(g), f is asymptotically
dominated by g. Since the relation of asymptotic domination is transi-
tive (by Theorem 5.2.1(a)), it follows that / is asymptotically dominated
by g and therefore h is O(g). Since # was chosen to be an arbitrary mem-
ber of O(/), it follows that O(f) < O(g). ff

The following is an immediate result of the previous theorem; its proof is left as
an exercise.

Corollary 5.2.3: Let f and g be functions from N to R. Then


(a) fis O(g) and g is O(/) if and only if O(f) = O(g).
(b) if fis O(g) and g is O(h), then fis O(A).
236 COUNTING AND ALGORITHM ANALYSIS Ch. 5

If f is a complexity function for an algorithm F, then O(f) is commonly


referred to as the asymptotic behavior or asymptotic complexity of the algorithm
F. Because algorithms are often compared on the basis of their asymptotic behavior,
it is important to understand that considerable differences may exist between two
functions which have the same asymptotic behavior. For example, suppose F and
G are two programs which are applicable to the same class of problems, and that
execution of F always takes 3 times as long as execution of G. If fand g are defined
to be the time complexity functions of F and G respectively, then f = 3g. We can
show in this case that f and g asymptotically dominate each other, and therefore
O(f) = O(g). Thus the asymptotic behavior of these functions does not provide
a basis for distinguishing between the cost of the two algorithms. This does not
imply that the difference in cost is negligible; a factor of 3 in speed of execution
would obviously serve as an important consideration of choosing between the
programs F and G. However, since each cost function asymptotically dominates
the other, the order of the functions does not provide a basis for choosing between
them.
Now suppose f(n) = cn and g(n) = dn?, where c and d are positive con-
stants. We can show, using Definition 5.2.1, that g asymptotically dominates f but
not vice versa, and regardless of the values of c and d, if n is sufficiently large, then
the execution of program F will require less time than the execution of G. In this
case, the order of the functions provides an important consideration for choosing
between the algorithms. We will see many important examples of functions f and
g where fis asymptotically dominated by g but not vice versa.

Some Important Classes of Asymptotic Behavior

It is often convenient to use order notation with explicit specification of


a function rather than the name of a function. Thus, O(6) denotes the set of
functions asymptotically dominated by the constant function S() = 6, O™M)
denotes the set of functions asymptotically dominated by the function f(@) =n,
and O(n’) denotes the set of functions asymptotically dominated by f(y) =n’,
The asymptotic complexity of an algorithm can often be expressed in a very
simple form. For example, the asymptotic behavior of a sum of functions is often
equal to the asymptotic behavior of one of the summands. Furthermore, we have
already seen that multiplicative constants do not affect asymptotic complexity. As
a result, the asymptotic complexity of an algorithm can often be characterized in
one of the following ways.

1, fis O(1). For any algorithm of complexity O(1), there exists some k © N
such that execution of the algorithm will cost r<k regardless of the
value of n. Thus the cost of applying the algorithm can be bounded
independently of the problem size n. Any function which is O(c), where
c € R, is O(1). An algorithm of O(1) complexity is said to have constant
complexity.
Sec. 5.2 _ASYMPTOTIC BEHAVIOR OF FUNCTIONS 237

2. fis O(log n).t An algorithm of O(log n) complexity is said to have Joga-


rithmic complexity. For an algorithm of logarithmic complexity, the cost
- of applying the algorithm to problems of sufficiently large size n can be
bounded by a function of the form k log n, where k € R.
3. fis O(n). An algorithm of O(n) complexity is said to have linear complexity.
For any such algorithm there will exist some k € N such that the cost of
executing the algorithm on a problem of sufficiently large size n will be
no more than kn. a
4, fis O(nlogn). For any algorithm of O(n logn) complexity, there will
exist some k € N such that applying the algorithm to a problem of
sufficiently large size n will cost no more than kn log n. Such an algorithm
is said to have n log n complexity.
5. fis O(n*). An algorithm of complexity O(n?) is said to have quadratic
complexity.
6. fis O(c"), where c > 1. An algorithm of complexity O(c”), c > 1, is said
to be of exponential complexity.
7. fis O(n).

The following theorem establishes that the classes we have listed are given in order
of increasing complexity.

Theorem 5.2.4: Consider the class F of all functions from N to R. Then for
c € Rsuch that c > 1,
OU) < O(log n) < O(1) < O(n log n) < O(n?) < O(c") < O(n),
and all containments are proper.
Proof: The proofs that containments are proper are left as exercises. We
will prove the first, second, and fifth containments and leave the others as exercises.
By Theorem 5.2.3, in order to show O(f) < O(g) it suffices to show that fis O(g).
(a) (O(1) < O(logn).) Let f(7) = land g(n) = logan. For alln > 2, 1 <logn
and therefore f is O(g). By Theorem 5.2.3, it follows that O(1) < O(logn).
(b) (O(log n) < O(n).) For all n > 0, log n < n, and therefore log n is O(n).
It follows that Odogn) < O(n).
(c) (O(n?) < O(c") for ¢ > 1.) We will show that for sufficiently large n,
n*<c",

Unless explicit statement is made to the contrary, all logarithms in this book are to the base 2.
For ease of exposition, we have defined the concepts of this section only for functions from
N to R. However, because we are concerned with the behavior of functions for large arguments,
the definitions can be extended without difficulty to include partial functions from N to R which
are defined on all but a finite subset of N. Thus, the fact that f(z) = log m and g(n) = nlognare
not defined for the argument n = 0 causes no substantive difficulty, and we will use O(log x), for
example, to denote the set O(g), where

g(n) = logn, forn > 0.


238 COUNTING AND ALGORITHM ANALYSIS Ch. 5

Since c > | it follows that log c > 0, and hence the above inequality will
hold if
2logn <n loge
or
2 <n,
log c ~ log n
The ratio n/log n becomes arbitrarily large as n increases, and therefore
for any c > 1, this inequality can be satisfied by choosing n sufficiently
large. Thus n* is O(c”) and hence O(n?) < O(c"), Jj

Complexity functions which involve various powers of n often occur in the analysis
of algorithms. The following theorem and its corollary are important for relating
these sets of functions.

Theorem 5.2.5: Let c, d € R, where 0 <c < d. Then O(n‘)< O(n‘), and
the containment is proper.
Proof: Foreachn > 1, if ¢ < d, thenn*® <n’. It follows that O(n") < O(n’).
To show the containment is proper, we will show that 7? is not O(n’). If n?
is O(n*), then for some k such that k > 0, the inequality |n“| < k|n*| holds for
sufficiently large n. If k is chosen to be 0, the inequality does not hold for n > 1.
If k is positive, then n can be chosen large enough that log n > log k/(d — c). But

logn > 28 = (dc) logn > logk

=> dlogn > logk + clogn


=> log(n*) > log (kn*)
=> nt > kn’.
Since k was an arbitrary positive number, this shows that n° does not asymp-
totically dominate nv‘ and therefore nis not O(n’). §j

Corollary 5.2.5: If P(n) is a polynomial in n of degree k, then P(n) is O(n*).

Examples
(a) The function f(n) = 1/n + 63 is O(1).
(b) The function f(n) = rn + knlogn is O(n logn).
(c) The function f(n) = .6n3 + 28n2 + 31n + 468 is O(n). #

The following theorem establishes that the logarithmic base does not affect
the asymptotic behavior of functions which are O(log 7).

Theorem 5.2.6: Let b, c < R be constants greater than-1. Then O(log, n) =


O(log, 7).
Sec. 5.2 ASYMPTOTIC BEHAVIOR OF FUNCTIONS 239

Proof: Using the fact that log, (a*) = x log, a, we observe that
log, n = log, (c’°*") = log, n-log, c = k log, n.
By application of Theorem 5.2.2 and Corollary 5.2.3 it follows that O(log, 2) =
O(log. n). fi
Theorem 5.2.7: Let b,c € R be constants greater than 1. Then O(7 log, n) =
O(n log, n).
The proof is left as an exercise.

The execution time of an algorithm is often equal to the sum of a number of


terms, where each term corresponds to the execution time of some part of the
algorithm. The following theorem provides some results which characterize the
asymptotic behavior of some of these sums.

Theorem 5.2.8: Let c © R. Then


(a) S¥cis om)
si
(b) yy iis O(n?)
(c) > iis O(n).
Proof: Part (a) is straightforward; parts (b) and (c) follow from Corollary
5.2.5 and the identities 07, i= n(n + 1)/2 and $02., 17 = nt + I(2n + 1/6
respectively. fj

In practice, any algorithm can be executed on small problems; that is, when
nis small enough, but the asymptotic behavior of a complexity function provides
important information about whether it will be feasible to execute an algorithm
for moderate or large values use of n. This point is illustrated in Tables 5.2.1 and
5.2.2.
Comparing algorithms on the basis of their asymptotic behavior is a powerful
and convenient technique, but it must be used with caution. Thus, while we would

Complexity Function

Problem Size n log n n niogn n2 2" n!

5 | 3 5 12 25 32 120
10 4 10 33 102 1024 3 x 106
102 7 102 664 104 1.3 x 103° *
103 10 103 9965 106 * *
104 14 104 1.4 x 105 108 * *

Table 5.2.1 A COMPARISON OF THE GROWTH OF SOME COMMON COMPLEXITY


FUNCTIONS. THE TABLE ENTRIES ARE PROPORTIONAL TO THE
TIME REQUIRED TO SOLVE A PROBLEM OF SIZE ”. AN ASTERISK
INDICATES THAT THE NUMBER IS GREATER THAN 10109,
240 COUNTING AND ALGORITHM ANALYSIS Ch. 5

Complexity function 1 sec 1 min. 1 hour

login Zi0s 96-107 336-108


n 106 6 x 107 3.6 x 10°
nlogn 62746 2.8 x 106 1.3 x 108
n2 103 7746 60,000
2" 23 26 32
n! 9 11 12
Table 5.2.2 A COMPARISON OF THE MAXIMUM SIZES OF PROBLEMS WHICH
CAN BE SOLVED USING ALGORITHMS WITH SOME COMMON
COMPLEXITY FUNCTIONS. AN AVERAGE EXECUTION TIME OF
ONE OPERATION PER MICROSECOND (1076 SEC) IS ASSUMED.
PROPORTIONAL VALUES HOLD FOR OTHER SPEEDS.

expect an O(n) algorithm to be “better” than one which is O(n?), in fact we cannot
choose between them without more information. For example, suppose that
algorithms F and G have complexity functions f(n) = cn and g(n) = dn’. If the
values of the constants are c = 50 and d = 1, then Fis a more attractive algorithm
only if n, the problem size, exceeds 50. Since this value of n may be larger than most
of the problems of interest, it may be that the O(n’) algorithm is the best choice.
Thus in order to choose between algorithms, it is generally necessary to know the
specific complexity functions and the problem size as well as the asymptotic
behaviors.
By extending the way in which order notation is used, we can characterize
algorithm performance more precisely than is possible with the notation we have
developed thus far. In the extended usage, the notation O(f) is used on the right
side of an equation to denote a member of the set O(f). For example, the assertion
that the algorithm F has asymptotic complexity f, where
f(a) = 1.6n? + O(n log n)
is interpreted as meaning f(n) = 1.6n? + g(n), where g(n) is a member of
O(n log n). This is a stronger assertion than

f(n) = O(n’);
the second is implied by the first but not vice versa. Using this extended notation,
the complexity function of different algorithms can be compared with one another
on the basis of the coefficients of dominating summand functions as well as less
important summands. Thus, for sufficiently large n, an algorithm with a com-
plexity function f(n) = 1.6n? + O(n) will probably be less costly than one who
se -
complexity function is g(n) = 2n? + O(n), which in turn will probably be less
costly than one whose complexity function is A(n) = 2n? + O(n log n).

Problems: Section 5.2

1. Let F be the class of functions from N to R, and let Ff, g € F. Define the
binary rela-
tion = as follows:
f = gif and only if fand g asymptotically dominate each other.
Sec, 5.2 ASYMPTOTIC BEHAVIOR OF FUNCTIONS 241

(a) Show that = is an equivalence relation. (This is part (b) of Theorem 5.2.1.)
(b) Let [f/f] denote the equivalence class of f under the relation =. Show that the
binary relation
Lf] < [elif and only if fis asymptotically dominated by ¢
is a partial order on the quotient set F/=.
Give an example of a function in O(1) which is not a constant function.
Find a pair of functions f and g from N to R such that f € O(g) and ¢ ¢ O(/).
Define a function f: N — R to be bounded if there exists some r € R such that for
alln € N,|f(#)| <r. Prove that every bounded function is O(1).
For each of the following pairs of functions, f: N-> R andg: N — R, determine if
and how fand g are related in terms of asymptotic domination.
(a) f(n) = 1 for n even,
for n odd.
g(n) = for n even,
= for n odd.
(b) f@ = for n even,
=] for n odd.
g(n) = for n even,
=) for n odd.
(c) fi) =n.
g(n) = n/100 if n 4 10* for some k,
== 107 1%,2 if n = 1, 10, 100, etc.
(a) Using logical notation, write out the definition of “f does not asymptotically
dominate g.”
(b) Using the assertion of part (a), argue that if f does not asymptotically domi-
nate g, then for any m there exists an infinite number of arguments 1 such that
lg(n)| > m|f@)|.
(c) Determine whether the following assertion is true. “If f does not asymptotically
dominate g, then for all m > 0, if n is sufficiently large, then | g()| > m|f(n)|.”
Let f; and f; be functions such that f, is O(g1) and f, is O(g2).
(a) Prove that if g,(”) and g2(m) are nonnegative for all arguments n < N, then
fi +fr is O(g1 + 82).
(b) Prove that f; + f, may not be O(g1 + g2).
Let f and g be functions from N to R, and denote by f+ g the product function:

f-e@) = f@:- a(n).


(a) Prove that if fis O(h,) and g is O(h2), then f-g is O(t; +h).
(b) Find a function f: N—R such that O(/) is not closed under multiplication
of functions.
Prove Corollary 5.2.3.
10. Show that each of the following containments is proper:
(a) O(1) < Odog n).
(b) Odlogn) < Of”).
(c) O(n?) < OO”), for alld > 1.
242 COUNTING AND ALGORITHM ANALYSIS Ch. 5

11. Prove the following assertions and show that each of the containments is proper.
(a) O(n) < Off log n).
(b) O(n logn) < O(n4), for alld > 1.
(c) O(c") < O(n), for alle > 1.
12, Show that for all integer values of k, n > 0, O(log n) = O(log (n + &)).
13. Prove Corollary 5.2.5.
14. Consider the class of functions F, where

F=({(f|f
and:N
f(N)
-Rc N}
i.e., the image of every member of F is a subset of N. Let f and g be members of F.
Prove or disprove the following.
Conjecture: If f and g are O(h), then fg is O(h) (i.e., the set O(h) is closed under
composition of functions).
15. Prove Theorem 5.2.7.
16. Suppose two algorithms F and G have time complexity functions

S(n) =n? —n + 550


and
a(n) = 59n + 50
respectively. Determine those values of n < N for which F takes less time to
execute than G.
17, Determine which of the following functions asymptotically dominate others. Present
your answer as a labelled digraph.

fi(n) = 528
Si(n) = 3n* logn + logn

AM=F+5 1

Si(n) = log login

fs(n) = (log n)?


= 208
fol)
fr(n) = log (n + +)
Ss(n) = log (n?)
fo(n) = 3nt4
18. From Theorem 5.2.8 we might make the following conjecture fork e N:

> i is O(nk*1),
a

i=Q

Prove or disprove the conjecture.


Sec. 5.3 RECURRENCE SYSTEMS 243

5.3 RECURRENCE SYSTEMS

The expressions for permutations and combinations developed in Section 5.1 are
the most fundamental tools for counting the elements of finite sets. They often
prove to be inadequate, however, and many problems of computer science require
a different approach. An important alternate approach uses recurrence equations
(often called difference equations or recurrence relations) to define the terms of
a sequence. A formal definition of recurrence equations is difficult because of the
wide variety of forms in which such equations can be written, but the concept is
straightforward. We have already seen an example of a recurrence equation in the
definition of the Fibonacci sequence, where for n > 2, the term a, is defined by the
recurrence equation
a, = Qn~1 + Qy-2

The salient characteristic of a recurrence equation is the specification of the term


a, aS a function of the terms a), a@;,...,4,-;. By itself, however, a recurrence
equation is not sufficient to define the terms of a sequence; we must also specify
the values of some initial terms of the sequence. Thus, in our definition of the
Fibonacci sequence, we set a, = 0 and a, = 1. These are called the boundary
conditions or initial conditions of the sequence. A recurrence equation together with
boundary conditions is a form of recursive definition, although the terminology
used is different from that introduced earlier. The topics of recursive definitions
and recurrence equations are not coextensive; many classes of recursive definitions
do not use recurrence equations and the solution of recurrence equations uses
techniques which are not applicable to the broader class of recursive definitions.
A recurrence system is a set of boundary conditions and recurrence equa-
tions which specify a unique sequence or a function (or sometimes a partial func-
tion) from N¥ to R, where k € I+. Recurrence systems provide a powerful tool for
investigating many classes of problems, including counting and enumeration
problems. A solution to a recurrence system is a function f: N* > R such that f
satisfies both the boundary conditions and the recurrence equations.

Examples
(a) The number of permutations of n objects can be expressed using the following
recurrence system:
P(O) = 1,
P(n) = nP(n — 1), for n> 0.
The correctness of this system can be established as follows:
1. The objects of an empty set can be arranged in a sequence in exactly one
way. Thus, the boundary condition is P(0) = 1.
2. Given n objects, 2 > 0, we can choose the first object of a sequence in
any of n ways and then arrange the remaining elements in P(n — 1) ways.
Thus, the recurrence equation is P(n) = n-P(n — 1) forn> 0.
244 COUNTING AND ALGORITHM ANALYSIS Ch. 5

It can be shown by induction that 7! is a solution to this system, where 0! = 1


and forn > 0,n! = [T[f.1 7.
(b) Let f(h, k) be the maximum number of leaves of a tree of height A, where each
node has outdegree k or less. This function can be expressed as the following
recurrence system:

fO,k) = 1,
fh, =k flh—i,k) forh>0.
The system is based on the following arguments.
1. A tree of height 0 has a single node which is a leaf, so f(O, kK) = 1. This
gives the boundary condition.
2. A tree of height 4 > 0 will have the maximum number of leaves if its
root has k sons, each of which is the root of a subtree of height — 1
with f(h — 1,k) leaves. A tree of height A can therefore have up to
ke f(h — 1, &) leaves.
It can be shown by induction on A that k* is a solution to this system.

(c) Pascal derived the following recurrence system to evaluate ( k ), the number of
subsets of & objects in a set of n objects.

(=Goyt(Z!) forma k>o.


The argument is as follows:
1. The number of ways of choosing 0 things from n things is 1, and the
number of ways of choosing things from n things is 1. These two asser-
tions provide the boundary conditions.
2. Suppose > 0. We choose some element and delete it from the set, leaving
n — 1 elements. A subset of k > 0 elements can now be chosen from the
original n elements in two distinct ways: one can choose k — 1 elements
from the remaining n — 1 elements and then add the deleted element, or
one can choose all k elements from the remaining n — 1 elements. These
possibilities are mutually exclusive and exhaustive. It follows that
n\ _({n—J n—l
(2) =Gi +(e’):
It can be shown by direct substitution that n!/[A!(n — k)!] is a solution to this
system, #

The number of injections and bijections from a set S to a set TJ’ can easily be
expressed in terms of permutations involving | S| and |7|; these expressions were
given in examples in Section 5.1. The number of surjections from one set to another
is difficult to characterize using only permutations and combinations, but can be
easily expressed using a recurrence system.
Sec. 5.3 RECURRENCE SYSTEMS 245

Theorem 5.3.1: Let A and B be finite nonempty sets with | A| = m, | B| =n,


where m > n > 0, The number of surjections, S(m, n), from A to B is given by the
following recurrence system.
S(m, 1) = 1,

Sem, n) =n" —'S1(") som,) form>n> 1.


j=l
Proof: fn = 1, then there is exactly one surjection from A to B; this estab-
lishes the boundary condition S(m, 1) = 1.
Suppose m > 1. The number of surjections from A to Bis equal to the number
of functions from A to B minus the number of functions whose images are proper
subsets of B. If B’ < Band | B’| = j, then there are S(m,/) functions from A to B
whose image is B’. Furthermore, there are ( i ) different subsets of B of cardinality

j. Thus, there are ( f; ) S(m, j) different functions from A to B which have an image
of cardinality j, where j < n. Then the total number of functions from A to B which
n
are not surjections is })7=} ( j ) S(m, j). Since there are a total of n” functions from
Ato B,

S(m, n) =n" —Sj=l (7) Som, j).


n~i

This establishes the recurrence system. Jj

It is obvious that a recurrence system can be used to obtain any term of the
associated sequence by iteratively solving the recurrence. Alternatively, it is
sometimes possible to find an expression for the solution which can be evaluated
directly for any argument nv to find the value of the nth term.

Examples
The following are examples of solutions to recurrence systems. In each case
the expression can be shown to be a solution by direct substitution. All of these
solutions are unique, but we will not prove this.
(a) The following system describes a function which grows exponentially:
ay = k,

Qy == CAy-4 forn> 0.

The solution is a, = kc".


(b) The following function describes the Fibonacci sequence:
ag = 0,

Qa, = 1,

On == Agni + An-2 forn> 1.

The solution is a, = (1/,/5) [0 + ./5)/2P — AL,/5)(d — ./5)/2}.


246 COUNTING AND ALGORITHM ANALYSIS Ch. 5

(c) Consider the following recurrence system


f(0) =0
fQ) =fQ) = 1,
S@® = 2f(a — 1) +f(n — 2) — 2f(a — 3) for
n > 2.
The solution is f(m) = [((—1)"*! + 2/3. #

A treatment of the many techniques for solving recurrence systems is beyond


the scope of this text, but we will illustrate one which is both easy and useful. Later
in this section we will use this procedure to find solutions for some important classes
of recurrence systems.
The technique begins with the specification of a, and repeatedly applies the
recurrence relation to evaluate the terms which appear on the right side. To illus-
trate, consider a recurrence system of the form

ag = bo,

a, = C,4,-1 + b,

where the value of the coefficients b, and c, may be functions of n. The value of the
general term a, can be expressed as a sum by adding both sides of the following
sequence of equations, where each equation is obtained by using the recurrence
relation to express a term in the preceding equation.

a, == CpQy-1 + b,

CpAn—1 = Cp€g-
1 Ayo + CD a1

CnC n-1An-2 = C7Cp- 1On~2.4n-3 + CC n-19n-2

IT ¢,- a, >= Th ee. ay + Te, iD;

il Cy- idg = il Cy iDo

Note that the right side of the last equation only involves the coefficients and
boundary conditions of the sequence. Forming separate sums of the left and right
sides of this set of equations and then cancelling common summands yields
n-~2 ant
a, = b, + Cyd, -1 + CnCy-1Dq—2 + cee + TT c,-ib1 + IT Cy-iDo-

In many cases, standard summation identities can be applied to derive an expres-


sion for the value of a,.

Example
Consider the recurrence system
ay = b,
An = CAy.4 + 8.
Sec, 5.3 RECURRENCE SYSTEMS 247

We form the set of equations


Qn = CAy.1 + 6
CQy-1 == C7A,-2 + cb
C*Ay,_2 = C8a,_3 + 7b

clay = cay + c™ 1b
cay = c"b
Summing the left and right sides and cancelling gives
a, = b+cb
+ c2b + +--- +7
'1b + crb

=bSc.
i=0

Applying Theorem 2.5.3 gives the solution


ad, =(n + 1)b ife
= 1;

= ey
a ) — ntl
ifexl. #

The importance of obtaining an expression for the nth term of a sequence


_defined by a recurrence system is mitigated by the possibility of obtaining the terms
of the sequence iteratively. But a general expression for the mth term often provides
additional insights. For example, if the nth term of the sequence describes the cost
of applying an algorithm to a problem of size n, then an expression for the nth
term will enable us to determine the asymptotic behavior of the complexity function
of the algorithm.

Example
The following procedure returns the sum of the first n entries of an array A

procedure SUM(7):
begin
total — 0;
for i — 1 ton step 1 do
total — total + A{i];
return total
end

Suppose we define the complexity function f of SUM to be the number of additions


performed by SUM. This function is characterized by the following recurrence
system.
fQ) = 1,
fm =fa-D+1 forn>0,
By adjusting indices, the expression developed in the preceding example can be
248 COUNTING AND ALGORITHM ANALYSIS Ch.5

applied by setting 6 = c = 1. It follows that the solution is f(m) = n forn > 1.


Hence SUM is an O(n) algorithm and has linear complexity. #

In the remainder of this section we will consider some special classes of


recurrence systems which are especially important for characterizing the perfor-
mance of recursive programs. While we will obtain solutions of some of these
recurrence systems, the primary goal will be to determine the asymptotic behavior
of a broad class of systems without actually finding solutions for the systems.

Divide and Conquer Algorithms

It is sometimes possible to divide a problem into smaller subproblems, solve


the subproblems, and then combine their solutions to obtain the solution to the
original problem. This general approach, often referred to as “divide and conquer,”
is a powerful technique in algorithm design. Since the subproblems are usually of
the same type as the original problem, a divide and conquer strategy can often be
implemented as a recursive algorithm. In the remainder of this section we will con-
sider some classes of recurrence systems which are useful for describing the com-
plexity functions of recursive algorithms, including many which use a divide and
conquer strategy. Our treatment will proceed from the specific to the general.
We begin by solving some special classes of recurrence relations explicitly and
characterizing their asymptotic behaviors. Then, using the solutions to these sys-
tems as bounds, we will show how to determine the asymptotic behavior of the
solutions to a larger class of recurrence systems without actually solving the sys-
tems.
In general, a divide and conquer algorithm will solve a small problem directly
and will solve a larger problem by dividing it into a set of subproblems of approx-
imately the same size. These algorithms are easiest to describe if one assumes the
subproblems are equal in size. For example, if an algorithm divides a problem of
size n > | into two subproblems of approximate size n/2, then the algorithm can
most easily be described if we assume n is a power of 2. This will enable us to
divide a problem of size n into two problems of size n/2, then divide each of the
problems of size n/2 into problems of size n/4, etc. The recurrence system for such
an algorithm will specify the values of the complexity function only for arguments
which are powers of the appropriate integer b > 1. We will consider the class of
divide and conquer algorithms which obey the following constraints:
1. The cost of solving a problem of size n = 1 is c, where c is a nonnegative
constant. ;
2. For k > 0, problems of size n = b* are divided into a different sub-
problems of size n/b.
3. For all problems of size n > 1, the cost of breaking the problem into
subproblems plus the cost of combining the solutions of the subproblems
to obtain a solution to the original problem is A(m), a function of n.
Sec. 5.3 RECURRENCE SYSTEMS 249

These conditions yield recurrence systems of the following form:


fM =e,
f(a) = af (n/b) + h(n) forn = b¥,k > 0.
Since f(1) = f(6°), a recurrence system of this form specifies a value of S for all
arguments which are (nonnegative) integer powers of b. Because the values of g
are not specified for other arguments, the system will not have a unique solution
f: NR, but we will see that this does not detract from its usefulness if the cost
of solving a problem of size m is monotone increasing with n.
We first treat recurrence systems in which f(n) = af(n/b) + h(n) for the
special case where A(n) = c. The solutions to these systems for n = b* will then be
used to characterize the asymptotic behavior for all arguments of a large class
of recursive algorithms.

Lemma 5.3.2a: Let a, b, and c be integers such that a> 1,b > 1, andc> 0,
and let f: N — R be any function whose values obey the recurrence system

ff) =e,
fn) = af (Z) +e for
n = b* where
k > 0.

For all arguments which are powers of 8,


(a) ifa=1, then f(m) = c(log,n + 1);
(b) ifa +1, then f(n) = cans = 1).
Proof: Letn=b*,k > 1.
Then

f(n) = af (F) +e

os(f)=#1(B)+0
>

o1(20B(B)) 0
aif (a) = a'f (Fr) + ak-le,

Summing both sides of the above sequence of equations and cancelling common
summands, and noting that f(n/b*) = f(1) = c, we have

S(a) = ca*® + cy a= chal


i=0 i=

(a) Ifa = 1, then f(m) = c(k + 1). Butk = log,n,so f(n) = c(log,n + 1).
COUNTING AND ALGORITHM ANALYSIS Ch. 5
250

(b) Ifa +1, then from Theorem 2.5.3 we have


@kti — 1

f() = (H | ):
But k = log, n, and a'®" = n'°®*, Therefore,
_ c(aa’®* me 1) _ e(an'® 4 —_ 1)
f (n) ~— a— { — a— 1 i

From Lemma 5.3.2a we can det erm ine the asy mpt oti c beh avi or of the fun cti on
f for those arg ume nts whi ch are pow ers of b. The fol low ing def ini tio n is a gen -
eralization of the con cep ts int rod uce d in Sec tio n 5.2. Thi s gen era liz ati on per mit s
us to dis cus s the asy mpt oti c beh avi or of fun cti ons on a sub set of the dom ain N.

Definition 5.3.1: Let f and g be fun cti ons fro m N to R, and let S be an
infinite sub set of N. The n f is O(g ) on S if the re exis ts k > 0 and m > 0 suc h tha t
| f()| < m|g(n)| for alln € S such that n > k.

Example
Let f: N > R be defined as follows:
f(x) =1 if x is even,
fw=*x if x is odd.
Then fis O(1) on the set of even integers, but fis not an O(1) function. #

It is easy to see that if gis O(A) and S < N, then gis O(h) on S. Moreover, the prop-
erties of asymptotic behavior we have considered extend in a natural way to
asymptotic behavior on S. For example, if c is a constant and fand g are O(A) on
S, then cf and f+ g are O(h) on S.
The next lemma is an immediate consequence of Lemma 5.3.2a; its proof is
left as an exercise.

Lemma 5.3.2b: Let a, b, and c be integers such that a > 1,6 > 1,andc > 0,
and let f: N—> R be a function such that
fd) =e,
f@=af(tn/b)+ec forn = b* where k > 0.
Let S = {b¥|k © N}.
(a) Ifa=1, then fis Odog x) on S.
(b) Ifa #1, then fis O(7'*?) on S.

We now use the preceding lemma to characterize the asymptotic behavior for
arguments which are powers of b for a large class of recurrence systems.

Theorem 5.3.2: Let a,b, and c be integers such that a> 1, b> 1, and
c > 0, and let f: N — R be any function such that
fM<e,
f(n)<af(n/b)+c forn = b* where k > 0.
Sec. 5.3 RECURRENCE SYSTEMS 251

Let S = {b¥|k & N}.


(a) Ifa=1, then fis O(log n) on S.
(b) Ifa 1, then fis O(n) on S.
Proof: Let g be the solution to the recurrence system where equality holds
for both conditions of the recurrence system; that is,

gl) = ¢,
g(n) = ag(n/b) + ¢ for n = b* where k > 0.
By Lemma 5.3.2b, the function g is O(log) on S if a = 1 and O(n) on S if
a ~ 1. It is easy to show by induction that any function f which satisfies the follow-
ing inequalities
fOse,
f@ <af(n/b) + ¢ for n = b* where k > 0,
is bounded by the function g for all arguments which are powers of 6, that is,
ifn € S, then f(n) < g(n).
We conclude that the function fis O(log x) on S if a = 1 and fis O(n?) on S if
axl. |

Example
The procedure MAXMIN given in Fig. 5.3.1 applies a divide and conquer
strategy to return the maximum and minimum values of the entries Afi], ..., ALJ]
of a vector 4. MAXMIN first determines if there is a single entry, i.e., if i = j; in
this case, MAXMIN returns the ordered pair <A[i], A[i]>. If i <j, then MAXMIN
divides the entries into two disjoint subproblems of approximately the same size
and solves each of the subproblems recursively. The solutions to the subproblems
are then used to construct the solution to the original problem. To find the largest
and smallest entries of the array A[1: 7], we call MAXMIMN (1, 2). We define the

procedure MAXMING, /):


if i = j then return <A[i], Ali}
else
begin
comment: Divide array into two subarrays of approximately equal size.
<max1, minl> — MAXMIN (i, iow )):
<max2, min? — MAXMIN (Ee jeu i);
comment: Put largest value in maxJ and smallest in min!.
if maxl < max2 then max] — max2;
if min] > min2 then minI — min2;
return <max1, minl>
end

Fig. 5.3.1. Procedure to find maximum and minimum entries in an


array A{i:j] where i<j
252. COUNTING AND ALGORITHM ANALYSIS Ch. §

complexity function f of MAXMIN as follows: f(m) is the number of comparisons


between elements of the array when A has n entries. The following recurrence system
describes the value of f for each argument which is a power of 2.

f(1) = 0,
f@) = 2f(i/2) +2 for n = 2* where k > 0.
The function f obeys the following inequalities:

fl) <2
f(n) < 2f(n/2) +2 for n = 2* where k > 0.
By Theorem 5.3.2 we can conclude that MAXMIN is an O(n) algorithm if n is a pow-
erof2. #

The preceding results concerning recurrence relations of the form f(n) =


af(n/b) + h(n) can only be applied to arguments which are powers of b. The next
theorem states conditions which enable us to characterize the asymptotic behavior
of f for all arguments. (The conditions given by the theorem are sufficient but
not necessary.) In terms of a complexity function f, the theorem asserts that if
a problem does not become easier as the size of the problem increases, and if
fis O(g) for all arguments of the form b*, then we can conclude fis O(g) if g is one
of the functions specified in the theorem. An important implication of the theorem
is that for most cases of interest, “padding” a problem or an input by adding
dummy entries so that the problem appears to be of size b* will not affect the
asymptotic behavior of the complexity function.

Theorem 5.3.3: Let f: N->R-+ be a monotone increasing function such


that fis O(g) for all arguments of the form b*, where b, k €¢ Nand b> 1.
(a) If gis O(log n), then fis O(log n).
(b) Ifgis O(n logan), then fis O(n log n).
(c) If gis O(n*), then fis O(n’) for d ¢ R,d> 0.
Proof of (a): Let S = {n|n = b*}; then f is O(g) on S. Since g is O(log n),
it follows that fis O(log m) on S. Hence there exist numbers r € N and K e R+
such that ifn > rand n = b*, then f(n) < K logn. Consider any m € N such that
r<b¥ <m< b**', Because fis monotone increasing and positive,
f(m) < f(b**")
and therefore
f(m) < K log(b**') = K(log(b*) + log 6)
< K(1 + log b) log(b*)
< K(1 + log b) log m.
Therefore f(m) < K(1 + log b) log m if m is greater than a Power of b which is
greater than r. It follows that fis O(log n).
The proofs of parts (b) and (c) are left as exercises. Jj
Sec. 5.3 RECURRENCE SYSTEMS 253

Examples
(a) The procedure MAXMIN, given in Fig. 5.3.1 and discussed in the previous
example, is O(n) for all nm = 2*, and the number of comparisons made by
MAXMIN increases with n. Therefore we can conclude from Theorem 5.3.3
that MAXMIN is an O(x) algorithm for all arguments n € N.
(b) A binary search of a sorted list stored in A[i:j] is given in Fig. 5.3.2. The
procedure determines whether an argument arg is contained in any of the
locations A{i], Afi + 1],..., ALj]. If so, the procedure returns the index of
the argument in A; otherwise the procedure reports that the argument was
not found. To search array A[1: 7] for arg, we call BINSEARCH (arg, 1, 7).
The procedure first compares arg with an element near the middle of the list.
If they match, the search is successful and the index of the element is returned.
Otherwise, if arg is less than the element, the search is resumed recursively on
the initial portion of A, and if arg is greater than the element, the search is
continued on the second portion of A.

procedure BINSEARCH
(arg, i,j):
begin

m—|F4 |;
i

if arg = Al[m] then return m


else
if arg < Alm] and i < m then return BINSEARCH
(arg, i, m — 1)
else
if m <j then return BINSEARCH(arg, m + 1,/)
else return “not found”
end

Fig. 5.3.2. Binary search for arg in the array A[i:j] where i<j
and entries are sorted in increasing order

Let f be the complexity function of BINSEARCH, where f(x) is defined


to be the maximum number of comparisons made between arg and the entries
of a list with n entries. (Counting the maximum number of comparisons is
called a “worst case” analysis.)
If j = i, a call to BINSEARCH (arg, j,/) will result in no more than two
comparisons.
Thus
FQ) = 2.
If j —i+ 1 = 2* for some k > 0, then BINSEARCH makes one com-
parison to determine whether arg = Alm]. If not, then BINSEARCH may call
itself to search either the initial portion of the array (which has 2*~-! — 1
entries) or the final portion (which has 2*~! entries). Since f(m) is defined to be
the maximum number of comparisons made,
f(2*) = f(2?) + 2.
Applying Theorem 5.3.2 with a = 1, b = 2, and c = 2, it follows that BIN-
254 COUNTING AND ALGORITHM ANALYSIS Ch. 5

SEARCH is an O(log n) algorithm for n = 2*. Moreover, f is monotone


increasing, so by Theorem 5.3.3, binary search has O(log n) complexity. 7

When the recurrence relation of a divide and conquer algorithm is of the form
JS (n) = af(n/b) + c, the constant ¢ represents the cost of splitting the problem
into subproblems plus the cost of combining their solutions to solve the original
problem. Sometimes the cost of splitting the problem, and more often the cost of
combining the solutions of the subproblems, increases with n. We next consider
recurrence relations of the form f(n) = af(n/b) + cn; these recurrence relations
can be applied when the splitting and combining costs grow linearly with n. Using
the techniques and results developed previously, we can prove the following result.

Theorem 5.3.4: Let a,b, and c be integers such that a> 1, b> 1, and
c > 0, and let f: N — R-+ be a monotone increasing function such that
fM<e,
f@ <af(n/b) + cn for n = b* where k > 0.
(a) Ifa< bb, then fis Om).
(b) Ifa =), then fis O(n log n).
(c) Ifa> 8, then fis O(n'*?).
Proof: Suppose n= b*, where k © N and k > 0. Then we can bound
F(@) as follows:

I) <af (+) + en

us(B) cer(B) +92


oj) sore) + a5

af (5) + gk! ae
< a'f (Fe) +.

Summing both sides of these inequalities and cancelling summands which appear
on both sides, we obtain

f(r) < end (4)


(i) Ifa=b, then ($) = land
b
I) <en(k + 1) = cn log, (n) + cn for n = Bt.
But O(cn log, n) = O(n log n) and O(cn) = O(n); furthermore,
O(n) < Ov log n).
Sec. 5.3 RECURRENCE SYSTEMS 255

It follows from Theorem 5.3.3 that f is O(n log n) in the case that a = b.
(ii) If a-=4 5, then we can apply the identity
| —_ xrtt _ n j

l—x 2x
to the inequality
fa)<en>dfi\b(+)
k

to obtain
1 _ (4 k+i

FQ) <n be
bh
cn peti — qk tt _ b**} — qk

<i b—a ) =e( b—a ):


Since b — a is a constant, we can set d = c/(b — a) to obtain
f(n) < d(b**! — a¥*?)

< dbn — ada®",


But a®" = n**+, and therefore
S(n) < dbn — adn’ forn = d*,
If a < b, then log, a < 1 and by Theorem 5.2.5, O(n'®*) < O(n).
It follows from Theorem 5.3.3 that if a < b, then f is O(n).
If a > b, then log, (a) > 1 and therefore O(n) < O(n"). It follows
that if a > b, then f is O(W?). Jj

Example
Suppose S is an arbitrary sequence of n distinct elements and we wish to build
a binary search tree of minimum height which contains the elements of S as node
values. The following algorithm can be used.
1. Find the median element m of S. (The median is the element of S that would
appear in the [n/2]th position if the sequence S were sorted.) The root of the
tree is assigned the value m.
2. Form two sequences S; and S, such that S; consists of those elements of S
which are less than m and SS, consists of those elements of S which are greater
than m.
3. Apply this procedure recursively to S, to construct the left subtree of the root,
and to S, to construct the right subtree.
An O(n) algorithm# exists for finding the kth largest (and therefore the |. n/2|th
largest, or median) element of any sequence; it follows that there exists some

+A care ful desc ript ion of a line ar algo rith m to find the med ian of a sequ ence of elem ents is
beyond our scop e. The read er is refe rred to Aho, Hop cro ft and Ult man [197 4], page 97.
256 COUNTING AND ALGORITHM ANALYSIS Ch. 5

integer c such that the median of any set with n elements can be found with
no more than cn comparisons. Thus, step 1 can be performed with at most cn
comparisons. After the median m has been found, the sequences 5, and S,
can be formed by comparing m with every element a; € S — {m}; we add a;
to S, if a; < mand add it to S, if a; > m. Thus step 2 can be accomplished
with n — 1 comparisons. Consequently, we can characterize the number of
comparisons necessary to build the binary search tree from S as follows:

fon)< 24 ($) + en + (n — 1D).


Therefore,

S(n) <2f (4) +(e+1)n.


Since fis monotone increasing, we can apply Theorem 5.3.4 and conclude that
the number of comparisons made in constructing the search tree is O(n log 7).
#
Problems: Section 5.3

1. In each of the following prove that the given expression is a solution for the recur-
rence system.
(a) yo = 2, (b) yY=l,
, Vn = 3Y nat for
n > 0. __ Vn-t
y, = 2-3", Yn forn > 0.
_ il
Jn = i"

(c) Yo == 2,

Yn == Vant n > 0.
for
Y_ = 22",
Find a solution for each of the following recurrence systems and determine the
asymptotic complexity of the solution. (The symbols a and b denote arbitrary posi-
tive constants.)
(a) xo = 1, (b) xo =a,
Xn == Xq-1 ta fora > 0. Xy = Xy-1 + BP forn> 0.
(c) x, =1, (d) xo = 1,
X_ = 2x,-1 —1 fornm> 1, Xn = (H+ Dx,-1 for n> 0.
(ce) x, =1, (f) xo =9,
Xn == AXq-1 forn> 1. Xn = X_p-1 tn—1 = forn>0.
(g) xo = 3,
Xn = 3x1 +07 for n > 0.
(a) Find a recurrence system to describe the number of moves that must be made
in a Tower of Hanoi problem with n discs, where n > 0. (See problem 5 1.21(a).)
(b) Solve the recurrence system of part (a).
(a) Consider n coplanar straight lines, no two of which are parallel and no three of
which pass through a common point. Find a recurrence system to describe the
Sec. 5.3 RECURRENCE SYSTEMS 257

number of disjoint areas into which the lines divide the plane. Show that
(n2 + n + 2)/2 is a solution.
(b) Suppose that n > 3 and exactly three of the lines pass through a common point.
’ Find a recurrence system for the number of regions into which the lines divide
the plane.
A derangement of n objects is permutation which leaves none of the objects fixed.
Thus, if fis a derangement function defined on the first » natural numbers, then
Stk) & k for all k <n. Let g be the number of derangements of n objects. Argue the
correctness of the following recursive characterization of g.

g(1) = 0,
&(2) = 1,
g(r) =(n2—)De(n—-1)4+Q@—Dgum—2) forn>2.
(Hint: A derangement either interchanges the first element with another, or it does
not.)
(a) The total path length of a tree is the sum of the lengths of all simple directed
paths from the root of the tree to a node. Find a recurrence system for the
minimum total path length of a complete n-ary tree of height h.
(b) Find the solution to the recurrence system of part (a).
(c) The external path length of a tree is the sum of the lengths of all simple directed
paths from the root of the tree to a leaf. Find a recurrence system for the
minimum external path length of a complete x-ary tree of height h.
(d) Find the solution to the recurrence system of part (c).
Prove Lemma 5.3.2b.
Let f: N— R be a function which satisfies the following relations where b, c > 0:

fO)<e,
fa) <afm—1)+6b forn>0.
If a is a nonnegative real number, describe how the asymptotic behavior of f is
affected by the value of a.
Prove parts (b) and (c) of Theorem 5.3.3.
10. It has been shown (Pohl [1972]) that if a vector A has n entries, then [32 — 2] com-
parisons suffice to find the largest and smallest entries of A. Modify the procedure
MAXMIN so that it never requires more than [3 — 2] comparisons of elements
of A for all n > 1. (Hint: Handle n = 1 and n = 2 as special cases, and make sure
your algorithm does not divide an array with an even number of entries into two
arrays both of which have an odd number of entries.)
11. (a) Construct a recursive procedure MAX2 to implement a divide and conquer
strategy for finding the largest element in the entries A(z), ..., A(j) of an array
A. Your procedure should divide the array into two approximately equal
subarrays.
(b) State the recurrence system which characterizes the complexity function f for
MAX? if f(x) is defined to be the number of comparisons made between entries
of an n element array A, where n is a power of 2.
(c) Find the solution of the recurrence system of part (b).
258 COUNTING AND ALGORITHM ANALYSIS Ch. §

(d) Determine the asymptotic behavior of the complexity function.


(e) Design a procedure MAX3 to find the largest element in an array by dividing the
array into 3 approximately equal subproblems.
(f) Describe how you could generalize this procedure to one which creates k sub-
problems. Discuss the asymptotic complexity of this class of algorithms.
12. (a) Design a recursive procedure TWOMAX which finds the largest two elements
of an array A.
(b) State the recurrence system for the complexity function f of TWOMAX where
/(@) is defined as in problem 11(b).
(c) Solve the recurrence system of part (b) and determine the asymptotic complexity
of f.
13. (a) A binary search such as that given in Fig. 5.3.2 can be viewed as an implementa-
tion of a tree search algorithm such as that given in Fig. 3.2.2. Describe how the
entries of the array correspond to node values of the tree and how to find the
values of the left son and right son of the root.
(b) The tree search algorithm given in a recursive form in Fig. 3.2.2 can also be
given in an iterative form, as in Fig. 3.2.1. Write an iterative form of
BINSEARCH (Fig. 5.3.2).

5.4 ANALYSIS OF ALGORITHMS

The evaluation and comparison of algorithms is a central concern of computer


science. Two kinds of questions predominate:
(a) What is the cost of using a given algorithm to solve a problem of a
specified class?
(b) What is the least costly algorithm which will solve the problems of
a specified class?
By choosing an appropriate measure of cost, we can often answer such questions.
If the same measure of cost is applied to different algorithms for the same task,
we can compare algorithms and choose from among them. In some cases, we can
establish a lower bound on the cost of solving the problems of a specified class;
such a bound provides a measure of the inherent difficulty of solving those prob-
lems. Furthermore, if the cost of applying an algorithm is equal to the lower bound,
then we can conclude that the algorithm is optimal for this measure, that is, no
algorithm exists which will solve the problems of the class with a lower cost. The
topics of algorithm analysis and computational complexity are concerned with the
construction, evaluation, and comparison of algorithms.
The cost of applying an algorithm can be measured in a variety of ways. It
is often inappropriate to measure the cost of operations using real programs run
on real machines because of the difficulty of generalizing such results. We usually
prefer to measure the cost using a mathematical model based on an idealized
programming language or computing machine. However, in any such analysis,
the set of operations which can be performed must be specified and the cost asso-
Sec. 5.4 ANALYSIS OF ALGORITHMS 259

ciated with performing each operation must be given. For example, we may
assume that all arithmetic operations cost the same or we may assume (more
accurately, for most computers) that multiplication is more costly than addition.
Alternatively, we may choose to ignore the cost of some operations. For example,
the cost of applying some sorting algorithms is essentially proportional to the
number of comparisons made between elements of the set being sorted. In the
analysis of such sorting algorithms, it is common to ignore operations such as
assignments, arithmetic operations, and comparisons of loop indices.
In this section we will consider some algorithms and discuss their cost of
execution. In some cases we will also comment on the optimality of these algo-
rithms. Optimality can be discussed in a variety of ways, of which two will be
important to us here. First, we can investigate the absolute optimality of an algo-
rithm with respect to a specified set of operations. If an algorithm is optimal in the
absolute sense, then if the primitive operations are restricted appropriately, no
algorithm can perform the task using fewer operations than the optimal algorithm.
Second, there is the weaker concept of asymptotic optimality. Suppose f is the
complexity function of an algorithm A which solves a specified problem. Then A
is asymptotically optimal if for every other algorithm B that solves the problem,
if the complexity function of B is g, then f is O(g). Thus for sufficiently large
arguments, the value of fis bounded by a multiple of the value of g. Informally, we
say O(f) is a lower bound on the asymptotic complexity of the class of algorithms.
Note that two algorithms with distinct complexity functions can both be asymp-
totically optimal. In contrast, if fand g are complexity functions of algorithms for
some problem class, and if fis optimal in the absolute sense, then f(x) < g(”) for
every argument n € N.
Table 5.2.1 describes how the growth of the cost of an algorithm is determined
by its asymptotic behavior. As a rule of thumb, we can say that it is usually feasible
to execute algorithms of O(n) and O(n log n) complexity for fairly large values of 7.
Time or space limitations often make it difficult or impossible to execute O(n?)
and O(n’) algorithms for even moderate values of nm. Exponential algorithms (those
of O(a") where a > 1) cannot generally be executed except for small values of n.
We will now analyze several algorithms, characterize their complexity func-
tions, and consider their optimality. We will describe algorithms for finding the
maximum element of a set, algorithms for searching for a specified element in a set,
and algorithms for sorting the elements of a set. All of the algorithms we describe
are based on comparisons; that is, the result of applying the algorithm is determined
by a sequence of comparisons between elements of a set. We will treat the ques-
tion of optimality only for the class of algorithms based on comparisons where the
number of outcomes of any comparison is bounded. (Most algorithms of interest
have either two or three possible outcomes for each comparison, e.g., < and >,
or <, = and >.) Thus, our claims that certain algorithms are asymptotically
optimal depend on our considering only a restricted class of algorithms; the
claims may not hold if we consider algorithms which are not based on comparisons
or algorithms in which the number of outcomes of a comparison is not bounded.
260 COUNTING AND ALGORITHM ANALYSIS Ch.5

Finding the largest element of an array: an O(n) algorithm


Let A[1: “] be a vector with n > 1 entries. We are to find the largest entry in
A and set the variable max equal to its value. Let be the complexity function such
that f(n) is the number of comparisons made between entries of A if A has 7 entries.

procedure MAX:
begin
max < A{l];
for i = 2 until 1 do
if max < Afi] then max — Afi]
end

Fig. 5.4.1 An algorithm to find the maximum entry in an array


A{i:n] where n> 1

We consider the algorithm MAX of Fig. 5.4.1. Each comparison in MAX


occurs within a loop which is traversed for loop index values of i = 2,3,...,n.
Hence the procedure MAX makes n — 1 comparisons of entries of A, and its
complexity function is the following:
f:N-R,
= 0,
f()
fa=n-1 forn>1.
Clearly f is O(7), and therefore the algorithm is linear. We now show that MAX
is, in fact, optimal in the absolute sense and therefore in the asymptotic sense as
well.

Theorem 5.4.1: Any algorithm to find the maximum element of a set with
n members, n > 0, must make at least n — 1 comparisons.
Proof: Each comparison establishes that one element is not larger than
another. In order to find the maximum element, each of n — 1 elements must be
shown (by means of a comparison) to be no larger than some other element. Hence
n — 1 comparisons are necessary to find the maximum of elements. JJ

It follows immediately from Theorem 5.4.1 that if the number of comparisons


between elements of an array is used to measure the cost of applying an algorithm,
then MAX is optimal in the absolute sense.
The procedure MAX uses more comparisons than just those between ele-
ments of A, since each execution of the loop will be preceded by a comparison of
the value of the loop index i with n. If the algorithm were implemented as a deci-
sion tree for a particular n, or if the data items were read sequentially, then the
comparisons associated with the loop index would be eliminated. Thus these
additional comparisons are a consequence of the algorithm implementation. Since
we are interested in the operations performed by the algorithms rather than their
implementations, these comparisons are usually ignored.
Sec. 5.4 ANALYSIS OF ALGORITHMS ~— 261

Alternative optimal methods exist for finding the maximum of a sequence of


n elements. In a sports tournament, players are often paired off for each round
of contests, with the winners of round 7 competing against each other in round
i+ 1. The following graph represents this method for finding the best of eight
players; it uses seven comparisons. This approach generalizes easily to values of
n which are not powers of two.

the winner

best of {1, 2, 3, 4}

1 2 3 4 5 6 7 8
the contestants

After the winner has been found, the resulting labeled tree provides some help in
finding the second best player, since he must have been one of the three players who
lost to the winner. Thus, only two more matches need be played to find the second
place winner.
The algorithms we have described for finding the largest element of a sequence
have the property that the cost is uniform over all problems of size n. In general,
however, the cost of applying an algorithm to a problem of size n may depend on
the particular problem solved. Consider, for example, sorting a list of n entries.
If all the entries are distinct, then there are n! different permutations of the n
entries and consequently ! different lists with the same set of entries. The cost of
applying a particular sorting algorithm to a list with these n entries will usually
depend on the order in which the entries appear; for example, if the list is nearly
sorted, then the algorithm may have to do very little work. The cost of applying
an algorithm to a problem of size n is usually based on either a worst case or an
average case analysis. A worst case analysis defines the cost of applying an algo-
rithm to a problem of size n as the maximum cost over all problems of size x. Thus,
if fis a complexity function based on a worst case analysis, then for every problem
of size n, the cost of applying the algorithm is no greater than f(n). In an average
case analysis, a probability distribution is assumed over the set of problems of size
n and the average cost is calculated based on this probability distribution. Such
an analysis often assumes all problems of size n are equally likely; in this case,
the value of f(n) is equal to the sum of the costs of applying the algorithm to all
problems of size n divided by the number of problems of that size. Of the two
kinds of analysis, worst case is usually simpler because it only requires that we
determine how bad things can be and then analyze that single case, whereas an
262 COUNTING AND ALGORITHM ANALYSIS Ch. 5

average case analysis must account for all possible cases and then weight them
appropriately.

Searching Algorithms

Sequential searching: an O(n) algorithm.


Consider the problem of accessing the records of a file. We assume that each
record includes a search key which is used to retrieve records from the file. For
example, if the file consists of information about individuals, the search key of
a record might be the individual’s name or social security number. In order to
locate a record in a file, a search argument is specified. The result of a search is the
set of records whose keys are equal to the search argument; if no such records
are in the file, this set is empty. We will treat only the special case where each
record has a unique key; thus, each search will return at most one record.
The simplest file organization for this problem stores the records in a linear
list or vector. If the file has m records, then the list will have n entries. The simplest
search procedure is a “sequential search”; this procedure examines the records
in the order in which they appear on the list until either a record whose key is equal
to the search argument is found or it is established that no such record is in the
file. We define the cost of a search to be the number of records examined, i.e., the
number of times the search argument is compared with the search key of a record.
In the worst case, either the record sought will be the last record of the file or the
record will not be in the file; all n records for the file must be examined to establish
either of these possibilities. Thus, for a worst case analysis, the complexity function
of a linear search is f(m) = n, and hence the search is of O(m) complexity.
We now analyze the average case performance of a sequential search with the
assumptions that each record of the list is equally likely to be the object of a search,
and every search results in a record being found; thus, there are no unsuccessful
searches. A search for the ith record will occur approximately one out of every n
times and will require i records be examined. Since the complexity function / is
defined to be the average number of records examined,
_ il led. nm+) at+i
IMAF=2F3 Fo t=
hia sy
Thus, an average case analysis (with the assumptions stated above) leads us to
conclude that “on the average,” each search of the file will examine about half the
records. Note that the value of f(m) for a worst case analysis is about twice that
for an average case analysis, but both analyses yield complexity functions which
are O(n).

Searching with a binary search tree: an O(log n) algorithm


In Section 3.2 we described the use of binary search trees for storing and
accessing the records of a file. If the height of a binary search tree is h, it was shown
that searching for a record using the tree would require examining no more than
h + 1 records in the file. If a file contains n records, n > 1, then the height / of the
Sec, 5.4 ANALYSIS OF ALGORITHMS 263

binary search tree satisfies the inequality


|logn|<h<n—l.
If h = n — 1, the tree is “degenerate” and each node of the tree has at most one
son. In this case the tree has only one leaf and the records are accessed just as they
would be if they were stored as successive entries in a linear list. If the height of the
tree is | log |, then the tree is said to be “balanced.” Generally, a balanced tree
is one which is not too lopsided or skewed, i.e., one whose height is not very much
greater than necessary in order to contain a specified number of nodes. We will use
the following characterization of a balanced binary tree.

Definition 5.4.1: A binary tree T of height fA is balanced if T is complete and


every path from the root of T to a leaf is of length A or h — 1.

The following theorem relates the number of nodes of a balanced binary tree
to its height.

Theorem 5.4.2: The height of a balanced binary tree with n nodes is | log n }.

To measure the cost of searching with a binary search tree, we take the number
of records in the file to be the problem size and define the cost of a search as the
number of records examined during the search. By Theorem 5.4.2, a balanced
- binary search tree with n nodes is of height h = | log n]|. Since as many ash + 1
records may be examined in the course of a search, a worst case analysis of a search
in a balanced binary tree yields the complexity function f(m) = Llogn]| + 1. The
search is therefore an O(log) algorithm if the search tree is balanced. In fact,
a balanced tree may not be possible if too many records in the file have the same
key, but if all keys are distinct, then a balanced tree can be always constructed.
(A recursive algorithm for constructing a balanced binary search tree was given
in the last example of Section 5.3.)
Many ways of organizing files and searching for records have been developed,
and whether a particular search algorithm is optimal depends on what operations
are permitted and are consistent with the file organization. For search algorithms
which locate records by comparing a search argument with record keys, a search
which uses a balanced tree is asymptotically optimal. This result is established by
the following theorem.

Theorem 5.4.3: Let A be an algorithm to search for a value arg in a sequence


S such that the output of A is either the index of arg in the sequence S or a report
that the search was not successful. If f is a worst case complexity function (that
is, f(n) is the maximum number of comparisons made when S has n elements),
then log n is O(f).
Proof: We will treat the case where each comparison has no more than three
possible outcomes, such as <, =, and >. The generalization to the case of k
outcomes, k > 1, is straightforward.
264 COUNTING AND ALGORITHM ANALYSIS Ch. 5

Consider the decision tree representation of a search based on comparisons.


Each internal node of the decision tree represents a comparison and has no more
than three outgoing branches. Since the sequence S has n elements, the decision
tree must have at least n + 1 leaves, where n of the leaves denote outcomes of the
form “arg is the ith element of S” and one leaf denotes “arg is not an element of
S.” Since no node can have more than three sons, if / is the height of the decision
tree, then 3" > n-+ 1, that is, A > log, (n + 1). Therefore some paths from the
root to a leaf contain at least log, (x + 1) internal nodes, each of which represents
a comparison made in the course of a search for some value of arg. It follows
that if fis the worst case complexity function of a search based on comparisons,
then for all n, f(x) >1log,(n+ 1), and hence O(log, (n + 1)) < O(f). Since
O(log, (n + 1)) = O(log; n) = O(log an), it follows that O(logn) < O(f) and
hence lognis O(f). J

Corollary 5.4.3: The worst case performance of a search in a balanced binary


search tree is asymptotically optimal.

Now consider the average case performance of searches using a binary search
tree. For the purpose of this analysis, we assume that all records are equally likely
to be the object of a search, and that every search is successful. Furthermore, we
assume the binary search tree is balanced. Note that approximately half the nodes
of a balanced binary search tree are leaves, approximately 3 of the nodes are either
leaves or one step removed, approximately } of them are within two steps of a leaf,
etc. Thus, unless n is small, most of the nodes of a binary search tree are nearly
as far from the root as the leaves.
Let C, be the number of comparisons required to find the ith record stored
in the binary search tree T. The average cost C of a search in T is then

C=_19$ dC,
We can calculate C, easily if we can determine the length of the path from the
root of the search tree to the ith node; C, is one greater than the length of this
path, and therefore C is equal to n plus the sum of the lengths of all such paths.

Definition 5.4.2:_ Let T be a tree with n nodes, a;,a,,...,a,, and let d, be


the length of the unique directed path from the root of T to node a;. Then the
total path length of T, L,, is defined as

Theorem 5.4.4: If T is a binary tree of height 4 > 0, then


Ly < (A — 1)2'*! + 2,
Proof: The total path length L, of a binary tree T of height # is greatest when
all leaves are distance h from the root and T is complete, that is, each node of T
Sec. 5.4 ANALYSIS OF ALGORITHMS 265

has either no sons or two sons. Such a tree has one node a distance 0 from the root,
2 nodes a distance 1 from the root, and in general 2* nodes a distance k from the
root for all k < h. The total path length of a complete binary tree is therefore no
greater than >)*_, i2'. From Theorem 2.5.3, it follows that

Ly <= (h—-1)2*142. &

We now use the bound on L, found in Theorem 5.4.4 to investigate the average
case performance of a search in a balanced binary search tree for the special case
where all leaves are distance h from the root. A complete binary tree of height h
with all leaves a distance / from the root contains 2'*! — 1 nodes. Recall that the
number of comparisons made in locating any record is 1 plus the length of the
path from the root to the node where the record is stored. Hence, the number of
comparisons necessary to locate each of the n = 2'*! — 1 records exactly once is
Lp +n = (hk — 1) 2414-2421 — 1 = hd + 1,
If we assume that all searches are successful, and all records are sought with equal
probability, then the average cost of a search is

C=A Ur +n)=
i
|
e
But 2’+! > A + 2 for all h > 0. Hence,

ct mt Qahth
Moreover,
Cy A) +1—-h+))_ 2! —h =h.
Dati —_ ] ~~ atl — l

Thus, for this class of binary search trees, the average cost of a search lies between
hand h + 1. Since both hand h + 1 are O(log n), it follows that the average search
cost is O(log n). Note that worst case and average case performances of searches in
a balanced binary search tree have the same asymptotic complexity.

Sorting Algorithms

Consider the problem of sorting a sequence of elements drawn from a linearly


ordered set. We define the complexity function of a sorting algorithm which sorts
by comparisons to be the function f such that f() is either the maximum number
of comparisons or the average number of comparisons required to sort a sequence
of n elements. For this measure of complexity, we can use decision trees to show
that O(n log n) is a lower bound for the worst case asymptotic complexity.

Theorem 5.4.5: Let A be an algorithm for sorting a finite sequence. If f is


the worst case complexity function such that f() is the maximum number of com-
parisons necessary to sort a sequence of n elements, then O(n logn) < O(/).
266 COUNTING AND ALGORITHM ANALYSIS Ch.5

Proof: A decision tree can be used to represent any sorting algorithm based
on comparisons. Each internal node of the decision tree will be associated with
the comparison of some element x, with another element x;. Each possible outcome
of a comparison is represented by an arc from the corresponding internal node.
If the result of comparing x, with x, is either x; << x, or x, > x,, then the decision
tree is binary.t Each leaf of the decision tree must specify a rearrangement of the
sequence which places the elements in sorted order. Since it may be necessary
to apply any one of n! permutations to arrange correctly the n elements of a se-
quence S, the decision tree must have at least n! leaves.
The number of comparisons made by an algorithm to specify a particular
permutation is the length of the path from the root to the leaf representing that
permutation. A minimax algorithm to sort n elements is therefore represented by
a tree with at least n! nodes and of height as small as possible. Since a binary
tree of height A has no more than 2* leaves, the height of the decision tree must be
large enough to satisfy the inequality 2! < 2*. Thus, log(n!) < Ah. But for n > 0,

nl =n-(n — 1)-(n— 2)-...-2-1 Sn-(n— 1)-(@—2)-... 13 > (5)


and therefore

log(n!)ty >> 75 log ( A\+) iF


= _
(log n — log 2)

Since h > log(n)), it follows that h > 4n logn — 4n. But A is the largest number
of comparisons required to sort n elements with a decision tree of height A; hence
f(a) = h. Therefore f(m) > 1/2n log n — n/2 and hence O(f) > O(mlogn). J

The preceding theorem establishes that any O(n log n) sorting algorithm is asymp-
totically optimal. Several O(n log n) sorting algorithms are known and we will
present one later in this section, but the most straightforward sorting algorithms
are O(n”), and we begin by analyzing one of these.

Bubble sort: an O(n’) algorithm


The n entries of the vector A are to be sorted into nondecreasing order; thus,
the smallest entry is to be placed in A[1] and the largest is to be placed in A[n]. The
procedure BUBBLE given in Fig. 5.4.2 makes n — 1 passes over the vector A,
where a pass always starts at A[n] and proceeds upward through the unsorted
portion of the vector. Each pass consists of a sequence of steps, each of which
compares some A[i] with A[i + 1] and interchanges their values if they are in the

tWe leave it as an exercise to show that if the decision tree is ternary with branches labelled
<, >, or =, then O(z log n) is still a lower bound on the worst case asymptotic complexity.
Sec. 5.4 ANALYSIS OF ALGORITHMS 267

procedure BUBBLE@):
for 7 — 1 step 1 until x — 1 do
-for i— n — 1 step —1 until / do
if A[{i] > Ali + 1] then interchange A[/] and 4[i+ 1] _

Fig. 5.4.2 Bubble sort of A[l: 7]

wrong relative order, i.e., if A[i] > A[i + 1], then the entries are interchanged. The
initial pass starts with i= — 1 and continues until i= 1. At the end of the
first pass, the smallest entry of A has been “bubbled up” into the position A[I]
and need not be considered further. In the second pass, the value of i ranges from
n— 1 to 2; this pass bubbles the smallest entry of A[2] . . . A[n] into A[2]. In general,
in the jth pass the index i ranges from n — 1 to j and the jth smallest element of
A is bubbled into A[j]. After the (n — 1)th pass, the values of A[I], A[2],...,
A{n — 1] are all in place, and consequently the largest entry of A has been moved
to A[n].
To analyze the bubble sort, we first observe that there are n — 1 passes, and
the jth pass makes n — j comparisons. The total number of comparisons is there-
fore S72} (n — fp = n(n — 1)/2 = n?/2 — n/2. It follows from Theorem 5.2.5
that the bubble sort is an O(n?) algorithm.
Alternatively, the complexity function of a bubble sort can be characterized
with a recurrence system. The boundary condition is obtained by noting that no
comparisons are necessary for a list with one entry. For the recurrence relation, we
observe that if a list has m entries, where n > 1, then (n — 1) comparisons are used
to move the smallest entry into place and this process leaves a list of n — 1 entries
to be sorted. Thus, the recurrence system is
T(l)= 0,
Tn) =Ta—Din-|l > 1,
forn
which has the solution n(n — 1)/2.
We have remarked that O(n log 7) sorting algorithms exist. Since the bubble
sort is an O(n? ) alg ori thm and O(n log n) is prop erly cont aine d in O(n’ ), it foll ows
that the bubble sort is not asy mpt oti cal ly opti mal. Neve rthe less , this sort ing
algori thm is com mon ly used whe re the valu e of 1 is not too larg e and pro gra mmi ng
effort is to be kept to a min imu m. A mod ifi ed vers ion is also usef ul if only the first
k entries of the sort ed list are to be foun d; in this case only & pass es need be mad e.
The bubble sort has the addi tion al virt ue that it requ ires almo st no spac e in addi -
tion to that used to contain the input vector.
The bubble sort oper ates by succ essi vely redu cing the pro ble m; each pass
reduces the size of the unsorted port ion of the vect or by 1. Sequ enti al sear ch of
a list of length n is simi lar; each com par iso n eith er find s the reco rd soug ht or
reduces the problem size by 1. If the reco rd soug ht is not fou nd at step i of a se-
quential search, then a pro ble m of size n — i mus t be solv ed. Com par e this with
a binary search: if the ith com par iso n of a bina ry sear ch does not loca te the reco rd,
the problem is reduced to one of app rox ima te size n/2'. Two sub pro ble ms are
268 COUNTING AND ALGORITHM ANALYSIS Ch. 5

defined at each step of a binary search, but each subproblem is only about half as
big as the original problem, and only one of them needs to be solved. Because the
subproblems are approximately equal in size, the algorithm is said to be “balanced.”
An algorithm is balanced if for some k, 0 <_k <1, the algorithm breaks a
problem of size n (where n is sufficiently large) into a collection of subproblems,
none of which is greater than size kn. In contrast, an algorithm may reduce a prob-
lem of size n to one of size n — p where p is a fixed integer; such an algorithm is
not balanced. Thus, bubble sort and sequential search are not balanced algorithms
because they reduce a problem of size n to one of size n — 1. Binary search, on the
other hand, is balanced because it changes a problem of size n into one of size n/2.
Moreover, the binary search of Fig. 5.3.2 would remain balanced even if m were
assigned the value of |(i + /)/r| for some r > 2 rather than [(i + j)/2|. Such a
“skewed” binary search would still have O(log n) complexity, but it would not be
as efficient as the usual binary search. In general, the most efficient algorithms are
those which are balanced, and among the balanced algorithms the most efficient
are those which break a problem into subproblems of approximately equal size.
We will now describe a sorting algorithm which implements a balanced divide
and conquer strategy; then we will show the algorithm is asymptotically optimal by
proving that the complexity function of the algorithm is O(n log n).

Mergesort: an O(n log n) sorting algorithm


Mergesort exploits the ease with which two sorted lists can be merged into
a single sorted list. The input to mergesort is an unsorted list. If the list has more
than one entry, the algorithm splits the list into two sublists, sorts them recursively,
and then merges the resulting lists into a single sorted list which is the output.
The divide and conquer characteristic is manifested by the strategy of breaking the
original unsorted list into smaller lists, each of which is then processed and the
results combined by merging the smaller lists. If the original list is broken into
sublists of approximately equal size, the algorithm is balanced.
As before, we will define the cost of sorting a list of length n to be the number
of comparisons made between elements of the list. In mergesort, all such com-
parisons are made in the process of merging sorted sublists. In order to determine
the worst case asymptotic complexity of mergesort, we need the following result.

Theorem 5.4.6: Two sorted lists of lengths m and n respectively can be


merged into a single sorted list using no more than m +n — 1 comparisons.
Proof: Let LIST1 and LIST2 be two lists of length m and n respectively,
both sorted in ascending order. We will describe an algorithm to merge LIST1 and
LIST2. At each step, the entries at the heads of the lists LIST1 and LIST2 are com-
pared and the smaller of the two is removed and added to the tail of LIST, which
is initially empty. This process is repeated until one of the two input lists is empty,
at which time the remainder of the nonempty list is concatenated to the tail of
LIST. Since each comparison of an element of LIST1 with an element of LIST2
results in an element being removed from one of these lists and added to LIST
,
Sec, 5.4 ANALYSIS OF ALGORITHMS 269

there can be no more than m + n comparisons. Moreover, since no comparison


can be made when either of the lists is empty, there can be at most m + n — 1 com-
parisons: Thus, the algorithm will merge two sorted lists into a single sorted list
while making no more than m + n — I comparisons between elements of the two
lists. J

The procedure MERGE given in Fig. 5.4.3 is an implementation of the algorithm


described in the preceding proof.

procedure MERGE(LIST1, LIST2):


begin
m<- LENGTH(LIST1); 2 — LENGTH(LIST2);
ie1;j-1;
make LIST empty;
comment: move entries from LIST1 and LIST2 to LIST until one list is
exhausted.
while ?< m and j <1 do
begin
if LISTi[{7]< LIST2[/] then
begin
concatenate LIST1[i] to end of LIST;
i<i+l]
end
else
begin
concatenate LIST2[/] to end of LIST;
jojti
end
end;
comment: add remainder of nonempty input list to LIST.
if i< m then concatenate LIST1[i]... LIST1[m] to LIST
else concatenate LIST2[j]. .. LIST2[#] to LIST;
return LIST
end

Fig. 5.4.3 Procedure to merge two sorted lists

The next theorem shows that any algorithm to merge two lists of lengths m
and n requires m + n — 1 comparisons for some pairs of lists.

Theore m 5.4. 7: Let A be an alg ori thm whi ch mer ges two sor ted lists on the
basis of com par iso ns bet wee n list entr ies. The re exis t an infi nite num ber of val ues
of m,n & Nan d lists of len gth s m and n res pec tiv ely suc h that the alg ori thm A
req uir es at leas t m + n — 1 com par iso ns to mer ge the lists .
Proof: It suf fic es to tre at the cas e m = n. Let LIS T1 and LIS T2 be list s of
length m suc h tha t for all i, LIS T1[ 4] < LIS T2[ /] < LIS T1[ i + 1]. The n the me rg ed
270 COUNTING AND ALGORITHM ANALYSIS Ch. 5

output LIST must be constructed by selecting elements from the lists alternately.
If we represent the original lists by the following pair of digraphs,

LIST 1 ay ay a3 Qin ay
o—___+ e+e» .- o—__+e

LIST? by b, b Bn 1 Dey

then the output LIST can be represented by the following digraph:


ad 1 a4 ay a, am

by by b3 bm -1 bm

Each edge of the digraph of LIST represents the result of a single comparison.
If any comparison is not made, the resulting partial subdigraph is consistent with
more than one ordering. Since a merging algorithm must be able to produce any
of the orderings consistent with such a partial subdigraph, all the comparisons
must be made. Because the digraph has 2m — | edges, it follows that m-+ 1 — 1
comparisons are necessary. jj

There exist values of m and n such that fewer than m -+ n — 1 comparisons will
suffice to merge two sorted lists. For example, ifn = 1, then merging can be done
by inserting the single element of LIST2 in the sorted list LIST1; this requires only
[log(m + 1)] comparisons using binary search. But the preceding theorem shows
that for some values of m and n, m + n — 1 comparisons are necessary.
The procedure MERGESORT, which uses the procedure MERGE as a sub-
routine, is given in Fig. 5.4.4. The next theorem establishes that the worst case
behavior of MERGESORT is asymptotically optimal.

procedure MERGESORT(LIST):
if LENGTH(LIST) < | then return LIST
else
begin
k — LENGTH(LIST);
set LIST! to LIST[1]... LIST[|_£/2_]];
set LIST2 to LIST[LA/2.] + 1]... LIST[A]:
return MERGE(MERGESORT(LIST1), MERGESORT(LIST2))
end

Fig. 5.4.4 Mergesort


Sec. 5.4 ANALYSIS OF ALGORITHMS = 271

Theorem 5.4.8: If LIST is a list of n items and MERGESORT is used to sort


LIST, then the number of comparisons made between elements of LIST is
O(n log. n).

Proof: We will apply Theorem 5.3.4, which requires that we characterize the
number of comparisons by a recurrence system in the form

fW <e,
S() <af(n/b) + cn for n = b* where k > 0.

Procedure MERGESORT divides a problem of size n = 2* into two problems of


size n/2, and therefore a = b = 2. The term cn must bound the number of com-
parisons made in merging the two resulting sorted sublists. Since the sublists are
both of length n/2, by Theorem 5.4.6, this will require no more than 2 — 1 com-
parisons. We therefore choose c = 1. This value obviously suffices for the boundary
condition as well, since no comparisons are made by the procedure MERGESORT
for the case n = 1. Since a = b, and f(n) is monotone increasing, it follows from
Theorem 5.3.4 that fis Omlogn).

The preceding theorem together with Theorem 5.4.5 shows that the worst
case behavior of MERGESORT is asymptotically optimal. A number of other
O(n log n) sorting algorithms are known, including heapsort, which has a worst
case behavior of O(n log n) and quicksort, which has an average case behavior of
O(n log n) but a worst case behavior of O(n”). A careful treatment of these algo-
rithms is beyond our scope; the reader is referred to Aho, Hopcroft and Ullman
[1974].

Problems: Section 5.4

1. Construct a binary decision tree of minimum height for finding the maximum of four
elements. Prove that your tree is of minimum height.

2. Itcan be shown that using comparisons to find the largest and second largest elements
of a sequence of length n requires n + [log n| — 2 comparisons, where the outcome
of each comparison is <, =, or >. Describe an algorithm which accomplishes the
task with this number of comparisons.

3. Prove Theorem 5.4.2.

4. (a) Find an expression for the minimum total path length of a balanced binary
tree of height h.
(b) Estimate the cost of an average search in a balanced binary search tree of
height A which has minimum total path length. Assume all searches are suc-
cessful.
(c) Use Theorem 5.4.3 to find the average case asymptotic complexity of a search
in a balanced binary search tree.
272 COUNTING AND ALGORITHM ANALYSIS Ch. 5

5, In this section we used decision trees to investigate the performance of searches in


binary search trees. The distinction between these classes of trees is an important
one; a decision tree is a representation of an algorithm, while a binary search tree
is a data structure. Construct the decision tree corresponding to a search in the fol-
lowing binary search tree. The search is to return the node index if the record sought
is found; otherwise it is to return “not found.” Each node of the search tree is labelled
i: j, where i is the node index and / is the key value of the record stored at the node.

In this section it was shown that the worst case asymptotic complexity of a search
in a balanced binary search tree is O(log m). Characterize the asymptotic behavior
of the worst case performance for the set of all binary search trees with n nodes; i.e.,
what happens if we drop the restriction that the tree is balanced ?
Find the worst case asymptotic complexity of a ternary tree search as described in
Section 3.2. Assume the ternary search tree is balanced.
Theorem 5.4.5 was proved using the assumption that a comparison of two elements
resulted in one of two possible outcomes: either x; < x; or x; > x;. Show that if
three outcomes are permitted (i.¢., x; < xj, x; > x,, OF X; = x,) the result still holds.
Consider the algorithm for sorting by interchange given in Fig. 5.4.5. The input is
procedure SORT(n):
for i ~ 1 until n — 1 do
begin
comment: find minimum entry in Afi: 7].
min — Ali];
position <— i;
for ji + 1 until 2 do
if A[j] < min then
begin
min — A[j};
position — j
end;
comment: interchange minimum entry with A[/].
Al[position] — A{i];
Ali] <— min
end
Fig. 5.4.5 Sorting the array A[1: ”] by interchange
Sec. 5.4 ANALYSIS OF ALGORITHMS 273

the number of entries in the vector 4; when the algorithm terminates, the entries of
A are sorted in nondecreasing order. The algorithm makes a sequence of n — 1
passes with the ith pass finding the smallest entry in A[i:n] and interchanging it
with A[i]. Prior to the ith pass, the first i — 1 entries are in place. Let f(n) be the
number of comparisons made in sorting a vector with 1 entries. Find the asymptotic
behavior of f.
10. The final example of Section 5.3 described how a binary search tree can be con-
structed from an unsorted sequence of length n using O(n log n) comparisons.
Prove that it is not possible to accomplish this task with O(7) comparisons. (Hint:
Show that if this task could be accomplished with O(n) comparisons, then we could
devise an O(n) algorithm to sort by comparisons.)
11. Let A[0: 2] be a vector of coefficients and consider the problem of evaluating the
polynomial
P,(2) = 3} Ali} z
for an arbitrary real argument z. Define the time complexity function f of an algor-
ithm to evaluate P,(z) as the function such that f(m) is the maximum number of
multiplications required to evaluate P,(z) for any vector A[0: 7].
(a) Find the asymptotic complexity of the following algorithms for evaluating
P,{2z).
(i) (This algorithm is known as Horner’s method and is known to use a minimal
number of multiplications.)
procedure HORNER:
begin
value — A[n];
for i<-n — 1 step —1 until 0 do
value «- (value * z) + Afi]
end
(ii) procedure TWO:
begin
power <1;
value < A[0};
for i — 1 until 7 do
begin
power <~ power * Z;
value < value + (A[i] * power)
end
end
(iii) procedure THREE:
begin
value <0;
1 do
for i — 0 until
begin
summand < A[t];
for j <1 until i do summand <— summand * z;
value — value +- summand
end
end
274 COUNTING AND ALGORITHM ANALYSIS Ch. 5

(b) Suppose it is known that A[i] = 0 for all odd i. Construct an algorithm to take
advantage of this restriction and analyze its asymptotic complexity.
12. (a) Using the programming language of this text, write a recursive procedure to
perform a sequential search on an array Afi: /j], where i < j.
(b) Write a recursive procedure to implement the interchange sort of Figure 5.4.5.
(Note: It would be poor practice to implement either of these algorithms
recursively. The exercise will illustrate, however, that they can be viewed as
examples of unbalanced divide and conquer algorithms.)

Suggestions for Further Reading

The counting techniques described in this chapter are a part of combinatorial


mathematics. Liu [1968] is an excellent introduction to this area. A somewhat
more concise treatment of some of the same topics is given by Even [1973]; this
work emphasizes algorithmic techniques and is particularly appropriate for com-
puter science. The book by Bellman, Cooke and Lockett [1970] is a readable and
informal introduction to some combinatorial techniques.
The analysis of algorithms is the subject of the book by Aho, Hopcroft and
Ullman [1974]. Knuth, in his series The Art of Computer Programming, presents
and analyzes many algorithms.
6

INFINITE SETS

6.0 INTRODUCTION

Many interesting and important sets are not finite; two obvious examples are the
set of natural numbers and the set of all ALGOL programs. But even with these
sets, we will never have to treat more than a finite number of the individual ele-
ments. For example, it should suffice to be able to answer questions about all
ALGOL programs with less than, say, 10!°” symbols; there is no need to find a
way to answer the same questions for all ALGOL programs. It can therefore be
argued that we are only interested in a finite number of ALGOL programs. Then
why should the computer scientist be interested in infinite sets? In fact, treating
infinite sets is often easier and more useful than dealing with the finite subset in
which we are interested. Many infinite sets of interest are inductively defined;
investigations of such sets tend to produce results about the entire infinite set and
often provide insight into the structure of the set and its elements.
As with finite sets, we are often interested in the size, or cardinality, of an
infinite set. Cardinality arguments, based on principles similar to the pigeonhole
principle, can be used to establish important results. For example, we will use
cardinality arguments to show that there exist tasks which cannot be performed by
any computer. This is demonstrated by showing that there are more tasks than
there are programs; it follows immediately that some tasks cannot be performed
by any of the programs. This technique will be used to show that there exist real
numbers which cannot be computed by any computer program, even if a computer
of unlimited storage and speed is assumed to exist.

6.1 FINITE AND INFINITE SETS

Finite sets can be distinguished from infinite sets using either of two definitions.
We will present both definitions and illustrate their use.

275
276 ~= INFINITE SETS Ch. 6

Defini tio n 6.l .la : A set A is fini te wit h car din ali ty n € N if the re is a bij ect ion
from the set {0, 1,...,2 — 1} to 4. A set is infinite if it is not finite.

Theorem 6.1.1: The set N of natural numbers is infinite.


Proo f: To prov e N is not finit e, we mus t sho w that ther e is no n € N such
that a bijection exists from {0, 1,..., — 1} to N. Let n be any element of N and
f an arbitrary function from {0,1,2,...,2— 1} to N. Let
k = 1+ max {f(0), f(1),.--, f(@ — D}.
Then k & N, but for every x € {0,1,...,2— 1}, f(x) #&k. Hence , be a
f cannot
surjection, and therefore f is not a bijection. Since n and f were chosen arbitrarily,
we conclude that N is infinite. Jj

To prove a set A is infinite by using definition 6.1.1a, one must establish that
no bijection exists from {0, 1,...,” — 1} to A for any n. Because it is necessary
to rule out an infinite number of possibilities, such a proof can be quite difficult.
For this reason, it is often useful to use the following alternate definitions of
finite and infinite sets.

Definition 6.1.1b: A set A is infinite if there exists an injection f: A — A


such that f(A) is a proper subset of A. A set is finite if it is not infinite.

Definition 6.1.1a states explicitly how to recognize a finite set and then says
that everything else is infinite; Definition 6.1.1b does just the reverse. It is usually
most convenient to use the first definition to show that a set is finite, and the second
to show that a set is infinite. Definitions 6.6.la and 6.6.1b can be shown to be
equivalent by using the Axiom of Choice.t In our discussions we will use whichever
definition is most convenient.

Using Definition 6.1.la, we can give a shorter proof for Theorem 6.1.1 than
the one given previously.

Theorem 6.1.1: The set N of natural numbers is an infinite set.

Proof: The map f: N — N defined by f(x) = 2x is an injection whose image


is the proper subset of even integers. §j

+The Axiom of Choice is a principle of mathematical reasoning of considerable power when


treating infinite sets. It can be stated in a bewildering variety of forms; one of the more easily
understood statements of the axiom is the following:

Axiom of Choice: If C is a collection of nonempty sets, then there exists a set T such that 7
has as elements exactly one x from each set S € C.

Conceptually this principle allows us to choose an arbitrary element from any nonempty set,
and in fact make an infinity of such choices. This seemingly reasonable assertion has some dis-
comforting implications. The interested reader is referred to Wilder [1965] for a discussion of the
Axiom of Choice and a proof of the equivalence of Definitions 6.1.1a and 6.1.1b.
Sec. 6.1 FINITE AND INFINITE SETS 277

Examples
(a) The set of real numbers, R, is infinite. We use Definition 6.1.1b and the
following map:
f:R-R,
S@=x+1 if x > 0,
f(x) =x ifx <0.
Then f is an injection and f(R) = {x|x € R A x ¢ [0, 1}.
(b) Let & = {a, 5}. Then &* is infinite. Let f: Z* > L* be defined by f(x) = ax.
Then fis an injection and the image of fis the proper subset of £* which con-
tains all strings beginning with the letter a.
(c) The closed interval, [0, 1], is infinite. The function f: [0, 1] > [0, 1] defined by
J (x) = x/2 is an injection whose image is the proper subset (0, 1/2]. #

The following theorems establish some of the important properties of finite


and infinite sets.

Theorem 6.1.2: Let A’ be a subset of A. If A’ is infinite, then A is infinite.


Proof: If A’ is infinite, then there must be a proper subset A” of A’ and an
injection f: A’ — A’ such that f(A’) = A”. To show that A is infinite, we will
extend the domain of f to all of A by mapping each element of A — A’ to itself.
In particular, we define g: A — A as follows:
gx)=f) ifxe A’,
g(x) = x ifxe
A— A’,
Then g is injective, and the image of g does not include the nonempty set A’ — A”;
this establishes that A is infinite. (Figure 6.1.1 illustrates the construction used
in the proof: the shaded portion is not in the image of g.) J

oo

Fig. 6.1.1 Construction for Theorem 6.1.2

Example
Let A denote the set of ALGOL programs which never halt. We will show the
set A is infinite by constructing an infinite subset A’ < A of programs which never
halt.
278 ~=INFINITE SETS Ch. 6

begin
label: go to label
end

This program, which we denote by Po, is an element of A. By inserting the statement

go to label;

immediately after begin, we have a different program P; which is also in A. Consider


the program P, obtained from Py by inserting n copies of the statement “go to /abel;”
after begin. Then A’ = {Po, P:, P2,...} is an infinite subset of A. Hence, by Theo-
rem 6.1.2, A is infinite. We can use a similar construction to show that the set of
ALGOL programs which always halt is infinite. #

Corollary 6.1.2: Every subset of a finite set is finite.


Proof: The result follows from the contrapositive of Theorem 6.1.2. Jj

Theorem 6.1.3: Let f: A — B be an injection and suppose that A is infinite.


Then B is infinite.

The proof is left as an exercise.

The next theorem shows that the property of a set being infinite is preserved
under certain set operations.

Theorem 6.1.4: Let A and B be sets where A is infinite. Then


(a) @(A) is infinite,
(b) A U Bis infinite,
(c) if B~¢,then A x Bis infinite,
(d) if B 4 ¢, then A? is infinite.
Proof: We will prove parts (a) and (c) and leave the others as exercises.
(a) Define the map fas follows:
f:A-— @(A),

f(x) = {x}.
Then f is an injection and it follows from Theorem 6.1.3 that @(A) is
infinite.
(c) Since B 4 $, we can choose some element 6 € B, and define the map-
f:A-AXB,
f(x) = &, b>.
Since A is infinite and fis injective, it follows from Theorem 6.1.3 that
A X Bis infinite. Jj

This section has introduced the notion of infinite set and the use of injections
to show that sets are not finite. We are accustomed to dealing with finite sets, where
Sec. 6.2 COUNTABLE AND UNCOUNTABLE SETS 279

any injection from a set to itself is also a surjection. In contrast, infinite sets do
not have this property and we have used this fact to distinguish between the classes
of finite and infinite sets. In a later section, injections will play a crucial role; we
will use them to determine when two infinite sets are the same size, as well as to
establish when one infinite set is “larger” than another.

Problems: Section 6.1

1. Prove that the set [0, 1] is infinite using Definition 6.1.1a.


For some general purpose programming language, prove that the set of all pro-
grams which halt and have no input statements is infinite.
3. (a) Prove that the intersection of two infinite sets is not necessarily infinite, ice.,
the class of infinite sets is not closed under intersection.
(b) Let A and B be infinite sets such that B c A. Is the set A — B necessarily finite?
Is it necessarily infinite? Give examples to support your assertions.
4. Prove Theorem 6.1.3.
Prove parts (b) and (d) of Theorem 6.1.4.
6. Determine which of the following sets are finite and which are infinite. If the set is
finite, find an expression for its cardinal number.
(a) The set of all strings in {a, b}* of prime length.
(b) The set of all strings in {a, b, c}* of length no greater than k.
(c) The positive rational numbers, Q+ = {x|x € Q A x > 0}.
(d) The set of all m x n matrices with entries from {0,1,..., k}, where k, m, and
n are given positive integers.
(e) The set of all ALGOL programs with four statements.
(f) The set of all propositional forms over the propositional variables P, QO, R, and
Ss.
(g) The set of all functions from {0, 1} to I.
(h) The set of all points in R x R with positive integer coordinates where the points
lie properly between the axes and the hyperbola y = 3/x.
(i) Né¢

6.2 COUNTABLE AND UNCOUNTABLE SETS


When dealing with sets, fundamental questions often occur concerning how “big”
a set is, and whether one set is larger than another. For the case of finite sets, the
natural numbers provide the basis for answering such questions; we characterize
the size of a finite set A by saying that A has cardinality m if A has n elements. In
his work on set theory, Cantor developed a technique for measuring the size or
cardinality of infinite as well as finite sets. The numbers which are used to measure
the size of a set are called cardinal numbers. In the following sections, we will
introduce some of the infinite cardinal numbers and explore their properties.
A formal definition of the cardinal numbers requires considerable care; for
example, the concept of “the set of all cardinal numbers” leads to a set-theoretic
280 ~—s INFINITE SETS Ch. 6

paradox. We can avoid paradoxes by introducing new cardinal numbers one at


a time, and since we are interested in only a few of them, this presents no difficulties.
The technique used for establishing the size of an infinite set is essentially the same
as that used for finite sets. For the finite sets, each set of the form
{0,1,2,...,27—]}
is used as a “standard set” with which other sets are compared by means of bijec-
tions. Thus, a finite set A has cardinality n if and only if there is a bijection from
{0, 1,. .., ” — 1}to A. Eac h time we intr oduc e a new infi nite card inal num ber « we
will choose an approporiate standard set S and assert “the set A has cardinality a
(or, the cardinal number of A is «) if there is a bijection from the set S to 4.”
In the last section, we proved that the set of natural numbers N is infinite.
Since no natural number can be the cardinality of N, we must introduce a standard
set for |N|. We choose N itself to be the standard set and denote | N| by No, called
aleph nullt. This results in the following definition.

Definition 6.2.1: A set A is of cardinality %,, denoted | A| = No, if there isa


bijection from N to A.

Examples
(a) [I+| = No.
The function f: N > I+ defined by f(x) = x + 1 is a bijection.
(b) [I] = No.
The function f: N — I defined by f(x) = x/2 if x is even, f(x) = —(x + 1)/2 if x
is odd, is a bijection. #

The existence of a bijection from either N or some set {0, 1, 2,...,2 — 1} to


a set A suggests that one can “count” the elements of A, even though the counting
process might not terminate. This leads to the following terminology.

Definition 6.2.2: A set A is countably infinite if | A| = NX>. The set A is count-


able, or denumerable, if it is either finite or countably infinite. The set A is un-
countable, or uncountably infinite, if it is not countable.

We say a set can be enumerated if its elements can be listed. The list may be
finite or infinite, and repetitions may occur, that is, not all entries of the list need
be distinct. If a list enumerates the set A, then every entry in the list is an element of
A and every element of A appears as an entry of the list. These concepts can be
formalized as follows.

Definition 6.2.3: An initial segment of N is either the set N or else a set of


the first n natural numbers, {0, 1,2,...,2— 1}.

TR is the first letter of the Hebrew alphabet. This notation was introduced by Cantor.
Sec. 6.2 COUNTABLE AND UNCOUNTABLE SETS 281

Definition 6.2.4: Let A be a set. An enumeration of A isa surjective function


J from an initial segment of N to A. If fis injective as well (and therefore bijective),
then fis an enumeration without repetitions; if fis not injective, then fis an enumera-
tion with repetitions.

When an enumeration fis presented, the function is usually specified implicitly


by giving the sequence <f(0), f(1), f(2),...>. We will refer to fas an enumeration
Sunction.

Examples
(a) If A = @, there is only one enumeration of A; it is the empty function.
(b) If A = {a, b,c}, then <a, b, a, c> and <b, c, a> are both finite enumerations
of A, the first with repetitions and the second without.
(c) Let A be the set of even natural numbers. Then
<0, 2, 4,...> and
<2, 0, 6, 4, 10, 8, .. >
are both enumerations of A. (The second enumeration function is
S(n) = 2(n + 1) if nis even and f(m) = 2(n — 1) if nis odd.) #

Theorem 6.2.1: A set A is countable if and only if there exists an enumera-


tion of A.
Proof:
(a) (only if) If A is countable, then 4 is finite or A is countably infinite.
Then, by definition, there exists a bijection from an initial segment of
N to the set A. This establishes that if A is countable, then there exists an
enumeration of A.
(b) (if) We assume that f is an enumeration of a set A. We consider two
cases.
Case 1: If A is finite, then by the definition of a countable set, A is
countable.
Case 2: Suppose A is not finite and fis an enumeration of A. The enu-
meration f must necessarily have the entire set N as its domain.
If f is a bijection, then by the definition of a countably infinite
set, the cardinality of A is & and A is countable. Suppose f
is an enumeration but not a bijection. In order to show that
A is countable, we will describe how to construct a bijection
g by eliminating the repetitions from the enumeration f. We
first set g(0) = (0). Now step through the elements of A in
the order of f(1), f(2), f(G), ..., and each time a new value
occurs, assign the new value to the next available argument for
the function g. Since we are eliminating repetitions, g is injec-
tive by construction. Furthermore, because every element of
A is the value f(m) for some integer m, it follows that each
282 ~=INFINITE SETS Ch. 6

element of A is the value of the function g for some argument


n where n < m; hence g is surjective. Since A is infinite, the
domain of g will be the entire set N. Therefore, g is a bijection
from N to A which establishes that | A| = &, and A is count-
able. J

Examples
(a) The set &* is countably infinite for any finite alphabet Z. This can be shown by
exhibiting the elements of X* in standard order (Definition 3.6.7). If 2 = {a, b}
and a precedes b in the alphabetic order of Z, then the enumeration of X* in
standard order is
<A, a, b, aa, ab, ba, bb, aaa, aab, . . .>
Note that if || > 1, then Z* cannot be enumerated in lexicographic order.
(b) The set of positive rational numbers Q-+ is countably infinite. Clearly Q+- is
not finite, since the natural numbers N can be mapped injectively to a proper
subset of Q+. We will show Q+ is countable by exhibiting an enumeration
with repetitions. The order of the enumeration is specified by the directed path
of the following array.
NUMERATOR
1 2 3 4 3
1 1/1 2/1—+3/1 4/1—>5/1
{
Ve 3/ae 3/2 4/2
N

DENOMINATOR 1/34 13 3/3 4/3


W

ha 24a 34
S&B

1/5<~ 4s
UU

the
WN
wa
re

Since this enumeration will include every integer ratio m/n, it is an enumeration
of Q+, and therefore Q + is countably infinite. The enumeration is with repe-
titions, e.g., 4 and 4 denote the same element of Q+. From Theorem 6.2.1,
it follows that there is a bijection from N to Q+. #

The following theorem establishes an important property of the cardinal number


X,. This result will be used later to show that XY, is the “smallest” infinite cardinal
number.

Theorem 6.2.2: Every infinite set contains a countably infinite subset.


Sec. 6.2 COUNTABLE AND UNCOUNTABLE SETS 283

Proof: Let A be an infinite set. Applying the Axiom of Choice to a sequence


of subsets of A, we construct an infinite sequence <a), a,, a;,.. .> as follows:

Choose a, from A
Choose a, from A — {ay}
Choose a, from A — {a, ay}
Choose a; from A — {ap, a;, a}

Each of the sets A — {ao, a,,4,,...,4,} is infinite. If this were not so, then
A would be equal to the union of the two finite sets 4 — {ao,a,,...,a,} and
{a ,4,,...,a,}. But the union of two finite sets is a finite set and A is infinite.
Therefore each set A — {ay, a1, a),..., a,} is infinite and we can select a new ele-
ment a,,,. Thus we can construct an infinite sequence <ap, a), a, ...> without
repetitions; the elements of this sequence comprise a countably infinite subset
of A. ff

Like the finite sets, countable sets are closed under certain set operations. The
following theorems list the principal results.

Theorem 6.2.3: The union of a countable collection of countable sets is


countable.

Proof: Let S be an initial segment of N, and set A = \_J,c5 A; where each A,


is a countable set. If S = ¢ or if A, = ¢ for eachi € S, then A = ¢ and the result
holds. Suppose that S + ¢ and that there is at least one nonempty set A,; without
loss of generality, we assume A, ~ 6. We construct an infinite array using enumera-
tions of the nonempty sets. If A, + ¢, then the ith row of the array is an enumera-
tion of A,; we use an infinite enumeration with repetitions if A, is finite. If A, = ¢,
we set the ith row equal to the (i — 1)th. Thus, the array contains all the elements
of A and no others. An enumeration of the elements of A is specified by the directed
path in the diagram below.

Ag: (Ao 1 Qo, Ao3.-.)

A;: (a; ayj a3 oe ee we eee )

Ag: (Ax6 Ag gg. es eae eee )

Since this is an enumeration of A, it follows from Theorem 6.2.1 that A is


countable. fj
284 ~—s INFINITE SETS ‘Ch. 6

Examples
The preceding theorem can be used to show that each of the following sets is
countably infinite.
(a) Ir = {{x1, X2,..- , Xn» |x; € I} (the set of n-tupl es with intege r compo nents ).
(b) Q" = {x1, Xa, eee Xn? |x; € Q}.

(c) The set of all nth degree polynomials with rational coefficients.
(d) The set of all polynomials with rational coefficients.
(e) The set of all n < m matrices with rational components.
(f) The set of all matrices of arbitrary finite dimension with rational compo-
nents. #

Theorem 6.2.4: Let A and B be countable sets. Then


(a) A X Bis countable,
(b) if A is finite, then B4 is countable.
Proof: The proof of part (a) is left as an exercise.
(b) If A or Bis empty, then | B4| = 0 or | B4| = 1. Now assume both A and
B are nonempty, where B is countable and | A| = 7. Each element of B4
is a function f: A — B. Let g: N-—> B be an enumeration of B, and for
each positive k € N define the set F, as follows:
F, = {(f |f © Bt and f(A) < g({0, 1, 2,...,4 — 1})}.
Then F, includes every function whose image is contained in the set
consisting of the first kK elements of the enumeration of B; | F,| =k’.
Since A is finite, for each function f: A — B there exists some m © N
such that if k > m, then f € F,; therefore B4 = , cn Fy. But each set
F,, is finite and therefore countable. Hence, by Theorem 6.2.3, we conclude
Uzen F, is countable. Jf

As our definitions have suggested, not all infinite sets are countably infinite.
The next theorem establishes that we need another infinite cardinal number.

Theorem 6.2.5: The subset of real numbers, [0, 1], is not countably infinite.
Proof: Recall that [0,1] denotes the set {x|x Ee RA O<x<1}. Each
x € [0, 1] can be represented by an infinite decimal expansion:
x= Xi QX 1 X~X3 one

where each x, is a decimal digit. Using this representation requires some care,
since the representation is not unique; for example:
5000... = .4999 .. .t
We will show that no function from N to [0, 1] is surjective. This will establish that
no enumeration exists for [0, 1].

TTo show that .4999 . . . is an alternative representation of .5, let x denote .4999 . . . Then
10x = 4.999...,
100x = 49.999 ...,
and 100x — 10x = 45. It follows that x = .5.
Sec. 6.2 COUNTABLE AND UNCOUNTABLE SETS 285

Let f: N-— > [0, 1] be an arbitrary function from the natural numbers to the
set [0, 1]. Arrange the elements f(0), f(1),..., ina vertical array, using a decimal
representation for each value f(x). The resulting array appears as follows:
FO): .X90X01X02-+
FO)! Xpo%11X12---

f(r): XnoXniXna: ++

where x,, is the ith digit in the decimal expansion of f(n). We now specify a real
number y ¢€ [0, 1] as follows: y = .yoy,y,..., where
y= lifx, ~1,
= 2if x, = 1.
The number y is determined by the digits on the diagonal of the array. Clearly,
y © [0, 1]. However, y differs from each f() in at least one digit of the expansion
(namely, the nth digit). Hence, y 4 f(n) for any n, and we conclude that the map
J: N- (0, 1] is not a surjection. Therefore, fis not an enumeration of [0, 1]. Since
the map / was arbitrary, this establishes that |[0, 1]|~N. ff

The preceding theorem and proof are due to Cantor. The proof technique is
sometimes called the “Cantor diagonal technique” or simply “diagonalization.”
Essentially, this technique begins with an infinite list such that each element on
the list has an infinite description. It then produces an object distinct from each
element of the list. This technique has many variations and is applied extensively
in the theory of computability.

Theorem 6.2.6: If Zisa finite nonempty alphabet, then @(£*) is uncountably


infinite.
Proof: Let <wo, W;, W2,...> be an enumeration of Z* and let
<Ao; Ai, A,, oe »

be an enumeration of any nonempty collection of subsets of £*. We will show


that there is a subset of X* which is not in the enumeration. Construct a (possibly
infinite) binary matrix

Wo Wi W2

Ao 200 agi ao2


Ay 410 a11 a2
A2 420 aai a22
286 ~=INFINITE SETS Ch. 6

by letting the ith row represent the characteristic function of A,. Then a,, = 74,(W;);
that is, a,, = lifw, € A,, otherwise a,, = 0. Now define a language L by traversing
the diagonal elements of the array and including in L exactly those elements which
are not in their respective subsets:
0,7
= ,
L = {w,|a © N} = {w,|w, € A, i € N}.
By construction, L + A, for any i € N; that is, L does not appear in the enumera-
tion. But L © @(Z*). Therefore, (Ay, A;, A2,-.-> iS not an enumeration of @(£*).
Since the enumeration was an arbitrary enumeration of any nonempty subset of
O(X*), it follows that no enumeration of the entire set O(2*) exists. J

The sets [0, 1] and @(£*) are examples of sets which are infinite but not count-
ably infinite. In the next section we will develop tools for showing that [0, 1] and
@(z*) have the same cardinality. We choose [0, 1] to be the “standard set” for this
cardinality and make the following definition.

Definition 6.2.5: A set A is of cardinality c if there is a bijection from [0, 1]


to A.

The choice of c is based on the fact that the set [0, 1] is often called a con-
tinuum.

Examples
(a) |[a, b]| = ¢ where [a, b] is any closed interval in R with a < b. This is estab-
lished by noting that f(x) = (6 — a)x + a is a bijection from [0, 1] to [a, 5).
(b) [(0, 1)] = |[0, 1]]. These two sets differ only in their containment of the end
points of the interval; in order to construct a bijection from [0, 1] to (0, 1) we
must find an image for 0 and 1 in the interval (0,1) while keeping the
map surjective. Define the set A to be {0, 1, 1/2, 1/3,..., 1/n,...}. Define the
map fas follows:
f:{0, 1] ©, 1),
fO=5
f() = $5 forn =,

f(x) = x for x € [0,1] — A.


Then fis bijective and therefore |(0, 1)| = c. The following diagram is a rep-
resentation of the function f.

1 1 i a
0 5 4 3 2 1
Sec, 6.2 COUNTABLE AND UNCOUNTABLE SETS 287

(c) |R| = c. We define a bijection g from (0, 1) to R as follows:


g:(0,1)>R,

&(x)
__ U/2 — x)
xd — x)

The function g has the following graph.

Since f of the preceding example is a bijection from [0, 1] to (0, 1), and g is
a bijection from (0, 1) to R, the composite function gf is a bijection from
(0, 1]to R. Hence, |R| =c. #

Problems: Section 6.2

1. Show that each of the following sets is countably infinite.


(a) X*, where X = {a}.
(b) {Xx1, X2, x3>| x; € I.
(c) The set of all finite subsets of {a, b}*.
(d) The set of all first-degree polynomials with integer coefficients.
(e) The set of all finite digraphs with nodes in N.

Show that each of the following sets has cardinality ec by constructing a bijection
from [0, 1] to the set.
(a) (a,b), wherea < banda,be R.
(b) {x|x ERA x> 0}.
() {Kx wlxy Ee RA x* + y? = 1}.
Let |A| =c, |B] =c, |[D| = No, |Z] => 0, where A, B, D, and E are disjoint.
Prove each of the following.
(a) |AUBl=c.
(b) |AU Dil =c.
(Cc) [Dx E|=No.
Try to find a set § such that |@(S)| = No. If you do not succeed, describe the difficul-
ties encountered.

Prove part (a) of Theorem 6.2.4.

(a) In Theorem 6.2.5, suppose we use a binary expansion for f(i) and define the
digits of y in the obvious way:
288 = INFINITE SETS Ch. 6

yw = Oif xe = 1,y = Lifxy, = 0.

Show that y may be equal to f(/) for somej € N.


(b) Explain what difficulties might arise because of the nonuniqueness of the decimal
representation of some numbers in [0, 1]. How does this influence the selection of
the values for y, in the real number y in the proof of Theorem 6.2.5?

7. Joe Cool , a stud ent at Silo Tech , has sugg este d the foll owin g proo f that no bije ctio n
exists from N to N. Assume f is a bijection from N to N, with f(k) = ix.
For each i;,, construct a number in (0, 1] by reversing the digits of i, and putting
a decimal point to the left. For example, if i, = 123, the number constructed becomes
321000...
This defines a map g from N to [0, 1] which is injective, e.g., g(123) = .321000...
Apply the Cantor diagonal technique to the array
gof(O) = .xXooX%01---
Sof) = XpoX11X12---

to construct the number y € [0,1]. Now reverse the digits of y and put the
decimal point to the right. The result is a number which does not appear in the
list f(0), f(1),..., which contradicts the assertion that f is surjective. Hence, no bi-
jection can exist from N to N.
Should we promote Joe to full professor or suggest he find a job as a COBOL
programmer (assuming the two are mutually exclusive)?

6.3 COMPARISON OF CARDINAL NUMBERS

The preceding sections introduced the finite cardinal numbers, the cardinal number
NX, for a countable infinity, and the cardinal number c for some sets of an uncount-
able infinity. In each case, the cardinality of a set A was established by constructing
a bijection from a standard set to A. .This allows us to show that two sets have
the same cardinality, but so far, we have not defined an order relation which will
enable us to assert that one set is larger than another. In this section, we develop
the order relations < and < on cardinal numbers and show that they have prop-
erties similar to the usual order relations over the real numbers. The following
definition formalizes the concept of two sets having the same cardinality even when
a standard set has not been specified.

Definition 6.3.1: Let A and B be sets. Then, A and B are equipotent or have
the same cardinality, denoted by | A| = | B\, if there is a bijectionfrom Ato B.

Example
Let E be the set of positive even integers. Then, |I-+- | =| E| because the func-
tion
Sec. 6.3 COMPARISON OF CARDINAL NUMBERS 289

fi1I+-E,
f(x) = 2x
is a bijection fromI+ to E. +

Because bijections are closed under composition, and inverses of bijections


are bijections, the relation of equipotence has the following property.

Theorem 6.3.1: Equipotence is an equivalence relation over any collection


of sets.

The proof is straightforward and left as an exercise.

It follows from the preceding theorem that to show a set S has cardinality a,
it suffices to choose any set S’ which we know has cardinality « and establish the
existence of a bijection from S to S’ or from S’ to S. In general, we choose the set
S’ to make the proof as easy as possible.
We now consider order relations on sets of cardinal numbers. Our goal is to
be able to compare the sizes of sets. For example, our intuition tells us that sets with
cardinality ¢ are “larger” than countable sets. Before we formally define the order
relation for arbitrary collections of sets, we make the following observations con-
cerning finite sets and their cardinal numbers.
Let A and B be finite sets with | A| = n, | B| = m.
(a) If there exists an injection from A to B, thenn < m.
(b) If there exists a bijection from A to B, then n = m.
(c) If there exists an injection from A to B, but no bijection exists, then
nom,
These relationships between functions and cardinalities can be extended in a
natural way to apply to arbitrary sets.

Definition 6.3.2: The cardinality of A is no greater than (or is less than or


equal to) the cardinality of B, denoted |A|<|B\, if there is an injection from A
to B. The cardinality of A is less than the cardinality of B, written |A| << |B|, if
there exists an injection but no bijection from A to B.

We have chosen to use the notation < and < because the order relations we
have just defined have the properties which we usually associate with these sym-
bols. However, the proofs that the properties hold are, in some cases, lengthy and
intricate. The following two theorems establish some of these properties, but their
proofs are too involved to be presented here. The first theorem, called the Law of
Trichotomy, asserts that any two sets can be compared using either the relation
<or=.

Theorem 6.3.2 (Zermelo): Let A and B be sets. Then exactly one of the
three following conditions holds:
290 ~—sINFINITE SETS Ch. 6

(a) |A|<|BI,
(b) |B] <|A|, or
(c) |A|= |B.
The second theorem asserts that the relation < is antisymmetric.

Theorem 6.3.3 (Cantor-Schréder-Bernstein): Let A and B be sets. If| A] <|B|


and |B| <|A|, then | A| = [B|.

The preceding theorem often provides a powerful mechanism for showing that
two sets have the same cardinality. If we can construct an injection f: A — B,
thus establishing that | 4| << |B|, and another injection g: B—> A to establish that
|B|<|A|, then we can conclude that |A| = |B|. Note that f and g need not be
surjective. Thus Theorem 6.3.3 allows us to conclude that a bijection exists from
A to B on the basis of injections from A to B and B to A. It is often easier to con-
struct two such injections than a single bijection.

Theorem 6.3.4: Let S be a set of cardinal numbers. The order relation < on
Sis a linear order. The order relation < on S is a quasi order.

The proof is left as an exercise.

Examples
(a) We show |(0, 1)| = |[0, 1]| by exhibiting an injection from each set to the
other as follows:
@ f:(@,1)— (0, 1],
Sx) =x.
(ii) g:[0, 1] ©, 1),
s=t4+y
(b) [@(N)| =e.
(i) We show that |@(N)| < ¢ by constructing an injection as follows:
g: PN) > (0, 1]
For every subset § < N, g maps S§ to a real fraction,
BCS) = .X9X1X2..-,
where the fraction is expressed in binary representation and
X2; =0 for 7 = 0,1,2,...,
Xoje1 = 1 for j < S, and
=0Q forj € S;
e.g., () = 0,
g(N) = .01010101...,
3, 5) = .00 01 00 01 00 01...
&({l,
(Note that we cannot use (in place of g) the function g’ such that 2’(S) is
the binary fraction .xox:x2..., where x; = 1 if j ¢ S and x; = 0 if
Sec. 6.3 COMPARISON OF CARDINAL NUMBERS 291

j € S. Since the value of g’(S) is expressed as a binary fraction, the


function g’ is not an injection from P(N) to the [0, 1]: for example, the
sets {0} and {n|n © N A n> 0} would be mapped to .1000... and
0111... respectively; in the binary number system, these are different
representations of the same fraction. However, if g’(S) is specified to be
a ternary (base 3) fraction, then the same characterization of g’ will
specify an injective function.)
(ii) We show ¢ <|@(N)| by constructing an injection from [0, 1] to O(N).
Let x = .x9x1x2... be a binary representation of x € [0, 1]. (If x does
not have a unique representation, choose one arbitrarily.) Define f(x)
to be the set such that j € f(x) if and only if x; = 1;

e.g., f(0) = ¢,
fQ@) = fC1ll1l..)=N,
f(.101010000 . . .) = {0, 2, 43.
Then f is an injection. (Note that fis not a surjection. For example, if
.1000... is chosen as the representation of 1/2 rather than .0111...,
then the set {0} will be in the image of f but the set {nj|n © N A n> 0}
will not.)
It follows from the Cantor-Schréder-Bernstein theorem that |@(N)| = e¢.

(c) [NN| =c.


(i) [NN] < c. We first construct an injection from NN to (0, 1). Let f be an
element of NN and let x; be the binary representation of f(i) for each
argument i € N. Using the digit “2” in a ternary base as a separator for
the values of the function, we define g(/) = (.x92x,2x,2...} and interpret
e(f) as a ternary fraction constructed from the values of ( For example,
consider h: N — N, A(x) = 2x. Then A & NN and g(h) = .021021002...
It is easy to show that g is an injection but not a bijection from NN to
(0, 1).
(ii) c <| NN]. We construct an injection A from (0, 1) to NN. Let x be an ele-
ment of (0, 1) and let x = .x9x1x.x;,... be a decimal expansion of x.
Define h(x) to be that function fe N% for which f(0) = xo, fC) = x1,
... etc. Then, 4 is an injection from (0, 1) to NN.
It follows that|NN| =e. #

The relationship between the finite cardinal numbers, &,, and c is established
by the following theorem.

Theorem 6.3.5: Let A be a finite set. Then |4| << No <e.


Proof: Suppose |A| =n. We use the standard set {0, 1,2,...,”2— 1} and
prove that {0, 1, 2,...,2— 1} <|N|<|[0, 1]| for every n € N. We first define
the function f as follows:
f:{0,1,2,...,n-— TON,
= x;
f(x)
Since fis an injection, it follows that | A|<|N|. In Theorem 6.1.1, we showed that
292 ~=INFINITE SETS Ch. 6

there was no bijection from N to A, so |A| + |N]. It follows that |A| <|N], ie.,
|[A|< No.
We next observe that the map

FNM
SQ) =
is an injection from N to [0, 1]; hence |N| < |[0, 1]|. In Theorem 6.2.5, we showed
that |N| + |[0, 1]|. It follows that |N| <|[0, I]L ie, Xo <e. Ff

Example
Define a number x € (0, 1) to be computable if and only if there is an ALGOL
(or PL/I, or FORTRAN, etc.) program P which, when given any nonnegative
integer i as an input, will halt after producing, as its only output, the ith digit of
the decimal expansion of x. The time required for the computation can be arbitrarily
large but must be finite. Thus, the number x = .x9x;x,...is computable in the
sense that the program P can be used to determine x to an arbitrary precision, or to
produce any digit of the expansion of x. A number x & (0, 1) is noncomputable
if it is not computable, The following procedure computes the digits of the repeating
decimal .514141414...

procedure COMP(i):
if i = 1 then return 5
else
if i mod 2 = 0 then return 1
else return 4

We now show that there exist noncomputable numbers in the open interval
(0, 1). The proof uses a cardinality argument and is nonconstructive. The following
sets will be used:
x, the ALGOL character set,
A, the set of all ALGOL programs,
C, the set of ALGOL programs which compute some number in (0, 1),
S, the numbers in (0,1) which are computed by some ALGOL program.
Since & is a finite set, the set of nonempty strings over the alphabet = has cardinality
No, ie., |L*] = No. Since any ALGOL program is a finite string over £,
|A[< [2+].
Since C is a proper subset of A, |C|<|A|. Any program P can compute the digits
of at most one element of S, but different programs might compute the digits of the
same number. It follows that |.S|<|C|. Thus, we have
IS|<|C| <|A|< No.
But in Section 6.2, we showed that |(0, 1)| = ¢, and in Theorem 6.3.5 we showed
No <c. Hence [S| <|(,1)|, i.e., some of the numbers in (0, 1) are not com-
putable. #
Sec. 6.3 COMPARISON OF CARDINAL NUMBERS 293

We have established that the cardinality of the continuum is greater than


countably infinite and that countably infinite is greater than finite. Might there be
other cardinal numbers that lie between those that we have considered? For exam-
ple, is it possible that there is an infinite set which has cardinality less than &,?
The next theorem gives a negative answer; it establishes that XX, is the smallest
infinite cardinal number.

Theorem 6.3.6: If A is an infinite set, then %, < | A].


Proof: By Theorem 6.2.2, if A is infinite, then A contains a countably infi-
nite subset A’. Since the map
f: A >A,
fa=x forx € A’,
is an injection of A’ into A, it follows that |A’| <]A|, and since |A’| = N,, we
conclude X%, <|Al. §

Is it possible that there is an infinite set whose cardinality is strictly greater


than NX, and strictly less than c? The assertion that no such cardinal number exists
is known as the continuum hypothesis. It has been known for some time that the
continuum hypothesis is consistent with the axioms of set theory. In 1963, Paul
Cohen showed that the negation of the continuum hypothesis is a/so consistent
with the axioms of set theory. As a consequence, one can (at least abstractly) deal
with a mathematical universe in which the hypothesis does or does not hold. For
our purposes, we are only interested in the fact that acceptance or rejection of the
hypothesis has implications for proof techniques. For example, suppose we wish
to prove that a given set A has cardinality c. If we accept the continuum hypothesis,
then it suffices to show that
(i) |A|<e, and
(ii) |Al> No.
However, if we reject the hypothesis, then the above approach does not yield the
conclusion we seek since it might happen that Ny <|A|< ce. We will avoid using
the hypothesis.
From the next theorem, it follows that there is at least a countably infinite
set of infinite cardinal numbers, and hence, there is no largest cardinal number
and no largest set. '

Theorem 6.3.7 (Cantor): Let A be a set. Then | A| < | @(A)|.


Proof: We first show that | A |< |@(A)| by noting that the following function
is injective.
f:A— @(A);
f(a) = {a}.
Next we show that |.A| ~|@(A)|. Let g be an arbitrary function,
g: A— O(A).
294 ~=INFINITE SETS Ch. 6

We will show that g is not surjective and hence not bijective. The function g maps
each element of A to a subset of A; an element x may or may not be in the subset
g(x). The set S < A is defined as follows:
= [x|x € g(x).
Now S is a subset of A, but g(a) # Sforanya € A. For if g(a) = S, then
ae S<+ae {x|x € g(x)} by definition of S,
<a € g(a) by application of the predicate
which defines S,
<aéS by the assumption that g(a) =
Since this is a contradition, the assumption that g(a) = S is false. Since a was
arbitrary, it follows that g is not surjective; and hence, not bijective. Since g
was an arbitrary function, this establishes that no bijection exists and therefore
|A|#|P(A)|. I
Using the previous theorem, we can construct a countably infinite set of infi-
nite cardinal numbers, each of which is smaller than the one which follows:
IN| <|@(N)| < O(N) |< --:
Problems: Section 6.3

Prove that if A’ < A, then | A’] < | Al.


Prove that if|A|<|B|and|C]|=|Al|, then|C}<|B|.
NN

Prove that if there exists a surjection from A to B, then |B] < | A].
NA
PY

If A € B, does it follow that | A| < |B|? Prove your assertion.


Prove that if A is finite and B is infinite, then | A| < |B].
Prove that if A is infinite and |A|<|B|, then B is infinite.
Show that every infinite subset of a countable set is countable.
Prove Theorem 6.3.1.
oP

Prove Theorem 6.3.4.

Find the cardinality of each of the following sets. Prove your assertion.
—_
>

(a) Q, the set of rational numbers


(b) [0,1] x [0, 1] (Hint: Interleave the representations of x and y in the pair
«x, y>.)
(c) QN
(d) PQ)
(ec) R—Q
(ff) RxR
11. Let, and 7, be partitions of A such that 7, refines 72. Prove that |z2| < |7;|.
12. Denote |@((0, 1])| by 2°. Find examples of other sets which have cardinality 2°.
Sec. 6.4 CARDINAL ARITHMETIC 295

13. Prove or disprove each of the following:


(a) |A| = |B] > |(4)| = |O()|
(b) (A < [BI A |C|<>||Ac] D)
<|B?|
(c) (AI<|B) A|Cl=|D)>|4 x Ci/<|Bx D|
(d) (AISIBIAICI<S|D)>|AUC|<|BUD|
14. (a) Prove that there exists a noncomputable number between any two rational
numbers in [0, 1].
(b) Show that all rational numbers in [0, 1] are computable.

46.4 CARDINAL ARITHMETIC

Previous sections have described the cardinal numbers as well as the order rela-
tions < and <. We can now define an arithmetic for cardinal numbers. The arith-
metic is a generalization of the familiar finite arithmetic and includes the operations
of addition, multiplication, and exponentiation.
We will present some of the fundamental properties of cardinal arithmetic
but will prove only a few of our assertions. In some cases, proofs are most naturally
given using ordinal numbers, which we have not developed but which include the
cardinal numbers as a proper subset. Consequently, although we quote a set of
theorems intended to illustrate the characteristics of the arithmetic, in many cases
the proofs are beyond the scope of this text and will be omitted.

Definition 6.4.1: Let a and b be cardinal numbers and let A and B be disjoint
sets such that | A| = a and | B| = b. The sum of a and b is defined to be
a+b=|AUBI.

The following is easily proven using the preceding definition and the proper-
ties of set union.

Theorem 6.4.1: Addition of cardinal numbers is commutative and associative.

The following theorem asserts that the order relations < and < are preserved
by the operation of addition.

Theorem 6.4.2: Leta, b, d, and e be cardinal numbers. Then


(a) ifa<bandd<e,thena+d<b+e.
(b) ifa<bandd<e,thena+d<b+e.
Proof:
E be sets such that |A| = a, |B| = 5, |D| = d, |E| =e,
(a) Let A, B, D, and
and (A U B)O (D U E) = ¢. Since a < B, there is an injection f: A — B, and
since d < e, there is an injection g: D — E. Define the map / as follows:
h:AUD>BUE,
h la =f,
h|p = 8.
296 ~=INFINITE SETS Ch. 6

Since A (\ D = @¢, the map is well-defined. Since B ™ E = ¢ and both f and g


are injective, it follows that / is injective. Hence, |A U D| <|B U Eland therefore
at+d<b-+e.
The proof of (b) is beyond our scope and will not be given. Jj

The following theorem illustrates one way in which arithmetic involving infi-
nite cardinal numbers differs from the familiar arithmetic.

Theorem 6.4.3: Wet a and b be cardinal numbers such that a is an infinite


cardinal number and b <a. Thena+ b= a.

We will not prove the theorem; however, the special cases of a= N, anda=c
follow from our previous work.

Example
We show that c+ Ny =c. Let A = {x|x € R and x > 1}, and let B=
+ 2)|n € N}.
{1/(n Then [A] =c, |B|=No and AM B=. Furthermore,
AU Bc R;hence,|A U B| <e. But|A| = ¢,s0|/A U B[ >. Hence|A U Bl] =
c+No=c. #

We now consider multiplication of cardinal numbers, which is defined using


the cartesian product.

Definition 6.4.2: Let a and b be cardinal numbers, and let A and B be sets
such that |A| = 5 and |B| = b. Then the product of a and b, denoted a-b or sim-
ply ab, is defined as follows:
a-b=|A X Bl.
The proof of the following theorem is left as an exercise.

Theorem 6.4.4: Multiplication of cardinal numbers is commutative and


associative, and it distributes over addition, i.e., a(b + d) = ab + ad.

Theorem 6.4.5: The operation of multiplication preserves the order relations


< and <;i.e., for all cardinal numbers a, b, d, and e,
(a) ifa<bandd<e, then ad
< be;
(b) ifa< band d<e, then ad< be,
The proof of (a) is an exercise; the proof of (b) is beyond our scope.

Theorem 6.4.6: Let a and b be cardinal numbers such that a is an infinite


cardinal number, b ~ 0, and a > b. Then ab = a.

We will not prove the general statement of the theorem, but the special cases
where a = c and b = XX, can be shown on the basis of our earlier work.
Sec. 6.4 CARDINAL ARITHMETIC 297

Example
We show that No-c = c. Let A =N and B = (0,1); then [A] = No and
|B| = c. We must show |A x B| = c. Define a function f from A x B to the
positive real numbers:
f:A X B- {x|x € R+},
S(n,x) =n+x.
Then fis injective, and since |R-+ |= c, it follows that |A x B|<c. Furthermore,
the map
2:(0,1)—>A x B,
&(x) = <0, x),
is injective and establishes that c<.|A x B|. Hence|A x Bl =c. #

The last operation we will discuss is exponentiation.

Definition 6.4.3: Let a and b be cardinal numbers, and let A and B be sets
such that |A| = a and |B| = b. Then a to the power b, denoted a’, is defined as
a’ = | A¥|.

It is an immediate consequence of this definition that | A?| = | A |!#!.


The most important properties of exponentiation are known as laws of
exponents; these properties are characterized by the next theorem.

Theorem 6.4.7: Let a, b, and d be cardinal numbers. Then


(a) a®’*4 = aba?
(b) (ab)? = ab?
(c) @y=a™
Proof of (a): The proof consists of showing that a bijection exists between
sets of functions.
Let A, B, and D be sets such that |A| = a, |B| = 5, and | D| = d, where
Bo D=@. Let g: B->A and h: D— A. Because B and D are disjoint, there
exists a map f{: BU D— A such that fis an extension of both g and A. Thus we
can define a function a as follows:
a: A® x AP —» AB»,
a(<g,h>)=f where f|, = g and f |p = h.
The function @ is an injection and hence | A? x A?| <|A?!?|. Furthermore, we
can define a function f
B: ABY? > A® Xx AP,

BP) = <f la f lo
which is also an injection. (It is easy to show that 8 = a~'.) Thus
| ABV? | < | A? 4 A” |,

and we conclude that | A?°?| = |A? x A?|. §j


298 INFINITE SETS Ch. 6

Exponentiation preserves the order relations < and < in the expected fashion.

Theorem 6.4.8: Let a, b, d, and e be cardinal numbers. Then


(a) ifa<bandd<e, then a’? < b’.
(b) ifa<bandd<e, then a’ < Bb’.

Once again, the proof of part (b) is beyond our scope. We leave the proof of part (a)
as an exercise.

Problems: Section 6.4

1. Determine the values of the following expressions. The letter n denotes an arbitrary
member of N.
(a) n+WNo (b) n+e (Cc) No
+ No
(d) e+e (e) No (f) n-c
(g) No-No (h) cc (i) O¥
(j) Is (k) 2% @
(m) Nb (n) NF (o) ©?
(p) ¢3 (q) ¢ + (o-e + 3%)
Find the cardinality of each of the following sets.
(a2) RU R?2
(b) S xX X&* where |S| =x forne N.
(c) The set of all m x n matrices with components in R.
(d) The set of all x component vectors with integer components.
(e) The set of all functions from &* to N.
(f) The set of all functions from I x I to I.
(g) The set of m x n matrices with rational components.
Prove Theorem 6.4.1.
We have not defined an operation of subtraction for cardinal numbers. Show that
the following definition is unsatisfactory because the operation is not well defined.
“Definition”: Let A and B be sets such that |A| = a, |B| = 6, and Bc A. Then
a—b=|A—Bl.
Let a, b, and d be cardinal numbers.
(a) Prove thatifa<b,thena+d<b4+d.
(b) Show by counterexample that a <b does not imply that a+d<b+d.
(c) Prove that if a <b, then ad < bd.
(d) Show that a < } does not imply ad < dd.
Prove Theorem 6.4.4.
aH

Prove part (a) of Theorem 6.4.5.


Show that for any integer 2 > 2, n®> = ¢,
ee

Prove part (a) of Theorem 6.4.8.


Ch. 6 SUGGESTIONS FOR FURTHER READING 299

Suggestions for Further Reading

Halmos [1960] develops the ordinal and cardinal numbers, along with their
arithmetics. More extensive treatments of these topics are given in the books by
Stoll [1963] and Suppes [1960]. Cohen [1966] discusses the role of the continuum
hypothesis in set theory. Vilenkin [1968] presents many of the concepts of this
chapter in an informal and entertaining way.
7

ALGEBRAS

7.0 INTRODUCTION
In Chapter 0, mathematical models were described as consisting of three compo-
nents: a phenomenon or process of the real world which we wish to investigate,
a mathematical structure, and a description of the way in which the mathematical
structure represents the real world process. To be useful, a mathematical model
must have a structure whose operations and relations reflect the real world in a
satisfactory way. Choosing a mathematical structure therefore requires understand-
ing how properties can be characterized mathematically and how some properties
imply others. A familiarity with the concepts of mathematical structures will
facilitate the understanding of abstract characterizations of new models and pro-
vide a basis for the construction of new models.
The mathematical structure of a model is often presented implicitly; in this
case there is no precise specification of the mathematical structure being used.
This usually causes no difficulty, because in most cases the structure is a familiar
one and an obvious choice. In this chapter, however, it will be useful to specify in
detail each mathematical structure we consider. In addition, we will develop a
few basic properties of some of these structures, emphasizing those properties which
are useful for the models which interest us.
The mathematical structures we will investigate are algebras, sometimes called
algebraic systems or algebraic structures, and their study is often referred to as’
“modern algebra.” These structures have been used in computer science for such
purposes as to describe the functions computable by classes of machines, to inves-
tigate the complexity of arithmetic computations, to characterize abstract data
structures, and as a basis for programming language semantics. Unfortunately, the
formalisms used in various applications are often quite different from one another,
although the fundamental concepts and techniques are the same. We will develop
only some of the most basic topics of this area, but at the end of the chapter we
will describe ways in which they can be augmented to treat various applications.

300
Sec. 7.1 THE STRUCTURE OF ALGEBRAS = 301

7.1 THE STRUCTURE OF ALGEBRAS

It is possible to give a general definition of an algebra, but such a definition would


take us too deep into mathematical formalism. Instead, we will describe the con-
cept informally and then illustrate it with a number of examples.
An algebra is characterized by specifying the following three components:
1. aset, called the carrier of the algebra,
2. operations defined on the carrier, and
3. distinguished elements of the carrier, called the constants of the algebra.
The carrier is the set of mathematical objects we wish to manipulate, such as
integers, real numbers or a set of character strings; we will represent the carrier
of an algebra by S. An operation defined on the carrier is a map from S” to S.
The value of m is called the “arity” of the operation. If an operation is from S = S$!
to S, such as the operation which takes an integer x to —x or a real number y to
its absolute value | y|, the operation is called a unary operation. Operations from
S? to S, such as addition or multiplication of numbers, are called binary operations.
Ternary operations are functions from S? to S; for example, if the carrier is a set
of numbers, the construct if x ~ 0 then y else z can be defined as a ternary opera-
tion with operands x, y, and z. The constants of an algebra are distinguished ele-
ments of the carrier; these elements usually have properties of special importance.
Algebras are often formally presented as n-tuples, where the entries of the
n-tuple specify the carrier, the operations, and the constants, in that order.f

Examples
(a) The integers with the binary operation of addition and the constant 0 can be
described as an algebra in the following way.
1. The carrier is the set I = {... —3, —2, —1,0,1,2,...}.
2. There is a single operation, addition (denoted “+-”), from I? to I.
3. The element 0 is a constant.
Alternatively, this algebra can be presented as the triple <I, +, 0>.
(b) The real numbers R with addition, multiplication and unary minus can be
described as an algebra as follows:
1. The carrier is R, the set of real numbers.
2. There are two operations (“+” and “-”) from R? to R and one (“—”)
from R to R.
3. The elements 0 and 1 are constants.
This algebra can be denoted by <R, +,-,—,0,1>. #

The two examples above are of specific and familiar structures. To specify them
precisely, we would present them as n-tuples by stating, for example, “Let

+Note that the carrier of an algebra may be empty and the operations and constants may not
all be distinct. Our examples, however, will have nonempty carriers, and the operations and con-
stants will generally be distinct.
302. ALGEBRAS Ch. 7

A = <I, +, 0> be the integers under addition.” It is also common to denote an


algebra by its carrier; thus the statement “Let I be the integers under addition”
would refer to the structure <I, +), or perhaps <I, +, 0>.

Frequently we do not wish to specify a single algebra but instead a class of


algebras such that each member of the class has certain characteristics. To provide
a mechanism for this, we first introduce the concept of the signature, or species of
an algebra. Two algebras have the same signature (or are of the same species) if
they have corresponding operations of each arity and corresponding constants.
In other words, two algebras have the same signature if their n-tuples (consisting
of carrier, operations, and constants) include the same number of operations and
constants and the arities of corresponding operations are the same.

Examples
(a) The algebras <N, -, 0> and <I, —, 0> have the same signature, since each has
a single binary operation and a single constant.
(b) The structures <R, +, -,1,0> and <@(S), U, O, S, > have the same signa-
ture.
(c) The algebras <I, +, 0> and <I, +> do not have the same signature because
the number of constants is not the same. #

Two algebras can have the same signature but not be related in any substantive
way. In order to prove useful theorems about classes of algebras, we generally
need to consider properties in addition to those implied by signature. We will
treat only properties specified by axioms, where each axiom is an equation written
in terms of the elements of the carrier and the operations of the algebra. A set of
axioms, together with a signature, specifies a class of algebras called a variety;
algebras which have the same signature and which obey the same set of axioms are
said to be of the same variety. Investigations of algebras are generally concerned
with particular varieties; the theorems that are proved are based on the axioms of
the variety, and the results hold for all algebras in the given variety.

Examples
(a) Consider the variety of algebras with the same signature as <I, +, 0) and the
following axioms:
Gi) x+y=ytx,
Gi) @+y4+z2=x4+04+9,
(iii) x +0=-x.
Then <R, +, 0>, <2*, concatenation, A>, <P(S), U, 6>, <P(S), A, S>, and
<I, -, 1 are all members of this variety, and theorems proved about this variety
will hold for these specific algebras.
(b) Consider the variety of algebras with the same signature as <R, +, - »—,9, 1D
(where “—” is a unary operation) and the following axioms:
Sec. 7.1 THE STRUCTURE OF ALGEBRAS 303

(Gi) x+y=y+x,
(ii) x+y = y+x,
Gi) @4+y)+z2=x4+04+2),
(iv) (x-y)-z = x-(y-2),
(v) x-(y +z) =(x-y) + 2),
(vi) x +(—x) =0,
(vil) x +0=x,
(viii) x-l =x.
Then <I, +,°, —,0,1> and <Q, +,-,—,0,1> are algebras of the same
variety, but <P(S), U, O, V,, S>, where ~ denotes set complementation, is
not because axiom (vi) does not hold for this algebra.
(c) Consider the variety of algebras with the signature <S, o, c>, (where is a bi-
nary operation and c¢ is a constant) and the following axioms:
aoc =a,
cca = a,
Any theorems we prove for this variety will hold for the algebras <I, +, 0>,
<R, -, 1> and <Z*, concatenation, A>. Not all these theorems will hold for the
algebra <I, —, 0> (where “—” denotes subtraction), because 0 — 1 = 1, thus
violating the second axiom. #

For the remainder of this chapter, rather than deal with algebras with arbitrary
signatures, we will usually treat an arbitrary algebra such as A = <S, 0, A,k),
where o is a binary operation, A is a unary operation, and k denotes a constant.
This will simplify the presentation by eliminating the need to treat arbitrary num-
bers of operations and constants and arbitrary arities of operations, but the defini-
tions and concepts can be extended to include algebras with other signatures as well.
Before we introduce the concept of a subalgebra, we must first define the notion
of a set of elements being closed under an operation.

Definition 7.1.1: Let o and A be binary and unary operations on a set 7,


and let J’ be a subset of T. Then 7” is closed with respect to o if a,b € T’ implies
aob < T’. The subset 7” is closed with respect to Aif a € T’ implies Aa € T”’.

Examples
(a) Consider the set of natural number N, and let 7’ = {x|O< x < 10}. The
set J’ is not closed with respect to the operation +, since 7 + 7 = 14 and
14 ¢ T’. However, T’ is closed with respect to the operation max, where the
operation is defined as max(x, y) =x if x > y, otherwise max(x, y) = y.
(b) Since each operation of an algebra with carrier S is defined as a function
from S” to S, it follows that the carrier of an algebra is closed under all its
operations. #

If A is an algebra, a subalgebra of A is an algebra with the same signature


which is “contained” in A.
304 ALGEBRAS Ch. 7

Definition 7.1.2: Let A =<S,0, A, k> and A’ = <8’, 0’, A’, k’> be algebras.
Then A’ is a subalgebra of A if
fi) S’c S;
(ii) ac’ b=aocbforallabe S’;
(iii) A’a= Aaforallae S’;
(iv) k’ =k,

If A’ is a subalgebra of A, then A’ has the same signature as A and obeys the


same axioms. Furthermore, the carrier of A’ is a subset of the carrier of A which
is closed under all the operations of A and contains all the constants of A. The
largest possible subalgebra of A is A itself; this subalgebra always exists. If the set
of constants of A is closed under the operations of A, then this is the carrier of the
smallest subalgebra of A.

Examples
(a) Let E denote the set of even integers. Then <E, +,0> is a subalgebra of
<I, +, 0>.
(b) Let - denote multiplication. Then <0, 1], «> is a subalgebra of <R, +>.
(c) If M denotes the set of odd integers, then <M, -, 1 is a subalgebra of <I, -, D.
But <M, +> is not a subalgebra of <I, +> because the odd integers are not
closed under addition;e.g.,1+1=-2. #

The constants of an algebra are usually distinguished because of their special


properties relative to one or more of the operations of the algebra. The following
two definitions describe the most important of these properties for binary opera-
tions,

Definition 7.1.3: Let o be a binary operation on S. An element 1 € S is an


identity (or unit) for the operation o if for every x € S,
lex=xol=x,
An element 0 € Sis a zero for the operation © if for every x € S,
Oox=x00= 0,

When no confusion can result, the operation may not be specified, and we will
speak of an identity, or an identity element, and a zero, or a zero element.

Examples
(a) The algebra <I, -, 1, 0>, where - denotes multiplication, has an identity 1 and
a zero 0.
(b) The algebra <I, +> has an identity 0 but no zero element.
(c) The algebra <N, max> has an identity 0 but no zero element.
(d) The algebra <N, min> has a zero element 0 but no identity element.
Sec. 7.1 THE STRUCTURE OF ALGEBRAS 305

(e) Let T be the set of integers between m and n, where m <n and both m and n
are included in 7. Then <7, max) is an algebra with an identity m and a zero n.
(f) Consider the algebra <R, +, ->. The element 0 is an identity for +, but there
are no zeroes for this operation. The element 1 is an identity and 0 is a zero for
the operation -. #

Identities and zeros are sometimes called two-sided identities and two-sided
zeroes since they have the same effect when used on either the right or left. In
contrast, the following definitions characterize one-sided identities and one-sided
zeroes.

Definition 7.1.4: Let o be a binary operation on S. An element 1, is a left


identity for the operation © if for every x € S,
ljpox =x.
An element 0, is a left zero for the operation o if for every x € S,
0, ie] x = 0).

A right identity 1, and a right zero 0, can be defined in an analogous manner.

Example
Let A = <S, o> where S = {a, b, c} and © is a binary operation defined by the
following operation table. (The entry in the row labeled x and the column labeled
y is the value of xo y,)

° a 5 c

a a b b
b a b c
c a b a

Then both a and b are right zeroes but neither is a left zero. The operation o is
neither associative nor commutative. #

The following theorems establish the most useful properties of identities and
zeroes.

Theorem 7.1.1: Let o be a binary operation on S with left identity 1, and


right identity 1,. Then 1, = 1,, and this element is a two-sided identity.
Proof: Since 1, and 1, are left and right identities,
1,=1,01,= 1, I

Theorem 7.1.2: Let o be a binary operation on S with left zero 0, and right
zero 0,. Then 0, = 0,, and this element is a two-sided zero.
306 ALGEBRAS Ch. 7

The proof is similar to that of Theorem 7.1.1. The above theorems have the
following immediate consequence:

Corollary 7.1.2: A two-sided identity (or zero) for a binary operation is


unique.

If an identity exists in an algebra, then inverses may also exist.

Definition 7.1.5: Let o be a binary operation on S and 1 an identity for the


operation o. If x o y = 1, then x is a /eft inverse of y and y is a right inverse of x
with respect to the operation o. If both xo y = 1 and yox = 1, then x is an
inverse of y (or a two-sided inverse of y) with respect to the operation o.

Note that if x is an inverse of y, then y is an inverse of x.

Examples
(a) The algebra <I, + > has an identity 0 and every element x € I has an inverse
with respect to the operation +; the inverse of x is denoted —x:
x+(—x)=0.
(b) The algebra <N, +> has an identity 0 which is the only element that has an
inverse.
(c) In the algebra <I, ->, only the identity 1 has an inverse, but in <R, -) all ele-
ments except the zero element 0 have an inverse.
(d) Let T be the set of integers between m and n, where m <n andm and n are
included in 7. Then <7, max> has an identity m, but only m has an inverse.
(e) Consider the set F of all functions on a set A under the operation of function
composition. Then 1, is an identity. By Theorem 4.2.8, every surjection has a
right inverse, every injection has a left inverse, and every bijection has a two-
sided inverse. Note that one-sided inverses may not be unique.
(f) Let N, be the first k natural numbers, where k > 0:
N; = {0,1,2,...,& — 1}.
Define +, to be an addition mod k; for every x, y & Nx,
X+eyexty ifx+y<k,
=x+y—k ifx+y>k.
Then +, is an associative binary operation with an identity 0. Every element
of N,; has an inverse; the inverse of 0 is 0 and the inverse of every nonzero
element x is k — x.
(g) Let N;, be the first A natural numbers, where k > 2, and define multiplication
mod k as follows:
X*, y = z, where z G N, and xy — z = nk for some n € N.
Then 1 is an identity for the operation. An element x © N,, has an inverse in
Sec. 7.1 THE STRUCTURE OF ALGEBRAS 307

N;, only if x and k have no nontrivial divisors in common, i.e., only if x and
K are relatively prime. #

Theorem 7.1.3: If an element has both a left and a right inverse with respect
to an associative operation, then the left and right inverse elements are equal.
Proof: Let 1 be an identity for the operation o, and let x be an element with
a left inverse w and a right inverse y. Then
wox=xoy= 1].
By associativity of the operation o, it follows that
w=wol=wo(xoy)=(wox)op=loy=y, |

Problems: Section 7.1

1. Show that if © is a commutative operation defined on a set S, then every one-sided


identity is a two-sided identity.
2. Let the universe be the integers I. Fill in the following table with Y (yes) or N (no)
according to whether the set listed in the left column is closed under the operation
listed in the top row. Interpret the operations max and min as binary operations. Note
that the last two columns specify unary operations.

unary
sum product difference abs minus absval
+ : ~ jx—y| max min — |x|

(a) I
(b) N
(c) {x[/0O<x<10}
(dq) {x|-Sxx<3}
fe) {x|-10<x*<0}
(f) {2x|xe

3. Let the universe be the real numbers R. Fill in the following table with Y (yes) or N
(no) according to whether the binary operations listed in the top row have the prop-
erties listed in the leftmost column.

sum difference product abs


+ — . max min lx—y|

(a) associative
(b) commutative
(c) identity exists
(d) zero exists

4. Prove Theorem 7.1.2.


5. Prove Corollary 7.1.2.
308 ALGEBRAS Ch. 7

6. Consider the algebras <{a, b, c, d} ©> and <{a, b, c}, o>, where in each case o is defined
by one of the following operation tables:

(a) (b)

For each algebra,


(i) Is the operation commutative?
(ii) Is the operation associative?
(iii) Determine if there exists an identity with respect to the operation. If one exists,
which element is it?
(iv) If an identity element exists, determine which elements have inverses.
(v) Determine if there exists a zero with respect to the operation. If one exists,
which element is it?

Find examples of algebras with a single binary operation which have the properties
listed below. In each case, choose your algebra to have a nonempty carrier as small
as possible if such an algebra exists. Your answer can be given as an operation table.
(a) An identity element exists.
(b) A zero element exists.
(c) An identity element and a zero element exist.
(d) The carrier has more than one element. Both an identity element and a zero
element exist.
(e) An identity exists but not a zero.
(f) <A zero exists but not an identity.
(g) The operation is not commutative.
(h) The operation is not associative.
(i) A left zero exists which is not a right zero.
G) Aright identity exists which is not a left identity.
(k) An identity exists and every element has an inverse.
(1) The carrier has more than one element. An identity exists, and every element
has a left inverse, but no element other than the identity has a right inverse.

Describe a variety with the signature <5, o>, where o is a binary operation, such-
that for every algebra <T, -> of this variety, if V < T, then <V, -> is a subalgebra of
cT, » .

Programming Problem

Write a program to determine if a binary operation on a finite set of elements is


associative. The program should accept the operation table as input.
Sec. 7.2 SOME VARIETIES OF ALGEBRAS 309

7.2 SOME VARIETIES OF ALGEBRAS

Many algebraic varieties are useful in various areas of computer science. We will
consider only four of the most important varieties: semigroups, monoids, groups,
and Boolean algebras. Semigroups and monoids find application in formal lan-
guages and automata theory, groups are used in automata and coding theory,
and Boolean algebras for many aspects of information processing as well as in
switching theory. The utility of the structures is not limited to these areas, however;
all of them are used in many other areas of investigation. In this section we will
develop some of the properties of these varieties.

Semigroups

The following deceptively simple structure has been extensively studied, and
a rich theory has emerged.

Definition 7.2.1: A semigroup is an algebra with signature <S, 0, where o


is a binary associative operation.

The preceding definition establishes that the variety of semigroups consists of all
algebras with a single binary operation which satisfies the axiom of associativity:
ac(boc)=(aob)oe.
From Definition 7.1.2, it follows that if (S, o> is a semigroup and T is a subset of
S such that T is closed with respect to o, then <7, o> is a subalgebra of <S, ©;
we call <7, o> a subsemigroup of <S, o>. The use of the term “subsemigroup” to
denote a subalgebra of a semigroup is justified by the following theorem.

Theorem 7.2.1: If <S,o> is a semigroup and <T, o> is a subalgebra of


<S, o>, then <7, o> is a semigroup.
Proof: Since <T, o> is a subalgebra of <S, o>, the set T is closed under the
operation o. Since o is an associative operation on S, it is also associative when
restricted to T. Therefore <T, o> is an algebra with a binary operation which is
associative and hence <7, o> is a semigroup. J

Examples
(a) Let k >0 and S, be the set of integers greater than or equal to k; S; =
{x|x € 1 A x > k}. Then <S;, +> is a semigroup, where + denotes ordinary
addition, since the operation is associative and S, is closed with respect to +.
Note that if k < 0, the set S, is not closed under the operation of addition
and <S;, +> is not an algebra.
(b) The algebras <I, —> and <R-++, /> are not semigroups because the operations
of subtraction and division are not associative.
(c) If- denotes the operation of multiplication, the algebras <[0, 1], ->, <(0, 1), ->,
310 ALGEBRAS Ch.7

and <N, -> are all semigroups. Moreover, they are all subsemigroups of
<R, ->.
(d) Let © denote a finite nonempty alphabet. Then <Z*, concatenation> and
<X*+, concatenation> are semigroups.
(e) Let S = {a, b} and defi ne the ope rat ion © so that both a and b are righ t zero es:
aca=boa=a
ach=bob=b.,
The operation © on S is associative, since for any x,y,z € S,
xo(yoz)=xeoz=z=yozr=(xey)z,
The algebra <S, o> is a semigroup, called the right zero semigroup of two
elements.
(f) The algebras <S, max> and <S, min> are semigroups for any set S of real
numbers.
(g) Let R be a binary relation on a set S. Then <{R"|n € N}, composition> is
asemigroup. #

Monoids

We next consider the variety of monoids. A monoid is essentially a semigroup


which has a two-sided identity element.

Definition 7.2.2: A monoid is an algebra with signature <S, o, 1>, where o is


a binary associative operation on S and 1 is a two-sided identity for the operation
o, i.e., the following axioms hold for all elements a, b,e € S:
ac(boc)=(aob)coe,
aol=$a,
loa=a.
If <S, o, 1) is a monoid and Tc S, 1 € T, and ToT c T, then by Definition
7.1.3, (T, 0, 1> is a subalgebra of <S, o, 1); a subalgebra of a monoid is called a
submonoid. We leave it as an exercise to show that a submonoid is a monoid.

Examples
(a) The algebra <R, +, 0> is a monoid because + is associative and 0 is an identity
element for +. Both <I, +, 0> and <N, +, 0> are submonoids of <R, +, 0.
(b) The algebras <I, -, 1>, <N, -, D, <I+, -, > and <R, -, 1> are all monoids.
(c) The algebras <I, -, 0> and <I, +, 15 are not monoids because in each case the
constant is not an identity for the specified operation.
(d) If % is a finite nonempty alphabet, then <2*, concatenation, A> is a monoid.
If X < &* then <X*, concatenation, A> is a submonoid of
<=*, concatenation, A).
Sec. 7.2 SOME VARIETIES OF ALGEBRAS = 311

(e) Let S be any subset of the real numbers which contains a lower bound, i.e.,
there is some m € S such that m < x for all x € S. Then <S, max, m) is a
monoid. Similarly, if S contains an upper bound n, then x € S=> x <nand
<S, min, n> is a monoid.

(f) The systems (Nz, +x, 0> and <N;, -;, 1) are monoids, where

N; = {0,1,2,...,4
— 1}
and the operations +, and -, are addition and multiplication mod k. #

If <S, o, a> is a monoid, then <S, o> is a semigroup; this is sometimes expressed
by the assertion that “every monoid is a semigroup.” On the other hand, some
semigroups, such as <N, +), have an identity, and some, such as <I+, +>, do
not. A semigroup <S, o> can always be converted into a monoid by “adjoining”
(i.e., adding) a new element whose behavior is defined to be that of an identity for
the operation o. Suppose 1 is an element not in S. (If necessary we can relabel the
elements of S so that 1 ¢ S.) We can extend the operation o to S U {1} so that
for all x € SU {1}, xo l =1lox=-x. Then <S U {1}, 0, 1d is a monoid. This
process is called “adjoining an identity” to the semigroup <S, o>. Note that even
if c was an identity of <S, o>, it will not be one for the monoid CS U {1}, °o, D,
since
col=loc=c#¥l.

Groups

We next consider the variety of groups. Informally, a group is a monoid in


in which every element has an inverse with respect to the binary operation of the
monoid. More specifically a group is an algebra consisting of a set, a binary associa-
tive operation, a unary operation and a distinguished element which is a two-sided
identity for the binary operation. The unary operation maps each element of the
group to its inverse with respect to the binary operation. If we denote the identity
of the operation o by 1 and the inverse of x by x, then for every element x, there
exists an element ¥ such that xo xX =xox=1.

Definition 7.2.3: A group is an algebra with signature <S, o, , 1> such that
o is an associative binary operation on S, the constant | is a two-sided identity for
the operation o, and ~ is a unary operation defined over the carrier such that for
all x € S, X is an inverse for x with respect to o.

If A = <S, 0, °, ]1> is a group and A’ = <T, o, ', 1> is a subalgebra of A, then A’


is called a subgroup of A. A subalgebra of a group is a group.
The requirement that an inverse exist for every element of a group places
strong restrictions on the binary operation. In particular, both right and left can-
Ch. 7
312. ALGEBRAS

cellation laws hold; that is, if aoc = boc, then a = B, since


acce=boce>(acc)o€=(bocjo€é
>ao(col)=bo(ec?e)
>acol=bol
>a=b.
Ca nc el la ti on la ws do no t ge ne ra ll y ho ld for
Similarly, if c 0 a = co b, then a = b,
ps or mo no id s; for ex am pl e, if c is a ze ro el em en t,
the operations of either semigrou
then aoc = boc for all el em en ts a an d b of th e ca rr ie r.
Another property of the bi na ry op er at io n of a gr ou p is tha t all eq ua ti on s of
the form
acx=b
have a unique solution for the value of x:
x=40b;
an analogous assertion holds for eq ua ti on s of th e fo rm x oa = b. Th e op er at io n
of a group is injectiv e in th e se ns e th at if xy , th en a o x A # a o y an d
xoa#yoa; thus “multiplic at io n” by an el em en t on ei th er th e ri gh t or le ft in -
duces an injection from the car rie r to its elf . Mo re ov er , the ope rat ion of a gr ou p is
surjective in the sense that ao S = S = So a, wh er e we use ao S to den ote the set
faox|x € S}.
We leave the proofs of these assertions as exercises.

Examples
(a) The alg ebr a <I, +, —, 0> is a gro up, whe re + den ote s add iti on and — den ote s
unary minus. If K den ote s the set of all mul tip les of a giv en k & N, the n
<K, +, —, 0> is a subgroup of <I, +, —, 0>.
(b) The algebra <Q+, -,~!, 1> is a group, where - denotes multiplication, and ~!
denotes the una ry ope rat ion of tak ing the rec ipr oca l of a rat ion al num ber .

(c) Let A be any set and let P den ote the set of per mut ati ons on A. The n P is the
set of biject ive fun cti ons fro m A to A. The str uct ure <P, o, ~}, 1, is a gro up,
where o denotes composition of functions, and f~! is the inverse function of f
(d) The ope rat ion s max and min can not gene rall y be used as the bina ry ope rat ion
of a group because an inverse operation cannot be defined if the carrier has
more than one element.
(e) The algebras <N,, +x, ~, 0 are groups, if we define X = k — x.
(f) The algebras <N;, +x, ~, 1) are not groups because the element 0 < Ny has
no inverse. +

Boolean Algebras

The last variety we will consider is that of Boolean algebras.


Sec. 7.2 SOME VARIETIES OF ALGEBRAS 313

Definition 7.2.4: A Boolean algebra is an algebra with signature


- <S, +, °9 - 0, I>

(where + and - are binary operations and ~ is a unary operation called com-
plementation) and the following axioms hold. (We write ab for a-b.)
Gi) a+b=b+a !
(ii) ab = ba co mm ut at iv e law s
(ili) Deak 1 Ot
(iv) (ab)c = a(bc) ass oci ati ve law s
(v) ab+c)=ab+ac e e
(vi) a+ @c) =(at+ bla +o) distributive laws
(vil) a+0=a 0 is an identity for +
(vii) al=a 1 is an identity for -
“o ae : properties of the complement

Less formally, we can say that a Boolean algebra has two commutative, associa-
tive binary operations + and - which distribute over each other, together with a
single unary operation ~. The constants 0 and 1 are identities for -- and . respec-
tively, and for every element a,a-+ d= Oanda-a@= 1,
If <S, +, +, ~, 0, 1> is a Boolean algebra and T is a subset of S which is closed
under the operations +, -, and ~, and 0,1 © T, then <T, +, +, 7,0, 1>is a sub-
algebra of (S, +, +, ,0, 1> called a Boolean subalgebra. A Boolean subalgebra is
a Boolean algebra.

Examples
(a) It can be shown that if the carrier of a Boolean algebra is finite and has more
than one element, then the cardinality of the carrier is an even integer. The
following operation tables describe the operations of a Boolean algebra with
carrier {0, 1}.

+10 1 _
0/0 1 o] 1
1 | 1 Oo 1] 0

Note that these operations are similar to the operations V, A and — defined
for truth values in Chapter 1.
(b) Let A be any set and let ~ denote the operation of set complementation rela-
tive to A. Then <P(A), U, 1, ~, 6, A> is a Boolean algebra. This is an example
of a Boolean set algebra. The carrier of a Boolean set algebra need not be
a power set; it can be any collection of sets which is closed under union, inter-
section and complement relative to some universal set.
(c) Let S be the set of positive divisors of 30; S = {1, 2, 3, 5, 6, 10, 15, 30}. Let
x1 + x, denote the least common multiple of x, and x2; let - denote the great-
314 ALGEBRAS Ch. 7

est common div iso r and X den ote the num ber 30/x . The n <S, +, -, ~, 1, 30>
is a Boolean algebra. #

Problems: Section 7.2

Constr uct a sem igr oup usin g the ope rat ion max whi ch has a zero but no iden tity .

Let S, = {x|x € I A x > k} where k > 0. Show that (S,, +> is a subsemigroup
of <I, +>.
Construct a monoid using the operation max which has no zero and an infinite
carrier.
Let E den ote the even natu ral num ber s; E = {0, 2,4, ...} . Sho w that (E, +, 0 is
a submonoid of <N, +, 0>.
Show that every subalgebra of a monoid is a monoid.
Construct a group using max as the binary operation.
Let E denote the even integers; E = {0, —2, 2, —4, 4, ...}. Show that <E, +, —, 0>
is a subgroup of <I, +, —, 0>, where the symbol — denotes unary minus.
Construct tables for the operations of addition and inverse for the group
<Ni, +x, ys 0»
where k = 5.
Show that if o is a binary operation on 7 and 0 is a zero element with respect to the
binary operation o, then T cannot be made the carrier of a group unless T = {0}.
Prove that if <S, o, ~, 1> is a group, then for every a € S,
(a) ifx ~y, th en
aox ao y, Similarly, if x ~ y, thenxoa #yoa.
(b) aeoS=S=Soa.
(c) d =a (the inverse of the inverse of a is a).
11. Show that if <S, 0, 7, 1> isa group and Tis a nonempty subset of S such that
Vx Vylx,y € T> xope TI,
then <7, o, ~, 1> is a subgroup of <S, °, ~, 1D.
12. For each of the following digraphs, let R be the binary relation represented by the
digraph, and let S = {R"|n © 1+} be the carrier of an algebra in which composition
of relations is the binary operation. In each case, determine whether the algebra can
be presented as a semigroup, monoid, or group, and state the cardinality of the
carrier.

(a) C) C) (b) 68

(c) /\ (d) |
Sec. 7.3 HOMOMORPHISMS ~— 315

(e)

13. (a) State necessary and sufficient conditions on a binary relation R so that the set
{R"|n © N} can be made the carrier of a monoid with the operation of com-
position.
(b) State necessary and sufficient conditions on a binary relation R so that the set
{R"|n © 1+} can be made the carrier of a monoid using the operation of com-
position.
(c) State necessary and sufficient conditions on a binary relation R on a finite set
so that the set {R"| 2 © I++} can be mad e the carr ier of a grou p with the bina ry
operation of composition.
14. Let S be a set. Sho w that <{S, 6}, U, A, 7, @, S> is a Boo lea n sub alg ebr a of
<O(S), U, A, 7, O, SD.
15. Consider the following questions to determine when a Boolean algebra can be
constructed from the set of integers between and including m and n where m <n,
using the operations of max (for +) and min (for -).
(a) Do the operations max and min satisfy Axioms 1-4 of Definition 7.2.4?
(b) Do the ope rat ion s max and min sati sfy Axi oms 5 and 6?
(c) Wha t wou ld be the con sta nts if Axi oms 7 and 8 are to be sati sfie d?
(d) Can an inverse ope rat ion be defi ned whi ch sati sfie s Axi oms 9 and 10? (Hin t:
Your answer should be expressed as a function of the size of the carrier, i.e.,
ofn—m-+1.)
16. Consider a com put er whi ch uses wor ds of k bits to repr esen t non neg ati ve inte gers
in binary nota tion . The only ope rat ion is addi tion . Whe n ove rfl ow occu rs, the high
order bits are lost.
(a) What algebrai c vari ety wou ld be mos t app rop ria te to mod el addi tion in the
machine? How big is the carrier?
(b) Suppose ove rfl ow caus es the resu lt to be set to the larg est rep res ent abl e num ber .
What algebraic variety would best model addition in this case?

7.3 HOMOMORPHISMS

We wish to find ways of cha rac ter izi ng the str uct ura l sim ila rit ies of two alg ebr as
A and A’. Clearly one possib ili ty is for A’ to “lo ok jus t lik e” A, tha t is, for A’ to
be simply a relabeled version of A. Th en A and A’ mus t hav e the sa me sig nat ure ,
the carriers of A and A’ mu st hav e the sa me car din ali ty, and the ope rat ion s and
constants of the two algebras mu st hav e the sa me pro per tie s. If two alg ebr as are
similar in the sense we hav e des cri bed , the n the sim ila rit y can be est abl ish ed by
exhibiting a bijection from the car rie r of A to tha t of A’ suc h tha t the fun cti on
describes how A’ can be viewed sim ply as a a rel abe lli ng of 4. The con cep t is ma de
316 ALGEBRAS Ch. 7

precise in the following definition. For ease of exposition, we restrict ourselves to


the algebras A = <S, 0, A, kD and A’ = <S’, 0’, A’, k’> where © and o’ are binary
operations, A and A’ are unary operations, and k and k’ are constants.

Definition 7.3.1: The algebras A = <S, 0, A, kD and A’ = <S’, 0’, A’, k’> are
isomorphic if there exists a bijection / such that
(Gi) A: SS’;
(ii) h(a o b) = h@) 0’ hb);
(iii) A(A(@)) = A’(A@);
(iv) A(k) =k’.
The map h is called an isomorphism from A to A’, and A’ is said to be an isomorphic
image of A under the map A.

The preceding definition is phrased in terms of a specific signature, but an analogous


definition can be formulated for any signature. In each case, if A is an isomorphism
from an algebra A to an algebra A’, then
1. A and A’ must have the same signature,
the function A maps each constant of A to the corresponding constant of
A’, and
3. each operation of A is preserved by the function A.
If A and A’ are isomorphic algebras, they are essentially the same structure with
different names; the algebra A’ can be obtained from A by a simple change of
notation.

Examples
(a) Let E denote the set of even integers; E ={... —4, —2,0,2,4,...}. Then
the algebras <I, +, 0> and <E, +, 0> are isomorphic. This is established by
showing the map
fil-E,
f(x) = 2x,
is an isomorphism, that is, by showing the conditions of Definition 7.3.1
are satisfied:
1. The function fis clearly bijective.
2. For any integers x and y, f(x + y) = 2(x + y)
== 2x + 2y
= f(x) + f0).
3. f(0) =2-0 =0.
(b) Let R+ denote the set of positive real numbers. Then <R+, -, 1>is isomorphic
to <R, +, 0> and the map
h: R+ > R,
A(x) = log x,
is an isomorphism. To show this, we first establish that A is a bijection from
Sec. 7.3 HOMOMORPHISMS = 317

R-+ to R. The function A is surjective because for x > 0, the equation log x = y
always hasa solution of x = 2”. Because the log function is monotone increas-
ing, h is injective. Hence, h is bijective and condition (i) of Definition 7.3.1 is
satisfied. Furthermore,

h(a-b) = log(a-b) = log(a) + log(6) = A(a) + h(),


thus satisfying condition (ii). Since h(1) = log(1) = 0, condition (iv) is also
satisfied.
The isomorphism / is the mathematical basis for the slide rule.

(c) The semigroups <N, +> and <I+,-> are not isomorphic. We establish this
using a proof by contradiction. Suppose / is an isomorphism from <N, +> to
<I-++, :>. There are infinitely many prime numbers in I-+-. Since h is a surjec-
tion from N to I+, there must be some x € N where x > 2 and some prime
number p, where p > 3, such that A(x) = p. If h is an isomorphism from
<N, +> to <I+, +>, then
G) p= h(x) = A(x + 0) = A(x)-h(0), and
(ii) p = h(x) = h(« — 14+: 1D = A — 1)-A0).
But since p is a prime number, the only factors of p are p and 1. Therefore,
by (i), either A(x) = 1 or h(O) = 1, and by (ii), either AQ) = 1 or A — 1) = 1.
Since 0 << 1<x—1 <x, it follows that 1 is the image of at least two ele-
ments under the function A. We conclude that A is not a bijection and therefore
not an isomorphism. 7

Theorem 7.3.1: Let C be a collection of algebras, and let ~ be the relation


defined by A ~ A’ if and only if A is isomorphic to A’. Then ~ is an equivalence
relation on C.
The proof is left as an exercise.

In order for A to be isomorphic to A’, the map A: S > S’ must be bijective.


If h is not necessarily bijective, but the other conditions are still satisfied, then h
is called a homomorphism from A to A’.

Definition 7.3.2: Let A = <S,0, A, k> and A’ = ¢S', 0’, A’, k’> be algebras
with the same signature, and let A be a function such that
0) h: S—> S';
(ii) h(a o b) = h(a) 0’ A(b) ;
(iii) h(A(a)) = A’(A(a));
(iv) h({k) =k’;
then h is a homomorphism from A to A’.t

Figure 7.3.1 dep ict s how two alg ebr as can be rel ate d by a ho mo mo rp hi sm .

+There is a rich ter min olo gy ass oci ate d with hom omo rph ism s. Let # be a hom omo rph ism
from A to A’. If his injective, then A is a mon omo rph ism and if h is surj ecti ve, then A is an epi mor -
phism. If A = A’, then h is an end omo rph ism ; if A = A’ and hk is an iso mor phi sm, then / is an
automorphism. We will not use this terminology.
318 ALGEBRAS Ch. 7

=~
: \ Y h(S):
YL \y

Fig. 7.3.1 Representation of a homomorphism A from


A = <S, 0°, k> to A’ = <8’, 0’. k’>.
The shaded portion of S’ represents A(S), the homomorphic image
of S under A.

Alternatively, we can characterize a homomorphism from A = <S,o,k> to


A’ = ¢S’, 0’, k’> as a map A such that A(k) = k’ and the following diagram com-
mutes.

Figure 7.3.2

From Definitions 7.3.1 and 7.3.2 it is immediate that an isomorphism is a homo-


morphism which is also a bijection.

Examples
(a) Fork ¢ I, the map f,: 1 > I defined by A(x) = kx isa homomorphism from
<I, +, 0> to <I, +, 0>. If k # 0, then fis injective. If k = 1 ork = —1, then
f is bijective and therefore an isomorphism.
(b) Let f:R— R where f(x) = 2*. Then f is an injective homomorphism (but
not an isomorphism) from <R, +, 0> to <R, -, 1>.
(c) Let f: N- N, where f(x) = x mod k. Then fis a surjective homomorphism
from <N, +, 0> to <Nz, +x, 0>.
(d) Let 2 be a finite nonempty alphabet, and let ||x|] denote the length of a
Sec. 7.3 HOMOMORPHISMS = 319

string x € Z*. Then the function A defined by


A: t* > N,

h(x) = |[xIL,
is a homomorphism from <Z*, concatenation, A> to <N, +, 0>. If Z is a sin-
gleton set, then A is an isomorphism. #

Theorem 7.3.2: Let h be a homomorphism from A = <S, o, A, k> to


A’ = <S', 0’, A’, k’>. Then <A(S), 0’, A’, k’> is a subalgebra of A’, called the
homomorphic image of A under h.
Proof: To show that <h(S), o’, A’, k’> is a subalgebra of A’, the following
conditions must be established.
1. A(S) < S’. This follows from the fact that h: S— S’.
2. The constant k’ is an element of A(S). By definition of a homomorphism,
h(k) = k’, and since k ¢€ S, it follows that k’ = h(k) € A(S).
3. The set A(S) is closed under the operation 0’, i.e., ifa,b € A(S), then a ob
e A(S). But if a,b € A(S), then there exist elements x, y <¢ S such that
h(x) = aand h(y) = b. Furthermore, x o y = zforsomez € S, and there-
fore
ao'b =h(x) 0’ Ah(y) = A(x o y) = A(z) € ACS).
4. The set A(S) is closed under the operation A’, i.e., if a GA(S), then
A’a < A(S). If a € A(S), then a= A(x) for some x € S. Therefore
Ax € S, and A(Ax) = A’h(x) = A’a; hence A’a € ACS). Jj

The homomorphic image of an algebra A is of the same variety as A. The


following the ore m esta blis hes this resu lt for the vari etie s we have defi ned.

Theorem 7.3.3: Leth be a homomorphism from the algebra A to the alge-


bra A’.
(a) If A is a sem igr oup , then the hom omo rph ic ima ge of A und er A is a
semigroup.
(b) If A isa monoid, then the homomorphic image of A under A is a monoid.
(c) If Ais a gro up, the n the hom omo rph ic ima ge of A und er / is a gro up.
(d) If A is a Boo lea n alg ebr a, then the hom omo rph ic ima ge of A und er h
is a Boolean algebra.
Proof: Let A = <S, °> be a sem igr oup , and let A’ = <S’, o’>, By The ore m
7.3.2, <h(S), o> is a sub alg ebr a of A’. Fur the rmo re, the ope rat ion o’ mus t be
associative on A(S ), sin ce by the ass oci ati vit y of o and the pro per tie s of h,

A(a) 0’ (h(b) 0’ h(c)) = Aa) o' A(b © c) = h(ae (6 c))


= h((ac b) 0c) = h(a o b) 0! Ac) = (Aa) &' h(B)) © ACC).
It follows that <A(S), o’> is a semigroup.
Th e pro ofs of par ts (b) , (c), and (d) are left as exe rci ses . |
320 ALGEBRAS Ch. 7

Examples
(a) Let A be a homomorphism from the monoid <I, +, 0> to <I, +, 0 defined by
hil,
h(x) = 3x.
The homomorphic image of <I, +, 0> under Ais the monoid <{3z |” € Dj, +,0>,
which is a submonoid of <I, +, 0>.

(b) Define the map A: R > R as h(x) = 2%. Then A is a homomorphism from the
monoid A = <R, +, 0> to the monoid A’ = <R, -, 15. The image of A under
h is the submonoid <R-+, -, 1).

(c) Let S, ={x|x € I A x > k}, where k € N. Let k, m, and n be elements of


N such that kn > m, and define h as follows:
h: S; —> Sans

A(x) = nx.
Then / is a homomorphism from the semigroup <.S;, +> to a subsemigroup
of KS, D>.

(d) Let S be a nonempty set, and consider the Boolean algebras

A = (P(S), U, 0,7, 9, S>


B = <{0, 1}, +, -, 7,0, 15
Then for any a€S the following function h is a homomorphism from A to B.
h: P(S) > {0, 1},
h(T )&T,
= Oifa
A(T) = lisaé fT.
Note that A(f) = 0 and A(S) = 1, thus satisfying the condition that a homo-
morphism maps the constants of one structure to the corresponding constants
of the other.

(e) Let N; = {0,1,2,...,& — 1}, where k > 1, and let p ¢ N. The map
h: N- Ng,
A(x) = y where y = px mod &k,
is a homomorphism to a submonoid of «Ny, +,, 0>.

(f) For the universe of integers I and some k & N, define x ~ y if and only if
= ymod k. Let I/~ be the quotient set; then [x] = {y |» = x mod k}. Define
the operation + on I/~ as [x] + [y] = [x + y], and unary minus as —[x] =
[—x]. Then <I/~, +, —, [OP is a group and ,
h:1l-oVe-,
h(x) = [4],
is a homomorphism from <I, +, —, 0> to <I/~, +, —,f0op>. #
Sec. 7.3 HOMOMORPHISMS 321

Problems: Section 7.3

(a) Show that two algebras cannot be isomorphic if their carriers have different
cardinalities.
(b) Give an example to show that two algebras with the same signature may not be
isomorphic even though their carriers have the same cardinality.
Prove Theorem 7.3.1 for algebras with signature <S, o, k>, where o is a binary opera-
tion andk ¢€ S.

Suppose A is a homomorphism from <S, o> to <S’, o>, where o and °’ are binary
operations.
(a) Show that if 1 © S is an identity with respect to the operation o, then some
element 1’ € S’ is an identity with respect to o’ for the subalgebra <A(S), o>.
(b) Show that an identity for <h(S), °’> may not be an identity for ¢S’, 0’>.
(c) Show that if 0 € S is a zero with respect to o, then some element 0’ € S’
is a zero for the subalgebra <A(S), o> and h(O) = 0’.
(d) Show that a zero for <A(S), o> may not be a zero for <S’, 0’.

(a) Show that there are exactly i homomorphisms from <N;, +:, 0> to itself.
(b) Describe the set of all homomorphisms from <N, +, 0> to <N;, +7, >.
(c) Describe the set of all homomorphisms from <N2, +2, 0> to <N3, +3, 0>.
Prove parts (b), (c), and (d) of Theorem 7.3.3.
Most computers represent numbers with binary sequences of a fixed length. Only
a finite set of numbers can be represented exactly, and “arithmetic overflow” occurs
when the result of a computation is larger than any of the numbers which can be
represented. Consider the following strategies for treating arithmetic overflow. For
simplicity, we will treat only the natural numbers and the operation of addition. For
each of the following functions f, determine whether f is a homomorphism from
<N, +, 0> to the specified algebra <5, , 0>, where S is the set of binary sequences of
length k. In each case, the operation @ is based on binary addition and is described
by means of examples. In the illustrative examples given below, we use k = 3.
(a) The & bits represent the least significant digits of the k digit binary representa-
tion of each natural number. The operation @ is the usual binary addition
except that if overflow occurs, the leading digits are lost. Thus, f(3) = 011,
f(6) = 110, and f(9) = £3 + 6) = 011 © 110 = 001 = f(8n + 1)forallneN.
(b) Ifn < 2*, then f(m) is the & digit binary representation of n. If n > 2*, then f(#)
is represented by the k digit binary representation of 2" — 1. Thus f(3) = 011,
f(6) = 110, and £9) = f3 + 6) = 011 © 110 = 111 = f(®) for all x > 7.
(c) One bit is reserved for an indication that overflow has occurred. (We will use 0
for no overflow, 1 for overflow, and use the leftmost bit as the overflow indica-
tor.) For all numbers less than 2%~, the numbers are represented in their
k — 1 digit binary representation and the overflow bit is set to 0. If n > Qe-1),
then f() consists of the digit 1 followed by the k — 1 least significant digits of
the binar y repre senta tion of n; e.g., if k = 3, then f(12) = 100. Thus f(3) = 011,
fQ2) = 010, and f(3 + 2) = 011 © 010 = 101 = f(4n + 1) for alln e N.

Let A = <S, 0, k> and A’ = <8’, 0’, k> and let h be a hom omo rph ism fro m A to A’.
Show that if <7, 0’, k’> is a sub alg ebr a of A’, the n <h7 '(7 ), °, k> is a sub alg ebr a of A.
322 ALGEBRAS Ch. 7

8. Let X be a finite alphabet, and consider the monoid <Z*, concatenation, A>. This is
sometimes called the free monoid generated by X. The free monoid has the following
important property:
Let <S, o, 1> be an arbitrary monoid. For any map A: X —S, there is a unique
extension of 4 to a homomorphism h*: X* - S.
Prove this property.

7.4 CONGRUENCE RELATIONS

A congruence relation is an equivalence relation defined on the carrier of an algebra


such that the equivalence classes of the relation are “preserved” by the operations
of the algebra. The notion of congruence is a generalization of the notion of
equality.
The most familiar example of a congruence relation is one used to associate
ordered pairs of integers with rational numbers. We define a fraction as an ordered
pair of integers <p, q> (written p/q) where q + 0, and let F be the set of all fractions.
The binary operations of +, —, and - and unary — can be defined on F using the
corresponding operations on integers as follows:

(p/q) + (r/s) = (ps + 19q)/(9s),


(p/q) — (r/s) = (ps — rq)/(Qs),
(p/q)-(r/s) = (pr)/(qs),
~ (p/9) = (—P)/4-
Note that the fractions 1/2 and 2/4 are not equal; they are distinct by virtue of
their being different ordered pairs. However, we usually want to treat 1/2 as in-
distinguishable from 2/4 and 2/2 the same as 1/1. This is done by establishing an
equivalence relation ~ over F as follows:
big ~ r/s <> ps = rq.
The set of rational numbers Q is defined to be the quotient set F/~. Thus, the ration-
al number commonly denoted by “1/2” actually represents the set

t+ +, (—3/—6), (—2/—4), (—1/—2), (1/2), (2/4), 3/6), . . 3


and we write, for example, 1/2 = 2/4 because these fractions represent the same
equivalence class. The equivalence relation ~ over F is particularly useful because
of a “substitution property” which makes this relation analogous to equality with
respect to arithmetic operations; substituting one operand for another of the same
equivalence class will not change the equivalence class of the result. For example,
just as the relation of equality of two integers a and b is preserved when both
integers are multiplied by an integer c, i.e.,
if a = b, then ac = be,
CONGRUENCE RELATIONS 323
Sec. 7.4

the relation ~ is preserved when two eq ui va le nt fra cti ons are mu lt ip li ed by an ot he r:

pla ~ rls > ps = rq


=> (ps)(tu) = (ray(tu)
=> (pt)(su) = (rt)(qu)
= (Gz) ~ (i)
This est abl ish es tha t for any fra cti ons a, b, an d ¢,

ifa ~ b, then ac ~ be.


es er ve d un de r the ot he r op er at io ns as we ll ;
In fact, equivalence of fractions is pr
a ~ b, th en th e fo ll ow in g as se rt io ns ho ld :
if a, b, and ¢ are fractions and
ctar~ct+b atcr~b+e
c—-an~c—b a—c~b—c
caw eeb a-c~ bec

—-a~—b
wh ic h is pr es er ve d un de r th es e op er at io ns ,
Because ~ is an equivalence relation
sp ec t to th e bi na ry op er at io ns of +, —,
we say ~ is a congruence relation with re
-, and the unary operation —.
re la ti on s fo r op er at io ns of ar bi tr ar y ar it ie s,
Rather than define congruence
th e al ge br a A = <S , 0, A> , wh er e o is a
we will restrict our formal definition to
op er at io n. We wil l us ua ll y wr it e ab fo r a o b.
binary operation and A is a unary

Definition 7. 4. 1: Le t A = <S ,° , A> be an al ge br a with a binary operation


o and a unary operatio n A an d let ~ be an eq ui va le nc e relation on S. Then ~ is
a congruence rela ti on on A if an d on ly if fo r all el em en ts a, b,c € S,
(i) if a~ b, th en ac ~ be an d ca ~ cb
Gi) ifa~ b, then Aa ~ Ab.
The equivalence clas se s of ~ ar e ca ll ed th e co ng ru en ce classes of the relation ~.

ea k of a re la ti on ~ on a se t S as a co ng ru en ce re la ti on wi th
Inform al ly , we wi ll sp
io n o if ~ is a co ng ru en ce re la ti on on th e al ge br a <S , o> . A
respect to th e op er at
re la ti on on an al ge br a A wi th ca rr ie r S if an d on ly if ~
relation ~ is a co ng ru en ce
sp ec t to ea ch of th e op er at io ns of A.
is a congruence on S with re

Examples
Equality is a congruen ce re la ti on on an y al ge br a.
(a)
er at io n of ad di ti on . Th e eq ui va le nc e
(b) Consider the integers I together with the op
so me gi ve n k € N is a co ng ru en ce
relation ~ of “equivalence mod k” for
324 ALGEBRAS Ch.7

relation on the algebra <I, +>, where


x ~ y if and only if x = y mod k.
To show ~ is a congruence relation with respect to the operation +, we
must first show that it is an equivalence relation; this was established by The-
orem 3.7.1. Then we must show that if a~ b, then a+c~b-+e and
e+a~c+b. Suppose a~ b. Then a ~b=kn for some n € I. Then
(a+c)—(6+c)=a—b=kn; hence a+c~ b+ c. Moreover, by the
commutativity of addition, c + a ~ c + b. Thus ~ is a congruence relation
over <I, +).
The relation ~ can also be shown to be a congruence relation on I with
respect to the operations of multiplication, subtraction, and unary minus.
Note that if k = 0, then equivalence mod k on I is the equality relation and
there are No congruence classes in I/~. If k ~ 0, then there are k congruence
classes in I/~:
I/~ = {(0], [1], (2],...,[& — 10.
(c) Consider the algebra A = <N, -,0> and the equivalence relation
x ~ y<>[( is even and y is even) V (x = y)].
We will show that ~ is a congruence relation on A.
Since multiplication is commutative, it will suffice to show that if x ~ y
then kx ~ ky. Suppose x ~ y. Then either x = 2m and y = 2n for some
mneée N,orx=y.
Case 1: If x = 2m and y = 2n, then for any k € N, kx = 2km and ky =
2kn. Since both kx and ky are even, kx ~ ky.
Case 2: If x = y, then kx = ky and therefore kx ~ ky.
It follows that ~ is a congruence relation on A.
(d) Consider the unary operation A defined on the set of fractions F as

P2\. P.,
A( q ) q?
and define p/q ~ r/s <> ps = rq as before. Clearly if a = b then A(a) = A(b);
but a ~ b does not imply A(@) ~ A(d), e.g., A(1/2) # AQ/4). Thus, ~ is
not preserved by the operation A, and consequently ~ is not a congruence
relation on<F, A>. #
The following theorem gives another characterization of a congruence relation
with respect to a binary operation.

Theorem 7.4.1: The equivalence relation ~ is a congruence relation with


respect to the binary operation o if and only if whenever a ~ b and c ~
d, then
ac ~ bd.
Proof:
(a) (only if) Let ~ be a congruence relation with respect to the binary opera-
tion o, and suppose a ~ b and c ~ d. Buta ~ b implies ac ~ be,
and
c ~ dimplies be ~ bd. By transitivity of ~, we conclude ac ~ bd.
(b) (if) Suppose ~ is an equivalence relation such that if a ~ b ande ~ d bd
then ac ~ bd. Since c ~ c, it follows that if a ~ b, then ac ~
bc. Simil-
Sec. 7.4 ; CONGRUENCE RELATIONS 325

arly, if a ~ b, then ca ~ cb. It follows that ~ is a congruence relation


with respect to the operation o. Jj

A homomorphism h from an algebra A with carrier S to an algebra A’ with carrier


S’ is a map from S to S’ which preserves the operations of A. As with any map, a
homomorphism induces a natural equivalence relation over its domain; under this
relation , a ~ b if and only if A(a) = A(b). The next theo rem show s that if / is a
homomorphism, then the induced equivalence relation is, in fact, a congruence
relation on A.

The ore m 7.4. 2: Let A = {S, 0 A}, be an alg ebr a wit h a bin ary ope rat ion o
and a unary ope rat ion A, and let h bea hom omo rph ism fro m A to A’ = <S’, o', A.
Then the equ iva len ce rela tion over S ind uce d by / is a con gru enc e rela tion on the
algebra A.
Proof: Two elements a,b € S are equivalent under the relation induced by
h if and onl y if h(a) = A(b) . To sho w this is a con gru enc e rela tion on A we mus t
show
(i) ifa~ b, then Aa ~ Ab, and
(ii) ifa~ bandc~d,thenacc~ bod.
(i) If a~ b, then h(a ) = A(b ), and the ref ore A’A (a) = A‘h (b) . But sin ce A is a
homomorphism, (Aa) = A‘h (a) and h(A b) = A’h (b) . The ref ore h(A a) =
h(Ab), and hen ce Aa ~ Ab. Thi s est abl ish es tha t ~ is a con gru enc e rel ati on
with respect to the unary operation A.
(ii) Ifa ~ ba nd c ~ d, the n A(a ) = h(b ) and hA(c ) = h(d ). The ref ore
h(a) 0 h(c) = h(b) o' h(a).
Since h is a homomo rp hi sm , h(a © c) = h(a ) 0’ h(c ) and h(b o d) = h(b ) o' h(d );
hence h(a o c) = h(b o d). It fol low s tha t aoc ~ bod , thu s est abl ish ing tha t
~ is a congruence rel ati on wit h res pec t to the bin ary ope rat ion o.
Thus ~ is a congruence relation on the algebra A.

Example
Consider the homomorphism A fr om the al ge br a <2 *, co nc at en at io n, A>
Th e eq ui va le nc e rel ati on ~ in du ce d by h is
to <N, +, 0> defined by A(x) = ||x||.
the following:
w~ v<>h(w) = h(v)> | wl] = llell.
le nc e rel ati on w ~ v<=> llw || = |jo lf is a
Since # is a homomorphism, the equiva
co nc at en at io n. It fol low s tha t if || w|| =
congruence relation on &* with respect to
{|| and |] yl] = [Z|], then | wyl] = lezl
Problems: Section 7.4

1. Let F deno te th e se t of fr ac ti on s as de fi ne d in th is se ct io n. Show that the relation


re la ti on on <F, +, —, —> wh er e the fir st oc cu r-
piq ~ ris <> ps = rq is a congruence
op er at io n of su bt ra ct io n an d the se co nd
rence of “—” represents the binary
326 ALGEBRAS Ch. 7

occurrence of “—” represents the unary minus. (Note that you must show that ~ is
an equivalence relation.)
2. Foran arbitrary monoid A = <S, o, 1>, show that equality and the universal relation
S x S are both congruence relations on A.
3. Consider the algebra A = <I, +). For each of the following binary relations on I,
prove or disprove that the relation is a congruence relation on A.
(a) x~ yo X<0Ay<O0VXS>OA YS)
(b) x~ yoo |x —y| < 10
(c) x~yoe7w=y=0VX%KFOA yD)
(qd) x~yoxby.
4. Let k be a natural number. Describe the class of all congruence relations on an
algebra of the form <{0, 1, 2,..., &}, max>.
5. An ideal of a semigroup A = <5, ©» is a subset K of the carrier S such that if x ¢ K
and y € S, then xo y € K and yox € K. For an arbitrary ideal K, define the
equivalence relation ~ over the carrier S as follows:

x~yeo[xyye KV (x
= y)].
(a) Show that if A has a zero element 0, then 0 < K.
(b) Show that if A has an identity element 1 and S + K, then 1 ¢ K.
(c) Show that ~ is a congruence relation on A.
6. Find an infinite ideal of the semigroup <I, ->. (The definition of an ideal is given in
the preceding problem.)
7. Let A = <S, 0, A> bean algebra, where o is a binary operation and Aisa unary opera-
tion. Show that if ~ and = are both congruence relations on A, then the intersection
of ~ and = is also a congruence relation on A.
8. State the conditions for ~ to be a congruence relation on <S, []>, where [] is a tern-
ary operation on S. Denote the result of the operation [] on the operands a, b, c
by L1G, 6, ¢).
9. Let & be the alphabet of a programming language, where ¥ contains the two symbols
end and continue (note that the keywords of a language are often treated simply
as special symbols). Using - to denote concatenation, we define the relation ~ on
<2*, concatenation) to be the smallest congruence relation such that for anyx € Z*,

x-continue ~ continue-x ~ x
end-x ~ end

Thus, ~ is reflexive, symmetric, and transitive, and if w ~ x and y ~ z, then wy ~ xz.


Let [x] denote the equivalence class of x €¢ X* under the relation ~.
(a) Describe the members of the congruence class [A].
(b) Fora given x € &*, describe the shortest string in [x].
Consider the operation - on the quotient algebra Z*/~ :

[x1-b] = [xy].
Show that with respect to this operation,
(c) [continue] is an identity on £*/~, and
(d) [end] is a left zero, but not a right zero.
Sec. 7.5 NEW ALGEBRAS FROM OLD = 327

For U, Vc &*/~, let U- V denote the set {[xy]| [x] € U and [y] © V}. We define a
matrix product on these sets in the usual way, but using union for + and set product
for multiplication; thus

i mnie fa Een nay,

U2, Ur. Vor Vr (Uo4+Vi1) U (22° Var) (Ua Via) U (U 22+ V2)
(e) Findann x n matrix which is a left identity for this matrix operation (each entry
of the matrix will be either [end] or [continue]).
(f) Is this left identity also a right identity?

7.5 NEW ALGEBRAS FROM OLD

There are seve ral way s of com bin ing alg ebr as to buil d new ones . We will disc uss
two methods in this section.

Quotient Algebras

We firs t trea t the top ic of quo tie nt alg ebr as. Rec all tha t if ~ is an equ iva len ce
relation over a set S, then [x] denotes the equivalence class of x € S.

Def ini tio n 7.5. 1: Let A = <S, 0, A, k> be an alg ebr a wit h a bin ary ope rat ion
o, a unary ope rat ion A, and a con sta nt k, and let ~ be a con gru enc e rel ati on on
A. The quotient algebra of A wit h res pec t to the rel ati on ~, den ote d by A/~ , is
the algebra <S/~, 0’, A’, [k]>, where
(i) S/~ is the quo tie nt set of S und er the rel ati on ~ (the ele men ts of S/~
are the equivalence classes of the relation ~),
(ii) For all [a], [6] € S/~ , [a] o’ [6] = [ao 4]; and A’[a ] = [Aa] ,
(iii) [k] is the equivalence class of k under ~.

To show the system defined above is in de ed an al ge br a, we mu st pr ov e tha t the


operations o’ and A’ are well-defined. This re qu ir es sh ow in g tha t the res ult of ap pl y-
ic h el em en ts of the eq ui va le nc e cla sse s are us ed
ing 0’ or A’ does not depend on wh
to compute the res ult . Th is ca n be sh ow n as fo ll ow s:
(a) To show that A’ is we ll -d ef in ed , we mu st sh ow th at if [a] = [b] , th en
A'[a] = A’[b]. If [a] = [6] , th en a ~ 6. Si nc e ~ is a co ng ru en ce re la ti on ,
]. Si nc e A’[ a] = [A a] an d A’[ b] = [Ad ], it
Aa ~ Ab; therefore [Aa] = [Ad
follows that A’[a] = A’‘ [b] . Th us , the op er at io n A’ is we ll -d ef in ed .
o’ is we ll -d ef in ed , we mu st sh ow th at if
(b) To show that the operation
[a] = [b] and [c] = [a], then [a] 0’ [c] = [b] 0’ [d] . If [a] = [Bb] , an d [c] =
[d], thena ~ bandc ~ d. Since ~ is a co ng ru en ce re la ti on ,a cc ~ bo d,
Si nc e [a] 0’ [c] = [a e c] an d [)] 0’ [d] =
and therefore [a c] = [bo d].
[b o d], it follows that [a] 0’ [c] = [8] -’ [d] . Th er ef or e, 0” is we ll -d ef in ed .
er at io ns on S/ ~, it fo ll ow s th at A/ ~ is an
Since A’ and o’ are well-defined op
algebra wi th the sa me si gn at ur e as A.
328 ALGEBRAS Ch. 7

The operations and constants of a quotient algebra retain many of the prop-
erties of the original algebra. For example, if the operation ° is commutative,
then ©’ is as well, since

[a] 0° [6] = [a 0 b] = [6 © a] = [B] ©" [a].


Similarly, if o is associative, so is o’. If k is an identity for o, then [k] o’ [a] =
[k o a] = [a] and therefore [k] is an identity for ©’. Similarly, if k is a zero for ,
then [k] is a zero for 0’. It follows that if ~ is a congruence relation on a semigroup
A, then A/~ is a semigroup. Corresponding statements hold for monoids, groups
and Boolean algebras, since all the axiomatic properties of the structures are
preserved in the quotient algebras.

Example
Let F be the set of fractions as defined in Section 7.4 and consider the algebra
A=<F,+,—,—>. If ~ is the relation p/g ~ r/s<> ps = rq, then ~ is a con-
gruence relation on the algebra A. The carrier F/~ of the quotient algebra A/~
is the set of rational numbers Q. #

Recall that if ~ is an equivalence relation on a set S, then the canonical map


from S to S/~ is defined as follows:
f:S— S/~,
S(@ = [a], where [a] is the equivalence class of a € S.
If S is the carrier of an algebra and ~ is a congruence relation, then the canonical
map is a homomorphism. This is established by the following theorem.

Theorem 7.5.1: Let A be an algebra, and let ~ be a congruence relation on


A. Then the canonical map h: S —> S/~ defined by h(a) = [a] is a homomorphism
from the algebra A to the quotient algebra A/~.
Proof: We will prove the theorem for the special case A = <S, 0, A,k,
where o and A are binary and unary operations respectively. Let ~ be a congruence
relation on A, and A/~ = <S/~, 0’, A’, [k]> be the quotient algebra. Let A be the
canonical map from S to $/~. Then, by definition of the quotient algebra,
G) [a] 0’ [b] = [ao dB], and
Gi) A’[a] = [Aa].
To show that A is a homomorphism, we first show that / preserves the operations
of A:
(i) h(a o b) = [a o b] = [a] ©’ [b] = A(a) 0’ A(b), and
(ti) h(Aa) = [Aa] = A’[a] = A’h(a).
Finally, we observe that h(k) = [k], so the constant of A is mapped to that of
A/~. Hence his a homomorphism. J

The preceding theorem establishes that if ~ is a congruence relation on the


algebra 4 = <S, o, A>, where o and A are unary and binary operations respectively,
Sec. 7.5 NEW ALGEBRAS FROM OLD 329

then the following diagrams commute (where A x h is the map on S x S such


that h x h(<a, b>) = <h(a), h(b)>):

SxS 5 S S A S
ath)
| © J ceeeeeRUReNAnenEE

hXh h h h

(S/~) X (S/~) s S/~ S/~

Product Algebras

Another way of constructing new algebras from old ones is by taking a


“direct product” of two algebras with the same signature. Recall that two sets S’
and S” can be com bin ed by the oper atio n of cart esia n pro duc t to for m S’ x S”.
The operation of cartesian product can be extended so that two algebras A’ and A”
can be combined to form the product algebra A’ x A”. The carrier of the product
algebra is the cart esia n pro duc t of the carr iers , and each oper atio n of the pro duc t
algebra corresponds to the operations of the original algebras acting in a “pair-
wise” fash ion on the elem ents of the carr ier. Not e that defi niti ons simi lar to the
following one appl y only if the alge bras A’ and A” have the sam e sign atur e.

Definition 7.5. 2: Let A’ = <S’, 0’, A’, ky, and A” = (S" , 0, AY, Kk" be
algebr as, whe re ’ and ” are bin ary ope rat ion s and A’ and A” are una ry ope rat ion s.
The direct product of A’ with A”, is the algebra
AX A’ —_ <S’ x“ Ss", °, A, Xk’, k'S,

where <a, c> 0 <b, d> = (a0 ' b,c 0" dy and A¢a , c> = <A’ a, A"c > for all <a, ©;
<b, d> in S’ x S’’. The alg ebr a A’ x A” is also cal led the pro duc t alg ebr a.

If the direct product of two algebras is defined, then the product algebra has
the same signature as the operands. If both operand algebras are of the same
variety, then the variety of the product algebra will be the same as that of the
operands; for example, the direct product of two semigroups is a semigroup.

Examples
(a) Let A = <N, +,0> an d A’ = <N , +, 0> . Th en A x A’ = (N ?, +, <0, 0;
the oper at io n + of the pr od uc t al ge br a is de fi ne d by the eq ua ti on
<a,c> + <b,d> = <a+b,c¢+d)>.

0» and A’ = <N3, +35 0>, where N2 = {0, 1}, N; =


(b) Let A= <Na, +a,
{0, 1, 2}, and +2 and +3 de no te the op er at io ns of ad di ti on mo d 2 an d mo d 3
uc t al ge br a A xX A’ is <N z X No, +, <0 ,0 >> . Th e
respectively. The prod
carrier of A x A’ is the set {<0 , 0D, <0, 1), <0, 2, <1, 0>, <1, 1D, <1, 25} . Th e
330 ALGEBRAS Ch. 7

operation + of the product algebra is pairwise modular addition; thus


<1,1> + <1, 1b = <0, 25. The constant of the product algebra is the ordered
pair <0, 0>. We leave it as an exercise to show that A x A’ is isomorphic to
«Ne, +6, 0>. #

In this chapter we have only covered some of the most basic and well-understood
topics of the field usually referred to as universal algebra. It is possible to extend the
concepts we have described in. many interesting and important ways. For example,
relational algebras permit relations on the carrier to occur in the signature of the alge-
bra; in our treatment, relations could only be included indirectly, e,g., by choosing
a partially ordered set as the carrier. Another extension would be to relax the require-
ment that the operations of an algebra be defined for ali possible operands. For
example, in the formulation we have presented, <R, /> is not an algebra because
the operation of division is not defined if the divisor is 0, Permitting operations to
be defined only for some of the possible operands gives another kind of mathe-
matical structure called a partial algebra. We can also extend the concept of algebra
to that of a many-sorted algebra; this is a mathematical system in which elements
from various sets (rather than a single carrier) can occur as operands, and not all
operations need be defined for all operands. Thus, we could use one set to represent
the integers, another set to represent the floating point numbers, and a third set to
represent truth values. In such an algebra, arithmetic operations are defined on
the sets of numbers, and Boolean operations are defined on truth values. The
ceiling and floor functions are unary operations which map the real numbers to
the integers. A relation such as “<” can be represented as an operation which has
numbers as operands and whose result is a truth value. Extensions such as these
are currently being applied to a number of problem areas of computer science,
the most notable of which is the semantics of programming languages.

Problems: Section 7.5

1. Let S, = {x|x € 1A x >k}, where k GN. Let m and vn be elements of N such


that nk > m, and let 4 be the following homomorphism from A = <S;, +> to
A’ = (Sy, +):
_ Si,

A(x) = nx.
Let ~ be the congruence relation on A induced by A. Describe the quotient algebra
A/~.
2. Let h be a homomorphism from A = <S, 0, A, k> to A’ = (S’, 0’, A’, k’>, and let ~
be the equivalence relation induced on S by h:

x~ y<> h(x) = hy)


Show that A/~ is isomorphic to the subalgebra <A(S), 0’, A’, k> of A’.
3. Let A =<S, 0, 1> and A’ = ¢S’, 0’, 1D be monoids. Show that the product algebra
A X A’ isa monoid.
Ch. 7 SUGGESTIONS FOR FURTHER READING 331

Let A = <{1, 2, 3}, max, 1> and A’ = <{5, 6}, min, 6>. Specify the product algebra
A x A’ by constructing an operation table and identifying the constants.
Let A’ = <8’, 0’, A’, 1 and A” = <8”, 0”, A”, 1’ where ©’ and o” are binary
operations and A’ and A” unary operations, and consider the product algebra
A’ x Av = 6S’ XK 8,0, AKU, I>.
(a) Show that if the binary operations of A and A’ are commutative, then the binary
operation of the product algebra is commutative.
(b) Show that if the binary operations of A’ and A” are associative, then the binary
operation of the product algebra is associative.
(c) Show that if the constants of A’ and A” are identity elements with respect to
their binary operations, then the constant of the product algebra is an identity
with respect to its binary operation.
(d) Show that if the constants of the algebras A’ and A” are zeroes with respect to
their binary operations, then the constant of the product algebra is a zero with
respect to its binary operation.
(ec) Show that if A and A’ are groups, then the product algebra is a group.
Let A and A’ be alg ebr as wit h non emp ty carr iers and def ine the rel ati on ~ ove r a
product algebra A x A’ as follows:
<w,x> ~ <y, Z) <> w= y.
(a) Determine when ~ is a congruence relation on AXA’.
(b) Show that if the relation ~ defined above is a congruence relation, then
(A x A’/~ is isomorphic to A.
Let A; = <N;, +;,0> where N, = {0,1,2,...,/—1} and +, denotes addition
mod j.
(a) Show that A, x A; is isomorphic to Ag.
(b) Describe the set of congruence relations on Az X Az.
(c) Describe the set of con gru enc e rel ati ons on A,,, whe re m € I+.

Suggestions for Further Reading

A number of excellent books treat a ra ng e of top ics in al ge br a; the se in cl ud e


Fraleigh [1969] and Herstein [1964] . Mo re ad va nc ed tr ea tm en ts wh ic h are clo ser
to the spirit of what was presented in thi s ch ap te r in cl ud e Ma cL an e an d Bi rk ho ff
[19 68] . Gil l [19 76] an d St on e [19 73] tre at a va ri et y
[1967], Cohn [1965], and Gratzer
of structures with an emphas is on ap pl ic at io ns re le va nt to co mp ut er sc ie nc e.
APPENDIX

THE PROGRAMMING LANGUAGE

The programs of this text have been written in an informal programming lan-
guage based on ALGOL 60. Because our principal concern is the clear and unam-
biguous description of algorithms, we have used the ALGOL 60 framework
whenever it has been convenient, but abandoned it when doing so resulted in a
more easily understood algorithm description. This has resulted in a language with
the following properties:

1. Simple data types include integers, real numbers, and character strings.
Complex data types include whatever is convenient for treating the prob-
lem, including arrays, lists, graphs, edges and nodes. The data type of a
program variable will be evident from the context; we will not include
formal declarations in the programs. Similarly, the scopes of variables
will be clear from the context.
2. The operations used in the language include the arithmetic operations,
the floor and ceiling functions, and concatenation of character strings.
When convenient we will also use other operations, requiring only that
they be clear and unambiguous.
3. The conditions of the language (used in conditional and iteration state-
ments) include all propositions whose truth values can be established at
the appropriate time during program execution.

A program is a (single) statement. Each statement of the language is of one of


the types specified in the following table. The clauses in brackets [ ] are optional
and may be omitted under some specified conditions. .
Because the language is informally specified and the data types and statement
types are not completely characterized, a careful specification of the syntax and
semantics of the language is not possible. Nevertheless, the-informal description
following the table of each of the statement types and how they are used will

332
Appendix THE PROGRAMMING LANGUAGE 333

enable the reader to understand the programs of the text with a minimum of
effort.

1. variable — expression the assignment statement


2. if condition then statement! [else statement2] the if statement
3. begin
statement] ;
statement2;
. the begin statement

statementk
end
4. while condition do statement the while statement
5. for variable < initial-value [step step-size] until final-
value do statement the for statement
6. procedure procedure-name [(list of parameters)]: the procedure definition
statement statement
7. return [expression] the return statement
8. procedure-name (list of arguments) the procedure cail statement
9. comment: character string the comment statement
10. other statements

1. An assignment statement is of the form


variable <— expression
Execution of an assignme nt st at em en t cau ses the ex pr es si on on the rig ht
of the assignme nt ar ro w “< ” to be ev al ua te d, an d the res ult ing val ue to
be assigned to the var iab le on the lef t sid e of the as si gn me nt ar ro w. Fo r
example, execution of
X<a-ypra

causes the variable x to be as si gn ed the val ue of the su m of y an d z.


Execution of
iei+l

causes the val ue of i to be in cr em en te d by 1.


2. An if statem en t (or con dit ion al st at em en t) is of the fo rm

if cond it io n th en st at em en tl [el se st at em en t2 ]

where condition is a prop os it io n an d st at em en t] an d st at em en t2 are


statements. Execution of an if st at em en t ca us es co nd it io n to be ev al ua te d.
If its truth value is tru e, th en st at em en t! is ex ec ut ed . If the op ti on al
else clause is present and co nd it io n ha s the tr ut h va lu e fal se, th en sta te-
ment2 is ex ec ut ed . Fo r ex am pl e, aft er ex ec ut io n of

if x < 0 then x<—- —x


the value of x wil l be no nn eg at iv e. Af te r ex ec ut io n of
if x = 0 then y< 1 else y<— 2
334 THE PROGRAMMING LANGUAGE Appendix

the value of y will be 1 if x is equal to zero; otherwise the value of y


will be 2.
A begin statement is of the form:

begin
statement] ;
statement? ;

statementk
end

and consists of a sequence of statements separated by semicolons and


enclosed between begin and end. A begin statement is called a block; it
can be used any way a statement can be used in the language. Execution
of a begin statement consists of execution of the sequence of statements
enclosed in the begin-end pair. A block causes the enclosed sequence of
statements to be treated as an entity. For example, to interchange the
values of x and y if x is less than y, we could execute the following (single)
statement:

if x < y then
begin
temp <~ X;
x<—Y;
y< temp
end

The while statement is used to control the repeated execution of a state-


ment. It has the following form:

while condition do statement

Execution of a while statement causes condition to be evaluated. If the


truth value of condition is true, then the statement after do is executed.
This process is repeated until condition becomes false. Note that if
condition is false at the time of execution of a while statement, then the
statement following the do will not be executed. On the other hand, if
the truth value of condition is true and this value is not changed by re-
peated executions of the statement following do, then execution of the
statement will not terminate.
The following statement causes the variable nfact tto be assigned the
value n!, when v is a nonnegative integer. The value of n! is defined to
be 1 if nm = 0; otherwise, n! = n(n — 1)(n — 2)... 261.
Appendix THE PROGRAMMING LANGUAGE = 335

begin
nfact <1;
while x > 1 do
begin
nfact — nfact *n;
n<n— I
end
end

A for statement is an alternative way to control repeated execution of


a statement. It has the following form:
for variable < initial-value [step step-size] until final-value
do statement
where variable is a var iab le na me (ca lle d the ind ex of the loo p) and ini -
tial-value, step-size , and fin al- val ue are exp res sio ns. If the opt ion al ste p
clause is omi tte d, the n the val ue of ste p-s ize is ass ign ed the def aul t val ue
of 1. If the value of ste p-s ize is pos iti ve, the n the eff ect of exe cut ing the
for statement is define d to be the sa me as exe cut ing the fol low ing sta te-
ment:

begin
variable < initial-value;
while variable < final-value do
begin
statement;
variable < variable ++ step-size
end
end

If the value of step-size is no np os it iv e, th en th e eff ect of ex ec ut in g the for


stat em en t is de fi ne d te be eq ui va le nt to the fo ll ow in g:

begin
variable < initial-value;
while variable > final-value do
begin
statement;
va r i a b l e <— v a r i a b l e + s t e p - s i z e
end
end

th e va lu e 0 to al l ar ra y en tr ie s A[ i]
The following statement assigns
for 0 < i < n a n d ia n ev en nu mb er :
for i<— 0 step 2 until 7 do A{i] <— 0
336 THE PROGRAMMING LANGUAGE Appendix

6, 7,8. A procedure in our language is an algorithm which can be invoked, or


called, by another algorithm. Three kinds of variables occur in a proce-
dure. A global variable is one which can be accessed and changed by either
the procedure or the program which invokes it; the same name is used
in both the calling program and a procedure when referring to a global
variable. A /ocal variable is one whose value is accessible only to the
procedure; these variables are used in the execution of the algorithm of
the procedure but are not used to communicate information between
the program and the procedure called. A parameter is a data item which
is specified explicitly at the time a procedure is invoked. Global variables
and parameters provide two ways of passing information between a
program and the procedures it invokes.
A procedure is defined by specifying the algorithm to be executed
when the procedure is invoked and the information to be passed by pa-
rameters. A procedure definition statement has the form
procedure procedure-name [(parameter list)]: statement
where procedure-name is the name used to invoke the procedure and
parameter list is a finite sequence of dummy variables called formal
parameters; the elements of the parameter list are separated by commas.
When the procedure is invoked, the statement following the colon is
executed. Note that a procedure definition statement merely defines an
algorithm; the algorithm is not executed until the procedure is invoked.
Procedures are either function procedures or subroutine procedures.
A function procedure is invoked by using the procedure name in an
expression, just as a variable name would be used; the procedure name
is followed by a (possibly empty) list of arguments called the actual
parameters. A value for the procedure name will be computed according
to the algorithm specified in the procedure definition statement. The
value to be substituted for the procedure name is specified in the pro-
cedure definition by a statement of the form
return expression
For example, the following is a definition of a procedure to compute the
absolute value of a real number.
procedure ABS(x):
if x > 0 then return x else return —x
Execution of the following program segment will set the variable y equal
to the absolute value of z:
y — ABS(z)
and execution of the program segment
yo 2+ ABS + 1)
will assign the value 6 to y if z = 3 and the value 4 to y if z = —3.
Appendix THE PROGRAMMING LANGUAGE 337

A subroutine procedure is called by executing a statement of the


form
procedure-name (argument list)
A subroutine procedure definition may or may not have a return state-
ment; execution of the subroutine algorithm terminates either by execut-
ing the statement
return
or by completion of the execution of the statement which defines the
algorithm. The following subroutine procedure interchanges the values
of two entries of an array A:

procedure SWITCHG,, j):


begin
temp <— Afi];
Afi] <— A[J];
A[j] <— temp
end

In this procedure, temp is a local variable, A is a global variable, and i


and j are formal parameters. (There is no unambiguous specification in
the above that temp is local; it could, in fact, be global. For the programs
of this text, the context will suffice to determine which variables are
intended to be local.) If n = 4, then execution of the statement
SWITCH (2, n + 3)
will cause the values of A[2] and A[7] to be interchanged.
Most of the algorithms described in this text are presented as pro-
cedures rather than programs. These procedures would be executed as
the result of being invoked by another procedure or program.
A comment statement is of the form
comment: character string
where character string is any string of characters. A comment statement
does not affect the algorithm execution; its function is to help the reader
understand the program. We have not avoided the use of semicolons in
comments, but the extent of the comment statement will be clear in all
cases.
10. We also permit the use of other unambiguous instructions so long as
they can be imp lem ent ed in a hig h-l eve l pro gra mmi ng lan gua ge. Exa m-
ples include the following:
interchange A[i] and A[j]
set max to the largest element in the array A
make LIST empty
concatenate LIST1 to the end of LIST
ANSWERS TO SELECTED PROBLEMS

Section 1.1

3. Tautologies: a,c, d,e,f, h, i, k,l.


Contingencies: g,j, m,n.
Contradictions: 0b.

(a) (i) (“PA R)>Q


(iii) “=P
(b) (i) I will go to town if and only if I have time and it is not snowing.
(iii) If I will go to town then I have time and if I have time, I will go to town.
(a) Converse: If I don’t go, then it rains.
Contrapositive: If I go, then it doesn’t rain.
(c) Converse: If you can bake the cake, you get 4 pounds.
Contrapositive: If you cannot bake the cake, you don’t get 4 pounds.

(a) PVOV mR —7(7P A 70V )7R

<> —“(7P A 7Q) A R)


<> —(7P A 72 A R)
(c) P>(Q>P)a —P VY (“QV P)
< “PV PV 7@Q
<I1V —7Q
<> 1

(©) [P=(QV ARIA MHP A Os [7APV OV ARI A[7P A QI


[APA PA QIVIQA “PA QI
V[AR A “PA Q]
[PA QIVIMPA QIVIMPAQ A AR)
<[7P A Q]
<> ~[P V 79]

339
340 ANSWERS TO SELECTED PROBLEMS

Suppose P is false. Then P => Q is true for any proposition Q. If we know P> Q
is true and accept the false hypothesis P as true, then we can infer the truth of Q
from the truth table of =. Since Q is arbitrary, Q may or may not be true.
The only noncommutative operator is =.
The only nonassociative operator is >.

(b) No.

(a) (PO OD<>(7P A QV (PA 7Q)


(a) Gi) P|P< —(P A P)< 7P

(ii) (P|P)|(Q|Q)<> “@PIP) A (Q19))


<> —(71P) V -(7Q)

<-PVQ

(iii) (PIQ)|(P1Q)> 7A(@I1Q) A 1Q))


<> —(-(P A Q) A XP A Q))
<-PAQ

Section 1.2

1 Vx Vy dz S(x, y, 2)
Vx[—L(x, 0)] or m4x[L(x, 0)]
True
False
P(x, y) denotes x + y = 0
All integers greater than 10.
The universe contains only 3.

PO, 0) A PQ, 1)
[P, 0) V PO, 1] A (Pd,0) V PQ, D)
P(x) denotes x = x + 1.

Vx Vy dz P(x, y, 2)
Vx P(x, 0, x)

Section 1.3

L
(a) » VyLE(y, 1) > VxP(x, y, x)]
(d) Vx[P(, x, 6) <> E(x, 2)]
(g) Vx Vy[ G(x, ») A Gy, x)] > EC, y)]
(h) Vx Vy Vz[[GQ, x) A GQ, z)] > Vu Well P(x, z, u) A PCy, z, v)] => Gu, v)]]
(a) Every arithmetic assertion which is provable is true.
(d) If z = x V y and z is provable, then x is provable or y is provable.
ANSWERS TO SELECTED PROBLEMS 341

(a) IfP(x) denotes “x is prime” and E(x) denotes “x is even”, then J !x{[P(x) A E(x)].
(c) T(x): x is a train
C(x): =x is a car
F(x, y): xis faster than y
Vx{T(x) > AyICQ) A Flx, yl]
(e) Let R denote “it rains tomorrow” and W(x) denote “x will get wet.”
R=> jAx[W()]
VQ) P(x) <> “Ax P(x)
dxP(x) <> “Vx — P(x)
AIP(x) > Ax[P(x) A Vy[PO) > y = o]]
(a) True.
(b) False. Consider the universe consisting of 0 and 1, and let P(x) denote “x = 0”
and Q(x) denote “x = 1.”
(Refer to Tables 1.1.1 and 1.1.2.)
(a) Ax[P(x) A Q(x)]< [PO) A G)] V [PG) A O)) (expansion)
<> [[P) A Q)] V PQ)] A (PO) A QO) V Q)]
(distributivity)
<> [[PO) V PQ)] A [Q0) V PO) A (PO) V @d)]
A (20) V ew (distributivity)
=> [P(0) V P(1)] A [@0) V OC] (simplification)
Moreover, for this universe,
[P(O) V P(1)] A [Q0) V O()]<> 4xP(x%) A JxQ(e).
(b) Let P(x) denote “x = 0” and Q(x) denote “x = 1”.

(b) 4x dy[P) A OV)<> JxfP(X) A AyvQO)I


<> IxP(x) A 4yQQ)
=> IxP(x)
(d) dx dy[P(x) > PQ)] <> 4x Jy[ P(x) V PO]
<> dx[7P(x) V JyP()]
<> dx — P(x) V dyP0)
<> “VxP(x) V dyP(y)
<> VxP(x) => JyP0)
11. (a) Vitcic20 Wii<j<30f
Ali, /] > 0)
(c) Ali, j] = 0]
At<<20 Ajt<j<30f

Section 1.4

1. (a) F: Ym fat.
T: I’m thin.
FV T
—T
..F Disjunctive syllogism
Conclusion: I’m fat.
342 ANSWERS TO SELECTED PROBLEMS

(b) R: Trun.
B: I get out of breath.
R=>B
—B
“aR Modus tollens
Conclusion: I didn’t run.
(c) B: The butler did it.
H: His hands are dirty
B>H
Li
The only conclusions are the hypotheses.
(e) Iam not happy and my program does not run. (By modus tollens and conjunc-
tion.)
(f) All trigonometric functions are continuous functions. (Universal instantiation,
hypothetical syllogism and universal generalization.)
(i) Let A(x) denote “x is good for the auto industry.”
Let C(x) denote “x is good for the country.”
Let Y(x) denote “x is good for you.”
Let & denote the constant of “you buying an expensive car.”
The given hypotheses are:
Vx[AQ) > CQ)]
Vx{C(x) > YO)
A(b)
Then by universal instantiation
A(b) = C(b)
C(b) > Y(b)
By modus ponens
C(é).
And again by modus ponens
Y(b)
and by conjunction
C(b) A Y)
Conclusion: It is good for you and the country for you to buy an expensive car.
3. (a) J: IBM will take over the copier market.
X: Xerox will take over the copier market.
R: RCA returns to the computer market.
We wish to show
UV X)
Rol
7. 7aR
ANSWERS TO SELECTED PROBLEMS 343

Proof: 1. “(IV X) Hypothesis


2. R= Hypothesis
3, “TA 7X 1, DeMorgan
4, —1 3, Simplification
5. —R 2, 4, modus tollens
5. (b) Valid. The proof is as follows:
1 AVB hypothesis
2, A>C hypothesis
3, “B>A 1, implication
4. mB>C 2, 3, hypothetical syllogism
5. CV B 4, implication.

6. (a) T: Today is Tuesday.


C: I have a test in Computer Science.
E: Ihave a test in Economics.
P: The Economics professor is sick.
Proof: 1. T=>(CV E) hypothesis
2. P> mE hypothesis
3. TAP hypothesis
4, T 3, simplification
5. CVE 1, 4, modus ponens
6. P 3, simplification
7. “iE 2, 6, modus ponens
8 C 5, 7, disjunctive syllogism.

(c) T(x): x is a trigonometric function.


P(x): x isa periodic function.
C(x): x is a continuous function.

—“Ax[T(x) A 7P@)]
Ax[P(x) A C@)]
“. AVX[T(x) > 7C)])
a dif fer ent int erp ret ati on con sid er a uni ver se con -
The argument is invalid. For
ble and a (ro und , rub ber ) bal l. Def ine the pre di-
sisting of a (round, glass) mar
cates as follows:
T(x) denotes “x is a marble.”
P(x) denotes “x is a round object.”
C(x) denotes “x is made of rubber.”
7. (b) The third step, which asserts
3x{P(x) \ 7O(Q)) > Ix 7 PQ) A ax 7909]
is fallacious, although
Ax{A(x) A BO X) ] > [x AG ) A dx B( x) ]

S, we ca nn ot co nc lu de th at —~ R = —S. The faulty step


is true. Thus, if R =>
fa ll ac y of de ny in g th e an te ce de nt .
corresponds to the
344 ANSWERS TO SELECTED PROBLEMS

8. The error is in applying universal generalization to d. Although d was arbitrary when


it was chosen, the value of c was constrained by that of d, and choosing a new value
for d may violate these constraints.

Section 1.5

1. (a) Wx[x? is odd => x is odd].


Proof: (Indirect) Let x be an arbitrary integer and assume x is not odd.
Then x is even. In an example in the text, we showed
x is even <> x? is even,
Negating both sides, it follows that
x is odd <> x? is odd
and therefore
x? is odd => x is odd.
By universal generalization
. Vx[x? is odd > xis odd]. Jj
(b) Wx Vy[(x is even A y is even) => x + y is even].
Proof: (Direct) Assume x and y are arbitrary even integers. Then x = 2m
and y = 2n for some integers m and n: Therefore, x + y = 2m + 2n = 2(m + n).
It follows that x + y = 2k where k = m +n. Hence,x +yiseven. fj
(d) dx dylx is odd A y is odd A x + y is odd].
The assertion is false. To show this it suffices to prove that the negation
Vx Vyl(x is odd A y is odd) => x + py is even]
is true.
Proof: (Direct) Let x and y be arbitrary odd integers. Then x = 2m +1
and y = 2n + 1 for some integers m and x. Therefore,
x+y =(Qm+1)+
(Qn 4+ 1) =2m+n+ 1).
Hence, x + yiseven. fj
(f) <Ax[x is prime A x? is even].
Proof: (Constructive existence proof) Observe that 2 is prime and 2? is
even. Then
2 is prime A 2? is even.
By existential generalization,
dx[x is prime A x? is even]. ff
(g) —dx{x? +1 <0].
Proof: (Contradiction) Assume 3x[x? + 1 < 0] is true. Then for some c,
c?+1<Oorc? < —1. But —1 < 0, hence c? < 0. By property (iii), c? > 0.
This contradicts property (iv) (with x = c? and y = 0). Hence “4x[x? + 1 < 0]
is true. ff
(h) Vx Vylx —-y>O0Vy—x>0].
Proof: (By cases) By property (iv), one of the following holds: x > y,
x =y,orx < y.Ifx > y, then (by property (v)), x — y is positive and therefore
nonnegative. If x = y, then x — y = 0 and is therefore nonnegative. If x < y,
ANSWERS TO SELECTED PROBLEMS 345

then y — x is positive and therefore nonnegative. Thus, in each of the three


cases, either x — y or y — x is nonnegative. J
1 = 3 > Vx[x? < 0].
Proof: (Vacuous) The assertion 1 = 3 is false. Hence, the implication is
true. Jj
(m) dx[x? << OJ >1=1.
Proof: (Trivial) The assertion 1 = 1 is true. Hence, the assertion is true. Jf

(a) The proposition


[i A He, A 7Q) > 0) <> [Gi A A2) > Q]
is a tautology.
(b) To prove by contradiction that (H, A HM, A -:+ A A,) > Q, we would assume
the negation of the assertion, i.e.,
“7G, A A, A+++ A Ay) V QI
or
A,
\ A, N+) \ A, A 7.
Then (by applying rules of inference), we would derive a contradiction. The
proof technique described in the problem is a straightforward variation. A
proof by contradiction assumes an assertion A and proves that the contradic-
tion B follows by rules of inference; the proof technique described simply estab-
lishes A => B,

Section 1.6

2. Any program which does not halt is a solution (see Definition 1.6.1). The following
program is correct, because if it halts (it won’t), the final assertion false will be true.
Al: true
while true do x <- 1
A2: false
3. (a) (i) using forward construction:
Al: true
x<-l1
Al: dyltrue A x = 1]
But A1 is equivalent to the assertion “x = 1”, hence
AT: true
x<-]
Al:x=1
is correct.
Similarly
Al:x=1
yo2
Al: 3azgfx=1Ay=2]
is correct and equivalent to
Al:x=1
yo 2
AF:x=1Ay=2
346 ANSWERS TO SELECTED PROBLEMS

By the rule of compositio n, the pr og ra m is cor rec t wit h res pec t to AJ an d


AF.
(ii) Using the Al te rn at e Ax io m of As si gn me nt to co ns tr uc t as se rt io ns in the
backward direction from AF, we have
2x-—
(x=1A2=2)fy =1lAy=2).
But x = 1 A 2 = 2 is eq ui va le nt to the in te rm ed ia te as se rt io n
Al:x = 1.
Ag ai n us in g the Al te rn at e Ax io m of As si gn me nt we ha ve
(1 = 1If{x — 1} = 1).
But 1 = 1 <> true , whi ch is pre cis ely the init ial ass ert ion AZ. Hen ce,
the program is correct with respect to AJ and AF.
4. (a) Inorder to apply the if-t hen rule , we mus t est abl ish the fol low ing two ass ert ion s:
@ bk=x A 7a <0) S10 <0Sx=-*)A CC S0>x=x)]
nd
Gi (x= xX Ax <Of{x— —xTO'’ <0>x=—-xX)A x’ >O0>x=x’)).
To establish the implication (i), we note that
[x =x’ A (x <0)] > 70’ < 0), and
(x > fx’
<0) <0 >x = —x’.

By hypothetical syllogism (Table 1.1.2), it follows that .

[Ix=x' A (x <0) > [X’ <O>x = —x'), qd)


Moreover,
[x =x’ A “(x < 0)]} > x = x’, and
(x = x’) > [xX’ > 0> x = x’). Hence,
[Ix=x A mx <0] >’ S>O0O>x = x’. (2)

It follows from (1) and (2) that (i) holds:


[Ix=x A 7x <0)
> [0 <0>%=-x) A SO0>x =x’).
To establish (ii), we first use the Alternate Axiom of Assignment to conclude
[x <0S x= xX) AW D0 x = —xX)x— —xHIO’ <0 >%x = -x)
AG’ >0>x =x’).
We then show ,
[Ix=x’ Ax <0]
> [' <0Sx=x)J AW S0>x= —x’].

(This can be done either with truth tables or by using the identities of Tables il ll
and 1.1.2.) Applying a rule of consequence, it follows that (ii) is true. Thus, the .
if-then rule can be applied and we conclude that the program segment is correct. _
6. procedure ZERO
Al:n>0O
begin
ie;
Alin>OQAi=1
Avicnt+1A Wil<j<i> Vij] =0)
ANSWERS TO SELECTED PROBLEMS 347

while {<i n do
Aisin
A Vifl <j <i> V[j] =0]
begin
Vii] — 0;
A4:isin A Vil sjsi= Vis] = 0]
ieit+l
At:i<n+1iAVill<j <i> VU] =0]
end
AF: Vifisjsin=> V{j] = 9]
end

Either axiom of assignment can be used to show /

Al 1}A1. We
— {i note that Al > A2 since n>O Ai=1l>i<n+],
and the assertion 1 <j < iis false for all j.
By a rule of consequence, this establishes that AI{i << 1}A2.
We next establish that the hypothesis of the rule of iteration holds, that is,
(A2 A ix n{V[i]—0;i<—i+4+ 1}A2.
We note that A2 \ i<.n< A3. Hence, it suffices to show
AX{V[i] — 0; ii + 1342.
We will first prove A3{V[i]< 0}44 and then A4{i<_i-+ 1}A2. The assertion
A3{V[i] <— 0}44 follows immediately from an application of the Alternative Axiom
of Assignment. Applying the same Axiom, we find
<j i
itl<n+ 1>Va
<it1 Oiji+l1342.
] =V
WijA
Since the asse rtio n on the left is equi vale nt to A4, it foll ows that A4{i <— i+ 1}A2 .
This establishes that the rule of iteration holds, and we conclude that
A2{while i <n do begin V[i] <— 0; i< i+ 1 end}[A2 A 7“G< nv).
But
[422 A MG<n]>fi=nt+la Wil<j <i> Vj] =0}} > AF.

It follows by the rule of composition that the procedure ZERO is correct with respect
to Aland AF.
The procedure SN EA KY illu stra tes one of the pro ble ms ass oci ate d wit h con str uct ing
initial and final ass ert ion s for a pro ced ure . For exa mpl e, sup pos e a pro ced ure is
intended to sort the entr ies of a list, but the fina l ass ert ion of the pro ced ure spec ifie s
merely that the entries are in non dec rea sin g orde r. The n a pro ced ure whi ch ass ign s
the same val ue to eac h ent ry of the list will be cor rec t wit h res pec t to the fina l asse r-
tion. Thus it is nec ess ary to spe cif y not onl y that the entr ies are in ord er, but that the
final list can be obt ain ed by rea rra ngi ng the entr ies of the ori gin al list.
In practice, however, there is oft en som e sacr ific e of pre cis ion in ord er to mak e
the proof of correctness more manage abl e; thus , the init ial and fina l ass ert ion s we
gave for PRODUCT would be con sid ere d acc ept abl e by som e, wit h the und ers tan din g
con str uct ed by virt ue of our und ers tan din g of the pro b-
that SNEAKY would not be
lem.
If desired, the initial and fina l ass ert ion s for PR OD UC T can be cha nge d so that
ect. Thi s is don e wit h aux ill iar y var iab les as
SNEAKY is no longer formally corr
348 ANSWERS TO SELECTED PROBLEMS

follows:
Atiat>O\a=a Ab=Bd' .
AF: y =a’+b’,
Since the val ues of a’ and b’ are not aff ect ed by pr og ra m exe cut ion , the ap pr op ri at e
value of y will be guaranteed.
9. (a) Al:n=1 A X[VEi] = arg]
begin
index — 1;
Al:nz1 A index =1 A 3i[V[i] = arg]
A2: Vill <j < index=> Vij] # arg] A 3i{V{i] = arg]
while V [index] + arg do
A3: Vill <j < index => V[j] # arg] A Ji[V[i] = arg]
index < index + 1
A2: Vill <j < index => VUj] % arg] A 3i[V[i] = arg]
AF: (Vindex] = arg) A Wil <j < index > V[j] # arg]
end

The loop invariant relation is A2.


(b) The Alternate Axiom of Assignment can be used to show
Al{index — 1}Al1.
Since Al => A2, it follows by a rule of consequence that
Alfindex — 1}A2.
To show the rule of iteration applies, we must show
A2 \ Viindex] # arg{index — index + 1}A2.
We note that the assertion A2 A V[index] # arg is equivalent to A3. By the
Alternate Axiom of Assignment,
Vill <j < index +1=>V[j]# arg] A JilV[i] = arg]{index— index + 1}A2.
Since the assertion on the left is equivalent to 43, the rule of iteration applies
and we conclude
A2{while V[index] + arg do index — index + 1}A2 A V[index] = arg.
But A2 A V[index] = arg = AF. Hence we can apply the rule of composition
to conclude that SEARCH is correct with respect to AJ and AF. Jj

Section 2.1

1. (a) {0,1, 2, 3, 4}
(c) {George Washington}
2. (a) If the universe of discourse is I, then the set is
{x|O0 <x A x < 100}.
(b) If the universe is I, then the set is
{x|dy[x = 2y + 1]}.
4. A=G= $
= {x| x is¢ ,
even}, B
and = F =E
C = {1, 2, 3}.
ANSWERS TO SELECTED PROBLEMS 349

Section 2.2

1. If he. shaves himself, he will break his vow not to shave anyone who shaves himself.
Therefore he must find someone else to shave him. Since only a barber can shave
someone else and he is the town’s only barber, he must leave town to be shaved.
(a) If the assertion “heterological is heterological” is true, then heterological applies
to itself, and is therefore homological; thus the assertion is false and we have a
contradiction. On the other hand, if the assertion is false, then heterological is
not heterological, i.e., heterological does not apply to itself. It follows that
heterological is heterological, and therefore the assertion is true. This is another
contradiction. Therefore the assertion is neither true nor false.

Section 2.3

1. (b) ¢, {1}, {{2, 3}}, U, 2, 33


(d) 9, {9}
(f) ¢, {{1, 23}
A < A: Let A be an arbitrary set and x an arbitrary element of A. Then
—(x € A) V (x € A) isa tautology for any x.
Hence, by universal generalization
Valx ¢ A>x€ A]
is true; therefore, by definition, d< A. §j
It is possible but not always true.
(a) False. Let A = ¢, B = {a}, and C = {g}.
They are both singleton sets, but one is an element of the other.
The single element of {2} is 2. The subsets of {2} are @ and {2}.
The single element of {{2}} is {2}. The subsets of {{2}} are b and {{2}}.

Section 2.4

2. AUVUBUC=(A-(BUC)U(B-Quc
3. A proof of part (b) of Theorem 2.4.1 can be obtained by replacing all occurrences
of U with A, and V with A in the proof of part (a). A proof of part (d) can be
obtained from that of part (c) in the same way.
(a) Assume Cc Aand Cc B. Then
Velx e C>xe Al A Valxe C>xe B]
is true. Since V distributes over A, this is equivalent to
Valx e C>xe AA(xE C>xe€ B))
which is equivalent to
Velxe C>[xE AA xe BI].
Hence
Vxlxe C>xeEeAn Bi,
and therefore
CcANB. §
350 ANSWERS TO SELECTED PROBLEMS

8. (b) Let x be an arbitrary element. Then


xEANASKEAAXEA
<x E A.

Hence, Vxix € AN Ae x € AL SOANA=A. J


(h) We know that A < A and ¢ < B for any sets A and B. Hence, by part (f),
AVOCAUB.
By part (c), AU @ = A. Therefore A Cc AUB. §f
(k) Assume A < B. Then by part (g), and since A < A,
ANACANB.
But by part (b), A 0 A = A. Hence, A < AM B. From part (), AN Bc A.
It follows that if Ac B, thn AN B=A. §f
(n) Let x be arbitrary. Then
xEAU(B-—AexEAVxE(B-A)
(x
-xEAVBA E A))
7“WE
ao (xEAVxE BAKEAV “XE A))
a(xEAVxEB)Al
-xEAVxeEeB
>xEAvUbB.

Hence, A U(B—A)=AUB. J

(A UA = U,) We first note that A = U — A. Then applying Theorem 2.4.3n,


AVA=AU(U—A)=AUU=U. §
10. (a) From Theorem 2.4.3i, we know that A \ Bc A. Hence, by Theorem 2.4.3},

AU(AN B)=A. §f

(c) |xxe
A-~B={xAA € B}
=f{xljxeAAlA€ xB
¢ B}
AA exe UAx
={xjx
={x|jxe AAxe U—B
AAxxee B}
={x|
=ANB. |
11. (a)
Us=6 QS= 4.
(c) ws = fa, 5}; £2 S= 9.
12. (b) (AN BHO O(AN BNC) =ANBN(ENC)
=(AN Bnd
=,

D is a disjoint collection of sets.


ANSWERS TO SELECTED PROBLEMS 351

13. (a) Us =U —{x|aS[Se CA x € SJ.


OS= {x|VS[S ¢ C>x € SI.
Therefore,
as = {x[|WS[70S © C) V mi € S)}
={x|VS [Se CAxeé S}}
= {x|-7dS[S e CA x € S}
={x/dsS[Se CA xe S]}
=US I
14. (b) {@, {fa, 53}, {{ch}, fla, b}, {c}}}.
15. O(S,41) = O(S,) U {A U {anes} |A < Sy}.
Note that each subset A of S, corresponds to two subsets of S,,,;: A and A U {a@,41}.
It follows that S,,, has twice as many subsets as S,. (In the next section, we will
use this analysis to show that if S has n elements, then @(S) has 2” elements.)

Section 2.5

1. (a) Basis: The digits 0, 1, 2, 3, 4,5, 6,7, 8 and 9 (i.e., all decimal digits) are in
the set.
Induction: If x is in the set and d is a decimal digit then xd is in the set.
Extremal: An object is in the set if and only if it can be constructed from a
finite number of applications of clauses 1 and 2.
(c) Basis: OQisin S.
Induction:
(i) Ifx <€ S,then lx e€ S.
Gi) If(x e S A x 0), then x0 € S.
iii) Ife SAyeS A x0), then xy € S.
Extremal: as in part (a).
2. (a) procedure MULT(, 5):
if b = O then return 0 else return MULT(a, b — 1) + a4

4. Basis: We must show

(31) = 2"
But
(> i)’ =12=1
and
SB=B=1.
Hence, the assertion holds for n = 1.
Inductio n: As su me the ass ert ion hol ds for arb itr ary n > 1, ie. ,

(3 i) = x=f 2
352 ANSWERS TO SELECTED PROBLEMS

Then

Sitat »)’

fel
i) + (200+ ny i)+@+ 1?
= 2 + (a(n + 1) MEY) + 1)?
+(n

= Si.§
5. Let m be an arbitrary integer in N. We prove Vn{(a")" = a”"] by induction.
Basis: Suppose n = 0. Then by Definition 2.5.4, (a")° = 1 and a™° =a@° = 1,
This establishes the basis step.
Induction: The induction hypothesis is “Assume the assertion holds for arbitrary
n”,ie., (a") = a™, Then
(amy'*! == (a™)r-g™ by Definition 2.5.4
== qmn.gm Induction hypothesis
== qmntm Theorem 2.5.5
== qm(nt1) distributivity of multiplication. Jf

6. (b) D> (2i+ 1) =(n + 1).


Using the properties of summation and Theorem 2.5.3 we have

SQ+N=2H 14 V1 = MATH
i=0 i=0 i=0
+(n4+1)

=nm+22n+1=(7+1)% §
Note that a proof by induction is not required.
(d) Basis: For n = 0, we have 1 + 2n = 1 and 3" = 1. Therefore, 14+ 27°< #
for n = 0.
Induction: Assume 1 + 2n < 34 for arbitrary n. The inequality i < 3” holds
for all n; hence
2< 2-3",
and therefore,
37 +2 << 3 4 2-3" = 3-3" = 341,
By the induction hypothesis, 1 + 2n < 3+,
sO
14+2n4+2<3"
and
14+2m+)<3", §
7. (a) Casel: fr =1, then r! =1 for alli N, and hence tr! = (n + 1).
i=0
Case 2: Suppose r + 1. We prove the assertion by induction.
ANSWERS TO SELECTED PROBLEMS 353

Basis: For n = 0 we have

romp —1l_r-l_,
r-~l or-i~*
Therefore, the assertion is true for n = 0.
Induction: Assume the assertion is true for arbitrary n. Then
n+i n
Sris Yori porn
i=0 i=0
(nt4)
=! 1 party
r—1l
_ yard — 1 prt2 — yuri

r—-l r—l
pnt2 —1

~~ yp = | i

Basis: The sum of the interior angles of a triangle is 180° = (3 — 2)180°. Hence,
the assertion is true for n = 3.
Induction: Assume the assertion holds for an arbitrary convex polygon with
n > 3 sides, and consider a convex polygon C with n + 1 sides. The polygon C can
be divided into a triangle T and a polygon P of n sides by connectingtwo non-adjacent
vertices. The sum of the interior angles of C is equal to the sum of the interior
angles of P and T. Since P has n sides, we can apply the induction hypothesis to
conclude that the sum of the interior angles of P is (n — 2)180°. By the basis step,
the sum of the interior angles of T is 180°. Therefore, the sum of the interior angles
of C is
(n — 2)180° + 180° = (mw + 1) — 2)180°.
This establishes the assertion for alln > 3. fj
10. The induction step of the proof is fallacious. In particular, if n = 1 or n = 2, it is
not true that the set S contains two nonequal subsets of n people which must overlap.

Section 2.6

1. (a) The empty set is a model of axioms (b) through (€). (This postulate plays the
same role as the basis step in an inductive definition.)
(b) The “infi nite root ed bina ry tree” exam ple of this secti on suffi ces as an exam ple
which satisfies all postulates but (b).
(c) The set {0} where 0’ = 0 satisfies all the postulates but (c).
(d) Let S = {0, 1, 2} where 0’ = 1,1’ = 2,and2’ = 1. Then S satisfies all postulates
but (d).
(e) Let S = {0, x1, %2,---5 V1, Var-+ eh
where 0’ = x, and x} = Xi44 fori eé I4,
w=. forie I+.
Then S satisfies all postulates but (e).
354 ANSWERS TO SELECTED PROBLEMS

2. (a) We show
Vp WaVrip +g) t+r=pt+@tr)l.
Let p and gq be arbitrary natural numbers. We establish
VWipt+gtr=pt@t+trl
by induction.
Basis: Let r= 0. Then by the basis step of the definition with m = p + q
we have
(p+q)+0=p+q.
Also by the basis step with m = q we have
p+q+0)=p+g¢.
Hence
(p+qgtr=p+@tr)
ifr = 0,
Induction: By the inductive step of the definition of addition,
pt+@t+tr)=pt+@try
=(pt+qt+ny
=((pt+qt+ry (Induction Hypothesis)
=(pP+g+rr’,
Thus the assertion holds for allr e N. fj

Section 2.7
1. (a) A* = {A, a, aa}.
(e) B* = {(ab)"|n > 0} = {A, ab, abab, .. .}.
2. (b+) Amdo = Am*s for all m,n > 0.
Proof: Let mbean arbitrary integer. We show Wx[A"A" = A™**] by induction
on A.
Basis: n=0.
Am™A® = A™{A}
= Am
= Amto

Induction: Assume the assertion is true for arbitrary n < N. Then,


AmAnti = Am(4"-A) (definition of A*)
= (A"™A)A (Theorem 2.7.1c)
az Amin, 4 (induction hypothesis)
=a A(mtn)+1 (definition of A”)
= Amttath (associativity of +)
Hence the assertion is true for 7 + 1, and we conclude
VnlA™A® = A™**),
Since m was arbitrary, it follows by universal generalization that
Vn Vn[A™A* = Am], §j
ANSWERS TO SELECTED PROBLEMS 355

Let A={a'lic N}=A* and B= ({ailic N \ i¥ 2} = A* — {aa}. Then


A? = B* but A + B.
(a) xe A Box =yz for some y € Aandze B,
igN iEéN

<>xX = yz for some y € A andz eé B, for somek € N,


<x ¢ AB, forsomek EN,
@exe LJ (AB).
iéN

(b) By Theorem 2.7.3a, A* = {A} U At. Hence A* = A? if and only if {A} < A*;
ie, A € A" for somen € I+.If A € A, then A € A! and therefore A* = A’,
Conversely, if A* = At, then A € A” forn < 1+. But if A € A’ it follows that
AEA,
(c) We apply parts (a) and (b) of this problem. Since A ¢ A*,
(A*)* = (A*)* = A*, Of

(a) Counterexample: Let A = {a} and n = 2.


(e) True. By Theorem 2.7.3m, (A*B*)* = (A U B)* = (BU A)* = (B*A*)*.
The assertion is always true. We prove containment in both directions.
(i) (A, UE, U ++: UE,)* < (ETER ++ En)*
1. E, c E*XE%.--- E* for all i (by induction on i, using Theorems 2.7.3d
and e)
2. (2, UF, U ++ UE, c EGER --: EF
3. (A, VUE, Us: UE,)* c (EXER +++ EX)* (2, Theorem 2.7.3f)
(ii) (EtER--- EXS* c(A, VUE, Ue U E,)*
1 EA c(h VEU: UE, for all 7
2. EXc(E, UE, U-:: UE,)* (1, Theorem 2.7.3f)
3. E*ER..- E® co (EF, VE, U-:: UE,)* (2, Theorems 2.7.1d
and 2,7.3})
4. (EXE%--» EX)* c(E; UE, U+:: U E,)* (3, Theorem 2.7.3j). §j

10. We establish that A*B is a solution by substituting A*B for X on the right and
showing that the remaining occurrence of X is equal to A*B.
X = A(A*B) UB Substitution
= (AA*)BU B Associativity
= A*BU {A}B 2.7.3h and 2.7.1b
= (A* U {A)B 2.7.1f
= A*B
11. If X = XAU Band A ¢ A, then X = BA* is the unique solution. The proof
is essentially the same as that of Theorem 2.7.4.
13. (a) (1) Xt = AX, U BX,

(2) X, =(AU BX, U BX, U {A}


Since A ¢ A, so lv in g eq ua ti on 1, for X; us in g Th eo re m 2.7 .4 yi el ds
X, = A*BX2.
Substituting this into equation 2,
X, =(A U B)A*BX, U BX, VU {A}
= ((A U B)A*B U B)X, U {A}
356 ANSWERS TO SELECTED PROBLEMS

Now A ¢ (4 U B)A*B U B, so again applying Theorem 2.7.4,


X, = (A U B)A*B VU B)*{A}
=: ((A U B)A*B U B)*
and therefore,
X, = A*B((A U B)A*B YU B)*.

14. (a) {aa, ab, ba, bb}*{a, b}.


(b) {b}*{a}{b}*
(c) {a}{a, b}* U {a, b}*{b5}.
(d) {a, b}*{aaa}{a, b}*
(e) {a, b}*{bbab}{a, b}*

Section 3.1

1. (a) {<0>, <1}


(e) {<0, 0>, <0, 1, <0, 2>, <0, 3>, <0, 4», <i, 1, <2, 2», <3, 3>, <4, 4>}

2. (a) A:<a,b,d,c>, length = 3


B: <a, o>, length = 1
<a, b, c>, length = 2.
(b) All nodes of A have indegree and outdegree of 1. All nodes of B have indegree
and outdegree of 3.
(c) A:<a>, <a, b, d, c, a>
B: <a>, <a, a, <4, ¢, a),
<a, b, ad, <a, b, c, a>, <a, ¢, b, a>.
3. (a) The graph is connected and consequently has one component.
(c) The graph is disconnected and has two components.
(f) The graph is strongly connected and consequently has one component.
5. (a) Basis; 0>0
Induction: if y > x then
yt+ti>-xand
y+1loxtl
Extremal: x > y only if it can be shown by a finite number of applications of
clauses 1 and 2,
By the basis clause, 0 > 0. By successive applications of the induction clause,
O>odelris2>1>3>1.

6. (a), {<1}, {C2}, {3d}, (CLD, <2}, (EDD, <3}; 12D, BD}, KD, 29, GB}
(b) 903) == 29

8. (a) We wish to show


<a,b>=<e,D<ama=cA\b=d
ie., {fa}, {a, b}} = {fc}, fc, dim a=cAb=d.
Proof by cases:
Case 1. If a = b, then <a, b> = <a, a> = {{a}}. Then <a, b> = <c, d> if and only
if <c, d> = {{a}}, in which case it follows that {c} = {a} and {c, d} = {a}; ive.,
=e d,
ANSWERS TO SELECTED PROBLEMS 357

Case 2. If ab, then <a, b> = {fa}, {a,b}. Then <a, b> = <c, d> only if
{c} = {a} and {c, d} = {a, b}. But if {c} = {a}, then c = a, and therefore, since
az b, {c,d} = {a, d} = {a,b}; henced=b. Jj
(b) Under the given definition the ordered triples <1, 2, 1) and <1, 1, 2> are equal,
but they are not equal according to Definition 3.1.1.

Section 3.2

(a) No; no node has indegree 0, and there exists a cycle.


(b) Yes; node a is the root.
(a) The root is a; the leaves are b, d and e; the height is 2. There are four proper
subtrees.

Suppose there is a directed path which is not simple from a node a to a node 6 in
the tree. Because the path is not simple, it contains a cycle of length > 1. Since there
is a directed path from the root r to a there must be at least two distinct directed
paths from r to 6, one of which contains a cycle and one of which does not. But
this contradicts Theorem 3.2.1. Hence, every directed path is a simple path. Jj

From Theorem 3.2.1, there is a directed path from the root r to a and from r to b.
It follows that there is at least one undirected path from a to b. Now suppose
£€05 C1y+++5Cm> and <do, dy,...,d,> are distinct simple undirected paths from
ato b. Then a = cy = d) and b=c,, = d,. Let i be the least integer such that
c, = d;, for all k < i, but ¢,,; ~ d;4,;. Note that since the paths are distinct, i exists
and 0 < i<_m — 2. Let j be the least index such that j > i and c; = d, for some
r> i.Sincec, = d,,j exists, 7 < m, and eitherj 4 i+ 1orr #i-+ 1, By the choice
of j, there is no c,, i <.s <j which is equal to any d,, i << ¢ <r. Hence the path
(Cis City + +9 Cj G1, Gp-2,---,4> is an undirected simple cycle of length greater
than 2, contradicting Theorem 3.2.2. Hence, if a ~ 6, then there is at most one sim-
ple path from atob. fj
Basis: If n = 1, then the only node is the root. Since there are no loops on nodes
there are O = n — 1 arcs. Hence the assertion holds for trees with 1 node.
Induction: Suppose the assertion is true for all trees with n nodes; n > 1. Let T
be a tree with n + 1 nodes. Then T has at least one node a with outdegree 0 and
indegree 1; a is a leaf. Consider the tree T’ formed by deleting the node a and its
incident arc from the tree T. Then 7’ has n nodes and by the induction hypothesis,
T’ has n — 1 arcs. But T has one more node and one move arc than T’; hence T
has n + 1 nodes and nv arcs. This establishes the induction step and completes the
proof. J
The recursive pro ced ure give n bel ow uses a pro ced ure ME DI AN whi ch retu rns
the median value of a finite set of integers. The median of a finite set of integers S
is the element x < S such that either the number of elements of S less than x is
equa l to the num ber of ele men ts of S grea ter than x (if S has an odd num ber of ele-
ments), or the num ber of ele men ts less than x is one mor e than the num ber of
elements greater than x (if S has an even number of elements).
procedure CONSTRUCT_TREE(S):
comment: Construct a bin ary sea rch tree who se nod e val ues are the ele men ts of the set S.
358 ANSWERS TO SELECTED PROBLEMS

if S = ¢ then return ¢
else
begin
m+«- MEDIAN(GS);
Si-{xlxe SAx< im;
S2,-{xlx Ee SAx>m);
construct the tree T such that
(a) the root r of T is labelled m
(b) the left subtree of r is CONSTRUCT_TREE(S1)
(c) the right subtree of r is CONSTRUCT_TREE(S2);
return T
end

10. We first show that the bounds are attainable. Consider a tree in which each interior
node has a single descendant. Then the tree has a single leaf, and for each integer
d such that 0 < d< A, there is a single node a distance d from the root. In such a
tree, 2 = h + 1, so the lower bound is attainable.
Now consider a tree in which each internal node has two descendants, and all
leaves are a distance A from the root. Then the number of nodes in the tree is
h
L244 40-42% = HW M1 1,
i=0

so the upper bound is also attainable.


We now show that # + 1<.n< 2'*! — 1. Let T be a binary tree of height A,
and let kz be the number of nodes of T which are a distance d from the root of T.
h
Then 1< kz < 24, andn = ) ky. Therefore
g=0

h h
2am -1.
At+i=Mi<n<c¥
d=0

13. (a) Preorder: ABDEHKCFIJLMG


(b) Inorder: DBKHEAIFLIMCG
(c) Postorder: DKHEBILMJFGCA

19. (a) The height of 7, is one greater than the height of 7).
(b) No more than A, records are examined in a search in 7,, and no more than
h, + 1 records are examined in a search in 73.

21. (a) Since T is complete, every node has either no sons or two sons. If the root has
no sons then n = 1 and d, = 0. Hence

2-4 = 20 = |,
Me

i 1
if

Now assume the assertion is true for all complete binary trees with 7 leaves,
n & I+, and let T’ be a complete binary tree with m + 1 leaves. Then 7” can be
constructed from some complete binary tree T with leaves b,, bz,...,6, by
adding two sons to a leaf b, of T, 1< A <n. Let these leaves be 5; and bi..1,
and associate the remaining leaves of T with those of 7’ in the natural way:
if 1< _m <k, then b,, corresponds to b/,, andif k + 1 <m<n, then 5,, cor-
responds to 6,41. Then for 1<m<k, d), =d,, and fork +1<m<n,
ANSWERS TO SELECTED PROBLEMS 359

d',+1 == dn. Moreover, by construction, d, = di.; = d, + 1. Hence


n+l k-1 n
> 2-4! = Ss 9-4: + 2.297 (det 1) a > Q-4
f=] r=] f=k+1

= 1 by the induction hypothesis. J

(b) Suppose we begin at the root of a tree T and follow a path from the root to a
leaf. If at each node in the path it is equally likely that we turn left or right,
then the probability of travelling any particular path of length m is 2-, and
consequently the probability of reaching node 5; is 2~%. The sum of these
probabilities must be 1.
If 7 is a complete k-ary tree with n leaves, b,, b2,...,6, and d, is the
length of the path from the root to leaf 5;, 1 <i<n. Then

Skea,
i=]

(c) The height A of the tree is the maximum path length of the d;; that is,
h = max {d,}. The maximum number of leaves n of a binary tree of height A
is 2*, Hence,
n < Dmax {di}

and therefore
log n < max {d}}.
Since max {d,} is an integer,
flog n]< max {d}. J

Section 3.3

1. Reflexive Irreflexive Symmetric Antisymmetric Transitive


(a) NO NO NO YES YES
(b) YES NO YES NO YES
(c) NO YES YES NO NO
(d) NO NO NO YES NO

3. @ IxI = < =< D

Reflexive N Y Y N Y N
Irreflexive Y N N Y N WN
Symmetric Y Y Y N NWN
Antisymmetric Y N Y Y ¥ WN
Transitive Y Y Y Y Y Y

(Note: D is not reflexive because 0/0 is not defined.


D is not antisymmetric because 1D(—1) and
(~1)D1,)
360 ANSWERS TO SELECTED PROBLEMS

6. Relative Absolute
Union Intersection Complement Complement

Reflexive Y Y N N
Irreflexive Y Y Y N
Symmetric Y Y Y Y
Antisymmetric N Y Y N
Transitive N Y N N

7. (a) The relation is not irreflexive; all other properties hold.


8. (a) irreflexive, antisymmetric
(b) The relation {<a, b>, <b, c>, <c, a>} is irreflexive and antisymmetric but is not
a tree. .

Section 3.4

1. RR, = {<a, c>, <a, dd} R,R, = {<e, d>}


Ri = {<a, a>, <a, b>, <a, a>} Ri = {<b, c>, <e, b>, <b, a>}
3. m=0,n = 15.
5. Both (a) and (b) are false assertions.
6. (a) (R,(R2O R3) c RyRz O R,R3)
Proof: <x, y> € Ry(R2 O Rs)
<> dgix,m & Ry A <z,y> € RO Rs]
<> Az<x,z> © Ri A <z, > € Ra A Gy? € Rs)
<> dz[<x,z> € Ri A <> E BAAD ER AY E Rs]
=> (x, yp € RR, A <x, yD € Ri Rg
=> <x, y> € RR, 0 R,R3. |
(b) Let A = {a}, B = {1, 2} and C = {c}, and define
R, = {<a, 1D, <a, 2},
R, = {<1, e>},
R3 = {<2, >}.
Then R,(R, ‘a R;) = d, but R,R, a) R,R3 = {<a, cy}.
A similar example can be constructed for part (d) of the theorem.
9. (a) True. Since R,; and R, are reflexive, <x,x> & R, and <x,x> € R, for all
x € A. Therefore <x, x> € R,R, for all x € A. Hence R,R, is reflexive.
(b) False. Let A = {a, b}. Ry = {<a, b>} and R, = {<b, a>}. Then R,R, = {<a, a>}
which is not an irreflexive relation on A.

Section 3.5

1. (b) r(R) is {<a, a>, <a, b>, <b, b>}.


s(R) is {<a, a>, <a, b>, <b, ay}.
t(R) = R.
2. Theorem 3.5.3(c): (Ry A R2)° = RYO RS.
ANSWERS TO SELECTED PROBLEMS 361

Proof: Let x and y be arbitrary elements of A and B respectively.


Then <x, ¥> & (Ri mY Ri)? <> <y, x> € R, a) R2

<> <y, x> & Ri A XY, XD € R,

<> <x,yp © REA <x,yp © RG


<> <x,yy € REO RS. Ef
Theorem 3.5.3(e): $° = @.
Proof: $° = {Kx, ><y, > € O}.
But the predicate <y, x> € @ is false for all <y, x>. Hence, do = ¢. JJ

Theorem 3.5.37): R, < Rz > Rj < RS.


Proof: Let x and y be arbitrary elements of A and B respectively and assume
R, < R,. Then

<y, x> € Ris x,y € R, => <x, y> € R, <> <y, > € Rj.

Hence Rj < Ry.


(a) If R, > R2, then R; UE> R, UE. By Theorem 3.5.2, RU E=r(R);
hence r(R;) > r(R,). ff
(c) By Definition 3.5.1, ¢(R;) > R, and t(R;) is transitive. Since R; > Rz, if follows
that t(R;) > R,. By property (iii) of Definition 3.5.1, 7(R:) > ¢(R2). Fj
(With minor modification, this proof can be used to establish 4a and 4b.)

(c) Since R,; U R, > Ri, it follows from part (ii) of Definition 3.5.1 that
1(R; U R,) > R;. Since t(R; U R;) is transitive, by part (iii) of Definition
3.5.1, t(R,; U R,) > t(R,). Similarly, 7(R; U R,) > t(R,), and hence
(Ry U Ry) > t(R;) UCR). fl
(a) By hypothesis, R is reflexive and therefore R > E. By definition, s(R) > R,
and hence s(R) > E. Therefore s(R) = s(R) U E, which establishes that s(R)
is reflexive. A similar proof establishes that r(R) is reflexive. Jj
(c) Since R is transitive, R = t(R). To show r(R) is transitive, it suffices to show
that ¢r(R) = r(R) as follows:
tr(R) = t(R U E)

=U (Ru Ey
It is easy to show by induction that
(RU Ey = U Ri;
we leave this to the reader. Thus -_
IC~

tr(R)
it ces

a
|

Qo
ar
wm,
i
if Ce

~
I
o
a
362 ANSWERS TO SELECTED PROBLEMS

8. (a) The digraph of t(R) has two components, one the complete digraph on {a, , c}
and the other the complete digraph on {d, e, f, g, h}-
10. (a) (Rt)* = 7(t(R)). But 7(R) is transitive and hence by Theorem 3.5.1, t@(R)) =
t(R).

11. Yes. The procedures B, C and E are all recursive.

13. (a) M’+M”


(c) If & =0, then the incidence matrix for R* is the identity matrix I. If k > 0,
then the incidence matrix for R* is M*.
14. (a) In the following array, the entry in row i and column j is the probability that
the ith die beats the jth die.

(bt) R= {<A, B), <B, C>, KC, A), KC, D>, <D, A>}.
(c) The transitive closure is the universal relation on the set {4, B, C, D}.
(d) In most games of this sort, the relation “is more likely to win than” is transitive.
But in this game, R ~ ¢(R); it follows that the relation is not transitive. If the
relation were transitive, there would be a best die.
(e) If you wanted to make money, the proposed game would be a poor vehicle
because no matter which die you pick, your opponent can choose one which
will beat yours 2/3 of the time. Note that this would not be possible if the rela-
tion of part (b) were transitive.

Section 3.6

1. Quasi Partially Linearly Well


ordered ordered ordered ordered

<N, <>
MMM S
222%

BAS
MNS

<N, =>
<I, =>

<R, =>
<@(N), proper
containment>
ZB

ZZ
Z22~

<P), <>
eZ

<P(fa}), <>
mM

<M

<P(G), <>

2. (a) Since R is a quasi order, R is transitive and irreflexive (and hence, antisym-
metric). By Theorem 3.5.8(c), r(R) is transitive and by definition of reflexive
closure, r(R) is reflexive. It remains to show that r(R) is antisymmetric. We
ANSWERS TO SELECTED PROBLEMS 363

first note that the antisymmetry condition on a relation T

. (Kx y ETA <y,x> ET) > x=y


is logically equivalent to the condition
(x Ay A <x,y> © T) > <y,x) € T).
Now suppose x # y and <x, y> € r(R). Then <x, y> € R. But since R is
antisymmetric, <y, x> ¢ Rand since x # y, <y, x> ¢ E. Hence <y, x> € r(R),
which establishes that r(R) is antisymmetric. Thus, if R is a quasi order, then
r(R) is a partial order. Jj

False. The antisymmetry condition fails for any pair of integers <x, —x>, where
x 0,
(a) Suppose R is a quasi order. Then R is irreflexive and transitive. Since
<x, x> ¢ R for any x, it follows that <x, x> ¢ R* for any x. Hence R° is ir-
reflexive. To show R¢ is transitive, consider any <x, y> € R° and <y, z> € R*.
Then <y,x> € Rand <z,y> € R, and by the transitivity of R, <z,x> € R.
Hence <x, z> € R°, which establishes that R° is transitive and therefore a quasi
order. Jj
All of the assertions are true.

(a) (only if) If R is a quasi order, then R is irreflexive and transitive. Since R is
transitive, by Theorem 3.5.1, R = 7(R) = R*. Now suppose <x, y> € R.
If <x, y> € R*%, then <y,x> € R, and since R is transitive, it follows that
<x,x> € R, violating the irreflexivity of R. Thus, if <x,y> € R, then
<x, y> ¢ Re and hence RO R° = ¢.
(if) We must show that if RQ R° = @ and R = R’*, then R is irreflexive and
transitive. Clearly R must be irreflexive, since if <x, x> € R, then <x, x> € R°
and R R° + @. Moreover, if R= R*, then R = ¢(R) and it follows that
Ris transitive. Jj
11. The proced ure PR OD UC T has a sing le loop and will ter min ate if this loop is tra-
versed only a fini te num ber of time s. The loop will be trav erse d so long as i < a,
that is, so long as a — i> 0. Sinc e both a and i are inte ger vari able s, this mea ns that
the loop will be trav erse d if a — iis a mem ber of the wel l-o rde red set I+. By the init ial
assumption, a > 0, and the first ass ign men t sta tem ent init iali zes 7 to 0. If a= 0,
then a — i ¢ 1+ and the loop is not trav erse d at all. If a > 0, then the loop is tra-
versed causing i to be inc rem ent ed and a — i to be dec rea sed in valu e. Sinc e I-+-
is well-ordered, the value of a — i will not be a member of I+ after a finite number
of executions of the loop, causing the while loop to terminate. |

13. Proof: Suppose a and 6 are leas t upp er bou nds of B. The n by def ini tio n of lub,
a<band b< a. Sin ce < is ant isy mme tri c it fol low s that a = 5. A sim ila r pro of
holds if a and b are glbs. fj
14. (a) True.
(b) False. (Consider <1, —1> and <—1, 1.)
(c) False. (Because T is not a linear order.)
(d) True.
(e) False. (T is not antisymmetric.)
364 ANSWERS TO SELECTED PROBLEMS

17. (a) Let <S, <)> be a poset, and let B be a finite subset of S. If B does not have a
minimal element, then for each x; € B we can find some x;,, € B such that
X; > X41, that is, x, > x;,, and x; ~ x41. It follows that we can construct
an infinite sequence of strictly decreasing values of B:
Hy > Xing > Xing tt
But B is finite, so in any infinite sequence of values of B, some value must be
repeated. By antisymmetry, it follows that all intervening values in the sequence
must be equal, contradicting the condition that x; ~ x;,;. Thus any strictly
decreasing sequence of elements of B must terminate, and it follows that B
must have a minimal element. The proof that B has a maximal element is
similar. Jj

Section 3.7

1. The conditions for R to be an equivalence relation on A can be expressed as follows:


(i) Vxfx € A> <x, € R]
(ii) Vx Vyl@,y € AA <x, y> © Ry, € RI
fii) Vx Vy Val(ix,y,zE AA KX ERA YD E R>HDE RI
For each of these implications, the conclusion is true if R = A x A; hence, A x A
is an equivalence relation. The rank of A x Ais 1.
(a) n?, (The relation is A x A).
(b) 1.
The union of two reflexive (symmetric) relations will be reflexive (symmetric), but
the union of two transitive relations is not necessarily transitive.
(a) Not reflexive or symmetric;
tsr(R) =1 x 1.
(b) Not symmetric;
tsr(R) = Ix
(c) Not reflexive;
tsr(R) = RU {<0, 09}.
The fallacy in the argument is the assumption that every element is related to some
other element by R. If this is not true, then the hypothesis of the symmetry condition
is always false, and therefore the conclusion is false. Thus, the void relation on a
nonempty set is vacuously symmetric and transitive but not reflexive, and the relation
{<a, a>, <a, b>, <b, a>, <b, b>} is symmetric and transitive but not reflexive on the set
{a, b, c}.
(only if) Suppose R; = R2, and let a be an arbitrary element of A.
Then 7
(ale, = {x|xR,a} = {x|xR,a} = [a]g,.
It follows that
{[ele,|@ € A} = {[@]k,|@ € A}.

(if) Suppose {[a]z,|@ € A} = {[a]e,|a € A}.


Then for each a, b € A,
<a, b> & Ry, <> a € [b]z, <> a & [b]e,<> <a, BD € Ry.
Hence R, = R. i
ANSWERS TO SELECTED PROBLEMS 365

10. A/R = [0], (11, [21, [3], [4], [S1}, where [Kk] = {y|y = 67 + & for some i < Th.
11. (a) Maybe. (Yes, if 7, = 2,; otherwise, no.)
(c) “Maybe. (Yes, if 2, A 2, = ; otherwise, no.)
12. (a) No. (Let A = {a} and R, = {<a, a>}.)
(c) Yes.
13. n.
15. By definition
{x,y ER ex-youd € L.
for somec
<x, y> € Ry x — y = dk e I.
for somd<

(a) (only if) Suppose I/R, refines I/R,;; then R, < R,. The pair <k, 0D € Ry and
hence <k, 0> € R;,; therefore
k-O=1-k=¢q forsomeceL
It follows that & is an integral multiple of j.
(if) Ifk =rj for some r é I, then
Kx, y> € Rye (x — y) = ck for somc e€ I
=> (x — y) = cr for some c,r € I

=> <x, yp © R;;


hence R, < R; and therefore I/R, refines I/R;.
(b) Let d be the gre ate st com mon div iso r of j and k, and let R, den ote equ iva len ce
mod d. Then
W/Ra = 1/R; + T/R,.

(c) Let m be the least co mm on mu lt ip le of j an d k, an d let R,, de no te eq ui va le nc e


mod m. Then
l/ Re
WRy = WR;
16. (only if) Suppose z induce s R an d R in du ce s a (p os si bl y dif fer ent ) pa rt it io n 7’.
Let a be an arbitrary element of A, an d let B an d B’ be bl oc ks of a an d n’ re sp ec ti ve ly
such thata ¢ Bandae B’.
Then for any 5,
be B<aRb

<> [a]p = [b]e

<be B’.

Hence, B = B’. Since the blocks of z and 2’ exh aus t 4, it fol low s tha t z = 7’.
(if) Suppose R induces z and 7 ind uce s a (po ssi bly dif fer ent ) eq ui va le nc e rel ati on
R’, Then for any a, b € A,
aRb <> [ale = [b]e
<> a,b & [ale
<> IBIBe nm Aae BA bE B
<> aR’b.
Hence, R= R’. fj
366 ANSWERS TO SELECTED PROBLEMS

19. Suppose z and 7’ are sum partitions of x, and z,. Then by Definition 3.7.8, 2 and
n’ refine each other. By Theorem 3.7.11, the relation “refines” is antisymmetric and
hencez = 27’. Jj
20. (a) Part (i) of Definition 3.7.7 establishes that 7,-7, is a lower bound of the set
{7,, 7%} under the relation “refines.” Part (ii) of the same definition asserts that
71*7%z is the greatest lower bound. ff

Section 4.1

1, (a) Function; f({a, b}) = {0}.


(b) Not a function; b has two images.
2. fk) = gh) =x +1.
3. (a) There are ten such functions, consisting of the following disjoint classes.
(i) The constant functions f(x) = 0, f(x) = land f(x) = 2.
(ii) The identity function f(x) = x.
(iii) The functions which map two elements to themselves and the remaining
element to one of those two. (For example, f(0) = 0, f(1) = 1 and
f@) = 1)
(a) Let y be an arbitrary string in 2*. We prove
xy ll = lll] + ly ll
Vextl|
by induction based on the inductive definition of Z*.
Basis: If x = A, then
xy |] = Ay] = iyi] =0 + |lyll = lll] + Uy
Induction: Assume the assertion holds for an arbitrary x <¢ £*. Then, for
a € &, consider the string ax:
\jaxy|| = 1 + ||xyl] Definition of length
=1+4({xll+l|ly|) | Induction Hypothesis
= (1 + {]x|) + [ly] Associativity of +
= |lax]| + Illy Definition of length.
Hence, it follows that
Vaxtilayll = lll] + (lll
Since y was arbitrary, the result follows by Universal Generalization. Jj
f(m, n) = n™, (We define 0° = 1.)
Proof: Let n be an arbitrary element of N.
Basis: If m = 0, then
f0,”) =1= n°,
Hence, the assertion holds for m = 0.
Induction: Assume that f(m, n) = n™ for an arbitrary m & N.
Then
S(m + 1,7”) = fn, n)-n Definition of f
= nn Induction hypothesis
== ymtl property of exponents.
ANSWERS TO SELECTED PROBLEMS 367

Hence, the assertion is true for all m < N. Since n was arbitrary the result follows
by Universal Generalization. Jj
9. The procedure SUM1 is a recursive algorithm:
procedure SUMI1(m, 7)
if n = 0 then return m
else return S(SSUM1(m, P()))

The procedure SUM2 is an iterative algorithm;


procedure SUM2(m, n):
begin
sum <—m;
count — ni;
while count > 0 do
begin
sum < S(sum);
count «- P(count)
end;
return sum
end

11. (a) £99) = F(FO9 + 11)


= f(f(110))
== f{(110 — 10)

= f(100)
= f(f(100 + 11)
= f(f(il)
== f(1i1 — 10)

= f(101)
= 101 —10 = 91

(b) The proof is in two parts.


(i) We first show f(x) = 91 for all 90 < x < 100.
£(90) = f(F101))
= fl)
= f(f(102))
= f(92)
= f(f(103))
= f(93)

= f(99)
= 91 by part (a).
368 ANSWERS TO SELECTED PROBLEMS

(ii) Now let x < 90 and let & be the smallest integer such that
90< x + 11k< 100.
Then fx) = f(f@ + 11))
=f(F(F@ + 2-11)

== fy + 11k)
where 90< x + 11k < 100 and k >1. By part (i), it follows that
S(e + 11k) = 91; hence

f(x) = fF(x + 11k)


== f*(91) by part (i).
But by part (i), f(91) = 91; hence f*(91) = 91 for all k > 0. It follows
that
f(x) = fF 91) = 91 for allx < 90. Jj
The argument 91 is called a fixed point of f because application of f to
this argument leaves it unchanged.
12. (a) The function go g is defined on R — {0} and go g(x) = x. The image of
gogis R — {0}.

Section 4.2

1. (a) (i) f is bijective.


(ii) J(R)=R.
(iii) f-*({8}) = {8}.
(iv) The equality relation on R.
(v) f(x) = x.
(c) (i) J is injective.
(ii) SIN) = {x vw ly = x + I}.
(iii) f-MK2, DY) = ¢.
(iv) The equality relation.
(e) (i) fis surjective.
(ii) JM =N.
(iii) f, 0}) = {-1, 1, 0}.
(iv) xRy <> |x| =|yl.
3. S = AGLA-on- and T = A® = {{ao, a1,...,@,-1>|@; € A}. Define the map
g:S— Tas follows:

&(f) = <fO), FO), ..., £4 — 1)>.


We show that g is injective. Suppose f; + f2, where f,, f, ¢ S. Then for some k,
O<k <n, fk) 4 f,(k). Then g(f;) + g(f2); hence g is injective. Now consider
an n-tuple <do, a1,..., @,-1> € A”. Let f be the function

f: (0, 1,2,...,2—-1}-A,
I@ =a; forO<i<a;
ANSWERS TO SELECTED PROBLEMS 369

then (dp, @1,...,5 G-1> is the image of the function funder g. Hence g is surjective.
It follows that g is bijective. Jj
(a) ‘mon
Define fas follows:
f: A P(A),
f(@= {a} for alla < A.
Then / is injective, since if
f(a) = f(b), then {a} = {b}, which impliesa = 5. Jj
(a) fO) =a, f() = 6, f@) =e.
(b) f(*) = 2x, x € (0, 1).
(c) f@ = 2n forn>0
= —(2n + 1) forn <0.
(a) Suppose g: A —> B and f: B— C, and let c be an arbitrary element of C.
Since fg is surjective, there is some element a € A such that fg(a) = c. But by
Theorem 4.1.1, fe(a) = f(g(a)), where g(a) € B. Thus c is the image of an
element of B under f. Since c was arbitrary, it follows that f is surjective. Jj
10. (a) Since f and g are monotone increasing, if x< y, then f(x) < f(y), and
a(x) < g(y). Hence, if x < y then
(f + a(x) = f(x) + 8) <fO) + 8) = FS + 8),
and it follows that f+ g is monotone increasing. Jf
(c) Let f(x) = g(x) = x. Then fand g are monotone increasing, but the product
function f-g(x) = f(x)-g(x) = x* is not a monotone increasing function on
R. §
11. (a) Let y be an arbitrary element such that y € f(A) — f(C). Then f(x) = y for
some x € A, but for every z € C, y# f(z). Hence x € A — C, and since
y = f(x), this implies that y ¢ f(A — C). Since y was arbitrary, this establishes
that f(A) — f(C) < f(A —C). ff
12. (a) Suppose y € f(f71(B’). Then there is an x in f~!(B’) such that f(x) = y. Since
x € f71(B, it follows that f(x) € B’. Hence y € B’; therefore f(f~1(B)) < B’.
(b) By part (a), f(f7'(B)) < B’. Suppose y € B’. Since fis surjective, there is an
x € f71(B’) such that f(x) = y. Since x € f-1(B), it follows that f(x) is in
f(f71(B’)). Hence, y € f(f-1(B)); therefore B’ < f(f-1(B)).
14. By Theorem 4.2.4, since f is bijective, f— is bijective. Hence (f ~!)-1! is defined and
equal to the converse relation of f~!. But f~! is the converse of f, so by Theorem
3.5.3a,(f')'=f |
17. The relation R is the equality relation E on A.
19. (a) Let x,,X, € A’ and suppose x, # x2. Since f is injective, f(x1) # f(%2).
But f (x1) = f(x) and f|y(x2) = f(%2). Hence, flae1) # Fl 4(x2). It follows
that f|y is injective. ff
20. (a) Sup pos e x €¢ A — B. Then x € Aan dx ¢ B. Henc e x(x) = land 7,(x) = 0.
In this case ¥ [1 — X2(x)] = 1 = X4-2(x). Now suppose x ¢ A — B; then
xe A—-B=AUB. It follows that yx) = 0 or ¥2(x) = 1, and therefore
either yx) = 0 or 1 — ZXa(x) = 0. Hence xAx)[l — Xa@)] = 0 = Z4-2().
370 ANSWERS TO SELECTED PROBLEMS

21. (a) The function has one left and one right inverse and they are both equal to the
inverse function. The equivalence relation induced by the function is the equality
relation. The canonical map g is defined by g(x) = {x}.
(b) Since the func tion is neit her inje ctiv e nor surj ecti ve, it has no left or righ t
inverses. The equivalence relation induced by the function is the universal
relation. The canonical map g is defined by
g(x) = fa, b,c} for x € {a, b, c}.

Section 5.1

1, (a) 34 = 81.
(c) The c can occur in any of the last three positions in the string. Once the posi-
tion of c is specified, either of two letters can occur in each of the other three
positions. Thus there are 3-23 = 24 such strings.
A binary relation from A to Bis a subset of A x B. There are 2!4*3! = 2I4l-18] = mn
such subsets.
There are 16 binary sequences of length 4. Representation for the sequence of digits
0,1, 2,...,9 can be chosen in P(16, 10) ways.
(a) (4)-2" = 2-1. This can be proved by induction.
(b) 2°71,

Exactly 1 head: (7) = 5,

Exactly 2 heads: ( >) == 10.

Exactly r heads in n flips: ( ):

11. Let|A| = m,|B| =n, and let f: A > B bea bijection. Note that fis an injection from
A to B. Then by the pigeonhole principle, m <n, for if m > n, no injection from
A to B exists. Since fis a bijection, the inverse function f~! exists and is an injection
from B to A; this implies, by the pigeonhole principle, that 2 <. m. Hence m = n,
ie, [A] =[Bl. |
12. (a) If Bc A, then AU B= A; hence|A U B| =|Al.
(b) It is easy to show that A = (A — B) U (A B) and that A— Band ANB
are disjoint. It follows from Theorem 5.1.3 that|A| =|4 — BJ] +/|AM Bl. §f

15. Basis: If n =O, then Dy (”) = (3) =1=2°,


Induction: Suppose the assertion holds for some arbitrary 7 > 0; we now show this
implies that the assertion holds for n + 1.

S(T) =S (0.7 ,)+(2)]


r=0 r=0 r
by Probiem 140.
_ ntl n nti n

= 2% (,” 1) +3 (*):

Next we change variables by letting k = r — 1 in the first sum. Since (”) = Q,


we can change the lower limit of the resulting sum from k = —1 to k = 0. More-
over, since (, 1) == 0, we can change the upper limit of the second sum from
ANSWERS TO SELECTED PROBLEMS 371

n+ 1 to x” without affecting the value of the sum. Hence,

a7 )=3G)+ 30)
== 2" + 2 by the induction hypothesis
= Qari. |

16. (a) The equality follows directly from Theorem 2.5.3(a).


(b) For any base b, positional notation represents b by 10 and 5" by 107 = 100... 0,
% ZeETOS

Therefore, in base } notation, (6 — 1)[1 +10 + 102+ .--- + 10°] = 10"*! —1.
If 6 = 6 and vn = 3, then (in base 6 positional notation),
S{i + 10 + 10? + 107] = 5555 = 10000 — 1.
18. There are 2” distinct bit patterns. Since +0 and —0 are distinct representations
of the integer 0, only 2? — 1 distinct integers can be represented.
(b) We first count the number of nonzero real numbers that can be represented.
The leading digit of the mantissa on a nonzero number must be a 1. The other
m — 1 bits of the mantissa (including its sign bit) can each be chosen in one of
two ways; hence there are 2”~! choices for the mantissa of a nonzero number.
By part (a), the exponent can represent any of 2* — 1 distinct values. Since
distinct pairs of mantissas and exponents denote distinct real numbers, the
rule of product applies. Thus there are (2* — 1)(2"7~!) distinct nonzero repre-
sentable real numbers, and therefore one can represent (2* — 1)(2"7!) +1
distinct real numbers.
(c) We first restrict ourselves to the nonnegative integers. Every integer 1, where
0< n< 223, can be represented by choosing an appropriate mantissa and
exponent. Above this range, not every integer can be represented (e.g. 273 + 1
is not representable). But all such integers greater than 273 require an exponent
with a value greater than or equal to 24, and every configuration with an expo-
nent this large represents an integer. Hence there are about 223 (27 — 24)
integers greater than 223 which are representable, making a total of
223 + 230 — 24.223
or
230 — 23.223
positive integers. Taking negative integers into account gives a total of about
2(239 — 23-223) = 1.76-10°
distinct representable integers.
(d) About 224 integers can be represented in integer notation. Using the results
of part (c), the ratio is about 1 to 100.

Section 5.2

1. The proof of part (a) of Theorem 5.2.1, given in the text, establishes that the
relation “g asymptotically dominates f” is reflexive and transitive. It follows
that the relation = is reflexive and transitive. Moreover, by the symmetry
of the roles of fand g in the definition of =, it follows that = is symmetric and
therefore an equivalence relation. Jj
372 ANSWERS TO SELECTED PROBLEMS

3. Let f@) = 2 if n is even,


= 0 otherwise;
e(n) = 9 if n is even,
= A otherwise.
5. (a) f(n) < 2g(n) for all n, and g(n) < 2f(n) for all n. Hence fand g asymptoticaliy
dominate each other, i.e, O(f) = O(g).
(b) Neither function asymptotically dominates the other.
(c) The function f is asymptotically dominated by g but g is not asymptotically
dominated by f; i.e.,
O(f) < O(g) but O(f) # O(g).
(a) Vkgso Vitmzo Itenln > k A |g()| > m| f(@)|] or
Vklk > O> Va[m > O0> dann e NA fal k A lg@)| > ml f@OI
(b) The universal quantifiers in the first expression of (a) can be interchanged.
Therefore, for any fixed nonnegative value of m, the following assertion holds:
Vkgso Amentn > k A [a(n)| > ml f@)]).
Choose an arbitrary k, <¢ N. It follows by the above assertion that there
exists 1, > k, such that |g(m,)| > m|f(7,)|. Let k2 = mn; + 1. Again applying
the above assertion there is an nm, > kz such that |g(n2)| > m|f(n2)|. Let
k3 = ny +1. Then there is an 13 > k3 such that | g(3)| > m|f(#3)|. Continu-
ing in this manner, we can construct an infinite set S = {7,, m2, m3,...} such
that [g(m,)| > m\f(n)| for alla, ¢ S. &j
(c) The assertion is not true in general. Suppose f(”) = a for all n and
a(n) = n? for even,
= 0 for n odd.
Then f does not asymptotically dominate g but g(”) < f(n) for all n which are
odd.

Corollary 5.2.3, part (a)


(i) (only if) Suppose f is O(g) and g is O(f). Then by Theorem 5.2.3,
O(f) < O(g) and O(g) < O(f); hence O(f) = O(g).
(ii) (if) Suppose O(f) = O(g). Then O(f) < O(g) and O(g) < O(/). Thus, by
Theorem 5.2.3, fis O(g) and gis O(/). fj
Corollary 5.2.3, part (b)
Suppose f is O(g) and g is O(h). Then by Theorem 5.2.3, O(f) < O(g) and
O(g) < O(A). Thus O(f) < O(A). Hence by Theorem 5.2.3, fis O(4). fj :

10. (a) We show that log” ¢ O(1). Suppose to the contrary that logz € O(1); then
there must be some 4, m => 0 such that if n> k, then logn< m-1 =m.
But if 7 > 2”, then log n > m; thus log 7 is not asymptotically dominated by
e(n) = 1 and hence log is not O(1). By Theorem 5.2.4, O71) < Odog vn) and
it follows from the above argument that the containment is proper. jf
(c) We show that if d> 1, then d* € O(n"). Suppose d” is O(n?). Then there exists
k,m => 0 such that if n> k, then d*?< mn?. Then for these values of 7,
niogd< logm + 2logz, and for n> 1,
ANSWERS TO SELECTED PROBLEMS 373

n 2 log m
logn = logd * (og n)(loga)
But the ratio on the left grows, arbitrarily large as n increases, whereas the
first summand on the right is a constant, and the second term decreases as n
increases. Thus the inequality can be violated by choosing n sufficiently large.
We conclude that d" ¢ O(n?). From this result and Theorem 5.2.4, we conclude
that the containment is proper. fj
11. Let K and n be arbitrary positive integers such that K=>[c] and
n> max (c*, K). Then
ni =n(n — 1)(n — 2)...(K + DK(K—1)...2¢1.
Since K > [cl],
(1 — 1)(n — 2)...(K + DK> cr,
Since n> c*,
nl > cKcn"K = cr,
Hence n! > c" if is sufficiently large, and therefore O(c”) < O(n!).
To show the containment is proper, it suffices to show that for any m > 0,
the value of n can be chosen large enough that 1! > mc". Without loss of gen-
erality, we can assume m > 1. We showed above that if” is chosen large enough,
n! > (me). But for n> 2, (mc) > mc"; hence n! > mc" for n sufficiently
large. It follows that n! is not O(c’). fj
13. If P is a polynomial of degree k, then P(n) = ay + ayn + ayn? +--+ + aynk,
where a, 4 0. By Theorems 5.2.5, 5,.2.2(b), and 5.2.3, a;n' € O(n*) for each i,
0<i<k. It follows from Theorem 5.2.2(c) that P is O(n*). Jj
16. Algorithm F takes less time than G to execute if and only if 10 <2” < 50.
17. hoho-hofhohohehohoh
18. The conjecture is true and can be proved by induction on k.
Basis: Ifk =0, then Si# = i= i=Q
Mi=n+le ow).
Li

Induction: We assume >) i* € O(m**') for some arbitrary k. Then there exist
i=Q

values M, K > 0 such that if n > K, then

>» kx M(n**1),
i=0
It follows that

Hence >) i**! is O(n**?). fj


7=0
374 ANSWERS TO SELECTED PROBLEMS

Section 5.3

1. (a) The proof is by induction on N.


By substitution, yp = 2-3° = 2-1 = 2. We now assume (as our induction hypothesis)
that
Yq m= 2+3*,

Then yyn, == 3+y_ = 3-263" = 2-3"),


(a) The solution is x, = 1 + na, which is O(n).
(b) The value of x, is a + x bi, i 0<b <1, then x, is O(1). If 6 = 1, then

x, = a+ nand therefore x, is O(n). If b> 1, then x, = a+ ot = : ) and


therefore x, is O(b").
(a) Clearly x9 = 1. Each line after the first one intersects each preceding line
exactly once, and so the nth line intersects (x — 1) preceding lines and hence
passes through n old regions. It divides each of these old regions into two new
regions. Thus,
Xn == Xqey OM.

It is easy to show by induction on 7 that (7? + n + 2)/2 is a solution to this


recurrence system.

(a) Let x, denote the minimum total path length of a complete n-ary tree of height
h. Each internal node of such a tree has 7 sons. The total path length for a tree
of height 0 is 0; thus,
Xo = 0.

Suppose 7” is a complete n-ary tree of height A with minimum total path length.
Then a complete n-ary tree of height h + 1 of minimal total path length is
constructed by adding 1 sons to some node a of T’ where a is distance A from
the root of 7’. Then the path length from the root to each son of a ish + 1;
thus
Xhai = X, + n(h + 1), where A > 0.

(b) x, = nih +),

By Lemma 5.3.2a, if a = 1, then f(n) = c(log, n + 1) forné S. By results analogous


to those of Section 5.2,
O(c log, n) on S = O(log n) on S and O(c) on S = O(1) on S.
c(aniess 4 — J)
It follows that f is O(log n) on S. Similarly, if a1, then f() = |
for n € S; hence
_ _ €& log, a\ Cc .
f@) = Fn) a-l

Since c and a are constants, fis O(7'°®*) on S. Jj


(a) (Proof of part (c) of Theorem 5.3.3.)
Let S = {n|n = b*}, Since fis O(g) on S and g is O(n, it follows that f is
O(n) on S. Hence there exist numbers r ¢ N and K <¢ R+, such thatifnvi>r -
and n = b*, then f(n) < Kn*. Consider any m & N sufficiently large that for ©
ANSWERS TO SELECTED PROBLEMS 375

some
k € N,
r< bk <m<bk!
’ Because fis monotone increasing,

f(m) < f(6**")


and therefore
f(m) < K(bF**)4
== Kpektd

= Kb%(b*)4
< Kb4(m)é.
Therefore, f(m) < Kb4(m)‘ if m is greater than a power of b which is at least
as great as r. It follows that fis O(v7). Jj
11. (a) procedure MAX2(i, /):
if i = j then return Afi]
else
begin
comment: Divide A into two subarrays of approximately equal size.

m |S |;
qt

maxa <— MAX2(i, m);


maxb — MAX2(m + 1,7);
if maxa > maxb then return maxa
else return maxb
end

(b) f() =0
f@ = 24(4) +1 for k >
n = 2* where 1.

(c) Suppose n = 2*. Then by a proof similar to that for Lemma 5.3.2a it follows
that

But n/2* = 1 and f(1) = 0. Hence

fn) =" 2 = 2 ~l=n-l.

(d) By part (c) the complexity function is O(7).


13. (a) The entries of the array are the node values of the search tree. If the array
is Ali: 7], where i<j, then the root of the tree corresponds to A[m], where

m= || The node values of the left subtree of the root are contained in

Ali: m — 1], and those of the right subtree of the root are in A[m + 1:j]. If
i<m, then the node value of the left son of the root is stored in
a{|24e=+)]. If i = m, then the root has no left son. If m <j, then the

ae
m+j+1 .Ifm=j,
node value of the right son of the root is stored in All
no right son exists.
376 ANSWERS TO SELECTED PROBLEMS

(b) procedure ITBINSEARCH(arg, i, 7):


begin
lo — i;
hi <j;
while Jo < hi do
begin
lo+hi|,
~ L 2 i
if A[m] = arg then return m
else
if Alm] < arg then lo<—m-+1
else hi — m — 1
end;
return “not found”
end

Section 5.4

1. By Theorem 5.4.1, at least 3 comparisons must be made to find the maximum of


4 elements. Thus the height of a binary decision tree for finding the maximum of 4
elements must be at least 3.
The algorithm is the same as the method described in this section for finding the top
two players in a sports tournament. We first index the objects of the set from 1 to
n. The first round of comparisons compares x; with x;,; for all odd 7, The next
round compares the winners of the first round comparisons in a similar way. The
competition can be represented as a binary tree even if n is not a power of 2.
Let maxI denote the largest and max2 denote the second largest values in a
collection of n objects; note that max] may be equal to max2. The competition tree
to find max! will involve a total of n — 1 comparisons and will be of height [log 7];
therefore, in the course of the competition, maxJ will be compared with no more than
[log 7] elements. If one of these comparisons is a tie, then max2 = max1. Otherwise,
max2 must be one of the elements compared with max/, since max2 could not lose
to any other element of the set. Hence, to find max2 we need only find the largest
element of the [log] elements which competed with maxJ; this will require
[log n] — 1 comparisons. The total number of comparisons is therefore
n-—1+(— [lo flogn] — 2.
1) =n + gn]
Let T be a balanced binary tree with 2 nodes. If n = 1, then h = 0 and the assertion
holds. Now suppose n > 1. Since T is balanced, T is complete by definition. Hence
there must be at least two and no more than 2 leaves a distance A from the root.
Furthermore, there are exactly 2* — 1 nodes which are no farther than distance
h — 1 from the root, It follows that
2! <—]
2—-14+2<n
n<— 1
2A+i<c24!
2h<<nQh
h<logn<h+1
ANSWERS TO SELECTED PROBLEMS 377

Since A is an integer and log n lies properly between h and A + 1, it follows that
h=|logn|. Jj
7. Let T be a balanced ternary search tree with n nodes and height h. Then Tis complete
and
hoi h
ye +3<an< > 3
i=Q i=0

34 — J Za rt
5 +3s2<5-— y—
34+ 5< 2n< 34t! — ]
34 < 2m < 3+
h < log; Qa) <<h +1
Hence, h = | log; (2n)|, and it follows that the worst case complexity of a search
in a ternary search tree is O(log 7).
10. Suppose an O(n) algorithm exists for constructing a binary search tree T from an
unsorted list of n elements. Then traversing the tree T in inorder (using the LIST
procedure of Fig. 3.2.3) produces the list in sorted order. Since the traversal algo-
rithm requires no comparisons between elements of the list, the entire sorting pro-
cedure would require O(n) comparisons. But by Theorem 5.4.5, if f is the worst
case complexity function of an algorithm for sorting by comparisons, then
O(a log n) < O(f). Since O(n log n) ¢ O(n), the supposition that a binary search
tree can be constructed in O(n) time leads to a contradiction of Theorem 5.4.5.
12. (a) procedure SEQSEARCH(arg, i,j):
if arg = A[i] then return i
else
if i = 7 then return “not found”
else return SEQSEARC(arg,
H i+1, /)
(b) procedure RECSORT(, /):
if i = 7 then return
else
begin
comment: find minimum entry in list.
min — Afi];
position < i;
fo
k r —i-+ 1 untilj do
if A[k] < min then
begin
min <— A[k];
position — k
end;
comment: interchange minimum with A[k].
A[ position] — Ali];
Ali] — min;
comment: sort remainder of the list.
call RECSORTG + 1,/)
end
378 ANSWERS TO SELECTED PROBLEMS

Section 6.1
ty n. Th en for so me n € N, the re is a bij ect ion
1. Assume [0, 1] is finite with cardinali
the re is a rea l nu mb er z € [0, 1], su ch tha t
f: (0,1,...,2 — 1} to [0,1]. We show
f(m) % z for any mé {0,1,...,2— 1}.
Suppose f(0) = xo,
fi (1) == X41,

f(a _— 1) == Xy-1-

Since f is an inject ion , all x,’ s mu st be dis tin ct. Or de r th em by in cr ea si ng va lu e,

Xjg < Xj, Soret OA

and choose z = (x; , + x, ,/ 2. Th en z is no t the im ag e of an y el em en t in


{0,1,...,— 1}. Hence fi s no t a su rj ec ti on an d th er ef or e no t a bi je ct io n. Th us ,
{0, 1] is not finite and therefore it must be infinite. Jj

Suppose f: A— Bis an inject ion and A is inf ini te. To sho w B is inf ini te we con str uct .
4.
an injection g: B-» B such that g(B) is a proper subset of B.
Since f is inject ive fro m A to B, f is bij ect ive fro m A to f(A ); thu s an inv ers e
function f~! exists whi ch is a bij ect ion fro m f(A ) to A. (No te tha t we are usi ng f~*
to denote a functi on fro m f(A ) to A rat her tha n fro m B to A.) Mo re ov er , sin ce A
is infinite, there is an inj ect ion h : A > A suc h tha t h(A ) is a pro per sub set of A.
We define the function g: B > B as follows:
a(x) = xifx e B— f(A).
g(x) =fohof-'(x) ifx € f(A).
Then fhf ~! is an inj ect ion fro m B to itse lf and fhf -'( B) = SAA ). Sin ce h(A ) is
properly con tai ned in A and fis an inj ect ion , fh( A) is pro per ly con tai ned in f(4 ). It
follows that f-'!Af(B) ~ B and hence B is infinite. Jj
. Proof of Theore m 6.1 .4( d): It suff ices to con str uct an inj ect ion fro m A to A. Def ine
4)

f:A-— AB as follows:
f)=g8 where g(b) = x for all b in B;
that is, f(x) is the con sta nt fun cti on g: B—> A suc h that g(b ) = x. Cle arl y, f is an
injectio n. Sin ce A is infi nite , it fol low s that 4? is infi nite by The ore m 6.1. 3. §

6. (a) Infinite (there is no largest prime).


(b) Finite, G**! — 1)/2.
(c) Infinite.
(d) Finite, (k + 1)”.
(e) Infinite (there is no bound on the length of a statement).

Section 6.2

1. (a) Define the function fas follows:


f:N- x,
f@ =a’.
Then / is a bijection from N to 2*; hence | Z*| = No.
ANSWERS TO SELECTED PROBLEMS 379

(c) For eachn € N, let # denote the sequence of digits of the binary representation
of n in reverse order. Let (wo, w1, w2,...> be an enumeration without repeti-
tions of &*. Then define f: N — P({a, b}*) as follows
f(n) = {w,|the (i + 1)th digit of # is 1, where i > 0}.
For example, if the enumeration of Z* is in standard order,
<A, a, b, aa, ab, ba, bb, aaa, aab, .. .»
then
£0) = $,
f() = {A},
£2) = {a},
fG) = {A, a},
f(4) = {5},
etc.
The function fis a bijection from N to the set of finite subsets of X*.
2. (b) Define f: [0, 1] — [0, 1) by
IY) =}
fQ) =

for)
= ET ne
S(x) = x for x zt.

Then f is a bijection from [0, 1] to [0, 1). Now let g: [0, 1) — [0, co) be defined
by g(x) = x/(1 — x). Then gf: [0, 1] > [0, oo) is a bijection.
3. Let f, be a bijection from [0, 1] to A, fy be a bijection of [0, 1] to B, fp be a bijection
from N to D, and fg be a bijection from {0, 1, 2,...,7” — 1} to E.
(a) Let g;:[0, 4) — (0, 1],
gi(x) =1fa#—2) ifx = 1/nforn> 2 wheren € N,
2,(x) = 2x otherwise;

& :[4, 1] > [0, 1],


&2(x) = 2x — 1.
Then g, and g, are bijections. Using these functions we can define the following
bijection from [0, 1] to A U B;

h:(0, 1] > AUB,


h(x) = f481(x) if XE [0, 4),

A(x) = fogo(x) —ifx € E, II.


Since A and B are disjoint and h is a bijection from [0,1] to 4 UB,
|AUBl|=c.
380 ANSWERS TO SELECTED PROBLEMS

(c) Let <do, d;, dz,.. > and <eo, €1,..., €n-1> be enumerations without repetitions
of D and E respectively. Define a function fas follows:
fiNoD*xE,
Sk) = Cdkjnts Ck mod a

Then fis a bijection and therefore |D x E| = No.


4. No such set exists, although we have not yet developed the tools necessary to show
this. Later we will show that the cardinal numbers can be ordered and that no cardinal
number is greater than every finite cardinal number and yet less than N». Moreover,
we will show | S| < |@(S)| for every set S. It will follow that if S is finite, then OCS)
is finite, and if S is infinite, then |O(S)| > No.
6. (a) Some numbers have two distinct representations in the conventional binary
representation; for example, .10000... = .01111.... We must make sure that
the nonunique representation does not invalidate the conclusion of the dia-
gonalization. Using the procedure described in the problem statement on the
matrix
10000...
.10000
-10000
-10000

would produce the number .011111..., which is different from every represen-
tation in the list but denotes a number equal to the first item on the list.
7. The digits of y form an infinite string which has a left end but no right end. Reversing
the digits results in a string which has a right end but no left end, i.e., this string is
not a member of Z*, where Z is the set of decimal digits. Since only strings in L*
represent elements of N, the result of the diagonalization is not an element of N.

Section 6.3

1, The map /: A’ > A defined by


fQ@) =x
is an injection. Therefore, by Definition 6.3.2, |A’|<|A]. Jj
3. It suffices to show that an injection exists from B to A. Let g: A > B bea surjection.
Define f: B — A as follows. Let b €¢ B. Then f(5) = a where a is an arbitrary but
fixed element of g~'({b}). Then fis an injection from B to A. Hence,|B|/< [A]. Jf
6. This assertion follows directly from the definition of |A|< |B| and Theorem 6.1.3.
(a) We first show that the order relation < on S is a partial order.
(i) Let a be a cardinal number in S and let A be a set such that | 4] = a,
The identity function on A, 1), is a bijection from A to A. It follows that
an injection exists from A to A and therefore |A|<|A|. Hence, a< a
for any a € S which establishes that the order relation < is reflexive.
ANSWERS TO SELECTED PROBLEMS 381

(ii) Let @ and b be elements of S suppose a < b and b < a. It follows from
Theorem 6.3.3 that a = 6, and hence < is antisymmetric.
(iii) Let a, b and ¢ be elements of S and assume a < b and b<c. Let A, B
and C be sets with cardinalities a,b and c respectively. Since a< b,
an injection f exists from A to B. Since b < c, an injection g exists from
B to C. Let h be the composite function h = gf, where gf: A —> C. Then,
by Theorem 4.2.1(b), A is injective and therefore a < c. It follows that
< is transitive.
To show that < is a linear order, we need to show that any two
elements of S are comparable, i.e., either a< b or b< a. By Theorem
6.3.2, for any a,b € S,a<b,a=b, or b <a. By Definition 6.3.2, if
a <b, th a en
< b; if a = b, thena<b, and if b < a, then b < a. Hence
a and b are comparable and therefore < is a linear order. JJ
10. (a) |Q| = No. We show this by noting that
Q = (03 UQ+ UQ-
where Q + is the set of positive rationals and Q — is the set of negative rationals.
Clearly |Q+| = |Q—| and therefore Q is the union of three countable sets.
Hence, by Theorem 6.2.3, Q is countable, ie., |Q|< No. Since there is an injec-
tion from Q+ to Q and |Q+| = No, it follows that No <|Q|. Therefore, by
Theorem 6.3.3, |Q| = No.
(b) |[0, 1] x [0, 1]]| =e.
(i) The function f: [0,1] — [0, 1] x [0,1] defined by f(x) = <x,0> is an
injection. Therefore, ¢ = |[0, 1]| < |[0, 1] x [0, 1]].
Gi) Let x = .xoxjx2... and y = .yoyi yz... be the decimal expansions of
x,y © [0, 1], where we choose a representation which does not terminate
in an infinite sequence of 9’s. (Thus, .50000 . . . is acceptable, but .4999 .. .
is not. This ensures that each x € [0, 1] will have a unique representation.)
Define g as follows:

&:[0, 1] x [0, 1] > [0, 1],


&({KX, YD) = z,
where Z = .Xo¥oX1)1X2)q.-... Then g is an injection which shows that
\{0, 1] x [0, 1]| < |[0, 1]| = ¢.
Hence, by Theorem 6.3.3, |[0, 1] x [0, 1]]} =c. Jj
13.
(a) The assertion is true. Let fbe a bijection from A to B, and define

g: P(A) > PB),


e(S) = f(S) for all Sc A,
Then g is a bijection and the result follows. J
(b) The assertion is true. Let f be an injection from A to B, and g be an injection
from C to D. We construct an injection
h: AC > B»,
Letr € A°. Define A(r) € B to be that function s: D -» Bsuch that ifr(c) = a,
then s(g(c)) = f(a). The function s can be defined arbitrarily on D — g(C).
Then A is an injection, and the assertion follows. J
382 ANSWERS TO SELECTED PROBLEMS

Section 6.4

1. (a) No
(ce) Oifn2=0;R,ifn>1.
@ 0
2. (a) c¢
(b) No
3. Let &, B, and 6 be cardinal numbers of the sets A, B and C respectively and assume
A, B and C are pairwise disjoint. Then
a+ Pp=|AUB|
=(BUA| by commutativity of set union
=B+4,
so addition of cardinal numbers is commutative. Moreover,

a+(B+0)=|Al/+|BUC|
=|AUBUC)|
=([(AUB)UC] by associativity of set union
=(AUB/[+(C|
=(4
+ B) +6;
hence addition of cardinal numbers is associative. Jf
4. Although we have not proved it, the result of the operations of addition, multiplica-
tion and exponentiation of cardinal numbers is independent of the sets chosen as
representatives for the cardinal numbers, ie., if |A|=[Bl, |C| =|D] and
ANC=BOD
= ®, then
[A] +|C| =|B] + |D}.
This is not the case with the operation of subtraction proposed in the problem. For
example, let A = B = C = N and let D be the set of even integers. Then | Aj = | B|
and|C| =|D|, but|A — C]| = OA No =[B| — | DI.
5. (a) Let A, B, and D be sets such that |A| = a, |B| = b, | D| = d, and
AND=BO D=¢.
Since a < b, there exists an injection f: A —- B. Define g as follows:
g:AUD->BUD,
g(x) = f() ifx € A,
= xX ifxe D.
Then g is an injection from AU D to BU D; hence |A U DI < [Bu
DI.
Since AM D= BO D = @, it follows thata+d<b+d. J
(b) Let a=n, b=n-+1, and d=). Then a<b buta+d=N,=b4d.
8. The set {0, 1}N has cardinality 2%. Since this is the set of characteristic functions of
subsets of N, it follows that |@(N)| = 2%». In (b) of the examples immediately preced-
ing Theorem 6.3.5, we showed |@(N)| = ¢; hence 2¥> = c. In (c) of the same examples,
ANSWERS TO SELECTED PROBLEMS 383

we showed |NN|=c. But [N‘| = X¥. Since for every n> 2, 2<n<QNo, it
follows from Theorem 6.4.8 that

c = 280 < nRo < RR = €;


hence n® = cifn>2. ff

Section 7.1

1. Let 1 be a left identity; then for all x,


lox =x,

By commutativity it follows that


lox =xol,

Combining these assertions we conclude


lox=xol=x.

2. unary
+ — [x—y| max min |x|

(a) Y Y Y Y Y Y Y Y
(ob) Y Y N Y Y Y N Y
(cc) N N N Y Y Y N Y
(d) N N N N Y Y Y Y
(ec) N N N N Y Y N N
(ff) Y Y Y Y y Y Y

4. Suppose 0, is a left zero and 0, is a right zero. Then

0, = 0,0 0, = 0;. i

6. (a) This algebra is just a presentation of the integers {0, 1, 2, 3} under addition mod 4.
The operation is commutative and associative. The element a is an identity.
All elements have inverses (because the element a appears in every row of the
operation table.) No zero element exists (because no row (column) has entries
which are all equal to the row (column) label).

7. (a) a

ae

(d) a b

a a b
b b b

(f) a b

a aa
384 ANSWERS TO SELECTED PROBLEMS

(h) a b

a a b
b aa

In this algebra, (ba)b + b(ab).

Section 7.2

1. Let 7, = {x|x € R and x<k}. Then & is a zero element of <7, max>, but no
identity element exists.
2. We must show that <S,, +> is a subalgebra of <I, +>. By definition of S, it follows
that S, < I. Furthermore, since k > 0, the set S, is closed under addition, i.e.
ifx>kandy>k, thenx +y>k;
therefore ¢S,, +> is an algebra. It follows that <S;,, +> is a subalgebra of <I, +>
and hence a subsemigroup of <I, +>. Jj
5. Let <T, °’, 1 be a subalgebra of a monoid <S,°,1>. Then T< S, 1’ = 1, and
ac’b=aob for all a,b € T. The operation o is associative on S; hence the
operation 0’ is associative on T since
(ao’ b)o’c = (ac b)oc = ac (boc) =a’ (bo’ ©).
Moreover, 1’ is an identity with respect to o’, since
Vox =lox=x,
Therefore <T, ’, 1 isa monoid. Jj

8. t+ke{ 0 12 3 4 x
tal)
©
©

NOW
hb
WN&

a
&

9. Suppose A = <7, °,~,1> is a group, and for some 0 € T, 0° x =O for every


x & T. Since A is a group, there is an element 6 such that
0.0 =1.
But 0 o 6 = 0, hence 0 = 1; i.e., the identity of A is the element 0. Then for every
xeé T,
x=xol=xo0=0
and therefore T = {0}. §j
10. Let A = <S, 0, 7, > be a group.
(a) We prove the contrapositive of the implication [x + y] > [aox ao y].
ANSWERS TO SELECTED PROBLEMS 385

aox=acy>do(aox)=Go(acy)
> (@cea)ox=(@oa)oy
=>lox=loy
> x= y.
In the same way, we can show that ifxoa=yoa,thnx=y. J
(b) By definition, ao S = {ao x|x € S}. Since S is closed under o, ao Sc S.
Now suppose y is an arbitrary element of S. Then for some x € S, namely
X= doy,
acox=ac(G@cey)=(acad)oy=loy=y;
hence ao S> 8S. Therefore aco S =S. Similarly, one can show that
S=Soa fj
(c) Let x be the inverse of a; then
Gox=xoa@=l,
and
xX=lox=(aca@)ox=ac(@ox)=aol=a, J
12. Variety Cardinality

a group 1
b semigroup 1
c semigroup 2
d group 4
e semigroup 3

13. (b) The algebra <{R*|n ¢ I+}, composition, R*> is a monoid if and only if
R* R/ = R’ for all positive j. This holds if and only if R* R’= R. Thus a neces-
sary and sufficient condition is that there exist a k such that R“*! = R. (Note
that there need not be a k such that R“ = R°: an example is R={<a,hb>,<b,
a>,<c,a>},.)
16. (a) Since k binary digits are used to represent each representable integer, the
carrier has 2* elements. The variety is a group, because 0 is an additive identity
and if 2* — x is added to any representable integer x, the result will be 0.
(b) The carrier still has 2* elements, but the variety is a monoid with identity
element 0. For every representable x and y, the operation @ of the monoid is
defined by x@®y = min (x + y, 2* — 1).

Section 7.3
1. (a) An isomorphism is a bijective map from one carrier to another; if the carriers
of two algebras have different cardinalities, then no bijections exist from one to
the other.
(b) Let A, = <{a, b}, o> and A, = <{ce, d}, [_}>, where o and (] have the operation
tables o lab led

a c ce oe
386 ANSWERS TO SELECTED PROBLEMS

2. Let Ay = (Si, °,ky>, Az = (S2, 1, ky> and A; = <S3, A, ka).


(i) The relation ~ is reflexive since 1s is an isomorphism from any algebra <S, ©, k>
to itself.
(ii) Suppose A, ~ A,; then there is some isomor phism / from A, to Az. We will
show Aq! is an isomorphism from A, to A,. The inverse h~! exists, since / is a
bijection from S, to S,. Choose elements c, din S,, and suppose h(a) = ¢ and
h(b) = d. Then, since h is a homomorphism,
h(a o b) = h(a) () A(b) = cd,
and
A(k,) = ko.
It follows that
h'(e Ld) = A (h(a) FA)
= h-'(h{ao b))
= h'h(ao b)
=aqohb

= h-*(c) o h-'(d),
and
ho (ky) = ANAK) = ATA(Ky) = ky.
Thus h-! is an isomorphism from A, to A,, which establishes that 4, ~ A,
and that ~ is symmetric.
(iii) Suppose 4, ~ A, and A, ~ A;; and let A be an isomorphism from A, to A,
and g be an isomorphism from A, to A;. We show that gh is an isomorphism
from A, to A;:
gh(ac b) = g(h(a 5))
= g(h(a)) (_} e(A(®)).
= gh(a) A gh(b).
Moreover,
gh(k,) = g(k2) = ks.
It follows that 4, ~ A; and that ~ is transitive. Jj
5. (Proof of Theorem 7.3.3b) Let A = <S,°,1> be a monoid and A’ = <S", 0’, 1D
(note that A’ need not be a monoid). The same proof given in the text for part (a)
of the Theorem establishes that the operation ’ is closed and associative over the set
ACS). To show that 1’ is an identity with respect to o’ for the set AGS), we note that
h(1) = 1’ since 4 is a homomorphism from A to A’. Then for any x € ACS), there is
some a € S such that h(a) = x, and
1’ 0 ‘x = A(1) o’ h(a) = AC o a) = h(a) = x.
Thus 1’ is an identity for the set h(S) and hence <h(S), 0’, 1 isa monoid. fj
6. (a) The function f: N - S is defined by
f(n) = n mod 2*
and is a homomorphism since
S(a + b) = (a + 6) mod 2* = (amod 2% + b mod 2*) mod 2*
== amod2* @ b mod2*
= f(a Of).
ANSWERS TO SELECTED PROBLEMS 387

(c) If we use ~~ to denote concatenation, the function f can be represented as


f(r) = 07 (nmod 24!) < 2*-1,
= O07 n ~~ forn
= 1 (n mod 24-1) for n> 2*-!,
Ifa+6 < 2*-', then both a and 6 are less than 2*~!, and
f(a + 6) =0~™ (a+ b)
=07a@0~b
= f(a) © f().
Ifa -+ b> 2*-1, then
f(a + b) =1~ (a+ db) mod 2*7!
= x ~~ (amod 2*~!) @ y ~ (6 mod 24!)
x = 1 and
where x = Oor y = Oor y = 1. In any case,

fla + b) =f@ Os),


which establishes that fis a homomorphism.

Section 7.4

1. We first show that ~ is an equivalence relation.


(i) Reflexivity: p/q ~ p/q, since pq = pq.
(ii) Symmetry: p/q ~ r/s > ps = rg => rq = ps >r/s ~ pia
(iii) Transitivity:
pla~ rls \ ris ~ thu
=> ps =rgq \ ru = ts by definition of ~
=> ps-ut = rg-ut \ ru = ts multiplication by ut
=> pu-ts = tg-ru \\ ru = ts by commutativity of -

We must now proceed by cases.


Case 1: If ru and ts are nonze ro, then pu = tq by cance llati on, and we con-
clude that p/q ~ t/u.
Case 2: Tf ru and ts are equal to 0, then r = ¢ = 0, becau se u and s canno t be
0. Since ps = rg and g cannot be 0, it follo ws that p = 0. Then pu = tq, and we
conclude that p/q ~ t/u. It follows that ~ is transitive.
We next show that for arbit rary fract ions a, b and ¢, ifa ~ b then a +¢~ b+e.
Let a = p/q, b = r/s, and c = t/u.
a~ b= plq ~ r[s > ps = rq.

Then + pi
ate~b+ee rs + tlu
tu ~ [q
<> (pu + tq)qu ~ (ru + ts)/su
<> (pu + tq)su = (ru + ts)qu.
tqsu = rquu + tqsu = (ru + ts)qu. Hence ,
But since ps = rq, (pu + tq)su = psuu +
a+c~ b-+c. Moreo ver, since -+ isco mmut ativ e,a ~ b2=c t+ta n~c+ b.To show
c, the prece ding proof can be altere d by repla cing
that a ~ b implies a — c ~ 6 —
that a ~ b impli es that c — a ~ c — b, how-
each occurrence of + by —. To prove
388 ANSWERS TO SELECTED PROBLEMS

ever, we cannot appeal to commutativity because -- is not commutative. But the


proof is essentially the same; we need only show
(tq — pu)us = (ts — ru)ug.
To show that ~ is a congruence relation for unary minus, we observe that

plq ~ v/s => ps = rq => —ps = —rq > (—p)lq ~ (—r)/s > —(p/q) ~ —(/S).
3. (a) This is not a congruence relation, since
1~ —2but —-1+1% —-2+1.
(b) This is not a congruence relation because it is not an equivalence relation; it is
reflexive and symmetric, but not transitive.
4. Let ~ be any equivalence relation over {0, 1, 2,..., } such that every equivalence
class of ~ is a sequence of successive integers:
an b=Vxla<x<b>a~ x].
Then ~ is a congruence relation on the algebra <{0, 1, 2,..., k}, max>.
5. (a) Since K is an ideal, KoO< K. But by the properties of a zero element,
Ko OQ = {0}. Therefore, {0} < K;ie.,0 € K.
6. The set of multiples of any integer k is an ideal of <I, ->.
The relation ~ is a congruence relation on <S, (> if ~ is an equivalence relation
and for all a, b,c,d & S,ifa~ b, then
(i) Ci, c,d) ~ (1G, ¢, d)
(ii) (ic, a, d) ~ Cie, b, d)
(iii) (C\(e, d, a) ~ [1(e, d, 6).
From these conditions we can show that the following (which can be used as an alter-
native definition):
An equivalence relation ~ is a congruence relation over <S, [> if and only if
for all elements a, a’, b, b’, c, c’ € S,

a~d \b~U Acre >([|a,6,0¢ ~ C@, b,c’).


9. (a) [A] consists of the set {continue}*, where * denotes the star closure.
(b) For any string x, we can obtain the shortest string in [x] by the following pro-
cedure,
(i) Delete all occurrences of continue from x, giving x’.
(ii) Delete all symbols to the right of the leftmost occurrence of end in x’.
(c) Since x-continue ~ continue-x ~ x,
[x]- [continue] = [x-continue] = [x], and
[continue]-[x] = [continue-x] = [x].

Section 7.5

1. If n=0, then m = 0 and the congruence relation on A is the universal relation


N XN. In this case, the quotient algebra is isomorphic to <{0}, +>, where 0 + 0 = 0.
If n ~ 0, then the function A is injective and the congruence relation induced on A
is equality. In this case, the quotient algebra is isomorphic to A.
ANSWERS TO SELECTED PROBLEMS 389

2. Define the map ffrom A/~ to <A(S), 0’, A’, k’> as follows:

. f: S/~ > KS);


f(x) = AQ).
(i) We first show that f is well-defined by showing that if [x] =[y], then
f(x) =f(y). If [x] = by], then x ~ y and therefore A(x) = h(y). Since
f(x) = A(x) and f(Ly) = AQy), it follows that f([x]) = f([y]). Therefore fis
well-defined.
(ii) We next show that fis a bijection.
To show that fis injective, we note that
x] Aly] > x ¥ y
=> h(x) & h(y)
=> f(x) + f(y).
To show f is surjective, we observe that for any a € S’,
a € ACS) = Axfh(x) = a]
=> Ix{flx) = al.
(iii) We now show that f preserves the operations. We use o and A to denote the
operations of A/~ as well as those of A; 0’ and A\’ are the operations of A’.
fi] ° Ly) = f(& ° yl)
= h(xc y)
= h(x) °’ h(y)
= f(x) °’ fy)
S(ABD = f((Ax)
= h(Ax)
= A’h(x)
= A’f(R)
Thus the map f preserves the operations. Moreover, f({k]) = A(k) = k’, so the con-
stant of A/~ is mapped to that of <A(S), °’, A’, k’>. Thus fis an isomorphism.
3. The product monoid is <S x S’, (1, <1, 19>, where
<a, b> [-] <c, d> = <a b,c’ d>.

The operation is associative, since


(<a, by (1 <e, d>) 1 Xe,f> = <a ce, bo d> [1 ef
= ((acc)oe, (bo d)e’ f>
= ao (coe), bo’ (do f)>
= (a, b>[]<eoe, do’ f>
= <a, b> [] Ke, d> 1 ef).
Furthermore, <1, 1’> is a left identity, since

<1, 1 D1 <a, b> = <1 oa, 1 0’ b> = Ga, bp;


390 ANSWERS TO SELECTED PROBLEMS

an analogous proof can be used to show that <1, 1 is a right identity. It follows that
the product algebra of two monoids is a monoid. J
6. (a) Always.
(b) This is easily shown by establishing that the function
h:A-(A x AY~,
h(x) = [x] = {<x, »>},
is an isomorphism.
7. (a) This can be shown by constructing the operation tables of the two algebras
and showing that they are identical except for notation. In particular, the
map f such that f(<0, 0>) = 0, f(<1, 1D) = 1, £(O, 2>) = 2, F(K1, O) = 3,
F(<O, 1>) = 4 and f(<1, 2>) = 5 is an isomorphism from A, Xx A3 to Ag.
BIBLIOGRAPHY

Axo, ALFRED V., JoHN E. Hopcrort, AND JEFFREY D. ULLMAN, The Design and Analysis
of Computer Algorithms. Reading, Mass.: Addison-Wesley, 1974.
Ano, ALFRED V., AND JEFFREY D. ULLMAN, The Theory of Parsing, Translation, and Com-
piling. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1972.
BELLMAN, RICHARD, KENNETH L. COOKE, AND Jo ANN Lockett, Algorithms, Graphs and
Computers. New York: Academic Press, 1970.
BUSACKER, RoBERT G., AND THomas L. Saaty, Finite Graphs and Networks; An Introduc-
tion with Applications. New York: McGraw-Hill, 1965.
CouEN, Paut J., Set Theory and the Continuum Hypothesis. New York: W. A. Benjamin,
1966.
Coun, P. M., Universal Algebra. New York: Harper & Row, 1965.
DeLonc, Howarp, A Profile of Mathematical Logic. Reading, Mass.: Addison-Wesley,
1970.
Deo, NaR sIN GH, Gra ph The ory with Appl icat ions to Eng ine eri ng and Com put er Scie nce.
Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1974.
Exspas, B., ET AL., “An Ass ess men t of Tec hni que s for Pro vin g Pro gra m Cor rec tne ss, ”
ACM Computing Surveys, Volume 4, Number 2, June 1972.
Even, SHimMon, Algorithmic Combinatorics. New York: The Macmillan Co., 1973.

Fioyp, R. W., “Assigning Mea nin gs to Pro gra ms, ” in Mat hem ati cal Aspe cts of Com put er
Science, Proc . Sym p. Appl . Mat h., Vol ume 19, ed. J. T. Schw artz ., Pro vid enc e, R. I:
American Mathematical Society, 1967.
FRALEIGH, J. B., A Firs t Cou rse in Abs tra ct Alg ebr a. Rea din g, Mas s.: Add iso n-W esl ey,
1969.
Git, ArTHUR, Applied Alg ebr a for the Com put er Sci enc es. Eng lew ood Clif fs, N.J .:
Prentice-Hall, Inc., 1976.
GRATZER, G., Universal Algebra. New York: Van Nostrand, 1968.

391
392 BIBLIOGRAPHY

Hautmos, PauL R., Naive Set Theory. New York: Van Nostrand, 1960.
HEerSTEIN, I. N., Topics in Algebra. Waltham, Mass.: Blaisdell, 1964.
Hoare, C. A. R., “An axiomatic basis for computer programming,” Communications of
the ACM, Volume 12, Number 10, October, 1969.
Knut, D.E., The Art of Computer Programming; Vol. I/ Fundamental Algorithms
(2nd Ed.). Reading, Mass.: Addison-Wesley, 1973.
Knutu, D. E., The Art of Computer Programming; Vol. 3/ Sorting and Searching, Reading,
Mass.: Addison-Wesley, 1973.
Knutu, D. E., Surreal Numbers, Reading, Mass.: Addison-Wesley, 1974.
KRrivINE, Jean-Louis, Introduction to Axiomatic Set Theory. Dordrecht, Holland: D.
Reid! Publishing Co., 1971.
LANDAU, EpMuUND, Foundations of Analysis. New York: Chelsea Publishing Co., 1951.
Liu, C. L., Introduction to Combinatorial Mathematics. New York: McGraw-Hill, 1968.
MACLANE, SAUNDERS, AND GARRETT BirkHorFF, Algebra. New York: The Macmillan Co.,
1967.
Mak}, D.P., and M. THompson, Mathematical Models with Applications. Englewood
Cliffs, N.J.: Prentice-Hall, Inc., 1973.
Manna, ZOHAR, Mathematical Theory of Computation. New York: McGraw-Hill, 1974.
Minsky, MARVIN, Computation: Finite and Infinite Machines. Englewood Cliffs, N.J.:
Prentice-Hall, 1967. ;

Monk, J. DonaLp, Introduction to Set Theory. New York: McGraw-Hill, 1969.


NIVEN, Ivan, Mathematics of Choice, or How to Count Without Counting. New York:
Random House, 1965.
REsSCHER, NICHOLAS, Many-valued Logic. New York: McGraw-Hill, 1969.
Roserts, Frep S., Discrete Mathematical Models. Englewood Cliffs, N.J.: Prentice-Hall,
Inc., 1976.
ROSEN, SAUL, ED., Programming Systems and Languages. New York: McGraw-Hill, 1967.
SHOENFIELD, JosepH R., Mathematical Logic. Reading, Mass.: Addison-Wesley, 1967.
STEINHAUS, H., Mathematical Snapshots (3rd American ed.). New York: Oxford Univer-
sity Press, 1969.
STOLL, RoBert R., Set Theory and Logic. San Fransisco: W. H. Freeman and Co., 1963.
Stone, Haroip S., Discrete Mathematical Structures. Chicago: Science Research
Associates, 1973.
Suppes, PATRICK C., Axiomatic Set Theory. New York: Van Nostrand, 1960.
VILENKIN,N. YA., Stories About Sets. New York: Academic Press, 1968.
WILDER, RAYMOND L., Introduction to the Foundations of Mathematics (2nd ed.). New
York: John Wiley, 1965,
INDEX

A Assertions, 9
logically equivalent, 13, 30
Absolute satisfiable, 30
complement, 89 unsatisfiable, 30
optimality, 259 valid, 30
Absurdity, 14 Associative operation, 19, 86, 108

Adjoining an identity, 311 Asymptotic


Algebraic behavior, 233-42
system, 300 complexity, 236-42

varieties, 302, 309-14 optimality, 259


Algebras, 300-331 Asymptotically dominates, 233
Algorithms Average case analysis, 261-71

analysis, 57, 247-74 Axiom(s), 8, 39, 302


balanced, 268, 274 of assignment, 69-71
236 of choice, 276, 283
complexity,

divide and conquer, 248-56 of extension, 78


iterative, 138 B
minimax, 227
optimal, 258 Backus-Naur (or Backus Normal) Form, 97
recursive, 248 Balanced
Alphabet, 96, 111 algorithm, 268, 274
Alphabetic order, 169 binary tree, 263
Ancestor (of a node), 132 Barber Paradox, 81
And, 11 Basis clause, 95
Antecedent, 12 Bijection, 204, 214-15, 217, 222
Antisymmetric relation, 146 Binary relation, 123ff
Arc, 125 Binary search, 253, 258
Arden, Dean, 116 Binary search tree, 136-39, 145, 255, 262-65,
Argument 271-73
of a function, 194 Binary tree, 134, 144-45
of a predicate, 21 balanced, 263
search, 262 traversal algorithms, 140-42, 144
valid, 39-43 Binding of individual variables, 22-26
Arity of an operation, 301 Binomial coefficient, 223

393
394 INDEX

Block, 183 Complete


Boolean algebras, 312-13 digraph, 129
Bound logical system, 81
greatest lower, 173 n-ary tree, 134
least upper, 173 Complexity function, 232
lower, 173 Component of a digraph, 129
upper, 173 Composite
variable, 22 function, 196
Boundary conditions, 243 relation, 150
Bubble sort, 266-67 Composition
of functions, 196
Cc of relations, 149-53
rule, 60
Cancellation laws, 312 Computable numbers, 292
Cannibals and Missionaries, 231 Concatenation, 96, 111
Canonical map, 211 Conclusion, 12, 40
Cantor, Georg, 75, 279, 285, 293 Congruence class, 323
Cantor diagonal technique, 285 Congruence relation, 323
Cantor-Schroder-Bernstein Theorem, 290 induced by a homomorphism, 32‘
Cardinal arithmetic, 295-98 Conjunction, 11
Cardinal numbers, 279 Connected, 129
arithmetic, 295-98 Consequence
comparison, 288-95 of an implication, 12
finite, 219 rules, 61
infinite, 280 Consistent logical system, 81
Cardinality Constant
argument, 275 of an algebra, 301
of a finite set, 218 complexity, 236
of an infinite set, 280 function, 207
Carrier of an algebra, 301 Containment, set, 82
Cartesian product, 121 Contingency, 14
Cases, proof by, 50 Continuum, 286
Chain, 166 hypothesis, 293
Characteristic function, 212, 217, 221 Contradiction, 14
Class (see Sets) proof by, 51, 56
Closed under an operation, 303 Contrapositive, 12
Closure operations proof, 49
on a language, 114 Converse
on a relation, 155-63 of an implication, 12
Codomain of a relation, 156
of a function, 194 Cost function, 232
of a relation, 124 Countable, 280
Cohen, Paul, 293 Countably infinite, 280
Collection (see Sets) Counterexample, 43, 53
Collision (in hashing), 206 Counterfeit coin problems, 226-29, 231
Combinations, 233, 244 Cross product, 121
Commutative diagram, 197 Cycle, 127
Commutative operation, 19, 86
Compact logical notation, 36 D
Complement
absolute, 89 Decision trees, 225-29, 231, 264, 266, 271
in a Boolean algebra, 313 DeMorgan’s laws, 15, 90, 94, 108
relative, 85 Denumerable, 280
INDEX 395

Derangement, 257 Equivalence


Descendant (of a node), 132 class, 180
Diagonalization, 285 logical, 13, 30
Dice paradox, 163 modular, 179
Difference equations, 243 Equivalence relation, 179-82
Difference of sets, 85 induced by a function, 211, 325
Digraph, 125-29 induced by a partition, 185
complete, 129 induced by a relation, 182
component of, 129 preserved by an operation, 322
connected, 129 rank, 180
disconnected, 129 Equivalent
representations of, 126 assertions, 13, 30
strongly connected, 128 modulo k, 179
Directed graph (see Digraph) Excluded middle, law of, 9, 39
Directed path, 127 Exclusive or, 12
Direct product of two algebras, 329 Existence proof, 53
Direct proof, 49 Existential
Disconnected, 129 generalization, 44
Discriminator, 139 instantiation, 44
Disjoint, 86 quantifier, 23
Disjunction, 11 Exponential complexity, 237
Distributive laws, 87 Exponents, laws of, 104-5, 297
Distributivity for binary relations, 152
of operations, 19, 87 for languages, 114
of quantifiers, 33 Expressions
Divide and Conquer, 248-56 inductive definition, 97
Document retrieval, 124, 189 tree representation, 135
Domain Extension
of a function, 193 axiom of, 78
of a relation, 124 of a function, 212
External path length, 257
Extremal clause, 95
E
F

Edges, 125
Element Factorial, 201
greatest, 167 Fallacies, 41, 47, 107
identity, 304-5 Father (of a node), 132
least, 167 Fermat’s Last Theorem, 77
maximal, 173 Fibonacci sequence, 199, 245
minimal, 173 File, 136, 206, 262
of a set, 75 Final assertion, 58
First Prin ci pl e of Ma th em at ic al In du ct io n, 10 2,
zero, 304-6
Empty
107, 170
relation, 122 Flowcharts, 127
set, 84 Formal system, 39, 47, 54-55
string, 96 Four cubes problem, 220
Enumeration, 281 Fractions, 322
Equality Free monoid, 322
functions, 196 Free variable, 25
relations, 123 Function, 193
sets, 78 argument, 194
Equipotent, 288 bijective, 204, 214-15, 217, 222
396 INDEX

Function (cont.): Identity


canonical, 211 adjoining, 311
characteristic, 212, 217, 221 element, 304-5
codomain, 194 function, 207
composite, 196 left or right, 207, 305-6
constant, 207 two-sided, 305
domain, 193 If-then-else rule, 65
enumeration, 281 lf-then rule, 62
extension of, 212 Image
identity, 207 of a function, 195
image, 195 homomorphic, 319
injective, 204, 213, 216, 217, 223 inverse, 210
inverse, 209, 215 isomorphic, 316
left inverse, 213 Implication, 12
of n variables, 199 Incidence
one-sided inverse, 213 matrix, 126, 162
partial, 201 relation, 125
restriction of, 212 Inclusive or, 11
right inverse, 213 Indegree, 126
successor, 199 Indexed collection of sets, 91
surjective, 204, 213, 216, 217 Index set, 91
total, 202 Indirect proof, 49
two-sided inverse, 213 Individual variable, 21
value, 194 Induction
well-defined, 200 First Principle of Mathematical, 102, 107, 170
hypothesis, 102, 105
G proof, 100-108, 170-73
Second Principle of Mathematical, 105,
Gédel, Kurt, 55 171-73, 178
Greatest Inductive clause, 95
element, 167 Inductive definitions
lower bound, 173 : ; .
of arithmetic expressions, 97-98
Groups, 311 of binary trees, 138
of functions, 199-201
of relations, 125
H of sets, 95-98
; of well-formed formulas, 97-98
Halting problem, 52 Inductive proofs, 100-108, 170-73
Hash function, 206 Inference, rules of (see Rules of inference)
Hasse diagram, 164 Infinite sets, 275
Heapsort, 271
countably, 280
Height of a tree, 132, 263
uncountably, 280
Heterological, 81
Initial
Homological, 81
assertion, 58
Homomorphic image, 319
conditions, 243
Homomorphism, 317-20
node, 127
Horner’s method, 273
segment of N, 280
Hypothesis, 12, 40 Injections, 204, 213, 216, 217, 223
continuum, 293 Inorder 141-42

induction, 102, 105 Interchange sort, 272, 274


Interior node, 132
Intersection, 85
Inverse
Ideal of a semigroup, 326 elements, 306
INDEX 397

Inverse (cont.): Logical (cont.):


functions, 209, 213-15 operators (see Logical connectives)
image, 210 program errors, 57
Irreflexive relation, 145 relationships (tables), 15, 16, 36
lsomorphic image, 316 Loop
Isomorphism, 316-17 of a graph, 126
Iteration, rule of, 65 invariant relation, 65
Iterative algorithm, 138 Lower bound, 173

K M

Key Map (see Function)


search, 136, 262 Mathematical
transformation, 206 logic, 8, 39, 47, 54-55
models, 2-7, 300
Kleene closure, 114
Mathematical Induction
L First Principle of, 102, 107, 170
proof by, 100-108, 170-73, 178
Second Principle of, 105, 171-73, 178
Language, 112
ws of ex po ne nt s (se e Ex po ne nt s law s of) Maximal element
La
of an array, 251-53, 257, 260
Leaf, 132, 244
of a poset, 173
Least
Member of a set, 75
element, 167
Merge, 268
upper bound, 173
Mergesort, 268
Length
Minimal element
of a path, 127
of an array, 251-53, 257-60
of a string, 96
of a poset, 173
Lexicographic ordering of 2 *, 169
Minimax algorithm, 227
Liar paradox, 80
Missionaries and Cannibals, 231
Liars and truth-tellers puzzle, 17
Models, 1-7
Limit of a function, 36
components, 3, 4
Linear
mathematical, 2-7, 300
complexity, 237
purposes, 6
order, 166
Modular equivalence, 179
Linearly ordered set, 166
Modulus (of an equivalence), 179
Logarithmic complexity, 237
Modus ponens, 40
Logic
Modus tollens, 41
mathematical, 8
Monoid, 310
two-valued, 9
Monotone
Logical
decreasing, 208
connectives
increasing, 208, 216
and, 11
equivalence, 13
exclusive or, 12
N
implies, 12
n-ary
inclusive or, 11
relation, 122
logical or, 11
tree, 134
nand, 19
nlog n complexity, 237
nor, 20
n-tuple, 121, 131
not, 10
Naive set theory, 75
Peirce arrow, 20
Nand, 19
Sheffer stroke, 19
Natural numbers, definition of, 108-10
equivalence, 13
Necessary condition, 12
fallacies, 41, 47
398 INDEX

Negation, 10 Partition (cont.):


91 function, 200, 203 rank of, 183
Node, 125 refinement of, 185
Noncomputable numbers, 292 sum, 188
Nor, 20 Path, 126
Not, 10 directed, 127
Null set, 84 external length, 257
Null string, 96 length of, 127
Number representation, 180, 221, 230, 284, 288, simple, 127
315, 321 total length, 257
undirected, 126
Oo Peano Postulates, 110
Peirce arrow, 20
O (order) notation, 234, 250 Perfect number, 50
One-to-one, 204 Permutations, 208, 222-23
Onto, 204 Pigeonhole principle, 219, 275
Operands, 10 Pohl, 1. 257
Operations of an algebra, 301 Polish notation, 142, 144
Optimality Polynomial evaluation, 273
absolute, 259 Poset, 164-78
asymptotic, 259 diagrams, 164
Order Positive closure, 114
alphabetic, 169 Postorder, 141-42
of complexity, 234-42 Power
lexicographic, 169 of cardinal numbers, 297
linear, 166 of a function, 198
notation (Big-Oh), 234, 250 of a language, 113
partial, 164-78 of a relation, 152
quasi, 165-66 set, 92
relation, 164-78 Predicate, 20-21
standard, 170, 282 constant, 21
total, 166 variable, 21
Ordered Prefix, 96
n-tuple, 121, 13) Pre-image, 210
tree, 134 Premise, 12, 40
Ordinal numbers, 295 Preorder, 141-42
Outdegree, 126 Preserved properties under an operation, 148, 322
Overflow, 180, 315, 321 Product
algebra, 329
P
of cardinal numbers, 296
Cartesian, 121
Paradoxes, set theory, 75, 79-81, 280 cross, 121
Parenthesis free notation, 142, 144 direct, 329
Parity, 229 of partitions, 187
Partial rule of, 220
correctness, 69 set (of languages), 112
function, 201 Program
order, 164-78 assertions, $7-71
subdigraph, 128 errors, 57
Partially ordered set, 164-78 verification, 57-71, 172, 202
Partition, 183-92 Proof, 8, 39, 47, 54-55
induced by an equivalence relation, 184 by cases, 50 :
product, 187 by contradiction, 51-56
INDEX 399

Proof (cont.): Relations, 122


of the contrapositive, 49 antisymmetric, 146
by counterexample, 53 binary, 123ff
direct, 49 codomain of, 124
existence, 53 congruence, 323
indirect, 49 empty, 122
by induction, 100-108, 170-73 equivalence, 179-82
reductio ad absurdum, 51 irreflexive, 145
of tautologies, 47 n-ary, 122
trivial, 49 order, 164-78
vacuous, 48 preserved under an operation, 148, 322
Proposition, 9-20 recurrence, 243
Propositional reflexive, 145
form, 10 symmetric, 146
inductive definition of forms, 98 ternary, 123
variable, 9 transitive, 146
Provable, 55 unary, 123
Pushdown store, 142 universal, 122
void, 122
Q Relative complement, 85
Restriction of a function, 212
Quadratic complexity, 237 Resultant, 85
Quantification, 22-27 Root, 131
Quantifiers, 22-23 of a subtree, 132
existential, 23 Rule
scope of, 31 of Product, 220
unique existential, 23 of Sum, 220
universal, 22 Rules of inference, 8, 40
Quasi order, 165 for mathematical proofs 40-45, 100-108, 170-73
Quicksort, 271 for program correctness 59-69
Quotient Russell, Bertrand, 80
algebras, 327 Russell’s paradox, 80
set, 184, 211
S
R
Rank Satisfiable
of an equivalence relation, 180 assertion, 30
predicate, 21
of a partition, 183
Record, 136, 206, 262
Scope of a quantifier, 31
Recurrence Search
relations, 243 algorithms, 136-38, 262-65
solution of systems, 243, 246 argument, 262
systems, 243-258
binary, 253, 258
Recursion formula, 199
key, 136, 262
Recursive
trees, 136-40, 255, 258, 262-65
space, 186
algorithms, 98-100, 248
definitions, 95, 199-201 sequential, 262, 274
space, 186
procedures, 98-100, 248
Second Principle of Mathematical Induction,
Reductio ad absurdum, 51
105, 171-73, 178
Refinement, 185
Selection
Reflexive
with replacement, 222
closure, 155
without replacement, 223
relation, 145
400 INDEX

Semigroup, 309 Superset, 82


Set product of languages, 112 Surjections, 204, 213, 216, 217, 245
Sets, 75 Switching functions, 230
countable, 280 Symbol table, 124
denumerable, 280 Symmetric
finite, 218-76 closure, 155
infinite, 275-76 relation, 146
specification of, 77 Syntactic program errors, 57
standard, 280
T
uncountable, 280
Sheffer stroke, 19
Signature, 302 Tautology, 14, 47
Simple Terminal node, 127
cycle, 127 Ternary tree search, 140, 272
order, 166 Theorem, 8, 39
path, 127 Theory of types, 80
Singleton set, 84 Time complexity function, 232
Slide rule, 317 Topological sort, 174
Son (of a node), 132 Total
Sort function, 202
algorithms, 265-74 order, 166
bubble, 266-67 path length, 257, 264
heap, 271 Towers of Hanoi, 231, 256
interchange, 272, 274 Transformation (see Function)
merge, 268 Transitive
quick, 271 closure, 155
Space complexity function, 232-33 relation, 146
Species, 302 Traveling salesman problem, 224
Standard Traversal of a tree, 133, 140-42
ordering of 2 *, 170, 282 Tree, 131-36
set, 280 balanced, 263
Star closure, 114 binary, 134
State diagram, 225 binary search, 136-40, 255, 258, 262-65
String, 96 complete n-ary, 134
Strongly connected, 128 decision, 225-29, 231, 264, 266, 271
Subalgebra, 304 degenerate, 263
Subdigraph, 128 height, 132, 263
Subgroup, 311 n-ary, 134
Submonoid, 310 ordered, 134
Subsemigroup, 309 search, 136-40, 255, 258, 262-65
Subset, 82 ternary search, 140, 272
Substitution property, 322 traversal algorithms, 140-42
Substring, 96 Trichotomy, Law of, 289
Subtree, 132 Trivial proof, 49
Successor Truth
function, 199 table, 10
of a natural number, 108, 111, 199 abbreviated table, 14
Sufficient condition, 12 value, 9
Suffix, 96 Two-valued logic, 9
Sum
of cardinal numbers, 295 U
of partitions, 188
rule of, 220 Uncountable, 280
INDEX 401

Uncountably infinite, 280 Variables (cont.):


Undirected path, 126 propositional, 9
Union, 85 Varieties of algebras, 302, 309-14
Universal Venn diagrams, 90
generalization, 44 Vertices, 125
instantiation, 44 Void
quantifier, 22 relation, 122
relation, 122 set, 84
Universe of discourse, 21
Unsatisfiable W
assertion, 30
predicate, 22 Water jug problem, 231
Upper bound, 173 Well-defined
function, 200
Vv set, 80
Well-formed formula, 97
Vacuous proof, 48 Well order, 168
Valid Well ordered set, 168
argument, 39-43 wff, 97
assertion, 30 Whitehead, Alfred North, 80
predicate, 21 Word, 96
Variables Worst case analysis, 261-71
binding of, 22-26
bound, 22 Z
free, 25
individual, 21 Zermelo, E., 80, 289
predicate, 21 Zeroes, 304-5

You might also like